From 06d85c06fa42d3a6272084ca50d7290c3a3ac403 Mon Sep 17 00:00:00 2001 From: dimakarp1996 Date: Tue, 5 Jul 2022 18:14:27 +0300 Subject: [PATCH 001/309] Dev (#13) * feat: Python pipeline wrappers (#1491) * feat: python pipeline wrappers * minor docstring fixes * docs: minor docstring edit Co-authored-by: Maxim Talimanchuk * refactor: Remove deeppavlov.configs.elmo (#1498) * removed elmo config files * removed elmo_file_paths_iterator, elmo_model and file_paths_reader * refactor: returned file_paths_reader * docs: newlines in file_paths_reader docstring * refactor: remove deeppavlov.configs.skills (#1499) * remove: deeppavlov.configs.skills * delete: aiml_skill component * remove: rasa_skill component * remove: DSLSkill component * docs: removed skills from docs apiref * Feat/glue superglue update (#1508) * Add wnli config * Update copa config * Fix path * Fix record path * Exclude train from evaluation * Exclude train from evaluation * add ranker * update ranker * feat: deeppavlov version update Co-authored-by: Fedor Ignatov Co-authored-by: slowwavesleep <44175589+slowwavesleep@users.noreply.github.com> * fix: upload DeepPavlov BERT models with MLM & NSP heads parameters (#1502) * fix: update urls to fixed BERT models * fix: table in pretrained_vectors docpage * fix: change bert_config_path to .../config.json * fix: add tokenizer_config.json, config.json to BERT-sentence models * fix: en_core_web_sm load error during tests (#1524) * fix: en_core_web_sm loading error * docs: fix comments grammar * Minor change: fix typo (#1517) * remove: Unnecessary models and components (#1523) * remove: unpopular configs and components * refactor: squad_ru_torch_bert -> squad_ru_bert + squad_ru_bert_infer moved to torch * refactor: squad_torch_bert[_infer] -> squad_bert[_infer] * refactor: ner_rus_bert_torch -> ner_rus_bert * refactor: ner_ontonotes_bert[_mult]_torch -> ner_ontonotes_bert[_mult] * refactor: ner_conll2003_torch_bert -> ner_conll2003_bert * insults_kaggle_bert_torch -> insults_kaggle_bert * refactor: ranking_ubuntu_v2_torch_bert_uncased -> ranking_ubuntu_v2_bert_uncased * Revert "refactor: squad_torch_bert[_infer] -> squad_bert[_infer]" This reverts commit fb5bfed8a12519c83e718015ea564ad7176956d0. * Revert "refactor: ner_conll2003_torch_bert -> ner_conll2003_bert" This reverts commit 7fee5e57feb5970f4019fec73d98f026a05d3f6a. * Revert "refactor: ranking_ubuntu_v2_torch_bert_uncased -> ranking_ubuntu_v2_bert_uncased" This reverts commit e656079b70ae6260af065ea47ce37d9ff46c73dc. * Revert "refactor: ner_ontonotes_bert[_mult]_torch -> ner_ontonotes_bert[_mult]" This reverts commit bb41a588170ac79700779442b793b4a46de4318d. * remove: jupyter example notebooks * remove: asr and tts components and models (#1526) * feat: python 3.8/3.9 support (#1525) * refactor: dependencies is weakened to allow run deeppavlov on python 3.8/3.9 * fix: joblib import and doc build errors * feat: added python3 tests and lowered pytoch version upper bound * fix: Jenkinsfile venv6 usage * refactor: reverted minor jenkinsfile changes * feat: docker containers initial commit * fix: Dockerfile and cmd.sh errors * feat: call container tests from jenkinsfile * fix: added entrypoint to image * feat: printenv in jenkinsfile and cmd.sh for debut purposes * fix: tfidf_logreg_en_faq compatibility with newer scikit-learn versions * fix: tfidf_logreg_autofaq compatibility with sklearn 0.24+ * fix: removed test_tf_layers for python3.8+ * refactor: torch.nn.modules.module.ModuleAttributeError -> AttributeError * feat: skip tensorflow-based model tests on python 3.8/3.9 * fix: locales for hdt installation * fix: added pybind11 to image to fix kbqa/el * refactor: building python from sources in docker images * fix: escape dollar sign in the Jenkinsfile * fix: added pip upgrade to fix cryptography installation error * fix: RECORD file not fould error for installation of the older pybind11 * fix: skip_tf_config for configs located in test_src_dir * fix: en_ranker_pop_wiki_test * fix: update tfidf_logreg_autofaq_psearch logreg pkl to newer sklearn version * refactor: spacy version update * feat: added build-essential to dockerfile and updated kenlm to fix some tests * fix: lxml python3.9 build * Using Transformers 4.6 with torch_bert_ranker (#1532) * Update transformer version in bert_ranker * delete bert_config_file from .json * remove: transformers28.txt from requirements Co-authored-by: Fedor Ignatov * Remove/zh squad (#1534) * Remove zh squad * Remove tests * Fix docs Co-authored-by: Fedor Ignatov * remove: elmo (#1533) * Remove elmo and elmo classifiers * Delete elmo tests * Fix docs * docs: returned metric and dataset name for elmo * Add Python code sample for ELMo * fix: docs build * docs: removed elmo model from README.md * docs: removed team * docs: added 3.8 and 3.9 support info * refactor: jinja2 requirement moved from common requirements to docs Co-authored-by: Fedor Ignatov * Remove/rankers (#1537) * Remove ranking * Fix config * Remove bert_sep_ranker * Remove bert_sep_ranker * Fix docs * Clean up rankers * Restore rel_ranking * Fix references order * Fix docs Co-authored-by: Fedor Ignatov * feat: Checking test containers exit codes (#1543) * feat: checking containers exit code * fix: torch_transformers_squad[_infer] added to requirements_registry * feat: docker network prune * fix: false non-zero exit in the end of tests stage * refactor: network prune -> network rm specific network * fix: wrong network name in the finnaly section * feat: added pre-loaded wikipedia_100K_vocab to brillmoore_wikitypos_en * fix: docker network cleaning * fix: config view (#1529) * fix: config view * Update ner.rst Co-authored-by: Fedor Ignatov * remove: telegram, msbot, alexa, alice connectors (#1548) * Delete Telegram * delete msbot, alexa, and alics connectors * delete images and functions used in removed connectors, revert some changes in README.md * update contribution_guide * update contribution_guide.rst * refactor: small formatting changes and contribution guide update Co-authored-by: Fedor Ignatov * feat: aliases (#1547) * feat: model aliasing mechanism * refactor: depration message * NER migrate to PyTorch (#1545) * add crf to ner * remove tf ner configs * update ner configs * update configs with probas * update ner docs * remove lstm ner model * reformat * refactor * replace allennlp with torchcrf * fix: torchcrf added to autodoc_mock_imports list * crf fixes * refactor * feat: ner and sentseg aliases * add ner_collection3_bert to tests * fix tests Co-authored-by: Fedor Ignatov * remove: gobot (#1544) * remove: gobot configs, docs and part of py files * remove: all remained files from deeppavlov.models.gobot * remove: gobot from README.md and requirements_registry * entity linking to pytorch and reduce ram and gpu (#1516) * lite entity linking * fixes * kbqa to pytorch and lite * el and kbqa updates * remove redundant components * update outputs of ner configs * update components registry * reformat * docstrings * rename configs * update tests * update docs * update docs * add init.py * refactor * fix tests * fix tests * fix requirements * fixes * fix kbqa requirements and downloads * fix downloads * update requirements * fix tests * fix downloads * feat: kbqa aliases * remove: rel_ranking_infer requirement_registry duplicate * refactor * refactor * update typing * refactor * refactor * fixes * refactor: license, import order, etc. * refactor: removed ranking training configs, devices type changed Co-authored-by: Fedor Ignatov * remove: mt-bert (#1560) * Replace tf squad with pytorch configs (#1539) * replace tf squad with torch * fixes * infer in call method * update redundant configs * reformat, update config names in odqa * fix config names in tests * refactor * update configs parameters * update squad configs in tests * refactor * add score from 0 to 1 * fix docs * update tests * fix docs * Update squad_ru_bert.json * fix multi-gpu inference * fix multi-gpu inference * fix squad training * odqa fixes * refactor * refactor: import order * refactor: squad_preprocessor imports * refactor * update config names * remove redundant squad configs * update docs * fix: removed unexisting configs from test_quick_start * feat: squad aliases * loading in base class * refactor * fix torch model * remove lang * remove lang * refactor: removed redundand newlines in two configs * remove strict load flag * rename configs, add squad2 config * update docs, remove redundant config * feat: instafail * replace checkpoints with old transformers version with new checkpoints * rename configs * refactor: docs and redundant requirements * add sent_tokenize Co-authored-by: Fedor Ignatov * remove: 35 unused components (#1563) * update: registry.json * remove: matplotlib from rured_reader * refactor: updated requirements_registry and some requirement txt files * remove: bilstm_nn component * remove: bow component * remove: dialog_db_result_iterator component * remove: dialog_iterator component * remove: dialog_indexing_iterator component * remove: dstc_slotfilling component * remove: dstc2_intents_iterator component * remove: dstc2_ner_iterator component * remove: sqlite_database component * remove: ubuntu_v2_mt_reader component * remove: siamese_reader component * remove: kvret_reader component * remove: kvret_dialog_iterator component * remove: rel_ranker component * remove: slotfill_raw[_rasa] components * remove: simple_dstc2_reader component * remove: md_yaml_dialogs_reader component * remove: dstc2_reader component * remove: char_splitter component * remove: char_splitting_lowercase_preprocessor component * remove: convert_ids2tags component * remove: two_sentences_emb component * remove: ru_sent_tokenizer component * remove: random_emb_mat component * remove: pymorphy_russian_lemmatizer component * remove: ner_preprocessor component * remove: multitask_iterator and multitask_reader components * remove: kbqa_reader component * remove: file_paths_reader and file_paths_iterator component * remove: emb_mat_assembler * remove: dictionary_vectorizer component * remove: capitalization_featurizer component * remove: glove component * fix: nvidia source cert error * fix: spacy models added to some components in requirements registry * refactor: newline at the end of ru_core_news_sm * remove: intent-catcher (#1564) * refactor: all classifier models to pytorch (#1565) * Replace tensorflow syntax parser with pytorch syntax parser from slovnet (#1569) Co-authored-by: Fedor Ignatov * remove: syntaxparser and morphotagger (#1573) * refactor: train logging (#1572) Co-authored-by: yurakuratov <9271630+yurakuratov@users.noreply.github.com> * feat: nested configs overwriting (#1561) * remove: TensorFlow (#1574) * feat: external metrics (#1546) Co-authored-by: Arij Aladel Co-authored-by: Fedor Ignatov * Add ner case-agnostic config (#1570) Co-authored-by: Fedor Ignatov * update: requirements (#1578) Co-authored-by: Fedor Ignatov Co-authored-by: Maxim Talimanchuk Co-authored-by: Vasily Co-authored-by: slowwavesleep <44175589+slowwavesleep@users.noreply.github.com> Co-authored-by: yurakuratov <9271630+yurakuratov@users.noreply.github.com> Co-authored-by: mak Co-authored-by: Ihab-Asaad <56175482+Ihab-Asaad@users.noreply.github.com> Co-authored-by: Aleksey Korshuk <48794610+AlekseyKorshuk@users.noreply.github.com> Co-authored-by: dmitrijeuseew Co-authored-by: Arij-Aladel <68355048+Arij-Aladel@users.noreply.github.com> Co-authored-by: Arij Aladel --- Jenkinsfile | 25 +- README.md | 131 +- deeppavlov/__init__.py | 1 + deeppavlov/_meta.py | 2 +- .../configs/classifiers/boolqa_rubert.json | 28 +- .../entity_ranking_bert_eng_no_mention.json | 76 - .../entity_ranking_bert_rus_no_mention.json | 76 - .../classifiers/glue/glue_mnli_roberta.json | 1 - .../glue/glue_rte_roberta_mnli.json | 1 - .../glue_wnli_roberta.json} | 94 +- .../configs/classifiers/insults_kaggle.json | 155 - .../classifiers/insults_kaggle_bert.json | 64 +- .../classifiers/insults_kaggle_conv_bert.json | 153 - .../configs/classifiers/intents_dstc2.json | 156 - .../classifiers/intents_dstc2_bert.json | 121 - .../classifiers/intents_dstc2_big.json | 155 - .../classifiers/intents_sample_csv.json | 160 - .../classifiers/intents_sample_json.json | 155 - .../configs/classifiers/intents_snips.json | 141 - .../classifiers/intents_snips_big.json | 141 - .../classifiers/intents_snips_sklearn.json | 164 - .../intents_snips_tfidf_weighted.json | 182 - .../configs/classifiers/paraphraser_bert.json | 104 - .../classifiers/paraphraser_rubert.json | 36 +- deeppavlov/configs/classifiers/query_pr.json | 69 +- .../configs/classifiers/rel_ranking_bert.json | 77 - .../classifiers/rel_ranking_bert_rus.json | 76 - .../classifiers/relation_prediction_rus.json | 132 - .../classifiers/ru_obscenity_classifier.json | 30 - .../configs/classifiers/rusentiment_bert.json | 25 +- .../rusentiment_bigru_superconv.json | 165 - .../configs/classifiers/rusentiment_cnn.json | 167 - .../classifiers/rusentiment_convers_bert.json | 25 +- .../rusentiment_elmo_twitter_cnn.json | 170 - .../classifiers/sentiment_imdb_bert.json | 142 - .../classifiers/sentiment_imdb_conv_bert.json | 142 - .../classifiers/sentiment_sst_conv_bert.json | 21 +- .../classifiers/sentiment_sst_multi_bert.json | 135 - .../classifiers/sentiment_twitter.json | 29 +- .../sentiment_twitter_bert_emb.json | 144 - .../sentiment_twitter_preproc.json | 159 - .../classifiers/sentiment_yelp_conv_bert.json | 149 - .../sentiment_yelp_multi_bert.json | 149 - .../configs/classifiers/sst_torch_swcnn.json | 148 - .../superglue/superglue_copa_roberta.json | 236 +- .../superglue/superglue_record_roberta.json | 2 +- .../configs/classifiers/topic_ag_news.json | 154 - .../classifiers/yahoo_convers_vs_info.json | 167 - .../yahoo_convers_vs_info_bert.json | 160 - .../en_ranker_pop_enwiki20180211.json | 6 +- .../en_ranker_tfidf_enwiki20161221.json | 80 - .../configs/elmo/elmo_1b_benchmark.json | 81 - .../configs/elmo/elmo_1b_benchmark_test.json | 79 - .../elmo_lm_ready4fine_tuning_ru_news.json | 83 - ...o_lm_ready4fine_tuning_ru_news_simple.json | 83 - .../elmo_lm_ready4fine_tuning_ru_twitter.json | 83 - ...m_ready4fine_tuning_ru_twitter_simple.json | 83 - .../elmo_lm_ready4fine_tuning_ru_wiki.json | 83 - ...o_lm_ready4fine_tuning_ru_wiki_simple.json | 83 - .../elmo/elmo_paraphraser_fine_tuning.json | 84 - .../embedder/bert_sentence_embedder.json | 6 +- .../configs/embedder/elmo_en_1billion.json | 36 - deeppavlov/configs/embedder/elmo_ru_news.json | 42 - .../configs/embedder/elmo_ru_twitter.json | 42 - deeppavlov/configs/embedder/elmo_ru_wiki.json | 42 - .../entity_detection_en.json | 46 + .../entity_detection_ru.json | 41 + .../entity_extraction_en.json | 23 + .../entity_extraction_ru.json | 23 + .../entity_extraction/entity_linking_en.json | 61 + .../entity_extraction/entity_linking_ru.json | 61 + .../configs/faq/tfidf_logreg_autofaq.json | 6 +- .../configs/faq/tfidf_logreg_en_faq.json | 14 +- deeppavlov/configs/go_bot/database_dstc2.json | 44 - deeppavlov/configs/go_bot/gobot_dstc2.json | 125 - .../configs/go_bot/gobot_dstc2_best.json | 133 - .../go_bot/gobot_dstc2_best_json_nlg.json | 133 - .../configs/go_bot/gobot_dstc2_minimal.json | 115 - .../configs/go_bot/gobot_md_yaml_minimal.json | 112 - .../configs/go_bot/gobot_simple_dstc2.json | 125 - .../intent_catcher/intent_catcher.json | 97 - .../configs/kbqa/entity_linking_eng.json | 89 - .../configs/kbqa/entity_linking_rus.json | 89 - deeppavlov/configs/kbqa/kbqa_cq.json | 184 - .../configs/kbqa/kbqa_cq_bert_ranker.json | 171 - deeppavlov/configs/kbqa/kbqa_cq_en.json | 94 + deeppavlov/configs/kbqa/kbqa_cq_mt_bert.json | 257 -- deeppavlov/configs/kbqa/kbqa_cq_online.json | 170 - .../configs/kbqa/kbqa_cq_online_mt_bert.json | 254 -- deeppavlov/configs/kbqa/kbqa_cq_ru.json | 116 + deeppavlov/configs/kbqa/kbqa_cq_rus.json | 205 - deeppavlov/configs/kbqa/kbqa_cq_sep.json | 175 - .../configs/kbqa/kbqa_entity_linking.json | 55 - .../configs/kbqa/kbqa_mt_bert_train.json | 255 -- .../BERT/morpho_ru_syntagrus_bert.json | 166 - .../morpho_tagger/UD2.0/morpho_ar.json | 173 - .../morpho_tagger/UD2.0/morpho_cs.json | 173 - .../morpho_tagger/UD2.0/morpho_de.json | 173 - .../morpho_tagger/UD2.0/morpho_en.json | 173 - .../morpho_tagger/UD2.0/morpho_es_ancora.json | 173 - .../morpho_tagger/UD2.0/morpho_fr.json | 173 - .../morpho_tagger/UD2.0/morpho_hi.json | 173 - .../morpho_tagger/UD2.0/morpho_hu.json | 173 - .../morpho_tagger/UD2.0/morpho_it.json | 173 - .../UD2.0/morpho_ru_syntagrus.json | 173 - .../UD2.0/morpho_ru_syntagrus_pymorphy.json | 193 - ...orpho_ru_syntagrus_pymorphy_lemmatize.json | 201 - .../morpho_tagger/UD2.0/morpho_tr.json | 174 - deeppavlov/configs/nemo/asr.json | 26 - deeppavlov/configs/nemo/asr_tts.json | 48 - deeppavlov/configs/nemo/tts.json | 27 - deeppavlov/configs/ner/conll2003_m1.json | 148 - .../ner/ner_bert_ent_and_type_rus.json | 119 - ...son => ner_case_agnostic_mdistilbert.json} | 35 +- ...t_torch.json => ner_collection3_bert.json} | 21 +- .../configs/ner/ner_collection3_m1.json | 134 - deeppavlov/configs/ner/ner_conll2003.json | 177 - .../configs/ner/ner_conll2003_bert.json | 137 +- deeppavlov/configs/ner/ner_conll2003_pos.json | 189 - .../configs/ner/ner_conll2003_torch_bert.json | 155 - deeppavlov/configs/ner/ner_dstc2.json | 126 - deeppavlov/configs/ner/ner_few_shot_ru.json | 104 - .../configs/ner/ner_few_shot_ru_simulate.json | 140 - deeppavlov/configs/ner/ner_kb_rus.json | 164 - .../ner/ner_lcquad_bert_ent_and_type.json | 119 - .../configs/ner/ner_lcquad_bert_probas.json | 119 - deeppavlov/configs/ner/ner_ontonotes.json | 165 - .../configs/ner/ner_ontonotes_bert.json | 65 +- .../configs/ner/ner_ontonotes_bert_emb.json | 122 - .../configs/ner/ner_ontonotes_bert_mult.json | 61 +- .../ner/ner_ontonotes_bert_probas.json | 107 - deeppavlov/configs/ner/ner_ontonotes_m1.json | 131 - deeppavlov/configs/ner/ner_rus.json | 177 - deeppavlov/configs/ner/ner_rus_bert.json | 140 +- .../configs/ner/ner_rus_bert_probas.json | 86 +- .../ner/ner_rus_convers_distilrubert_2L.json | 9 +- .../ner/ner_rus_convers_distilrubert_6L.json | 9 +- deeppavlov/configs/ner/slotfill_dstc2.json | 64 - .../configs/ner/slotfill_dstc2_raw.json | 54 - .../ner/slotfill_simple_dstc2_raw.json | 54 - .../configs/ner/slotfill_simple_rasa_raw.json | 43 - deeppavlov/configs/ner/vlsp2016_full.json | 170 - .../odqa/en_odqa_infer_enwiki20161221.json | 69 - .../configs/odqa/en_odqa_infer_wiki.json | 2 +- .../en_odqa_pop_infer_enwiki20180211.json | 4 +- .../configs/odqa/ru_odqa_infer_wiki.json | 2 +- .../odqa/ru_odqa_infer_wiki_retr_noans.json | 2 +- .../odqa/ru_odqa_infer_wiki_rubert.json | 70 - .../odqa/ru_odqa_infer_wiki_rubert_noans.json | 70 - .../tfidf_logreg_autofaq_psearch.json | 6 +- .../ranking/paraphrase_ident_paraphraser.json | 108 - ...paraphrase_ident_paraphraser_interact.json | 121 - .../configs/ranking/ranking_default.json | 106 - .../ranking/ranking_default_triplet.json | 108 - .../ranking/ranking_ubuntu_v2_bert_sep.json | 72 - .../ranking_ubuntu_v2_bert_sep_interact.json | 91 - .../ranking_ubuntu_v2_bert_uncased.json | 72 - .../configs/ranking/ranking_ubuntu_v2_mt.json | 107 - .../ranking_ubuntu_v2_mt_interact.json | 121 - ...ubuntu_v2_mt_word2vec_dam_transformer.json | 134 - .../ranking_ubuntu_v2_mt_word2vec_smn.json | 127 - .../ranking_ubuntu_v2_torch_bert_uncased.json | 2 +- deeppavlov/configs/ranking/rel_ranking.json | 88 - .../configs/ranking/rel_ranking_bert_en.json | 106 + .../configs/ranking/rel_ranking_bert_ru.json | 106 + .../regressors/translation_ranker.json | 105 + .../relation_extraction/re_docred.json | 10 +- .../configs/relation_extraction/re_rured.json | 2 +- .../sentseg_dailydialog.json | 130 - .../sentseg_dailydialog_bert.json} | 37 +- deeppavlov/configs/skills/aiml_skill.json | 44 - deeppavlov/configs/skills/dsl_skill.json | 40 - deeppavlov/configs/skills/rasa_skill.json | 39 - .../brillmoore_kartaslov_ru.json | 82 - .../brillmoore_kartaslov_ru_custom_vocab.json | 84 - .../brillmoore_kartaslov_ru_nolm.json | 77 - .../brillmoore_wikitypos_en.json | 4 + .../configs/squad/multi_squad_noans.json | 148 - .../squad/multi_squad_noans_infer.json | 140 - .../configs/squad/multi_squad_retr_noans.json | 159 - .../squad/multi_squad_ru_retr_noans.json | 159 - .../multi_squad_ru_retr_noans_rubert.json | 106 - ...ulti_squad_ru_retr_noans_rubert_infer.json | 70 - .../configs/squad/qa_multisberquad_bert.json | 108 + ...ad_torch_bert.json => qa_squad2_bert.json} | 39 +- deeppavlov/configs/squad/squad.json | 138 - deeppavlov/configs/squad/squad_bert.json | 62 +- .../configs/squad/squad_bert_infer.json | 75 - .../squad_bert_multilingual_freezed_emb.json | 66 - .../configs/squad/squad_bert_uncased.json | 103 - deeppavlov/configs/squad/squad_ru.json | 139 - deeppavlov/configs/squad/squad_ru_bert.json | 165 +- .../configs/squad/squad_ru_bert_infer.json | 78 - .../squad_ru_convers_distilrubert_2L.json | 18 +- ...quad_ru_convers_distilrubert_2L_infer.json | 76 - .../squad_ru_convers_distilrubert_6L.json | 18 +- ...quad_ru_convers_distilrubert_6L_infer.json | 76 - deeppavlov/configs/squad/squad_ru_rubert.json | 107 - .../configs/squad/squad_ru_rubert_infer.json | 78 - .../configs/squad/squad_ru_torch_bert.json | 175 - .../configs/squad/squad_torch_bert_infer.json | 69 - .../configs/squad/squad_zh_bert_mult.json | 118 - .../configs/squad/squad_zh_bert_zh.json | 118 - .../syntax/ru_syntagrus_joint_parsing.json | 33 - .../syntax/syntax_ru_syntagrus_bert.json | 183 - .../mt_bert/mt_bert_inference_tutorial.json | 139 - .../mt_bert/mt_bert_train_tutorial.json | 311 -- .../data/tools/train_set_generation.py | 177 - .../Dataset_generation_tutorial.ipynb | 806 ---- deeppavlov/contrib/examples/db.sqlite | Bin 24576 -> 0 bytes .../contrib/examples/dstc2-templates.txt | 46 - .../contrib/examples/dstc_slot_vals.json | 416 -- .../contrib/examples/generated_data.json | 164 - deeppavlov/core/commands/train.py | 28 +- deeppavlov/core/commands/utils.py | 36 +- deeppavlov/core/common/aliases.py | 47 + deeppavlov/core/common/base.py | 62 + deeppavlov/core/common/check_gpu.py | 38 - deeppavlov/core/common/file.py | 25 +- deeppavlov/core/common/log_events.py | 53 + deeppavlov/core/common/metrics_registry.json | 1 - deeppavlov/core/common/metrics_registry.py | 11 +- deeppavlov/core/common/params.py | 2 +- deeppavlov/core/common/registry.json | 147 +- .../core/common/requirements_registry.json | 386 +- deeppavlov/core/data/sqlite_database.py | 187 - deeppavlov/core/layers/keras_layers.py | 223 -- .../core/layers/tf_attention_mechanisms.py | 337 -- .../core/layers/tf_csoftmax_attention.py | 255 -- deeppavlov/core/layers/tf_layers.py | 952 ----- deeppavlov/core/models/keras_model.py | 206 - deeppavlov/core/models/tf_backend.py | 77 - deeppavlov/core/models/tf_model.py | 254 -- deeppavlov/core/models/torch_model.py | 44 +- deeppavlov/core/trainers/fit_trainer.py | 56 +- deeppavlov/core/trainers/nn_trainer.py | 36 +- deeppavlov/core/trainers/utils.py | 18 +- .../dataset_iterators/dialog_iterator.py | 123 - .../dstc2_intents_iterator.py | 85 - .../dataset_iterators/dstc2_ner_iterator.py | 102 - .../elmo_file_paths_iterator.py | 154 - .../dataset_iterators/file_paths_iterator.py | 74 - .../kvret_dialog_iterator.py | 77 - .../morphotagger_iterator.py | 120 - .../dataset_iterators/multitask_iterator.py | 119 +- .../ner_few_shot_iterator.py | 144 - .../snips_intents_iterator.py | 30 - .../dataset_iterators/snips_ner_iterator.py | 42 - .../dataset_iterators/squad_iterator.py | 13 +- deeppavlov/dataset_readers/dstc2_reader.py | 362 -- .../dataset_readers/file_paths_reader.py | 66 - .../dataset_readers/intent_catcher_reader.py | 55 - deeppavlov/dataset_readers/kbqa_reader.py | 48 - deeppavlov/dataset_readers/kvret_reader.py | 183 - .../dataset_readers/md_yaml_dialogs_reader.py | 663 ---- .../morphotagging_dataset_reader.py | 188 - .../dataset_readers/multitask_reader.py | 64 - deeppavlov/dataset_readers/rured_reader.py | 10 - deeppavlov/dataset_readers/siamese_reader.py | 59 - deeppavlov/dataset_readers/snips_reader.py | 93 - .../dataset_readers/squad_dataset_reader.py | 14 +- .../torchtext_classification_data_reader.py | 60 - .../dataset_readers/ubuntu_v2_mt_reader.py | 117 - deeppavlov/deep.py | 35 +- deeppavlov/metrics/accuracy.py | 18 - deeppavlov/models/bert/__init__.py | 0 deeppavlov/models/bert/bert_classifier.py | 243 -- deeppavlov/models/bert/bert_ranker.py | 467 --- .../models/bert/bert_sequence_tagger.py | 704 ---- deeppavlov/models/bert/bert_squad.py | 366 -- .../classifiers/keras_classification_model.py | 960 ----- .../classifiers/ru_obscenity_classifier.py | 144 - deeppavlov/models/doc_retrieval/pop_ranker.py | 2 +- deeppavlov/models/elmo/__init__.py | 0 deeppavlov/models/elmo/bilm_model.py | 510 --- deeppavlov/models/elmo/elmo.py | 601 --- deeppavlov/models/elmo/elmo2tfhub.py | 208 - deeppavlov/models/elmo/elmo_model.py | 730 ---- deeppavlov/models/elmo/train_utils.py | 244 -- deeppavlov/models/embedders/bow_embedder.py | 58 - deeppavlov/models/embedders/elmo_embedder.py | 314 -- deeppavlov/models/embedders/glove_embedder.py | 74 - .../entity_extraction}/__init__.py | 0 .../entity_detection_parser.py | 139 +- .../entity_extraction/entity_linking.py | 583 +++ .../models/entity_extraction/ner_chunker.py | 318 ++ deeppavlov/models/go_bot/__init__.py | 0 deeppavlov/models/go_bot/dto/__init__.py | 0 .../models/go_bot/dto/dataset_features.py | 258 -- .../models/go_bot/dto/shared_gobot_params.py | 24 - deeppavlov/models/go_bot/go_bot.py | 484 --- deeppavlov/models/go_bot/nlg/__init__.py | 0 deeppavlov/models/go_bot/nlg/dto/__init__.py | 0 .../go_bot/nlg/dto/batch_nlg_response.py | 7 - .../go_bot/nlg/dto/json_nlg_response.py | 25 - .../go_bot/nlg/dto/nlg_response_interface.py | 10 - .../go_bot/nlg/mock_json_nlg_manager.py | 151 - deeppavlov/models/go_bot/nlg/nlg_manager.py | 115 - .../go_bot/nlg/nlg_manager_interface.py | 52 - .../models/go_bot/nlg/templates/__init__.py | 0 .../models/go_bot/nlg/templates/templates.py | 186 - deeppavlov/models/go_bot/nlu/__init__.py | 0 deeppavlov/models/go_bot/nlu/dto/__init__.py | 0 .../models/go_bot/nlu/dto/nlu_response.py | 18 - .../go_bot/nlu/dto/nlu_response_interface.py | 5 - .../nlu/dto/text_vectorization_response.py | 9 - deeppavlov/models/go_bot/nlu/nlu_manager.py | 82 - .../go_bot/nlu/nlu_manager_interface.py | 17 - .../models/go_bot/nlu/tokens_vectorizer.py | 149 - deeppavlov/models/go_bot/policy/__init__.py | 0 .../models/go_bot/policy/dto/__init__.py | 0 .../models/go_bot/policy/dto/attn_params.py | 16 - .../policy/dto/digitized_policy_features.py | 5 - .../policy/dto/policy_network_params.py | 57 - .../go_bot/policy/dto/policy_prediction.py | 18 - .../models/go_bot/policy/policy_network.py | 455 --- deeppavlov/models/go_bot/tracker/__init__.py | 0 .../go_bot/tracker/dialogue_state_tracker.py | 279 -- .../models/go_bot/tracker/dto/__init__.py | 0 .../go_bot/tracker/dto/dst_knowledge.py | 12 - .../dto/tracker_knowledge_interface.py | 5 - .../go_bot/tracker/featurized_tracker.py | 270 -- .../go_bot/tracker/tracker_interface.py | 42 - deeppavlov/models/go_bot/wrapper.py | 49 - deeppavlov/models/intent_catcher/__init__.py | 0 .../models/intent_catcher/intent_catcher.py | 260 -- deeppavlov/models/kbqa/entity_linking.py | 422 -- deeppavlov/models/kbqa/kbqa_entity_linking.py | 434 -- deeppavlov/models/kbqa/query_generator.py | 102 +- .../models/kbqa/query_generator_base.py | 190 +- .../models/kbqa/query_generator_online.py | 190 - .../models/kbqa/rel_ranking_bert_infer.py | 190 - deeppavlov/models/kbqa/rel_ranking_infer.py | 200 +- deeppavlov/models/kbqa/sentence_answer.py | 11 + deeppavlov/models/kbqa/template_matcher.py | 9 +- deeppavlov/models/kbqa/tree_to_sparql.py | 73 +- deeppavlov/models/kbqa/type_define.py | 154 + deeppavlov/models/kbqa/utils.py | 58 +- deeppavlov/models/kbqa/wiki_parser.py | 351 +- deeppavlov/models/kbqa/wiki_parser_online.py | 107 - deeppavlov/models/morpho_tagger/__init__.py | 0 deeppavlov/models/morpho_tagger/__main__.py | 25 - deeppavlov/models/morpho_tagger/cells.py | 179 - deeppavlov/models/morpho_tagger/common.py | 293 -- .../models/morpho_tagger/common_tagger.py | 128 - deeppavlov/models/morpho_tagger/lemmatizer.py | 137 - .../models/morpho_tagger/morpho_tagger.py | 352 -- deeppavlov/models/multitask_bert/__init__.py | 0 .../models/multitask_bert/multitask_bert.py | 1152 ------ deeppavlov/models/nemo/__init__.py | 0 deeppavlov/models/nemo/asr.py | 193 - deeppavlov/models/nemo/common.py | 117 - deeppavlov/models/nemo/tts.py | 210 - deeppavlov/models/nemo/vocoder.py | 131 - deeppavlov/models/ner/NER_model.py | 317 -- deeppavlov/models/ner/__init__.py | 0 deeppavlov/models/ner/bio.py | 46 - deeppavlov/models/ner/network.py | 324 -- deeppavlov/models/ner/svm.py | 83 - .../assemble_embeddings_matrix.py | 93 - .../models/preprocessors/bert_preprocessor.py | 324 -- .../models/preprocessors/capitalization.py | 138 - .../models/preprocessors/char_splitter.py | 37 - .../models/preprocessors/ner_preprocessor.py | 83 - .../preprocessors/random_embeddings_matrix.py | 37 - .../preprocessors/russian_lemmatizer.py | 37 - .../preprocessors/siamese_preprocessor.py | 138 - .../preprocessors/squad_preprocessor.py | 443 +-- .../torch_transformers_preprocessor.py | 314 +- .../ranking/bilstm_gru_siamese_network.py | 110 - .../models/ranking/bilstm_siamese_network.py | 292 -- ...ention_matching_network_use_transformer.py | 403 -- .../models/ranking/keras_siamese_model.py | 123 - .../ranking/matching_models/__init__.py | 0 .../matching_models/dam_utils/__init__.py | 0 .../matching_models/dam_utils/layers.py | 555 --- .../matching_models/dam_utils/operations.py | 400 -- .../models/ranking/mpm_siamese_network.py | 180 - deeppavlov/models/ranking/rel_ranker.py | 146 - .../ranking/sequential_matching_network.py | 150 - deeppavlov/models/ranking/siamese_model.py | 135 - .../models/ranking/siamese_predictor.py | 146 - .../models/ranking/tf_base_matching_model.py | 162 - deeppavlov/models/slotfill/__init__.py | 0 deeppavlov/models/slotfill/slotfill.py | 130 - deeppavlov/models/slotfill/slotfill_raw.py | 181 - deeppavlov/models/squad/__init__.py | 0 deeppavlov/models/squad/squad.py | 326 -- deeppavlov/models/squad/utils.py | 214 - deeppavlov/models/syntax_parser/__init__.py | 0 deeppavlov/models/syntax_parser/joint.py | 142 - deeppavlov/models/syntax_parser/network.py | 345 -- deeppavlov/models/syntax_parser/parser.py | 47 - .../models/tokenizers/jieba_tokenizer.py | 68 - .../models/tokenizers/lazy_tokenizer.py | 37 - .../models/tokenizers/ru_sent_tokenizer.py | 47 - .../models/tokenizers/spacy_tokenizer.py | 1 + deeppavlov/models/torch_bert/crf.py | 28 + .../models/torch_bert/torch_bert_ranker.py | 46 +- .../torch_transformers_classifier.py | 51 +- .../torch_transformers_el_ranker.py | 445 +++ .../torch_transformers_sequence_tagger.py | 128 +- .../torch_bert/torch_transformers_squad.py | 321 +- .../models/vectorizers/word_vectorizer.py | 289 -- deeppavlov/requirements/aiml_skill.txt | 1 - deeppavlov/requirements/bert_dp.txt | 1 - deeppavlov/requirements/datasets.txt | 2 +- deeppavlov/requirements/en_core_web_sm.txt | 3 +- deeppavlov/requirements/faiss.txt | 1 - deeppavlov/requirements/fasttext.txt | 2 +- deeppavlov/requirements/gensim.txt | 1 - deeppavlov/requirements/jieba.txt | 1 - deeppavlov/requirements/kenlm.txt | 2 +- deeppavlov/requirements/lxml.txt | 2 +- deeppavlov/requirements/morpho_tagger.txt | 1 - deeppavlov/requirements/nemo-asr.txt | 7 - deeppavlov/requirements/nemo-tts.txt | 3 - deeppavlov/requirements/nemo.txt | 1 - deeppavlov/requirements/opt_einsum.txt | 2 +- deeppavlov/requirements/pytorch.txt | 1 + deeppavlov/requirements/pytorch14.txt | 2 - deeppavlov/requirements/pytorch16.txt | 2 - deeppavlov/requirements/rapidfuzz.txt | 2 +- deeppavlov/requirements/rasa_skill.txt | 1 - deeppavlov/requirements/ru_core_news_sm.txt | 2 + deeppavlov/requirements/sacremoses.txt | 1 + deeppavlov/requirements/slovnet.txt | 2 + deeppavlov/requirements/sortedcontainers.txt | 2 +- deeppavlov/requirements/spacy.txt | 1 - deeppavlov/requirements/syntax_parser.txt | 1 - deeppavlov/requirements/tf-gpu.txt | 1 - deeppavlov/requirements/tf-hub.txt | 1 - deeppavlov/requirements/tf.txt | 1 - deeppavlov/requirements/torchcrf.txt | 1 + deeppavlov/requirements/torchtext.txt | 1 - deeppavlov/requirements/transformers.txt | 2 +- deeppavlov/requirements/transformers28.txt | 1 - deeppavlov/requirements/udapi.txt | 2 +- deeppavlov/requirements/whapi.txt | 3 +- deeppavlov/requirements/xeger.txt | 1 - deeppavlov/skills/__init__.py | 0 deeppavlov/skills/aiml_skill/README.md | 6 - deeppavlov/skills/aiml_skill/__init__.py | 1 - deeppavlov/skills/aiml_skill/aiml_skill.py | 158 - deeppavlov/skills/dsl_skill/__init__.py | 3 - deeppavlov/skills/dsl_skill/context.py | 53 - deeppavlov/skills/dsl_skill/dsl_skill.py | 225 -- .../skills/dsl_skill/handlers/__init__.py | 0 .../skills/dsl_skill/handlers/handler.py | 68 - .../dsl_skill/handlers/regex_handler.py | 80 - deeppavlov/skills/dsl_skill/utils.py | 22 - deeppavlov/skills/rasa_skill/__init__.py | 1 - deeppavlov/skills/rasa_skill/rasa_skill.py | 269 -- deeppavlov/utils/alexa/__init__.py | 1 - deeppavlov/utils/alexa/request_parameters.py | 94 - deeppavlov/utils/alexa/server.py | 88 - deeppavlov/utils/alice/__init__.py | 1 - deeppavlov/utils/alice/request_parameters.py | 57 - deeppavlov/utils/alice/server.py | 65 - deeppavlov/utils/connector/__init__.py | 1 - deeppavlov/utils/connector/bot.py | 544 --- deeppavlov/utils/connector/conversation.py | 465 --- deeppavlov/utils/connector/ssl_tools.py | 216 - deeppavlov/utils/ms_bot_framework/__init__.py | 1 - deeppavlov/utils/ms_bot_framework/server.py | 60 - deeppavlov/utils/settings/log_config.json | 16 + deeppavlov/utils/settings/server_config.json | 31 - deeppavlov/utils/telegram/__init__.py | 1 - deeppavlov/utils/telegram/telegram_ui.py | 23 - .../ms_bot_framework/01_web_app_bot.png | Bin 45399 -> 0 bytes .../02_web_app_bot_settings.png | Bin 119789 -> 0 bytes .../ms_bot_framework/03_navigate_to_bot.png | Bin 131169 -> 0 bytes .../ms_bot_framework/04_bot_settings.png | Bin 164448 -> 0 bytes .../ms_bot_framework/05_bot_channels.png | Bin 153051 -> 0 bytes docs/_static/social/f_logo_RGB-Blue_58.png | Bin 2465 -> 0 bytes docs/_templates/footer.html | 1 - docs/apiref/core/common.rst | 8 + docs/apiref/core/data.rst | 2 - docs/apiref/core/models.rst | 6 - docs/apiref/dataset_iterators.rst | 19 - docs/apiref/dataset_readers.rst | 22 - docs/apiref/models/bert.rst | 63 - docs/apiref/models/classifiers.rst | 5 - docs/apiref/models/elmo.rst | 6 - docs/apiref/models/embedders.rst | 14 +- docs/apiref/models/entity_extraction.rst | 19 + docs/apiref/models/entity_linking.rst | 22 - docs/apiref/models/go_bot.rst | 17 - docs/apiref/models/intent_catcher.rst | 8 - docs/apiref/models/kbqa.rst | 15 +- docs/apiref/models/morpho_tagger.rst | 27 - docs/apiref/models/multitask_bert.rst | 58 - docs/apiref/models/nemo.rst | 32 - docs/apiref/models/ner.rst | 4 - docs/apiref/models/preprocessors.rst | 16 - docs/apiref/models/ranking.rst | 26 - docs/apiref/models/slotfill.rst | 8 - docs/apiref/models/squad.rst | 9 - docs/apiref/models/syntax_parser.rst | 16 - docs/apiref/models/tokenizers.rst | 4 - docs/apiref/models/torch_bert.rst | 4 - docs/apiref/models/vectorizers.rst | 11 - docs/apiref/skills.rst | 12 - docs/apiref/skills/aiml_skill.rst | 5 - docs/apiref/skills/dsl_skill.rst | 5 - docs/apiref/skills/rasa_skill.rst | 5 - docs/conf.py | 10 +- docs/devguides/contribution_guide.rst | 26 +- docs/features/models/bert.rst | 85 +- docs/features/models/classifiers.rst | 286 +- docs/features/models/entity_extraction.rst | 107 + docs/features/models/entity_linking.rst | 55 - docs/features/models/intent_catcher.rst | 83 - docs/features/models/kbqa.rst | 133 +- docs/features/models/morphotagger.rst | 684 ---- docs/features/models/multitask_bert.rst | 348 -- docs/features/models/nemo.rst | 164 - docs/features/models/ner.rst | 174 +- docs/features/models/neural_ranking.rst | 163 +- docs/features/models/slot_filling.rst | 264 -- docs/features/models/spelling_correction.rst | 7 +- docs/features/models/squad.rst | 90 +- docs/features/models/syntaxparser.rst | 170 - docs/features/models/tfidf_ranking.rst | 4 +- docs/features/overview.rst | 377 +- docs/features/pretrained_vectors.rst | 85 +- docs/features/skills/aiml_skill.rst | 44 - docs/features/skills/dsl_skill.rst | 42 - docs/features/skills/go_bot.rst | 640 --- docs/features/skills/odqa.rst | 46 +- docs/features/skills/rasa_skill.rst | 50 - docs/index.rst | 17 +- docs/integrations/amazon_alexa.rst | 202 - docs/integrations/ms_bot.rst | 104 - docs/integrations/rest_api.rst | 4 +- docs/integrations/socket_api.rst | 4 +- docs/integrations/telegram.rst | 39 - docs/integrations/yandex_alice.rst | 59 - docs/intro/choose_framework.rst | 135 - docs/intro/configuration.rst | 106 +- docs/intro/overview.rst | 7 +- docs/intro/quick_start.rst | 68 +- .../Pseudo-labeling for classification.ipynb | 210 - examples/README.md | 19 - examples/classification_tutorial.ipynb | 2961 -------------- examples/gobot_extended_tutorial.ipynb | 1387 ------- examples/gobot_formfilling_tutorial.ipynb | 1412 ------- examples/gobot_md_yaml_configs_tutorial.ipynb | 3491 ----------------- examples/gobot_tutorial.ipynb | 799 ---- examples/img/gobot_database.png | Bin 8149 -> 0 bytes examples/img/gobot_example.png | Bin 586959 -> 0 bytes examples/img/gobot_pipeline.png | Bin 84393 -> 0 bytes examples/img/gobot_policy.png | Bin 23464 -> 0 bytes examples/img/gobot_simple_example.png | Bin 112550 -> 0 bytes examples/img/gobot_simple_pipeline.png | Bin 63816 -> 0 bytes examples/img/gobot_simple_policy.png | Bin 22437 -> 0 bytes examples/img/gobot_simple_templates.png | Bin 5270 -> 0 bytes examples/img/gobot_slotfiller.png | Bin 4756 -> 0 bytes examples/img/gobot_templates.png | Bin 6118 -> 0 bytes examples/img/sc_loss_comparison.png | Bin 88085 -> 0 bytes examples/img/sc_ner_lr_cosine.png | Bin 16075 -> 0 bytes examples/img/sc_ner_lr_exponential.png | Bin 15694 -> 0 bytes examples/img/sc_ner_lr_linear.png | Bin 16102 -> 0 bytes examples/img/sc_ner_lr_linear2.png | Bin 16220 -> 0 bytes examples/img/sc_ner_lr_no.png | Bin 12288 -> 0 bytes examples/img/sc_ner_lr_onecycle.png | Bin 23861 -> 0 bytes examples/img/sc_ner_lr_polynomial.png | Bin 16309 -> 0 bytes examples/img/sc_ner_lr_polynomial1.png | Bin 15769 -> 0 bytes examples/img/sc_ner_lr_polynomial2.png | Bin 15410 -> 0 bytes examples/img/sc_ner_lr_sc.png | Bin 36522 -> 0 bytes examples/img/sc_ner_lr_sc1.png | Bin 33542 -> 0 bytes examples/img/sc_ner_lr_trapezoid.png | Bin 22307 -> 0 bytes examples/morphotagger_example.ipynb | 315 -- examples/super_convergence_tutorial.ipynb | 629 --- requirements.txt | 42 +- setup.py | 4 +- tests/test_aiml_skill.py | 37 - .../classifiers/intents_snips_bigru.json | 138 - .../classifiers/intents_snips_bilstm.json | 138 - .../intents_snips_bilstm_bilstm.json | 139 - .../classifiers/intents_snips_bilstm_cnn.json | 145 - .../intents_snips_bilstm_proj_layer.json | 140 - ...tents_snips_bilstm_self_add_attention.json | 141 - ...ents_snips_bilstm_self_mult_attention.json | 141 - .../classifiers/intents_snips_cnn_bilstm.json | 145 - .../en_ranker_pop_wiki_test.json | 6 +- tests/test_configs/nemo/tts2asr_test.json | 49 - .../odqa/en_odqa_infer_wiki_test.json | 78 - .../odqa/en_odqa_pop_infer_wiki_test.json | 83 - .../odqa/ru_odqa_infer_wiki_test.json | 82 - tests/test_dsl_skill.py | 109 - tests/test_quick_start.py | 300 +- tests/test_rasa_skill.py | 39 - tests/test_tf_layers.py | 230 -- utils/Docker/Dockerfile | 60 + .../models/ranking => utils/Docker}/README.md | 0 utils/Docker/cmd.sh | 16 + utils/Docker/docker-compose.yml | 56 + 598 files changed, 5852 insertions(+), 68049 deletions(-) delete mode 100644 deeppavlov/configs/classifiers/entity_ranking_bert_eng_no_mention.json delete mode 100644 deeppavlov/configs/classifiers/entity_ranking_bert_rus_no_mention.json rename deeppavlov/configs/classifiers/{insults_kaggle_bert_torch.json => glue/glue_wnli_roberta.json} (64%) delete mode 100644 deeppavlov/configs/classifiers/insults_kaggle.json delete mode 100644 deeppavlov/configs/classifiers/insults_kaggle_conv_bert.json delete mode 100644 deeppavlov/configs/classifiers/intents_dstc2.json delete mode 100644 deeppavlov/configs/classifiers/intents_dstc2_bert.json delete mode 100644 deeppavlov/configs/classifiers/intents_dstc2_big.json delete mode 100644 deeppavlov/configs/classifiers/intents_sample_csv.json delete mode 100644 deeppavlov/configs/classifiers/intents_sample_json.json delete mode 100644 deeppavlov/configs/classifiers/intents_snips.json delete mode 100644 deeppavlov/configs/classifiers/intents_snips_big.json delete mode 100644 deeppavlov/configs/classifiers/intents_snips_sklearn.json delete mode 100644 deeppavlov/configs/classifiers/intents_snips_tfidf_weighted.json delete mode 100644 deeppavlov/configs/classifiers/paraphraser_bert.json delete mode 100644 deeppavlov/configs/classifiers/rel_ranking_bert.json delete mode 100644 deeppavlov/configs/classifiers/rel_ranking_bert_rus.json delete mode 100644 deeppavlov/configs/classifiers/relation_prediction_rus.json delete mode 100644 deeppavlov/configs/classifiers/ru_obscenity_classifier.json delete mode 100644 deeppavlov/configs/classifiers/rusentiment_bigru_superconv.json delete mode 100644 deeppavlov/configs/classifiers/rusentiment_cnn.json delete mode 100644 deeppavlov/configs/classifiers/rusentiment_elmo_twitter_cnn.json delete mode 100644 deeppavlov/configs/classifiers/sentiment_imdb_bert.json delete mode 100644 deeppavlov/configs/classifiers/sentiment_imdb_conv_bert.json delete mode 100644 deeppavlov/configs/classifiers/sentiment_sst_multi_bert.json delete mode 100644 deeppavlov/configs/classifiers/sentiment_twitter_bert_emb.json delete mode 100644 deeppavlov/configs/classifiers/sentiment_twitter_preproc.json delete mode 100644 deeppavlov/configs/classifiers/sentiment_yelp_conv_bert.json delete mode 100644 deeppavlov/configs/classifiers/sentiment_yelp_multi_bert.json delete mode 100644 deeppavlov/configs/classifiers/sst_torch_swcnn.json delete mode 100644 deeppavlov/configs/classifiers/topic_ag_news.json delete mode 100644 deeppavlov/configs/classifiers/yahoo_convers_vs_info.json delete mode 100644 deeppavlov/configs/classifiers/yahoo_convers_vs_info_bert.json delete mode 100644 deeppavlov/configs/doc_retrieval/en_ranker_tfidf_enwiki20161221.json delete mode 100644 deeppavlov/configs/elmo/elmo_1b_benchmark.json delete mode 100644 deeppavlov/configs/elmo/elmo_1b_benchmark_test.json delete mode 100644 deeppavlov/configs/elmo/elmo_lm_ready4fine_tuning_ru_news.json delete mode 100644 deeppavlov/configs/elmo/elmo_lm_ready4fine_tuning_ru_news_simple.json delete mode 100644 deeppavlov/configs/elmo/elmo_lm_ready4fine_tuning_ru_twitter.json delete mode 100644 deeppavlov/configs/elmo/elmo_lm_ready4fine_tuning_ru_twitter_simple.json delete mode 100644 deeppavlov/configs/elmo/elmo_lm_ready4fine_tuning_ru_wiki.json delete mode 100644 deeppavlov/configs/elmo/elmo_lm_ready4fine_tuning_ru_wiki_simple.json delete mode 100644 deeppavlov/configs/elmo/elmo_paraphraser_fine_tuning.json delete mode 100644 deeppavlov/configs/embedder/elmo_en_1billion.json delete mode 100644 deeppavlov/configs/embedder/elmo_ru_news.json delete mode 100644 deeppavlov/configs/embedder/elmo_ru_twitter.json delete mode 100644 deeppavlov/configs/embedder/elmo_ru_wiki.json create mode 100644 deeppavlov/configs/entity_extraction/entity_detection_en.json create mode 100644 deeppavlov/configs/entity_extraction/entity_detection_ru.json create mode 100644 deeppavlov/configs/entity_extraction/entity_extraction_en.json create mode 100644 deeppavlov/configs/entity_extraction/entity_extraction_ru.json create mode 100644 deeppavlov/configs/entity_extraction/entity_linking_en.json create mode 100644 deeppavlov/configs/entity_extraction/entity_linking_ru.json delete mode 100644 deeppavlov/configs/go_bot/database_dstc2.json delete mode 100644 deeppavlov/configs/go_bot/gobot_dstc2.json delete mode 100644 deeppavlov/configs/go_bot/gobot_dstc2_best.json delete mode 100644 deeppavlov/configs/go_bot/gobot_dstc2_best_json_nlg.json delete mode 100644 deeppavlov/configs/go_bot/gobot_dstc2_minimal.json delete mode 100644 deeppavlov/configs/go_bot/gobot_md_yaml_minimal.json delete mode 100644 deeppavlov/configs/go_bot/gobot_simple_dstc2.json delete mode 100644 deeppavlov/configs/intent_catcher/intent_catcher.json delete mode 100644 deeppavlov/configs/kbqa/entity_linking_eng.json delete mode 100644 deeppavlov/configs/kbqa/entity_linking_rus.json delete mode 100644 deeppavlov/configs/kbqa/kbqa_cq.json delete mode 100644 deeppavlov/configs/kbqa/kbqa_cq_bert_ranker.json create mode 100644 deeppavlov/configs/kbqa/kbqa_cq_en.json delete mode 100644 deeppavlov/configs/kbqa/kbqa_cq_mt_bert.json delete mode 100644 deeppavlov/configs/kbqa/kbqa_cq_online.json delete mode 100644 deeppavlov/configs/kbqa/kbqa_cq_online_mt_bert.json create mode 100644 deeppavlov/configs/kbqa/kbqa_cq_ru.json delete mode 100644 deeppavlov/configs/kbqa/kbqa_cq_rus.json delete mode 100644 deeppavlov/configs/kbqa/kbqa_cq_sep.json delete mode 100644 deeppavlov/configs/kbqa/kbqa_entity_linking.json delete mode 100644 deeppavlov/configs/kbqa/kbqa_mt_bert_train.json delete mode 100644 deeppavlov/configs/morpho_tagger/BERT/morpho_ru_syntagrus_bert.json delete mode 100644 deeppavlov/configs/morpho_tagger/UD2.0/morpho_ar.json delete mode 100644 deeppavlov/configs/morpho_tagger/UD2.0/morpho_cs.json delete mode 100644 deeppavlov/configs/morpho_tagger/UD2.0/morpho_de.json delete mode 100644 deeppavlov/configs/morpho_tagger/UD2.0/morpho_en.json delete mode 100644 deeppavlov/configs/morpho_tagger/UD2.0/morpho_es_ancora.json delete mode 100644 deeppavlov/configs/morpho_tagger/UD2.0/morpho_fr.json delete mode 100644 deeppavlov/configs/morpho_tagger/UD2.0/morpho_hi.json delete mode 100644 deeppavlov/configs/morpho_tagger/UD2.0/morpho_hu.json delete mode 100644 deeppavlov/configs/morpho_tagger/UD2.0/morpho_it.json delete mode 100644 deeppavlov/configs/morpho_tagger/UD2.0/morpho_ru_syntagrus.json delete mode 100644 deeppavlov/configs/morpho_tagger/UD2.0/morpho_ru_syntagrus_pymorphy.json delete mode 100644 deeppavlov/configs/morpho_tagger/UD2.0/morpho_ru_syntagrus_pymorphy_lemmatize.json delete mode 100644 deeppavlov/configs/morpho_tagger/UD2.0/morpho_tr.json delete mode 100644 deeppavlov/configs/nemo/asr.json delete mode 100644 deeppavlov/configs/nemo/asr_tts.json delete mode 100644 deeppavlov/configs/nemo/tts.json delete mode 100644 deeppavlov/configs/ner/conll2003_m1.json delete mode 100644 deeppavlov/configs/ner/ner_bert_ent_and_type_rus.json rename deeppavlov/configs/ner/{ner_ontonotes_bert_torch.json => ner_case_agnostic_mdistilbert.json} (75%) rename deeppavlov/configs/ner/{ner_rus_bert_torch.json => ner_collection3_bert.json} (87%) delete mode 100644 deeppavlov/configs/ner/ner_collection3_m1.json delete mode 100644 deeppavlov/configs/ner/ner_conll2003.json delete mode 100644 deeppavlov/configs/ner/ner_conll2003_pos.json delete mode 100644 deeppavlov/configs/ner/ner_conll2003_torch_bert.json delete mode 100644 deeppavlov/configs/ner/ner_dstc2.json delete mode 100644 deeppavlov/configs/ner/ner_few_shot_ru.json delete mode 100644 deeppavlov/configs/ner/ner_few_shot_ru_simulate.json delete mode 100644 deeppavlov/configs/ner/ner_kb_rus.json delete mode 100644 deeppavlov/configs/ner/ner_lcquad_bert_ent_and_type.json delete mode 100644 deeppavlov/configs/ner/ner_lcquad_bert_probas.json delete mode 100644 deeppavlov/configs/ner/ner_ontonotes.json delete mode 100644 deeppavlov/configs/ner/ner_ontonotes_bert_emb.json delete mode 100644 deeppavlov/configs/ner/ner_ontonotes_bert_probas.json delete mode 100644 deeppavlov/configs/ner/ner_ontonotes_m1.json delete mode 100644 deeppavlov/configs/ner/ner_rus.json delete mode 100644 deeppavlov/configs/ner/slotfill_dstc2.json delete mode 100644 deeppavlov/configs/ner/slotfill_dstc2_raw.json delete mode 100644 deeppavlov/configs/ner/slotfill_simple_dstc2_raw.json delete mode 100644 deeppavlov/configs/ner/slotfill_simple_rasa_raw.json delete mode 100644 deeppavlov/configs/ner/vlsp2016_full.json delete mode 100644 deeppavlov/configs/odqa/en_odqa_infer_enwiki20161221.json delete mode 100644 deeppavlov/configs/odqa/ru_odqa_infer_wiki_rubert.json delete mode 100644 deeppavlov/configs/odqa/ru_odqa_infer_wiki_rubert_noans.json delete mode 100644 deeppavlov/configs/ranking/paraphrase_ident_paraphraser.json delete mode 100644 deeppavlov/configs/ranking/paraphrase_ident_paraphraser_interact.json delete mode 100644 deeppavlov/configs/ranking/ranking_default.json delete mode 100644 deeppavlov/configs/ranking/ranking_default_triplet.json delete mode 100644 deeppavlov/configs/ranking/ranking_ubuntu_v2_bert_sep.json delete mode 100644 deeppavlov/configs/ranking/ranking_ubuntu_v2_bert_sep_interact.json delete mode 100644 deeppavlov/configs/ranking/ranking_ubuntu_v2_bert_uncased.json delete mode 100644 deeppavlov/configs/ranking/ranking_ubuntu_v2_mt.json delete mode 100644 deeppavlov/configs/ranking/ranking_ubuntu_v2_mt_interact.json delete mode 100644 deeppavlov/configs/ranking/ranking_ubuntu_v2_mt_word2vec_dam_transformer.json delete mode 100644 deeppavlov/configs/ranking/ranking_ubuntu_v2_mt_word2vec_smn.json delete mode 100644 deeppavlov/configs/ranking/rel_ranking.json create mode 100644 deeppavlov/configs/ranking/rel_ranking_bert_en.json create mode 100644 deeppavlov/configs/ranking/rel_ranking_bert_ru.json create mode 100644 deeppavlov/configs/regressors/translation_ranker.json delete mode 100644 deeppavlov/configs/sentence_segmentation/sentseg_dailydialog.json rename deeppavlov/configs/{ner/ner_ontonotes_bert_mult_torch.json => sentence_segmentation/sentseg_dailydialog_bert.json} (74%) delete mode 100644 deeppavlov/configs/skills/aiml_skill.json delete mode 100644 deeppavlov/configs/skills/dsl_skill.json delete mode 100644 deeppavlov/configs/skills/rasa_skill.json delete mode 100644 deeppavlov/configs/spelling_correction/brillmoore_kartaslov_ru.json delete mode 100644 deeppavlov/configs/spelling_correction/brillmoore_kartaslov_ru_custom_vocab.json delete mode 100644 deeppavlov/configs/spelling_correction/brillmoore_kartaslov_ru_nolm.json delete mode 100644 deeppavlov/configs/squad/multi_squad_noans.json delete mode 100644 deeppavlov/configs/squad/multi_squad_noans_infer.json delete mode 100644 deeppavlov/configs/squad/multi_squad_retr_noans.json delete mode 100644 deeppavlov/configs/squad/multi_squad_ru_retr_noans.json delete mode 100644 deeppavlov/configs/squad/multi_squad_ru_retr_noans_rubert.json delete mode 100644 deeppavlov/configs/squad/multi_squad_ru_retr_noans_rubert_infer.json create mode 100644 deeppavlov/configs/squad/qa_multisberquad_bert.json rename deeppavlov/configs/squad/{squad_torch_bert.json => qa_squad2_bert.json} (83%) delete mode 100644 deeppavlov/configs/squad/squad.json delete mode 100644 deeppavlov/configs/squad/squad_bert_infer.json delete mode 100644 deeppavlov/configs/squad/squad_bert_multilingual_freezed_emb.json delete mode 100644 deeppavlov/configs/squad/squad_bert_uncased.json delete mode 100644 deeppavlov/configs/squad/squad_ru.json delete mode 100644 deeppavlov/configs/squad/squad_ru_bert_infer.json delete mode 100644 deeppavlov/configs/squad/squad_ru_convers_distilrubert_2L_infer.json delete mode 100644 deeppavlov/configs/squad/squad_ru_convers_distilrubert_6L_infer.json delete mode 100644 deeppavlov/configs/squad/squad_ru_rubert.json delete mode 100644 deeppavlov/configs/squad/squad_ru_rubert_infer.json delete mode 100644 deeppavlov/configs/squad/squad_ru_torch_bert.json delete mode 100644 deeppavlov/configs/squad/squad_torch_bert_infer.json delete mode 100644 deeppavlov/configs/squad/squad_zh_bert_mult.json delete mode 100644 deeppavlov/configs/squad/squad_zh_bert_zh.json delete mode 100644 deeppavlov/configs/syntax/ru_syntagrus_joint_parsing.json delete mode 100644 deeppavlov/configs/syntax/syntax_ru_syntagrus_bert.json delete mode 100644 deeppavlov/configs/tutorials/mt_bert/mt_bert_inference_tutorial.json delete mode 100644 deeppavlov/configs/tutorials/mt_bert/mt_bert_train_tutorial.json delete mode 100644 deeppavlov/contrib/data/tools/train_set_generation.py delete mode 100644 deeppavlov/contrib/examples/Dataset_generation_tutorial.ipynb delete mode 100644 deeppavlov/contrib/examples/db.sqlite delete mode 100644 deeppavlov/contrib/examples/dstc2-templates.txt delete mode 100644 deeppavlov/contrib/examples/dstc_slot_vals.json delete mode 100644 deeppavlov/contrib/examples/generated_data.json create mode 100644 deeppavlov/core/common/aliases.py create mode 100644 deeppavlov/core/common/base.py delete mode 100644 deeppavlov/core/common/check_gpu.py create mode 100644 deeppavlov/core/common/log_events.py delete mode 100644 deeppavlov/core/data/sqlite_database.py delete mode 100644 deeppavlov/core/layers/keras_layers.py delete mode 100644 deeppavlov/core/layers/tf_attention_mechanisms.py delete mode 100644 deeppavlov/core/layers/tf_csoftmax_attention.py delete mode 100644 deeppavlov/core/layers/tf_layers.py delete mode 100644 deeppavlov/core/models/keras_model.py delete mode 100644 deeppavlov/core/models/tf_backend.py delete mode 100644 deeppavlov/core/models/tf_model.py delete mode 100644 deeppavlov/dataset_iterators/dialog_iterator.py delete mode 100644 deeppavlov/dataset_iterators/dstc2_intents_iterator.py delete mode 100644 deeppavlov/dataset_iterators/dstc2_ner_iterator.py delete mode 100644 deeppavlov/dataset_iterators/elmo_file_paths_iterator.py delete mode 100644 deeppavlov/dataset_iterators/file_paths_iterator.py delete mode 100644 deeppavlov/dataset_iterators/kvret_dialog_iterator.py delete mode 100644 deeppavlov/dataset_iterators/morphotagger_iterator.py delete mode 100644 deeppavlov/dataset_iterators/ner_few_shot_iterator.py delete mode 100644 deeppavlov/dataset_iterators/snips_intents_iterator.py delete mode 100644 deeppavlov/dataset_iterators/snips_ner_iterator.py delete mode 100644 deeppavlov/dataset_readers/dstc2_reader.py delete mode 100644 deeppavlov/dataset_readers/file_paths_reader.py delete mode 100644 deeppavlov/dataset_readers/intent_catcher_reader.py delete mode 100644 deeppavlov/dataset_readers/kbqa_reader.py delete mode 100644 deeppavlov/dataset_readers/kvret_reader.py delete mode 100644 deeppavlov/dataset_readers/md_yaml_dialogs_reader.py delete mode 100644 deeppavlov/dataset_readers/morphotagging_dataset_reader.py delete mode 100644 deeppavlov/dataset_readers/multitask_reader.py delete mode 100644 deeppavlov/dataset_readers/siamese_reader.py delete mode 100644 deeppavlov/dataset_readers/snips_reader.py delete mode 100644 deeppavlov/dataset_readers/torchtext_classification_data_reader.py delete mode 100644 deeppavlov/dataset_readers/ubuntu_v2_mt_reader.py delete mode 100644 deeppavlov/models/bert/__init__.py delete mode 100644 deeppavlov/models/bert/bert_classifier.py delete mode 100644 deeppavlov/models/bert/bert_ranker.py delete mode 100644 deeppavlov/models/bert/bert_sequence_tagger.py delete mode 100644 deeppavlov/models/bert/bert_squad.py delete mode 100644 deeppavlov/models/classifiers/keras_classification_model.py delete mode 100644 deeppavlov/models/classifiers/ru_obscenity_classifier.py delete mode 100644 deeppavlov/models/elmo/__init__.py delete mode 100644 deeppavlov/models/elmo/bilm_model.py delete mode 100644 deeppavlov/models/elmo/elmo.py delete mode 100644 deeppavlov/models/elmo/elmo2tfhub.py delete mode 100644 deeppavlov/models/elmo/elmo_model.py delete mode 100644 deeppavlov/models/elmo/train_utils.py delete mode 100644 deeppavlov/models/embedders/bow_embedder.py delete mode 100644 deeppavlov/models/embedders/elmo_embedder.py delete mode 100644 deeppavlov/models/embedders/glove_embedder.py rename deeppavlov/{core/layers => models/entity_extraction}/__init__.py (100%) rename deeppavlov/models/{kbqa => entity_extraction}/entity_detection_parser.py (61%) create mode 100644 deeppavlov/models/entity_extraction/entity_linking.py create mode 100644 deeppavlov/models/entity_extraction/ner_chunker.py delete mode 100644 deeppavlov/models/go_bot/__init__.py delete mode 100644 deeppavlov/models/go_bot/dto/__init__.py delete mode 100644 deeppavlov/models/go_bot/dto/dataset_features.py delete mode 100644 deeppavlov/models/go_bot/dto/shared_gobot_params.py delete mode 100644 deeppavlov/models/go_bot/go_bot.py delete mode 100644 deeppavlov/models/go_bot/nlg/__init__.py delete mode 100644 deeppavlov/models/go_bot/nlg/dto/__init__.py delete mode 100644 deeppavlov/models/go_bot/nlg/dto/batch_nlg_response.py delete mode 100644 deeppavlov/models/go_bot/nlg/dto/json_nlg_response.py delete mode 100644 deeppavlov/models/go_bot/nlg/dto/nlg_response_interface.py delete mode 100644 deeppavlov/models/go_bot/nlg/mock_json_nlg_manager.py delete mode 100644 deeppavlov/models/go_bot/nlg/nlg_manager.py delete mode 100644 deeppavlov/models/go_bot/nlg/nlg_manager_interface.py delete mode 100644 deeppavlov/models/go_bot/nlg/templates/__init__.py delete mode 100644 deeppavlov/models/go_bot/nlg/templates/templates.py delete mode 100644 deeppavlov/models/go_bot/nlu/__init__.py delete mode 100644 deeppavlov/models/go_bot/nlu/dto/__init__.py delete mode 100644 deeppavlov/models/go_bot/nlu/dto/nlu_response.py delete mode 100644 deeppavlov/models/go_bot/nlu/dto/nlu_response_interface.py delete mode 100644 deeppavlov/models/go_bot/nlu/dto/text_vectorization_response.py delete mode 100644 deeppavlov/models/go_bot/nlu/nlu_manager.py delete mode 100644 deeppavlov/models/go_bot/nlu/nlu_manager_interface.py delete mode 100644 deeppavlov/models/go_bot/nlu/tokens_vectorizer.py delete mode 100644 deeppavlov/models/go_bot/policy/__init__.py delete mode 100644 deeppavlov/models/go_bot/policy/dto/__init__.py delete mode 100644 deeppavlov/models/go_bot/policy/dto/attn_params.py delete mode 100644 deeppavlov/models/go_bot/policy/dto/digitized_policy_features.py delete mode 100644 deeppavlov/models/go_bot/policy/dto/policy_network_params.py delete mode 100644 deeppavlov/models/go_bot/policy/dto/policy_prediction.py delete mode 100644 deeppavlov/models/go_bot/policy/policy_network.py delete mode 100644 deeppavlov/models/go_bot/tracker/__init__.py delete mode 100644 deeppavlov/models/go_bot/tracker/dialogue_state_tracker.py delete mode 100644 deeppavlov/models/go_bot/tracker/dto/__init__.py delete mode 100644 deeppavlov/models/go_bot/tracker/dto/dst_knowledge.py delete mode 100644 deeppavlov/models/go_bot/tracker/dto/tracker_knowledge_interface.py delete mode 100644 deeppavlov/models/go_bot/tracker/featurized_tracker.py delete mode 100644 deeppavlov/models/go_bot/tracker/tracker_interface.py delete mode 100644 deeppavlov/models/go_bot/wrapper.py delete mode 100644 deeppavlov/models/intent_catcher/__init__.py delete mode 100644 deeppavlov/models/intent_catcher/intent_catcher.py delete mode 100644 deeppavlov/models/kbqa/entity_linking.py delete mode 100644 deeppavlov/models/kbqa/kbqa_entity_linking.py delete mode 100644 deeppavlov/models/kbqa/query_generator_online.py delete mode 100644 deeppavlov/models/kbqa/rel_ranking_bert_infer.py create mode 100644 deeppavlov/models/kbqa/type_define.py delete mode 100644 deeppavlov/models/kbqa/wiki_parser_online.py delete mode 100644 deeppavlov/models/morpho_tagger/__init__.py delete mode 100644 deeppavlov/models/morpho_tagger/__main__.py delete mode 100644 deeppavlov/models/morpho_tagger/cells.py delete mode 100644 deeppavlov/models/morpho_tagger/common.py delete mode 100644 deeppavlov/models/morpho_tagger/common_tagger.py delete mode 100644 deeppavlov/models/morpho_tagger/lemmatizer.py delete mode 100644 deeppavlov/models/morpho_tagger/morpho_tagger.py delete mode 100644 deeppavlov/models/multitask_bert/__init__.py delete mode 100644 deeppavlov/models/multitask_bert/multitask_bert.py delete mode 100644 deeppavlov/models/nemo/__init__.py delete mode 100644 deeppavlov/models/nemo/asr.py delete mode 100644 deeppavlov/models/nemo/common.py delete mode 100644 deeppavlov/models/nemo/tts.py delete mode 100644 deeppavlov/models/nemo/vocoder.py delete mode 100644 deeppavlov/models/ner/NER_model.py delete mode 100644 deeppavlov/models/ner/__init__.py delete mode 100644 deeppavlov/models/ner/bio.py delete mode 100644 deeppavlov/models/ner/network.py delete mode 100644 deeppavlov/models/ner/svm.py delete mode 100644 deeppavlov/models/preprocessors/assemble_embeddings_matrix.py delete mode 100644 deeppavlov/models/preprocessors/bert_preprocessor.py delete mode 100644 deeppavlov/models/preprocessors/capitalization.py delete mode 100644 deeppavlov/models/preprocessors/char_splitter.py delete mode 100644 deeppavlov/models/preprocessors/random_embeddings_matrix.py delete mode 100644 deeppavlov/models/preprocessors/russian_lemmatizer.py delete mode 100644 deeppavlov/models/preprocessors/siamese_preprocessor.py delete mode 100644 deeppavlov/models/ranking/bilstm_gru_siamese_network.py delete mode 100644 deeppavlov/models/ranking/bilstm_siamese_network.py delete mode 100644 deeppavlov/models/ranking/deep_attention_matching_network_use_transformer.py delete mode 100644 deeppavlov/models/ranking/keras_siamese_model.py delete mode 100644 deeppavlov/models/ranking/matching_models/__init__.py delete mode 100644 deeppavlov/models/ranking/matching_models/dam_utils/__init__.py delete mode 100644 deeppavlov/models/ranking/matching_models/dam_utils/layers.py delete mode 100644 deeppavlov/models/ranking/matching_models/dam_utils/operations.py delete mode 100644 deeppavlov/models/ranking/mpm_siamese_network.py delete mode 100644 deeppavlov/models/ranking/rel_ranker.py delete mode 100644 deeppavlov/models/ranking/sequential_matching_network.py delete mode 100644 deeppavlov/models/ranking/siamese_model.py delete mode 100644 deeppavlov/models/ranking/siamese_predictor.py delete mode 100644 deeppavlov/models/ranking/tf_base_matching_model.py delete mode 100644 deeppavlov/models/slotfill/__init__.py delete mode 100644 deeppavlov/models/slotfill/slotfill.py delete mode 100644 deeppavlov/models/slotfill/slotfill_raw.py delete mode 100644 deeppavlov/models/squad/__init__.py delete mode 100644 deeppavlov/models/squad/squad.py delete mode 100644 deeppavlov/models/squad/utils.py delete mode 100644 deeppavlov/models/syntax_parser/__init__.py delete mode 100644 deeppavlov/models/syntax_parser/joint.py delete mode 100644 deeppavlov/models/syntax_parser/network.py delete mode 100644 deeppavlov/models/syntax_parser/parser.py delete mode 100644 deeppavlov/models/tokenizers/jieba_tokenizer.py delete mode 100644 deeppavlov/models/tokenizers/lazy_tokenizer.py delete mode 100644 deeppavlov/models/tokenizers/ru_sent_tokenizer.py create mode 100644 deeppavlov/models/torch_bert/crf.py create mode 100644 deeppavlov/models/torch_bert/torch_transformers_el_ranker.py delete mode 100644 deeppavlov/models/vectorizers/word_vectorizer.py delete mode 100644 deeppavlov/requirements/aiml_skill.txt delete mode 100644 deeppavlov/requirements/bert_dp.txt delete mode 100644 deeppavlov/requirements/faiss.txt delete mode 100644 deeppavlov/requirements/gensim.txt delete mode 100644 deeppavlov/requirements/jieba.txt delete mode 100644 deeppavlov/requirements/morpho_tagger.txt delete mode 100644 deeppavlov/requirements/nemo-asr.txt delete mode 100644 deeppavlov/requirements/nemo-tts.txt delete mode 100644 deeppavlov/requirements/nemo.txt create mode 100644 deeppavlov/requirements/pytorch.txt delete mode 100644 deeppavlov/requirements/pytorch14.txt delete mode 100644 deeppavlov/requirements/pytorch16.txt delete mode 100644 deeppavlov/requirements/rasa_skill.txt create mode 100644 deeppavlov/requirements/ru_core_news_sm.txt create mode 100644 deeppavlov/requirements/sacremoses.txt create mode 100644 deeppavlov/requirements/slovnet.txt delete mode 100644 deeppavlov/requirements/spacy.txt delete mode 100644 deeppavlov/requirements/syntax_parser.txt delete mode 100644 deeppavlov/requirements/tf-gpu.txt delete mode 100644 deeppavlov/requirements/tf-hub.txt delete mode 100644 deeppavlov/requirements/tf.txt create mode 100644 deeppavlov/requirements/torchcrf.txt delete mode 100644 deeppavlov/requirements/torchtext.txt delete mode 100644 deeppavlov/requirements/transformers28.txt delete mode 100644 deeppavlov/requirements/xeger.txt delete mode 100644 deeppavlov/skills/__init__.py delete mode 100644 deeppavlov/skills/aiml_skill/README.md delete mode 100644 deeppavlov/skills/aiml_skill/__init__.py delete mode 100644 deeppavlov/skills/aiml_skill/aiml_skill.py delete mode 100644 deeppavlov/skills/dsl_skill/__init__.py delete mode 100644 deeppavlov/skills/dsl_skill/context.py delete mode 100644 deeppavlov/skills/dsl_skill/dsl_skill.py delete mode 100644 deeppavlov/skills/dsl_skill/handlers/__init__.py delete mode 100644 deeppavlov/skills/dsl_skill/handlers/handler.py delete mode 100644 deeppavlov/skills/dsl_skill/handlers/regex_handler.py delete mode 100644 deeppavlov/skills/dsl_skill/utils.py delete mode 100644 deeppavlov/skills/rasa_skill/__init__.py delete mode 100644 deeppavlov/skills/rasa_skill/rasa_skill.py delete mode 100644 deeppavlov/utils/alexa/__init__.py delete mode 100644 deeppavlov/utils/alexa/request_parameters.py delete mode 100644 deeppavlov/utils/alexa/server.py delete mode 100644 deeppavlov/utils/alice/__init__.py delete mode 100644 deeppavlov/utils/alice/request_parameters.py delete mode 100644 deeppavlov/utils/alice/server.py delete mode 100644 deeppavlov/utils/connector/bot.py delete mode 100644 deeppavlov/utils/connector/conversation.py delete mode 100644 deeppavlov/utils/connector/ssl_tools.py delete mode 100644 deeppavlov/utils/ms_bot_framework/__init__.py delete mode 100644 deeppavlov/utils/ms_bot_framework/server.py delete mode 100644 deeppavlov/utils/telegram/__init__.py delete mode 100644 deeppavlov/utils/telegram/telegram_ui.py delete mode 100644 docs/_static/ms_bot_framework/01_web_app_bot.png delete mode 100644 docs/_static/ms_bot_framework/02_web_app_bot_settings.png delete mode 100644 docs/_static/ms_bot_framework/03_navigate_to_bot.png delete mode 100644 docs/_static/ms_bot_framework/04_bot_settings.png delete mode 100644 docs/_static/ms_bot_framework/05_bot_channels.png delete mode 100644 docs/_static/social/f_logo_RGB-Blue_58.png delete mode 100644 docs/apiref/models/bert.rst delete mode 100644 docs/apiref/models/elmo.rst create mode 100644 docs/apiref/models/entity_extraction.rst delete mode 100644 docs/apiref/models/entity_linking.rst delete mode 100644 docs/apiref/models/go_bot.rst delete mode 100644 docs/apiref/models/intent_catcher.rst delete mode 100644 docs/apiref/models/morpho_tagger.rst delete mode 100644 docs/apiref/models/multitask_bert.rst delete mode 100644 docs/apiref/models/nemo.rst delete mode 100644 docs/apiref/models/ner.rst delete mode 100644 docs/apiref/models/ranking.rst delete mode 100644 docs/apiref/models/slotfill.rst delete mode 100644 docs/apiref/models/squad.rst delete mode 100644 docs/apiref/models/syntax_parser.rst delete mode 100644 docs/apiref/skills.rst delete mode 100644 docs/apiref/skills/aiml_skill.rst delete mode 100644 docs/apiref/skills/dsl_skill.rst delete mode 100644 docs/apiref/skills/rasa_skill.rst create mode 100644 docs/features/models/entity_extraction.rst delete mode 100644 docs/features/models/entity_linking.rst delete mode 100644 docs/features/models/intent_catcher.rst delete mode 100644 docs/features/models/morphotagger.rst delete mode 100644 docs/features/models/multitask_bert.rst delete mode 100644 docs/features/models/nemo.rst delete mode 100644 docs/features/models/slot_filling.rst delete mode 100644 docs/features/models/syntaxparser.rst delete mode 100644 docs/features/skills/aiml_skill.rst delete mode 100644 docs/features/skills/dsl_skill.rst delete mode 100644 docs/features/skills/go_bot.rst delete mode 100644 docs/features/skills/rasa_skill.rst delete mode 100644 docs/integrations/amazon_alexa.rst delete mode 100644 docs/integrations/ms_bot.rst delete mode 100644 docs/integrations/telegram.rst delete mode 100644 docs/integrations/yandex_alice.rst delete mode 100644 docs/intro/choose_framework.rst delete mode 100644 examples/Pseudo-labeling for classification.ipynb delete mode 100644 examples/README.md delete mode 100644 examples/classification_tutorial.ipynb delete mode 100644 examples/gobot_extended_tutorial.ipynb delete mode 100644 examples/gobot_formfilling_tutorial.ipynb delete mode 100644 examples/gobot_md_yaml_configs_tutorial.ipynb delete mode 100644 examples/gobot_tutorial.ipynb delete mode 100644 examples/img/gobot_database.png delete mode 100644 examples/img/gobot_example.png delete mode 100644 examples/img/gobot_pipeline.png delete mode 100644 examples/img/gobot_policy.png delete mode 100644 examples/img/gobot_simple_example.png delete mode 100644 examples/img/gobot_simple_pipeline.png delete mode 100644 examples/img/gobot_simple_policy.png delete mode 100644 examples/img/gobot_simple_templates.png delete mode 100644 examples/img/gobot_slotfiller.png delete mode 100644 examples/img/gobot_templates.png delete mode 100644 examples/img/sc_loss_comparison.png delete mode 100644 examples/img/sc_ner_lr_cosine.png delete mode 100644 examples/img/sc_ner_lr_exponential.png delete mode 100644 examples/img/sc_ner_lr_linear.png delete mode 100644 examples/img/sc_ner_lr_linear2.png delete mode 100644 examples/img/sc_ner_lr_no.png delete mode 100644 examples/img/sc_ner_lr_onecycle.png delete mode 100644 examples/img/sc_ner_lr_polynomial.png delete mode 100644 examples/img/sc_ner_lr_polynomial1.png delete mode 100644 examples/img/sc_ner_lr_polynomial2.png delete mode 100644 examples/img/sc_ner_lr_sc.png delete mode 100644 examples/img/sc_ner_lr_sc1.png delete mode 100644 examples/img/sc_ner_lr_trapezoid.png delete mode 100644 examples/morphotagger_example.ipynb delete mode 100644 examples/super_convergence_tutorial.ipynb delete mode 100644 tests/test_aiml_skill.py delete mode 100644 tests/test_configs/classifiers/intents_snips_bigru.json delete mode 100644 tests/test_configs/classifiers/intents_snips_bilstm.json delete mode 100644 tests/test_configs/classifiers/intents_snips_bilstm_bilstm.json delete mode 100644 tests/test_configs/classifiers/intents_snips_bilstm_cnn.json delete mode 100644 tests/test_configs/classifiers/intents_snips_bilstm_proj_layer.json delete mode 100644 tests/test_configs/classifiers/intents_snips_bilstm_self_add_attention.json delete mode 100644 tests/test_configs/classifiers/intents_snips_bilstm_self_mult_attention.json delete mode 100644 tests/test_configs/classifiers/intents_snips_cnn_bilstm.json delete mode 100644 tests/test_configs/nemo/tts2asr_test.json delete mode 100644 tests/test_configs/odqa/en_odqa_infer_wiki_test.json delete mode 100644 tests/test_configs/odqa/en_odqa_pop_infer_wiki_test.json delete mode 100644 tests/test_configs/odqa/ru_odqa_infer_wiki_test.json delete mode 100644 tests/test_dsl_skill.py delete mode 100644 tests/test_rasa_skill.py delete mode 100644 tests/test_tf_layers.py create mode 100644 utils/Docker/Dockerfile rename {deeppavlov/models/ranking => utils/Docker}/README.md (100%) create mode 100755 utils/Docker/cmd.sh create mode 100644 utils/Docker/docker-compose.yml diff --git a/Jenkinsfile b/Jenkinsfile index 78f8407493..4b7271dd24 100644 --- a/Jenkinsfile +++ b/Jenkinsfile @@ -10,26 +10,15 @@ node('cuda-module') { stage('Setup') { env.TFHUB_CACHE_DIR="tfhub_cache" sh """ - virtualenv --python=python3.7 '.venv-$BUILD_NUMBER' - . '.venv-$BUILD_NUMBER/bin/activate' - pip install .[tests,docs] - pip install -r deeppavlov/requirements/tf-gpu.txt - rm -rf `find . -mindepth 1 -maxdepth 1 ! -name tests ! -name Jenkinsfile ! -name docs ! -name '.venv-$BUILD_NUMBER'` + EPOCH=\$(date +%s) docker-compose -f utils/Docker/docker-compose.yml -p $BUILD_TAG build """ } stage('Tests') { sh """ - . /etc/profile - module add cuda/10.0 - . .venv-$BUILD_NUMBER/bin/activate - - cd docs - make clean - make html - cd .. - - flake8 `python -c 'import deeppavlov; print(deeppavlov.__path__[0])'` --count --select=E9,F63,F7,F82 --show-source --statistics - pytest -v --disable-warnings + docker-compose -f utils/Docker/docker-compose.yml -p $BUILD_TAG up py36 py37 + docker-compose -f utils/Docker/docker-compose.yml -p $BUILD_TAG ps | grep Exit | grep -v 'Exit 0' && exit 1 + docker-compose -f utils/Docker/docker-compose.yml -p $BUILD_TAG up py38 py39 + docker-compose -f utils/Docker/docker-compose.yml -p $BUILD_TAG ps | grep Exit | grep -v 'Exit 0' && exit 1 || exit 0 """ currentBuild.result = 'SUCCESS' } @@ -39,6 +28,10 @@ node('cuda-module') { throw e } finally { + sh """ + docker-compose -f utils/Docker/docker-compose.yml -p $BUILD_TAG rm -f + docker network rm \$(echo $BUILD_TAG | awk '{print tolower(\$0)}')_default + """ emailext to: "\${DEFAULT_RECIPIENTS}", subject: "${env.JOB_NAME} - Build # ${currentBuild.number} - ${currentBuild.result}!", body: '${BRANCH_NAME} - ${BUILD_URL}', diff --git a/README.md b/README.md index bdb0a4022d..8a444935eb 100644 --- a/README.md +++ b/README.md @@ -1,10 +1,9 @@ [![License Apache 2.0](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/deepmipt/DeepPavlov/blob/master/LICENSE) -![Python 3.6, 3.7](https://img.shields.io/badge/python-3.6%20%7C%203.7-green.svg) +![Python 3.6, 3.7, 3.8, 3.9](https://img.shields.io/badge/python-3.6%20%7C%203.7%20%7C%203.8%20%7C%203.9-green.svg) [![Downloads](https://pepy.tech/badge/deeppavlov)](https://pepy.tech/project/deeppavlov) -DeepPavlov is an open-source conversational AI library built on [TensorFlow](https://www.tensorflow.org/), [Keras](https://keras.io/) -and [PyTorch](https://pytorch.org/). +DeepPavlov is an open-source conversational AI library built on [PyTorch](https://pytorch.org/). DeepPavlov is designed for * development of production ready chat-bots and complex conversational systems, @@ -27,7 +26,7 @@ Please leave us [your feedback](https://forms.gle/i64fowQmiVhMMC7f9) on how we c **Models** -[Named Entity Recognition](http://docs.deeppavlov.ai/en/master/features/models/ner.html) | [Slot filling](http://docs.deeppavlov.ai/en/master/features/models/slot_filling.html) +[Named Entity Recognition](http://docs.deeppavlov.ai/en/master/features/models/ner.html) [Intent/Sentence Classification](http://docs.deeppavlov.ai/en/master/features/models/classifiers.html) | [Question Answering over Text (SQuAD)](http://docs.deeppavlov.ai/en/master/features/models/squad.html) @@ -35,17 +34,13 @@ Please leave us [your feedback](https://forms.gle/i64fowQmiVhMMC7f9) on how we c [Sentence Similarity/Ranking](http://docs.deeppavlov.ai/en/master/features/models/neural_ranking.html) | [TF-IDF Ranking](http://docs.deeppavlov.ai/en/master/features/models/tfidf_ranking.html) -[Morphological tagging](http://docs.deeppavlov.ai/en/master/features/models/morphotagger.html) | [Syntactic parsing](http://docs.deeppavlov.ai/en/master/features/models/syntaxparser.html) +[Automatic Spelling Correction](http://docs.deeppavlov.ai/en/master/features/models/spelling_correction.html) -[Automatic Spelling Correction](http://docs.deeppavlov.ai/en/master/features/models/spelling_correction.html) | [ELMo training and fine-tuning](http://docs.deeppavlov.ai/en/master/apiref/models/elmo.html) - -[Speech recognition and synthesis (ASR and TTS)](http://docs.deeppavlov.ai/en/master/features/models/nemo.html) based on [NVIDIA NeMo](https://nvidia.github.io/NeMo/index.html) - -[Entity Linking](http://docs.deeppavlov.ai/en/master/features/models/entity_linking.html) | [Multitask BERT](http://docs.deeppavlov.ai/en/master/features/models/multitask_bert.html) +[Entity Linking](http://docs.deeppavlov.ai/en/master/features/models/entity_linking.html) **Skills** -[Goal(Task)-oriented Bot](http://docs.deeppavlov.ai/en/master/features/skills/go_bot.html) | [Open Domain Questions Answering](http://docs.deeppavlov.ai/en/master/features/skills/odqa.html) +[Open Domain Questions Answering](http://docs.deeppavlov.ai/en/master/features/skills/odqa.html) [Frequently Asked Questions Answering](http://docs.deeppavlov.ai/en/master/features/skills/faq.html) @@ -63,15 +58,13 @@ Please leave us [your feedback](https://forms.gle/i64fowQmiVhMMC7f9) on how we c **Integrations** -[REST API](http://docs.deeppavlov.ai/en/master/integrations/rest_api.html) | [Socket API](http://docs.deeppavlov.ai/en/master/integrations/socket_api.html) | [Yandex Alice](http://docs.deeppavlov.ai/en/master/integrations/yandex_alice.html) +[REST API](http://docs.deeppavlov.ai/en/master/integrations/rest_api.html) | [Socket API](http://docs.deeppavlov.ai/en/master/integrations/socket_api.html) -[Telegram](http://docs.deeppavlov.ai/en/master/integrations/telegram.html) | [Microsoft Bot Framework](http://docs.deeppavlov.ai/en/master/integrations/ms_bot.html) - -[Amazon Alexa](http://docs.deeppavlov.ai/en/master/integrations/amazon_alexa.html) | [Amazon AWS](http://docs.deeppavlov.ai/en/master/integrations/aws_ec2.html) +[Amazon AWS](http://docs.deeppavlov.ai/en/master/integrations/aws_ec2.html) ## Installation -0. We support `Linux` and `Windows` platforms, `Python 3.6` and `Python 3.7` +0. We support `Linux` platform, `Python 3.6`, `3.7`, `3.8` and `3.9` * **`Python 3.5` is not supported!** * **installation for `Windows` requires `Git`(for example, [git](https://git-scm.com/download/win)) and `Visual Studio 2015/2017` with `C++` build tools installed!** @@ -112,29 +105,8 @@ evaluate and infer it: #### GPU requirements -To run supported DeepPavlov models on GPU you should have [CUDA](https://developer.nvidia.com/cuda-toolkit) 10.0 -installed on your host machine and TensorFlow with GPU support (`tensorflow-gpu`) -installed in your python environment. Current supported TensorFlow version is 1.15.2. -Run - -``` -pip install tensorflow-gpu==1.15.2 -``` - -before installing model's package requirements to install supported `tensorflow-gpu` version. - - -Before making choice of an interface, install model's package requirements -(CLI): - -```bash -python -m deeppavlov install -``` - -* where `` is path to the chosen model's config file (e.g. - `deeppavlov/configs/ner/slotfill_dstc2.json`) or just name without - *.json* extension (e.g. `slotfill_dstc2`) - +To run supported DeepPavlov models on GPU you should have [CUDA](https://developer.nvidia.com/cuda-toolkit) compatible +with used GPU and [library PyTorch version](deeppavlov/requirements/pytorch.txt). ### Command line interface (CLI) @@ -172,10 +144,6 @@ python -m deeppavlov [-d] * `interact` to interact via CLI, * `riseapi` to run a REST API server (see [doc](http://docs.deeppavlov.ai/en/master/integrations/rest_api.html)), - * `telegram` to run as a Telegram bot (see - [doc](http://docs.deeppavlov.ai/en/master/integrations/telegram.html)), - * `msbot` to run a Miscrosoft Bot Framework server (see - [doc](http://docs.deeppavlov.ai/en/master/integrations/ms_bot.html)), * `predict` to get prediction for samples from *stdin* or from ** if `-f ` is specified. * `` specifies path (or name) of model's config file @@ -228,87 +196,12 @@ from deeppavlov import evaluate_model model = evaluate_model(, download=True) ``` -There are also available integrations with various messengers, see -[Telegram Bot doc page](http://docs.deeppavlov.ai/en/master/integrations/telegram.html) -and others in the Integrations section for more info. - - -## Breaking Changes - -**Breaking changes in version 0.15.0** -- [bert_as_summarizer](https://github.com/deepmipt/DeepPavlov/pull/1391), [seq2seq_go_bot](https://github.com/deepmipt/DeepPavlov/pull/1434) and all deeppavlov.deprecated components were removed -- hyperparameter optimization by neural evolution was [removed](https://github.com/deepmipt/DeepPavlov/pull/1436) - -**Breaking changes in version 0.7.0** -- in dialog logger config file [dialog_logger_config.json](deeppavlov/utils/settings/dialog_logger_config.json) `agent_name` parameter was renamed to `logger_name`, - the default value was changed -- Agent, Skill, eCommerce Bot and Pattern Matching classes were moved to [deeppavlov.deprecated](deeppavlov/deprecated) -- [AIML Skill](http://docs.deeppavlov.ai/en/0.7.0/features/skills/aiml_skill.html), - [RASA Skill](http://docs.deeppavlov.ai/en/0.7.0/features/skills/rasa_skill.html), - [Yandex Alice](http://docs.deeppavlov.ai/en/0.7.0/integrations/yandex_alice.html), - [Amazon Alexa](http://docs.deeppavlov.ai/en/0.7.0/integrations/amazon_alexa.html), - [Microsoft Bot Framework](http://docs.deeppavlov.ai/en/0.7.0/integrations/ms_bot.html) and - [Telegram integration](http://docs.deeppavlov.ai/en/0.7.0/integrations/telegram.html) interfaces were changed -- `/start` and `/help` Telegram messages were moved from `models_info.json` to [server_config.json](deeppavlov/utils/settings/server_config.json) -- [risesocket](http://docs.deeppavlov.ai/en/0.7.0/integrations/socket_api.html) request and response format was changed -- [riseapi](http://docs.deeppavlov.ai/en/0.7.0/integrations/rest_api.html#advanced-configuration) and - [risesocket](http://docs.deeppavlov.ai/en/0.7.0/integrations/socket_api.html#advanced-configuration) model-specific - properties parametrization was changed - -**Breaking changes in version 0.6.0** -- [REST API](http://docs.deeppavlov.ai/en/0.6.0/integrations/rest_api.html): - - all models default endpoints were renamed to `/model` - - by default model arguments names are taken from `chainer.in` - [configuration parameter](http://docs.deeppavlov.ai/en/0.6.0/intro/configuration.html) instead of pre-set names - from a [settings file](http://docs.deeppavlov.ai/en/0.6.0/integrations/settings.html) - - swagger api endpoint moved from `/apidocs` to `/docs` -- when using `"max_proba": true` in - a [`proba2labels` component](http://docs.deeppavlov.ai/en/0.6.0/apiref/models/classifiers.html) for classification, - it will return single label for every batch element instead of a list. One can set `"top_n": 1` - to get batches of single item lists as before - -**Breaking changes in version 0.5.0** -- dependencies have to be reinstalled for most pipeline configurations -- models depending on `tensorflow` require `CUDA 10.0` to run on GPU instead of `CUDA 9.0` -- scikit-learn models have to be redownloaded or retrained - -**Breaking changes in version 0.4.0!** -- default target variable name for [neural evolution](https://docs.deeppavlov.ai/en/0.4.0/intro/hypersearch.html#parameters-evolution-for-deeppavlov-models) -was changed from `MODELS_PATH` to `MODEL_PATH`. - -**Breaking changes in version 0.3.0!** -- component option `fit_on_batch` in configuration files was removed and replaced with adaptive usage of the `fit_on` parameter. - -**Breaking changes in version 0.2.0!** -- `utils` module was moved from repository root in to `deeppavlov` module -- `ms_bot_framework_utils`,`server_utils`, `telegram utils` modules was renamed to `ms_bot_framework`, `server` and `telegram` correspondingly -- rename metric functions `exact_match` to `squad_v2_em` and `squad_f1` to `squad_v2_f1` -- replace dashes in configs name with underscores - -**Breaking changes in version 0.1.0!** -- As of `version 0.1.0` all models, embeddings and other downloaded data for provided configurations are - by default downloaded to the `.deeppavlov` directory in current user's home directory. - This can be changed on per-model basis by modifying - a `ROOT_PATH` [variable](http://docs.deeppavlov.ai/en/master/intro/configuration.html#variables) - or related fields one by one in model's configuration file. - -- In configuration files, for all features/models, dataset readers and iterators `"name"` and `"class"` fields are combined -into the `"class_name"` field. - -- `deeppavlov.core.commands.infer.build_model_from_config()` was renamed to `build_model` and can be imported from the - `deeppavlov` module directly. - -- The way arguments are passed to metrics functions during training and evaluation was changed and - [documented](http://docs.deeppavlov.ai/en/0.4.0/intro/config_description.html#metrics). ## License DeepPavlov is Apache 2.0 - licensed. -## The Team - -DeepPavlov is built and maintained by [Neural Networks and Deep Learning Lab](https://www.facebook.com/deepmipt/) -at [MIPT](https://mipt.ru/english/). +##

diff --git a/deeppavlov/__init__.py b/deeppavlov/__init__.py index 6f7169f96a..f72041f40d 100644 --- a/deeppavlov/__init__.py +++ b/deeppavlov/__init__.py @@ -19,6 +19,7 @@ from .configs import configs from .core.commands.infer import build_model from .core.commands.train import train_evaluate_model_from_config +from .core.common.base import Element, Model from .core.common.chainer import Chainer from .core.common.log import init_logger from .download import deep_download diff --git a/deeppavlov/_meta.py b/deeppavlov/_meta.py index c6503adea1..abb2989682 100644 --- a/deeppavlov/_meta.py +++ b/deeppavlov/_meta.py @@ -1,4 +1,4 @@ -__version__ = '0.17.0' +__version__ = '1.0.0rc1' __author__ = 'Neural Networks and Deep Learning lab, MIPT' __description__ = 'An open source library for building end-to-end dialog systems and training chatbots.' __keywords__ = ['NLP', 'NER', 'SQUAD', 'Intents', 'Chatbot'] diff --git a/deeppavlov/configs/classifiers/boolqa_rubert.json b/deeppavlov/configs/classifiers/boolqa_rubert.json index 34045bfdb1..60ee9f090c 100644 --- a/deeppavlov/configs/classifiers/boolqa_rubert.json +++ b/deeppavlov/configs/classifiers/boolqa_rubert.json @@ -13,24 +13,21 @@ "in_y": ["y"], "pipe": [ { - "class_name": "bert_preprocessor", - "vocab_file": "{DOWNLOADS_PATH}/bert_models/rubert_cased_L-12_H-768_A-12_v1/vocab.txt", + "class_name": "torch_transformers_preprocessor", + "vocab_file": "{TRANSFORMER}", "do_lower_case": false, "max_seq_length": 128, "in": ["text_a", "text_b"], "out": ["bert_features"] }, { - "class_name": "bert_classifier", + "class_name": "torch_transformers_classifier", "n_classes": 2, - "one_hot_labels": false, - "bert_config_file": "{DOWNLOADS_PATH}/bert_models/rubert_cased_L-12_H-768_A-12_v1/bert_config.json", - "pretrained_bert": "{DOWNLOADS_PATH}/bert_models/rubert_cased_L-12_H-768_A-12_v1/bert_model.ckpt", + "pretrained_bert": "{TRANSFORMER}", "save_path": "{MODELS_PATH}/boolqa_rubert/model_rubert", "load_path": "{MODELS_PATH}/boolqa_rubert/model_rubert", - "keep_prob": 0.5, - "optimizer": "tf.train:AdamOptimizer", - "learning_rate": 2e-05, + "optimizer": "AdamW", + "optimizer_parameters": {"lr": 2e-05}, "learning_rate_drop_patience": 3, "learning_rate_drop_div": 2.0, "in": ["bert_features"], @@ -50,19 +47,14 @@ "log_every_n_epochs": 1, "evaluation_targets": ["valid", "train"], "show_examples": false, - "tensorboard_log_dir": "{MODELS_PATH}/boolqa_rubert/logs" + "class_name": "torch_trainer" }, "metadata": { "variables": { "ROOT_PATH": "~/.deeppavlov", "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/bert/rubert_cased_L-12_H-768_A-12_v1.tar.gz", - "subdir": "{DOWNLOADS_PATH}/bert_models" - } - ] + "MODELS_PATH": "{ROOT_PATH}/models", + "TRANSFORMER": "DeepPavlov/rubert-base-cased" + } } } diff --git a/deeppavlov/configs/classifiers/entity_ranking_bert_eng_no_mention.json b/deeppavlov/configs/classifiers/entity_ranking_bert_eng_no_mention.json deleted file mode 100644 index 0c317cfad1..0000000000 --- a/deeppavlov/configs/classifiers/entity_ranking_bert_eng_no_mention.json +++ /dev/null @@ -1,76 +0,0 @@ -{ - "dataset_reader": { - "class_name": "paraphraser_reader", - "data_path": "{DOWNLOADS_PATH}/entity_ranking_bert_eng_no_mention", - "do_lower_case": false - }, - "dataset_iterator": { - "class_name": "siamese_iterator", - "seed": 243, - "len_valid": 500 - }, - "chainer": { - "in": ["text_a", "text_b"], - "in_y": ["y"], - "pipe": [ - { - "class_name": "bert_preprocessor", - "vocab_file": "{DOWNLOADS_PATH}/bert_models/cased_L-12_H-768_A-12/vocab.txt", - "do_lower_case": false, - "max_seq_length": 64, - "in": ["text_a", "text_b"], - "out": ["bert_features"] - }, - { - "class_name": "bert_classifier", - "n_classes": 2, - "return_probas": true, - "one_hot_labels": false, - "bert_config_file": "{DOWNLOADS_PATH}/bert_models/cased_L-12_H-768_A-12/bert_config.json", - "pretrained_bert": "{DOWNLOADS_PATH}/bert_models/cased_L-12_H-768_A-12/bert_model.ckpt", - "save_path": "{MODEL_PATH}/model", - "load_path": "{MODEL_PATH}/model", - "keep_prob": 0.5, - "learning_rate": 2e-05, - "learning_rate_drop_patience": 2, - "learning_rate_drop_div": 2.0, - "in": ["bert_features"], - "in_y": ["y"], - "out": ["predictions"] - } - ], - "out": ["predictions"] - }, - "train": { - "batch_size": 32, - "pytest_max_batches": 2, - "metrics": ["f1", "acc"], - "validation_patience": 10, - "val_every_n_batches": 100, - "log_every_n_batches": 100, - "evaluation_targets": ["train", "valid", "test"], - "tensorboard_log_dir": "{MODEL_PATH}/" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "MODEL_PATH": "{MODELS_PATH}/entity_ranking_bert_eng_no_mention" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/bert/cased_L-12_H-768_A-12.zip", - "subdir": "{DOWNLOADS_PATH}/bert_models" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/datasets/rel_ranking_bert_rus.tar.gz", - "subdir": "{DOWNLOADS_PATH}/rel_ranking_rus" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/models/rel_ranking_bert_rus.tar.gz", - "subdir": "{MODELS_PATH}/rel_ranking_bert_rus" - } - ] - } -} diff --git a/deeppavlov/configs/classifiers/entity_ranking_bert_rus_no_mention.json b/deeppavlov/configs/classifiers/entity_ranking_bert_rus_no_mention.json deleted file mode 100644 index 6dc5b247a3..0000000000 --- a/deeppavlov/configs/classifiers/entity_ranking_bert_rus_no_mention.json +++ /dev/null @@ -1,76 +0,0 @@ -{ - "dataset_reader": { - "class_name": "paraphraser_reader", - "data_path": "{DOWNLOADS_PATH}/entity_ranking_bert_rus_no_mention", - "do_lower_case": false - }, - "dataset_iterator": { - "class_name": "siamese_iterator", - "seed": 243, - "len_valid": 500 - }, - "chainer": { - "in": ["text_a", "text_b"], - "in_y": ["y"], - "pipe": [ - { - "class_name": "bert_preprocessor", - "vocab_file": "{DOWNLOADS_PATH}/bert_models/multi_cased_L-12_H-768_A-12/vocab.txt", - "do_lower_case": false, - "max_seq_length": 64, - "in": ["text_a", "text_b"], - "out": ["bert_features"] - }, - { - "class_name": "bert_classifier", - "n_classes": 2, - "return_probas": true, - "one_hot_labels": false, - "bert_config_file": "{DOWNLOADS_PATH}/bert_models/multi_cased_L-12_H-768_A-12/bert_config.json", - "pretrained_bert": "{DOWNLOADS_PATH}/bert_models/multi_cased_L-12_H-768_A-12/bert_model.ckpt", - "save_path": "{MODEL_PATH}/model", - "load_path": "{MODEL_PATH}/model", - "keep_prob": 0.5, - "learning_rate": 2e-05, - "learning_rate_drop_patience": 2, - "learning_rate_drop_div": 2.0, - "in": ["bert_features"], - "in_y": ["y"], - "out": ["predictions"] - } - ], - "out": ["predictions"] - }, - "train": { - "batch_size": 32, - "pytest_max_batches": 2, - "metrics": ["f1", "acc"], - "validation_patience": 10, - "val_every_n_batches": 100, - "log_every_n_batches": 100, - "evaluation_targets": ["train", "valid", "test"], - "tensorboard_log_dir": "{MODEL_PATH}/" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "MODEL_PATH": "{MODELS_PATH}/entity_ranking_bert_rus_no_mention" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/bert/multi_cased_L-12_H-768_A-12.zip", - "subdir": "{DOWNLOADS_PATH}/bert_models" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/datasets/rel_ranking_bert_rus.tar.gz", - "subdir": "{DOWNLOADS_PATH}/rel_ranking_rus" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/models/rel_ranking_bert_rus.tar.gz", - "subdir": "{MODELS_PATH}/rel_ranking_bert_rus" - } - ] - } -} diff --git a/deeppavlov/configs/classifiers/glue/glue_mnli_roberta.json b/deeppavlov/configs/classifiers/glue/glue_mnli_roberta.json index 7ff348e303..16b20476c0 100644 --- a/deeppavlov/configs/classifiers/glue/glue_mnli_roberta.json +++ b/deeppavlov/configs/classifiers/glue/glue_mnli_roberta.json @@ -121,7 +121,6 @@ "log_every_n_batches": 250, "show_examples": false, "evaluation_targets": [ - "train", "valid" ], "class_name": "torch_trainer", diff --git a/deeppavlov/configs/classifiers/glue/glue_rte_roberta_mnli.json b/deeppavlov/configs/classifiers/glue/glue_rte_roberta_mnli.json index feb3f17ae5..6001c5cce7 100644 --- a/deeppavlov/configs/classifiers/glue/glue_rte_roberta_mnli.json +++ b/deeppavlov/configs/classifiers/glue/glue_rte_roberta_mnli.json @@ -121,7 +121,6 @@ "log_every_n_epochs": 1, "show_examples": false, "evaluation_targets": [ - "train", "valid" ], "class_name": "torch_trainer", diff --git a/deeppavlov/configs/classifiers/insults_kaggle_bert_torch.json b/deeppavlov/configs/classifiers/glue/glue_wnli_roberta.json similarity index 64% rename from deeppavlov/configs/classifiers/insults_kaggle_bert_torch.json rename to deeppavlov/configs/classifiers/glue/glue_wnli_roberta.json index a9ff62015e..34b300c4b8 100644 --- a/deeppavlov/configs/classifiers/insults_kaggle_bert_torch.json +++ b/deeppavlov/configs/classifiers/glue/glue_wnli_roberta.json @@ -1,17 +1,39 @@ { + "metadata": { + "variables": { + "ROOT_PATH": "~/.deeppavlov", + "BASE_MODEL": "roberta-large", + "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", + "MODELS_PATH": "{ROOT_PATH}/models", + "MODEL_PATH": "{MODELS_PATH}/classifiers/glue_wnli/{BASE_MODEL}" + }, + "download": [ + { + "url": "http://files.deeppavlov.ai/0.16/classifiers/glue_wnli_roberta.tar.gz", + "subdir": "{MODELS_PATH}" + } + ] + }, "dataset_reader": { - "class_name": "basic_classification_reader", - "x": "Comment", - "y": "Class", - "data_path": "{DOWNLOADS_PATH}/insults_data" + "class_name": "huggingface_dataset_reader", + "path": "glue", + "name": "wnli", + "train": "train", + "valid": "validation" }, "dataset_iterator": { - "class_name": "basic_classification_iterator", + "class_name": "huggingface_dataset_iterator", + "features": [ + "sentence1", + "sentence2" + ], + "label": "label", "seed": 42 }, "chainer": { "in": [ - "x" + "sentence1", + "sentence2" ], "in_y": [ "y" @@ -19,11 +41,14 @@ "pipe": [ { "class_name": "torch_transformers_preprocessor", - "vocab_file": "{TRANSFORMER}", - "do_lower_case": true, - "max_seq_length": 64, + "vocab_file": "{BASE_MODEL}", + "do_lower_case": false, + "max_seq_length": 192, + "truncation": "longest_first", + "padding": "longest", "in": [ - "x" + "sentence1", + "sentence2" ], "out": [ "bert_features" @@ -59,14 +84,14 @@ "class_name": "torch_transformers_classifier", "n_classes": "#classes_vocab.len", "return_probas": true, - "pretrained_bert": "{TRANSFORMER}", + "pretrained_bert": "{BASE_MODEL}", "save_path": "{MODEL_PATH}/model", "load_path": "{MODEL_PATH}/model", "optimizer": "AdamW", "optimizer_parameters": { "lr": 1e-05 }, - "learning_rate_drop_patience": 5, + "learning_rate_drop_patience": 3, "learning_rate_drop_div": 2.0, "in": [ "bert_features" @@ -103,47 +128,20 @@ ] }, "train": { - "epochs": 100, - "batch_size": 64, + "batch_size": 24, "metrics": [ - { - "name": "roc_auc", - "inputs": [ - "y_onehot", - "y_pred_probas" - ] - }, - "accuracy", - "f1_macro" + "accuracy" ], - "validation_patience": 5, - "val_every_n_epochs": 1, - "log_every_n_epochs": 1, + "epochs": 1, + "val_every_n_batches": 250, + "log_every_n_batches": 250, "show_examples": false, "evaluation_targets": [ "train", - "valid", - "test" + "valid" ], - "class_name": "torch_trainer" - }, - "metadata": { - "variables": { - "TRANSFORMER": "bert-base-uncased", - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "MODEL_PATH": "{MODELS_PATH}/classifiers/insults_kaggle_torch_bert" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/datasets/insults_data.tar.gz", - "subdir": "{DOWNLOADS_PATH}" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/classifiers/insults_kaggle_torch_bert_v0.tar.gz", - "subdir": "{MODELS_PATH}/classifiers" - } - ] + "class_name": "torch_trainer", + "tensorboard_log_dir": "{MODEL_PATH}/", + "pytest_max_batches": 2 } } diff --git a/deeppavlov/configs/classifiers/insults_kaggle.json b/deeppavlov/configs/classifiers/insults_kaggle.json deleted file mode 100644 index 8627589eca..0000000000 --- a/deeppavlov/configs/classifiers/insults_kaggle.json +++ /dev/null @@ -1,155 +0,0 @@ -{ - "dataset_reader": { - "class_name": "basic_classification_reader", - "x": "Comment", - "y": "Class", - "data_path": "{DOWNLOADS_PATH}/insults_data" - }, - "dataset_iterator": { - "class_name": "basic_classification_iterator", - "seed": 42 - }, - "chainer": { - "in": [ - "x" - ], - "in_y": [ - "y" - ], - "pipe": [ - { - "id": "classes_vocab", - "class_name": "simple_vocab", - "fit_on": [ - "y" - ], - "save_path": "{MODEL_PATH}/classes.dict", - "load_path": "{MODEL_PATH}/classes.dict", - "in": "y", - "out": "y_ids" - }, - { - "in": [ - "x" - ], - "out": [ - "x_prep" - ], - "class_name": "dirty_comments_preprocessor" - }, - { - "in": "x_prep", - "out": "x_tok", - "id": "my_tokenizer", - "class_name": "nltk_tokenizer", - "tokenizer": "wordpunct_tokenize" - }, - { - "in": "x_tok", - "out": "x_emb", - "id": "my_embedder", - "class_name": "fasttext", - "load_path": "{DOWNLOADS_PATH}/embeddings/wordpunct_tok_reddit_comments_2017_11_300.bin", - "pad_zero": true - }, - { - "in": "y_ids", - "out": "y_onehot", - "class_name": "one_hotter", - "depth": "#classes_vocab.len", - "single_vector": true - }, - { - "in": [ - "x_emb" - ], - "in_y": [ - "y_onehot" - ], - "out": [ - "y_pred_probas" - ], - "main": true, - "class_name": "keras_classification_model", - "save_path": "{MODEL_PATH}/model", - "load_path": "{MODEL_PATH}/model", - "embedding_size": "#my_embedder.dim", - "n_classes": "#classes_vocab.len", - "kernel_sizes_cnn": [ - 3, - 5, - 7 - ], - "filters_cnn": 256, - "optimizer": "Adam", - "learning_rate": 0.01, - "learning_rate_decay": 0.1, - "loss": "binary_crossentropy", - "last_layer_activation": "softmax", - "coef_reg_cnn": 1e-3, - "coef_reg_den": 1e-2, - "dropout_rate": 0.5, - "dense_size": 100, - "model_name": "cnn_model" - }, - { - "in": "y_pred_probas", - "out": "y_pred_ids", - "class_name": "proba2labels", - "max_proba": true - }, - { - "in": "y_pred_ids", - "out": "y_pred_labels", - "ref": "classes_vocab" - } - ], - "out": [ - "y_pred_labels" - ] - }, - "train": { - "epochs": 1000, - "batch_size": 64, - "metrics": [ - "accuracy", - "f1_macro", - { - "name": "roc_auc", - "inputs": ["y_onehot", "y_pred_probas"] - } - ], - "validation_patience": 5, - "val_every_n_epochs": 5, - "log_every_n_epochs": 5, - "show_examples": false, - "evaluation_targets": [ - "train", - "valid", - "test" - ], - "class_name": "nn_trainer" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "MODEL_PATH": "{MODELS_PATH}/classifiers/insults_kaggle_v2" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/datasets/insults_data.tar.gz", - "subdir": "{DOWNLOADS_PATH}" - }, - { - "url": "http://files.deeppavlov.ai/embeddings/reddit_fastText/wordpunct_tok_reddit_comments_2017_11_300.bin", - "subdir": "{DOWNLOADS_PATH}/embeddings" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/classifiers/insults_kaggle_v2.tar.gz", - "subdir": "{MODELS_PATH}/classifiers" - } - ] - } -} diff --git a/deeppavlov/configs/classifiers/insults_kaggle_bert.json b/deeppavlov/configs/classifiers/insults_kaggle_bert.json index d64f2363b4..3eebd753a3 100644 --- a/deeppavlov/configs/classifiers/insults_kaggle_bert.json +++ b/deeppavlov/configs/classifiers/insults_kaggle_bert.json @@ -18,9 +18,9 @@ ], "pipe": [ { - "class_name": "bert_preprocessor", - "vocab_file": "{DOWNLOADS_PATH}/bert_models/cased_L-12_H-768_A-12/vocab.txt", - "do_lower_case": false, + "class_name": "torch_transformers_preprocessor", + "vocab_file": "{TRANSFORMER}", + "do_lower_case": true, "max_seq_length": 64, "in": [ "x" @@ -37,48 +37,64 @@ ], "save_path": "{MODEL_PATH}/classes.dict", "load_path": "{MODEL_PATH}/classes.dict", - "in": "y", - "out": "y_ids" + "in": [ + "y" + ], + "out": [ + "y_ids" + ] }, { - "in": "y_ids", - "out": "y_onehot", + "in": [ + "y_ids" + ], + "out": [ + "y_onehot" + ], "class_name": "one_hotter", "depth": "#classes_vocab.len", "single_vector": true }, { - "class_name": "bert_classifier", + "class_name": "torch_transformers_classifier", "n_classes": "#classes_vocab.len", "return_probas": true, - "one_hot_labels": true, - "bert_config_file": "{DOWNLOADS_PATH}/bert_models/cased_L-12_H-768_A-12/bert_config.json", - "pretrained_bert": "{DOWNLOADS_PATH}/bert_models/cased_L-12_H-768_A-12/bert_model.ckpt", + "pretrained_bert": "{TRANSFORMER}", "save_path": "{MODEL_PATH}/model", "load_path": "{MODEL_PATH}/model", - "keep_prob": 0.5, - "learning_rate": 1e-05, + "optimizer": "AdamW", + "optimizer_parameters": { + "lr": 1e-05 + }, "learning_rate_drop_patience": 5, "learning_rate_drop_div": 2.0, "in": [ "bert_features" ], "in_y": [ - "y_onehot" + "y_ids" ], "out": [ "y_pred_probas" ] }, { - "in": "y_pred_probas", - "out": "y_pred_ids", + "in": [ + "y_pred_probas" + ], + "out": [ + "y_pred_ids" + ], "class_name": "proba2labels", "max_proba": true }, { - "in": "y_pred_ids", - "out": "y_pred_labels", + "in": [ + "y_pred_ids" + ], + "out": [ + "y_pred_labels" + ], "ref": "classes_vocab" } ], @@ -109,15 +125,15 @@ "valid", "test" ], - "class_name": "nn_trainer", - "tensorboard_log_dir": "{MODEL_PATH}/" + "class_name": "torch_trainer" }, "metadata": { "variables": { + "TRANSFORMER": "bert-base-uncased", "ROOT_PATH": "~/.deeppavlov", "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", "MODELS_PATH": "{ROOT_PATH}/models", - "MODEL_PATH": "{MODELS_PATH}/classifiers/insults_kaggle_v3" + "MODEL_PATH": "{MODELS_PATH}/classifiers/insults_kaggle_torch_bert" }, "download": [ { @@ -125,11 +141,7 @@ "subdir": "{DOWNLOADS_PATH}" }, { - "url": "http://files.deeppavlov.ai/deeppavlov_data/bert/cased_L-12_H-768_A-12.zip", - "subdir": "{DOWNLOADS_PATH}/bert_models" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/classifiers/insults_kaggle_v3.tar.gz", + "url": "http://files.deeppavlov.ai/deeppavlov_data/classifiers/insults_kaggle_torch_bert_v5.tar.gz", "subdir": "{MODELS_PATH}/classifiers" } ] diff --git a/deeppavlov/configs/classifiers/insults_kaggle_conv_bert.json b/deeppavlov/configs/classifiers/insults_kaggle_conv_bert.json deleted file mode 100644 index 01f13affca..0000000000 --- a/deeppavlov/configs/classifiers/insults_kaggle_conv_bert.json +++ /dev/null @@ -1,153 +0,0 @@ -{ - "dataset_reader": { - "class_name": "basic_classification_reader", - "x": "Comment", - "y": "Class", - "data_path": "{DOWNLOADS_PATH}/insults_data" - }, - "dataset_iterator": { - "class_name": "basic_classification_iterator", - "seed": 42 - }, - "chainer": { - "in": [ - "x" - ], - "in_y": [ - "y" - ], - "pipe": [ - { - "class_name": "bert_preprocessor", - "vocab_file": "{DOWNLOADS_PATH}/bert_models/conversational_cased_L-12_H-768_A-12/vocab.txt", - "do_lower_case": false, - "max_seq_length": 64, - "in": [ - "x" - ], - "out": [ - "bert_features" - ] - }, - { - "id": "classes_vocab", - "class_name": "simple_vocab", - "fit_on": [ - "y" - ], - "save_path": "{MODEL_PATH}/classes.dict", - "load_path": "{MODEL_PATH}/classes.dict", - "in": [ - "y" - ], - "out": [ - "y_ids" - ] - }, - { - "in": [ - "y_ids" - ], - "out": [ - "y_onehot" - ], - "class_name": "one_hotter", - "depth": "#classes_vocab.len", - "single_vector": true - }, - { - "class_name": "bert_classifier", - "n_classes": "#classes_vocab.len", - "return_probas": true, - "one_hot_labels": true, - "bert_config_file": "{DOWNLOADS_PATH}/bert_models/conversational_cased_L-12_H-768_A-12/bert_config.json", - "pretrained_bert": "{DOWNLOADS_PATH}/bert_models/conversational_cased_L-12_H-768_A-12/bert_model.ckpt", - "save_path": "{MODEL_PATH}/model", - "load_path": "{MODEL_PATH}/model", - "keep_prob": 0.5, - "learning_rate": 1e-05, - "learning_rate_drop_patience": 5, - "learning_rate_drop_div": 2.0, - "in": [ - "bert_features" - ], - "in_y": [ - "y_onehot" - ], - "out": [ - "y_pred_probas" - ] - }, - { - "in": [ - "y_pred_probas" - ], - "out": [ - "y_pred_ids" - ], - "class_name": "proba2labels", - "max_proba": true - }, - { - "in": [ - "y_pred_ids" - ], - "out": [ - "y_pred_labels" - ], - "ref": "classes_vocab" - } - ], - "out": [ - "y_pred_labels" - ] - }, - "train": { - "epochs": 100, - "batch_size": 64, - "metrics": [ - { - "name": "roc_auc", - "inputs": [ - "y_onehot", - "y_pred_probas" - ] - }, - "accuracy", - "f1_macro" - ], - "validation_patience": 5, - "val_every_n_epochs": 1, - "log_every_n_epochs": 1, - "show_examples": false, - "evaluation_targets": [ - "train", - "valid", - "test" - ], - "class_name": "nn_trainer", - "tensorboard_log_dir": "{MODEL_PATH}/" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "MODEL_PATH": "{MODELS_PATH}/classifiers/insults_kaggle_v4" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/datasets/insults_data.tar.gz", - "subdir": "{DOWNLOADS_PATH}" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/bert/conversational_cased_L-12_H-768_A-12.tar.gz", - "subdir": "{DOWNLOADS_PATH}/bert_models" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/classifiers/insults_kaggle_v4.tar.gz", - "subdir": "{MODELS_PATH}/classifiers" - } - ] - } -} diff --git a/deeppavlov/configs/classifiers/intents_dstc2.json b/deeppavlov/configs/classifiers/intents_dstc2.json deleted file mode 100644 index 828c01d634..0000000000 --- a/deeppavlov/configs/classifiers/intents_dstc2.json +++ /dev/null @@ -1,156 +0,0 @@ -{ - "dataset_reader": { - "class_name": "dstc2_reader", - "data_path": "{DOWNLOADS_PATH}/dstc2" - }, - "dataset_iterator": { - "class_name": "dstc2_intents_iterator", - "seed": 42 - }, - "chainer": { - "in": [ - "x" - ], - "in_y": [ - "y" - ], - "pipe": [ - { - "id": "classes_vocab", - "class_name": "simple_vocab", - "fit_on": [ - "y" - ], - "save_path": "{MODEL_PATH}/classes.dict", - "load_path": "{MODEL_PATH}/classes.dict", - "in": "y", - "out": "y_ids", - "special_tokens": [""] - }, - { - "in": "x", - "out": "x_tok", - "id": "my_tokenizer", - "class_name": "nltk_tokenizer", - "tokenizer": "wordpunct_tokenize" - }, - { - "in": "x_tok", - "out": "x_emb", - "id": "my_embedder", - "class_name": "fasttext", - "load_path": "{DOWNLOADS_PATH}/embeddings/dstc2_fastText_model.bin", - "pad_zero": true - }, - { - "in": "y_ids", - "out": "y_onehot", - "class_name": "one_hotter", - "id": "my_one_hotter", - "depth": "#classes_vocab.len", - "single_vector": true - }, - { - "in": [ - "x_emb" - ], - "in_y": [ - "y_onehot" - ], - "out": [ - "y_pred_probas" - ], - "main": true, - "class_name": "keras_classification_model", - "save_path": "{MODEL_PATH}/model", - "load_path": "{MODEL_PATH}/model", - "embedding_size": "#my_embedder.dim", - "classes": "#classes_vocab.keys()", - "n_classes": "#classes_vocab.len", - "kernel_sizes_cnn": [ - 3, - 5, - 7 - ], - "filters_cnn": 512, - "optimizer": "Adam", - "learning_rate": 0.1, - "learning_rate_decay": 0.1, - "loss": "binary_crossentropy", - "coef_reg_cnn": 1e-4, - "coef_reg_den": 1e-4, - "dropout_rate": 0.5, - "dense_size": 100, - "model_name": "cnn_model" - }, - { - "in": "y_pred_probas", - "out": "y_pred_ids", - "class_name": "proba2labels", - "confidence_threshold": 0.5 - }, - { - "in": "y_pred_ids", - "out": "y_pred_labels", - "ref": "classes_vocab" - }, - { - "ref": "my_one_hotter", - "in": "y_pred_ids", - "out": "y_pred_onehot" - } - ], - "out": [ - "y_pred_labels", - "y_pred_probas" - ] - }, - "train": { - "epochs": 1000, - "batch_size": 64, - "metrics": [ - { - "name": "sets_accuracy", - "inputs": [ - "y", - "y_pred_labels" - ] - }, - { - "name": "roc_auc", - "inputs": [ - "y_onehot", - "y_pred_probas" - ] - } - ], - "validation_patience": 5, - "val_every_n_epochs": 5, - "log_every_n_batches": 100, - "show_examples": false, - "evaluation_targets": [ - "train", - "valid", - "test" - ], - "class_name": "nn_trainer" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "MODEL_PATH": "{MODELS_PATH}/classifiers/intents_dstc2_v10" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/embeddings/dstc2_fastText_model.bin", - "subdir": "{DOWNLOADS_PATH}/embeddings" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/classifiers/intents_dstc2_v10.tar.gz", - "subdir": "{MODELS_PATH}/classifiers" - } - ] - } -} diff --git a/deeppavlov/configs/classifiers/intents_dstc2_bert.json b/deeppavlov/configs/classifiers/intents_dstc2_bert.json deleted file mode 100644 index 0143446b53..0000000000 --- a/deeppavlov/configs/classifiers/intents_dstc2_bert.json +++ /dev/null @@ -1,121 +0,0 @@ -{ - "dataset_reader": { - "class_name": "dstc2_reader", - "data_path": "{DOWNLOADS_PATH}/dstc2" - }, - "dataset_iterator": { - "class_name": "dstc2_intents_iterator", - "seed": 42 - }, - "chainer": { - "in": ["x"], - "in_y": ["y"], - "pipe": [ - { - "id": "classes_vocab", - "class_name": "simple_vocab", - "fit_on": ["y"], - "save_path": "{MODEL_PATH}/classes.dict", - "load_path": "{MODEL_PATH}/classes.dict", - "in": "y", - "out": "y_ids", - "special_tokens": [""] - }, - { - "class_name": "bert_preprocessor", - "vocab_file": "{DOWNLOADS_PATH}/bert_models/cased_L-12_H-768_A-12/vocab.txt", - "do_lower_case": false, - "max_seq_length": 64, - "in": ["x"], - "out": ["bert_features"] - }, - { - "in": "y_ids", - "out": "y_onehot", - "class_name": "one_hotter", - "id": "my_one_hotter", - "depth": "#classes_vocab.len", - "single_vector": true - }, - { - "class_name": "bert_classifier", - "n_classes": "#classes_vocab.len", - "return_probas": true, - "one_hot_labels": true, - "multilabel": true, - "bert_config_file": "{DOWNLOADS_PATH}/bert_models/cased_L-12_H-768_A-12/bert_config.json", - "pretrained_bert": "{DOWNLOADS_PATH}/bert_models/cased_L-12_H-768_A-12/bert_model.ckpt", - "save_path": "{MODEL_PATH}/model", - "load_path": "{MODEL_PATH}/model", - "keep_prob": 0.5, - "learning_rate": 2e-05, - "learning_rate_drop_patience": 3, - "learning_rate_drop_div": 2.0, - "in": ["bert_features"], - "in_y": ["y_onehot"], - "out": ["y_pred_probas"] - }, - { - "in": "y_pred_probas", - "out": "y_pred_ids", - "class_name": "proba2labels", - "confidence_threshold": 0.5 - }, - { - "in": "y_pred_ids", - "out": "y_pred_labels", - "ref": "classes_vocab" - }, - { - "ref": "my_one_hotter", - "in": "y_pred_ids", - "out": "y_pred_onehot" - } - ], - "out": ["y_pred_probas", "y_pred_labels"] - }, - "train": { - "metrics": [ - { - "name": "sets_accuracy", - "inputs": ["y", "y_pred_labels"] - }, - { - "name": "roc_auc", - "inputs": ["y_onehot", "y_pred_probas"] - } - ], - "show_examples": false, - "batch_size": 32, - "pytest_max_batches": 2, - "validation_patience": 10, - "val_every_n_batches": 100, - "log_every_n_batches": 100, - "validate_best": true, - "test_best": true, - "tensorboard_log_dir": "{MODEL_PATH}/logs" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "MODEL_PATH": "{MODELS_PATH}/classifiers/intents_dstc2_bert_v0" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/bert/cased_L-12_H-768_A-12.zip", - "subdir": "{DOWNLOADS_PATH}/bert_models" - }, - { - "url": "http://files.deeppavlov.ai/datasets/dstc2_v2.tar.gz", - "subdir": "{DOWNLOADS_PATH}/dstc2" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/classifiers/intents_dstc2_bert_v0.tar.gz", - "subdir": "{MODELS_PATH}/classifiers" - } - - ] - } -} diff --git a/deeppavlov/configs/classifiers/intents_dstc2_big.json b/deeppavlov/configs/classifiers/intents_dstc2_big.json deleted file mode 100644 index d6a458dcab..0000000000 --- a/deeppavlov/configs/classifiers/intents_dstc2_big.json +++ /dev/null @@ -1,155 +0,0 @@ -{ - "dataset_reader": { - "class_name": "dstc2_reader", - "data_path": "{DOWNLOADS_PATH}/dstc2" - }, - "dataset_iterator": { - "class_name": "dstc2_intents_iterator", - "seed": 42 - }, - "chainer": { - "in": [ - "x" - ], - "in_y": [ - "y" - ], - "pipe": [ - { - "id": "classes_vocab", - "class_name": "simple_vocab", - "fit_on": [ - "y" - ], - "save_path": "{MODEL_PATH}/classes.dict", - "load_path": "{MODEL_PATH}/classes.dict", - "in": "y", - "out": "y_ids", - "special_tokens": [""] - }, - { - "in": "x", - "out": "x_tok", - "id": "my_tokenizer", - "class_name": "nltk_tokenizer", - "tokenizer": "wordpunct_tokenize" - }, - { - "in": "x_tok", - "out": "x_emb", - "id": "my_embedder", - "class_name": "fasttext", - "load_path": "{DOWNLOADS_PATH}/embeddings/wiki.en.bin", - "pad_zero": true - }, - { - "in": "y_ids", - "out": "y_onehot", - "class_name": "one_hotter", - "id": "my_one_hotter", - "depth": "#classes_vocab.len", - "single_vector": true - }, - { - "in": [ - "x_emb" - ], - "in_y": [ - "y_onehot" - ], - "out": [ - "y_pred_probas" - ], - "main": true, - "class_name": "keras_classification_model", - "save_path": "{MODEL_PATH}/model", - "load_path": "{MODEL_PATH}/model", - "embedding_size": "#my_embedder.dim", - "classes": "#classes_vocab.keys()", - "n_classes": "#classes_vocab.len", - "kernel_sizes_cnn": [ - 3, - 5, - 7 - ], - "filters_cnn": 512, - "optimizer": "Adam", - "learning_rate": 0.1, - "learning_rate_decay": 0.1, - "loss": "binary_crossentropy", - "coef_reg_cnn": 1e-4, - "coef_reg_den": 1e-4, - "dropout_rate": 0.5, - "dense_size": 100, - "model_name": "cnn_model" - }, - { - "in": "y_pred_probas", - "out": "y_pred_ids", - "class_name": "proba2labels", - "confidence_threshold": 0.5 - }, - { - "in": "y_pred_ids", - "out": "y_pred_labels", - "ref": "classes_vocab" - }, - { - "ref": "my_one_hotter", - "in": "y_pred_ids", - "out": "y_pred_onehot" - } - ], - "out": [ - "y_pred_labels" - ] - }, - "train": { - "epochs": 1000, - "batch_size": 64, - "metrics": [ - { - "name": "sets_accuracy", - "inputs": [ - "y", - "y_pred_labels" - ] - }, - { - "name": "roc_auc", - "inputs": [ - "y_onehot", - "y_pred_probas" - ] - } - ], - "validation_patience": 5, - "val_every_n_epochs": 5, - "log_every_n_batches": 100, - "show_examples": false, - "evaluation_targets": [ - "train", - "valid", - "test" - ], - "class_name": "nn_trainer" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "MODEL_PATH": "{MODELS_PATH}/classifiers/intents_dstc2_v11" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/embeddings/wiki.en.bin", - "subdir": "{DOWNLOADS_PATH}/embeddings" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/classifiers/intents_dstc2_v11.tar.gz", - "subdir": "{MODELS_PATH}/classifiers" - } - ] - } -} diff --git a/deeppavlov/configs/classifiers/intents_sample_csv.json b/deeppavlov/configs/classifiers/intents_sample_csv.json deleted file mode 100644 index 4b01a2d301..0000000000 --- a/deeppavlov/configs/classifiers/intents_sample_csv.json +++ /dev/null @@ -1,160 +0,0 @@ -{ - "dataset": { - "type": "classification", - "format": "csv", - "sep": ",", - "header": 0, - "names": [ - "text", - "classes" - ], - "class_sep": ",", - "train": "sample.csv", - "data_path": "{DOWNLOADS_PATH}/sample", - "x": "text", - "y": "classes", - "url": "http://files.deeppavlov.ai/datasets/snips_intents/train.csv", - "seed": 42, - "field_to_split": "train", - "split_fields": [ - "train", - "valid" - ], - "split_proportions": [ - 0.9, - 0.1 - ] - }, - "chainer": { - "in": [ - "x" - ], - "in_y": [ - "y" - ], - "pipe": [ - { - "id": "classes_vocab", - "class_name": "simple_vocab", - "fit_on": [ - "y" - ], - "save_path": "{MODEL_PATH}/classes.dict", - "load_path": "{MODEL_PATH}/classes.dict", - "in": "y", - "out": "y_ids" - }, - { - "in": "x", - "out": "x_tok", - "id": "my_tokenizer", - "class_name": "nltk_tokenizer", - "tokenizer": "wordpunct_tokenize" - }, - { - "in": "x_tok", - "out": "x_emb", - "id": "my_embedder", - "class_name": "fasttext", - "load_path": "{DOWNLOADS_PATH}/embeddings/dstc2_fastText_model.bin", - "pad_zero": true - }, - { - "in": "y_ids", - "out": "y_onehot", - "class_name": "one_hotter", - "depth": "#classes_vocab.len", - "single_vector": true - }, - { - "in": [ - "x_emb" - ], - "in_y": [ - "y_onehot" - ], - "out": [ - "y_pred_probas" - ], - "main": true, - "class_name": "keras_classification_model", - "save_path": "{MODEL_PATH}/model", - "load_path": "{MODEL_PATH}/model", - "embedding_size": "#my_embedder.dim", - "n_classes": "#classes_vocab.len", - "kernel_sizes_cnn": [ - 1, - 2, - 3 - ], - "filters_cnn": 256, - "optimizer": "Adam", - "learning_rate": 0.01, - "learning_rate_decay": 0.1, - "loss": "binary_crossentropy", - "coef_reg_cnn": 1e-4, - "coef_reg_den": 1e-4, - "dropout_rate": 0.5, - "dense_size": 100, - "model_name": "cnn_model" - }, - { - "in": "y_pred_probas", - "out": "y_pred_ids", - "class_name": "proba2labels", - "max_proba": true - }, - { - "in": "y_pred_ids", - "out": "y_pred_labels", - "ref": "classes_vocab" - } - ], - "out": [ - "y_pred_labels" - ] - }, - "train": { - "epochs": 100, - "batch_size": 64, - "metrics": [ - "accuracy", - "f1_macro", - { - "name": "roc_auc", - "inputs": ["y_onehot", "y_pred_probas"] - } - ], - "validation_patience": 5, - "val_every_n_epochs": 1, - "log_every_n_epochs": 1, - "show_examples": false, - "evaluation_targets": [ - "train", - "valid" - ], - "class_name": "nn_trainer" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "MODEL_PATH": "{MODELS_PATH}/classifiers/intents_snips_v9" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/datasets/snips_intents/train.csv", - "subdir": "{DOWNLOADS_PATH}/sample" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/embeddings/dstc2_fastText_model.bin", - "subdir": "{DOWNLOADS_PATH}/embeddings" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/classifiers/intents_snips_v9.tar.gz", - "subdir": "{MODELS_PATH}/classifiers" - } - ] - } -} diff --git a/deeppavlov/configs/classifiers/intents_sample_json.json b/deeppavlov/configs/classifiers/intents_sample_json.json deleted file mode 100644 index b87d3274be..0000000000 --- a/deeppavlov/configs/classifiers/intents_sample_json.json +++ /dev/null @@ -1,155 +0,0 @@ -{ - "dataset": { - "type": "classification", - "format": "json", - "orient": "records", - "lines": true, - "data_path": "{DOWNLOADS_PATH}/sample", - "train": "sample.json", - "x": "text", - "y": "intents", - "url": "http://files.deeppavlov.ai/datasets/snips_intents/train.json", - "seed": 42, - "field_to_split": "train", - "split_fields": [ - "train", - "valid" - ], - "split_proportions": [ - 0.9, - 0.1 - ] - }, - "chainer": { - "in": [ - "x" - ], - "in_y": [ - "y" - ], - "pipe": [ - { - "id": "classes_vocab", - "class_name": "simple_vocab", - "fit_on": [ - "y" - ], - "save_path": "{MODEL_PATH}/classes.dict", - "load_path": "{MODEL_PATH}/classes.dict", - "in": "y", - "out": "y_ids" - }, - { - "in": "x", - "out": "x_tok", - "id": "my_tokenizer", - "class_name": "nltk_tokenizer", - "tokenizer": "wordpunct_tokenize" - }, - { - "in": "x_tok", - "out": "x_emb", - "id": "my_embedder", - "class_name": "fasttext", - "load_path": "{DOWNLOADS_PATH}/embeddings/dstc2_fastText_model.bin", - "pad_zero": true - }, - { - "in": "y_ids", - "out": "y_onehot", - "class_name": "one_hotter", - "depth": "#classes_vocab.len", - "single_vector": true - }, - { - "in": [ - "x_emb" - ], - "in_y": [ - "y_onehot" - ], - "out": [ - "y_pred_probas" - ], - "main": true, - "class_name": "keras_classification_model", - "save_path": "{MODEL_PATH}/model", - "load_path": "{MODEL_PATH}/model", - "embedding_size": "#my_embedder.dim", - "n_classes": "#classes_vocab.len", - "kernel_sizes_cnn": [ - 1, - 2, - 3 - ], - "filters_cnn": 256, - "optimizer": "Adam", - "learning_rate": 0.01, - "learning_rate_decay": 0.1, - "loss": "binary_crossentropy", - "coef_reg_cnn": 1e-4, - "coef_reg_den": 1e-4, - "dropout_rate": 0.5, - "dense_size": 100, - "model_name": "cnn_model" - }, - { - "in": "y_pred_probas", - "out": "y_pred_ids", - "class_name": "proba2labels", - "max_proba": true - }, - { - "in": "y_pred_ids", - "out": "y_pred_labels", - "ref": "classes_vocab" - } - ], - "out": [ - "y_pred_labels" - ] - }, - "train": { - "epochs": 100, - "batch_size": 64, - "metrics": [ - "accuracy", - "f1_macro", - { - "name": "roc_auc", - "inputs": ["y_onehot", "y_pred_probas"] - } - ], - "validation_patience": 5, - "val_every_n_epochs": 1, - "log_every_n_epochs": 1, - "show_examples": false, - "evaluation_targets": [ - "train", - "valid" - ], - "class_name": "nn_trainer" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "MODEL_PATH": "{MODELS_PATH}/classifiers/intents_snips_v9" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/datasets/snips_intents/train.json", - "subdir": "{DOWNLOADS_PATH}/sample" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/embeddings/dstc2_fastText_model.bin", - "subdir": "{DOWNLOADS_PATH}/embeddings" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/classifiers/intents_snips_v9.tar.gz", - "subdir": "{MODELS_PATH}/classifiers" - } - ] - } -} diff --git a/deeppavlov/configs/classifiers/intents_snips.json b/deeppavlov/configs/classifiers/intents_snips.json deleted file mode 100644 index 5f0aa89cd0..0000000000 --- a/deeppavlov/configs/classifiers/intents_snips.json +++ /dev/null @@ -1,141 +0,0 @@ -{ - "dataset_reader": { - "class_name": "snips_reader", - "x": "text", - "y": "intents", - "data_path": "{DOWNLOADS_PATH}/snips" - }, - "dataset_iterator": { - "class_name": "snips_intents_iterator", - "seed": 42 - }, - "chainer": { - "in": [ - "x" - ], - "in_y": [ - "y" - ], - "pipe": [ - { - "id": "classes_vocab", - "class_name": "simple_vocab", - "fit_on": [ - "y" - ], - "level": "token", - "save_path": "{MODEL_PATH}/classes.dict", - "load_path": "{MODEL_PATH}/classes.dict", - "in": "y", - "out": "y_ids" - }, - { - "in": "x", - "out": "x_tok", - "id": "my_tokenizer", - "class_name": "nltk_tokenizer", - "tokenizer": "wordpunct_tokenize" - }, - { - "in": "x_tok", - "out": "x_emb", - "id": "my_embedder", - "class_name": "fasttext", - "load_path": "{DOWNLOADS_PATH}/embeddings/dstc2_fastText_model.bin", - "pad_zero": true - }, - { - "in": "y_ids", - "out": "y_onehot", - "class_name": "one_hotter", - "depth": "#classes_vocab.len", - "single_vector": true - }, - { - "in": [ - "x_emb" - ], - "in_y": [ - "y_onehot" - ], - "out": [ - "y_pred_probas" - ], - "main": true, - "class_name": "keras_classification_model", - "save_path": "{MODEL_PATH}/model", - "load_path": "{MODEL_PATH}/model", - "embedding_size": "#my_embedder.dim", - "n_classes": "#classes_vocab.len", - "kernel_sizes_cnn": [ - 1, - 2, - 3 - ], - "filters_cnn": 256, - "optimizer": "Adam", - "learning_rate": 0.01, - "learning_rate_decay": 0.1, - "loss": "binary_crossentropy", - "coef_reg_cnn": 1e-4, - "coef_reg_den": 1e-4, - "dropout_rate": 0.5, - "dense_size": 100, - "model_name": "cnn_model" - }, - { - "in": "y_pred_probas", - "out": "y_pred_ids", - "class_name": "proba2labels", - "max_proba": true - }, - { - "in": "y_pred_ids", - "out": "y_pred_labels", - "ref": "classes_vocab" - } - ], - "out": [ - "y_pred_labels" - ] - }, - "train": { - "epochs": 1000, - "batch_size": 64, - "metrics": [ - "accuracy", - "f1_macro", - { - "name": "roc_auc", - "inputs": ["y_onehot", "y_pred_probas"] - } - ], - "validation_patience": 5, - "val_every_n_epochs": 5, - "log_every_n_epochs": 5, - "show_examples": false, - "evaluation_targets": [ - "train", - "valid" - ], - "class_name": "nn_trainer" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "MODEL_PATH": "{MODELS_PATH}/classifiers/intents_snips_v9" - }, - "download": [ -{ - "url": "http://files.deeppavlov.ai/deeppavlov_data/embeddings/dstc2_fastText_model.bin", - "subdir": "{DOWNLOADS_PATH}/embeddings" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/classifiers/intents_snips_v9.tar.gz", - "subdir": "{MODELS_PATH}/classifiers" - } - ] - } -} diff --git a/deeppavlov/configs/classifiers/intents_snips_big.json b/deeppavlov/configs/classifiers/intents_snips_big.json deleted file mode 100644 index 15b5adc648..0000000000 --- a/deeppavlov/configs/classifiers/intents_snips_big.json +++ /dev/null @@ -1,141 +0,0 @@ -{ - "dataset_reader": { - "class_name": "snips_reader", - "x": "text", - "y": "intents", - "data_path": "{DOWNLOADS_PATH}/snips" - }, - "dataset_iterator": { - "class_name": "snips_intents_iterator", - "seed": 42 - }, - "chainer": { - "in": [ - "x" - ], - "in_y": [ - "y" - ], - "pipe": [ - { - "id": "classes_vocab", - "class_name": "simple_vocab", - "fit_on": [ - "y" - ], - "level": "token", - "save_path": "{MODEL_PATH}/classes.dict", - "load_path": "{MODEL_PATH}/classes.dict", - "in": "y", - "out": "y_ids" - }, - { - "in": "x", - "out": "x_tok", - "id": "my_tokenizer", - "class_name": "nltk_tokenizer", - "tokenizer": "wordpunct_tokenize" - }, - { - "in": "x_tok", - "out": "x_emb", - "id": "my_embedder", - "class_name": "fasttext", - "load_path": "{DOWNLOADS_PATH}/embeddings/wiki.en.bin", - "pad_zero": true - }, - { - "in": "y_ids", - "out": "y_onehot", - "class_name": "one_hotter", - "depth": "#classes_vocab.len", - "single_vector": true - }, - { - "in": [ - "x_emb" - ], - "in_y": [ - "y_onehot" - ], - "out": [ - "y_pred_probas" - ], - "main": true, - "class_name": "keras_classification_model", - "save_path": "{MODEL_PATH}/model", - "load_path": "{MODEL_PATH}/model", - "embedding_size": "#my_embedder.dim", - "n_classes": "#classes_vocab.len", - "kernel_sizes_cnn": [ - 3, - 5, - 7 - ], - "filters_cnn": 256, - "optimizer": "Adam", - "learning_rate": 0.01, - "learning_rate_decay": 0.1, - "loss": "categorical_crossentropy", - "coef_reg_cnn": 1e-4, - "coef_reg_den": 1e-4, - "dropout_rate": 0.5, - "dense_size": 100, - "model_name": "cnn_model" - }, - { - "in": "y_pred_probas", - "out": "y_pred_ids", - "class_name": "proba2labels", - "max_proba": true - }, - { - "in": "y_pred_ids", - "out": "y_pred_labels", - "ref": "classes_vocab" - } - ], - "out": [ - "y_pred_labels" - ] - }, - "train": { - "epochs": 1000, - "batch_size": 64, - "metrics": [ - "accuracy", - "f1_macro", - { - "name": "roc_auc", - "inputs": ["y_onehot", "y_pred_probas"] - } - ], - "validation_patience": 5, - "val_every_n_epochs": 1, - "log_every_n_epochs": 1, - "show_examples": false, - "evaluation_targets": [ - "train", - "valid" - ], - "class_name": "nn_trainer" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "MODEL_PATH": "{MODELS_PATH}/classifiers/intents_snips_v10" - }, - "download": [ -{ - "url": "http://files.deeppavlov.ai/deeppavlov_data/embeddings/wiki.en.bin", - "subdir": "{DOWNLOADS_PATH}/embeddings" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/classifiers/intents_snips_v10.tar.gz", - "subdir": "{MODELS_PATH}/classifiers" - } - ] - } -} diff --git a/deeppavlov/configs/classifiers/intents_snips_sklearn.json b/deeppavlov/configs/classifiers/intents_snips_sklearn.json deleted file mode 100644 index 7847aa3e15..0000000000 --- a/deeppavlov/configs/classifiers/intents_snips_sklearn.json +++ /dev/null @@ -1,164 +0,0 @@ -{ - "dataset_reader": { - "class_name": "snips_reader", - "x": "text", - "y": "intents", - "data_path": "{DOWNLOADS_PATH}/snips" - }, - "dataset_iterator": { - "class_name": "snips_intents_iterator", - "seed": 42 - }, - "chainer": { - "in": [ - "x" - ], - "in_y": [ - "y" - ], - "pipe": [ - { - "id": "classes_vocab", - "class_name": "simple_vocab", - "fit_on": [ - "y" - ], - "save_path": "{MODEL_PATH}/classes.dict", - "load_path": "{MODEL_PATH}/classes.dict", - "in": "y", - "out": "y_ids" - }, - { - "in": [ - "x" - ], - "out": [ - "x_vec" - ], - "fit_on": [ - "x", - "y_ids" - ], - "id": "tfidf_vec", - "class_name": "sklearn_component", - "save_path": "{MODEL_PATH}/tfidf.pkl", - "load_path": "{MODEL_PATH}/tfidf.pkl", - "model_class": "sklearn.feature_extraction.text:TfidfVectorizer", - "infer_method": "transform", - "lowercase": true, - "analyzer": "word", - "ngram_range": [ - 1, - 5 - ], - "max_features": 10000, - "norm": null - }, - { - "in": [ - "x_vec" - ], - "out": [ - "x_sel" - ], - "fit_on": [ - "x_vec", - "y_ids" - ], - "id": "selector", - "class_name": "sklearn_component", - "save_path": "{MODEL_PATH}/selectkbest.pkl", - "load_path": "{MODEL_PATH}/selectkbest.pkl", - "model_class": "sklearn.feature_selection:SelectKBest", - "infer_method": "transform", - "score_func": "sklearn.feature_selection:chi2", - "k": 1000 - }, - { - "in": [ - "x_sel" - ], - "out": [ - "x_pca" - ], - "fit_on": [ - "x_sel" - ], - "id": "pca", - "class_name": "sklearn_component", - "save_path": "{MODEL_PATH}/pca.pkl", - "load_path": "{MODEL_PATH}/pca.pkl", - "model_class": "sklearn.decomposition:PCA", - "infer_method": "transform", - "n_components": 300 - }, - { - "class_name": "one_hotter", - "id": "onehotter", - "depth": "#classes_vocab.len", - "in": "y_ids", - "out": "y_onehot", - "single_vector": true - }, - { - "in": [ - "x_pca" - ], - "out": [ - "y_pred_onehot" - ], - "fit_on": [ - "x_pca", - "y_onehot" - ], - "class_name": "sklearn_component", - "main": true, - "save_path": "{MODEL_PATH}/model.pkl", - "load_path": "{MODEL_PATH}/model.pkl", - "model_class": "sklearn.neighbors:KNeighborsClassifier", - "infer_method": "predict", - "ensure_list_output": true - }, - { - "class_name": "proba2labels", - "in": "y_pred_onehot", - "out": "y_pred_ids", - "max_proba": true - }, - { - "ref": "classes_vocab", - "in": "y_pred_ids", - "out": "y_pred_labels" - } - ], - "out": [ - "y_pred_labels" - ] - }, - "train": { - "batch_size": 64, - "metrics": [ - "accuracy" - ], - "show_examples": false, - "evaluation_targets": [ - "train", - "valid" - ], - "class_name": "fit_trainer" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "MODEL_PATH": "{MODELS_PATH}/classifiers/intents_snips_sklearn_v11" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/classifiers/intents_snips_sklearn_v11.tar.gz", - "subdir": "{MODELS_PATH}/classifiers" - } - ] - } -} diff --git a/deeppavlov/configs/classifiers/intents_snips_tfidf_weighted.json b/deeppavlov/configs/classifiers/intents_snips_tfidf_weighted.json deleted file mode 100644 index b7f4e70712..0000000000 --- a/deeppavlov/configs/classifiers/intents_snips_tfidf_weighted.json +++ /dev/null @@ -1,182 +0,0 @@ -{ - "dataset_reader": { - "class_name": "snips_reader", - "x": "text", - "y": "intents", - "data_path": "{DOWNLOADS_PATH}/snips" - }, - "dataset_iterator": { - "class_name": "snips_intents_iterator", - "seed": 42 - }, - "chainer": { - "in": [ - "x" - ], - "in_y": [ - "y" - ], - "pipe": [ - { - "id": "classes_vocab", - "class_name": "simple_vocab", - "fit_on": [ - "y" - ], - "save_path": "{MODEL_PATH}/classes.dict", - "load_path": "{MODEL_PATH}/classes.dict", - "in": "y", - "out": "y_ids" - }, - { - "in": [ - "x" - ], - "out": [ - "x_vec" - ], - "fit_on": [ - "x", - "y_ids" - ], - "id": "my_tfidf_vectorizer", - "class_name": "sklearn_component", - "save_path": "{MODEL_PATH}/tfidf.pkl", - "load_path": "{MODEL_PATH}/tfidf.pkl", - "model_class": "sklearn.feature_extraction.text:TfidfVectorizer", - "infer_method": "transform", - "lowercase": true, - "analyzer": "word" - }, - { - "in": [ - "x_vec" - ], - "out": [ - "x_sel" - ], - "fit_on": [ - "x_vec", - "y_ids" - ], - "id": "my_selector", - "class_name": "sklearn_component", - "save_path": "{MODEL_PATH}/selectkbest.pkl", - "load_path": "{MODEL_PATH}/selectkbest.pkl", - "model_class": "sklearn.feature_selection:SelectKBest", - "infer_method": "transform", - "score_func": "sklearn.feature_selection:chi2", - "k": 1000 - }, - { - "in": [ - "x_sel" - ], - "out": [ - "x_pca" - ], - "fit_on": [ - "x_sel" - ], - "id": "my_pca", - "class_name": "sklearn_component", - "save_path": "{MODEL_PATH}/pca.pkl", - "load_path": "{MODEL_PATH}/pca.pkl", - "model_class": "sklearn.decomposition:PCA", - "infer_method": "transform", - "n_components": 300 - }, - { - "in": "x", - "out": "x_tok", - "id": "my_tokenizer", - "class_name": "nltk_moses_tokenizer" - }, - { - "in": "x_tok", - "out": "x_emb", - "id": "my_embedder", - "class_name": "fasttext", - "load_path": "{DOWNLOADS_PATH}/embeddings/wiki.en.bin", - "dim": 300 - }, - { - "class_name": "one_hotter", - "id": "my_onehotter", - "depth": "#classes_vocab.len", - "in": "y_ids", - "out": "y_onehot", - "single_vector": true - }, - { - "in": "x_tok", - "out": "x_weighted_emb", - "class_name": "tfidf_weighted", - "id": "my_weighted_embedder", - "embedder": "#my_embedder", - "tokenizer": "#my_tokenizer", - "vectorizer": "#my_tfidf_vectorizer", - "mean": true - }, - { - "in": [ - "x_pca", - "x_weighted_emb" - ], - "out": [ - "y_pred_ids" - ], - "fit_on": [ - "x_pca", - "x_weighted_emb", - "y_ids" - ], - "class_name": "sklearn_component", - "main": true, - "save_path": "{MODEL_PATH}/model.pkl", - "load_path": "{MODEL_PATH}/model.pkl", - "model_class": "sklearn.linear_model:LogisticRegression", - "infer_method": "predict", - "ensure_list_output": true - }, - { - "ref": "classes_vocab", - "in": "y_pred_ids", - "out": "y_pred_labels" - } - ], - "out": [ - "y_pred_labels" - ] - }, - "train": { - "batch_size": 64, - "metrics": [ - "accuracy" - ], - "show_examples": false, - "evaluation_targets": [ - "train", - "valid" - ], - "class_name": "fit_trainer" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "MODEL_PATH": "{MODELS_PATH}/classifiers/intents_snips_sklearn_v12" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/classifiers/intents_snips_sklearn_v12.tar.gz", - "subdir": "{MODELS_PATH}/classifiers" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/embeddings/wiki.en.bin", - "subdir": "{DOWNLOADS_PATH}/embeddings" - } - ] - } -} diff --git a/deeppavlov/configs/classifiers/paraphraser_bert.json b/deeppavlov/configs/classifiers/paraphraser_bert.json deleted file mode 100644 index 81da5ccb9d..0000000000 --- a/deeppavlov/configs/classifiers/paraphraser_bert.json +++ /dev/null @@ -1,104 +0,0 @@ -{ - "dataset_reader": { - "class_name": "paraphraser_reader", - "data_path": "{DOWNLOADS_PATH}/paraphraser_data", - "do_lower_case": false - }, - "dataset_iterator": { - "class_name": "siamese_iterator", - "seed": 243, - "len_valid": 500 - }, - "chainer": { - "in": [ - "text_a", - "text_b" - ], - "in_y": [ - "y" - ], - "pipe": [ - { - "class_name": "bert_preprocessor", - "vocab_file": "{DOWNLOADS_PATH}/bert_models/multi_cased_L-12_H-768_A-12/vocab.txt", - "do_lower_case": false, - "max_seq_length": 64, - "in": [ - "text_a", - "text_b" - ], - "out": [ - "bert_features" - ] - }, - { - "class_name": "bert_classifier", - "n_classes": 2, - "one_hot_labels": false, - "bert_config_file": "{DOWNLOADS_PATH}/bert_models/multi_cased_L-12_H-768_A-12/bert_config.json", - "pretrained_bert": "{DOWNLOADS_PATH}/bert_models/multi_cased_L-12_H-768_A-12/bert_model.ckpt", - "save_path": "{MODEL_PATH}/model_multi", - "load_path": "{MODEL_PATH}/model_multi", - "keep_prob": 0.5, - "learning_rate": 2e-05, - "learning_rate_drop_patience": 2, - "learning_rate_drop_div": 2.0, - "in": [ - "bert_features" - ], - "in_y": [ - "y" - ], - "out": [ - "predictions" - ] - } - ], - "out": [ - "predictions" - ] - }, - "train": { - "batch_size": 32, - "pytest_max_batches": 2, - "metrics": [ - "f1", - "acc" - ], - "validation_patience": 10, - "val_every_n_batches": 100, - "log_every_n_batches": 100, - "evaluation_targets": [ - "train", - "valid", - "test" - ], - "tensorboard_log_dir": "{MODEL_PATH}/" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "MODEL_PATH": "{MODELS_PATH}/paraphraser_bert_v0" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/datasets/paraphraser.zip", - "subdir": "{DOWNLOADS_PATH}/paraphraser_data" - }, - { - "url": "http://files.deeppavlov.ai/datasets/paraphraser_gold.zip", - "subdir": "{DOWNLOADS_PATH}/paraphraser_data" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/bert/multi_cased_L-12_H-768_A-12.zip", - "subdir": "{DOWNLOADS_PATH}/bert_models" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/classifiers/paraphraser_bert_v0.tar.gz", - "subdir": "{MODELS_PATH}" - } - ] - } -} \ No newline at end of file diff --git a/deeppavlov/configs/classifiers/paraphraser_rubert.json b/deeppavlov/configs/classifiers/paraphraser_rubert.json index 0d8f8adff8..bdc03382cd 100644 --- a/deeppavlov/configs/classifiers/paraphraser_rubert.json +++ b/deeppavlov/configs/classifiers/paraphraser_rubert.json @@ -14,24 +14,21 @@ "in_y": ["y"], "pipe": [ { - "class_name": "bert_preprocessor", - "vocab_file": "{DOWNLOADS_PATH}/bert_models/rubert_cased_L-12_H-768_A-12_v1/vocab.txt", + "class_name": "torch_transformers_preprocessor", + "vocab_file": "{TRANSFORMER}", "do_lower_case": false, "max_seq_length": 64, "in": ["text_a", "text_b"], "out": ["bert_features"] }, { - "class_name": "bert_classifier", + "class_name": "torch_transformers_classifier", "n_classes": 2, - "one_hot_labels": false, - "bert_config_file": "{DOWNLOADS_PATH}/bert_models/rubert_cased_L-12_H-768_A-12_v1/bert_config.json", - "pretrained_bert": "{DOWNLOADS_PATH}/bert_models/rubert_cased_L-12_H-768_A-12_v1/bert_model.ckpt", - "save_path": "{MODELS_PATH}/paraphraser_rubert/model_rubert", - "load_path": "{MODELS_PATH}/paraphraser_rubert/model_rubert", - "keep_prob": 0.5, - "optimizer": "tf.train:AdamOptimizer", - "learning_rate": 2e-05, + "pretrained_bert": "{TRANSFORMER}", + "save_path": "{MODEL_PATH}/model", + "load_path": "{MODEL_PATH}/model", + "optimizer": "AdamW", + "optimizer_parameters": {"lr": 2e-05}, "learning_rate_drop_patience": 3, "learning_rate_drop_div": 2.0, "in": ["bert_features"], @@ -49,15 +46,16 @@ "validation_patience": 7, "val_every_n_batches": 50, "log_every_n_batches": 50, - "validate_best": true, - "test_best": true, - "tensorboard_log_dir": "{MODELS_PATH}/paraphraser_rubert/logs" + "evaluation_targets": ["valid", "test"], + "class_name": "torch_trainer" }, "metadata": { "variables": { "ROOT_PATH": "~/.deeppavlov", "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models" + "MODELS_PATH": "{ROOT_PATH}/models", + "MODEL_PATH": "{MODELS_PATH}/classifiers/paraphraser_rubert_torch", + "TRANSFORMER": "DeepPavlov/rubert-base-cased" }, "download": [ { @@ -69,12 +67,8 @@ "subdir": "{DOWNLOADS_PATH}/paraphraser_data" }, { - "url": "http://files.deeppavlov.ai/deeppavlov_data/bert/rubert_cased_L-12_H-768_A-12_v1.tar.gz", - "subdir": "{DOWNLOADS_PATH}/bert_models" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/classifiers/paraphraser_rubert_v0.tar.gz", - "subdir": "{ROOT_PATH}/models" + "url": "http://files.deeppavlov.ai/v1/classifiers/paraphraser_rubert/paraphraser_rubert_v1.tar.gz", + "subdir": "{MODEL_PATH}" } ] } diff --git a/deeppavlov/configs/classifiers/query_pr.json b/deeppavlov/configs/classifiers/query_pr.json index f3fcee2e22..5f070b7141 100644 --- a/deeppavlov/configs/classifiers/query_pr.json +++ b/deeppavlov/configs/classifiers/query_pr.json @@ -1,9 +1,7 @@ { "dataset_reader": { - "class_name": "basic_classification_reader", - "x": "Question", - "y": "Class", - "data_path": "{DOWNLOADS_PATH}/query_prediction" + "class_name": "sq_reader", + "data_path": "{DOWNLOADS_PATH}/query_prediction/query_prediction_eng.pickle" }, "dataset_iterator": { "class_name": "basic_classification_iterator", @@ -14,8 +12,8 @@ "in_y": ["y"], "pipe": [ { - "class_name": "bert_preprocessor", - "vocab_file": "{BERT_PATH}/vocab.txt", + "class_name": "torch_transformers_preprocessor", + "vocab_file": "{TRANSFORMER}", "do_lower_case": false, "max_seq_length": 64, "in": ["x"], @@ -27,42 +25,40 @@ "fit_on": ["y"], "save_path": "{MODEL_PATH}/classes.dict", "load_path": "{MODEL_PATH}/classes.dict", - "in": "y", - "out": "y_ids" + "in": ["y"], + "out": ["y_ids"] }, { - "in": "y_ids", - "out": "y_onehot", + "in": ["y_ids"], + "out": ["y_onehot"], "class_name": "one_hotter", "depth": "#classes_vocab.len", "single_vector": true }, { - "class_name": "bert_classifier", + "class_name": "torch_transformers_classifier", "n_classes": "#classes_vocab.len", "return_probas": true, - "one_hot_labels": true, - "bert_config_file": "{BERT_PATH}/bert_config.json", - "pretrained_bert": "{BERT_PATH}/bert_model.ckpt", + "pretrained_bert": "{TRANSFORMER}", "save_path": "{MODEL_PATH}/model", "load_path": "{MODEL_PATH}/model", - "keep_prob": 0.5, - "learning_rate": 1e-05, + "optimizer": "AdamW", + "optimizer_parameters": {"lr": 1e-05}, "learning_rate_drop_patience": 5, "learning_rate_drop_div": 2.0, "in": ["bert_features"], - "in_y": ["y_onehot"], + "in_y": ["y_ids"], "out": ["y_pred_probas"] }, { - "in": "y_pred_probas", - "out": "y_pred_ids", + "in": ["y_pred_probas"], + "out": ["y_pred_ids"], "class_name": "proba2labels", "max_proba": true }, { - "in": "y_pred_ids", - "out": "y_pred_labels", + "in": ["y_pred_ids"], + "out": ["y_pred_labels"], "ref": "classes_vocab" } ], @@ -72,45 +68,36 @@ "epochs": 100, "batch_size": 64, "metrics": [ - "sets_accuracy", "f1_macro", + "accuracy", { "name": "roc_auc", "inputs": ["y_onehot", "y_pred_probas"] } ], - "validation_patience": 5, - "val_every_n_epochs": 1, - "log_every_n_epochs": 1, + "validation_patience": 10, + "val_every_n_batches": 100, + "log_every_n_batches": 100, "show_examples": false, "evaluation_targets": ["train", "valid", "test"], - "class_name": "nn_trainer", - "tensorboard_log_dir": "{MODEL_PATH}/" + "class_name": "torch_trainer" }, "metadata": { "variables": { + "TRANSFORMER": "haisongzhang/roberta-tiny-cased", "ROOT_PATH": "~/.deeppavlov", "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", "MODELS_PATH": "{ROOT_PATH}/models", - "BERT_PATH": "{DOWNLOADS_PATH}/bert_models_kbqa/cased_L-12_H-768_A-12", - "MODEL_PATH": "{MODELS_PATH}/classifiers/query_prediction" - }, - "labels": { - "telegram_utils": "IntentModel", - "server_utils": "KerasIntentModel" + "MODEL_PATH": "{MODELS_PATH}/classifiers/query_prediction_eng" }, "download": [ { - "url": "http://files.deeppavlov.ai/kbqa/datasets/query_prediction.tar.gz", - "subdir": "{DOWNLOADS_PATH}/query_prediction" + "url": "http://files.deeppavlov.ai/kbqa/wikidata/query_prediction_eng.tar.gz", + "subdir": "{MODELS_PATH}/classifiers/query_prediction_eng" }, { - "url": "http://files.deeppavlov.ai/deeppavlov_data/bert/cased_L-12_H-768_A-12.zip", - "subdir": "{DOWNLOADS_PATH}/bert_models_kbqa" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/models/query_prediction.tar.gz", - "subdir": "{MODELS_PATH}/classifiers/query_prediction" + "url": "http://files.deeppavlov.ai/kbqa/wikidata/query_prediction_eng.pickle", + "subdir": "{DOWNLOADS_PATH}/query_prediction" } ] } diff --git a/deeppavlov/configs/classifiers/rel_ranking_bert.json b/deeppavlov/configs/classifiers/rel_ranking_bert.json deleted file mode 100644 index 0ac3b504ce..0000000000 --- a/deeppavlov/configs/classifiers/rel_ranking_bert.json +++ /dev/null @@ -1,77 +0,0 @@ -{ - "dataset_reader": { - "class_name": "paraphraser_reader", - "data_path": "{DOWNLOADS_PATH}/rel_ranking_bert", - "do_lower_case": false - }, - "dataset_iterator": { - "class_name": "siamese_iterator", - "seed": 243, - "len_valid": 500 - }, - "chainer": { - "in": ["text_a", "text_b"], - "in_y": ["y"], - "pipe": [ - { - "class_name": "bert_preprocessor", - "vocab_file": "{BERT_PATH}/vocab.txt", - "do_lower_case": false, - "max_seq_length": 64, - "in": ["text_a", "text_b"], - "out": ["bert_features"] - }, - { - "class_name": "bert_classifier", - "n_classes": 2, - "return_probas": true, - "one_hot_labels": false, - "bert_config_file": "{BERT_PATH}/bert_config.json", - "pretrained_bert": "{BERT_PATH}/bert_model.ckpt", - "save_path": "{MODEL_PATH}/model", - "load_path": "{MODEL_PATH}/model", - "keep_prob": 0.5, - "learning_rate": 2e-05, - "learning_rate_drop_patience": 2, - "learning_rate_drop_div": 2.0, - "in": ["bert_features"], - "in_y": ["y"], - "out": ["predictions"] - } - ], - "out": ["predictions"] - }, - "train": { - "batch_size": 32, - "pytest_max_batches": 2, - "metrics": ["f1", "acc"], - "validation_patience": 10, - "val_every_n_batches": 100, - "log_every_n_batches": 100, - "evaluation_targets": ["train", "valid", "test"], - "tensorboard_log_dir": "{MODEL_PATH}/" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "BERT_PATH": "{DOWNLOADS_PATH}/bert_models_kbqa/cased_L-12_H-768_A-12", - "MODEL_PATH": "{MODELS_PATH}/rel_ranking_bert" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/kbqa/datasets/rel_ranking_bert.tar.gz", - "subdir": "{DOWNLOADS_PATH}/rel_ranking_bert" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/bert/cased_L-12_H-768_A-12.zip", - "subdir": "{DOWNLOADS_PATH}/bert_models_kbqa" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/models/rel_ranking_bert.tar.gz", - "subdir": "{MODELS_PATH}/rel_ranking_bert" - } - ] - } -} diff --git a/deeppavlov/configs/classifiers/rel_ranking_bert_rus.json b/deeppavlov/configs/classifiers/rel_ranking_bert_rus.json deleted file mode 100644 index f3bcfd7ccb..0000000000 --- a/deeppavlov/configs/classifiers/rel_ranking_bert_rus.json +++ /dev/null @@ -1,76 +0,0 @@ -{ - "dataset_reader": { - "class_name": "paraphraser_reader", - "data_path": "{DOWNLOADS_PATH}/rel_ranking_rus", - "do_lower_case": false - }, - "dataset_iterator": { - "class_name": "siamese_iterator", - "seed": 243, - "len_valid": 500 - }, - "chainer": { - "in": ["text_a", "text_b"], - "in_y": ["y"], - "pipe": [ - { - "class_name": "bert_preprocessor", - "vocab_file": "{DOWNLOADS_PATH}/bert_models/multi_cased_L-12_H-768_A-12/vocab.txt", - "do_lower_case": false, - "max_seq_length": 64, - "in": ["text_a", "text_b"], - "out": ["bert_features"] - }, - { - "class_name": "bert_classifier", - "n_classes": 2, - "return_probas": true, - "one_hot_labels": false, - "bert_config_file": "{DOWNLOADS_PATH}/bert_models/multi_cased_L-12_H-768_A-12/bert_config.json", - "pretrained_bert": "{DOWNLOADS_PATH}/bert_models/multi_cased_L-12_H-768_A-12/bert_model.ckpt", - "save_path": "{MODEL_PATH}/model", - "load_path": "{MODEL_PATH}/model", - "keep_prob": 0.5, - "learning_rate": 2e-05, - "learning_rate_drop_patience": 2, - "learning_rate_drop_div": 2.0, - "in": ["bert_features"], - "in_y": ["y"], - "out": ["predictions"] - } - ], - "out": ["predictions"] - }, - "train": { - "batch_size": 32, - "pytest_max_batches": 2, - "metrics": ["f1", "acc"], - "validation_patience": 10, - "val_every_n_batches": 100, - "log_every_n_batches": 100, - "evaluation_targets": ["train", "valid", "test"], - "tensorboard_log_dir": "{MODEL_PATH}/" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "MODEL_PATH": "{MODELS_PATH}/rel_ranking_bert_rus" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/bert/multi_cased_L-12_H-768_A-12.zip", - "subdir": "{DOWNLOADS_PATH}/bert_models" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/datasets/rel_ranking_bert_rus.tar.gz", - "subdir": "{DOWNLOADS_PATH}/rel_ranking_rus" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/models/rel_ranking_bert_rus.tar.gz", - "subdir": "{MODELS_PATH}/rel_ranking_bert_rus" - } - ] - } -} diff --git a/deeppavlov/configs/classifiers/relation_prediction_rus.json b/deeppavlov/configs/classifiers/relation_prediction_rus.json deleted file mode 100644 index 24f16cc159..0000000000 --- a/deeppavlov/configs/classifiers/relation_prediction_rus.json +++ /dev/null @@ -1,132 +0,0 @@ -{ - "dataset_reader": { - "class_name": "basic_classification_reader", - "x": "Question", - "y": "Class", - "data_path": "{DOWNLOADS_PATH}/relation_prediction" - }, - "dataset_iterator": { - "class_name": "basic_classification_iterator", - "seed": 42 - }, - "chainer": { - "in": ["x"], - "in_y": ["y"], - "pipe": [ - { - "id": "classes_vocab", - "class_name": "simple_vocab", - "fit_on": ["y"], - "save_path": "{MODEL_PATH}/classes.dict", - "load_path": "{MODEL_PATH}/classes.dict", - "in": "y", - "out": "y_ids" - }, - { - "in": "x", - "out": "x_tok", - "id": "my_tokenizer", - "class_name": "nltk_tokenizer", - "tokenizer": "wordpunct_tokenize" - }, - { - "in": ["x_tok"], - "class_name": "str_lower", - "out": ["x_lower"] - }, - { - "in": "x_lower", - "out": "x_emb", - "id": "my_embedder", - "class_name": "fasttext", - "load_path": "{DOWNLOADS_PATH}/embeddings/ft_native_300_ru_wiki_lenta_nltk_word_tokenize.bin" - }, - { - "in": "y_ids", - "out": ["y_onehot"], - "class_name": "one_hotter", - "depth": "#classes_vocab.len" - }, - { - "in": ["x_emb"], - "in_y": ["y_onehot"], - "out": ["y_pred_probas"], - "main": true, - "class_name": "keras_classification_model", - "save_path": "{MODEL_PATH}/model", - "load_path": "{MODEL_PATH}/model", - "embedding_size": "#my_embedder.dim", - "n_classes": "#classes_vocab.len", - "kernel_sizes_cnn": [1, 2, 3], - "filters_cnn": 256, - "optimizer": "Adam", - "learning_rate": 0.05, - "learning_rate_decay": 0.1, - "loss": "categorical_crossentropy", - "last_layer_activation": "softmax", - "text_size": 36, - "coef_reg_cnn": 1e-3, - "coef_reg_den": 1e-2, - "dropout_rate": 0.5, - "dense_size": 300, - "model_name": "cnn_model" - }, - { - "in": "y_pred_probas", - "out": "y_pred_ids", - "class_name": "proba2labels", - "top_n": 5 - }, - { - "in": "y_pred_ids", - "out": "y_pred_labels", - "ref": "classes_vocab" - } - ], - "out": ["y_pred_probas", "y_pred_labels"] - }, - "train": { - "epochs": 1000, - "batch_size": 64, - "metrics": [ - "sets_accuracy", - "f1_macro", - { - "name": "roc_auc", - "inputs": ["y_onehot", "y_pred_probas"] - } - ], - "validation_patience": 5, - "val_every_n_epochs": 5, - "log_every_n_epochs": 5, - "show_examples": false, - "evaluation_targets": ["train", "valid", "test"], - "class_name": "nn_trainer" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "MODEL_PATH": "{MODELS_PATH}/kbqa_mix_lowercase/relation_prediction" - }, - "labels": { - "telegram_utils": "IntentModel", - "server_utils": "KerasIntentModel" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/datasets/relation_prediction_rus.tar.gz", - "subdir": "{DOWNLOADS_PATH}" - }, - { - "url": "http://files.deeppavlov.ai/embeddings/ft_native_300_ru_wiki_lenta_nltk_word_tokenize/ft_native_300_ru_wiki_lenta_nltk_word_tokenize.bin", - "subdir": "{DOWNLOADS_PATH}/embeddings" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/relation_prediction_rus.tar.gz", - "subdir": "{MODELS_PATH}/kbqa_mix_lowercase/relation_prediction" - } - ] - } -} diff --git a/deeppavlov/configs/classifiers/ru_obscenity_classifier.json b/deeppavlov/configs/classifiers/ru_obscenity_classifier.json deleted file mode 100644 index 2344a71b87..0000000000 --- a/deeppavlov/configs/classifiers/ru_obscenity_classifier.json +++ /dev/null @@ -1,30 +0,0 @@ -{ - "chainer": { - "in": [ - "text" - ], - "pipe": [ - { - "class_name": "ru_obscenity_classifier", - "data_path": "{DOWNLOADS_PATH}/obscenity_dataset/", - "in": "text", - "out": "flags_obscenity_or_not" - } - ], - "out": [ - "flags_obscenity_or_not" - ] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/models/obscenity_classifier/ru_obscenity_dataset.zip", - "subdir": "{DOWNLOADS_PATH}/obscenity_dataset" - } - ] - } -} \ No newline at end of file diff --git a/deeppavlov/configs/classifiers/rusentiment_bert.json b/deeppavlov/configs/classifiers/rusentiment_bert.json index 9e29925500..f0d97a16c5 100644 --- a/deeppavlov/configs/classifiers/rusentiment_bert.json +++ b/deeppavlov/configs/classifiers/rusentiment_bert.json @@ -30,8 +30,8 @@ ], "pipe": [ { - "class_name": "bert_preprocessor", - "vocab_file": "{DOWNLOADS_PATH}/bert_models/multi_cased_L-12_H-768_A-12/vocab.txt", + "class_name": "torch_transformers_preprocessor", + "vocab_file": "{TRANSFORMER}", "do_lower_case": false, "max_seq_length": 64, "in": [ @@ -60,16 +60,14 @@ "single_vector": true }, { - "class_name": "bert_classifier", + "class_name": "torch_transformers_classifier", "n_classes": "#classes_vocab.len", "return_probas": true, "one_hot_labels": true, - "bert_config_file": "{DOWNLOADS_PATH}/bert_models/multi_cased_L-12_H-768_A-12/bert_config.json", - "pretrained_bert": "{DOWNLOADS_PATH}/bert_models/multi_cased_L-12_H-768_A-12/bert_model.ckpt", + "pretrained_bert": "{TRANSFORMER}", "save_path": "{MODEL_PATH}/model", "load_path": "{MODEL_PATH}/model", - "keep_prob": 0.5, - "learning_rate": 1e-05, + "optimizer_parameters": {"lr": 1e-05}, "learning_rate_drop_patience": 5, "learning_rate_drop_div": 2.0, "in": [ @@ -123,23 +121,20 @@ "valid", "test" ], - "tensorboard_log_dir": "{MODEL_PATH}/" + "class_name": "torch_trainer" }, "metadata": { "variables": { "ROOT_PATH": "~/.deeppavlov", "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", "MODELS_PATH": "{ROOT_PATH}/models", - "MODEL_PATH": "{MODELS_PATH}/classifiers/rusentiment_bert_v0/" + "MODEL_PATH": "{MODELS_PATH}/classifiers/rusentiment_bert_torch", + "TRANSFORMER": "bert-base-multilingual-cased" }, "download": [ { - "url": "http://files.deeppavlov.ai/deeppavlov_data/bert/multi_cased_L-12_H-768_A-12.zip", - "subdir": "{DOWNLOADS_PATH}/bert_models" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/classifiers/rusentiment_bert_v0.tar.gz", - "subdir": "{MODELS_PATH}/classifiers/" + "url": "http://files.deeppavlov.ai/v1/classifiers/rusentiment_bert/rusentiment_bert_torch.tar.gz", + "subdir": "{MODEL_PATH}" } ] } diff --git a/deeppavlov/configs/classifiers/rusentiment_bigru_superconv.json b/deeppavlov/configs/classifiers/rusentiment_bigru_superconv.json deleted file mode 100644 index ceff4b647a..0000000000 --- a/deeppavlov/configs/classifiers/rusentiment_bigru_superconv.json +++ /dev/null @@ -1,165 +0,0 @@ -{ - "dataset_reader": { - "class_name": "basic_classification_reader", - "x": "text", - "y": "label", - "data_path": "{DOWNLOADS_PATH}/rusentiment/", - "train": "rusentiment_random_posts.csv", - "test": "rusentiment_test.csv" - }, - "dataset_iterator": { - "class_name": "basic_classification_iterator", - "seed": 42, - "field_to_split": "train", - "split_seed": 23, - "split_fields": [ - "train", - "valid" - ], - "split_proportions": [ - 0.9, - 0.1 - ] - }, - "chainer": { - "in": [ - "x" - ], - "in_y": [ - "y" - ], - "pipe": [ - { - "id": "classes_vocab", - "class_name": "simple_vocab", - "fit_on": [ - "y" - ], - "save_path": "{MODEL_PATH}/classes.dict", - "load_path": "{MODEL_PATH}/classes.dict", - "in": "y", - "out": "y_ids" - }, - { - "in": [ - "x" - ], - "out": [ - "x_prep" - ], - "class_name": "dirty_comments_preprocessor", - "remove_punctuation": false - }, - { - "in": "x_prep", - "out": "x_tok", - "id": "my_tokenizer", - "class_name": "nltk_tokenizer", - "tokenizer": "wordpunct_tokenize" - }, - { - "in": "x_tok", - "out": "x_emb", - "id": "my_embedder", - "class_name": "fasttext", - "load_path": "{DOWNLOADS_PATH}/embeddings/ft_native_300_ru_twitter_nltk_word_tokenize.bin", - "dim": 300, - "pad_zero": true - }, - { - "in": "y_ids", - "out": "y_onehot", - "class_name": "one_hotter", - "depth": "#classes_vocab.len", - "single_vector": true - }, - { - "in": [ - "x_emb" - ], - "in_y": [ - "y_onehot" - ], - "out": [ - "y_pred_probas" - ], - "main": true, - "class_name": "keras_classification_model", - "save_path": "{MODEL_PATH}/model", - "load_path": "{MODEL_PATH}/model", - "embedding_size": "#my_embedder.dim", - "n_classes": "#classes_vocab.len", - "units_gru": 256, - "optimizer": "Adam", - "learning_rate": 0.001, - "learning_rate_decay": "trapezoid", - "learning_rate_decay_batches": 10000, - "fit_batch_size": 64, - "fit_on": ["x_emb", "y_onehot"], - "momentum": [0.95, 0.55], - "momentum_decay": "trapezoid", - "momentum_decay_batches": 10000, - "loss": "categorical_crossentropy", - "last_layer_activation": "softmax", - "coef_reg_gru": 1e-6, - "coef_reg_den": 1e-6, - "dropout_rate": 0.2, - "rec_dropout_rate": 0.2, - "dense_size": 100, - "model_name": "bigru_with_max_aver_pool_model" - }, - { - "in": "y_pred_probas", - "out": "y_pred_ids", - "class_name": "proba2labels", - "max_proba": true - }, - { - "in": "y_pred_ids", - "out": "y_pred_labels", - "ref": "classes_vocab" - } - ], - "out": [ - "y_pred_labels" - ] - }, - "train": { - "epochs": 100, - "batch_size": 64, - "metrics": [ - "f1_weighted", - "f1_macro", - "accuracy", - { - "name": "roc_auc", - "inputs": ["y_onehot", "y_pred_probas"] - } - ], - "validation_patience": 5, - "val_every_n_epochs": 1, - "log_every_n_epochs": 1, - "show_examples": false, - "validate_best": true, - "test_best": true, - "tensorboard_log_dir": "{MODEL_PATH}/logs" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "MODEL_PATH": "{MODELS_PATH}/classifiers/rusentiment_v14" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/embeddings/ft_native_300_ru_twitter_nltk_word_tokenize.bin", - "subdir": "{DOWNLOADS_PATH}/embeddings" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/classifiers/rusentiment_v14.tar.gz", - "subdir": "{MODELS_PATH}/classifiers" - } - ] - } -} diff --git a/deeppavlov/configs/classifiers/rusentiment_cnn.json b/deeppavlov/configs/classifiers/rusentiment_cnn.json deleted file mode 100644 index 0706d803e7..0000000000 --- a/deeppavlov/configs/classifiers/rusentiment_cnn.json +++ /dev/null @@ -1,167 +0,0 @@ -{ - "dataset_reader": { - "class_name": "basic_classification_reader", - "x": "text", - "y": "label", - "data_path": "{DOWNLOADS_PATH}/rusentiment/", - "train": "rusentiment_random_posts.csv", - "test": "rusentiment_test.csv" - }, - "dataset_iterator": { - "class_name": "basic_classification_iterator", - "seed": 42, - "field_to_split": "train", - "split_fields": [ - "train", - "valid" - ], - "split_proportions": [ - 0.9, - 0.1 - ] - }, - "chainer": { - "in": [ - "x" - ], - "in_y": [ - "y" - ], - "pipe": [ - { - "id": "classes_vocab", - "class_name": "simple_vocab", - "fit_on": [ - "y" - ], - "save_path": "{MODEL_PATH}/classes.dict", - "load_path": "{MODEL_PATH}/classes.dict", - "in": "y", - "out": "y_ids" - }, - { - "in": [ - "x" - ], - "out": [ - "x_prep" - ], - "class_name": "dirty_comments_preprocessor" - }, - { - "in": "x_prep", - "out": "x_tok", - "id": "my_tokenizer", - "class_name": "nltk_tokenizer", - "tokenizer": "wordpunct_tokenize" - }, - { - "in": "x_tok", - "out": "x_emb", - "id": "my_embedder", - "class_name": "fasttext", - "load_path": "{DOWNLOADS_PATH}/embeddings/ft_native_300_ru_wiki_lenta_nltk_wordpunct_tokenize.bin", - "pad_zero": true - }, - { - "in": "y_ids", - "out": "y_onehot", - "class_name": "one_hotter", - "depth": "#classes_vocab.len", - "single_vector": true - }, - { - "in": [ - "x_emb" - ], - "in_y": [ - "y_onehot" - ], - "out": [ - "y_pred_probas" - ], - "main": true, - "class_name": "keras_classification_model", - "save_path": "{MODEL_PATH}/model", - "load_path": "{MODEL_PATH}/model", - "embedding_size": "#my_embedder.dim", - "n_classes": "#classes_vocab.len", - "kernel_sizes_cnn": [ - 3, - 5, - 7 - ], - "filters_cnn": 256, - "optimizer": "Adam", - "learning_rate": [0.01, 1e-4], - "learning_rate_decay": "exponential", - "learning_rate_decay_batches": 5000, - "learning_rate_drop_patience": 5, - "learning_rate_drop_div": 5.0, - "loss": "binary_crossentropy", - "last_layer_activation": "softmax", - "coef_reg_cnn": 1e-3, - "coef_reg_den": 1e-2, - "dropout_rate": 0.5, - "dense_size": 100, - "model_name": "cnn_model" - }, - { - "in": "y_pred_probas", - "out": "y_pred_ids", - "class_name": "proba2labels", - "max_proba": true - }, - { - "in": "y_pred_ids", - "out": "y_pred_labels", - "ref": "classes_vocab" - } - ], - "out": [ - "y_pred_labels" - ] - }, - "train": { - "epochs": 100, - "batch_size": 64, - "metrics": [ - "f1_weighted", - "accuracy", - "f1_macro", - { - "name": "roc_auc", - "inputs": ["y_onehot", "y_pred_probas"] - } - ], - "validation_patience": 5, - "val_every_n_epochs": 1, - "log_every_n_epochs": 1, - "tensorboard_log_dir": "{MODEL_PATH}/logs", - "show_examples": false, - "evaluation_targets": [ - "train", - "valid", - "test" - ], - "class_name": "nn_trainer" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "MODEL_PATH": "{MODELS_PATH}/classifiers/rusentiment_v3" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/embeddings/ft_native_300_ru_wiki_lenta_nltk_wordpunct_tokenize/ft_native_300_ru_wiki_lenta_nltk_wordpunct_tokenize.bin", - "subdir": "{DOWNLOADS_PATH}/embeddings" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/classifiers/rusentiment_v3.tar.gz", - "subdir": "{MODELS_PATH}/classifiers" - } - ] - } -} diff --git a/deeppavlov/configs/classifiers/rusentiment_convers_bert.json b/deeppavlov/configs/classifiers/rusentiment_convers_bert.json index 74430e3a0a..5f69923294 100644 --- a/deeppavlov/configs/classifiers/rusentiment_convers_bert.json +++ b/deeppavlov/configs/classifiers/rusentiment_convers_bert.json @@ -30,8 +30,8 @@ ], "pipe": [ { - "class_name": "bert_preprocessor", - "vocab_file": "{DOWNLOADS_PATH}/bert_models/ru_conversational_cased_L-12_H-768_A-12/vocab.txt", + "class_name": "torch_transformers_preprocessor", + "vocab_file": "{TRANSFORMER}", "do_lower_case": false, "max_seq_length": 64, "in": [ @@ -60,16 +60,14 @@ "single_vector": true }, { - "class_name": "bert_classifier", + "class_name": "torch_transformers_classifier", "n_classes": "#classes_vocab.len", "return_probas": true, "one_hot_labels": true, - "bert_config_file": "{DOWNLOADS_PATH}/bert_models/ru_conversational_cased_L-12_H-768_A-12/bert_config.json", - "pretrained_bert": "{DOWNLOADS_PATH}/bert_models/ru_conversational_cased_L-12_H-768_A-12/bert_model.ckpt", + "pretrained_bert": "{TRANSFORMER}", "save_path": "{MODEL_PATH}/model", "load_path": "{MODEL_PATH}/model", - "keep_prob": 0.5, - "learning_rate": 1e-05, + "optimizer_parameters": {"lr": 1e-05}, "learning_rate_drop_patience": 5, "learning_rate_drop_div": 2.0, "in": [ @@ -123,23 +121,20 @@ "valid", "test" ], - "tensorboard_log_dir": "{MODEL_PATH}/" + "class_name": "torch_trainer" }, "metadata": { "variables": { "ROOT_PATH": "~/.deeppavlov", "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", "MODELS_PATH": "{ROOT_PATH}/models", - "MODEL_PATH": "{MODELS_PATH}/classifiers/rusentiment_convers_bert_v0/" + "MODEL_PATH": "{MODELS_PATH}/classifiers/rusentiment_convers_bert_torch", + "TRANSFORMER": "DeepPavlov/rubert-base-cased-conversational" }, "download": [ { - "url": "http://files.deeppavlov.ai/deeppavlov_data/bert/ru_conversational_cased_L-12_H-768_A-12.tar.gz", - "subdir": "{DOWNLOADS_PATH}/bert_models" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/classifiers/rusentiment_convers_bert_v0.tar.gz", - "subdir": "{MODELS_PATH}/classifiers/" + "url": "http://files.deeppavlov.ai/v1/classifiers/rusentiment_convers_bert/rusentiment_convers_bert_torch.tar.gz", + "subdir": "{MODEL_PATH}" } ] } diff --git a/deeppavlov/configs/classifiers/rusentiment_elmo_twitter_cnn.json b/deeppavlov/configs/classifiers/rusentiment_elmo_twitter_cnn.json deleted file mode 100644 index 1418b30dc2..0000000000 --- a/deeppavlov/configs/classifiers/rusentiment_elmo_twitter_cnn.json +++ /dev/null @@ -1,170 +0,0 @@ -{ - "dataset_reader": { - "class_name": "basic_classification_reader", - "x": "text", - "y": "label", - "data_path": "{DOWNLOADS_PATH}/rusentiment/", - "train": "rusentiment_random_posts.csv", - "test": "rusentiment_test.csv" - }, - "dataset_iterator": { - "class_name": "basic_classification_iterator", - "seed": 42, - "field_to_split": "train", - "split_seed": 23, - "split_fields": [ - "train", - "valid" - ], - "split_proportions": [ - 0.9, - 0.1 - ] - }, - "chainer": { - "in": [ - "x" - ], - "in_y": [ - "y" - ], - "pipe": [ - { - "id": "classes_vocab", - "class_name": "simple_vocab", - "fit_on": [ - "y" - ], - "save_path": "{MODEL_PATH}/classes.dict", - "load_path": "{MODEL_PATH}/classes.dict", - "in": "y", - "out": "y_ids" - }, - { - "in": [ - "x" - ], - "out": [ - "x_prep" - ], - "class_name": "dirty_comments_preprocessor", - "remove_punctuation": false - }, - { - "in": "x_prep", - "out": "x_tok", - "id": "my_tokenizer", - "class_name": "nltk_tokenizer", - "tokenizer": "wordpunct_tokenize" - }, - { - "in": [ - "x_tok" - ], - "out": [ - "x_emb" - ], - "id": "my_embedder", - "class_name": "elmo_embedder", - "elmo_output_names": [ - "elmo" - ], - "mini_batch_size": 32, - "spec": "http://files.deeppavlov.ai/deeppavlov_data/elmo_ru-twitter_2013-01_2018-04_600k_steps.tar.gz", - "pad_zero": true - }, - { - "in": "y_ids", - "out": "y_onehot", - "class_name": "one_hotter", - "depth": "#classes_vocab.len", - "single_vector": true - }, - { - "in": [ - "x_emb" - ], - "in_y": [ - "y_onehot" - ], - "out": [ - "y_pred_probas" - ], - "main": true, - "class_name": "keras_classification_model", - "save_path": "{MODEL_PATH}/model", - "load_path": "{MODEL_PATH}/model", - "embedding_size": "#my_embedder.dim", - "n_classes": "#classes_vocab.len", - "kernel_sizes_cnn": [ - 3, - 5, - 7 - ], - "filters_cnn": 256, - "optimizer": "Adam", - "learning_rate": 0.01, - "learning_rate_decay": 0.1, - "loss": "categorical_crossentropy", - "last_layer_activation": "softmax", - "coef_reg_cnn": 1e-3, - "coef_reg_den": 1e-2, - "dropout_rate": 0.5, - "dense_size": 100, - "model_name": "cnn_model" - }, - { - "in": "y_pred_probas", - "out": "y_pred_ids", - "class_name": "proba2labels", - "max_proba": true - }, - { - "in": "y_pred_ids", - "out": "y_pred_labels", - "ref": "classes_vocab" - } - ], - "out": [ - "y_pred_labels" - ] - }, - "train": { - "epochs": 100, - "batch_size": 128, - "metrics": [ - "f1_weighted", - "f1_macro", - "accuracy", - { - "name": "roc_auc", - "inputs": ["y_onehot", "y_pred_probas"] - } - ], - "validation_patience": 5, - "val_every_n_epochs": 1, - "log_every_n_epochs": 1, - "show_examples": false, - "evaluation_targets": [ - "train", - "valid", - "test" - ], - "tensorboard_log_dir": "{MODEL_PATH}/logs", - "class_name": "nn_trainer" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "MODEL_PATH": "{MODELS_PATH}/classifiers/rusentiment_v10" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/classifiers/rusentiment_v10.tar.gz", - "subdir": "{MODELS_PATH}/classifiers" - } - ] - } -} diff --git a/deeppavlov/configs/classifiers/sentiment_imdb_bert.json b/deeppavlov/configs/classifiers/sentiment_imdb_bert.json deleted file mode 100644 index 8e62aefe8c..0000000000 --- a/deeppavlov/configs/classifiers/sentiment_imdb_bert.json +++ /dev/null @@ -1,142 +0,0 @@ -{ - "dataset_reader": { - "class_name": "imdb_reader", - "data_path": "{DOWNLOADS_PATH}/aclImdb" - }, - "dataset_iterator": { - "class_name": "basic_classification_iterator", - "seed": 42, - "split_seed": 23, - "field_to_split": "train", - "stratify": true, - "split_fields": [ - "train", - "valid" - ], - "split_proportions": [ - 0.9, - 0.1 - ] - }, - "chainer": { - "in": [ - "x" - ], - "in_y": [ - "y" - ], - "pipe": [ - { - "class_name": "bert_preprocessor", - "vocab_file": "{DOWNLOADS_PATH}/bert_models/cased_L-12_H-768_A-12/vocab.txt", - "do_lower_case": false, - "max_seq_length": 450, - "in": [ - "x" - ], - "out": [ - "bert_features" - ] - }, - { - "id": "classes_vocab", - "class_name": "simple_vocab", - "fit_on": [ - "y" - ], - "save_path": "{MODEL_PATH}/classes.dict", - "load_path": "{MODEL_PATH}/classes.dict", - "in": "y", - "out": "y_ids" - }, - { - "in": "y_ids", - "out": "y_onehot", - "class_name": "one_hotter", - "depth": "#classes_vocab.len", - "single_vector": true - }, - { - "class_name": "bert_classifier", - "n_classes": "#classes_vocab.len", - "return_probas": true, - "one_hot_labels": true, - "bert_config_file": "{DOWNLOADS_PATH}/bert_models/cased_L-12_H-768_A-12/bert_config.json", - "pretrained_bert": "{DOWNLOADS_PATH}/bert_models/cased_L-12_H-768_A-12/bert_model.ckpt", - "save_path": "{MODEL_PATH}/model", - "load_path": "{MODEL_PATH}/model", - "keep_prob": 0.5, - "learning_rate": 1e-05, - "learning_rate_drop_patience": 5, - "learning_rate_drop_div": 2.0, - "in": [ - "bert_features" - ], - "in_y": [ - "y_onehot" - ], - "out": [ - "y_pred_probas" - ] - }, - { - "in": "y_pred_probas", - "out": "y_pred_ids", - "class_name": "proba2labels", - "max_proba": true - }, - { - "in": "y_pred_ids", - "out": "y_pred_labels", - "ref": "classes_vocab" - } - ], - "out": [ - "y_pred_labels" - ] - }, - "train": { - "batch_size": 8, - "epochs": 100, - "metrics": [ - "f1_weighted", - "f1_macro", - { - "name": "roc_auc", - "inputs": [ - "y_onehot", - "y_pred_probas" - ] - } - ], - "show_examples": false, - "pytest_max_batches": 2, - "validation_patience": 5, - "val_every_n_epochs": 1, - "log_every_n_epochs": 1, - "evaluation_targets": [ - "train", - "valid", - "test" - ], - "tensorboard_log_dir": "{MODEL_PATH}/" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "MODEL_PATH": "{MODELS_PATH}/classifiers/sentiment_imdb_bert_v0/" - }, - "labels": { - "telegram_utils": "IntentModel", - "server_utils": "KerasIntentModel" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/bert/cased_L-12_H-768_A-12.zip", - "subdir": "{DOWNLOADS_PATH}/bert_models" - } - ] - } -} diff --git a/deeppavlov/configs/classifiers/sentiment_imdb_conv_bert.json b/deeppavlov/configs/classifiers/sentiment_imdb_conv_bert.json deleted file mode 100644 index 4e1a1287b5..0000000000 --- a/deeppavlov/configs/classifiers/sentiment_imdb_conv_bert.json +++ /dev/null @@ -1,142 +0,0 @@ -{ - "dataset_reader": { - "class_name": "imdb_reader", - "data_path": "{DOWNLOADS_PATH}/aclImdb" - }, - "dataset_iterator": { - "class_name": "basic_classification_iterator", - "seed": 42, - "split_seed": 23, - "field_to_split": "train", - "stratify": true, - "split_fields": [ - "train", - "valid" - ], - "split_proportions": [ - 0.9, - 0.1 - ] - }, - "chainer": { - "in": [ - "x" - ], - "in_y": [ - "y" - ], - "pipe": [ - { - "class_name": "bert_preprocessor", - "vocab_file": "{DOWNLOADS_PATH}/bert_models/conversational_cased_L-12_H-768_A-12/vocab.txt", - "do_lower_case": false, - "max_seq_length": 450, - "in": [ - "x" - ], - "out": [ - "bert_features" - ] - }, - { - "id": "classes_vocab", - "class_name": "simple_vocab", - "fit_on": [ - "y" - ], - "save_path": "{MODEL_PATH}/classes.dict", - "load_path": "{MODEL_PATH}/classes.dict", - "in": "y", - "out": "y_ids" - }, - { - "in": "y_ids", - "out": "y_onehot", - "class_name": "one_hotter", - "depth": "#classes_vocab.len", - "single_vector": true - }, - { - "class_name": "bert_classifier", - "n_classes": "#classes_vocab.len", - "return_probas": true, - "one_hot_labels": true, - "bert_config_file": "{DOWNLOADS_PATH}/bert_models/conversational_cased_L-12_H-768_A-12/bert_config.json", - "pretrained_bert": "{DOWNLOADS_PATH}/bert_models/conversational_cased_L-12_H-768_A-12/bert_model.ckpt", - "save_path": "{MODEL_PATH}/model", - "load_path": "{MODEL_PATH}/model", - "keep_prob": 0.5, - "learning_rate": 1e-05, - "learning_rate_drop_patience": 5, - "learning_rate_drop_div": 2.0, - "in": [ - "bert_features" - ], - "in_y": [ - "y_onehot" - ], - "out": [ - "y_pred_probas" - ] - }, - { - "in": "y_pred_probas", - "out": "y_pred_ids", - "class_name": "proba2labels", - "max_proba": true - }, - { - "in": "y_pred_ids", - "out": "y_pred_labels", - "ref": "classes_vocab" - } - ], - "out": [ - "y_pred_labels" - ] - }, - "train": { - "batch_size": 8, - "epochs": 100, - "metrics": [ - "f1_weighted", - "f1_macro", - { - "name": "roc_auc", - "inputs": [ - "y_onehot", - "y_pred_probas" - ] - } - ], - "show_examples": false, - "pytest_max_batches": 2, - "validation_patience": 5, - "val_every_n_epochs": 1, - "log_every_n_epochs": 1, - "evaluation_targets": [ - "train", - "valid", - "test" - ], - "tensorboard_log_dir": "{MODEL_PATH}/" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "MODEL_PATH": "{MODELS_PATH}/classifiers/sentiment_imdb_conv_bert_v0/" - }, - "labels": { - "telegram_utils": "IntentModel", - "server_utils": "KerasIntentModel" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/bert/conversational_cased_L-12_H-768_A-12.tar.gz", - "subdir": "{DOWNLOADS_PATH}/bert_models" - } - ] - } -} diff --git a/deeppavlov/configs/classifiers/sentiment_sst_conv_bert.json b/deeppavlov/configs/classifiers/sentiment_sst_conv_bert.json index f88b0ae9b7..f70523e41c 100644 --- a/deeppavlov/configs/classifiers/sentiment_sst_conv_bert.json +++ b/deeppavlov/configs/classifiers/sentiment_sst_conv_bert.json @@ -21,8 +21,8 @@ ], "pipe": [ { - "class_name": "bert_preprocessor", - "vocab_file": "{MODEL_PATH}/vocab.txt", + "class_name": "torch_transformers_preprocessor", + "vocab_file": "{TRANSFORMER}", "do_lower_case": false, "max_seq_length": 64, "in": [ @@ -51,15 +51,14 @@ "single_vector": true }, { - "class_name": "bert_classifier", + "class_name": "torch_transformers_classifier", "n_classes": "#classes_vocab.len", "return_probas": true, "one_hot_labels": true, - "bert_config_file": "{MODEL_PATH}/bert_config.json", + "pretrained_bert": "{TRANSFORMER}", "save_path": "{MODEL_PATH}/model", "load_path": "{MODEL_PATH}/model", - "keep_prob": 0.5, - "learning_rate": 1e-05, + "optimizer_parameters": {"lr": 1e-05}, "learning_rate_drop_patience": 5, "learning_rate_drop_div": 2.0, "in": [ @@ -111,15 +110,15 @@ "valid", "test" ], - "class_name": "nn_trainer", - "tensorboard_log_dir": "{MODEL_PATH}/" + "class_name": "torch_trainer" }, "metadata": { "variables": { "ROOT_PATH": "~/.deeppavlov", "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", "MODELS_PATH": "{ROOT_PATH}/models", - "MODEL_PATH": "{MODELS_PATH}/classifiers/sentiment_sst_bert_v2" + "MODEL_PATH": "{MODELS_PATH}/classifiers/sentiment_sst_bert_torch", + "TRANSFORMER": "DeepPavlov/bert-base-cased-conversational" }, "download": [ { @@ -127,8 +126,8 @@ "subdir": "{DOWNLOADS_PATH}" }, { - "url": "http://files.deeppavlov.ai/deeppavlov_data/classifiers/sentiment_sst_bert_v2.tar.gz", - "subdir": "{MODELS_PATH}/classifiers" + "url": "http://files.deeppavlov.ai/v1/classifiers/sentiment_sst_bert/sentiment_sst_bert_torch.tar.gz", + "subdir": "{MODEL_PATH}" } ] } diff --git a/deeppavlov/configs/classifiers/sentiment_sst_multi_bert.json b/deeppavlov/configs/classifiers/sentiment_sst_multi_bert.json deleted file mode 100644 index 95a46ad544..0000000000 --- a/deeppavlov/configs/classifiers/sentiment_sst_multi_bert.json +++ /dev/null @@ -1,135 +0,0 @@ -{ - "dataset_reader": { - "class_name": "basic_classification_reader", - "x": "text", - "y": "fine_grained_label", - "data_path": "{DOWNLOADS_PATH}/stanfordSentimentTreebank", - "train": "train_fine_grained.csv", - "valid": "valid_fine_grained.csv", - "test": "test_fine_grained.csv" - }, - "dataset_iterator": { - "class_name": "basic_classification_iterator", - "seed": 42 - }, - "chainer": { - "in": [ - "x" - ], - "in_y": [ - "y" - ], - "pipe": [ - { - "class_name": "bert_preprocessor", - "vocab_file": "{MODEL_PATH}/vocab.txt", - "do_lower_case": false, - "max_seq_length": 64, - "in": [ - "x" - ], - "out": [ - "bert_features" - ] - }, - { - "id": "classes_vocab", - "class_name": "simple_vocab", - "fit_on": [ - "y" - ], - "save_path": "{MODEL_PATH}/classes.dict", - "load_path": "{MODEL_PATH}/classes.dict", - "in": "y", - "out": "y_ids" - }, - { - "in": "y_ids", - "out": "y_onehot", - "class_name": "one_hotter", - "depth": "#classes_vocab.len", - "single_vector": true - }, - { - "class_name": "bert_classifier", - "n_classes": "#classes_vocab.len", - "return_probas": true, - "one_hot_labels": true, - "bert_config_file": "{MODEL_PATH}/bert_config.json", - "save_path": "{MODEL_PATH}/model", - "load_path": "{MODEL_PATH}/model", - "keep_prob": 0.5, - "learning_rate": 1e-05, - "learning_rate_drop_patience": 5, - "learning_rate_drop_div": 2.0, - "in": [ - "bert_features" - ], - "in_y": [ - "y_onehot" - ], - "out": [ - "y_pred_probas" - ] - }, - { - "in": "y_pred_probas", - "out": "y_pred_ids", - "class_name": "proba2labels", - "max_proba": true - }, - { - "in": "y_pred_ids", - "out": "y_pred_labels", - "ref": "classes_vocab" - } - ], - "out": [ - "y_pred_labels" - ] - }, - "train": { - "epochs": 100, - "batch_size": 64, - "metrics": [ - "accuracy", - { - "name": "roc_auc", - "inputs": [ - "y_onehot", - "y_pred_probas" - ] - }, - "f1_macro" - ], - "validation_patience": 5, - "val_every_n_epochs": 1, - "log_every_n_epochs": 1, - "show_examples": false, - "evaluation_targets": [ - "train", - "valid", - "test" - ], - "class_name": "nn_trainer", - "tensorboard_log_dir": "{MODEL_PATH}/" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "MODEL_PATH": "{MODELS_PATH}/classifiers/sentiment_sst_bert_v1" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/datasets/stanfordSentimentTreebank.zip", - "subdir": "{DOWNLOADS_PATH}" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/classifiers/sentiment_sst_bert_v1.tar.gz", - "subdir": "{MODELS_PATH}/classifiers" - } - ] - } -} diff --git a/deeppavlov/configs/classifiers/sentiment_twitter.json b/deeppavlov/configs/classifiers/sentiment_twitter.json index 0d02ec5927..304d766466 100644 --- a/deeppavlov/configs/classifiers/sentiment_twitter.json +++ b/deeppavlov/configs/classifiers/sentiment_twitter.json @@ -55,13 +55,13 @@ "x_emb" ], "in_y": [ - "y_onehot" + "y_ids" ], "out": [ "y_pred_probas" ], "main": true, - "class_name": "keras_classification_model", + "class_name": "torch_text_classification_model", "save_path": "{MODEL_PATH}/model", "load_path": "{MODEL_PATH}/model", "embedding_size": "#my_embedder.dim", @@ -72,15 +72,14 @@ 7 ], "filters_cnn": 256, - "optimizer": "Adam", - "learning_rate": 0.01, - "learning_rate_decay": 0.1, - "loss": "binary_crossentropy", - "last_layer_activation": "softmax", - "coef_reg_cnn": 1e-3, - "coef_reg_den": 1e-2, "dropout_rate": 0.5, - "dense_size": 100, + "dense_size": 64, + "optimizer": "SGD", + "optimizer_parameters": { + "lr": 0.0001, + "momentum": 0.9, + "weight_decay": 0.0001 + }, "model_name": "cnn_model" }, { @@ -101,7 +100,7 @@ }, "train": { "epochs": 100, - "batch_size": 64, + "batch_size": 128, "metrics": [ "accuracy", "f1_macro", @@ -119,14 +118,14 @@ "valid", "test" ], - "class_name": "nn_trainer" + "class_name": "torch_trainer" }, "metadata": { "variables": { "ROOT_PATH": "~/.deeppavlov", "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", "MODELS_PATH": "{ROOT_PATH}/models", - "MODEL_PATH": "{MODELS_PATH}/classifiers/sentiment_twitter_v6" + "MODEL_PATH": "{MODELS_PATH}/classifiers/sentiment_twitter_torch" }, "download": [ { @@ -138,8 +137,8 @@ "subdir": "{DOWNLOADS_PATH}/embeddings" }, { - "url": "http://files.deeppavlov.ai/deeppavlov_data/classifiers/sentiment_twitter_v6.tar.gz", - "subdir": "{MODELS_PATH}/classifiers" + "url": "http://files.deeppavlov.ai/v1/classifiers/sentiment_twitter/sentiment_twitter_torch.tar.gz", + "subdir": "{MODEL_PATH}" } ] } diff --git a/deeppavlov/configs/classifiers/sentiment_twitter_bert_emb.json b/deeppavlov/configs/classifiers/sentiment_twitter_bert_emb.json deleted file mode 100644 index 6a4fb9756a..0000000000 --- a/deeppavlov/configs/classifiers/sentiment_twitter_bert_emb.json +++ /dev/null @@ -1,144 +0,0 @@ -{ - "dataset_reader": { - "class_name": "basic_classification_reader", - "x": "Twit", - "y": "Class", - "data_path": "{DOWNLOADS_PATH}/sentiment_twitter_data" - }, - "dataset_iterator": { - "class_name": "basic_classification_iterator", - "seed": 42 - }, - "chainer": { - "in": [ - "x" - ], - "in_y": [ - "y" - ], - "pipe": [ - { - "id": "classes_vocab", - "class_name": "simple_vocab", - "fit_on": [ - "y" - ], - "save_path": "{MODEL_PATH}/classes.dict", - "load_path": "{MODEL_PATH}/classes.dict", - "in": "y", - "out": "y_ids" - }, - { - "class_name": "transformers_bert_preprocessor", - "vocab_file": "{BERT_PATH}/vocab.txt", - "do_lower_case": false, - "max_seq_length": 512, - "in": ["x"], - "out": ["tokens", "subword_tokens", "subword_tok_ids", "startofword_markers", "attention_mask"] - }, - { - "class_name": "transformers_bert_embedder", - "id": "my_embedder", - "bert_config_path": "{BERT_PATH}/bert_config.json", - "truncate": false, - "load_path": "{BERT_PATH}", - "in": ["subword_tok_ids", "startofword_markers", "attention_mask"], - "out": ["word_emb", "subword_emb", "max_emb", "mean_emb", "pooler_output"] - }, - { - "in": "y_ids", - "out": "y_onehot", - "class_name": "one_hotter", - "depth": "#classes_vocab.len", - "single_vector": true - }, - { - "in": [ - "word_emb" - ], - "in_y": [ - "y_onehot" - ], - "out": [ - "y_pred_probas" - ], - "main": true, - "class_name": "keras_classification_model", - "save_path": "{MODEL_PATH}/model", - "load_path": "{MODEL_PATH}/model", - "embedding_size": "#my_embedder.dim", - "n_classes": "#classes_vocab.len", - "kernel_sizes_cnn": [ - 3, - 5, - 7 - ], - "filters_cnn": 256, - "optimizer": "Adam", - "learning_rate": 0.01, - "learning_rate_decay": 0.1, - "loss": "binary_crossentropy", - "last_layer_activation": "softmax", - "coef_reg_cnn": 1e-3, - "coef_reg_den": 1e-2, - "dropout_rate": 0.5, - "dense_size": 100, - "model_name": "cnn_model" - }, - { - "in": "y_pred_probas", - "out": "y_pred_ids", - "class_name": "proba2labels", - "max_proba": true - }, - { - "in": "y_pred_ids", - "out": "y_pred_labels", - "ref": "classes_vocab" - } - ], - "out": [ - "y_pred_labels" - ] - }, - "train": { - "epochs": 100, - "batch_size": 64, - "metrics": [ - "accuracy", - "f1_macro", - { - "name": "roc_auc", - "inputs": ["y_onehot", "y_pred_probas"] - } - ], - "validation_patience": 5, - "val_every_n_epochs": 1, - "log_every_n_epochs": 1, - "show_examples": false, - "evaluation_targets": [ - "valid", - "test" - ], - "class_name": "nn_trainer" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "MODEL_PATH": "{MODELS_PATH}/classifiers/sentiment_twitter_bert_emb", - "BERT_PATH": "{DOWNLOADS_PATH}/bert_models/rubert_cased_L-12_H-768_A-12_pt" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/datasets/sentiment_twitter_data.tar.gz", - "subdir": "{DOWNLOADS_PATH}" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/bert/rubert_cased_L-12_H-768_A-12_pt.tar.gz", - "subdir": "{DOWNLOADS_PATH}/bert_models" - } - ] - } -} diff --git a/deeppavlov/configs/classifiers/sentiment_twitter_preproc.json b/deeppavlov/configs/classifiers/sentiment_twitter_preproc.json deleted file mode 100644 index 2d6a250958..0000000000 --- a/deeppavlov/configs/classifiers/sentiment_twitter_preproc.json +++ /dev/null @@ -1,159 +0,0 @@ -{ - "dataset_reader": { - "class_name": "basic_classification_reader", - "x": "Twit", - "y": "Class", - "data_path": "{DOWNLOADS_PATH}/sentiment_twitter_data" - }, - "dataset_iterator": { - "class_name": "basic_classification_iterator", - "seed": 42 - }, - "chainer": { - "in": [ - "x" - ], - "in_y": [ - "y" - ], - "pipe": [ - { - "id": "classes_vocab", - "class_name": "simple_vocab", - "fit_on": [ - "y" - ], - "save_path": "{MODEL_PATH}/classes.dict", - "load_path": "{MODEL_PATH}/classes.dict", - "in": "y", - "out": "y_ids" - }, - { - "in": [ - "x" - ], - "out": [ - "x_prep" - ], - "class_name": "dirty_comments_preprocessor", - "delete_smile_brackets": true - }, - { - "in": "x_prep", - "out": "x_tok", - "id": "my_tokenizer", - "class_name": "nltk_tokenizer", - "tokenizer": "wordpunct_tokenize" - }, - { - "in": "x_tok", - "out": "x_emb", - "id": "my_embedder", - "class_name": "fasttext", - "load_path": "{DOWNLOADS_PATH}/embeddings/ft_native_300_ru_wiki_lenta_nltk_wordpunct_tokenize.bin", - "pad_zero": true - }, - { - "in": "y_ids", - "out": "y_onehot", - "class_name": "one_hotter", - "depth": "#classes_vocab.len", - "single_vector": true - }, - { - "in": [ - "x_emb" - ], - "in_y": [ - "y_onehot" - ], - "out": [ - "y_pred_probas" - ], - "main": true, - "class_name": "keras_classification_model", - "save_path": "{MODEL_PATH}/model", - "load_path": "{MODEL_PATH}/model", - "embedding_size": "#my_embedder.dim", - "n_classes": "#classes_vocab.len", - "kernel_sizes_cnn": [ - 3, - 5, - 7 - ], - "filters_cnn": 256, - "optimizer": "Adam", - "learning_rate": 0.01, - "learning_rate_decay": 0.1, - "loss": "binary_crossentropy", - "last_layer_activation": "softmax", - "coef_reg_cnn": 1e-3, - "coef_reg_den": 1e-2, - "dropout_rate": 0.5, - "dense_size": 100, - "model_name": "cnn_model" - }, - { - "in": "y_pred_probas", - "out": "y_pred_ids", - "class_name": "proba2labels", - "max_proba": true - }, - { - "in": "y_pred_ids", - "out": "y_pred_labels", - "ref": "classes_vocab" - } - ], - "out": [ - "y_pred_labels" - ] - }, - "train": { - "epochs": 100, - "batch_size": 64, - "metrics": [ - "accuracy", - "f1_macro", - { - "name": "roc_auc", - "inputs": [ - "y_onehot", - "y_pred_probas" - ] - } - ], - "validation_patience": 5, - "val_every_n_epochs": 1, - "log_every_n_epochs": 1, - "show_examples": false, - "evaluation_targets": [ - "train", - "valid", - "test" - ], - "class_name": "nn_trainer" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "MODEL_PATH": "{MODELS_PATH}/classifiers/sentiment_twitter_v7" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/datasets/sentiment_twitter_data.tar.gz", - "subdir": "{DOWNLOADS_PATH}" - }, - { - "url": "http://files.deeppavlov.ai/embeddings/ft_native_300_ru_wiki_lenta_nltk_wordpunct_tokenize/ft_native_300_ru_wiki_lenta_nltk_wordpunct_tokenize.bin", - "subdir": "{DOWNLOADS_PATH}/embeddings" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/classifiers/sentiment_twitter_v7.tar.gz", - "subdir": "{MODELS_PATH}/classifiers" - } - ] - } -} diff --git a/deeppavlov/configs/classifiers/sentiment_yelp_conv_bert.json b/deeppavlov/configs/classifiers/sentiment_yelp_conv_bert.json deleted file mode 100644 index f1b1a40561..0000000000 --- a/deeppavlov/configs/classifiers/sentiment_yelp_conv_bert.json +++ /dev/null @@ -1,149 +0,0 @@ -{ - "dataset_reader": { - "class_name": "basic_classification_reader", - "x": "text", - "y": "label", - "data_path": "{DOWNLOADS_PATH}/yelp_review_full_csv", - "train": "train.csv", - "test": "test.csv", - "header": null, - "names": [ - "label", - "text" - ] - }, - "dataset_iterator": { - "class_name": "basic_classification_iterator", - "seed": 42, - "split_seed": 23, - "field_to_split": "train", - "split_fields": [ - "train", - "valid" - ], - "split_proportions": [ - 0.9, - 0.1 - ] - }, - "chainer": { - "in": [ - "x" - ], - "in_y": [ - "y" - ], - "pipe": [ - { - "class_name": "bert_preprocessor", - "vocab_file": "{MODEL_PATH}/vocab.txt", - "do_lower_case": false, - "max_seq_length": 256, - "in": [ - "x" - ], - "out": [ - "bert_features" - ] - }, - { - "id": "classes_vocab", - "class_name": "simple_vocab", - "fit_on": [ - "y" - ], - "save_path": "{MODEL_PATH}/classes.dict", - "load_path": "{MODEL_PATH}/classes.dict", - "in": "y", - "out": "y_ids" - }, - { - "in": "y_ids", - "out": "y_onehot", - "class_name": "one_hotter", - "depth": "#classes_vocab.len", - "single_vector": true - }, - { - "class_name": "bert_classifier", - "n_classes": "#classes_vocab.len", - "return_probas": true, - "one_hot_labels": true, - "bert_config_file": "{MODEL_PATH}/bert_config.json", - "save_path": "{MODEL_PATH}/model", - "load_path": "{MODEL_PATH}/model", - "keep_prob": 0.5, - "learning_rate": 1e-05, - "learning_rate_drop_patience": 5, - "learning_rate_drop_div": 2.0, - "in": [ - "bert_features" - ], - "in_y": [ - "y_onehot" - ], - "out": [ - "y_pred_probas" - ] - }, - { - "in": "y_pred_probas", - "out": "y_pred_ids", - "class_name": "proba2labels", - "max_proba": true - }, - { - "in": "y_pred_ids", - "out": "y_pred_labels", - "ref": "classes_vocab" - } - ], - "out": [ - "y_pred_labels" - ] - }, - "train": { - "epochs": 100, - "batch_size": 16, - "metrics": [ - "accuracy", - { - "name": "roc_auc", - "inputs": [ - "y_onehot", - "y_pred_probas" - ] - }, - "f1_macro" - ], - "validation_patience": 5, - "val_every_n_epochs": 1, - "log_every_n_epochs": 1, - "show_examples": false, - "evaluation_targets": [ - "train", - "valid", - "test" - ], - "class_name": "nn_trainer", - "tensorboard_log_dir": "{MODEL_PATH}/" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "MODEL_PATH": "{MODELS_PATH}/classifiers/sentiment_yelp_bert_v2" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/datasets/yelp_review_full_csv.tar.gz", - "subdir": "{DOWNLOADS_PATH}" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/classifiers/sentiment_yelp_bert_v2.tar.gz", - "subdir": "{MODELS_PATH}/classifiers" - } - ] - } -} diff --git a/deeppavlov/configs/classifiers/sentiment_yelp_multi_bert.json b/deeppavlov/configs/classifiers/sentiment_yelp_multi_bert.json deleted file mode 100644 index d18dab7b05..0000000000 --- a/deeppavlov/configs/classifiers/sentiment_yelp_multi_bert.json +++ /dev/null @@ -1,149 +0,0 @@ -{ - "dataset_reader": { - "class_name": "basic_classification_reader", - "x": "text", - "y": "label", - "data_path": "{DOWNLOADS_PATH}/yelp_review_full_csv", - "train": "train.csv", - "test": "test.csv", - "header": null, - "names": [ - "label", - "text" - ] - }, - "dataset_iterator": { - "class_name": "basic_classification_iterator", - "seed": 42, - "split_seed": 23, - "field_to_split": "train", - "split_fields": [ - "train", - "valid" - ], - "split_proportions": [ - 0.9, - 0.1 - ] - }, - "chainer": { - "in": [ - "x" - ], - "in_y": [ - "y" - ], - "pipe": [ - { - "class_name": "bert_preprocessor", - "vocab_file": "{MODEL_PATH}/vocab.txt", - "do_lower_case": false, - "max_seq_length": 200, - "in": [ - "x" - ], - "out": [ - "bert_features" - ] - }, - { - "id": "classes_vocab", - "class_name": "simple_vocab", - "fit_on": [ - "y" - ], - "save_path": "{MODEL_PATH}/classes.dict", - "load_path": "{MODEL_PATH}/classes.dict", - "in": "y", - "out": "y_ids" - }, - { - "in": "y_ids", - "out": "y_onehot", - "class_name": "one_hotter", - "depth": "#classes_vocab.len", - "single_vector": true - }, - { - "class_name": "bert_classifier", - "n_classes": "#classes_vocab.len", - "return_probas": true, - "one_hot_labels": true, - "bert_config_file": "{MODEL_PATH}/bert_config.json", - "save_path": "{MODEL_PATH}/model", - "load_path": "{MODEL_PATH}/model", - "keep_prob": 0.5, - "learning_rate": 1e-05, - "learning_rate_drop_patience": 5, - "learning_rate_drop_div": 2.0, - "in": [ - "bert_features" - ], - "in_y": [ - "y_onehot" - ], - "out": [ - "y_pred_probas" - ] - }, - { - "in": "y_pred_probas", - "out": "y_pred_ids", - "class_name": "proba2labels", - "max_proba": true - }, - { - "in": "y_pred_ids", - "out": "y_pred_labels", - "ref": "classes_vocab" - } - ], - "out": [ - "y_pred_labels" - ] - }, - "train": { - "epochs": 100, - "batch_size": 16, - "metrics": [ - "accuracy", - { - "name": "roc_auc", - "inputs": [ - "y_onehot", - "y_pred_probas" - ] - }, - "f1_macro" - ], - "validation_patience": 5, - "val_every_n_epochs": 1, - "log_every_n_epochs": 1, - "show_examples": false, - "evaluation_targets": [ - "train", - "valid", - "test" - ], - "class_name": "nn_trainer", - "tensorboard_log_dir": "{MODEL_PATH}/" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "MODEL_PATH": "{MODELS_PATH}/classifiers/sentiment_yelp_bert_v1" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/datasets/yelp_review_full_csv.tar.gz", - "subdir": "{DOWNLOADS_PATH}" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/classifiers/sentiment_yelp_bert_v1.tar.gz", - "subdir": "{MODELS_PATH}/classifiers" - } - ] - } -} diff --git a/deeppavlov/configs/classifiers/sst_torch_swcnn.json b/deeppavlov/configs/classifiers/sst_torch_swcnn.json deleted file mode 100644 index 9709d333fd..0000000000 --- a/deeppavlov/configs/classifiers/sst_torch_swcnn.json +++ /dev/null @@ -1,148 +0,0 @@ -{ - "dataset_reader": { - "class_name": "torchtext_classification_data_reader", - "data_path": "{DOWNLOADS_PATH}", - "dataset_title": "SST" - }, - "dataset_iterator": { - "class_name": "basic_classification_iterator", - "seed": 42 - }, - "chainer": { - "in": [ - "x" - ], - "in_y": [ - "y" - ], - "pipe": [ - { - "id": "classes_vocab", - "class_name": "simple_vocab", - "fit_on": [ - "y" - ], - "level": "token", - "save_path": "{MODEL_PATH}/classes.dict", - "load_path": "{MODEL_PATH}/classes.dict", - "in": "y", - "out": "y_ids" - }, - { - "in": "x", - "out": "x_tok", - "id": "my_tokenizer", - "class_name": "nltk_tokenizer", - "tokenizer": "wordpunct_tokenize" - }, - { - "in": "x_tok", - "out": "x_emb", - "id": "my_embedder", - "class_name": "fasttext", - "load_path": "{DOWNLOADS_PATH}/embeddings/wiki.en.bin", - "pad_zero": true - }, - { - "in": "y_ids", - "out": "y_onehot", - "class_name": "one_hotter", - "depth": "#classes_vocab.len", - "single_vector": true - }, - { - "in": [ - "x_emb" - ], - "in_y": [ - "y_ids" - ], - "out": [ - "y_pred_probas" - ], - "main": true, - "class_name": "torch_text_classification_model", - "save_path": "{MODEL_PATH}/model", - "load_path": "{MODEL_PATH}/model", - "embedding_size": "#my_embedder.dim", - "n_classes": "#classes_vocab.len", - "model_name": "cnn_model", - "kernel_sizes_cnn": [ - 3, - 5, - 7 - ], - "filters_cnn": 128, - "dropout_rate": 0.5, - "dense_size": 64, - "optimizer": "SGD", - "optimizer_parameters": { - "lr": 0.0001, - "momentum": 0.9, - "weight_decay": 0.0001 - }, - "lr_scheduler": "CyclicLR", - "lr_scheduler_parameters": { - "base_lr": 0.0001, - "max_lr": 0.001 - }, - "loss": "CrossEntropyLoss" - }, - { - "in": "y_pred_probas", - "out": "y_pred_ids", - "class_name": "proba2labels", - "max_proba": true - }, - { - "in": "y_pred_ids", - "out": "y_pred_labels", - "ref": "classes_vocab" - } - ], - "out": [ - "y_pred_labels" - ] - }, - "train": { - "epochs": 100, - "batch_size": 64, - "metrics": [ - "accuracy", - { - "name": "roc_auc", - "inputs": [ - "y_onehot", - "y_pred_probas" - ] - } - ], - "validation_patience": 10, - "val_every_n_epochs": 1, - "log_every_n_epochs": 1, - "show_examples": false, - "evaluation_targets": [ - "train", - "valid" - ], - "class_name": "torch_trainer" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "MODEL_PATH": "{MODELS_PATH}/classifiers/sst_torch_v0" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/embeddings/wiki.en.bin", - "subdir": "{DOWNLOADS_PATH}/embeddings" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/classifiers/sst_torch_v0.tar.gz", - "subdir": "{MODELS_PATH}/classifiers" - } - ] - } -} diff --git a/deeppavlov/configs/classifiers/superglue/superglue_copa_roberta.json b/deeppavlov/configs/classifiers/superglue/superglue_copa_roberta.json index 1a9fda443d..101f474412 100644 --- a/deeppavlov/configs/classifiers/superglue/superglue_copa_roberta.json +++ b/deeppavlov/configs/classifiers/superglue/superglue_copa_roberta.json @@ -1,97 +1,147 @@ { - "dataset_reader": { - "class_name": "huggingface_dataset_reader", - "path": "super_glue", - "name": "copa", - "train": "train", - "valid": "validation", - "test": "test" - }, - "dataset_iterator": { - "class_name": "huggingface_dataset_iterator", - "features": ["contexts", "choices"], - "label": "label", - "seed": 42 - }, - "chainer": { - "in": ["contexts_list", "choices_list"], - "in_y": ["y"], - "pipe": [ - { - "class_name": "torch_transformers_multiplechoice_preprocessor", - "vocab_file": "{BASE_MODEL}", - "do_lower_case": false, - "max_seq_length": 64, - "in": ["contexts_list", "choices_list"], - "out": ["bert_features"] - }, - { - "id": "classes_vocab", - "class_name": "simple_vocab", - "fit_on": ["y"], - "save_path": "{MODEL_PATH}/classes.dict", - "load_path": "{MODEL_PATH}/classes.dict", - "in": ["y"], - "out": ["y_ids"] - }, - { - "in": ["y_ids"], - "out": ["y_onehot"], - "class_name": "one_hotter", - "depth": "#classes_vocab.len", - "single_vector": true - }, - { - "class_name": "torch_transformers_multiplechoice", - "n_classes": "#classes_vocab.len", - "return_probas": true, - "pretrained_bert": "{BASE_MODEL}", - "save_path": "{MODEL_PATH}/model", - "load_path": "{MODEL_PATH}/model", - "optimizer": "AdamW", - "optimizer_parameters": { - "lr": 2e-05 + "dataset_reader": { + "class_name": "huggingface_dataset_reader", + "path": "super_glue", + "name": "copa", + "train": "train", + "valid": "validation", + "test": "test" + }, + "dataset_iterator": { + "class_name": "huggingface_dataset_iterator", + "features": [ + "contexts", + "choices" + ], + "label": "label", + "seed": 42 + }, + "chainer": { + "in": [ + "contexts_list", + "choices_list" + ], + "in_y": [ + "y" + ], + "pipe": [ + { + "class_name": "torch_transformers_multiplechoice_preprocessor", + "vocab_file": "{BASE_MODEL}", + "do_lower_case": false, + "max_seq_length": 64, + "in": [ + "contexts_list", + "choices_list" + ], + "out": [ + "bert_features" + ] + }, + { + "id": "classes_vocab", + "class_name": "simple_vocab", + "fit_on": [ + "y" + ], + "save_path": "{MODEL_PATH}/classes.dict", + "load_path": "{MODEL_PATH}/classes.dict", + "in": [ + "y" + ], + "out": [ + "y_ids" + ] + }, + { + "in": [ + "y_ids" + ], + "out": [ + "y_onehot" + ], + "class_name": "one_hotter", + "depth": "#classes_vocab.len", + "single_vector": true + }, + { + "class_name": "torch_transformers_multiplechoice", + "n_classes": "#classes_vocab.len", + "return_probas": true, + "pretrained_bert": "{BASE_MODEL}", + "save_path": "{MODEL_PATH}/model", + "load_path": "{MODEL_PATH}/model", + "optimizer": "AdamW", + "optimizer_parameters": { + "lr": 2e-05 + }, + "learning_rate_drop_patience": 3, + "learning_rate_drop_div": 2.0, + "in": [ + "bert_features" + ], + "in_y": [ + "y_ids" + ], + "out": [ + "y_pred_probas" + ] + }, + { + "in": [ + "y_pred_probas" + ], + "out": [ + "y_pred_ids" + ], + "class_name": "proba2labels", + "max_proba": true + }, + { + "in": [ + "y_pred_ids" + ], + "out": [ + "y_pred_labels" + ], + "ref": "classes_vocab" + } + ], + "out": [ + "y_pred_labels" + ] + }, + "train": { + "batch_size": 16, + "metrics": [ + "accuracy" + ], + "validation_patience": 10, + "val_every_n_epochs": 1, + "log_every_n_epochs": 1, + "show_examples": false, + "evaluation_targets": [ + "train", + "valid" + ], + "class_name": "torch_trainer", + "tensorboard_log_dir": "{MODEL_PATH}/", + "pytest_max_batches": 2, + "pytest_batch_size": 2 + }, + "metadata": { + "variables": { + "BASE_MODEL": "roberta-large", + "ROOT_PATH": "~/.deeppavlov", + "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", + "MODELS_PATH": "{ROOT_PATH}/models", + "MODEL_PATH": "{MODELS_PATH}/classifiers/superglue_copa_{BASE_MODEL}" }, - "learning_rate_drop_patience": 3, - "learning_rate_drop_div": 2.0, - "in": ["bert_features"], - "in_y": ["y_ids"], - "out": ["y_pred_probas"] - }, - { - "in": ["y_pred_probas"], - "out": ["y_pred_ids"], - "class_name": "proba2labels", - "max_proba": true - }, - { - "in": ["y_pred_ids"], - "out": ["y_pred_labels"], - "ref": "classes_vocab" - } - ], - "out": ["y_pred_labels"] - }, - "train": { - "batch_size": 16, - "metrics": ["accuracy"], - "validation_patience": 10, - "val_every_n_epochs": 1, - "log_every_n_epochs": 1, - "show_examples": false, - "evaluation_targets": ["train", "valid"], - "class_name": "torch_trainer", - "tensorboard_log_dir": "{MODEL_PATH}/", - "pytest_max_batches": 2, - "pytest_batch_size": 2 - }, - "metadata": { - "variables": { - "BASE_MODEL": "roberta-large", - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "MODEL_PATH": "{MODELS_PATH}/classifiers/superglue_copa_{BASE_MODEL}" + "download": [ + { + "url": "http://files.deeppavlov.ai/0.17/classifiers/superglue/superglue_copa_roberta.tar.gz", + "subdir": "{MODELS_PATH}" + } + ] } - } } diff --git a/deeppavlov/configs/classifiers/superglue/superglue_record_roberta.json b/deeppavlov/configs/classifiers/superglue/superglue_record_roberta.json index e537a098f3..c21bcf193e 100644 --- a/deeppavlov/configs/classifiers/superglue/superglue_record_roberta.json +++ b/deeppavlov/configs/classifiers/superglue/superglue_record_roberta.json @@ -13,7 +13,7 @@ "download": [ { "url": "http://files.deeppavlov.ai/0.17/classifiers/superglue/superglue_record_roberta.tar.gz", - "subdir": "{MODEL_PATH}" + "subdir": "{MODELS_PATH}" } ] }, diff --git a/deeppavlov/configs/classifiers/topic_ag_news.json b/deeppavlov/configs/classifiers/topic_ag_news.json deleted file mode 100644 index 0e56578b55..0000000000 --- a/deeppavlov/configs/classifiers/topic_ag_news.json +++ /dev/null @@ -1,154 +0,0 @@ -{ - "dataset_reader": { - "class_name": "basic_classification_reader", - "x": "text", - "y": "label", - "data_path": "{DOWNLOADS_PATH}/ag_news_data" - }, - "dataset_iterator": { - "class_name": "basic_classification_iterator" - }, - "chainer": { - "in": [ - "x" - ], - "in_y": [ - "y" - ], - "pipe": [ - { - "id": "classes_vocab", - "class_name": "simple_vocab", - "fit_on": [ - "y" - ], - "save_path": "{MODEL_PATH}/classes.dict", - "load_path": "{MODEL_PATH}/classes.dict", - "in": "y", - "out": "y_ids" - }, - { - "in": [ - "x" - ], - "out": [ - "x_lower" - ], - "class_name": "str_lower" - }, - { - "in": "x_lower", - "out": "x_tok", - "id": "my_tokenizer", - "class_name": "nltk_tokenizer", - "tokenizer": "wordpunct_tokenize" - }, - { - "in": "x_tok", - "out": "x_emb", - "id": "my_embedder", - "class_name": "fasttext", - "load_path": "{DOWNLOADS_PATH}/embeddings/wiki.en.bin", - "pad_zero": true - }, - { - "in": "y_ids", - "out": "y_onehot", - "class_name": "one_hotter", - "depth": "#classes_vocab.len", - "single_vector": true - }, - { - "in": [ - "x_emb" - ], - "in_y": [ - "y_onehot" - ], - "out": [ - "y_pred_probas" - ], - "main": true, - "class_name": "keras_classification_model", - "save_path": "{MODEL_PATH}/model", - "load_path": "{MODEL_PATH}/model", - "embedding_size": "#my_embedder.dim", - "n_classes": "#classes_vocab.len", - "kernel_sizes_cnn": [ - 3, - 5, - 7 - ], - "filters_cnn": 256, - "optimizer": "Adam", - "learning_rate": 0.01, - "learning_rate_decay": 0.1, - "loss": "binary_crossentropy", - "coef_reg_cnn": 1e-4, - "coef_reg_den": 1e-4, - "dropout_rate": 0.5, - "dense_size": 100, - "last_layer_activation": "softmax", - "model_name": "cnn_model" - }, - { - "in": "y_pred_probas", - "out": "y_pred_ids", - "class_name": "proba2labels", - "max_proba": true - }, - { - "in": "y_pred_ids", - "out": "y_pred_labels", - "ref": "classes_vocab" - } - ], - "out": [ - "y_pred_labels" - ] - }, - "train": { - "epochs": 100, - "batch_size": 64, - "metrics": [ - "accuracy", - "f1_macro", - { - "name": "roc_auc", - "inputs": ["y_onehot", "y_pred_probas"] - } - ], - "validation_patience": 5, - "val_every_n_epochs": 1, - "log_every_n_epochs": 1, - "show_examples": false, - "evaluation_targets": [ - "train", - "valid", - "test" - ], - "class_name": "nn_trainer" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "MODEL_PATH": "{MODELS_PATH}/classifiers/topic_ag_news_v3" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/datasets/ag_news_data.tar.gz", - "subdir": "{DOWNLOADS_PATH}" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/embeddings/wiki.en.bin", - "subdir": "{DOWNLOADS_PATH}/embeddings" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/classifiers/topic_ag_news_v3.tar.gz", - "subdir": "{MODELS_PATH}/classifiers" - } - ] - } -} diff --git a/deeppavlov/configs/classifiers/yahoo_convers_vs_info.json b/deeppavlov/configs/classifiers/yahoo_convers_vs_info.json deleted file mode 100644 index ec1cd427b0..0000000000 --- a/deeppavlov/configs/classifiers/yahoo_convers_vs_info.json +++ /dev/null @@ -1,167 +0,0 @@ -{ - "chainer": { - "in": [ - "x" - ], - "in_y": [ - "y" - ], - "pipe": [ - { - "id": "classes_vocab", - "class_name": "simple_vocab", - "fit_on": [ - "y" - ], - "save_path": "{MODEL_PATH}/classes.dict", - "load_path": "{MODEL_PATH}/classes.dict", - "in": "y", - "out": "y_ids" - }, - { - "in": [ - "x" - ], - "out": [ - "x_prep" - ], - "class_name": "dirty_comments_preprocessor", - "remove_punctuation": false - }, - { - "in": "x_prep", - "out": "x_tok", - "id": "my_tokenizer", - "class_name": "nltk_moses_tokenizer" - }, - { - "in": [ - "x_tok" - ], - "out": [ - "x_emb" - ], - "id": "my_embedder", - "class_name": "elmo_embedder", - "elmo_output_names": [ - "elmo" - ], - "mini_batch_size": 32, - "spec": "{DOWNLOADS_PATH}/embeddings/yahooo-sber-questions_epoches_n_15/", - "pad_zero": true - }, - { - "in": "y_ids", - "out": "y_onehot", - "class_name": "one_hotter", - "id": "my_one_hotter", - "depth": "#classes_vocab.len", - "single_vector": true - }, - { - "in": [ - "x_emb" - ], - "in_y": [ - "y_onehot" - ], - "out": [ - "y_pred_probas" - ], - "main": true, - "class_name": "keras_classification_model", - "save_path": "{MODEL_PATH}/model", - "load_path": "{MODEL_PATH}/model", - "embedding_size": "#my_embedder.dim", - "n_classes": "#classes_vocab.len", - "units_gru": 512, - "optimizer": "Adam", - "learning_rate": 0.001, - "learning_rate_decay": 0.001, - "loss": "categorical_crossentropy", - "coef_reg_gru": 1e-4, - "coef_reg_den": 1e-4, - "dropout_rate": 0.5, - "rec_dropout_rate": 0.5, - "dense_size": 100, - "model_name": "bigru_with_max_aver_pool_model", - "last_layer_activation": "softmax", - "restore_lr": false - }, - { - "in": "y_pred_probas", - "out": "y_pred_ids", - "class_name": "proba2labels", - "max_proba": true - }, - { - "in": "y_pred_ids", - "out": "y_pred_labels", - "ref": "classes_vocab" - }, - { - "ref": "my_one_hotter", - "in": "y_pred_ids", - "out": "y_pred_onehot" - } - ], - "out": [ - "y_pred_labels" - ] - }, - "train": { - "epochs": 100, - "batch_size": 32, - "metrics": [ - { - "name": "roc_auc", - "inputs": [ - "y_onehot", - "y_pred_probas" - ] - }, - { - "name": "accuracy", - "inputs": [ - "y", - "y_pred_labels" - ] - }, - { - "name": "f1_macro", - "inputs": [ - "y", - "y_pred_labels" - ] - } - ], - "validation_patience": 20, - "val_every_n_epochs": 1, - "log_every_n_epochs": 1, - "show_examples": true, - "evaluation_targets": [ - "train", - "valid" - ], - "tensorboard_log_dir": "{MODEL_PATH}/", - "class_name": "nn_trainer" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "MODEL_PATH": "{MODELS_PATH}/classifiers/yahoo_convers_vs_info_v2" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/embeddings/yahooo-sber-questions_epoches_n_15.tar.gz", - "subdir": "{DOWNLOADS_PATH}/embeddings/yahooo-sber-questions_epoches_n_15/" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/classifiers/yahoo_convers_vs_info_v2.tar.gz", - "subdir": "{MODELS_PATH}/classifiers/" - } - ] - } -} diff --git a/deeppavlov/configs/classifiers/yahoo_convers_vs_info_bert.json b/deeppavlov/configs/classifiers/yahoo_convers_vs_info_bert.json deleted file mode 100644 index 7b0e79994d..0000000000 --- a/deeppavlov/configs/classifiers/yahoo_convers_vs_info_bert.json +++ /dev/null @@ -1,160 +0,0 @@ -{ - "chainer": { - "in": [ - "x" - ], - "in_y": [ - "y" - ], - "pipe": [ - { - "class_name": "bert_preprocessor", - "vocab_file": "{DOWNLOADS_PATH}/bert_models/conversational_cased_L-12_H-768_A-12/vocab.txt", - "do_lower_case": false, - "max_seq_length": 64, - "in": [ - "x" - ], - "out": [ - "bert_features" - ] - }, - { - "id": "classes_vocab", - "class_name": "simple_vocab", - "fit_on": [ - "y" - ], - "save_path": "{MODEL_PATH}/classes.dict", - "load_path": "{MODEL_PATH}/classes.dict", - "in": [ - "y" - ], - "out": [ - "y_ids" - ] - }, - { - "in": [ - "y_ids" - ], - "out": [ - "y_onehot" - ], - "class_name": "one_hotter", - "id": "my_one_hotter", - "depth": "#classes_vocab.len", - "single_vector": true - }, - { - "class_name": "bert_classifier", - "n_classes": "#classes_vocab.len", - "return_probas": true, - "one_hot_labels": true, - "bert_config_file": "{DOWNLOADS_PATH}/bert_models/conversational_cased_L-12_H-768_A-12/bert_config.json", - "pretrained_bert": "{DOWNLOADS_PATH}/bert_models/conversational_cased_L-12_H-768_A-12/bert_model.ckpt", - "save_path": "{MODEL_PATH}/model", - "load_path": "{MODEL_PATH}/model", - "keep_prob": 0.5, - "learning_rate": 1e-05, - "learning_rate_drop_patience": 5, - "learning_rate_drop_div": 2.0, - "in": [ - "bert_features" - ], - "in_y": [ - "y_onehot" - ], - "out": [ - "y_pred_probas" - ] - }, - { - "in": [ - "y_pred_probas" - ], - "out": [ - "y_pred_ids" - ], - "class_name": "proba2labels", - "max_proba": true - }, - { - "in": [ - "y_pred_ids" - ], - "out": [ - "y_pred_labels" - ], - "ref": "classes_vocab" - }, - { - "ref": "my_one_hotter", - "in": [ - "y_pred_ids" - ], - "out": [ - "y_pred_onehot" - ] - } - ], - "out": [ - "y_pred_labels" - ] - }, - "train": { - "epochs": 100, - "batch_size": 64, - "metrics": [ - { - "name": "roc_auc", - "inputs": [ - "y_onehot", - "y_pred_probas" - ] - }, - { - "name": "accuracy", - "inputs": [ - "y", - "y_pred_labels" - ] - }, - { - "name": "f1_macro", - "inputs": [ - "y", - "y_pred_labels" - ] - } - ], - "validation_patience": 20, - "val_every_n_epochs": 1, - "log_every_n_epochs": 1, - "show_examples": false, - "evaluation_targets": [ - "train", - "valid" - ], - "tensorboard_log_dir": "{MODEL_PATH}/", - "class_name": "nn_trainer" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "MODEL_PATH": "{MODELS_PATH}/classifiers/yahoo_convers_vs_info_v3" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/bert/conversational_cased_L-12_H-768_A-12.tar.gz", - "subdir": "{DOWNLOADS_PATH}/bert_models" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/classifiers/yahoo_convers_vs_info_v3.tar.gz", - "subdir": "{MODELS_PATH}/classifiers/" - } - ] - } -} diff --git a/deeppavlov/configs/doc_retrieval/en_ranker_pop_enwiki20180211.json b/deeppavlov/configs/doc_retrieval/en_ranker_pop_enwiki20180211.json index 20402495c1..24c4d566fe 100644 --- a/deeppavlov/configs/doc_retrieval/en_ranker_pop_enwiki20180211.json +++ b/deeppavlov/configs/doc_retrieval/en_ranker_pop_enwiki20180211.json @@ -56,7 +56,7 @@ { "class_name": "pop_ranker", "pop_dict_path": "{DOWNLOADS_PATH}/odqa/enwiki20180211_popularities.json", - "load_path": "{MODELS_PATH}/odqa/logreg_3features.joblib", + "load_path": "{MODELS_PATH}/odqa/logreg_3features_v2.joblib", "top_n": 10, "in": ["tfidf_doc_ids", "tfidf_doc_scores"], "out": ["pop_doc_ids", "pop_doc_scores"] @@ -88,8 +88,8 @@ "subdir": "{DOWNLOADS_PATH}" }, { - "url": "http://files.deeppavlov.ai/deeppavlov_data/pop_ranker.tar.gz", - "subdir": "{MODELS_PATH}" + "url": "http://files.deeppavlov.ai/deeppavlov_data/ranking/logreg_3features_v2.joblib", + "subdir": "{MODELS_PATH}/odqa" } ] } diff --git a/deeppavlov/configs/doc_retrieval/en_ranker_tfidf_enwiki20161221.json b/deeppavlov/configs/doc_retrieval/en_ranker_tfidf_enwiki20161221.json deleted file mode 100644 index 4fe8f63780..0000000000 --- a/deeppavlov/configs/doc_retrieval/en_ranker_tfidf_enwiki20161221.json +++ /dev/null @@ -1,80 +0,0 @@ -{ - "dataset_reader": { - "class_name": "odqa_reader", - "data_path": "{DOWNLOADS_PATH}/odqa/enwiki20161221", - "save_path": "{DOWNLOADS_PATH}/odqa/enwiki20161221.db", - "dataset_format": "wiki" - }, - "dataset_iterator": { - "class_name": "sqlite_iterator", - "shuffle": false, - "load_path": "{DOWNLOADS_PATH}/odqa/enwiki20161221.db" - }, - "chainer": { - "in": [ - "docs" - ], - "in_y": [ - "doc_ids", - "doc_nums" - ], - "out": [ - "tfidf_doc_ids" - ], - "pipe": [ - { - "class_name": "hashing_tfidf_vectorizer", - "id": "vectorizer", - "fit_on": [ - "docs", - "doc_ids", - "doc_nums" - ], - "save_path": "{MODELS_PATH}/odqa/enwiki20161221_tfidf_matrix.npz", - "load_path": "{MODELS_PATH}/odqa/enwiki20161221_tfidf_matrix.npz", - "tokenizer": { - "class_name": "stream_spacy_tokenizer", - "lemmas": true, - "ngram_range": [ - 1, - 2 - ] - } - }, - { - "class_name": "tfidf_ranker", - "top_n": 25, - "in": [ - "docs" - ], - "out": [ - "tfidf_doc_ids", - "tfidf_doc_scores" - ], - "vectorizer": "#vectorizer" - } - ] - }, - "train": { - "batch_size": 10000, - "evaluation_targets": [], - "class_name": "fit_trainer" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/datasets/wikipedia/enwiki20161221.tar.gz", - "subdir": "{DOWNLOADS_PATH}" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/en_odqa_enwiki20161221.tar.gz", - "subdir": "{MODELS_PATH}" - } - ] - } -} \ No newline at end of file diff --git a/deeppavlov/configs/elmo/elmo_1b_benchmark.json b/deeppavlov/configs/elmo/elmo_1b_benchmark.json deleted file mode 100644 index 806272b771..0000000000 --- a/deeppavlov/configs/elmo/elmo_1b_benchmark.json +++ /dev/null @@ -1,81 +0,0 @@ -{ - "dataset_reader": { - "class_name": "file_paths_reader", - "data_path": "{DOWNLOADS_PATH}/elmo-1b-benchmark/data/1-billion-word-language-modeling-benchmark-r13output/", - "train": "training-monolingual.tokenized.shuffled/*" - }, - "dataset_iterator": { - "class_name": "elmo_file_paths_iterator", - "seed": 31415, - "unroll_steps": 20, - "max_word_length": 50, - "n_gpus": 1, - "shuffle": false, - "bos": "", - "eos": "", - "save_path": "{MODELS_PATH}/elmo-1b-benchmark/vocab-2016-09-10.txt", - "load_path": "{MODELS_PATH}/elmo-1b-benchmark/vocab-2016-09-10.txt" - }, - "chainer": { - "in": [ - "x_char_ids" - ], - "in_y": [ - "y_token_ids" - ], - "pipe": [ - { - "class_name": "elmo_model", - "options_json_path": "{MODELS_PATH}/elmo-1b-benchmark/options.json", - "unroll_steps": 20, - "batch_size": 128, - "save_path": "{MODELS_PATH}/elmo-1b-benchmark/saves/model", - "load_path": "{MODELS_PATH}/elmo-1b-benchmark/saves/model", - "in": ["x_char_ids", "y_token_ids"], - "in_y": [], - "n_gpus": 1, - "out": ["loss"] - } - ], - "out": [ - "x_char_ids" - ] - }, - "train": { - "epochs": 20, - "batch_size": 128, - "log_every_n_batches": 100, - "val_every_n_epochs": 1, - "validation_patience": 4, - "metric_optimization": "minimize", - "metrics": [ - { - "name": "elmo_loss2ppl", - "inputs": ["loss"] - } - ], - "tensorboard_log_dir": "{MODELS_PATH}/elmo-1b-benchmark/logs", - "class_name": "nn_trainer", - "evaluation_targets": [ - "valid", - "test" - ] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/1-billion-word-language-modeling-benchmark-r13output.tar.gz", - "subdir": "{DOWNLOADS_PATH}/elmo-1b-benchmark/data" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/original_elmo_configuration_and_vocab.tar.gz", - "subdir": "{MODELS_PATH}/elmo-1b-benchmark" - } - ] - } -} diff --git a/deeppavlov/configs/elmo/elmo_1b_benchmark_test.json b/deeppavlov/configs/elmo/elmo_1b_benchmark_test.json deleted file mode 100644 index 15af5b02ae..0000000000 --- a/deeppavlov/configs/elmo/elmo_1b_benchmark_test.json +++ /dev/null @@ -1,79 +0,0 @@ -{ - "dataset_reader": { - "class_name": "file_paths_reader", - "data_path": "{DOWNLOADS_PATH}/elmo-1b-benchmark_test/data/1-billion-word-language-modeling-benchmark-r13output/", - "train": "heldout-monolingual.tokenized.shuffled/news.en.heldout-00001-of-00050", - "test": "heldout-monolingual.tokenized.shuffled/news.en.heldout-00002-of-00050", - "valid": "heldout-monolingual.tokenized.shuffled/news.en.heldout-00003-of-00050" - }, - "dataset_iterator": { - "class_name": "elmo_file_paths_iterator", - "seed": 31415, - "unroll_steps": 20, - "max_word_length": 50, - "n_gpus": 1, - "shuffle": false, - "bos": "", - "eos": "", - "save_path": "{DOWNLOADS_PATH}/elmo-1b-benchmark_test/data/vocab-2016-09-10.txt", - "load_path": "{DOWNLOADS_PATH}/elmo-1b-benchmark_test/data/vocab-2016-09-10.txt" - }, - "chainer": { - "in": [ - "x_char_ids" - ], - "in_y": [ - "y_token_ids" - ], - "pipe": [ - { - "class_name": "elmo_model", - "options_json_path": "{DOWNLOADS_PATH}/elmo-1b-benchmark_test/options.json", - "unroll_steps": 20, - "batch_size": 128, - "save_path": "{MODELS_PATH}/elmo-1b-benchmark_test/saves/model", - "load_path": "{MODELS_PATH}/elmo-1b-benchmark_test/saves/model", - "in": ["x_char_ids", "y_token_ids"], - "in_y": [], - "n_gpus": 1, - "out": ["loss"] - } - ], - "out": [ - "x_char_ids" - ] - }, - "train": { - "epochs": 2, - "batch_size": 128, - "log_every_n_batches": 5, - "val_every_n_epochs": 1, - "validation_patience": 4, - "metric_optimization": "minimize", - "metrics": [ - { - "name": "elmo_loss2ppl", - "inputs": ["loss"] - } - ], - "tensorboard_log_dir": "{MODELS_PATH}/elmo-1b-benchmark_test/logs", - "class_name": "nn_trainer", - "evaluation_targets": [ - "valid", - "test" - ] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/elmo-1b-benchmark_test.tar.gz", - "subdir": "{DOWNLOADS_PATH}" - } - ] - } -} diff --git a/deeppavlov/configs/elmo/elmo_lm_ready4fine_tuning_ru_news.json b/deeppavlov/configs/elmo/elmo_lm_ready4fine_tuning_ru_news.json deleted file mode 100644 index ecaa2afb39..0000000000 --- a/deeppavlov/configs/elmo/elmo_lm_ready4fine_tuning_ru_news.json +++ /dev/null @@ -1,83 +0,0 @@ -{ - "dataset_reader": { - "class_name": "file_paths_reader", - "data_path": "{DOWNLOADS_PATH}/elmo-lm-ready4fine-example-data/data/", - "train": "train/*", - "valid": "heldout/*" - }, - "dataset_iterator": { - "class_name": "elmo_file_paths_iterator", - "seed": 31415, - "unroll_steps": 20, - "max_word_length": 50, - "n_gpus": 1, - "shuffle": false, - "bos": "", - "eos": "", - "save_path": "{MODELS_PATH}/elmo-lm-ready4fine-tuning-ru-news/vocab.txt", - "load_path": "{MODELS_PATH}/elmo-lm-ready4fine-tuning-ru-news/vocab.txt" - }, - "chainer": { - "in": [ - "x_char_ids" - ], - "in_y": [ - "y_token_ids" - ], - "pipe": [ - { - "class_name": "elmo_model", - "options_json_path": "{MODELS_PATH}/elmo-lm-ready4fine-tuning-ru-news/options.json", - "unroll_steps": 20, - "batch_size": 128, - "save_path": "{MODELS_PATH}/elmo-lm-ready4fine-tuning-ru-news/saves/model", - "load_path": "{MODELS_PATH}/elmo-lm-ready4fine-tuning-ru-news/saves/model", - "in": ["x_char_ids", "y_token_ids"], - "in_y": [], - "n_gpus": 1, - "out": ["loss"] - } - ], - "out": [ - "x_char_ids", - "y_token_ids" - ] - }, - "train": { - "epochs": 20, - "batch_size": 128, - "log_every_n_batches": 100, - "val_every_n_epochs": 1, - "validation_patience": 4, - "metric_optimization": "minimize", - "metrics": [ - { - "name": "elmo_loss2ppl", - "inputs": ["loss"] - } - ], - "tensorboard_log_dir": "{MODELS_PATH}/elmo-lm-ready4fine-tuning-ru-news/logs", - "class_name": "nn_trainer", - "evaluation_targets": [ - "valid", - "test" - ] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/elmo-lm-ready4fine-example-data.tar.gz", - "subdir": "{DOWNLOADS_PATH}/" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/elmo-lm-ready4fine-tuning-ru-news.tar.gz", - "subdir": "{MODELS_PATH}/" - } - ] - } -} \ No newline at end of file diff --git a/deeppavlov/configs/elmo/elmo_lm_ready4fine_tuning_ru_news_simple.json b/deeppavlov/configs/elmo/elmo_lm_ready4fine_tuning_ru_news_simple.json deleted file mode 100644 index f7a95f1238..0000000000 --- a/deeppavlov/configs/elmo/elmo_lm_ready4fine_tuning_ru_news_simple.json +++ /dev/null @@ -1,83 +0,0 @@ -{ - "dataset_reader": { - "class_name": "file_paths_reader", - "data_path": "{DOWNLOADS_PATH}/elmo-lm-ready4fine-example-data/data/", - "train": "train/*", - "valid": "heldout/*" - }, - "dataset_iterator": { - "class_name": "elmo_file_paths_iterator", - "seed": 31415, - "unroll_steps": 20, - "max_word_length": 50, - "n_gpus": 1, - "shuffle": false, - "bos": "", - "eos": "", - "save_path": "{MODELS_PATH}/elmo-lm-ready4fine-tuning-ru-news-simple/vocab.txt", - "load_path": "{MODELS_PATH}/elmo-lm-ready4fine-tuning-ru-news-simple/vocab.txt" - }, - "chainer": { - "in": [ - "x_char_ids" - ], - "in_y": [ - "y_token_ids" - ], - "pipe": [ - { - "class_name": "elmo_model", - "options_json_path": "{MODELS_PATH}/elmo-lm-ready4fine-tuning-ru-news-simple/options.json", - "unroll_steps": 20, - "batch_size": 128, - "save_path": "{MODELS_PATH}/elmo-lm-ready4fine-tuning-ru-news-simple/saves/model", - "load_path": "{MODELS_PATH}/elmo-lm-ready4fine-tuning-ru-news-simple/saves/model", - "in": ["x_char_ids", "y_token_ids"], - "in_y": [], - "n_gpus": 1, - "out": ["loss"] - } - ], - "out": [ - "x_char_ids", - "y_token_ids" - ] - }, - "train": { - "epochs": 20, - "batch_size": 128, - "log_every_n_batches": 100, - "val_every_n_epochs": 1, - "validation_patience": 4, - "metric_optimization": "minimize", - "metrics": [ - { - "name": "elmo_loss2ppl", - "inputs": ["loss"] - } - ], - "tensorboard_log_dir": "{MODELS_PATH}/elmo-lm-ready4fine-tuning-ru-news-simple/logs", - "class_name": "nn_trainer", - "evaluation_targets": [ - "valid", - "test" - ] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/elmo-lm-ready4fine-example-data.tar.gz", - "subdir": "{DOWNLOADS_PATH}/" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/elmo-lm-ready4fine-tuning-ru-news-simple.tar.gz", - "subdir": "{MODELS_PATH}/" - } - ] - } -} \ No newline at end of file diff --git a/deeppavlov/configs/elmo/elmo_lm_ready4fine_tuning_ru_twitter.json b/deeppavlov/configs/elmo/elmo_lm_ready4fine_tuning_ru_twitter.json deleted file mode 100644 index 9a4a2f9007..0000000000 --- a/deeppavlov/configs/elmo/elmo_lm_ready4fine_tuning_ru_twitter.json +++ /dev/null @@ -1,83 +0,0 @@ -{ - "dataset_reader": { - "class_name": "file_paths_reader", - "data_path": "{DOWNLOADS_PATH}/elmo-lm-ready4fine-example-data/data/", - "train": "train/*", - "valid": "heldout/*" - }, - "dataset_iterator": { - "class_name": "elmo_file_paths_iterator", - "seed": 31415, - "unroll_steps": 20, - "max_word_length": 50, - "n_gpus": 1, - "shuffle": false, - "bos": "", - "eos": "", - "save_path": "{MODELS_PATH}/elmo-lm-ready4fine-tuning-ru-twitter/vocab.txt", - "load_path": "{MODELS_PATH}/elmo-lm-ready4fine-tuning-ru-twitter/vocab.txt" - }, - "chainer": { - "in": [ - "x_char_ids" - ], - "in_y": [ - "y_token_ids" - ], - "pipe": [ - { - "class_name": "elmo_model", - "options_json_path": "{MODELS_PATH}/elmo-lm-ready4fine-tuning-ru-twitter/options.json", - "unroll_steps": 20, - "batch_size": 128, - "save_path": "{MODELS_PATH}/elmo-lm-ready4fine-tuning-ru-twitter/saves/model", - "load_path": "{MODELS_PATH}/elmo-lm-ready4fine-tuning-ru-twitter/saves/model", - "in": ["x_char_ids", "y_token_ids"], - "in_y": [], - "n_gpus": 1, - "out": ["loss"] - } - ], - "out": [ - "x_char_ids", - "y_token_ids" - ] - }, - "train": { - "epochs": 20, - "batch_size": 128, - "log_every_n_batches": 100, - "val_every_n_epochs": 1, - "validation_patience": 4, - "metric_optimization": "minimize", - "metrics": [ - { - "name": "elmo_loss2ppl", - "inputs": ["loss"] - } - ], - "tensorboard_log_dir": "{MODELS_PATH}/elmo-lm-ready4fine-tuning-ru-twitter/logs", - "class_name": "nn_trainer", - "evaluation_targets": [ - "valid", - "test" - ] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/elmo-lm-ready4fine-example-data.tar.gz", - "subdir": "{DOWNLOADS_PATH}/" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/elmo-lm-ready4fine-tuning-ru-twitter.tar.gz", - "subdir": "{MODELS_PATH}/" - } - ] - } -} \ No newline at end of file diff --git a/deeppavlov/configs/elmo/elmo_lm_ready4fine_tuning_ru_twitter_simple.json b/deeppavlov/configs/elmo/elmo_lm_ready4fine_tuning_ru_twitter_simple.json deleted file mode 100644 index 6ffd491f07..0000000000 --- a/deeppavlov/configs/elmo/elmo_lm_ready4fine_tuning_ru_twitter_simple.json +++ /dev/null @@ -1,83 +0,0 @@ -{ - "dataset_reader": { - "class_name": "file_paths_reader", - "data_path": "{DOWNLOADS_PATH}/elmo-lm-ready4fine-example-data/data/", - "train": "train/*", - "valid": "heldout/*" - }, - "dataset_iterator": { - "class_name": "elmo_file_paths_iterator", - "seed": 31415, - "unroll_steps": 20, - "max_word_length": 50, - "n_gpus": 1, - "shuffle": false, - "bos": "", - "eos": "", - "save_path": "{MODELS_PATH}/elmo-lm-ready4fine-tuning-ru-twitter-simple/vocab.txt", - "load_path": "{MODELS_PATH}/elmo-lm-ready4fine-tuning-ru-twitter-simple/vocab.txt" - }, - "chainer": { - "in": [ - "x_char_ids" - ], - "in_y": [ - "y_token_ids" - ], - "pipe": [ - { - "class_name": "elmo_model", - "options_json_path": "{MODELS_PATH}/elmo-lm-ready4fine-tuning-ru-twitter-simple/options.json", - "unroll_steps": 20, - "batch_size": 128, - "save_path": "{MODELS_PATH}/elmo-lm-ready4fine-tuning-ru-twitter-simple/saves/model", - "load_path": "{MODELS_PATH}/elmo-lm-ready4fine-tuning-ru-twitter-simple/saves/model", - "in": ["x_char_ids", "y_token_ids"], - "in_y": [], - "n_gpus": 1, - "out": ["loss"] - } - ], - "out": [ - "x_char_ids", - "y_token_ids" - ] - }, - "train": { - "epochs": 20, - "batch_size": 128, - "log_every_n_batches": 100, - "val_every_n_epochs": 1, - "validation_patience": 4, - "metric_optimization": "minimize", - "metrics": [ - { - "name": "elmo_loss2ppl", - "inputs": ["loss"] - } - ], - "tensorboard_log_dir": "{MODELS_PATH}/elmo-lm-ready4fine-tuning-ru-twitter-simple/logs", - "class_name": "nn_trainer", - "evaluation_targets": [ - "valid", - "test" - ] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/elmo-lm-ready4fine-example-data.tar.gz", - "subdir": "{DOWNLOADS_PATH}/" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/elmo-lm-ready4fine-tuning-ru-twitter-simple.tar.gz", - "subdir": "{MODELS_PATH}/" - } - ] - } -} \ No newline at end of file diff --git a/deeppavlov/configs/elmo/elmo_lm_ready4fine_tuning_ru_wiki.json b/deeppavlov/configs/elmo/elmo_lm_ready4fine_tuning_ru_wiki.json deleted file mode 100644 index c44e850215..0000000000 --- a/deeppavlov/configs/elmo/elmo_lm_ready4fine_tuning_ru_wiki.json +++ /dev/null @@ -1,83 +0,0 @@ -{ - "dataset_reader": { - "class_name": "file_paths_reader", - "data_path": "{DOWNLOADS_PATH}/elmo-lm-ready4fine-example-data/data/", - "train": "train/*", - "valid": "heldout/*" - }, - "dataset_iterator": { - "class_name": "elmo_file_paths_iterator", - "seed": 31415, - "unroll_steps": 20, - "max_word_length": 50, - "n_gpus": 1, - "shuffle": false, - "bos": "", - "eos": "", - "save_path": "{MODELS_PATH}/elmo-lm-ready4fine-tuning-ru-wiki/vocab.txt", - "load_path": "{MODELS_PATH}/elmo-lm-ready4fine-tuning-ru-wiki/vocab.txt" - }, - "chainer": { - "in": [ - "x_char_ids" - ], - "in_y": [ - "y_token_ids" - ], - "pipe": [ - { - "class_name": "elmo_model", - "options_json_path": "{MODELS_PATH}/elmo-lm-ready4fine-tuning-ru-wiki/options.json", - "unroll_steps": 20, - "batch_size": 128, - "save_path": "{MODELS_PATH}/elmo-lm-ready4fine-tuning-ru-wiki/saves/model", - "load_path": "{MODELS_PATH}/elmo-lm-ready4fine-tuning-ru-wiki/saves/model", - "in": ["x_char_ids", "y_token_ids"], - "in_y": [], - "n_gpus": 1, - "out": ["loss"] - } - ], - "out": [ - "x_char_ids", - "y_token_ids" - ] - }, - "train": { - "epochs": 20, - "batch_size": 128, - "log_every_n_batches": 100, - "val_every_n_epochs": 1, - "validation_patience": 4, - "metric_optimization": "minimize", - "metrics": [ - { - "name": "elmo_loss2ppl", - "inputs": ["loss"] - } - ], - "tensorboard_log_dir": "{MODELS_PATH}/elmo-lm-ready4fine-tuning-ru-wiki/logs", - "class_name": "nn_trainer", - "evaluation_targets": [ - "valid", - "test" - ] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/elmo-lm-ready4fine-example-data.tar.gz", - "subdir": "{DOWNLOADS_PATH}/" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/elmo-lm-ready4fine-tuning-ru-wiki.tar.gz", - "subdir": "{MODELS_PATH}/" - } - ] - } -} \ No newline at end of file diff --git a/deeppavlov/configs/elmo/elmo_lm_ready4fine_tuning_ru_wiki_simple.json b/deeppavlov/configs/elmo/elmo_lm_ready4fine_tuning_ru_wiki_simple.json deleted file mode 100644 index c4188744e4..0000000000 --- a/deeppavlov/configs/elmo/elmo_lm_ready4fine_tuning_ru_wiki_simple.json +++ /dev/null @@ -1,83 +0,0 @@ -{ - "dataset_reader": { - "class_name": "file_paths_reader", - "data_path": "{DOWNLOADS_PATH}/elmo-lm-ready4fine-example-data/data/", - "train": "train/*", - "valid": "heldout/*" - }, - "dataset_iterator": { - "class_name": "elmo_file_paths_iterator", - "seed": 31415, - "unroll_steps": 20, - "max_word_length": 50, - "n_gpus": 1, - "shuffle": false, - "bos": "", - "eos": "", - "save_path": "{MODELS_PATH}/elmo-lm-ready4fine-tuning-ru-wiki-simple/vocab.txt", - "load_path": "{MODELS_PATH}/elmo-lm-ready4fine-tuning-ru-wiki-simple/vocab.txt" - }, - "chainer": { - "in": [ - "x_char_ids" - ], - "in_y": [ - "y_token_ids" - ], - "pipe": [ - { - "class_name": "elmo_model", - "options_json_path": "{MODELS_PATH}/elmo-lm-ready4fine-tuning-ru-wiki-simple/options.json", - "unroll_steps": 20, - "batch_size": 128, - "save_path": "{MODELS_PATH}/elmo-lm-ready4fine-tuning-ru-wiki-simple/saves/model", - "load_path": "{MODELS_PATH}/elmo-lm-ready4fine-tuning-ru-wiki-simple/saves/model", - "in": ["x_char_ids", "y_token_ids"], - "in_y": [], - "n_gpus": 1, - "out": ["loss"] - } - ], - "out": [ - "x_char_ids", - "y_token_ids" - ] - }, - "train": { - "epochs": 20, - "batch_size": 128, - "log_every_n_batches": 100, - "val_every_n_epochs": 1, - "validation_patience": 4, - "metric_optimization": "minimize", - "metrics": [ - { - "name": "elmo_loss2ppl", - "inputs": ["loss"] - } - ], - "tensorboard_log_dir": "{MODELS_PATH}/elmo-lm-ready4fine-tuning-ru-wiki-simple/logs", - "class_name": "nn_trainer", - "evaluation_targets": [ - "valid", - "test" - ] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/elmo-lm-ready4fine-example-data.tar.gz", - "subdir": "{DOWNLOADS_PATH}/" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/elmo-lm-ready4fine-tuning-ru-wiki-simple.tar.gz", - "subdir": "{MODELS_PATH}/" - } - ] - } -} \ No newline at end of file diff --git a/deeppavlov/configs/elmo/elmo_paraphraser_fine_tuning.json b/deeppavlov/configs/elmo/elmo_paraphraser_fine_tuning.json deleted file mode 100644 index fce6382ffd..0000000000 --- a/deeppavlov/configs/elmo/elmo_paraphraser_fine_tuning.json +++ /dev/null @@ -1,84 +0,0 @@ -{ - "dataset_reader": { - "class_name": "file_paths_reader", - "data_path": "{DOWNLOADS_PATH}/paraphraser_train_and_pretrain_texts/", - "train": "paraphraser_train_and_pretrain_texts_train.txt", - "valid": "paraphraser_train_and_pretrain_texts_valid.txt", - "test": "paraphraser_train_and_pretrain_texts_test.txt" - }, - "dataset_iterator": { - "class_name": "elmo_file_paths_iterator", - "seed": 31415, - "unroll_steps": 20, - "max_word_length": 50, - "n_gpus": 1, - "shuffle": false, - "bos": "", - "eos": "", - "save_path": "{MODELS_PATH}/elmo_news_wmt11-16-simple_reduce_vocab/vocab-2016-09-10.txt", - "load_path": "{MODELS_PATH}/elmo_news_wmt11-16-simple_reduce_vocab/vocab-2016-09-10.txt" - }, - "chainer": { - "in": [ - "x_char_ids" - ], - "in_y": [ - "y_token_ids" - ], - "pipe": [ - { - "class_name": "elmo_model", - "options_json_path": "{MODELS_PATH}/elmo_news_wmt11-16-simple_reduce_vocab/options.json", - "unroll_steps": 20, - "batch_size": 128, - "save_path": "{MODELS_PATH}/elmo_news_wmt11-16-simple_reduce_vocab/saves/model", - "load_path": "{MODELS_PATH}/elmo_news_wmt11-16-simple_reduce_vocab/saves/model", - "in": ["x_char_ids", "y_token_ids"], - "in_y": [], - "n_gpus": 1, - "out": ["loss"] - } - ], - "out": [ - "x_char_ids", - "y_token_ids" - ] - }, - "train": { - "epochs": 1, - "batch_size": 128, - "log_every_n_batches": 24, - "val_every_n_epochs": 1, - "validation_patience": 1, - "metric_optimization": "minimize", - "metrics": [ - { - "name": "elmo_loss2ppl", - "inputs": ["loss"] - } - ], - "tensorboard_log_dir": "{MODELS_PATH}/elmo_news_wmt11-16-simple_reduce_vocab/logs", - "class_name": "nn_trainer", - "evaluation_targets": [ - "valid", - "test" - ] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/paraphraser_train_and_pretrain_texts.tar.gz", - "subdir": "{DOWNLOADS_PATH}/paraphraser_train_and_pretrain_texts" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/elmo_news_wmt11-16-simple_reduce_vocab.tar.gz", - "subdir": "{MODELS_PATH}/" - } - ] - } -} \ No newline at end of file diff --git a/deeppavlov/configs/embedder/bert_sentence_embedder.json b/deeppavlov/configs/embedder/bert_sentence_embedder.json index 348616ae27..b8c29a67f2 100644 --- a/deeppavlov/configs/embedder/bert_sentence_embedder.json +++ b/deeppavlov/configs/embedder/bert_sentence_embedder.json @@ -12,7 +12,7 @@ }, { "class_name": "transformers_bert_embedder", - "bert_config_path": "{BERT_PATH}/bert_config.json", + "bert_config_path": "{BERT_PATH}/config.json", "load_path": "{BERT_PATH}", "truncate": false, "in": ["subword_tok_ids", "startofword_markers", "attention_mask"], @@ -26,12 +26,12 @@ "variables": { "ROOT_PATH": "~/.deeppavlov", "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "BERT_PATH": "{DOWNLOADS_PATH}/bert_models/sentence_multi_cased_L-12_H-768_A-12_pt" + "BERT_PATH": "{DOWNLOADS_PATH}/bert_models/sentence_multi_cased_L-12_H-768_A-12_pt_v1" }, "labels": {}, "download": [ { - "url": "http://files.deeppavlov.ai/deeppavlov_data/bert/sentence_multi_cased_L-12_H-768_A-12_pt.tar.gz", + "url": "http://files.deeppavlov.ai/deeppavlov_data/bert/sentence_multi_cased_L-12_H-768_A-12_pt_v1.tar.gz", "subdir": "{DOWNLOADS_PATH}/bert_models" } ] diff --git a/deeppavlov/configs/embedder/elmo_en_1billion.json b/deeppavlov/configs/embedder/elmo_en_1billion.json deleted file mode 100644 index c79d4908af..0000000000 --- a/deeppavlov/configs/embedder/elmo_en_1billion.json +++ /dev/null @@ -1,36 +0,0 @@ -{ - "chainer": { - "in": [ - "sentences" - ], - "pipe": [ - { - "in": ["sentences"], - "class_name": "lazy_tokenizer", - "out": ["tokens"] - }, - { - "class_name": "elmo_embedder", - "elmo_output_names": ["lstm_outputs1", "lstm_outputs2", "word_emb"], - "mini_batch_size": 32, - "in": [ - "tokens" - ], - "spec": "https://tfhub.dev/google/elmo/2", - "out": [ - "tokens_emb" - ] - } - ], - "out": [ - "tokens_emb" - ] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models" - } - } -} diff --git a/deeppavlov/configs/embedder/elmo_ru_news.json b/deeppavlov/configs/embedder/elmo_ru_news.json deleted file mode 100644 index 86d78bfe52..0000000000 --- a/deeppavlov/configs/embedder/elmo_ru_news.json +++ /dev/null @@ -1,42 +0,0 @@ -{ - "chainer": { - "in": [ - "sentences" - ], - "pipe": [ - { - "in": ["sentences"], - "class_name": "lazy_tokenizer", - "out": ["tokens"] - }, - { - "class_name": "elmo_embedder", - "elmo_output_names": ["lstm_outputs1", "lstm_outputs2", "word_emb"], - "mini_batch_size": 32, - "in": [ - "tokens" - ], - "spec": "{DOWNLOADS_PATH}/embeddings/elmo_ru_news", - "out": [ - "tokens_emb" - ] - } - ], - "out": [ - "tokens_emb" - ] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/elmo_ru-news_wmt11-16_1.5M_steps.tar.gz", - "subdir": "{DOWNLOADS_PATH}/embeddings/elmo_ru_news" - } - ] - } -} diff --git a/deeppavlov/configs/embedder/elmo_ru_twitter.json b/deeppavlov/configs/embedder/elmo_ru_twitter.json deleted file mode 100644 index df4c6013d4..0000000000 --- a/deeppavlov/configs/embedder/elmo_ru_twitter.json +++ /dev/null @@ -1,42 +0,0 @@ -{ - "chainer": { - "in": [ - "sentences" - ], - "pipe": [ - { - "in": ["sentences"], - "class_name": "lazy_tokenizer", - "out": ["tokens"] - }, - { - "class_name": "elmo_embedder", - "elmo_output_names": ["lstm_outputs1", "lstm_outputs2", "word_emb"], - "mini_batch_size": 32, - "in": [ - "tokens" - ], - "spec": "{DOWNLOADS_PATH}/embeddings/elmo_ru_tw", - "out": [ - "tokens_emb" - ] - } - ], - "out": [ - "tokens_emb" - ] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/elmo_ru-twitter_2013-01_2018-04_600k_steps.tar.gz", - "subdir": "{DOWNLOADS_PATH}/embeddings/elmo_ru_tw" - } - ] - } -} diff --git a/deeppavlov/configs/embedder/elmo_ru_wiki.json b/deeppavlov/configs/embedder/elmo_ru_wiki.json deleted file mode 100644 index f234430e6f..0000000000 --- a/deeppavlov/configs/embedder/elmo_ru_wiki.json +++ /dev/null @@ -1,42 +0,0 @@ -{ - "chainer": { - "in": [ - "sentences" - ], - "pipe": [ - { - "in": ["sentences"], - "class_name": "lazy_tokenizer", - "out": ["tokens"] - }, - { - "class_name": "elmo_embedder", - "elmo_output_names": ["lstm_outputs1", "lstm_outputs2", "word_emb"], - "mini_batch_size": 32, - "in": [ - "tokens" - ], - "spec": "{DOWNLOADS_PATH}/embeddings/elmo_ru_wiki", - "out": [ - "tokens_emb" - ] - } - ], - "out": [ - "tokens_emb" - ] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/elmo_ru-wiki_600k_steps.tar.gz", - "subdir": "{DOWNLOADS_PATH}/embeddings/elmo_ru_wiki" - } - ] - } -} \ No newline at end of file diff --git a/deeppavlov/configs/entity_extraction/entity_detection_en.json b/deeppavlov/configs/entity_extraction/entity_detection_en.json new file mode 100644 index 0000000000..2c45fc8703 --- /dev/null +++ b/deeppavlov/configs/entity_extraction/entity_detection_en.json @@ -0,0 +1,46 @@ +{ + "chainer": { + "in": ["x"], + "pipe": [ + { + "class_name": "ner_chunker", + "batch_size": 16, + "max_chunk_len" : 180, + "max_seq_len" : 300, + "vocab_file": "{TRANSFORMER}", + "in": ["x"], + "out": ["x_chunk", "chunk_nums", "chunk_sentences_offsets", "chunk_sentences"] + }, + { + "thres_proba": 0.05, + "o_tag": "O", + "tags_file": "{NER_PATH}/tag.dict", + "return_entities_with_tags": true, + "class_name": "entity_detection_parser", + "id": "edp" + }, + { + "class_name": "ner_chunk_model", + "ner": { + "config_path": "{CONFIGS_PATH}/ner/ner_ontonotes_bert.json", + "overwrite": { + "chainer.out": ["x_tokens", "tokens_offsets", "y_pred", "probas"] + } + }, + "ner_parser": "#edp", + "in": ["x_chunk", "chunk_nums", "chunk_sentences_offsets", "chunk_sentences"], + "out": ["entity_substr", "entity_offsets", "entity_positions", "tags", "sentences_offsets", "sentences", "probas"] + } + ], + "out": ["entity_substr", "entity_offsets", "entity_positions", "tags", "sentences_offsets", "sentences", "probas"] + }, + "metadata": { + "variables": { + "ROOT_PATH": "~/.deeppavlov", + "MODELS_PATH": "{ROOT_PATH}/models", + "CONFIGS_PATH": "{DEEPPAVLOV_PATH}/configs", + "TRANSFORMER": "bert-base-cased", + "NER_PATH": "{MODELS_PATH}/ner_ontonotes_bert_torch_crf" + } + } +} diff --git a/deeppavlov/configs/entity_extraction/entity_detection_ru.json b/deeppavlov/configs/entity_extraction/entity_detection_ru.json new file mode 100644 index 0000000000..5ff48e3fd7 --- /dev/null +++ b/deeppavlov/configs/entity_extraction/entity_detection_ru.json @@ -0,0 +1,41 @@ +{ + "chainer": { + "in": ["x"], + "pipe": [ + { + "class_name": "ner_chunker", + "batch_size": 16, + "max_chunk_len" : 180, + "max_seq_len" : 300, + "vocab_file": "{TRANSFORMER}", + "in": ["x"], + "out": ["x_chunk", "chunk_nums", "chunk_sentences_offsets", "chunk_sentences"] + }, + { + "thres_proba": 0.05, + "o_tag": "O", + "tags_file": "{NER_PATH}/tag.dict", + "return_entities_with_tags": true, + "class_name": "entity_detection_parser", + "id": "edp" + }, + { + "class_name": "ner_chunk_model", + "ner": {"config_path": "{CONFIGS_PATH}/ner/ner_rus_bert_probas.json"}, + "ner_parser": "#edp", + "in": ["x_chunk", "chunk_nums", "chunk_sentences_offsets", "chunk_sentences"], + "out": ["entity_substr", "entity_offsets", "entity_positions", "tags", "sentences_offsets", "sentences", "probas"] + } + ], + "out": ["entity_substr", "entity_offsets", "entity_positions", "tags", "sentences_offsets", "sentences", "probas"] + }, + "metadata": { + "variables": { + "ROOT_PATH": "~/.deeppavlov", + "MODELS_PATH": "{ROOT_PATH}/models", + "CONFIGS_PATH": "{DEEPPAVLOV_PATH}/configs", + "TRANSFORMER": "DeepPavlov/rubert-base-cased", + "NER_PATH": "{MODELS_PATH}/wiki_ner_rus_bert" + } + } +} diff --git a/deeppavlov/configs/entity_extraction/entity_extraction_en.json b/deeppavlov/configs/entity_extraction/entity_extraction_en.json new file mode 100644 index 0000000000..188568dcb6 --- /dev/null +++ b/deeppavlov/configs/entity_extraction/entity_extraction_en.json @@ -0,0 +1,23 @@ +{ + "chainer": { + "in": ["x"], + "pipe": [ + { + "config_path": "{CONFIGS_PATH}/entity_extraction/entity_detection_en.json", + "in": ["x"], + "out": ["entity_substr", "entity_offsets", "entity_positions", "tags", "sentences_offsets", "sentences", "probas"] + }, + { + "config_path": "{CONFIGS_PATH}/entity_extraction/entity_linking_en.json", + "in": ["entity_substr", "tags", "sentences", "entity_offsets", "sentences_offsets"], + "out": ["entity_ids", "entity_conf", "entity_pages"] + } + ], + "out": ["entity_substr", "tags", "entity_offsets", "entity_ids", "entity_conf", "entity_pages"] + }, + "metadata": { + "variables": { + "CONFIGS_PATH": "{DEEPPAVLOV_PATH}/configs" + } + } +} diff --git a/deeppavlov/configs/entity_extraction/entity_extraction_ru.json b/deeppavlov/configs/entity_extraction/entity_extraction_ru.json new file mode 100644 index 0000000000..941a59a65f --- /dev/null +++ b/deeppavlov/configs/entity_extraction/entity_extraction_ru.json @@ -0,0 +1,23 @@ +{ + "chainer": { + "in": ["x"], + "pipe": [ + { + "config_path": "{CONFIGS_PATH}/entity_extraction/entity_detection_ru.json", + "in": ["x"], + "out": ["entity_substr", "entity_offsets", "entity_positions", "tags", "sentences_offsets", "sentences", "probas"] + }, + { + "config_path": "{CONFIGS_PATH}/entity_extraction/entity_linking_ru.json", + "in": ["entity_substr", "tags", "sentences", "entity_offsets", "sentences_offsets"], + "out": ["entity_ids", "entity_conf", "entity_pages"] + } + ], + "out": ["entity_substr", "tags", "entity_offsets", "entity_ids", "entity_conf", "entity_pages"] + }, + "metadata": { + "variables": { + "CONFIGS_PATH": "{DEEPPAVLOV_PATH}/configs" + } + } +} diff --git a/deeppavlov/configs/entity_extraction/entity_linking_en.json b/deeppavlov/configs/entity_extraction/entity_linking_en.json new file mode 100644 index 0000000000..9faeac0d8b --- /dev/null +++ b/deeppavlov/configs/entity_extraction/entity_linking_en.json @@ -0,0 +1,61 @@ +{ + "chainer": { + "in": ["entity_substr", "tags", "sentences", "entity_offsets", "sentences_offsets"], + "pipe": [ + { + "class_name": "torch_transformers_entity_ranker_infer", + "id": "entity_descr_ranking", + "pretrained_bert": "{TRANSFORMER}", + "encoder_weights_path": "{MODELS_PATH}/entity_linking_eng/encoder.pth.tar", + "bilinear_weights_path": "{MODELS_PATH}/entity_linking_eng/bilinear.pth.tar", + "special_token_id": 30522, + "emb_size": 512, + "block_size": 8 + }, + { + "class_name": "entity_linker", + "in": ["entity_substr", "tags", "sentences", "entity_offsets", "sentences_offsets"], + "out": ["entity_ids", "entity_conf", "entity_pages"], + "load_path": "{DOWNLOADS_PATH}/entity_linking_eng", + "entities_database_filename": "el_eng.db", + "entity_ranker": "#entity_descr_ranking", + "rank_in_runtime": true, + "num_entities_for_bert_ranking": 20, + "use_gpu": false, + "include_mention": false, + "num_entities_to_return": 3, + "lemmatize": true, + "use_descriptions": true, + "wikidata_file": "{DOWNLOADS_PATH}/wikidata/wikidata_lite.hdt", + "use_connections": true, + "use_tags": true, + "full_paragraph": true, + "return_confidences": true, + "lang": "ru" + } + ], + "out": ["entity_ids", "entity_conf", "entity_pages"] + }, + "metadata": { + "variables": { + "ROOT_PATH": "~/.deeppavlov", + "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", + "MODELS_PATH": "{ROOT_PATH}/models", + "TRANSFORMER": "prajjwal1/bert-small" + }, + "download": [ + { + "url": "http://files.deeppavlov.ai/deeppavlov_data/entity_linking/el_db_eng.tar.gz", + "subdir": "{DOWNLOADS_PATH}/entity_linking_eng" + }, + { + "url": "http://files.deeppavlov.ai/deeppavlov_data/entity_linking/el_ranker_eng.tar.gz", + "subdir": "{MODELS_PATH}/entity_linking_eng" + }, + { + "url": "http://files.deeppavlov.ai/kbqa/wikidata/wikidata_lite.tar.gz", + "subdir": "{DOWNLOADS_PATH}/wikidata" + } + ] + } +} diff --git a/deeppavlov/configs/entity_extraction/entity_linking_ru.json b/deeppavlov/configs/entity_extraction/entity_linking_ru.json new file mode 100644 index 0000000000..4b8589710c --- /dev/null +++ b/deeppavlov/configs/entity_extraction/entity_linking_ru.json @@ -0,0 +1,61 @@ +{ + "chainer": { + "in": ["entity_substr", "tags", "sentences", "entity_offsets", "sentences_offsets"], + "pipe": [ + { + "class_name": "torch_transformers_entity_ranker_infer", + "id": "entity_descr_ranking", + "pretrained_bert": "{TRANSFORMER}", + "encoder_weights_path": "{MODELS_PATH}/entity_linking_rus/encoder.pth.tar", + "bilinear_weights_path": "{MODELS_PATH}/entity_linking_rus/bilinear.pth.tar", + "special_token_id": 30522, + "emb_size": 264, + "block_size": 6 + }, + { + "class_name": "entity_linker", + "in": ["entity_substr", "tags", "sentences", "entity_offsets", "sentences_offsets"], + "out": ["entity_ids", "entity_conf", "entity_pages"], + "load_path": "{DOWNLOADS_PATH}/entity_linking_rus", + "entities_database_filename": "el_rus.db", + "entity_ranker": "#entity_descr_ranking", + "rank_in_runtime": true, + "num_entities_for_bert_ranking": 20, + "use_gpu": false, + "include_mention": false, + "num_entities_to_return": 3, + "lemmatize": true, + "use_descriptions": true, + "use_connections": true, + "use_tags": true, + "wikidata_file": "{DOWNLOADS_PATH}/wikidata/wikidata_lite.hdt", + "full_paragraph": true, + "return_confidences": true, + "lang": "ru" + } + ], + "out": ["entity_ids", "entity_conf", "entity_pages"] + }, + "metadata": { + "variables": { + "ROOT_PATH": "~/.deeppavlov", + "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", + "MODELS_PATH": "{ROOT_PATH}/models", + "TRANSFORMER": "DeepPavlov/distilrubert-tiny-cased-conversational-v1" + }, + "download": [ + { + "url": "http://files.deeppavlov.ai/deeppavlov_data/entity_linking/el_db_rus.tar.gz", + "subdir": "{DOWNLOADS_PATH}/entity_linking_rus" + }, + { + "url": "http://files.deeppavlov.ai/deeppavlov_data/entity_linking/el_ranker_rus.tar.gz", + "subdir": "{MODELS_PATH}/entity_linking_rus" + }, + { + "url": "http://files.deeppavlov.ai/kbqa/wikidata/wikidata_lite.tar.gz", + "subdir": "{DOWNLOADS_PATH}/wikidata" + } + ] + } +} diff --git a/deeppavlov/configs/faq/tfidf_logreg_autofaq.json b/deeppavlov/configs/faq/tfidf_logreg_autofaq.json index 9e2516fceb..a41ada103a 100644 --- a/deeppavlov/configs/faq/tfidf_logreg_autofaq.json +++ b/deeppavlov/configs/faq/tfidf_logreg_autofaq.json @@ -64,8 +64,8 @@ ], "class_name": "sklearn_component", "main": true, - "save_path": "{MODELS_PATH}/faq/tfidf_logreg_classifier_v2.pkl", - "load_path": "{MODELS_PATH}/faq/tfidf_logreg_classifier_v2.pkl", + "save_path": "{MODELS_PATH}/faq/tfidf_logreg_classifier_v4.pkl", + "load_path": "{MODELS_PATH}/faq/tfidf_logreg_classifier_v4.pkl", "model_class": "sklearn.linear_model:LogisticRegression", "infer_method": "predict_proba", "C": 1000, @@ -100,7 +100,7 @@ }, "download": [ { - "url": "http://files.deeppavlov.ai/faq/school/tfidf_logreg_classifier_v2.pkl", + "url": "http://files.deeppavlov.ai/faq/school/tfidf_logreg_classifier_v4.pkl", "subdir": "{MODELS_PATH}/faq" }, { diff --git a/deeppavlov/configs/faq/tfidf_logreg_en_faq.json b/deeppavlov/configs/faq/tfidf_logreg_en_faq.json index 6146bbb295..8abccda06e 100644 --- a/deeppavlov/configs/faq/tfidf_logreg_en_faq.json +++ b/deeppavlov/configs/faq/tfidf_logreg_en_faq.json @@ -36,8 +36,8 @@ ], "id": "tfidf_vec", "class_name": "sklearn_component", - "save_path": "{MODELS_PATH}/faq/mipt/en_mipt_faq_v4/tfidf.pkl", - "load_path": "{MODELS_PATH}/faq/mipt/en_mipt_faq_v4/tfidf.pkl", + "save_path": "{MODELS_PATH}/faq/mipt/en_mipt_faq_v5/tfidf.pkl", + "load_path": "{MODELS_PATH}/faq/mipt/en_mipt_faq_v5/tfidf.pkl", "model_class": "sklearn.feature_extraction.text:TfidfVectorizer", "infer_method": "transform" }, @@ -47,8 +47,8 @@ "fit_on": [ "y" ], - "save_path": "{MODELS_PATH}/faq/mipt/en_mipt_faq_v4/en_mipt_answers.dict", - "load_path": "{MODELS_PATH}/faq/mipt/en_mipt_faq_v4/en_mipt_answers.dict", + "save_path": "{MODELS_PATH}/faq/mipt/en_mipt_faq_v5/en_mipt_answers.dict", + "load_path": "{MODELS_PATH}/faq/mipt/en_mipt_faq_v5/en_mipt_answers.dict", "in": "y", "out": "y_ids" }, @@ -63,8 +63,8 @@ ], "class_name": "sklearn_component", "main": true, - "save_path": "{MODELS_PATH}/faq/mipt/en_mipt_faq_v4/logreg.pkl", - "load_path": "{MODELS_PATH}/faq/mipt/en_mipt_faq_v4/logreg.pkl", + "save_path": "{MODELS_PATH}/faq/mipt/en_mipt_faq_v5/logreg.pkl", + "load_path": "{MODELS_PATH}/faq/mipt/en_mipt_faq_v5/logreg.pkl", "model_class": "sklearn.linear_model:LogisticRegression", "infer_method": "predict_proba", "C": 1000, @@ -99,7 +99,7 @@ }, "download": [ { - "url": "http://files.deeppavlov.ai/faq/mipt/en_mipt_faq_v4.tar.gz", + "url": "http://files.deeppavlov.ai/faq/mipt/en_mipt_faq_v5.tar.gz", "subdir": "{MODELS_PATH}/faq/mipt" } ] diff --git a/deeppavlov/configs/go_bot/database_dstc2.json b/deeppavlov/configs/go_bot/database_dstc2.json deleted file mode 100644 index bcb153a6fd..0000000000 --- a/deeppavlov/configs/go_bot/database_dstc2.json +++ /dev/null @@ -1,44 +0,0 @@ -{ - "dataset_reader": { - "class_name": "dstc2_reader", - "data_path": "{DOWNLOADS_PATH}/dstc2_v3" - }, - "dataset_iterator": { - "class_name": "dialog_db_result_iterator" - }, - "chainer": { - "in": ["db_result"], - "in_y": [], - "out": [], - "pipe": [ - { - "id": "restaurant_database", - "class_name": "sqlite_database", - "fit_on": ["db_result"], - "table_name": "mytable", - "primary_keys": ["name"], - "save_path": "{DOWNLOADS_PATH}/dstc2_v3/resto.sqlite" - } - ] - }, - "train": { - "class_name": "fit_trainer", - "evaluation_targets": [ - "valid", - "test" - ] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/datasets/dstc2_v3.tar.gz", - "subdir": "{DOWNLOADS_PATH}/dstc2_v3" - } - ] - } -} diff --git a/deeppavlov/configs/go_bot/gobot_dstc2.json b/deeppavlov/configs/go_bot/gobot_dstc2.json deleted file mode 100644 index 2611af6f05..0000000000 --- a/deeppavlov/configs/go_bot/gobot_dstc2.json +++ /dev/null @@ -1,125 +0,0 @@ -{ - "dataset_reader": { - "class_name": "dstc2_reader", - "data_path": "{DATA_PATH}" - }, - "dataset_iterator": { - "class_name": "dialog_iterator" - }, - "chainer": { - "in": ["x"], - "in_y": ["y"], - "out": ["y_predicted"], - "pipe": [ - { - "class_name": "dialog_component_wrapper", - "component": { "class_name": "split_tokenizer" }, - "in": ["x"], - "out": ["x_tokens"] - }, - { - "id": "word_vocab", - "class_name": "simple_vocab", - "fit_on": ["x_tokens"], - "save_path": "{MODEL_PATH}/word.dict", - "load_path": "{MODEL_PATH}/word.dict" - }, - { - "class_name": "go_bot", - "load_path": "{MODEL_PATH}/model", - "save_path": "{MODEL_PATH}/model", - "in": ["x"], - "in_y": ["y"], - "out": ["y_predicted"], - "main": true, - "debug": false, - "learning_rate": 0.003, - "learning_rate_drop_patience": 5, - "learning_rate_drop_div": 10.0, - "momentum": 0.95, - "optimizer": "tensorflow.train:AdamOptimizer", - "clip_norm": 2.0, - "dropout_rate": 0.4, - "l2_reg_coef": 3e-4, - "hidden_size": 128, - "dense_size": 160, - "word_vocab": "#word_vocab", - "database": { - "class_name": "sqlite_database", - "table_name": "mytable", - "primary_keys": ["name"], - "save_path": "{DOWNLOADS_PATH}/dstc2_v3/resto.sqlite" - }, - "nlg_manager": { - "class_name": "gobot_nlg_manager", - "template_path": "{DATA_PATH}/dstc2-templates.txt", - "template_type": "DualTemplate", - "api_call_action": "api_call" - }, - "api_call_action": "api_call", - "use_action_mask": false, - "slot_filler": { - "config_path": "{CONFIGS_PATH}/ner/slotfill_dstc2.json" - }, - "intent_classifier": null, - "embedder": { - "class_name": "glove", - "load_path": "{DOWNLOADS_PATH}/embeddings/glove.6B.100d.txt" - }, - "bow_embedder": { - "class_name": "bow", - "depth": "#word_vocab.__len__()", - "with_counts": true - }, - "tokenizer": { - "class_name": "stream_spacy_tokenizer", - "lowercase": false - }, - "tracker": { - "class_name": "featurized_tracker", - "slot_names": ["pricerange", "this", "area", "food", "name"] - } - } - ] - }, - "train": { - "epochs": 200, - "batch_size": 8, - - "metrics": ["per_item_dialog_accuracy"], - "validation_patience": 10, - "val_every_n_batches": 15, - - "log_every_n_batches": 15, - "show_examples": false, - "evaluation_targets": [ - "valid", - "test" - ], - "class_name": "nn_trainer" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "CONFIGS_PATH": "{DEEPPAVLOV_PATH}/configs", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "DATA_PATH": "{DOWNLOADS_PATH}/dstc2_v3", - "MODELS_PATH": "{ROOT_PATH}/models", - "MODEL_PATH": "{MODELS_PATH}/gobot_dstc2" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/gobot_dstc2_v9.tar.gz", - "subdir": "{MODELS_PATH}" - }, - { - "url": "http://files.deeppavlov.ai/embeddings/glove.6B.100d.txt", - "subdir": "{DOWNLOADS_PATH}/embeddings" - }, - { - "url": "http://files.deeppavlov.ai/datasets/dstc2_v3.tar.gz", - "subdir": "{DATA_PATH}" - } - ] - } -} diff --git a/deeppavlov/configs/go_bot/gobot_dstc2_best.json b/deeppavlov/configs/go_bot/gobot_dstc2_best.json deleted file mode 100644 index b13c680f9e..0000000000 --- a/deeppavlov/configs/go_bot/gobot_dstc2_best.json +++ /dev/null @@ -1,133 +0,0 @@ -{ - "dataset_reader": { - "class_name": "dstc2_reader", - "data_path": "{DSTC2_DATA_PATH}" - }, - "dataset_iterator": { - "class_name": "dialog_iterator" - }, - "chainer": { - "in": ["x"], - "in_y": ["y"], - "out": ["y_predicted"], - "pipe": [ - { - "class_name": "dialog_component_wrapper", - "component": { "class_name": "split_tokenizer" }, - "in": ["x"], - "out": ["x_tokens"] - }, - { - "id": "token_vocab", - "fit_on": ["x_tokens"], - "class_name": "simple_vocab", - "save_path": "{MODELS_PATH}/gobot_dstc2_best/word.dict", - "load_path": "{MODELS_PATH}/gobot_dstc2_best/word.dict" - }, - { - "id": "restaurant_database", - "class_name": "sqlite_database", - "table_name": "mytable", - "primary_keys": ["name"], - "save_path": "{DOWNLOADS_PATH}/dstc2_v3/resto.sqlite" - }, - { - "class_name": "go_bot", - "load_path": "{MODELS_PATH}/gobot_dstc2_best/model", - "save_path": "{MODELS_PATH}/gobot_dstc2_best/model", - "in": ["x"], - "in_y": ["y"], - "out": ["y_predicted"], - "main": true, - "debug": false, - "learning_rate": 3e-3, - "learning_rate_drop_patience": 10, - "learning_rate_drop_div": 4.0, - "momentum": 0.95, - "optimizer": "tensorflow.train:AdamOptimizer", - "clip_norm": 2.0, - "dropout_rate": 0.75, - "l2_reg_coef": 9e-4, - "hidden_size": 128, - "dense_size": 128, - "attention_mechanism": { - "type": "general", - "hidden_size": 32, - "action_as_key": true, - "intent_as_key": true, - "max_num_tokens": 100, - "projected_align": false - }, - "word_vocab": "#token_vocab", - "database": "#restaurant_database", - "nlg_manager": { - "class_name": "gobot_nlg_manager", - "template_path": "{DSTC2_DATA_PATH}/dstc2-templates.txt", - "template_type": "DualTemplate", - "api_call_action": "api_call" - }, - "use_action_mask": false, - "slot_filler": { - "config_path": "{CONFIGS_PATH}/ner/slotfill_dstc2.json" - }, - "intent_classifier": { - "config_path": "{CONFIGS_PATH}/classifiers/intents_dstc2.json" - }, - "embedder": { - "class_name": "fasttext", - "load_path": "{DOWNLOADS_PATH}/embeddings/wiki.en.bin" - }, - "bow_embedder": null, - "tokenizer": { - "class_name": "stream_spacy_tokenizer", - "lowercase": false - }, - "tracker": { - "class_name": "featurized_tracker", - "slot_names": ["pricerange", "this", "area", "food", "name"] - } - } - ] - }, - "train": { - "epochs": 100, - "batch_size": 8, - - "pytest_max_batches": 2, - - "metrics": ["per_item_dialog_accuracy"], - "validation_patience": 15, - "val_every_n_batches": 15, - - "log_every_n_batches": 15, - "show_examples": false, - "evaluation_targets": [ - "valid", - "test" - ], - "class_name": "nn_trainer" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "CONFIGS_PATH": "{DEEPPAVLOV_PATH}/configs", - "DSTC2_DATA_PATH": "{DOWNLOADS_PATH}/dstc2_v3" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/gobot_dstc2_best_v4.tar.gz", - "subdir": "{MODELS_PATH}" - }, - { - "url": "http://files.deeppavlov.ai/datasets/dstc2_v3.tar.gz", - "subdir": "{DOWNLOADS_PATH}/dstc2_v3" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/embeddings/wiki.en.bin", - "subdir": "{DOWNLOADS_PATH}/embeddings" - } - ] - } -} diff --git a/deeppavlov/configs/go_bot/gobot_dstc2_best_json_nlg.json b/deeppavlov/configs/go_bot/gobot_dstc2_best_json_nlg.json deleted file mode 100644 index 19202edb74..0000000000 --- a/deeppavlov/configs/go_bot/gobot_dstc2_best_json_nlg.json +++ /dev/null @@ -1,133 +0,0 @@ -{ - "dataset_reader": { - "class_name": "dstc2_reader", - "data_path": "{DSTC2_DATA_PATH}" - }, - "dataset_iterator": { - "class_name": "dialog_iterator" - }, - "chainer": { - "in": ["x"], - "in_y": ["y"], - "out": ["y_predicted"], - "pipe": [ - { - "class_name": "dialog_component_wrapper", - "component": { "class_name": "split_tokenizer" }, - "in": ["x"], - "out": ["x_tokens"] - }, - { - "id": "token_vocab", - "fit_on": ["x_tokens"], - "class_name": "simple_vocab", - "save_path": "{MODELS_PATH}/gobot_dstc2_best_json/word.dict", - "load_path": "{MODELS_PATH}/gobot_dstc2_best_json/word.dict" - }, - { - "id": "restaurant_database", - "class_name": "sqlite_database", - "table_name": "mytable", - "primary_keys": ["name"], - "save_path": "{DOWNLOADS_PATH}/dstc2/resto.sqlite" - }, - { - "class_name": "go_bot", - "load_path": "{MODELS_PATH}/gobot_dstc2_best_json/model", - "save_path": "{MODELS_PATH}/gobot_dstc2_best_json/model", - "in": ["x"], - "in_y": ["y"], - "out": ["y_predicted"], - "main": true, - "debug": false, - "learning_rate": 3e-3, - "learning_rate_drop_patience": 10, - "learning_rate_drop_div": 4.0, - "momentum": 0.95, - "optimizer": "tensorflow.train:AdamOptimizer", - "clip_norm": 2.0, - "dropout_rate": 0.75, - "l2_reg_coef": 9e-4, - "hidden_size": 128, - "dense_size": 128, - "attention_mechanism": { - "type": "general", - "hidden_size": 32, - "action_as_key": true, - "intent_as_key": true, - "max_num_tokens": 100, - "projected_align": false - }, - "word_vocab": "#token_vocab", - "database": "#restaurant_database", - "nlg_manager": { - "class_name": "gobot_json_nlg_manager", - "data_path": "{DSTC2_DATA_PATH}", - "actions2slots_path": "{DSTC2_DATA_PATH}/dstc2-actions2slots.json", - "api_call_action": "api_call" - }, - "use_action_mask": false, - "slot_filler": { - "config_path": "{CONFIGS_PATH}/ner/slotfill_dstc2.json" - }, - "intent_classifier": { - "config_path": "{CONFIGS_PATH}/classifiers/intents_dstc2.json" - }, - "embedder": { - "class_name": "fasttext", - "load_path": "{DOWNLOADS_PATH}/embeddings/wiki.en.bin" - }, - "bow_embedder": null, - "tokenizer": { - "class_name": "stream_spacy_tokenizer", - "lowercase": false - }, - "tracker": { - "class_name": "featurized_tracker", - "slot_names": ["pricerange", "this", "area", "food", "name"] - } - } - ] - }, - "train": { - "epochs": 100, - "batch_size": 8, - - "pytest_max_batches": 2, - - "metrics": ["per_item_action_accuracy"], - "validation_patience": 15, - "val_every_n_batches": 15, - - "log_every_n_batches": 15, - "show_examples": false, - "evaluation_targets": [ - "valid", - "test" - ], - "class_name": "nn_trainer" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "CONFIGS_PATH": "{DEEPPAVLOV_PATH}/configs", - "DSTC2_DATA_PATH": "{DOWNLOADS_PATH}/dstc2_v3" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/gobot_dstc2_best_v4.tar.gz", - "subdir": "{MODELS_PATH}" - }, - { - "url": "http://files.deeppavlov.ai/datasets/dstc2_v3.tar.gz", - "subdir": "{DSTC2_DATA_PATH}" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/embeddings/wiki.en.bin", - "subdir": "{DOWNLOADS_PATH}/embeddings" - } - ] - } -} diff --git a/deeppavlov/configs/go_bot/gobot_dstc2_minimal.json b/deeppavlov/configs/go_bot/gobot_dstc2_minimal.json deleted file mode 100644 index 032b8e05ac..0000000000 --- a/deeppavlov/configs/go_bot/gobot_dstc2_minimal.json +++ /dev/null @@ -1,115 +0,0 @@ -{ - "dataset_reader": { - "class_name": "dstc2_reader", - "data_path": "{DATA_PATH}" - }, - "dataset_iterator": { - "class_name": "dialog_iterator" - }, - "chainer": { - "in": ["x"], - "in_y": ["y"], - "out": ["y_predicted"], - "pipe": [ - { - "class_name": "dialog_component_wrapper", - "component": { "class_name": "split_tokenizer" }, - "in": ["x"], - "out": ["x_tokens"] - }, - { - "id": "word_vocab", - "class_name": "simple_vocab", - "fit_on": ["x_tokens"], - "save_path": "{MODEL_PATH}/word.dict", - "load_path": "{MODEL_PATH}/word.dict" - }, - { - "class_name": "go_bot", - "load_path": "{MODEL_PATH}/model", - "save_path": "{MODEL_PATH}/model", - "in": ["x"], - "in_y": ["y"], - "out": ["y_predicted"], - "main": true, - "debug": false, - "learning_rate": 0.003, - "learning_rate_drop_patience": 5, - "learning_rate_drop_div": 10.0, - "momentum": 0.95, - "optimizer": "tensorflow.train:AdamOptimizer", - "clip_norm": 2.0, - "dropout_rate": 0.4, - "l2_reg_coef": 3e-4, - "hidden_size": 128, - "dense_size": 160, - "word_vocab": "#word_vocab", - "database": null, - "nlg_manager": { - "class_name": "gobot_nlg_manager", - "template_path": "{DATA_PATH}/dstc2-templates.txt", - "template_type": "DualTemplate", - "api_call_action": "api_call" - }, - "api_call_action": null, - "use_action_mask": false, - "slot_filler": null, - "intent_classifier": null, - "embedder": { - "class_name": "glove", - "load_path": "{DOWNLOADS_PATH}/embeddings/glove.6B.100d.txt" - }, - "bow_embedder": null, - "tokenizer": { - "class_name": "stream_spacy_tokenizer", - "lowercase": false - }, - "tracker": { - "class_name": "featurized_tracker", - "slot_names": ["pricerange", "this", "area", "food", "name"] - } - } - ] - }, - "train": { - "epochs": 200, - "batch_size": 4, - - "metrics": ["per_item_dialog_accuracy"], - "validation_patience": 10, - "val_every_n_batches": 15, - - "log_every_n_batches": 15, - "log_on_k_batches": -1, - "show_examples": false, - "evaluation_targets": [ - "valid", - "test" - ], - "class_name": "nn_trainer" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "CONFIGS_PATH": "{DEEPPAVLOV_PATH}/configs", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "DATA_PATH": "{DOWNLOADS_PATH}/dstc2_v3", - "MODELS_PATH": "{ROOT_PATH}/models", - "MODEL_PATH": "{MODELS_PATH}/gobot_dstc2_minimal" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/gobot_dstc2_v9.tar.gz", - "subdir": "{MODELS_PATH}" - }, - { - "url": "http://files.deeppavlov.ai/embeddings/glove.6B.100d.txt", - "subdir": "{DOWNLOADS_PATH}/embeddings" - }, - { - "url": "http://files.deeppavlov.ai/datasets/dstc2_v3.tar.gz", - "subdir": "{DATA_PATH}" - } - ] - } -} diff --git a/deeppavlov/configs/go_bot/gobot_md_yaml_minimal.json b/deeppavlov/configs/go_bot/gobot_md_yaml_minimal.json deleted file mode 100644 index ae6a8158ae..0000000000 --- a/deeppavlov/configs/go_bot/gobot_md_yaml_minimal.json +++ /dev/null @@ -1,112 +0,0 @@ -{ - "dataset_reader": { - "class_name": "md_yaml_dialogs_reader", - "data_path": "{DATA_PATH}" - }, - "dataset_iterator": { - "class_name": "dialog_iterator" - }, - "chainer": { - "in": ["x"], - "in_y": ["y"], - "out": ["y_predicted"], - "pipe": [ - { - "class_name": "dialog_component_wrapper", - "component": { "class_name": "split_tokenizer" }, - "in": ["x"], - "out": ["x_tokens"] - }, - { - "id": "word_vocab", - "class_name": "simple_vocab", - "fit_on": ["x_tokens"], - "save_path": "{MODEL_PATH}/word.dict", - "load_path": "{MODEL_PATH}/word.dict" - }, - { - "class_name": "go_bot", - "load_path": "{MODEL_PATH}/model", - "save_path": "{MODEL_PATH}/model", - "in": ["x"], - "in_y": ["y"], - "out": ["y_predicted"], - "main": true, - "debug": false, - "learning_rate": 0.003, - "learning_rate_drop_patience": 5, - "learning_rate_drop_div": 10.0, - "momentum": 0.95, - "optimizer": "tensorflow.train:AdamOptimizer", - "clip_norm": 2.0, - "dropout_rate": 0.4, - "l2_reg_coef": 3e-4, - "hidden_size": 128, - "dense_size": 160, - "word_vocab": "#word_vocab", - "database": null, - "nlg_manager": { - "class_name": "gobot_json_nlg_manager", - "data_path": "{DATA_PATH}", - "dataset_reader_class": "md_yaml_dialogs_reader", - "actions2slots_path": "{DATA_PATH}/dstc2-actions2slots.json", - "api_call_action": null - }, - "api_call_action": null, - "use_action_mask": false, - "slot_filler": null, - "intent_classifier": null, - "embedder": { - "class_name": "glove", - "load_path": "{DOWNLOADS_PATH}/embeddings/glove.6B.100d.txt" - }, - "bow_embedder": null, - "tokenizer": { - "class_name": "stream_spacy_tokenizer", - "lowercase": false - }, - "tracker": { - "class_name": "featurized_tracker", - "slot_names": [] - } - } - ] - }, - "train": { - "epochs": 200, - "batch_size": 4, - - "metrics": ["per_item_action_accuracy"], - "validation_patience": 10, - "val_every_n_batches": 15, - - "log_every_n_batches": 15, - "log_on_k_batches": -1, - "show_examples": false, - "evaluation_targets": [ - "valid", - "test" - ], - "class_name": "nn_trainer" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "CONFIGS_PATH": "{DEEPPAVLOV_PATH}/configs", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "DATA_PATH": "{DOWNLOADS_PATH}/gobot_md_yaml_minimal", - "MODELS_PATH": "{ROOT_PATH}/models", - "MODEL_PATH": "{MODELS_PATH}/gobot_md_yaml_minimal" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/embeddings/glove.6B.100d.txt", - "subdir": "{DOWNLOADS_PATH}/embeddings" - }, - { - "url": "http://files.deeppavlov.ai/datasets/gobot_md_yaml_minimal.tar.gz", - "subdir": "{DATA_PATH}" - } - ] - } -} diff --git a/deeppavlov/configs/go_bot/gobot_simple_dstc2.json b/deeppavlov/configs/go_bot/gobot_simple_dstc2.json deleted file mode 100644 index 52093d0ecb..0000000000 --- a/deeppavlov/configs/go_bot/gobot_simple_dstc2.json +++ /dev/null @@ -1,125 +0,0 @@ -{ - "dataset_reader": { - "class_name": "dstc2_reader", - "data_path": "{DSTC2_DATA_PATH}" - }, - "dataset_iterator": { - "class_name": "dialog_iterator" - }, - "chainer": { - "in": ["x"], - "in_y": ["y"], - "out": ["y_predicted"], - "pipe": [ - { - "class_name": "dialog_component_wrapper", - "component": { "class_name": "split_tokenizer" }, - "in": ["x"], - "out": ["x_tokens"] - }, - { - "id": "word_vocab", - "class_name": "simple_vocab", - "fit_on": ["x_tokens"], - "save_path": "{MODEL_PATH}/word.dict", - "load_path": "{MODEL_PATH}/word.dict" - }, - { - "class_name": "go_bot", - "load_path": "{MODEL_PATH}/model", - "save_path": "{MODEL_PATH}/model", - "in": ["x"], - "in_y": ["y"], - "out": ["y_predicted"], - "main": true, - "debug": false, - "learning_rate": 0.003, - "learning_rate_drop_patience": 5, - "learning_rate_drop_div": 10.0, - "momentum": 0.95, - "optimizer": "tensorflow.train:AdamOptimizer", - "clip_norm": 2.0, - "dropout_rate": 0.4, - "l2_reg_coef": 3e-4, - "hidden_size": 128, - "dense_size": 160, - "word_vocab": "#word_vocab", - "database": { - "class_name": "sqlite_database", - "table_name": "mytable", - "primary_keys": ["name"], - "save_path": "{DSTC2_DATA_PATH}/resto.sqlite" - }, - "nlg_manager": { - "class_name": "gobot_nlg_manager", - "template_path": "{DSTC2_DATA_PATH}/dstc2-templates.txt", - "template_type": "DualTemplate", - "api_call_action": "api_call" - }, - "api_call_action": "api_call", - "use_action_mask": false, - "slot_filler": { - "config_path": "{CONFIGS_PATH}/ner/slotfill_dstc2.json" - }, - "intent_classifier": null, - "embedder": { - "class_name": "glove", - "load_path": "{DOWNLOADS_PATH}/embeddings/glove.6B.100d.txt" - }, - "bow_embedder": { - "class_name": "bow", - "depth": "#word_vocab.__len__()", - "with_counts": true - }, - "tokenizer": { - "class_name": "stream_spacy_tokenizer", - "lowercase": false - }, - "tracker": { - "class_name": "featurized_tracker", - "slot_names": ["pricerange", "this", "area", "food", "name"] - } - } - ] - }, - "train": { - "epochs": 200, - "batch_size": 8, - - "metrics": ["per_item_dialog_accuracy"], - "validation_patience": 10, - "val_every_n_batches": 15, - - "log_every_n_batches": 15, - "show_examples": false, - "evaluation_targets": [ - "valid", - "test" - ], - "class_name": "nn_trainer" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "CONFIGS_PATH": "{DEEPPAVLOV_PATH}/configs", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "DSTC2_DATA_PATH": "{DOWNLOADS_PATH}/dstc2_v3", - "MODELS_PATH": "{ROOT_PATH}/models", - "MODEL_PATH": "{MODELS_PATH}/gobot_dstc2" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/gobot_dstc2_v9.tar.gz", - "subdir": "{MODELS_PATH}" - }, - { - "url": "http://files.deeppavlov.ai/embeddings/glove.6B.100d.txt", - "subdir": "{DOWNLOADS_PATH}/embeddings" - }, - { - "url": "http://files.deeppavlov.ai/datasets/dstc2_v3.tar.gz", - "subdir": "{DSTC2_DATA_PATH}" - } - ] - } -} diff --git a/deeppavlov/configs/intent_catcher/intent_catcher.json b/deeppavlov/configs/intent_catcher/intent_catcher.json deleted file mode 100644 index 0c527b2774..0000000000 --- a/deeppavlov/configs/intent_catcher/intent_catcher.json +++ /dev/null @@ -1,97 +0,0 @@ -{ - "dataset_reader": { - "class_name": "intent_catcher_reader", - "data_path": "{DOWNLOADS_PATH}/intent_catcher_data" - }, - "dataset_iterator": { - "class_name": "basic_classification_iterator", - "seed": 42 - }, - "chainer": { - "in": [ - "x" - ], - "in_y": [ - "y" - ], - "pipe": [ - { - "id": "classes_vocab", - "class_name": "simple_vocab", - "fit_on": [ - "y" - ], - "save_path": "{MODEL_PATH}/classes.dict", - "load_path": "{MODEL_PATH}/classes.dict", - "in": "y", - "out": "y_ids" - }, - { - "in": [ - "x" - ], - "in_y": [ - "y_ids" - ], - "out": [ - "y_pred_probas" - ], - "class_name": "intent_catcher", - "embeddings": "use", - "limit": 10, - "multilabel": false, - "number_of_layers": 1, - "number_of_intents": 23, - "hidden_dim": 256, - "save_path": "{MODEL_PATH}/model", - "load_path": "{MODEL_PATH}/model", - "mode": "train" - }, - { - "in": "y_pred_probas", - "out": "y_pred_ids", - "class_name": "proba2labels", - "max_proba": true - }, - { - "in": "y_pred_ids", - "out": "y_pred_labels", - "ref": "classes_vocab" - } - ], - "out": [ - "y_pred_labels" - ] - }, - "train": { - "epochs": 25, - "batch_size": 100, - "metrics": [ - "accuracy", - "f1_macro" - ], - "validation_patience": 5, - "val_every_n_epochs": 5, - "log_every_n_epochs": 5, - "show_examples": false, - "evaluation_targets": [ - "valid", - "test" - ], - "class_name": "nn_trainer" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "MODEL_PATH": "{MODELS_PATH}/classifiers/intent_catcher" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/intent_catcher/intent_catcher.tar.gz", - "subdir": "{MODELS_PATH}/classifiers" - } - ] - } -} diff --git a/deeppavlov/configs/kbqa/entity_linking_eng.json b/deeppavlov/configs/kbqa/entity_linking_eng.json deleted file mode 100644 index 3f337dee3d..0000000000 --- a/deeppavlov/configs/kbqa/entity_linking_eng.json +++ /dev/null @@ -1,89 +0,0 @@ -{ - "chainer": { - "in": ["documents"], - "pipe": [ - { - "class_name": "ner_chunker", - "id": "chunker" - }, - { - "thres_proba": 0.05, - "entity_tags": ["PERSON", "LOC", "ORG", "GPE", "EVENT", "WORK_OF_ART"], - "type_tag": "TYPE", - "o_tag": "O", - "tags_file": "{NER_PATH}/tag.dict", - "return_entities_with_tags": true, - "class_name": "entity_detection_parser", - "id": "edp" - }, - { - "class_name": "rel_ranking_bert_infer", - "id": "entity_descr_ranking", - "ranker": {"config_path": "{CONFIGS_PATH}/classifiers/entity_ranking_bert_eng_no_mention.json"}, - "batch_size": 100, - "load_path": "{DOWNLOADS_PATH}/wikidata_eng", - "rel_q2name_filename": "q_to_descr_en.pickle", - "rels_to_leave": 200 - }, - { - "class_name": "entity_linker", - "in": ["documents"], - "out": ["entity_substr_list", "entity_positions_list", "entity_ids_list"], - "load_path": "{DOWNLOADS_PATH}/wikidata_eng", - "save_path": "{DOWNLOADS_PATH}/wikidata_eng", - "word_to_idlist_filename": "word_to_idlist_eng.pickle", - "entities_list_filename": "ent_list_eng.pickle", - "entities_ranking_filename": "entities_ranking_dict_eng.pickle", - "vectorizer_filename": "vectorizer_eng.pk", - "faiss_index_filename": "{DOWNLOADS_PATH}/wikidata_eng/faiss_vectors_eng.index", - "q_to_descr_filename": "q_to_descr_en.pickle", - "chunker": "#chunker", - "ner": {"config_path": "{CONFIGS_PATH}/ner/ner_ontonotes_bert_probas.json"}, - "ner_parser": "#edp", - "entity_ranker": "#entity_descr_ranking", - "num_faiss_candidate_entities": 10, - "num_entities_for_bert_ranking": 200, - "num_faiss_cells": 50, - "use_gpu": false, - "fit_vectorizer": false, - "max_tfidf_features": 500, - "include_mention": false, - "ngram_range": [2, 2], - "num_entities_to_return": 1, - "build_inverted_index": false, - "lemmatize": false, - "use_descriptions": true, - "use_prefix_tree": false, - "lang": "en" - } - ], - "out": ["entity_substr_list", "entity_positions_list", "entity_ids_list"] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "CONFIGS_PATH": "{DEEPPAVLOV_PATH}/configs", - "NER_PATH": "{MODELS_PATH}/ner_ontonotes_bert" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/kbqa/wikidata/entity_linking_eng.tar.gz", - "subdir": "{DOWNLOADS_PATH}/wikidata_eng" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/models/ner_ontonotes_bert.tar.gz", - "subdir": "{MODELS_PATH}/ner_ontonotes_bert" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/models/entity_ranking_bert_eng_no_mention.tar.gz", - "subdir": "{MODELS_PATH}/entity_ranking_bert_eng_no_mention" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/wikidata/q_to_descr_en.pickle", - "subdir": "{DOWNLOADS_PATH}/wikidata_eng" - } - ] - } -} diff --git a/deeppavlov/configs/kbqa/entity_linking_rus.json b/deeppavlov/configs/kbqa/entity_linking_rus.json deleted file mode 100644 index 8593501860..0000000000 --- a/deeppavlov/configs/kbqa/entity_linking_rus.json +++ /dev/null @@ -1,89 +0,0 @@ -{ - "chainer": { - "in": ["documents"], - "pipe": [ - { - "class_name": "ner_chunker", - "id": "chunker" - }, - { - "thres_proba": 0.05, - "entity_tags": ["PER", "LOC", "ORG"], - "type_tag": "TYPE", - "o_tag": "O", - "tags_file": "{NER_PATH}/tag.dict", - "return_entities_with_tags": true, - "class_name": "entity_detection_parser", - "id": "edp" - }, - { - "class_name": "rel_ranking_bert_infer", - "id": "entity_descr_ranking", - "ranker": {"config_path": "{CONFIGS_PATH}/classifiers/entity_ranking_bert_rus_no_mention.json"}, - "batch_size": 100, - "load_path": "{DOWNLOADS_PATH}/wikidata_rus", - "rel_q2name_filename": "q_to_descr_ru.pickle", - "rels_to_leave": 200 - }, - { - "class_name": "entity_linker", - "in": ["documents"], - "out": ["entity_substr_list", "entity_positions_list", "entity_ids_list"], - "load_path": "{DOWNLOADS_PATH}/wikidata_rus", - "save_path": "{DOWNLOADS_PATH}/wikidata_rus", - "word_to_idlist_filename": "word_to_idlist_rus.pickle", - "entities_list_filename": "ent_list_rus.pickle", - "entities_ranking_filename": "entities_ranking_dict_rus.pickle", - "vectorizer_filename": "vectorizer_rus.pk", - "faiss_index_filename": "{DOWNLOADS_PATH}/wikidata_rus/faiss_vectors_rus.index", - "q_to_descr_filename": "q_to_descr_ru.pickle", - "chunker": "#chunker", - "ner": {"config_path": "{CONFIGS_PATH}/ner/ner_rus_bert_probas.json"}, - "ner_parser": "#edp", - "entity_ranker": "#entity_descr_ranking", - "num_faiss_candidate_entities": 10, - "num_entities_for_bert_ranking": 200, - "num_faiss_cells": 1, - "use_gpu": false, - "fit_vectorizer": false, - "max_tfidf_features": 500, - "include_mention": false, - "ngram_range": [2, 2], - "num_entities_to_return": 1, - "build_inverted_index": false, - "lemmatize": true, - "use_descriptions": true, - "use_prefix_tree": false, - "lang": "ru" - } - ], - "out": ["entity_substr_list", "entity_positions_list", "entity_ids_list"] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "CONFIGS_PATH": "{DEEPPAVLOV_PATH}/configs", - "NER_PATH": "{MODELS_PATH}/ner_rus_bert" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/kbqa/wikidata/entity_linking_rus.tar.gz", - "subdir": "{DOWNLOADS_PATH}/wikidata_rus" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/models/ner_rus_bert.tar.gz", - "subdir": "{MODELS_PATH}/ner_rus_bert" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/models/entity_ranking_bert_rus_no_mention.tar.gz", - "subdir": "{MODELS_PATH}/entity_ranking_bert_rus_no_mention" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/wikidata/q_to_descr_ru.pickle", - "subdir": "{DOWNLOADS_PATH}/wikidata_rus" - } - ] - } -} diff --git a/deeppavlov/configs/kbqa/kbqa_cq.json b/deeppavlov/configs/kbqa/kbqa_cq.json deleted file mode 100644 index d87c20aa19..0000000000 --- a/deeppavlov/configs/kbqa/kbqa_cq.json +++ /dev/null @@ -1,184 +0,0 @@ -{ - "chainer": { - "in": ["x_init"], - "in_y": ["y"], - "pipe": [ - { - "class_name": "question_sign_checker", - "in": ["x_init"], - "out": ["x"] - }, - { - "config_path": "{CONFIGS_PATH}/ner/ner_lcquad_bert_ent_and_type.json", - "in": ["x"], - "out": ["x_tokens", "y_pred"] - }, - { - "in": ["x_tokens", "y_pred"], - "out": ["entities", "types", "entities_pos"], - "entity_tags": ["E-TAG"], - "type_tag": "T-TAG", - "o_tag": "O-TAG", - "tags_file": "{NER_PATH}/tag.dict", - "class_name": "entity_detection_parser" - }, - { - "class_name": "wiki_parser", - "id": "wiki_p", - "wiki_filename": "{DOWNLOADS_PATH}/wikidata/wikidata.hdt", - "lang": "@en" - }, - { - "class_name": "template_matcher", - "id": "template_m", - "num_processors": 16, - "load_path": "{DOWNLOADS_PATH}/wikidata_eng", - "templates_filename": "templates_eng.json" - }, - { - "config_path": "{CONFIGS_PATH}/classifiers/query_pr.json", - "in": ["x"], - "out": ["template_type"] - }, - { - "class_name": "kbqa_entity_linker", - "id": "linker_entities", - "load_path": "{DOWNLOADS_PATH}/wikidata_eng", - "inverted_index_filename": "inverted_index_eng.pickle", - "entities_list_filename": "entities_list.pickle", - "q2name_filename": "wiki_eng_q_to_name.pickle", - "who_entities_filename": "who_entities.pickle", - "build_inverted_index": false, - "use_descriptions": false, - "use_prefix_tree": false - }, - { - "class_name": "kbqa_entity_linker", - "id": "linker_types", - "load_path": "{DOWNLOADS_PATH}/wikidata_eng", - "inverted_index_filename": "inverted_index_types_eng.pickle", - "entities_list_filename": "types_list.pickle", - "q2name_filename": "wiki_eng_q_to_name_types.pickle", - "build_inverted_index": false, - "use_descriptions": false, - "use_prefix_tree": false - }, - { - "class_name": "rel_ranking_infer", - "id": "rel_r_inf", - "ranker": {"config_path": "{CONFIGS_PATH}/ranking/rel_ranking.json"}, - "load_path": "{DOWNLOADS_PATH}/wikidata_eng", - "rel_q2name_filename": "wiki_dict_properties.pickle", - "rels_to_leave": 40 - }, - { - "class_name": "query_generator", - "id": "query_g", - "linker_entities": "#linker_entities", - "linker_types": "#linker_types", - "template_matcher": "#template_m", - "rel_ranker": "#rel_r_inf", - "wiki_parser": "#wiki_p", - "load_path": "{DOWNLOADS_PATH}/wikidata_eng", - "rank_rels_filename_1": "rels_0.txt", - "rank_rels_filename_2": "rels_1.txt", - "sparql_queries_filename": "{DOWNLOADS_PATH}/wikidata/sparql_queries.json", - "entities_to_leave": 5, - "rels_to_leave": 10, - "in": ["x", "x", "template_type", "entities", "types"], - "out": ["candidate_answers"] - }, - { - "class_name": "rel_ranking_bert_infer", - "ranker": {"config_path": "{CONFIGS_PATH}/classifiers/rel_ranking_bert.json"}, - "wiki_parser": "#wiki_p", - "batch_size": 32, - "load_path": "{DOWNLOADS_PATH}/wikidata_eng", - "rel_q2name_filename": "wiki_dict_properties.pickle", - "in": ["x", "candidate_answers"], - "out": ["answers"] - } - ], - "out": ["answers"] - }, - "train": { - "epochs": 30, - "batch_size": 16, - "metrics": [ - { - "name": "ner_f1", - "inputs": ["y", "y_pred"] - }, - { - "name": "ner_token_f1", - "inputs": ["y", "y_pred"] - } - ], - "validation_patience": 10, - "val_every_n_batches": 400, - - "log_every_n_batches": 400, - "tensorboard_log_dir": "{NER_PATH}/logs", - "show_examples": false, - "pytest_max_batches": 2, - "pytest_batch_size": 8, - "evaluation_targets": ["valid", "test"], - "class_name": "nn_trainer" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "BERT_PATH": "{DOWNLOADS_PATH}/bert_models_kbqa/cased_L-12_H-768_A-12", - "NER_PATH": "{MODELS_PATH}/ner_lcquad_ent_and_type", - "CONFIGS_PATH": "{DEEPPAVLOV_PATH}/configs" - }, - "labels": { - "telegram_utils": "NERCoNLL2003Model", - "server_utils": "NER" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/embeddings/reddit_fastText/wordpunct_tok_reddit_comments_2017_11_300.bin", - "subdir": "{DOWNLOADS_PATH}/embeddings" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/bert/cased_L-12_H-768_A-12.zip", - "subdir": "{DOWNLOADS_PATH}/bert_models_kbqa" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/models/query_prediction.tar.gz", - "subdir": "{MODELS_PATH}/classifiers/query_prediction" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/models/ner_lcquad.tar.gz", - "subdir": "{MODELS_PATH}/ner_lcquad_ent_and_type" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/models/rel_ranking.tar.gz", - "subdir": "{MODELS_PATH}/rel_ranking" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/models/rel_ranking_bert.tar.gz", - "subdir": "{MODELS_PATH}/rel_ranking_bert" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/wikidata/wiki_eng_files.tar.gz", - "subdir": "{DOWNLOADS_PATH}/wikidata_eng" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/wikidata/sparql_queries.json", - "subdir": "{DOWNLOADS_PATH}/wikidata" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/wikidata/wikidata.hdt", - "subdir": "{DOWNLOADS_PATH}/wikidata" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/wikidata/wikidata.hdt.index.v1-1", - "subdir": "{DOWNLOADS_PATH}/wikidata" - } - ] - } -} diff --git a/deeppavlov/configs/kbqa/kbqa_cq_bert_ranker.json b/deeppavlov/configs/kbqa/kbqa_cq_bert_ranker.json deleted file mode 100644 index bc365b78a0..0000000000 --- a/deeppavlov/configs/kbqa/kbqa_cq_bert_ranker.json +++ /dev/null @@ -1,171 +0,0 @@ -{ - "chainer": { - "in": ["x"], - "in_y": ["y"], - "pipe": [ - { - "config_path": "{CONFIGS_PATH}/ner/ner_lcquad_bert_ent_and_type.json", - "in": ["x"], - "out": ["x_tokens", "y_pred"] - }, - { - "in": ["x_tokens", "y_pred"], - "out": ["entities", "types", "entities_pos"], - "entity_tags": ["E-TAG"], - "type_tag": "T-TAG", - "o_tag": "O-TAG", - "tags_file": "{NER_PATH}/tag.dict", - "class_name": "entity_detection_parser" - }, - { - "class_name": "wiki_parser", - "id": "wiki_p", - "wiki_filename": "{DOWNLOADS_PATH}/wikidata/wikidata.hdt", - "lang": "@en" - }, - { - "class_name": "template_matcher", - "id": "template_m", - "num_processors": 8, - "load_path": "{DOWNLOADS_PATH}/wikidata_eng", - "templates_filename": "templates_eng.json" - }, - { - "config_path": "{CONFIGS_PATH}/classifiers/query_pr.json", - "in": ["x"], - "out": ["template_type"] - }, - { - "class_name": "kbqa_entity_linker", - "id": "linker_entities", - "load_path": "{DOWNLOADS_PATH}/wikidata_eng", - "inverted_index_filename": "inverted_index_eng.pickle", - "entities_list_filename": "entities_list.pickle", - "q2name_filename": "wiki_eng_q_to_name.pickle", - "who_entities_filename": "who_entities.pickle", - "use_hdt": false, - "wiki_parser": "#wiki_p", - "use_prefix_tree": false - }, - { - "class_name": "kbqa_entity_linker", - "id": "linker_types", - "load_path": "{DOWNLOADS_PATH}/wikidata_eng", - "inverted_index_filename": "inverted_index_types_eng.pickle", - "entities_list_filename": "types_list.pickle", - "q2name_filename": "wiki_eng_q_to_name_types.pickle", - "use_hdt": false, - "wiki_parser": "#wiki_p", - "use_prefix_tree": false - }, - { - "class_name": "rel_ranking_bert_infer", - "id": "rel_r_inf", - "ranker": {"config_path": "{CONFIGS_PATH}/classifiers/rel_ranking_bert.json"}, - "wiki_parser": "#wiki_p", - "batch_size": 32, - "load_path": "{DOWNLOADS_PATH}/wikidata_eng", - "rel_q2name_filename": "wiki_dict_properties.pickle" - }, - { - "class_name": "query_generator", - "id": "query_g", - "linker_entities": "#linker_entities", - "linker_types": "#linker_types", - "template_matcher": "#template_m", - "rel_ranker": "#rel_r_inf", - "wiki_parser": "#wiki_p", - "load_path": "{DOWNLOADS_PATH}/wikidata_eng", - "rank_rels_filename_1": "rels_0.txt", - "rank_rels_filename_2": "rels_1.txt", - "sparql_queries_filename": "{DOWNLOADS_PATH}/wikidata/sparql_queries.json", - "entities_to_leave": 5, - "rels_to_leave": 12, - "return_answers": true, - "in": ["x", "x", "template_type", "entities", "types"], - "out": ["answers"] - } - ], - "out": ["answers"] - }, - "train": { - "epochs": 30, - "batch_size": 16, - "metrics": [ - { - "name": "ner_f1", - "inputs": ["y", "y_pred"] - }, - { - "name": "ner_token_f1", - "inputs": ["y", "y_pred"] - } - ], - "validation_patience": 10, - "val_every_n_batches": 400, - - "log_every_n_batches": 400, - "tensorboard_log_dir": "{NER_PATH}/logs", - "show_examples": false, - "pytest_max_batches": 2, - "pytest_batch_size": 8, - "evaluation_targets": ["valid", "test"], - "class_name": "nn_trainer" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "BERT_PATH": "{DOWNLOADS_PATH}/bert_models_kbqa/cased_L-12_H-768_A-12", - "NER_PATH": "{MODELS_PATH}/ner_lcquad_ent_and_type", - "CONFIGS_PATH": "{DEEPPAVLOV_PATH}/configs" - }, - "labels": { - "telegram_utils": "NERCoNLL2003Model", - "server_utils": "NER" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/embeddings/reddit_fastText/wordpunct_tok_reddit_comments_2017_11_300.bin", - "subdir": "{DOWNLOADS_PATH}/embeddings" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/bert/cased_L-12_H-768_A-12.zip", - "subdir": "{DOWNLOADS_PATH}/bert_models_kbqa" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/models/query_prediction.tar.gz", - "subdir": "{MODELS_PATH}/classifiers/query_prediction" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/models/ner_lcquad.tar.gz", - "subdir": "{NER_PATH}" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/models/rel_ranking.tar.gz", - "subdir": "{MODELS_PATH}/rel_ranking" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/models/rel_ranking_bert.tar.gz", - "subdir": "{MODELS_PATH}/rel_ranking_bert" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/wikidata/wiki_eng_files.tar.gz", - "subdir": "{DOWNLOADS_PATH}/wikidata_eng" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/wikidata/sparql_queries.json", - "subdir": "{DOWNLOADS_PATH}/wikidata" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/wikidata/wikidata.hdt", - "subdir": "{DOWNLOADS_PATH}/wikidata" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/wikidata/wikidata.hdt.index.v1-1", - "subdir": "{DOWNLOADS_PATH}/wikidata" - } - ] - } -} diff --git a/deeppavlov/configs/kbqa/kbqa_cq_en.json b/deeppavlov/configs/kbqa/kbqa_cq_en.json new file mode 100644 index 0000000000..3c18230646 --- /dev/null +++ b/deeppavlov/configs/kbqa/kbqa_cq_en.json @@ -0,0 +1,94 @@ +{ + "chainer": { + "in": ["x"], + "in_y": ["y"], + "pipe": [ + { + "class_name": "question_sign_checker", + "in": ["x"], + "out": ["x_punct"] + }, + { + "config_path": "{CONFIGS_PATH}/entity_extraction/entity_detection_en.json", + "in": ["x_punct"], + "out": ["entity_substr", "entity_offsets", "entity_positions", "tags", "sentences_offsets", "sentences", "probas"] + }, + { + "class_name": "answer_types_extractor", + "lang": "@en", + "types_filename": "{DOWNLOADS_PATH}/wikidata_eng/types_labels_dict_en.pickle", + "types_sets_filename": "{DOWNLOADS_PATH}/wikidata_eng/answer_types.pickle", + "in": ["x_punct", "entity_substr", "tags"], + "out": ["answer_types", "f_entity_substr", "f_tags"] + }, + { + "config_path": "{CONFIGS_PATH}/entity_extraction/entity_linking_en.json", + "id": "entity_linker" + }, + { + "class_name": "wiki_parser", + "id": "wiki_p", + "wiki_filename": "{DOWNLOADS_PATH}/wikidata/wikidata_lite.hdt", + "lang": "@en" + }, + { + "class_name": "template_matcher", + "id": "template_m", + "num_processors": 16, + "load_path": "{DOWNLOADS_PATH}/wikidata_eng", + "templates_filename": "templates_eng.json" + }, + { + "config_path": "{CONFIGS_PATH}/classifiers/query_pr.json", + "in": ["x_punct"], + "out": ["template_type"] + }, + { + "class_name": "rel_ranking_infer", + "id": "rel_r_inf", + "ranker": {"config_path": "{CONFIGS_PATH}/ranking/rel_ranking_bert_en.json"}, + "wiki_parser": "#wiki_p", + "batch_size": 32, + "return_all_possible_answers": true, + "return_answer_ids": false, + "rank_answers": true, + "load_path": "{DOWNLOADS_PATH}/wikidata_eng", + "rel_q2name_filename": "wiki_dict_properties_eng.pickle" + }, + { + "class_name": "query_generator", + "id": "query_g", + "entity_linker": "#entity_linker", + "template_matcher": "#template_m", + "rel_ranker": "#rel_r_inf", + "wiki_parser": "#wiki_p", + "load_path": "{DOWNLOADS_PATH}/wikidata", + "rank_rels_filename_1": "rels_0.txt", + "rank_rels_filename_2": "rels_1.txt", + "sparql_queries_filename": "{DOWNLOADS_PATH}/wikidata/sparql_queries.json", + "entities_to_leave": 5, + "rels_to_leave": 10, + "in": ["x_punct", "x_punct", "template_type", "f_entity_substr", "f_tags", "answer_types"], + "out": ["answers"] + } + ], + "out": ["answers"] + }, + "metadata": { + "variables": { + "ROOT_PATH": "~/.deeppavlov", + "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", + "CONFIGS_PATH": "{DEEPPAVLOV_PATH}/configs" + }, + "download": [ + { + "url": "http://files.deeppavlov.ai/kbqa/wikidata/queries_and_rels.tar.gz", + "subdir": "{DOWNLOADS_PATH}/wikidata" + }, + { + "url": "http://files.deeppavlov.ai/kbqa/wikidata/kbqa_files_en.tar.gz", + "subdir": "{DOWNLOADS_PATH}/wikidata_eng" + } + ] + } +} diff --git a/deeppavlov/configs/kbqa/kbqa_cq_mt_bert.json b/deeppavlov/configs/kbqa/kbqa_cq_mt_bert.json deleted file mode 100644 index e3d5bff600..0000000000 --- a/deeppavlov/configs/kbqa/kbqa_cq_mt_bert.json +++ /dev/null @@ -1,257 +0,0 @@ -{ - "chainer": { - "in": ["x"], - "pipe": [ - { - "id": "queries_vocab", - "class_name": "simple_vocab", - "save_path": "{MT_BERT_PATH}/query_prediction/classes.dict" - }, - { - "class_name": "bert_ner_preprocessor", - "vocab_file": "{BERT_PATH}/vocab.txt", - "do_lower_case": false, - "max_seq_length": 512, - "max_subword_length": 15, - "token_maksing_prob": 0.0, - "in": ["x"], - "out": ["x_tokens", "x_subword_tokens", "x_subword_tok_ids", "pred_subword_mask"] - }, - { - "class_name": "mask", - "in": ["x_subword_tokens"], - "out": ["x_subword_mask"] - }, - { - "id": "tag_vocab", - "class_name": "simple_vocab", - "unk_token": ["O"], - "pad_with_zeros": true, - "save_path": "{MT_BERT_PATH}/ner/tag.dict", - "load_path": "{MT_BERT_PATH}/ner/tag.dict" - }, - { - "class_name": "mt_bert", - "id": "mt_bert_kbqa", - "inference_task_names": ["ner"], - "bert_config_file": "{BERT_PATH}/bert_config.json", - "pretrained_bert": "{BERT_PATH}/bert_model.ckpt", - "load_path": "{MT_BERT_PATH}/model", - "save_path": "{MT_BERT_PATH}/model", - "tasks": { - "ner": { - "class_name": "mt_bert_seq_tagging_task", - "n_tags": "#tag_vocab.len", - "use_crf": false, - "return_probas": true, - "encoder_layer_ids": [-1] - }, - "query_prediction": { - "class_name": "mt_bert_classification_task", - "n_classes": "#queries_vocab.len", - "return_probas": true, - "one_hot_labels": true - }, - "rel_ranking": { - "class_name": "mt_bert_classification_task", - "n_classes": 2, - "return_probas": true, - "one_hot_labels": false - } - }, - "in": { - "x_subword_tok_ids": "x_subword_tok_ids", - "x_subword_mask": "x_subword_mask", - "pred_subword_mask": "pred_subword_mask" - }, - "out": ["tag_probas_ner"] - }, - { - "class_name": "bert_preprocessor", - "vocab_file": "{BERT_PATH}/vocab.txt", - "do_lower_case": false, - "max_seq_length": 64, - "in": ["x"], - "out": ["bert_features_qr"] - }, - { - "class_name": "mt_bert_reuser", - "mt_bert": "#mt_bert_kbqa", - "task_names": "query_prediction", - "in": ["bert_features_qr"], - "out": ["template_probas"] - }, - { - "in": "template_probas", - "out": "template_ids", - "class_name": "proba2labels", - "max_proba": true - }, - { - "ref": "queries_vocab", - "in": "template_ids", - "out": "template_type" - }, - { - "in": ["x_tokens", "tag_probas_ner"], - "out": ["entities", "types"], - "thres_proba": 0.95, - "entity_tags": ["ENTITY"], - "type_tag": "TYPE", - "o_tag": "O", - "tags_file": "{MT_BERT_PATH}/ner/tag.dict", - "class_name": "entity_detection_parser" - }, - { - "class_name": "wiki_parser", - "id": "wiki_p", - "wiki_filename": "{DOWNLOADS_PATH}/wikidata/wikidata.hdt" - }, - { - "class_name": "template_matcher", - "id": "template_m", - "num_processors": 8, - "load_path": "{DOWNLOADS_PATH}/wikidata_eng", - "templates_filename": "templates_eng.json" - }, - { - "class_name": "kbqa_entity_linker", - "id": "linker_entities", - "load_path": "{DOWNLOADS_PATH}/wikidata_eng", - "inverted_index_filename": "inverted_index_eng.pickle", - "entities_list_filename": "entities_list.pickle", - "q2name_filename": "wiki_eng_q_to_name.pickle", - "who_entities_filename": "who_entities.pickle", - "use_hdt": false, - "wiki_parser": "#wiki_p", - "use_prefix_tree": false - }, - { - "class_name": "kbqa_entity_linker", - "id": "linker_types", - "load_path": "{DOWNLOADS_PATH}/wikidata_eng", - "inverted_index_filename": "inverted_index_types_eng.pickle", - "entities_list_filename": "types_list.pickle", - "q2name_filename": "wiki_eng_q_to_name_types.pickle", - "use_hdt": false, - "wiki_parser": "#wiki_p", - "use_prefix_tree": false - }, - { - "class_name": "rel_ranking_infer", - "id": "rel_r_inf", - "ranker": {"config_path": "{CONFIGS_PATH}/ranking/rel_ranking.json"}, - "load_path": "{DOWNLOADS_PATH}/wikidata_eng", - "rel_q2name_filename": "wiki_dict_properties.pickle", - "rels_to_leave": 40 - }, - { - "class_name": "query_generator", - "id": "query_g", - "linker_entities": "#linker_entities", - "linker_types": "#linker_types", - "template_matcher": "#template_m", - "rel_ranker": "#rel_r_inf", - "wiki_parser": "#wiki_p", - "load_path": "{DOWNLOADS_PATH}/wikidata_eng", - "rank_rels_filename_1": "rels_0.txt", - "rank_rels_filename_2": "rels_1.txt", - "sparql_queries_filename": "{DOWNLOADS_PATH}/wikidata/sparql_queries.json", - "entities_to_leave": 5, - "rels_to_leave": 10, - "in": ["x", "x", "template_type", "entities", "types"], - "out": ["candidate_rels_answers"] - }, - { - "class_name": "rel_ranking_bert_infer", - "bert_preprocessor": { - "class_name": "bert_preprocessor", - "vocab_file": "{BERT_PATH}/vocab.txt", - "do_lower_case": false, - "max_seq_length": 64 - }, - "ranker":{ - "class_name": "mt_bert_reuser", - "mt_bert": "#mt_bert_kbqa", - "task_names": ["rel_ranking"] - }, - "wiki_parser": "#wiki_p", - "batch_size": 32, - "load_path": "{DOWNLOADS_PATH}/wikidata_eng", - "rel_q2name_filename": "wiki_dict_properties.pickle", - "use_mt_bert": true, - "in": ["x", "candidate_rels_answers"], - "out": ["answers"] - } - ], - "out": ["answers"] - }, - "train": { - "epochs": 30, - "batch_size": 16, - "metrics": [ - { - "name": "ner_f1", - "inputs": ["y", "y_pred"] - }, - { - "name": "ner_token_f1", - "inputs": ["y", "y_pred"] - } - ], - "validation_patience": 10, - "val_every_n_batches": 400, - - "log_every_n_batches": 400, - "tensorboard_log_dir": "{MT_BERT_PATH}/logs", - "show_examples": false, - "pytest_max_batches": 2, - "pytest_batch_size": 8, - "evaluation_targets": ["valid", "test"], - "class_name": "nn_trainer" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "BERT_PATH": "{DOWNLOADS_PATH}/bert_models_kbqa/cased_L-12_H-768_A-12", - "CONFIGS_PATH": "{DEEPPAVLOV_PATH}/configs", - "MT_BERT_PATH": "{MODELS_PATH}/mt_bert_kbqa" - }, - "labels": { - "telegram_utils": "KBQA_MT_BERT_MODEL", - "server_utils": "KBQA" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/embeddings/reddit_fastText/wordpunct_tok_reddit_comments_2017_11_300.bin", - "subdir": "{DOWNLOADS_PATH}/embeddings" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/models/mt_bert.tar.gz", - "subdir": "{MT_BERT_PATH}" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/models/cased_L-12_H-768_A-12_vocab_config.tar.gz", - "subdir": "{BERT_PATH}" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/wikidata/wiki_eng_files.tar.gz", - "subdir": "{DOWNLOADS_PATH}/wikidata_eng" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/wikidata/sparql_queries.json", - "subdir": "{DOWNLOADS_PATH}/wikidata" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/wikidata/wikidata.hdt", - "subdir": "{DOWNLOADS_PATH}/wikidata" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/wikidata/wikidata.hdt.index.v1-1", - "subdir": "{DOWNLOADS_PATH}/wikidata" - } - ] - } -} diff --git a/deeppavlov/configs/kbqa/kbqa_cq_online.json b/deeppavlov/configs/kbqa/kbqa_cq_online.json deleted file mode 100644 index 4918186abd..0000000000 --- a/deeppavlov/configs/kbqa/kbqa_cq_online.json +++ /dev/null @@ -1,170 +0,0 @@ -{ - "chainer": { - "in": ["x"], - "in_y": ["y"], - "pipe": [ - { - "config_path": "{CONFIGS_PATH}/ner/ner_lcquad_bert_ent_and_type.json", - "in": ["x"], - "out": ["x_tokens", "y_pred"] - }, - { - "in": ["x_tokens", "y_pred"], - "out": ["entities", "types", "entities_pos"], - "entity_tags": ["E-TAG"], - "type_tag": "T-TAG", - "o_tag": "O-TAG", - "tags_file": "{NER_PATH}/tag.dict", - "class_name": "entity_detection_parser" - }, - { - "class_name": "wiki_parser_online", - "id": "wiki_p", - "url": "https://query.wikidata.org/sparql" - }, - { - "class_name": "template_matcher", - "id": "template_m", - "num_processors": 8, - "load_path": "{DOWNLOADS_PATH}/wikidata_eng", - "templates_filename": "templates_eng.json" - }, - { - "config_path": "{CONFIGS_PATH}/classifiers/query_pr.json", - "in": ["x"], - "out": ["template_type"] - }, - { - "class_name": "kbqa_entity_linker", - "id": "linker_entities", - "load_path": "{DOWNLOADS_PATH}/wikidata_eng", - "inverted_index_filename": "inverted_index_eng.pickle", - "entities_list_filename": "entities_list.pickle", - "q2name_filename": "wiki_eng_q_to_name.pickle", - "who_entities_filename": "who_entities.pickle", - "build_inverted_index": false, - "use_descriptions": false, - "use_prefix_tree": false - }, - { - "class_name": "kbqa_entity_linker", - "id": "linker_types", - "load_path": "{DOWNLOADS_PATH}/wikidata_eng", - "inverted_index_filename": "inverted_index_types_eng.pickle", - "entities_list_filename": "types_list.pickle", - "q2name_filename": "wiki_eng_q_to_name_types.pickle", - "build_inverted_index": false, - "use_descriptions": false, - "use_prefix_tree": false - }, - { - "class_name": "rel_ranking_infer", - "id": "rel_r_inf", - "ranker": {"config_path": "{CONFIGS_PATH}/ranking/rel_ranking.json"}, - "load_path": "{DOWNLOADS_PATH}/wikidata_eng", - "rel_q2name_filename": "wiki_dict_properties.pickle", - "rels_to_leave": 40 - }, - { - "class_name": "query_generator_online", - "id": "query_g", - "linker_entities": "#linker_entities", - "linker_types": "#linker_types", - "template_matcher": "#template_m", - "rel_ranker": "#rel_r_inf", - "wiki_parser": "#wiki_p", - "load_path": "{DOWNLOADS_PATH}/wikidata_eng", - "rank_rels_filename_1": "rels_0.txt", - "rank_rels_filename_2": "rels_1.txt", - "sparql_queries_filename": "{DOWNLOADS_PATH}/wikidata/sparql_queries.json", - "entities_to_leave": 5, - "rels_to_leave": 10, - "in": ["x", "x", "template_type", "entities", "types"], - "out": ["candidate_answers"] - }, - { - "class_name": "rel_ranking_bert_infer", - "ranker": {"config_path": "{CONFIGS_PATH}/classifiers/rel_ranking_bert.json"}, - "wiki_parser": "#wiki_p", - "batch_size": 32, - "load_path": "{DOWNLOADS_PATH}/wikidata_eng", - "rel_q2name_filename": "wiki_dict_properties.pickle", - "in": ["x", "candidate_answers"], - "out": ["answers"] - } - ], - "out": ["answers"] - }, - "train": { - "epochs": 30, - "batch_size": 16, - "metrics": [ - { - "name": "ner_f1", - "inputs": ["y", "y_pred"] - }, - { - "name": "ner_token_f1", - "inputs": ["y", "y_pred"] - } - ], - "validation_patience": 10, - "val_every_n_batches": 400, - - "log_every_n_batches": 400, - "tensorboard_log_dir": "{NER_PATH}/logs", - "show_examples": false, - "pytest_max_batches": 2, - "pytest_batch_size": 8, - "evaluation_targets": ["valid", "test"], - "class_name": "nn_trainer" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "BERT_PATH": "{DOWNLOADS_PATH}/bert_models_kbqa/cased_L-12_H-768_A-12", - "NER_PATH": "{MODELS_PATH}/ner_lcquad_ent_and_type", - "CONFIGS_PATH": "{DEEPPAVLOV_PATH}/configs" - }, - "labels": { - "telegram_utils": "NERCoNLL2003Model", - "server_utils": "NER" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/embeddings/reddit_fastText/wordpunct_tok_reddit_comments_2017_11_300.bin", - "subdir": "{DOWNLOADS_PATH}/embeddings" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/bert/cased_L-12_H-768_A-12.zip", - "subdir": "{DOWNLOADS_PATH}/bert_models_kbqa" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/models/query_prediction.tar.gz", - "subdir": "{MODELS_PATH}/classifiers/query_prediction" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/models/ner_lcquad.tar.gz", - "subdir": "{MODELS_PATH}/ner_lcquad_ent_and_type" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/models/rel_ranking.tar.gz", - "subdir": "{MODELS_PATH}/rel_ranking" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/models/rel_ranking_bert.tar.gz", - "subdir": "{MODELS_PATH}/rel_ranking_bert" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/wikidata/wiki_eng_files.tar.gz", - "subdir": "{DOWNLOADS_PATH}/wikidata_eng" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/wikidata/sparql_queries.json", - "subdir": "{DOWNLOADS_PATH}/wikidata" - } - ] - } -} diff --git a/deeppavlov/configs/kbqa/kbqa_cq_online_mt_bert.json b/deeppavlov/configs/kbqa/kbqa_cq_online_mt_bert.json deleted file mode 100644 index d1b0095d2c..0000000000 --- a/deeppavlov/configs/kbqa/kbqa_cq_online_mt_bert.json +++ /dev/null @@ -1,254 +0,0 @@ -{ - "chainer": { - "in": ["x"], - "pipe": [ - { - "id": "queries_vocab", - "class_name": "simple_vocab", - "save_path": "{MT_BERT_PATH}/query_prediction/classes.dict" - }, - { - "class_name": "bert_ner_preprocessor", - "vocab_file": "{BERT_PATH}/vocab.txt", - "do_lower_case": false, - "max_seq_length": 512, - "max_subword_length": 15, - "token_maksing_prob": 0.0, - "in": ["x"], - "out": ["x_tokens", "x_subword_tokens", "x_subword_tok_ids", "pred_subword_mask"] - }, - { - "class_name": "mask", - "in": ["x_subword_tokens"], - "out": ["x_subword_mask"] - }, - { - "id": "tag_vocab", - "class_name": "simple_vocab", - "unk_token": ["O"], - "pad_with_zeros": true, - "save_path": "{MT_BERT_PATH}/ner/tag.dict", - "load_path": "{MT_BERT_PATH}/ner/tag.dict" - }, - { - "class_name": "mt_bert", - "id": "mt_bert_kbqa", - "inference_task_names": ["ner"], - "bert_config_file": "{BERT_PATH}/bert_config.json", - "pretrained_bert": "{BERT_PATH}/bert_model.ckpt", - "load_path": "{MT_BERT_PATH}/model", - "save_path": "{MT_BERT_PATH}/model", - "tasks": { - "ner": { - "class_name": "mt_bert_seq_tagging_task", - "n_tags": "#tag_vocab.len", - "use_crf": false, - "return_probas": true, - "encoder_layer_ids": [-1] - }, - "query_prediction": { - "class_name": "mt_bert_classification_task", - "n_classes": "#queries_vocab.len", - "return_probas": true, - "one_hot_labels": true - }, - "rel_ranking": { - "class_name": "mt_bert_classification_task", - "n_classes": 2, - "return_probas": true, - "one_hot_labels": false - } - }, - "in_distribution": {"ner": ["x_subword_tok_ids", "x_subword_mask", "pred_subword_mask"]}, - "in": { - "x_subword_tok_ids": "x_subword_tok_ids", - "x_subword_mask": "x_subword_mask", - "pred_subword_mask": "pred_subword_mask" - }, - "out": ["tag_probas_ner"] - }, - - { - "class_name": "bert_preprocessor", - "vocab_file": "{BERT_PATH}/vocab.txt", - "do_lower_case": false, - "max_seq_length": 64, - "in": ["x"], - "out": ["bert_features_qr"] - }, - { - "class_name": "mt_bert_reuser", - "mt_bert": "#mt_bert_kbqa", - "task_names": [["query_prediction"]], - "in_distribution": {"query_prediction": 1}, - "in": ["bert_features_qr"], - "out": ["template_probas"] - }, - - { - "in": "template_probas", - "out": "template_ids", - "class_name": "proba2labels", - "max_proba": true - }, - { - "ref": "queries_vocab", - "in": "template_ids", - "out": "template_type" - }, - { - "in": ["x_tokens", "tag_probas_ner"], - "out": ["entities", "types"], - "thres_proba": 0.95, - "entity_tags": ["ENTITY"], - "type_tag": "TYPE", - "o_tag": "O", - "tags_file": "{MT_BERT_PATH}/ner/tag.dict", - "class_name": "entity_detection_parser" - }, - { - "class_name": "wiki_parser_online", - "id": "wiki_p", - "url": "https://query.wikidata.org/sparql" - }, - { - "class_name": "template_matcher", - "id": "template_m", - "num_processors": 8, - "load_path": "{DOWNLOADS_PATH}/wikidata_eng", - "templates_filename": "templates_eng.json" - }, - { - "class_name": "kbqa_entity_linker", - "id": "linker_entities", - "load_path": "{DOWNLOADS_PATH}/wikidata_eng", - "inverted_index_filename": "inverted_index_eng.pickle", - "entities_list_filename": "entities_list.pickle", - "q2name_filename": "wiki_eng_q_to_name.pickle", - "who_entities_filename": "who_entities.pickle", - "use_hdt": false, - "wiki_parser": "#wiki_p", - "use_prefix_tree": false - }, - { - "class_name": "kbqa_entity_linker", - "id": "linker_types", - "load_path": "{DOWNLOADS_PATH}/wikidata_eng", - "inverted_index_filename": "inverted_index_types_eng.pickle", - "entities_list_filename": "types_list.pickle", - "q2name_filename": "wiki_eng_q_to_name_types.pickle", - "use_hdt": false, - "wiki_parser": "#wiki_p", - "use_prefix_tree": false - }, - { - "class_name": "rel_ranking_infer", - "id": "rel_r_inf", - "ranker": {"config_path": "{CONFIGS_PATH}/ranking/rel_ranking.json"}, - "load_path": "{DOWNLOADS_PATH}/wikidata_eng", - "rel_q2name_filename": "wiki_dict_properties.pickle", - "rels_to_leave": 40 - }, - { - "class_name": "query_generator_online", - "id": "query_g", - "linker_entities": "#linker_entities", - "linker_types": "#linker_types", - "template_matcher": "#template_m", - "rel_ranker": "#rel_r_inf", - "wiki_parser": "#wiki_p", - "load_path": "{DOWNLOADS_PATH}/wikidata_eng", - "rank_rels_filename_1": "rels_0.txt", - "rank_rels_filename_2": "rels_1.txt", - "sparql_queries_filename": "{DOWNLOADS_PATH}/wikidata/sparql_queries.json", - "entities_to_leave": 5, - "rels_to_leave": 10, - "in": ["x", "x", "template_type", "entities", "types"], - "out": ["candidate_rels_answers"] - }, - { - "class_name": "rel_ranking_bert_infer", - "bert_preprocessor": { - "class_name": "bert_preprocessor", - "vocab_file": "{BERT_PATH}/vocab.txt", - "do_lower_case": false, - "max_seq_length": 64 - }, - "ranker":{ - "class_name": "mt_bert_reuser", - "mt_bert": "#mt_bert_kbqa", - "task_names": "rel_ranking", - "in_distribution": {"rel_ranking": 1} - }, - "wiki_parser": "#wiki_p", - "batch_size": 32, - "load_path": "{DOWNLOADS_PATH}/wikidata_eng", - "rel_q2name_filename": "wiki_dict_properties.pickle", - "use_mt_bert": true, - "in": ["x", "candidate_rels_answers"], - "out": ["answers"] - } - ], - "out": ["answers"] - }, - "train": { - "epochs": 30, - "batch_size": 16, - "metrics": [ - { - "name": "ner_f1", - "inputs": ["y", "y_pred"] - }, - { - "name": "ner_token_f1", - "inputs": ["y", "y_pred"] - } - ], - "validation_patience": 10, - "val_every_n_batches": 400, - - "log_every_n_batches": 400, - "tensorboard_log_dir": "{MT_BERT_PATH}/logs", - "show_examples": false, - "pytest_max_batches": 2, - "pytest_batch_size": 8, - "evaluation_targets": ["valid", "test"], - "class_name": "nn_trainer" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "BERT_PATH": "{DOWNLOADS_PATH}/bert_models_kbqa/cased_L-12_H-768_A-12", - "CONFIGS_PATH": "{DEEPPAVLOV_PATH}/configs", - "MT_BERT_PATH": "{MODELS_PATH}/mt_bert_kbqa" - }, - "labels": { - "telegram_utils": "NERCoNLL2003Model", - "server_utils": "NER" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/embeddings/reddit_fastText/wordpunct_tok_reddit_comments_2017_11_300.bin", - "subdir": "{DOWNLOADS_PATH}/embeddings" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/models/cased_L-12_H-768_A-12_vocab_config.tar.gz", - "subdir": "{BERT_PATH}" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/models/mt_bert.tar.gz", - "subdir": "{MT_BERT_PATH}" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/wikidata/wiki_eng_files.tar.gz", - "subdir": "{DOWNLOADS_PATH}/wikidata_eng" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/wikidata/sparql_queries.json", - "subdir": "{DOWNLOADS_PATH}/wikidata" - } - ] - } -} diff --git a/deeppavlov/configs/kbqa/kbqa_cq_ru.json b/deeppavlov/configs/kbqa/kbqa_cq_ru.json new file mode 100644 index 0000000000..85f14eed91 --- /dev/null +++ b/deeppavlov/configs/kbqa/kbqa_cq_ru.json @@ -0,0 +1,116 @@ +{ + "chainer": { + "in": ["x"], + "in_y": ["y"], + "pipe": [ + { + "class_name": "question_sign_checker", + "in": ["x"], + "out": ["x_punct"] + }, + { + "config_path": "{CONFIGS_PATH}/entity_extraction/entity_detection_ru.json", + "in": ["x_punct"], + "out": ["entity_substr", "entity_offsets", "entity_positions", "tags", "sentences_offsets", "sentences", "probas"] + }, + { + "class_name": "answer_types_extractor", + "lang": "@ru", + "types_filename": "{DOWNLOADS_PATH}/wikidata_rus/types_labels_dict_ru.pickle", + "types_sets_filename": "{DOWNLOADS_PATH}/wikidata_rus/answer_types.pickle", + "in": ["x_punct", "entity_substr", "tags"], + "out": ["answer_types", "f_entity_substr", "f_tags"] + }, + { + "config_path": "{CONFIGS_PATH}/entity_extraction/entity_linking_ru.json", + "id": "entity_linker" + }, + { + "class_name": "wiki_parser", + "id": "wiki_p", + "wiki_filename": "{DOWNLOADS_PATH}/wikidata/wikidata_lite.hdt", + "lang": "@ru" + }, + { + "class_name": "slovnet_syntax_parser", + "load_path": "{MODELS_PATH}/slovnet_syntax_parser", + "navec_filename": "{MODELS_PATH}/slovnet_syntax_parser/navec_news_v1_1B_250K_300d_100q.tar", + "syntax_parser_filename": "{MODELS_PATH}/slovnet_syntax_parser/slovnet_syntax_news_v1.tar", + "in": ["x_punct", "entity_offsets"], + "out": ["syntax_info"] + }, + { + "class_name": "ru_adj_to_noun", + "freq_dict_filename": "{DOWNLOADS_PATH}/wikidata_rus/freqrnc2011.csv", + "id": "adj2noun" + }, + { + "class_name": "tree_to_sparql", + "sparql_queries_filename": "{DOWNLOADS_PATH}/wikidata/sparql_queries.json", + "adj_to_noun": "#adj2noun", + "lang": "rus", + "in": ["syntax_info", "entity_positions"], + "out": ["x_sanitized", "query_nums", "entities_dict", "types_dict"] + }, + { + "class_name": "template_matcher", + "id": "template_m", + "num_processors": 8, + "load_path": "{DOWNLOADS_PATH}/wikidata_rus", + "templates_filename": "templates_rus.json" + }, + { + "class_name": "rel_ranking_infer", + "id": "rel_r_inf", + "ranker": {"config_path": "{CONFIGS_PATH}/ranking/rel_ranking_bert_ru.json"}, + "wiki_parser": "#wiki_p", + "batch_size": 32, + "return_all_possible_answers": true, + "return_answer_ids": false, + "load_path": "{DOWNLOADS_PATH}/wikidata_rus", + "rel_q2name_filename": "wiki_dict_properties_rus.pickle" + }, + { + "class_name": "query_generator", + "id": "query_g", + "entity_linker": "#entity_linker", + "template_matcher": "#template_m", + "rel_ranker": "#rel_r_inf", + "wiki_parser": "#wiki_p", + "load_path": "{DOWNLOADS_PATH}/wikidata", + "rank_rels_filename_1": "rels_0.txt", + "rank_rels_filename_2": "rels_1.txt", + "sparql_queries_filename": "{DOWNLOADS_PATH}/wikidata/sparql_queries.json", + "entities_to_leave": 9, + "rels_to_leave": 10, + "return_all_possible_answers": false, + "syntax_structure_known": true, + "in": ["x_punct", "x_sanitized", "query_nums", "f_entity_substr", "f_tags", "answer_types"], + "out": ["answers"] + } + ], + "out": ["answers"] + }, + "metadata": { + "variables": { + "ROOT_PATH": "~/.deeppavlov", + "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", + "MODELS_PATH": "{ROOT_PATH}/models", + "CONFIGS_PATH": "{DEEPPAVLOV_PATH}/configs" + }, + "download": [ + { + "url": "http://files.deeppavlov.ai/kbqa/wikidata/queries_and_rels.tar.gz", + "subdir": "{DOWNLOADS_PATH}/wikidata" + }, + { + "url": "http://files.deeppavlov.ai/kbqa/wikidata/kbqa_files_ru.tar.gz", + "subdir": "{DOWNLOADS_PATH}/wikidata_rus" + }, + { + "url": "http://files.deeppavlov.ai/deeppavlov_data/slovnet_syntax_parser.tar.gz", + "subdir": "{MODELS_PATH}/slovnet_syntax_parser" + } + ] + } +} diff --git a/deeppavlov/configs/kbqa/kbqa_cq_rus.json b/deeppavlov/configs/kbqa/kbqa_cq_rus.json deleted file mode 100644 index 2c62a33206..0000000000 --- a/deeppavlov/configs/kbqa/kbqa_cq_rus.json +++ /dev/null @@ -1,205 +0,0 @@ -{ - "chainer": { - "in": ["x_init"], - "in_y": ["y"], - "pipe": [ - { - "class_name": "question_sign_checker", - "in": ["x_init"], - "out": ["x"] - }, - { - "config_path": "{CONFIGS_PATH}/ner/ner_bert_ent_and_type_rus.json", - "in": ["x"], - "out": ["x_tokens", "y_pred"] - }, - { - "in": ["x_tokens", "y_pred"], - "out": ["entities", "types", "entities_pos"], - "thres_proba": 0.4, - "entity_tags": ["E-TAG"], - "type_tag": "T-TAG", - "o_tag": "O-TAG", - "tags_file": "{NER_PATH}/tag.dict", - "ignore_points": true, - "class_name": "entity_detection_parser" - }, - { - "class_name": "wiki_parser", - "id": "wiki_p", - "wiki_filename": "{DOWNLOADS_PATH}/wikidata/wikidata.hdt", - "lang": "@ru" - }, - { - "config_path": "{CONFIGS_PATH}/syntax/syntax_ru_syntagrus_bert.json", - "in": ["x"], - "out": ["syntax_info"] - }, - { - "class_name": "ru_adj_to_noun", - "freq_dict_filename": "{DOWNLOADS_PATH}/wikidata_rus/freqrnc2011.csv", - "id": "adj2noun" - }, - { - "class_name": "tree_to_sparql", - "sparql_queries_filename": "{DOWNLOADS_PATH}/wikidata/sparql_queries.json", - "adj_to_noun": "#adj2noun", - "lang": "rus", - "in": ["syntax_info", "entities_pos"], - "out": ["x_sanitized", "query_nums", "entities_dict", "types_dict"] - }, - { - "class_name": "template_matcher", - "id": "template_m", - "num_processors": 8, - "load_path": "{DOWNLOADS_PATH}/wikidata_rus", - "templates_filename": "templates_rus.json" - }, - { - "class_name": "rel_ranking_bert_infer", - "id": "entity_descr_ranking", - "ranker": {"config_path": "{CONFIGS_PATH}/classifiers/entity_ranking_bert_rus_no_mention.json"}, - "batch_size": 100, - "load_path": "{DOWNLOADS_PATH}/wikidata_rus", - "rel_q2name_filename": "q_to_descr_ru.pickle", - "rels_to_leave": 200 - }, - { - "class_name": "kbqa_entity_linker", - "id": "linker_entities", - "load_path": "{DOWNLOADS_PATH}/wikidata_rus", - "inverted_index_filename": "inverted_index_rus.pickle", - "entities_list_filename": "entities_list_rus.pickle", - "q2name_filename": "wiki_rus_q_to_name.pickle", - "entity_ranker": "#entity_descr_ranking", - "build_inverted_index": false, - "lemmatize": true, - "use_descriptions": true, - "include_mention": false, - "use_prefix_tree": false, - "lang": "ru" - }, - { - "class_name": "kbqa_entity_linker", - "id": "linker_types", - "load_path": "{DOWNLOADS_PATH}/wikidata_rus", - "inverted_index_filename": "inverted_index_types_rus.pickle", - "entities_list_filename": "types_list_rus.pickle", - "q2name_filename": "wiki_rus_q_to_name_types.pickle", - "build_inverted_index": false, - "lemmatize": true, - "use_descriptions": false, - "use_prefix_tree": false, - "lang": "ru" - }, - { - "class_name": "rel_ranking_bert_infer", - "id": "rel_r_inf", - "ranker": {"config_path": "{CONFIGS_PATH}/classifiers/rel_ranking_bert_rus.json"}, - "wiki_parser": "#wiki_p", - "batch_size": 32, - "return_all_possible_answers": false, - "return_answer_ids": false, - "load_path": "{DOWNLOADS_PATH}/wikidata_rus", - "rel_q2name_filename": "wiki_dict_properties_rus.pickle" - }, - { - "class_name": "query_generator", - "id": "query_g", - "linker_entities": "#linker_entities", - "linker_types": "#linker_types", - "template_matcher": "#template_m", - "rel_ranker": "#rel_r_inf", - "wiki_parser": "#wiki_p", - "load_path": "{DOWNLOADS_PATH}/wikidata_rus", - "rank_rels_filename_1": "rels_0.txt", - "rank_rels_filename_2": "rels_1.txt", - "sparql_queries_filename": "{DOWNLOADS_PATH}/wikidata/sparql_queries.json", - "entities_to_leave": 9, - "rels_to_leave": 10, - "return_answers": true, - "return_all_possible_answers": false, - "syntax_structure_known": true, - "in": ["x", "x_sanitized", "query_nums", "entities_dict", "types_dict"], - "out": ["answers"] - } - ], - "out": ["answers"] - }, - "train": { - "epochs": 30, - "batch_size": 16, - "metrics": [ - { - "name": "ner_f1", - "inputs": ["y", "y_pred"] - }, - { - "name": "ner_token_f1", - "inputs": ["y", "y_pred"] - } - ], - "validation_patience": 10, - "val_every_n_batches": 400, - - "log_every_n_batches": 400, - "tensorboard_log_dir": "{NER_PATH}/logs", - "show_examples": false, - "pytest_max_batches": 2, - "pytest_batch_size": 8, - "evaluation_targets": ["valid", "test"], - "class_name": "nn_trainer" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "BERT_PATH": "{DOWNLOADS_PATH}/bert_models/multi_cased_L-12_H-768_A-12", - "NER_PATH": "{MODELS_PATH}/ner_ent_and_type_rus", - "CONFIGS_PATH": "{DEEPPAVLOV_PATH}/configs" - }, - "labels": { - "telegram_utils": "NERCoNLL2003Model", - "server_utils": "NER" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/bert/multi_cased_L-12_H-768_A-12.zip", - "subdir": "{DOWNLOADS_PATH}/bert_models" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/models/ner_cq_rus.tar.gz", - "subdir": "{NER_PATH}" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/models/rel_ranking_bert_rus.tar.gz", - "subdir": "{MODELS_PATH}/rel_ranking_bert_rus" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/models/entity_ranking_bert_rus_no_mention.tar.gz", - "subdir": "{MODELS_PATH}/entity_ranking_bert_rus_no_mention" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/syntax_parser/syntax_ru_syntagrus_bert.tar.gz", - "subdir": "{MODELS_PATH}/syntax_ru_syntagrus" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/wikidata/wiki_rus_files.tar.gz", - "subdir": "{DOWNLOADS_PATH}/wikidata_rus" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/wikidata/sparql_queries.json", - "subdir": "{DOWNLOADS_PATH}/wikidata" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/wikidata/wikidata.hdt", - "subdir": "{DOWNLOADS_PATH}/wikidata" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/wikidata/wikidata.hdt.index.v1-1", - "subdir": "{DOWNLOADS_PATH}/wikidata" - } - ] - } -} diff --git a/deeppavlov/configs/kbqa/kbqa_cq_sep.json b/deeppavlov/configs/kbqa/kbqa_cq_sep.json deleted file mode 100644 index 418522958e..0000000000 --- a/deeppavlov/configs/kbqa/kbqa_cq_sep.json +++ /dev/null @@ -1,175 +0,0 @@ -{ - "chainer": { - "in": ["x_init"], - "in_y": ["y"], - "pipe": [ - { - "class_name": "question_sign_checker", - "in": ["x_init"], - "out": ["x"] - }, - { - "config_path": "{CONFIGS_PATH}/ner/ner_lcquad_bert_ent_and_type.json", - "in": ["x"], - "out": ["x_tokens", "y_pred"] - }, - { - "in": ["x_tokens", "y_pred"], - "out": ["entities", "types", "entities_pos"], - "entity_tags": ["E-TAG"], - "type_tag": "T-TAG", - "o_tag": "O-TAG", - "tags_file": "{NER_PATH}/tag.dict", - "class_name": "entity_detection_parser" - }, - { - "class_name": "template_matcher", - "id": "template_m", - "num_processors": 16, - "load_path": "{DOWNLOADS_PATH}/wikidata_eng", - "templates_filename": "templates_eng.json" - }, - { - "config_path": "{CONFIGS_PATH}/classifiers/query_pr.json", - "in": ["x"], - "out": ["template_type"] - }, - { - "class_name": "kbqa_entity_linker", - "id": "linker_entities", - "load_path": "{DOWNLOADS_PATH}/wikidata_eng", - "inverted_index_filename": "inverted_index_eng.pickle", - "entities_list_filename": "entities_list.pickle", - "q2name_filename": "wiki_eng_q_to_name.pickle", - "who_entities_filename": "who_entities.pickle", - "build_inverted_index": false, - "use_descriptions": false, - "use_prefix_tree": false - }, - { - "class_name": "kbqa_entity_linker", - "id": "linker_types", - "load_path": "{DOWNLOADS_PATH}/wikidata_eng", - "inverted_index_filename": "inverted_index_types_eng.pickle", - "entities_list_filename": "types_list.pickle", - "q2name_filename": "wiki_eng_q_to_name_types.pickle", - "build_inverted_index": false, - "use_descriptions": false, - "use_prefix_tree": false - }, - { - "class_name": "rel_ranking_infer", - "id": "rel_r_inf", - "ranker": {"config_path": "{CONFIGS_PATH}/ranking/rel_ranking.json"}, - "load_path": "{DOWNLOADS_PATH}/wikidata_eng", - "rel_q2name_filename": "wiki_dict_properties.pickle", - "rels_to_leave": 40 - }, - { - "class_name": "query_generator", - "id": "query_g", - "linker_entities": {"config_path": "{CONFIGS_PATH}/kbqa/kbqa_entity_linking.json"}, - "linker_types": "#linker_types", - "template_matcher": "#template_m", - "rel_ranker": "#rel_r_inf", - "wiki_parser": {"config_path": "{CONFIGS_PATH}/kbqa/wiki_parser.json"}, - "load_path": "{DOWNLOADS_PATH}/wikidata_eng", - "rank_rels_filename_1": "rels_0.txt", - "rank_rels_filename_2": "rels_1.txt", - "sparql_queries_filename": "{DOWNLOADS_PATH}/wikidata/sparql_queries.json", - "wiki_file_format": "pickle", - "entities_to_leave": 5, - "rels_to_leave": 10, - "in": ["x", "x", "template_type", "entities", "types"], - "out": ["candidate_answers"] - }, - { - "class_name": "rel_ranking_bert_infer", - "ranker": {"config_path": "{CONFIGS_PATH}/classifiers/rel_ranking_bert.json"}, - "wiki_parser": {"config_path": "{CONFIGS_PATH}/kbqa/wiki_parser.json"}, - "batch_size": 32, - "load_path": "{DOWNLOADS_PATH}/wikidata_eng", - "rel_q2name_filename": "wiki_dict_properties.pickle", - "in": ["x", "candidate_answers"], - "out": ["answers"] - } - ], - "out": ["answers"] - }, - "train": { - "epochs": 30, - "batch_size": 16, - "metrics": [ - { - "name": "ner_f1", - "inputs": ["y", "y_pred"] - }, - { - "name": "ner_token_f1", - "inputs": ["y", "y_pred"] - } - ], - "validation_patience": 10, - "val_every_n_batches": 400, - - "log_every_n_batches": 400, - "tensorboard_log_dir": "{NER_PATH}/logs", - "show_examples": false, - "pytest_max_batches": 2, - "pytest_batch_size": 8, - "evaluation_targets": ["valid", "test"], - "class_name": "nn_trainer" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "BERT_PATH": "{DOWNLOADS_PATH}/bert_models_kbqa/cased_L-12_H-768_A-12", - "NER_PATH": "{MODELS_PATH}/ner_lcquad_ent_and_type", - "CONFIGS_PATH": "{DEEPPAVLOV_PATH}/configs" - }, - "labels": { - "telegram_utils": "NERCoNLL2003Model", - "server_utils": "NER" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/embeddings/reddit_fastText/wordpunct_tok_reddit_comments_2017_11_300.bin", - "subdir": "{DOWNLOADS_PATH}/embeddings" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/bert/cased_L-12_H-768_A-12.zip", - "subdir": "{DOWNLOADS_PATH}/bert_models_kbqa" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/models/query_prediction.tar.gz", - "subdir": "{MODELS_PATH}/classifiers/query_prediction" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/models/ner_lcquad.tar.gz", - "subdir": "{MODELS_PATH}/ner_lcquad_ent_and_type" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/models/rel_ranking.tar.gz", - "subdir": "{MODELS_PATH}/rel_ranking" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/models/rel_ranking_bert.tar.gz", - "subdir": "{MODELS_PATH}/rel_ranking_bert" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/wikidata/wiki_eng_files.tar.gz", - "subdir": "{DOWNLOADS_PATH}/wikidata_eng" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/wikidata/sparql_queries.json", - "subdir": "{DOWNLOADS_PATH}/wikidata" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/wikidata/wikidata_compr.pickle", - "subdir": "{DOWNLOADS_PATH}/wikidata" - } - ] - } -} diff --git a/deeppavlov/configs/kbqa/kbqa_entity_linking.json b/deeppavlov/configs/kbqa/kbqa_entity_linking.json deleted file mode 100644 index 821c67d9e4..0000000000 --- a/deeppavlov/configs/kbqa/kbqa_entity_linking.json +++ /dev/null @@ -1,55 +0,0 @@ -{ - "chainer": { - "in": ["entity_substr", "template", "context"], - "pipe": [ - { - "class_name": "rel_ranking_bert_infer", - "id": "entity_descr_ranking", - "ranker": {"config_path": "{CONFIGS_PATH}/classifiers/entity_ranking_bert_eng_no_mention.json"}, - "batch_size": 100, - "load_path": "{DOWNLOADS_PATH}/wikidata_eng", - "rel_q2name_filename": "q_to_descr_en.pickle", - "rels_to_leave": 20 - }, - { - "class_name": "kbqa_entity_linker", - "in": ["entity_substr", "template", "context"], - "out": ["entity_ids", "confidences"], - "load_path": "{DOWNLOADS_PATH}/wikidata_eng", - "inverted_index_filename": "inverted_index_eng.pickle", - "entities_list_filename": "entities_list.pickle", - "q2name_filename": "wiki_eng_q_to_name.pickle", - "q2descr_filename": "q_to_descr_en.pickle", - "who_entities_filename": "who_entities.pickle", - "entity_ranker": "#entity_descr_ranking", - "build_inverted_index": false, - "use_descriptions": false, - "use_prefix_tree": false, - "num_entities_to_return": 2 - } - ], - "out": ["entity_ids", "confidences"] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "CONFIGS_PATH": "{DEEPPAVLOV_PATH}/configs" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/kbqa/wikidata/kbqa_entity_linking_eng.tar.gz", - "subdir": "{DOWNLOADS_PATH}/wikidata_eng" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/models/entity_ranking_bert_eng_no_mention.tar.gz", - "subdir": "{MODELS_PATH}/entity_ranking_bert_eng_no_mention" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/wikidata/q_to_descr_en.pickle", - "subdir": "{DOWNLOADS_PATH}/wikidata_eng" - } - ] - } -} diff --git a/deeppavlov/configs/kbqa/kbqa_mt_bert_train.json b/deeppavlov/configs/kbqa/kbqa_mt_bert_train.json deleted file mode 100644 index 7aef5caf22..0000000000 --- a/deeppavlov/configs/kbqa/kbqa_mt_bert_train.json +++ /dev/null @@ -1,255 +0,0 @@ -{ - "dataset_reader": { - "class_name": "multitask_reader", - "data_path": "null", - "tasks": { - "ner_lcquad": { - "reader_class_name": "sq_reader", - "data_path": "{DOWNLOADS_PATH}/lcquad/entity_and_type_detection_BIO.pickle" - }, - "query_prediction": { - "reader_class_name": "basic_classification_reader", - "x": "Question", - "y": "Class", - "data_path": "{DOWNLOADS_PATH}/query_prediction" - }, - "rel_ranking": { - "reader_class_name": "rel_ranking_reader", - "data_path": "{DOWNLOADS_PATH}/rel_ranking_bert", - "do_lower_case": false - } - } - }, - "dataset_iterator": { - "class_name": "multitask_iterator", - "tasks": { - "ner_lcquad": {"iterator_class_name": "data_learning_iterator"}, - "query_prediction": {"iterator_class_name": "basic_classification_iterator"}, - "rel_ranking": { - "iterator_class_name": "siamese_iterator", - "seed": 243, - "len_valid": 500 - } - } - }, - "chainer": { - "in": ["x_ner", "x_qr", "texts_a_b"], - "in_y": ["y_ner", "y_qr", "y_rr"], - "pipe": [ - { - "class_name": "input_splitter", - "keys_to_extract": [0, 1], - "in": ["texts_a_b"], - "out": ["text_a", "text_b"] - }, - { - "class_name": "bert_ner_preprocessor", - "vocab_file": "{BERT_PATH}/vocab.txt", - "do_lower_case": false, - "max_seq_length": 512, - "max_subword_length": 15, - "token_maksing_prob": 0.0, - "in": ["x_ner"], - "out": ["x_tokens", "x_subword_tokens", "x_subword_tok_ids", "pred_subword_mask"] - }, - { - "class_name": "mask", - "in": ["x_subword_tokens"], - "out": ["x_subword_mask"] - }, - { - "id": "tag_vocab", - "class_name": "simple_vocab", - "unk_token": ["O"], - "pad_with_zeros": true, - "save_path": "{MT_BERT_PATH}/ner/tag.dict", - "load_path": "{MT_BERT_PATH}/ner/tag.dict", - "fit_on": ["y_ner"], - "in": ["y_ner"], - "out": ["y_ner_ind"] - }, - - { - "class_name": "bert_preprocessor", - "vocab_file": "{DOWNLOADS_PATH}/bert_models/cased_L-12_H-768_A-12/vocab.txt", - "do_lower_case": false, - "max_seq_length": 64, - "in": ["x_qr"], - "out": ["bert_features_qr"] - }, - { - "id": "queries_vocab", - "class_name": "simple_vocab", - "fit_on": ["y_qr"], - "save_path": "{MT_BERT_PATH}/query_prediction/classes.dict", - "load_path": "{MT_BERT_PATH}/query_prediction/classes.dict", - "in": "y_qr", - "out": "y_qr_ids" - }, - { - "in": "y_qr_ids", - "out": "y_qr_onehot", - "class_name": "one_hotter", - "depth": "#queries_vocab.len", - "single_vector": true - }, - - { - "class_name": "bert_preprocessor", - "vocab_file": "{DOWNLOADS_PATH}/bert_models/cased_L-12_H-768_A-12/vocab.txt", - "do_lower_case": false, - "max_seq_length": 64, - "in": ["text_a", "text_b"], - "out": ["bert_features_rr"] - }, - { - "id": "mt_bert", - "class_name": "mt_bert", - "save_path": "{MT_BERT_PATH}/model", - "load_path": "{MT_BERT_PATH}/model", - "bert_config_file": "{BERT_PATH}/bert_config.json", - "pretrained_bert": "{BERT_PATH}/bert_model.ckpt", - "attention_probs_keep_prob": 0.5, - "freeze_embeddings": false, - "body_learning_rate": 3e-5, - "min_body_learning_rate": 2e-7, - "learning_rate_drop_patience": 10, - "learning_rate_drop_div": 1.5, - "load_before_drop": true, - "optimizer": "tf.train:AdamOptimizer", - "clip_norm": 1.0, - "tasks": { - "ner": { - "class_name": "mt_bert_seq_tagging_task", - "n_tags": "#tag_vocab.len", - "use_crf": false, - "keep_prob": 0.5, - "return_probas": false, - "encoder_layer_ids": [-1], - "learning_rate": 1e-4 - }, - "query_prediction": { - "class_name": "mt_bert_classification_task", - "n_classes": "#queries_vocab.len", - "return_probas": true, - "one_hot_labels": true, - "keep_prob": 0.5, - "learning_rate": 1e-4 - }, - "rel_ranking": { - "class_name": "mt_bert_classification_task", - "n_classes": 2, - "return_probas": false, - "one_hot_labels": false, - "keep_prob": 0.5, - "learning_rate": 1e-4 - } - }, - "in_distribution": {"ner": 3, "query_prediction": 1, "rel_ranking": 1}, - "in": { - "x_subword_tok_ids": "x_subword_tok_ids", - "x_subword_mask": "x_subword_mask", - "pred_subword_mask": "pred_subword_mask", - "bert_features_qr": "bert_features_qr", - "bert_features_rr": "bert_features_rr" - }, - "in_y_distribution": {"ner": 1, "query_prediction": 1, "rel_ranking": 1}, - "in_y": { - "y_ner_ind": "y_ner_ind", - "y_qr_onehot": "y_qr_onehot", - "y_rr": "y_rr" - }, - "out": ["ner_ids", "qr_probas", "rr_preds"] - }, - { - "in": "ner_ids", - "out": "ner_labels", - "ref": "tag_vocab" - }, - { - "in": "qr_probas", - "out": "qr_ids", - "max_proba": true, - "class_name": "proba2labels" - }, - { - "in": "qr_ids", - "out": "qr_labels", - "ref": "queries_vocab" - } - ], - "out": ["ner_labels", "qr_labels", "rr_preds"] - }, - "train": { - "epochs": 5, - "batch_size": 32, - "metrics": [ - { - "name": "average__ner_f1__f1_macro__f1", - "inputs": ["y_ner", "ner_labels", "y_qr", "qr_labels", "y_rr", "rr_preds"] - }, - { - "name": "ner_f1", - "inputs": ["y_ner", "ner_labels"] - }, - { - "name": "f1_macro", - "inputs": ["y_qr", "qr_labels"] - }, - { - "name": "f1", - "inputs": ["y_rr", "rr_preds"] - } - ], - "validation_patience": 20, - "val_every_n_batches": 400, - - "log_every_n_batches": 400, - "tensorboard_log_dir": "{MT_BERT_PATH}/logs", - "show_examples": false, - "pytest_max_batches": 2, - "pytest_batch_size": 8, - "evaluation_targets": ["valid", "test"], - "class_name": "nn_trainer" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "BERT_PATH": "{DOWNLOADS_PATH}/bert_models/cased_L-12_H-768_A-12", - "MT_BERT_PATH": "{MODELS_PATH}/mt_bert_kbqa", - "CONFIGS_PATH": "{DEEPPAVLOV_PATH}/configs" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/kbqa/datasets/query_prediction.tar.gz", - "subdir": "{DOWNLOADS_PATH}/query_prediction" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/datasets/rel_ranking_bert.tar.gz", - "subdir": "{DOWNLOADS_PATH}/rel_ranking_bert" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/models/mt_bert.tar.gz", - "subdir": "{MT_BERT_PATH}" - }, - { - "url": "http://files.deeppavlov.ai/embeddings/reddit_fastText/wordpunct_tok_reddit_comments_2017_11_300.bin", - "subdir": "{DOWNLOADS_PATH}/embeddings" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/bert/cased_L-12_H-768_A-12.zip", - "subdir": "{DOWNLOADS_PATH}/bert_models" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/models/rel_ranking.tar.gz", - "subdir": "{MODELS_PATH}/rel_ranking" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/datasets/entity_and_type_detection_BIO.pickle", - "subdir": "{DOWNLOADS_PATH}/lcquad" - } - ] - } -} diff --git a/deeppavlov/configs/morpho_tagger/BERT/morpho_ru_syntagrus_bert.json b/deeppavlov/configs/morpho_tagger/BERT/morpho_ru_syntagrus_bert.json deleted file mode 100644 index e7589c0e9e..0000000000 --- a/deeppavlov/configs/morpho_tagger/BERT/morpho_ru_syntagrus_bert.json +++ /dev/null @@ -1,166 +0,0 @@ -{ - "dataset_reader": { - "class_name": "morphotagger_dataset_reader", - "data_path": "{DOWNLOADS_PATH}/UD2.3_source", - "language": "ru_syntagrus", - "data_types": [ - "train", "dev", "test" - ] - }, - "dataset_iterator": { - "class_name": "morphotagger_dataset" - }, - "chainer": { - "in": ["x"], - "in_y": ["y"], - "pipe": [ - { - "in": [ - "x" - ], - "class_name": "lazy_tokenizer", - "out": [ - "x_words" - ] - }, - { - "class_name": "bert_ner_preprocessor", - "vocab_file": "{BERT_PATH}/vocab.txt", - "do_lower_case": false, - "max_seq_length": 512, - "max_subword_length": 15, - "subword_mask_mode": "last", - "token_masking_prob": 0.0, - "in": ["x_words"], - "out": ["x_tokens", "x_subword_tokens", "x_subword_tok_ids", "startofword_markers", "attention_mask"] - }, - { - "id": "tag_vocab", - "class_name": "simple_vocab", - "min_freq": 3, - "fit_on": [ - "y" - ], - "in": ["y"], - "out": ["y_ind"], - "special_tokens": [ - "PAD", - "BEGIN", - "END" - ], - "pad_with_zeros": true, - "save_path": "{WORK_PATH}/tag.dict", - "load_path": "{WORK_PATH}/tag.dict" - }, - { - "class_name": "bert_sequence_tagger", - "n_tags": "#tag_vocab.len", - "keep_prob": 0.1, - "bert_config_file": "{BERT_PATH}/bert_config.json", - "pretrained_bert": "{BERT_PATH}/bert_model.ckpt", - "attention_probs_keep_prob": 0.5, - "use_crf": false, - "return_probas": false, - "encoder_layer_ids": [6, 7, 8, 9, 10, 11], - "optimizer": "tf.train:AdamOptimizer", - "learning_rate": 1e-3, - "bert_learning_rate": 2e-5, - "min_learning_rate": 1e-7, - "learning_rate_drop_patience": 30, - "learning_rate_drop_div": 1.5, - "load_before_drop": true, - "clip_norm": null, - "save_path": "{WORK_PATH}/model", - "load_path": "{WORK_PATH}/model", - "in": ["x_subword_tok_ids", "attention_mask", "startofword_markers"], - "in_y": ["y_ind"], - "out": ["y_predicted_ind"] - }, - { - "ref": "tag_vocab", - "in": ["y_predicted_ind"], - "out": ["y_predicted"] - }, - { - "in": [ - "x_words", - "y_predicted" - ], - "out": [ - "y_lemmas" - ], - "class_name": "UD_pymorphy_lemmatizer", - "end": "\n" - }, - { - "in": [ - "x_words", - "y_predicted", - "y_lemmas" - ], - "out": [ - "y_prettified" - ], - "id": "prettifier", - "class_name": "lemmatized_output_prettifier", - "end": "\n" - } - ], - "out": [ - "y_prettified" - ] - }, - "train": { - "epochs": 10, - "batch_size": 32, - "metrics": [ - { - "name": "per_token_accuracy", - "inputs": [ - "y", - "y_predicted" - ] - }, - { - "name": "accuracy", - "inputs": [ - "y", - "y_predicted" - ] - } - ], - "validation_patience": 10, - "val_every_n_epochs": 1, - "val_every_n_batches": 300, - - "tensorboard_log_dir": "{WORK_PATH}/logs", - "show_examples": false, - "pytest_max_batches": 2, - "pytest_batch_size": 8, - "evaluation_targets": ["valid", "test"], - "class_name": "nn_trainer" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "BERT_PATH": "{DOWNLOADS_PATH}/bert_models/rubert_cased_L-12_H-768_A-12_v1", - "WORK_PATH": "{MODELS_PATH}/morpho_ru_syntagrus" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/morpho_tagger/BERT/morpho_ru_syntagrus_bert.tar.gz", - "subdir": "{WORK_PATH}" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/bert/rubert_cased_L-12_H-768_A-12_v1.tar.gz", - "subdir": "{DOWNLOADS_PATH}/bert_models" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/morpho_tagger/UD2.3/ru_syntagrus.tar.gz", - "subdir": "{DOWNLOADS_PATH}/UD2.3_source/ru_syntagrus" - } - ] - } -} diff --git a/deeppavlov/configs/morpho_tagger/UD2.0/morpho_ar.json b/deeppavlov/configs/morpho_tagger/UD2.0/morpho_ar.json deleted file mode 100644 index 1abd931d5d..0000000000 --- a/deeppavlov/configs/morpho_tagger/UD2.0/morpho_ar.json +++ /dev/null @@ -1,173 +0,0 @@ -{ - "dataset_reader": { - "class_name": "morphotagger_dataset_reader", - "data_path": "{DOWNLOADS_PATH}/UD2.0_source", - "language": "ar", - "data_types": [ - "train", - "dev", - "test" - ] - }, - "dataset_iterator": { - "class_name": "morphotagger_dataset" - }, - "chainer": { - "in": [ - "x" - ], - "in_y": [ - "y" - ], - "pipe": [ - { - "in": [ - "x" - ], - "class_name": "lazy_tokenizer", - "out": [ - "x_tokens" - ] - }, - { - "id": "char_splitting_lowercase_preprocessor", - "class_name": "char_splitting_lowercase_preprocessor", - "in": [ - "x_tokens" - ], - "out": [ - "x_chars_lowered_marked" - ] - }, - { - "id": "tag_vocab", - "class_name": "simple_vocab", - "fit_on": [ - "y" - ], - "special_tokens": [ - "PAD", - "BEGIN", - "END" - ], - "save_path": "{MODELS_PATH}/morpho_tagger/UD2.0/ar/tag.dict", - "load_path": "{MODELS_PATH}/morpho_tagger/UD2.0/ar/tag.dict" - }, - { - "id": "char_vocab", - "class_name": "simple_vocab", - "min_freq": 3, - "fit_on": [ - "x_chars_lowered_marked" - ], - "special_tokens": [ - "PAD", - "BEGIN", - "END" - ], - "save_path": "{MODELS_PATH}/morpho_tagger/UD2.0/ar/char.dict", - "load_path": "{MODELS_PATH}/morpho_tagger/UD2.0/ar/char.dict" - }, - { - "in": [ - "x_chars_lowered_marked" - ], - "in_y": [ - "y" - ], - "out": [ - "y_predicted" - ], - "class_name": "morpho_tagger", - "main": true, - "save_path": "{MODELS_PATH}/morpho_tagger/UD2.0/ar/model.hdf5", - "load_path": "{MODELS_PATH}/morpho_tagger/UD2.0/ar/model.hdf5", - "tags": "#tag_vocab", - "symbols": "#char_vocab", - "verbose": 1, - "char_embeddings_size": 32, - "char_window_size": [ - 1, - 2, - 3, - 4, - 5, - 6, - 7 - ], - "word_lstm_units": 128, - "conv_dropout": 0.0, - "char_conv_layers": 1, - "char_highway_layers": 1, - "highway_dropout": 0.0, - "word_lstm_layers": 1, - "char_filter_multiple": 50, - "intermediate_dropout": 0.0, - "word_dropout": 0.2, - "lstm_dropout": 0.2, - "regularizer": 0.01 - }, - { - "in": [ - "x_tokens", - "y_predicted" - ], - "out": [ - "y_prettified" - ], - "id": "prettifier", - "class_name": "tag_output_prettifier", - "end": "\n" - } - ], - "out": [ - "y_prettified" - ] - }, - "train": { - "epochs": 50, - "batch_size": 32, - "metrics": [ - { - "name": "per_token_accuracy", - "inputs": [ - "y", - "y_predicted" - ] - }, - { - "name": "accuracy", - "inputs": [ - "y", - "y_predicted" - ] - } - ], - "validation_patience": 10, - "val_every_n_epochs": 1, - "log_every_n_epochs": 1, - "class_name": "nn_trainer", - "evaluation_targets": [ - "valid", - "test" - ] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "RESULTS_PATH": "{ROOT_PATH}/results" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/morpho_tagger/UD2.0/ar.tar.gz", - "subdir": "{MODELS_PATH}/morpho_tagger/UD2.0/ar" - }, - { - "url": "http://files.deeppavlov.ai/datasets/UD2.0_source/ar.tar.gz", - "subdir": "{DOWNLOADS_PATH}/UD2.0_source/ar" - } - ] - } -} \ No newline at end of file diff --git a/deeppavlov/configs/morpho_tagger/UD2.0/morpho_cs.json b/deeppavlov/configs/morpho_tagger/UD2.0/morpho_cs.json deleted file mode 100644 index 047a08cf24..0000000000 --- a/deeppavlov/configs/morpho_tagger/UD2.0/morpho_cs.json +++ /dev/null @@ -1,173 +0,0 @@ -{ - "dataset_reader": { - "class_name": "morphotagger_dataset_reader", - "data_path": "{DOWNLOADS_PATH}/UD2.0_source", - "language": "cs", - "data_types": [ - "train", - "dev", - "test" - ] - }, - "dataset_iterator": { - "class_name": "morphotagger_dataset" - }, - "chainer": { - "in": [ - "x" - ], - "in_y": [ - "y" - ], - "pipe": [ - { - "in": [ - "x" - ], - "class_name": "lazy_tokenizer", - "out": [ - "x_tokens" - ] - }, - { - "id": "char_splitting_lowercase_preprocessor", - "class_name": "char_splitting_lowercase_preprocessor", - "in": [ - "x_tokens" - ], - "out": [ - "x_chars_lowered_marked" - ] - }, - { - "id": "tag_vocab", - "class_name": "simple_vocab", - "fit_on": [ - "y" - ], - "special_tokens": [ - "PAD", - "BEGIN", - "END" - ], - "save_path": "{MODELS_PATH}/morpho_tagger/UD2.0/cs/tag.dict", - "load_path": "{MODELS_PATH}/morpho_tagger/UD2.0/cs/tag.dict" - }, - { - "id": "char_vocab", - "class_name": "simple_vocab", - "min_freq": 3, - "fit_on": [ - "x_chars_lowered_marked" - ], - "special_tokens": [ - "PAD", - "BEGIN", - "END" - ], - "save_path": "{MODELS_PATH}/morpho_tagger/UD2.0/cs/char.dict", - "load_path": "{MODELS_PATH}/morpho_tagger/UD2.0/cs/char.dict" - }, - { - "in": [ - "x_chars_lowered_marked" - ], - "in_y": [ - "y" - ], - "out": [ - "y_predicted" - ], - "class_name": "morpho_tagger", - "main": true, - "save_path": "{MODELS_PATH}/morpho_tagger/UD2.0/cs/model.hdf5", - "load_path": "{MODELS_PATH}/morpho_tagger/UD2.0/cs/model.hdf5", - "tags": "#tag_vocab", - "symbols": "#char_vocab", - "verbose": 1, - "char_embeddings_size": 32, - "char_window_size": [ - 1, - 2, - 3, - 4, - 5, - 6, - 7 - ], - "word_lstm_units": 128, - "conv_dropout": 0.0, - "char_conv_layers": 1, - "char_highway_layers": 1, - "highway_dropout": 0.0, - "word_lstm_layers": 1, - "char_filter_multiple": 50, - "intermediate_dropout": 0.0, - "word_dropout": 0.2, - "lstm_dropout": 0.2, - "regularizer": 0.01 - }, - { - "in": [ - "x_tokens", - "y_predicted" - ], - "out": [ - "y_prettified" - ], - "id": "prettifier", - "class_name": "tag_output_prettifier", - "end": "\n" - } - ], - "out": [ - "y_prettified" - ] - }, - "train": { - "epochs": 50, - "batch_size": 32, - "metrics": [ - { - "name": "per_token_accuracy", - "inputs": [ - "y", - "y_predicted" - ] - }, - { - "name": "accuracy", - "inputs": [ - "y", - "y_predicted" - ] - } - ], - "validation_patience": 10, - "val_every_n_epochs": 1, - "log_every_n_epochs": 1, - "class_name": "nn_trainer", - "evaluation_targets": [ - "valid", - "test" - ] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "RESULTS_PATH": "{ROOT_PATH}/results" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/morpho_tagger/UD2.0/cs.tar.gz", - "subdir": "{MODELS_PATH}/morpho_tagger/UD2.0/cs" - }, - { - "url": "http://files.deeppavlov.ai/datasets/UD2.0_source/cs.tar.gz", - "subdir": "{DOWNLOADS_PATH}/UD2.0_source/cs" - } - ] - } -} \ No newline at end of file diff --git a/deeppavlov/configs/morpho_tagger/UD2.0/morpho_de.json b/deeppavlov/configs/morpho_tagger/UD2.0/morpho_de.json deleted file mode 100644 index c0c7aa19f1..0000000000 --- a/deeppavlov/configs/morpho_tagger/UD2.0/morpho_de.json +++ /dev/null @@ -1,173 +0,0 @@ -{ - "dataset_reader": { - "class_name": "morphotagger_dataset_reader", - "data_path": "{DOWNLOADS_PATH}/UD2.0_source", - "language": "de", - "data_types": [ - "train", - "dev", - "test" - ] - }, - "dataset_iterator": { - "class_name": "morphotagger_dataset" - }, - "chainer": { - "in": [ - "x" - ], - "in_y": [ - "y" - ], - "pipe": [ - { - "in": [ - "x" - ], - "class_name": "lazy_tokenizer", - "out": [ - "x_tokens" - ] - }, - { - "id": "char_splitting_lowercase_preprocessor", - "class_name": "char_splitting_lowercase_preprocessor", - "in": [ - "x_tokens" - ], - "out": [ - "x_chars_lowered_marked" - ] - }, - { - "id": "tag_vocab", - "class_name": "simple_vocab", - "fit_on": [ - "y" - ], - "special_tokens": [ - "PAD", - "BEGIN", - "END" - ], - "save_path": "{MODELS_PATH}/morpho_tagger/UD2.0/de/tag.dict", - "load_path": "{MODELS_PATH}/morpho_tagger/UD2.0/de/tag.dict" - }, - { - "id": "char_vocab", - "class_name": "simple_vocab", - "min_freq": 3, - "fit_on": [ - "x_chars_lowered_marked" - ], - "special_tokens": [ - "PAD", - "BEGIN", - "END" - ], - "save_path": "{MODELS_PATH}/morpho_tagger/UD2.0/de/char.dict", - "load_path": "{MODELS_PATH}/morpho_tagger/UD2.0/de/char.dict" - }, - { - "in": [ - "x_chars_lowered_marked" - ], - "in_y": [ - "y" - ], - "out": [ - "y_predicted" - ], - "class_name": "morpho_tagger", - "main": true, - "save_path": "{MODELS_PATH}/morpho_tagger/UD2.0/de/model.hdf5", - "load_path": "{MODELS_PATH}/morpho_tagger/UD2.0/de/model.hdf5", - "tags": "#tag_vocab", - "symbols": "#char_vocab", - "verbose": 1, - "char_embeddings_size": 32, - "char_window_size": [ - 1, - 2, - 3, - 4, - 5, - 6, - 7 - ], - "word_lstm_units": 128, - "conv_dropout": 0.0, - "char_conv_layers": 1, - "char_highway_layers": 1, - "highway_dropout": 0.0, - "word_lstm_layers": 1, - "char_filter_multiple": 50, - "intermediate_dropout": 0.0, - "word_dropout": 0.2, - "lstm_dropout": 0.2, - "regularizer": 0.01 - }, - { - "in": [ - "x_tokens", - "y_predicted" - ], - "out": [ - "y_prettified" - ], - "id": "prettifier", - "class_name": "tag_output_prettifier", - "end": "\n" - } - ], - "out": [ - "y_prettified" - ] - }, - "train": { - "epochs": 50, - "batch_size": 32, - "metrics": [ - { - "name": "per_token_accuracy", - "inputs": [ - "y", - "y_predicted" - ] - }, - { - "name": "accuracy", - "inputs": [ - "y", - "y_predicted" - ] - } - ], - "validation_patience": 10, - "val_every_n_epochs": 1, - "log_every_n_epochs": 1, - "class_name": "nn_trainer", - "evaluation_targets": [ - "valid", - "test" - ] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "RESULTS_PATH": "{ROOT_PATH}/results" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/morpho_tagger/UD2.0/de.tar.gz", - "subdir": "{MODELS_PATH}/morpho_tagger/UD2.0/de" - }, - { - "url": "http://files.deeppavlov.ai/datasets/UD2.0_source/de.tar.gz", - "subdir": "{DOWNLOADS_PATH}/UD2.0_source/de" - } - ] - } -} \ No newline at end of file diff --git a/deeppavlov/configs/morpho_tagger/UD2.0/morpho_en.json b/deeppavlov/configs/morpho_tagger/UD2.0/morpho_en.json deleted file mode 100644 index dd771f8216..0000000000 --- a/deeppavlov/configs/morpho_tagger/UD2.0/morpho_en.json +++ /dev/null @@ -1,173 +0,0 @@ -{ - "dataset_reader": { - "class_name": "morphotagger_dataset_reader", - "data_path": "{DOWNLOADS_PATH}/UD2.0_source", - "language": "en", - "data_types": [ - "train", - "dev", - "test" - ] - }, - "dataset_iterator": { - "class_name": "morphotagger_dataset" - }, - "chainer": { - "in": [ - "x" - ], - "in_y": [ - "y" - ], - "pipe": [ - { - "in": [ - "x" - ], - "class_name": "lazy_tokenizer", - "out": [ - "x_tokens" - ] - }, - { - "id": "char_splitting_lowercase_preprocessor", - "class_name": "char_splitting_lowercase_preprocessor", - "in": [ - "x_tokens" - ], - "out": [ - "x_chars_lowered_marked" - ] - }, - { - "id": "tag_vocab", - "class_name": "simple_vocab", - "fit_on": [ - "y" - ], - "special_tokens": [ - "PAD", - "BEGIN", - "END" - ], - "save_path": "{MODELS_PATH}/morpho_tagger/UD2.0/en/tag.dict", - "load_path": "{MODELS_PATH}/morpho_tagger/UD2.0/en/tag.dict" - }, - { - "id": "char_vocab", - "class_name": "simple_vocab", - "min_freq": 3, - "fit_on": [ - "x_chars_lowered_marked" - ], - "special_tokens": [ - "PAD", - "BEGIN", - "END" - ], - "save_path": "{MODELS_PATH}/morpho_tagger/UD2.0/en/char.dict", - "load_path": "{MODELS_PATH}/morpho_tagger/UD2.0/en/char.dict" - }, - { - "in": [ - "x_chars_lowered_marked" - ], - "in_y": [ - "y" - ], - "out": [ - "y_predicted" - ], - "class_name": "morpho_tagger", - "main": true, - "save_path": "{MODELS_PATH}/morpho_tagger/UD2.0/en/model.hdf5", - "load_path": "{MODELS_PATH}/morpho_tagger/UD2.0/en/model.hdf5", - "tags": "#tag_vocab", - "symbols": "#char_vocab", - "verbose": 1, - "char_embeddings_size": 32, - "char_window_size": [ - 1, - 2, - 3, - 4, - 5, - 6, - 7 - ], - "word_lstm_units": 128, - "conv_dropout": 0.0, - "char_conv_layers": 1, - "char_highway_layers": 1, - "highway_dropout": 0.0, - "word_lstm_layers": 1, - "char_filter_multiple": 50, - "intermediate_dropout": 0.0, - "word_dropout": 0.2, - "lstm_dropout": 0.2, - "regularizer": 0.01 - }, - { - "in": [ - "x_tokens", - "y_predicted" - ], - "out": [ - "y_prettified" - ], - "id": "prettifier", - "class_name": "tag_output_prettifier", - "end": "\n" - } - ], - "out": [ - "y_prettified" - ] - }, - "train": { - "epochs": 50, - "batch_size": 32, - "metrics": [ - { - "name": "per_token_accuracy", - "inputs": [ - "y", - "y_predicted" - ] - }, - { - "name": "accuracy", - "inputs": [ - "y", - "y_predicted" - ] - } - ], - "validation_patience": 10, - "val_every_n_epochs": 1, - "log_every_n_epochs": 1, - "class_name": "nn_trainer", - "evaluation_targets": [ - "valid", - "test" - ] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "RESULTS_PATH": "{ROOT_PATH}/results" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/morpho_tagger/UD2.0/en.tar.gz", - "subdir": "{MODELS_PATH}/morpho_tagger/UD2.0/en" - }, - { - "url": "http://files.deeppavlov.ai/datasets/UD2.0_source/en.tar.gz", - "subdir": "{DOWNLOADS_PATH}/UD2.0_source/en" - } - ] - } -} \ No newline at end of file diff --git a/deeppavlov/configs/morpho_tagger/UD2.0/morpho_es_ancora.json b/deeppavlov/configs/morpho_tagger/UD2.0/morpho_es_ancora.json deleted file mode 100644 index ce6c39f736..0000000000 --- a/deeppavlov/configs/morpho_tagger/UD2.0/morpho_es_ancora.json +++ /dev/null @@ -1,173 +0,0 @@ -{ - "dataset_reader": { - "class_name": "morphotagger_dataset_reader", - "data_path": "{DOWNLOADS_PATH}/UD2.0_source", - "language": "es_ancora", - "data_types": [ - "train", - "dev", - "test" - ] - }, - "dataset_iterator": { - "class_name": "morphotagger_dataset" - }, - "chainer": { - "in": [ - "x" - ], - "in_y": [ - "y" - ], - "pipe": [ - { - "in": [ - "x" - ], - "class_name": "lazy_tokenizer", - "out": [ - "x_tokens" - ] - }, - { - "id": "char_splitting_lowercase_preprocessor", - "class_name": "char_splitting_lowercase_preprocessor", - "in": [ - "x_tokens" - ], - "out": [ - "x_chars_lowered_marked" - ] - }, - { - "id": "tag_vocab", - "class_name": "simple_vocab", - "fit_on": [ - "y" - ], - "special_tokens": [ - "PAD", - "BEGIN", - "END" - ], - "save_path": "{MODELS_PATH}/morpho_tagger/UD2.0/es_ancora/tag.dict", - "load_path": "{MODELS_PATH}/morpho_tagger/UD2.0/es_ancora/tag.dict" - }, - { - "id": "char_vocab", - "class_name": "simple_vocab", - "min_freq": 3, - "fit_on": [ - "x_chars_lowered_marked" - ], - "special_tokens": [ - "PAD", - "BEGIN", - "END" - ], - "save_path": "{MODELS_PATH}/morpho_tagger/UD2.0/es_ancora/char.dict", - "load_path": "{MODELS_PATH}/morpho_tagger/UD2.0/es_ancora/char.dict" - }, - { - "in": [ - "x_chars_lowered_marked" - ], - "in_y": [ - "y" - ], - "out": [ - "y_predicted" - ], - "class_name": "morpho_tagger", - "main": true, - "save_path": "{MODELS_PATH}/morpho_tagger/UD2.0/es_ancora/model.hdf5", - "load_path": "{MODELS_PATH}/morpho_tagger/UD2.0/es_ancora/model.hdf5", - "tags": "#tag_vocab", - "symbols": "#char_vocab", - "verbose": 1, - "char_embeddings_size": 32, - "char_window_size": [ - 1, - 2, - 3, - 4, - 5, - 6, - 7 - ], - "word_lstm_units": 128, - "conv_dropout": 0.0, - "char_conv_layers": 1, - "char_highway_layers": 1, - "highway_dropout": 0.0, - "word_lstm_layers": 1, - "char_filter_multiple": 50, - "intermediate_dropout": 0.0, - "word_dropout": 0.2, - "lstm_dropout": 0.2, - "regularizer": 0.01 - }, - { - "in": [ - "x_tokens", - "y_predicted" - ], - "out": [ - "y_prettified" - ], - "id": "prettifier", - "class_name": "tag_output_prettifier", - "end": "\n" - } - ], - "out": [ - "y_prettified" - ] - }, - "train": { - "epochs": 50, - "batch_size": 32, - "metrics": [ - { - "name": "per_token_accuracy", - "inputs": [ - "y", - "y_predicted" - ] - }, - { - "name": "accuracy", - "inputs": [ - "y", - "y_predicted" - ] - } - ], - "validation_patience": 10, - "val_every_n_epochs": 1, - "log_every_n_epochs": 1, - "class_name": "nn_trainer", - "evaluation_targets": [ - "valid", - "test" - ] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "RESULTS_PATH": "{ROOT_PATH}/results" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/morpho_tagger/UD2.0/es_ancora.tar.gz", - "subdir": "{MODELS_PATH}/morpho_tagger/UD2.0/es_ancora" - }, - { - "url": "http://files.deeppavlov.ai/datasets/UD2.0_source/es_ancora.tar.gz", - "subdir": "{DOWNLOADS_PATH}/UD2.0_source/es_ancora" - } - ] - } -} \ No newline at end of file diff --git a/deeppavlov/configs/morpho_tagger/UD2.0/morpho_fr.json b/deeppavlov/configs/morpho_tagger/UD2.0/morpho_fr.json deleted file mode 100644 index 7c944e807a..0000000000 --- a/deeppavlov/configs/morpho_tagger/UD2.0/morpho_fr.json +++ /dev/null @@ -1,173 +0,0 @@ -{ - "dataset_reader": { - "class_name": "morphotagger_dataset_reader", - "data_path": "{DOWNLOADS_PATH}/UD2.0_source", - "language": "fr", - "data_types": [ - "train", - "dev", - "test" - ] - }, - "dataset_iterator": { - "class_name": "morphotagger_dataset" - }, - "chainer": { - "in": [ - "x" - ], - "in_y": [ - "y" - ], - "pipe": [ - { - "in": [ - "x" - ], - "class_name": "lazy_tokenizer", - "out": [ - "x_tokens" - ] - }, - { - "id": "char_splitting_lowercase_preprocessor", - "class_name": "char_splitting_lowercase_preprocessor", - "in": [ - "x_tokens" - ], - "out": [ - "x_chars_lowered_marked" - ] - }, - { - "id": "tag_vocab", - "class_name": "simple_vocab", - "fit_on": [ - "y" - ], - "special_tokens": [ - "PAD", - "BEGIN", - "END" - ], - "save_path": "{MODELS_PATH}/morpho_tagger/UD2.0/fr/tag.dict", - "load_path": "{MODELS_PATH}/morpho_tagger/UD2.0/fr/tag.dict" - }, - { - "id": "char_vocab", - "class_name": "simple_vocab", - "min_freq": 3, - "fit_on": [ - "x_chars_lowered_marked" - ], - "special_tokens": [ - "PAD", - "BEGIN", - "END" - ], - "save_path": "{MODELS_PATH}/morpho_tagger/UD2.0/fr/char.dict", - "load_path": "{MODELS_PATH}/morpho_tagger/UD2.0/fr/char.dict" - }, - { - "in": [ - "x_chars_lowered_marked" - ], - "in_y": [ - "y" - ], - "out": [ - "y_predicted" - ], - "class_name": "morpho_tagger", - "main": true, - "save_path": "{MODELS_PATH}/morpho_tagger/UD2.0/fr/model.hdf5", - "load_path": "{MODELS_PATH}/morpho_tagger/UD2.0/fr/model.hdf5", - "tags": "#tag_vocab", - "symbols": "#char_vocab", - "verbose": 1, - "char_embeddings_size": 32, - "char_window_size": [ - 1, - 2, - 3, - 4, - 5, - 6, - 7 - ], - "word_lstm_units": 128, - "conv_dropout": 0.0, - "char_conv_layers": 1, - "char_highway_layers": 1, - "highway_dropout": 0.0, - "word_lstm_layers": 1, - "char_filter_multiple": 50, - "intermediate_dropout": 0.0, - "word_dropout": 0.2, - "lstm_dropout": 0.2, - "regularizer": 0.01 - }, - { - "in": [ - "x_tokens", - "y_predicted" - ], - "out": [ - "y_prettified" - ], - "id": "prettifier", - "class_name": "tag_output_prettifier", - "end": "\n" - } - ], - "out": [ - "y_prettified" - ] - }, - "train": { - "epochs": 50, - "batch_size": 32, - "metrics": [ - { - "name": "per_token_accuracy", - "inputs": [ - "y", - "y_predicted" - ] - }, - { - "name": "accuracy", - "inputs": [ - "y", - "y_predicted" - ] - } - ], - "validation_patience": 10, - "val_every_n_epochs": 1, - "log_every_n_epochs": 1, - "class_name": "nn_trainer", - "evaluation_targets": [ - "valid", - "test" - ] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "RESULTS_PATH": "{ROOT_PATH}/results" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/morpho_tagger/UD2.0/fr.tar.gz", - "subdir": "{MODELS_PATH}/morpho_tagger/UD2.0/fr" - }, - { - "url": "http://files.deeppavlov.ai/datasets/UD2.0_source/fr.tar.gz", - "subdir": "{DOWNLOADS_PATH}/UD2.0_source/fr" - } - ] - } -} \ No newline at end of file diff --git a/deeppavlov/configs/morpho_tagger/UD2.0/morpho_hi.json b/deeppavlov/configs/morpho_tagger/UD2.0/morpho_hi.json deleted file mode 100644 index ff10e2e4ba..0000000000 --- a/deeppavlov/configs/morpho_tagger/UD2.0/morpho_hi.json +++ /dev/null @@ -1,173 +0,0 @@ -{ - "dataset_reader": { - "class_name": "morphotagger_dataset_reader", - "data_path": "{DOWNLOADS_PATH}/UD2.0_source", - "language": "hi", - "data_types": [ - "train", - "dev", - "test" - ] - }, - "dataset_iterator": { - "class_name": "morphotagger_dataset" - }, - "chainer": { - "in": [ - "x" - ], - "in_y": [ - "y" - ], - "pipe": [ - { - "in": [ - "x" - ], - "class_name": "lazy_tokenizer", - "out": [ - "x_tokens" - ] - }, - { - "id": "char_splitting_lowercase_preprocessor", - "class_name": "char_splitting_lowercase_preprocessor", - "in": [ - "x_tokens" - ], - "out": [ - "x_chars_lowered_marked" - ] - }, - { - "id": "tag_vocab", - "class_name": "simple_vocab", - "fit_on": [ - "y" - ], - "special_tokens": [ - "PAD", - "BEGIN", - "END" - ], - "save_path": "{MODELS_PATH}/morpho_tagger/UD2.0/hi/tag.dict", - "load_path": "{MODELS_PATH}/morpho_tagger/UD2.0/hi/tag.dict" - }, - { - "id": "char_vocab", - "class_name": "simple_vocab", - "min_freq": 3, - "fit_on": [ - "x_chars_lowered_marked" - ], - "special_tokens": [ - "PAD", - "BEGIN", - "END" - ], - "save_path": "{MODELS_PATH}/morpho_tagger/UD2.0/hi/char.dict", - "load_path": "{MODELS_PATH}/morpho_tagger/UD2.0/hi/char.dict" - }, - { - "in": [ - "x_chars_lowered_marked" - ], - "in_y": [ - "y" - ], - "out": [ - "y_predicted" - ], - "class_name": "morpho_tagger", - "main": true, - "save_path": "{MODELS_PATH}/morpho_tagger/UD2.0/hi/model.hdf5", - "load_path": "{MODELS_PATH}/morpho_tagger/UD2.0/hi/model.hdf5", - "tags": "#tag_vocab", - "symbols": "#char_vocab", - "verbose": 1, - "char_embeddings_size": 32, - "char_window_size": [ - 1, - 2, - 3, - 4, - 5, - 6, - 7 - ], - "word_lstm_units": 128, - "conv_dropout": 0.0, - "char_conv_layers": 1, - "char_highway_layers": 1, - "highway_dropout": 0.0, - "word_lstm_layers": 1, - "char_filter_multiple": 50, - "intermediate_dropout": 0.0, - "word_dropout": 0.2, - "lstm_dropout": 0.2, - "regularizer": 0.01 - }, - { - "in": [ - "x_tokens", - "y_predicted" - ], - "out": [ - "y_prettified" - ], - "id": "prettifier", - "class_name": "tag_output_prettifier", - "end": "\n" - } - ], - "out": [ - "y_prettified" - ] - }, - "train": { - "epochs": 50, - "batch_size": 32, - "metrics": [ - { - "name": "per_token_accuracy", - "inputs": [ - "y", - "y_predicted" - ] - }, - { - "name": "accuracy", - "inputs": [ - "y", - "y_predicted" - ] - } - ], - "validation_patience": 10, - "val_every_n_epochs": 1, - "log_every_n_epochs": 1, - "class_name": "nn_trainer", - "evaluation_targets": [ - "valid", - "test" - ] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "RESULTS_PATH": "{ROOT_PATH}/results" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/morpho_tagger/UD2.0/hi.tar.gz", - "subdir": "{MODELS_PATH}/morpho_tagger/UD2.0/hi" - }, - { - "url": "http://files.deeppavlov.ai/datasets/UD2.0_source/hi.tar.gz", - "subdir": "{DOWNLOADS_PATH}/UD2.0_source/hi" - } - ] - } -} \ No newline at end of file diff --git a/deeppavlov/configs/morpho_tagger/UD2.0/morpho_hu.json b/deeppavlov/configs/morpho_tagger/UD2.0/morpho_hu.json deleted file mode 100644 index 6e399a3a43..0000000000 --- a/deeppavlov/configs/morpho_tagger/UD2.0/morpho_hu.json +++ /dev/null @@ -1,173 +0,0 @@ -{ - "dataset_reader": { - "class_name": "morphotagger_dataset_reader", - "data_path": "{DOWNLOADS_PATH}/UD2.0_source", - "language": "hu", - "data_types": [ - "train", - "dev", - "test" - ] - }, - "dataset_iterator": { - "class_name": "morphotagger_dataset" - }, - "chainer": { - "in": [ - "x" - ], - "in_y": [ - "y" - ], - "pipe": [ - { - "in": [ - "x" - ], - "class_name": "lazy_tokenizer", - "out": [ - "x_tokens" - ] - }, - { - "id": "char_splitting_lowercase_preprocessor", - "class_name": "char_splitting_lowercase_preprocessor", - "in": [ - "x_tokens" - ], - "out": [ - "x_chars_lowered_marked" - ] - }, - { - "id": "tag_vocab", - "class_name": "simple_vocab", - "fit_on": [ - "y" - ], - "special_tokens": [ - "PAD", - "BEGIN", - "END" - ], - "save_path": "{MODELS_PATH}/morpho_tagger/UD2.0/hu/tag.dict", - "load_path": "{MODELS_PATH}/morpho_tagger/UD2.0/hu/tag.dict" - }, - { - "id": "char_vocab", - "class_name": "simple_vocab", - "min_freq": 3, - "fit_on": [ - "x_chars_lowered_marked" - ], - "special_tokens": [ - "PAD", - "BEGIN", - "END" - ], - "save_path": "{MODELS_PATH}/morpho_tagger/UD2.0/hu/char.dict", - "load_path": "{MODELS_PATH}/morpho_tagger/UD2.0/hu/char.dict" - }, - { - "in": [ - "x_chars_lowered_marked" - ], - "in_y": [ - "y" - ], - "out": [ - "y_predicted" - ], - "class_name": "morpho_tagger", - "main": true, - "save_path": "{MODELS_PATH}/morpho_tagger/UD2.0/hu/model.hdf5", - "load_path": "{MODELS_PATH}/morpho_tagger/UD2.0/hu/model.hdf5", - "tags": "#tag_vocab", - "symbols": "#char_vocab", - "verbose": 1, - "char_embeddings_size": 32, - "char_window_size": [ - 1, - 2, - 3, - 4, - 5, - 6, - 7 - ], - "word_lstm_units": 128, - "conv_dropout": 0.0, - "char_conv_layers": 1, - "char_highway_layers": 1, - "highway_dropout": 0.0, - "word_lstm_layers": 1, - "char_filter_multiple": 50, - "intermediate_dropout": 0.0, - "word_dropout": 0.2, - "lstm_dropout": 0.2, - "regularizer": 0.01 - }, - { - "in": [ - "x_tokens", - "y_predicted" - ], - "out": [ - "y_prettified" - ], - "id": "prettifier", - "class_name": "tag_output_prettifier", - "end": "\n" - } - ], - "out": [ - "y_prettified" - ] - }, - "train": { - "epochs": 50, - "batch_size": 32, - "metrics": [ - { - "name": "per_token_accuracy", - "inputs": [ - "y", - "y_predicted" - ] - }, - { - "name": "accuracy", - "inputs": [ - "y", - "y_predicted" - ] - } - ], - "validation_patience": 10, - "val_every_n_epochs": 1, - "log_every_n_epochs": 1, - "class_name": "nn_trainer", - "evaluation_targets": [ - "valid", - "test" - ] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "RESULTS_PATH": "{ROOT_PATH}/results" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/morpho_tagger/UD2.0/hu.tar.gz", - "subdir": "{MODELS_PATH}/morpho_tagger/UD2.0/hu" - }, - { - "url": "http://files.deeppavlov.ai/datasets/UD2.0_source/hu.tar.gz", - "subdir": "{DOWNLOADS_PATH}/UD2.0_source/hu" - } - ] - } -} \ No newline at end of file diff --git a/deeppavlov/configs/morpho_tagger/UD2.0/morpho_it.json b/deeppavlov/configs/morpho_tagger/UD2.0/morpho_it.json deleted file mode 100644 index a84510a2e2..0000000000 --- a/deeppavlov/configs/morpho_tagger/UD2.0/morpho_it.json +++ /dev/null @@ -1,173 +0,0 @@ -{ - "dataset_reader": { - "class_name": "morphotagger_dataset_reader", - "data_path": "{DOWNLOADS_PATH}/UD2.0_source", - "language": "it", - "data_types": [ - "train", - "dev", - "test" - ] - }, - "dataset_iterator": { - "class_name": "morphotagger_dataset" - }, - "chainer": { - "in": [ - "x" - ], - "in_y": [ - "y" - ], - "pipe": [ - { - "in": [ - "x" - ], - "class_name": "lazy_tokenizer", - "out": [ - "x_tokens" - ] - }, - { - "id": "char_splitting_lowercase_preprocessor", - "class_name": "char_splitting_lowercase_preprocessor", - "in": [ - "x_tokens" - ], - "out": [ - "x_chars_lowered_marked" - ] - }, - { - "id": "tag_vocab", - "class_name": "simple_vocab", - "fit_on": [ - "y" - ], - "special_tokens": [ - "PAD", - "BEGIN", - "END" - ], - "save_path": "{MODELS_PATH}/morpho_tagger/UD2.0/it/tag.dict", - "load_path": "{MODELS_PATH}/morpho_tagger/UD2.0/it/tag.dict" - }, - { - "id": "char_vocab", - "class_name": "simple_vocab", - "min_freq": 3, - "fit_on": [ - "x_chars_lowered_marked" - ], - "special_tokens": [ - "PAD", - "BEGIN", - "END" - ], - "save_path": "{MODELS_PATH}/morpho_tagger/UD2.0/it/char.dict", - "load_path": "{MODELS_PATH}/morpho_tagger/UD2.0/it/char.dict" - }, - { - "in": [ - "x_chars_lowered_marked" - ], - "in_y": [ - "y" - ], - "out": [ - "y_predicted" - ], - "class_name": "morpho_tagger", - "main": true, - "save_path": "{MODELS_PATH}/morpho_tagger/UD2.0/it/model.hdf5", - "load_path": "{MODELS_PATH}/morpho_tagger/UD2.0/it/model.hdf5", - "tags": "#tag_vocab", - "symbols": "#char_vocab", - "verbose": 1, - "char_embeddings_size": 32, - "char_window_size": [ - 1, - 2, - 3, - 4, - 5, - 6, - 7 - ], - "word_lstm_units": 128, - "conv_dropout": 0.0, - "char_conv_layers": 1, - "char_highway_layers": 1, - "highway_dropout": 0.0, - "word_lstm_layers": 1, - "char_filter_multiple": 50, - "intermediate_dropout": 0.0, - "word_dropout": 0.2, - "lstm_dropout": 0.2, - "regularizer": 0.01 - }, - { - "in": [ - "x_tokens", - "y_predicted" - ], - "out": [ - "y_prettified" - ], - "id": "prettifier", - "class_name": "tag_output_prettifier", - "end": "\n" - } - ], - "out": [ - "y_prettified" - ] - }, - "train": { - "epochs": 50, - "batch_size": 32, - "metrics": [ - { - "name": "per_token_accuracy", - "inputs": [ - "y", - "y_predicted" - ] - }, - { - "name": "accuracy", - "inputs": [ - "y", - "y_predicted" - ] - } - ], - "validation_patience": 10, - "val_every_n_epochs": 1, - "log_every_n_epochs": 1, - "class_name": "nn_trainer", - "evaluation_targets": [ - "valid", - "test" - ] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "RESULTS_PATH": "{ROOT_PATH}/results" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/morpho_tagger/UD2.0/it.tar.gz", - "subdir": "{MODELS_PATH}/morpho_tagger/UD2.0/it" - }, - { - "url": "http://files.deeppavlov.ai/datasets/UD2.0_source/it.tar.gz", - "subdir": "{DOWNLOADS_PATH}/UD2.0_source/it" - } - ] - } -} \ No newline at end of file diff --git a/deeppavlov/configs/morpho_tagger/UD2.0/morpho_ru_syntagrus.json b/deeppavlov/configs/morpho_tagger/UD2.0/morpho_ru_syntagrus.json deleted file mode 100644 index b5ec00c85e..0000000000 --- a/deeppavlov/configs/morpho_tagger/UD2.0/morpho_ru_syntagrus.json +++ /dev/null @@ -1,173 +0,0 @@ -{ - "dataset_reader": { - "class_name": "morphotagger_dataset_reader", - "data_path": "{DOWNLOADS_PATH}/UD2.0_source", - "language": "ru_syntagrus", - "data_types": [ - "train", - "dev", - "test" - ] - }, - "dataset_iterator": { - "class_name": "morphotagger_dataset" - }, - "chainer": { - "in": [ - "x" - ], - "in_y": [ - "y" - ], - "pipe": [ - { - "in": [ - "x" - ], - "class_name": "lazy_tokenizer", - "out": [ - "x_tokens" - ] - }, - { - "id": "char_splitting_lowercase_preprocessor", - "class_name": "char_splitting_lowercase_preprocessor", - "in": [ - "x_tokens" - ], - "out": [ - "x_chars_lowered_marked" - ] - }, - { - "id": "tag_vocab", - "class_name": "simple_vocab", - "fit_on": [ - "y" - ], - "special_tokens": [ - "PAD", - "BEGIN", - "END" - ], - "save_path": "{MODELS_PATH}/morpho_tagger/UD2.0/ru_syntagrus/tag.dict", - "load_path": "{MODELS_PATH}/morpho_tagger/UD2.0/ru_syntagrus/tag.dict" - }, - { - "id": "char_vocab", - "class_name": "simple_vocab", - "min_freq": 3, - "fit_on": [ - "x_chars_lowered_marked" - ], - "special_tokens": [ - "PAD", - "BEGIN", - "END" - ], - "save_path": "{MODELS_PATH}/morpho_tagger/UD2.0/ru_syntagrus/char.dict", - "load_path": "{MODELS_PATH}/morpho_tagger/UD2.0/ru_syntagrus/char.dict" - }, - { - "in": [ - "x_chars_lowered_marked" - ], - "in_y": [ - "y" - ], - "out": [ - "y_predicted" - ], - "class_name": "morpho_tagger", - "main": true, - "save_path": "{MODELS_PATH}/morpho_tagger/UD2.0/ru_syntagrus/model.hdf5", - "load_path": "{MODELS_PATH}/morpho_tagger/UD2.0/ru_syntagrus/model.hdf5", - "tags": "#tag_vocab", - "symbols": "#char_vocab", - "verbose": 1, - "char_embeddings_size": 32, - "char_window_size": [ - 1, - 2, - 3, - 4, - 5, - 6, - 7 - ], - "word_lstm_units": 128, - "conv_dropout": 0.0, - "char_conv_layers": 1, - "char_highway_layers": 1, - "highway_dropout": 0.0, - "word_lstm_layers": 1, - "char_filter_multiple": 50, - "intermediate_dropout": 0.0, - "word_dropout": 0.2, - "lstm_dropout": 0.2, - "regularizer": 0.01 - }, - { - "in": [ - "x_tokens", - "y_predicted" - ], - "out": [ - "y_prettified" - ], - "id": "prettifier", - "class_name": "tag_output_prettifier", - "end": "\n" - } - ], - "out": [ - "y_prettified" - ] - }, - "train": { - "epochs": 50, - "batch_size": 32, - "metrics": [ - { - "name": "per_token_accuracy", - "inputs": [ - "y", - "y_predicted" - ] - }, - { - "name": "accuracy", - "inputs": [ - "y", - "y_predicted" - ] - } - ], - "validation_patience": 10, - "val_every_n_epochs": 1, - "log_every_n_epochs": 1, - "class_name": "nn_trainer", - "evaluation_targets": [ - "valid", - "test" - ] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "RESULTS_PATH": "{ROOT_PATH}/results" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/morpho_tagger/UD2.0/ru_syntagrus.tar.gz", - "subdir": "{MODELS_PATH}/morpho_tagger/UD2.0/ru_syntagrus" - }, - { - "url": "http://files.deeppavlov.ai/datasets/UD2.0_source/ru_syntagrus.tar.gz", - "subdir": "{DOWNLOADS_PATH}/UD2.0_source/ru_syntagrus" - } - ] - } -} \ No newline at end of file diff --git a/deeppavlov/configs/morpho_tagger/UD2.0/morpho_ru_syntagrus_pymorphy.json b/deeppavlov/configs/morpho_tagger/UD2.0/morpho_ru_syntagrus_pymorphy.json deleted file mode 100644 index ef67338faa..0000000000 --- a/deeppavlov/configs/morpho_tagger/UD2.0/morpho_ru_syntagrus_pymorphy.json +++ /dev/null @@ -1,193 +0,0 @@ -{ - "dataset_reader": { - "class_name": "morphotagger_dataset_reader", - "data_path": "{DOWNLOADS_PATH}/UD2.0_source", - "language": "ru_syntagrus", - "data_types": [ - "train", - "dev", - "test" - ] - }, - "dataset_iterator": { - "class_name": "morphotagger_dataset" - }, - "chainer": { - "in": [ - "x" - ], - "in_y": [ - "y" - ], - "pipe": [ - { - "in": [ - "x" - ], - "class_name": "lazy_tokenizer", - "out": [ - "x_tokens" - ] - }, - { - "id": "char_splitting_lowercase_preprocessor", - "class_name": "char_splitting_lowercase_preprocessor", - "in": [ - "x_tokens" - ], - "out": [ - "x_chars_lowered_marked" - ] - }, - { - "id": "tag_vocab", - "class_name": "simple_vocab", - "fit_on": [ - "y" - ], - "special_tokens": [ - "PAD", - "BEGIN", - "END" - ], - "save_path": "{MODELS_PATH}/morpho_tagger/UD2.0/ru_syntagrus/tag.dict", - "load_path": "{MODELS_PATH}/morpho_tagger/UD2.0/ru_syntagrus/tag.dict" - }, - { - "id": "char_vocab", - "class_name": "simple_vocab", - "min_freq": 3, - "fit_on": [ - "x_chars_lowered_marked" - ], - "special_tokens": [ - "PAD", - "BEGIN", - "END" - ], - "save_path": "{MODELS_PATH}/morpho_tagger/UD2.0/ru_syntagrus/char.dict", - "load_path": "{MODELS_PATH}/morpho_tagger/UD2.0/ru_syntagrus/char.dict" - }, - { - "id": "pymorphy_vectorizer", - "class_name": "pymorphy_vectorizer", - "save_path": "{MODELS_PATH}/morpho_tagger/UD2.0/ru_syntagrus/tags_russian.txt", - "load_path": "{MODELS_PATH}/morpho_tagger/UD2.0/ru_syntagrus/tags_russian.txt", - "max_pymorphy_variants": 5, - "in": [ - "x_tokens" - ], - "out": [ - "x_possible_tags" - ] - }, - { - "in": [ - "x_chars_lowered_marked", - "x_possible_tags" - ], - "in_y": [ - "y" - ], - "out": [ - "y_predicted" - ], - "class_name": "morpho_tagger", - "main": true, - "save_path": "{MODELS_PATH}/morpho_tagger/UD2.0/ru_syntagrus/model_pymorphy.hdf5", - "load_path": "{MODELS_PATH}/morpho_tagger/UD2.0/ru_syntagrus/model_pymorphy.hdf5", - "tags": "#tag_vocab", - "symbols": "#char_vocab", - "verbose": 1, - "char_embeddings_size": 32, - "char_window_size": [ - 1, - 2, - 3, - 4, - 5, - 6, - 7 - ], - "word_lstm_units": 128, - "conv_dropout": 0.0, - "char_conv_layers": 1, - "char_highway_layers": 1, - "highway_dropout": 0.0, - "word_lstm_layers": 1, - "char_filter_multiple": 50, - "intermediate_dropout": 0.0, - "word_dropout": 0.2, - "lstm_dropout": 0.2, - "regularizer": 0.01, - "word_vectorizers": [ - [ - "#pymorphy_vectorizer.dim", - 128 - ] - ] - }, - { - "in": [ - "x_tokens", - "y_predicted" - ], - "out": [ - "y_prettified" - ], - "id": "prettifier", - "class_name": "tag_output_prettifier", - "end": "\n" - } - ], - "out": [ - "y_prettified" - ] - }, - "train": { - "epochs": 50, - "batch_size": 32, - "metrics": [ - { - "name": "per_token_accuracy", - "inputs": [ - "y", - "y_predicted" - ] - }, - { - "name": "accuracy", - "inputs": [ - "y", - "y_predicted" - ] - } - ], - "validation_patience": 10, - "val_every_n_epochs": 1, - "log_every_n_epochs": 1, - "class_name": "nn_trainer", - "evaluation_targets": [ - "valid", - "test" - ] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "RESULTS_PATH": "{ROOT_PATH}/results" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/morpho_tagger/UD2.0/ru_syntagrus.tar.gz", - "subdir": "{MODELS_PATH}/morpho_tagger/UD2.0/ru_syntagrus" - }, - { - "url": "http://files.deeppavlov.ai/datasets/UD2.0_source/ru_syntagrus.tar.gz", - "subdir": "{DOWNLOADS_PATH}/UD2.0_source/ru_syntagrus" - } - ] - } -} \ No newline at end of file diff --git a/deeppavlov/configs/morpho_tagger/UD2.0/morpho_ru_syntagrus_pymorphy_lemmatize.json b/deeppavlov/configs/morpho_tagger/UD2.0/morpho_ru_syntagrus_pymorphy_lemmatize.json deleted file mode 100644 index da40a4f2a5..0000000000 --- a/deeppavlov/configs/morpho_tagger/UD2.0/morpho_ru_syntagrus_pymorphy_lemmatize.json +++ /dev/null @@ -1,201 +0,0 @@ -{ - "dataset_reader": { - "class_name": "morphotagger_dataset_reader", - "data_path": "{DOWNLOADS_PATH}/UD2.0_source", - "language": "ru_syntagrus", - "data_types": [ - "train", - "dev", - "test" - ] - }, - "dataset_iterator": { - "class_name": "morphotagger_dataset" - }, - "chainer": { - "in": [ - "x" - ], - "in_y": [ - "y" - ], - "pipe": [ - { - "in": [ - "x" - ], - "class_name": "lazy_tokenizer", - "language": "russian", - "out": [ - "x_tokens" - ] - }, - { - "id": "char_splitting_lowercase_preprocessor", - "class_name": "char_splitting_lowercase_preprocessor", - "in": [ - "x_tokens" - ], - "out": [ - "x_chars_lowered_marked" - ] - }, - { - "id": "tag_vocab", - "class_name": "simple_vocab", - "fit_on": [ - "y" - ], - "special_tokens": [ - "PAD", - "BEGIN", - "END" - ], - "save_path": "{MODELS_PATH}/morpho_tagger/UD2.0/ru_syntagrus/tag.dict", - "load_path": "{MODELS_PATH}/morpho_tagger/UD2.0/ru_syntagrus/tag.dict" - }, - { - "id": "char_vocab", - "class_name": "simple_vocab", - "min_freq": 3, - "fit_on": [ - "x_chars_lowered_marked" - ], - "special_tokens": [ - "PAD", - "BEGIN", - "END" - ], - "save_path": "{MODELS_PATH}/morpho_tagger/UD2.0/ru_syntagrus/char.dict", - "load_path": "{MODELS_PATH}/morpho_tagger/UD2.0/ru_syntagrus/char.dict" - }, - { - "id": "pymorphy_vectorizer", - "class_name": "pymorphy_vectorizer", - "save_path": "{MODELS_PATH}/morpho_tagger/UD2.0/ru_syntagrus/tags_russian.txt", - "load_path": "{MODELS_PATH}/morpho_tagger/UD2.0/ru_syntagrus/tags_russian.txt", - "max_pymorphy_variants": 5, - "in": [ - "x_tokens" - ], - "out": [ - "x_possible_tags" - ] - }, - { - "in": [ - "x_chars_lowered_marked", - "x_possible_tags" - ], - "in_y": [ - "y" - ], - "out": [ - "y_predicted" - ], - "class_name": "morpho_tagger", - "main": true, - "save_path": "{MODELS_PATH}/morpho_tagger/UD2.0/ru_syntagrus/model_pymorphy.hdf5", - "load_path": "{MODELS_PATH}/morpho_tagger/UD2.0/ru_syntagrus/model_pymorphy.hdf5", - "tags": "#tag_vocab", - "symbols": "#char_vocab", - "verbose": 1, - "char_embeddings_size": 32, - "char_window_size": [ - 1, - 2, - 3, - 4, - 5, - 6, - 7 - ], - "word_lstm_units": 128, - "conv_dropout": 0.0, - "char_conv_layers": 1, - "char_highway_layers": 1, - "highway_dropout": 0.0, - "word_lstm_layers": 1, - "char_filter_multiple": 50, - "intermediate_dropout": 0.0, - "word_dropout": 0.2, - "lstm_dropout": 0.2, - "regularizer": 0.01, - "word_vectorizers": [ - [ - "#pymorphy_vectorizer.dim", - 128 - ] - ] - }, - { - "in": [ - "x_tokens", - "y_predicted" - ], - "out": [ - "y_lemmas" - ], - "class_name": "UD_pymorphy_lemmatizer", - "end": "\n" - }, - { - "in": [ - "x_tokens", - "y_predicted", - "y_lemmas" - ], - "out": [ - "y_prettified" - ], - "id": "prettifier", - "class_name": "lemmatized_output_prettifier", - "end": "\n" - } - ], - "out": [ - "y_prettified" - ] - }, - "train": { - "epochs": 50, - "batch_size": 32, - "metrics": [ - { - "name": "per_token_accuracy", - "inputs": [ - "y", - "y_predicted" - ] - }, - { - "name": "accuracy", - "inputs": [ - "y", - "y_predicted" - ] - } - ], - "validation_patience": 10, - "val_every_n_epochs": 1, - "log_every_n_epochs": 1 - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "RESULTS_PATH": "{ROOT_PATH}/results" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/morpho_tagger/UD2.0/ru_syntagrus.tar.gz", - "subdir": "{MODELS_PATH}/morpho_tagger/UD2.0/ru_syntagrus" - }, - { - "url": "http://files.deeppavlov.ai/datasets/UD2.0_source/ru_syntagrus.tar.gz", - "subdir": "{DOWNLOADS_PATH}/UD2.0_source/ru_syntagrus" - } - ] - } -} \ No newline at end of file diff --git a/deeppavlov/configs/morpho_tagger/UD2.0/morpho_tr.json b/deeppavlov/configs/morpho_tagger/UD2.0/morpho_tr.json deleted file mode 100644 index e7887f1560..0000000000 --- a/deeppavlov/configs/morpho_tagger/UD2.0/morpho_tr.json +++ /dev/null @@ -1,174 +0,0 @@ -{ - "dataset_reader": { - "class_name": "morphotagger_dataset_reader", - "data_path": "{DOWNLOADS_PATH}/UD2.0_source", - "language": "tr", - "data_types": [ - "train", - "dev", - "test" - ] - }, - "dataset_iterator": { - "class_name": "morphotagger_dataset", - "min_train_fraction": 0.9 - }, - "chainer": { - "in": [ - "x" - ], - "in_y": [ - "y" - ], - "pipe": [ - { - "in": [ - "x" - ], - "class_name": "lazy_tokenizer", - "out": [ - "x_tokens" - ] - }, - { - "id": "char_splitting_lowercase_preprocessor", - "class_name": "char_splitting_lowercase_preprocessor", - "in": [ - "x_tokens" - ], - "out": [ - "x_chars_lowered_marked" - ] - }, - { - "id": "tag_vocab", - "class_name": "simple_vocab", - "fit_on": [ - "y" - ], - "special_tokens": [ - "PAD", - "BEGIN", - "END" - ], - "save_path": "{MODELS_PATH}/morpho_tagger/UD2.0/tr/tag.dict", - "load_path": "{MODELS_PATH}/morpho_tagger/UD2.0/tr/tag.dict" - }, - { - "id": "char_vocab", - "class_name": "simple_vocab", - "min_freq": 3, - "fit_on": [ - "x_chars_lowered_marked" - ], - "special_tokens": [ - "PAD", - "BEGIN", - "END" - ], - "save_path": "{MODELS_PATH}/morpho_tagger/UD2.0/tr/char.dict", - "load_path": "{MODELS_PATH}/morpho_tagger/UD2.0/tr/char.dict" - }, - { - "in": [ - "x_chars_lowered_marked" - ], - "in_y": [ - "y" - ], - "out": [ - "y_predicted" - ], - "class_name": "morpho_tagger", - "main": true, - "save_path": "{MODELS_PATH}/morpho_tagger/UD2.0/tr/model.hdf5", - "load_path": "{MODELS_PATH}/morpho_tagger/UD2.0/tr/model.hdf5", - "tags": "#tag_vocab", - "symbols": "#char_vocab", - "verbose": 1, - "char_embeddings_size": 32, - "char_window_size": [ - 1, - 2, - 3, - 4, - 5, - 6, - 7 - ], - "word_lstm_units": 128, - "conv_dropout": 0.0, - "char_conv_layers": 1, - "char_highway_layers": 1, - "highway_dropout": 0.0, - "word_lstm_layers": 1, - "char_filter_multiple": 50, - "intermediate_dropout": 0.0, - "word_dropout": 0.2, - "lstm_dropout": 0.2, - "regularizer": 0.01 - }, - { - "in": [ - "x_tokens", - "y_predicted" - ], - "out": [ - "y_prettified" - ], - "id": "prettifier", - "class_name": "tag_output_prettifier", - "end": "\n" - } - ], - "out": [ - "y_prettified" - ] - }, - "train": { - "epochs": 50, - "batch_size": 32, - "metrics": [ - { - "name": "per_token_accuracy", - "inputs": [ - "y", - "y_predicted" - ] - }, - { - "name": "accuracy", - "inputs": [ - "y", - "y_predicted" - ] - } - ], - "validation_patience": 10, - "val_every_n_epochs": 1, - "log_every_n_epochs": 1, - "class_name": "nn_trainer", - "evaluation_targets": [ - "valid", - "test" - ] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "RESULTS_PATH": "{ROOT_PATH}/results" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/morpho_tagger/UD2.0/tr.tar.gz", - "subdir": "{MODELS_PATH}/morpho_tagger/UD2.0/tr" - }, - { - "url": "http://files.deeppavlov.ai/datasets/UD2.0_source/tr.tar.gz", - "subdir": "{DOWNLOADS_PATH}/UD2.0_source/tr" - } - ] - } -} \ No newline at end of file diff --git a/deeppavlov/configs/nemo/asr.json b/deeppavlov/configs/nemo/asr.json deleted file mode 100644 index 410e0ac560..0000000000 --- a/deeppavlov/configs/nemo/asr.json +++ /dev/null @@ -1,26 +0,0 @@ -{ - "chainer": { - "in": "speech", - "pipe": [ - { - "class_name": "nemo_asr", - "nemo_params_path": "{NEMO_PATH}/quartznet15x5/quartznet15x5.yaml", - "load_path": "{NEMO_PATH}/quartznet15x5", - "in": ["speech"], - "out": ["text"] - } - ], - "out": ["text"] - }, - "metadata": { - "variables": { - "NEMO_PATH": "~/.deeppavlov/models/nemo" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/nemo/quartznet15x5.tar.gz", - "subdir": "{NEMO_PATH}" - } - ] - } -} diff --git a/deeppavlov/configs/nemo/asr_tts.json b/deeppavlov/configs/nemo/asr_tts.json deleted file mode 100644 index 8ecc10c304..0000000000 --- a/deeppavlov/configs/nemo/asr_tts.json +++ /dev/null @@ -1,48 +0,0 @@ -{ - "chainer": { - "in": "speech_in_encoded", - "pipe": [ - { - "class_name": "base64_decode_bytesIO", - "in": ["speech_in_encoded"], - "out": ["speech_in"] - }, - { - "class_name": "nemo_asr", - "nemo_params_path": "{NEMO_PATH}/quartznet15x5/quartznet15x5.yaml", - "load_path": "{NEMO_PATH}/quartznet15x5", - "in": ["speech_in"], - "out": ["text"] - }, - { - "class_name": "nemo_tts", - "nemo_params_path": "{TTS_PATH}/tacotron2_waveglow.yaml", - "load_path": "{TTS_PATH}", - "in": ["text"], - "out": ["speech_out"] - }, - { - "class_name": "bytesIO_encode_base64", - "in": ["speech_out"], - "out": ["speech_out_encoded"] - } - ], - "out": ["text", "speech_out_encoded"] - }, - "metadata": { - "variables": { - "NEMO_PATH": "~/.deeppavlov/models/nemo", - "TTS_PATH": "{NEMO_PATH}/tacotron2_waveglow" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/nemo/quartznet15x5.tar.gz", - "subdir": "{NEMO_PATH}" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/nemo/tacotron2_waveglow.tar.gz", - "subdir": "{NEMO_PATH}" - } - ] - } -} diff --git a/deeppavlov/configs/nemo/tts.json b/deeppavlov/configs/nemo/tts.json deleted file mode 100644 index 6cbac9a043..0000000000 --- a/deeppavlov/configs/nemo/tts.json +++ /dev/null @@ -1,27 +0,0 @@ -{ - "chainer": { - "in": ["text", "filepath"], - "pipe": [ - { - "class_name": "nemo_tts", - "nemo_params_path": "{TTS_PATH}/tacotron2_waveglow.yaml", - "load_path": "{TTS_PATH}", - "in": ["text", "filepath"], - "out": ["saved_path"] - } - ], - "out": ["saved_path"] - }, - "metadata": { - "variables": { - "NEMO_PATH": "~/.deeppavlov/models/nemo", - "TTS_PATH": "{NEMO_PATH}/tacotron2_waveglow" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/nemo/tacotron2_waveglow.tar.gz", - "subdir": "{NEMO_PATH}" - } - ] - } -} diff --git a/deeppavlov/configs/ner/conll2003_m1.json b/deeppavlov/configs/ner/conll2003_m1.json deleted file mode 100644 index c792ca336c..0000000000 --- a/deeppavlov/configs/ner/conll2003_m1.json +++ /dev/null @@ -1,148 +0,0 @@ -{ - "dataset_reader": { - "class_name": "conll2003_reader", - "data_path": "{DOWNLOADS_PATH}/conll2003/", - "dataset_name": "conll2003", - "provide_pos": true, - "provide_chunk": false, - "iobes": true - }, - "dataset_iterator": { - "class_name": "data_learning_iterator" - }, - "chainer": { - "in": ["x", "pos"], - "in_y": ["y"], - "pipe": [ - { - "in": ["x"], - "out": ["x_tokens"], - "class_name": "lazy_tokenizer" - }, - { - "in": ["pos"], - "out": ["pos_tokens"], - "class_name": "lazy_tokenizer" - }, - { - "in": ["x_tokens"], - "out": ["x_lower", "sent_lengths", "x_tokens_elmo"], - "class_name": "ner_preprocessor", - "get_x_padded_for_elmo": true - }, - { - "in": ["x_lower"], - "out": ["x_tok_ind"], - "fit_on": ["x_lower"], - "class_name": "ner_vocab", - "id": "word_vocab", - "save_path": "{MODELS_PATH}/word.dict", - "load_path": "{MODELS_PATH}/word.dict" - }, - { - "in": ["pos_tokens"], - "out": ["pos_ind"], - "fit_on": ["pos_tokens"], - "class_name": "ner_vocab", - "id": "pos_vocab", - "save_path": "{MODELS_PATH}/pos.dict", - "load_path": "{MODELS_PATH}/pos.dict" - }, - { - "in": ["y"], - "out": ["y_ind"], - "fit_on": ["y"], - "class_name": "ner_vocab", - "id": "tag_vocab", - "save_path": "{MODELS_PATH}/tag.dict", - "load_path": "{MODELS_PATH}/tag.dict" - }, - { - "in": ["x_tokens"], - "out": ["x_char_ind"], - "fit_on": ["x_tokens"], - "class_name": "ner_vocab", - "char_level": true, - "id": "char_vocab", - "save_path": "{MODELS_PATH}/char.dict", - "load_path": "{MODELS_PATH}/char.dict" - }, - { - "in":[ - "sent_lengths", - "x_tok_ind", - "pos_ind", - "x_char_ind", - "x_tokens_elmo" - ], - "in_y": ["y_ind"], - "out": ["y_predicted"], - "class_name": "hybrid_ner_model", - "n_tags": "#tag_vocab.len", - "word_emb_path": "{DOWNLOADS_PATH}/embeddings/glove.6B.100d.txt", - "word_emb_name": "glove", - "word_dim": 100, - "word_vocab": "#word_vocab", - "char_vocab_size": "#char_vocab.len", - "pos_vocab_size": "#pos_vocab.len", - "pos_dim": 40, - "char_dim": 100, - "elmo_dim": 128, - "lstm_hidden_size": 256, - "save_path": "{MODELS_PATH}/conll2003_m1", - "load_path": "{MODELS_PATH}/conll2003_m1", - "learning_rate": 1e-3, - "learning_rate_drop_patience": 5, - "learning_rate_drop_div": 10, - "dropout_keep_prob": 0.7 - }, - { - "in": ["y_predicted"], - "out": ["tags"], - "class_name": "convert_ids2tags", - "id2tag": "#tag_vocab.i2t" - } - ], - "out": ["x_tokens", "tags"] - }, - "train": { - "epochs": 100, - "batch_size": 64, - "metrics": [ - { - "name": "ner_f1", - "inputs": ["y", "tags"] - }, - { - "name": "ner_token_f1", - "inputs": ["y", "tags"] - } - ], - "validation_patience": 10, - "val_every_n_epochs": 1, - "log_every_n_epochs": 1, - "show_examples": false, - "class_name": "nn_trainer", - "evaluation_targets": [ - "valid", - "test" - ] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models/conll2003_m1" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/ner_conll2003_m1.tar.gz", - "subdir": "{MODELS_PATH}" - }, - { - "url": "http://files.deeppavlov.ai/embeddings/glove.6B.100d.txt", - "subdir": "{DOWNLOADS_PATH}/embeddings" - } - ] - } -} diff --git a/deeppavlov/configs/ner/ner_bert_ent_and_type_rus.json b/deeppavlov/configs/ner/ner_bert_ent_and_type_rus.json deleted file mode 100644 index f7c3f6fcc7..0000000000 --- a/deeppavlov/configs/ner/ner_bert_ent_and_type_rus.json +++ /dev/null @@ -1,119 +0,0 @@ -{ - "dataset_reader": { - "class_name": "sq_reader", - "data_path": "{DOWNLOADS_PATH}/lcquad/entity_and_type_detection_rus.pickle" - }, - "dataset_iterator": { - "class_name": "data_learning_iterator" - }, - "chainer": { - "in": ["x"], - "in_y": ["y"], - "pipe": [ - { - "class_name": "bert_ner_preprocessor", - "vocab_file": "{BERT_PATH}/vocab.txt", - "do_lower_case": false, - "max_seq_length": 512, - "max_subword_length": 15, - "token_maksing_prob": 0.0, - "in": ["x"], - "out": ["x_tokens", "x_subword_tokens", "x_subword_tok_ids", "pred_subword_mask"] - }, - { - "class_name": "mask", - "in": ["x_subword_tokens"], - "out": ["x_subword_mask"] - }, - { - "id": "tag_vocab", - "class_name": "simple_vocab", - "unk_token": ["O"], - "pad_with_zeros": true, - "save_path": "{NER_PATH}/tag.dict", - "load_path": "{NER_PATH}/tag.dict", - "fit_on": ["y"], - "in": ["y"], - "out": ["y_ind"] - }, - { - "class_name": "bert_sequence_tagger", - "n_tags": "#tag_vocab.len", - "keep_prob": 0.1, - "bert_config_file": "{BERT_PATH}/bert_config.json", - "pretrained_bert": "{BERT_PATH}/bert_model.ckpt", - "attention_probs_keep_prob": 0.5, - "use_crf": false, - "return_probas": true, - "ema_decay": 0.9, - "encoder_layer_ids": [-1], - "optimizer": "tf.train:AdamOptimizer", - "learning_rate": 1e-3, - "bert_learning_rate": 2e-5, - "min_learning_rate": 1e-7, - "learning_rate_drop_patience": 30, - "learning_rate_drop_div": 1.5, - "load_before_drop": true, - "clip_norm": 1.0, - "save_path": "{NER_PATH}/model", - "load_path": "{NER_PATH}/model", - "in": ["x_subword_tok_ids", "x_subword_mask", "pred_subword_mask"], - "in_y": ["y_ind"], - "out": ["y_pred"] - } - ], - "out": ["x_tokens", "y_pred"] - }, - "train": { - "epochs": 30, - "batch_size": 16, - "metrics": [ - { - "name": "ner_f1", - "inputs": ["y", "y_pred"] - }, - { - "name": "ner_token_f1", - "inputs": ["y", "y_pred"] - } - ], - "validation_patience": 5, - "val_every_n_batches": 400, - - "log_every_n_batches": 400, - "tensorboard_log_dir": "{NER_PATH}/logs", - "show_examples": false, - "pytest_max_batches": 2, - "pytest_batch_size": 8, - "evaluation_targets": ["valid", "test"], - "class_name": "nn_trainer" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "BERT_PATH": "{DOWNLOADS_PATH}/bert_models/multi_cased_L-12_H-768_A-12", - "NER_PATH": "{MODELS_PATH}/ner_ent_and_type_rus" - }, - "labels": { - "telegram_utils": "NERCoNLL2003Model", - "server_utils": "NER" - }, - "download": [ - - { - "url": "http://files.deeppavlov.ai/kbqa/datasets/entity_and_type_detection_rus.pickle", - "subdir": "{MODELS_PATH}" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/bert/multi_cased_L-12_H-768_A-12.zip", - "subdir": "{DOWNLOADS_PATH}/bert_models" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/models/ner_cq_rus.tar.gz", - "subdir": "{MODELS_PATH}/ner_ent_and_type_rus" - } - ] - } -} diff --git a/deeppavlov/configs/ner/ner_ontonotes_bert_torch.json b/deeppavlov/configs/ner/ner_case_agnostic_mdistilbert.json similarity index 75% rename from deeppavlov/configs/ner/ner_ontonotes_bert_torch.json rename to deeppavlov/configs/ner/ner_case_agnostic_mdistilbert.json index ce9360ee80..f62bab103d 100644 --- a/deeppavlov/configs/ner/ner_ontonotes_bert_torch.json +++ b/deeppavlov/configs/ner/ner_case_agnostic_mdistilbert.json @@ -1,8 +1,7 @@ { "dataset_reader": { "class_name": "conll2003_reader", - "data_path": "{DOWNLOADS_PATH}/ontonotes/", - "dataset_name": "ontonotes", + "dataset_name": "conll2003", "provide_pos": false }, "dataset_iterator": { @@ -20,7 +19,7 @@ "max_subword_length": 15, "token_masking_prob": 0.0, "in": ["x"], - "out": ["x_tokens", "x_subword_tokens", "x_subword_tok_ids", "startofword_markers", "attention_mask"] + "out": ["x_tokens", "x_subword_tokens", "x_subword_tok_ids", "startofword_markers", "attention_mask", "tokens_offsets"] }, { "id": "tag_vocab", @@ -38,7 +37,7 @@ "n_tags": "#tag_vocab.len", "pretrained_bert": "{TRANSFORMER}", "attention_probs_keep_prob": 0.5, - "return_probas": false, + "use_crf": true, "encoder_layer_ids": [-1], "optimizer": "AdamW", "optimizer_parameters": { @@ -49,14 +48,14 @@ }, "clip_norm": 1.0, "min_learning_rate": 1e-07, - "learning_rate_drop_patience": 30, + "learning_rate_drop_patience": 20, "learning_rate_drop_div": 1.5, "load_before_drop": true, "save_path": "{MODEL_PATH}/model", "load_path": "{MODEL_PATH}/model", "in": ["x_subword_tok_ids", "attention_mask", "startofword_markers"], "in_y": ["y_ind"], - "out": ["y_pred_ind"] + "out": ["y_pred_ind", "probas"] }, { "ref": "tag_vocab", @@ -67,8 +66,8 @@ "out": ["x_tokens", "y_pred"] }, "train": { - "epochs": 30, - "batch_size": 10, + "epochs": 50, + "batch_size": 8, "metrics": [ { "name": "ner_f1", @@ -80,27 +79,27 @@ } ], "validation_patience": 100, - "val_every_n_batches": 20, - "log_every_n_batches": 20, + "val_every_n_batches": 50, + "log_every_n_batches": 50, "show_examples": false, "pytest_max_batches": 2, "pytest_batch_size": 8, - "evaluation_targets": ["valid", "test"], + "evaluation_targets": ["test", "valid"], "class_name": "torch_trainer" }, "metadata": { "variables": { "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "TRANSFORMER": "bert-base-cased", - "MODEL_PATH": "{MODELS_PATH}/ner_ontonotes_bert_torch/{TRANSFORMER}" + "DOWNLOADS_PATH": "~/.deeppavlov/downloads", + "MODELS_PATH": "~/.deeppavlov/models", + "TRANSFORMER": "distilbert-base-multilingual-cased", + "MODEL_PATH": "{MODELS_PATH}/ner/ner_case_agnostic_mdistilbert" }, "download": [ { - "url": "http://files.deeppavlov.ai/v1/ner/ner_ontonotes_bert_torch.tar.gz", - "subdir": "{ROOT_PATH}/models" + "url": "http://files.deeppavlov.ai/v1/ner/ner_case_agnostic_mdistilbert.tar.gz", + "subdir": "{MODELS_PATH}" } ] } -} \ No newline at end of file +} diff --git a/deeppavlov/configs/ner/ner_rus_bert_torch.json b/deeppavlov/configs/ner/ner_collection3_bert.json similarity index 87% rename from deeppavlov/configs/ner/ner_rus_bert_torch.json rename to deeppavlov/configs/ner/ner_collection3_bert.json index 8a4c51ff5f..0ef9f98d54 100644 --- a/deeppavlov/configs/ner/ner_rus_bert_torch.json +++ b/deeppavlov/configs/ner/ner_collection3_bert.json @@ -1,9 +1,11 @@ { "dataset_reader": { "class_name": "conll2003_reader", - "data_path": "{DOWNLOADS_PATH}/total_rus/", - "dataset_name": "collection_rus", - "provide_pos": false + "data_path": "{DOWNLOADS_PATH}/collection3/", + "dataset_name": "collection3", + "provide_pos": false, + "provide_chunk": false, + "iobes": true }, "dataset_iterator": { "class_name": "data_learning_iterator" @@ -31,7 +33,8 @@ "x_subword_tokens", "x_subword_tok_ids", "startofword_markers", - "attention_mask" + "attention_mask", + "tokens_offsets" ] }, { @@ -58,7 +61,6 @@ "n_tags": "#tag_vocab.len", "pretrained_bert": "{TRANSFORMER}", "attention_probs_keep_prob": 0.5, - "return_probas": false, "encoder_layer_ids": [ -1 ], @@ -88,7 +90,8 @@ "y_ind" ], "out": [ - "y_pred_ind" + "y_pred_ind", + "probas" ] }, { @@ -143,12 +146,12 @@ "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", "MODELS_PATH": "{ROOT_PATH}/models", "TRANSFORMER": "DeepPavlov/rubert-base-cased", - "MODEL_PATH": "{MODELS_PATH}/ner_rus_bert_torch" + "MODEL_PATH": "{MODELS_PATH}/ner_rus_bert_coll3_torch" }, "download": [ { - "url": "http://files.deeppavlov.ai/0.16/ner/ner_rus_bert_torch.tar.gz", - "subdir": "{MODELS_PATH}" + "url": "http://files.deeppavlov.ai/v1/ner/ner_rus_bert_coll3_torch.tar.gz", + "subdir": "{MODEL_PATH}" } ] } diff --git a/deeppavlov/configs/ner/ner_collection3_m1.json b/deeppavlov/configs/ner/ner_collection3_m1.json deleted file mode 100644 index 0662c521cc..0000000000 --- a/deeppavlov/configs/ner/ner_collection3_m1.json +++ /dev/null @@ -1,134 +0,0 @@ -{ - "dataset_reader": { - "class_name": "conll2003_reader", - "data_path": "{DOWNLOADS_PATH}/collection3/", - "dataset_name": "collection3", - "provide_pos": false, - "provide_chunk": false, - "iobes": true - }, - "dataset_iterator": { - "class_name": "data_learning_iterator" - }, - "chainer": { - "in": ["x"], - "in_y": ["y"], - "pipe": [ - { - "in": ["x"], - "out": ["x_tokens"], - "class_name": "lazy_tokenizer" - }, - { - "in": ["x_tokens"], - "out": ["x_lower", "sent_lengths", "x_tokens_elmo"], - "class_name": "ner_preprocessor", - "id": "ner_preprocessor", - "get_x_padded_for_elmo": true, - "get_x_cap_padded": false - }, - { - "in": ["x_lower"], - "out": ["x_tok_ind"], - "fit_on": ["x_lower"], - "class_name": "ner_vocab", - "id": "word_vocab", - "save_path": "{MODELS_PATH}/word.dict", - "load_path": "{MODELS_PATH}/word.dict" - }, - { - "in": ["y"], - "out": ["y_ind"], - "fit_on": ["y"], - "class_name": "ner_vocab", - "id": "tag_vocab", - "save_path": "{MODELS_PATH}/tag.dict", - "load_path": "{MODELS_PATH}/tag.dict" - }, - { - "in": ["x_tokens"], - "out": ["x_char_ind"], - "fit_on": ["x_tokens"], - "class_name": "ner_vocab", - "char_level": true, - "id": "char_vocab", - "save_path": "{MODELS_PATH}/char.dict", - "load_path": "{MODELS_PATH}/char.dict" - }, - { - "in":[ - "sent_lengths", - "x_tok_ind", - "x_char_ind", - "x_tokens_elmo" - ], - "in_y": ["y_ind"], - "out": ["y_predicted"], - "class_name": "hybrid_ner_model", - "n_tags": "#tag_vocab.len", - "word_emb_path": "{DOWNLOADS_PATH}/embeddings/lenta_lower_100.bin", - "word_emb_name": "fasttext", - "word_dim": 100, - "word_vocab": "#word_vocab", - "char_vocab_size": "#char_vocab.len", - "char_dim": 100, - "elmo_dim": 128, - "elmo_hub_path": "http://files.deeppavlov.ai/deeppavlov_data/elmo_ru-news_wmt11-16_1.5M_steps.tar.gz", - "lstm_hidden_size": 256, - "save_path": "{MODELS_PATH}/collection3", - "load_path": "{MODELS_PATH}/collection3", - "learning_rate": 1e-3, - "learning_rate_drop_patience": 5, - "learning_rate_drop_div": 10, - "dropout_keep_prob": 0.7 - }, - { - "in": ["y_predicted"], - "out": ["tags"], - "class_name": "convert_ids2tags", - "id2tag": "#tag_vocab.i2t" - } - ], - "out": ["x_tokens", "tags"] - }, - "train": { - "epochs": 100, - "batch_size": 64, - "metrics": [ - { - "name": "ner_f1", - "inputs": ["y", "tags"] - }, - { - "name": "ner_token_f1", - "inputs": ["y", "tags"] - } - ], - "validation_patience": 10, - "val_every_n_epochs": 1, - "log_every_n_epochs": 1, - "show_examples": false, - "class_name": "nn_trainer", - "evaluation_targets": [ - "valid", - "test" - ] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models/collection3" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/ner_collection3_m1.tar.gz", - "subdir": "{MODELS_PATH}" - }, - { - "url": "http://files.deeppavlov.ai/embeddings/lenta_lower_100.bin", - "subdir": "{DOWNLOADS_PATH}/embeddings" - } - ] - } -} diff --git a/deeppavlov/configs/ner/ner_conll2003.json b/deeppavlov/configs/ner/ner_conll2003.json deleted file mode 100644 index 25510db208..0000000000 --- a/deeppavlov/configs/ner/ner_conll2003.json +++ /dev/null @@ -1,177 +0,0 @@ -{ - "dataset_reader": { - "class_name": "conll2003_reader", - "data_path": "{DOWNLOADS_PATH}/conll2003/", - "dataset_name": "conll2003", - "provide_pos": false - }, - "dataset_iterator": { - "class_name": "data_learning_iterator" - }, - "chainer": { - "in": ["x"], - "in_y": ["y"], - "pipe": [ - { - "in": ["x"], - "class_name": "lazy_tokenizer", - "out": ["x_tokens"] - }, - { - "in": ["x_tokens"], - "class_name": "str_lower", - "out": ["x_lower"] - }, - { - "in": ["x_lower"], - "class_name": "sanitizer", - "nums": true, - "out": ["x_san"] - }, - { - "in": ["x_san"], - "id": "word_vocab", - "class_name": "simple_vocab", - "pad_with_zeros": true, - "special_tokens": [""], - "fit_on": ["x_san"], - "save_path": "{NER_PATH}/word.dict", - "load_path": "{NER_PATH}/word.dict", - "out": ["x_tok_ind"] - }, - { - "in": ["y"], - "id": "tag_vocab", - "class_name": "simple_vocab", - "pad_with_zeros": true, - "fit_on": ["y"], - "save_path": "{NER_PATH}/tag.dict", - "load_path": "{NER_PATH}/tag.dict", - "out": ["y_ind"] - }, - { - "in": ["x_tokens"], - "class_name": "char_splitter", - "out": ["x_char"] - }, - { - "in": ["x_char"], - "id": "char_vocab", - "class_name": "simple_vocab", - "pad_with_zeros": true, - "fit_on": ["x_char"], - "save_path": "{NER_PATH}/char.dict", - "load_path": "{NER_PATH}/char.dict", - "out": ["x_char_ind"] - }, - { - "in": ["x_tokens"], - "class_name": "mask", - "out": ["mask"] - }, - { - "in": ["x_san"], - "id": "glove_emb", - "class_name": "glove", - "pad_zero": true, - "load_path": "{DOWNLOADS_PATH}/embeddings/glove.6B.100d.txt", - "out": ["x_emb"] - }, - { - "id": "embeddings", - "class_name": "emb_mat_assembler", - "embedder": "#glove_emb", - "vocab": "#word_vocab" - }, - { - "id": "embeddings_char", - "class_name": "emb_mat_assembler", - "character_level": true, - "emb_dim": 32, - "embedder": "#glove_emb", - "vocab": "#char_vocab" - }, - { - "id": "capitalization", - "class_name": "capitalization_featurizer", - "in": ["x_tokens"], - "out": ["cap"] - }, - { - "in": ["x_emb", "mask", "x_char_ind", "cap"], - "in_y": ["y_ind"], - "out": ["y_predicted"], - "class_name": "ner", - "main": true, - "token_emb_dim": "#glove_emb.dim", - "n_hidden_list": [128], - "net_type": "rnn", - "cell_type": "lstm", - "use_cudnn_rnn": true, - "n_tags": "#tag_vocab.len", - "capitalization_dim": "#capitalization.dim", - "char_emb_dim": "#embeddings_char.dim", - "save_path": "{NER_PATH}/model_no_pos", - "load_path": "{NER_PATH}/model_no_pos", - "char_emb_mat": "#embeddings_char.emb_mat", - "two_dense_on_top": true, - "use_crf": true, - "embeddings_dropout": true, - "top_dropout": true, - "intra_layer_dropout": true, - "l2_reg": 0, - "learning_rate": 1e-2, - "dropout_keep_prob": 0.7 - }, - { - "ref": "tag_vocab", - "in": ["y_predicted"], - "out": ["tags"] - } - ], - "out": ["x_tokens", "tags"] - }, - "train": { - "epochs": 100, - "batch_size": 64, - - "metrics": [ - { - "name": "ner_f1", - "inputs": ["y", "tags"] - }, - { - "name": "ner_token_f1", - "inputs": ["y", "tags"] - } - ], - "validation_patience": 7, - "val_every_n_epochs": 1, - - "log_every_n_epochs": 1, - "show_examples": false, - "class_name": "nn_trainer", - "evaluation_targets": [ - "valid", - "test" - ] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "NER_PATH": "{MODELS_PATH}/ner_conll2003" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/ner_conll2003_v5.tar.gz", - "subdir": "{MODELS_PATH}" - }, - { - "url": "http://files.deeppavlov.ai/embeddings/glove.6B.100d.txt", - "subdir": "{DOWNLOADS_PATH}/embeddings" - } - ] - } -} diff --git a/deeppavlov/configs/ner/ner_conll2003_bert.json b/deeppavlov/configs/ner/ner_conll2003_bert.json index 2314b8b875..dbe5530e01 100644 --- a/deeppavlov/configs/ner/ner_conll2003_bert.json +++ b/deeppavlov/configs/ner/ner_conll2003_bert.json @@ -9,62 +9,104 @@ "class_name": "data_learning_iterator" }, "chainer": { - "in": ["x"], - "in_y": ["y"], + "in": [ + "x" + ], + "in_y": [ + "y" + ], "pipe": [ { - "class_name": "bert_ner_preprocessor", - "vocab_file": "{BERT_PATH}/vocab.txt", + "class_name": "torch_transformers_ner_preprocessor", + "vocab_file": "{TRANSFORMER}", "do_lower_case": false, "max_seq_length": 512, "max_subword_length": 15, "token_masking_prob": 0.0, - "in": ["x"], - "out": ["x_tokens", "x_subword_tokens", "x_subword_tok_ids", "startofword_markers", "attention_mask"] + "in": [ + "x" + ], + "out": [ + "x_tokens", + "x_subword_tokens", + "x_subword_tok_ids", + "startofword_markers", + "attention_mask", + "tokens_offsets" + ] }, { "id": "tag_vocab", "class_name": "simple_vocab", - "unk_token": ["O"], + "unk_token": [ + "O" + ], "pad_with_zeros": true, - "save_path": "{NER_PATH}/tag.dict", - "load_path": "{NER_PATH}/tag.dict", - "fit_on": ["y"], - "in": ["y"], - "out": ["y_ind"] + "save_path": "{MODEL_PATH}/tag.dict", + "load_path": "{MODEL_PATH}/tag.dict", + "fit_on": [ + "y" + ], + "in": [ + "y" + ], + "out": [ + "y_ind" + ] }, { - "class_name": "bert_sequence_tagger", + "class_name": "torch_transformers_sequence_tagger", "n_tags": "#tag_vocab.len", - "keep_prob": 0.1, - "bert_config_file": "{BERT_PATH}/bert_config.json", - "pretrained_bert": "{BERT_PATH}/bert_model.ckpt", + "pretrained_bert": "{TRANSFORMER}", "attention_probs_keep_prob": 0.5, "use_crf": true, - "return_probas": false, - "ema_decay": 0.9, - "encoder_layer_ids": [-1], - "optimizer": "tf.train:AdamOptimizer", - "learning_rate": 1e-3, - "bert_learning_rate": 2e-5, - "min_learning_rate": 1e-7, + "encoder_layer_ids": [ + -1 + ], + "optimizer": "AdamW", + "optimizer_parameters": { + "lr": 2e-05, + "weight_decay": 1e-06, + "betas": [ + 0.9, + 0.999 + ], + "eps": 1e-06 + }, + "clip_norm": 1.0, + "min_learning_rate": 1e-07, "learning_rate_drop_patience": 30, "learning_rate_drop_div": 1.5, "load_before_drop": true, - "clip_norm": 1.0, - "save_path": "{NER_PATH}/model", - "load_path": "{NER_PATH}/model", - "in": ["x_subword_tok_ids", "attention_mask", "startofword_markers"], - "in_y": ["y_ind"], - "out": ["y_pred_ind"] + "save_path": "{MODEL_PATH}/model", + "load_path": "{MODEL_PATH}/model", + "in": [ + "x_subword_tok_ids", + "attention_mask", + "startofword_markers" + ], + "in_y": [ + "y_ind" + ], + "out": [ + "y_pred_ind", + "probas" + ] }, { "ref": "tag_vocab", - "in": ["y_pred_ind"], - "out": ["y_pred"] + "in": [ + "y_pred_ind" + ], + "out": [ + "y_pred" + ] } ], - "out": ["x_tokens", "y_pred"] + "out": [ + "x_tokens", + "y_pred" + ] }, "train": { "epochs": 30, @@ -72,40 +114,43 @@ "metrics": [ { "name": "ner_f1", - "inputs": ["y", "y_pred"] + "inputs": [ + "y", + "y_pred" + ] }, { "name": "ner_token_f1", - "inputs": ["y", "y_pred"] + "inputs": [ + "y", + "y_pred" + ] } ], "validation_patience": 100, "val_every_n_batches": 20, - "log_every_n_batches": 20, - "tensorboard_log_dir": "{NER_PATH}/logs", "show_examples": false, "pytest_max_batches": 2, "pytest_batch_size": 8, - "evaluation_targets": ["valid", "test"], - "class_name": "nn_trainer" + "evaluation_targets": [ + "valid", + "test" + ], + "class_name": "torch_trainer" }, "metadata": { "variables": { "ROOT_PATH": "~/.deeppavlov", "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", "MODELS_PATH": "{ROOT_PATH}/models", - "BERT_PATH": "{DOWNLOADS_PATH}/bert_models/cased_L-12_H-768_A-12", - "NER_PATH": "{MODELS_PATH}/ner_conll2003_bert" + "TRANSFORMER": "bert-base-cased", + "MODEL_PATH": "{MODELS_PATH}/ner_conll2003_torch_bert_crf" }, "download": [ { - "url": "http://files.deeppavlov.ai/deeppavlov_data/ner_conll2003_bert_v1.tar.gz", - "subdir": "{MODELS_PATH}" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/bert/cased_L-12_H-768_A-12.zip", - "subdir": "{DOWNLOADS_PATH}/bert_models" + "url": "http://files.deeppavlov.ai/v1/ner/ner_conll2003_bert_torch_crf.tar.gz", + "subdir": "{MODEL_PATH}" } ] } diff --git a/deeppavlov/configs/ner/ner_conll2003_pos.json b/deeppavlov/configs/ner/ner_conll2003_pos.json deleted file mode 100644 index 3ddd6ab55d..0000000000 --- a/deeppavlov/configs/ner/ner_conll2003_pos.json +++ /dev/null @@ -1,189 +0,0 @@ -{ - "dataset_reader": { - "class_name": "conll2003_reader", - "data_path": "{DOWNLOADS_PATH}/conll2003/", - "dataset_name": "conll2003", - "provide_pos": true - }, - "dataset_iterator": { - "class_name": "data_learning_iterator" - }, - "chainer": { - "in": ["x", "pos"], - "in_y": ["y"], - "pipe": [ - { - "in": ["x"], - "class_name": "lazy_tokenizer", - "out": ["x_tokens"] - }, - { - "in": ["x_tokens"], - "class_name": "str_lower", - "out": ["x_lower"] - }, - { - "in": ["x_lower"], - "class_name": "sanitizer", - "nums": true, - "out": ["x_san"] - }, - { - "in": ["x_san"], - "id": "word_vocab", - "class_name": "simple_vocab", - "pad_with_zeros": true, - "special_tokens": [""], - "fit_on": ["x_san"], - "save_path": "{MODELS_PATH}/ner_conll2003/word.dict", - "load_path": "{MODELS_PATH}/ner_conll2003/word.dict", - "out": ["x_tok_ind"] - }, - { - "in": ["pos"], - "id": "pos_vocab", - "class_name": "simple_vocab", - "pad_with_zeros": true, - "fit_on": ["pos"], - "save_path": "{MODELS_PATH}/ner_conll2003/pos.dict", - "load_path": "{MODELS_PATH}/ner_conll2003/pos.dict", - "out": ["pos_ind"] - }, - { - "in": ["pos_ind"], - "class_name": "one_hotter", - "depth": "#pos_vocab.len", - "pad_zeros": true, - "out": ["pos_one_hot"] - }, - { - "in": ["y"], - "id": "tag_vocab", - "class_name": "simple_vocab", - "pad_with_zeros": true, - "fit_on": ["y"], - "save_path": "{MODELS_PATH}/ner_conll2003/tag.dict", - "load_path": "{MODELS_PATH}/ner_conll2003/tag.dict", - "out": ["y_ind"] - }, - { - "in": ["x_tokens"], - "class_name": "char_splitter", - "out": ["x_char"] - }, - { - "in": ["x_char"], - "id": "char_vocab", - "class_name": "simple_vocab", - "pad_with_zeros": true, - "fit_on": ["x_char"], - "save_path": "{MODELS_PATH}/ner_conll2003/char.dict", - "load_path": "{MODELS_PATH}/ner_conll2003/char.dict", - "out": ["x_char_ind"] - }, - { - "in": ["x_tokens"], - "class_name": "mask", - "out": ["mask"] - }, - { - "in": ["x_san"], - "id": "glove_emb", - "class_name": "glove", - "pad_zero": true, - "load_path": "{DOWNLOADS_PATH}/embeddings/glove.6B.100d.txt", - - "out": ["x_emb"] - }, - { - "id": "embeddings", - "class_name": "emb_mat_assembler", - "embedder": "#glove_emb", - "vocab": "#word_vocab" - }, - { - "id": "embeddings_char", - "class_name": "emb_mat_assembler", - "character_level": true, - "emb_dim": 32, - "embedder": "#glove_emb", - "vocab": "#char_vocab" - }, - { - "id": "capitalization", - "class_name": "capitalization_featurizer", - "in": ["x_tokens"], - "out": ["cap"] - }, - { - "in": ["x_emb", "mask", "x_char_ind", "cap", "pos_one_hot"], - "in_y": ["y_ind"], - "out": ["y_predicted"], - "class_name": "ner", - "main": true, - "token_emb_dim": "#glove_emb.dim", - "n_hidden_list": [128], - "net_type": "rnn", - "cell_type": "lstm", - "use_cudnn_rnn": true, - "n_tags": "#tag_vocab.len", - "capitalization_dim": "#capitalization.dim", - "char_emb_dim": "#embeddings_char.dim", - "pos_features_dim": "#pos_vocab.len", - "save_path": "{MODELS_PATH}/ner_conll2003/model", - "load_path": "{MODELS_PATH}/ner_conll2003/model", - "char_emb_mat": "#embeddings_char.emb_mat", - "two_dense_on_top": true, - "use_crf": true, - "use_batch_norm": true, - "embeddings_dropout": true, - "top_dropout": true, - "intra_layer_dropout": true, - "l2_reg": 0, - "learning_rate": 1e-2, - "dropout_keep_prob": 0.7 - }, - { - "ref": "tag_vocab", - "in": ["y_predicted"], - "out": ["tags"] - } - ], - - "out": ["x_tokens", "tags"] - }, - "train": { - "epochs": 100, - "batch_size": 64, - - "metrics": [ - { - "name": "ner_f1", - "inputs": ["y", "tags"] - } - ], - "validation_patience": 7, - "val_every_n_epochs": 1, - - "log_every_n_epochs": 1, - "show_examples": false, - "class_name": "nn_trainer", - "evaluation_targets": [ - "valid", - "test" - ] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/embeddings/glove.6B.100d.txt", - "subdir": "{DOWNLOADS_PATH}/embeddings" - } - ] - } -} \ No newline at end of file diff --git a/deeppavlov/configs/ner/ner_conll2003_torch_bert.json b/deeppavlov/configs/ner/ner_conll2003_torch_bert.json deleted file mode 100644 index 21d338dcff..0000000000 --- a/deeppavlov/configs/ner/ner_conll2003_torch_bert.json +++ /dev/null @@ -1,155 +0,0 @@ -{ - "dataset_reader": { - "class_name": "conll2003_reader", - "data_path": "{DOWNLOADS_PATH}/conll2003/", - "dataset_name": "conll2003", - "provide_pos": false - }, - "dataset_iterator": { - "class_name": "data_learning_iterator" - }, - "chainer": { - "in": [ - "x" - ], - "in_y": [ - "y" - ], - "pipe": [ - { - "class_name": "torch_transformers_ner_preprocessor", - "vocab_file": "{TRANSFORMER}", - "do_lower_case": false, - "max_seq_length": 512, - "max_subword_length": 15, - "token_masking_prob": 0.0, - "in": [ - "x" - ], - "out": [ - "x_tokens", - "x_subword_tokens", - "x_subword_tok_ids", - "startofword_markers", - "attention_mask" - ] - }, - { - "id": "tag_vocab", - "class_name": "simple_vocab", - "unk_token": [ - "O" - ], - "pad_with_zeros": true, - "save_path": "{MODEL_PATH}/tag.dict", - "load_path": "{MODEL_PATH}/tag.dict", - "fit_on": [ - "y" - ], - "in": [ - "y" - ], - "out": [ - "y_ind" - ] - }, - { - "class_name": "torch_transformers_sequence_tagger", - "n_tags": "#tag_vocab.len", - "pretrained_bert": "{TRANSFORMER}", - "attention_probs_keep_prob": 0.5, - "return_probas": false, - "encoder_layer_ids": [ - -1 - ], - "optimizer": "AdamW", - "optimizer_parameters": { - "lr": 2e-05, - "weight_decay": 1e-06, - "betas": [ - 0.9, - 0.999 - ], - "eps": 1e-06 - }, - "clip_norm": 1.0, - "min_learning_rate": 1e-07, - "learning_rate_drop_patience": 30, - "learning_rate_drop_div": 1.5, - "load_before_drop": true, - "save_path": "{MODEL_PATH}/model", - "load_path": "{MODEL_PATH}/model", - "in": [ - "x_subword_tok_ids", - "attention_mask", - "startofword_markers" - ], - "in_y": [ - "y_ind" - ], - "out": [ - "y_pred_ind" - ] - }, - { - "ref": "tag_vocab", - "in": [ - "y_pred_ind" - ], - "out": [ - "y_pred" - ] - } - ], - "out": [ - "x_tokens", - "y_pred" - ] - }, - "train": { - "epochs": 30, - "batch_size": 16, - "metrics": [ - { - "name": "ner_f1", - "inputs": [ - "y", - "y_pred" - ] - }, - { - "name": "ner_token_f1", - "inputs": [ - "y", - "y_pred" - ] - } - ], - "validation_patience": 100, - "val_every_n_batches": 20, - "log_every_n_batches": 20, - "show_examples": false, - "pytest_max_batches": 2, - "pytest_batch_size": 8, - "evaluation_targets": [ - "valid", - "test" - ], - "class_name": "torch_trainer" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "TRANSFORMER": "bert-base-cased", - "MODEL_PATH": "{MODELS_PATH}/ner_conll2003_torch_bert/{TRANSFORMER}" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/0.16/ner/ner_conll2003_torch_bert.tar.gz", - "subdir": "{MODELS_PATH}" - } - ] - } -} diff --git a/deeppavlov/configs/ner/ner_dstc2.json b/deeppavlov/configs/ner/ner_dstc2.json deleted file mode 100644 index 4f35d4b530..0000000000 --- a/deeppavlov/configs/ner/ner_dstc2.json +++ /dev/null @@ -1,126 +0,0 @@ -{ - "dataset_reader": { - "class_name": "dstc2_reader", - "data_path": "{DATA_PATH}" - }, - "dataset_iterator": { - "class_name": "dstc2_ner_iterator", - "slot_values_path": "{SLOT_VALS_PATH}" - }, - "chainer": { - "in": ["x"], - "in_y": ["y"], - "pipe": [ - { - "in": ["x"], - "class_name": "lazy_tokenizer", - "out": ["x_tokens"] - }, - { - "in": ["x_tokens"], - "class_name": "str_lower", - "out": ["x_lower"] - }, - { - "in": ["x_lower"], - "id": "word_vocab", - "class_name": "simple_vocab", - "pad_with_zeros": true, - "fit_on": ["x_lower"], - "save_path": "{MODEL_PATH}/word.dict", - "load_path": "{MODEL_PATH}/word.dict", - "out": ["x_tok_ind"] - }, - { - "class_name": "random_emb_mat", - "id": "embeddings", - "vocab_len": "#word_vocab.len", - "emb_dim": 100 - }, - { - "in": ["y"], - "id": "tag_vocab", - "class_name": "simple_vocab", - "pad_with_zeros": true, - "fit_on": ["y"], - "save_path": "{MODEL_PATH}/tag.dict", - "load_path": "{MODEL_PATH}/tag.dict", - "out": ["y_ind"] - }, - { - "in": ["x_tokens"], - "class_name": "mask", - "out": ["mask"] - }, - { - "in": ["x_tok_ind", "mask"], - "in_y": ["y_ind"], - "out": ["y_predicted"], - "class_name": "ner", - "main": true, - "token_emb_mat": "#embeddings.emb_mat", - "n_hidden_list": [64, 64], - "net_type": "cnn", - "n_tags": "#tag_vocab.len", - "save_path": "{MODEL_PATH}/model", - "load_path": "{MODEL_PATH}/model", - "embeddings_dropout": true, - "top_dropout": true, - "intra_layer_dropout": false, - "use_batch_norm": true, - "learning_rate": 1e-2, - "dropout_keep_prob": 0.5 - }, - { - "ref": "tag_vocab", - "in": ["y_predicted"], - "out": ["tags"] - } - ], - "out": ["x_tokens", "tags"] - }, - "train": { - "epochs": 100, - "batch_size": 64, - - "metrics": [ - { - "name": "ner_f1", - "inputs": ["y", "tags"] - }, - { - "name": "per_token_accuracy", - "inputs": ["y", "tags"] - } - ], - "validation_patience": 5, - "val_every_n_epochs": 5, - - "log_every_n_batches": 100, - "show_examples": false, - "class_name": "nn_trainer", - "evaluation_targets": [ - "valid", - "test" - ] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DATA_PATH": "{ROOT_PATH}/downloads/dstc2", - "SLOT_VALS_PATH": "{DATA_PATH}/dstc_slot_vals.json", - "MODELS_PATH": "{ROOT_PATH}/models", - "MODEL_PATH": "{MODELS_PATH}/slotfill_dstc2" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/dstc_slot_vals.tar.gz", - "subdir": "{DATA_PATH}" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/slotfill_dstc2.tar.gz", - "subdir": "{MODELS_PATH}" - } - ] - } -} diff --git a/deeppavlov/configs/ner/ner_few_shot_ru.json b/deeppavlov/configs/ner/ner_few_shot_ru.json deleted file mode 100644 index ad60b46567..0000000000 --- a/deeppavlov/configs/ner/ner_few_shot_ru.json +++ /dev/null @@ -1,104 +0,0 @@ -{ - "deeppavlov_root": ".", - "dataset_reader": { - "class_name": "conll2003_reader", - "data_path": "{DOWNLOADS_PATH}/ner_few_shot_data/" - }, - "dataset_iterator": { - "class_name": "data_learning_iterator" - }, - "chainer": { - "in": ["x"], - "in_y": ["tags"], - "pipe": [ - { - "in": ["x"], - "class_name": "lazy_tokenizer", - "out": ["x_tokens"] - }, - { - "in": ["tags"], - "id": "tag_vocab", - "class_name": "simple_vocab", - "pad_with_zeros": false, - "fit_on": ["tags"], - "save_path": "{MODELS_PATH}/ner_fs/tag.dict", - "load_path": "{MODELS_PATH}/ner_fs/tag.dict", - "out": ["tag_indices"] - }, - { - "class_name": "elmo_embedder", - "elmo_output_names": ["lstm_outputs1", "lstm_outputs2", "word_emb"], - "mini_batch_size": 32, - "in": ["x_tokens"], - "spec": "{DOWNLOADS_PATH}/embeddings/elmo_ru_news", - "out": [ - "tokens_emb" - ] - }, - { - "class_name": "ner_svm", - "in": "tokens_emb", - "out": "tag_indices", - "fit_on": ["tokens_emb", "tag_indices"], - "save_path": "{MODELS_PATH}/ner_fs/model", - "load_path": "{MODELS_PATH}/ner_fs/model" - }, - { - "ref": "tag_vocab", - "in": ["tag_indices"], - "out": ["tags_hat"] - }, - { - "class_name": "ner_bio_converter", - "in": ["tags_hat"], - "out": ["tags_bio_hat"] - }, - { - "class_name": "ner_bio_converter", - "in": ["tags"], - "out": ["tags_bio"] - } - - ], - - "out": ["x_tokens", "tags_bio_hat"] - }, - "train": { - "epochs": 100, - "batch_size": 64, - "metrics": [ - { - "name": "ner_f1", - "inputs": [ - "tags_bio", - "tags_bio_hat" - ] - } - ], - "validation_patience": 7, - "val_every_n_epochs": 1, - - "log_every_n_epochs": 1, - "show_examples": false, - "tensorboard_log_dir": "{MODELS_PATH}/ner_fs/logs", - "class_name": "fit_trainer", - "evaluation_targets": [ - "valid", - "test" - ] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/elmo_ru-news_wmt11-16_1.5M_steps.tar.gz", - "subdir": "{DOWNLOADS_PATH}/embeddings/elmo_ru_news" - } - ] - } -} \ No newline at end of file diff --git a/deeppavlov/configs/ner/ner_few_shot_ru_simulate.json b/deeppavlov/configs/ner/ner_few_shot_ru_simulate.json deleted file mode 100644 index cb58707224..0000000000 --- a/deeppavlov/configs/ner/ner_few_shot_ru_simulate.json +++ /dev/null @@ -1,140 +0,0 @@ -{ - "dataset_reader": { - "class_name": "conll2003_reader", - "dataset_name": "collection_rus", - "data_path": "{DOWNLOADS_PATH}/ner_few_shot_data/" - }, - "dataset_iterator": { - "class_name": "ner_few_shot_iterator", - "target_tag": "PER" - }, - "chainer": { - "in": [ - "x" - ], - "in_y": [ - "tags" - ], - "pipe": [ - { - "in": [ - "x" - ], - "class_name": "lazy_tokenizer", - "out": [ - "x_tokens" - ] - }, - { - "in": [ - "tags" - ], - "id": "tag_vocab", - "class_name": "simple_vocab", - "pad_with_zeros": false, - "fit_on": [ - "tags" - ], - "save_path": "{MODELS_PATH}/ner_fs/tag.dict", - "load_path": "{MODELS_PATH}/ner_fs/tag.dict", - "out": [ - "tag_indices" - ] - }, - { - "class_name": "elmo_embedder", - "elmo_output_names": [ - "lstm_outputs1", - "lstm_outputs2", - "word_emb" - ], - "mini_batch_size": 32, - "in": [ - "x_tokens" - ], - "spec": "{DOWNLOADS_PATH}/embeddings/elmo_ru_news", - "out": [ - "tokens_emb" - ] - }, - { - "class_name": "ner_svm", - "in": "tokens_emb", - "out": "tag_indices", - "fit_on": [ - "tokens_emb", - "tag_indices" - ], - "save_path": "{MODELS_PATH}/ner_fs/model", - "load_path": "{MODELS_PATH}/ner_fs/model" - }, - { - "ref": "tag_vocab", - "in": [ - "tag_indices" - ], - "out": [ - "tags_hat" - ] - }, - { - "class_name": "ner_bio_converter", - "in": [ - "tags_hat" - ], - "out": [ - "tags_bio_hat" - ] - }, - { - "class_name": "ner_bio_converter", - "in": [ - "tags" - ], - "out": [ - "tags_bio" - ] - } - ], - "out": [ - "x_tokens", - "tags_bio_hat" - ] - }, - "train": { - "epochs": 100, - "batch_size": 64, - "metrics": [ - { - "name": "ner_f1", - "inputs": [ - "tags_bio", - "tags_bio_hat" - ] - } - ], - "validation_patience": 7, - "val_every_n_epochs": 1, - "log_every_n_epochs": 1, - "show_examples": false, - "tensorboard_log_dir": "{MODELS_PATH}/ner_fs/logs", - "class_name": "fit_trainer", - "evaluation_targets": [ - "valid", - "test" - ] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/elmo_ru-news_wmt11-16_1.5M_steps.tar.gz", - "subdir": "{DOWNLOADS_PATH}/embeddings/elmo_ru_news" - } - ] - } -} \ No newline at end of file diff --git a/deeppavlov/configs/ner/ner_kb_rus.json b/deeppavlov/configs/ner/ner_kb_rus.json deleted file mode 100644 index 1bef6b87b1..0000000000 --- a/deeppavlov/configs/ner/ner_kb_rus.json +++ /dev/null @@ -1,164 +0,0 @@ -{ - "dataset_reader": { - "class_name": "sq_reader", - "data_path": "{DOWNLOADS_PATH}/ner_sq/SQ_rus_dataset_zs.pckl" - }, - "dataset_iterator": { - "class_name": "data_learning_iterator" - }, - "chainer": { - "in": ["x"], - "in_y": ["y"], - "pipe": [ - { - "in": ["x"], - "class_name": "lazy_tokenizer", - "out": ["x_tokens"] - }, - { - "in": ["x_tokens"], - "class_name": "str_lower", - "out": ["x_lower"] - }, - { - "in": ["x_lower"], - "class_name": "sanitizer", - "nums": true, - "out": ["x_san"] - }, - { - "in": ["x_san"], - "id": "word_vocab", - "class_name": "simple_vocab", - "pad_with_zeros": true, - "special_tokens": [""], - "fit_on": ["x_san"], - "save_path": "{MODEL_PATH}/ner/word.dict", - "load_path": "{MODEL_PATH}/ner/word.dict", - "out": ["x_tok_ind"] - }, - { - "in": ["y"], - "id": "tag_vocab", - "class_name": "simple_vocab", - "pad_with_zeros": true, - "fit_on": ["y"], - "save_path": "{MODEL_PATH}/ner/tag.dict", - "load_path": "{MODEL_PATH}/ner/tag.dict", - "out": ["y_ind"] - }, - { - "in": ["x_tokens"], - "class_name": "char_splitter", - "out": ["x_char"] - }, - { - "in": ["x_char"], - "id": "char_vocab", - "class_name": "simple_vocab", - "pad_with_zeros": true, - "fit_on": ["x_char"], - "save_path": "{MODEL_PATH}/ner/char.dict", - "load_path": "{MODEL_PATH}/ner/char.dict", - "out": ["x_char_ind"] - }, - { - "in": ["x_san"], - "id": "embedder", - "class_name": "fasttext", - "pad_zero": true, - "load_path": "{DOWNLOADS_PATH}/embeddings/lenta_lower_100.bin", - "out": ["x_emb"] - }, - { - "in": ["x_tokens"], - "class_name": "mask", - "out": ["mask"] - }, - { - "class_name": "random_emb_mat", - "id": "embeddings", - "vocab_len": "#word_vocab.len", - "emb_dim": 100 - }, - { - "class_name": "random_emb_mat", - "id": "embeddings_char", - "vocab_len": "#char_vocab.len", - "emb_dim": 100 - }, - { - "in": ["x_emb", "mask", "x_char_ind"], - "in_y": ["y_ind"], - "out": ["y_predicted"], - "class_name": "ner", - "main": true, - "n_hidden_list": [128], - "net_type": "rnn", - "cell_type": "lstm", - "use_cudnn_rnn": true, - "n_tags": "#tag_vocab.len", - "token_emb_dim": "#embedder.dim", - "char_emb_dim": 100, - "save_path": "{MODEL_PATH}/ner/model", - "load_path": "{MODEL_PATH}/ner/model", - "char_emb_mat": "#embeddings_char.emb_mat", - "use_crf": true, - "use_batch_norm": true, - "embeddings_dropout": true, - "top_dropout": true, - "intra_layer_dropout": true, - "l2_reg": 0, - "learning_rate": 1e-2, - "dropout_keep_prob": 0.7 - }, - { - "ref": "tag_vocab", - "in": ["y_predicted"], - "out": ["tags"] - } - ], - - "out": ["x_tokens", "tags"] - }, - "train": { - "epochs": 100, - "batch_size": 64, - - "metrics": [ - { - "name": "ner_f1", - "inputs": ["y", "tags"] - } - ], - "validation_patience": 7, - "val_every_n_epochs": 1, - - "log_every_n_epochs": 1, - "show_examples": false, - "tensorboard_log_dir": "{MODEL_PATH}/ner/logs" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "MODEL_PATH": "{MODELS_PATH}/kbqa_mix_lowercase" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/ner_kb_rus.tar.gz", - "subdir": "{MODEL_PATH}" - }, - { - "url": "http://files.deeppavlov.ai/embeddings/lenta_lower_100.bin", - "subdir": "{DOWNLOADS_PATH}/embeddings" - }, - { - "url": "http://files.deeppavlov.ai/datasets/SQ_rus_dataset_zs.pckl", - "subdir": "{DOWNLOADS_PATH}/ner_sq" - } - ] - } -} - diff --git a/deeppavlov/configs/ner/ner_lcquad_bert_ent_and_type.json b/deeppavlov/configs/ner/ner_lcquad_bert_ent_and_type.json deleted file mode 100644 index 0010f48b4f..0000000000 --- a/deeppavlov/configs/ner/ner_lcquad_bert_ent_and_type.json +++ /dev/null @@ -1,119 +0,0 @@ -{ - "dataset_reader": { - "class_name": "sq_reader", - "data_path": "{DOWNLOADS_PATH}/lcquad/entity_and_type_detection.pickle" - }, - "dataset_iterator": { - "class_name": "data_learning_iterator" - }, - "chainer": { - "in": ["x"], - "in_y": ["y"], - "pipe": [ - { - "class_name": "bert_ner_preprocessor", - "vocab_file": "{BERT_PATH}/vocab.txt", - "do_lower_case": false, - "max_seq_length": 512, - "max_subword_length": 15, - "token_maksing_prob": 0.0, - "in": ["x"], - "out": ["x_tokens", "x_subword_tokens", "x_subword_tok_ids", "pred_subword_mask"] - }, - { - "class_name": "mask", - "in": ["x_subword_tokens"], - "out": ["x_subword_mask"] - }, - { - "id": "tag_vocab", - "class_name": "simple_vocab", - "unk_token": ["O"], - "pad_with_zeros": true, - "save_path": "{NER_PATH}/tag.dict", - "load_path": "{NER_PATH}/tag.dict", - "fit_on": ["y"], - "in": ["y"], - "out": ["y_ind"] - }, - { - "class_name": "bert_sequence_tagger", - "n_tags": "#tag_vocab.len", - "keep_prob": 0.1, - "bert_config_file": "{BERT_PATH}/bert_config.json", - "pretrained_bert": "{BERT_PATH}/bert_model.ckpt", - "attention_probs_keep_prob": 0.5, - "use_crf": false, - "return_probas": true, - "ema_decay": 0.9, - "encoder_layer_ids": [-1], - "optimizer": "tf.train:AdamOptimizer", - "learning_rate": 1e-3, - "bert_learning_rate": 2e-5, - "min_learning_rate": 1e-7, - "learning_rate_drop_patience": 30, - "learning_rate_drop_div": 1.5, - "load_before_drop": true, - "clip_norm": 1.0, - "save_path": "{NER_PATH}/model", - "load_path": "{NER_PATH}/model", - "in": ["x_subword_tok_ids", "x_subword_mask", "pred_subword_mask"], - "in_y": ["y_ind"], - "out": ["y_pred_ind"] - } - ], - "out": ["x_tokens", "y_pred_ind"] - }, - "train": { - "epochs": 30, - "batch_size": 16, - "metrics": [ - { - "name": "ner_f1", - "inputs": ["y", "y_pred"] - }, - { - "name": "ner_token_f1", - "inputs": ["y", "y_pred"] - } - ], - "validation_patience": 10, - "val_every_n_batches": 400, - - "log_every_n_batches": 400, - "tensorboard_log_dir": "{NER_PATH}/logs", - "show_examples": false, - "pytest_max_batches": 2, - "pytest_batch_size": 8, - "evaluation_targets": ["valid", "test"], - "class_name": "nn_trainer" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "BERT_PATH": "{DOWNLOADS_PATH}/bert_models_kbqa/cased_L-12_H-768_A-12", - "NER_PATH": "{MODELS_PATH}/ner_lcquad_ent_and_type" - }, - "labels": { - "telegram_utils": "NERCoNLL2003Model", - "server_utils": "NER" - }, - "download": [ - - { - "url": "http://files.deeppavlov.ai/kbqa/datasets/entity_and_type_detection.pickle", - "subdir": "{MODELS_PATH}" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/bert/cased_L-12_H-768_A-12.zip", - "subdir": "{DOWNLOADS_PATH}/bert_models_kbqa" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/models/ner_lcquad.tar.gz", - "subdir": "{MODELS_PATH}/ner_lcquad_ent_and_type" - } - ] - } -} diff --git a/deeppavlov/configs/ner/ner_lcquad_bert_probas.json b/deeppavlov/configs/ner/ner_lcquad_bert_probas.json deleted file mode 100644 index 60c4febd57..0000000000 --- a/deeppavlov/configs/ner/ner_lcquad_bert_probas.json +++ /dev/null @@ -1,119 +0,0 @@ -{ - "dataset_reader": { - "class_name": "sq_reader", - "data_path": "{DOWNLOADS_PATH}/lcquad/entity_detection_dataset.pickle" - }, - "dataset_iterator": { - "class_name": "data_learning_iterator" - }, - "chainer": { - "in": ["x"], - "in_y": ["y"], - "pipe": [ - { - "class_name": "bert_ner_preprocessor", - "vocab_file": "{BERT_PATH}/vocab.txt", - "do_lower_case": false, - "max_seq_length": 512, - "max_subword_length": 15, - "token_maksing_prob": 0.0, - "in": ["x"], - "out": ["x_tokens", "x_subword_tokens", "x_subword_tok_ids", "pred_subword_mask"] - }, - { - "class_name": "mask", - "in": ["x_subword_tokens"], - "out": ["x_subword_mask"] - }, - { - "id": "tag_vocab", - "class_name": "simple_vocab", - "unk_token": ["O"], - "pad_with_zeros": true, - "save_path": "{NER_PATH}/tag.dict", - "load_path": "{NER_PATH}/tag.dict", - "fit_on": ["y"], - "in": ["y"], - "out": ["y_ind"] - }, - { - "class_name": "bert_sequence_tagger", - "n_tags": "#tag_vocab.len", - "keep_prob": 0.1, - "bert_config_file": "{BERT_PATH}/bert_config.json", - "pretrained_bert": "{BERT_PATH}/bert_model.ckpt", - "attention_probs_keep_prob": 0.5, - "use_crf": false, - "return_probas": true, - "ema_decay": 0.9, - "encoder_layer_ids": [-1], - "optimizer": "tf.train:AdamOptimizer", - "learning_rate": 1e-3, - "bert_learning_rate": 2e-5, - "min_learning_rate": 1e-7, - "learning_rate_drop_patience": 30, - "learning_rate_drop_div": 1.5, - "load_before_drop": true, - "clip_norm": 1.0, - "save_path": "{NER_PATH}/model", - "load_path": "{NER_PATH}/model", - "in": ["x_subword_tok_ids", "x_subword_mask", "pred_subword_mask"], - "in_y": ["y_ind"], - "out": ["y_pred_ind"] - } - ], - "out": ["x_tokens", "y_pred_ind"] - }, - "train": { - "epochs": 30, - "batch_size": 16, - "metrics": [ - { - "name": "ner_f1", - "inputs": ["y", "y_pred"] - }, - { - "name": "ner_token_f1", - "inputs": ["y", "y_pred"] - } - ], - "validation_patience": 10, - "val_every_n_batches": 400, - - "log_every_n_batches": 400, - "tensorboard_log_dir": "{NER_PATH}/logs", - "show_examples": false, - "pytest_max_batches": 2, - "pytest_batch_size": 8, - "evaluation_targets": ["valid", "test"], - "class_name": "nn_trainer" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "BERT_PATH": "{DOWNLOADS_PATH}/bert_models/cased_L-12_H-768_A-12", - "NER_PATH": "{MODELS_PATH}/ner_lcquad" - }, - "labels": { - "telegram_utils": "NERCoNLL2003Model", - "server_utils": "NER" - }, - "download": [ - - { - "url": "http://files.deeppavlov.ai/kbqa/datasets/entity_detection_dataset.pickle", - "subdir": "{MODELS_PATH}" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/bert/cased_L-12_H-768_A-12.zip", - "subdir": "{DOWNLOADS_PATH}/bert_models" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/models/ner_lcquad.tar.gz", - "subdir": "{MODELS_PATH}/ner_lcquad" - } - ] - } -} diff --git a/deeppavlov/configs/ner/ner_ontonotes.json b/deeppavlov/configs/ner/ner_ontonotes.json deleted file mode 100644 index ca0827eacf..0000000000 --- a/deeppavlov/configs/ner/ner_ontonotes.json +++ /dev/null @@ -1,165 +0,0 @@ -{ - "dataset_reader": { - "class_name": "conll2003_reader", - "data_path": "{DOWNLOADS_PATH}/ontonotes", - "dataset_name": "ontonotes" - }, - "dataset_iterator": { - "class_name": "data_learning_iterator", - "seed": 42 - }, - "chainer": { - "in": ["x"], - "in_y": ["y"], - "pipe": [ - { - "in": ["x"], - "class_name": "lazy_tokenizer", - "out": ["x_tokens"] - }, - { - "in": ["x_tokens"], - "class_name": "str_lower", - "out": ["x_lower"] - }, - { - "in": ["x_lower"], - "class_name": "sanitizer", - "nums": true, - "out": ["x_san"] - }, - { - "in": ["y"], - "id": "tag_vocab", - "class_name": "simple_vocab", - "pad_with_zeros": true, - "fit_on": ["y"], - "save_path": "{MODEL_PATH}/tag.dict", - "load_path": "{MODEL_PATH}/tag.dict", - "out": ["y_ind"] - }, - { - "in": ["x_tokens"], - "class_name": "char_splitter", - "out": ["x_char"] - }, - { - "in": ["x_char"], - "id": "char_vocab", - "class_name": "simple_vocab", - "pad_with_zeros": true, - "fit_on": ["x_char"], - "save_path": "{MODEL_PATH}/char.dict", - "load_path": "{MODEL_PATH}/char.dict", - "out": ["x_char_ind"] - }, - { - "in": ["x_tokens"], - "class_name": "mask", - "out": ["mask"] - }, - { - "in": ["x_san"], - "id": "glove_emb", - "class_name": "glove", - "pad_zero": true, - "load_path": "{DOWNLOADS_PATH}/embeddings/glove.6B.100d.txt", - - "out": ["x_emb"] - }, - { - "id": "embeddings_char", - "class_name": "emb_mat_assembler", - "character_level": true, - "emb_dim": 32, - "embedder": "#glove_emb", - "vocab": "#char_vocab" - }, - { - "id": "capitalization", - "class_name": "capitalization_featurizer", - "in": ["x_tokens"], - "out": ["cap"] - }, - { - "in": ["x_emb", "mask", "x_char_ind", "cap"], - "in_y": ["y_ind"], - "out": ["y_predicted"], - "class_name": "ner", - "main": true, - "token_emb_dim": "#glove_emb.dim", - "n_hidden_list": [256, 256, 256], - "net_type": "rnn", - "cell_type": "lstm", - "use_cudnn_rnn": true, - "n_tags": "#tag_vocab.len", - "capitalization_dim": "#capitalization.dim", - "char_emb_dim": "#embeddings_char.dim", - "save_path": "{MODEL_PATH}/model", - "load_path": "{MODEL_PATH}/model", - "char_emb_mat": "#embeddings_char.emb_mat", - "two_dense_on_top": true, - "use_crf": true, - "use_batch_norm": true, - "embeddings_dropout": true, - "top_dropout": true, - "intra_layer_dropout": false, - "l2_reg": 0, - "learning_rate": 3e-3, - "learning_rate_drop_patience": 3, - "dropout_keep_prob": 0.7 - }, - { - "ref": "tag_vocab", - "in": ["y_predicted"], - "out": ["tags"] - } - ], - - "out": ["x_tokens", "tags"] - }, - "train": { - "epochs": 100, - "batch_size": 64, - - "metrics": [ - { - "name": "ner_f1", - "inputs": ["y", "tags"] - }, - { - "name": "ner_token_f1", - "inputs": ["y", "tags"] - } - ], - "validation_patience": 7, - "val_every_n_epochs": 1, - - "log_every_n_batches": -1, - "tensorboard_log_dir": "{MODEL_PATH}/logs", - "show_examples": false, - "class_name": "nn_trainer", - "evaluation_targets": [ - "valid", - "test" - ] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "MODEL_PATH": "{MODELS_PATH}/ner_ontonotes" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/ner_ontonotes_v3_cpu_compatible.tar.gz", - "subdir": "{MODELS_PATH}" - }, - { - "url": "http://files.deeppavlov.ai/embeddings/glove.6B.100d.txt", - "subdir": "{DOWNLOADS_PATH}/embeddings" - } - ] - } -} diff --git a/deeppavlov/configs/ner/ner_ontonotes_bert.json b/deeppavlov/configs/ner/ner_ontonotes_bert.json index 7b67e977f0..0e7cb5c55e 100644 --- a/deeppavlov/configs/ner/ner_ontonotes_bert.json +++ b/deeppavlov/configs/ner/ner_ontonotes_bert.json @@ -13,50 +13,50 @@ "in_y": ["y"], "pipe": [ { - "class_name": "bert_ner_preprocessor", - "vocab_file": "{BERT_PATH}/vocab.txt", + "class_name": "torch_transformers_ner_preprocessor", + "vocab_file": "{TRANSFORMER}", "do_lower_case": false, "max_seq_length": 512, "max_subword_length": 15, "token_masking_prob": 0.0, "in": ["x"], - "out": ["x_tokens", "x_subword_tokens", "x_subword_tok_ids", "startofword_markers", "attention_mask"] + "out": ["x_tokens", "x_subword_tokens", "x_subword_tok_ids", "startofword_markers", "attention_mask", "tokens_offsets"] }, { "id": "tag_vocab", "class_name": "simple_vocab", "unk_token": ["O"], "pad_with_zeros": true, - "save_path": "{NER_PATH}/tag.dict", - "load_path": "{NER_PATH}/tag.dict", + "save_path": "{MODEL_PATH}/tag.dict", + "load_path": "{MODEL_PATH}/tag.dict", "fit_on": ["y"], "in": ["y"], "out": ["y_ind"] }, { - "class_name": "bert_sequence_tagger", + "class_name": "torch_transformers_sequence_tagger", "n_tags": "#tag_vocab.len", - "keep_prob": 0.1, - "bert_config_file": "{BERT_PATH}/bert_config.json", - "pretrained_bert": "{BERT_PATH}/bert_model.ckpt", + "pretrained_bert": "{TRANSFORMER}", "attention_probs_keep_prob": 0.5, "use_crf": true, - "return_probas": false, - "ema_decay": 0.9, "encoder_layer_ids": [-1], - "weight_decay_rate": 1e-6, - "learning_rate": 1e-2, - "bert_learning_rate": 2e-5, - "min_learning_rate": 1e-7, - "learning_rate_drop_patience": 30, - "learning_rate_drop_div": 2, - "load_before_drop": false, + "optimizer": "AdamW", + "optimizer_parameters": { + "lr": 2e-05, + "weight_decay": 1e-06, + "betas": [0.9, 0.999], + "eps": 1e-06 + }, "clip_norm": 1.0, - "save_path": "{NER_PATH}/model", - "load_path": "{NER_PATH}/model", + "min_learning_rate": 1e-07, + "learning_rate_drop_patience": 30, + "learning_rate_drop_div": 1.5, + "load_before_drop": true, + "save_path": "{MODEL_PATH}/model", + "load_path": "{MODEL_PATH}/model", "in": ["x_subword_tok_ids", "attention_mask", "startofword_markers"], "in_y": ["y_ind"], - "out": ["y_pred_ind"] + "out": ["y_pred_ind", "probas"] }, { "ref": "tag_vocab", @@ -68,7 +68,7 @@ }, "train": { "epochs": 30, - "batch_size": 16, + "batch_size": 60, "metrics": [ { "name": "ner_f1", @@ -80,33 +80,26 @@ } ], "validation_patience": 100, - "val_every_n_batches": 40, - - "log_every_n_batches": 40, - "tensorboard_log_dir": "{NER_PATH}/logs", + "val_every_n_batches": 20, + "log_every_n_batches": 20, "show_examples": false, "pytest_max_batches": 2, "pytest_batch_size": 8, "evaluation_targets": ["valid", "test"], - "class_name": "nn_trainer" + "class_name": "torch_trainer" }, "metadata": { "variables": { "ROOT_PATH": "~/.deeppavlov", "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", "MODELS_PATH": "{ROOT_PATH}/models", - "BERT_PATH": "{DOWNLOADS_PATH}/bert_models/cased_L-12_H-768_A-12", - "NER_PATH": "{MODELS_PATH}/ner_ontonotes_bert" + "TRANSFORMER": "bert-base-cased", + "MODEL_PATH": "{MODELS_PATH}/ner_ontonotes_bert_torch_crf" }, "download": [ - - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/ner_ontonotes_bert_v1.tar.gz", - "subdir": "{MODELS_PATH}" - }, { - "url": "http://files.deeppavlov.ai/deeppavlov_data/bert/cased_L-12_H-768_A-12.zip", - "subdir": "{DOWNLOADS_PATH}/bert_models" + "url": "http://files.deeppavlov.ai/v1/ner/ner_ontonotes_bert_torch_crf.tar.gz", + "subdir": "{MODEL_PATH}" } ] } diff --git a/deeppavlov/configs/ner/ner_ontonotes_bert_emb.json b/deeppavlov/configs/ner/ner_ontonotes_bert_emb.json deleted file mode 100644 index 513af21f5f..0000000000 --- a/deeppavlov/configs/ner/ner_ontonotes_bert_emb.json +++ /dev/null @@ -1,122 +0,0 @@ -{ - "dataset_reader": { - "class_name": "conll2003_reader", - "data_path": "{DOWNLOADS_PATH}/ontonotes", - "dataset_name": "ontonotes" - }, - "dataset_iterator": { - "class_name": "data_learning_iterator", - "seed": 42 - }, - "chainer": { - "in": ["x"], - "in_y": ["y"], - "pipe": [ - { - "class_name": "transformers_bert_preprocessor", - "vocab_file": "{BERT_PATH}/vocab.txt", - "do_lower_case": false, - "max_seq_length": 512, - "in": ["x"], - "out": ["x_tokens", "subword_tokens", "subword_tok_ids", "startofword_markers", "attention_mask"] - }, - { - "in": ["y"], - "id": "tag_vocab", - "class_name": "simple_vocab", - "pad_with_zeros": true, - "fit_on": ["y"], - "save_path": "{MODEL_PATH}/tag.dict", - "load_path": "{MODEL_PATH}/tag.dict", - "out": ["y_ind"] - }, - { - "in": ["x_tokens"], - "class_name": "mask", - "out": ["mask"] - }, - { - "class_name": "transformers_bert_embedder", - "id": "embedder", - "bert_config_path": "{BERT_PATH}/bert_config.json", - "truncate": false, - "load_path": "{BERT_PATH}", - "in": ["subword_tok_ids", "startofword_markers", "attention_mask"], - "out": ["x_emb", "subword_emb", "max_emb", "mean_emb", "pooler_output"] - }, - { - "in": ["x_emb", "mask"], - "in_y": ["y_ind"], - "out": ["y_predicted"], - "class_name": "ner", - "main": true, - "token_emb_dim": "#embedder.dim", - "n_hidden_list": [256, 256, 256], - "net_type": "rnn", - "cell_type": "lstm", - "use_cudnn_rnn": true, - "n_tags": "#tag_vocab.len", - "save_path": "{MODEL_PATH}/model", - "load_path": "{MODEL_PATH}/model", - "two_dense_on_top": true, - "use_crf": true, - "use_batch_norm": true, - "embeddings_dropout": true, - "top_dropout": true, - "intra_layer_dropout": false, - "l2_reg": 0, - "learning_rate": 3e-3, - "learning_rate_drop_patience": 3, - "dropout_keep_prob": 0.7 - }, - { - "ref": "tag_vocab", - "in": ["y_predicted"], - "out": ["tags"] - } - ], - - "out": ["x_tokens", "tags"] - }, - "train": { - "epochs": 100, - "batch_size": 64, - - "metrics": [ - { - "name": "ner_f1", - "inputs": ["y", "tags"] - }, - { - "name": "ner_token_f1", - "inputs": ["y", "tags"] - } - ], - "validation_patience": 7, - "val_every_n_epochs": 1, - - "log_every_n_batches": -1, - "tensorboard_log_dir": "{MODEL_PATH}/logs", - "show_examples": false, - "class_name": "nn_trainer", - "evaluation_targets": [ - "valid", - "test" - ] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "MODEL_PATH": "{MODELS_PATH}/ner_ontonotes_bert_emb", - "BERT_PATH": "{DOWNLOADS_PATH}/bert_models/multi_cased_L-12_H-768_A-12_pt" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/bert/multi_cased_L-12_H-768_A-12_pt.tar.gz", - "subdir": "{DOWNLOADS_PATH}/bert_models" - } - ] - } -} diff --git a/deeppavlov/configs/ner/ner_ontonotes_bert_mult.json b/deeppavlov/configs/ner/ner_ontonotes_bert_mult.json index da6138d1a2..c18e9e89ae 100644 --- a/deeppavlov/configs/ner/ner_ontonotes_bert_mult.json +++ b/deeppavlov/configs/ner/ner_ontonotes_bert_mult.json @@ -13,50 +13,50 @@ "in_y": ["y"], "pipe": [ { - "class_name": "bert_ner_preprocessor", - "vocab_file": "{BERT_PATH}/vocab.txt", + "class_name": "torch_transformers_ner_preprocessor", + "vocab_file": "{TRANSFORMER}", "do_lower_case": false, "max_seq_length": 512, "max_subword_length": 15, "token_masking_prob": 0.0, "in": ["x"], - "out": ["x_tokens", "x_subword_tokens", "x_subword_tok_ids", "startofword_markers", "attention_mask"] + "out": ["x_tokens", "x_subword_tokens", "x_subword_tok_ids", "startofword_markers", "attention_mask", "tokens_offsets"] }, { "id": "tag_vocab", "class_name": "simple_vocab", "unk_token": ["O"], "pad_with_zeros": true, - "save_path": "{NER_PATH}/tag.dict", - "load_path": "{NER_PATH}/tag.dict", + "save_path": "{MODEL_PATH}/tag.dict", + "load_path": "{MODEL_PATH}/tag.dict", "fit_on": ["y"], "in": ["y"], "out": ["y_ind"] }, { - "class_name": "bert_sequence_tagger", + "class_name": "torch_transformers_sequence_tagger", "n_tags": "#tag_vocab.len", - "keep_prob": 0.1, - "bert_config_file": "{BERT_PATH}/bert_config.json", - "pretrained_bert": "{BERT_PATH}/bert_model.ckpt", + "pretrained_bert": "{TRANSFORMER}", "attention_probs_keep_prob": 0.5, "use_crf": true, - "return_probas": false, - "ema_decay": 0.9, "encoder_layer_ids": [-1], - "weight_decay_rate": 1e-6, - "learning_rate": 1e-2, - "bert_learning_rate": 2e-5, - "min_learning_rate": 1e-7, + "optimizer": "AdamW", + "optimizer_parameters": { + "lr": 2e-05, + "weight_decay": 1e-06, + "betas": [0.9, 0.999], + "eps": 1e-06 + }, + "clip_norm": 1.0, + "min_learning_rate": 1e-07, "learning_rate_drop_patience": 30, "learning_rate_drop_div": 1.5, - "load_before_drop": false, - "clip_norm": 1.0, - "save_path": "{NER_PATH}/model", - "load_path": "{NER_PATH}/model", + "load_before_drop": true, + "save_path": "{MODEL_PATH}/model", + "load_path": "{MODEL_PATH}/model", "in": ["x_subword_tok_ids", "attention_mask", "startofword_markers"], "in_y": ["y_ind"], - "out": ["y_pred_ind"] + "out": ["y_pred_ind", "probas"] }, { "ref": "tag_vocab", @@ -68,7 +68,7 @@ }, "train": { "epochs": 30, - "batch_size": 16, + "batch_size": 10, "metrics": [ { "name": "ner_f1", @@ -81,32 +81,25 @@ ], "validation_patience": 100, "val_every_n_batches": 20, - "log_every_n_batches": 20, - "tensorboard_log_dir": "{NER_PATH}/logs", + "show_examples": false, "pytest_max_batches": 2, "pytest_batch_size": 8, - "show_examples": false, "evaluation_targets": ["valid", "test"], - "class_name": "nn_trainer" + "class_name": "torch_trainer" }, "metadata": { "variables": { "ROOT_PATH": "~/.deeppavlov", "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", "MODELS_PATH": "{ROOT_PATH}/models", - "BERT_PATH": "{DOWNLOADS_PATH}/bert_models/multi_cased_L-12_H-768_A-12", - "NER_PATH": "{MODELS_PATH}/ner_ontonotes_bert_mult" + "TRANSFORMER": "bert-base-multilingual-cased", + "MODEL_PATH": "{MODELS_PATH}/ner_ontonotes_torch_bert_mult_crf" }, "download": [ - - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/ner_ontonotes_bert_mult_v1.tar.gz", - "subdir": "{MODELS_PATH}" - }, { - "url": "http://files.deeppavlov.ai/deeppavlov_data/bert/multi_cased_L-12_H-768_A-12.zip", - "subdir": "{DOWNLOADS_PATH}/bert_models" + "url": "http://files.deeppavlov.ai/v1/ner/ner_ontonotes_bert_mult_torch_crf.tar.gz", + "subdir": "{MODEL_PATH}" } ] } diff --git a/deeppavlov/configs/ner/ner_ontonotes_bert_probas.json b/deeppavlov/configs/ner/ner_ontonotes_bert_probas.json deleted file mode 100644 index 9b1912fdbb..0000000000 --- a/deeppavlov/configs/ner/ner_ontonotes_bert_probas.json +++ /dev/null @@ -1,107 +0,0 @@ -{ - "dataset_reader": { - "class_name": "conll2003_reader", - "data_path": "{DOWNLOADS_PATH}/ontonotes/", - "dataset_name": "ontonotes", - "provide_pos": false - }, - "dataset_iterator": { - "class_name": "data_learning_iterator" - }, - "chainer": { - "in": ["x"], - "in_y": ["y"], - "pipe": [ - { - "class_name": "bert_ner_preprocessor", - "vocab_file": "{BERT_PATH}/vocab.txt", - "do_lower_case": false, - "max_seq_length": 512, - "max_subword_length": 15, - "token_masking_prob": 0.0, - "in": ["x"], - "out": ["x_tokens", "x_subword_tokens", "x_subword_tok_ids", "startofword_markers", "attention_mask"] - }, - { - "id": "tag_vocab", - "class_name": "simple_vocab", - "unk_token": ["O"], - "pad_with_zeros": true, - "save_path": "{NER_PATH}/tag.dict", - "load_path": "{NER_PATH}/tag.dict", - "fit_on": ["y"], - "in": ["y"], - "out": ["y_ind"] - }, - { - "class_name": "bert_sequence_tagger", - "n_tags": "#tag_vocab.len", - "keep_prob": 0.1, - "bert_config_file": "{BERT_PATH}/bert_config.json", - "pretrained_bert": "{BERT_PATH}/bert_model.ckpt", - "attention_probs_keep_prob": 0.5, - "use_crf": true, - "return_probas": true, - "ema_decay": 0.9, - "encoder_layer_ids": [-1], - "weight_decay_rate": 1e-6, - "learning_rate": 1e-2, - "bert_learning_rate": 2e-5, - "min_learning_rate": 1e-7, - "learning_rate_drop_patience": 30, - "learning_rate_drop_div": 2, - "load_before_drop": false, - "clip_norm": 1.0, - "save_path": "{NER_PATH}/model", - "load_path": "{NER_PATH}/model", - "in": ["x_subword_tok_ids", "attention_mask", "startofword_markers"], - "in_y": ["y_ind"], - "out": ["y_pred"] - } - ], - "out": ["x_tokens", "y_pred"] - }, - "train": { - "epochs": 30, - "batch_size": 16, - "metrics": [ - { - "name": "ner_f1", - "inputs": ["y", "y_pred"] - }, - { - "name": "ner_token_f1", - "inputs": ["y", "y_pred"] - } - ], - "validation_patience": 100, - "val_every_n_batches": 40, - - "log_every_n_batches": 40, - "tensorboard_log_dir": "{NER_PATH}/logs", - "show_examples": false, - "pytest_max_batches": 2, - "pytest_batch_size": 8, - "evaluation_targets": ["valid", "test"], - "class_name": "nn_trainer" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "BERT_PATH": "{DOWNLOADS_PATH}/bert_models/cased_L-12_H-768_A-12", - "NER_PATH": "{MODELS_PATH}/ner_ontonotes_bert" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/ner_ontonotes_bert_v1.tar.gz", - "subdir": "{MODELS_PATH}" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/bert/cased_L-12_H-768_A-12.zip", - "subdir": "{DOWNLOADS_PATH}/bert_models" - } - ] - } -} diff --git a/deeppavlov/configs/ner/ner_ontonotes_m1.json b/deeppavlov/configs/ner/ner_ontonotes_m1.json deleted file mode 100644 index 6e4e85d66e..0000000000 --- a/deeppavlov/configs/ner/ner_ontonotes_m1.json +++ /dev/null @@ -1,131 +0,0 @@ -{ - "dataset_reader": { - "class_name": "conll2003_reader", - "data_path": "{DOWNLOADS_PATH}/ontonotes/", - "dataset_name": "ontonotes", - "provide_pos": false, - "provide_chunk": false, - "iobes": true - }, - "dataset_iterator": { - "class_name": "data_learning_iterator" - }, - "chainer": { - "in": ["x"], - "in_y": ["y"], - "pipe": [ - { - "in": ["x"], - "out": ["x_tokens"], - "class_name": "lazy_tokenizer" - }, - { - "in": ["x_tokens"], - "out": ["x_lower", "sent_lengths", "x_tokens_elmo"], - "class_name": "ner_preprocessor", - "get_x_padded_for_elmo": true - }, - { - "in": ["x_lower"], - "out": ["x_tok_ind"], - "fit_on": ["x_lower"], - "class_name": "ner_vocab", - "id": "word_vocab", - "save_path": "{MODEL_PATH}/word.dict", - "load_path": "{MODEL_PATH}/word.dict" - }, - { - "in": ["y"], - "out": ["y_ind"], - "fit_on": ["y"], - "class_name": "ner_vocab", - "id": "tag_vocab", - "save_path": "{MODEL_PATH}/tag.dict", - "load_path": "{MODEL_PATH}/tag.dict" - }, - { - "in": ["x_tokens"], - "out": ["x_char_ind"], - "fit_on": ["x_tokens"], - "class_name": "ner_vocab", - "char_level": true, - "id": "char_vocab", - "save_path": "{MODEL_PATH}/char.dict", - "load_path": "{MODEL_PATH}/char.dict" - }, - { - "in":[ - "sent_lengths", - "x_tok_ind", - "x_char_ind", - "x_tokens_elmo" - ], - "in_y": ["y_ind"], - "out": ["y_predicted"], - "class_name": "hybrid_ner_model", - "n_tags": "#tag_vocab.len", - "word_emb_path": "{DOWNLOADS_PATH}/embeddings/glove.6B.100d.txt", - "word_emb_name": "glove", - "word_dim": 100, - "word_vocab": "#word_vocab", - "char_vocab_size": "#char_vocab.len", - "char_dim": 100, - "elmo_dim": 128, - "lstm_hidden_size": 256, - "save_path": "{MODEL_PATH}/ontonotes", - "load_path": "{MODEL_PATH}/ontonotes", - "learning_rate": 1e-3, - "learning_rate_drop_patience": 5, - "learning_rate_drop_div": 10, - "dropout_keep_prob": 0.7 - }, - { - "in": ["y_predicted"], - "out": ["tags"], - "class_name": "convert_ids2tags", - "id2tag": "#tag_vocab.i2t" - } - ], - "out": ["x_tokens", "tags"] - }, - "train": { - "epochs": 100, - "batch_size": 64, - "metrics": [ - { - "name": "ner_f1", - "inputs": ["y", "tags"] - }, - { - "name": "ner_token_f1", - "inputs": ["y", "tags"] - } - ], - "validation_patience": 10, - "val_every_n_epochs": 1, - "log_every_n_epochs": 1, - "show_examples": false, - "class_name": "nn_trainer", - "evaluation_targets": [ - "valid", - "test" - ] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODEL_PATH": "{ROOT_PATH}/models/ontonotes" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/ner_ontonotes_m1.tar.gz", - "subdir": "{MODEL_PATH}" - }, - { - "url": "http://files.deeppavlov.ai/embeddings/glove.6B.100d.txt", - "subdir": "{DOWNLOADS_PATH}/embeddings" - } - ] - } -} diff --git a/deeppavlov/configs/ner/ner_rus.json b/deeppavlov/configs/ner/ner_rus.json deleted file mode 100644 index b6546f706a..0000000000 --- a/deeppavlov/configs/ner/ner_rus.json +++ /dev/null @@ -1,177 +0,0 @@ -{ - "dataset_reader": { - "class_name": "conll2003_reader", - "data_path": "{DOWNLOADS_PATH}/total_rus/", - "dataset_name": "collection_rus", - "provide_pos": false - }, - "dataset_iterator": { - "class_name": "data_learning_iterator" - }, - "chainer": { - "in": ["x"], - "in_y": ["y"], - "pipe": [ - { - "in": ["x"], - "class_name": "lazy_tokenizer", - "out": ["x_tokens"] - }, - { - "in": ["x_tokens"], - "class_name": "str_lower", - "out": ["x_lower"] - }, - { - "in": ["x_lower"], - "class_name": "sanitizer", - "nums": true, - "out": ["x_san"] - }, - { - "in": ["x_san"], - "id": "word_vocab", - "class_name": "simple_vocab", - "pad_with_zeros": true, - "special_tokens": [""], - "fit_on": ["x_san"], - "save_path": "{NER_PATH}/word.dict", - "load_path": "{NER_PATH}/word.dict", - "out": ["x_tok_ind"] - }, - { - "in": ["y"], - "id": "tag_vocab", - "class_name": "simple_vocab", - "pad_with_zeros": true, - "fit_on": ["y"], - "save_path": "{NER_PATH}/tag.dict", - "load_path": "{NER_PATH}/tag.dict", - "out": ["y_ind"] - }, - { - "in": ["x_tokens"], - "class_name": "char_splitter", - "out": ["x_char"] - }, - { - "in": ["x_char"], - "id": "char_vocab", - "class_name": "simple_vocab", - "pad_with_zeros": true, - "fit_on": ["x_char"], - "save_path": "{NER_PATH}/char.dict", - "load_path": "{NER_PATH}/char.dict", - "out": ["x_char_ind"] - }, - { - "in": ["x_san"], - "id": "embedder", - "class_name": "fasttext", - "pad_zero": true, - "load_path": "{DOWNLOADS_PATH}/embeddings/lenta_lower_100.bin", - "out": ["x_emb"] - }, - { - "in": ["x_tokens"], - "class_name": "mask", - "out": ["mask"] - }, - { - "class_name": "random_emb_mat", - "id": "embeddings", - "vocab_len": "#word_vocab.len", - "emb_dim": 100 - }, - { - "class_name": "random_emb_mat", - "id": "embeddings_char", - "vocab_len": "#char_vocab.len", - "emb_dim": 100 - }, - { - "id": "capitalization", - "class_name": "capitalization_featurizer", - "in": ["x_tokens"], - "out": ["cap"] - }, - { - "in": ["x_emb", "mask", "x_char_ind", "cap"], - "in_y": ["y_ind"], - "out": ["y_predicted"], - "class_name": "ner", - "main": true, - "n_hidden_list": [128], - "net_type": "rnn", - "cell_type": "lstm", - "use_cudnn_rnn": true, - "n_tags": "#tag_vocab.len", - "capitalization_dim": "#capitalization.dim", - "token_emb_dim": "#embedder.dim", - "char_emb_dim": 100, - "save_path": "{NER_PATH}/model", - "load_path": "{NER_PATH}/model", - "char_emb_mat": "#embeddings_char.emb_mat", - "use_crf": true, - "use_batch_norm": true, - "embeddings_dropout": true, - "top_dropout": true, - "intra_layer_dropout": true, - "l2_reg": 0, - "learning_rate": 1e-2, - "dropout_keep_prob": 0.7 - }, - { - "ref": "tag_vocab", - "in": ["y_predicted"], - "out": ["tags"] - } - ], - - "out": ["x_tokens", "tags"] - }, - "train": { - "epochs": 100, - "batch_size": 64, - - "metrics": [ - { - "name": "ner_f1", - "inputs": ["y", "tags"] - }, - { - "name": "ner_token_f1", - "inputs": ["y", "tags"] - } - ], - "validation_patience": 7, - "val_every_n_epochs": 1, - - "log_every_n_epochs": 1, - "show_examples": false, - "tensorboard_log_dir": "{NER_PATH}/logs", - "class_name": "nn_trainer", - "evaluation_targets": [ - "valid", - "test" - ] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "NER_PATH": "{MODELS_PATH}/ner_rus" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/ner_rus_v3_cpu_compatible.tar.gz", - "subdir": "{MODELS_PATH}" - }, - { - "url": "http://files.deeppavlov.ai/embeddings/lenta_lower_100.bin", - "subdir": "{DOWNLOADS_PATH}/embeddings" - } - ] - } -} diff --git a/deeppavlov/configs/ner/ner_rus_bert.json b/deeppavlov/configs/ner/ner_rus_bert.json index 9a00116886..799f9f3fda 100644 --- a/deeppavlov/configs/ner/ner_rus_bert.json +++ b/deeppavlov/configs/ner/ner_rus_bert.json @@ -9,103 +9,147 @@ "class_name": "data_learning_iterator" }, "chainer": { - "in": ["x"], - "in_y": ["y"], + "in": [ + "x" + ], + "in_y": [ + "y" + ], "pipe": [ { - "class_name": "bert_ner_preprocessor", - "vocab_file": "{BERT_PATH}/vocab.txt", + "class_name": "torch_transformers_ner_preprocessor", + "vocab_file": "{TRANSFORMER}", "do_lower_case": false, "max_seq_length": 512, "max_subword_length": 15, "token_masking_prob": 0.0, - "in": ["x"], - "out": ["x_tokens", "x_subword_tokens", "x_subword_tok_ids", "startofword_markers", "attention_mask"] + "in": [ + "x" + ], + "out": [ + "x_tokens", + "x_subword_tokens", + "x_subword_tok_ids", + "startofword_markers", + "attention_mask", + "tokens_offsets" + ] }, { "id": "tag_vocab", "class_name": "simple_vocab", - "unk_token": ["O"], + "unk_token": [ + "O" + ], "pad_with_zeros": true, - "save_path": "{NER_PATH}/tag.dict", - "load_path": "{NER_PATH}/tag.dict", - "fit_on": ["y"], - "in": ["y"], - "out": ["y_ind"] + "save_path": "{MODEL_PATH}/tag.dict", + "load_path": "{MODEL_PATH}/tag.dict", + "fit_on": [ + "y" + ], + "in": [ + "y" + ], + "out": [ + "y_ind" + ] }, { - "class_name": "bert_sequence_tagger", + "class_name": "torch_transformers_sequence_tagger", "n_tags": "#tag_vocab.len", - "keep_prob": 0.1, - "bert_config_file": "{BERT_PATH}/bert_config.json", - "pretrained_bert": "{BERT_PATH}/bert_model.ckpt", + "pretrained_bert": "{TRANSFORMER}", "attention_probs_keep_prob": 0.5, - "use_crf": true, - "ema_decay": 0.9, - "return_probas": false, - "encoder_layer_ids": [-1], - "optimizer": "tf.train:AdamOptimizer", - "learning_rate": 1e-3, - "bert_learning_rate": 2e-5, - "min_learning_rate": 1e-7, + "encoder_layer_ids": [ + -1 + ], + "optimizer": "AdamW", + "optimizer_parameters": { + "lr": 2e-05, + "weight_decay": 1e-06, + "betas": [ + 0.9, + 0.999 + ], + "eps": 1e-06 + }, + "clip_norm": 1.0, + "min_learning_rate": 1e-07, "learning_rate_drop_patience": 30, "learning_rate_drop_div": 1.5, "load_before_drop": true, - "clip_norm": null, - "save_path": "{NER_PATH}/model", - "load_path": "{NER_PATH}/model", - "in": ["x_subword_tok_ids", "attention_mask", "startofword_markers"], - "in_y": ["y_ind"], - "out": ["y_pred_ind"] + "save_path": "{MODEL_PATH}/model", + "load_path": "{MODEL_PATH}/model", + "in": [ + "x_subword_tok_ids", + "attention_mask", + "startofword_markers" + ], + "in_y": [ + "y_ind" + ], + "out": [ + "y_pred_ind", + "probas" + ] }, { "ref": "tag_vocab", - "in": ["y_pred_ind"], - "out": ["y_pred"] + "in": [ + "y_pred_ind" + ], + "out": [ + "y_pred" + ] } ], - "out": ["x_tokens", "y_pred"] + "out": [ + "x_tokens", + "y_pred" + ] }, "train": { "epochs": 30, - "batch_size": 16, + "batch_size": 10, "metrics": [ { "name": "ner_f1", - "inputs": ["y", "y_pred"] + "inputs": [ + "y", + "y_pred" + ] }, { "name": "ner_token_f1", - "inputs": ["y", "y_pred"] + "inputs": [ + "y", + "y_pred" + ] } ], "validation_patience": 100, "val_every_n_batches": 20, - "log_every_n_batches": 20, - "tensorboard_log_dir": "{NER_PATH}/logs", "show_examples": false, "pytest_max_batches": 2, "pytest_batch_size": 8, - "evaluation_targets": ["valid", "test"], - "class_name": "nn_trainer" + "evaluation_targets": [ + "valid", + "test" + ], + "class_name": "torch_trainer" }, "metadata": { "variables": { "ROOT_PATH": "~/.deeppavlov", "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", "MODELS_PATH": "{ROOT_PATH}/models", - "BERT_PATH": "{DOWNLOADS_PATH}/bert_models/rubert_cased_L-12_H-768_A-12_v1", - "NER_PATH": "{MODELS_PATH}/ner_rus_bert" + "TRANSFORMER": "DeepPavlov/rubert-base-cased", + "MODEL_PATH": "{MODELS_PATH}/ner_rus_bert_torch" }, "download": [ { - "url": "http://files.deeppavlov.ai/deeppavlov_data/ner_rus_bert_v1.tar.gz", - "subdir": "{MODELS_PATH}" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/bert/rubert_cased_L-12_H-768_A-12_v1.tar.gz", - "subdir": "{DOWNLOADS_PATH}/bert_models" + "url": "http://files.deeppavlov.ai/v1/ner/ner_rus_bert_torch_new.tar.gz", + "subdir": "{MODEL_PATH}" } ] } diff --git a/deeppavlov/configs/ner/ner_rus_bert_probas.json b/deeppavlov/configs/ner/ner_rus_bert_probas.json index 8e0189dee1..ae382ced1e 100644 --- a/deeppavlov/configs/ner/ner_rus_bert_probas.json +++ b/deeppavlov/configs/ner/ner_rus_bert_probas.json @@ -1,9 +1,7 @@ { "dataset_reader": { - "class_name": "conll2003_reader", - "data_path": "{DOWNLOADS_PATH}/news_ner/", - "dataset_name": "collection_rus", - "provide_pos": false + "class_name": "sq_reader", + "data_path": "{DOWNLOADS_PATH}/wiki_ner_rus/wikipedia_dataset.pickle" }, "dataset_iterator": { "class_name": "data_learning_iterator" @@ -13,94 +11,108 @@ "in_y": ["y"], "pipe": [ { - "class_name": "bert_ner_preprocessor", - "vocab_file": "{BERT_PATH}/vocab.txt", + "class_name": "torch_transformers_ner_preprocessor", + "vocab_file": "{TRANSFORMER}", "do_lower_case": false, "max_seq_length": 512, "max_subword_length": 15, "token_masking_prob": 0.0, "in": ["x"], - "out": ["x_tokens", "x_subword_tokens", "x_subword_tok_ids", "startofword_markers", "attention_mask"] + "out": ["x_tokens", "x_subword_tokens", "x_subword_tok_ids", "startofword_markers", "attention_mask", "tokens_offsets"] }, { "id": "tag_vocab", "class_name": "simple_vocab", "unk_token": ["O"], "pad_with_zeros": true, - "save_path": "{NER_PATH}/tag.dict", - "load_path": "{NER_PATH}/tag.dict", + "save_path": "{MODEL_PATH}/tag.dict", + "load_path": "{MODEL_PATH}/tag.dict", "fit_on": ["y"], "in": ["y"], "out": ["y_ind"] }, { - "class_name": "bert_sequence_tagger", + "class_name": "torch_transformers_sequence_tagger", "n_tags": "#tag_vocab.len", - "keep_prob": 0.1, - "bert_config_file": "{BERT_PATH}/bert_config.json", - "pretrained_bert": "{BERT_PATH}/bert_model.ckpt", + "pretrained_bert": "{TRANSFORMER}", "attention_probs_keep_prob": 0.5, - "use_crf": true, - "ema_decay": 0.9, - "return_probas": true, "encoder_layer_ids": [-1], - "optimizer": "tf.train:AdamOptimizer", - "learning_rate": 1e-3, - "bert_learning_rate": 2e-5, - "min_learning_rate": 1e-7, + "optimizer": "AdamW", + "optimizer_parameters": { + "lr": 2e-05, + "weight_decay": 1e-06, + "betas": [ + 0.9, + 0.999 + ], + "eps": 1e-06 + }, + "clip_norm": 1.0, + "min_learning_rate": 1e-07, "learning_rate_drop_patience": 30, "learning_rate_drop_div": 1.5, "load_before_drop": true, - "clip_norm": null, - "save_path": "{NER_PATH}/model", - "load_path": "{NER_PATH}/model", + "save_path": "{MODEL_PATH}/model", + "load_path": "{MODEL_PATH}/model", "in": ["x_subword_tok_ids", "attention_mask", "startofword_markers"], "in_y": ["y_ind"], + "out": ["y_pred_ind", "probas"] + }, + { + "ref": "tag_vocab", + "in": ["y_pred_ind"], "out": ["y_pred"] } ], - "out": ["x_tokens", "y_pred"] + "out": ["x_tokens", "tokens_offsets", "y_pred", "probas"] }, "train": { "epochs": 30, - "batch_size": 16, + "batch_size": 10, "metrics": [ { "name": "ner_f1", - "inputs": ["y", "y_pred"] + "inputs": [ + "y", + "y_pred" + ] }, { "name": "ner_token_f1", - "inputs": ["y", "y_pred"] + "inputs": [ + "y", + "y_pred" + ] } ], "validation_patience": 100, "val_every_n_batches": 20, - "log_every_n_batches": 20, - "tensorboard_log_dir": "{NER_PATH}/logs", "show_examples": false, "pytest_max_batches": 2, "pytest_batch_size": 8, - "evaluation_targets": ["valid", "test"], - "class_name": "nn_trainer" + "evaluation_targets": [ + "valid", + "test" + ], + "class_name": "torch_trainer" }, "metadata": { "variables": { "ROOT_PATH": "~/.deeppavlov", "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", "MODELS_PATH": "{ROOT_PATH}/models", - "BERT_PATH": "{DOWNLOADS_PATH}/bert_models/rubert_cased_L-12_H-768_A-12_v1", - "NER_PATH": "{MODELS_PATH}/ner_rus_bert" + "TRANSFORMER": "DeepPavlov/rubert-base-cased", + "MODEL_PATH": "{MODELS_PATH}/wiki_ner_rus_bert" }, "download": [ { - "url": "http://files.deeppavlov.ai/deeppavlov_data/ner_rus_bert_v1.tar.gz", - "subdir": "{MODELS_PATH}" + "url": "http://files.deeppavlov.ai/deeppavlov_data/rus_dream_entity_detection/wiki_ner_rus_bert.tar.gz", + "subdir": "{MODELS_PATH}/wiki_ner_rus_bert" }, { - "url": "http://files.deeppavlov.ai/deeppavlov_data/bert/rubert_cased_L-12_H-768_A-12_v1.tar.gz", - "subdir": "{DOWNLOADS_PATH}/bert_models" + "url": "http://files.deeppavlov.ai/datasets/wiki_ner_rus/wiki_ner_rus_dataset.tar.gz", + "subdir": "{DOWNLOADS_PATH}/wiki_ner_rus" } ] } diff --git a/deeppavlov/configs/ner/ner_rus_convers_distilrubert_2L.json b/deeppavlov/configs/ner/ner_rus_convers_distilrubert_2L.json index 15c931c1eb..b125f9ffaf 100644 --- a/deeppavlov/configs/ner/ner_rus_convers_distilrubert_2L.json +++ b/deeppavlov/configs/ner/ner_rus_convers_distilrubert_2L.json @@ -31,7 +31,8 @@ "x_subword_tokens", "x_subword_tok_ids", "startofword_markers", - "attention_mask" + "attention_mask", + "tokens_offsets" ] }, { @@ -58,8 +59,7 @@ "n_tags": "#tag_vocab.len", "pretrained_bert": "{TRANSFORMER}", "attention_probs_keep_prob": 0.11, - "hidden_keep_prob": 0.11, - "return_probas": false, + "hidden_keep_prob": 0.11, "encoder_layer_ids": [ -1 ], @@ -89,7 +89,8 @@ "y_ind" ], "out": [ - "y_pred_ind" + "y_pred_ind", + "probas" ] }, { diff --git a/deeppavlov/configs/ner/ner_rus_convers_distilrubert_6L.json b/deeppavlov/configs/ner/ner_rus_convers_distilrubert_6L.json index b2534426a6..d414d732d3 100644 --- a/deeppavlov/configs/ner/ner_rus_convers_distilrubert_6L.json +++ b/deeppavlov/configs/ner/ner_rus_convers_distilrubert_6L.json @@ -31,7 +31,8 @@ "x_subword_tokens", "x_subword_tok_ids", "startofword_markers", - "attention_mask" + "attention_mask", + "tokens_offsets" ] }, { @@ -58,8 +59,7 @@ "n_tags": "#tag_vocab.len", "pretrained_bert": "{TRANSFORMER}", "attention_probs_keep_prob": 0.44, - "hidden_keep_prob": 0.89, - "return_probas": false, + "hidden_keep_prob": 0.89, "encoder_layer_ids": [ -1 ], @@ -89,7 +89,8 @@ "y_ind" ], "out": [ - "y_pred_ind" + "y_pred_ind", + "probas" ] }, { diff --git a/deeppavlov/configs/ner/slotfill_dstc2.json b/deeppavlov/configs/ner/slotfill_dstc2.json deleted file mode 100644 index e1df2f26fd..0000000000 --- a/deeppavlov/configs/ner/slotfill_dstc2.json +++ /dev/null @@ -1,64 +0,0 @@ -{ - "dataset_reader": { - "class_name": "dstc2_reader", - "data_path": "{DATA_PATH}" - }, - "dataset_iterator": { - "class_name": "dstc2_ner_iterator", - "slot_values_path": "{SLOT_VALS_PATH}" - }, - "chainer": { - "in": ["x"], - "in_y": ["y"], - "pipe": [ - { - "in": ["x"], - "class_name": "lazy_tokenizer", - "out": ["x_tokens"] - }, - { - "in": ["x_tokens"], - "config_path": "{NER_CONFIG_PATH}", - "out": ["x_tokens", "tags"] - }, - - { - "in": ["x_tokens", "tags"], - "class_name": "dstc_slotfilling", - "threshold": 0.8, - "save_path": "{MODEL_PATH}/model", - "load_path": "{MODEL_PATH}/model", - "out": ["slots"] - } - ], - "out": ["slots"] - }, - "train": { - "metrics": ["slots_accuracy"], - "class_name": "fit_trainer", - "evaluation_targets": [ - "valid", - "test" - ] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "NER_CONFIG_PATH": "{DEEPPAVLOV_PATH}/configs/ner/ner_dstc2.json", - "DATA_PATH": "{ROOT_PATH}/downloads/dstc2", - "SLOT_VALS_PATH": "{DATA_PATH}/dstc_slot_vals.json", - "MODELS_PATH": "{ROOT_PATH}/models", - "MODEL_PATH": "{MODELS_PATH}/slotfill_dstc2" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/dstc_slot_vals.tar.gz", - "subdir": "{DATA_PATH}" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/slotfill_dstc2.tar.gz", - "subdir": "{MODELS_PATH}" - } - ] - } -} diff --git a/deeppavlov/configs/ner/slotfill_dstc2_raw.json b/deeppavlov/configs/ner/slotfill_dstc2_raw.json deleted file mode 100644 index 9138d99c01..0000000000 --- a/deeppavlov/configs/ner/slotfill_dstc2_raw.json +++ /dev/null @@ -1,54 +0,0 @@ -{ - "dataset_reader": { - "class_name": "dstc2_reader", - "data_path": "{DATA_PATH}" - }, - "dataset_iterator": { - "class_name": "dstc2_ner_iterator", - "slot_values_path": "{SLOT_VALS_PATH}" - }, - "chainer": { - "in": ["x"], - "in_y": ["y"], - "pipe": [ - { - "in": ["x"], - "class_name": "lazy_tokenizer", - "out": ["x_tokens"] - }, - { - "in": ["x_tokens"], - "class_name": "str_lower", - "out": ["x_lower"] - }, - { - "in": ["x_lower"], - "class_name": "slotfill_raw", - "save_path": "{SLOT_VALS_PATH}", - "load_path": "{SLOT_VALS_PATH}", - "out": ["slots"] - } - ], - "out": ["slots"] - }, - "train": { - "metrics": ["slots_accuracy"], - "evaluation_targets": [ - "valid", - "test" - ] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DATA_PATH": "{ROOT_PATH}/downloads/dstc2", - "SLOT_VALS_PATH": "{DATA_PATH}/dstc_slot_vals.json" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/dstc_slot_vals.tar.gz", - "subdir": "{DATA_PATH}" - } - ] - } -} diff --git a/deeppavlov/configs/ner/slotfill_simple_dstc2_raw.json b/deeppavlov/configs/ner/slotfill_simple_dstc2_raw.json deleted file mode 100644 index d6f9750e34..0000000000 --- a/deeppavlov/configs/ner/slotfill_simple_dstc2_raw.json +++ /dev/null @@ -1,54 +0,0 @@ -{ - "dataset_reader": { - "class_name": "simple_dstc2_reader", - "data_path": "{DATA_PATH}" - }, - "dataset_iterator": { - "class_name": "dstc2_ner_iterator", - "slot_values_path": "{SLOT_VALS_PATH}" - }, - "chainer": { - "in": ["x"], - "in_y": ["y"], - "pipe": [ - { - "in": ["x"], - "class_name": "lazy_tokenizer", - "out": ["x_tokens"] - }, - { - "in": ["x_tokens"], - "class_name": "str_lower", - "out": ["x_lower"] - }, - { - "in": ["x_lower"], - "class_name": "slotfill_raw", - "save_path": "{SLOT_VALS_PATH}", - "load_path": "{SLOT_VALS_PATH}", - "out": ["slots"] - } - ], - "out": ["slots"] - }, - "train": { - "metrics": ["slots_accuracy"], - "evaluation_targets": [ - "valid", - "test" - ] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DATA_PATH": "{ROOT_PATH}/downloads/simple-dstc2", - "SLOT_VALS_PATH": "{DATA_PATH}/dstc_slot_vals.json" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/dstc_slot_vals.tar.gz", - "subdir": "{DATA_PATH}" - } - ] - } -} diff --git a/deeppavlov/configs/ner/slotfill_simple_rasa_raw.json b/deeppavlov/configs/ner/slotfill_simple_rasa_raw.json deleted file mode 100644 index 1365ebe7f4..0000000000 --- a/deeppavlov/configs/ner/slotfill_simple_rasa_raw.json +++ /dev/null @@ -1,43 +0,0 @@ -{ - "chainer": { - "in": ["x"], - "in_y": ["y"], - "pipe": [ - { - "in": ["x"], - "class_name": "lazy_tokenizer", - "out": ["x_tokens"] - }, - { - "in": ["x_tokens"], - "class_name": "str_lower", - "out": ["x_lower"] - }, - { - "in": ["x_lower"], - "class_name": "slotfill_raw_rasa", - "save_path": "{DATA_PATH}", - "load_path": "{DATA_PATH}", - "out": ["slots"] - } - ], - "out": ["slots"] - }, - "train": { - "metrics": [], - "evaluation_targets": [] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "DATA_PATH": "{DOWNLOADS_PATH}/rasa_configs_reader" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/dp_minimal_rasa_demo.tar.gz", - "subdir": "{DATA_PATH}" - } - ] - } -} diff --git a/deeppavlov/configs/ner/vlsp2016_full.json b/deeppavlov/configs/ner/vlsp2016_full.json deleted file mode 100644 index ec8d10ffbe..0000000000 --- a/deeppavlov/configs/ner/vlsp2016_full.json +++ /dev/null @@ -1,170 +0,0 @@ -{ - "dataset_reader": { - "class_name": "conll2003_reader", - "data_path": "{DOWNLOADS_PATH}/vlsp2016/", - "dataset_name": "vlsp2016", - "provide_pos": true, - "provide_chunk": true, - "iobes": true - }, - "dataset_iterator": { - "class_name": "data_learning_iterator" - }, - "chainer": { - "in": ["x", "pos", "chunk"], - "in_y": ["y"], - "pipe": [ - { - "in": ["x"], - "out": ["x_tokens"], - "class_name": "lazy_tokenizer" - }, - { - "in": ["pos"], - "out": ["pos_tokens"], - "class_name": "lazy_tokenizer" - }, - { - "in": ["chunk"], - "out": ["chunk_tokens"], - "class_name": "lazy_tokenizer" - }, - { - "in": ["x_tokens"], - "out": ["x_lower", "sent_lengths"], - "class_name": "ner_preprocessor", - "id": "ner_preprocessor", - "get_x_padded_for_elmo": false, - "get_x_cap_padded": false - }, - { - "in": ["x_lower"], - "out": ["x_tok_ind"], - "fit_on": ["x_lower"], - "class_name": "ner_vocab", - "id": "word_vocab", - "save_path": "{MODELS_PATH}/word.dict", - "load_path": "{MODELS_PATH}/word.dict" - }, - { - "in": ["pos_tokens"], - "out": ["pos_ind"], - "fit_on": ["pos_tokens"], - "class_name": "ner_vocab", - "id": "pos_vocab", - "save_path": "{MODELS_PATH}/pos.dict", - "load_path": "{MODELS_PATH}/pos.dict" - }, - { - "in": ["chunk_tokens"], - "out": ["chunk_ind"], - "fit_on": ["chunk_tokens"], - "class_name": "ner_vocab", - "id": "chunk_vocab", - "save_path": "{MODELS_PATH}/chunk.dict", - "load_path": "{MODELS_PATH}/chunk.dict" - }, - { - "in": ["y"], - "out": ["y_ind"], - "fit_on": ["y"], - "class_name": "ner_vocab", - "id": "tag_vocab", - "save_path": "{MODELS_PATH}/tag.dict", - "load_path": "{MODELS_PATH}/tag.dict" - }, - { - "in": ["x_tokens"], - "out": ["x_char"], - "class_name": "char_splitter" - }, - { - "in": ["x_tokens"], - "out": ["x_char_ind"], - "fit_on": ["x_tokens"], - "class_name": "ner_vocab", - "char_level": true, - "id": "char_vocab", - "save_path": "{MODELS_PATH}/char.dict", - "load_path": "{MODELS_PATH}/char.dict" - }, - { - "in":[ - "sent_lengths", - "x_tok_ind", - "pos_ind", - "chunk_ind", - "x_char_ind" - ], - "in_y": ["y_ind"], - "out": ["y_predicted"], - "class_name": "hybrid_ner_model", - "n_tags": "#tag_vocab.len", - "word_emb_path": "{DOWNLOADS_PATH}/embeddings/baomoi.bin", - "word_emb_name": "baomoi", - "word_dim": 300, - "word_vocab": "#word_vocab", - "char_vocab_size": "#char_vocab.len", - "pos_vocab_size": "#pos_vocab.len", - "chunk_vocab_size": "#chunk_vocab.len", - "pos_dim": 40, - "chunk_dim": 40, - "char_dim": 100, - "lstm_hidden_size": 256, - "save_path": "{MODELS_PATH}/vlsp2016_full", - "load_path": "{MODELS_PATH}/vlsp2016_full", - "learning_rate": 1e-3, - "learning_rate_drop_patience": 5, - "learning_rate_drop_div": 10, - "dropout_keep_prob": 0.7 - }, - { - "in": ["y_predicted"], - "out": ["tags"], - "class_name": "convert_ids2tags", - "id2tag": "#tag_vocab.i2t" - } - ], - "out": ["x_tokens", "tags"] - }, - "train": { - "epochs": 100, - "batch_size": 64, - "metrics": [ - { - "name": "ner_f1", - "inputs": ["y", "tags"] - }, - { - "name": "ner_token_f1", - "inputs": ["y", "tags"] - } - ], - "validation_patience": 10, - "val_every_n_epochs": 1, - "log_every_n_epochs": 1, - "show_examples": false, - "class_name": "nn_trainer", - "evaluation_targets": [ - "valid", - "test" - ] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models/vlsp2016_full" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/ner_vlsp2016_full.tar.gz", - "subdir": "{MODELS_PATH}" - }, - { - "url": "http://files.deeppavlov.ai/embeddings/baomoi.bin", - "subdir": "{DOWNLOADS_PATH}/embeddings" - } - ] - } -} diff --git a/deeppavlov/configs/odqa/en_odqa_infer_enwiki20161221.json b/deeppavlov/configs/odqa/en_odqa_infer_enwiki20161221.json deleted file mode 100644 index 7b011f13d4..0000000000 --- a/deeppavlov/configs/odqa/en_odqa_infer_enwiki20161221.json +++ /dev/null @@ -1,69 +0,0 @@ -{ - "chainer": { - "in": [ - "question_raw" - ], - "out": [ - "best_answer" - ], - "pipe": [ - { - "config_path": "{CONFIGS_PATH}/doc_retrieval/en_ranker_tfidf_enwiki20161221.json", - "in": [ - "question_raw" - ], - "out": [ - "tfidf_doc_ids" - ] - }, - { - "class_name": "wiki_sqlite_vocab", - "in": [ - "tfidf_doc_ids" - ], - "out": [ - "tfidf_doc_text" - ], - "join_docs": false, - "shuffle": false, - "load_path": "{DOWNLOADS_PATH}/odqa/enwiki20161221.db" - }, - { - "class_name": "document_chunker", - "in": ["tfidf_doc_text"], - "out": ["chunks"], - "flatten_result": true, - "paragraphs": true - }, - { - "class_name": "string_multiplier", - "in": ["question_raw", "chunks"], - "out":["questions"] - }, - { - "class_name": "logit_ranker", - "batch_size": 10, - "squad_model": {"config_path": "{CONFIGS_PATH}/squad/multi_squad_noans_infer.json"}, - "sort_noans": true, - "in": [ - "chunks", - "questions" - ], - "out": [ - "best_answer", - "best_answer_score" - ] - } - ] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "CONFIGS_PATH": "{DEEPPAVLOV_PATH}/configs" - }, - "download": [ - ] - } -} \ No newline at end of file diff --git a/deeppavlov/configs/odqa/en_odqa_infer_wiki.json b/deeppavlov/configs/odqa/en_odqa_infer_wiki.json index dcaee5cf19..93c801c6b8 100644 --- a/deeppavlov/configs/odqa/en_odqa_infer_wiki.json +++ b/deeppavlov/configs/odqa/en_odqa_infer_wiki.json @@ -31,7 +31,7 @@ { "class_name": "logit_ranker", "batch_size": 64, - "squad_model": {"config_path": "{CONFIGS_PATH}/squad/multi_squad_noans_infer.json"}, + "squad_model": {"config_path": "{CONFIGS_PATH}/squad/qa_squad2_bert.json"}, "sort_noans": true, "in": ["chunks", "questions"], "out": ["answer", "answer_score", "answer_place"] diff --git a/deeppavlov/configs/odqa/en_odqa_pop_infer_enwiki20180211.json b/deeppavlov/configs/odqa/en_odqa_pop_infer_enwiki20180211.json index 82e5730644..6cf9bff88a 100644 --- a/deeppavlov/configs/odqa/en_odqa_pop_infer_enwiki20180211.json +++ b/deeppavlov/configs/odqa/en_odqa_pop_infer_enwiki20180211.json @@ -49,7 +49,7 @@ "class_name": "logit_ranker", "batch_size": 10, "squad_model": { - "config_path": "{CONFIGS_PATH}/squad/multi_squad_noans_infer.json" + "config_path": "{CONFIGS_PATH}/squad/qa_squad2_bert.json" }, "sort_noans": true, "in": [ @@ -78,4 +78,4 @@ "download": [ ] } -} \ No newline at end of file +} diff --git a/deeppavlov/configs/odqa/ru_odqa_infer_wiki.json b/deeppavlov/configs/odqa/ru_odqa_infer_wiki.json index c84e02125c..464415af66 100644 --- a/deeppavlov/configs/odqa/ru_odqa_infer_wiki.json +++ b/deeppavlov/configs/odqa/ru_odqa_infer_wiki.json @@ -43,7 +43,7 @@ { "class_name": "logit_ranker", "batch_size": 10, - "squad_model": {"config_path": "{CONFIGS_PATH}/squad/squad_ru.json"}, + "squad_model": {"config_path": "{CONFIGS_PATH}/squad/squad_ru_bert.json"}, "sort_noans": true, "in": [ "chunks", diff --git a/deeppavlov/configs/odqa/ru_odqa_infer_wiki_retr_noans.json b/deeppavlov/configs/odqa/ru_odqa_infer_wiki_retr_noans.json index b9c12d682c..293e73e204 100644 --- a/deeppavlov/configs/odqa/ru_odqa_infer_wiki_retr_noans.json +++ b/deeppavlov/configs/odqa/ru_odqa_infer_wiki_retr_noans.json @@ -43,7 +43,7 @@ { "class_name": "logit_ranker", "batch_size": 10, - "squad_model": {"config_path": "{CONFIGS_PATH}/squad/multi_squad_ru_retr_noans.json"}, + "squad_model": {"config_path": "{CONFIGS_PATH}/squad/qa_multisberquad_bert.json"}, "sort_noans": true, "in": [ "chunks", diff --git a/deeppavlov/configs/odqa/ru_odqa_infer_wiki_rubert.json b/deeppavlov/configs/odqa/ru_odqa_infer_wiki_rubert.json deleted file mode 100644 index 934ff0f9a6..0000000000 --- a/deeppavlov/configs/odqa/ru_odqa_infer_wiki_rubert.json +++ /dev/null @@ -1,70 +0,0 @@ -{ - "chainer": { - "in": [ - "question_raw" - ], - "out": [ - "best_answer" - ], - "pipe": [ - { - "config_path": "{CONFIGS_PATH}/doc_retrieval/ru_ranker_tfidf_wiki.json", - "in": [ - "question_raw" - ], - "out": [ - "tfidf_doc_ids" - ] - }, - { - "class_name": "wiki_sqlite_vocab", - "in": [ - "tfidf_doc_ids" - ], - "out": [ - "tfidf_doc_text" - ], - "join_docs": false, - "shuffle": false, - "load_path": "{DOWNLOADS_PATH}/odqa/ruwiki.db" - }, - { - "class_name": "document_chunker", - "in": ["tfidf_doc_text"], - "out": ["chunks"], - "flatten_result": true, - "paragraphs": true, - "tokens_limit": 100000 - }, - { - "class_name": "string_multiplier", - "in": ["question_raw", "chunks"], - "out":["questions"] - }, - { - "class_name": "logit_ranker", - "batch_size": 10000, - "squad_model": {"config_path": "{CONFIGS_PATH}/squad/squad_ru_rubert_infer.json"}, - "sort_noans": true, - "in": [ - "chunks", - "questions" - ], - "out": [ - "best_answer", - "best_answer_score" - ] - } - ] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "CONFIGS_PATH": "{DEEPPAVLOV_PATH}/configs" - }, - "download": [ - ] - } -} diff --git a/deeppavlov/configs/odqa/ru_odqa_infer_wiki_rubert_noans.json b/deeppavlov/configs/odqa/ru_odqa_infer_wiki_rubert_noans.json deleted file mode 100644 index 756e28cb97..0000000000 --- a/deeppavlov/configs/odqa/ru_odqa_infer_wiki_rubert_noans.json +++ /dev/null @@ -1,70 +0,0 @@ -{ - "chainer": { - "in": [ - "question_raw" - ], - "out": [ - "best_answer", "best_answer_score" - ], - "pipe": [ - { - "config_path": "{CONFIGS_PATH}/doc_retrieval/ru_ranker_tfidf_wiki.json", - "in": [ - "question_raw" - ], - "out": [ - "tfidf_doc_ids" - ] - }, - { - "class_name": "wiki_sqlite_vocab", - "in": [ - "tfidf_doc_ids" - ], - "out": [ - "tfidf_doc_text" - ], - "join_docs": false, - "shuffle": false, - "load_path": "{DOWNLOADS_PATH}/odqa/ruwiki.db" - }, - { - "class_name": "document_chunker", - "in": ["tfidf_doc_text"], - "out": ["chunks"], - "flatten_result": true, - "paragraphs": true, - "tokens_limit": 100000 - }, - { - "class_name": "string_multiplier", - "in": ["question_raw", "chunks"], - "out":["questions"] - }, - { - "class_name": "logit_ranker", - "batch_size": 10000, - "squad_model": {"config_path": "{CONFIGS_PATH}/squad/multi_squad_ru_retr_noans_rubert_infer.json"}, - "sort_noans": true, - "in": [ - "chunks", - "questions" - ], - "out": [ - "best_answer", - "best_answer_score" - ] - } - ] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "CONFIGS_PATH": "{DEEPPAVLOV_PATH}/configs" - }, - "download": [ - ] - } -} diff --git a/deeppavlov/configs/paramsearch/tfidf_logreg_autofaq_psearch.json b/deeppavlov/configs/paramsearch/tfidf_logreg_autofaq_psearch.json index f793916514..bf65d82229 100644 --- a/deeppavlov/configs/paramsearch/tfidf_logreg_autofaq_psearch.json +++ b/deeppavlov/configs/paramsearch/tfidf_logreg_autofaq_psearch.json @@ -53,8 +53,8 @@ ], "class_name": "sklearn_component", "main": true, - "save_path": "{MODELS_PATH}/faq/tfidf_logreg_classifier_v3.pkl", - "load_path": "{MODELS_PATH}/faq/tfidf_logreg_classifier_v3.pkl", + "save_path": "{MODELS_PATH}/faq/tfidf_logreg_classifier_v4.pkl", + "load_path": "{MODELS_PATH}/faq/tfidf_logreg_classifier_v4.pkl", "model_class": "sklearn.linear_model:LogisticRegression", "infer_method": "predict", "C": { @@ -94,7 +94,7 @@ }, "download": [ { - "url": "http://files.deeppavlov.ai/faq/school/tfidf_logreg_classifier_v3.pkl", + "url": "http://files.deeppavlov.ai/faq/school/tfidf_logreg_classifier_v4.pkl", "subdir": "{MODELS_PATH}/faq" }, { diff --git a/deeppavlov/configs/ranking/paraphrase_ident_paraphraser.json b/deeppavlov/configs/ranking/paraphrase_ident_paraphraser.json deleted file mode 100644 index d9cb4c5ec6..0000000000 --- a/deeppavlov/configs/ranking/paraphrase_ident_paraphraser.json +++ /dev/null @@ -1,108 +0,0 @@ -{ - "dataset_reader": { - "class_name": "paraphraser_reader", - "data_path": "{DOWNLOADS_PATH}/paraphraser_data" - }, - "dataset_iterator": { - "class_name": "siamese_iterator", - "seed": 243 - }, - "chainer": { - "in": ["x"], - "in_y": ["y"], - "pipe": [ - { - "id": "preproc", - "class_name": "siamese_preprocessor", - "use_matrix": false, - "max_sequence_length": 28, - "fit_on": ["x"], - "in": ["x"], - "out": ["x_proc"], - "sent_vocab": { - "id": "siam_sent_vocab", - "class_name": "simple_vocab", - "save_path": "{MODELS_PATH}/paraphraser_vocabs/sent.dict", - "load_path": "{MODELS_PATH}/paraphraser_vocabs/sent.dict" - }, - "tokenizer": { - "class_name": "nltk_tokenizer" - }, - "vocab": { - "id": "siam_vocab", - "class_name": "simple_vocab", - "save_path": "{MODELS_PATH}/paraphraser_vocabs/tok.dict", - "load_path": "{MODELS_PATH}/paraphraser_vocabs/tok.dict" - }, - "embedder": { - "id": "siam_embedder", - "class_name": "fasttext", - "load_path": "{DOWNLOADS_PATH}/embeddings/ft_native_300_ru_wiki_lenta_lower_case.bin" - } - }, - { - "id": "embeddings", - "class_name": "emb_mat_assembler", - "embedder": "#siam_embedder", - "vocab": "#siam_vocab" - }, - { - "in": ["x_proc"], - "in_y": ["y"], - "out": ["y_predicted"], - "class_name": "mpm_nn", - "len_vocab": "#siam_vocab.len", - "use_matrix": "#preproc.use_matrix", - "attention": true, - "max_sequence_length": "#preproc.max_sequence_length", - "emb_matrix": "#embeddings.emb_mat", - "embedding_dim": "#siam_embedder.dim", - "seed": 243, - "hidden_dim": 200, - "learning_rate": 1e-3, - "triplet_loss": false, - "batch_size": 256, - "save_path": "{MODELS_PATH}/paraphraser_model/model_weights.h5", - "load_path": "{MODELS_PATH}/paraphraser_model/model_weights.h5", - "preprocess": "#preproc.__call__" - } - ], - "out": ["y_predicted"] - }, - "train": { - "epochs": 200, - "batch_size": 256, - "pytest_max_batches": 2, - "train_metrics": ["f1", "acc", "log_loss"], - "metrics": ["f1", "acc", "log_loss"], - "validation_patience": 10, - "val_every_n_epochs": 1, - "log_every_n_batches": 24, - "class_name": "nn_trainer", - "evaluation_targets": [ - "valid", - "test" - ] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/datasets/paraphraser.zip", - "subdir": "{DOWNLOADS_PATH}/paraphraser_data" - }, - { - "url": "http://files.deeppavlov.ai/datasets/paraphraser_gold.zip", - "subdir": "{DOWNLOADS_PATH}/paraphraser_data" - }, - { - "url": "http://files.deeppavlov.ai/embeddings/ft_native_300_ru_wiki_lenta_lower_case/ft_native_300_ru_wiki_lenta_lower_case.bin", - "subdir": "{DOWNLOADS_PATH}/embeddings" - } - ] - } -} \ No newline at end of file diff --git a/deeppavlov/configs/ranking/paraphrase_ident_paraphraser_interact.json b/deeppavlov/configs/ranking/paraphrase_ident_paraphraser_interact.json deleted file mode 100644 index 4c6e3fa28b..0000000000 --- a/deeppavlov/configs/ranking/paraphrase_ident_paraphraser_interact.json +++ /dev/null @@ -1,121 +0,0 @@ -{ - "dataset_reader": { - "class_name": "paraphraser_reader", - "data_path": "{DOWNLOADS_PATH}/paraphraser_data" - }, - "dataset_iterator": { - "class_name": "siamese_iterator", - "seed": 243 - }, - "chainer": { - "in": ["x"], - "in_y": ["y"], - "pipe": [ - { - "id": "preproc", - "class_name": "siamese_preprocessor", - "use_matrix": false, - "max_sequence_length": 28, - "fit_on": ["x"], - "in": ["x"], - "out": ["x_proc"], - "sent_vocab": { - "id": "siam_sent_vocab", - "class_name": "simple_vocab", - "save_path": "{MODELS_PATH}/paraphraser_vocabs/sent.dict", - "load_path": "{MODELS_PATH}/paraphraser_vocabs/sent.dict" - }, - "tokenizer": { - "class_name": "nltk_tokenizer" - }, - "vocab": { - "id": "siam_vocab", - "class_name": "simple_vocab", - "save_path": "{MODELS_PATH}/paraphraser_vocabs/tok.dict", - "load_path": "{MODELS_PATH}/paraphraser_vocabs/tok.dict" - }, - "embedder": { - "id": "siam_embedder", - "class_name": "fasttext", - "load_path": "{DOWNLOADS_PATH}/embeddings/ft_native_300_ru_wiki_lenta_lower_case.bin" - } - }, - { - "id": "embeddings", - "class_name": "emb_mat_assembler", - "embedder": "#siam_embedder", - "vocab": "#siam_vocab" - }, - { - "id": "model", - "class_name": "mpm_nn", - "len_vocab": "#siam_vocab.len", - "use_matrix": "#preproc.use_matrix", - "attention": true, - "max_sequence_length": "#preproc.max_sequence_length", - "emb_matrix": "#embeddings.emb_mat", - "embedding_dim": "#siam_embedder.dim", - "seed": 243, - "hidden_dim": 200, - "learning_rate": 1e-3, - "triplet_loss": false, - "batch_size": 256, - "save_path": "{MODELS_PATH}/paraphraser_model/model_weights.h5", - "load_path": "{MODELS_PATH}/paraphraser_model/model_weights.h5", - "preprocess": "#preproc.__call__" - }, - { - "in": ["x_proc"], - "in_y": ["y"], - "out": ["y_predicted"], - "class_name": "siamese_predictor", - "model": "#model", - "ranking": false, - "attention": true, - "batch_size": "#model.batch_size", - "preproc_func": "#preproc.__call__" - } - ], - "out": ["y_predicted"] - }, - "train": { - "epochs": 200, - "batch_size": 256, - "pytest_max_batches": 2, - "train_metrics": ["f1", "acc", "log_loss"], - "metrics": ["f1", "acc", "log_loss"], - "validation_patience": 10, - "val_every_n_epochs": 5, - "log_every_n_batches": 12, - "class_name": "nn_trainer", - "evaluation_targets": [ - "valid", - "test" - ] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/paraphrase_ident_paraphraser.tar.gz", - "subdir": "{MODELS_PATH}" - }, - { - "url": "http://files.deeppavlov.ai/datasets/paraphraser.zip", - "subdir": "{DOWNLOADS_PATH}/paraphraser_data" - }, - { - "url": "http://files.deeppavlov.ai/datasets/paraphraser_gold.zip", - "subdir": "{DOWNLOADS_PATH}/paraphraser_data" - }, - { - "url": "http://files.deeppavlov.ai/embeddings/ft_native_300_ru_wiki_lenta_lower_case/ft_native_300_ru_wiki_lenta_lower_case.bin", - "subdir": "{DOWNLOADS_PATH}/embeddings" - } - ] - } -} \ No newline at end of file diff --git a/deeppavlov/configs/ranking/ranking_default.json b/deeppavlov/configs/ranking/ranking_default.json deleted file mode 100644 index 8d3ac4f15f..0000000000 --- a/deeppavlov/configs/ranking/ranking_default.json +++ /dev/null @@ -1,106 +0,0 @@ -{ - "dataset_reader": { - "class_name": "siamese_reader", - "data_path": "{DOWNLOADS_PATH}/default_ranking_data" - }, - "dataset_iterator": { - "class_name": "siamese_iterator" - }, - "chainer": { - "in": ["x"], - "in_y": ["y"], - "pipe": [ - { - "id": "preproc", - "class_name": "siamese_preprocessor", - "use_matrix": false, - "num_ranking_samples": 10, - "max_sequence_length": 50, - "fit_on": ["x"], - "in": ["x"], - "out": ["x_proc"], - "sent_vocab": { - "id": "siam_sent_vocab", - "class_name": "simple_vocab", - "save_path": "{MODELS_PATH}/default_ranking_vocabs/sent.dict", - "load_path": "{MODELS_PATH}/default_ranking_vocabs/sent.dict" - }, - "tokenizer": { - "class_name": "split_tokenizer" - }, - "vocab": { - "id": "siam_vocab", - "class_name": "simple_vocab", - "save_path": "{MODELS_PATH}/default_ranking_vocabs/tok.dict", - "load_path": "{MODELS_PATH}/default_ranking_vocabs/tok.dict" - }, - "embedder": { - "id": "siam_embedder", - "class_name": "fasttext", - "load_path": "{DOWNLOADS_PATH}/embeddings/wiki.ru.bin" - } - }, - { - "id": "embeddings", - "class_name": "emb_mat_assembler", - "embedder": "#siam_embedder", - "vocab": "#siam_vocab" - }, - { - "in": ["x_proc"], - "in_y": ["y"], - "out": ["y_predicted"], - "class_name": "bilstm_nn", - "len_vocab": "#siam_vocab.len", - "use_matrix": "#preproc.use_matrix", - "max_sequence_length": "#preproc.max_sequence_length", - "emb_matrix": "#embeddings.emb_mat", - "embedding_dim": "#siam_embedder.dim", - "seed": 243, - "reccurent": "bilstm", - "max_pooling": true, - "shared_weights": true, - "hidden_dim": 100, - "learning_rate": 1e-3, - "batch_size": 256, - "save_path": "{MODELS_PATH}/default_ranking_model/model_weights.h5", - "load_path": "{MODELS_PATH}/default_ranking_model/model_weights.h5", - "preprocess": "#preproc.__call__", - "interact_pred_num": 3 - } - ], - "out": ["y_predicted"] - }, - "train": { - "epochs": 10, - "batch_size": 256, - "pytest_max_batches": 2, - "train_metrics": ["f1", "acc"], - "metrics": ["r@1", "r@2", "r@5", "rank_response"], - "validation_patience": 3, - "val_every_n_epochs": 1, - "log_every_n_batches": 1, - "class_name": "nn_trainer", - "evaluation_targets": [ - "valid", - "test" - ] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/datasets/default_ranking_data.tar.gz", - "subdir": "{DOWNLOADS_PATH}/default_ranking_data" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/embeddings/wiki.ru.bin", - "subdir": "{DOWNLOADS_PATH}/embeddings" - } - ] - } -} \ No newline at end of file diff --git a/deeppavlov/configs/ranking/ranking_default_triplet.json b/deeppavlov/configs/ranking/ranking_default_triplet.json deleted file mode 100644 index fdfeb74621..0000000000 --- a/deeppavlov/configs/ranking/ranking_default_triplet.json +++ /dev/null @@ -1,108 +0,0 @@ -{ - "dataset_reader": { - "class_name": "siamese_reader", - "data_path": "{DOWNLOADS_PATH}/default_ranking_data_triplet" - }, - "dataset_iterator": { - "class_name": "siamese_iterator" - }, - "chainer": { - "in": ["x"], - "in_y": ["y"], - "pipe": [ - { - "id": "preproc", - "class_name": "siamese_preprocessor", - "use_matrix": false, - "num_ranking_samples": 10, - "max_sequence_length": 50, - "fit_on": ["x"], - "in": ["x"], - "out": ["x_proc"], - "sent_vocab": { - "id": "siam_sent_vocab", - "class_name": "simple_vocab", - "save_path": "{MODELS_PATH}/default_ranking_vocabs/sent.dict", - "load_path": "{MODELS_PATH}/default_ranking_vocabs/sent.dict" - }, - "tokenizer": { - "class_name": "split_tokenizer" - }, - "vocab": { - "id": "siam_vocab", - "class_name": "simple_vocab", - "save_path": "{MODELS_PATH}/default_ranking_vocabs/tok.dict", - "load_path": "{MODELS_PATH}/default_ranking_vocabs/tok.dict" - }, - "embedder": { - "id": "siam_embedder", - "class_name": "fasttext", - "load_path": "{DOWNLOADS_PATH}/embeddings/wiki.ru.bin" - } - }, - { - "id": "embeddings", - "class_name": "emb_mat_assembler", - "embedder": "#siam_embedder", - "vocab": "#siam_vocab" - }, - { - "in": ["x_proc"], - "in_y": ["y"], - "out": ["y_predicted"], - "class_name": "bilstm_nn", - "len_vocab": "#siam_vocab.len", - "use_matrix": "#preproc.use_matrix", - "max_sequence_length": "#preproc.max_sequence_length", - "emb_matrix": "#embeddings.emb_mat", - "embedding_dim": "#siam_embedder.dim", - "seed": 243, - "reccurent": "bilstm", - "max_pooling": true, - "shared_weights": true, - "hidden_dim": 100, - "triplet_loss": true, - "hard_triplets": false, - "learning_rate": 1e-3, - "batch_size": 256, - "save_path": "{MODELS_PATH}/default_ranking_model/model_weights.h5", - "load_path": "{MODELS_PATH}/default_ranking_model/model_weights.h5", - "preprocess": "#preproc.__call__", - "interact_pred_num": 3 - } - ], - "out": ["y_predicted"] - }, - "train": { - "epochs": 10, - "batch_size": 256, - "pytest_max_batches": 2, - "train_metrics": ["f1", "acc"], - "metrics": ["r@1", "r@2", "r@5", "rank_response"], - "validation_patience": 3, - "val_every_n_epochs": 1, - "log_every_n_batches": 1, - "class_name": "nn_trainer", - "evaluation_targets": [ - "valid", - "test" - ] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/datasets/default_ranking_data_triplet.tar.gz", - "subdir": "{DOWNLOADS_PATH}/default_ranking_data_triplet" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/embeddings/wiki.ru.bin", - "subdir": "{DOWNLOADS_PATH}/embeddings" - } - ] - } -} \ No newline at end of file diff --git a/deeppavlov/configs/ranking/ranking_ubuntu_v2_bert_sep.json b/deeppavlov/configs/ranking/ranking_ubuntu_v2_bert_sep.json deleted file mode 100644 index 200ac499bd..0000000000 --- a/deeppavlov/configs/ranking/ranking_ubuntu_v2_bert_sep.json +++ /dev/null @@ -1,72 +0,0 @@ -{ - "dataset_reader": { - "class_name": "ubuntu_v2_reader", - "data_path": "{DOWNLOADS_PATH}/ubuntu_v2_data", - "positive_samples": true - }, - "dataset_iterator": { - "class_name": "siamese_iterator", - "seed": 243 - }, - "chainer": { - "in": ["x"], - "in_y": ["y"], - "pipe": [ - { - "class_name": "bert_sep_ranker_preprocessor", - "vocab_file": "{DOWNLOADS_PATH}/bert_models/uncased_L-12_H-768_A-12/vocab.txt", - "do_lower_case": true, - "max_seq_length": 128, - "in": ["x"], - "out": ["bert_features"] - }, - { - "class_name": "bert_sep_ranker", - "bert_config_file": "{DOWNLOADS_PATH}/bert_models/uncased_L-12_H-768_A-12/bert_config.json", - "pretrained_bert": "{DOWNLOADS_PATH}/bert_models/uncased_L-12_H-768_A-12/bert_model.ckpt", - "save_path": "{MODEL_PATH}/model", - "load_path": "{MODEL_PATH}/model", - "learning_rate": 2e-05, - "in": ["bert_features"], - "in_y": ["y"], - "out": ["predictions"] - } - ], - "out": ["predictions"] - }, - "train": { - "batch_size": 16, - "pytest_max_batches": 2, - "train_metrics": [], - "metrics": ["r@1", "r@2", "r@5"], - "validation_patience": 1, - "val_every_n_batches": -1, - "val_every_n_epochs": 1, - "log_every_n_batches": -1, - "validate_best": true, - "test_best": true, - "tensorboard_log_dir": "{MODEL_PATH}/" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "MODEL_PATH": "{MODELS_PATH}/ubuntu_v2_uncased_bert_sep_model" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/datasets/ubuntu_v2_data.tar.gz", - "subdir": "{DOWNLOADS_PATH}/ubuntu_v2_data" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/bert/uncased_L-12_H-768_A-12.zip", - "subdir": "{DOWNLOADS_PATH}/bert_models" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/ubuntu_v2_uncased_bert_sep_model.tar.gz", - "subdir": "{MODELS_PATH}" - } - ] - } -} \ No newline at end of file diff --git a/deeppavlov/configs/ranking/ranking_ubuntu_v2_bert_sep_interact.json b/deeppavlov/configs/ranking/ranking_ubuntu_v2_bert_sep_interact.json deleted file mode 100644 index 8884dcfc24..0000000000 --- a/deeppavlov/configs/ranking/ranking_ubuntu_v2_bert_sep_interact.json +++ /dev/null @@ -1,91 +0,0 @@ -{ - "dataset_reader": { - "class_name": "ubuntu_v2_reader", - "data_path": "{DOWNLOADS_PATH}/ubuntu_v2_data", - "positive_samples": true - }, - "dataset_iterator": { - "class_name": "siamese_iterator", - "seed": 243 - }, - "chainer": { - "in": ["x"], - "in_y": ["y"], - "pipe": [ - { - "class_name": "response_base_loader", - "id": "loader", - "save_path": "{MODEL_PATH}", - "load_path": "{MODEL_PATH}" - }, - { - "class_name": "bert_sep_ranker_predictor_preprocessor", - "id": "preproc", - "vocab_file": "{DOWNLOADS_PATH}/bert_models/uncased_L-12_H-768_A-12/vocab.txt", - "do_lower_case": true, - "max_seq_length": 128, - "resps": "#loader.resps", - "resp_vecs": "#loader.resp_vecs", - "conts": "#loader.conts", - "cont_vecs": "#loader.cont_vecs", - "in": ["x"], - "out": ["bert_features"] - }, - { - "class_name": "bert_sep_ranker_predictor", - "resps": "#loader.resps", - "resp_vecs": "#loader.resp_vecs", - "resp_features": "#preproc.resp_features", - "conts": "#loader.conts", - "cont_vecs": "#loader.cont_vecs", - "cont_features": "#preproc.cont_features", - "interact_mode": 3, - "bert_config_file": "{DOWNLOADS_PATH}/bert_models/uncased_L-12_H-768_A-12/bert_config.json", - "pretrained_bert": "{DOWNLOADS_PATH}/bert_models/uncased_L-12_H-768_A-12/bert_model.ckpt", - "save_path": "{MODEL_PATH}", - "load_path": "{MODEL_PATH}/model", - "learning_rate": 2e-05, - "in": ["bert_features"], - "in_y": ["y"], - "out": ["predictions"] - } - ], - "out": ["predictions"] - }, - "train": { - "batch_size": 16, - "pytest_max_batches": 2, - "train_metrics": [], - "metrics": ["r@1", "r@2", "r@5"], - "validation_patience": 1, - "val_every_n_batches": -1, - "val_every_n_epochs": 1, - "log_every_n_batches": -1, - "validate_best": true, - "test_best": true, - "tensorboard_log_dir": "{MODEL_PATH}/" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "MODEL_PATH": "{MODELS_PATH}/ubuntu_v2_uncased_bert_sep_predictor_model" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/datasets/ubuntu_v2_data.tar.gz", - "subdir": "{DOWNLOADS_PATH}/ubuntu_v2_data" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/bert/uncased_L-12_H-768_A-12.zip", - "subdir": "{DOWNLOADS_PATH}/bert_models" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/ubuntu_v2_uncased_bert_sep_predictor_model.tar.gz", - "subdir": "{MODELS_PATH}" - } - - ] - } -} \ No newline at end of file diff --git a/deeppavlov/configs/ranking/ranking_ubuntu_v2_bert_uncased.json b/deeppavlov/configs/ranking/ranking_ubuntu_v2_bert_uncased.json deleted file mode 100644 index 266fe02630..0000000000 --- a/deeppavlov/configs/ranking/ranking_ubuntu_v2_bert_uncased.json +++ /dev/null @@ -1,72 +0,0 @@ -{ - "dataset_reader": { - "class_name": "ubuntu_v2_reader", - "data_path": "{DOWNLOADS_PATH}/ubuntu_v2_data" - }, - "dataset_iterator": { - "class_name": "siamese_iterator", - "seed": 243 - }, - "chainer": { - "in": ["x"], - "in_y": ["y"], - "pipe": [ - { - "class_name": "bert_ranker_preprocessor", - "vocab_file": "{DOWNLOADS_PATH}/bert_models/uncased_L-12_H-768_A-12/vocab.txt", - "do_lower_case": true, - "max_seq_length": 128, - "in": ["x"], - "out": ["bert_features"] - }, - { - "class_name": "bert_ranker", - "one_hot_labels": false, - "bert_config_file": "{DOWNLOADS_PATH}/bert_models/uncased_L-12_H-768_A-12/bert_config.json", - "pretrained_bert": "{DOWNLOADS_PATH}/bert_models/uncased_L-12_H-768_A-12/bert_model.ckpt", - "save_path": "{MODEL_PATH}/model", - "load_path": "{MODEL_PATH}/model", - "learning_rate": 2e-05, - "in": ["bert_features"], - "in_y": ["y"], - "out": ["predictions"] - } - ], - "out": ["predictions"] - }, - "train": { - "batch_size": 32, - "pytest_max_batches": 2, - "train_metrics": [], - "metrics": ["r@1", "r@2", "r@5"], - "validation_patience": 1, - "val_every_n_batches": -1, - "val_every_n_epochs": 1, - "log_every_n_batches": -1, - "validate_best": true, - "test_best": true, - "tensorboard_log_dir": "{MODEL_PATH}/" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "MODEL_PATH": "{MODELS_PATH}/ubuntu_v2_uncased_bert_model" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/datasets/ubuntu_v2_data.tar.gz", - "subdir": "{DOWNLOADS_PATH}/ubuntu_v2_data" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/bert/uncased_L-12_H-768_A-12.zip", - "subdir": "{DOWNLOADS_PATH}/bert_models" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/ubuntu_v2_uncased_bert_model.tar.gz", - "subdir": "{MODELS_PATH}" - } - ] - } -} \ No newline at end of file diff --git a/deeppavlov/configs/ranking/ranking_ubuntu_v2_mt.json b/deeppavlov/configs/ranking/ranking_ubuntu_v2_mt.json deleted file mode 100644 index 499bd3d3dc..0000000000 --- a/deeppavlov/configs/ranking/ranking_ubuntu_v2_mt.json +++ /dev/null @@ -1,107 +0,0 @@ -{ - "dataset_reader": { - "class_name": "ubuntu_v2_mt_reader", - "data_path": "{DOWNLOADS_PATH}/ubuntu_v2_data", - "num_context_turns": "{NUM_CONTEXT_TURNS}" - }, - "dataset_iterator": { - "class_name": "siamese_iterator", - "seed": 243 - }, - "chainer": { - "in": ["x"], - "in_y": ["y"], - "pipe": [ - { - "id": "preproc", - "class_name": "siamese_preprocessor", - "use_matrix": true, - "num_ranking_samples": 10, - "num_context_turns": "{NUM_CONTEXT_TURNS}", - "max_sequence_length": 50, - "fit_on": ["x"], - "in": ["x"], - "out": ["x_proc"], - "sent_vocab": { - "id": "siam_sent_vocab", - "class_name": "simple_vocab", - "save_path": "{MODELS_PATH}/ubuntu_v2_vocabs/sent.dict", - "load_path": "{MODELS_PATH}/ubuntu_v2_vocabs/sent.dict" - }, - "tokenizer": { - "class_name": "nltk_tokenizer" - }, - "vocab": { - "id": "siam_vocab", - "class_name": "simple_vocab", - "save_path": "{MODELS_PATH}/ubuntu_v2_mt_vocabs/tok.dict", - "load_path": "{MODELS_PATH}/ubuntu_v2_mt_vocabs/tok.dict" - }, - "embedder": { - "id": "siam_embedder", - "class_name": "fasttext", - "load_path": "{DOWNLOADS_PATH}/embeddings/wiki.en.bin" - } - }, - { - "id": "embeddings", - "class_name": "emb_mat_assembler", - "embedder": "#siam_embedder", - "vocab": "#siam_vocab" - }, - { - "in": ["x_proc"], - "in_y": ["y"], - "out": ["y_predicted"], - "class_name": "bilstm_gru_nn", - "use_matrix": "#preproc.use_matrix", - "num_context_turns": "{NUM_CONTEXT_TURNS}", - "len_vocab": "#siam_vocab.len", - "max_sequence_length": "#preproc.max_sequence_length", - "embedding_dim": "#siam_embedder.dim", - "emb_matrix": "#embeddings.emb_mat", - "seed": 243, - "hidden_dim": 300, - "learning_rate": 1e-3, - "triplet_loss": false, - "batch_size": 256, - "save_path": "{MODELS_PATH}/ubuntu_v2_mt_model/model_weights.h5", - "load_path": "{MODELS_PATH}/ubuntu_v2_mt_model/model_weights.h5" - } - ], - "out": ["y_predicted"] - }, - "train": { - "epochs": 200, - "batch_size": 256, - "pytest_max_batches": 2, - "train_metrics": [], - "metrics": ["r@1", "rank_response"], - "validation_patience": 10, - "val_every_n_epochs": 1, - "log_every_n_batches": 1000, - "class_name": "nn_trainer", - "evaluation_targets": [ - "valid", - "test" - ] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "NUM_CONTEXT_TURNS": 10 - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/datasets/ubuntu_v2_data.tar.gz", - "subdir": "{DOWNLOADS_PATH}/ubuntu_v2_data" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/embeddings/wiki.en.bin", - "subdir": "{DOWNLOADS_PATH}/embeddings" - } - ] - } -} \ No newline at end of file diff --git a/deeppavlov/configs/ranking/ranking_ubuntu_v2_mt_interact.json b/deeppavlov/configs/ranking/ranking_ubuntu_v2_mt_interact.json deleted file mode 100644 index 3ece2399dd..0000000000 --- a/deeppavlov/configs/ranking/ranking_ubuntu_v2_mt_interact.json +++ /dev/null @@ -1,121 +0,0 @@ -{ - "dataset_reader": { - "class_name": "ubuntu_v2_mt_reader", - "data_path": "{DOWNLOADS_PATH}/ubuntu_v2_data", - "num_context_turns": "{NUM_CONTEXT_TURNS}" - }, - "dataset_iterator": { - "class_name": "siamese_iterator", - "seed": 243 - }, - "chainer": { - "in": ["x"], - "in_y": ["y"], - "pipe": [ - { - "id": "preproc", - "class_name": "siamese_preprocessor", - "use_matrix": true, - "num_ranking_samples": 10, - "num_context_turns": "{NUM_CONTEXT_TURNS}", - "max_sequence_length": 50, - "fit_on": ["x"], - "in": ["x"], - "out": ["x_proc"], - "sent_vocab": { - "id": "siam_sent_vocab", - "class_name": "simple_vocab", - "save_path": "{MODELS_PATH}/ubuntu_v2_vocabs/sent.dict", - "load_path": "{MODELS_PATH}/ubuntu_v2_vocabs/sent.dict" - }, - "tokenizer": { - "class_name": "nltk_tokenizer" - }, - "vocab": { - "id": "siam_vocab", - "class_name": "simple_vocab", - "save_path": "{MODELS_PATH}/ubuntu_v2_mt_vocabs/tok.dict", - "load_path": "{MODELS_PATH}/ubuntu_v2_mt_vocabs/tok.dict" - }, - "embedder": { - "id": "siam_embedder", - "class_name": "fasttext", - "load_path": "{DOWNLOADS_PATH}/embeddings/wiki.en.bin" - } - }, - { - "id": "embeddings", - "class_name": "emb_mat_assembler", - "embedder": "#siam_embedder", - "vocab": "#siam_vocab" - }, - { - "id": "model", - "class_name": "bilstm_gru_nn", - "use_matrix": "#preproc.use_matrix", - "num_context_turns": "{NUM_CONTEXT_TURNS}", - "len_vocab": "#siam_vocab.len", - "max_sequence_length": "#preproc.max_sequence_length", - "embedding_dim": "#siam_embedder.dim", - "emb_matrix": "#embeddings.emb_mat", - "seed": 243, - "hidden_dim": 300, - "learning_rate": 1e-3, - "triplet_loss": false, - "batch_size": 256, - "save_path": "{MODELS_PATH}/ubuntu_v2_mt_model/model_weights.h5", - "load_path": "{MODELS_PATH}/ubuntu_v2_mt_model/model_weights.h5" - }, - { - "in": ["x_proc"], - "in_y": ["y"], - "out": ["y_predicted"], - "class_name": "siamese_predictor", - "model": "#model", - "num_context_turns": "{NUM_CONTEXT_TURNS}", - "batch_size": "#model.batch_size", - "responses": "#siam_sent_vocab", - "preproc_func": "#preproc.__call__" - } - ], - "out": ["y_predicted"] - }, - "train": { - "epochs": 200, - "batch_size": 256, - "pytest_max_batches": 2, - "train_metrics": [], - "metrics": ["r@1", "rank_response"], - "validation_patience": 10, - "val_every_n_epochs": 1, - "log_every_n_batches": 1000, - "class_name": "nn_trainer", - "evaluation_targets": [ - "valid", - "test" - ] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "NUM_CONTEXT_TURNS": 10 - - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/ubuntu_v2_mt_ranking.tar.gz", - "subdir": "{MODELS_PATH}" - }, - { - "url": "http://files.deeppavlov.ai/datasets/ubuntu_v2_data.tar.gz", - "subdir": "{DOWNLOADS_PATH}/ubuntu_v2_data" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/embeddings/wiki.en.bin", - "subdir": "{DOWNLOADS_PATH}/embeddings" - } - ] - } -} \ No newline at end of file diff --git a/deeppavlov/configs/ranking/ranking_ubuntu_v2_mt_word2vec_dam_transformer.json b/deeppavlov/configs/ranking/ranking_ubuntu_v2_mt_word2vec_dam_transformer.json deleted file mode 100644 index fb7d8aa31f..0000000000 --- a/deeppavlov/configs/ranking/ranking_ubuntu_v2_mt_word2vec_dam_transformer.json +++ /dev/null @@ -1,134 +0,0 @@ -{ - "info": "The config is for training or evaluation of DAM_USE-T on Ubuntu Dialogue Corpus v2 using prepared Word2vec embeddings", - "dataset_reader": { - "class_name": "ubuntu_v2_mt_reader", - "data_path": "{DOWNLOADS_PATH}/ubuntu_v2_data_clean", - "num_context_turns": "{NUM_CONTEXT_TURNS}", - "padding": "pre" - }, - "dataset_iterator": { - "class_name": "siamese_iterator", - "shuffle": true, - "seed": 243 - }, - "chainer": { - "in": ["x"], - "in_y": ["y"], - "pipe": [ - { - "class_name": "split_tokenizer", - "id": "tok_1" - }, - { - "class_name": "simple_vocab", - "special_tokens": ["", ""], - "unk_token": "", - "fit_on": ["x"], - "id": "vocab_1", - "save_path": "{MODELS_PATH}/ubuntu_v2_mt_word2vec_dam_transformer/vocabs/int_tok.dict", - "load_path": "{MODELS_PATH}/ubuntu_v2_mt_word2vec_dam_transformer/vocabs/int_tok.dict" - }, - { - "id": "word2vec_embedder", - "class_name": "glove", - "dim": 200, - "load_path": "{DOWNLOADS_PATH}/embeddings/v2_ubuntu_w2v_vectors.txt" - }, - { - "id": "preproc", - "class_name": "siamese_preprocessor", - "save_path": "{MODELS_PATH}/ubuntu_v2_mt_word2vec_dam_transformer/preproc/tok.dict", - "load_path": "{MODELS_PATH}/ubuntu_v2_mt_word2vec_dam_transformer/preproc/tok.dict", - "num_ranking_samples": 10, - "num_context_turns": "{NUM_CONTEXT_TURNS}", - "max_sequence_length": 50, - "embedding_dim": 200, - "add_raw_text": true, - "fit_on": ["x"], - "in": ["x"], - "out": ["x_proc"], - "tokenizer": { - "ref": "tok_1", - "notes": "use defined tokenizer" - }, - "vocab": { - "ref": "vocab_1", - "notes": "use vocab built for tokenized data" - } - }, - { - "id": "embeddings", - "class_name": "emb_mat_assembler", - "embedder": "#word2vec_embedder", - "vocab": "#vocab_1" - }, - { - "in": ["x_proc"], - "in_y": ["y"], - "out": ["y_predicted"], - "class_name": "dam_nn_use_transformer", - "stack_num": 5, - "is_positional": true, - "num_context_turns": "{NUM_CONTEXT_TURNS}", - "max_sequence_length": "#preproc.max_sequence_length", - "embedding_dim": "#word2vec_embedder.dim", - "emb_matrix": "#embeddings.emb_mat", - "learning_rate": 1e-3, - "batch_size": 100, - "seed": 65, - "decay_steps": 1000, - "save_path": "{MODELS_PATH}/ubuntu_v2_mt_word2vec_dam_transformer/model_dam/model", - "load_path": "{MODELS_PATH}/ubuntu_v2_mt_word2vec_dam_transformer/model_dam/model" - } - ], - "out": [ - "y_predicted" - ] - }, - "train": { - "class_name": "nn_trainer", - "epochs": 8, - "batch_size": 100, - "shuffle": true, - "pytest_max_batches": 2, - "train_metrics": [], - "validate_best": true, - "test_best": true, - "metrics": [ - "r@1", - "r@2", - "r@5", - "rank_response" - ], - "validation_patience": 1, - "val_every_n_epochs": 1, - "log_every_n_batches": 100, - "evaluation_targets": [ - "valid", - "test" - ], - "tensorboard_log_dir": "{MODELS_PATH}/ubuntu_v2_mt_word2vec_dam_transformer/logs_dam/" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "NUM_CONTEXT_TURNS": 10 - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/ubuntu_v2_mt_word2vec_dam_transformer.tar.gz", - "subdir": "{MODELS_PATH}" - }, - { - "url": "http://files.deeppavlov.ai/datasets/ubuntu_v2_data_clean.tar.gz", - "subdir": "{DOWNLOADS_PATH}/ubuntu_v2_data_clean" - }, - { - "url": "http://files.deeppavlov.ai/embeddings/v2_ubuntu_w2v_vectors.txt.tar.gz", - "subdir": "{DOWNLOADS_PATH}/embeddings" - } - ] - } -} diff --git a/deeppavlov/configs/ranking/ranking_ubuntu_v2_mt_word2vec_smn.json b/deeppavlov/configs/ranking/ranking_ubuntu_v2_mt_word2vec_smn.json deleted file mode 100644 index e6ef4cdd5e..0000000000 --- a/deeppavlov/configs/ranking/ranking_ubuntu_v2_mt_word2vec_smn.json +++ /dev/null @@ -1,127 +0,0 @@ -{ - "info": "The config is for training or evaluation of SMN on Ubuntu Dialogue Corpus v2 using prepared Word2vec embeddings", - "dataset_reader": { - "class_name": "ubuntu_v2_mt_reader", - "data_path": "{DOWNLOADS_PATH}/ubuntu_v2_data_clean", - "num_context_turns": "{NUM_CONTEXT_TURNS}", - "padding": "pre" - }, - "dataset_iterator": { - "class_name": "siamese_iterator", - "shuffle": true, - "seed": 243 - }, - "chainer": { - "in": ["x"], - "in_y": ["y"], - "pipe": [ - { - "class_name": "split_tokenizer", - "id": "tok_1" - }, - { - "class_name": "simple_vocab", - "special_tokens": ["", ""], - "unk_token": "", - "fit_on": ["x"], - "id": "vocab_1", - "save_path": "{MODELS_PATH}/ubuntu_v2_mt_word2vec_smn/vocabs/int_tok.dict", - "load_path": "{MODELS_PATH}/ubuntu_v2_mt_word2vec_smn/vocabs/int_tok.dict" - }, - { - "id": "word2vec_embedder", - "class_name": "glove", - "dim": 200, - "load_path": "{DOWNLOADS_PATH}/embeddings/v2_ubuntu_w2v_vectors.txt" - }, - { - "id": "preproc", - "class_name": "siamese_preprocessor", - "save_path": "{MODELS_PATH}/ubuntu_v2_mt_word2vec_smn/preproc/tok.dict", - "load_path": "{MODELS_PATH}/ubuntu_v2_mt_word2vec_smn/preproc/tok.dict", - "num_ranking_samples": 10, - "num_context_turns": "{NUM_CONTEXT_TURNS}", - "max_sequence_length": 50, - "embedding_dim": 200, - "fit_on": ["x"], - "in": ["x"], - "out": ["x_proc"], - "tokenizer": { - "ref": "tok_1", - "notes": "use defined tokenizer" - }, - "vocab": { - "ref": "vocab_1", - "notes": "use vocab built for tokenized data" - } - }, - { - "id": "embeddings", - "class_name": "emb_mat_assembler", - "embedder": "#word2vec_embedder", - "vocab": "#vocab_1" - }, - { - "in": ["x_proc"], - "in_y": ["y"], - "out": ["y_predicted"], - "class_name": "smn_nn", - "num_context_turns": "{NUM_CONTEXT_TURNS}", - "max_sequence_length": "#preproc.max_sequence_length", - "embedding_dim": "#word2vec_embedder.dim", - "emb_matrix": "#embeddings.emb_mat", - "learning_rate": 1e-3, - "batch_size": 500, - "seed": 65, - "save_path": "{MODELS_PATH}/ubuntu_v2_mt_word2vec_smn/model_smn/model", - "load_path": "{MODELS_PATH}/ubuntu_v2_mt_word2vec_smn/model_smn/model" - } - ], - "out": [ - "y_predicted" - ] - }, - "train": { - "class_name": "nn_trainer", - "epochs": 8, - "batch_size": 500, - "pytest_max_batches": 2, - "train_metrics": [], - "metrics": [ - "r@1", - "r@2", - "r@5", - "rank_response" - ], - "validation_patience": 3, - "val_every_n_epochs": 1, - "log_every_n_batches": 100, - "evaluation_targets": [ - "valid", - "test" - ], - "tensorboard_log_dir": "{MODELS_PATH}/ubuntu_v2_mt_word2vec_smn/logs_smn/" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "NUM_CONTEXT_TURNS": 10 - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/ubuntu_v2_mt_word2vec_smn.tar.gz", - "subdir": "{MODELS_PATH}" - }, - { - "url": "http://files.deeppavlov.ai/datasets/ubuntu_v2_data_clean.tar.gz", - "subdir": "{DOWNLOADS_PATH}/ubuntu_v2_data_clean" - }, - { - "url": "http://files.deeppavlov.ai/embeddings/v2_ubuntu_w2v_vectors.txt.tar.gz", - "subdir": "{DOWNLOADS_PATH}/embeddings" - } - ] - } -} \ No newline at end of file diff --git a/deeppavlov/configs/ranking/ranking_ubuntu_v2_torch_bert_uncased.json b/deeppavlov/configs/ranking/ranking_ubuntu_v2_torch_bert_uncased.json index 67d76ed6ef..4887abda31 100644 --- a/deeppavlov/configs/ranking/ranking_ubuntu_v2_torch_bert_uncased.json +++ b/deeppavlov/configs/ranking/ranking_ubuntu_v2_torch_bert_uncased.json @@ -89,7 +89,7 @@ "subdir": "{DOWNLOADS_PATH}/ubuntu_v2_data" }, { - "url": "http://files.deeppavlov.ai/deeppavlov_data/ubuntu_v2_uncased_torch_bert_model_v0.tar.gz", + "url": "http://files.deeppavlov.ai/deeppavlov_data/ubuntu_v2_uncased_torch_bert_model_v2.tar.gz", "subdir": "{MODELS_PATH}" } ] diff --git a/deeppavlov/configs/ranking/rel_ranking.json b/deeppavlov/configs/ranking/rel_ranking.json deleted file mode 100644 index 22e7008b14..0000000000 --- a/deeppavlov/configs/ranking/rel_ranking.json +++ /dev/null @@ -1,88 +0,0 @@ -{ - "dataset_reader": { - "class_name": "paraphraser_reader", - "data_path": "{DOWNLOADS_PATH}/rel_ranking", - "do_lower_case": false - }, - "dataset_iterator": { - "class_name": "siamese_iterator", - "seed": 243, - "len_valid": 500 - }, - "chainer": { - "in": ["text_a", "text_b"], - "in_y": ["y"], - "pipe": [ - { - "in": "text_a", - "out": "question_tok", - "id": "my_tokenizer", - "class_name": "nltk_tokenizer", - "tokenizer": "wordpunct_tokenize" - }, - { - "in": "text_b", - "out": "rel_tok", - "id": "my_tokenizer", - "class_name": "nltk_tokenizer", - "tokenizer": "wordpunct_tokenize" - }, - { - "id": "ft_embedder", - "class_name": "fasttext", - "load_path": "{DOWNLOADS_PATH}/embeddings/wordpunct_tok_reddit_comments_2017_11_300.bin", - "pad_zero": true - }, - { - "in": ["question_tok", "rel_tok"], - "out": ["question_emb", "rel_emb"], - "class_name": "two_sentences_emb", - "embedder": "#ft_embedder" - }, - { - "class_name": "rel_ranker", - "return_probas": true, - "save_path": "{MODEL_PATH}/model", - "load_path": "{MODEL_PATH}/model", - "learning_rate": 1e-3, - "dropout_keep_prob": 0.7, - "in": ["question_emb", "rel_emb"], - "in_y": ["y"], - "out": ["predictions"] - } - ], - "out": ["predictions"] - }, - "train": { - "batch_size": 50, - "pytest_max_batches": 2, - "metrics": ["f1", "acc"], - "validation_patience": 5, - "val_every_n_batches": 5000, - "log_every_n_batches": 5000, - "evaluation_targets": ["train", "valid", "test"], - "tensorboard_log_dir": "{MODEL_PATH}/" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "MODEL_PATH": "{MODELS_PATH}/rel_ranking" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/kbqa/datasets/rel_ranking.tar.gz", - "subdir": "{DOWNLOADS_PATH}/rel_ranking" - }, - { - "url": "http://files.deeppavlov.ai/kbqa/models/rel_ranking.tar.gz", - "subdir": "{MODELS_PATH}/rel_ranking" - }, - { - "url": "http://files.deeppavlov.ai/embeddings/reddit_fastText/wordpunct_tok_reddit_comments_2017_11_300.bin", - "subdir": "{DOWNLOADS_PATH}/embeddings" - } - ] - } -} diff --git a/deeppavlov/configs/ranking/rel_ranking_bert_en.json b/deeppavlov/configs/ranking/rel_ranking_bert_en.json new file mode 100644 index 0000000000..ae836ebcc9 --- /dev/null +++ b/deeppavlov/configs/ranking/rel_ranking_bert_en.json @@ -0,0 +1,106 @@ +{ + "dataset_reader": { + "class_name": "sq_reader", + "data_path": "{DOWNLOADS_PATH}/rel_ranking_eng/lcquad_rel_ranking.pickle" + }, + "dataset_iterator": { + "class_name": "basic_classification_iterator", + "seed": 42 + }, + "chainer": { + "in": ["question", "rel_list"], + "in_y": ["y"], + "pipe": [ + { + "class_name": "rel_ranking_preprocessor", + "vocab_file": "{TRANSFORMER}", + "do_lower_case": true, + "max_seq_length": 64, + "add_special_tokens": ["", "", ""], + "in": ["question", "rel_list"], + "out": ["bert_features"] + }, + { + "id": "classes_vocab", + "class_name": "simple_vocab", + "fit_on": ["y"], + "save_path": "{MODEL_PATH}/classes.dict", + "load_path": "{MODEL_PATH}/classes.dict", + "in": ["y"], + "out": ["y_ids"] + }, + { + "in": ["y_ids"], + "out": ["y_onehot"], + "class_name": "one_hotter", + "depth": "#classes_vocab.len", + "single_vector": true + }, + { + "class_name": "torch_transformers_classifier", + "n_classes": "#classes_vocab.len", + "return_probas": "true", + "num_special_tokens": 3, + "pretrained_bert": "{TRANSFORMER}", + "save_path": "{MODEL_PATH}/model", + "load_path": "{MODEL_PATH}/model", + "optimizer": "AdamW", + "optimizer_parameters": {"lr": 1e-05}, + "learning_rate_drop_patience": 5, + "learning_rate_drop_div": 2.0, + "in": ["bert_features"], + "in_y": ["y_ids"], + "out": ["y_pred_probas"] + }, + { + "in": ["y_pred_probas"], + "out": ["y_pred_ids"], + "class_name": "proba2labels", + "max_proba": true + }, + { + "in": ["y_pred_ids"], + "out": ["y_pred_labels"], + "ref": "classes_vocab" + } + ], + "out": ["y_pred_probas"] + }, + "train": { + "epochs": 3, + "batch_size": 30, + "metrics": [ + { + "name": "roc_auc", + "inputs": ["y_onehot", "y_pred_probas"] + }, + "accuracy", + "f1_macro" + ], + "validation_patience": 5, + "val_every_n_batches": 100, + "log_every_n_batches": 100, + "show_examples": false, + "evaluation_targets": ["train", "valid", "test"], + "class_name": "torch_trainer" + }, + "metadata": { + "variables": { + "ROOT_PATH": "~/.deeppavlov", + "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", + "MODELS_PATH": "{ROOT_PATH}/models", + "TRANSFORMER": "haisongzhang/roberta-tiny-cased", + "MODEL_PATH": "{MODELS_PATH}/classifiers/rel_ranking_bert_eng_torch" + }, + "download": [ + { + "url": "http://files.deeppavlov.ai/kbqa/wikidata/rel_ranking_bert_eng_torch.tar.gz", + "subdir": "{MODEL_PATH}" + }, + { + "url": "http://files.deeppavlov.ai/kbqa/wikidata/lcquad_rel_ranking.pickle", + "subdir": "{DOWNLOADS_PATH}/rel_ranking_eng" + } + ] + } +} diff --git a/deeppavlov/configs/ranking/rel_ranking_bert_ru.json b/deeppavlov/configs/ranking/rel_ranking_bert_ru.json new file mode 100644 index 0000000000..8bc7209c03 --- /dev/null +++ b/deeppavlov/configs/ranking/rel_ranking_bert_ru.json @@ -0,0 +1,106 @@ +{ + "dataset_reader": { + "class_name": "sq_reader", + "data_path": "{DOWNLOADS_PATH}/rel_ranking_rus/rubq_rel_ranking.pickle" + }, + "dataset_iterator": { + "class_name": "basic_classification_iterator", + "seed": 42 + }, + "chainer": { + "in": ["question", "rel_list"], + "in_y": ["y"], + "pipe": [ + { + "class_name": "rel_ranking_preprocessor", + "vocab_file": "{TRANSFORMER}", + "do_lower_case": true, + "max_seq_length": 64, + "add_special_tokens": ["", "", ""], + "in": ["question", "rel_list"], + "out": ["bert_features"] + }, + { + "id": "classes_vocab", + "class_name": "simple_vocab", + "fit_on": ["y"], + "save_path": "{MODEL_PATH}/classes.dict", + "load_path": "{MODEL_PATH}/classes.dict", + "in": ["y"], + "out": ["y_ids"] + }, + { + "in": ["y_ids"], + "out": ["y_onehot"], + "class_name": "one_hotter", + "depth": "#classes_vocab.len", + "single_vector": true + }, + { + "class_name": "torch_transformers_classifier", + "n_classes": "#classes_vocab.len", + "return_probas": "true", + "num_special_tokens": 3, + "pretrained_bert": "{TRANSFORMER}", + "save_path": "{MODEL_PATH}/model", + "load_path": "{MODEL_PATH}/model", + "optimizer": "AdamW", + "optimizer_parameters": {"lr": 1e-05}, + "learning_rate_drop_patience": 5, + "learning_rate_drop_div": 2.0, + "in": ["bert_features"], + "in_y": ["y_ids"], + "out": ["y_pred_probas"] + }, + { + "in": ["y_pred_probas"], + "out": ["y_pred_ids"], + "class_name": "proba2labels", + "max_proba": true + }, + { + "in": ["y_pred_ids"], + "out": ["y_pred_labels"], + "ref": "classes_vocab" + } + ], + "out": ["y_pred_probas"] + }, + "train": { + "epochs": 3, + "batch_size": 30, + "metrics": [ + { + "name": "roc_auc", + "inputs": ["y_onehot", "y_pred_probas"] + }, + "accuracy", + "f1_macro" + ], + "validation_patience": 5, + "val_every_n_batches": 100, + "log_every_n_batches": 100, + "show_examples": false, + "evaluation_targets": ["train", "valid", "test"], + "class_name": "torch_trainer" + }, + "metadata": { + "variables": { + "ROOT_PATH": "~/.deeppavlov", + "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", + "MODELS_PATH": "{ROOT_PATH}/models", + "TRANSFORMER": "DeepPavlov/distilrubert-tiny-cased-conversational", + "MODEL_PATH": "{MODELS_PATH}/classifiers/rel_ranking_bert_rus_torch" + }, + "download": [ + { + "url": "http://files.deeppavlov.ai/kbqa/wikidata/rel_ranking_bert_rus_torch.tar.gz", + "subdir": "{MODEL_PATH}" + }, + { + "url": "http://files.deeppavlov.ai/kbqa/wikidata/rubq_rel_ranking.pickle", + "subdir": "{DOWNLOADS_PATH}/rel_ranking_rus" + } + ] + } +} diff --git a/deeppavlov/configs/regressors/translation_ranker.json b/deeppavlov/configs/regressors/translation_ranker.json new file mode 100644 index 0000000000..161a6ad2c5 --- /dev/null +++ b/deeppavlov/configs/regressors/translation_ranker.json @@ -0,0 +1,105 @@ +{ + "metadata": + { + "variables": { + "BASE_MODEL": "cointegrated/LaBSE-en-ru", + "ROOT_PATH": "~/.deeppavlov", + "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", + "MODELS_PATH": "{ROOT_PATH}/models", + "MODEL_PATH": "{MODELS_PATH}/classifiers/ranker_labse", + "SEED": 42 + }, + "download": [ + { + "url": "http://files.deeppavlov.ai/v1/tmp/translation_ranker.tar.gz", + "subdir": "{MODELS_PATH}" + } + ] + }, + "dataset_iterator": { + "class_name": "huggingface_dataset_iterator", + "features": [ + "source", + "hypothesis" + ], + "label": "agg_score", + "seed": "{SEED}", + "use_label_name": false + }, + "chainer": { + "in": [ + "source", + "hypothesis" + ], + "in_y": [ + "score" + ], + "pipe": [ + { + "class_name": "torch_transformers_preprocessor", + "vocab_file": "{BASE_MODEL}", + "do_lower_case": false, + "max_seq_length": 256, + "in": [ + "source", + "hypothesis" + ], + "out": [ + "bert_features" + ] + }, + { + "class_name": "torch_transformers_classifier", + "n_classes": 1, + "return_probas": false, + "pretrained_bert": "{BASE_MODEL}", + "save_path": "{MODEL_PATH}/model", + "load_path": "{MODEL_PATH}/model", + "optimizer": "AdamW", + "optimizer_parameters": { + "lr": 2e-06, + "weight_decay": 0.1 + }, + "learning_rate_drop_patience": 3, + "learning_rate_drop_div": 2.0, + "in": [ + "bert_features" + ], + "in_y": [ + "score" + ], + "out": [ + "pred_score" + ] + } + ], + "out": [ + "pred_score" + ] + }, + "train": { + "batch_size": 32, + "metrics": [ + { + "name": "mean_squared_error", + "inputs": [ + "score", + "pred_score" + ] + } + ], + "validation_patience": 10, + "val_every_n_epochs": 1, + "log_every_n_epochs": 1, + "show_examples": false, + "class_name": "torch_trainer", + "evaluation_targets": [ + "train", + "valid" + ], + "metric_optimization": "minimize", + "tensorboard_log_dir": "{MODEL_PATH}/", + "pytest_max_batches": 2, + "pytest_batch_size": 2 + } +} diff --git a/deeppavlov/configs/relation_extraction/re_docred.json b/deeppavlov/configs/relation_extraction/re_docred.json index 717cabde15..e0b3a4841f 100644 --- a/deeppavlov/configs/relation_extraction/re_docred.json +++ b/deeppavlov/configs/relation_extraction/re_docred.json @@ -17,7 +17,7 @@ "in": ["tokens", "entity_pos", "entity_tags"], "out": ["input_ids", "attention_mask", "upd_entity_pos", "upd_entity_tags", "nf_samples"], "class_name": "re_preprocessor", - "vocab_file": "bert-base-uncased", + "vocab_file": "bert-base-cased", "default_tag": "PER" }, { @@ -30,7 +30,7 @@ "model_name": "re_model", "n_classes": 97, "num_ner_tags": 6, - "pretrained_bert": "bert-base-uncased", + "pretrained_bert": "bert-base-cased", "return_probas": true }, { @@ -45,13 +45,13 @@ }, "train": { "epochs": 50, - "batch_size": 16, + "batch_size": 30, "log_every_n_batches": 100, "train_metrics": ["f1_weighted", "acc"], "evaluation_targets": ["valid", "train"], "metrics": ["f1_weighted", "acc"], "validation_patience": 50, - "val_every_n_batches": 100, + "val_every_n_batches": 200, "show_examples": false, "class_name": "torch_trainer" }, @@ -68,7 +68,7 @@ "subdir": "{DOWNLOADS_PATH}/docred" }, { - "url": "http://files.deeppavlov.ai/deeppavlov_data/relation_extraction/re_docred_model.tar.gz", + "url": "http://files.deeppavlov.ai/deeppavlov_data/relation_extraction/re_docred_model_v1.tar.gz", "subdir": "{MODELS_PATH}/re_docred" }, { diff --git a/deeppavlov/configs/relation_extraction/re_rured.json b/deeppavlov/configs/relation_extraction/re_rured.json index cd84ec532f..b7254fdb87 100644 --- a/deeppavlov/configs/relation_extraction/re_rured.json +++ b/deeppavlov/configs/relation_extraction/re_rured.json @@ -70,7 +70,7 @@ "subdir": "{DOWNLOADS_PATH}/rured" }, { - "url": "http://files.deeppavlov.ai/deeppavlov_data/relation_extraction/re_rured_model.tar.gz", + "url": "http://files.deeppavlov.ai/deeppavlov_data/relation_extraction/re_rured_model_v1.tar.gz", "subdir": "{MODELS_PATH}/re_rured" } ] diff --git a/deeppavlov/configs/sentence_segmentation/sentseg_dailydialog.json b/deeppavlov/configs/sentence_segmentation/sentseg_dailydialog.json deleted file mode 100644 index 3e3737ee67..0000000000 --- a/deeppavlov/configs/sentence_segmentation/sentseg_dailydialog.json +++ /dev/null @@ -1,130 +0,0 @@ -{ - "dataset_reader": { - "class_name": "conll2003_reader", - "data_path": "{DOWNLOADS_PATH}/dailydialog/", - "dataset_name": "dailydialog" - }, - "dataset_iterator": { - "class_name": "data_learning_iterator" - }, - "chainer": { - "in": ["x"], - "in_y": ["y_tokens"], - "pipe": [ - { - "in": ["x"], - "out": ["x_tokens"], - "class_name": "lazy_tokenizer" - }, - { - "in": ["x_tokens"], - "out": ["x_lower", "sent_lengths", "x_tokens_elmo"], - "class_name": "ner_preprocessor", - "get_x_padded_for_elmo": true - }, - { - "in": ["x_lower"], - "out": ["x_tok_ind"], - "fit_on": ["x_lower"], - "class_name": "ner_vocab", - "id": "word_vocab", - "save_path": "{MODELS_PATH}/word.dict", - "load_path": "{MODELS_PATH}/word.dict" - }, - { - "in": ["x_tokens"], - "out": ["x_char_ind"], - "fit_on": ["x_tokens"], - "class_name": "ner_vocab", - "char_level": true, - "id": "char_vocab", - "save_path": "{MODELS_PATH}/char.dict", - "load_path": "{MODELS_PATH}/char.dict" - }, - { - "in": ["y_tokens"], - "out": ["y_ind"], - "fit_on": ["y_tokens"], - "class_name": "ner_vocab", - "id": "tag_vocab", - "save_path": "{MODELS_PATH}/tag.dict", - "load_path": "{MODELS_PATH}/tag.dict" - }, - { - "in": [ - "sent_lengths", - "x_tok_ind", - "x_char_ind", - "x_tokens_elmo" - ], - "in_y": ["y_ind"], - "out": ["y_predicted"], - "class_name": "hybrid_ner_model", - "n_tags": "#tag_vocab.len", - "word_emb_path": "{DOWNLOADS_PATH}/embeddings/glove.6B.100d.txt", - "word_emb_name": "glove", - "word_dim": 100, - "word_vocab": "#word_vocab", - "char_vocab_size": "#char_vocab.len", - "char_dim": 64, - "elmo_dim": 100, - "lstm_hidden_size": 256, - "save_path": "{MODELS_PATH}/sentseg_dailydialog", - "load_path": "{MODELS_PATH}/sentseg_dailydialog", - "learning_rate": 1e-3, - "learning_rate_drop_patience": 5, - "learning_rate_drop_div": 10, - "dropout_keep_prob": 0.7 - }, - { - "in": ["y_predicted"], - "out": ["tags"], - "class_name": "convert_ids2tags", - "id2tag": "#tag_vocab.i2t" - }, - { - "in": ["x_tokens","tags"], - "out": "punctuated_sents", - "class_name": "sentseg_restore_sent" - } - ], - "out": ["x", "punctuated_sents"] - }, - "train": { - "epochs": 100, - "batch_size": 50, - "metrics": [ - { - "name": "ner_f1", - "inputs": ["y_tokens", "tags"] - }, - { - "name": "ner_token_f1", - "inputs": ["y_tokens", "tags"] - } - ], - "validation_patience": 10, - "val_every_n_epochs": 1, - "log_every_n_epochs": 1, - "show_examples": false, - "class_name": "nn_trainer", - "evaluation_targets": ["valid", "test"] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models/sentseg_dailydialog" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/sentseg_dailydialog.tar.gz", - "subdir": "{MODELS_PATH}" - }, - { - "url": "http://files.deeppavlov.ai/embeddings/glove.6B.100d.txt", - "subdir": "{DOWNLOADS_PATH}/embeddings" - } - ] - } -} diff --git a/deeppavlov/configs/ner/ner_ontonotes_bert_mult_torch.json b/deeppavlov/configs/sentence_segmentation/sentseg_dailydialog_bert.json similarity index 74% rename from deeppavlov/configs/ner/ner_ontonotes_bert_mult_torch.json rename to deeppavlov/configs/sentence_segmentation/sentseg_dailydialog_bert.json index ddced871ac..f0ad27c558 100644 --- a/deeppavlov/configs/ner/ner_ontonotes_bert_mult_torch.json +++ b/deeppavlov/configs/sentence_segmentation/sentseg_dailydialog_bert.json @@ -1,9 +1,8 @@ { "dataset_reader": { "class_name": "conll2003_reader", - "data_path": "{DOWNLOADS_PATH}/ontonotes/", - "dataset_name": "ontonotes", - "provide_pos": false + "data_path": "{DOWNLOADS_PATH}/dailydialog/", + "dataset_name": "dailydialog" }, "dataset_iterator": { "class_name": "data_learning_iterator" @@ -15,12 +14,12 @@ { "class_name": "torch_transformers_ner_preprocessor", "vocab_file": "{TRANSFORMER}", - "do_lower_case": false, + "do_lower_case": true, "max_seq_length": 512, "max_subword_length": 15, "token_masking_prob": 0.0, "in": ["x"], - "out": ["x_tokens", "x_subword_tokens", "x_subword_tok_ids", "startofword_markers", "attention_mask"] + "out": ["x_tokens", "x_subword_tokens", "x_subword_tok_ids", "startofword_markers", "attention_mask", "tokens_offsets"] }, { "id": "tag_vocab", @@ -38,7 +37,6 @@ "n_tags": "#tag_vocab.len", "pretrained_bert": "{TRANSFORMER}", "attention_probs_keep_prob": 0.5, - "return_probas": false, "encoder_layer_ids": [-1], "optimizer": "AdamW", "optimizer_parameters": { @@ -49,26 +47,31 @@ }, "clip_norm": 1.0, "min_learning_rate": 1e-07, - "learning_rate_drop_patience": 30, + "learning_rate_drop_patience": 6, "learning_rate_drop_div": 1.5, "load_before_drop": true, "save_path": "{MODEL_PATH}/model", "load_path": "{MODEL_PATH}/model", "in": ["x_subword_tok_ids", "attention_mask", "startofword_markers"], "in_y": ["y_ind"], - "out": ["y_pred_ind"] + "out": ["y_pred_ind", "probas"] }, { "ref": "tag_vocab", "in": ["y_pred_ind"], "out": ["y_pred"] + }, + { + "in": ["x_tokens", "y_pred"], + "out": "punctuated_sents", + "class_name": "sentseg_restore_sent" } ], - "out": ["x_tokens", "y_pred"] + "out": ["x_tokens", "punctuated_sents"] }, "train": { "epochs": 30, - "batch_size": 10, + "batch_size": 30, "metrics": [ { "name": "ner_f1", @@ -79,9 +82,9 @@ "inputs": ["y", "y_pred"] } ], - "validation_patience": 100, - "val_every_n_batches": 20, - "log_every_n_batches": 20, + "validation_patience": 20, + "val_every_n_batches": 100, + "log_every_n_batches": 100, "show_examples": false, "pytest_max_batches": 2, "pytest_batch_size": 8, @@ -93,13 +96,13 @@ "ROOT_PATH": "~/.deeppavlov", "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", "MODELS_PATH": "{ROOT_PATH}/models", - "TRANSFORMER": "bert-base-multilingual-cased", - "MODEL_PATH": "{MODELS_PATH}/ner_ontonotes_bert_mult_torch/{TRANSFORMER}" + "TRANSFORMER": "bert-base-uncased", + "MODEL_PATH": "{MODELS_PATH}/sentseg_dailydialog_bert" }, "download": [ { - "url": "http://files.deeppavlov.ai/v1/ner/ner_ontonotes_bert_mult_torch.tar.gz", - "subdir": "{ROOT_PATH}/models" + "url": "http://files.deeppavlov.ai/deeppavlov_data/sentseg_dailydialog_bert.tar.gz", + "subdir": "{MODEL_PATH}" } ] } diff --git a/deeppavlov/configs/skills/aiml_skill.json b/deeppavlov/configs/skills/aiml_skill.json deleted file mode 100644 index 5a454fa4da..0000000000 --- a/deeppavlov/configs/skills/aiml_skill.json +++ /dev/null @@ -1,44 +0,0 @@ -{ - "chainer": { - "in": [ - "utterances_batch", - "states_batch" - ], - "out": [ - "responses_batch", - "confidences_batch", - "output_states_batch" - ], - "pipe": [ - { - "class_name": "aiml_skill", - "path_to_aiml_scripts": "{DOWNLOADS_PATH}/aiml_scripts", - "positive_confidence": 0.66, - "null_response": "I don't know", - "null_confidence": 0.33, - "in": [ - "utterances_batch", - "states_batch" - ], - "out": [ - "responses_batch", - "confidences_batch", - "output_states_batch" - ] - } - ] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/aiml_skill/aiml_scripts.tar.gz", - "subdir": "{DOWNLOADS_PATH}" - } - ] - } -} \ No newline at end of file diff --git a/deeppavlov/configs/skills/dsl_skill.json b/deeppavlov/configs/skills/dsl_skill.json deleted file mode 100644 index 296c0708ee..0000000000 --- a/deeppavlov/configs/skills/dsl_skill.json +++ /dev/null @@ -1,40 +0,0 @@ -{ - "chainer": { - "in": [ - "utterances_batch", - "user_ids_batch" - ], - "out": [ - "responses_batch", - "confidences_batch" - ], - "pipe": [ - { - "class_name": "ru_tokenizer", - "in": "utterances_batch", - "lowercase": true, - "out": "utterance_tokens_batch" - }, - { - "class_name": "DSLSkill", - "on_invalid_command": "Sorry, I do not understand you", - "null_confidence": 0.0, - "in": [ - "utterance_tokens_batch", - "user_ids_batch" - ], - "out": [ - "responses_batch", - "confidences_batch" - ] - } - ] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models" - } - } -} \ No newline at end of file diff --git a/deeppavlov/configs/skills/rasa_skill.json b/deeppavlov/configs/skills/rasa_skill.json deleted file mode 100644 index 22936c660d..0000000000 --- a/deeppavlov/configs/skills/rasa_skill.json +++ /dev/null @@ -1,39 +0,0 @@ -{ - "chainer": { - "in": [ - "utterances" - ], - "out": [ - "responses_batch", - "confidences_batch" - ], - "pipe": [ - { - "class_name": "rasa_skill", - "path_to_models": "{PROJECT_ROOT}/models", - "in": [ - "utterances" - ], - "out": [ - "responses_batch", - "confidences_batch", - "output_states_batch" - ] - } - ] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "PROJECT_ROOT": "{DOWNLOADS_PATH}/rasa_tutorial_project" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/rasa_skill/rasa_tutorial_project.tar.gz", - "subdir": "{DOWNLOADS_PATH}" - } - ] - } -} diff --git a/deeppavlov/configs/spelling_correction/brillmoore_kartaslov_ru.json b/deeppavlov/configs/spelling_correction/brillmoore_kartaslov_ru.json deleted file mode 100644 index d24b70d8e4..0000000000 --- a/deeppavlov/configs/spelling_correction/brillmoore_kartaslov_ru.json +++ /dev/null @@ -1,82 +0,0 @@ -{ - "dataset_reader": { - "class_name": "typos_kartaslov_reader", - "data_path": "{DOWNLOADS_PATH}" - }, - "dataset_iterator": { - "class_name": "typos_iterator", - "test_ratio": 0.02 - }, - "chainer":{ - "in": ["x"], - "in_y": ["y"], - "pipe": [ - { - "class_name": "str_lower", - "id": "lower", - "in": ["x"], - "out": ["x_lower"] - }, - { - "class_name": "nltk_moses_tokenizer", - "id": "tokenizer", - "in": ["x_lower"], - "out": ["x_tokens"] - }, - { - "ref": "tokenizer", - "in": ["y"], - "out": ["y_tokens"] - }, - { - "fit_on": ["x_tokens", "y_tokens"], - "in": ["x_tokens"], - "out": ["tokens_candidates"], - "class_name": "spelling_error_model", - "window": 1, - "candidates_count": 4, - "dictionary": { - "class_name": "russian_words_vocab", - "data_dir": "{DOWNLOADS_PATH}/vocabs" - }, - "save_path": "{MODELS_PATH}/error_model/error_model_ru.tsv", - "load_path": "{MODELS_PATH}/error_model/error_model_ru.tsv" - }, - { - "class_name": "kenlm_elector", - "in": ["tokens_candidates"], - "out": ["y_predicted_tokens"], - "load_path": "{DOWNLOADS_PATH}/language_models/ru_wiyalen_no_punkt.arpa.binary" - }, - { - "ref": "tokenizer", - "in": ["y_predicted_tokens"], - "out": ["y_predicted"] - } - ], - "out": ["y_predicted"] - }, - "train": { - "evaluation_targets": [ - "test" - ], - "class_name": "fit_trainer" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/error_model.tar.gz", - "subdir": "{MODELS_PATH}" - }, - { - "url": "http://files.deeppavlov.ai/lang_models/ru_wiyalen_no_punkt.arpa.binary.gz", - "subdir": "{DOWNLOADS_PATH}/language_models" - } - ] - } -} \ No newline at end of file diff --git a/deeppavlov/configs/spelling_correction/brillmoore_kartaslov_ru_custom_vocab.json b/deeppavlov/configs/spelling_correction/brillmoore_kartaslov_ru_custom_vocab.json deleted file mode 100644 index 46694d2205..0000000000 --- a/deeppavlov/configs/spelling_correction/brillmoore_kartaslov_ru_custom_vocab.json +++ /dev/null @@ -1,84 +0,0 @@ -{ - "dataset_reader": { - "class_name": "typos_kartaslov_reader", - "data_path": "{DOWNLOADS_PATH}" - }, - "dataset_iterator": { - "class_name": "typos_iterator", - "test_ratio": 0.02 - }, - "chainer":{ - "in": ["x"], - "in_y": ["y"], - "pipe": [ - { - "class_name": "str_lower", - "id": "lower", - "in": ["x"], - "out": ["x_lower"] - }, - { - "class_name": "nltk_moses_tokenizer", - "id": "tokenizer", - "in": ["x_lower"], - "out": ["x_tokens"] - }, - { - "ref": "tokenizer", - "in": ["y"], - "out": ["y_tokens"] - }, - { - "fit_on": ["x_tokens", "y_tokens"], - "in": ["x_tokens"], - "out": ["tokens_candidates"], - "class_name": "spelling_error_model", - "window": 1, - "candidates_count": 4, - "dictionary": { - "class_name": "static_dictionary", - "dictionary_name": "compreno_words", - "data_dir": "{DOWNLOADS_PATH}/vocabs", - "raw_dictionary_path": "./compreno_wordforms.txt" - }, - "save_path": "{MODELS_PATH}/error_model/error_model_ru.tsv", - "load_path": "{MODELS_PATH}/error_model/error_model_ru.tsv" - }, - { - "class_name": "kenlm_elector", - "in": ["tokens_candidates"], - "out": ["y_predicted_tokens"], - "load_path": "{DOWNLOADS_PATH}/language_models/ru_wiyalen_no_punkt.arpa.binary" - }, - { - "ref": "tokenizer", - "in": ["y_predicted_tokens"], - "out": ["y_predicted"] - } - ], - "out": ["y_predicted"] - }, - "train": { - "evaluation_targets": [ - "test" - ], - "class_name": "fit_trainer" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/error_model.tar.gz", - "subdir": "{MODELS_PATH}" - }, - { - "url": "http://files.deeppavlov.ai/lang_models/ru_wiyalen_no_punkt.arpa.binary.gz", - "subdir": "{DOWNLOADS_PATH}/language_models" - } - ] - } -} \ No newline at end of file diff --git a/deeppavlov/configs/spelling_correction/brillmoore_kartaslov_ru_nolm.json b/deeppavlov/configs/spelling_correction/brillmoore_kartaslov_ru_nolm.json deleted file mode 100644 index 6aa7de9c85..0000000000 --- a/deeppavlov/configs/spelling_correction/brillmoore_kartaslov_ru_nolm.json +++ /dev/null @@ -1,77 +0,0 @@ -{ - "dataset_reader": { - "class_name": "typos_kartaslov_reader", - "data_path": "{DOWNLOADS_PATH}" - }, - "dataset_iterator": { - "class_name": "typos_iterator", - "test_ratio": 0.02 - }, - "chainer":{ - "in": ["x"], - "in_y": ["y"], - "pipe": [ - { - "class_name": "str_lower", - "id": "lower", - "in": ["x"], - "out": ["x_lower"] - }, - { - "class_name": "nltk_moses_tokenizer", - "id": "tokenizer", - "in": ["x_lower"], - "out": ["x_tokens"] - }, - { - "ref": "tokenizer", - "in": ["y"], - "out": ["y_tokens"] - }, - { - "fit_on": ["x_tokens", "y_tokens"], - "in": ["x_tokens"], - "out": ["tokens_candidates"], - "class_name": "spelling_error_model", - "window": 1, - "candidates_count": 1, - "dictionary": { - "class_name": "russian_words_vocab", - "data_dir": "{DOWNLOADS_PATH}/vocabs" - }, - "save_path": "{MODELS_PATH}/error_model/error_model_ru.tsv", - "load_path": "{MODELS_PATH}/error_model/error_model_ru.tsv" - }, - { - "class_name": "top1_elector", - "in": ["tokens_candidates"], - "out": ["y_predicted_tokens"] - }, - { - "ref": "tokenizer", - "in": ["y_predicted_tokens"], - "out": ["y_predicted"] - } - ], - "out": ["y_predicted"] - }, - "train": { - "evaluation_targets": [ - "test" - ], - "class_name": "fit_trainer" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/error_model.tar.gz", - "subdir": "{MODELS_PATH}" - } - ] - } -} \ No newline at end of file diff --git a/deeppavlov/configs/spelling_correction/brillmoore_wikitypos_en.json b/deeppavlov/configs/spelling_correction/brillmoore_wikitypos_en.json index 60ed162888..73b9e43a39 100644 --- a/deeppavlov/configs/spelling_correction/brillmoore_wikitypos_en.json +++ b/deeppavlov/configs/spelling_correction/brillmoore_wikitypos_en.json @@ -73,6 +73,10 @@ { "url": "http://files.deeppavlov.ai/lang_models/en_wiki_no_punkt.arpa.binary.gz", "subdir": "{DOWNLOADS_PATH}/language_models" + }, + { + "url": "http://files.deeppavlov.ai/datasets/wiktionary/wikipedia_100K_vocab.tar.gz", + "subdir": "{DOWNLOADS_PATH}/vocabs" } ] } diff --git a/deeppavlov/configs/squad/multi_squad_noans.json b/deeppavlov/configs/squad/multi_squad_noans.json deleted file mode 100644 index c423b47036..0000000000 --- a/deeppavlov/configs/squad/multi_squad_noans.json +++ /dev/null @@ -1,148 +0,0 @@ -{ - "dataset_reader": { - "class_name": "squad_dataset_reader", - "dataset": "MultiSQuAD", - "data_path": "{DOWNLOADS_PATH}/multi_squad/" - }, - "dataset_iterator": { - "class_name": "multi_squad_iterator", - "seed": 1337, - "shuffle": true, - "with_answer_rate": 0.666 - }, - "chainer": { - "in": ["context_raw", "question_raw"], - "in_y": ["ans_raw", "ans_raw_start"], - "pipe": [ - { - "class_name": "squad_preprocessor", - "id": "squad_prepr", - "context_limit": 400, - "question_limit": 150, - "char_limit": 16, - "in": ["context_raw", "question_raw"], - "out": ["context", "context_tokens", "context_chars", - "c_r2p", "c_p2r", "question", - "question_tokens", "question_chars", "spans"] - }, - { - "class_name": "squad_ans_preprocessor", - "id": "squad_ans_prepr", - "in": ["ans_raw", "ans_raw_start", "c_r2p", "spans"], - "out": ["ans", "ans_start", "ans_end"] - }, - { - "class_name": "squad_vocab_embedder", - "id": "vocab_embedder", - "level": "token", - "emb_folder": "{DOWNLOADS_PATH}/embeddings/", - "emb_url": "http://files.deeppavlov.ai/embeddings/wiki-news-300d-1M.vec", - "save_path": "{MODELS_PATH}/multi_squad_model_noans/emb/vocab_embedder.pckl", - "load_path": "{MODELS_PATH}/multi_squad_model_noans/emb/vocab_embedder.pckl", - "context_limit": "#squad_prepr.context_limit", - "question_limit": "#squad_prepr.question_limit", - "char_limit": "#squad_prepr.char_limit", - "fit_on": ["context_tokens", "question_tokens"], - "in": ["context_tokens", "question_tokens"], - "out": ["context_tokens_idxs", "question_tokens_idxs"] - }, - { - "class_name": "squad_vocab_embedder", - "id": "char_vocab_embedder", - "level": "char", - "emb_folder": "{DOWNLOADS_PATH}/embeddings/", - "emb_url": "http://files.deeppavlov.ai/embeddings/wiki-news-300d-1M-char.vec", - "save_path": "{MODELS_PATH}/multi_squad_model_noans/emb/char_vocab_embedder.pckl", - "load_path": "{MODELS_PATH}/multi_squad_model_noans/emb/char_vocab_embedder.pckl", - "context_limit": "#squad_prepr.context_limit", - "question_limit": "#squad_prepr.question_limit", - "char_limit": "#squad_prepr.char_limit", - "fit_on": ["context_chars", "question_chars"], - "in": ["context_chars", "question_chars"], - "out": ["context_chars_idxs", "question_chars_idxs"] - }, - { - "class_name": "squad_model", - "id": "squad", - "word_emb": "#vocab_embedder.emb_mat", - "char_emb": "#char_vocab_embedder.emb_mat", - "context_limit": "#squad_prepr.context_limit", - "question_limit": "#squad_prepr.question_limit", - "char_limit": "#squad_prepr.char_limit", - "train_char_emb": true, - "char_hidden_size": 100, - "encoder_hidden_size": 75, - "attention_hidden_size": 75, - "learning_rate": 0.1, - "min_learning_rate": 0.001, - "learning_rate_patience": 5, - "keep_prob": 0.7, - "grad_clip": 5.0, - "weight_decay": 1.0, - "noans_token": true, - "save_path": "{MODELS_PATH}/multi_squad_model_noans/model", - "load_path": "{MODELS_PATH}/multi_squad_model_noans/model", - "in": { - "c_tokens": "context_tokens_idxs", - "c_chars": "context_chars_idxs", - "q_tokens": "question_tokens_idxs", - "q_chars": "question_chars_idxs" - }, - "in_y": { - "y1s": "ans_start", - "y2s": "ans_end" - }, - "out": ["ans_start_predicted", "ans_end_predicted", "prob", "score"] - }, - { - "class_name": "squad_ans_postprocessor", - "id": "squad_ans_postprepr", - "in": ["ans_start_predicted", "ans_end_predicted", "context_raw", "c_p2r", "spans"], - "out": ["ans_predicted", "ans_start_predicted", "ans_end_predicted"] - } - ], - "out": ["ans_predicted", "ans_start_predicted", "score"] - }, - "train": { - "show_examples": false, - "log_every_n_batches": 250, - "val_every_n_epochs": 1, - "batch_size": 64, - "pytest_max_batches": 2, - "validation_patience": 10, - "metrics": [ - { - "name": "squad_v2_f1", - "inputs": ["ans_raw", "ans_predicted"] - }, - { - "name": "squad_v2_em", - "inputs": ["ans_raw", "ans_predicted"] - } - ], - "tensorboard_log_dir": "{MODELS_PATH}/multi_squad_model_noans/logs", - "evaluation_targets": ["valid"], - "class_name": "nn_trainer" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/multi_squad_model_noans_1.1.tar.gz", - "subdir": "{MODELS_PATH}" - }, - { - "url": "http://files.deeppavlov.ai/embeddings/wiki-news-300d-1M.vec", - "subdir": "{DOWNLOADS_PATH}/embeddings" - }, - { - "url": "http://files.deeppavlov.ai/embeddings/wiki-news-300d-1M-char.vec", - "subdir": "{DOWNLOADS_PATH}/embeddings" - } - ] - } -} \ No newline at end of file diff --git a/deeppavlov/configs/squad/multi_squad_noans_infer.json b/deeppavlov/configs/squad/multi_squad_noans_infer.json deleted file mode 100644 index 99338627e3..0000000000 --- a/deeppavlov/configs/squad/multi_squad_noans_infer.json +++ /dev/null @@ -1,140 +0,0 @@ -{ - "dataset_reader": { - "class_name": "squad_dataset_reader", - "dataset": "MultiSQuAD", - "data_path": "{DOWNLOADS_PATH}/multi_squad/" - }, - "dataset_iterator": { - "class_name": "multi_squad_iterator", - "seed": 1337, - "shuffle": true, - "with_answer_rate": 0.666 - }, - "chainer": { - "in": ["context_raw", "question_raw"], - "in_y": ["ans_raw", "ans_raw_start"], - "pipe": [ - { - "class_name": "squad_preprocessor", - "id": "squad_prepr", - "context_limit": 4000, - "question_limit": 150, - "char_limit": 16, - "in": ["context_raw", "question_raw"], - "out": ["context", "context_tokens", "context_chars", - "c_r2p", "c_p2r", "question", - "question_tokens", "question_chars", "spans"] - }, - { - "class_name": "squad_ans_preprocessor", - "id": "squad_ans_prepr", - "in": ["ans_raw", "ans_raw_start", "c_r2p", "spans"], - "out": ["ans", "ans_start", "ans_end"] - }, - { - "class_name": "squad_vocab_embedder", - "id": "vocab_embedder", - "level": "token", - "emb_folder": "{DOWNLOADS_PATH}/embeddings/", - "emb_url": "http://files.deeppavlov.ai/embeddings/wiki-news-300d-1M.vec", - "save_path": "{MODELS_PATH}/multi_squad_model_noans/emb/vocab_embedder.pckl", - "load_path": "{MODELS_PATH}/multi_squad_model_noans/emb/vocab_embedder.pckl", - "context_limit": "#squad_prepr.context_limit", - "question_limit": "#squad_prepr.question_limit", - "char_limit": "#squad_prepr.char_limit", - "fit_on": ["context_tokens", "question_tokens"], - "in": ["context_tokens", "question_tokens"], - "out": ["context_tokens_idxs", "question_tokens_idxs"] - }, - { - "class_name": "squad_vocab_embedder", - "id": "char_vocab_embedder", - "level": "char", - "emb_folder": "{DOWNLOADS_PATH}/embeddings/", - "emb_url": "http://files.deeppavlov.ai/embeddings/wiki-news-300d-1M-char.vec", - "save_path": "{MODELS_PATH}/multi_squad_model_noans/emb/char_vocab_embedder.pckl", - "load_path": "{MODELS_PATH}/multi_squad_model_noans/emb/char_vocab_embedder.pckl", - "context_limit": "#squad_prepr.context_limit", - "question_limit": "#squad_prepr.question_limit", - "char_limit": "#squad_prepr.char_limit", - "fit_on": ["context_chars", "question_chars"], - "in": ["context_chars", "question_chars"], - "out": ["context_chars_idxs", "question_chars_idxs"] - }, - { - "class_name": "squad_model", - "id": "squad", - "word_emb": "#vocab_embedder.emb_mat", - "char_emb": "#char_vocab_embedder.emb_mat", - "context_limit": "#squad_prepr.context_limit", - "question_limit": "#squad_prepr.question_limit", - "char_limit": "#squad_prepr.char_limit", - "train_char_emb": true, - "char_hidden_size": 100, - "encoder_hidden_size": 75, - "attention_hidden_size": 75, - "learning_rate": 0.1, - "min_learning_rate": 0.001, - "learning_rate_patience": 5, - "keep_prob": 0.7, - "grad_clip": 5.0, - "weight_decay": 1.0, - "noans_token": true, - "save_path": "{MODELS_PATH}/multi_squad_model_noans/model", - "load_path": "{MODELS_PATH}/multi_squad_model_noans/model", - "in": { - "c_tokens": "context_tokens_idxs", - "c_chars": "context_chars_idxs", - "q_tokens": "question_tokens_idxs", - "q_chars": "question_chars_idxs" - }, - "in_y": { - "y1s": "ans_start", - "y2s": "ans_end" - }, - "out": ["ans_start_predicted", "ans_end_predicted", "prob", "score"] - }, - { - "class_name": "squad_ans_postprocessor", - "id": "squad_ans_postprepr", - "in": ["ans_start_predicted", "ans_end_predicted", "context_raw", "c_p2r", "spans"], - "out": ["ans_predicted", "ans_start_predicted", "ans_end_predicted"] - } - ], - "out": ["ans_predicted", "ans_start_predicted", "score"] - }, - "train": { - "show_examples": false, - "log_every_n_batches": 250, - "val_every_n_epochs": 1, - "batch_size": 64, - "pytest_max_batches": 2, - "validation_patience": 10, - "metrics": [ - { - "name": "squad_v2_f1", - "inputs": ["ans_raw", "ans_predicted"] - }, - { - "name": "squad_v2_em", - "inputs": ["ans_raw", "ans_predicted"] - } - ], - "tensorboard_log_dir": "{MODELS_PATH}/multi_squad_model_noans/logs", - "evaluation_targets": ["valid"], - "class_name": "nn_trainer" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/multi_squad_model_noans_1.1.tar.gz", - "subdir": "{MODELS_PATH}" - } - ] - } -} \ No newline at end of file diff --git a/deeppavlov/configs/squad/multi_squad_retr_noans.json b/deeppavlov/configs/squad/multi_squad_retr_noans.json deleted file mode 100644 index c0fd6cfe7a..0000000000 --- a/deeppavlov/configs/squad/multi_squad_retr_noans.json +++ /dev/null @@ -1,159 +0,0 @@ -{ - "dataset_reader": { - "class_name": "multi_squad_dataset_reader", - "dataset": "MultiSQuADRetr", - "data_path": "{DOWNLOADS_PATH}/multi_squad_retr/" - }, - "dataset_iterator": { - "class_name": "multi_squad_retr_iterator", - "seed": 1337, - "shuffle": false, - "with_answer_rate": 0.333 - }, - "chainer": { - "in": ["context_raw", "question_raw"], - "in_y": ["ans_raw", "ans_raw_start"], - "pipe": [ - { - "class_name": "squad_preprocessor", - "id": "squad_prepr", - "context_limit": 500, - "question_limit": 150, - "char_limit": 16, - "in": ["context_raw", "question_raw"], - "out": ["context", "context_tokens", "context_chars", - "c_r2p", "c_p2r", "question", - "question_tokens", "question_chars", "spans"] - }, - { - "class_name": "squad_ans_preprocessor", - "id": "squad_ans_prepr", - "in": ["ans_raw", "ans_raw_start", "c_r2p", "spans"], - "out": ["ans", "ans_start", "ans_end"] - }, - { - "class_name": "squad_vocab_embedder", - "id": "vocab_embedder", - "level": "token", - "emb_folder": "{DOWNLOADS_PATH}/embeddings/", - "emb_url": "http://files.deeppavlov.ai/embeddings/wiki-news-300d-1M.vec", - "save_path": "{MODELS_PATH}/multi_squad_model_noans/emb/vocab_embedder.pckl", - "load_path": "{MODELS_PATH}/multi_squad_model_noans/emb/vocab_embedder.pckl", - "context_limit": "#squad_prepr.context_limit", - "question_limit": "#squad_prepr.question_limit", - "char_limit": "#squad_prepr.char_limit", - "fit_lol": ["context_tokens", "question_tokens"], - "in": ["context_tokens", "question_tokens"], - "out": ["context_tokens_idxs", "question_tokens_idxs"] - }, - { - "class_name": "squad_vocab_embedder", - "id": "char_vocab_embedder", - "level": "char", - "emb_folder": "{DOWNLOADS_PATH}/embeddings/", - "emb_url": "http://files.deeppavlov.ai/embeddings/wiki-news-300d-1M-char.vec", - "save_path": "{MODELS_PATH}/multi_squad_model_noans/emb/char_vocab_embedder.pckl", - "load_path": "{MODELS_PATH}/multi_squad_model_noans/emb/char_vocab_embedder.pckl", - "context_limit": "#squad_prepr.context_limit", - "question_limit": "#squad_prepr.question_limit", - "char_limit": "#squad_prepr.char_limit", - "fit_lol": ["context_chars", "question_chars"], - "in": ["context_chars", "question_chars"], - "out": ["context_chars_idxs", "question_chars_idxs"] - }, - { - "class_name": "squad_model", - "id": "squad", - "word_emb": "#vocab_embedder.emb_mat", - "char_emb": "#char_vocab_embedder.emb_mat", - "context_limit": "#squad_prepr.context_limit", - "question_limit": "#squad_prepr.question_limit", - "char_limit": "#squad_prepr.char_limit", - "train_char_emb": true, - "char_hidden_size": 100, - "encoder_hidden_size": 75, - "attention_hidden_size": 75, - "learning_rate": 0.1, - "min_learning_rate": 0.001, - "learning_rate_drop_patience": 5, - "learning_rate_drop_div": 2.0, - "optimizer": "tf.train:AdadeltaOptimizer", - "momentum": 0.95, - "keep_prob": 0.7, - "grad_clip": 5.0, - "weight_decay": 1.0, - "noans_token": true, - "save_path": "{MODELS_PATH}/multi_squad_retr_model_noans/model", - "load_path": "{MODELS_PATH}/multi_squad_retr_model_noans/model", - "in": { - "c_tokens": "context_tokens_idxs", - "c_chars": "context_chars_idxs", - "q_tokens": "question_tokens_idxs", - "q_chars": "question_chars_idxs" - }, - "in_y": { - "y1s": "ans_start", - "y2s": "ans_end" - }, - "out": ["ans_start_predicted", "ans_end_predicted", "prob", "score"] - }, - { - "class_name": "squad_ans_postprocessor", - "id": "squad_ans_postprepr", - "in": ["ans_start_predicted", "ans_end_predicted", "context_raw", "c_p2r", "spans"], - "out": ["ans_predicted", "ans_start_predicted", "ans_end_predicted"] - } - ], - "out": ["ans_predicted", "ans_start_predicted", "score"] - }, - "train": { - "show_examples": false, - "test_best": false, - "validate_best": true, - "log_every_n_batches": 250, - "val_every_n_epochs": 1, - "batch_size": 64, - "pytest_max_batches": 2, - "validation_patience": 10, - "metrics": [ - { - "name": "squad_v1_f1", - "inputs": ["ans_raw", "ans_predicted"] - }, - { - "name": "squad_v1_em", - "inputs": ["ans_raw", "ans_predicted"] - }, - { - "name": "squad_v2_f1", - "inputs": ["ans_raw", "ans_predicted"] - }, - { - "name": "squad_v2_em", - "inputs": ["ans_raw", "ans_predicted"] - } - ], - "tensorboard_log_dir": "{MODELS_PATH}/multi_squad_retr_model_noans/logs" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/multi_squad_model_noans_1.1.tar.gz", - "subdir": "{MODELS_PATH}" - }, - { - "url": "http://files.deeppavlov.ai/embeddings/wiki-news-300d-1M.vec", - "subdir": "{DOWNLOADS_PATH}/embeddings" - }, - { - "url": "http://files.deeppavlov.ai/embeddings/wiki-news-300d-1M-char.vec", - "subdir": "{DOWNLOADS_PATH}/embeddings" - } - ] - } -} diff --git a/deeppavlov/configs/squad/multi_squad_ru_retr_noans.json b/deeppavlov/configs/squad/multi_squad_ru_retr_noans.json deleted file mode 100644 index 3b28beb9e6..0000000000 --- a/deeppavlov/configs/squad/multi_squad_ru_retr_noans.json +++ /dev/null @@ -1,159 +0,0 @@ -{ - "dataset_reader": { - "class_name": "multi_squad_dataset_reader", - "dataset": "MultiSQuADRuRetr", - "data_path": "{DOWNLOADS_PATH}/multi_squad_ru_retr/" - }, - "dataset_iterator": { - "class_name": "multi_squad_retr_iterator", - "seed": 1337, - "shuffle": false, - "with_answer_rate": 0.666 - }, - "chainer": { - "in": ["context_raw", "question_raw"], - "in_y": ["ans_raw", "ans_raw_start"], - "pipe": [ - { - "class_name": "squad_preprocessor", - "id": "squad_prepr", - "context_limit": 500, - "question_limit": 150, - "char_limit": 16, - "in": ["context_raw", "question_raw"], - "out": ["context", "context_tokens", "context_chars", - "c_r2p", "c_p2r", "question", - "question_tokens", "question_chars", "spans"] - }, - { - "class_name": "squad_ans_preprocessor", - "id": "squad_ans_prepr", - "in": ["ans_raw", "ans_raw_start", "c_r2p", "spans"], - "out": ["ans", "ans_start", "ans_end"] - }, - { - "class_name": "squad_vocab_embedder", - "id": "vocab_embedder", - "level": "token", - "emb_folder": "{DOWNLOADS_PATH}/embeddings/", - "emb_url": "http://files.deeppavlov.ai/embeddings/wiki-news-300d-1M.vec", - "save_path": "{MODELS_PATH}/multi_squad_retr_model_ru_noans/emb/vocab_embedder.pckl", - "load_path": "{MODELS_PATH}/multi_squad_retr_model_ru_noans/emb/vocab_embedder.pckl", - "context_limit": "#squad_prepr.context_limit", - "question_limit": "#squad_prepr.question_limit", - "char_limit": "#squad_prepr.char_limit", - "fit_lol": ["context_tokens", "question_tokens"], - "in": ["context_tokens", "question_tokens"], - "out": ["context_tokens_idxs", "question_tokens_idxs"] - }, - { - "class_name": "squad_vocab_embedder", - "id": "char_vocab_embedder", - "level": "char", - "emb_folder": "{DOWNLOADS_PATH}/embeddings/", - "emb_url": "http://files.deeppavlov.ai/embeddings/wiki-news-300d-1M-char.vec", - "save_path": "{MODELS_PATH}/multi_squad_retr_model_ru_noans/emb/char_vocab_embedder.pckl", - "load_path": "{MODELS_PATH}/multi_squad_retr_model_ru_noans/emb/char_vocab_embedder.pckl", - "context_limit": "#squad_prepr.context_limit", - "question_limit": "#squad_prepr.question_limit", - "char_limit": "#squad_prepr.char_limit", - "fit_lol": ["context_chars", "question_chars"], - "in": ["context_chars", "question_chars"], - "out": ["context_chars_idxs", "question_chars_idxs"] - }, - { - "class_name": "squad_model", - "id": "squad", - "word_emb": "#vocab_embedder.emb_mat", - "char_emb": "#char_vocab_embedder.emb_mat", - "context_limit": "#squad_prepr.context_limit", - "question_limit": "#squad_prepr.question_limit", - "char_limit": "#squad_prepr.char_limit", - "train_char_emb": true, - "char_hidden_size": 100, - "encoder_hidden_size": 75, - "attention_hidden_size": 75, - "learning_rate": 0.5, - "min_learning_rate": 0.001, - "learning_rate_drop_patience": 5, - "learning_rate_drop_div": 2.0, - "optimizer": "tf.train:AdadeltaOptimizer", - "momentum": 0.95, - "keep_prob": 0.7, - "grad_clip": 5.0, - "weight_decay": 1.0, - "noans_token": true, - "save_path": "{MODELS_PATH}/multi_squad_retr_model_ru_noans/model", - "load_path": "{MODELS_PATH}/multi_squad_retr_model_ru_noans/model", - "in": { - "c_tokens": "context_tokens_idxs", - "c_chars": "context_chars_idxs", - "q_tokens": "question_tokens_idxs", - "q_chars": "question_chars_idxs" - }, - "in_y": { - "y1s": "ans_start", - "y2s": "ans_end" - }, - "out": ["ans_start_predicted", "ans_end_predicted", "prob", "score"] - }, - { - "class_name": "squad_ans_postprocessor", - "id": "squad_ans_postprepr", - "in": ["ans_start_predicted", "ans_end_predicted", "context_raw", "c_p2r", "spans"], - "out": ["ans_predicted", "ans_start_predicted", "ans_end_predicted"] - } - ], - "out": ["ans_predicted", "ans_start_predicted", "score"] - }, - "train": { - "show_examples": false, - "test_best": false, - "validate_best": true, - "log_every_n_batches": 25, - "val_every_n_epochs": 1, - "batch_size": 64, - "pytest_max_batches": 2, - "validation_patience": 10, - "metrics": [ - { - "name": "squad_v1_f1", - "inputs": ["ans_raw", "ans_predicted"] - }, - { - "name": "squad_v1_em", - "inputs": ["ans_raw", "ans_predicted"] - }, - { - "name": "squad_v2_f1", - "inputs": ["ans_raw", "ans_predicted"] - }, - { - "name": "squad_v2_em", - "inputs": ["ans_raw", "ans_predicted"] - } - ], - "tensorboard_log_dir": "{MODELS_PATH}/multi_squad_retr_model_ru_noans/logs" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/multi_squad_model_ru_1.0.tar.gz", - "subdir": "{MODELS_PATH}" - }, - { - "url": "http://files.deeppavlov.ai/embeddings/wiki-news-300d-1M.vec", - "subdir": "{DOWNLOADS_PATH}/embeddings" - }, - { - "url": "http://files.deeppavlov.ai/embeddings/wiki-news-300d-1M-char.vec", - "subdir": "{DOWNLOADS_PATH}/embeddings" - } - ] - } -} diff --git a/deeppavlov/configs/squad/multi_squad_ru_retr_noans_rubert.json b/deeppavlov/configs/squad/multi_squad_ru_retr_noans_rubert.json deleted file mode 100644 index cb78714e49..0000000000 --- a/deeppavlov/configs/squad/multi_squad_ru_retr_noans_rubert.json +++ /dev/null @@ -1,106 +0,0 @@ -{ - "dataset_reader": { - "class_name": "multi_squad_dataset_reader", - "dataset": "MultiSQuADRuRetrClean", - "url": "http://files.deeppavlov.ai/datasets/multi_squad_ru_retr_clean.tar.gz", - "data_path": "{DOWNLOADS_PATH}/multi_squad_ru_retr_clean/" - }, - "dataset_iterator": { - "class_name": "multi_squad_retr_iterator", - "seed": 1337, - "shuffle": false, - "with_answer_rate": 0.666 - }, - "chainer": { - "in": ["context_raw", "question_raw"], - "in_y": ["ans_raw", "ans_raw_start"], - "pipe": [ - { - "class_name": "bert_preprocessor", - "vocab_file": "{DOWNLOADS_PATH}/bert_models/rubert_cased_L-12_H-768_A-12_v1/vocab.txt", - "do_lower_case": false, - "max_seq_length": 384, - "in": ["question_raw", "context_raw"], - "out": ["bert_features"] - }, - { - "class_name": "squad_bert_mapping", - "do_lower_case": false, - "in": ["context_raw", "bert_features"], - "out": ["subtok2chars", "char2subtoks"] - }, - { - "class_name": "squad_bert_ans_preprocessor", - "do_lower_case": false, - "in": ["ans_raw", "ans_raw_start","char2subtoks"], - "out": ["ans", "ans_start", "ans_end"] - }, - { - "class_name": "squad_bert_model", - "bert_config_file": "{DOWNLOADS_PATH}/bert_models/rubert_cased_L-12_H-768_A-12_v1/bert_config.json", - "pretrained_bert": "{DOWNLOADS_PATH}/bert_models/rubert_cased_L-12_H-768_A-12_v1/bert_model.ckpt", - "save_path": "{MODELS_PATH}/multi_squad_ru_retr_bert/model_rubert_noans", - "load_path": "{MODELS_PATH}/multi_squad_ru_retr_bert/model_rubert_noans", - "keep_prob": 0.5, - "learning_rate": 2e-05, - "learning_rate_drop_patience": 3, - "learning_rate_drop_div": 2.0, - "in": ["bert_features"], - "in_y": ["ans_start", "ans_end"], - "out": ["ans_start_predicted", "ans_end_predicted", "logits", "score"] - }, - { - "class_name": "squad_bert_ans_postprocessor", - "in": ["ans_start_predicted", "ans_end_predicted", "context_raw", "bert_features", "subtok2chars"], - "out": ["ans_predicted", "ans_start_predicted", "ans_end_predicted"] - } - ], - "out": ["ans_predicted", "ans_start_predicted", "score"] - }, - "train": { - "show_examples": false, - "test_best": false, - "validate_best": true, - "log_every_n_batches": 250, - "val_every_n_batches": 500, - "batch_size": 17, - "pytest_max_batches": 2, - "validation_patience": 10, - "metrics": [ - { - "name": "squad_f1", - "inputs": ["ans_raw", "ans_predicted"] - }, - { - "name": "squad_f1_1.1", - "inputs": ["ans_raw", "ans_predicted"] - }, - { - "name": "exact_match", - "inputs": ["ans_raw", "ans_predicted"] - }, - { - "name": "exact_match_1.1", - "inputs": ["ans_raw", "ans_predicted"] - } - ], - "tensorboard_log_dir": "{MODELS_PATH}/multi_squad_ru_retr_bert/logs_rubert" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/bert/rubert_cased_L-12_H-768_A-12_v1.tar.gz", - "subdir": "{DOWNLOADS_PATH}/bert_models" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/multi_squad_ru_retr_rubert.tar.gz", - "subdir": "{MODELS_PATH}" - } - ] - } -} \ No newline at end of file diff --git a/deeppavlov/configs/squad/multi_squad_ru_retr_noans_rubert_infer.json b/deeppavlov/configs/squad/multi_squad_ru_retr_noans_rubert_infer.json deleted file mode 100644 index d17891ae5a..0000000000 --- a/deeppavlov/configs/squad/multi_squad_ru_retr_noans_rubert_infer.json +++ /dev/null @@ -1,70 +0,0 @@ -{ - "dataset_reader": { - "class_name": "squad_dataset_reader", - "dataset": "SberSQuADClean", - "url": "http://files.deeppavlov.ai/datasets/sber_squad_clean-v1.1.tar.gz", - "data_path": "{DOWNLOADS_PATH}/squad_ru_clean/" - }, - "dataset_iterator": { - "class_name": "squad_iterator", - "seed": 1337, - "shuffle": true - }, - "chainer": { - "in": ["context_raw", "question_raw"], - "in_y": ["ans_raw", "ans_raw_start"], - "pipe": [ - { - "class_name": "squad_bert_infer", - "lang": "ru", - "batch_size": 128, - "squad_model_config": "{CONFIGS_PATH}/squad/multi_squad_ru_retr_noans_rubert.json", - "vocab_file": "{DOWNLOADS_PATH}/bert_models/rubert_cased_L-12_H-768_A-12_v1/vocab.txt", - "do_lower_case": false, - "max_seq_length": 256, - "in": ["context_raw", "question_raw"], - "out": ["ans_predicted", "ans_start_predicted", "score"] - } - ], - "out": ["ans_predicted", "ans_start_predicted", "score"] - }, - "train": { - "show_examples": false, - "test_best": false, - "validate_best": true, - "log_every_n_batches": 250, - "val_every_n_batches": 500, - "batch_size": 10, - "pytest_max_batches": 2, - "validation_patience": 10, - "metrics": [ - { - "name": "squad_f1", - "inputs": ["ans_raw", "ans_predicted"] - }, - { - "name": "exact_match", - "inputs": ["ans_raw", "ans_predicted"] - } - ] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "CONFIGS_PATH": "{DEEPPAVLOV_PATH}/configs" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/bert/rubert_cased_L-12_H-768_A-12_v1.tar.gz", - "subdir": "{DOWNLOADS_PATH}/bert_models" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/multi_squad_ru_retr_rubert.tar.gz", - "subdir": "{MODELS_PATH}" - } - ] - } -} - diff --git a/deeppavlov/configs/squad/qa_multisberquad_bert.json b/deeppavlov/configs/squad/qa_multisberquad_bert.json new file mode 100644 index 0000000000..0055094943 --- /dev/null +++ b/deeppavlov/configs/squad/qa_multisberquad_bert.json @@ -0,0 +1,108 @@ +{ + "dataset_reader": { + "class_name": "multi_squad_dataset_reader", + "dataset": "MultiSQuADRuRetrClean", + "url": "http://files.deeppavlov.ai/datasets/multi_squad_ru_retr_clean.tar.gz", + "data_path": "{DOWNLOADS_PATH}/multi_squad_ru_retr_clean/" + }, + "dataset_iterator": { + "class_name": "multi_squad_retr_iterator", + "seed": 1337, + "shuffle": false, + "with_answer_rate": 0.666 + }, + "chainer": { + "in": ["context_raw", "question_raw"], + "in_y": ["ans_raw", "ans_raw_start"], + "pipe": [ + { + "class_name": "torch_squad_transformers_preprocessor", + "vocab_file": "{TRANSFORMER}", + "do_lower_case": "{LOWERCASE}", + "max_seq_length": 384, + "in": ["question_raw", "context_raw"], + "out": ["bert_features", "subtokens", "split_context"] + }, + { + "class_name": "squad_bert_mapping", + "do_lower_case": "{LOWERCASE}", + "in": ["split_context", "bert_features", "subtokens"], + "out": ["subtok2chars", "char2subtoks"] + }, + { + "class_name": "squad_bert_ans_preprocessor", + "do_lower_case": "{LOWERCASE}", + "in": ["ans_raw", "ans_raw_start", "char2subtoks"], + "out": ["ans", "ans_start", "ans_end"] + }, + { + "class_name": "torch_transformers_squad", + "pretrained_bert": "{TRANSFORMER}", + "save_path": "{MODEL_PATH}/model", + "load_path": "{MODEL_PATH}/model", + "optimizer": "AdamW", + "optimizer_parameters": { + "lr": 2e-05, + "weight_decay": 0.01, + "betas": [0.9, 0.999], + "eps": 1e-06 + }, + "learning_rate_drop_patience": 3, + "learning_rate_drop_div": 2.0, + "in": ["bert_features"], + "in_y": ["ans_start", "ans_end"], + "out": ["ans_start_predicted", "ans_end_predicted", "logits", "scores", "inds"] + }, + { + "class_name": "squad_bert_ans_postprocessor", + "in": ["ans_start_predicted", "ans_end_predicted", "split_context", "subtok2chars", "subtokens", "inds"], + "out": ["ans_predicted", "ans_start_predicted", "ans_end_predicted"] + } + ], + "out": ["ans_predicted", "ans_start_predicted", "scores"] + }, + "train": { + "show_examples": false, + "evaluation_targets": ["valid"], + "log_every_n_batches": 250, + "val_every_n_batches": 500, + "batch_size": 20, + "valid_batch_size": 64, + "validation_patience": 10, + "metrics": [ + { + "name": "squad_v1_f1", + "inputs": ["ans", "ans_predicted"] + }, + { + "name": "squad_v1_em", + "inputs": ["ans", "ans_predicted"] + }, + { + "name": "squad_v2_f1", + "inputs": ["ans", "ans_predicted"] + }, + { + "name": "squad_v2_em", + "inputs": ["ans", "ans_predicted"] + } + ], + "class_name": "torch_trainer" + }, + "metadata": { + "variables": { + "LOWERCASE": false, + "TRANSFORMER": "DeepPavlov/rubert-base-cased", + "ROOT_PATH": "~/.deeppavlov", + "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", + "MODELS_PATH": "{ROOT_PATH}/models", + "MODEL_PATH": "{MODELS_PATH}/multi_squad_ru_torch_bert_retr_noans/{TRANSFORMER}" + }, + "download": [ + { + "url": "http://files.deeppavlov.ai/v1/squad/multi_squad_ru_torch_bert_retr_noans.tar.gz", + "subdir": "{MODEL_PATH}" + } + ] + } +} diff --git a/deeppavlov/configs/squad/squad_torch_bert.json b/deeppavlov/configs/squad/qa_squad2_bert.json similarity index 83% rename from deeppavlov/configs/squad/squad_torch_bert.json rename to deeppavlov/configs/squad/qa_squad2_bert.json index 32b104c96a..20d9e23cb5 100644 --- a/deeppavlov/configs/squad/squad_torch_bert.json +++ b/deeppavlov/configs/squad/qa_squad2_bert.json @@ -1,7 +1,8 @@ { "dataset_reader": { "class_name": "squad_dataset_reader", - "data_path": "{DOWNLOADS_PATH}/squad/" + "dataset": "SQuAD2.0", + "data_path": "{DOWNLOADS_PATH}/squad2/" }, "dataset_iterator": { "class_name": "squad_iterator", @@ -23,21 +24,21 @@ "vocab_file": "{TRANSFORMER}", "do_lower_case": "{LOWERCASE}", "max_seq_length": 384, - "return_tokens": true, "in": [ "question_raw", "context_raw" ], "out": [ "bert_features", - "subtokens" + "subtokens", + "split_context" ] }, { "class_name": "squad_bert_mapping", "do_lower_case": "{LOWERCASE}", "in": [ - "context_raw", + "split_context", "bert_features", "subtokens" ], @@ -65,6 +66,7 @@ "pretrained_bert": "{TRANSFORMER}", "save_path": "{MODEL_PATH}/model", "load_path": "{MODEL_PATH}/model", + "torch_seed": 1, "optimizer": "AdamW", "optimizer_parameters": { "lr": 2e-05, @@ -75,6 +77,7 @@ ], "eps": 1e-06 }, + "random_seed": 1, "learning_rate_drop_patience": 2, "learning_rate_drop_div": 2.0, "in": [ @@ -87,7 +90,9 @@ "out": [ "ans_start_predicted", "ans_end_predicted", - "logits" + "logits", + "scores", + "inds" ] }, { @@ -95,10 +100,10 @@ "in": [ "ans_start_predicted", "ans_end_predicted", - "context_raw", - "bert_features", + "split_context", "subtok2chars", - "subtokens" + "subtokens", + "inds" ], "out": [ "ans_predicted", @@ -110,7 +115,7 @@ "out": [ "ans_predicted", "ans_start_predicted", - "logits" + "scores" ] }, "train": { @@ -118,9 +123,11 @@ "evaluation_targets": [ "valid" ], - "log_every_n_batches": 250, + "log_every_n_batches": 50, "val_every_n_batches": 500, - "batch_size": 10, + "batch_size": 20, + "valid_batch_size": 60, + "valid_batch_size": 32, "pytest_max_batches": 2, "pytest_batch_size": 5, "validation_patience": 10, @@ -158,17 +165,17 @@ }, "metadata": { "variables": { - "LOWERCASE": true, - "TRANSFORMER": "bert-base-uncased", + "LOWERCASE": false, + "TRANSFORMER": "bert-base-cased", "ROOT_PATH": "~/.deeppavlov", "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", "MODELS_PATH": "{ROOT_PATH}/models", - "MODEL_PATH": "{MODELS_PATH}/squad_torch_bert/{TRANSFORMER}" + "MODEL_PATH": "{MODELS_PATH}/squad2_bert" }, "download": [ { - "url": "http://files.deeppavlov.ai/v1/squad/squad_torch_bert.tar.gz", - "subdir": "{ROOT_PATH}/models" + "url": "http://files.deeppavlov.ai/v1/squad/squad2_bert.tar.gz", + "subdir": "{MODEL_PATH}" } ] } diff --git a/deeppavlov/configs/squad/squad.json b/deeppavlov/configs/squad/squad.json deleted file mode 100644 index 451b0bb2a9..0000000000 --- a/deeppavlov/configs/squad/squad.json +++ /dev/null @@ -1,138 +0,0 @@ -{ - "dataset_reader": { - "class_name": "squad_dataset_reader", - "data_path": "{DOWNLOADS_PATH}/squad/" - }, - "dataset_iterator": { - "class_name": "squad_iterator", - "seed": 1337, - "shuffle": true - }, - "chainer": { - "in": ["context_raw", "question_raw"], - "in_y": ["ans_raw", "ans_raw_start"], - "pipe": [ - { - "class_name": "squad_preprocessor", - "id": "squad_prepr", - "context_limit": 400, - "question_limit": 150, - "char_limit": 16, - "in": ["context_raw", "question_raw"], - "out": ["context", "context_tokens", "context_chars", - "c_r2p", "c_p2r", "question", - "question_tokens", "question_chars", "spans"] - }, - { - "class_name": "squad_ans_preprocessor", - "id": "squad_ans_prepr", - "in": ["ans_raw", "ans_raw_start", "c_r2p", "spans"], - "out": ["ans", "ans_start", "ans_end"] - }, - { - "class_name": "squad_vocab_embedder", - "id": "vocab_embedder", - "level": "token", - "emb_folder": "{DOWNLOADS_PATH}/embeddings/", - "emb_url": "http://files.deeppavlov.ai/embeddings/wiki-news-300d-1M.vec", - "save_path": "{MODELS_PATH}/squad_model/emb/vocab_embedder.pckl", - "load_path": "{MODELS_PATH}/squad_model/emb/vocab_embedder.pckl", - "context_limit": "#squad_prepr.context_limit", - "question_limit": "#squad_prepr.question_limit", - "char_limit": "#squad_prepr.char_limit", - "fit_on": ["context_tokens", "question_tokens"], - "in": ["context_tokens", "question_tokens"], - "out": ["context_tokens_idxs", "question_tokens_idxs"] - }, - { - "class_name": "squad_vocab_embedder", - "id": "char_vocab_embedder", - "level": "char", - "emb_folder": "{DOWNLOADS_PATH}/embeddings/", - "emb_url": "http://files.deeppavlov.ai/embeddings/wiki-news-300d-1M-char.vec", - "save_path": "{MODELS_PATH}/squad_model/emb/char_vocab_embedder.pckl", - "load_path": "{MODELS_PATH}/squad_model/emb/char_vocab_embedder.pckl", - "context_limit": "#squad_prepr.context_limit", - "question_limit": "#squad_prepr.question_limit", - "char_limit": "#squad_prepr.char_limit", - "fit_on": ["context_chars", "question_chars"], - "in": ["context_chars", "question_chars"], - "out": ["context_chars_idxs", "question_chars_idxs"] - }, - { - "class_name": "squad_model", - "id": "squad", - "word_emb": "#vocab_embedder.emb_mat", - "char_emb": "#char_vocab_embedder.emb_mat", - "context_limit": "#squad_prepr.context_limit", - "question_limit": "#squad_prepr.question_limit", - "char_limit": "#squad_prepr.char_limit", - "train_char_emb": true, - "char_hidden_size": 100, - "encoder_hidden_size": 75, - "attention_hidden_size": 75, - "min_learning_rate": 0.001, - "keep_prob": 0.7, - "clip_norm": 5.0, - "learning_rate": 0.5, - "learning_rate_drop_patience": 5, - "learning_rate_drop_div": 2.0, - "optimizer": "tf.train:AdadeltaOptimizer", - "momentum": 0.95, - "save_path": "{MODELS_PATH}/squad_model/model", - "load_path": "{MODELS_PATH}/squad_model/model", - "in": ["context_tokens_idxs", "context_chars_idxs", "question_tokens_idxs", "question_chars_idxs"], - "in_y": ["ans_start", "ans_end"], - "out": ["ans_start_predicted", "ans_end_predicted", "logits"] - }, - { - "class_name": "squad_ans_postprocessor", - "id": "squad_ans_postprepr", - "in": ["ans_start_predicted", "ans_end_predicted", "context_raw", "c_p2r", "spans"], - "out": ["ans_predicted", "ans_start_predicted", "ans_end_predicted"] - } - ], - "out": ["ans_predicted", "ans_start_predicted", "logits"] - }, - "train": { - "show_examples": false, - "log_every_n_batches": 250, - "val_every_n_epochs": 1, - "batch_size": 50, - "pytest_max_batches": 2, - "validation_patience": 10, - "metrics": [ - { - "name": "squad_v1_f1", - "inputs": ["ans_raw", "ans_predicted"] - }, - { - "name": "squad_v1_em", - "inputs": ["ans_raw", "ans_predicted"] - } - ], - "evaluation_targets": ["valid"], - "class_name": "nn_trainer" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/squad_model_1.4_cpu_compatible.tar.gz", - "subdir": "{MODELS_PATH}" - }, - { - "url": "http://files.deeppavlov.ai/embeddings/wiki-news-300d-1M.vec", - "subdir": "{DOWNLOADS_PATH}/embeddings" - }, - { - "url": "http://files.deeppavlov.ai/embeddings/wiki-news-300d-1M-char.vec", - "subdir": "{DOWNLOADS_PATH}/embeddings" - } - ] - } -} \ No newline at end of file diff --git a/deeppavlov/configs/squad/squad_bert.json b/deeppavlov/configs/squad/squad_bert.json index 18435be6f1..e67c361570 100644 --- a/deeppavlov/configs/squad/squad_bert.json +++ b/deeppavlov/configs/squad/squad_bert.json @@ -13,51 +13,55 @@ "in_y": ["ans_raw", "ans_raw_start"], "pipe": [ { - "class_name": "bert_preprocessor", - "vocab_file": "{DOWNLOADS_PATH}/bert_models/cased_L-12_H-768_A-12/vocab.txt", - "do_lower_case": false, + "class_name": "torch_squad_transformers_preprocessor", + "vocab_file": "{TRANSFORMER}", + "do_lower_case": "{LOWERCASE}", "max_seq_length": 384, "in": ["question_raw", "context_raw"], - "out": ["bert_features"] + "out": ["bert_features", "subtokens", "split_context"] }, { "class_name": "squad_bert_mapping", - "do_lower_case": false, - "in": ["context_raw", "bert_features"], + "do_lower_case": "{LOWERCASE}", + "in": ["split_context", "bert_features", "subtokens"], "out": ["subtok2chars", "char2subtoks"] }, { "class_name": "squad_bert_ans_preprocessor", - "do_lower_case": false, - "in": ["ans_raw", "ans_raw_start","char2subtoks"], + "do_lower_case": "{LOWERCASE}", + "in": ["ans_raw", "ans_raw_start", "char2subtoks"], "out": ["ans", "ans_start", "ans_end"] }, { - "class_name": "squad_bert_model", - "bert_config_file": "{DOWNLOADS_PATH}/bert_models/cased_L-12_H-768_A-12/bert_config.json", - "pretrained_bert": "{DOWNLOADS_PATH}/bert_models/cased_L-12_H-768_A-12/bert_model.ckpt", - "save_path": "{MODELS_PATH}/squad_bert/model", - "load_path": "{MODELS_PATH}/squad_bert/model", - "keep_prob": 0.5, - "learning_rate": 2e-05, + "class_name": "torch_transformers_squad", + "pretrained_bert": "{TRANSFORMER}", + "save_path": "{MODEL_PATH}/model", + "load_path": "{MODEL_PATH}/model", + "optimizer": "AdamW", + "optimizer_parameters": { + "lr": 2e-05, + "weight_decay": 0.01, + "betas": [0.9, 0.999], + "eps": 1e-06 + }, "learning_rate_drop_patience": 2, "learning_rate_drop_div": 2.0, + "batch_size": 10, "in": ["bert_features"], "in_y": ["ans_start", "ans_end"], - "out": ["ans_start_predicted", "ans_end_predicted", "logits"] + "out": ["ans_start_predicted", "ans_end_predicted", "logits", "scores", "inds"] }, { "class_name": "squad_bert_ans_postprocessor", - "in": ["ans_start_predicted", "ans_end_predicted", "context_raw", "bert_features", "subtok2chars"], + "in": ["ans_start_predicted", "ans_end_predicted", "split_context", "subtok2chars", "subtokens", "inds"], "out": ["ans_predicted", "ans_start_predicted", "ans_end_predicted"] } ], - "out": ["ans_predicted", "ans_start_predicted", "logits"] + "out": ["ans_predicted", "ans_start_predicted", "scores"] }, "train": { "show_examples": false, - "test_best": false, - "validate_best": true, + "evaluation_targets": ["valid"], "log_every_n_batches": 250, "val_every_n_batches": 500, "batch_size": 10, @@ -82,24 +86,22 @@ "inputs": ["ans", "ans_predicted"] } ], - "tensorboard_log_dir": "{MODELS_PATH}/squad_bert/logs" + "class_name": "torch_trainer" }, "metadata": { "variables": { + "LOWERCASE": false, + "TRANSFORMER": "bert-base-cased", "ROOT_PATH": "~/.deeppavlov", "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models" + "MODELS_PATH": "{ROOT_PATH}/models", + "MODEL_PATH": "{MODELS_PATH}/squad_torch_bert/cased/{TRANSFORMER}" }, "download": [ { - "url": "http://files.deeppavlov.ai/deeppavlov_data/bert/cased_L-12_H-768_A-12.zip", - "subdir": "{DOWNLOADS_PATH}/bert_models" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/squad_bert.tar.gz", - "subdir": "{MODELS_PATH}" + "url": "http://files.deeppavlov.ai/v1/squad/squad_torch_bert_cased.tar.gz", + "subdir": "{MODEL_PATH}" } - ] + ] } } - diff --git a/deeppavlov/configs/squad/squad_bert_infer.json b/deeppavlov/configs/squad/squad_bert_infer.json deleted file mode 100644 index dcc5747d31..0000000000 --- a/deeppavlov/configs/squad/squad_bert_infer.json +++ /dev/null @@ -1,75 +0,0 @@ -{ - "dataset_reader": { - "class_name": "squad_dataset_reader", - "data_path": "{DOWNLOADS_PATH}/squad/" - }, - "dataset_iterator": { - "class_name": "squad_iterator", - "seed": 1337, - "shuffle": true - }, - "chainer": { - "in": ["context_raw", "question_raw"], - "in_y": ["ans_raw", "ans_raw_start"], - "pipe": [ - { - "class_name": "squad_bert_infer", - "batch_size": 10, - "squad_model_config": "{CONFIGS_PATH}/squad/squad_bert.json", - "vocab_file": "{DOWNLOADS_PATH}/bert_models/cased_L-12_H-768_A-12/vocab.txt", - "do_lower_case": false, - "max_seq_length": 512, - "in": ["context_raw", "question_raw"], - "out": ["ans_predicted", "ans_start_predicted", "logits"] - } - ], - "out": ["ans_predicted", "ans_start_predicted", "logits"] - }, - "train": { - "show_examples": false, - "test_best": false, - "validate_best": true, - "log_every_n_batches": 250, - "val_every_n_batches": 500, - "batch_size": 10, - "pytest_max_batches": 2, - "validation_patience": 10, - "metrics": [ - { - "name": "squad_v1_f1", - "inputs": ["ans_raw", "ans_predicted"] - }, - { - "name": "squad_v1_em", - "inputs": ["ans_raw", "ans_predicted"] - }, - { - "name": "squad_v2_f1", - "inputs": ["ans_raw", "ans_predicted"] - }, - { - "name": "squad_v2_em", - "inputs": ["ans_raw", "ans_predicted"] - } - ], - "tensorboard_log_dir": "{MODELS_PATH}/squad_bert/logs" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "CONFIGS_PATH": "{DEEPPAVLOV_PATH}/configs" - }, - "download": [{ - "url": "http://files.deeppavlov.ai/deeppavlov_data/bert/cased_L-12_H-768_A-12.zip", - "subdir": "{DOWNLOADS_PATH}/bert_models" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/squad_bert.tar.gz", - "subdir": "{MODELS_PATH}" - } - ] - } -} - diff --git a/deeppavlov/configs/squad/squad_bert_multilingual_freezed_emb.json b/deeppavlov/configs/squad/squad_bert_multilingual_freezed_emb.json deleted file mode 100644 index ed3a89c02a..0000000000 --- a/deeppavlov/configs/squad/squad_bert_multilingual_freezed_emb.json +++ /dev/null @@ -1,66 +0,0 @@ -{ - "chainer": { - "in": ["context_raw", "question_raw"], - "in_y": ["ans_raw", "ans_raw_start"], - "pipe": [ - { - "class_name": "bert_preprocessor", - "vocab_file": "{DOWNLOADS_PATH}/bert_models/multi_cased_L-12_H-768_A-12/vocab.txt", - "do_lower_case": false, - "max_seq_length": 384, - "in": ["question_raw", "context_raw"], - "out": ["bert_features"] - }, - { - "class_name": "squad_bert_mapping", - "do_lower_case": false, - "in": ["context_raw", "bert_features"], - "out": ["subtok2chars", "char2subtoks"] - }, - { - "class_name": "squad_bert_ans_preprocessor", - "do_lower_case": false, - "in": ["ans_raw", "ans_raw_start","char2subtoks"], - "out": ["ans", "ans_start", "ans_end"] - }, - { - "class_name": "squad_bert_model", - "bert_config_file": "{DOWNLOADS_PATH}/bert_models/multi_cased_L-12_H-768_A-12/bert_config.json", - "pretrained_bert": "{DOWNLOADS_PATH}/bert_models/multi_cased_L-12_H-768_A-12/bert_model.ckpt", - "save_path": "{MODELS_PATH}/squad_bert/model_multi_freezed", - "load_path": "{MODELS_PATH}/squad_bert/model_multi_freezed", - "keep_prob": 0.5, - "learning_rate": 2e-05, - "learning_rate_drop_patience": 2, - "learning_rate_drop_div": 2.0, - "in": ["bert_features"], - "in_y": ["ans_start", "ans_end"], - "out": ["ans_start_predicted", "ans_end_predicted", "logits"] - }, - { - "class_name": "squad_bert_ans_postprocessor", - "in": ["ans_start_predicted", "ans_end_predicted", "context_raw", "bert_features", "subtok2chars"], - "out": ["ans_predicted", "ans_start_predicted", "ans_end_predicted"] - } - ], - "out": ["ans_predicted", "ans_start_predicted", "logits"] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/bert/multi_cased_L-12_H-768_A-12.zip", - "subdir": "{DOWNLOADS_PATH}/bert_models" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/squad_bert_mult_freezed.tar.gz", - "subdir": "{MODELS_PATH}" - } - ] - } -} - diff --git a/deeppavlov/configs/squad/squad_bert_uncased.json b/deeppavlov/configs/squad/squad_bert_uncased.json deleted file mode 100644 index 5542458965..0000000000 --- a/deeppavlov/configs/squad/squad_bert_uncased.json +++ /dev/null @@ -1,103 +0,0 @@ -{ - "dataset_reader": { - "class_name": "squad_dataset_reader", - "data_path": "{DOWNLOADS_PATH}/squad/" - }, - "dataset_iterator": { - "class_name": "squad_iterator", - "seed": 1337, - "shuffle": true - }, - "chainer": { - "in": ["context_raw", "question_raw"], - "in_y": ["ans_raw", "ans_raw_start"], - "pipe": [ - { - "class_name": "bert_preprocessor", - "id": "bert_preprocessor", - "vocab_file": "{DOWNLOADS_PATH}/bert_models/uncased_L-12_H-768_A-12/vocab.txt", - "do_lower_case": "{lowercase}", - "max_seq_length": 384, - "in": ["question_raw", "context_raw"], - "out": ["bert_features"] - }, - { - "class_name": "squad_bert_mapping", - "do_lower_case": "{lowercase}", - "in": ["context_raw", "bert_features"], - "out": ["subtok2chars", "char2subtoks"] - }, - { - "class_name": "squad_bert_ans_preprocessor", - "do_lower_case": "{lowercase}", - "in": ["ans_raw", "ans_raw_start","char2subtoks"], - "out": ["ans", "ans_start", "ans_end"] - }, - { - "class_name": "squad_bert_model", - "bert_config_file": "{DOWNLOADS_PATH}/bert_models/uncased_L-12_H-768_A-12/bert_config.json", - "pretrained_bert": "{DOWNLOADS_PATH}/bert_models/uncased_L-12_H-768_A-12/bert_model.ckpt", - "save_path": "{MODELS_PATH}/squad_bert/uncased_model", - "load_path": "{MODELS_PATH}/squad_bert/uncased_model", - "keep_prob": 0.5, - "learning_rate": 2e-05, - "learning_rate_drop_patience": 2, - "learning_rate_drop_div": 2.0, - "in": ["bert_features"], - "in_y": ["ans_start", "ans_end"], - "out": ["ans_start_predicted", "ans_end_predicted", "logits"] - }, - { - "class_name": "squad_bert_ans_postprocessor", - "in": ["ans_start_predicted", "ans_end_predicted", "context_raw", "bert_features", "subtok2chars"], - "out": ["ans_predicted", "ans_start_predicted", "ans_end_predicted"] - } - ], - "out": ["ans_predicted", "ans_start_predicted", "logits"] - }, - "train": { - "show_examples": false, - "test_best": false, - "validate_best": true, - "log_every_n_batches": 250, - "val_every_n_batches": 500, - "batch_size": 10, - "pytest_max_batches": 2, - "pytest_batch_size": 5, - "validation_patience": 10, - "metrics": [ - { - "name": "squad_v1_f1", - "inputs": ["ans", "ans_predicted"] - }, - { - "name": "squad_v1_em", - "inputs": ["ans", "ans_predicted"] - }, - { - "name": "squad_v2_f1", - "inputs": ["ans", "ans_predicted"] - }, - { - "name": "squad_v2_em", - "inputs": ["ans", "ans_predicted"] - } - ], - "tensorboard_log_dir": "{MODELS_PATH}/squad_bert/uncased_logs" - }, - "metadata": { - "variables": { - "lowercase": true, - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/bert/uncased_L-12_H-768_A-12.zip", - "subdir": "{DOWNLOADS_PATH}/bert_models" - } - ] - } -} - diff --git a/deeppavlov/configs/squad/squad_ru.json b/deeppavlov/configs/squad/squad_ru.json deleted file mode 100644 index 2d66da3143..0000000000 --- a/deeppavlov/configs/squad/squad_ru.json +++ /dev/null @@ -1,139 +0,0 @@ -{ - "dataset_reader": { - "class_name": "squad_dataset_reader", - "dataset": "SberSQuAD", - "data_path": "{DOWNLOADS_PATH}/squad_ru/" - }, - "dataset_iterator": { - "class_name": "squad_iterator", - "seed": 1337, - "shuffle": true - }, - "chainer": { - "in": ["context_raw", "question_raw"], - "in_y": ["ans_raw", "ans_raw_start"], - "pipe": [ - { - "class_name": "squad_preprocessor", - "id": "squad_prepr", - "context_limit": 400, - "question_limit": 50, - "char_limit": 16, - "in": ["context_raw", "question_raw"], - "out": ["context", "context_tokens", "context_chars", - "c_r2p", "c_p2r", "question", - "question_tokens", "question_chars", "spans"] - }, - { - "class_name": "squad_ans_preprocessor", - "id": "squad_ans_prepr", - "in": ["ans_raw", "ans_raw_start", "c_r2p", "spans"], - "out": ["ans", "ans_start", "ans_end"] - }, - { - "class_name": "squad_vocab_embedder", - "id": "vocab_embedder", - "level": "token", - "emb_folder": "{DOWNLOADS_PATH}/embeddings/", - "emb_url": "http://files.deeppavlov.ai/embeddings/ft_native_300_ru_wiki_lenta_nltk_word_tokenize/ft_native_300_ru_wiki_lenta_nltk_word_tokenize.vec", - "save_path": "{MODELS_PATH}/squad_model_ru/emb/vocab_embedder.pckl", - "load_path": "{MODELS_PATH}/squad_model_ru/emb/vocab_embedder.pckl", - "context_limit": "#squad_prepr.context_limit", - "question_limit": "#squad_prepr.question_limit", - "char_limit": "#squad_prepr.char_limit", - "fit_on": ["context_tokens", "question_tokens"], - "in": ["context_tokens", "question_tokens"], - "out": ["context_tokens_idxs", "question_tokens_idxs"] - }, - { - "class_name": "squad_vocab_embedder", - "id": "char_vocab_embedder", - "level": "char", - "emb_folder": "{DOWNLOADS_PATH}/embeddings/", - "emb_url": "http://files.deeppavlov.ai/embeddings/ft_native_300_ru_wiki_lenta_nltk_word_tokenize-char.vec", - "save_path": "{MODELS_PATH}/squad_model_ru/emb/char_vocab_embedder.pckl", - "load_path": "{MODELS_PATH}/squad_model_ru/emb/char_vocab_embedder.pckl", - "context_limit": "#squad_prepr.context_limit", - "question_limit": "#squad_prepr.question_limit", - "char_limit": "#squad_prepr.char_limit", - "fit_on": ["context_chars", "question_chars"], - "in": ["context_chars", "question_chars"], - "out": ["context_chars_idxs", "question_chars_idxs"] - }, - { - "class_name": "squad_model", - "id": "squad", - "word_emb": "#vocab_embedder.emb_mat", - "char_emb": "#char_vocab_embedder.emb_mat", - "context_limit": "#squad_prepr.context_limit", - "question_limit": "#squad_prepr.question_limit", - "char_limit": "#squad_prepr.char_limit", - "train_char_emb": true, - "char_hidden_size": 100, - "encoder_hidden_size": 75, - "attention_hidden_size": 75, - "keep_prob": 0.6, - "clip_norm": 5.0, - "learning_rate": 0.5, - "learning_rate_drop_patience": 2, - "learning_rate_drop_div": 2.0, - "min_learning_rate": 0.001, - "optimizer": "tf.train:AdadeltaOptimizer", - "momentum": 0.95, - "save_path": "{MODELS_PATH}/squad_model_ru/model", - "load_path": "{MODELS_PATH}/squad_model_ru/model", - "in": ["context_tokens_idxs", "context_chars_idxs", "question_tokens_idxs", "question_chars_idxs"], - "in_y": ["ans_start", "ans_end"], - "out": ["ans_start_predicted", "ans_end_predicted", "logits"] - }, - { - "class_name": "squad_ans_postprocessor", - "id": "squad_ans_postprepr", - "in": ["ans_start_predicted", "ans_end_predicted", "context_raw", "c_p2r", "spans"], - "out": ["ans_predicted", "ans_start_predicted", "ans_end_predicted"] - } - ], - "out": ["ans_predicted", "ans_start_predicted", "logits"] - }, - "train": { - "show_examples": false, - "log_every_n_batches": 250, - "val_every_n_epochs": 1, - "batch_size": 50, - "pytest_max_batches": 2, - "validation_patience": 10, - "metrics": [ - { - "name": "squad_v1_f1", - "inputs": ["ans_raw", "ans_predicted"] - }, - { - "name": "squad_v1_em", - "inputs": ["ans_raw", "ans_predicted"] - } - ], - "evaluation_targets": ["valid"], - "class_name": "nn_trainer" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/squad_model_ru_1.4_cpu_compatible.tar.gz", - "subdir": "{MODELS_PATH}" - }, - { - "url": "http://files.deeppavlov.ai/embeddings/ft_native_300_ru_wiki_lenta_nltk_word_tokenize/ft_native_300_ru_wiki_lenta_nltk_word_tokenize.vec", - "subdir": "{DOWNLOADS_PATH}/embeddings" - }, - { - "url": "http://files.deeppavlov.ai/embeddings/ft_native_300_ru_wiki_lenta_nltk_word_tokenize-char.vec", - "subdir": "{DOWNLOADS_PATH}/embeddings" - } - ] - } -} \ No newline at end of file diff --git a/deeppavlov/configs/squad/squad_ru_bert.json b/deeppavlov/configs/squad/squad_ru_bert.json index 7b105b47ef..dcfa165314 100644 --- a/deeppavlov/configs/squad/squad_ru_bert.json +++ b/deeppavlov/configs/squad/squad_ru_bert.json @@ -11,98 +11,167 @@ "shuffle": true }, "chainer": { - "in": ["context_raw", "question_raw"], - "in_y": ["ans_raw", "ans_raw_start"], + "in": [ + "context_raw", + "question_raw" + ], + "in_y": [ + "ans_raw", + "ans_raw_start" + ], "pipe": [ { - "class_name": "bert_preprocessor", - "vocab_file": "{DOWNLOADS_PATH}/bert_models/multi_cased_L-12_H-768_A-12/vocab.txt", - "do_lower_case": "{lowercase}", + "class_name": "torch_squad_transformers_preprocessor", + "vocab_file": "{TRANSFORMER}", + "do_lower_case": "{LOWERCASE}", "max_seq_length": 384, - "in": ["question_raw", "context_raw"], - "out": ["bert_features"] + "in": [ + "question_raw", + "context_raw" + ], + "out": [ + "bert_features", + "subtokens", + "split_context" + ] }, { "class_name": "squad_bert_mapping", - "do_lower_case": "{lowercase}", - "in": ["context_raw", "bert_features"], - "out": ["subtok2chars", "char2subtoks"] + "do_lower_case": "{LOWERCASE}", + "in": [ + "split_context", + "bert_features", + "subtokens" + ], + "out": [ + "subtok2chars", + "char2subtoks" + ] }, { "class_name": "squad_bert_ans_preprocessor", - "do_lower_case": "{lowercase}", - "in": ["ans_raw", "ans_raw_start","char2subtoks"], - "out": ["ans", "ans_start", "ans_end"] + "do_lower_case": "{LOWERCASE}", + "in": [ + "ans_raw", + "ans_raw_start", + "char2subtoks" + ], + "out": [ + "ans", + "ans_start", + "ans_end" + ] }, { - "class_name": "squad_bert_model", - "bert_config_file": "{DOWNLOADS_PATH}/bert_models/multi_cased_L-12_H-768_A-12/bert_config.json", - "pretrained_bert": "{DOWNLOADS_PATH}/bert_models/multi_cased_L-12_H-768_A-12/bert_model.ckpt", - "save_path": "{MODELS_PATH}/squad_ru_bert/model_multi", - "load_path": "{MODELS_PATH}/squad_ru_bert/model_multi", - "keep_prob": 0.5, - "learning_rate": 2e-05, + "class_name": "torch_transformers_squad", + "pretrained_bert": "{TRANSFORMER}", + "save_path": "{MODEL_PATH}/model", + "load_path": "{MODEL_PATH}/model", + "optimizer": "AdamW", + "optimizer_parameters": { + "lr": 2e-05, + "weight_decay": 0.01, + "betas": [ + 0.9, + 0.999 + ], + "eps": 1e-06 + }, "learning_rate_drop_patience": 3, "learning_rate_drop_div": 2.0, - "in": ["bert_features"], - "in_y": ["ans_start", "ans_end"], - "out": ["ans_start_predicted", "ans_end_predicted", "logits", "score"] + "in": [ + "bert_features" + ], + "in_y": [ + "ans_start", + "ans_end" + ], + "out": [ + "ans_start_predicted", + "ans_end_predicted", + "logits", + "scores", + "inds" + ] }, { "class_name": "squad_bert_ans_postprocessor", - "in": ["ans_start_predicted", "ans_end_predicted", "context_raw", "bert_features", "subtok2chars"], - "out": ["ans_predicted", "ans_start_predicted", "ans_end_predicted"] + "in": [ + "ans_start_predicted", + "ans_end_predicted", + "split_context", + "subtok2chars", + "subtokens", + "inds" + ], + "out": [ + "ans_predicted", + "ans_start_predicted", + "ans_end_predicted" + ] } ], - "out": ["ans_predicted", "ans_start_predicted", "logits"] + "out": [ + "ans_predicted", + "ans_start_predicted", + "scores" + ] }, "train": { "show_examples": false, - "test_best": false, - "validate_best": true, + "evaluation_targets": [ + "valid" + ], "log_every_n_batches": 250, "val_every_n_batches": 500, "batch_size": 10, - "pytest_max_batches": 2, - "pytest_batch_size": 5, "validation_patience": 10, "metrics": [ { - "name": "squad_v2_f1", - "inputs": ["ans", "ans_predicted"] + "name": "squad_v1_f1", + "inputs": [ + "ans", + "ans_predicted" + ] }, { - "name": "squad_v2_em", - "inputs": ["ans", "ans_predicted"] + "name": "squad_v1_em", + "inputs": [ + "ans", + "ans_predicted" + ] }, { - "name": "squad_v1_f1", - "inputs": ["ans", "ans_predicted"] + "name": "squad_v2_f1", + "inputs": [ + "ans", + "ans_predicted" + ] }, { - "name": "squad_v1_em", - "inputs": ["ans", "ans_predicted"] + "name": "squad_v2_em", + "inputs": [ + "ans", + "ans_predicted" + ] } ], - "tensorboard_log_dir": "{MODELS_PATH}/squad_ru_bert/logs" + "class_name": "torch_trainer" }, "metadata": { "variables": { - "lowercase": false, + "LOWERCASE": false, + "TRANSFORMER": "DeepPavlov/rubert-base-cased", "ROOT_PATH": "~/.deeppavlov", "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models" + "MODELS_PATH": "{ROOT_PATH}/models", + "MODEL_PATH": "{MODELS_PATH}/squad_ru_torch_bert/{TRANSFORMER}" }, "download": [ { - "url": "http://files.deeppavlov.ai/deeppavlov_data/bert/multi_cased_L-12_H-768_A-12.zip", - "subdir": "{DOWNLOADS_PATH}/bert_models" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/squad_ru_bert.tar.gz", + "url": "http://files.deeppavlov.ai/v1/squad/squad_ru_torch_bert.tar.gz", "subdir": "{MODELS_PATH}" } - ] + ] } } - diff --git a/deeppavlov/configs/squad/squad_ru_bert_infer.json b/deeppavlov/configs/squad/squad_ru_bert_infer.json deleted file mode 100644 index 83cc2cdd68..0000000000 --- a/deeppavlov/configs/squad/squad_ru_bert_infer.json +++ /dev/null @@ -1,78 +0,0 @@ -{ - "dataset_reader": { - "class_name": "squad_dataset_reader", - "dataset": "SberSQuADClean", - "url": "http://files.deeppavlov.ai/datasets/sber_squad_clean-v1.1.tar.gz", - "data_path": "{DOWNLOADS_PATH}/squad_ru_clean/" - }, - "dataset_iterator": { - "class_name": "squad_iterator", - "seed": 1337, - "shuffle": true - }, - "chainer": { - "in": ["context_raw", "question_raw"], - "in_y": ["ans_raw", "ans_raw_start"], - "pipe": [ - { - "class_name": "squad_bert_infer", - "lang": "ru", - "batch_size": 10, - "squad_model_config": "{CONFIGS_PATH}/squad/squad_ru_bert.json", - "vocab_file": "{DOWNLOADS_PATH}/bert_models/multi_cased_L-12_H-768_A-12/vocab.txt", - "do_lower_case": false, - "max_seq_length": 512, - "in": ["context_raw", "question_raw"], - "out": ["ans_predicted", "ans_start_predicted", "logits", "score"] - } - ], - "out": ["ans_predicted", "ans_start_predicted", "logits"] - }, - "train": { - "show_examples": false, - "test_best": false, - "validate_best": true, - "log_every_n_batches": 250, - "val_every_n_batches": 500, - "batch_size": 10, - "pytest_max_batches": 2, - "validation_patience": 10, - "metrics": [ - { - "name": "squad_v1_f1", - "inputs": ["ans_raw", "ans_predicted"] - }, - { - "name": "squad_v1_em", - "inputs": ["ans_raw", "ans_predicted"] - }, - { - "name": "squad_v2_f1", - "inputs": ["ans_raw", "ans_predicted"] - }, - { - "name": "squad_v2_em", - "inputs": ["ans_raw", "ans_predicted"] - } - ] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "CONFIGS_PATH": "{DEEPPAVLOV_PATH}/configs" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/bert/multi_cased_L-12_H-768_A-12.zip", - "subdir": "{DOWNLOADS_PATH}/bert_models" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/squad_ru_bert.tar.gz", - "subdir": "{MODELS_PATH}" - } - ] - } -} - diff --git a/deeppavlov/configs/squad/squad_ru_convers_distilrubert_2L.json b/deeppavlov/configs/squad/squad_ru_convers_distilrubert_2L.json index f278ad9627..1594b65ae1 100644 --- a/deeppavlov/configs/squad/squad_ru_convers_distilrubert_2L.json +++ b/deeppavlov/configs/squad/squad_ru_convers_distilrubert_2L.json @@ -26,21 +26,21 @@ "vocab_file": "{TRANSFORMER}", "do_lower_case": "{lowercase}", "max_seq_length": 384, - "return_tokens": true, "in": [ "question_raw", "context_raw" ], "out": [ "bert_features", - "subtokens" + "subtokens", + "split_context" ] }, { "class_name": "squad_bert_mapping", "do_lower_case": "{lowercase}", "in": [ - "context_raw", + "split_context", "bert_features", "subtokens" ], @@ -86,7 +86,9 @@ "out": [ "ans_start_predicted", "ans_end_predicted", - "logits" + "logits", + "scores", + "inds" ] }, { @@ -94,10 +96,10 @@ "in": [ "ans_start_predicted", "ans_end_predicted", - "context_raw", - "bert_features", + "split_context", "subtok2chars", - "subtokens" + "subtokens", + "inds" ], "out": [ "ans_predicted", @@ -109,7 +111,7 @@ "out": [ "ans_predicted", "ans_start_predicted", - "logits" + "scores" ] }, "train": { diff --git a/deeppavlov/configs/squad/squad_ru_convers_distilrubert_2L_infer.json b/deeppavlov/configs/squad/squad_ru_convers_distilrubert_2L_infer.json deleted file mode 100644 index 9202d83ba8..0000000000 --- a/deeppavlov/configs/squad/squad_ru_convers_distilrubert_2L_infer.json +++ /dev/null @@ -1,76 +0,0 @@ -{ - "dataset_reader": { - "class_name": "squad_dataset_reader", - "dataset": "SberSQuADClean", - "url": "http://files.deeppavlov.ai/datasets/sber_squad_clean-v1.1.tar.gz", - "data_path": "{DOWNLOADS_PATH}/squad_ru_clean/" - }, - "dataset_iterator": { - "class_name": "squad_iterator", - "seed": 1337, - "shuffle": true - }, - "chainer": { - "in": ["context_raw", "question_raw"], - "in_y": ["ans_raw", "ans_raw_start"], - "pipe": [ - { - "class_name": "torch_transformers_squad_infer", - "lang": "ru", - "batch_size": 128, - "squad_model_config": "{CONFIGS_PATH}/squad/squad_ru_convers_distilrubert_2L.json", - "vocab_file": "{TRANSFORMER}", - "do_lower_case": "{lowercase}", - "max_seq_length": 256, - "in": ["context_raw", "question_raw"], - "out": ["ans_predicted", "ans_start_predicted", "logits"] - } - ], - "out": ["ans_predicted", "ans_start_predicted", "logits"] - }, - "train": { - "show_examples": false, - "evaluation_targets": [ - "valid" - ], - "log_every_n_batches": 250, - "val_every_n_batches": 500, - "batch_size": 10, - "validation_patience": 10, - "metrics": [ - { - "name": "squad_v2_f1", - "inputs": ["ans_raw", "ans_predicted"] - }, - { - "name": "squad_v2_em", - "inputs": ["ans_raw", "ans_predicted"] - }, - { - "name": "squad_v1_f1", - "inputs": ["ans_raw", "ans_predicted"] - }, - { - "name": "squad_v1_em", - "inputs": ["ans_raw", "ans_predicted"] - } - ] - }, - "metadata": { - "variables": { - "lowercase": false, - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "TRANSFORMER": "DeepPavlov/distilrubert-tiny-cased-conversational", - "MODELS_PATH": "{ROOT_PATH}/models", - "MODEL_PATH": "{MODELS_PATH}/squad_ru_convers_distilrubert_2L", - "CONFIGS_PATH": "{DEEPPAVLOV_PATH}/configs" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/squad_ru_convers_distilrubert_2L.tar.gz", - "subdir": "{MODELS_PATH}" - } - ] - } -} diff --git a/deeppavlov/configs/squad/squad_ru_convers_distilrubert_6L.json b/deeppavlov/configs/squad/squad_ru_convers_distilrubert_6L.json index 8ca10a28f7..1fa989377b 100644 --- a/deeppavlov/configs/squad/squad_ru_convers_distilrubert_6L.json +++ b/deeppavlov/configs/squad/squad_ru_convers_distilrubert_6L.json @@ -26,21 +26,21 @@ "vocab_file": "{TRANSFORMER}", "do_lower_case": "{lowercase}", "max_seq_length": 384, - "return_tokens": true, "in": [ "question_raw", "context_raw" ], "out": [ "bert_features", - "subtokens" + "subtokens", + "split_context" ] }, { "class_name": "squad_bert_mapping", "do_lower_case": "{lowercase}", "in": [ - "context_raw", + "split_context", "bert_features", "subtokens" ], @@ -86,7 +86,9 @@ "out": [ "ans_start_predicted", "ans_end_predicted", - "logits" + "logits", + "scores", + "inds" ] }, { @@ -94,10 +96,10 @@ "in": [ "ans_start_predicted", "ans_end_predicted", - "context_raw", - "bert_features", + "split_context", "subtok2chars", - "subtokens" + "subtokens", + "inds" ], "out": [ "ans_predicted", @@ -109,7 +111,7 @@ "out": [ "ans_predicted", "ans_start_predicted", - "logits" + "scores" ] }, "train": { diff --git a/deeppavlov/configs/squad/squad_ru_convers_distilrubert_6L_infer.json b/deeppavlov/configs/squad/squad_ru_convers_distilrubert_6L_infer.json deleted file mode 100644 index 5c6171311c..0000000000 --- a/deeppavlov/configs/squad/squad_ru_convers_distilrubert_6L_infer.json +++ /dev/null @@ -1,76 +0,0 @@ -{ - "dataset_reader": { - "class_name": "squad_dataset_reader", - "dataset": "SberSQuADClean", - "url": "http://files.deeppavlov.ai/datasets/sber_squad_clean-v1.1.tar.gz", - "data_path": "{DOWNLOADS_PATH}/squad_ru_clean/" - }, - "dataset_iterator": { - "class_name": "squad_iterator", - "seed": 1337, - "shuffle": true - }, - "chainer": { - "in": ["context_raw", "question_raw"], - "in_y": ["ans_raw", "ans_raw_start"], - "pipe": [ - { - "class_name": "torch_transformers_squad_infer", - "lang": "ru", - "batch_size": 128, - "squad_model_config": "{CONFIGS_PATH}/squad/squad_ru_convers_distilrubert_6L.json", - "vocab_file": "{TRANSFORMER}", - "do_lower_case": "{lowercase}", - "max_seq_length": 256, - "in": ["context_raw", "question_raw"], - "out": ["ans_predicted", "ans_start_predicted", "logits"] - } - ], - "out": ["ans_predicted", "ans_start_predicted", "logits"] - }, - "train": { - "show_examples": false, - "evaluation_targets": [ - "valid" - ], - "log_every_n_batches": 250, - "val_every_n_batches": 500, - "batch_size": 10, - "validation_patience": 10, - "metrics": [ - { - "name": "squad_v2_f1", - "inputs": ["ans_raw", "ans_predicted"] - }, - { - "name": "squad_v2_em", - "inputs": ["ans_raw", "ans_predicted"] - }, - { - "name": "squad_v1_f1", - "inputs": ["ans_raw", "ans_predicted"] - }, - { - "name": "squad_v1_em", - "inputs": ["ans_raw", "ans_predicted"] - } - ] - }, - "metadata": { - "variables": { - "lowercase": false, - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "TRANSFORMER": "DeepPavlov/distilrubert-base-cased-conversational", - "MODELS_PATH": "{ROOT_PATH}/models", - "MODEL_PATH": "{MODELS_PATH}/squad_ru_convers_distilrubert_6L", - "CONFIGS_PATH": "{DEEPPAVLOV_PATH}/configs" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/squad_ru_convers_distilrubert_6L.tar.gz", - "subdir": "{MODELS_PATH}" - } - ] - } -} diff --git a/deeppavlov/configs/squad/squad_ru_rubert.json b/deeppavlov/configs/squad/squad_ru_rubert.json deleted file mode 100644 index e8070409da..0000000000 --- a/deeppavlov/configs/squad/squad_ru_rubert.json +++ /dev/null @@ -1,107 +0,0 @@ -{ - "dataset_reader": { - "class_name": "squad_dataset_reader", - "dataset": "SberSQuADClean", - "url": "http://files.deeppavlov.ai/datasets/sber_squad_clean-v1.1.tar.gz", - "data_path": "{DOWNLOADS_PATH}/squad_ru_clean/" - }, - "dataset_iterator": { - "class_name": "squad_iterator", - "seed": 1337, - "shuffle": true - }, - "chainer": { - "in": ["context_raw", "question_raw"], - "in_y": ["ans_raw", "ans_raw_start"], - "pipe": [ - { - "class_name": "bert_preprocessor", - "vocab_file": "{DOWNLOADS_PATH}/bert_models/rubert_cased_L-12_H-768_A-12_v1/vocab.txt", - "do_lower_case": "{lowercase}", - "max_seq_length": 384, - "in": ["question_raw", "context_raw"], - "out": ["bert_features"] - }, - { - "class_name": "squad_bert_mapping", - "do_lower_case": "{lowercase}", - "in": ["context_raw", "bert_features"], - "out": ["subtok2chars", "char2subtoks"] - }, - { - "class_name": "squad_bert_ans_preprocessor", - "do_lower_case": "{lowercase}", - "in": ["ans_raw", "ans_raw_start","char2subtoks"], - "out": ["ans", "ans_start", "ans_end"] - }, - { - "class_name": "squad_bert_model", - "bert_config_file": "{DOWNLOADS_PATH}/bert_models/rubert_cased_L-12_H-768_A-12_v1/bert_config.json", - "pretrained_bert": "{DOWNLOADS_PATH}/bert_models/rubert_cased_L-12_H-768_A-12_v1/bert_model.ckpt", - "save_path": "{MODELS_PATH}/squad_ru_bert/model_rubert", - "load_path": "{MODELS_PATH}/squad_ru_bert/model_rubert", - "keep_prob": 0.5, - "learning_rate": 2e-05, - "learning_rate_drop_patience": 3, - "learning_rate_drop_div": 2.0, - "in": ["bert_features"], - "in_y": ["ans_start", "ans_end"], - "out": ["ans_start_predicted", "ans_end_predicted", "logits", "score"] - }, - { - "class_name": "squad_bert_ans_postprocessor", - "in": ["ans_start_predicted", "ans_end_predicted", "context_raw", "bert_features", "subtok2chars"], - "out": ["ans_predicted", "ans_start_predicted", "ans_end_predicted"] - } - ], - "out": ["ans_predicted", "ans_start_predicted", "logits"] - }, - "train": { - "show_examples": false, - "test_best": false, - "validate_best": true, - "log_every_n_batches": 250, - "val_every_n_batches": 500, - "batch_size": 10, - "pytest_max_batches": 2, - "pytest_batch_size": 5, - "validation_patience": 10, - "metrics": [ - { - "name": "squad_v2_f1", - "inputs": ["ans", "ans_predicted"] - }, - { - "name": "squad_v2_em", - "inputs": ["ans", "ans_predicted"] - }, - { - "name": "squad_v1_f1", - "inputs": ["ans", "ans_predicted"] - }, - { - "name": "squad_v1_em", - "inputs": ["ans", "ans_predicted"] - } - ], - "tensorboard_log_dir": "{MODELS_PATH}/squad_ru_bert/logs_rubert" - }, - "metadata": { - "variables": { - "lowercase": false, - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/bert/rubert_cased_L-12_H-768_A-12_v1.tar.gz", - "subdir": "{DOWNLOADS_PATH}/bert_models" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/squad_model_ru_rubert.tar.gz", - "subdir": "{MODELS_PATH}" - } - ] - } -} diff --git a/deeppavlov/configs/squad/squad_ru_rubert_infer.json b/deeppavlov/configs/squad/squad_ru_rubert_infer.json deleted file mode 100644 index 5ea0c6e3e4..0000000000 --- a/deeppavlov/configs/squad/squad_ru_rubert_infer.json +++ /dev/null @@ -1,78 +0,0 @@ -{ - "dataset_reader": { - "class_name": "squad_dataset_reader", - "dataset": "SberSQuADClean", - "url": "http://files.deeppavlov.ai/datasets/sber_squad_clean-v1.1.tar.gz", - "data_path": "{DOWNLOADS_PATH}/squad_ru_clean/" - }, - "dataset_iterator": { - "class_name": "squad_iterator", - "seed": 1337, - "shuffle": true - }, - "chainer": { - "in": ["context_raw", "question_raw"], - "in_y": ["ans_raw", "ans_raw_start"], - "pipe": [ - { - "class_name": "squad_bert_infer", - "lang": "ru", - "batch_size": 128, - "squad_model_config": "{CONFIGS_PATH}/squad/squad_ru_rubert.json", - "vocab_file": "{DOWNLOADS_PATH}/bert_models/rubert_cased_L-12_H-768_A-12_v1/vocab.txt", - "do_lower_case": false, - "max_seq_length": 256, - "in": ["context_raw", "question_raw"], - "out": ["ans_predicted", "ans_start_predicted", "logits"] - } - ], - "out": ["ans_predicted", "ans_start_predicted", "logits"] - }, - "train": { - "show_examples": false, - "test_best": false, - "validate_best": true, - "log_every_n_batches": 250, - "val_every_n_batches": 500, - "batch_size": 10, - "pytest_max_batches": 2, - "validation_patience": 10, - "metrics": [ - { - "name": "squad_v2_f1", - "inputs": ["ans_raw", "ans_predicted"] - }, - { - "name": "squad_v2_em", - "inputs": ["ans_raw", "ans_predicted"] - }, - { - "name": "squad_v1_f1", - "inputs": ["ans_raw", "ans_predicted"] - }, - { - "name": "squad_v1_em", - "inputs": ["ans_raw", "ans_predicted"] - } - ] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "CONFIGS_PATH": "{DEEPPAVLOV_PATH}/configs" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/bert/rubert_cased_L-12_H-768_A-12_v1.tar.gz", - "subdir": "{DOWNLOADS_PATH}/bert_models" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/squad_model_ru_rubert.tar.gz", - "subdir": "{MODELS_PATH}" - } - ] - } -} - diff --git a/deeppavlov/configs/squad/squad_ru_torch_bert.json b/deeppavlov/configs/squad/squad_ru_torch_bert.json deleted file mode 100644 index 029777626e..0000000000 --- a/deeppavlov/configs/squad/squad_ru_torch_bert.json +++ /dev/null @@ -1,175 +0,0 @@ -{ - "dataset_reader": { - "class_name": "squad_dataset_reader", - "dataset": "SberSQuADClean", - "url": "http://files.deeppavlov.ai/datasets/sber_squad_clean-v1.1.tar.gz", - "data_path": "{DOWNLOADS_PATH}/squad_ru_clean/" - }, - "dataset_iterator": { - "class_name": "squad_iterator", - "seed": 1337, - "shuffle": true - }, - "chainer": { - "in": [ - "context_raw", - "question_raw" - ], - "in_y": [ - "ans_raw", - "ans_raw_start" - ], - "pipe": [ - { - "class_name": "torch_squad_transformers_preprocessor", - "vocab_file": "{TRANSFORMER}", - "do_lower_case": "{LOWERCASE}", - "max_seq_length": 384, - "return_tokens": true, - "in": [ - "question_raw", - "context_raw" - ], - "out": [ - "bert_features", - "subtokens" - ] - }, - { - "class_name": "squad_bert_mapping", - "do_lower_case": "{LOWERCASE}", - "in": [ - "context_raw", - "bert_features", - "subtokens" - ], - "out": [ - "subtok2chars", - "char2subtoks" - ] - }, - { - "class_name": "squad_bert_ans_preprocessor", - "do_lower_case": "{LOWERCASE}", - "in": [ - "ans_raw", - "ans_raw_start", - "char2subtoks" - ], - "out": [ - "ans", - "ans_start", - "ans_end" - ] - }, - { - "class_name": "torch_transformers_squad", - "pretrained_bert": "{TRANSFORMER}", - "save_path": "{MODEL_PATH}/model", - "load_path": "{MODEL_PATH}/model", - "optimizer": "AdamW", - "optimizer_parameters": { - "lr": 2e-05, - "weight_decay": 0.01, - "betas": [ - 0.9, - 0.999 - ], - "eps": 1e-06 - }, - "learning_rate_drop_patience": 3, - "learning_rate_drop_div": 2.0, - "in": [ - "bert_features" - ], - "in_y": [ - "ans_start", - "ans_end" - ], - "out": [ - "ans_start_predicted", - "ans_end_predicted", - "logits" - ] - }, - { - "class_name": "squad_bert_ans_postprocessor", - "in": [ - "ans_start_predicted", - "ans_end_predicted", - "context_raw", - "bert_features", - "subtok2chars", - "subtokens" - ], - "out": [ - "ans_predicted", - "ans_start_predicted", - "ans_end_predicted" - ] - } - ], - "out": [ - "ans_predicted", - "ans_start_predicted", - "logits" - ] - }, - "train": { - "show_examples": false, - "evaluation_targets": [ - "valid" - ], - "log_every_n_batches": 250, - "val_every_n_batches": 500, - "batch_size": 10, - "validation_patience": 10, - "metrics": [ - { - "name": "squad_v1_f1", - "inputs": [ - "ans", - "ans_predicted" - ] - }, - { - "name": "squad_v1_em", - "inputs": [ - "ans", - "ans_predicted" - ] - }, - { - "name": "squad_v2_f1", - "inputs": [ - "ans", - "ans_predicted" - ] - }, - { - "name": "squad_v2_em", - "inputs": [ - "ans", - "ans_predicted" - ] - } - ], - "class_name": "torch_trainer" - }, - "metadata": { - "variables": { - "LOWERCASE": false, - "TRANSFORMER": "DeepPavlov/rubert-base-cased", - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "MODEL_PATH": "{MODELS_PATH}/squad_ru_torch_bert/{TRANSFORMER}" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/v1/squad/squad_ru_torch_bert.tar.gz", - "subdir": "{MODELS_PATH}" - } - ] - } -} diff --git a/deeppavlov/configs/squad/squad_torch_bert_infer.json b/deeppavlov/configs/squad/squad_torch_bert_infer.json deleted file mode 100644 index 62398a515e..0000000000 --- a/deeppavlov/configs/squad/squad_torch_bert_infer.json +++ /dev/null @@ -1,69 +0,0 @@ -{ - "dataset_reader": { - "class_name": "squad_dataset_reader", - "data_path": "{DOWNLOADS_PATH}/squad/" - }, - "dataset_iterator": { - "class_name": "squad_iterator", - "seed": 1337, - "shuffle": true - }, - "chainer": { - "in": ["context_raw", "question_raw"], - "in_y": ["ans_raw", "ans_raw_start"], - "pipe": [ - { - "class_name": "torch_transformers_squad_infer", - "batch_size": 10, - "squad_model_config": "{CONFIGS_PATH}/squad/squad_torch_bert.json", - "vocab_file": "bert-base-cased", - "do_lower_case": false, - "max_seq_length": 384, - "in": ["context_raw", "question_raw"], - "out": ["ans_predicted", "ans_start_predicted", "logits"] - } - ], - "out": ["ans_predicted", "ans_start_predicted", "logits"] - }, - "train": { - "show_examples": false, - "evaluation_targets": [ - "valid" - ], - "log_every_n_batches": 250, - "val_every_n_batches": 500, - "batch_size": 10, - "pytest_max_batches": 2, - "validation_patience": 10, - "metrics": [ - { - "name": "squad_v1_f1", - "inputs": ["ans_raw", "ans_predicted"] - }, - { - "name": "squad_v1_em", - "inputs": ["ans_raw", "ans_predicted"] - }, - { - "name": "squad_v2_f1", - "inputs": ["ans_raw", "ans_predicted"] - }, - { - "name": "squad_v2_em", - "inputs": ["ans_raw", "ans_predicted"] - } - ] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "TRANSFORMER": "bert-base-cased", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models/squad_torch_bert/{TRANSFORMER}", - "CONFIGS_PATH": "{DEEPPAVLOV_PATH}/configs" - }, - "download": [ - ] - } -} - diff --git a/deeppavlov/configs/squad/squad_zh_bert_mult.json b/deeppavlov/configs/squad/squad_zh_bert_mult.json deleted file mode 100644 index 50cac7569f..0000000000 --- a/deeppavlov/configs/squad/squad_zh_bert_mult.json +++ /dev/null @@ -1,118 +0,0 @@ -{ - "dataset_reader": { - "class_name": "squad_dataset_reader", - "dataset": "SQuAD", - "url": "http://files.deeppavlov.ai/datasets/DRCD.tar.gz", - "data_path": "{DOWNLOADS_PATH}/DRCD_train/" - }, - "dataset_iterator": { - "class_name": "squad_iterator", - "seed": 1337, - "shuffle": true - }, - "chainer": { - "in": ["context_raw", "question_raw"], - "in_y": ["ans_raw", "ans_raw_start"], - "pipe": [ - { - "class_name": "bert_preprocessor", - "vocab_file": "{DOWNLOADS_PATH}/bert_models/multi_cased_L-12_H-768_A-12/vocab.txt", - "do_lower_case": "{lowercase}", - "max_seq_length": 384, - "in": ["question_raw", "context_raw"], - "out": ["bert_features"] - }, - { - "class_name": "squad_bert_mapping", - "do_lower_case": "{lowercase}", - "in": ["context_raw", "bert_features"], - "out": ["subtok2chars", "char2subtoks"] - }, - { - "class_name": "squad_bert_ans_preprocessor", - "do_lower_case": "{lowercase}", - "in": ["ans_raw", "ans_raw_start","char2subtoks"], - "out": ["ans", "ans_start", "ans_end"] - }, - { - "class_name": "squad_bert_model", - "bert_config_file": "{DOWNLOADS_PATH}/bert_models/multi_cased_L-12_H-768_A-12/bert_config.json", - "pretrained_bert": "{DOWNLOADS_PATH}/bert_models/multi_cased_L-12_H-768_A-12/bert_model.ckpt", - "save_path": "{MODELS_PATH}/squad_zh_bert/model_multi", - "load_path": "{MODELS_PATH}/squad_zh_bert/model_multi", - "keep_prob": 0.5, - "learning_rate": 2e-05, - "learning_rate_drop_patience": 3, - "learning_rate_drop_div": 2.0, - "in": ["bert_features"], - "in_y": ["ans_start", "ans_end"], - "out": ["ans_start_predicted", "ans_end_predicted", "logits", "score"] - }, - { - "class_name": "squad_bert_ans_postprocessor", - "in": ["ans_start_predicted", "ans_end_predicted", "context_raw", "bert_features", "subtok2chars"], - "out": ["ans_predicted", "ans_start_predicted", "ans_end_predicted"] - }, - { - "in": "ans", - "out": "ans_tok", - "class_name": "jieba_tokenizer" - }, - { - "in": "ans_predicted", - "out": "ans_predicted_tok", - "class_name": "jieba_tokenizer" - } - ], - "out": ["ans_predicted", "ans_start_predicted", "logits"] - }, - "train": { - "show_examples": false, - "test_best": false, - "validate_best": true, - "log_every_n_batches": 250, - "val_every_n_batches": 500, - "batch_size": 8, - "pytest_max_batches": 2, - "pytest_batch_size": 5, - "validation_patience": 10, - "metrics": [ - { - "name": "squad_v2_f1", - "inputs": ["ans_tok", "ans_predicted_tok"] - }, - { - "name": "squad_v2_em", - "inputs": ["ans_tok", "ans_predicted_tok"] - }, - { - "name": "squad_v1_f1", - "inputs": ["ans_tok", "ans_predicted_tok"] - }, - { - "name": "squad_v1_em", - "inputs": ["ans_tok", "ans_predicted_tok"] - } - ], - "tensorboard_log_dir": "{MODELS_PATH}/squad_zh_bert/logs" - }, - "metadata": { - "variables": { - "lowercase": false, - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/bert/multi_cased_L-12_H-768_A-12.zip", - "subdir": "{DOWNLOADS_PATH}/bert_models" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/squad_zh.tar.gz", - "subdir": "{MODELS_PATH}" - } - ] - } -} - diff --git a/deeppavlov/configs/squad/squad_zh_bert_zh.json b/deeppavlov/configs/squad/squad_zh_bert_zh.json deleted file mode 100644 index 5864236cf9..0000000000 --- a/deeppavlov/configs/squad/squad_zh_bert_zh.json +++ /dev/null @@ -1,118 +0,0 @@ -{ - "dataset_reader": { - "class_name": "squad_dataset_reader", - "dataset": "SQuAD", - "url": "http://files.deeppavlov.ai/datasets/DRCD.tar.gz", - "data_path": "{DOWNLOADS_PATH}/DRCD_train/" - }, - "dataset_iterator": { - "class_name": "squad_iterator", - "seed": 1337, - "shuffle": true - }, - "chainer": { - "in": ["context_raw", "question_raw"], - "in_y": ["ans_raw", "ans_raw_start"], - "pipe": [ - { - "class_name": "bert_preprocessor", - "vocab_file": "{DOWNLOADS_PATH}/bert_models/chinese_L-12_H-768_A-12/vocab.txt", - "do_lower_case": "{lowercase}", - "max_seq_length": 384, - "in": ["question_raw", "context_raw"], - "out": ["bert_features"] - }, - { - "class_name": "squad_bert_mapping", - "do_lower_case": "{lowercase}", - "in": ["context_raw", "bert_features"], - "out": ["subtok2chars", "char2subtoks"] - }, - { - "class_name": "squad_bert_ans_preprocessor", - "do_lower_case": "{lowercase}", - "in": ["ans_raw", "ans_raw_start","char2subtoks"], - "out": ["ans", "ans_start", "ans_end"] - }, - { - "class_name": "squad_bert_model", - "bert_config_file": "{DOWNLOADS_PATH}/bert_models/chinese_L-12_H-768_A-12/bert_config.json", - "pretrained_bert": "{DOWNLOADS_PATH}/bert_models/chinese_L-12_H-768_A-12/bert_model.ckpt", - "save_path": "{MODELS_PATH}/squad_zh_bert/model_zh", - "load_path": "{MODELS_PATH}/squad_zh_bert/model_zh", - "keep_prob": 0.5, - "learning_rate": 2e-05, - "learning_rate_drop_patience": 3, - "learning_rate_drop_div": 2.0, - "in": ["bert_features"], - "in_y": ["ans_start", "ans_end"], - "out": ["ans_start_predicted", "ans_end_predicted", "logits", "score"] - }, - { - "class_name": "squad_bert_ans_postprocessor", - "in": ["ans_start_predicted", "ans_end_predicted", "context_raw", "bert_features", "subtok2chars"], - "out": ["ans_predicted", "ans_start_predicted", "ans_end_predicted"] - }, - { - "in": "ans", - "out": "ans_tok", - "class_name": "jieba_tokenizer" - }, - { - "in": "ans_predicted", - "out": "ans_predicted_tok", - "class_name": "jieba_tokenizer" - } - ], - "out": ["ans_predicted", "ans_start_predicted", "logits"] - }, - "train": { - "show_examples": false, - "test_best": false, - "validate_best": true, - "log_every_n_batches": 250, - "val_every_n_batches": 500, - "batch_size": 8, - "pytest_max_batches": 2, - "pytest_batch_size": 5, - "validation_patience": 10, - "metrics": [ - { - "name": "squad_v2_f1", - "inputs": ["ans_tok", "ans_predicted_tok"] - }, - { - "name": "squad_v2_em", - "inputs": ["ans_tok", "ans_predicted_tok"] - }, - { - "name": "squad_v1_f1", - "inputs": ["ans_tok", "ans_predicted_tok"] - }, - { - "name": "squad_v1_em", - "inputs": ["ans_tok", "ans_predicted_tok"] - } - ], - "tensorboard_log_dir": "{MODELS_PATH}/squad_zh_bert/logs" - }, - "metadata": { - "variables": { - "lowercase": false, - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/bert/chinese_L-12_H-768_A-12.zip", - "subdir": "{DOWNLOADS_PATH}/bert_models" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/squad_model_zh_zhbert.tar.gz", - "subdir": "{MODELS_PATH}" - } - ] - } -} - diff --git a/deeppavlov/configs/syntax/ru_syntagrus_joint_parsing.json b/deeppavlov/configs/syntax/ru_syntagrus_joint_parsing.json deleted file mode 100644 index 739a09433c..0000000000 --- a/deeppavlov/configs/syntax/ru_syntagrus_joint_parsing.json +++ /dev/null @@ -1,33 +0,0 @@ -{ - "chainer": { - "in": [ - "x_words" - ], - "pipe": [ - { - "id": "main", - "class_name": "joint_tagger_parser", - "tagger": {"config_path": "{CONFIGS_PATH}/morpho_tagger/BERT/morpho_ru_syntagrus_bert.json"}, - "parser": {"config_path": "{CONFIGS_PATH}/syntax/syntax_ru_syntagrus_bert.json"}, - "to_output_string": true, - "in": [ - "x_words" - ], - "out": [ - "y_parsed" - ] - } - ], - "out": [ - "y_parsed" - ] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "CONFIGS_PATH": "{DEEPPAVLOV_PATH}/configs", - "MODELS_PATH": "{ROOT_PATH}/models" - } - } -} diff --git a/deeppavlov/configs/syntax/syntax_ru_syntagrus_bert.json b/deeppavlov/configs/syntax/syntax_ru_syntagrus_bert.json deleted file mode 100644 index 86244bee5f..0000000000 --- a/deeppavlov/configs/syntax/syntax_ru_syntagrus_bert.json +++ /dev/null @@ -1,183 +0,0 @@ -{ - "dataset_reader": { - "class_name": "morphotagger_dataset_reader", - "data_path": "{DOWNLOADS_PATH}/UD2.3_source", - "language": "ru_syntagrus", - "data_types": [ - "train", "dev", "test" - ], - "read_syntax": true - }, - "dataset_iterator": { - "class_name": "morphotagger_dataset" - }, - "chainer": { - "in": ["x"], - "in_y": ["y_tags", "y_heads", "y_deps"], - "pipe": [ - { - "in": [ - "x" - ], - "class_name": "lazy_tokenizer", - "out": [ - "x_words" - ] - }, - { - "class_name": "bert_ner_preprocessor", - "vocab_file": "{BERT_PATH}/vocab.txt", - "do_lower_case": false, - "max_seq_length": 512, - "max_subword_length": 15, - "subword_mask_mode": "last", - "token_masking_prob": 0.0, - "in": ["x_words"], - "out": ["x_tokens", "x_subword_tokens", "x_subword_tok_ids", "startofword_markers", "attention_mask"] - }, - { - "id": "dep_vocab", - "class_name": "simple_vocab", - "min_freq": 3, - "fit_on": [ - "y_deps" - ], - "in": ["y_deps"], - "out": ["y_deps_indexes"], - "special_tokens": [ - "PAD" - ], - "pad_with_zeros": true, - "save_path": "{WORK_PATH}/deps.dict", - "load_path": "{WORK_PATH}/deps.dict" - }, - { - "class_name": "bert_syntax_parser", - "n_deps": "#dep_vocab.len", - "state_size": 384, - "keep_prob": 0.1, - "bert_config_file": "{BERT_PATH}/bert_config.json", - "pretrained_bert": "{BERT_PATH}/bert_model.ckpt", - "attention_probs_keep_prob": 0.5, - "use_crf": false, - "return_probas": true, - "encoder_layer_ids": [6, 7, 8, 9, 10, 11], - "optimizer": "tf.train:AdamOptimizer", - "learning_rate": 1e-3, - "bert_learning_rate": 2e-5, - "min_learning_rate": 1e-7, - "use_birnn": true, - "learning_rate_drop_patience": 30, - "learning_rate_drop_div": 1.5, - "load_before_drop": true, - "clip_norm": null, - "save_path": "{WORK_PATH}/model_joint", - "load_path": "{WORK_PATH}/model_joint", - "in": ["x_subword_tok_ids", "attention_mask", "startofword_markers"], - "in_y": ["y_heads", "y_deps_indexes"], - "out": ["y_predicted_heads_probs", "y_predicted_deps_indexes"] - }, - { - "class_name": "chu_liu_edmonds_transformer", - "in": ["y_predicted_heads_probs"], - "out": ["y_predicted_heads"] - }, - { - "ref": "dep_vocab", - "in": ["y_predicted_deps_indexes"], - "out": ["y_predicted_deps"] - }, - { - "in": [ - "x_words", - "y_predicted_heads", - "y_predicted_deps" - ], - "out": [ - "y_prettified" - ], - "id": "dependency_output_prettifier", - "class_name": "dependency_output_prettifier", - "end": "\n" - } - ], - "out": [ - "y_prettified" - ] - }, - "train": { - "epochs": 10, - "batch_size": 32, - "metrics": [ - { - "name": "multitask_token_accuracy", - "alias": "LAS", - "inputs": [ - "y_deps", - "y_heads", - "y_predicted_deps", - "y_predicted_heads" - ] - }, - { - "name": "multitask_sequence_accuracy", - "alias": "sentence_LAS", - "inputs": [ - "y_deps", - "y_heads", - "y_predicted_deps", - "y_predicted_heads" - ] - }, - { - "name": "per_token_accuracy", - "alias": "UAS", - "inputs": [ - "y_heads", - "y_predicted_heads" - ] - }, - { - "name": "accuracy", - "alias": "sentence_UAS", - "inputs": [ - "y_heads", - "y_predicted_heads" - ] - } - ], - "validation_patience": 10, - "val_every_n_epochs": 1, - "val_every_n_batches": 300, - - "tensorboard_log_dir": "{WORK_PATH}/logs", - "show_examples": false, - "pytest_max_batches": 2, - "pytest_batch_size": 8, - "evaluation_targets": ["valid", "test"], - "class_name": "nn_trainer" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "BERT_PATH": "{DOWNLOADS_PATH}/bert_models/rubert_cased_L-12_H-768_A-12_v1", - "WORK_PATH": "{MODELS_PATH}/syntax_ru_syntagrus" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/syntax_parser/syntax_ru_syntagrus_bert.tar.gz", - "subdir": "{WORK_PATH}" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/bert/rubert_cased_L-12_H-768_A-12_v1.tar.gz", - "subdir": "{DOWNLOADS_PATH}/bert_models" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/morpho_tagger/UD2.3/ru_syntagrus.tar.gz", - "subdir": "{DOWNLOADS_PATH}/UD2.3_source/ru_syntagrus" - } - ] - } -} diff --git a/deeppavlov/configs/tutorials/mt_bert/mt_bert_inference_tutorial.json b/deeppavlov/configs/tutorials/mt_bert/mt_bert_inference_tutorial.json deleted file mode 100644 index 9cb64a6fd3..0000000000 --- a/deeppavlov/configs/tutorials/mt_bert/mt_bert_inference_tutorial.json +++ /dev/null @@ -1,139 +0,0 @@ -{ - "chainer": { - "in": ["x"], - "pipe": [ - { - "class_name": "bert_preprocessor", - "vocab_file": "{BERT_PATH}/vocab.txt", - "do_lower_case": false, - "max_seq_length": 64, - "in": [ - "x" - ], - "out": [ - "bert_features" - ] - }, - { - "id": "classes_vocab_insults", - "class_name": "simple_vocab", - "save_path": "{INSULTS_PATH}/classes.dict", - "load_path": "{INSULTS_PATH}/classes.dict" - }, - { - "id": "classes_vocab_sentiment", - "class_name": "simple_vocab", - "save_path": "{SENTIMENT_PATH}/classes.dict", - "load_path": "{SENTIMENT_PATH}/classes.dict" - }, - { - "id": "tag_vocab", - "class_name": "simple_vocab", - "unk_token": ["O"], - "pad_with_zeros": true, - "save_path": "{NER_PATH}/tag.dict", - "load_path": "{NER_PATH}/tag.dict" - }, - { - "class_name": "bert_ner_preprocessor", - "vocab_file": "{BERT_PATH}/vocab.txt", - "do_lower_case": false, - "max_seq_length": 512, - "max_subword_length": 15, - "token_masking_prob": 0.0, - "in": ["x"], - "out": [ - "x_ner_tokens", - "x_ner_subword_tokens", - "x_ner_subword_tok_ids", - "ner_startofword_markers", - "ner_attention_mask"] - }, - - { - "id": "mt_bert", - "class_name": "mt_bert", - "inference_task_names": "ner", - "bert_config_file": "{BERT_PATH}/bert_config.json", - "save_path": "{MT_BERT_PATH}/model", - "load_path": "{MT_BERT_PATH}/model", - "pretrained_bert": "{BERT_PATH}/bert_model.ckpt", - "tasks": { - "insults": { - "class_name": "mt_bert_classification_task", - "n_classes": "#classes_vocab_insults.len", - "return_probas": true, - "one_hot_labels": true - }, - "sentiment": { - "class_name": "mt_bert_classification_task", - "n_classes": "#classes_vocab_sentiment.len", - "return_probas": true, - "one_hot_labels": true - }, - "ner": { - "class_name": "mt_bert_seq_tagging_task", - "n_tags": "#tag_vocab.len", - "return_probas": false, - "use_crf": true, - "encoder_layer_ids": [-1] - } - }, - "in": ["x_ner_subword_tok_ids", "ner_attention_mask", "ner_startofword_markers"], - "out": ["y_ner_pred_ind"] - }, - - { - "class_name": "mt_bert_reuser", - "mt_bert": "#mt_bert", - "task_names": [["insults", "sentiment"]], - "in_distribution": {"insults": 1, "sentiment": 1}, - "in": ["bert_features", "bert_features"], - "out": ["y_insults_pred_probas", "y_sentiment_pred_probas"] - }, - - { - "in": "y_insults_pred_probas", - "out": "y_insults_pred_ids", - "class_name": "proba2labels", - "max_proba": true - }, - { - "in": "y_insults_pred_ids", - "out": "y_insults_pred_labels", - "ref": "classes_vocab_insults" - }, - - { - "in": "y_sentiment_pred_probas", - "out": "y_sentiment_pred_ids", - "class_name": "proba2labels", - "max_proba": true - }, - { - "in": "y_sentiment_pred_ids", - "out": "y_sentiment_pred_labels", - "ref": "classes_vocab_sentiment" - }, - - { - "ref": "tag_vocab", - "in": ["y_ner_pred_ind"], - "out": ["y_ner_pred"] - } - ], - "out": ["y_insults_pred_labels", "y_sentiment_pred_labels", "y_ner_pred"] - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "BERT_PATH": "{DOWNLOADS_PATH}/bert_models/cased_L-12_H-768_A-12", - "MT_BERT_PATH": "{MODELS_PATH}/mt_bert_tutorial", - "INSULTS_PATH": "{MT_BERT_PATH}/insults", - "SENTIMENT_PATH": "{MT_BERT_PATH}/sentiment", - "NER_PATH": "{MT_BERT_PATH}/ner" - } - } -} diff --git a/deeppavlov/configs/tutorials/mt_bert/mt_bert_train_tutorial.json b/deeppavlov/configs/tutorials/mt_bert/mt_bert_train_tutorial.json deleted file mode 100644 index b6a30ad6e7..0000000000 --- a/deeppavlov/configs/tutorials/mt_bert/mt_bert_train_tutorial.json +++ /dev/null @@ -1,311 +0,0 @@ -{ - "dataset_reader": { - "class_name": "multitask_reader", - "data_path": "null", - "tasks": { - "insults": { - "reader_class_name": "basic_classification_reader", - "x": "Comment", - "y": "Class", - "data_path": "{DOWNLOADS_PATH}/insults_data" - }, - "sentiment": { - "reader_class_name": "basic_classification_reader", - "x": "text", - "y": "label", - "data_path": "{DOWNLOADS_PATH}/yelp_review_full_csv", - "train": "train.csv", - "test": "test.csv", - "header": null, - "names": [ - "label", - "text" - ] - }, - "ner": { - "reader_class_name": "conll2003_reader", - "data_path": "{DOWNLOADS_PATH}/conll2003/", - "dataset_name": "conll2003", - "provide_pos": false - } - } - }, - "dataset_iterator": { - "class_name": "multitask_iterator", - "tasks": { - "insults": { - "iterator_class_name": "basic_classification_iterator", - "seed": 42 - }, - "sentiment": { - "iterator_class_name": "basic_classification_iterator", - "seed": 42, - "split_seed": 23, - "field_to_split": "train", - "split_fields": [ - "train", - "valid" - ], - "split_proportions": [ - 0.9, - 0.1 - ] - }, - "ner": {"iterator_class_name": "data_learning_iterator"} - } - }, - "chainer": { - "in": ["x_insults", "x_sentiment", "x_ner"], - "in_y": ["y_insults", "y_sentiment", "y_ner"], - "pipe": [ - { - "class_name": "bert_preprocessor", - "vocab_file": "{BERT_PATH}/vocab.txt", - "do_lower_case": false, - "max_seq_length": 64, - "in": [ - "x_insults" - ], - "out": [ - "bert_features_insults" - ] - }, - { - "id": "classes_vocab_insults", - "class_name": "simple_vocab", - "fit_on": [ - "y_insults" - ], - "save_path": "{INSULTS_PATH}/classes.dict", - "load_path": "{INSULTS_PATH}/classes.dict", - "in": "y_insults", - "out": "y_insults_ids" - }, - { - "in": "y_insults_ids", - "out": "y_insults_onehot", - "class_name": "one_hotter", - "depth": "#classes_vocab_insults.len", - "single_vector": true - }, - - { - "class_name": "bert_preprocessor", - "vocab_file": "{BERT_PATH}/vocab.txt", - "do_lower_case": false, - "max_seq_length": 200, - "in": [ - "x_sentiment" - ], - "out": [ - "bert_features_sentiment" - ] - }, - { - "id": "classes_vocab_sentiment", - "class_name": "simple_vocab", - "fit_on": [ - "y_sentiment" - ], - "save_path": "{SENTIMENT_PATH}/classes.dict", - "load_path": "{SENTIMENT_PATH}/classes.dict", - "in": "y_sentiment", - "out": "y_sentiment_ids" - }, - { - "in": "y_sentiment_ids", - "out": "y_sentiment_onehot", - "class_name": "one_hotter", - "depth": "#classes_vocab_sentiment.len", - "single_vector": true - }, - - { - "class_name": "bert_ner_preprocessor", - "vocab_file": "{BERT_PATH}/vocab.txt", - "do_lower_case": false, - "max_seq_length": 512, - "max_subword_length": 15, - "token_masking_prob": 0.0, - "in": ["x_ner"], - "out": [ - "x_ner_tokens", - "x_ner_subword_tokens", - "x_ner_subword_tok_ids", - "ner_startofword_markers", - "ner_attention_mask"] - }, - { - "id": "tag_vocab", - "class_name": "simple_vocab", - "unk_token": ["O"], - "pad_with_zeros": true, - "save_path": "{NER_PATH}/tag.dict", - "load_path": "{NER_PATH}/tag.dict", - "fit_on": ["y_ner"], - "in": ["y_ner"], - "out": ["y_ner_ind"] - }, - - { - "id": "mt_bert", - "class_name": "mt_bert", - "save_path": "{MT_BERT_PATH}/model", - "load_path": "{MT_BERT_PATH}/model", - "bert_config_file": "{BERT_PATH}/bert_config.json", - "pretrained_bert": "{BERT_PATH}/bert_model.ckpt", - "attention_probs_keep_prob": 0.5, - "body_learning_rate": 3e-5, - "min_body_learning_rate": 2e-7, - "learning_rate_drop_patience": 10, - "learning_rate_drop_div": 1.5, - "load_before_drop": true, - "optimizer": "tf.train:AdamOptimizer", - "clip_norm": 1.0, - "tasks": { - "insults": { - "class_name": "mt_bert_classification_task", - "n_classes": "#classes_vocab_insults.len", - "keep_prob": 0.5, - "return_probas": true, - "learning_rate": 1e-3, - "one_hot_labels": true - }, - "sentiment": { - "class_name": "mt_bert_classification_task", - "n_classes": "#classes_vocab_sentiment.len", - "return_probas": true, - "one_hot_labels": true, - "keep_prob": 0.5, - "learning_rate": 1e-3 - }, - "ner": { - "class_name": "mt_bert_seq_tagging_task", - "n_tags": "#tag_vocab.len", - "return_probas": false, - "keep_prob": 0.5, - "learning_rate": 1e-3, - "use_crf": true, - "encoder_layer_ids": [-1] - } - }, - "in_distribution": {"insults": 1, "sentiment": 1, "ner": 3}, - "in": [ - "bert_features_insults", - "bert_features_sentiment", - "x_ner_subword_tok_ids", - "ner_attention_mask", - "ner_startofword_markers"], - "in_y_distribution": {"insults": 1, "sentiment": 1, "ner": 1}, - "in_y": ["y_insults_onehot", "y_sentiment_onehot", "y_ner_ind"], - "out": ["y_insults_pred_probas", "y_sentiment_pred_probas", "y_ner_pred_ind"] - }, - - { - "in": "y_insults_pred_probas", - "out": "y_insults_pred_ids", - "class_name": "proba2labels", - "max_proba": true - }, - { - "in": "y_insults_pred_ids", - "out": "y_insults_pred_labels", - "ref": "classes_vocab_insults" - }, - - { - "in": "y_sentiment_pred_probas", - "out": "y_sentiment_pred_ids", - "class_name": "proba2labels", - "max_proba": true - }, - { - "in": "y_sentiment_pred_ids", - "out": "y_sentiment_pred_labels", - "ref": "classes_vocab_sentiment" - }, - - { - "ref": "tag_vocab", - "in": ["y_ner_pred_ind"], - "out": ["y_ner_pred"] - } - ], - "out": ["y_insults_pred_labels", "y_sentiment_pred_labels", "y_ner_pred"] - }, - "train": { - "epochs": 30, - "batch_size": 16, - "metrics": [ - { - "name": "average__roc_auc__roc_auc__ner_f1", - "inputs": [ - "y_insults_onehot", - "y_insults_pred_probas", - "y_sentiment_onehot", - "y_sentiment_pred_probas", - "y_ner", - "y_ner_pred" - ] - }, - { - "name": "roc_auc", - "inputs": [ - "y_insults_onehot", - "y_insults_pred_probas" - ] - }, - { - "name": "accuracy", - "inputs": [ - "y_sentiment_onehot", - "y_sentiment_pred_probas" - ] - }, - { - "name": "ner_f1", - "inputs": ["y_ner", "y_ner_pred"] - }, - { - "name": "ner_token_f1", - "inputs": ["y_ner", "y_ner_pred"] - } - ], - "validation_patience": 100, - "val_every_n_batches": 20, - - "log_every_n_batches": 20, - "tensorboard_log_dir": "{MT_BERT_PATH}/logs", - "show_examples": false, - "pytest_max_batches": 2, - "pytest_batch_size": 8, - "evaluation_targets": ["valid", "test"], - "class_name": "nn_trainer" - }, - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "BERT_PATH": "{DOWNLOADS_PATH}/bert_models/cased_L-12_H-768_A-12", - "MT_BERT_PATH": "{MODELS_PATH}/mt_bert_tutorial", - "INSULTS_PATH": "{MT_BERT_PATH}/insults", - "SENTIMENT_PATH": "{MT_BERT_PATH}/sentiment", - "NER_PATH": "{MT_BERT_PATH}/ner" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/datasets/insults_data.tar.gz", - "subdir": "{DOWNLOADS_PATH}" - }, - { - "url": "http://files.deeppavlov.ai/datasets/yelp_review_full_csv.tar.gz", - "subdir": "{DOWNLOADS_PATH}" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/bert/cased_L-12_H-768_A-12.zip", - "subdir": "{DOWNLOADS_PATH}/bert_models" - } - ] - } -} diff --git a/deeppavlov/contrib/data/tools/train_set_generation.py b/deeppavlov/contrib/data/tools/train_set_generation.py deleted file mode 100644 index 23021fda2c..0000000000 --- a/deeppavlov/contrib/data/tools/train_set_generation.py +++ /dev/null @@ -1,177 +0,0 @@ -import json -import deeppavlov.models.go_bot.nlg.templates.templates as templ -from deeppavlov.models.go_bot.nlg.templates.templates import DefaultTemplate -from logging import getLogger -from deeppavlov.models.slotfill.slotfill_raw import SlotFillingComponent -from deeppavlov.core.data.sqlite_database import Sqlite3Database -from deeppavlov.models.go_bot.tracker.dialogue_state_tracker import DialogueStateTracker -from deeppavlov.models.go_bot.nlu.dto.nlu_response import NLUResponse -from typing import List -import re -import itertools -log = getLogger(__name__) - - -class TrainSetGeneration(): - """ - Generates train dataset in the DSTC2 format via a command line. - - Args: - tempalate_path = path to the go_bot template (in case of restaurant gobot, it is 'dstc2-templates.txt') - slot_path = path to the slot values path (in case of restaurant gobot, it is 'dstc_slot_vals.json') - save_path = this path is where you want to save the end result - db_path = path to the populated database from which results are retrieved - db_primary_key = the primary key of the database - """ - - def __init__(self, - template_path: str, - slot_path: str, - save_path: str, - db_path: str, - db_primary_key: List[str] = ['name']): - self.templates = templ.Templates(DefaultTemplate()).load(template_path) - self.slotfiller = SlotFillingComponent(load_path=slot_path, save_path=slot_path) - self.save_path = save_path - self.database = Sqlite3Database(db_path, db_primary_key) - self.ds_tracker = DialogueStateTracker(api_call_id=0, slot_names=list(self.slotfiller._slot_vals.keys()), - n_actions=len(self.templates.actions), - hidden_size=128, - database=self.database) - self.slots = list(set(list(itertools.chain.from_iterable(map(lambda x: re.findall(r"#(\w+)", x.text), self.templates.templates))) - + list(self.slotfiller._slot_vals.keys()))) - self.utters = [] - self.dialogs = [] - self.slots_history = {} - - - def get_id_input(self, - prompt: str, - valid_vals: List[int]) -> int: - # for neat output - print('\n' + '*' * 10) - idx = -1 - while idx == -1: - try: - idx = int(input('[INPUT] ' + prompt)) - if not idx in valid_vals: - print('[INFO] please input a valid integer in: ', valid_vals) - idx = -1 - except ValueError: - print('[INFO] please enter integer value') - return idx - - def save_dialogs(self) -> None: - from pathlib import Path - with open(Path(self.save_path), 'w', encoding='utf8') as f: - print('[INFO] saving the dialogs and exiting...') - json.dump(self.dialogs, f) - - def add_and_reset_utters(self) -> None: - if self.utters: - self.dialogs.append(self.utters) - self.utters = [] - self.slots_history = {} - self.ds_tracker.reset_state() - else: - self.dialogs.append([]) - - - - def get_user_input(self) -> None: - text = input('[INFO] write a user sentence: ') - has_slot = self.get_id_input(prompt = 'type 1 if your sentence has a slot, else 0: ', - valid_vals = [0, 1]) - slots = [] - if has_slot: - - while has_slot: - for i, key in enumerate(self.slots): - print(i, key) - idx = self.get_id_input(prompt = 'type slot category number from the list: ', - valid_vals = list(range(len(self.slots)))) - slot_category = self.slots[idx] - if slot_category in self.slotfiller._slot_vals: - id2key = {} - for i, key in enumerate(self.slotfiller._slot_vals[slot_category]): - print(i, key) - id2key[i]=key - idx = self.get_id_input(prompt = 'type slot subcategory number from the list: ', - valid_vals = list(range(len(id2key)))) - sub_category = id2key[idx] - else: - sub_category = '' - - slots.append([slot_category, sub_category]) - has_slot = self.get_id_input(prompt = 'type 1 if you want to add more slots, else 0: ', - valid_vals = [0, 1]) - - user_input = {'speaker': 1, - 'text': text, - 'slots': slots} - - print(user_input) - self.update_slots_history(slots) - self.utters.append(user_input) - - def update_slots_history(self, slots: List[List[str]]) -> None: - for slot, val in slots: - self.slots_history[slot] = val - - def start_generation(self) -> None: - - while True: - turn = self.get_id_input(prompt = 'choose turn (1 for user, 2 for bot, 3 to start a new dialog or 10 for saving and exit): ', - valid_vals = [1, 2, 3, 10]) - print('\n' + '*' * 10) - if turn == 1: - self.get_user_input() - elif turn == 2: - self.get_bot_output() - elif turn == 3: - self.add_and_reset_utters() - elif turn == 10: - self.add_and_reset_utters() - self.save_dialogs() - return - - - def get_bot_output(self) -> None: - - print('[INFO] current slot vals are: ', self.slots_history) - for i, act in enumerate(self.templates.actions): - print(i, act) - id = int(input('type template number from the list: ')) - #slots in the template chosen - template_slots = re.findall(r"#(\w+)", self.templates.templates[id].text) - slots = [[slot, self.slots_history[slot]] for slot in template_slots if slot in self.slots_history and slot in self.slotfiller._slot_vals] - # slots that are missing int the current slots history - missing_slots = [st for st in template_slots if st not in slots] - # get missing slots from the db - if missing_slots and self.ds_tracker.db_result: - for slot in missing_slots: - slots.append([slot, self.ds_tracker.db_result[slot]]) - text = self.templates.templates[id].generate_text(slots).strip() - print('[INFO] generated response is: ', text) - # make db call if 'api_call' - if 'api_call' in self.templates.templates[id].text: - nlu_response = NLUResponse(slots,None, None) - self.ds_tracker.update_state(nlu_response) - self.ds_tracker.make_api_call() - print('[INFO] the result of the db call is: ', self.ds_tracker.db_result) - - bot_output = {'speaker': 2, - 'text': text, - 'db_result': json.dumps(self.ds_tracker.db_result), - 'slots': slots, - 'act': self.templates.actions[id]} - else: - bot_output = {'speaker': 2, - 'text': text, - 'slots': slots, - 'act': self.templates.actions[id]} - self.utters.append(bot_output) - - - - diff --git a/deeppavlov/contrib/examples/Dataset_generation_tutorial.ipynb b/deeppavlov/contrib/examples/Dataset_generation_tutorial.ipynb deleted file mode 100644 index 9ba1b7ddb8..0000000000 --- a/deeppavlov/contrib/examples/Dataset_generation_tutorial.ipynb +++ /dev/null @@ -1,806 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "This is an example tutorial to generate a training dataset from scratch.\n", - "\n", - "The example here creates the first dialog from the dstc2 train dataset, but in a similar fashion, you can create any dataset that fits your purpose.\n", - "\n", - "To start, you will need:\n", - "1. dstc2 type template file. See the downloaded dstc2-templates.txt for a reference. You can create a new one with your own templates\n", - "2. slot values that you need to provide in a JSON format. See dstc_slot_vals.json as a reference\n", - "3. An sqlite database instance with a table that matches 1 and 2 above. Spend some time to see how these two relate to the database. Again, downloaded db.sqlite is a good starting poin.\n", - "\n", - "Once you have these, you are set to start your own dataset generation\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "!pip install deeppavlov\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import os\n", - "\n", - "from deeppavlov.contrib import examples\n", - "from deeppavlov.contrib.data.tools.train_set_generation import TrainSetGeneration\n", - "\n", - "template_fn = \"dstc2-templates.txt\"\n", - "slot_fn = \"dstc_slot_vals.json\"\n", - "db_fn = \"db.sqlite\"\n", - "\n", - "template_path = os.path.join(examples.__path__._path[0], template_fn)\n", - "slot_path = os.path.join(examples.__path__._path[0], slot_fn)\n", - "db_path = os.path.join(examples.__path__._path[0], db_fn)\n", - "\n", - "trainsetgen = TrainSetGeneration(template_path = template_path,\n", - " slot_path = slot_path,\n", - " save_path = \"generated_data.json\",\n", - " db_path = db_path)" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "**********\n", - "[INPUT] choose turn (1 for user, 2 for bot, 3 to start a new dialog or 10 for saving and exit): 2\n", - "\n", - "**********\n", - "[INFO] current slot vals are: {}\n", - "0 api_call\n", - "1 bye\n", - "2 canthear\n", - "3 canthelp_area\n", - "4 canthelp_area_food\n", - "5 canthelp_area_food_pricerange\n", - "6 canthelp_area_pricerange\n", - "7 canthelp_food\n", - "8 canthelp_food_pricerange\n", - "9 confirm-domain\n", - "10 expl-conf_area\n", - "11 expl-conf_food\n", - "12 expl-conf_pricerange\n", - "13 impl-conf_area+impl-conf_pricerange+request_food\n", - "14 impl-conf_food+impl-conf_pricerange+request_area\n", - "15 impl-conf_food+request_area\n", - "16 inform_addr+inform_food+offer_name\n", - "17 inform_addr+inform_phone+inform_pricerange+offer_name\n", - "18 inform_addr+inform_phone+offer_name\n", - "19 inform_addr+inform_postcode+offer_name\n", - "20 inform_addr+inform_pricerange+offer_name\n", - "21 inform_addr+offer_name\n", - "22 inform_area+inform_food+inform_pricerange+offer_name\n", - "23 inform_area+inform_food+offer_name\n", - "24 inform_area+inform_phone+offer_name\n", - "25 inform_area+inform_postcode+offer_name\n", - "26 inform_area+inform_pricerange+offer_name\n", - "27 inform_area+offer_name\n", - "28 inform_food+inform_pricerange+offer_name\n", - "29 inform_food+offer_name\n", - "30 inform_phone+inform_postcode+offer_name\n", - "31 inform_phone+inform_pricerange+offer_name\n", - "32 inform_phone+offer_name\n", - "33 inform_postcode+inform_pricerange+offer_name\n", - "34 inform_postcode+offer_name\n", - "35 inform_pricerange+offer_name\n", - "36 offer_name\n", - "37 repeat\n", - "38 reqmore\n", - "39 request_area\n", - "40 request_food\n", - "41 request_pricerange\n", - "42 select_area\n", - "43 select_food\n", - "44 select_pricerange\n", - "45 welcomemsg\n", - "type template number from the list: 45\n", - "[INFO] generated response is: Hello, welcome to the Cambridge restaurant system. You can ask for restaurants by area, price range or food type. How may I help you?\n", - "\n", - "**********\n", - "[INPUT] choose turn (1 for user, 2 for bot, 3 to start a new dialog or 10 for saving and exit): cheap restaurant\n", - "[INFO] please enter integer value\n", - "[INPUT] choose turn (1 for user, 2 for bot, 3 to start a new dialog or 10 for saving and exit): 1\n", - "\n", - "**********\n", - "[INFO] write a user sentence: cheap restaurant\n", - "\n", - "**********\n", - "[INPUT] type 1 if your sentence has a slot, else 0: 1\n", - "0 phone\n", - "1 postcode\n", - "2 this\n", - "3 name\n", - "4 addr\n", - "5 food\n", - "6 area\n", - "7 pricerange\n", - "\n", - "**********\n", - "[INPUT] type slot category number from the list: 7\n", - "0 moderate\n", - "1 expensive\n", - "2 cheap\n", - "3 dontcare\n", - "\n", - "**********\n", - "[INPUT] type slot subcategory number from the list: 2\n", - "\n", - "**********\n", - "[INPUT] type 1 if you want to add more slots, else 0: 0\n", - "{'speaker': 1, 'text': 'cheap restaurant', 'slots': [['pricerange', 'cheap']]}\n", - "\n", - "**********\n", - "[INPUT] choose turn (1 for user, 2 for bot, 3 to start a new dialog or 10 for saving and exit): 2\n", - "\n", - "**********\n", - "[INFO] current slot vals are: {'pricerange': 'cheap'}\n", - "0 api_call\n", - "1 bye\n", - "2 canthear\n", - "3 canthelp_area\n", - "4 canthelp_area_food\n", - "5 canthelp_area_food_pricerange\n", - "6 canthelp_area_pricerange\n", - "7 canthelp_food\n", - "8 canthelp_food_pricerange\n", - "9 confirm-domain\n", - "10 expl-conf_area\n", - "11 expl-conf_food\n", - "12 expl-conf_pricerange\n", - "13 impl-conf_area+impl-conf_pricerange+request_food\n", - "14 impl-conf_food+impl-conf_pricerange+request_area\n", - "15 impl-conf_food+request_area\n", - "16 inform_addr+inform_food+offer_name\n", - "17 inform_addr+inform_phone+inform_pricerange+offer_name\n", - "18 inform_addr+inform_phone+offer_name\n", - "19 inform_addr+inform_postcode+offer_name\n", - "20 inform_addr+inform_pricerange+offer_name\n", - "21 inform_addr+offer_name\n", - "22 inform_area+inform_food+inform_pricerange+offer_name\n", - "23 inform_area+inform_food+offer_name\n", - "24 inform_area+inform_phone+offer_name\n", - "25 inform_area+inform_postcode+offer_name\n", - "26 inform_area+inform_pricerange+offer_name\n", - "27 inform_area+offer_name\n", - "28 inform_food+inform_pricerange+offer_name\n", - "29 inform_food+offer_name\n", - "30 inform_phone+inform_postcode+offer_name\n", - "31 inform_phone+inform_pricerange+offer_name\n", - "32 inform_phone+offer_name\n", - "33 inform_postcode+inform_pricerange+offer_name\n", - "34 inform_postcode+offer_name\n", - "35 inform_pricerange+offer_name\n", - "36 offer_name\n", - "37 repeat\n", - "38 reqmore\n", - "39 request_area\n", - "40 request_food\n", - "41 request_pricerange\n", - "42 select_area\n", - "43 select_food\n", - "44 select_pricerange\n", - "45 welcomemsg\n", - "type template number from the list: 40\n", - "[INFO] generated response is: What kind of food would you like?\n", - "\n", - "**********\n", - "[INPUT] choose turn (1 for user, 2 for bot, 3 to start a new dialog or 10 for saving and exit): 1\n", - "\n", - "**********\n", - "[INFO] write a user sentence: any\n", - "\n", - "**********\n", - "[INPUT] type 1 if your sentence has a slot, else 0: 1\n", - "0 phone\n", - "1 postcode\n", - "2 this\n", - "3 name\n", - "4 addr\n", - "5 food\n", - "6 area\n", - "7 pricerange\n", - "\n", - "**********\n", - "[INPUT] type slot category number from the list: 2\n", - "0 dontcare\n", - "\n", - "**********\n", - "[INPUT] type slot subcategory number from the list: 0\n", - "\n", - "**********\n", - "[INPUT] type 1 if you want to add more slots, else 0: 0\n", - "{'speaker': 1, 'text': 'any', 'slots': [['this', 'dontcare']]}\n", - "\n", - "**********\n", - "[INPUT] choose turn (1 for user, 2 for bot, 3 to start a new dialog or 10 for saving and exit): 2\n", - "\n", - "**********\n", - "[INFO] current slot vals are: {'pricerange': 'cheap', 'this': 'dontcare'}\n", - "0 api_call\n", - "1 bye\n", - "2 canthear\n", - "3 canthelp_area\n", - "4 canthelp_area_food\n", - "5 canthelp_area_food_pricerange\n", - "6 canthelp_area_pricerange\n", - "7 canthelp_food\n", - "8 canthelp_food_pricerange\n", - "9 confirm-domain\n", - "10 expl-conf_area\n", - "11 expl-conf_food\n", - "12 expl-conf_pricerange\n", - "13 impl-conf_area+impl-conf_pricerange+request_food\n", - "14 impl-conf_food+impl-conf_pricerange+request_area\n", - "15 impl-conf_food+request_area\n", - "16 inform_addr+inform_food+offer_name\n", - "17 inform_addr+inform_phone+inform_pricerange+offer_name\n", - "18 inform_addr+inform_phone+offer_name\n", - "19 inform_addr+inform_postcode+offer_name\n", - "20 inform_addr+inform_pricerange+offer_name\n", - "21 inform_addr+offer_name\n", - "22 inform_area+inform_food+inform_pricerange+offer_name\n", - "23 inform_area+inform_food+offer_name\n", - "24 inform_area+inform_phone+offer_name\n", - "25 inform_area+inform_postcode+offer_name\n", - "26 inform_area+inform_pricerange+offer_name\n", - "27 inform_area+offer_name\n", - "28 inform_food+inform_pricerange+offer_name\n", - "29 inform_food+offer_name\n", - "30 inform_phone+inform_postcode+offer_name\n", - "31 inform_phone+inform_pricerange+offer_name\n", - "32 inform_phone+offer_name\n", - "33 inform_postcode+inform_pricerange+offer_name\n", - "34 inform_postcode+offer_name\n", - "35 inform_pricerange+offer_name\n", - "36 offer_name\n", - "37 repeat\n", - "38 reqmore\n", - "39 request_area\n", - "40 request_food\n", - "41 request_pricerange\n", - "42 select_area\n", - "43 select_food\n", - "44 select_pricerange\n", - "45 welcomemsg\n", - "type template number from the list: 39\n", - "[INFO] generated response is: What part of town do you have in mind?\n", - "\n", - "**********\n", - "[INPUT] choose turn (1 for user, 2 for bot, 3 to start a new dialog or 10 for saving and exit): 1\n", - "\n", - "**********\n", - "[INFO] write a user sentence: south\n", - "\n", - "**********\n", - "[INPUT] type 1 if your sentence has a slot, else 0: 1\n", - "0 phone\n", - "1 postcode\n", - "2 this\n", - "3 name\n", - "4 addr\n", - "5 food\n", - "6 area\n", - "7 pricerange\n", - "\n", - "**********\n", - "[INPUT] type slot category number from the list: 6\n", - "0 south\n", - "1 east\n", - "2 dontcare\n", - "3 north\n", - "4 west\n", - "5 centre\n", - "\n", - "**********\n", - "[INPUT] type slot subcategory number from the list: 0\n", - "\n", - "**********\n", - "[INPUT] type 1 if you want to add more slots, else 0: 0\n", - "{'speaker': 1, 'text': 'south', 'slots': [['area', 'south']]}\n", - "\n", - "**********\n", - "[INPUT] choose turn (1 for user, 2 for bot, 3 to start a new dialog or 10 for saving and exit): 2\n", - "\n", - "**********\n", - "[INFO] current slot vals are: {'pricerange': 'cheap', 'this': 'dontcare', 'area': 'south'}\n", - "0 api_call\n", - "1 bye\n", - "2 canthear\n", - "3 canthelp_area\n", - "4 canthelp_area_food\n", - "5 canthelp_area_food_pricerange\n", - "6 canthelp_area_pricerange\n", - "7 canthelp_food\n", - "8 canthelp_food_pricerange\n", - "9 confirm-domain\n", - "10 expl-conf_area\n", - "11 expl-conf_food\n", - "12 expl-conf_pricerange\n", - "13 impl-conf_area+impl-conf_pricerange+request_food\n", - "14 impl-conf_food+impl-conf_pricerange+request_area\n", - "15 impl-conf_food+request_area\n", - "16 inform_addr+inform_food+offer_name\n", - "17 inform_addr+inform_phone+inform_pricerange+offer_name\n", - "18 inform_addr+inform_phone+offer_name\n", - "19 inform_addr+inform_postcode+offer_name\n", - "20 inform_addr+inform_pricerange+offer_name\n", - "21 inform_addr+offer_name\n", - "22 inform_area+inform_food+inform_pricerange+offer_name\n", - "23 inform_area+inform_food+offer_name\n", - "24 inform_area+inform_phone+offer_name\n", - "25 inform_area+inform_postcode+offer_name\n", - "26 inform_area+inform_pricerange+offer_name\n", - "27 inform_area+offer_name\n", - "28 inform_food+inform_pricerange+offer_name\n", - "29 inform_food+offer_name\n", - "30 inform_phone+inform_postcode+offer_name\n", - "31 inform_phone+inform_pricerange+offer_name\n", - "32 inform_phone+offer_name\n", - "33 inform_postcode+inform_pricerange+offer_name\n", - "34 inform_postcode+offer_name\n", - "35 inform_pricerange+offer_name\n", - "36 offer_name\n", - "37 repeat\n", - "38 reqmore\n", - "39 request_area\n", - "40 request_food\n", - "41 request_pricerange\n", - "42 select_area\n", - "43 select_food\n", - "44 select_pricerange\n", - "45 welcomemsg\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "type template number from the list: 0\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "INFO in 'deeppavlov.models.go_bot.tracker.dialogue_state_tracker'['dialogue_state_tracker'] at line 102: Made api_call with {'area': 'south', 'pricerange': 'cheap'}, got 2 results.\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[INFO] generated response is: Api_call area=\"south\" food=\"#food\" pricerange=\"cheap\"\tapi_call area=\"south\" food=\"#food\" pricerange=\"cheap\"\n", - "[INFO] the result of the db call is: {'food': 'chinese', 'pricerange': 'cheap', 'area': 'south', 'postcode': 'c.b 1, 7 d.y', 'phone': '01223 244277', 'addr': 'cambridge leisure park clifton way cherry hinton', 'name': 'the lucky star'}\n", - "\n", - "**********\n", - "[INPUT] choose turn (1 for user, 2 for bot, 3 to start a new dialog or 10 for saving and exit): 2\n", - "\n", - "**********\n", - "[INFO] current slot vals are: {'pricerange': 'cheap', 'this': 'dontcare', 'area': 'south'}\n", - "0 api_call\n", - "1 bye\n", - "2 canthear\n", - "3 canthelp_area\n", - "4 canthelp_area_food\n", - "5 canthelp_area_food_pricerange\n", - "6 canthelp_area_pricerange\n", - "7 canthelp_food\n", - "8 canthelp_food_pricerange\n", - "9 confirm-domain\n", - "10 expl-conf_area\n", - "11 expl-conf_food\n", - "12 expl-conf_pricerange\n", - "13 impl-conf_area+impl-conf_pricerange+request_food\n", - "14 impl-conf_food+impl-conf_pricerange+request_area\n", - "15 impl-conf_food+request_area\n", - "16 inform_addr+inform_food+offer_name\n", - "17 inform_addr+inform_phone+inform_pricerange+offer_name\n", - "18 inform_addr+inform_phone+offer_name\n", - "19 inform_addr+inform_postcode+offer_name\n", - "20 inform_addr+inform_pricerange+offer_name\n", - "21 inform_addr+offer_name\n", - "22 inform_area+inform_food+inform_pricerange+offer_name\n", - "23 inform_area+inform_food+offer_name\n", - "24 inform_area+inform_phone+offer_name\n", - "25 inform_area+inform_postcode+offer_name\n", - "26 inform_area+inform_pricerange+offer_name\n", - "27 inform_area+offer_name\n", - "28 inform_food+inform_pricerange+offer_name\n", - "29 inform_food+offer_name\n", - "30 inform_phone+inform_postcode+offer_name\n", - "31 inform_phone+inform_pricerange+offer_name\n", - "32 inform_phone+offer_name\n", - "33 inform_postcode+inform_pricerange+offer_name\n", - "34 inform_postcode+offer_name\n", - "35 inform_pricerange+offer_name\n", - "36 offer_name\n", - "37 repeat\n", - "38 reqmore\n", - "39 request_area\n", - "40 request_food\n", - "41 request_pricerange\n", - "42 select_area\n", - "43 select_food\n", - "44 select_pricerange\n", - "45 welcomemsg\n", - "type template number from the list: 23\n", - "[INFO] generated response is: The lucky star is a nice place in the south of town serving tasty chinese food.\n", - "\n", - "**********\n", - "[INPUT] choose turn (1 for user, 2 for bot, 3 to start a new dialog or 10 for saving and exit): 1\n", - "\n", - "**********\n", - "[INFO] write a user sentence: address\n", - "\n", - "**********\n", - "[INPUT] type 1 if your sentence has a slot, else 0: 1\n", - "0 phone\n", - "1 postcode\n", - "2 this\n", - "3 name\n", - "4 addr\n", - "5 food\n", - "6 area\n", - "7 pricerange\n", - "\n", - "**********\n", - "[INPUT] type slot category number from the list: 4\n", - "\n", - "**********\n", - "[INPUT] type 1 if you want to add more slots, else 0: 0\n", - "{'speaker': 1, 'text': 'address', 'slots': [['addr', '']]}\n", - "\n", - "**********\n", - "[INPUT] choose turn (1 for user, 2 for bot, 3 to start a new dialog or 10 for saving and exit): 2\n", - "\n", - "**********\n", - "[INFO] current slot vals are: {'pricerange': 'cheap', 'this': 'dontcare', 'area': 'south', 'addr': ''}\n", - "0 api_call\n", - "1 bye\n", - "2 canthear\n", - "3 canthelp_area\n", - "4 canthelp_area_food\n", - "5 canthelp_area_food_pricerange\n", - "6 canthelp_area_pricerange\n", - "7 canthelp_food\n", - "8 canthelp_food_pricerange\n", - "9 confirm-domain\n", - "10 expl-conf_area\n", - "11 expl-conf_food\n", - "12 expl-conf_pricerange\n", - "13 impl-conf_area+impl-conf_pricerange+request_food\n", - "14 impl-conf_food+impl-conf_pricerange+request_area\n", - "15 impl-conf_food+request_area\n", - "16 inform_addr+inform_food+offer_name\n", - "17 inform_addr+inform_phone+inform_pricerange+offer_name\n", - "18 inform_addr+inform_phone+offer_name\n", - "19 inform_addr+inform_postcode+offer_name\n", - "20 inform_addr+inform_pricerange+offer_name\n", - "21 inform_addr+offer_name\n", - "22 inform_area+inform_food+inform_pricerange+offer_name\n", - "23 inform_area+inform_food+offer_name\n", - "24 inform_area+inform_phone+offer_name\n", - "25 inform_area+inform_postcode+offer_name\n", - "26 inform_area+inform_pricerange+offer_name\n", - "27 inform_area+offer_name\n", - "28 inform_food+inform_pricerange+offer_name\n", - "29 inform_food+offer_name\n", - "30 inform_phone+inform_postcode+offer_name\n", - "31 inform_phone+inform_pricerange+offer_name\n", - "32 inform_phone+offer_name\n", - "33 inform_postcode+inform_pricerange+offer_name\n", - "34 inform_postcode+offer_name\n", - "35 inform_pricerange+offer_name\n", - "36 offer_name\n", - "37 repeat\n", - "38 reqmore\n", - "39 request_area\n", - "40 request_food\n", - "41 request_pricerange\n", - "42 select_area\n", - "43 select_food\n", - "44 select_pricerange\n", - "45 welcomemsg\n", - "type template number from the list: 21\n", - "[INFO] generated response is: Sure, the lucky star is on cambridge leisure park clifton way cherry hinton.\n", - "\n", - "**********\n", - "[INPUT] choose turn (1 for user, 2 for bot, 3 to start a new dialog or 10 for saving and exit): 1\n", - "\n", - "**********\n", - "[INFO] write a user sentence: phone number\n", - "\n", - "**********\n", - "[INPUT] type 1 if your sentence has a slot, else 0: 1\n", - "0 phone\n", - "1 postcode\n", - "2 this\n", - "3 name\n", - "4 addr\n", - "5 food\n", - "6 area\n", - "7 pricerange\n", - "\n", - "**********\n", - "[INPUT] type slot category number from the list: 0\n", - "\n", - "**********\n", - "[INPUT] type 1 if you want to add more slots, else 0: 0\n", - "{'speaker': 1, 'text': 'phone number', 'slots': [['phone', '']]}\n", - "\n", - "**********\n", - "[INPUT] choose turn (1 for user, 2 for bot, 3 to start a new dialog or 10 for saving and exit): 2\n", - "\n", - "**********\n", - "[INFO] current slot vals are: {'pricerange': 'cheap', 'this': 'dontcare', 'area': 'south', 'addr': '', 'phone': ''}\n", - "0 api_call\n", - "1 bye\n", - "2 canthear\n", - "3 canthelp_area\n", - "4 canthelp_area_food\n", - "5 canthelp_area_food_pricerange\n", - "6 canthelp_area_pricerange\n", - "7 canthelp_food\n", - "8 canthelp_food_pricerange\n", - "9 confirm-domain\n", - "10 expl-conf_area\n", - "11 expl-conf_food\n", - "12 expl-conf_pricerange\n", - "13 impl-conf_area+impl-conf_pricerange+request_food\n", - "14 impl-conf_food+impl-conf_pricerange+request_area\n", - "15 impl-conf_food+request_area\n", - "16 inform_addr+inform_food+offer_name\n", - "17 inform_addr+inform_phone+inform_pricerange+offer_name\n", - "18 inform_addr+inform_phone+offer_name\n", - "19 inform_addr+inform_postcode+offer_name\n", - "20 inform_addr+inform_pricerange+offer_name\n", - "21 inform_addr+offer_name\n", - "22 inform_area+inform_food+inform_pricerange+offer_name\n", - "23 inform_area+inform_food+offer_name\n", - "24 inform_area+inform_phone+offer_name\n", - "25 inform_area+inform_postcode+offer_name\n", - "26 inform_area+inform_pricerange+offer_name\n", - "27 inform_area+offer_name\n", - "28 inform_food+inform_pricerange+offer_name\n", - "29 inform_food+offer_name\n", - "30 inform_phone+inform_postcode+offer_name\n", - "31 inform_phone+inform_pricerange+offer_name\n", - "32 inform_phone+offer_name\n", - "33 inform_postcode+inform_pricerange+offer_name\n", - "34 inform_postcode+offer_name\n", - "35 inform_pricerange+offer_name\n", - "36 offer_name\n", - "37 repeat\n", - "38 reqmore\n", - "39 request_area\n", - "40 request_food\n", - "41 request_pricerange\n", - "42 select_area\n", - "43 select_food\n", - "44 select_pricerange\n", - "45 welcomemsg\n", - "type template number from the list: 32\n", - "[INFO] generated response is: The phone number of the lucky star is 01223 244277.\tThe phone number of the lucky star is dontcare.\n", - "\n", - "**********\n", - "[INPUT] choose turn (1 for user, 2 for bot, 3 to start a new dialog or 10 for saving and exit): 1\n", - "\n", - "**********\n", - "[INFO] write a user sentence: thank you good bye\n", - "\n", - "**********\n", - "[INPUT] type 1 if your sentence has a slot, else 0: 0\n", - "{'speaker': 1, 'text': 'thank you good bye', 'slots': []}\n", - "\n", - "**********\n", - "[INPUT] choose turn (1 for user, 2 for bot, 3 to start a new dialog or 10 for saving and exit): 2\n", - "\n", - "**********\n", - "[INFO] current slot vals are: {'pricerange': 'cheap', 'this': 'dontcare', 'area': 'south', 'addr': '', 'phone': ''}\n", - "0 api_call\n", - "1 bye\n", - "2 canthear\n", - "3 canthelp_area\n", - "4 canthelp_area_food\n", - "5 canthelp_area_food_pricerange\n", - "6 canthelp_area_pricerange\n", - "7 canthelp_food\n", - "8 canthelp_food_pricerange\n", - "9 confirm-domain\n", - "10 expl-conf_area\n", - "11 expl-conf_food\n", - "12 expl-conf_pricerange\n", - "13 impl-conf_area+impl-conf_pricerange+request_food\n", - "14 impl-conf_food+impl-conf_pricerange+request_area\n", - "15 impl-conf_food+request_area\n", - "16 inform_addr+inform_food+offer_name\n", - "17 inform_addr+inform_phone+inform_pricerange+offer_name\n", - "18 inform_addr+inform_phone+offer_name\n", - "19 inform_addr+inform_postcode+offer_name\n", - "20 inform_addr+inform_pricerange+offer_name\n", - "21 inform_addr+offer_name\n", - "22 inform_area+inform_food+inform_pricerange+offer_name\n", - "23 inform_area+inform_food+offer_name\n", - "24 inform_area+inform_phone+offer_name\n", - "25 inform_area+inform_postcode+offer_name\n", - "26 inform_area+inform_pricerange+offer_name\n", - "27 inform_area+offer_name\n", - "28 inform_food+inform_pricerange+offer_name\n", - "29 inform_food+offer_name\n", - "30 inform_phone+inform_postcode+offer_name\n", - "31 inform_phone+inform_pricerange+offer_name\n", - "32 inform_phone+offer_name\n", - "33 inform_postcode+inform_pricerange+offer_name\n", - "34 inform_postcode+offer_name\n", - "35 inform_pricerange+offer_name\n", - "36 offer_name\n", - "37 repeat\n", - "38 reqmore\n", - "39 request_area\n", - "40 request_food\n", - "41 request_pricerange\n", - "42 select_area\n", - "43 select_food\n", - "44 select_pricerange\n", - "45 welcomemsg\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "type template number from the list: 1\n", - "[INFO] generated response is: You are welcome!\n", - "\n", - "**********\n", - "[INPUT] choose turn (1 for user, 2 for bot, 3 to start a new dialog or 10 for saving and exit): 10\n", - "\n", - "**********\n", - "[INFO] saving the dialogs and exiting...\n" - ] - } - ], - "source": [ - "trainsetgen.start_generation()" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [], - "source": [ - "import json\n", - "\n", - "with open(trainsetgen.save_path) as f:\n", - " dialogs = json.load(f)" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[ [ { 'act': 'welcomemsg',\n", - " 'slots': [],\n", - " 'speaker': 2,\n", - " 'text': 'Hello, welcome to the Cambridge restaurant system. You '\n", - " 'can ask for restaurants by area, price range or food '\n", - " 'type. How may I help you?'},\n", - " { 'slots': [['pricerange', 'cheap']],\n", - " 'speaker': 1,\n", - " 'text': 'cheap restaurant'},\n", - " { 'act': 'request_food',\n", - " 'slots': [],\n", - " 'speaker': 2,\n", - " 'text': 'What kind of food would you like?'},\n", - " {'slots': [['this', 'dontcare']], 'speaker': 1, 'text': 'any'},\n", - " { 'act': 'request_area',\n", - " 'slots': [],\n", - " 'speaker': 2,\n", - " 'text': 'What part of town do you have in mind?'},\n", - " {'slots': [['area', 'south']], 'speaker': 1, 'text': 'south'},\n", - " { 'act': 'api_call',\n", - " 'db_result': '{\"food\": \"chinese\", \"pricerange\": \"cheap\", \"area\": '\n", - " '\"south\", \"postcode\": \"c.b 1, 7 d.y\", \"phone\": \"01223 '\n", - " '244277\", \"addr\": \"cambridge leisure park clifton way '\n", - " 'cherry hinton\", \"name\": \"the lucky star\"}',\n", - " 'slots': [ ['area', 'south'],\n", - " ['pricerange', 'cheap'],\n", - " ['area', 'south'],\n", - " ['pricerange', 'cheap']],\n", - " 'speaker': 2,\n", - " 'text': 'Api_call area=\"south\" food=\"#food\" pricerange=\"cheap\"\\t'\n", - " 'api_call area=\"south\" food=\"#food\" pricerange=\"cheap\"'},\n", - " { 'act': 'inform_area+inform_food+offer_name',\n", - " 'slots': [ ['area', 'south'],\n", - " ['name', 'the lucky star'],\n", - " ['area', 'south'],\n", - " ['food', 'chinese']],\n", - " 'speaker': 2,\n", - " 'text': 'The lucky star is a nice place in the south of town '\n", - " 'serving tasty chinese food.'},\n", - " {'slots': [['addr', '']], 'speaker': 1, 'text': 'address'},\n", - " { 'act': 'inform_addr+offer_name',\n", - " 'slots': [ ['name', 'the lucky star'],\n", - " [ 'addr',\n", - " 'cambridge leisure park clifton way cherry '\n", - " 'hinton']],\n", - " 'speaker': 2,\n", - " 'text': 'Sure, the lucky star is on cambridge leisure park clifton '\n", - " 'way cherry hinton.'},\n", - " {'slots': [['phone', '']], 'speaker': 1, 'text': 'phone number'},\n", - " { 'act': 'inform_phone+offer_name',\n", - " 'slots': [ ['name', 'the lucky star'],\n", - " ['phone', '01223 244277'],\n", - " ['name', 'the lucky star']],\n", - " 'speaker': 2,\n", - " 'text': 'The phone number of the lucky star is 01223 244277.\\tThe '\n", - " 'phone number of the lucky star is dontcare.'},\n", - " {'slots': [], 'speaker': 1, 'text': 'thank you good bye'},\n", - " {'act': 'bye', 'slots': [], 'speaker': 2, 'text': 'You are welcome!'}]]\n" - ] - } - ], - "source": [ - "from pprint import pprint\n", - "\n", - "pprint(dialogs, indent=4)" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.7.3" - }, - "pycharm": { - "stem_cell": { - "cell_type": "raw", - "source": [], - "metadata": { - "collapsed": false - } - } - } - }, - "nbformat": 4, - "nbformat_minor": 4 -} \ No newline at end of file diff --git a/deeppavlov/contrib/examples/db.sqlite b/deeppavlov/contrib/examples/db.sqlite deleted file mode 100644 index f05cdbfa322d509cebd1b5607d83711b53f3a6ce..0000000000000000000000000000000000000000 GIT binary patch literal 0 HcmV?d00001 literal 24576 zcmeHPTZ|j$bzUwd?JTdhX?eYt*E_OJuh)`RbK!7E%j@;p-mGQIm(|6xE!pxx*9+2Pj$uZ37p1=u6TDFwmk1P{1%yATKRYBo76OHg#&C z=y&EmyhtgK071dEB`tT!|MLCkod2BfoHG}8&gp%PZ<y7MPQ1+6oDxMQv{|6 zOc9tOFhyXBz!ZTg0>2puymxP6ZsFLm#CzZ9vsy>%9z_3{eeV3$=EW`k;^wpGws`M% z{Nqj2Z18>UX8$DLv-G-VF{7n4a<9%!Mg5GX{;`@ z4){IoVD;4Ok%i4;i5cB!XgBTa9Ypol*r0DZudjJ4xh8twB!MMIcpk=|+5FGx+avie z^FPn6XFtmxWd1GlVTMotAbm0QAF1C-otXcN`Lo==bKl}t=02IbHpd`w400`!w9I`D=fLJGQ(~EOte`&pJ9Yw3|K6u=N|7 zZ4UbF`dW<_Px30?SUV6zNs{@JD3xTP&bl>AZvg#{rrQHc<9p276s1ea$c}ciz%TUgq0t z9d}+?5z3W{SQRm%O>qw0IkD)B1c}?_)R4hYs{!Pt$k$W+tHQvH**Z^4d z>yH4QqkyOC4eYOeA8(tIy|Os z;D_PK)H1XS>auzAg7kLZ;ugyep};jFv}*P z2xq;v%&jay=AJ8dwFYd4g)`Kc;oup{o3qnSDA8qE!FDYZCtt@{^v0;P(NryFL9BIl z@gZ)}k%uK%B1p9_4jOaCK{?4+cx$cY?nWw=s*0kLgc|!=r{gM16Z(P0;qWZB+huLm zd3Om?hGlP_9+I-ynnm7T+x5km)Uf2}hfO-qU}hriUjo#l@csSr(-*qgeSIDFD*b!vG%RSxR^$5iVTzMjjl>+ z31&-BU_0u47-}|b*1KA(ZLmxCaf<}W<|FQ~4jj=jMP4H_uF5j6D6*_5!FcS3W(**d z!9`)22du-}JxG;#)U|)zRqqbE9n8=vNmcI*bNnz#Av%Nl-T`1^R`XtNmCVH16Y;Cb znRiI%thMX@5y&MK_mU(!wzs2MHh%!K5Y@YexobwQC04*8lU|EMA{3iL#{J=&KZ#^W zmQTs0u@=yko#>30?r~3g{rKdQ7WqMNdz6K0xePHHdwCE_hHkMAYu$|t72;W68op3t z^4VFt>4JkJsj8}bK{U0Fev{vC!~VM0X$M9a`epQSXEZ#Qp=GgV6Ib>AUEW0N!xP0! zYHy8s6Tyg7RQfbse(r>2vC13TKAcAwyI~s}e`{!H-pacZZ!dd77q|?QCazQ}k|Nz^ zs)K=dPweHmMM96Qx`#3{G%QCIK9vt%sfuf&1TRZ(wFAA5+M>8H^Ag2aDh4*Zk1fq%a6pBCD>Hl02q;bRZ$eNjE%;( z<#~z`TY$^Hh0~)jWW4IOTok4Aa_Bd~ZPuz=ucXb|8hV5+o_ilVr@*v?~ zliBR~R;H{dQn?&KXV?~#mO}7*X8*l8Zr&OGy~LTWc2lqa5^)1AU{#P>6K~0w*!j+V z7(qZ;t`GFS8QCMbH&^Fi!s$9&iMp^cTfzL>MurCb40lKlHz&mb0fh(M2fe=MQ5R_()sJ-X4Opa?N!hA*E8K zED5s2!(>tHHF^`*Vcax+V3crTo3J-2NZ&hgQe;x^V2i3FTW&$oFJj71gq zk$2}+M2Mo$#2|Fr)lJKQM;W&rwC1jE+fF=S*7v@eaL?+(g^6e7_)RS0B#WhLMT*dQ?S!ky^_z&z5AWcwLn|r#-JbjLOnN>2K>9Q8&r<)Gc|Y6Bf0%lQ z`#k@0zBu=@)W!5lW;G+HzCQP5_N~k}=jU=7_hR};N>2YI^UtYY%s-s@GIf9Mquig) z{~q^Y{(I>kamUjy&i`j#PyHx;BlZ2veVN1a^1PM#K~7E0q@K?5xux7e{;Rp4aux2U zxy0O$GkW^Jv!7)Dm1|_qXP(b~p896)+4(=@()mPoWB%{cm$@J2K1+X;`SbkUxyRG* zXMZ@qIbY9yJO5Gs^Ze7f^ZCEY{Xy#E`H$!SDfEp)T$rW!JxLEKA?K(|Nu_~gW>jMxj0iRmCKdB@_r;(?m zhOL3!V1?v~MBf6@(Q&u|de6JZl4lZbklGj!G3zq#V9qV`8m+L#>U*C1SaW{*j z;CY(KM%c#pkvVGZ5bm@zyEAAx+v$MW9MrYfQpv@{-T`?0gI5=la-yrxGm}Ms-K-C~FFusqOhkw$lYIYB z63#kjxZfmY+}(bjPqfe>yJB(xKg?SZW$S@=pOxs;%tGP};UoFhT}qjyc4 zJx{~dkYU?1;q*cQo+CNC1S-nwPV^l>B%!Foe6%|S;Zyum>s3WC-oAl=t31euNLe)D-Sk$zJIna$04=2wiAn?`!g-&KfzV{Kj@HX;4Ptfdr zwi}Q+tE7OWPe={0cJSi}7?0D6M)PrujYe2VH38gC@G z^e|b;vFV@VG!)6fXpV+=_KaWoIhAFSH9r;d5KZI=P&9q9M7s+U5DyO7;2>vkjO=ZA zE-+kS`1k;ch)bX@x<<@r%kQV(xD59KT_X`{`aWM*L-P12sn-zEdoLZzC^=hj=kZC^ zduXFR@p(6C$|Q2~E)v3kPR!FH;-p`WCLHDJvb6jNo0f6zB#$WcIIuKX ziWZd~49EW{MqqmyL$*5$?_~uE`}}X`mvaA*`*527kFzSHc4M0VpXUGFDvW9Vf13ZF z=Kn)~VVeJ^yTeP?P4oY}T%P9t5wOQ{dei)WBf`4`4*N9!j|x>;K_+{e|DWdngCbeK zmTj8&V>U?;O5(m^<{rp;O6^ldsKwHv6^NT;eYi zug`peL4QSG=XbcHR2+8ZIO;X^SV@OhP+mnjkXPo9qE=C@pdNZ$ZAVCsUww^RUS2FN zDM|~4$SB4g)PjMBY7KYS2FPpl>uA(Kk>M$rY0&LRn0QqzM6wJs>{nhz1sKX0*B2*L zfQ3__v_nGqi0u_KiD(p{df%}H#V%+6<||ZnQ!H%A{eiVNTDtEw2*|v#)`ZfC)$;h! z#?bzqm)&)5RGs269!0j`5F<5G;kpt0ideT-s4R<;Qmt58iz>mr^o+xpxgQvxNqF-d zw?rsyE&Aoqla9dhTOH(b1+{ohz|Y&D$;}S$luAaXax)YAFS$eSoxc6hR3Sd_pckrQ zNwxcHK+DN!tWd|nz(S&@kL&~`(ndEnlKi;%qFYSK&+L)01`Pw#WF$|*P zQcm;rb=Ov&S=wcgSF}}(A2m{SU&4X`Zh36Uy`LVv{FHlU8%rGxf)VMI8(OS^7Ag@# z@~G*g5QyrZ{XUoIVv)3RQmrqXK~n{Ub=_+z0t~D-ELalQhpJi{fA#r!+QyTv)}P(< ziX-D<5NbV2fKf17rBpcj1|v1+VSUBqbnm$;X+p6mAM@-%WWjV7&S2-u++)nM^WWc zRaC{>6d@yH-YB`Syy~|v4(UP!Ums{T&bbF|d8DBF!KKZ^QL+)^$uP=HF_vJ2j znLACAYPHHM5~_;FHc3*sxtqj)M{+I0c5%-{M{(E=JEUxlpl)8Ml+iaTp!;qT*3p`P z%c83)+kVTDzvN-XgX~H;OzP!qnw`BdPE)+H;DX><%Z-JxH?QZ-ihRuve`NGsiGo;> zdGb5Fvgc3_Vbc2MrPG8)q3C-m6HigVeU8y?BXF*d;-=nTTqExeO0zu1y9-4_kSc;8 zt7AiupirWI3S&{DgE(Z$<9!<}tk4%L6+v>UzWs&}zuUlj9eQeh*~N{MuAuEHLxL7_ z+4jS3D(VOf-6u=yGrfZ}I9jCRMeGq0FRUI29gEjbI5A3b;}~mBiXfcGrxr8w$D1^7dBE39dIBukqcZwBPE*ARAEw+^zAE8>Z|T~UGSXW zV8)=M!mHBv`mWG#j>hRSx+|kK72d|8KZ;hfn2(cyp(|v2jDXRmyzXhQ^IKLbGTbK- z9pIBAh}#y&Q2u{`%g*F~H@}+uEccDv>FnQTuVvGjk2AZOyVD=1pG|$9`lHm!{HOEp z&p*ojjQcj%#M`G|Qv{|6Oc9tOFhyXB!2cHnF1KCEWqWS5znzHio-8@izB=^5Dl(=C1edG&ral zMp?u#r-yv2AcAqn8tH+!+JF;{)vt>~ECH?3kKcQKTm{+_j=CuFc)o}2Fd4D%b)=PI ztM0}>1-}tE9$0x3hL6$MM3pL<`-)~>7xq<7Ze|7=P!S=im8-B1*Mu0XU zq#xVEah=g33q?~I8>C?|-r%wdHy*Y7=u@C}Ug}qHo;bs-E>+h@7``Yfk-#PB4q1N} z8$zsAIN?nhY>0y7U`7eVOvPyhjyICDuDR^f=9xP%hS!4!B}jM*2Vw+z)=G6uSU;gF zrS=XoNsx}M)1#RrPe7bVCzwYTK=y*|vcV3HCk;k=T=chH`giN}o#Jz*avYKRsMWf4zUDa#YjHQIpl|G)_fKi+K2?p5V<_Q)93t~uJzj5F2 zrQ-QelhmE2@VcQcdE#Cs^cBdCHbpxy4kX_!JoP&|b|K;OP+?0*591_<*2 diff --git a/deeppavlov/contrib/examples/dstc2-templates.txt b/deeppavlov/contrib/examples/dstc2-templates.txt deleted file mode 100644 index 6ca6d3560d..0000000000 --- a/deeppavlov/contrib/examples/dstc2-templates.txt +++ /dev/null @@ -1,46 +0,0 @@ -api_call api_call area="#area" food="#food" pricerange="#pricerange" api_call area="#area" food="#food" pricerange="#pricerange" -bye You are welcome! -canthear Sorry, I can't hear you. -canthelp_area I'm sorry but there is no #area american restaurant in the #area of town. -canthelp_area_food Sorry there is no #food restaurant in the #area of town. -canthelp_area_food_pricerange Sorry there is no #pricerange restaurant in the #area of town serving #food food. -canthelp_area_pricerange Sorry there is no #pricerange restaurant in the #area of town serving #area american food. -canthelp_food I am sorry but there is no #food restaurant that matches your request. -canthelp_food_pricerange Sorry there is no #food restaurant in the #pricerange price range. -confirm-domain You are looking for a restaurant is that right? -expl-conf_area Did you say you are looking for a restaurant in the #area of town? Ok, a restaurant in any part of town is that right? -expl-conf_food You are looking for a #food restaurant right? You are looking for a restaurant serving any kind of food right? -expl-conf_pricerange Let me confirm, You are looking for a restaurant in the #pricerange price range right? Let me confirm, You are looking for a restaurant and you dont care about the price range right? -impl-conf_area+impl-conf_pricerange+request_food There are restaurants in the #pricerange price range and the #area of town. What type of food would you like? -impl-conf_food+impl-conf_pricerange+request_area There are restaurants serving #food in the #pricerange price range. What area would you like? -impl-conf_food+request_area There are restaurants serving #food food. What area do you want? -inform_addr+inform_food+offer_name #name is on #addr and serves tasty #food food. -inform_addr+inform_phone+inform_pricerange+offer_name #name is on #addr. Its phone number is #phone, and it is in the #pricerange pricerange. -inform_addr+inform_phone+offer_name The phone number of #name is #phone and it is on #addr. -inform_addr+inform_postcode+offer_name #name is on #addr, #postcode. -inform_addr+inform_pricerange+offer_name #name is on #addr, and it is in the #pricerange price range. -inform_addr+offer_name Sure, #name is on #addr. -inform_area+inform_food+inform_pricerange+offer_name #name is a great restaurant serving #pricerange #food food in the #area of town. -inform_area+inform_food+offer_name #name is a nice place in the #area of town serving tasty #food food. -inform_area+inform_phone+offer_name The phone number of #name is #phone and it is in the #area part of town. -inform_area+inform_postcode+offer_name #name is in the #area, at #postcode. -inform_area+inform_pricerange+offer_name #name is a nice place in the #area of town and the prices are #pricerange. -inform_area+offer_name #name is in the #area part of town. -inform_food+inform_pricerange+offer_name #name serves #food food in the #pricerange price range. -inform_food+offer_name #name serves #food food. -inform_phone+inform_postcode+offer_name The phone number of #name is #phone and its postcode is #postcode. -inform_phone+inform_pricerange+offer_name The phone number of #name is #phone and it is in the #pricerange price range. -inform_phone+offer_name The phone number of #name is #phone. The phone number of #name is dontcare. -inform_postcode+inform_pricerange+offer_name #name is in the #pricerange price range, and their post code is #postcode. -inform_postcode+offer_name The post code of #name is #postcode. -inform_pricerange+offer_name The price range at #name is #pricerange. -offer_name #name is a great restaurant. -repeat Sorry I am a bit confused; please tell me again what you are looking for. -reqmore Can I help you with anything else? -request_area What part of town do you have in mind? -request_food What kind of food would you like? -request_pricerange Would you like something in the cheap, moderate, or expensive price range? -select_area Sorry would you like something in the #area or in the #area. Sorry would you like the #area of town or you dont care. -select_food Sorry would you like #food or #food food? Sorry would you like #food food or you dont care. -select_pricerange Sorry would you like something in the #pricerange price range or in the #pricerange price range. Sorry would you like something in the #pricerange price range or you dont care. -welcomemsg Hello, welcome to the Cambridge restaurant system. You can ask for restaurants by area, price range or food type. How may I help you? diff --git a/deeppavlov/contrib/examples/dstc_slot_vals.json b/deeppavlov/contrib/examples/dstc_slot_vals.json deleted file mode 100644 index 047f682fe3..0000000000 --- a/deeppavlov/contrib/examples/dstc_slot_vals.json +++ /dev/null @@ -1,416 +0,0 @@ -{ - "food": { - "caribbean": [ - "carraibean", - "carribean", - "caribbean" - ], - "kosher": [ - "kosher" - ], - "tuscan": [ - "tuscan" - ], - "french": [ - "french" - ], - "mexican": [ - "mexican" - ], - "japanese": [ - "japanese" - ], - "thai": [ - "tailand", - "thai" - ], - "bistro": [ - "bistro" - ], - "swedish": [ - "swedish" - ], - "lebanese": [ - "lebanese" - ], - "indonesian": [ - "endonesian", - "indonesian" - ], - "halal": [ - "halal" - ], - "crossover": [ - "class over", - "cross over", - "crossover" - ], - "chinese": [ - "chinses", - "chineese", - "chinese" - ], - "british": [ - "british" - ], - "austrian": [ - "austrian" - ], - "greek": [ - "greek" - ], - "vietnamese": [ - "vietnam", - "vietnamese" - ], - "fusion": [ - "fusion" - ], - "italian": [ - "itailian", - "italian" - ], - "persian": [ - "persian" - ], - "indian": [ - "india", - "indian" - ], - "welsh": [ - "welsh" - ], - "gastropub": [ - "gastro pub", - "gastropub" - ], - "australian": [ - "australian" - ], - "brazilian": [ - "brazilian" - ], - "cuban": [ - "cuban" - ], - "moroccan": [ - "moroccon", - "moroccan" - ], - "korean": [ - "korea", - "korean" - ], - "english": [ - "english" - ], - "world": [ - "world" - ], - "irish": [ - "irish" - ], - "swiss": [ - "swiss" - ], - "modern european": [ - "modern europane", - "modern europone", - "modern euorpean", - "modern european" - ], - "turkish": [ - "turkiesh", - "turkish" - ], - "russian": [ - "russian" - ], - "catalan": [ - "catalin", - "kitalian", - "catalanian", - "catalonian", - "katalian", - "catalan" - ], - "polynesian": [ - "polynesian" - ], - "cantonese": [ - "cantonates", - "cantonales", - "cantonese" - ], - "barbeque": [ - "barbecue", - "barbeque" - ], - "scottish": [ - "scottish" - ], - "portuguese": [ - "portugeuse", - "portugese", - "portuguese" - ], - "german": [ - "german" - ], - "north american": [ - "american", - "north american" - ], - "afghan": [ - "afghan" - ], - "vegetarian": [ - "vegetarian" - ], - "jamaican": [ - "jamcian", - "jamian", - "jamaican" - ], - "australasian": [ - "australian asian", - "austria asian", - "australasian" - ], - "venetian": [ - "venetian" - ], - "hungarian": [ - "hungarian" - ], - "belgian": [ - "belgium", - "belgian" - ], - "asian oriental": [ - "asian ori", - "asian", - "oriental", - "asian oriental" - ], - "basque": [ - "bask", - "basque" - ], - "international": [ - "international" - ], - "corsica": [ - "corsica" - ], - "canapes": [ - "canope", - "canopy", - "canapes" - ], - "traditional": [ - "traditional" - ], - "creative": [ - "creative" - ], - "malaysian": [ - "malaysian" - ], - "polish": [ - "polish" - ], - "european": [ - "european", - "european" - ], - "dontcare": [ - "any food", - "any kind", - "any type of food", - "any type", - "dontcare" - ], - "eritrean": [ - "airatarin", - "airitran", - "air tran", - "airitran", - "eartrain", - "earatrain", - "arotrian", - "airtran", - "eirtrean", - "eritrean" - ], - "scandinavian": [ - "scandanavian", - "scandanavian", - "scandinavian" - ], - "christmas": [ - "christmas" - ], - "unusual": [ - "unusual" - ], - "singaporean": [ - "signaporian", - "signapore", - "singapore", - "singaporean" - ], - "steakhouse": [ - "steak house", - "steak", - "steakhouse" - ], - "african": [ - "african" - ], - "danish": [ - "danish" - ], - "spanish": [ - "spanish" - ], - "seafood": [ - "sea food", - "seafood" - ], - "romanian": [ - "romanian" - ], - "mediterranean": [ - "medterranean", - "mediterranean" - ] - }, - "pricerange": { - "moderate": [ - "moderately", - "medium", - "moderat", - "moderately", - "derately", - "modreately", - "moderate" - ], - "expensive": [ - "high priced", - "expensive" - ], - "cheap": [ - "cheap" - ], - "dontcare": [ - "any price", - "i dont care about the price", - "i dont care about price", - "any range is fine", - "any range", - "dontcare" - ] - }, - "area": { - "south": [ - "south" - ], - "east": [ - "east" - ], - "dontcare": [ - "any area", - "any address", - "any part", - "other parts", - "anywhere", - "any where", - "other parts", - "other areas", - "dontcare" - ], - "north": [ - "north" - ], - "west": [ - "west" - ], - "centre": [ - "center", - "central", - "downtown", - "down town", - "centre" - ] - }, - "this": { - "dontcare": [ - "dont care", - "doesnt matter", - "any fine", - "any noise", - "any of town", - "noise anything", - "any type", - "anything", - "any thing", - "what ever", - "does not matter", - "dont mind", - "dont know", - "any type", - "any kind", - "dont matter", - "dontcare" - ] - }, - "name": { - "prezzo": [ - "prezzo" - ], - "restaurant two two": [ - "restaurant two two" - ], - "la margherita": [ - "la margherita" - ], - "golden wok": [ - "golden wok" - ], - "nandos": [ - "nandos" - ], - "da vinci pizzeria": [ - "da vinci pizzeria" - ], - "rice house": [ - "rice house" - ], - "curry prince": [ - "curry prince" - ], - "hk fusion": [ - "hk fusion" - ], - "pizza hut city centre": [ - "pizza hutt", - "pizza hut", - "pizza hut city centre" - ], - "venue": [ - "venue" - ], - "chiquito restaurant bar": [ - "chiquito", - "chiquito restaurant bar" - ], - "bloomsbury restaurant": [ - "bloomsbury", - "bloomsbury restaurant" - ], - "the missing sock": [ - "the missing sock" - ] - } -} \ No newline at end of file diff --git a/deeppavlov/contrib/examples/generated_data.json b/deeppavlov/contrib/examples/generated_data.json deleted file mode 100644 index 3e8b03139d..0000000000 --- a/deeppavlov/contrib/examples/generated_data.json +++ /dev/null @@ -1,164 +0,0 @@ -[ - [ - { - "speaker": 2, - "text": "Hello, welcome to the Cambridge restaurant system. You can ask for restaurants by area, price range or food type. How may I help you?", - "slots": [], - "act": "welcomemsg" - }, - { - "speaker": 1, - "text": "cheap restaurant", - "slots": [ - [ - "pricerange", - "cheap" - ] - ] - }, - { - "speaker": 2, - "text": "What kind of food would you like?", - "slots": [], - "act": "request_food" - }, - { - "speaker": 1, - "text": "any", - "slots": [ - [ - "this", - "dontcare" - ] - ] - }, - { - "speaker": 2, - "text": "What part of town do you have in mind?", - "slots": [], - "act": "request_area" - }, - { - "speaker": 1, - "text": "south", - "slots": [ - [ - "area", - "south" - ] - ] - }, - { - "speaker": 2, - "text": "Api_call area=\"south\" food=\"#food\" pricerange=\"cheap\"\tapi_call area=\"south\" food=\"#food\" pricerange=\"cheap\"", - "db_result": "{\"food\": \"chinese\", \"pricerange\": \"cheap\", \"area\": \"south\", \"postcode\": \"c.b 1, 7 d.y\", \"phone\": \"01223 244277\", \"addr\": \"cambridge leisure park clifton way cherry hinton\", \"name\": \"the lucky star\"}", - "slots": [ - [ - "area", - "south" - ], - [ - "pricerange", - "cheap" - ], - [ - "area", - "south" - ], - [ - "pricerange", - "cheap" - ] - ], - "act": "api_call" - }, - { - "speaker": 2, - "text": "The lucky star is a nice place in the south of town serving tasty chinese food.", - "slots": [ - [ - "area", - "south" - ], - [ - "name", - "the lucky star" - ], - [ - "area", - "south" - ], - [ - "food", - "chinese" - ] - ], - "act": "inform_area+inform_food+offer_name" - }, - { - "speaker": 1, - "text": "address", - "slots": [ - [ - "addr", - "" - ] - ] - }, - { - "speaker": 2, - "text": "Sure, the lucky star is on cambridge leisure park clifton way cherry hinton.", - "slots": [ - [ - "name", - "the lucky star" - ], - [ - "addr", - "cambridge leisure park clifton way cherry hinton" - ] - ], - "act": "inform_addr+offer_name" - }, - { - "speaker": 1, - "text": "phone number", - "slots": [ - [ - "phone", - "" - ] - ] - }, - { - "speaker": 2, - "text": "The phone number of the lucky star is 01223 244277.\tThe phone number of the lucky star is dontcare.", - "slots": [ - [ - "name", - "the lucky star" - ], - [ - "phone", - "01223 244277" - ], - [ - "name", - "the lucky star" - ] - ], - "act": "inform_phone+offer_name" - }, - { - "speaker": 1, - "text": "thank you good bye", - "slots": [] - }, - { - "speaker": 2, - "text": "You are welcome!", - "slots": [], - "act": "bye" - } - ] -] \ No newline at end of file diff --git a/deeppavlov/core/commands/train.py b/deeppavlov/core/commands/train.py index c3466fc403..d444df5c09 100644 --- a/deeppavlov/core/commands/train.py +++ b/deeppavlov/core/commands/train.py @@ -70,7 +70,6 @@ def train_evaluate_model_from_config(config: Union[str, Path, dict], iterator: Union[DataLearningIterator, DataFittingIterator] = None, *, to_train: bool = True, evaluation_targets: Optional[Iterable[str]] = None, - to_validate: Optional[bool] = None, download: bool = False, start_epoch_num: Optional[int] = None, recursive: bool = False) -> Dict[str, Dict[str, float]]: @@ -98,23 +97,13 @@ def train_evaluate_model_from_config(config: Union[str, Path, dict], if 'train' not in config: log.warning('Train config is missing. Populating with default values') - train_config = config.get('train') + train_config = config.get('train', {}) if start_epoch_num is not None: train_config['start_epoch_num'] = start_epoch_num - if 'evaluation_targets' not in train_config and ('validate_best' in train_config - or 'test_best' in train_config): - log.warning('"validate_best" and "test_best" parameters are deprecated.' - ' Please, use "evaluation_targets" list instead') + trainer_class = get_model(train_config.pop('class_name', 'torch_trainer')) - train_config['evaluation_targets'] = [] - if train_config.pop('validate_best', True): - train_config['evaluation_targets'].append('valid') - if train_config.pop('test_best', True): - train_config['evaluation_targets'].append('test') - - trainer_class = get_model(train_config.pop('class_name', 'nn_trainer')) trainer = trainer_class(config['chainer'], **train_config) if to_train: @@ -123,18 +112,7 @@ def train_evaluate_model_from_config(config: Union[str, Path, dict], res = {} if iterator is not None: - if to_validate is not None: - if evaluation_targets is None: - log.warning('"to_validate" parameter is deprecated and will be removed in future versions.' - ' Please, use "evaluation_targets" list instead') - evaluation_targets = ['test'] - if to_validate: - evaluation_targets.append('valid') - else: - log.warning('Both "evaluation_targets" and "to_validate" parameters are specified.' - ' "to_validate" is deprecated and will be ignored') - - res = trainer.evaluate(iterator, evaluation_targets, print_reports=True) + res = trainer.evaluate(iterator, evaluation_targets) trainer.get_chainer().destroy() res = {k: v['metrics'] for k, v in res.items()} diff --git a/deeppavlov/core/commands/utils.py b/deeppavlov/core/commands/utils.py index 1543591835..6824ab05e9 100644 --- a/deeppavlov/core/commands/utils.py +++ b/deeppavlov/core/commands/utils.py @@ -14,7 +14,7 @@ import os from copy import deepcopy from pathlib import Path -from typing import Union, Dict, TypeVar +from typing import Any, Union, Dict, TypeVar, Optional from deeppavlov.core.common.file import read_json, find_config from deeppavlov.core.common.registry import inverted_registry @@ -90,11 +90,41 @@ def _update_requirements(config: dict) -> dict: return response -def parse_config(config: Union[str, Path, dict]) -> dict: - """Apply variables' values to all its properties""" +def _overwrite(data: Any, value: Any, nested_keys: list) -> None: + """Changes ``data`` nested key value to ``value`` using ``nested_keys`` as nested keys list. + + Example: + >>> x = {'a': [None, {'b': 2}]} + >>> _overwrite(x, 42, ['a', 1, 'b']) + >>> x + {'a': [None, {'b': 42}]} + + """ + key = nested_keys.pop(0) + if not nested_keys: + data[key] = value + else: + _overwrite(data[key], value, nested_keys) + + +def parse_config(config: Union[str, Path, dict], overwrite: Optional[dict] = None) -> dict: + """Apply metadata.variables values to placeholders inside config and update nested configs using overwrite parameter + + Args: + config: Config to parse. + overwrite: If not None - key-value pairs of nested keys and values to overwrite config. + For {'chainer.pipe.0.class_name': 'simple_vocab'} it will update config + config['chainer']['pipe'][0]['class_name'] = 'simple_vocab'. + + """ if isinstance(config, (str, Path)): config = read_json(find_config(config)) + if overwrite is not None: + for key, value in overwrite.items(): + items = [int(item) if item.isdigit() else item for item in key.split('.')] + _overwrite(config, value, items) + updated_config = _update_requirements(config) variables, variables_exact = _get_variables_from_config(updated_config) diff --git a/deeppavlov/core/common/aliases.py b/deeppavlov/core/common/aliases.py new file mode 100644 index 0000000000..6b50826e5d --- /dev/null +++ b/deeppavlov/core/common/aliases.py @@ -0,0 +1,47 @@ +# Copyright 2017 Neural Networks and Deep Learning lab, MIPT +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +ALIASES = { + 'kbqa_cq': 'kbqa_cq_en', + 'kbqa_cq_online': 'kbqa_cq_en', + 'kbqa_cq_rus': 'kbqa_cq_ru', + 'multi_squad_noans': 'qa_squad2_bert', + 'multi_squad_noans_infer': 'qa_squad2_bert', + 'multi_squad_retr_noans': 'qa_squad2_bert', + 'ner_collection3_m1': 'ner_collection3_bert', + 'ner_conll2003': 'ner_conll2003_bert', + 'ner_conll2003_torch_bert': 'ner_conll2003_bert', + 'ner_dstc2': 'ner_conll2003_bert', + 'ner_few_shot_ru': 'ner_rus_bert', + 'ner_few_shot_ru_simulate': 'ner_rus_bert', + 'ner_ontonotes': 'ner_ontonotes_bert', + 'ner_ontonotes_bert_emb': 'ner_ontonotes_bert', + 'ner_ontonotes_bert_mult_torch': 'ner_ontonotes_bert_mult', + 'ner_ontonotes_bert_torch': 'ner_ontonotes_bert', + 'ner_rus': 'ner_rus_bert', + 'paraphraser_bert': 'paraphraser_rubert', + 'ru_odqa_infer_wiki_rubert': 'ru_odqa_infer_wiki', + 'sentseg_dailydialog': 'sentseg_dailydialog_bert', + 'squad': 'squad_bert', + 'squad_bert_infer': 'squad_bert', + 'squad_bert_multilingual_freezed_emb': 'squad_bert', + 'squad_ru': 'squad_ru_bert', + 'squad_ru_bert_infer': 'squad_ru_bert', + 'squad_ru_convers_distilrubert_2L_infer': 'squad_ru_convers_distilrubert_2L', + 'squad_ru_convers_distilrubert_6L_infer': 'squad_ru_convers_distilrubert_6L', + 'squad_ru_rubert': 'squad_ru_bert', + 'squad_ru_rubert_infer': 'squad_ru_bert', + 'squad_torch_bert': 'squad_bert', + 'squad_torch_bert_infer': 'squad_bert' +} diff --git a/deeppavlov/core/common/base.py b/deeppavlov/core/common/base.py new file mode 100644 index 0000000000..e18d548d05 --- /dev/null +++ b/deeppavlov/core/common/base.py @@ -0,0 +1,62 @@ +# Copyright 2021 Neural Networks and Deep Learning lab, MIPT +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from types import FunctionType +from typing import List, Optional, Union + +from deeppavlov.core.common.chainer import Chainer +from deeppavlov.core.models.component import Component + + +class Element: + """DeepPavlov model pipeline element.""" + def __init__(self, component: Union[Component, FunctionType], + x: Optional[Union[str, list]] = None, + out: Optional[Union[str, list]] = None, + y: Optional[Union[str, list]] = None, + main: bool = False) -> None: + """ + Args: + component: Pipeline component object. + x: Names of the component inference inputs. Output from other pipeline elements with such names will be fed + to the input of this component. + out: Names of the component inference outputs. Component outputs can be fed to other pipeline elements + using this names. + y: Names of additional inputs (targets) for component training and evaluation. + main: Set True if this is the main component. Main component is trained during model training process. + """ + self.component = component + self.x = x + self.y = y + self.out = out + self.main = main + + +class Model(Chainer): + """Builds a component pipeline to train and infer models.""" + def __init__(self, x: Optional[Union[str, list]] = None, + out: Optional[Union[str, list]] = None, + y: Optional[Union[str, list]] = None, + pipe: Optional[List[Element]] = None) -> None: + """ + Args: + x: Names of pipeline inference inputs. + out: Names of pipeline inference outputs. + y: Names of additional inputs (targets) for pipeline training and evaluation. + pipe: List of pipeline elements. + """ + super().__init__(in_x=x, out_params=out, in_y=y) + if pipe is not None: + for element in pipe: + self.append(element.component, element.x, element.out, element.y, element.main) diff --git a/deeppavlov/core/common/check_gpu.py b/deeppavlov/core/common/check_gpu.py deleted file mode 100644 index d768417785..0000000000 --- a/deeppavlov/core/common/check_gpu.py +++ /dev/null @@ -1,38 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from logging import getLogger - -import tensorflow as tf -from tensorflow.python.client import device_lib - -log = getLogger(__name__) - -_gpu_available = None - - -def check_gpu_existence(): - r"""Return True if at least one GPU is available""" - global _gpu_available - if _gpu_available is None: - sess_config = tf.ConfigProto() - sess_config.gpu_options.allow_growth = True - try: - with tf.Session(config=sess_config): - device_list = device_lib.list_local_devices() - _gpu_available = any(device.device_type == 'GPU' for device in device_list) - except AttributeError as e: - log.warning(f'Got an AttributeError `{e}`, assuming documentation building') - _gpu_available = False - return _gpu_available diff --git a/deeppavlov/core/common/file.py b/deeppavlov/core/common/file.py index 6079c68203..212fe3d7c0 100644 --- a/deeppavlov/core/common/file.py +++ b/deeppavlov/core/common/file.py @@ -19,18 +19,33 @@ from pathlib import Path from typing import Union, Any -from ruamel.yaml import YAML +from deeppavlov.core.common.aliases import ALIASES log = getLogger(__name__) +_red_text, _reset_text_color, _sharp_line = "\x1b[31;20m", "\x1b[0m", '#'*80 +DEPRECATOIN_MSG = f"{_red_text}\n\n{_sharp_line}\n" \ + "# The model '{0}' has been removed from the DeepPavlov configs.\n" \ + "# The model '{1}' is used instead.\n" \ + "# To disable this message please switch to '{1}'.\n" \ + "# Automatic name resolving will be disabled in the next release,\n" \ + "# and if you try to use '{0}' you will get an ERROR.\n" \ + f"{_sharp_line}{_reset_text_color}\n" + def find_config(pipeline_config_path: Union[str, Path]) -> Path: + if pipeline_config_path in ALIASES: + new_pipeline_config_path = ALIASES[pipeline_config_path] + log.warning(DEPRECATOIN_MSG.format(pipeline_config_path, new_pipeline_config_path)) + pipeline_config_path = new_pipeline_config_path + if not Path(pipeline_config_path).is_file(): configs = [c for c in Path(__file__).parent.parent.parent.glob(f'configs/**/{pipeline_config_path}.json') if str(c.with_suffix('')).endswith(pipeline_config_path)] # a simple way to not allow * and ? if configs: - log.info(f"Interpreting '{pipeline_config_path}' as '{configs[0]}'") + log.debug(f"Interpreting '{pipeline_config_path}' as '{configs[0]}'") pipeline_config_path = configs[0] + return Path(pipeline_config_path) @@ -52,9 +67,3 @@ def save_pickle(data: dict, fpath: Union[str, Path]) -> None: def load_pickle(fpath: Union[str, Path]) -> Any: with open(fpath, 'rb') as fin: return pickle.load(fin) - - -def read_yaml(fpath: Union[str, Path]) -> dict: - yaml = YAML(typ="safe") - with open(fpath, encoding='utf8') as fin: - return yaml.load(fin) diff --git a/deeppavlov/core/common/log_events.py b/deeppavlov/core/common/log_events.py new file mode 100644 index 0000000000..f6d3c88cbb --- /dev/null +++ b/deeppavlov/core/common/log_events.py @@ -0,0 +1,53 @@ +# Copyright 2019 Neural Networks and Deep Learning lab, MIPT +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from logging import getLogger +from typing import Optional +from deeppavlov.core.commands.utils import expand_path + +log = getLogger(__name__) + + +class TBWriter: + def __init__(self, tensorboard_log_dir: str): + # TODO: After adding wandb logger, create common parent class for both loggers + from torch.utils.tensorboard import SummaryWriter + tensorboard_log_dir = expand_path(tensorboard_log_dir) + self.tb_train_writer = SummaryWriter(str(tensorboard_log_dir / 'train_log')) + self.tb_valid_writer = SummaryWriter(str(tensorboard_log_dir / 'valid_log')) + + # TODO: find how to write Summary + def write_train(self, tag, scalar_value, global_step): + self.tb_train_writer.add_scalar(tag, scalar_value, global_step) + + def write_valid(self, tag, scalar_value, global_step): + self.tb_valid_writer.add_scalar(tag, scalar_value, global_step) + + def flush(self): + self.tb_train_writer.flush() + self.tb_valid_writer.flush() + + +def get_tb_writer(tensorboard_log_dir: Optional[str]) -> Optional[TBWriter]: + try: + if tensorboard_log_dir is not None: + tb_writer = TBWriter(tensorboard_log_dir) + else: + tb_writer = None + except ImportError: + log.error('Failed to import SummaryWriter from torch.utils.tensorboard. Failed to initialize Tensorboard ' + 'logger. Install appropriate Pytorch version to use this logger or remove tensorboard_log_dir ' + 'parameter from the train parameters list in the configuration file.') + tb_writer = None + return tb_writer diff --git a/deeppavlov/core/common/metrics_registry.json b/deeppavlov/core/common/metrics_registry.json index f41fa24f41..c1f1a6c7a0 100644 --- a/deeppavlov/core/common/metrics_registry.json +++ b/deeppavlov/core/common/metrics_registry.json @@ -20,7 +20,6 @@ "ner_f1": "deeppavlov.metrics.fmeasure:ner_f1", "ner_token_f1": "deeppavlov.metrics.fmeasure:ner_token_f1", "pearson_correlation": "deeppavlov.metrics.correlation:pearson_correlation", - "per_item_action_accuracy": "deeppavlov.metrics.accuracy:per_item_action_accuracy", "per_item_bleu": "deeppavlov.metrics.bleu:per_item_bleu", "per_item_dialog_accuracy": "deeppavlov.metrics.accuracy:per_item_dialog_accuracy", "per_item_dialog_bleu": "deeppavlov.metrics.bleu:per_item_dialog_bleu", diff --git a/deeppavlov/core/common/metrics_registry.py b/deeppavlov/core/common/metrics_registry.py index 55d7d34f9d..4c64533aaa 100644 --- a/deeppavlov/core/common/metrics_registry.py +++ b/deeppavlov/core/common/metrics_registry.py @@ -34,11 +34,13 @@ def fn_from_str(name: str) -> Callable[..., Any]: """Returns a function object with the name given in string.""" try: module_name, fn_name = name.split(':') + return getattr(importlib.import_module(module_name), fn_name) except ValueError: raise ConfigError('Expected function description in a `module.submodules:function_name` form, but got `{}`' .format(name)) - - return getattr(importlib.import_module(module_name), fn_name) + except AttributeError: + # noinspection PyUnboundLocalVariable + raise ConfigError(f"Incorrect metric: '{module_name}' has no attribute '{fn_name}'.") def register_metric(metric_name: str) -> Callable[..., Any]: @@ -57,6 +59,5 @@ def decorate(fn): def get_metric_by_name(name: str) -> Callable[..., Any]: """Returns a metric callable with a corresponding name.""" - if name not in _REGISTRY: - raise ConfigError(f'"{name}" is not registered as a metric') - return fn_from_str(_REGISTRY[name]) + name = _REGISTRY.get(name, name) + return fn_from_str(name) diff --git a/deeppavlov/core/common/params.py b/deeppavlov/core/common/params.py index e74bb594a7..0abd8644ba 100644 --- a/deeppavlov/core/common/params.py +++ b/deeppavlov/core/common/params.py @@ -77,7 +77,7 @@ def from_params(params: Dict, mode: str = 'infer', serialized: Any = None, **kwa from deeppavlov.core.commands.infer import build_model refs = _refs.copy() _refs.clear() - config = parse_config(expand_path(config_params['config_path'])) + config = parse_config(expand_path(config_params['config_path']), config_params.get('overwrite')) model = build_model(config, serialized=serialized) _refs.clear() _refs.update(refs) diff --git a/deeppavlov/core/common/registry.json b/deeppavlov/core/common/registry.json index 0e5891e11f..42f0df484e 100644 --- a/deeppavlov/core/common/registry.json +++ b/deeppavlov/core/common/registry.json @@ -1,108 +1,35 @@ { - "UD_pymorphy_lemmatizer": "deeppavlov.models.morpho_tagger.lemmatizer:UDPymorphyLemmatizer", - "aiml_skill": "deeppavlov.skills.aiml_skill.aiml_skill:AIMLSkill", + "answer_types_extractor": "deeppavlov.models.kbqa.type_define:AnswerTypesExtractor", "api_requester": "deeppavlov.models.api_requester.api_requester:ApiRequester", "api_router": "deeppavlov.models.api_requester.api_router:ApiRouter", - "base64_decode_bytesIO": "deeppavlov.models.nemo.common:ascii_to_bytes_io", "basic_classification_iterator": "deeppavlov.dataset_iterators.basic_classification_iterator:BasicClassificationDatasetIterator", "basic_classification_reader": "deeppavlov.dataset_readers.basic_classification_reader:BasicClassificationDatasetReader", - "bert_classifier": "deeppavlov.models.bert.bert_classifier:BertClassifierModel", - "bert_ner_preprocessor": "deeppavlov.models.preprocessors.bert_preprocessor:BertNerPreprocessor", - "bert_preprocessor": "deeppavlov.models.preprocessors.bert_preprocessor:BertPreprocessor", - "bert_ranker": "deeppavlov.models.bert.bert_ranker:BertRankerModel", - "bert_ranker_preprocessor": "deeppavlov.models.preprocessors.bert_preprocessor:BertRankerPreprocessor", - "bert_sep_ranker": "deeppavlov.models.bert.bert_ranker:BertSepRankerModel", - "bert_sep_ranker_predictor": "deeppavlov.models.bert.bert_ranker:BertSepRankerPredictor", - "bert_sep_ranker_predictor_preprocessor": "deeppavlov.models.preprocessors.bert_preprocessor:BertSepRankerPredictorPreprocessor", - "bert_sep_ranker_preprocessor": "deeppavlov.models.preprocessors.bert_preprocessor:BertSepRankerPreprocessor", - "bert_sequence_network": "deeppavlov.models.bert.bert_sequence_tagger:BertSequenceNetwork", - "bert_sequence_tagger": "deeppavlov.models.bert.bert_sequence_tagger:BertSequenceTagger", - "bert_syntax_parser": "deeppavlov.models.syntax_parser.network:BertSyntaxParser", - "bilstm_gru_nn": "deeppavlov.models.ranking.bilstm_gru_siamese_network:BiLSTMGRUSiameseNetwork", - "bilstm_nn": "deeppavlov.models.ranking.bilstm_siamese_network:BiLSTMSiameseNetwork", "boolqa_reader": "deeppavlov.dataset_readers.boolqa_reader:BoolqaReader", - "bow": "deeppavlov.models.embedders.bow_embedder:BoWEmbedder", - "bytesIO_encode_base64": "deeppavlov.models.nemo.common:bytes_io_to_ascii", - "capitalization_featurizer": "deeppavlov.models.preprocessors.capitalization:CapitalizationPreprocessor", - "char_splitter": "deeppavlov.models.preprocessors.char_splitter:CharSplitter", - "char_splitting_lowercase_preprocessor": "deeppavlov.models.preprocessors.capitalization:CharSplittingLowercasePreprocessor", - "chu_liu_edmonds_transformer": "deeppavlov.models.syntax_parser.parser:ChuLiuEdmonds", "conll2003_reader": "deeppavlov.dataset_readers.conll2003_reader:Conll2003DatasetReader", - "convert_ids2tags": "deeppavlov.models.preprocessors.ner_preprocessor:ConvertIds2Tags", "cos_sim_classifier": "deeppavlov.models.classifiers.cos_sim_classifier:CosineSimilarityClassifier", - "dam_nn_use_transformer": "deeppavlov.models.ranking.deep_attention_matching_network_use_transformer:DAMNetworkUSETransformer", "data_fitting_iterator": "deeppavlov.core.data.data_fitting_iterator:DataFittingIterator", "data_learning_iterator": "deeppavlov.core.data.data_learning_iterator:DataLearningIterator", - "dependency_output_prettifier": "deeppavlov.models.morpho_tagger.common:DependencyOutputPrettifier", - "dialog_component_wrapper": "deeppavlov.models.go_bot.wrapper:DialogComponentWrapper", - "dialog_db_result_iterator": "deeppavlov.dataset_iterators.dialog_iterator:DialogDBResultDatasetIterator", - "dialog_indexing_iterator": "deeppavlov.dataset_iterators.dialog_iterator:DialogDatasetIndexingIterator", - "dialog_iterator": "deeppavlov.dataset_iterators.dialog_iterator:DialogDatasetIterator", - "dictionary_vectorizer": "deeppavlov.models.vectorizers.word_vectorizer:DictionaryVectorizer", "dirty_comments_preprocessor": "deeppavlov.models.preprocessors.dirty_comments_preprocessor:DirtyCommentsPreprocessor", "docred_reader": "deeppavlov.dataset_readers.docred_reader:DocREDDatasetReader", "document_chunker": "deeppavlov.models.preprocessors.odqa_preprocessors:DocumentChunker", - "dstc2_intents_iterator": "deeppavlov.dataset_iterators.dstc2_intents_iterator:Dstc2IntentsDatasetIterator", - "dstc2_ner_iterator": "deeppavlov.dataset_iterators.dstc2_ner_iterator:Dstc2NerDatasetIterator", - "dstc2_reader": "deeppavlov.dataset_readers.dstc2_reader:DSTC2DatasetReader", - "dstc_slotfilling": "deeppavlov.models.slotfill.slotfill:DstcSlotFillingNetwork", - "elmo_embedder": "deeppavlov.models.embedders.elmo_embedder:ELMoEmbedder", - "elmo_file_paths_iterator": "deeppavlov.dataset_iterators.elmo_file_paths_iterator:ELMoFilePathsIterator", - "elmo_model": "deeppavlov.models.elmo.elmo:ELMo", - "emb_mat_assembler": "deeppavlov.models.preprocessors.assemble_embeddings_matrix:EmbeddingsMatrixAssembler", - "entity_detection_parser": "deeppavlov.models.kbqa.entity_detection_parser:EntityDetectionParser", - "entity_linker": "deeppavlov.models.kbqa.entity_linking:EntityLinker", + "entity_detection_parser": "deeppavlov.models.entity_extraction.entity_detection_parser:EntityDetectionParser", + "entity_linker": "deeppavlov.models.entity_extraction.entity_linking:EntityLinker", "faq_reader": "deeppavlov.dataset_readers.faq_reader:FaqDatasetReader", "fasttext": "deeppavlov.models.embedders.fasttext_embedder:FasttextEmbedder", - "featurized_tracker": "deeppavlov.models.go_bot.tracker.featurized_tracker:FeaturizedTracker", - "file_paths_iterator": "deeppavlov.dataset_iterators.file_paths_iterator:FilePathsIterator", - "file_paths_reader": "deeppavlov.dataset_readers.file_paths_reader:FilePathsReader", "fit_trainer": "deeppavlov.core.trainers.fit_trainer:FitTrainer", - "glove": "deeppavlov.models.embedders.glove_embedder:GloVeEmbedder", - "go_bot": "deeppavlov.models.go_bot.go_bot:GoalOrientedBot", - "gobot_json_nlg_manager": "deeppavlov.models.go_bot.nlg.mock_json_nlg_manager:MockJSONNLGManager", - "gobot_nlg_manager": "deeppavlov.models.go_bot.nlg.nlg_manager:NLGManager", "hashing_tfidf_vectorizer": "deeppavlov.models.vectorizers.hashing_tfidf_vectorizer:HashingTfIdfVectorizer", "huggingface_dataset_iterator": "deeppavlov.dataset_iterators.huggingface_dataset_iterator:HuggingFaceDatasetIterator", "huggingface_dataset_reader": "deeppavlov.dataset_readers.huggingface_dataset_reader:HuggingFaceDatasetReader", - "hybrid_ner_model": "deeppavlov.models.ner.NER_model:HybridNerModel", "imdb_reader": "deeppavlov.dataset_readers.imdb_reader:ImdbReader", - "input_splitter": "deeppavlov.models.multitask_bert.multitask_bert:InputSplitter", - "jieba_tokenizer": "deeppavlov.models.tokenizers.jieba_tokenizer:JiebaTokenizer", - "joint_tagger_parser": "deeppavlov.models.syntax_parser.joint:JointTaggerParser", - "kbqa_entity_linker": "deeppavlov.models.kbqa.kbqa_entity_linking:KBEntityLinker", - "kbqa_reader": "deeppavlov.dataset_readers.kbqa_reader:KBQAReader", "kenlm_elector": "deeppavlov.models.spelling_correction.electors.kenlm_elector:KenlmElector", - "keras_classification_model": "deeppavlov.models.classifiers.keras_classification_model:KerasClassificationModel", - "kvret_dialog_iterator": "deeppavlov.dataset_iterators.kvret_dialog_iterator:KvretDialogDatasetIterator", - "kvret_reader": "deeppavlov.dataset_readers.kvret_reader:KvretDatasetReader", - "lazy_tokenizer": "deeppavlov.models.tokenizers.lazy_tokenizer:LazyTokenizer", - "lemmatized_output_prettifier": "deeppavlov.models.morpho_tagger.common:LemmatizedOutputPrettifier", "line_reader": "deeppavlov.dataset_readers.line_reader:LineReader", "logit_ranker": "deeppavlov.models.doc_retrieval.logit_ranker:LogitRanker", "mask": "deeppavlov.models.preprocessors.mask:Mask", - "md_yaml_dialogs_reader": "deeppavlov.dataset_readers.md_yaml_dialogs_reader:MD_YAML_DialogsDatasetReader", - "morpho_tagger": "deeppavlov.models.morpho_tagger.morpho_tagger:MorphoTagger", - "morphotagger_dataset": "deeppavlov.dataset_iterators.morphotagger_iterator:MorphoTaggerDatasetIterator", - "morphotagger_dataset_reader": "deeppavlov.dataset_readers.morphotagging_dataset_reader:MorphotaggerDatasetReader", - "mpm_nn": "deeppavlov.models.ranking.mpm_siamese_network:MPMSiameseNetwork", - "mt_bert": "deeppavlov.models.multitask_bert.multitask_bert:MultiTaskBert", - "mt_bert_classification_task": "deeppavlov.models.multitask_bert.multitask_bert:MTBertClassificationTask", - "mt_bert_reuser": "deeppavlov.models.multitask_bert.multitask_bert:MTBertReUser", - "mt_bert_seq_tagging_task": "deeppavlov.models.multitask_bert.multitask_bert:MTBertSequenceTaggingTask", "multi_squad_dataset_reader": "deeppavlov.dataset_readers.squad_dataset_reader:MultiSquadDatasetReader", "multi_squad_iterator": "deeppavlov.dataset_iterators.squad_iterator:MultiSquadIterator", "multi_squad_retr_iterator": "deeppavlov.dataset_iterators.squad_iterator:MultiSquadRetrIterator", - "multitask_iterator": "deeppavlov.dataset_iterators.multitask_iterator:MultiTaskIterator", - "multitask_reader": "deeppavlov.dataset_readers.multitask_reader:MultiTaskReader", - "nemo_asr": "deeppavlov.models.nemo.asr:NeMoASR", - "nemo_tts": "deeppavlov.models.nemo.tts:NeMoTTS", - "ner": "deeppavlov.models.ner.network:NerNetwork", - "ner_bio_converter": "deeppavlov.models.ner.bio:BIOMarkupRestorer", - "ner_chunker": "deeppavlov.models.kbqa.entity_linking:NerChunker", - "ner_few_shot_iterator": "deeppavlov.dataset_iterators.ner_few_shot_iterator:NERFewShotIterator", - "ner_preprocessor": "deeppavlov.models.preprocessors.ner_preprocessor:NerPreprocessor", - "ner_svm": "deeppavlov.models.ner.svm:SVMTagger", + "ner_chunk_model": "deeppavlov.models.entity_extraction.ner_chunker:NerChunkModel", + "ner_chunker": "deeppavlov.models.entity_extraction.ner_chunker:NerChunker", "ner_vocab": "deeppavlov.models.preprocessors.ner_preprocessor:NerVocab", "nltk_moses_tokenizer": "deeppavlov.models.tokenizers.nltk_moses_tokenizer:NLTKMosesTokenizer", "nltk_tokenizer": "deeppavlov.models.tokenizers.nltk_tokenizer:NLTKTokenizer", @@ -113,100 +40,70 @@ "paraphraser_reader": "deeppavlov.dataset_readers.paraphraser_reader:ParaphraserReader", "pop_ranker": "deeppavlov.models.doc_retrieval.pop_ranker:PopRanker", "proba2labels": "deeppavlov.models.classifiers.proba2labels:Proba2Labels", - "pymorphy_russian_lemmatizer": "deeppavlov.models.preprocessors.russian_lemmatizer:PymorphyRussianLemmatizer", - "pymorphy_vectorizer": "deeppavlov.models.vectorizers.word_vectorizer:PymorphyVectorizer", "query_generator": "deeppavlov.models.kbqa.query_generator:QueryGenerator", - "query_generator_online": "deeppavlov.models.kbqa.query_generator_online:QueryGeneratorOnline", - "question_sign_checker": "deeppavlov.models.kbqa.entity_detection_parser:QuestionSignChecker", - "random_emb_mat": "deeppavlov.models.preprocessors.random_embeddings_matrix:RandomEmbeddingsMatrix", - "rasa_skill": "deeppavlov.skills.rasa_skill.rasa_skill:RASASkill", - "rel_ranker": "deeppavlov.models.ranking.rel_ranker:RelRanker", - "rel_ranking_bert_infer": "deeppavlov.models.kbqa.rel_ranking_bert_infer:RelRankerBertInfer", + "question_sign_checker": "deeppavlov.models.entity_extraction.entity_detection_parser:question_sign_checker", + "re_classifier": "deeppavlov.models.relation_extraction.relation_extraction_bert:REBertModel", + "re_postprocessor": "deeppavlov.models.preprocessors.re_preprocessor:REPostprocessor", + "re_preprocessor": "deeppavlov.models.preprocessors.re_preprocessor:REPreprocessor", "rel_ranking_infer": "deeppavlov.models.kbqa.rel_ranking_infer:RelRankerInfer", + "rel_ranking_preprocessor": "deeppavlov.models.preprocessors.torch_transformers_preprocessor:RelRankingPreprocessor", "rel_ranking_reader": "deeppavlov.dataset_readers.rel_ranking_reader:ParaphraserReader", - "re_postprocessor": "deeppavlov.models.preprocessors.re_preprocessor:REPostprocessor", - "re_classifier": "deeppavlov.models.relation_extraction.relation_extraction_bert:REBertModel", "response_base_loader": "deeppavlov.models.preprocessors.response_base_loader:ResponseBaseLoader", "ru_adj_to_noun": "deeppavlov.models.kbqa.tree_to_sparql:RuAdjToNoun", - "ru_obscenity_classifier": "deeppavlov.models.classifiers.ru_obscenity_classifier:RuObscenityClassifier", - "ru_sent_tokenizer": "deeppavlov.models.tokenizers.ru_sent_tokenizer:RuSentTokenizer", "ru_tokenizer": "deeppavlov.models.tokenizers.ru_tokenizer:RussianTokenizer", "rured_reader": "deeppavlov.dataset_readers.rured_reader:RuREDDatasetReader", "russian_words_vocab": "deeppavlov.vocabs.typos:RussianWordsVocab", "sanitizer": "deeppavlov.models.preprocessors.sanitizer:Sanitizer", "sentseg_restore_sent": "deeppavlov.models.preprocessors.sentseg_preprocessor:SentSegRestoreSent", "siamese_iterator": "deeppavlov.dataset_iterators.siamese_iterator:SiameseIterator", - "siamese_predictor": "deeppavlov.models.ranking.siamese_predictor:SiamesePredictor", - "siamese_preprocessor": "deeppavlov.models.preprocessors.siamese_preprocessor:SiamesePreprocessor", - "siamese_reader": "deeppavlov.dataset_readers.siamese_reader:SiameseReader", - "simple_dstc2_reader": "deeppavlov.dataset_readers.dstc2_reader:SimpleDSTC2DatasetReader", "simple_vocab": "deeppavlov.core.data.simple_vocab:SimpleVocabulary", "sklearn_component": "deeppavlov.models.sklearn.sklearn_component:SklearnComponent", - "slotfill_raw": "deeppavlov.models.slotfill.slotfill_raw:SlotFillingComponent", - "slotfill_raw_rasa": "deeppavlov.models.slotfill.slotfill_raw:RASA_SlotFillingComponent", - "smn_nn": "deeppavlov.models.ranking.sequential_matching_network:SMNNetwork", - "snips_intents_iterator": "deeppavlov.dataset_iterators.snips_intents_iterator:SnipsIntentIterator", - "snips_ner_iterator": "deeppavlov.dataset_iterators.snips_ner_iterator:SnipsNerIterator", - "snips_reader": "deeppavlov.dataset_readers.snips_reader:SnipsReader", + "slovnet_syntax_parser": "deeppavlov.models.kbqa.tree_to_sparql:SlovnetSyntaxParser", "spelling_error_model": "deeppavlov.models.spelling_correction.brillmoore.error_model:ErrorModel", "spelling_levenshtein": "deeppavlov.models.spelling_correction.levenshtein.searcher_component:LevenshteinSearcherComponent", "split_tokenizer": "deeppavlov.models.tokenizers.split_tokenizer:SplitTokenizer", "sq_reader": "deeppavlov.dataset_readers.sq_reader:OntonotesReader", - "sqlite_database": "deeppavlov.core.data.sqlite_database:Sqlite3Database", "sqlite_iterator": "deeppavlov.dataset_iterators.sqlite_iterator:SQLiteDataIterator", - "squad_ans_postprocessor": "deeppavlov.models.preprocessors.squad_preprocessor:SquadAnsPostprocessor", - "squad_ans_preprocessor": "deeppavlov.models.preprocessors.squad_preprocessor:SquadAnsPreprocessor", "squad_bert_ans_postprocessor": "deeppavlov.models.preprocessors.squad_preprocessor:SquadBertAnsPostprocessor", "squad_bert_ans_preprocessor": "deeppavlov.models.preprocessors.squad_preprocessor:SquadBertAnsPreprocessor", - "squad_bert_infer": "deeppavlov.models.bert.bert_squad:BertSQuADInferModel", "squad_bert_mapping": "deeppavlov.models.preprocessors.squad_preprocessor:SquadBertMappingPreprocessor", - "squad_bert_model": "deeppavlov.models.bert.bert_squad:BertSQuADModel", "squad_dataset_reader": "deeppavlov.dataset_readers.squad_dataset_reader:SquadDatasetReader", "squad_iterator": "deeppavlov.dataset_iterators.squad_iterator:SquadIterator", - "squad_model": "deeppavlov.models.squad.squad:SquadModel", - "squad_preprocessor": "deeppavlov.models.preprocessors.squad_preprocessor:SquadPreprocessor", - "squad_vocab_embedder": "deeppavlov.models.preprocessors.squad_preprocessor:SquadVocabEmbedder", "static_dictionary": "deeppavlov.vocabs.typos:StaticDictionary", "str_lower": "deeppavlov.models.preprocessors.str_lower:str_lower", "str_token_reverser": "deeppavlov.models.preprocessors.str_token_reverser:StrTokenReverser", "str_utf8_encoder": "deeppavlov.models.preprocessors.str_utf8_encoder:StrUTF8Encoder", "stream_spacy_tokenizer": "deeppavlov.models.tokenizers.spacy_tokenizer:StreamSpacyTokenizer", "string_multiplier": "deeppavlov.models.preprocessors.odqa_preprocessors:StringMultiplier", - "tag_output_prettifier": "deeppavlov.models.morpho_tagger.common:TagOutputPrettifier", "template_matcher": "deeppavlov.models.kbqa.template_matcher:TemplateMatcher", "tfidf_ranker": "deeppavlov.models.doc_retrieval.tfidf_ranker:TfidfRanker", "tfidf_weighted": "deeppavlov.models.embedders.tfidf_weighted_embedder:TfidfWeightedEmbedder", "top1_elector": "deeppavlov.models.spelling_correction.electors.top1_elector:TopOneElector", - "torch_squad_transformers_preprocessor": "deeppavlov.models.preprocessors.torch_transformers_preprocessor:TorchSquadTransformersPreprocessor", - "torch_transformers_ner_preprocessor": "deeppavlov.models.preprocessors.torch_transformers_preprocessor:TorchTransformersNerPreprocessor", - "torch_transformers_multiplechoice_preprocessor": "deeppavlov.models.preprocessors.torch_transformers_preprocessor:TorchTransformersMultiplechoicePreprocessor", - "torch_transformers_multiplechoice": "deeppavlov.models.torch_bert.torch_transformers_multiplechoice:TorchTransformersMultiplechoiceModel", "torch_bert_ranker": "deeppavlov.models.torch_bert.torch_bert_ranker:TorchBertRankerModel", "torch_bert_ranker_preprocessor": "deeppavlov.models.preprocessors.torch_transformers_preprocessor:TorchBertRankerPreprocessor", - "torch_transformers_sequence_tagger": "deeppavlov.models.torch_bert.torch_transformers_sequence_tagger:TorchTransformersSequenceTagger", - "torch_transformers_squad_infer": "deeppavlov.models.torch_bert.torch_transformers_squad:TorchTransformersSquadInfer", - "torch_transformers_squad": "deeppavlov.models.torch_bert.torch_transformers_squad:TorchTransformersSquad", - "torch_text_classification_model": "deeppavlov.models.classifiers.torch_classification_model:TorchTextClassificationModel", "torch_record_postprocessor": "deeppavlov.models.preprocessors.torch_transformers_preprocessor:TorchRecordPostprocessor", + "torch_squad_transformers_preprocessor": "deeppavlov.models.preprocessors.torch_transformers_preprocessor:TorchSquadTransformersPreprocessor", + "torch_text_classification_model": "deeppavlov.models.classifiers.torch_classification_model:TorchTextClassificationModel", "torch_trainer": "deeppavlov.core.trainers.torch_trainer:TorchTrainer", "torch_transformers_classifier": "deeppavlov.models.torch_bert.torch_transformers_classifier:TorchTransformersClassifierModel", + "torch_transformers_el_ranker": "deeppavlov.models.torch_bert.torch_transformers_el_ranker:TorchTransformersElRanker", + "torch_transformers_entity_ranker_infer": "deeppavlov.models.torch_bert.torch_transformers_el_ranker:TorchTransformersEntityRankerInfer", + "torch_transformers_entity_ranker_preprocessor": "deeppavlov.models.preprocessors.torch_transformers_preprocessor:TorchTransformersEntityRankerPreprocessor", + "torch_transformers_multiplechoice": "deeppavlov.models.torch_bert.torch_transformers_multiplechoice:TorchTransformersMultiplechoiceModel", + "torch_transformers_multiplechoice_preprocessor": "deeppavlov.models.preprocessors.torch_transformers_preprocessor:TorchTransformersMultiplechoicePreprocessor", + "torch_transformers_ner_preprocessor": "deeppavlov.models.preprocessors.torch_transformers_preprocessor:TorchTransformersNerPreprocessor", "torch_transformers_preprocessor": "deeppavlov.models.preprocessors.torch_transformers_preprocessor:TorchTransformersPreprocessor", - "re_preprocessor": "deeppavlov.models.preprocessors.re_preprocessor:REPreprocessor", - "torchtext_classification_data_reader": "deeppavlov.dataset_readers.torchtext_classification_data_reader:TorchtextClassificationDataReader", + "torch_transformers_sequence_tagger": "deeppavlov.models.torch_bert.torch_transformers_sequence_tagger:TorchTransformersSequenceTagger", + "torch_transformers_squad": "deeppavlov.models.torch_bert.torch_transformers_squad:TorchTransformersSquad", "transformers_bert_embedder": "deeppavlov.models.embedders.transformers_embedder:TransformersBertEmbedder", "transformers_bert_preprocessor": "deeppavlov.models.preprocessors.transformers_preprocessor:TransformersBertPreprocessor", "tree_to_sparql": "deeppavlov.models.kbqa.tree_to_sparql:TreeToSparql", - "two_sentences_emb": "deeppavlov.models.ranking.rel_ranker:TwoSentencesEmbedder", "typos_custom_reader": "deeppavlov.dataset_readers.typos_reader:TyposCustom", "typos_iterator": "deeppavlov.dataset_iterators.typos_iterator:TyposDatasetIterator", "typos_kartaslov_reader": "deeppavlov.dataset_readers.typos_reader:TyposKartaslov", "typos_wikipedia_reader": "deeppavlov.dataset_readers.typos_reader:TyposWikipedia", - "ubuntu_v2_mt_reader": "deeppavlov.dataset_readers.ubuntu_v2_mt_reader:UbuntuV2MTReader", "ubuntu_v2_reader": "deeppavlov.dataset_readers.ubuntu_v2_reader:UbuntuV2Reader", "wiki_parser": "deeppavlov.models.kbqa.wiki_parser:WikiParser", - "wiki_parser_online": "deeppavlov.models.kbqa.wiki_parser_online:WikiParserOnline", "wiki_sqlite_vocab": "deeppavlov.vocabs.wiki_sqlite:WikiSQLiteVocab", - "wikitionary_100K_vocab": "deeppavlov.vocabs.typos:Wiki100KDictionary", - "intent_catcher_reader": "deeppavlov.dataset_readers.intent_catcher_reader:IntentCatcherReader", - "intent_catcher": "deeppavlov.models.intent_catcher.intent_catcher:IntentCatcher" + "wikitionary_100K_vocab": "deeppavlov.vocabs.typos:Wiki100KDictionary" } diff --git a/deeppavlov/core/common/requirements_registry.json b/deeppavlov/core/common/requirements_registry.json index 25754cab2d..d65eba771e 100644 --- a/deeppavlov/core/common/requirements_registry.json +++ b/deeppavlov/core/common/requirements_registry.json @@ -1,363 +1,167 @@ { - "UD_pymorphy_lemmatizer": [ - "{DEEPPAVLOV_PATH}/requirements/morpho_tagger.txt", - "{DEEPPAVLOV_PATH}/requirements/tf.txt" - ], - "aiml_skill": [ - "{DEEPPAVLOV_PATH}/requirements/aiml_skill.txt" - ], - "bert_classifier": [ - "{DEEPPAVLOV_PATH}/requirements/tf.txt", - "{DEEPPAVLOV_PATH}/requirements/bert_dp.txt" - ], - "bert_ner_preprocessor": [ - "{DEEPPAVLOV_PATH}/requirements/bert_dp.txt", - "{DEEPPAVLOV_PATH}/requirements/tf.txt" - ], - "bert_preprocessor": [ - "{DEEPPAVLOV_PATH}/requirements/bert_dp.txt", - "{DEEPPAVLOV_PATH}/requirements/tf.txt" - ], - "bert_ranker": [ - "{DEEPPAVLOV_PATH}/requirements/tf.txt", - "{DEEPPAVLOV_PATH}/requirements/bert_dp.txt" - ], - "bert_ranker_preprocessor": [ - "{DEEPPAVLOV_PATH}/requirements/bert_dp.txt", - "{DEEPPAVLOV_PATH}/requirements/tf.txt" - ], - "bert_sep_ranker": [ - "{DEEPPAVLOV_PATH}/requirements/tf.txt", - "{DEEPPAVLOV_PATH}/requirements/bert_dp.txt" - ], - "bert_sep_ranker_predictor": [ - "{DEEPPAVLOV_PATH}/requirements/tf.txt", - "{DEEPPAVLOV_PATH}/requirements/bert_dp.txt" - ], - "bert_sep_ranker_predictor_preprocessor": [ - "{DEEPPAVLOV_PATH}/requirements/bert_dp.txt", - "{DEEPPAVLOV_PATH}/requirements/tf.txt" - ], - "bert_sep_ranker_preprocessor": [ - "{DEEPPAVLOV_PATH}/requirements/bert_dp.txt", - "{DEEPPAVLOV_PATH}/requirements/tf.txt" - ], - "bert_sequence_network": [ - "{DEEPPAVLOV_PATH}/requirements/tf.txt", - "{DEEPPAVLOV_PATH}/requirements/bert_dp.txt" - ], - "bert_sequence_tagger": [ - "{DEEPPAVLOV_PATH}/requirements/tf.txt", - "{DEEPPAVLOV_PATH}/requirements/bert_dp.txt" - ], - "bert_syntax_parser": [ - "{DEEPPAVLOV_PATH}/requirements/tf.txt", - "{DEEPPAVLOV_PATH}/requirements/bert_dp.txt" - ], - "chu_liu_edmonds_transformer": [ - "{DEEPPAVLOV_PATH}/requirements/syntax_parser.txt" - ], - "dam_nn_use_transformer": [ - "{DEEPPAVLOV_PATH}/requirements/tf.txt", - "{DEEPPAVLOV_PATH}/requirements/tf-hub.txt" - ], - "dependency_output_prettifier": [ - "{DEEPPAVLOV_PATH}/requirements/tf.txt" - ], - "dictionary_vectorizer": [ - "{DEEPPAVLOV_PATH}/requirements/morpho_tagger.txt", - "{DEEPPAVLOV_PATH}/requirements/tf.txt" + "answer_types_extractor": [ + "{DEEPPAVLOV_PATH}/requirements/en_core_web_sm.txt", + "{DEEPPAVLOV_PATH}/requirements/ru_core_news_sm.txt" ], - "dstc_slotfilling": [ + "entity_linker": [ + "{DEEPPAVLOV_PATH}/requirements/hdt.txt", "{DEEPPAVLOV_PATH}/requirements/rapidfuzz.txt" ], - "elmo_embedder": [ - "{DEEPPAVLOV_PATH}/requirements/tf.txt", - "{DEEPPAVLOV_PATH}/requirements/tf-hub.txt" - ], - "elmo_model": [ - "{DEEPPAVLOV_PATH}/requirements/tf.txt", - "{DEEPPAVLOV_PATH}/requirements/tf-hub.txt" - ], "fasttext": [ "{DEEPPAVLOV_PATH}/requirements/fasttext.txt" ], - "glove": [ - "{DEEPPAVLOV_PATH}/requirements/gensim.txt" - ], - "go_bot": [ - "{DEEPPAVLOV_PATH}/requirements/tf.txt" - ], - "hybrid_ner_model": [ - "{DEEPPAVLOV_PATH}/requirements/tf.txt", - "{DEEPPAVLOV_PATH}/requirements/tf-hub.txt", - "{DEEPPAVLOV_PATH}/requirements/gensim.txt" - ], - "input_splitter": [ - "{DEEPPAVLOV_PATH}/requirements/tf.txt", - "{DEEPPAVLOV_PATH}/requirements/bert_dp.txt" - ], - "jieba_tokenizer": [ - "{DEEPPAVLOV_PATH}/requirements/jieba.txt" + "huggingface_dataset_iterator": [ + "{DEEPPAVLOV_PATH}/requirements/datasets.txt" ], - "joint_tagger_parser": [ - "{DEEPPAVLOV_PATH}/requirements/tf.txt" + "huggingface_dataset_reader": [ + "{DEEPPAVLOV_PATH}/requirements/datasets.txt" ], "kenlm_elector": [ "{DEEPPAVLOV_PATH}/requirements/kenlm.txt" ], - "keras_classification_model": [ - "{DEEPPAVLOV_PATH}/requirements/tf.txt" - ], - "lemmatized_output_prettifier": [ - "{DEEPPAVLOV_PATH}/requirements/tf.txt" - ], - "morpho_tagger": [ - "{DEEPPAVLOV_PATH}/requirements/tf.txt" - ], - "mpm_nn": [ - "{DEEPPAVLOV_PATH}/requirements/tf.txt" - ], - "mt_bert": [ - "{DEEPPAVLOV_PATH}/requirements/tf.txt", - "{DEEPPAVLOV_PATH}/requirements/bert_dp.txt" - ], - "mt_bert_classification_task": [ - "{DEEPPAVLOV_PATH}/requirements/tf.txt", - "{DEEPPAVLOV_PATH}/requirements/bert_dp.txt" + "ner_chunk_model": [ + "{DEEPPAVLOV_PATH}/requirements/pytorch.txt", + "{DEEPPAVLOV_PATH}/requirements/transformers.txt" ], - "mt_bert_reuser": [ - "{DEEPPAVLOV_PATH}/requirements/tf.txt", - "{DEEPPAVLOV_PATH}/requirements/bert_dp.txt" + "ner_chunker": [ + "{DEEPPAVLOV_PATH}/requirements/pytorch.txt", + "{DEEPPAVLOV_PATH}/requirements/transformers.txt" ], - "mt_bert_seq_tagging_task": [ - "{DEEPPAVLOV_PATH}/requirements/tf.txt", - "{DEEPPAVLOV_PATH}/requirements/bert_dp.txt" + "nltk_moses_tokenizer": [ + "{DEEPPAVLOV_PATH}/requirements/sacremoses.txt" ], - "ner": [ - "{DEEPPAVLOV_PATH}/requirements/tf.txt" + "query_generator": [ + "{DEEPPAVLOV_PATH}/requirements/en_core_web_sm.txt", + "{DEEPPAVLOV_PATH}/requirements/hdt.txt", + "{DEEPPAVLOV_PATH}/requirements/rapidfuzz.txt", + "{DEEPPAVLOV_PATH}/requirements/whapi.txt" ], - "pymorphy_vectorizer": [ - "{DEEPPAVLOV_PATH}/requirements/morpho_tagger.txt", - "{DEEPPAVLOV_PATH}/requirements/tf.txt" + "re_classifier": [ + "{DEEPPAVLOV_PATH}/requirements/opt_einsum.txt", + "{DEEPPAVLOV_PATH}/requirements/pytorch.txt", + "{DEEPPAVLOV_PATH}/requirements/transformers.txt" ], - "rasa_skill": [ - "{DEEPPAVLOV_PATH}/requirements/rasa_skill.txt", - "{DEEPPAVLOV_PATH}/requirements/tf.txt" + "re_postprocessor": [ + "{DEEPPAVLOV_PATH}/requirements/pytorch.txt", + "{DEEPPAVLOV_PATH}/requirements/transformers.txt" ], - "rel_ranker": [ - "{DEEPPAVLOV_PATH}/requirements/tf.txt" + "re_preprocessor": [ + "{DEEPPAVLOV_PATH}/requirements/pytorch.txt", + "{DEEPPAVLOV_PATH}/requirements/transformers.txt" ], "rel_ranking_infer": [ - "{DEEPPAVLOV_PATH}/requirements/tf.txt" - ], - "siamese_predictor": [ - "{DEEPPAVLOV_PATH}/requirements/tf.txt" - ], - "smn_nn": [ - "{DEEPPAVLOV_PATH}/requirements/tf.txt" - ], - "squad_bert_infer": [ - "{DEEPPAVLOV_PATH}/requirements/tf.txt", - "{DEEPPAVLOV_PATH}/requirements/bert_dp.txt" - ], - "squad_bert_model": [ - "{DEEPPAVLOV_PATH}/requirements/tf.txt", - "{DEEPPAVLOV_PATH}/requirements/bert_dp.txt" - ], - "squad_model": [ - "{DEEPPAVLOV_PATH}/requirements/tf.txt" - ], - "stream_spacy_tokenizer": [ - "{DEEPPAVLOV_PATH}/requirements/spacy.txt", - "{DEEPPAVLOV_PATH}/requirements/en_core_web_sm.txt" - ], - "tag_output_prettifier": [ - "{DEEPPAVLOV_PATH}/requirements/tf.txt" + "{DEEPPAVLOV_PATH}/requirements/en_core_web_sm.txt", + "{DEEPPAVLOV_PATH}/requirements/hdt.txt" ], - "two_sentences_emb": [ - "{DEEPPAVLOV_PATH}/requirements/tf.txt" + "rel_ranking_preprocessor": [ + "{DEEPPAVLOV_PATH}/requirements/pytorch.txt", + "{DEEPPAVLOV_PATH}/requirements/transformers.txt" ], - "bilstm_gru_nn": [ - "{DEEPPAVLOV_PATH}/requirements/tf.txt" + "ru_adj_to_noun": [ + "{DEEPPAVLOV_PATH}/requirements/udapi.txt" ], - "wiki_parser": [ - "{DEEPPAVLOV_PATH}/requirements/hdt.txt" + "russian_words_vocab": [ + "{DEEPPAVLOV_PATH}/requirements/lxml.txt" ], - "bilstm_nn": [ - "{DEEPPAVLOV_PATH}/requirements/tf.txt" + "slovnet_syntax_parser": [ + "{DEEPPAVLOV_PATH}/requirements/slovnet.txt" ], - "typos_wikipedia_reader": [ + "spelling_error_model": [ "{DEEPPAVLOV_PATH}/requirements/lxml.txt" ], + "spelling_levenshtein": [ + "{DEEPPAVLOV_PATH}/requirements/sortedcontainers.txt" + ], "static_dictionary": [ "{DEEPPAVLOV_PATH}/requirements/lxml.txt" ], - "base64_decode_bytesIO": [ - "{DEEPPAVLOV_PATH}/requirements/nemo.txt" + "stream_spacy_tokenizer": [ + "{DEEPPAVLOV_PATH}/requirements/en_core_web_sm.txt" ], - "wikitionary_100K_vocab": [ - "{DEEPPAVLOV_PATH}/requirements/lxml.txt" + "torch_bert_ranker": [ + "{DEEPPAVLOV_PATH}/requirements/pytorch.txt", + "{DEEPPAVLOV_PATH}/requirements/transformers.txt" ], - "huggingface_dataset_iterator": [ - "{DEEPPAVLOV_PATH}/requirements/datasets.txt" + "torch_bert_ranker_preprocessor": [ + "{DEEPPAVLOV_PATH}/requirements/pytorch.txt", + "{DEEPPAVLOV_PATH}/requirements/transformers.txt" ], - "bytesIO_encode_base64": [ - "{DEEPPAVLOV_PATH}/requirements/nemo.txt" + "torch_record_postprocessor": [ + "{DEEPPAVLOV_PATH}/requirements/pytorch.txt", + "{DEEPPAVLOV_PATH}/requirements/transformers.txt" ], - "typos_custom_reader": [ - "{DEEPPAVLOV_PATH}/requirements/lxml.txt" + "torch_squad_transformers_preprocessor": [ + "{DEEPPAVLOV_PATH}/requirements/pytorch.txt", + "{DEEPPAVLOV_PATH}/requirements/transformers.txt" ], "torch_text_classification_model": [ - "{DEEPPAVLOV_PATH}/requirements/pytorch16.txt" + "{DEEPPAVLOV_PATH}/requirements/pytorch.txt" ], - "huggingface_dataset_reader": [ - "{DEEPPAVLOV_PATH}/requirements/datasets.txt" + "torch_transformers_classifier": [ + "{DEEPPAVLOV_PATH}/requirements/pytorch.txt", + "{DEEPPAVLOV_PATH}/requirements/transformers.txt" ], - "tree_to_sparql": [ - "{DEEPPAVLOV_PATH}/requirements/udapi.txt" + "torch_transformers_el_ranker": [ + "{DEEPPAVLOV_PATH}/requirements/pytorch.txt", + "{DEEPPAVLOV_PATH}/requirements/transformers.txt" ], - "torch_squad_bert_model": [ - "{DEEPPAVLOV_PATH}/requirements/pytorch16.txt", + "torch_transformers_entity_ranker_infer": [ + "{DEEPPAVLOV_PATH}/requirements/pytorch.txt", "{DEEPPAVLOV_PATH}/requirements/transformers.txt" ], - "torch_transformers_preprocessor": [ - "{DEEPPAVLOV_PATH}/requirements/pytorch16.txt", + "torch_transformers_entity_ranker_preprocessor": [ + "{DEEPPAVLOV_PATH}/requirements/pytorch.txt", "{DEEPPAVLOV_PATH}/requirements/transformers.txt" ], - "torch_squad_transformers_preprocessor": [ - "{DEEPPAVLOV_PATH}/requirements/pytorch16.txt", + "torch_transformers_multiplechoice": [ + "{DEEPPAVLOV_PATH}/requirements/pytorch.txt", "{DEEPPAVLOV_PATH}/requirements/transformers.txt" ], "torch_transformers_multiplechoice_preprocessor": [ - "{DEEPPAVLOV_PATH}/requirements/pytorch16.txt", + "{DEEPPAVLOV_PATH}/requirements/pytorch.txt", "{DEEPPAVLOV_PATH}/requirements/transformers.txt" ], - "torch_transformers_multiplechoice": [ - "{DEEPPAVLOV_PATH}/requirements/pytorch16.txt", + "torch_transformers_ner_preprocessor": [ + "{DEEPPAVLOV_PATH}/requirements/pytorch.txt", "{DEEPPAVLOV_PATH}/requirements/transformers.txt" ], - "torch_bert_ranker": [ - "{DEEPPAVLOV_PATH}/requirements/pytorch16.txt", - "{DEEPPAVLOV_PATH}/requirements/transformers28.txt" - ], - "torch_transformers_classifier": [ - "{DEEPPAVLOV_PATH}/requirements/pytorch16.txt", + "torch_transformers_preprocessor": [ + "{DEEPPAVLOV_PATH}/requirements/pytorch.txt", "{DEEPPAVLOV_PATH}/requirements/transformers.txt" ], "torch_transformers_sequence_tagger": [ - "{DEEPPAVLOV_PATH}/requirements/pytorch16.txt", + "{DEEPPAVLOV_PATH}/requirements/pytorch.txt", + "{DEEPPAVLOV_PATH}/requirements/torchcrf.txt", "{DEEPPAVLOV_PATH}/requirements/transformers.txt" ], - "ru_adj_to_noun": [ - "{DEEPPAVLOV_PATH}/requirements/udapi.txt" - ], - "transformers_bert_embedder": [ - "{DEEPPAVLOV_PATH}/requirements/pytorch16.txt", + "torch_transformers_squad": [ + "{DEEPPAVLOV_PATH}/requirements/pytorch.txt", "{DEEPPAVLOV_PATH}/requirements/transformers.txt" ], - "torch_transformers_ner_preprocessor": [ - "{DEEPPAVLOV_PATH}/requirements/pytorch16.txt", + "transformers_bert_embedder": [ + "{DEEPPAVLOV_PATH}/requirements/pytorch.txt", "{DEEPPAVLOV_PATH}/requirements/transformers.txt" ], - "torch_bert_ranker_preprocessor": [ - "{DEEPPAVLOV_PATH}/requirements/pytorch16.txt", - "{DEEPPAVLOV_PATH}/requirements/transformers28.txt" - ], "transformers_bert_preprocessor": [ + "{DEEPPAVLOV_PATH}/requirements/pytorch.txt", "{DEEPPAVLOV_PATH}/requirements/transformers.txt" ], - "spelling_levenshtein": [ - "{DEEPPAVLOV_PATH}/requirements/sortedcontainers.txt" + "tree_to_sparql": [ + "{DEEPPAVLOV_PATH}/requirements/udapi.txt" ], - "typos_kartaslov_reader": [ + "typos_custom_reader": [ "{DEEPPAVLOV_PATH}/requirements/lxml.txt" ], - "torch_squad_bert_infer": [ - "{DEEPPAVLOV_PATH}/requirements/pytorch16.txt", - "{DEEPPAVLOV_PATH}/requirements/transformers.txt" - ], - "nemo_asr": [ - "{DEEPPAVLOV_PATH}/requirements/pytorch14.txt", - "{DEEPPAVLOV_PATH}/requirements/nemo.txt", - "{DEEPPAVLOV_PATH}/requirements/nemo-asr.txt" - ], - "nemo_tts": [ - "{DEEPPAVLOV_PATH}/requirements/pytorch14.txt", - "{DEEPPAVLOV_PATH}/requirements/nemo.txt", - "{DEEPPAVLOV_PATH}/requirements/nemo-asr.txt", - "{DEEPPAVLOV_PATH}/requirements/transformers28.txt", - "{DEEPPAVLOV_PATH}/requirements/nemo-tts.txt" - ], - "spelling_error_model": [ + "typos_kartaslov_reader": [ "{DEEPPAVLOV_PATH}/requirements/lxml.txt" ], - "torchtext_classification_data_reader": [ - "{DEEPPAVLOV_PATH}/requirements/torchtext.txt" - ], - "russian_words_vocab": [ + "typos_wikipedia_reader": [ "{DEEPPAVLOV_PATH}/requirements/lxml.txt" ], - "query_generator": [ - "{DEEPPAVLOV_PATH}/requirements/hdt.txt", - "{DEEPPAVLOV_PATH}/requirements/tf.txt", - "{DEEPPAVLOV_PATH}/requirements/bert_dp.txt", - "{DEEPPAVLOV_PATH}/requirements/spacy.txt", - "{DEEPPAVLOV_PATH}/requirements/en_core_web_sm.txt", - "{DEEPPAVLOV_PATH}/requirements/whapi.txt", - "{DEEPPAVLOV_PATH}/requirements/faiss.txt" - ], - "kbqa_entity_linker": [ - "{DEEPPAVLOV_PATH}/requirements/rapidfuzz.txt", - "{DEEPPAVLOV_PATH}/requirements/hdt.txt", - "{DEEPPAVLOV_PATH}/requirements/sortedcontainers.txt", - "{DEEPPAVLOV_PATH}/requirements/tf.txt", - "{DEEPPAVLOV_PATH}/requirements/bert_dp.txt", - "{DEEPPAVLOV_PATH}/requirements/spacy.txt", - "{DEEPPAVLOV_PATH}/requirements/en_core_web_sm.txt" - ], - "rel_ranking_bert_infer": [ - "{DEEPPAVLOV_PATH}/requirements/tf.txt", - "{DEEPPAVLOV_PATH}/requirements/hdt.txt", - "{DEEPPAVLOV_PATH}/requirements/bert_dp.txt", - "{DEEPPAVLOV_PATH}/requirements/spacy.txt", - "{DEEPPAVLOV_PATH}/requirements/en_core_web_sm.txt" - ], - "query_generator_online": [ - "{DEEPPAVLOV_PATH}/requirements/tf.txt", - "{DEEPPAVLOV_PATH}/requirements/hdt.txt", - "{DEEPPAVLOV_PATH}/requirements/bert_dp.txt", - "{DEEPPAVLOV_PATH}/requirements/spacy.txt", - "{DEEPPAVLOV_PATH}/requirements/en_core_web_sm.txt", - "{DEEPPAVLOV_PATH}/requirements/whapi.txt", - "{DEEPPAVLOV_PATH}/requirements/faiss.txt" - ], - "ner_chunker": [ - "{DEEPPAVLOV_PATH}/requirements/faiss.txt", - "{DEEPPAVLOV_PATH}/requirements/tf.txt", - "{DEEPPAVLOV_PATH}/requirements/hdt.txt", - "{DEEPPAVLOV_PATH}/requirements/bert_dp.txt", - "{DEEPPAVLOV_PATH}/requirements/spacy.txt", - "{DEEPPAVLOV_PATH}/requirements/en_core_web_sm.txt" - ], - "intent_catcher": [ - "{DEEPPAVLOV_PATH}/requirements/tf.txt", - "{DEEPPAVLOV_PATH}/requirements/tf-hub.txt", - "{DEEPPAVLOV_PATH}/requirements/xeger.txt" - ], - "entity_linker": [ - "{DEEPPAVLOV_PATH}/requirements/faiss.txt", - "{DEEPPAVLOV_PATH}/requirements/tf.txt", - "{DEEPPAVLOV_PATH}/requirements/hdt.txt", - "{DEEPPAVLOV_PATH}/requirements/bert_dp.txt", - "{DEEPPAVLOV_PATH}/requirements/spacy.txt", - "{DEEPPAVLOV_PATH}/requirements/en_core_web_sm.txt" - ], - "re_preprocessor": [ - "{DEEPPAVLOV_PATH}/requirements/pytorch16.txt", - "{DEEPPAVLOV_PATH}/requirements/transformers.txt" + "wiki_parser": [ + "{DEEPPAVLOV_PATH}/requirements/hdt.txt" ], - "re_classifier": [ - "{DEEPPAVLOV_PATH}/requirements/opt_einsum.txt" + "wikitionary_100K_vocab": [ + "{DEEPPAVLOV_PATH}/requirements/lxml.txt" ] } diff --git a/deeppavlov/core/data/sqlite_database.py b/deeppavlov/core/data/sqlite_database.py deleted file mode 100644 index 144cda2fbe..0000000000 --- a/deeppavlov/core/data/sqlite_database.py +++ /dev/null @@ -1,187 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import sqlite3 -from logging import getLogger -from typing import List, Dict - -from deeppavlov.core.common.registry import register -from deeppavlov.core.models.estimator import Estimator - -log = getLogger(__name__) - - -@register('sqlite_database') -class Sqlite3Database(Estimator): - """ - Loads and trains sqlite table of any items (with name ``table_name`` - and path ``save_path``). - - Primary (unique) keys must be specified, all other keys are infered from data. - Batch here is a list of dictionaries, where each dictionary corresponds to an item. - If an item doesn't contain values for all keys, then missing values will be stored - with ``unknown_value``. - - Parameters: - save_path: sqlite database path. - primary_keys: list of table primary keys' names. - keys: all table keys' names. - table_name: name of the sqlite table. - unknown_value: value assigned to missing item values. - **kwargs: parameters passed to parent - :class:`~deeppavlov.core.models.estimator.Estimator` class. - """ - - def __init__(self, - save_path: str, - primary_keys: List[str], - keys: List[str] = None, - table_name: str = "mytable", - unknown_value: str = 'UNK', - *args, **kwargs) -> None: - super().__init__(save_path=save_path, *args, **kwargs) - - self.primary_keys = primary_keys - if not self.primary_keys: - raise ValueError("Primary keys list can't be empty") - self.tname = table_name - self.keys = keys - self.unknown_value = unknown_value - - self.conn = sqlite3.connect(str(self.save_path), - check_same_thread=False) - self.cursor = self.conn.cursor() - if self._check_if_table_exists(): - log.info(f"Loading database from {self.save_path}.") - if not self.keys: - self.keys = self._get_keys() - else: - log.info(f"Initializing empty database on {self.save_path}.") - - def __call__(self, batch: List[Dict], - order_by: str = None, - ascending: bool = False) -> List[List[Dict]]: - order = 'ASC' if ascending else 'DESC' - if not self._check_if_table_exists(): - log.warning("Database is empty, call fit() before using.") - return [[] for i in range(len(batch))] - return [self._search(b, order_by=order_by, order=order) for b in batch] - - def _check_if_table_exists(self): - self.cursor.execute(f"SELECT name FROM sqlite_master" - f" WHERE type='table'" - f" AND name='{self.tname}';") - return bool(self.cursor.fetchall()) - - def _search(self, kv=None, order_by=None, order=''): - order_expr = f" ORDER BY {order_by} {order}" if order_by else '' - if kv: - keys, values = zip(*kv.items()) - where_expr = " AND ".join(f"{k}=?" for k in keys) - self.cursor.execute(f"SELECT * FROM {self.tname} WHERE {where_expr}" + order_expr, values) - else: - self.cursor.execute(f"SELECT * FROM {self.tname}" + order_expr) - return [self._wrap_selection(s) for s in self.cursor.fetchall()] - - def _wrap_selection(self, selection): - if not self.keys: - self.keys = self._get_keys() - return {f: v for f, v in zip(self.keys, selection)} - - def _get_keys(self): - self.cursor.execute(f"PRAGMA table_info({self.tname});") - return [info[1] for info in self.cursor] - - def _get_types(self): - self.cursor.execute(f"PRAGMA table_info({self.tname});") - return {info[1]: info[2] for info in self.cursor} - - def fit(self, data: List[Dict]) -> None: - if not self._check_if_table_exists(): - self.keys = self.keys or [key for key in data[0]] - # because in the next line we assume that in the first dict there are all (!) necessary keys: - types = ('integer' if isinstance(data[0][k], int) else 'text' for k in self.keys) - self._create_table(self.keys, types) - elif not self.keys: - self.keys = self._get_keys() - - self._insert_many(data) - - def _create_table(self, keys, types): - if any(pk not in keys for pk in self.primary_keys): - raise ValueError(f"Primary keys must be from {keys}.") - new_types = (f"{k} {t} primary key" - if k in self.primary_keys else f"{k} {t}" - for k, t in zip(keys, types)) - new_types_joined = ', '.join(new_types) - self.cursor.execute(f"CREATE TABLE IF NOT EXISTS {self.tname}" - f" ({new_types_joined})") - log.info(f"Created table with keys {self._get_types()}.") - - def _insert_many(self, data): - to_insert = {} - to_update = {} - for kv in filter(None, data): - primary_values = tuple(kv[pk] for pk in self.primary_keys) - record = tuple(kv.get(k, self.unknown_value) for k in self.keys) - curr_record = self._get_record(primary_values) - if curr_record: - if primary_values in to_update: - curr_record = to_update[primary_values] - if curr_record != record: - to_update[primary_values] = record - else: - to_insert[primary_values] = record - - if to_insert: - fformat = ','.join(['?'] * len(self.keys)) - self.cursor.executemany(f"INSERT into {self.tname}" + - f" VALUES ({fformat})", - to_insert.values()) - if to_update: - for record in to_update.values(): - self._update_one(record) - - self.conn.commit() - - def _get_record(self, primary_values): - ffields = ", ".join(self.keys) or "*" - where_expr = " AND ".join(f"{pk}=?" for pk in self.primary_keys) - fetched = self.cursor.execute(f"SELECT {ffields} FROM {self.tname}" + - f" WHERE {where_expr}", primary_values).fetchone() - if not fetched: - return None - return fetched - - def _update_one(self, record): - set_values, where_values = [], [] - set_fields, where_fields = [], [] - for k, v in zip(self.keys, record): - if k in self.primary_keys: - where_fields.append(f"{k}=?") - where_values.append(v) - else: - set_fields.append(f"{k}=?") - set_values.append(v) - set_expr = ", ".join(set_fields) - where_expr = " AND ".join(where_fields) - self.cursor.execute(f"UPDATE {self.tname}" + - f" SET {set_expr}" + - f" WHERE {where_expr}", set_values+where_values) - - def save(self): - pass - - def load(self): - pass diff --git a/deeppavlov/core/layers/keras_layers.py b/deeppavlov/core/layers/keras_layers.py deleted file mode 100644 index 29635537c6..0000000000 --- a/deeppavlov/core/layers/keras_layers.py +++ /dev/null @@ -1,223 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from tensorflow.keras import backend as K -from tensorflow.keras.activations import softmax -from tensorflow.keras.layers import Dense, Reshape, Concatenate, Lambda, Layer, Multiply - - -def expand_tile(units, axis): - """ - Expand and tile tensor along given axis - - Args: - units: tf tensor with dimensions [batch_size, time_steps, n_input_features] - axis: axis along which expand and tile. Must be 1 or 2 - - """ - assert axis in (1, 2) - n_time_steps = K.int_shape(units)[1] - repetitions = [1, 1, 1, 1] - repetitions[axis] = n_time_steps - if axis == 1: - expanded = Reshape(target_shape=((1,) + K.int_shape(units)[1:]))(units) - else: - expanded = Reshape(target_shape=(K.int_shape(units)[1:2] + (1,) + K.int_shape(units)[2:]))(units) - return K.tile(expanded, repetitions) - - -def additive_self_attention(units, n_hidden=None, n_output_features=None, activation=None): - """ - Compute additive self attention for time series of vectors (with batch dimension) - the formula: score(h_i, h_j) = - v is a learnable vector of n_hidden dimensionality, - W_1 and W_2 are learnable [n_hidden, n_input_features] matrices - - Args: - units: tf tensor with dimensionality [batch_size, time_steps, n_input_features] - n_hidden: number of2784131 units in hidden representation of similarity measure - n_output_features: number of features in output dense layer - activation: activation at the output - - Returns: - output: self attended tensor with dimensionality [batch_size, time_steps, n_output_features] - """ - n_input_features = K.int_shape(units)[2] - if n_hidden is None: - n_hidden = n_input_features - if n_output_features is None: - n_output_features = n_input_features - exp1 = Lambda(lambda x: expand_tile(x, axis=1))(units) - exp2 = Lambda(lambda x: expand_tile(x, axis=2))(units) - units_pairs = Concatenate(axis=3)([exp1, exp2]) - query = Dense(n_hidden, activation="tanh")(units_pairs) - attention = Dense(1, activation=lambda x: softmax(x, axis=2))(query) - attended_units = Lambda(lambda x: K.sum(attention * x, axis=2))(exp1) - output = Dense(n_output_features, activation=activation)(attended_units) - return output - - -def multiplicative_self_attention(units, n_hidden=None, n_output_features=None, activation=None): - """ - Compute multiplicative self attention for time series of vectors (with batch dimension) - the formula: score(h_i, h_j) = , W_1 and W_2 are learnable matrices - with dimensionality [n_hidden, n_input_features] - - Args: - units: tf tensor with dimensionality [batch_size, time_steps, n_input_features] - n_hidden: number of units in hidden representation of similarity measure - n_output_features: number of features in output dense layer - activation: activation at the output - - Returns: - output: self attended tensor with dimensionality [batch_size, time_steps, n_output_features] - """ - n_input_features = K.int_shape(units)[2] - if n_hidden is None: - n_hidden = n_input_features - if n_output_features is None: - n_output_features = n_input_features - exp1 = Lambda(lambda x: expand_tile(x, axis=1))(units) - exp2 = Lambda(lambda x: expand_tile(x, axis=2))(units) - queries = Dense(n_hidden)(exp1) - keys = Dense(n_hidden)(exp2) - scores = Lambda(lambda x: K.sum(queries * x, axis=3, keepdims=True))(keys) - attention = Lambda(lambda x: softmax(x, axis=2))(scores) - mult = Multiply()([attention, exp1]) - attended_units = Lambda(lambda x: K.sum(x, axis=2))(mult) - output = Dense(n_output_features, activation=activation)(attended_units) - return output - - -class MatchingLayer(Layer): - def __init__(self, output_dim, **kwargs): - self.output_dim = output_dim - self.W = [] - super().__init__(**kwargs) - - def build(self, input_shape): - assert isinstance(input_shape, list) - self.W = [] - for i in range(self.output_dim): - self.W.append(self.add_weight(name='kernel', - shape=(1, input_shape[0][-1]), - initializer='uniform', - trainable=True)) - super().build(input_shape) # Be sure to call this at the end - - def compute_output_shape(self, input_shape): - assert isinstance(input_shape, list) - shape_a, shape_b = input_shape - return [(shape_a[0], shape_a[1], self.output_dim), (shape_a[0], shape_a[1], self.output_dim)] - - -class FullMatchingLayer(MatchingLayer): - - def call(self, x, **kwargs): - assert isinstance(x, list) - inp_a, inp_b = x - last_state = K.expand_dims(inp_b[:, -1, :], 1) - m = [] - for i in range(self.output_dim): - outp_a = inp_a * self.W[i] - outp_last = last_state * self.W[i] - outp_a = K.l2_normalize(outp_a, -1) - outp_last = K.l2_normalize(outp_last, -1) - outp = K.batch_dot(outp_a, outp_last, axes=[2, 2]) - m.append(outp) - if self.output_dim > 1: - persp = K.concatenate(m, 2) - else: - persp = m[0] - return [persp, persp] - - -class MaxpoolingMatchingLayer(MatchingLayer): - - def call(self, x, **kwargs): - assert isinstance(x, list) - inp_a, inp_b = x - m = [] - for i in range(self.output_dim): - outp_a = inp_a * self.W[i] - outp_b = inp_b * self.W[i] - outp_a = K.l2_normalize(outp_a, -1) - outp_b = K.l2_normalize(outp_b, -1) - outp = K.batch_dot(outp_a, outp_b, axes=[2, 2]) - outp = K.max(outp, -1, keepdims=True) - m.append(outp) - if self.output_dim > 1: - persp = K.concatenate(m, 2) - else: - persp = m[0] - return [persp, persp] - - -class AttentiveMatchingLayer(MatchingLayer): - - def call(self, x, **kwargs): - assert isinstance(x, list) - inp_a, inp_b = x - - outp_a = K.l2_normalize(inp_a, -1) - outp_b = K.l2_normalize(inp_b, -1) - alpha = K.batch_dot(outp_b, outp_a, axes=[1, 1]) - alpha = K.l2_normalize(alpha, 1) - hmean = K.batch_dot(outp_b, alpha, axes=[2, 1]) - kcon = K.eye(K.int_shape(inp_a)[1], dtype='float32') - - m = [] - for i in range(self.output_dim): - outp_a = inp_a * self.W[i] - outp_hmean = hmean * self.W[i] - outp_a = K.l2_normalize(outp_a, -1) - outp_hmean = K.l2_normalize(outp_hmean, -1) - outp = K.batch_dot(outp_hmean, outp_a, axes=[2, 2]) - outp = K.sum(outp * kcon, -1, keepdims=True) - m.append(outp) - if self.output_dim > 1: - persp = K.concatenate(m, 2) - else: - persp = m[0] - return [persp, persp] - - -class MaxattentiveMatchingLayer(MatchingLayer): - - def call(self, x, **kwargs): - assert isinstance(x, list) - inp_a, inp_b = x - - outp_a = K.l2_normalize(inp_a, -1) - outp_b = K.l2_normalize(inp_b, -1) - alpha = K.batch_dot(outp_b, outp_a, axes=[2, 2]) - alpha = K.l2_normalize(alpha, 1) - alpha = K.one_hot(K.argmax(alpha, 1), K.int_shape(inp_a)[1]) - hmax = K.batch_dot(alpha, outp_b, axes=[1, 1]) - kcon = K.eye(K.int_shape(inp_a)[1], dtype='float32') - - m = [] - for i in range(self.output_dim): - outp_a = inp_a * self.W[i] - outp_hmax = hmax * self.W[i] - outp_a = K.l2_normalize(outp_a, -1) - outp_hmax = K.l2_normalize(outp_hmax, -1) - outp = K.batch_dot(outp_hmax, outp_a, axes=[2, 2]) - outp = K.sum(outp * kcon, -1, keepdims=True) - m.append(outp) - if self.output_dim > 1: - persp = K.concatenate(m, 2) - else: - persp = m[0] - return [persp, persp] diff --git a/deeppavlov/core/layers/tf_attention_mechanisms.py b/deeppavlov/core/layers/tf_attention_mechanisms.py deleted file mode 100644 index c75f6f3282..0000000000 --- a/deeppavlov/core/layers/tf_attention_mechanisms.py +++ /dev/null @@ -1,337 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from logging import getLogger - -import tensorflow as tf -from tensorflow.contrib.layers import xavier_initializer as xav - -from deeppavlov.core.layers import tf_csoftmax_attention as csoftmax_attention - -log = getLogger(__name__) - - -def general_attention(key, context, hidden_size, projected_align=False): - """ It is a implementation of the Luong et al. attention mechanism with general score. Based on the paper: - https://arxiv.org/abs/1508.04025 "Effective Approaches to Attention-based Neural Machine Translation" - Args: - key: A tensorflow tensor with dimensionality [None, None, key_size] - context: A tensorflow tensor with dimensionality [None, None, max_num_tokens, token_size] - hidden_size: Number of units in hidden representation - projected_align: Using bidirectional lstm for hidden representation of context. - If true, beetween input and attention mechanism insert layer of bidirectional lstm with dimensionality [hidden_size]. - If false, bidirectional lstm is not used. - Returns: - output: Tensor at the output with dimensionality [None, None, hidden_size] - """ - - if hidden_size % 2 != 0: - raise ValueError("hidden size must be dividable by two") - batch_size = tf.shape(context)[0] - max_num_tokens, token_size = context.get_shape().as_list()[-2:] - r_context = tf.reshape(context, shape=[-1, max_num_tokens, token_size]) - - # projected_key: [None, None, hidden_size] - projected_key = \ - tf.layers.dense(key, hidden_size, kernel_initializer=xav()) - r_projected_key = tf.reshape(projected_key, shape=[-1, hidden_size, 1]) - - lstm_fw_cell = tf.nn.rnn_cell.LSTMCell(hidden_size // 2) - lstm_bw_cell = tf.nn.rnn_cell.LSTMCell(hidden_size // 2) - (output_fw, output_bw), states = \ - tf.nn.bidirectional_dynamic_rnn(cell_fw=lstm_fw_cell, - cell_bw=lstm_bw_cell, - inputs=r_context, - dtype=tf.float32) - # bilstm_output: [-1, max_num_tokens, hidden_size] - bilstm_output = tf.concat([output_fw, output_bw], -1) - - attn = tf.nn.softmax(tf.matmul(bilstm_output, r_projected_key), dim=1) - - if projected_align: - log.info("Using projected attention alignment") - t_context = tf.transpose(bilstm_output, [0, 2, 1]) - output = tf.reshape(tf.matmul(t_context, attn), - shape=[batch_size, -1, hidden_size]) - else: - log.info("Using without projected attention alignment") - t_context = tf.transpose(r_context, [0, 2, 1]) - output = tf.reshape(tf.matmul(t_context, attn), - shape=[batch_size, -1, token_size]) - return output - - -def light_general_attention(key, context, hidden_size, projected_align=False): - """ It is a implementation of the Luong et al. attention mechanism with general score. Based on the paper: - https://arxiv.org/abs/1508.04025 "Effective Approaches to Attention-based Neural Machine Translation" - Args: - key: A tensorflow tensor with dimensionality [None, None, key_size] - context: A tensorflow tensor with dimensionality [None, None, max_num_tokens, token_size] - hidden_size: Number of units in hidden representation - projected_align: Using dense layer for hidden representation of context. - If true, between input and attention mechanism insert a dense layer with dimensionality [hidden_size]. - If false, a dense layer is not used. - Returns: - output: Tensor at the output with dimensionality [None, None, hidden_size] - """ - batch_size = tf.shape(context)[0] - max_num_tokens, token_size = context.get_shape().as_list()[-2:] - r_context = tf.reshape(context, shape=[-1, max_num_tokens, token_size]) - - # projected_key: [None, None, hidden_size] - projected_key = tf.layers.dense(key, hidden_size, kernel_initializer=xav()) - r_projected_key = tf.reshape(projected_key, shape=[-1, hidden_size, 1]) - - # projected context: [None, None, hidden_size] - projected_context = \ - tf.layers.dense(r_context, hidden_size, kernel_initializer=xav()) - - attn = tf.nn.softmax(tf.matmul(projected_context, r_projected_key), dim=1) - - if projected_align: - log.info("Using projected attention alignment") - t_context = tf.transpose(projected_context, [0, 2, 1]) - output = tf.reshape(tf.matmul(t_context, attn), - shape=[batch_size, -1, hidden_size]) - else: - log.info("Using without projected attention alignment") - t_context = tf.transpose(r_context, [0, 2, 1]) - output = tf.reshape(tf.matmul(t_context, attn), - shape=[batch_size, -1, token_size]) - return output - - -def cs_general_attention(key, context, hidden_size, depth, projected_align=False): - """ It is a implementation of the Luong et al. attention mechanism with general score and the constrained softmax (csoftmax). - Based on the papers: - https://arxiv.org/abs/1508.04025 "Effective Approaches to Attention-based Neural Machine Translation" - https://andre-martins.github.io/docs/emnlp2017_final.pdf "Learning What's Easy: Fully Differentiable Neural Easy-First Taggers" - Args: - key: A tensorflow tensor with dimensionality [None, None, key_size] - context: A tensorflow tensor with dimensionality [None, None, max_num_tokens, token_size] - hidden_size: Number of units in hidden representation - depth: Number of csoftmax usages - projected_align: Using bidirectional lstm for hidden representation of context. - If true, beetween input and attention mechanism insert layer of bidirectional lstm with dimensionality [hidden_size]. - If false, bidirectional lstm is not used. - Returns: - output: Tensor at the output with dimensionality [None, None, depth * hidden_size] - """ - if hidden_size % 2 != 0: - raise ValueError("hidden size must be dividable by two") - key_size = tf.shape(key)[-1] - batch_size = tf.shape(context)[0] - max_num_tokens, token_size = context.get_shape().as_list()[-2:] - r_context = tf.reshape(context, shape=[-1, max_num_tokens, token_size]) - # projected_context: [None, max_num_tokens, token_size] - projected_context = tf.layers.dense(r_context, token_size, - kernel_initializer=xav(), - name='projected_context') - - lstm_fw_cell = tf.nn.rnn_cell.LSTMCell(hidden_size // 2) - lstm_bw_cell = tf.nn.rnn_cell.LSTMCell(hidden_size // 2) - (output_fw, output_bw), states = \ - tf.nn.bidirectional_dynamic_rnn(cell_fw=lstm_fw_cell, - cell_bw=lstm_bw_cell, - inputs=projected_context, - dtype=tf.float32) - # bilstm_output: [-1, max_num_tokens, hidden_size] - bilstm_output = tf.concat([output_fw, output_bw], -1) - h_state_for_sketch = bilstm_output - - if projected_align: - log.info("Using projected attention alignment") - h_state_for_attn_alignment = bilstm_output - aligned_h_state = csoftmax_attention.attention_gen_block( - h_state_for_sketch, h_state_for_attn_alignment, key, depth) - output = \ - tf.reshape(aligned_h_state, shape=[batch_size, -1, depth * hidden_size]) - else: - log.info("Using without projected attention alignment") - h_state_for_attn_alignment = projected_context - aligned_h_state = csoftmax_attention.attention_gen_block( - h_state_for_sketch, h_state_for_attn_alignment, key, depth) - output = \ - tf.reshape(aligned_h_state, shape=[batch_size, -1, depth * token_size]) - return output - - -def bahdanau_attention(key, context, hidden_size, projected_align=False): - """ It is a implementation of the Bahdanau et al. attention mechanism. Based on the paper: - https://arxiv.org/abs/1409.0473 "Neural Machine Translation by Jointly Learning to Align and Translate" - Args: - key: A tensorflow tensor with dimensionality [None, None, key_size] - context: A tensorflow tensor with dimensionality [None, None, max_num_tokens, token_size] - hidden_size: Number of units in hidden representation - projected_align: Using bidirectional lstm for hidden representation of context. - If true, beetween input and attention mechanism insert layer of bidirectional lstm with dimensionality [hidden_size]. - If false, bidirectional lstm is not used. - Returns: - output: Tensor at the output with dimensionality [None, None, hidden_size] - """ - if hidden_size % 2 != 0: - raise ValueError("hidden size must be dividable by two") - batch_size = tf.shape(context)[0] - max_num_tokens, token_size = context.get_shape().as_list()[-2:] - r_context = tf.reshape(context, shape=[-1, max_num_tokens, token_size]) - - # projected_key: [None, None, hidden_size] - projected_key = tf.layers.dense(key, hidden_size, kernel_initializer=xav()) - r_projected_key = \ - tf.tile(tf.reshape(projected_key, shape=[-1, 1, hidden_size]), - [1, max_num_tokens, 1]) - - lstm_fw_cell = tf.nn.rnn_cell.LSTMCell(hidden_size // 2) - lstm_bw_cell = tf.nn.rnn_cell.LSTMCell(hidden_size // 2) - (output_fw, output_bw), states = \ - tf.nn.bidirectional_dynamic_rnn(cell_fw=lstm_fw_cell, - cell_bw=lstm_bw_cell, - inputs=r_context, - dtype=tf.float32) - - # bilstm_output: [-1,self.max_num_tokens,_n_hidden] - bilstm_output = tf.concat([output_fw, output_bw], -1) - concat_h_state = tf.concat([r_projected_key, output_fw, output_bw], -1) - projected_state = \ - tf.layers.dense(concat_h_state, hidden_size, use_bias=False, - kernel_initializer=xav()) - score = \ - tf.layers.dense(tf.tanh(projected_state), units=1, use_bias=False, - kernel_initializer=xav()) - - attn = tf.nn.softmax(score, dim=1) - - if projected_align: - log.info("Using projected attention alignment") - t_context = tf.transpose(bilstm_output, [0, 2, 1]) - output = tf.reshape(tf.matmul(t_context, attn), - shape=[batch_size, -1, hidden_size]) - else: - log.info("Using without projected attention alignment") - t_context = tf.transpose(r_context, [0, 2, 1]) - output = tf.reshape(tf.matmul(t_context, attn), - shape=[batch_size, -1, token_size]) - return output - - -def light_bahdanau_attention(key, context, hidden_size, projected_align=False): - """ It is a implementation of the Bahdanau et al. attention mechanism. Based on the paper: - https://arxiv.org/abs/1409.0473 "Neural Machine Translation by Jointly Learning to Align and Translate" - Args: - key: A tensorflow tensor with dimensionality [None, None, key_size] - context: A tensorflow tensor with dimensionality [None, None, max_num_tokens, token_size] - hidden_size: Number of units in hidden representation - projected_align: Using dense layer for hidden representation of context. - If true, between input and attention mechanism insert a dense layer with dimensionality [hidden_size]. - If false, a dense layer is not used. - Returns: - output: Tensor at the output with dimensionality [None, None, hidden_size] - """ - batch_size = tf.shape(context)[0] - max_num_tokens, token_size = context.get_shape().as_list()[-2:] - r_context = tf.reshape(context, shape=[-1, max_num_tokens, token_size]) - - # projected_key: [None, None, hidden_size] - projected_key = tf.layers.dense(key, hidden_size, kernel_initializer=xav()) - r_projected_key = \ - tf.tile(tf.reshape(projected_key, shape=[-1, 1, hidden_size]), - [1, max_num_tokens, 1]) - - # projected_context: [None, max_num_tokens, hidden_size] - projected_context = \ - tf.layers.dense(r_context, hidden_size, kernel_initializer=xav()) - concat_h_state = tf.concat([projected_context, r_projected_key], -1) - - projected_state = \ - tf.layers.dense(concat_h_state, hidden_size, use_bias=False, - kernel_initializer=xav()) - score = \ - tf.layers.dense(tf.tanh(projected_state), units=1, use_bias=False, - kernel_initializer=xav()) - - attn = tf.nn.softmax(score, dim=1) - - if projected_align: - log.info("Using projected attention alignment") - t_context = tf.transpose(projected_context, [0, 2, 1]) - output = tf.reshape(tf.matmul(t_context, attn), - shape=[batch_size, -1, hidden_size]) - else: - log.info("Using without projected attention alignment") - t_context = tf.transpose(r_context, [0, 2, 1]) - output = tf.reshape(tf.matmul(t_context, attn), - shape=[batch_size, -1, token_size]) - return output - - -def cs_bahdanau_attention(key, context, hidden_size, depth, projected_align=False): - """ It is a implementation of the Bahdanau et al. attention mechanism. Based on the papers: - https://arxiv.org/abs/1409.0473 "Neural Machine Translation by Jointly Learning to Align and Translate" - https://andre-martins.github.io/docs/emnlp2017_final.pdf "Learning What's Easy: Fully Differentiable Neural Easy-First Taggers" - Args: - key: A tensorflow tensor with dimensionality [None, None, key_size] - context: A tensorflow tensor with dimensionality [None, None, max_num_tokens, token_size] - hidden_size: Number of units in hidden representation - depth: Number of csoftmax usages - projected_align: Using bidirectional lstm for hidden representation of context. - If true, beetween input and attention mechanism insert layer of bidirectional lstm with dimensionality [hidden_size]. - If false, bidirectional lstm is not used. - Returns: - output: Tensor at the output with dimensionality [None, None, depth * hidden_size] - """ - if hidden_size % 2 != 0: - raise ValueError("hidden size must be dividable by two") - batch_size = tf.shape(context)[0] - max_num_tokens, token_size = context.get_shape().as_list()[-2:] - - r_context = tf.reshape(context, shape=[-1, max_num_tokens, token_size]) - # projected context: [None, max_num_tokens, token_size] - projected_context = tf.layers.dense(r_context, token_size, - kernel_initializer=xav(), - name='projected_context') - - # projected_key: [None, None, hidden_size] - projected_key = tf.layers.dense(key, hidden_size, kernel_initializer=xav(), - name='projected_key') - r_projected_key = \ - tf.tile(tf.reshape(projected_key, shape=[-1, 1, hidden_size]), - [1, max_num_tokens, 1]) - - lstm_fw_cell = tf.nn.rnn_cell.LSTMCell(hidden_size // 2) - lstm_bw_cell = tf.nn.rnn_cell.LSTMCell(hidden_size // 2) - (output_fw, output_bw), states = \ - tf.nn.bidirectional_dynamic_rnn(cell_fw=lstm_fw_cell, - cell_bw=lstm_bw_cell, - inputs=projected_context, - dtype=tf.float32) - - # bilstm_output: [-1, max_num_tokens, hidden_size] - bilstm_output = tf.concat([output_fw, output_bw], -1) - concat_h_state = tf.concat([r_projected_key, output_fw, output_bw], -1) - - if projected_align: - log.info("Using projected attention alignment") - h_state_for_attn_alignment = bilstm_output - aligned_h_state = csoftmax_attention.attention_bah_block( - concat_h_state, h_state_for_attn_alignment, depth) - output = \ - tf.reshape(aligned_h_state, shape=[batch_size, -1, depth * hidden_size]) - else: - log.info("Using without projected attention alignment") - h_state_for_attn_alignment = projected_context - aligned_h_state = csoftmax_attention.attention_bah_block( - concat_h_state, h_state_for_attn_alignment, depth) - output = \ - tf.reshape(aligned_h_state, shape=[batch_size, -1, depth * token_size]) - return output diff --git a/deeppavlov/core/layers/tf_csoftmax_attention.py b/deeppavlov/core/layers/tf_csoftmax_attention.py deleted file mode 100644 index 764a727dc2..0000000000 --- a/deeppavlov/core/layers/tf_csoftmax_attention.py +++ /dev/null @@ -1,255 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import tensorflow as tf - - -def csoftmax_for_slice(input): - """ It is a implementation of the constrained softmax (csoftmax) for slice. - Based on the paper: - https://andre-martins.github.io/docs/emnlp2017_final.pdf "Learning What's Easy: Fully Differentiable Neural Easy-First Taggers" (page 4) - Args: - input: A list of [input tensor, cumulative attention]. - Returns: - output: A list of [csoftmax results, masks] - """ - - [ten, u] = input - - shape_t = ten.shape - shape_u = u.shape - - ten -= tf.reduce_mean(ten) - q = tf.exp(ten) - active = tf.ones_like(u, dtype=tf.int32) - mass = tf.constant(0, dtype=tf.float32) - found = tf.constant(True, dtype=tf.bool) - - def loop(q_, mask, mass_, found_): - q_list = tf.dynamic_partition(q_, mask, 2) - condition_indices = tf.dynamic_partition(tf.range(tf.shape(q_)[0]), mask, 2) # 0 element it False, - # 1 element if true - - p = q_list[1] * (1.0 - mass_) / tf.reduce_sum(q_list[1]) - p_new = tf.dynamic_stitch(condition_indices, [q_list[0], p]) - - # condition verification and mask modification - less_mask = tf.cast(tf.less(u, p_new), tf.int32) # 0 when u is bigger than p, 1 when u is less than p - condition_indices = tf.dynamic_partition(tf.range(tf.shape(p_new)[0]), less_mask, - 2) # 0 when u is bigger than p, 1 when u is less than p - - split_p_new = tf.dynamic_partition(p_new, less_mask, 2) - split_u = tf.dynamic_partition(u, less_mask, 2) - - alpha = tf.dynamic_stitch(condition_indices, [split_p_new[0], split_u[1]]) - mass_ += tf.reduce_sum(split_u[1]) - - mask = mask * (tf.ones_like(less_mask) - less_mask) - - found_ = tf.cond(tf.equal(tf.reduce_sum(less_mask), 0), - lambda: False, - lambda: True) - - alpha = tf.reshape(alpha, q_.shape) - - return alpha, mask, mass_, found_ - - (csoft, mask_, _, _) = tf.while_loop(cond=lambda _0, _1, _2, f: f, - body=loop, - loop_vars=(q, active, mass, found)) - - return [csoft, mask_] - - -def csoftmax(tensor, inv_cumulative_att): - """ It is a implementation of the constrained softmax (csoftmax). - Based on the paper: - https://andre-martins.github.io/docs/emnlp2017_final.pdf "Learning What's Easy: Fully Differentiable Neural Easy-First Taggers" - Args: - tensor: A tensorflow tensor is score. This tensor have dimensionality [None, n_tokens] - inv_cumulative_att: A inverse cumulative attention tensor with dimensionality [None, n_tokens] - Returns: - cs: Tensor at the output with dimensionality [None, n_tokens] - """ - shape_ten = tensor.shape - shape_cum = inv_cumulative_att.shape - - merge_tensor = [tensor, inv_cumulative_att] - cs, _ = tf.map_fn(csoftmax_for_slice, merge_tensor, dtype=[tf.float32, tf.float32]) # [bs, L] - return cs - - -def attention_gen_step(hidden_for_sketch, hidden_for_attn_alignment, sketch, key, cum_att): - """ It is a implementation one step of block of the Luong et al. attention mechanism with general score and the constrained softmax (csoftmax). - Based on the papers: - https://arxiv.org/abs/1508.04025 "Effective Approaches to Attention-based Neural Machine Translation" - https://andre-martins.github.io/docs/emnlp2017_final.pdf "Learning What's Easy: Fully Differentiable Neural Easy-First Taggers" - Args: - hidden_for_sketch: A tensorflow tensor for a sketch computing. This tensor have dimensionality [None, max_num_tokens, sketch_hidden_size] - hidden_for_attn_alignment: A tensorflow tensor is aligned for output during a performing. This tensor have dimensionality [None, max_num_tokens, hidden_size_for_attn_alignment] - sketch: A previous step sketch tensor for a sketch computing. This tensor have dimensionality [None, sketch_hidden_size] - key: A tensorflow tensor with dimensionality [None, None, key_size] - cum_att: A cumulative attention tensor with dimensionality [None, max_num_tokens] - Returns: - next_sketch: Tensor of the current step sketch with dimensionality [None, sketch_hidden_size] - att: Tensor of the current step attention with dimensionality [None, max_num_tokens] - aligned_hidden_sketch: Tensor of aligned hidden state of current step with dimensionality [None, hidden_size_for_attn_alignment] - """ - with tf.name_scope('attention_step'): - sketch_dims = hidden_for_sketch.get_shape().as_list() - batch_size = sketch_dims[0] - num_tokens = sketch_dims[1] - hidden_size = sketch_dims[2] - attn_alignment_dims = hidden_for_attn_alignment.get_shape().as_list() - attn_alignment_hidden_size = attn_alignment_dims[2] - - repeated_sketch = tf.tile(tf.reshape(sketch, [-1, 1, hidden_size]), (1, num_tokens, 1)) - concat_mem = tf.concat([hidden_for_sketch, repeated_sketch], -1) - - concat_mem = tf.reshape(concat_mem, [-1, num_tokens, 2 * hidden_size]) # dirty trick - reduce_mem = tf.layers.dense(concat_mem, hidden_size) - - projected_key = tf.layers.dense(key, hidden_size) - t_key = tf.reshape(projected_key, [-1, hidden_size, 1]) - - score = tf.reshape(tf.matmul(reduce_mem, t_key), [-1, num_tokens]) - - inv_cum_att = tf.reshape(tf.ones_like(cum_att) - cum_att, [-1, num_tokens]) - att = csoftmax(score, inv_cum_att) - - t_reduce_mem = tf.transpose(reduce_mem, [0, 2, 1]) - t_hidden_for_attn_alignment = tf.transpose(hidden_for_attn_alignment, [0, 2, 1]) - - r_att = tf.reshape(att, [-1, num_tokens, 1]) - - next_sketch = tf.squeeze(tf.matmul(t_reduce_mem, r_att), -1) - aligned_hidden_sketch = tf.squeeze(tf.matmul(t_hidden_for_attn_alignment, r_att), -1) - return next_sketch, att, aligned_hidden_sketch - - -def attention_gen_block(hidden_for_sketch, hidden_for_attn_alignment, key, attention_depth): - """ It is a implementation of the Luong et al. attention mechanism with general score and the constrained softmax (csoftmax). - Based on the papers: - https://arxiv.org/abs/1508.04025 "Effective Approaches to Attention-based Neural Machine Translation" - https://andre-martins.github.io/docs/emnlp2017_final.pdf "Learning What's Easy: Fully Differentiable Neural Easy-First Taggers" - Args: - hidden_for_sketch: A tensorflow tensor for a sketch computing. This tensor have dimensionality [None, max_num_tokens, sketch_hidden_size] - hidden_for_attn_alignment: A tensorflow tensor is aligned for output during a performing. This tensor have dimensionality [None, max_num_tokens, hidden_size_for_attn_alignment] - key: A tensorflow tensor with dimensionality [None, None, key_size] - attention_depth: Number of usage csoftmax - Returns: - final_aligned_hiddens: Tensor at the output with dimensionality [1, attention_depth, hidden_size_for_attn_alignment] - """ - with tf.name_scope('attention_block'): - sketch_dims = tf.shape(hidden_for_sketch) - batch_size = sketch_dims[0] - num_tokens = sketch_dims[1] - hidden_size = sketch_dims[2] - - attn_alignment_dims = tf.shape(hidden_for_attn_alignment) - attn_alignment_hidden_size = attn_alignment_dims[2] - - sketches = [tf.zeros(shape=[batch_size, hidden_size], dtype=tf.float32)] - aligned_hiddens = [] - cum_att = tf.zeros(shape=[batch_size, num_tokens]) # cumulative attention - for i in range(attention_depth): - sketch, cum_att_, aligned_hidden = attention_gen_step(hidden_for_sketch, hidden_for_attn_alignment, - sketches[-1], key, cum_att) - sketches.append(sketch) # sketch - aligned_hiddens.append(aligned_hidden) # sketch - cum_att += cum_att_ - final_aligned_hiddens = tf.reshape(tf.transpose(tf.stack(aligned_hiddens), [1, 0, 2]), - [1, attention_depth, attn_alignment_hidden_size]) - return final_aligned_hiddens - - -def attention_bah_step(hidden_for_sketch, hidden_for_attn_alignment, sketch, cum_att): - """ It is a implementation one step of block of the Bahdanau et al. attention mechanism with concat score and the constrained softmax (csoftmax). - Based on the papers: - https://arxiv.org/abs/1409.0473 "Neural Machine Translation by Jointly Learning to Align and Translate" - https://andre-martins.github.io/docs/emnlp2017_final.pdf "Learning What's Easy: Fully Differentiable Neural Easy-First Taggers" - Args: - hidden_for_sketch: A tensorflow tensor for a sketch computing. This tensor have dimensionality [None, max_num_tokens, sketch_hidden_size] - hidden_for_attn_alignment: A tensorflow tensor is aligned for output during a performing. This tensor have dimensionality [None, max_num_tokens, hidden_size_for_attn_alignment] - sketch: A previous step sketch tensor for a sketch computing. This tensor have dimensionality [None, sketch_hidden_size] - key: A tensorflow tensor with dimensionality [None, None, key_size] - cum_att: A cumulative attention tensor with dimensionality [None, max_num_tokens] - Returns: - next_sketch: Tensor of the current step sketch with dimensionality [None, sketch_hidden_size] - att: Tensor of the current step attention with dimensionality [None, max_num_tokens] - aligned_hidden_sketch: Tensor of aligned hidden state of current step with dimensionality [None, hidden_size_for_attn_alignment] - """ - with tf.name_scope('attention_step'): - sketch_dims = hidden_for_sketch.get_shape().as_list() - batch_size = sketch_dims[0] - num_tokens = sketch_dims[1] - hidden_size = sketch_dims[2] - attn_alignment_dims = hidden_for_attn_alignment.get_shape().as_list() - attn_alignment_hidden_size = attn_alignment_dims[2] - - repeated_sketch = tf.tile(tf.reshape(sketch, [-1, 1, hidden_size]), (1, num_tokens, 1)) - concat_mem = tf.concat([hidden_for_sketch, repeated_sketch], -1) - - concat_mem = tf.reshape(concat_mem, [-1, num_tokens, 2 * hidden_size]) # dirty trick - reduce_mem = tf.layers.dense(concat_mem, hidden_size) - - score = tf.squeeze(tf.layers.dense(reduce_mem, units=1, - use_bias=False), -1) - inv_cum_att = tf.reshape(tf.ones_like(cum_att) - cum_att, [-1, num_tokens]) - att = csoftmax(score, inv_cum_att) - - t_reduce_mem = tf.transpose(reduce_mem, [0, 2, 1]) - t_hidden_for_attn_alignment = tf.transpose(hidden_for_attn_alignment, [0, 2, 1]) - - r_att = tf.reshape(att, [-1, num_tokens, 1]) - - next_sketch = tf.squeeze(tf.matmul(t_reduce_mem, r_att), -1) - aligned_hidden_sketch = tf.squeeze(tf.matmul(t_hidden_for_attn_alignment, r_att), -1) - return next_sketch, att, aligned_hidden_sketch - - -def attention_bah_block(hidden_for_sketch, hidden_for_attn_alignment, attention_depth): - """ It is a implementation of the Bahdanau et al. attention mechanism with concat score and the constrained softmax (csoftmax). - Based on the papers: - https://arxiv.org/abs/1409.0473 "Neural Machine Translation by Jointly Learning to Align and Translate" - https://andre-martins.github.io/docs/emnlp2017_final.pdf "Learning What's Easy: Fully Differentiable Neural Easy-First Taggers" - Args: - hidden_for_sketch: A tensorflow tensor for a sketch computing. This tensor have dimensionality [None, max_num_tokens, sketch_hidden_size] - hidden_for_attn_alignment: A tensorflow tensor is aligned for output during a performing. This tensor have dimensionality [None, max_num_tokens, hidden_size_for_attn_alignment] - key: A tensorflow tensor with dimensionality [None, None, key_size] - attention_depth: Number of usage csoftmax - Returns: - final_aligned_hiddens: Tensor at the output with dimensionality [1, attention_depth, hidden_size_for_attn_alignment] - """ - with tf.name_scope('attention_block'): - sketch_dims = tf.shape(hidden_for_sketch) - batch_size = sketch_dims[0] - num_tokens = sketch_dims[1] - hidden_size = sketch_dims[2] - - attn_alignment_dims = tf.shape(hidden_for_attn_alignment) - attn_alignment_hidden_size = attn_alignment_dims[2] - - sketches = [tf.zeros(shape=[batch_size, hidden_size], dtype=tf.float32)] - aligned_hiddens = [] - cum_att = tf.zeros(shape=[batch_size, num_tokens]) # cumulative attention - for i in range(attention_depth): - sketch, cum_att_, aligned_hidden = attention_bah_step(hidden_for_sketch, hidden_for_attn_alignment, - sketches[-1], cum_att) - sketches.append(sketch) # sketch - aligned_hiddens.append(aligned_hidden) # sketch - cum_att += cum_att_ - final_aligned_hiddens = tf.reshape(tf.transpose(tf.stack(aligned_hiddens), [1, 0, 2]), - [1, attention_depth, attn_alignment_hidden_size]) - return final_aligned_hiddens diff --git a/deeppavlov/core/layers/tf_layers.py b/deeppavlov/core/layers/tf_layers.py deleted file mode 100644 index 20158d4614..0000000000 --- a/deeppavlov/core/layers/tf_layers.py +++ /dev/null @@ -1,952 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from logging import getLogger -from typing import List, Union - -import numpy as np -import tensorflow as tf - -from deeppavlov.core.common.check_gpu import check_gpu_existence - -log = getLogger(__name__) - -INITIALIZER = tf.orthogonal_initializer - - -# INITIALIZER = xavier_initializer - - -def stacked_cnn(units: tf.Tensor, - n_hidden_list: List, - filter_width=3, - use_batch_norm=False, - use_dilation=False, - training_ph=None, - add_l2_losses=False): - """ Number of convolutional layers stacked on top of each other - - Args: - units: a tensorflow tensor with dimensionality [None, n_tokens, n_features] - n_hidden_list: list with number of hidden units at the ouput of each layer - filter_width: width of the kernel in tokens - use_batch_norm: whether to use batch normalization between layers - use_dilation: use power of 2 dilation scheme [1, 2, 4, 8 .. ] for layers 1, 2, 3, 4 ... - training_ph: boolean placeholder determining whether is training phase now or not. - It is used only for batch normalization to determine whether to use - current batch average (std) or memory stored average (std) - add_l2_losses: whether to add l2 losses on network kernels to - tf.GraphKeys.REGULARIZATION_LOSSES or not - - Returns: - units: tensor at the output of the last convolutional layer - """ - l2_reg = tf.nn.l2_loss if add_l2_losses else None - for n_layer, n_hidden in enumerate(n_hidden_list): - if use_dilation: - dilation_rate = 2 ** n_layer - else: - dilation_rate = 1 - units = tf.layers.conv1d(units, - n_hidden, - filter_width, - padding='same', - dilation_rate=dilation_rate, - kernel_initializer=INITIALIZER(), - kernel_regularizer=l2_reg) - if use_batch_norm: - assert training_ph is not None - units = tf.layers.batch_normalization(units, training=training_ph) - units = tf.nn.relu(units) - return units - - -def dense_convolutional_network(units: tf.Tensor, - n_hidden_list: List, - filter_width=3, - use_dilation=False, - use_batch_norm=False, - training_ph=None): - """ Densely connected convolutional layers. Based on the paper: - [Gao 17] https://arxiv.org/abs/1608.06993 - - Args: - units: a tensorflow tensor with dimensionality [None, n_tokens, n_features] - n_hidden_list: list with number of hidden units at the ouput of each layer - filter_width: width of the kernel in tokens - use_batch_norm: whether to use batch normalization between layers - use_dilation: use power of 2 dilation scheme [1, 2, 4, 8 .. ] for layers 1, 2, 3, 4 ... - training_ph: boolean placeholder determining whether is training phase now or not. - It is used only for batch normalization to determine whether to use - current batch average (std) or memory stored average (std) - Returns: - units: tensor at the output of the last convolutional layer - with dimensionality [None, n_tokens, n_hidden_list[-1]] - """ - units_list = [units] - for n_layer, n_filters in enumerate(n_hidden_list): - total_units = tf.concat(units_list, axis=-1) - if use_dilation: - dilation_rate = 2 ** n_layer - else: - dilation_rate = 1 - units = tf.layers.conv1d(total_units, - n_filters, - filter_width, - dilation_rate=dilation_rate, - padding='same', - kernel_initializer=INITIALIZER()) - if use_batch_norm: - units = tf.layers.batch_normalization(units, training=training_ph) - units = tf.nn.relu(units) - units_list.append(units) - return units - - -def bi_rnn(units: tf.Tensor, - n_hidden: Union[List, int], - cell_type='gru', - seq_lengths=None, - trainable_initial_states=False, - use_peepholes=False, - name='Bi-'): - """ Bi directional recurrent neural network. GRU or LSTM - - Args: - units: a tensorflow tensor with dimensionality [None, n_tokens, n_features] - n_hidden: list with number of hidden units in the output of each layer if - cell_type is 'lstm' and int if cell_type is 'gru'. - seq_lengths: length of sequences for different length sequences in batch - can be None for maximum length as a length for every sample in the batch - cell_type: 'lstm' or 'gru' - trainable_initial_states: whether to create a special trainable variable - to initialize the hidden states of the network or use just zeros - use_peepholes: whether to use peephole connections (only 'lstm' case affected) - name: what variable_scope to use for the network parameters - - Returns: - units: a tuple of tensors at the output of the last recurrent layer - with dimensionality [None, n_tokens, n_hidden[-1]] if cell_type is 'lstm' and - with dimensionality [None, n_tokens, n_hidden] if cell_type is 'gru'. - The tensors contain the outputs of forward and backward passes of - the birnn correspondingly. - last_units: tensor of last hidden states for GRU and tuple - of last hidden stated and last cell states for LSTM - dimensionality of cell states and hidden states are - similar and equal to [B x 2 * H], where B - batch - size and H is number of hidden units - """ - - with tf.variable_scope(name + '_' + cell_type.upper()): - if cell_type == 'gru': - forward_cell = tf.nn.rnn_cell.GRUCell(n_hidden, kernel_initializer=INITIALIZER()) - backward_cell = tf.nn.rnn_cell.GRUCell(n_hidden, kernel_initializer=INITIALIZER()) - if trainable_initial_states: - initial_state_fw = tf.tile(tf.get_variable('init_fw_h', [1, n_hidden]), (tf.shape(units)[0], 1)) - initial_state_bw = tf.tile(tf.get_variable('init_bw_h', [1, n_hidden]), (tf.shape(units)[0], 1)) - else: - initial_state_fw = initial_state_bw = None - elif cell_type == 'lstm': - forward_cell = tf.nn.rnn_cell.LSTMCell(n_hidden, use_peepholes=use_peepholes, initializer=INITIALIZER()) - backward_cell = tf.nn.rnn_cell.LSTMCell(n_hidden, use_peepholes=use_peepholes, initializer=INITIALIZER()) - if trainable_initial_states: - initial_state_fw = tf.nn.rnn_cell.LSTMStateTuple( - tf.tile(tf.get_variable('init_fw_c', [1, n_hidden]), (tf.shape(units)[0], 1)), - tf.tile(tf.get_variable('init_fw_h', [1, n_hidden]), (tf.shape(units)[0], 1))) - initial_state_bw = tf.nn.rnn_cell.LSTMStateTuple( - tf.tile(tf.get_variable('init_bw_c', [1, n_hidden]), (tf.shape(units)[0], 1)), - tf.tile(tf.get_variable('init_bw_h', [1, n_hidden]), (tf.shape(units)[0], 1))) - else: - initial_state_fw = initial_state_bw = None - else: - raise RuntimeError('cell_type must be either "gru" or "lstm"s') - (rnn_output_fw, rnn_output_bw), (fw, bw) = \ - tf.nn.bidirectional_dynamic_rnn(forward_cell, - backward_cell, - units, - dtype=tf.float32, - sequence_length=seq_lengths, - initial_state_fw=initial_state_fw, - initial_state_bw=initial_state_bw) - kernels = [var for var in forward_cell.trainable_variables + - backward_cell.trainable_variables if 'kernel' in var.name] - for kernel in kernels: - tf.add_to_collection(tf.GraphKeys.REGULARIZATION_LOSSES, tf.nn.l2_loss(kernel)) - return (rnn_output_fw, rnn_output_bw), (fw, bw) - - -def stacked_bi_rnn(units: tf.Tensor, - n_hidden_list: List, - cell_type='gru', - seq_lengths=None, - use_peepholes=False, - name='RNN_layer'): - """ Stackted recurrent neural networks GRU or LSTM - - Args: - units: a tensorflow tensor with dimensionality [None, n_tokens, n_features] - n_hidden_list: list with number of hidden units at the ouput of each layer - seq_lengths: length of sequences for different length sequences in batch - can be None for maximum length as a length for every sample in the batch - cell_type: 'lstm' or 'gru' - use_peepholes: whether to use peephole connections (only 'lstm' case affected) - name: what variable_scope to use for the network parameters - Returns: - units: tensor at the output of the last recurrent layer - with dimensionality [None, n_tokens, n_hidden_list[-1]] - last_units: tensor of last hidden states for GRU and tuple - of last hidden stated and last cell states for LSTM - dimensionality of cell states and hidden states are - similar and equal to [B x 2 * H], where B - batch - size and H is number of hidden units - """ - for n, n_hidden in enumerate(n_hidden_list): - with tf.variable_scope(name + '_' + str(n)): - if cell_type == 'gru': - forward_cell = tf.nn.rnn_cell.GRUCell(n_hidden) - backward_cell = tf.nn.rnn_cell.GRUCell(n_hidden) - elif cell_type == 'lstm': - forward_cell = tf.nn.rnn_cell.LSTMCell(n_hidden, use_peepholes=use_peepholes) - backward_cell = tf.nn.rnn_cell.LSTMCell(n_hidden, use_peepholes=use_peepholes) - else: - raise RuntimeError('cell_type must be either gru or lstm') - - (rnn_output_fw, rnn_output_bw), (fw, bw) = \ - tf.nn.bidirectional_dynamic_rnn(forward_cell, - backward_cell, - units, - dtype=tf.float32, - sequence_length=seq_lengths) - units = tf.concat([rnn_output_fw, rnn_output_bw], axis=2) - if cell_type == 'gru': - last_units = tf.concat([fw, bw], axis=1) - else: - (c_fw, h_fw), (c_bw, h_bw) = fw, bw - c = tf.concat([c_fw, c_bw], axis=1) - h = tf.concat([h_fw, h_bw], axis=1) - last_units = (h, c) - return units, last_units - - -def u_shape(units: tf.Tensor, - n_hidden_list: List, - filter_width=7, - use_batch_norm=False, - training_ph=None): - """ Network architecture inspired by One Hundred layer Tiramisu. - https://arxiv.org/abs/1611.09326. U-Net like. - - Args: - units: a tensorflow tensor with dimensionality [None, n_tokens, n_features] - n_hidden_list: list with number of hidden units at the ouput of each layer - filter_width: width of the kernel in tokens - use_batch_norm: whether to use batch normalization between layers - training_ph: boolean placeholder determining whether is training phase now or not. - It is used only for batch normalization to determine whether to use - current batch average (std) or memory stored average (std) - Returns: - units: tensor at the output of the last convolutional layer - with dimensionality [None, n_tokens, n_hidden_list[-1]] - """ - - # Bread Crumbs - units_for_skip_conn = [] - conv_net_params = {'filter_width': filter_width, - 'use_batch_norm': use_batch_norm, - 'training_ph': training_ph} - - # Go down the rabbit hole - for n_hidden in n_hidden_list: - units = stacked_cnn(units, [n_hidden], **conv_net_params) - units_for_skip_conn.append(units) - units = tf.layers.max_pooling1d(units, pool_size=2, strides=2, padding='same') - - units = stacked_cnn(units, [n_hidden], **conv_net_params) - - # Up to the sun light - for down_step, n_hidden in enumerate(n_hidden_list[::-1]): - units = tf.expand_dims(units, axis=2) - units = tf.layers.conv2d_transpose(units, n_hidden, filter_width, strides=(2, 1), padding='same') - units = tf.squeeze(units, axis=2) - - # Skip connection - skip_units = units_for_skip_conn[-(down_step + 1)] - if skip_units.get_shape().as_list()[-1] != n_hidden: - skip_units = tf.layers.dense(skip_units, n_hidden) - units = skip_units + units - - units = stacked_cnn(units, [n_hidden], **conv_net_params) - return units - - -def stacked_highway_cnn(units: tf.Tensor, - n_hidden_list: List, - filter_width=3, - use_batch_norm=False, - use_dilation=False, - training_ph=None): - """ Highway convolutional network. Skip connection with gating - mechanism. - - Args: - units: a tensorflow tensor with dimensionality [None, n_tokens, n_features] - n_hidden_list: list with number of hidden units at the output of each layer - filter_width: width of the kernel in tokens - use_batch_norm: whether to use batch normalization between layers - use_dilation: use power of 2 dilation scheme [1, 2, 4, 8 .. ] for layers 1, 2, 3, 4 ... - training_ph: boolean placeholder determining whether is training phase now or not. - It is used only for batch normalization to determine whether to use - current batch average (std) or memory stored average (std) - Returns: - units: tensor at the output of the last convolutional layer - with dimensionality [None, n_tokens, n_hidden_list[-1]] - """ - - for n_layer, n_hidden in enumerate(n_hidden_list): - input_units = units - # Projection if needed - if input_units.get_shape().as_list()[-1] != n_hidden: - input_units = tf.layers.dense(input_units, n_hidden) - if use_dilation: - dilation_rate = 2 ** n_layer - else: - dilation_rate = 1 - units = tf.layers.conv1d(units, - n_hidden, - filter_width, - padding='same', - dilation_rate=dilation_rate, - kernel_initializer=INITIALIZER()) - if use_batch_norm: - units = tf.layers.batch_normalization(units, training=training_ph) - sigmoid_gate = tf.layers.dense(input_units, 1, activation=tf.sigmoid, kernel_initializer=INITIALIZER()) - input_units = sigmoid_gate * input_units + (1 - sigmoid_gate) * units - input_units = tf.nn.relu(input_units) - units = input_units - return units - - -def embedding_layer(token_indices=None, - token_embedding_matrix=None, - n_tokens=None, - token_embedding_dim=None, - name: str = None, - trainable=True): - """ Token embedding layer. Create matrix of for token embeddings. - Can be initialized with given matrix (for example pre-trained - with word2ve algorithm - - Args: - token_indices: token indices tensor of type tf.int32 - token_embedding_matrix: matrix of embeddings with dimensionality - [n_tokens, embeddings_dimension] - n_tokens: total number of unique tokens - token_embedding_dim: dimensionality of embeddings, typical 100..300 - name: embedding matrix name (variable name) - trainable: whether to set the matrix trainable or not - - Returns: - embedded_tokens: tf tensor of size [B, T, E], where B - batch size - T - number of tokens, E - token_embedding_dim - """ - if token_embedding_matrix is not None: - tok_mat = token_embedding_matrix - if trainable: - Warning('Matrix of embeddings is passed to the embedding_layer, ' - 'possibly there is a pre-trained embedding matrix. ' - 'Embeddings paramenters are set to Trainable!') - else: - tok_mat = np.random.randn(n_tokens, token_embedding_dim).astype(np.float32) / np.sqrt(token_embedding_dim) - tok_emb_mat = tf.Variable(tok_mat, name=name, trainable=trainable) - embedded_tokens = tf.nn.embedding_lookup(tok_emb_mat, token_indices) - return embedded_tokens - - -def character_embedding_network(char_placeholder: tf.Tensor, - n_characters: int = None, - emb_mat: np.array = None, - char_embedding_dim: int = None, - filter_widths=(3, 4, 5, 7), - highway_on_top=False): - """ Characters to vector. Every sequence of characters (token) - is embedded to vector space with dimensionality char_embedding_dim - Convolution plus max_pooling is used to obtain vector representations - of words. - - Args: - char_placeholder: placeholder of int32 type with dimensionality [B, T, C] - B - batch size (can be None) - T - Number of tokens (can be None) - C - number of characters (can be None) - n_characters: total number of unique characters - emb_mat: if n_characters is not provided the emb_mat should be provided - it is a numpy array with dimensions [V, E], where V - vocabulary size - and E - embeddings dimension - char_embedding_dim: dimensionality of characters embeddings - filter_widths: array of width of kernel in convolutional embedding network - used in parallel - - Returns: - embeddings: tf.Tensor with dimensionality [B, T, F], - where F is dimensionality of embeddings - """ - if emb_mat is None: - emb_mat = np.random.randn(n_characters, char_embedding_dim).astype(np.float32) / np.sqrt(char_embedding_dim) - else: - char_embedding_dim = emb_mat.shape[1] - char_emb_var = tf.Variable(emb_mat, trainable=True) - with tf.variable_scope('Char_Emb_Network'): - # Character embedding layer - c_emb = tf.nn.embedding_lookup(char_emb_var, char_placeholder) - - # Character embedding network - conv_results_list = [] - for filter_width in filter_widths: - conv_results_list.append(tf.layers.conv2d(c_emb, - char_embedding_dim, - (1, filter_width), - padding='same', - kernel_initializer=INITIALIZER)) - units = tf.concat(conv_results_list, axis=3) - units = tf.reduce_max(units, axis=2) - if highway_on_top: - sigmoid_gate = tf.layers.dense(units, - 1, - activation=tf.sigmoid, - kernel_initializer=INITIALIZER, - kernel_regularizer=tf.nn.l2_loss) - deeper_units = tf.layers.dense(units, - tf.shape(units)[-1], - kernel_initializer=INITIALIZER, - kernel_regularizer=tf.nn.l2_loss) - units = sigmoid_gate * units + (1 - sigmoid_gate) * deeper_units - units = tf.nn.relu(units) - return units - - -def expand_tile(units, axis): - """Expand and tile tensor along given axis - Args: - units: tf tensor with dimensions [batch_size, time_steps, n_input_features] - axis: axis along which expand and tile. Must be 1 or 2 - - """ - assert axis in (1, 2) - n_time_steps = tf.shape(units)[1] - repetitions = [1, 1, 1, 1] - repetitions[axis] = n_time_steps - return tf.tile(tf.expand_dims(units, axis), repetitions) - - -def additive_self_attention(units, n_hidden=None, n_output_features=None, activation=None): - """ Computes additive self attention for time series of vectors (with batch dimension) - the formula: score(h_i, h_j) = - v is a learnable vector of n_hidden dimensionality, - W_1 and W_2 are learnable [n_hidden, n_input_features] matrices - - Args: - units: tf tensor with dimensionality [batch_size, time_steps, n_input_features] - n_hidden: number of units in hidden representation of similarity measure - n_output_features: number of features in output dense layer - activation: activation at the output - - Returns: - output: self attended tensor with dimensionality [batch_size, time_steps, n_output_features] - """ - n_input_features = units.get_shape().as_list()[2] - if n_hidden is None: - n_hidden = n_input_features - if n_output_features is None: - n_output_features = n_input_features - units_pairs = tf.concat([expand_tile(units, 1), expand_tile(units, 2)], 3) - query = tf.layers.dense(units_pairs, n_hidden, activation=tf.tanh, kernel_initializer=INITIALIZER()) - attention = tf.nn.softmax(tf.layers.dense(query, 1), dim=2) - attended_units = tf.reduce_sum(attention * expand_tile(units, 1), axis=2) - output = tf.layers.dense(attended_units, n_output_features, activation, kernel_initializer=INITIALIZER()) - return output - - -def multiplicative_self_attention(units, n_hidden=None, n_output_features=None, activation=None): - """ Computes multiplicative self attention for time series of vectors (with batch dimension) - the formula: score(h_i, h_j) = , W_1 and W_2 are learnable matrices - with dimensionality [n_hidden, n_input_features], where stands for a and b - dot product - - Args: - units: tf tensor with dimensionality [batch_size, time_steps, n_input_features] - n_hidden: number of units in hidden representation of similarity measure - n_output_features: number of features in output dense layer - activation: activation at the output - - Returns: - output: self attended tensor with dimensionality [batch_size, time_steps, n_output_features] - """ - n_input_features = units.get_shape().as_list()[2] - if n_hidden is None: - n_hidden = n_input_features - if n_output_features is None: - n_output_features = n_input_features - queries = tf.layers.dense(expand_tile(units, 1), n_hidden, kernel_initializer=INITIALIZER()) - keys = tf.layers.dense(expand_tile(units, 2), n_hidden, kernel_initializer=INITIALIZER()) - scores = tf.reduce_sum(queries * keys, axis=3, keep_dims=True) - attention = tf.nn.softmax(scores, dim=2) - attended_units = tf.reduce_sum(attention * expand_tile(units, 1), axis=2) - output = tf.layers.dense(attended_units, n_output_features, activation, kernel_initializer=INITIALIZER()) - return output - - -def cudnn_gru(units, n_hidden, n_layers=1, trainable_initial_states=False, - seq_lengths=None, input_initial_h=None, name='cudnn_gru', reuse=False): - """ Fast CuDNN GRU implementation - - Args: - units: tf.Tensor with dimensions [B x T x F], where - B - batch size - T - number of tokens - F - features - - n_hidden: dimensionality of hidden state - trainable_initial_states: whether to create a special trainable variable - to initialize the hidden states of the network or use just zeros - seq_lengths: tensor of sequence lengths with dimension [B] - n_layers: number of layers - input_initial_h: initial hidden state, tensor - name: name of the variable scope to use - reuse:whether to reuse already initialized variable - - Returns: - h - all hidden states along T dimension, - tf.Tensor with dimensionality [B x T x F] - h_last - last hidden state, tf.Tensor with dimensionality [B x H] - """ - with tf.variable_scope(name, reuse=reuse): - gru = tf.contrib.cudnn_rnn.CudnnGRU(num_layers=n_layers, - num_units=n_hidden) - - if trainable_initial_states: - init_h = tf.get_variable('init_h', [n_layers, 1, n_hidden]) - init_h = tf.tile(init_h, (1, tf.shape(units)[0], 1)) - else: - init_h = tf.zeros([n_layers, tf.shape(units)[0], n_hidden]) - - initial_h = input_initial_h or init_h - - h, h_last = gru(tf.transpose(units, (1, 0, 2)), (initial_h,)) - h = tf.transpose(h, (1, 0, 2)) - h_last = tf.squeeze(h_last, axis=0)[-1] # extract last layer state - - # Extract last states if they are provided - if seq_lengths is not None: - indices = tf.stack([tf.range(tf.shape(h)[0]), seq_lengths - 1], axis=1) - h_last = tf.gather_nd(h, indices) - - return h, h_last - - -def cudnn_compatible_gru(units, n_hidden, n_layers=1, trainable_initial_states=False, - seq_lengths=None, input_initial_h=None, name='cudnn_gru', reuse=False): - """ CuDNN Compatible GRU implementation. - It should be used to load models saved with CudnnGRUCell to run on CPU. - - Args: - units: tf.Tensor with dimensions [B x T x F], where - B - batch size - T - number of tokens - F - features - - n_hidden: dimensionality of hidden state - trainable_initial_states: whether to create a special trainable variable - to initialize the hidden states of the network or use just zeros - seq_lengths: tensor of sequence lengths with dimension [B] - n_layers: number of layers - input_initial_h: initial hidden state, tensor - name: name of the variable scope to use - reuse:whether to reuse already initialized variable - - Returns: - h - all hidden states along T dimension, - tf.Tensor with dimensionality [B x T x F] - h_last - last hidden state, tf.Tensor with dimensionality [B x H] - """ - with tf.variable_scope(name, reuse=reuse): - - if trainable_initial_states: - init_h = tf.get_variable('init_h', [n_layers, 1, n_hidden]) - init_h = tf.tile(init_h, (1, tf.shape(units)[0], 1)) - else: - init_h = tf.zeros([n_layers, tf.shape(units)[0], n_hidden]) - - initial_h = input_initial_h or init_h - - with tf.variable_scope('cudnn_gru', reuse=reuse): - def single_cell(): return tf.contrib.cudnn_rnn.CudnnCompatibleGRUCell(n_hidden) - - cell = tf.nn.rnn_cell.MultiRNNCell([single_cell() for _ in range(n_layers)]) - - units = tf.transpose(units, (1, 0, 2)) - - h, h_last = tf.nn.dynamic_rnn(cell=cell, inputs=units, time_major=True, - initial_state=tuple(tf.unstack(initial_h, axis=0))) - h = tf.transpose(h, (1, 0, 2)) - - h_last = h_last[-1] # h_last is tuple: n_layers x batch_size x n_hidden - - # Extract last states if they are provided - if seq_lengths is not None: - indices = tf.stack([tf.range(tf.shape(h)[0]), seq_lengths - 1], axis=1) - h_last = tf.gather_nd(h, indices) - - return h, h_last - - -def cudnn_gru_wrapper(units, n_hidden, n_layers=1, trainable_initial_states=False, - seq_lengths=None, input_initial_h=None, name='cudnn_gru', reuse=False): - if check_gpu_existence(): - return cudnn_gru(units, n_hidden, n_layers, trainable_initial_states, - seq_lengths, input_initial_h, name, reuse) - - log.info('\nWarning! tf.contrib.cudnn_rnn.CudnnCompatibleGRUCell is used. ' - 'It is okay for inference mode, but ' - 'if you train your model with this cell it could NOT be used with ' - 'tf.contrib.cudnn_rnn.CudnnGRUCell later. ' - ) - - return cudnn_compatible_gru(units, n_hidden, n_layers, trainable_initial_states, - seq_lengths, input_initial_h, name, reuse) - - -def cudnn_lstm(units, n_hidden, n_layers=1, trainable_initial_states=None, seq_lengths=None, initial_h=None, - initial_c=None, name='cudnn_lstm', reuse=False): - """ Fast CuDNN LSTM implementation - - Args: - units: tf.Tensor with dimensions [B x T x F], where - B - batch size - T - number of tokens - F - features - n_hidden: dimensionality of hidden state - n_layers: number of layers - trainable_initial_states: whether to create a special trainable variable - to initialize the hidden states of the network or use just zeros - seq_lengths: tensor of sequence lengths with dimension [B] - initial_h: optional initial hidden state, masks trainable_initial_states - if provided - initial_c: optional initial cell state, masks trainable_initial_states - if provided - name: name of the variable scope to use - reuse:whether to reuse already initialized variable - - - Returns: - h - all hidden states along T dimension, - tf.Tensor with dimensionality [B x T x F] - h_last - last hidden state, tf.Tensor with dimensionality [B x H] - where H - number of hidden units - c_last - last cell state, tf.Tensor with dimensionality [B x H] - where H - number of hidden units - """ - with tf.variable_scope(name, reuse=reuse): - lstm = tf.contrib.cudnn_rnn.CudnnLSTM(num_layers=n_layers, - num_units=n_hidden) - if trainable_initial_states: - init_h = tf.get_variable('init_h', [n_layers, 1, n_hidden]) - init_h = tf.tile(init_h, (1, tf.shape(units)[0], 1)) - init_c = tf.get_variable('init_c', [n_layers, 1, n_hidden]) - init_c = tf.tile(init_c, (1, tf.shape(units)[0], 1)) - else: - init_h = init_c = tf.zeros([n_layers, tf.shape(units)[0], n_hidden]) - - initial_h = initial_h or init_h - initial_c = initial_c or init_c - - h, (h_last, c_last) = lstm(tf.transpose(units, (1, 0, 2)), (initial_h, initial_c)) - h = tf.transpose(h, (1, 0, 2)) - h_last = h_last[-1] - c_last = c_last[-1] - - # Extract last states if they are provided - if seq_lengths is not None: - indices = tf.stack([tf.range(tf.shape(h)[0]), seq_lengths - 1], axis=1) - h_last = tf.gather_nd(h, indices) - - return h, (h_last, c_last) - - -def cudnn_compatible_lstm(units, n_hidden, n_layers=1, trainable_initial_states=None, seq_lengths=None, initial_h=None, - initial_c=None, name='cudnn_lstm', reuse=False): - """ CuDNN Compatible LSTM implementation. - It should be used to load models saved with CudnnLSTMCell to run on CPU. - - Args: - units: tf.Tensor with dimensions [B x T x F], where - B - batch size - T - number of tokens - F - features - n_hidden: dimensionality of hidden state - n_layers: number of layers - trainable_initial_states: whether to create a special trainable variable - to initialize the hidden states of the network or use just zeros - seq_lengths: tensor of sequence lengths with dimension [B] - initial_h: optional initial hidden state, masks trainable_initial_states - if provided - initial_c: optional initial cell state, masks trainable_initial_states - if provided - name: name of the variable scope to use - reuse:whether to reuse already initialized variable - - - Returns: - h - all hidden states along T dimension, - tf.Tensor with dimensionality [B x T x F] - h_last - last hidden state, tf.Tensor with dimensionality [B x H] - where H - number of hidden units - c_last - last cell state, tf.Tensor with dimensionality [B x H] - where H - number of hidden units - """ - - with tf.variable_scope(name, reuse=reuse): - if trainable_initial_states: - init_h = tf.get_variable('init_h', [n_layers, 1, n_hidden]) - init_h = tf.tile(init_h, (1, tf.shape(units)[0], 1)) - init_c = tf.get_variable('init_c', [n_layers, 1, n_hidden]) - init_c = tf.tile(init_c, (1, tf.shape(units)[0], 1)) - else: - init_h = init_c = tf.zeros([n_layers, tf.shape(units)[0], n_hidden]) - - initial_h = initial_h or init_h - initial_c = initial_c or init_c - - with tf.variable_scope('cudnn_lstm', reuse=reuse): - def single_cell(): return tf.contrib.cudnn_rnn.CudnnCompatibleLSTMCell(n_hidden) - - cell = tf.nn.rnn_cell.MultiRNNCell([single_cell() for _ in range(n_layers)]) - - units = tf.transpose(units, (1, 0, 2)) - - init = tuple([tf.nn.rnn_cell.LSTMStateTuple(ic, ih) for ih, ic in - zip(tf.unstack(initial_h, axis=0), tf.unstack(initial_c, axis=0))]) - - h, state = tf.nn.dynamic_rnn(cell=cell, inputs=units, time_major=True, initial_state=init) - - h = tf.transpose(h, (1, 0, 2)) - h_last = state[-1].h - c_last = state[-1].c - - # Extract last states if they are provided - if seq_lengths is not None: - indices = tf.stack([tf.range(tf.shape(h)[0]), seq_lengths - 1], axis=1) - h_last = tf.gather_nd(h, indices) - - return h, (h_last, c_last) - - -def cudnn_lstm_wrapper(units, n_hidden, n_layers=1, trainable_initial_states=None, seq_lengths=None, initial_h=None, - initial_c=None, name='cudnn_lstm', reuse=False): - if check_gpu_existence(): - return cudnn_lstm(units, n_hidden, n_layers, trainable_initial_states, - seq_lengths, initial_h, initial_c, name, reuse) - - log.info('\nWarning! tf.contrib.cudnn_rnn.CudnnCompatibleLSTMCell is used. ' - 'It is okay for inference mode, but ' - 'if you train your model with this cell it could NOT be used with ' - 'tf.contrib.cudnn_rnn.CudnnLSTMCell later. ' - ) - - return cudnn_compatible_lstm(units, n_hidden, n_layers, trainable_initial_states, - seq_lengths, initial_h, initial_c, name, reuse) - - -def cudnn_bi_gru(units, - n_hidden, - seq_lengths=None, - n_layers=1, - trainable_initial_states=False, - name='cudnn_bi_gru', - reuse=False): - """ Fast CuDNN Bi-GRU implementation - - Args: - units: tf.Tensor with dimensions [B x T x F], where - B - batch size - T - number of tokens - F - features - n_hidden: dimensionality of hidden state - seq_lengths: number of tokens in each sample in the batch - n_layers: number of layers - trainable_initial_states: whether to create a special trainable variable - to initialize the hidden states of the network or use just zeros - name: name of the variable scope to use - reuse:whether to reuse already initialized variable - - - Returns: - h - all hidden states along T dimension, - tf.Tensor with dimensionality [B x T x F] - h_last - last hidden state, tf.Tensor with dimensionality [B x H * 2] - where H - number of hidden units - """ - - with tf.variable_scope(name, reuse=reuse): - if seq_lengths is None: - seq_lengths = tf.ones([tf.shape(units)[0]], dtype=tf.int32) * tf.shape(units)[1] - with tf.variable_scope('Forward'): - h_fw, h_last_fw = cudnn_gru_wrapper(units, - n_hidden, - n_layers=n_layers, - trainable_initial_states=trainable_initial_states, - seq_lengths=seq_lengths, - reuse=reuse) - - with tf.variable_scope('Backward'): - reversed_units = tf.reverse_sequence(units, seq_lengths=seq_lengths, seq_dim=1, batch_dim=0) - h_bw, h_last_bw = cudnn_gru_wrapper(reversed_units, - n_hidden, - n_layers=n_layers, - trainable_initial_states=trainable_initial_states, - seq_lengths=seq_lengths, - reuse=reuse) - h_bw = tf.reverse_sequence(h_bw, seq_lengths=seq_lengths, seq_dim=1, batch_dim=0) - - return (h_fw, h_bw), (h_last_fw, h_last_bw) - - -def cudnn_bi_lstm(units, - n_hidden, - seq_lengths=None, - n_layers=1, - trainable_initial_states=False, - name='cudnn_bi_gru', - reuse=False): - """ Fast CuDNN Bi-LSTM implementation - - Args: - units: tf.Tensor with dimensions [B x T x F], where - B - batch size - T - number of tokens - F - features - n_hidden: dimensionality of hidden state - seq_lengths: number of tokens in each sample in the batch - n_layers: number of layers - trainable_initial_states: whether to create a special trainable variable - to initialize the hidden states of the network or use just zeros - name: name of the variable scope to use - reuse:whether to reuse already initialized variable - - Returns: - h - all hidden states along T dimension, - tf.Tensor with dimensionality [B x T x F] - h_last - last hidden state, tf.Tensor with dimensionality [B x H * 2] - where H - number of hidden units - c_last - last cell state, tf.Tensor with dimensionality [B x H * 2] - where H - number of hidden units - """ - with tf.variable_scope(name, reuse=reuse): - if seq_lengths is None: - seq_lengths = tf.ones([tf.shape(units)[0]], dtype=tf.int32) * tf.shape(units)[1] - with tf.variable_scope('Forward'): - h_fw, (h_fw_last, c_fw_last) = cudnn_lstm_wrapper(units, - n_hidden, - n_layers=n_layers, - trainable_initial_states=trainable_initial_states, - seq_lengths=seq_lengths) - - with tf.variable_scope('Backward'): - reversed_units = tf.reverse_sequence(units, seq_lengths=seq_lengths, seq_dim=1, batch_dim=0) - h_bw, (h_bw_last, c_bw_last) = cudnn_lstm_wrapper(reversed_units, - n_hidden, - n_layers=n_layers, - trainable_initial_states=trainable_initial_states, - seq_lengths=seq_lengths) - - h_bw = tf.reverse_sequence(h_bw, seq_lengths=seq_lengths, seq_dim=1, batch_dim=0) - return (h_fw, h_bw), ((h_fw_last, c_fw_last), (h_bw_last, c_bw_last)) - - -def cudnn_stacked_bi_gru(units, - n_hidden, - seq_lengths=None, - n_stacks=2, - keep_prob=1.0, - concat_stacked_outputs=False, - trainable_initial_states=False, - name='cudnn_stacked_bi_gru', - reuse=False): - """ Fast CuDNN Stacked Bi-GRU implementation - - Args: - units: tf.Tensor with dimensions [B x T x F], where - B - batch size - T - number of tokens - F - features - n_hidden: dimensionality of hidden state - seq_lengths: number of tokens in each sample in the batch - n_stacks: number of stacked Bi-GRU - keep_prob: dropout keep_prob between Bi-GRUs (intra-layer dropout) - concat_stacked_outputs: return last Bi-GRU output or concat outputs from every Bi-GRU, - trainable_initial_states: whether to create a special trainable variable - to initialize the hidden states of the network or use just zeros - name: name of the variable scope to use - reuse: whether to reuse already initialized variable - - - Returns: - h - all hidden states along T dimension, - tf.Tensor with dimensionality [B x T x ((n_hidden * 2) * n_stacks)] - """ - if seq_lengths is None: - seq_lengths = tf.ones([tf.shape(units)[0]], dtype=tf.int32) * tf.shape(units)[1] - - outputs = [units] - - with tf.variable_scope(name, reuse=reuse): - for n in range(n_stacks): - - if n == 0: - inputs = outputs[-1] - else: - inputs = variational_dropout(outputs[-1], keep_prob=keep_prob) - - (h_fw, h_bw), _ = cudnn_bi_gru(inputs, n_hidden, seq_lengths, - n_layers=1, - trainable_initial_states=trainable_initial_states, - name='{}_cudnn_bi_gru'.format(n), - reuse=reuse) - - outputs.append(tf.concat([h_fw, h_bw], axis=2)) - - if concat_stacked_outputs: - return tf.concat(outputs[1:], axis=2) - - return outputs[-1] - - -def variational_dropout(units, keep_prob, fixed_mask_dims=(1,)): - """ Dropout with the same drop mask for all fixed_mask_dims - - Args: - units: a tensor, usually with shapes [B x T x F], where - B - batch size - T - tokens dimension - F - feature dimension - keep_prob: keep probability - fixed_mask_dims: in these dimensions the mask will be the same - - Returns: - dropped units tensor - """ - units_shape = tf.shape(units) - noise_shape = [units_shape[n] for n in range(len(units.shape))] - for dim in fixed_mask_dims: - noise_shape[dim] = 1 - return tf.nn.dropout(units, rate=1 - keep_prob, noise_shape=noise_shape) diff --git a/deeppavlov/core/models/keras_model.py b/deeppavlov/core/models/keras_model.py deleted file mode 100644 index a60f561a1b..0000000000 --- a/deeppavlov/core/models/keras_model.py +++ /dev/null @@ -1,206 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from abc import abstractmethod -from logging import getLogger - -import tensorflow.compat.v1 as tf -from tensorflow.keras import backend as K -from overrides import overrides - -from deeppavlov.core.models.lr_scheduled_model import LRScheduledModel -from deeppavlov.core.models.nn_model import NNModel -from deeppavlov.core.models.tf_backend import TfModelMeta - -log = getLogger(__name__) - - -class KerasModel(NNModel, metaclass=TfModelMeta): - """ - Builds Keras model with TensorFlow backend. - - Attributes: - epochs_done: number of epochs that were done - batches_seen: number of epochs that were seen - train_examples_seen: number of training samples that were seen - sess: tf session - """ - - def __init__(self, **kwargs) -> None: - """ - Initialize model using keyword parameters - - Args: - kwargs: Dictionary with model parameters - """ - self.epochs_done = 0 - self.batches_seen = 0 - self.train_examples_seen = 0 - - super().__init__(save_path=kwargs.get("save_path"), - load_path=kwargs.get("load_path"), - mode=kwargs.get("mode")) - - @staticmethod - def _config_session(): - """ - Configure session for particular device - - Returns: - tensorflow.Session - """ - config = tf.ConfigProto() - config.gpu_options.allow_growth = True - config.gpu_options.visible_device_list = '0' - return tf.Session(config=config) - - @abstractmethod - def load(self, *args, **kwargs) -> None: - pass - - @abstractmethod - def save(self, *args, **kwargs) -> None: - pass - - def process_event(self, event_name: str, data: dict) -> None: - """ - Process event after epoch - Args: - event_name: whether event is send after epoch or batch. - Set of values: ``"after_epoch", "after_batch"`` - data: event data (dictionary) - - Returns: - None - """ - if event_name == "after_epoch": - self.epochs_done = data["epochs_done"] - self.batches_seen = data["batches_seen"] - self.train_examples_seen = data["train_examples_seen"] - return - - -class LRScheduledKerasModel(LRScheduledModel, KerasModel): - """ - KerasModel enhanced with optimizer, learning rate and momentum - management and search. - """ - - def __init__(self, **kwargs): - """ - Initialize model with given parameters - - Args: - **kwargs: dictionary of parameters - """ - self.opt = kwargs - KerasModel.__init__(self, **kwargs) - if not(isinstance(kwargs.get("learning_rate"), float) and isinstance(kwargs.get("learning_rate_decay"), float)): - LRScheduledModel.__init__(self, **kwargs) - - @abstractmethod - def get_optimizer(self): - """ - Return an instance of keras optimizer - """ - pass - - @overrides - def _init_learning_rate_variable(self): - """ - Initialize learning rate - - Returns: - None - """ - return None - - @overrides - def _init_momentum_variable(self): - """ - Initialize momentum - - Returns: - None - """ - return None - - @overrides - def get_learning_rate_variable(self): - """ - Extract value of learning rate from optimizer - - Returns: - learning rate value - """ - return self.get_optimizer().lr - - @overrides - def get_momentum_variable(self): - """ - Extract values of momentum variables from optimizer - - Returns: - optimizer's `rho` or `beta_1` - """ - optimizer = self.get_optimizer() - if hasattr(optimizer, 'rho'): - return optimizer.rho - elif hasattr(optimizer, 'beta_1'): - return optimizer.beta_1 - return None - - @overrides - def _update_graph_variables(self, learning_rate: float = None, momentum: float = None): - """ - Update graph variables setting giving `learning_rate` and `momentum` - - Args: - learning_rate: learning rate value to be set in graph (set if not None) - momentum: momentum value to be set in graph (set if not None) - - Returns: - None - """ - if learning_rate is not None: - K.set_value(self.get_learning_rate_variable(), learning_rate) - # log.info(f"Learning rate = {learning_rate}") - if momentum is not None: - K.set_value(self.get_momentum_variable(), momentum) - # log.info(f"Momentum = {momentum}") - - def process_event(self, event_name: str, data: dict): - """ - Process event after epoch - Args: - event_name: whether event is send after epoch or batch. - Set of values: ``"after_epoch", "after_batch"`` - data: event data (dictionary) - - Returns: - None - """ - if (isinstance(self.opt.get("learning_rate", None), float) and - isinstance(self.opt.get("learning_rate_decay", None), float)): - pass - else: - if event_name == 'after_train_log': - if (self.get_learning_rate_variable() is not None) and ('learning_rate' not in data): - data['learning_rate'] = float(K.get_value(self.get_learning_rate_variable())) - # data['learning_rate'] = self._lr - if (self.get_momentum_variable() is not None) and ('momentum' not in data): - data['momentum'] = float(K.get_value(self.get_momentum_variable())) - # data['momentum'] = self._mom - else: - super().process_event(event_name, data) diff --git a/deeppavlov/core/models/tf_backend.py b/deeppavlov/core/models/tf_backend.py deleted file mode 100644 index f6d9bc018c..0000000000 --- a/deeppavlov/core/models/tf_backend.py +++ /dev/null @@ -1,77 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from abc import ABCMeta -from functools import wraps - -import tensorflow.compat.v1 as tf -from six import with_metaclass - - -def _graph_wrap(func, graph): - """Constructs function encapsulated in the graph.""" - - @wraps(func) - def _wrapped(*args, **kwargs): - with graph.as_default(): - return func(*args, **kwargs) - - return _wrapped - - -def _keras_wrap(func, session): - """Constructs function encapsulated in the graph and the session.""" - @wraps(func) - def _wrapped(*args, **kwargs): - with session.graph.as_default(): - tf.keras.backend.set_session(session) - return func(*args, **kwargs) - - return _wrapped - - -class TfModelMeta(with_metaclass(type, ABCMeta)): - """Metaclass that helps all child classes to have their own graph and session.""" - - def __call__(cls, *args, **kwargs): - obj = cls.__new__(cls, *args, **kwargs) - from .keras_model import KerasModel - if issubclass(cls, KerasModel): - from tensorflow.keras import backend as K - if K.backend() != 'tensorflow': - obj.__init__(*args, **kwargs) - return obj - - K.clear_session() - obj.graph = tf.Graph() - with obj.graph.as_default(): - if hasattr(cls, '_config_session'): - obj.sess = cls._config_session() - else: - obj.sess = tf.Session() - else: - obj.graph = tf.Graph() - - for meth in dir(obj): - if meth == '__class__': - continue - attr = getattr(obj, meth) - if callable(attr): - if issubclass(cls, KerasModel): - wrapped_attr = _keras_wrap(attr, obj.sess) - else: - wrapped_attr = _graph_wrap(attr, obj.graph) - setattr(obj, meth, wrapped_attr) - obj.__init__(*args, **kwargs) - return obj diff --git a/deeppavlov/core/models/tf_model.py b/deeppavlov/core/models/tf_model.py deleted file mode 100644 index 39d867165b..0000000000 --- a/deeppavlov/core/models/tf_model.py +++ /dev/null @@ -1,254 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from collections import defaultdict -from logging import getLogger -from pathlib import Path -from typing import Iterable, Union, Tuple, Optional - -import numpy as np -import tensorflow as tf -from overrides import overrides -from tensorflow.python.ops import variables - -from deeppavlov.core.common.errors import ConfigError -from deeppavlov.core.common.registry import cls_from_str -from deeppavlov.core.models.lr_scheduled_model import LRScheduledModel -from deeppavlov.core.models.nn_model import NNModel -from deeppavlov.core.models.tf_backend import TfModelMeta - -log = getLogger(__name__) - - -class TFModel(NNModel, metaclass=TfModelMeta): - """Parent class for all components using TensorFlow.""" - - sess: tf.Session - - def __init__(self, *args, **kwargs) -> None: - super().__init__(*args, **kwargs) - - def load(self, exclude_scopes: tuple = ('Optimizer',), path: Union[Path, str] = None) -> None: - """Load model parameters from self.load_path""" - if not hasattr(self, 'sess'): - raise RuntimeError('Your TensorFlow model {} must' - ' have sess attribute!'.format(self.__class__.__name__)) - path = path or self.load_path - path = str(Path(path).resolve()) - # Check presence of the model files - if tf.train.checkpoint_exists(path): - log.info('[loading model from {}]'.format(path)) - # Exclude optimizer variables from saved variables - var_list = self._get_saveable_variables(exclude_scopes) - saver = tf.train.Saver(var_list) - saver.restore(self.sess, path) - - def deserialize(self, weights: Iterable[Tuple[str, np.ndarray]]) -> None: - assign_ops = [] - feed_dict = {} - for var_name, value in weights: - var = self.sess.graph.get_tensor_by_name(var_name) - value = np.asarray(value) - assign_placeholder = tf.placeholder(var.dtype, shape=value.shape) - assign_op = tf.assign(var, assign_placeholder) - assign_ops.append(assign_op) - feed_dict[assign_placeholder] = value - self.sess.run(assign_ops, feed_dict=feed_dict) - - def save(self, exclude_scopes: tuple = ('Optimizer',)) -> None: - """Save model parameters to self.save_path""" - if not hasattr(self, 'sess'): - raise RuntimeError('Your TensorFlow model {} must' - ' have sess attribute!'.format(self.__class__.__name__)) - path = str(self.save_path.resolve()) - log.info('[saving model to {}]'.format(path)) - var_list = self._get_saveable_variables(exclude_scopes) - saver = tf.train.Saver(var_list) - saver.save(self.sess, path) - - def serialize(self) -> Tuple[Tuple[str, np.ndarray], ...]: - tf_vars = tf.global_variables() - values = self.sess.run(tf_vars) - return tuple(zip([var.name for var in tf_vars], values)) - - @staticmethod - def _get_saveable_variables(exclude_scopes=tuple()): - # noinspection PyProtectedMember - all_vars = variables._all_saveable_objects() - vars_to_train = [var for var in all_vars if all(sc not in var.name for sc in exclude_scopes)] - return vars_to_train - - @staticmethod - def _get_trainable_variables(exclude_scopes=tuple()): - all_vars = tf.global_variables() - vars_to_train = [var for var in all_vars if all(sc not in var.name for sc in exclude_scopes)] - return vars_to_train - - def get_train_op(self, - loss, - learning_rate, - optimizer=None, - clip_norm=None, - learnable_scopes=None, - optimizer_scope_name=None, - **kwargs): - """ - Get train operation for given loss - - Args: - loss: loss, tf tensor or scalar - learning_rate: scalar or placeholder. - clip_norm: clip gradients norm by clip_norm. - learnable_scopes: which scopes are trainable (None for all). - optimizer: instance of tf.train.Optimizer, default Adam. - **kwargs: parameters passed to tf.train.Optimizer object - (scalars or placeholders). - - Returns: - train_op - """ - if optimizer_scope_name is None: - opt_scope = tf.variable_scope('Optimizer') - else: - opt_scope = tf.variable_scope(optimizer_scope_name) - with opt_scope: - if learnable_scopes is None: - variables_to_train = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES) - else: - variables_to_train = [] - for scope_name in learnable_scopes: - variables_to_train.extend(tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope=scope_name)) - - if optimizer is None: - optimizer = tf.train.AdamOptimizer - - # For batch norm it is necessary to update running averages - extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS) - with tf.control_dependencies(extra_update_ops): - - def clip_if_not_none(grad): - if grad is not None: - return tf.clip_by_norm(grad, clip_norm) - - opt = optimizer(learning_rate, **kwargs) - grads_and_vars = opt.compute_gradients(loss, var_list=variables_to_train) - if clip_norm is not None: - grads_and_vars = [(clip_if_not_none(grad), var) - for grad, var in grads_and_vars] - train_op = opt.apply_gradients(grads_and_vars) - return train_op - - @staticmethod - def print_number_of_parameters(): - """ - Print number of *trainable* parameters in the network - """ - log.info('Number of parameters: ') - variables = tf.trainable_variables() - blocks = defaultdict(int) - for var in variables: - # Get the top level scope name of variable - block_name = var.name.split('/')[0] - number_of_parameters = np.prod(var.get_shape().as_list()) - blocks[block_name] += number_of_parameters - for block_name, cnt in blocks.items(): - log.info("{} - {}.".format(block_name, cnt)) - total_num_parameters = np.sum(list(blocks.values())) - log.info('Total number of parameters equal {}'.format(total_num_parameters)) - - def destroy(self): - if hasattr(self, 'sess'): - for k in list(self.sess.graph.get_all_collection_keys()): - self.sess.graph.clear_collection(k) - super().destroy() - - -class LRScheduledTFModel(TFModel, LRScheduledModel): - """ - TFModel enhanced with optimizer, learning rate and momentum - management and search. - """ - - def __init__(self, - optimizer: str = 'AdamOptimizer', - clip_norm: float = None, - momentum: float = None, - **kwargs) -> None: - TFModel.__init__(self, **kwargs) - - try: - self._optimizer = cls_from_str(optimizer) - except Exception: - self._optimizer = getattr(tf.train, optimizer.split(':')[-1]) - if not issubclass(self._optimizer, tf.train.Optimizer): - raise ConfigError("`optimizer` should be tensorflow.train.Optimizer subclass") - self._clip_norm = clip_norm - - LRScheduledModel.__init__(self, momentum=momentum, **kwargs) - - @overrides - def _init_learning_rate_variable(self): - return tf.Variable(self._lr or 0., dtype=tf.float32, name='learning_rate') - - @overrides - def _init_momentum_variable(self): - return tf.Variable(self._mom or 0., dtype=tf.float32, name='momentum') - - @overrides - def _update_graph_variables(self, learning_rate=None, momentum=None): - if learning_rate is not None: - self.sess.run(tf.assign(self._lr_var, learning_rate)) - # log.info(f"Learning rate = {learning_rate}") - if momentum is not None: - self.sess.run(tf.assign(self._mom_var, momentum)) - # log.info(f"Momentum = {momentum}") - - def get_train_op(self, - loss, - learning_rate: Union[float, tf.placeholder] = None, - optimizer: tf.train.Optimizer = None, - momentum: Union[float, tf.placeholder] = None, - clip_norm: float = None, - **kwargs): - if learning_rate is not None: - kwargs['learning_rate'] = learning_rate - else: - kwargs['learning_rate'] = self._lr_var - kwargs['optimizer'] = optimizer or self.get_optimizer() - kwargs['clip_norm'] = clip_norm or self._clip_norm - - momentum_param = 'momentum' - if kwargs['optimizer'] == tf.train.AdamOptimizer: - momentum_param = 'beta1' - elif kwargs['optimizer'] == tf.train.AdadeltaOptimizer: - momentum_param = 'rho' - - if momentum is not None: - kwargs[momentum_param] = momentum - elif self.get_momentum() is not None: - kwargs[momentum_param] = self._mom_var - return TFModel.get_train_op(self, loss, **kwargs) - - def get_optimizer(self): - return self._optimizer - - def load(self, - exclude_scopes: Optional[Iterable] = ('Optimizer', - 'learning_rate', - 'momentum'), - **kwargs): - return super().load(exclude_scopes=exclude_scopes, **kwargs) - - def process_event(self, *args, **kwargs): - LRScheduledModel.process_event(self, *args, **kwargs) diff --git a/deeppavlov/core/models/torch_model.py b/deeppavlov/core/models/torch_model.py index 67bfee27ab..43c83658c4 100644 --- a/deeppavlov/core/models/torch_model.py +++ b/deeppavlov/core/models/torch_model.py @@ -126,6 +126,10 @@ def init_from_opt(self, model_func: str) -> None: else: raise AttributeError("Model is not defined.") + @property + def is_data_parallel(self) -> bool: + return isinstance(self.model, torch.nn.DataParallel) + @overrides def load(self, fname: Optional[str] = None, *args, **kwargs) -> None: """Load model from `fname` (if `fname` is not given, use `self.load_path`) to `self.model` along with @@ -143,7 +147,7 @@ def load(self, fname: Optional[str] = None, *args, **kwargs) -> None: if fname is not None: self.load_path = fname - model_func = getattr(self, self.opt.get("model_name"), None) + model_func = getattr(self, self.opt.get("model_name", ""), None) if self.load_path: log.info(f"Load path {self.load_path} is given.") @@ -157,18 +161,29 @@ def load(self, fname: Optional[str] = None, *args, **kwargs) -> None: log.info(f"Initializing `{self.__class__.__name__}` from saved.") # firstly, initialize with random weights and previously saved parameters - self.init_from_opt(model_func) + if model_func: + self.init_from_opt(model_func) # now load the weights, optimizer from saved log.info(f"Loading weights from {weights_path}.") checkpoint = torch.load(weights_path, map_location=self.device) - self.model.load_state_dict(checkpoint["model_state_dict"], strict=False) - self.optimizer.load_state_dict(checkpoint["optimizer_state_dict"]) + model_state = checkpoint["model_state_dict"] + optimizer_state = checkpoint["optimizer_state_dict"] + + # load a multi-gpu model on a single device + if not self.is_data_parallel and any(["module." in key for key in list(model_state.keys())]): + model_state = {key.replace("module.", ""): val for key, val in model_state.items()} + + if torch.cuda.device_count() > 1: + self.model.module.load_state_dict(model_state) + else: + self.model.load_state_dict(model_state) + self.optimizer.load_state_dict(optimizer_state) self.epochs_done = checkpoint.get("epochs_done", 0) - else: + elif model_func: log.info(f"Init from scratch. Load path {weights_path} does not exist.") self.init_from_opt(model_func) - else: + elif model_func: log.info(f"Init from scratch. Load path {self.load_path} is not provided.") self.init_from_opt(model_func) @@ -194,11 +209,18 @@ def save(self, fname: Optional[str] = None, *args, **kwargs) -> None: weights_path = Path(fname).with_suffix(f".pth.tar") log.info(f"Saving model to {weights_path}.") # move the model to `cpu` before saving to provide consistency - torch.save({ - "model_state_dict": self.model.cpu().state_dict(), - "optimizer_state_dict": self.optimizer.state_dict(), - "epochs_done": self.epochs_done - }, weights_path) + if torch.cuda.device_count() > 1: + torch.save({ + "model_state_dict": self.model.module.cpu().state_dict(), + "optimizer_state_dict": self.optimizer.state_dict(), + "epochs_done": self.epochs_done + }, weights_path) + else: + torch.save({ + "model_state_dict": self.model.cpu().state_dict(), + "optimizer_state_dict": self.optimizer.state_dict(), + "epochs_done": self.epochs_done + }, weights_path) # return it back to device (necessary if it was on `cuda`) self.model.to(self.device) diff --git a/deeppavlov/core/trainers/fit_trainer.py b/deeppavlov/core/trainers/fit_trainer.py index 0378560564..758dea532f 100644 --- a/deeppavlov/core/trainers/fit_trainer.py +++ b/deeppavlov/core/trainers/fit_trainer.py @@ -17,11 +17,9 @@ import time from itertools import islice from logging import getLogger -from pathlib import Path from typing import Tuple, Dict, Union, Optional, Iterable, Any, Collection from deeppavlov.core.commands.infer import build_model -from deeppavlov.core.commands.utils import expand_path from deeppavlov.core.common.chainer import Chainer from deeppavlov.core.common.params import from_params from deeppavlov.core.common.registry import register @@ -31,6 +29,7 @@ from deeppavlov.core.trainers.utils import Metric, parse_metrics, prettify_metrics, NumpyArrayEncoder log = getLogger(__name__) +report_log = getLogger('train_report') @register('fit_trainer') @@ -50,8 +49,6 @@ class FitTrainer: evaluation_targets: data types on which to evaluate trained pipeline (default is ``('valid', 'test')``) show_examples: a flag used to print inputs, expected outputs and predicted outputs for the last batch in evaluation logs (default is ``False``) - tensorboard_log_dir: path to a directory where tensorboard logs can be stored, ignored if None - (default is ``None``) max_test_batches: maximum batches count for pipeline testing and evaluation, ignored if negative (default is ``-1``) **kwargs: additional parameters whose names will be logged but otherwise ignored @@ -61,7 +58,6 @@ def __init__(self, chainer_config: dict, *, batch_size: int = -1, metrics: Iterable[Union[str, dict]] = ('accuracy',), evaluation_targets: Iterable[str] = ('valid', 'test'), show_examples: bool = False, - tensorboard_log_dir: Optional[Union[str, Path]] = None, max_test_batches: int = -1, **kwargs) -> None: if kwargs: @@ -72,23 +68,7 @@ def __init__(self, chainer_config: dict, *, batch_size: int = -1, self.metrics = parse_metrics(metrics, self._chainer.in_y, self._chainer.out_params) self.evaluation_targets = tuple(evaluation_targets) self.show_examples = show_examples - self.max_test_batches = None if max_test_batches < 0 else max_test_batches - - self.tensorboard_log_dir: Optional[Path] = tensorboard_log_dir - if tensorboard_log_dir is not None: - try: - # noinspection PyPackageRequirements - # noinspection PyUnresolvedReferences - import tensorflow - except ImportError: - log.warning('TensorFlow could not be imported, so tensorboard log directory' - f'`{self.tensorboard_log_dir}` will be ignored') - self.tensorboard_log_dir = None - else: - self.tensorboard_log_dir = expand_path(tensorboard_log_dir) - self._tf = tensorflow - self._built = False self._saved = False self._loaded = False @@ -110,37 +90,15 @@ def fit_chainer(self, iterator: Union[DataFittingIterator, DataLearningIterator] targets = [targets] if self.batch_size > 0 and callable(getattr(component, 'partial_fit', None)): - writer = None - for i, (x, y) in enumerate(iterator.gen_batches(self.batch_size, shuffle=False)): preprocessed = self._chainer.compute(x, y, targets=targets) # noinspection PyUnresolvedReferences - result = component.partial_fit(*preprocessed) - - if result is not None and self.tensorboard_log_dir is not None: - if writer is None: - writer = self._tf.summary.FileWriter(str(self.tensorboard_log_dir / - f'partial_fit_{component_index}_log')) - for name, score in result.items(): - summary = self._tf.Summary() - summary.value.add(tag='partial_fit/' + name, simple_value=score) - writer.add_summary(summary, i) - writer.flush() + component.partial_fit(*preprocessed) else: preprocessed = self._chainer.compute(*iterator.get_instances(), targets=targets) if len(targets) == 1: preprocessed = [preprocessed] - result: Optional[Dict[str, Iterable[float]]] = component.fit(*preprocessed) - - if result is not None and self.tensorboard_log_dir is not None: - writer = self._tf.summary.FileWriter(str(self.tensorboard_log_dir / - f'fit_log_{component_index}')) - for name, scores in result.items(): - for i, score in enumerate(scores): - summary = self._tf.Summary() - summary.value.add(tag='fit/' + name, simple_value=score) - writer.add_summary(summary, i) - writer.flush() + component.fit(*preprocessed) component.save() @@ -240,15 +198,14 @@ def test(self, data: Iterable[Tuple[Collection[Any], Collection[Any]]], return report - def evaluate(self, iterator: DataLearningIterator, evaluation_targets: Optional[Iterable[str]] = None, *, - print_reports: bool = True) -> Dict[str, dict]: + def evaluate(self, iterator: DataLearningIterator, + evaluation_targets: Optional[Iterable[str]] = None) -> Dict[str, dict]: """ Run :meth:`test` on multiple data types using provided data iterator Args: iterator: :class:`~deeppavlov.core.data.data_learning_iterator.DataLearningIterator` used for evaluation evaluation_targets: iterable of data types to evaluate on - print_reports: a flag used to print evaluation reports as json lines Returns: a dictionary with data types as keys and evaluation reports as values @@ -263,7 +220,6 @@ def evaluate(self, iterator: DataLearningIterator, evaluation_targets: Optional[ data_gen = iterator.gen_batches(self.batch_size, data_type=data_type, shuffle=False) report = self.test(data_gen) res[data_type] = report - if print_reports: - print(json.dumps({data_type: report}, ensure_ascii=False, cls=NumpyArrayEncoder)) + report_log.info(json.dumps({data_type: report}, ensure_ascii=False, cls=NumpyArrayEncoder)) return res diff --git a/deeppavlov/core/trainers/nn_trainer.py b/deeppavlov/core/trainers/nn_trainer.py index 6f6fd8b4bf..f642d752e0 100644 --- a/deeppavlov/core/trainers/nn_trainer.py +++ b/deeppavlov/core/trainers/nn_trainer.py @@ -25,8 +25,9 @@ from deeppavlov.core.data.data_learning_iterator import DataLearningIterator from deeppavlov.core.trainers.fit_trainer import FitTrainer from deeppavlov.core.trainers.utils import parse_metrics, NumpyArrayEncoder - +from deeppavlov.core.common.log_events import get_tb_writer log = getLogger(__name__) +report_log = getLogger('train_report') @register('nn_trainer') @@ -105,8 +106,7 @@ def __init__(self, chainer_config: dict, *, log_every_n_batches: int = -1, log_every_n_epochs: int = -1, log_on_k_batches: int = 1, **kwargs) -> None: super().__init__(chainer_config, batch_size=batch_size, metrics=metrics, evaluation_targets=evaluation_targets, - show_examples=show_examples, tensorboard_log_dir=tensorboard_log_dir, - max_test_batches=max_test_batches, **kwargs) + show_examples=show_examples, max_test_batches=max_test_batches, **kwargs) if train_metrics is None: self.train_metrics = self.metrics else: @@ -145,10 +145,7 @@ def _improved(op): self.last_result = {} self.losses = [] self.start_time: Optional[float] = None - - if self.tensorboard_log_dir is not None: - self.tb_train_writer = self._tf.summary.FileWriter(str(self.tensorboard_log_dir / 'train_log')) - self.tb_valid_writer = self._tf.summary.FileWriter(str(self.tensorboard_log_dir / 'valid_log')) + self.tb_writer = get_tb_writer(tensorboard_log_dir) def save(self) -> None: if self._loaded: @@ -174,14 +171,13 @@ def _validate(self, iterator: DataLearningIterator, metrics = list(report['metrics'].items()) - if tensorboard_tag is not None and self.tensorboard_log_dir is not None: - summary = self._tf.Summary() - for name, score in metrics: - summary.value.add(tag=f'{tensorboard_tag}/{name}', simple_value=score) + if tensorboard_tag is not None and self.tb_writer is not None: if tensorboard_index is None: tensorboard_index = self.train_batches_seen - self.tb_valid_writer.add_summary(summary, tensorboard_index) - self.tb_valid_writer.flush() + for name, score in metrics: + self.tb_writer.write_valid(tag=f'{tensorboard_tag}/{name}', scalar_value=score, + global_step=tensorboard_index) + self.tb_writer.flush() m_name, score = metrics[0] @@ -217,7 +213,7 @@ def _validate(self, iterator: DataLearningIterator, self._send_event(event_name='after_validation', data=report) report = {'valid': report} - print(json.dumps(report, ensure_ascii=False, cls=NumpyArrayEncoder)) + report_log.info(json.dumps(report, ensure_ascii=False, cls=NumpyArrayEncoder)) self.validation_number += 1 def _log(self, iterator: DataLearningIterator, @@ -246,18 +242,16 @@ def _log(self, iterator: DataLearningIterator, self.losses.clear() metrics.append(('loss', report['loss'])) - if metrics and self.tensorboard_log_dir is not None: - summary = self._tf.Summary() - + if metrics and self.tb_writer is not None: for name, score in metrics: - summary.value.add(tag=f'{tensorboard_tag}/{name}', simple_value=score) - self.tb_train_writer.add_summary(summary, tensorboard_index) - self.tb_train_writer.flush() + self.tb_writer.write_train(tag=f'{tensorboard_tag}/{name}', scalar_value=score, + global_step=tensorboard_index) + self.tb_writer.flush() self._send_event(event_name='after_train_log', data=report) report = {'train': report} - print(json.dumps(report, ensure_ascii=False, cls=NumpyArrayEncoder)) + report_log.info(json.dumps(report, ensure_ascii=False, cls=NumpyArrayEncoder)) def _send_event(self, event_name: str, data: Optional[dict] = None) -> None: report = { diff --git a/deeppavlov/core/trainers/utils.py b/deeppavlov/core/trainers/utils.py index f15e940626..27e1d07647 100644 --- a/deeppavlov/core/trainers/utils.py +++ b/deeppavlov/core/trainers/utils.py @@ -13,10 +13,11 @@ # limitations under the License. from collections import OrderedDict, namedtuple +from functools import partial from json import JSONEncoder from typing import List, Tuple, Union, Iterable -import numpy +import numpy as np from deeppavlov.core.common.metrics_registry import get_metric_by_name @@ -29,16 +30,17 @@ def parse_metrics(metrics: Iterable[Union[str, dict]], in_y: List[str], out_vars if isinstance(metric, str): metric = {'name': metric, 'alias': metric} - metric_name = metric['name'] - alias = metric.get('alias', metric_name) + metric_name = metric.pop('name') + alias = metric.pop('alias', metric_name) f = get_metric_by_name(metric_name) - inputs = metric.get('inputs', in_y + out_vars) + inputs = metric.pop('inputs', in_y + out_vars) if isinstance(inputs, str): inputs = [inputs] - metrics_functions.append(Metric(metric_name, f, inputs, alias)) + metrics_functions.append(Metric(metric_name, partial(f, **metric), inputs, alias)) + return metrics_functions @@ -56,6 +58,10 @@ def prettify_metrics(metrics: List[Tuple[str, float]], precision: int = 4) -> Or class NumpyArrayEncoder(JSONEncoder): def default(self, obj): - if isinstance(obj, numpy.ndarray): + if isinstance(obj, np.ndarray): return obj.tolist() + elif isinstance(obj, np.integer): + return int(obj) + elif isinstance(obj, np.floating): + return float(obj) return JSONEncoder.default(self, obj) diff --git a/deeppavlov/dataset_iterators/dialog_iterator.py b/deeppavlov/dataset_iterators/dialog_iterator.py deleted file mode 100644 index a60991ca50..0000000000 --- a/deeppavlov/dataset_iterators/dialog_iterator.py +++ /dev/null @@ -1,123 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from overrides import overrides - -from deeppavlov.core.common.registry import register -from deeppavlov.core.data.data_learning_iterator import DataLearningIterator - - -@register('dialog_indexing_iterator') -class DialogDatasetIndexingIterator(DataLearningIterator): - """ - Iterates over dialog data, - generates batches where one sample is one dialog. - Assigns unique index value to each turn item of each dialog. - - A subclass of :class:`~deeppavlov.core.data.data_learning_iterator.DataLearningIterator`. - - Attributes: - train: list of training dialogs (tuples ``(context, response)``) - valid: list of validation dialogs (tuples ``(context, response)``) - test: list of dialogs used for testing (tuples ``(context, response)``) - """ - - Xs_LABEL = 'x' - Ys_LABEL = 'y' - - @overrides - def preprocess(self, data, *args, **kwargs): - dialogs = [] - prev_resp_act = None - for x, y in data: - if x.get('episode_done'): - del x['episode_done'] - prev_resp_act = None - dialogs.append(([], [])) - x['prev_resp_act'] = prev_resp_act - prev_resp_act = y['act'] - - dialogue_label = str(len(dialogs)) - dialogue_x_item_label = str(len(dialogs[-1][0])) - dialogue_y_item_label = str(len(dialogs[-1][1])) - - x_item_full_label = f"{self.Xs_LABEL}_{dialogue_label}_{dialogue_x_item_label}" - y_item_full_label = f"{self.Ys_LABEL}_{dialogue_label}_{dialogue_y_item_label}" - - x['indexed_value'] = x_item_full_label - y['indexed_value'] = y_item_full_label - - x['dialogue_label'] = dialogue_label - y['dialogue_label'] = dialogue_label - - dialogs[-1][0].append(x) - dialogs[-1][1].append(y) - return dialogs - - -@register('dialog_iterator') -class DialogDatasetIterator(DataLearningIterator): - """ - Iterates over dialog data, - generates batches where one sample is one dialog. - - A subclass of :class:`~deeppavlov.core.data.data_learning_iterator.DataLearningIterator`. - - Attributes: - train: list of training dialogs (tuples ``(context, response)``) - valid: list of validation dialogs (tuples ``(context, response)``) - test: list of dialogs used for testing (tuples ``(context, response)``) - """ - - @overrides - def preprocess(self, data, *args, **kwargs): - dialogs = [] - prev_resp_act = None - for x, y in data: - if x.get('episode_done'): - del x['episode_done'] - prev_resp_act = None - dialogs.append(([], [])) - x['prev_resp_act'] = prev_resp_act - prev_resp_act = y['act'] - dialogs[-1][0].append(x) - dialogs[-1][1].append(y) - return dialogs - - -@register('dialog_db_result_iterator') -class DialogDBResultDatasetIterator(DataLearningIterator): - """ - Iterates over dialog data, - outputs list of all ``'db_result'`` fields (if present). - - The class helps to build a list of all ``'db_result'`` values present in a dataset. - - Inherits key methods and attributes from :class:`~deeppavlov.core.data.data_learning_iterator.DataLearningIterator`. - - Attributes: - train: list of tuples ``(db_result dictionary, '')`` from "train" data - valid: list of tuples ``(db_result dictionary, '')`` from "valid" data - test: list of tuples ``(db_result dictionary, '')`` from "test" data - """ - - @staticmethod - def _db_result(data): - x, y = data - if 'db_result' in x: - return x['db_result'] - - @overrides - def preprocess(self, data, *args, **kwargs): - return [(r, "") for r in filter(None, map(self._db_result, data))] diff --git a/deeppavlov/dataset_iterators/dstc2_intents_iterator.py b/deeppavlov/dataset_iterators/dstc2_intents_iterator.py deleted file mode 100644 index 3ad34bee4c..0000000000 --- a/deeppavlov/dataset_iterators/dstc2_intents_iterator.py +++ /dev/null @@ -1,85 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - - -from logging import getLogger -from typing import List - -from deeppavlov.core.common.registry import register -from deeppavlov.dataset_iterators.basic_classification_iterator import BasicClassificationDatasetIterator - -log = getLogger(__name__) - - -@register('dstc2_intents_iterator') -class Dstc2IntentsDatasetIterator(BasicClassificationDatasetIterator): - """ - Class gets data dictionary from DSTC2DatasetReader instance, construct intents from act and slots, \ - merge fields if necessary, split a field if necessary - - Args: - data: dictionary of data with fields "train", "valid" and "test" (or some of them) - fields_to_merge: list of fields (out of ``"train", "valid", "test"``) to merge - merged_field: name of field (out of ``"train", "valid", "test"``) to which save merged fields - field_to_split: name of field (out of ``"train", "valid", "test"``) to split - split_fields: list of fields (out of ``"train", "valid", "test"``) to which save splitted field - split_proportions: list of corresponding proportions for splitting - seed: random seed - shuffle: whether to shuffle examples in batches - *args: arguments - **kwargs: arguments - - Attributes: - data: dictionary of data with fields "train", "valid" and "test" (or some of them) - """ - - def __init__(self, data: dict, - fields_to_merge: List[str] = None, merged_field: str = None, - field_to_split: str = None, split_fields: List[str] = None, split_proportions: List[float] = None, - seed: int = None, shuffle: bool = True, - *args, **kwargs): - """ - Initialize dataset using data from DatasetReader, - merges and splits fields according to the given parameters - """ - super().__init__(data, fields_to_merge, merged_field, - field_to_split, split_fields, split_proportions, - seed=seed, shuffle=shuffle) - - new_data = dict() - new_data['train'] = [] - new_data['valid'] = [] - new_data['test'] = [] - - for field in ['train', 'valid', 'test']: - for turn in self.data[field]: - reply = turn[0] - curr_intents = [] - if reply['intents']: - for intent in reply['intents']: - for slot in intent['slots']: - if slot[0] == 'slot': - curr_intents.append(intent['act'] + '_' + slot[1]) - else: - curr_intents.append(intent['act'] + '_' + slot[0]) - if len(intent['slots']) == 0: - curr_intents.append(intent['act']) - else: - if reply['text']: - curr_intents.append('unknown') - else: - continue - new_data[field].append((reply['text'], curr_intents)) - - self.data = new_data diff --git a/deeppavlov/dataset_iterators/dstc2_ner_iterator.py b/deeppavlov/dataset_iterators/dstc2_ner_iterator.py deleted file mode 100644 index 7b12721497..0000000000 --- a/deeppavlov/dataset_iterators/dstc2_ner_iterator.py +++ /dev/null @@ -1,102 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import json -import logging -from typing import List, Tuple, Dict, Any - -from deeppavlov.core.commands.utils import expand_path -from deeppavlov.core.common.registry import register -from deeppavlov.core.data.data_learning_iterator import DataLearningIterator - -logger = logging.getLogger(__name__) - - -@register('dstc2_ner_iterator') -class Dstc2NerDatasetIterator(DataLearningIterator): - """ - Iterates over data for DSTC2 NER task. Dataset takes a dict with fields 'train', 'test', 'valid'. A list of samples - (pairs x, y) is stored in each field. - - Args: - data: list of (x, y) pairs, samples from the dataset: x as well as y can be a tuple of different input features. - dataset_path: path to dataset - seed: value for random seed - shuffle: whether to shuffle the data - """ - - def __init__(self, - data: Dict[str, List[Tuple]], - slot_values_path: str, - seed: int = None, - shuffle: bool = False): - # TODO: include slot vals to dstc2.tar.gz - with expand_path(slot_values_path).open(encoding='utf8') as f: - self._slot_vals = json.load(f) - super().__init__(data, seed, shuffle) - - def preprocess(self, - data: List[Tuple[Any, Any]], - *args, **kwargs) -> List[Tuple[Any, Any]]: - processed_data = list() - processed_texts = dict() - for x, y in data: - text = x['text'] - if not text.strip(): - continue - intents = [] - if 'intents' in x: - intents = x['intents'] - elif 'slots' in x: - intents = [x] - # aggregate slots from different intents - slots = list() - for intent in intents: - current_slots = intent.get('slots', []) - for slot_type, slot_val in current_slots: - if not self._slot_vals or (slot_type in self._slot_vals): - slots.append((slot_type, slot_val,)) - # remove duplicate pairs (text, slots) - if (text in processed_texts) and (slots in processed_texts[text]): - continue - processed_texts[text] = processed_texts.get(text, []) + [slots] - - processed_data.append(self._add_bio_markup(text, slots)) - return processed_data - - def _add_bio_markup(self, - utterance: str, - slots: List[Tuple[str, str]]) -> Tuple[List, List]: - tokens = utterance.split() - n_toks = len(tokens) - tags = ['O' for _ in range(n_toks)] - for n in range(n_toks): - for slot_type, slot_val in slots: - for entity in self._slot_vals[slot_type].get(slot_val, - [slot_val]): - slot_tokens = entity.split() - slot_len = len(slot_tokens) - if n + slot_len <= n_toks and \ - self._is_equal_sequences(tokens[n: n + slot_len], - slot_tokens): - tags[n] = 'B-' + slot_type - for k in range(1, slot_len): - tags[n + k] = 'I-' + slot_type - break - return tokens, tags - - @staticmethod - def _is_equal_sequences(seq1, seq2): - equality_list = [tok1 == tok2 for tok1, tok2 in zip(seq1, seq2)] - return all(equality_list) diff --git a/deeppavlov/dataset_iterators/elmo_file_paths_iterator.py b/deeppavlov/dataset_iterators/elmo_file_paths_iterator.py deleted file mode 100644 index a887fe8b4c..0000000000 --- a/deeppavlov/dataset_iterators/elmo_file_paths_iterator.py +++ /dev/null @@ -1,154 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from logging import getLogger -from pathlib import Path -from typing import Tuple, Iterator, Optional, Dict, List, Union - -from deeppavlov.core.common.registry import register -from deeppavlov.core.data.simple_vocab import SimpleVocabulary -from deeppavlov.core.data.utils import chunk_generator -from deeppavlov.dataset_iterators.file_paths_iterator import FilePathsIterator -from deeppavlov.models.preprocessors.str_utf8_encoder import StrUTF8Encoder - -log = getLogger(__name__) - - -@register('elmo_file_paths_iterator') -class ELMoFilePathsIterator(FilePathsIterator): - """Dataset iterator for tokenized datasets like 1 Billion Word Benchmark - It gets lists of file paths from the data dictionary and returns batches of lines from each file. - - Args: - data: dict with keys ``'train'``, ``'valid'`` and ``'test'`` and values - load_path: path to the vocabulary to be load from - seed: random seed for data shuffling - shuffle: whether to shuffle data during batching - unroll_steps: number of unrolling steps - n_gpus: number of gpu to use - max_word_length: max length of word - bos: tag of begin of sentence - eos: tag of end of sentence - - """ - - def __init__(self, - data: Dict[str, List[Union[str, Path]]], - load_path: Union[str, Path], - seed: Optional[int] = None, - shuffle: bool = True, - unroll_steps: Optional[int] = None, - n_gpus: Optional[int] = None, - max_word_length: Optional[int] = None, - bos: str = "", - eos: str = "", - *args, **kwargs) -> None: - self.unroll_steps = unroll_steps - self.n_gpus = n_gpus - self.bos = bos - self.eos = eos - self.str_utf8_encoder = StrUTF8Encoder( - max_word_length=max_word_length, - pad_special_char_use=True, - word_boundary_special_char_use=True, - sentence_boundary_special_char_use=False, - reversed_sentense_tokens=False, - bos=self.bos, - eos=self.eos, - save_path=load_path, - load_path=load_path, - ) - self.simple_vocab = SimpleVocabulary( - min_freq=2, - special_tokens=[self.eos, self.bos, ""], - unk_token="", - freq_drop_load=True, - save_path=load_path, - load_path=load_path, - ) - super().__init__(data, seed, shuffle, *args, **kwargs) - - def _line2ids(self, line): - line = [self.bos] + line.split() + [self.eos] - - char_ids = self.str_utf8_encoder(line) - reversed_char_ids = list(reversed(char_ids)) - char_ids = char_ids[:-1] - reversed_char_ids = reversed_char_ids[:-1] - - token_ids = self.simple_vocab(line) - reversed_token_ids = list(reversed(token_ids)) - token_ids = token_ids[1:] - reversed_token_ids = reversed_token_ids[1:] - - return char_ids, reversed_char_ids, token_ids, reversed_token_ids - - def _line_generator(self, shard_generator): - for shard in shard_generator: - line_generator = chunk_generator(shard, 1) - for line in line_generator: - line = line[0] - char_ids, reversed_char_ids, token_ids, reversed_token_ids = \ - self._line2ids(line) - yield char_ids, reversed_char_ids, token_ids, reversed_token_ids - - @staticmethod - def _batch_generator(line_generator, batch_size, unroll_steps): - batch = [[[] for i in range(4)] for i in range(batch_size)] - stream = [[[] for i in range(4)] for i in range(batch_size)] - - try: - while True: - for batch_item, stream_item in zip(batch, stream): - while len(stream_item[0]) < unroll_steps: - line = next(line_generator) - for sti, lni in zip(stream_item, line): - sti.extend(lni) - for sti, bchi in zip(stream_item, batch_item): - _b = sti[:unroll_steps] - _s = sti[unroll_steps:] - bchi.clear() - _b = _b - bchi.extend(_b) - - sti.clear() - sti.extend(_s) - char_ids, reversed_char_ids, token_ids, reversed_token_ids = \ - zip(*batch) - yield char_ids, reversed_char_ids, token_ids, reversed_token_ids - except StopIteration: - pass - - def gen_batches(self, batch_size: int, data_type: str = 'train', shuffle: Optional[bool] = None) \ - -> Iterator[Tuple[str, str]]: - if shuffle is None: - shuffle = self.shuffle - - tgt_data = self.data[data_type] - shard_generator = self._shard_generator(tgt_data, shuffle=shuffle) - line_generator = self._line_generator(shard_generator) - - if data_type == 'train': - unroll_steps = self.unroll_steps - n_gpus = self.n_gpus - else: - unroll_steps = 1 - batch_size = 256 - n_gpus = 1 - - batch_generator = self._batch_generator(line_generator, batch_size * n_gpus, unroll_steps) - - for char_ids, reversed_char_ids, token_ids, reversed_token_ids in batch_generator: - batch = [(char_ids, reversed_char_ids), (token_ids, reversed_token_ids)] - yield batch diff --git a/deeppavlov/dataset_iterators/file_paths_iterator.py b/deeppavlov/dataset_iterators/file_paths_iterator.py deleted file mode 100644 index 9d8769f8b2..0000000000 --- a/deeppavlov/dataset_iterators/file_paths_iterator.py +++ /dev/null @@ -1,74 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from logging import getLogger -from pathlib import Path -from typing import Tuple, Iterator, Optional, Dict, List, Union - -import numpy as np - -from deeppavlov.core.common.registry import register -from deeppavlov.core.data.data_learning_iterator import DataLearningIterator -from deeppavlov.core.data.utils import chunk_generator - -log = getLogger(__name__) - - -@register('file_paths_iterator') -class FilePathsIterator(DataLearningIterator): - """Dataset iterator for datasets like 1 Billion Word Benchmark. - It gets lists of file paths from the data dictionary and returns lines from each file. - - Args: - data: dict with keys ``'train'``, ``'valid'`` and ``'test'`` and values - seed: random seed for data shuffling - shuffle: whether to shuffle data during batching - - """ - - def __init__(self, - data: Dict[str, List[Union[str, Path]]], - seed: Optional[int] = None, - shuffle: bool = True, - *args, **kwargs) -> None: - self.seed = seed - self.np_random = np.random.RandomState(seed) - super().__init__(data, seed, shuffle, *args, **kwargs) - - def _shard_generator(self, shards: List[Union[str, Path]], shuffle: bool = False) -> List[str]: - shards_to_choose = list(shards) - if shuffle: - self.np_random.shuffle(shards_to_choose) - for shard in shards_to_choose: - log.info(f'Loaded shard from {shard}') - with open(shard, encoding='utf-8') as f: - lines = f.readlines() - if shuffle: - self.np_random.shuffle(lines) - yield lines - - def gen_batches(self, batch_size: int, data_type: str = 'train', shuffle: Optional[bool] = None) \ - -> Iterator[Tuple[str, str]]: - if shuffle is None: - shuffle = self.shuffle - - tgt_data = self.data[data_type] - shard_generator = self._shard_generator(tgt_data, shuffle=shuffle) - - for shard in shard_generator: - if not (batch_size): - bs = len(shard) - lines_generator = chunk_generator(shard, bs) - for lines in lines_generator: - yield (lines, [None] * len(lines)) diff --git a/deeppavlov/dataset_iterators/kvret_dialog_iterator.py b/deeppavlov/dataset_iterators/kvret_dialog_iterator.py deleted file mode 100644 index c2147c63a2..0000000000 --- a/deeppavlov/dataset_iterators/kvret_dialog_iterator.py +++ /dev/null @@ -1,77 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from overrides import overrides - -from deeppavlov.core.common.registry import register -from deeppavlov.core.data.data_learning_iterator import DataLearningIterator - - -@register('kvret_dialog_iterator') -class KvretDialogDatasetIterator(DataLearningIterator): - """ - Inputs data from :class:`~deeppavlov.dataset_readers.dstc2_reader.DSTC2DatasetReader`, constructs dialog history for each turn, generates batches (one sample is a turn). - - Inherits key methods and attributes from :class:`~deeppavlov.core.data.data_learning_iterator.DataLearningIterator`. - - Attributes: - train: list of "train" ``(context, response)`` tuples - valid: list of "valid" ``(context, response)`` tuples - test: list of "test" ``(context, response)`` tuples - """ - - # TODO: write custom batch_generator: order of utterances from one dialogue is presumed - @staticmethod - def _dialogs(data): - dialogs = [] - history = [] - task = None - for x, y in data: - if x.get('episode_done'): - # history = [] - history = "" - dialogs.append((([], [], [], [], []), ([], []))) - task = y['task'] - # history.append((x, y)) - history = history + ' ' + x['text'] + ' ' + y['text'] - # x['history'] = history[:-1] - x['history'] = history[:-len(x['text']) - len(y['text']) - 2] - dialogs[-1][0][0].append(x['text']) - dialogs[-1][0][1].append(x['dialog_id']) - dialogs[-1][0][2].append(x['history']) - dialogs[-1][0][3].append(x.get('kb_columns', None)) - dialogs[-1][0][4].append(x.get('kb_items', None)) - dialogs[-1][1][0].append(y['text']) - dialogs[-1][1][1].append(task) - return dialogs - - @overrides - def preprocess(self, data, *args, **kwargs): - utters = [] - history = [] - for x, y in data: - if x.get('episode_done'): - # x_hist, y_hist = [], [] - history = "" - # x_hist.append(x['text']) - # y_hist.append(y['text']) - history = history + ' ' + x['text'] + ' ' + y['text'] - # x['x_hist'] = x_hist[:-1] - # x['y_hist'] = y_hist[:-1] - x['history'] = history[:-len(x['text']) - len(y['text']) - 2] - x_tuple = (x['text'], x['dialog_id'], x['history'], - x['kb_columns'], x['kb_items']) - y_tuple = (y['text'], y['task']['intent']) - utters.append((x_tuple, y_tuple)) - return utters diff --git a/deeppavlov/dataset_iterators/morphotagger_iterator.py b/deeppavlov/dataset_iterators/morphotagger_iterator.py deleted file mode 100644 index 40af273f51..0000000000 --- a/deeppavlov/dataset_iterators/morphotagger_iterator.py +++ /dev/null @@ -1,120 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import random -from typing import Tuple, List, Dict, Any, Iterator - -import numpy as np - -from deeppavlov.core.common.registry import register -from deeppavlov.core.data.data_learning_iterator import DataLearningIterator -from deeppavlov.models.preprocessors.capitalization import process_word - - -def preprocess_data(data: List[Tuple[List[str], List[str]]], to_lower: bool = True, - append_case: str = "first") -> List[Tuple[List[Tuple[str]], List[str]]]: - """Processes all words in data using - :func:`~deeppavlov.dataset_iterators.morphotagger_iterator.process_word`. - - Args: - data: a list of pairs (words, tags), each pair corresponds to a single sentence - to_lower: whether to lowercase - append_case: whether to add case mark - - Returns: - a list of preprocessed sentences - """ - new_data = [] - for words, tags in data: - new_words = [process_word(word, to_lower=to_lower, append_case=append_case) - for word in words] - # tags could also be processed in future - new_tags = tags - new_data.append((new_words, new_tags)) - return new_data - - -@register('morphotagger_dataset') -class MorphoTaggerDatasetIterator(DataLearningIterator): - """ - Iterates over data for Morphological Tagging. - A subclass of :class:`~deeppavlov.core.data.data_learning_iterator.DataLearningIterator`. - - Args: - seed: random seed for data shuffling - shuffle: whether to shuffle data during batching - validation_split: the fraction of validation data - (is used only if there is no `valid` subset in `data`) - min_train_fraction: minimal fraction of train data in train+dev dataset, - For fair comparison with UD Pipe it is set to 0.9 for UD experiments. - It is actually used only for Turkish data. - """ - - def __init__(self, data: Dict[str, List[Tuple[Any, Any]]], seed: int = None, - shuffle: bool = True, min_train_fraction: float = 0.0, - validation_split: float = 0.2) -> None: - self.validation_split = validation_split - self.min_train_fraction = min_train_fraction - super().__init__(data, seed, shuffle) - - def split(self, *args, **kwargs) -> None: - """ - Splits the `train` part to `train` and `valid`, if no `valid` part is specified. - Moves deficient data from `valid` to `train` if both parts are given, - but `train` subset is too small. - """ - if len(self.valid) == 0: - if self.shuffle: - self.random.shuffle(self.train) - L = int(len(self.train) * (1.0 - self.validation_split)) - self.train, self.valid = self.train[:L], self.train[L:] - elif self.min_train_fraction > 0.0: - train_length = len(self.train) - valid_length = len(self.valid) - gap = int(self.min_train_fraction * (train_length + valid_length)) - train_length - if gap > 0: - self.train.extend(self.valid[:gap]) - self.valid = self.valid[gap:] - return - - def gen_batches(self, batch_size: int, data_type: str = 'train', - shuffle: bool = None, return_indexes: bool = False) -> Iterator[tuple]: - """Generate batches of inputs and expected output to train neural networks - Args: - batch_size: number of samples in batch - data_type: can be either 'train', 'test', or 'valid' - shuffle: whether to shuffle dataset before batching - return_indexes: whether to return indexes of batch elements in initial dataset - Yields: - a tuple of a batch of inputs and a batch of expected outputs. - If `return_indexes` is True, also yields indexes of batch elements. - """ - if shuffle is None: - shuffle = self.shuffle - data = self.data[data_type] - lengths = [len(x[0]) for x in data] - indexes = np.argsort(lengths) - L = len(data) - if batch_size < 0: - batch_size = L - starts = list(range(0, L, batch_size)) - if shuffle: - self.random.shuffle(starts) - for start in starts: - indexes_to_yield = indexes[start:start + batch_size] - data_to_yield = tuple(list(x) for x in zip(*([data[i] for i in indexes_to_yield]))) - if return_indexes: - yield indexes_to_yield, data_to_yield - else: - yield data_to_yield diff --git a/deeppavlov/dataset_iterators/multitask_iterator.py b/deeppavlov/dataset_iterators/multitask_iterator.py index 0168bc3927..61e4939b5a 100644 --- a/deeppavlov/dataset_iterators/multitask_iterator.py +++ b/deeppavlov/dataset_iterators/multitask_iterator.py @@ -12,130 +12,15 @@ # See the License for the specific language governing permissions and # limitations under the License. -import copy import math from logging import getLogger -from typing import Iterator, Optional, Tuple, Union +from typing import Optional -from deeppavlov.core.common.registry import register -from deeppavlov.core.common.params import from_params from deeppavlov.core.data.data_learning_iterator import DataLearningIterator log = getLogger(__name__) -@register('multitask_iterator') -class MultiTaskIterator: - """ - Class merges data from several dataset iterators. When used for batch generation batches from - merged dataset iterators are united into one batch. If sizes of merged datasets are different - smaller datasets are repeated until their size becomes equal to the largest dataset. - - Args: - data: dictionary which keys are task names and values are dictionaries with fields - ``"train", "valid", "test"``. - tasks: dictionary which keys are task names and values are init params of dataset iterators. - - Attributes: - data: dictionary of data with fields "train", "valid" and "test" (or some of them) - """ - - def __init__(self, data: dict, tasks: dict): - self.task_iterators = {} - for task_name, task_iterator_params in tasks.items(): - task_iterator_params = copy.deepcopy(task_iterator_params) - task_iterator_params['class_name'] = task_iterator_params['iterator_class_name'] - del task_iterator_params['iterator_class_name'] - self.task_iterators[task_name] = from_params(task_iterator_params, data=data[task_name]) - - self.train = self._extract_data_type('train') - self.valid = self._extract_data_type('valid') - self.test = self._extract_data_type('test') - self.data = { - 'train': self.train, - 'valid': self.valid, - 'test': self.test, - 'all': self._unite_dataset_parts(self.train, self.valid, self.test) - } - - def _extract_data_type(self, data_type): - dataset_part = {} - for task, iterator in self.task_iterators.items(): - dataset_part[task] = getattr(iterator, data_type) - return dataset_part - - @staticmethod - def _unite_dataset_parts(*dataset_parts): - united = {} - for ds_part in dataset_parts: - for task, data in ds_part.items(): - if task not in united: - united[task] = data - else: - united[task] = united[task] + data - return united - - def gen_batches(self, batch_size: int, data_type: str = 'train', - shuffle: bool = None) -> Iterator[Tuple[tuple, tuple]]: - """Generate batches and expected output to train neural networks. Batches from task iterators - are united into one batch. Every element of the largest dataset is used once whereas smaller - datasets are repeated until their size is equal to the largest dataset. - - Args: - batch_size: number of samples in batch - data_type: can be either 'train', 'test', or 'valid' - shuffle: whether to shuffle dataset before batching - - Yields: - a tuple of a batch of inputs and a batch of expected outputs. Inputs and outputs are tuples. - Element of inputs or outputs is a tuple which elements are x values of merged tasks in the order - tasks are present in `tasks` argument of `__init__` method. - """ - max_task_data_len = max([len(iter_.data[data_type]) for iter_ in self.task_iterators.values()]) - - size_of_last_batch = max_task_data_len % batch_size - if size_of_last_batch == 0: - size_of_last_batch = batch_size - - n_batches = math.ceil(max_task_data_len / batch_size) - for task_batches in zip( - *[RepeatBatchGenerator(iter_, batch_size, data_type, shuffle, n_batches, size_of_last_batch) for - iter_ in self.task_iterators.values()] - ): - x_instances, y_instances = [], [] - for task_batch in task_batches: - x_instances.append(task_batch[0]) - y_instances.append(task_batch[1]) - b = (tuple(zip(*x_instances)), tuple(zip(*y_instances))) - yield b - - def get_instances(self, data_type: str = 'train'): - """Returns a tuple of inputs and outputs from all datasets. Lengths of inputs and outputs are equal to - the size of the largest dataset. Smaller datasets are repeated until their sizes are equal to the - size of the largest dataset. - - Args: - data_type: can be either 'train', 'test', or 'valid' - - Returns: - a tuple of all inputs for a data type and all expected outputs for a data type - """ - max_task_data_len = max( - [len(iter_.get_instances(data_type)[0]) for iter_ in self.task_iterators.values()]) - x_instances = [] - y_instances = [] - for task_name, iter_ in self.task_iterators.items(): - x, y = iter_.get_instances(data_type) - n_repeats = math.ceil(max_task_data_len / len(x)) - x *= n_repeats - y *= n_repeats - x_instances.append(x[:max_task_data_len]) - y_instances.append(y[:max_task_data_len]) - - instances = (tuple(zip(*x_instances)), tuple(zip(*y_instances))) - return instances - - class RepeatBatchGenerator: """Repeating dataset. If there is not enough elements in the dataset to form another batch, elements for the batch are drawn in the beginning of the dataset. Optionally dataset is reshuffled before a repeat. @@ -150,7 +35,7 @@ class RepeatBatchGenerator: """ def __init__( self, - dataset_iterator: Union[MultiTaskIterator, DataLearningIterator], + dataset_iterator: DataLearningIterator, batch_size: int, data_type: str, shuffle: bool, diff --git a/deeppavlov/dataset_iterators/ner_few_shot_iterator.py b/deeppavlov/dataset_iterators/ner_few_shot_iterator.py deleted file mode 100644 index 52e1fa38c1..0000000000 --- a/deeppavlov/dataset_iterators/ner_few_shot_iterator.py +++ /dev/null @@ -1,144 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import re -from typing import List, Dict, Tuple, Any, Iterator, Optional - -import numpy as np - -from deeppavlov.core.common.registry import register -from deeppavlov.core.data.data_learning_iterator import DataLearningIterator - - -@register('ner_few_shot_iterator') -class NERFewShotIterator(DataLearningIterator): - """Dataset iterator for simulating few-shot Named Entity Recognition setting. - - Args: - data: list of (x, y) pairs for every data type in ``'train'``, ``'valid'`` and ``'test'`` - seed: random seed for data shuffling - shuffle: whether to shuffle data during batching - target_tag: the tag of interest. For this tag the few-shot setting will be simulated - filter_bi: whether to filter BIO markup or not - n_train_samples: number of training samples in the few shot setting. The validation and the test sets will be - the same - remove_not_targets: whether to replace all non target tags with `O` tag or not. - """ - - def __init__(self, - data: Dict[str, List[Tuple[Any, Any]]], - seed: int = None, - shuffle: bool = True, - target_tag: str = None, - filter_bi: bool = True, - n_train_samples: int = 20, - remove_not_targets: bool = True, - *args, **kwargs) -> None: - super(NERFewShotIterator, self).__init__(data=data, seed=seed, shuffle=shuffle) - self.target_tag = target_tag - self.filter_bi = filter_bi - self.n_train_samples = n_train_samples - self.remove_not_targets = remove_not_targets - if self.target_tag is None: - raise RuntimeError('You must provide a target tag to NERFewShotIterator!') - - self.n_samples = len(self.train) - - if self.remove_not_targets: - self._remove_not_target_tags() - - if self.filter_bi: - for key in self.data: - for n, (x, y) in enumerate(self.data[key]): - self.data[key][n] = [x, [re.sub('(B-|I-)', '', tag) for tag in y]] - - self.tag_map = np.zeros(self.n_samples, dtype=bool) - for n, (toks, tags) in enumerate(self.data['train']): - if self.filter_bi: - self.tag_map[n] = any(self.target_tag == tag for tag in tags if len(tag) > 2) - else: - self.tag_map[n] = any(self.target_tag == tag[2:] for tag in tags if len(tag) > 2) - - self.marked_nums = None - self.unmarked_nums = None - self._sample_marked() - - def _sample_marked(self): - np.zeros(len(self.data['train']), dtype=bool) - n_marked = 0 - self.marked_mask = np.zeros(self.n_samples, dtype=bool) - while n_marked < self.n_train_samples: - is_picked = True - while is_picked: - n = np.random.randint(self.n_samples) - if not self.marked_mask[n]: - is_picked = False - self.marked_mask[n] = True - if self.tag_map[n]: - n_marked += 1 - - self.marked_nums = np.arange(self.n_samples)[self.marked_mask] - self.unmarked_nums = np.arange(self.n_samples)[~self.marked_mask] - - def _remove_not_target_tags(self): - if self.remove_not_targets: - for key in self.data: - for n, (x, y) in enumerate(self.data[key]): - tags = [] - for tag in y: - if tag.endswith('-' + self.target_tag): - tags.append(tag) - else: - tags.append('O') - self.data[key][n] = [x, tags] - - def get_instances(self, data_type: str = 'train') -> Tuple[List[List[str]], List[List[str]]]: - """Get all data for a selected data type - - Args: - data_type (str): can be either ``'train'``, ``'test'``, ``'valid'`` or ``'all'`` - - Returns: - a tuple of all inputs for a data type and all expected outputs for a data type - """ - - if data_type == 'train': - samples = [self.data[data_type][i] for i in self.marked_nums] - else: - samples = self.data[data_type][:] - - x, y = list(zip(*samples)) - - return x, y - - def gen_batches(self, batch_size: int, - data_type: str = 'train', - shuffle: Optional[bool] = None) -> Iterator[Tuple[List[List[str]], List[List[str]]]]: - x, y = self.get_instances(data_type) - data_len = len(x) - - if data_len == 0: - return - - order = list(range(data_len)) - if shuffle is None and self.shuffle: - self.random.shuffle(order) - elif shuffle: - self.random.shuffle(order) - - if batch_size < 0: - batch_size = data_len - - for i in range((data_len - 1) // batch_size + 1): - yield tuple(zip(*[(x[o], y[o]) for o in order[i * batch_size:(i + 1) * batch_size]])) diff --git a/deeppavlov/dataset_iterators/snips_intents_iterator.py b/deeppavlov/dataset_iterators/snips_intents_iterator.py deleted file mode 100644 index 2a881634ac..0000000000 --- a/deeppavlov/dataset_iterators/snips_intents_iterator.py +++ /dev/null @@ -1,30 +0,0 @@ -# Copyright 2019 Alexey Romanov -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from overrides import overrides - -from deeppavlov.core.common.registry import register -from deeppavlov.core.data.data_learning_iterator import DataLearningIterator - - -@register('snips_intents_iterator') -class SnipsIntentIterator(DataLearningIterator): - @overrides - def preprocess(self, data, *args, **kwargs): - result = [] - for query in data: - text = ''.join(part['text'] for part in query['data']) - intent = query['intent'] - result.append((text, intent)) - return result diff --git a/deeppavlov/dataset_iterators/snips_ner_iterator.py b/deeppavlov/dataset_iterators/snips_ner_iterator.py deleted file mode 100644 index 2186ebbaa9..0000000000 --- a/deeppavlov/dataset_iterators/snips_ner_iterator.py +++ /dev/null @@ -1,42 +0,0 @@ -# Copyright 2019 Alexey Romanov -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import nltk -from overrides import overrides - -from deeppavlov.core.common.registry import register -from deeppavlov.core.data.data_learning_iterator import DataLearningIterator - - -@register('snips_ner_iterator') -class SnipsNerIterator(DataLearningIterator): - @overrides - def preprocess(self, data, *args, **kwargs): - result = [] - for query in data: - query = query['data'] - words = [] - slots = [] - for part in query: - part_words = nltk.tokenize.wordpunct_tokenize(part['text']) - entity = part.get('entity', None) - if entity: - slots.append('B-' + entity) - slots += ['I-' + entity] * (len(part_words) - 1) - else: - slots += ['O'] * len(part_words) - words += part_words - - result.append((words, slots)) - return result diff --git a/deeppavlov/dataset_iterators/squad_iterator.py b/deeppavlov/dataset_iterators/squad_iterator.py index c7300799f8..8328ba0243 100644 --- a/deeppavlov/dataset_iterators/squad_iterator.py +++ b/deeppavlov/dataset_iterators/squad_iterator.py @@ -25,7 +25,8 @@ @register('squad_iterator') class SquadIterator(DataLearningIterator): """SquadIterator allows to iterate over examples in SQuAD-like datasets. - SquadIterator is used to train :class:`~deeppavlov.models.squad.squad.SquadModel`. + SquadIterator is used to train + :class:`~deeppavlov.models.torch_bert.torch_transformers_squad:TorchTransformersSquad`. It extracts ``context``, ``question``, ``answer_text`` and ``answer_start`` position from dataset. Example from a dataset is a tuple of ``(context, question)`` and ``(answer_text, answer_start)`` @@ -58,9 +59,13 @@ def preprocess(self, data: Dict[str, Any], *args, **kwargs) -> \ q = qa['question'] ans_text = [] ans_start = [] - for answer in qa['answers']: - ans_text.append(answer['text']) - ans_start.append(answer['answer_start']) + if qa['answers']: + for answer in qa['answers']: + ans_text.append(answer['text']) + ans_start.append(answer['answer_start']) + else: + ans_text = [''] + ans_start = [-1] cqas.append(((context, q), (ans_text, ans_start))) return cqas diff --git a/deeppavlov/dataset_readers/dstc2_reader.py b/deeppavlov/dataset_readers/dstc2_reader.py deleted file mode 100644 index 55127f297a..0000000000 --- a/deeppavlov/dataset_readers/dstc2_reader.py +++ /dev/null @@ -1,362 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, softwaredata -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - - -import copy -import json -from logging import getLogger -from pathlib import Path -from typing import Dict, List - -from overrides import overrides - -from deeppavlov.core.common.registry import register -from deeppavlov.core.data.dataset_reader import DatasetReader -from deeppavlov.core.data.utils import download_decompress, mark_done - -log = getLogger(__name__) - - -@register('dstc2_reader') -class DSTC2DatasetReader(DatasetReader): - """ - Contains labelled dialogs from Dialog State Tracking Challenge 2 - (http://camdial.org/~mh521/dstc/). - - There've been made the following modifications to the original dataset: - - 1. added api calls to restaurant database - - - example: ``{"text": "api_call area=\"south\" food=\"dontcare\" - pricerange=\"cheap\"", "dialog_acts": ["api_call"]}``. - - 2. new actions - - - bot dialog actions were concatenated into one action - (example: ``{"dialog_acts": ["ask", "request"]}`` -> - ``{"dialog_acts": ["ask_request"]}``) - - - if a slot key was associated with the dialog action, the new act - was a concatenation of an act and a slot key (example: - ``{"dialog_acts": ["ask"], "slot_vals": ["area"]}`` -> - ``{"dialog_acts": ["ask_area"]}``) - - 3. new train/dev/test split - - - original dstc2 consisted of three different MDP policies, the original - train and dev datasets (consisting of two policies) were merged and - randomly split into train/dev/test - - 4. minor fixes - - - fixed several dialogs, where actions were wrongly annotated - - uppercased first letter of bot responses - - unified punctuation for bot responses - """ - - url = 'http://files.deeppavlov.ai/datasets/dstc2_v2.tar.gz' - - @staticmethod - def _data_fname(datatype): - assert datatype in ('trn', 'val', 'tst'), "wrong datatype name" - return f"dstc2-{datatype}.jsonlist" - - @classmethod - @overrides - def read(self, data_path: str, dialogs: bool = False) -> Dict[str, List]: - """ - Downloads ``'dstc2_v2.tar.gz'`` archive from ipavlov internal server, - decompresses and saves files to ``data_path``. - - Parameters: - data_path: path to save DSTC2 dataset - dialogs: flag which indicates whether to output list of turns or - list of dialogs - - Returns: - dictionary that contains ``'train'`` field with dialogs from - ``'dstc2-trn.jsonlist'``, ``'valid'`` field with dialogs from - ``'dstc2-val.jsonlist'`` and ``'test'`` field with dialogs from - ``'dstc2-tst.jsonlist'``. Each field is a list of tuples ``(x_i, y_i)``. - """ - required_files = (self._data_fname(dt) for dt in ('trn', 'val', 'tst')) - if not all(Path(data_path, f).exists() for f in required_files): - log.info(f"[downloading data from {self.url} to {data_path}]") - download_decompress(self.url, data_path) - mark_done(data_path) - - data = { - 'train': self._read_from_file( - Path(data_path, self._data_fname('trn')), dialogs), - 'valid': self._read_from_file( - Path(data_path, self._data_fname('val')), dialogs), - 'test': self._read_from_file( - Path(data_path, self._data_fname('tst')), dialogs) - } - return data - - @classmethod - def _read_from_file(cls, file_path, dialogs=False): - """Returns data from single file""" - log.info(f"[loading dialogs from {file_path}]") - - utterances, responses, dialog_indices = \ - cls._get_turns(cls._iter_file(file_path), with_indices=True) - - data = list(map(cls._format_turn, zip(utterances, responses))) - - if dialogs: - return [data[idx['start']:idx['end']] for idx in dialog_indices] - return data - - @staticmethod - def _format_turn(turn): - turn_x, turn_y = turn - x = {'text': turn_x['text'], - 'intents': turn_x['dialog_acts']} - if turn_x.get('db_result') is not None: - x['db_result'] = turn_x['db_result'] - if turn_x.get('episode_done'): - x['episode_done'] = True - y = {'text': turn_y['text'], - 'act': turn_y['dialog_acts'][0]['act']} - return (x, y) - - @staticmethod - def _iter_file(file_path): - for ln in open(file_path, 'rt', encoding='utf8'): - if ln.strip(): - yield json.loads(ln) - else: - yield {} - - @staticmethod - def _get_turns(data, with_indices=False): - utterances = [] - responses = [] - dialog_indices = [] - n = 0 - num_dialog_utter, num_dialog_resp = 0, 0 - episode_done = True - for turn in data: - if not turn: - if num_dialog_utter != num_dialog_resp: - raise RuntimeError("Datafile in the wrong format.") - episode_done = True - n += num_dialog_utter - dialog_indices.append({ - 'start': n - num_dialog_utter, - 'end': n, - }) - num_dialog_utter, num_dialog_resp = 0, 0 - else: - speaker = turn.pop('speaker') - if speaker == 1: - if episode_done: - turn['episode_done'] = True - utterances.append(turn) - num_dialog_utter += 1 - elif speaker == 2: - if num_dialog_utter - 1 == num_dialog_resp: - responses.append(turn) - elif num_dialog_utter - 1 < num_dialog_resp: - if episode_done: - responses.append(turn) - utterances.append({ - "text": "", - "dialog_acts": [], - "episode_done": True} - ) - else: - new_turn = copy.deepcopy(utterances[-1]) - if 'db_result' not in responses[-1]: - raise RuntimeError(f"Every api_call action" - f" should have db_result," - f" turn = {responses[-1]}") - new_turn['db_result'] = responses[-1].pop('db_result') - utterances.append(new_turn) - responses.append(turn) - num_dialog_utter += 1 - else: - raise RuntimeError("there cannot be two successive turns of" - " speaker 1") - num_dialog_resp += 1 - else: - raise RuntimeError("Only speakers 1 and 2 are supported") - episode_done = False - - if with_indices: - return utterances, responses, dialog_indices - return utterances, responses - - -@register('simple_dstc2_reader') -class SimpleDSTC2DatasetReader(DatasetReader): - """ - Contains labelled dialogs from Dialog State Tracking Challenge 2 - (http://camdial.org/~mh521/dstc/). - - There've been made the following modifications to the original dataset: - - 1. added api calls to restaurant database - - - example: ``{"text": "api_call area=\"south\" food=\"dontcare\" - pricerange=\"cheap\"", "dialog_acts": ["api_call"]}``. - - 2. new actions - - - bot dialog actions were concatenated into one action - (example: ``{"dialog_acts": ["ask", "request"]}`` -> - ``{"dialog_acts": ["ask_request"]}``) - - - if a slot key was associated with the dialog action, the new act - was a concatenation of an act and a slot key (example: - ``{"dialog_acts": ["ask"], "slot_vals": ["area"]}`` -> - ``{"dialog_acts": ["ask_area"]}``) - - 3. new train/dev/test split - - - original dstc2 consisted of three different MDP policies, the original - train and dev datasets (consisting of two policies) were merged and - randomly split into train/dev/test - - 4. minor fixes - - - fixed several dialogs, where actions were wrongly annotated - - uppercased first letter of bot responses - - unified punctuation for bot responses - """ - - url = 'http://files.deeppavlov.ai/datasets/simple_dstc2.tar.gz' - - @staticmethod - def _data_fname(datatype): - assert datatype in ('trn', 'val', 'tst'), "wrong datatype name" - return f"simple-dstc2-{datatype}.json" - - @classmethod - @overrides - def read(self, data_path: str, dialogs: bool = False, encoding = 'utf-8') -> Dict[str, List]: - """ - Downloads ``'simple_dstc2.tar.gz'`` archive from internet, - decompresses and saves files to ``data_path``. - - Parameters: - data_path: path to save DSTC2 dataset - dialogs: flag which indicates whether to output list of turns or - list of dialogs - - Returns: - dictionary that contains ``'train'`` field with dialogs from - ``'simple-dstc2-trn.json'``, ``'valid'`` field with dialogs - from ``'simple-dstc2-val.json'`` and ``'test'`` field with - dialogs from ``'simple-dstc2-tst.json'``. - Each field is a list of tuples ``(user turn, system turn)``. - """ - required_files = (self._data_fname(dt) for dt in ('trn', 'val', 'tst')) - if not all(Path(data_path, f).exists() for f in required_files): - log.info(f"{[Path(data_path, f) for f in required_files]}]") - log.info(f"[downloading data from {self.url} to {data_path}]") - download_decompress(self.url, data_path) - mark_done(data_path) - - data = { - 'train': self._read_from_file( - Path(data_path, self._data_fname('trn')), dialogs, encoding), - 'valid': self._read_from_file( - Path(data_path, self._data_fname('val')), dialogs, encoding), - 'test': self._read_from_file( - Path(data_path, self._data_fname('tst')), dialogs, encoding) - } - log.info(f"There are {len(data['train'])} samples in train split.") - log.info(f"There are {len(data['valid'])} samples in valid split.") - log.info(f"There are {len(data['test'])} samples in test split.") - return data - - @classmethod - def _read_from_file(cls, file_path: str, dialogs: bool = False, encoding = 'utf-8'): - """Returns data from single file""" - log.info(f"[loading dialogs from {file_path}]") - - utterances, responses, dialog_indices = \ - cls._get_turns(json.load(open(file_path, mode = 'rt', encoding = encoding)), with_indices=True) - - data = list(map(cls._format_turn, zip(utterances, responses))) - - if dialogs: - return [data[idx['start']:idx['end']] for idx in dialog_indices] - return data - - @staticmethod - def _format_turn(turn): - turn_x, turn_y = turn - x = {'text': turn_x['text']} - y = {'text': turn_y['text'], - 'act': turn_y['act']} - if 'act' in turn_x: - x['intents'] = turn_x['act'] - if 'episode_done' in turn_x: - x['episode_done'] = turn_x['episode_done'] - if turn_x.get('db_result') is not None: - x['db_result'] = turn_x['db_result'] - if turn_x.get('slots'): - x['slots'] = turn_x['slots'] - if turn_y.get('slots'): - y['slots'] = turn_y['slots'] - return (x, y) - - @staticmethod - def _get_turns(data, with_indices=False): - n = 0 - utterances, responses, dialog_indices = [], [], [] - for dialog in data: - cur_n_utter, cur_n_resp = 0, 0 - for i, turn in enumerate(dialog): - speaker = turn.pop('speaker') - if speaker == 1: - if i == 0: - turn['episode_done'] = True - utterances.append(turn) - cur_n_utter += 1 - elif speaker == 2: - responses.append(turn) - cur_n_resp += 1 - if cur_n_utter not in range(cur_n_resp - 2, cur_n_resp + 1): - raise RuntimeError("Datafile has wrong format.") - if cur_n_utter != cur_n_resp: - if i == 0: - new_utter = { - "text": "", - "episode_done": True - } - else: - new_utter = copy.deepcopy(utterances[-1]) - if 'db_result' not in responses[-2]: - raise RuntimeError("Every api_call action" - " should have db_result") - db_result = responses[-2].pop('db_result') - new_utter['db_result'] = db_result - utterances.append(new_utter) - cur_n_utter += 1 - if cur_n_utter != cur_n_resp: - raise RuntimeError("Datafile has wrong format.") - n += cur_n_utter - dialog_indices.append({ - 'start': n - cur_n_utter, - 'end': n, - }) - - if with_indices: - return utterances, responses, dialog_indices - return utterances, responses diff --git a/deeppavlov/dataset_readers/file_paths_reader.py b/deeppavlov/dataset_readers/file_paths_reader.py deleted file mode 100644 index 4e6cbae299..0000000000 --- a/deeppavlov/dataset_readers/file_paths_reader.py +++ /dev/null @@ -1,66 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, softwaredata -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from logging import getLogger -from pathlib import Path -from typing import Dict, Optional, Union - -from overrides import overrides - -from deeppavlov.core.common.registry import register -from deeppavlov.core.data.dataset_reader import DatasetReader - -log = getLogger(__name__) - - -@register('file_paths_reader') -class FilePathsReader(DatasetReader): - """Find all file paths by a data path glob""" - - @overrides - def read(self, data_path: Union[str, Path], train: Optional[str] = None, - valid: Optional[str] = None, test: Optional[str] = None, - *args, **kwargs) -> Dict: - """ - Find all file paths by a data path glob - - Args: - data_path: directory with data - train: data path glob relative to data_path - valid: data path glob relative to data_path - test: data path glob relative to data_path - - Returns: - A dictionary containing training, validation and test parts of the dataset obtainable via ``train``, - ``valid`` and ``test`` keys. - """ - - dataset = dict() - dataset["train"] = self._get_files(data_path, train) - dataset["valid"] = self._get_files(data_path, valid) - dataset["test"] = self._get_files(data_path, test) - return dataset - - def _get_files(self, data_path, tgt): - if tgt is not None: - paths = Path(data_path).resolve().glob(tgt) - files = [file for file in paths if Path(file).is_file()] - paths_info = Path(data_path, tgt).absolute().as_posix() - if not files: - raise Exception(f"Not find files. Data path '{paths_info}' does not exist or does not hold files!") - else: - log.info(f"Found {len(files)} files located '{paths_info}'.") - else: - files = [] - return files diff --git a/deeppavlov/dataset_readers/intent_catcher_reader.py b/deeppavlov/dataset_readers/intent_catcher_reader.py deleted file mode 100644 index d916273db9..0000000000 --- a/deeppavlov/dataset_readers/intent_catcher_reader.py +++ /dev/null @@ -1,55 +0,0 @@ -# Copyright 2018 Neural Networks and Deep Learning lab, MIPT -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# http://www.apache.org/licenses/LICENSE-2.0 -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from json import load -from logging import getLogger -from pathlib import Path -from typing import Dict, List, Tuple - -from deeppavlov.core.common.registry import register -from deeppavlov.core.data.dataset_reader import DatasetReader - -log = getLogger(__file__) - - -@register('intent_catcher_reader') -class IntentCatcherReader(DatasetReader): - """Reader for Intent Catcher dataset in json format""" - - def read(self, data_path: str, *args, **kwargs) -> Dict[str, List[Tuple[str, str]]]: - data_types = ["train", "valid", "test"] - - train_file = kwargs.get('train', 'train.json') - - if not Path(data_path, train_file).exists(): - raise Exception( - "data path {} does not exist or is empty!".format( - data_path)) - - data = {"train": [], - "valid": [], - "test": []} - - for data_type in data_types: - file_name = kwargs.get(data_type, '{}.{}'.format(data_type, "json")) - if file_name is None: - continue - - file = Path(data_path).joinpath(file_name) - if file.exists(): - with open(file) as fp: - file = load(fp) - for label in file: - data[data_type].extend([(phrase, label) for phrase in file[label]]) - else: - log.warning("Cannot find {} file".format(file)) - - return data diff --git a/deeppavlov/dataset_readers/kbqa_reader.py b/deeppavlov/dataset_readers/kbqa_reader.py deleted file mode 100644 index 0dea282fe5..0000000000 --- a/deeppavlov/dataset_readers/kbqa_reader.py +++ /dev/null @@ -1,48 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from pathlib import Path - -from deeppavlov.core.common.registry import register -from deeppavlov.core.data.dataset_reader import DatasetReader -from deeppavlov.core.data.utils import download_decompress - - -@register('kbqa_reader') -class KBQAReader(DatasetReader): - """Class to read test set of questions and answers for knowledge base question answering""" - - def read(self, data_path: str): - data_path = Path(data_path) - files = list(data_path.glob('*.txt')) - test_set_filename = "test_set_with_answers.txt" - if test_set_filename not in {file_path.name for file_path in files}: - url = 'http://files.deeppavlov.ai/kbqa/test_set_with_answers.zip' - data_path.mkdir(exist_ok=True, parents=True) - download_decompress(url, data_path) - dataset = {} - - dataset["test"] = self.parse_ner_file(data_path / test_set_filename) - dataset["train"] = [] - dataset["valid"] = [] - return dataset - - def parse_ner_file(self, file_name: Path): - samples = [] - with file_name.open(encoding='utf8') as f: - for line in f: - line_split = line.strip('\n').split('\t') - samples.append((line_split[0], tuple(line_split[1:]))) - - return samples diff --git a/deeppavlov/dataset_readers/kvret_reader.py b/deeppavlov/dataset_readers/kvret_reader.py deleted file mode 100644 index c218dc4298..0000000000 --- a/deeppavlov/dataset_readers/kvret_reader.py +++ /dev/null @@ -1,183 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, softwaredata -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import json -from logging import getLogger -from pathlib import Path -from typing import Dict, List - -from overrides import overrides - -from deeppavlov.core.common.registry import register -from deeppavlov.core.data.dataset_reader import DatasetReader -from deeppavlov.core.data.utils import download_decompress, mark_done - -log = getLogger(__name__) - - -@register('kvret_reader') -class KvretDatasetReader(DatasetReader): - """ - A New Multi-Turn, Multi-Domain, Task-Oriented Dialogue Dataset. - - Stanford NLP released a corpus of 3,031 multi-turn dialogues in three distinct domains appropriate for an in-car assistant: calendar scheduling, weather information retrieval, and point-of-interest navigation. The dialogues are grounded through knowledge bases ensuring that they are versatile in their natural language without being completely free form. - - For details see https://nlp.stanford.edu/blog/a-new-multi-turn-multi-domain-task-oriented-dialogue-dataset/. - """ - - url = 'http://files.deeppavlov.ai/datasets/kvret_public.tar.gz' - - @staticmethod - def _data_fname(datatype): - assert datatype in ('train', 'dev', 'test'), "wrong datatype name" - return 'kvret_{}_public.json'.format(datatype) - - @classmethod - @overrides - def read(self, data_path: str, dialogs: bool = False) -> Dict[str, List]: - """ - Downloads ``'kvrest_public.tar.gz'``, decompresses, saves files to ``data_path``. - - Parameters: - data_path: path to save data - dialogs: flag indices whether to output list of turns or list of dialogs - - Returns: - dictionary with ``'train'`` containing dialogs from ``'kvret_train_public.json'``, ``'valid'`` containing dialogs from ``'kvret_valid_public.json'``, ``'test'`` containing dialogs from ``'kvret_test_public.json'``. Each fields is a list of tuples ``(x_i, y_i)``. - """ - - required_files = (self._data_fname(dt) for dt in ('train', 'dev', 'test')) - if not all(Path(data_path, f).exists() for f in required_files): - log.info('[downloading dstc2 from {} to {}]'.format(self.url, data_path)) - download_decompress(self.url, data_path) - mark_done(data_path) - - data = { - 'train': self._read_from_file( - Path(data_path, self._data_fname('train')), dialogs), - 'valid': self._read_from_file( - Path(data_path, self._data_fname('dev')), dialogs), - 'test': self._read_from_file( - Path(data_path, self._data_fname('test')), dialogs) - } - return data - - @classmethod - def _read_from_file(cls, file_path, dialogs=False): - """Returns data from single file""" - log.info("[loading dialogs from {}]".format(file_path)) - - utterances, responses, dialog_indices = \ - cls._get_turns(cls._iter_file(file_path), with_indices=True) - - data = list(map(cls._format_turn, zip(utterances, responses))) - - if dialogs: - return [data[idx['start']:idx['end']] for idx in dialog_indices] - return data - - @staticmethod - def _format_turn(turn): - x = {'text': turn[0]['utterance'], - 'dialog_id': turn[0]['dialog_id'], - 'kb_columns': turn[0]['kb_columns'], - 'kb_items': turn[0]['kb_items'], - 'requested': turn[0].get('requested', {}), - 'slots': turn[0].get('slots', {})} - if turn[0].get('episode_done') is not None: - x['episode_done'] = turn[0]['episode_done'] - y = {'text': turn[1]['utterance'], - 'task': turn[0]['task'], - 'requested': turn[1].get('requested', {}), - 'slots': turn[1].get('slots', {})} - return (x, y) - - @staticmethod - def _check_dialog(dialog): - # TODO: manually fix bad dialogs - driver = True - for turn in dialog: - if turn['turn'] not in ('driver', 'assistant'): - raise RuntimeError("Dataset wrong format: `turn` key value is" - " either `driver` or `assistant`.") - if driver and turn['turn'] != 'driver': - log.debug("Turn is expected to by driver's, but it's {}'s" \ - .format(turn['turn'])) - return False - if not driver and turn['turn'] != 'assistant': - log.debug("Turn is expected to be assistant's but it's {}'s" \ - .format(turn['turn'])) - return False - driver = not driver - # if not driver: - # log.debug("Last turn is expected to be by assistant") - # return False - return True - - @staticmethod - def _filter_duplicates(dialog): - last_turn, last_utter = None, None - for turn in dialog: - curr_turn, curr_utter = turn['turn'], turn['data']['utterance'] - if (curr_turn != last_turn) or (curr_utter != last_utter): - yield turn - last_turn, last_utter = curr_turn, curr_utter - - @classmethod - def _iter_file(cls, file_path): - with open(file_path, 'rt', encoding='utf8') as f: - data = json.load(f) - for i, sample in enumerate(data): - dialog = list(cls._filter_duplicates(sample['dialogue'])) - if cls._check_dialog(dialog): - yield dialog, sample['scenario'] - else: - log.warning("Skipping {}th dialogue with uuid={}: wrong format." \ - .format(i, sample['scenario']['uuid'])) - - @staticmethod - def _get_turns(data, with_indices=False): - utterances, responses, dialog_indices = [], [], [] - for dialog, scenario in data: - for i, turn in enumerate(dialog): - replica = turn['data'] - if i == 0: - replica['episode_done'] = True - if turn['turn'] == 'driver': - replica['task'] = scenario['task'] - replica['dialog_id'] = scenario['uuid'] - replica['kb_columns'] = scenario['kb']['column_names'] - replica['kb_items'] = scenario['kb']['items'] - utterances.append(replica) - else: - responses.append(replica) - - # if last replica was by driver - if len(responses) != len(utterances): - utterances[-1]['end_dialogue'] = False - responses.append({'utterance': '', 'end_dialogue': True}) - - last_utter = responses[-1]['utterance'] - if last_utter and not last_utter[-1].isspace(): - last_utter += ' ' - responses[-1]['utterance'] = last_utter + 'END_OF_DIALOGUE' - - dialog_indices.append({ - 'start': len(utterances), - 'end': len(utterances) + len(dialog), - }) - - if with_indices: - return utterances, responses, dialog_indices - return utterances, responses diff --git a/deeppavlov/dataset_readers/md_yaml_dialogs_reader.py b/deeppavlov/dataset_readers/md_yaml_dialogs_reader.py deleted file mode 100644 index 29a1b3f699..0000000000 --- a/deeppavlov/dataset_readers/md_yaml_dialogs_reader.py +++ /dev/null @@ -1,663 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, softwaredata -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - - -import json -import os -import re -import tempfile -from collections import defaultdict -from logging import getLogger -from overrides import overrides -from pathlib import Path -from typing import Dict, List, Tuple, Union, Any, Optional - -from deeppavlov.core.common.file import read_yaml -from deeppavlov.core.common.registry import register -from deeppavlov.core.data.dataset_reader import DatasetReader -from deeppavlov.dataset_readers.dstc2_reader import DSTC2DatasetReader - - -SLOT2VALUE_PAIRS_TUPLE = Tuple[Tuple[str, Any], ...] - -log = getLogger(__name__) - - -class DomainKnowledge: - """the DTO-like class to store the domain knowledge from the domain yaml config.""" - - def __init__(self, domain_knowledge_di: Dict): - self.known_entities: List = domain_knowledge_di.get("entities", []) - self.known_intents: List = domain_knowledge_di.get("intents", []) - self.known_actions: List = domain_knowledge_di.get("actions", []) - self.known_slots: Dict = domain_knowledge_di.get("slots", {}) - self.response_templates: Dict = domain_knowledge_di.get("responses", {}) - self.session_config: Dict = domain_knowledge_di.get("session_config", {}) - self.forms: Dict = domain_knowledge_di.get("forms", {}) - - @classmethod - def from_yaml(cls, domain_yml_fpath: Union[str, Path] = "domain.yml"): - """ - Parses domain.yml domain config file into the DomainKnowledge object - Args: - domain_yml_fpath: path to the domain config file, defaults to domain.yml - Returns: - the loaded DomainKnowledge obect - """ - return cls(read_yaml(domain_yml_fpath)) - - - -@register('md_yaml_dialogs_reader') -class MD_YAML_DialogsDatasetReader(DatasetReader): - """ - Reads dialogs from dataset composed of ``stories.md``, ``nlu.md``, ``domain.yml`` . - - ``stories.md`` is to provide the dialogues dataset for model to train on. The dialogues - are represented as user messages labels and system response messages labels: (not texts, just action labels). - This is so to distinguish the NLU-NLG tasks from the actual dialogues storytelling experience: one - should be able to describe just the scripts of dialogues to the system. - - ``nlu.md`` is contrariwise to provide the NLU training set irrespective of the dialogues scripts. - - ``domain.yml`` is to desribe the task-specific domain and serves two purposes: - provide the NLG templates and provide some specific configuration of the NLU - """ - - _USER_SPEAKER_ID = 1 - _SYSTEM_SPEAKER_ID = 2 - - VALID_DATATYPES = ('trn', 'val', 'tst') - - NLU_FNAME = "nlu.md" - DOMAIN_FNAME = "domain.yml" - - @classmethod - def _data_fname(cls, datatype: str) -> str: - assert datatype in cls.VALID_DATATYPES, f"wrong datatype name: {datatype}" - return f"stories-{datatype}.md" - - @classmethod - @overrides - def read(cls, data_path: str, dialogs: bool = False, ignore_slots: bool = False) -> Dict[str, List]: - """ - Parameters: - data_path: path to read dataset from - dialogs: flag which indicates whether to output list of turns or - list of dialogs - ignore_slots: whether to ignore slots information provided in stories.md or not - - Returns: - dictionary that contains - ``'train'`` field with dialogs from ``'stories-trn.md'``, - ``'valid'`` field with dialogs from ``'stories-val.md'`` and - ``'test'`` field with dialogs from ``'stories-tst.md'``. - Each field is a list of tuples ``(x_i, y_i)``. - """ - domain_fname = cls.DOMAIN_FNAME - nlu_fname = cls.NLU_FNAME - stories_fnames = tuple(cls._data_fname(dt) for dt in cls.VALID_DATATYPES) - required_fnames = stories_fnames + (nlu_fname, domain_fname) - for required_fname in required_fnames: - required_path = Path(data_path, required_fname) - if not required_path.exists(): - log.error(f"INSIDE MLU_MD_DialogsDatasetReader.read(): " - f"{required_fname} not found with path {required_path}") - - domain_path = Path(data_path, domain_fname) - domain_knowledge = DomainKnowledge.from_yaml(domain_path) - intent2slots2text, slot_name2text2value = cls._read_intent2text_mapping(Path(data_path, nlu_fname), - domain_knowledge, ignore_slots) - - short2long_subsample_name = {"trn": "train", - "val": "valid", - "tst": "test"} - - data = {short2long_subsample_name[subsample_name_short]: - cls._read_story(Path(data_path, cls._data_fname(subsample_name_short)), - dialogs, domain_knowledge, intent2slots2text, slot_name2text2value, - ignore_slots=ignore_slots) - for subsample_name_short in cls.VALID_DATATYPES} - - return data - - @classmethod - def _read_intent2text_mapping(cls, nlu_fpath: Path, domain_knowledge: DomainKnowledge, ignore_slots: bool = False) \ - -> Tuple[Dict[str, Dict[SLOT2VALUE_PAIRS_TUPLE, List]], - Dict[str, Dict[str, str]]]: - - slots_markup_pattern = r"\[" + \ - r"(?P.*?)" + \ - r"\]" + \ - r"\(" + \ - r"(?P.*?)" + \ - r"\)" - - intent2slots2text = defaultdict(lambda: defaultdict(list)) - slot_name2text2value = defaultdict(lambda: defaultdict(list)) - - curr_intent_name = None - - with open(nlu_fpath) as nlu_f: - for line in nlu_f: - if line.startswith("##"): - # lines starting with ## are starting section describing new intent type - curr_intent_name = line.strip("##").strip().split("intent:", 1)[-1] - - if line.strip().startswith('-'): - # lines starting with - are listing the examples of intent texts of the current intent type - intent_text_w_markup = line.strip().strip('-').strip() - line_slots_found = re.finditer(slots_markup_pattern, intent_text_w_markup) - if ignore_slots: - line_slots_found = [] - - curr_char_ix = 0 - intent_text_without_markup = '' - cleaned_text_slots = [] # intent text can contain slots highlighted - for line_slot in line_slots_found: - line_slot_l_span, line_slot_r_span = line_slot.span() - # intent w.o. markup for "some [entity](entity_example) text" is "some entity text" - # so we should remove brackets and the parentheses content - intent_text_without_markup += intent_text_w_markup[curr_char_ix:line_slot_l_span] - - slot_value_text = str(line_slot["slot_value"]) - slot_name = line_slot["slot_name"] - slot_value = slot_value_text - if ':' in slot_name: - slot_name, slot_value = slot_name.split(':', 1) # e.g. [moderately](price:moderate) - - assert slot_name in domain_knowledge.known_slots, f"{slot_name} from {nlu_fpath}" + \ - " was not listed as slot " + \ - "in domain knowledge config" - - slot_value_new_l_span = len(intent_text_without_markup) # l span in cleaned text - slot_value_new_r_span = slot_value_new_l_span + len(slot_value_text) # r span in cleaned text - # intent w.o. markup for "some [entity](entity_example) text" is "some entity text" - # so we should remove brackets and the parentheses content - intent_text_without_markup += slot_value_text - - cleaned_text_slots.append((slot_name, slot_value)) - - slot_name2text2value[slot_name][slot_value_text].append(slot_value) - - curr_char_ix = line_slot_r_span - intent_text_without_markup += intent_text_w_markup[curr_char_ix: len(intent_text_w_markup)] - - slots_key = tuple(sorted((slot[0], slot[1]) for slot in cleaned_text_slots)) - intent2slots2text[curr_intent_name][slots_key].append({"text": intent_text_without_markup, - "slots_di": cleaned_text_slots, - "slots": slots_key}) - - # defaultdict behavior is no more needed - intent2slots2text = {k: dict(v) for k, v in intent2slots2text.items()} - slot_name2text2value = dict(slot_name2text2value) - - return intent2slots2text, slot_name2text2value - - @classmethod - def _read_story(cls, - story_fpath: Path, - dialogs: bool, - domain_knowledge: DomainKnowledge, - intent2slots2text: Dict[str, Dict[SLOT2VALUE_PAIRS_TUPLE, List]], - slot_name2text2value: Dict[str, Dict[str, str]], - ignore_slots: bool = False) \ - -> Union[List[List[Tuple[Dict[str, bool], Dict[str, Any]]]], List[Tuple[Dict[str, bool], Dict[str, Any]]]]: - """ - Reads stories from the specified path converting them to go-bot format on the fly. - - Args: - story_fpath: path to the file containing the stories dataset - dialogs: flag which indicates whether to output list of turns or - list of dialogs - domain_knowledge: the domain knowledge, usually inferred from domain.yml - intent2slots2text: the mapping allowing given the intent class and - slotfilling values of utterance, restore utterance text. - slot_name2text2value: the mapping of possible slot values spellings to the values themselves. - Returns: - stories read as if it was done with DSTC2DatasetReader._read_from_file() - """ - log.debug(f"BEFORE MLU_MD_DialogsDatasetReader._read_story(): " - f"story_fpath={story_fpath}, " - f"dialogs={dialogs}, " - f"domain_knowledge={domain_knowledge}, " - f"intent2slots2text={intent2slots2text}, " - f"slot_name2text2value={slot_name2text2value}") - - default_system_start = { - "speaker": cls._SYSTEM_SPEAKER_ID, - "text": "start", - "dialog_acts": [{"act": "start", "slots": []}]} - default_system_goodbye = { - "text": "goodbye :(", - "dialog_acts": [{"act": "utter_goodbye", "slots": []}], - "speaker": cls._SYSTEM_SPEAKER_ID} # TODO infer from dataset - - stories_parsed = {} - - curr_story_title = None - curr_story_utters_batch = [] - nonlocal_curr_story_bad = False # can be modified as a nonlocal variable - - def process_user_utter(line: str) -> List[List[Dict[str, Any]]]: - """ - given the stories.md user line, returns the batch of all the dstc2 ways to represent it - Args: - line: the system line to generate dstc2 versions for - - Returns: - all the possible dstc2 versions of the passed story line - """ - nonlocal intent2slots2text, slot_name2text2value, curr_story_utters_batch, nonlocal_curr_story_bad - try: - possible_user_utters = cls.augment_user_turn(intent2slots2text, line, slot_name2text2value) - # dialogs MUST start with system replics - for curr_story_utters in curr_story_utters_batch: - if not curr_story_utters: - curr_story_utters.append(default_system_start) - - utters_to_append_batch = [] - for user_utter in possible_user_utters: - utters_to_append_batch.append([user_utter]) - - except KeyError: - log.debug(f"INSIDE MLU_MD_DialogsDatasetReader._read_story(): " - f"Skipping story w. line {line} because of no NLU candidates found") - nonlocal_curr_story_bad = True - utters_to_append_batch = [] - return utters_to_append_batch - - def process_system_utter(line: str) -> List[List[Dict[str, Any]]]: - """ - given the stories.md system line, returns the batch of all the dstc2 ways to represent it - Args: - line: the system line to generate dstc2 versions for - - Returns: - all the possible dstc2 versions of the passed story line - """ - nonlocal intent2slots2text, domain_knowledge, curr_story_utters_batch, nonlocal_curr_story_bad - system_action = cls.parse_system_turn(domain_knowledge, line) - system_action_name = system_action.get("dialog_acts")[0].get("act") - - for curr_story_utters in curr_story_utters_batch: - if cls.last_turn_is_systems_turn(curr_story_utters): - # deal with consecutive system actions by inserting the last user replics in between - curr_story_utters.append(cls.get_last_users_turn(curr_story_utters)) - - def parse_form_name(story_line: str) -> str: - """ - if the line (in stories.md utterance format) contains a form name, return it - Args: - story_line: line to extract form name from - - Returns: - the extracted form name or None if no form name found - """ - form_name = None - if story_line.startswith("form"): - form_di = json.loads(story_line[len("form"):]) - form_name = form_di["name"] - return form_name - - if system_action_name.startswith("form"): - form_name = parse_form_name(system_action_name) - augmented_utters = cls.augment_form(form_name, domain_knowledge, intent2slots2text) - - utters_to_append_batch = [[]] - for user_utter in augmented_utters: - new_curr_story_utters_batch = [] - for curr_story_utters in utters_to_append_batch: - possible_extensions = process_story_line(user_utter) - for possible_extension in possible_extensions: - new_curr_story_utters = curr_story_utters.copy() - new_curr_story_utters.extend(possible_extension) - new_curr_story_utters_batch.append(new_curr_story_utters) - utters_to_append_batch = new_curr_story_utters_batch - else: - utters_to_append_batch = [[system_action]] - return utters_to_append_batch - - def process_story_line(line: str) -> List[List[Dict[str, Any]]]: - """ - given the stories.md line, returns the batch of all the dstc2 ways to represent it - Args: - line: the line to generate dstc2 versions - - Returns: - all the possible dstc2 versions of the passed story line - """ - if line.startswith('*'): - utters_to_extend_with_batch = process_user_utter(line) - elif line.startswith('-'): - utters_to_extend_with_batch = process_system_utter(line) - else: - # todo raise an exception - utters_to_extend_with_batch = [] - return utters_to_extend_with_batch - - story_file = open(story_fpath) - for line in story_file: - line = line.strip() - if not line: - continue - if line.startswith('#'): - # #... marks the beginning of new story - if curr_story_utters_batch and curr_story_utters_batch[0] and curr_story_utters_batch[0][-1]["speaker"] == cls._USER_SPEAKER_ID: - for curr_story_utters in curr_story_utters_batch: - curr_story_utters.append(default_system_goodbye) # dialogs MUST end with system replics - - if not nonlocal_curr_story_bad: - for curr_story_utters_ix, curr_story_utters in enumerate(curr_story_utters_batch): - stories_parsed[curr_story_title+f"_{curr_story_utters_ix}"] = curr_story_utters - - curr_story_title = line.strip('#') - curr_story_utters_batch = [[]] - nonlocal_curr_story_bad = False - else: - new_curr_story_utters_batch = [] - possible_extensions = process_story_line(line) - for curr_story_utters in curr_story_utters_batch: - for user_utter in possible_extensions: - new_curr_story_utters = curr_story_utters.copy() - new_curr_story_utters.extend(user_utter) - new_curr_story_utters_batch.append(new_curr_story_utters) - curr_story_utters_batch = new_curr_story_utters_batch - # curr_story_utters.extend(process_story_line(line)) - story_file.close() - - if not nonlocal_curr_story_bad: - for curr_story_utters_ix, curr_story_utters in enumerate(curr_story_utters_batch): - stories_parsed[curr_story_title + f"_{curr_story_utters_ix}"] = curr_story_utters - - tmp_f = tempfile.NamedTemporaryFile(delete=False, mode='w', encoding="utf-8") - for story_id, story in stories_parsed.items(): - for replics in story: - print(json.dumps(replics), file=tmp_f) - print(file=tmp_f) - tmp_f.close() - # noinspection PyProtectedMember - gobot_formatted_stories = DSTC2DatasetReader._read_from_file(tmp_f.name, dialogs=dialogs) - os.remove(tmp_f.name) - - log.debug(f"AFTER MLU_MD_DialogsDatasetReader._read_story(): " - f"story_fpath={story_fpath}, " - f"dialogs={dialogs}, " - f"domain_knowledge={domain_knowledge}, " - f"intent2slots2text={intent2slots2text}, " - f"slot_name2text2value={slot_name2text2value}") - - return gobot_formatted_stories - - @classmethod - def augment_form(cls, form_name: str, domain_knowledge: DomainKnowledge, intent2slots2text: Dict) -> List[str]: - """ - Replaced the form mention in stories.md with the actual turns relevant to the form - Args: - form_name: the name of form to generate turns for - domain_knowledge: the domain knowledge (see domain.yml in RASA) relevant to the processed config - intent2slots2text: the mapping of intents and particular slots onto text - - Returns: - the story turns relevant to the passed form - """ - form = domain_knowledge.forms[form_name] # todo handle keyerr - augmended_story = [] - for slot_name, slot_info_li in form.items(): - if slot_info_li and slot_info_li[0].get("type", '') == "from_entity": - # we only handle from_entity slots - known_responses = list(domain_knowledge.response_templates) - known_intents = list(intent2slots2text.keys()) - augmended_story.extend(cls.augment_slot(known_responses, known_intents, slot_name, form_name)) - return augmended_story - - @classmethod - def augment_slot(cls, known_responses: List[str], known_intents: List[str], slot_name: str, form_name: str) \ - -> List[str]: - """ - Given the slot name, generates a sequence of system turn asking for a slot and user' turn providing this slot - - Args: - known_responses: responses known to the system from domain.yml - known_intents: intents known to the system from domain.yml - slot_name: the name of the slot to augment for - form_name: the name of the form for which the turn is augmented - - Returns: - the list of stories.md alike turns - """ - ask_slot_act_name = cls.get_augmented_ask_slot_utter(form_name, known_responses, slot_name) - inform_slot_user_utter = cls.get_augmented_ask_intent_utter(known_intents, slot_name) - - return [f"- {ask_slot_act_name}", f"* {inform_slot_user_utter}"] - - @classmethod - def get_augmented_ask_intent_utter(cls, known_intents: List[str], slot_name: str) -> Optional[str]: - """ - if the system knows the inform_{slot} intent, return this intent name, otherwise return None - Args: - known_intents: intents known to the system - slot_name: the slot to look inform intent for - - Returns: - the slot informing intent or None - """ - inform_slot_user_utter_hypothesis = f"inform_{slot_name}" - if inform_slot_user_utter_hypothesis in known_intents: - inform_slot_user_utter = inform_slot_user_utter_hypothesis - else: - # todo raise an exception - inform_slot_user_utter = None - pass - return inform_slot_user_utter - - @classmethod - def get_augmented_ask_slot_utter(cls, form_name: str, known_responses: List[str], slot_name: str): - """ - if the system knows the ask_{slot} action, return this action name, otherwise return None - Args: - form_name: the name of the currently processed form - known_responses: actions known to the system - slot_name: the slot to look asking action for - - Returns: - the slot asking action or None - """ - ask_slot_act_name_hypothesis1 = f"utter_ask_{form_name}_{slot_name}" - ask_slot_act_name_hypothesis2 = f"utter_ask_{slot_name}" - if ask_slot_act_name_hypothesis1 in known_responses: - ask_slot_act_name = ask_slot_act_name_hypothesis1 - elif ask_slot_act_name_hypothesis2 in known_responses: - ask_slot_act_name = ask_slot_act_name_hypothesis2 - else: - # todo raise an exception - ask_slot_act_name = None - pass - return ask_slot_act_name - - @classmethod - def get_last_users_turn(cls, curr_story_utters: List[Dict]) -> Dict: - """ - Given the dstc2 story, return the last user utterance from it - Args: - curr_story_utters: the dstc2-formatted stoyr - - Returns: - the last user utterance from the passed story - """ - *_, last_user_utter = filter(lambda x: x["speaker"] == cls._USER_SPEAKER_ID, curr_story_utters) - return last_user_utter - - @classmethod - def last_turn_is_systems_turn(cls, curr_story_utters): - return curr_story_utters and curr_story_utters[-1]["speaker"] == cls._SYSTEM_SPEAKER_ID - - @classmethod - def parse_system_turn(cls, domain_knowledge: DomainKnowledge, line: str) -> Dict: - """ - Given the RASA stories.md line, returns the dstc2-formatted json (dict) for this line - Args: - domain_knowledge: the domain knowledge relevant to the processed stories config (from which line is taken) - line: the story system step representing line from stories.md - - Returns: - the dstc2-formatted passed turn - """ - # system actions are started in dataset with - - system_action_name = line.strip('-').strip() - curr_action_text = cls._system_action2text(domain_knowledge, system_action_name) - system_action = {"speaker": cls._SYSTEM_SPEAKER_ID, - "text": curr_action_text, - "dialog_acts": [{"act": system_action_name, "slots": []}]} - if system_action_name.startswith("action"): - system_action["db_result"] = {} - return system_action - - @classmethod - def augment_user_turn(cls, intent2slots2text, line: str, slot_name2text2value) -> List[Dict[str, Any]]: - """ - given the turn information generate all the possible stories representing it - Args: - intent2slots2text: the intents and slots to natural language utterances mapping known to the system - line: the line representing used utterance in stories.md format - slot_name2text2value: the slot names to values mapping known o the system - - Returns: - the batch of all the possible dstc2 representations of the passed intent - """ - # user actions are started in dataset with * - user_action, slots_dstc2formatted = cls._parse_user_intent(line) - slots_actual_values = cls._clarify_slots_values(slot_name2text2value, slots_dstc2formatted) - slots_to_exclude, slots_used_values, action_for_text = cls._choose_slots_for_whom_exists_text( - intent2slots2text, slots_actual_values, - user_action) - possible_user_response_infos = cls._user_action2text(intent2slots2text, action_for_text, slots_used_values) - possible_user_utters = [] - for user_response_info in possible_user_response_infos: - user_utter = {"speaker": cls._USER_SPEAKER_ID, - "text": user_response_info["text"], - "dialog_acts": [{"act": user_action, "slots": user_response_info["slots"]}], - "slots to exclude": slots_to_exclude} - possible_user_utters.append(user_utter) - return possible_user_utters - - @staticmethod - def _choose_slots_for_whom_exists_text(intent2slots2text: Dict[str, Dict[SLOT2VALUE_PAIRS_TUPLE, List]], - slots_actual_values: SLOT2VALUE_PAIRS_TUPLE, - user_action: str) -> Tuple[List, SLOT2VALUE_PAIRS_TUPLE, str]: - """ - - Args: - intent2slots2text: the mapping of intents and slots to natural language utterances representing them - slots_actual_values: the slot values information to look utterance for - user_action: the intent to look utterance for - - Returns: - the slots ommitted to find an NLU candidate, the slots represented in the candidate, the intent name used - """ - possible_keys = [k for k in intent2slots2text.keys() if user_action in k] - possible_keys = possible_keys + [user_action] - possible_keys = sorted(possible_keys, key=lambda action_s: action_s.count('+')) - for possible_action_key in possible_keys: - if intent2slots2text[possible_action_key].get(slots_actual_values): - slots_used_values = slots_actual_values - slots_to_exclude = [] - return slots_to_exclude, slots_used_values, possible_action_key - else: - slots_lazy_key = set(e[0] for e in slots_actual_values) - slots_lazy_key -= {"intent"} - fake_keys = [] - for known_key in intent2slots2text[possible_action_key].keys(): - if slots_lazy_key.issubset(set(e[0] for e in known_key)): - fake_keys.append(known_key) - break - - if fake_keys: - slots_used_values = sorted(fake_keys, key=lambda elem: (len(set(slots_actual_values) ^ set(elem)), - len([e for e in elem - if e[0] not in slots_lazy_key])) - )[0] - - slots_to_exclude = [e[0] for e in slots_used_values if e[0] not in slots_lazy_key] - return slots_to_exclude, slots_used_values, possible_action_key - - raise KeyError("no possible NLU candidates found") - - @staticmethod - def _clarify_slots_values(slot_name2text2value: Dict[str, Dict[str, Any]], - slots_dstc2formatted: List[List]) -> SLOT2VALUE_PAIRS_TUPLE: - slots_key = [] - for slot_name, slot_value in slots_dstc2formatted: - slot_actual_value = slot_name2text2value.get(slot_name, {}).get(slot_value, slot_value) - slots_key.append((slot_name, slot_actual_value)) - slots_key = tuple(sorted(slots_key)) - return slots_key - - @staticmethod - def _parse_user_intent(line: str, ignore_slots=False) -> Tuple[str, List[List]]: - """ - Given the intent line in RASA stories.md format, return the name of the intent and slots described with this line - Args: - line: the line to parse - ignore_slots: whether to ignore slots information - - Returns: - the pair of the intent name and slots ([[slot name, slot value],.. ]) info - """ - intent = line.strip('*').strip() - if '{' not in intent: - intent = intent + "{}" # the prototypical intent is "intent_name{slot1: value1, slotN: valueN}" - user_action, slots_info = intent.split('{', 1) - slots_info = json.loads('{' + slots_info) - slots_dstc2formatted = [[slot_name, slot_value] for slot_name, slot_value in slots_info.items()] - if ignore_slots: - slots_dstc2formatted = dict() - return user_action, slots_dstc2formatted - - @staticmethod - def _user_action2text(intent2slots2text: Dict[str, Dict[SLOT2VALUE_PAIRS_TUPLE, List]], - user_action: str, - slots_li: Optional[SLOT2VALUE_PAIRS_TUPLE] = None) -> List[str]: - """ - given the user intent, return the text representing this intent with passed slots - Args: - intent2slots2text: the mapping of intents and slots to natural language utterances - user_action: the name of intent to generate text for - slots_li: the slot values to provide - - Returns: - the text of utterance relevant to the passed intent and slots - """ - if slots_li is None: - slots_li = tuple() - return intent2slots2text[user_action][slots_li] - - @staticmethod - def _system_action2text(domain_knowledge: DomainKnowledge, system_action: str) -> str: - """ - given the system action name return the relevant template text - Args: - domain_knowledge: the domain knowledge relevant to the currently processed config - system_action: the name of the action to get intent for - - Returns: - template relevant to the passed action - """ - possible_system_responses = domain_knowledge.response_templates.get(system_action, - [{"text": system_action}]) - - response_text = possible_system_responses[0]["text"] - response_text = re.sub(r"(\w+)\=\{(.*?)\}", r"#\2", response_text) # TODO: straightforward regex string - - return response_text diff --git a/deeppavlov/dataset_readers/morphotagging_dataset_reader.py b/deeppavlov/dataset_readers/morphotagging_dataset_reader.py deleted file mode 100644 index 4402022252..0000000000 --- a/deeppavlov/dataset_readers/morphotagging_dataset_reader.py +++ /dev/null @@ -1,188 +0,0 @@ -# Copyright 2018 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import sys -from logging import getLogger -from pathlib import Path -from typing import Dict, List, Union, Tuple, Optional - -from deeppavlov.core.common.registry import register -from deeppavlov.core.data.dataset_reader import DatasetReader -from deeppavlov.core.data.utils import download_decompress, mark_done - -WORD_COLUMN, POS_COLUMN, TAG_COLUMN = 1, 3, 5 -HEAD_COLUMN, DEP_COLUMN = 6, 7 - -log = getLogger(__name__) - - -def get_language(filepath: str) -> str: - """Extracts language from typical UD filename - """ - return filepath.split("-")[0] - - -def read_infile(infile: Union[Path, str], *, from_words=False, - word_column: int = WORD_COLUMN, pos_column: int = POS_COLUMN, - tag_column: int = TAG_COLUMN, head_column: int = HEAD_COLUMN, - dep_column: int = DEP_COLUMN, max_sents: int = -1, - read_only_words: bool = False, read_syntax: bool = False) -> List[Tuple[List, Union[List, None]]]: - """Reads input file in CONLL-U format - - Args: - infile: a path to a file - word_column: column containing words (default=1) - pos_column: column containing part-of-speech labels (default=3) - tag_column: column containing fine-grained tags (default=5) - head_column: column containing syntactic head position (default=6) - dep_column: column containing syntactic dependency label (default=7) - max_sents: maximal number of sentences to read - read_only_words: whether to read only words - read_syntax: whether to return ``heads`` and ``deps`` alongside ``tags``. Ignored if read_only_words is ``True`` - - Returns: - a list of sentences. Each item contains a word sequence and an output sequence. - The output sentence is ``None``, if ``read_only_words`` is ``True``, - a single list of word tags if ``read_syntax`` is False, - and a list of the form [``tags``, ``heads``, ``deps``] in case ``read_syntax`` is ``True``. - - """ - answer, curr_word_sent, curr_tag_sent = [], [], [] - curr_head_sent, curr_dep_sent = [], [] - # read_syntax = read_syntax and read_only_words - if from_words: - word_column, read_only_words = 0, True - if infile is not sys.stdin: - fin = open(infile, "r", encoding="utf8") - else: - fin = sys.stdin - for line in fin: - line = line.strip() - if line.startswith("#"): - continue - if line == "": - if len(curr_word_sent) > 0: - if read_only_words: - curr_tag_sent = None - elif read_syntax: - curr_tag_sent = [curr_tag_sent, curr_head_sent, curr_dep_sent] - answer.append((curr_word_sent, curr_tag_sent)) - curr_tag_sent, curr_word_sent = [], [] - curr_head_sent, curr_dep_sent = [], [] - if len(answer) == max_sents: - break - continue - splitted = line.split("\t") - index = splitted[0] - if not from_words and not index.isdigit(): - continue - curr_word_sent.append(splitted[word_column]) - if not read_only_words: - pos, tag = splitted[pos_column], splitted[tag_column] - tag = pos if tag == "_" else "{},{}".format(pos, tag) - curr_tag_sent.append(tag) - if read_syntax: - curr_head_sent.append(int(splitted[head_column])) - curr_dep_sent.append(splitted[dep_column]) - if len(curr_word_sent) > 0: - if read_only_words: - curr_tag_sent = None - elif read_syntax: - curr_tag_sent = [curr_tag_sent, curr_head_sent, curr_dep_sent] - answer.append((curr_word_sent, curr_tag_sent)) - if infile is not sys.stdin: - fin.close() - return answer - - -@register('morphotagger_dataset_reader') -class MorphotaggerDatasetReader(DatasetReader): - """Class to read training datasets in UD format""" - - URL = 'http://files.deeppavlov.ai/datasets/UD2.0_source/' - - def read(self, data_path: Union[List, str], - language: Optional[str] = None, - data_types: Optional[List[str]] = None, - **kwargs) -> Dict[str, List]: - """Reads UD dataset from data_path. - - Args: - data_path: can be either - 1. a directory containing files. The file for data_type 'mode' - is then data_path / {language}-ud-{mode}.conllu - 2. a list of files, containing the same number of items as data_types - language: a language to detect filename when it is not given - data_types: which dataset parts among 'train', 'dev', 'test' are returned - - Returns: - a dictionary containing dataset fragments (see ``read_infile``) for given data types - """ - if data_types is None: - data_types = ["train", "dev"] - elif isinstance(data_types, str): - data_types = list(data_types) - for data_type in data_types: - if data_type not in ["train", "dev", "test"]: - raise ValueError("Unknown data_type: {}, only train, dev and test " - "datatypes are allowed".format(data_type)) - if isinstance(data_path, str): - data_path = Path(data_path) - if isinstance(data_path, Path): - if data_path.exists(): - is_file = data_path.is_file() - else: - is_file = (len(data_types) == 1) - if is_file: - # path to a single file - data_path, reserve_data_path = [data_path], None - else: - # path to data directory - if language is None: - raise ValueError("You must implicitly provide language " - "when providing data directory as source") - reserve_data_path = data_path - data_path = [data_path / "{}-ud-{}.conllu".format(language, mode) - for mode in data_types] - reserve_data_path = [ - reserve_data_path / language / "{}-ud-{}.conllu".format(language, mode) - for mode in data_types] - else: - data_path = [Path(data_path) for data_path in data_path] - reserve_data_path = None - if len(data_path) != len(data_types): - raise ValueError("The number of input files in data_path and data types " - "in data_types must be equal") - has_missing_files = any(not filepath.exists() for filepath in data_path) - if has_missing_files and reserve_data_path is not None: - has_missing_files = any(not filepath.exists() for filepath in reserve_data_path) - if not has_missing_files: - data_path = reserve_data_path - if has_missing_files: - # Files are downloaded from the Web repository - dir_path = data_path[0].parent - language = language or get_language(data_path[0].parts[-1]) - url = self.URL + "{}.tar.gz".format(language) - log.info('[downloading data from {} to {}]'.format(url, dir_path)) - dir_path.mkdir(exist_ok=True, parents=True) - download_decompress(url, dir_path) - mark_done(dir_path) - data = {} - for mode, filepath in zip(data_types, data_path): - if mode == "dev": - mode = "valid" -# if mode == "test": -# kwargs["read_only_words"] = True - data[mode] = read_infile(filepath, **kwargs) - return data diff --git a/deeppavlov/dataset_readers/multitask_reader.py b/deeppavlov/dataset_readers/multitask_reader.py deleted file mode 100644 index 593ffc7f66..0000000000 --- a/deeppavlov/dataset_readers/multitask_reader.py +++ /dev/null @@ -1,64 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import copy -import pickle -from logging import getLogger -from pathlib import Path -from typing import Dict - -from deeppavlov.core.common.params import from_params -from deeppavlov.core.common.registry import get_model, register -from deeppavlov.core.data.dataset_reader import DatasetReader - - -log = getLogger(__name__) - - -@register('multitask_reader') -class MultiTaskReader(DatasetReader): - """Class to read several datasets simultaneuosly""" - - def read(self, data_path, tasks: Dict[str, Dict[str, str]]): - """Creates dataset readers for tasks and returns what task dataset readers `read()` methods return. - - Args: - data_path: can be anything since it is not used. `data_path` is present because it is - required in train.py script. - tasks: dictionary which keys are task names and values are dictionaries with `DatasetReader` - subclasses specs. `DatasetReader` specs are provided in the same format as "dataset_reader" - in the model config except for "class_name" field which has to be named "reader_class_name". - ```json - "tasks": { - "query_prediction": { - "reader_class_name": "basic_classification_reader", - "x": "Question", - "y": "Class", - "data_path": "{DOWNLOADS_PATH}/query_prediction" - } - } - ``` - - Returns: - dictionary which keys are task names and values are what task readers `read()` methods returned. - """ - data = {} - for task_name, reader_params in tasks.items(): - reader_params = copy.deepcopy(reader_params) - tasks[task_name] = from_params({"class_name": reader_params['reader_class_name']}) - del reader_params['reader_class_name'] - reader_params['data_path'] = Path(reader_params['data_path']).expanduser() - data[task_name] = tasks[task_name].read(**reader_params) - return data - diff --git a/deeppavlov/dataset_readers/rured_reader.py b/deeppavlov/dataset_readers/rured_reader.py index e415e4193d..8c717a9579 100644 --- a/deeppavlov/dataset_readers/rured_reader.py +++ b/deeppavlov/dataset_readers/rured_reader.py @@ -5,7 +5,6 @@ from pathlib import Path from logging import getLogger from overrides import overrides -import matplotlib.pyplot as plt from deeppavlov.core.common.registry import register from deeppavlov.core.data.dataset_reader import DatasetReader @@ -68,8 +67,6 @@ def read(self, data_path: str, rel2id: Dict = None) -> Dict[str, List[Tuple]]: data = {"train": train_data, "valid": dev_data, "test": test_data} - self.draw_plot() - return data def process_rured_file(self, data: List[Dict], num_neg_samples: str) -> Tuple[List, Dict]: @@ -158,13 +155,6 @@ def label_to_one_hot(self, label: int) -> List[int]: relation[label] = 1 return relation - def draw_plot(self) -> None: - """ Make plots with NER tags """ - ner_stat_sorted = dict(list(reversed(sorted(self.ner_stat.items(), key=lambda item: item[1])))) - plt.bar(ner_stat_sorted.keys(), ner_stat_sorted.values()) - plt.xticks(rotation=270) - plt.show() - @staticmethod def add_default_rel_dict(): """ Creates a default relation to relation if dictionary with RuRED relations """ diff --git a/deeppavlov/dataset_readers/siamese_reader.py b/deeppavlov/dataset_readers/siamese_reader.py deleted file mode 100644 index 0b7553c6c2..0000000000 --- a/deeppavlov/dataset_readers/siamese_reader.py +++ /dev/null @@ -1,59 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import csv -from pathlib import Path -from typing import Dict, List, Tuple - -from deeppavlov.core.commands.utils import expand_path -from deeppavlov.core.common.registry import register -from deeppavlov.core.data.dataset_reader import DatasetReader - - -@register('siamese_reader') -class SiameseReader(DatasetReader): - """The class to read dataset for ranking or paraphrase identification with Siamese networks.""" - - def read(self, data_path: str, **kwargs) -> Dict[str, List[Tuple[List[str], int]]]: - """Read the dataset for ranking or paraphrase identification with Siamese networks. - - Args: - data_path: A path to a folder with dataset files. - """ - - dataset = {'train': None, 'valid': None, 'test': None} - data_path = expand_path(data_path) - train_fname = data_path / 'train.csv' - valid_fname = data_path / 'valid.csv' - test_fname = data_path / 'test.csv' - dataset["train"] = self._preprocess_data_train(train_fname) - dataset["valid"] = self._preprocess_data_valid_test(valid_fname) - dataset["test"] = self._preprocess_data_valid_test(test_fname) - return dataset - - def _preprocess_data_train(self, fname: Path) -> List[Tuple[List[str], int]]: - data = [] - with open(fname, 'r') as f: - reader = csv.reader(f, delimiter='\t') - for el in reader: - data.append((el[:2], int(el[2]))) - return data - - def _preprocess_data_valid_test(self, fname: Path) -> List[Tuple[List[str], int]]: - data = [] - with open(fname, 'r') as f: - reader = csv.reader(f, delimiter='\t') - for el in reader: - data.append((el, 1)) - return data diff --git a/deeppavlov/dataset_readers/snips_reader.py b/deeppavlov/dataset_readers/snips_reader.py deleted file mode 100644 index 7041df6aa7..0000000000 --- a/deeppavlov/dataset_readers/snips_reader.py +++ /dev/null @@ -1,93 +0,0 @@ -# Copyright 2019 Alexey Romanov -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import json -from logging import getLogger -from pathlib import Path -from typing import List, Dict, Any, Optional - -from overrides import overrides - -from deeppavlov.core.common.registry import register -from deeppavlov.core.data.dataset_reader import DatasetReader -from deeppavlov.core.data.utils import download_decompress, mark_done, is_done - -log = getLogger(__name__) - - -@register('snips_reader') -class SnipsReader(DatasetReader): - """The class to download and read Snips NLU Benchmark dataset (custom intents section). - - See https://github.com/snipsco/nlu-benchmark. - """ - - # noinspection PyAttributeOutsideInit - @overrides - def read(self, data_path: str, queries_per_intent: Optional[int] = None, test_validate_split: float = 0.5, - *args, **kwargs) -> \ - Dict[str, List[Dict[str, Any]]]: - """ - Each query in the output has the following form: - { 'intent': intent_name, - 'data': [ { 'text': text, ('entity': slot_name)? } ] - } - - Args: - data_path: A path to a folder with dataset files. - queries_per_intent: Number of queries to load for each intent. None to load all. - If the requested number is greater than available in file, all queries are returned. - test_validate_split: Proportion of `_validate` files to be used as test dataset (since Snips - is split into training and validation sets without a separate test set). - """ - data_path = Path(data_path) - intents = ['AddToPlaylist', 'BookRestaurant', 'GetWeather', 'PlayMusic', - 'RateBook', 'SearchCreativeWork', 'SearchScreeningEvent'] - - if not is_done(data_path): - url = 'http://files.deeppavlov.ai/datasets/snips.tar.gz' - log.info('[downloading data from {} to {}]'.format(url, data_path)) - download_decompress(url, data_path) - mark_done(data_path) - - use_full_file = queries_per_intent is None or queries_per_intent > 70 - training_data = [] - validation_data = [] - test_data = [] - - for intent in intents: - intent_path = data_path / intent - train_file_name = f"train_{intent}{'_full' if use_full_file else ''}.json" - validate_file_name = f"validate_{intent}.json" - - train_queries = self._load_file(intent_path / train_file_name, intent, queries_per_intent) - validate_queries = self._load_file(intent_path / validate_file_name, intent, queries_per_intent) - num_test_queries = round(len(validate_queries) * test_validate_split) - - training_data.extend(train_queries) - validation_data.extend(validate_queries[num_test_queries:]) - test_data.extend(validate_queries[:num_test_queries]) - - return {'train': training_data, 'valid': validation_data, 'test': test_data} - - @staticmethod - def _load_file(path: Path, intent: str, num_queries: Optional[int]): - with path.open(encoding='latin_1') as f: - data = json.load(f) - - # restrict number of queries - queries = data[intent][:num_queries] - for query in queries: - query['intent'] = intent - return queries diff --git a/deeppavlov/dataset_readers/squad_dataset_reader.py b/deeppavlov/dataset_readers/squad_dataset_reader.py index 2a4ef9d2bb..078cf46d8b 100644 --- a/deeppavlov/dataset_readers/squad_dataset_reader.py +++ b/deeppavlov/dataset_readers/squad_dataset_reader.py @@ -30,6 +30,10 @@ class SquadDatasetReader(DatasetReader): SQuAD: Stanford Question Answering Dataset https://rajpurkar.github.io/SQuAD-explorer/ + + SQuAD2.0: + Stanford Question Answering Dataset, version 2.0 + https://rajpurkar.github.io/SQuAD-explorer/ SberSQuAD: Dataset from SDSJ Task B @@ -46,6 +50,7 @@ class SquadDatasetReader(DatasetReader): url_squad = 'http://files.deeppavlov.ai/datasets/squad-v1.1.tar.gz' url_sber_squad = 'http://files.deeppavlov.ai/datasets/sber_squad-v1.1.tar.gz' url_multi_squad = 'http://files.deeppavlov.ai/datasets/multiparagraph_squad.tar.gz' + url_squad2 = 'http://files.deeppavlov.ai/datasets/squad-v2.0.tar.gz' def read(self, dir_path: str, dataset: Optional[str] = 'SQuAD', url: Optional[str] = None, *args, **kwargs) \ -> Dict[str, Dict[str, Any]]: @@ -70,11 +75,16 @@ def read(self, dir_path: str, dataset: Optional[str] = 'SQuAD', url: Optional[st self.url = self.url_sber_squad elif dataset == 'MultiSQuAD': self.url = self.url_multi_squad + elif dataset == 'SQuAD2.0': + self.url = self.url_squad2 else: raise RuntimeError(f'Dataset {dataset} is unknown') dir_path = Path(dir_path) - required_files = [f'{dt}-v1.1.json' for dt in ['train', 'dev']] + if dataset == "SQuAD2.0": + required_files = [f'{dt}-v2.0.json' for dt in ['train', 'dev']] + else: + required_files = [f'{dt}-v1.1.json' for dt in ['train', 'dev']] dir_path.mkdir(parents=True, exist_ok=True) if not all((dir_path / f).exists() for f in required_files): @@ -84,7 +94,7 @@ def read(self, dir_path: str, dataset: Optional[str] = 'SQuAD', url: Optional[st for f in required_files: with dir_path.joinpath(f).open('r', encoding='utf8') as fp: data = json.load(fp) - if f == 'dev-v1.1.json': + if f in {'dev-v1.1.json', 'dev-v2.0.json'}: dataset['valid'] = data else: dataset['train'] = data diff --git a/deeppavlov/dataset_readers/torchtext_classification_data_reader.py b/deeppavlov/dataset_readers/torchtext_classification_data_reader.py deleted file mode 100644 index b7b4f5319f..0000000000 --- a/deeppavlov/dataset_readers/torchtext_classification_data_reader.py +++ /dev/null @@ -1,60 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import random -from logging import getLogger -from typing import Optional - -import torchtext -import torchtext.datasets as torch_texts -from overrides import overrides - -from deeppavlov.core.common.registry import register -from deeppavlov.core.data.dataset_reader import DatasetReader - -log = getLogger(__name__) - - -@register("torchtext_classification_data_reader") -class TorchtextClassificationDataReader(DatasetReader): - """Class initializes datasets as an attribute of `torchtext.datasets`. - Raw texts and string labels are re-assigned to common deeppavlov format of data which will be given to iterator. - """ - @overrides - def read(self, data_path: str, dataset_title: str, - splits: list = ["train", "valid", "test"], valid_portion: Optional[float] = None, - split_seed: int = 42, *args, **kwargs) -> dict: - - if hasattr(torch_texts, dataset_title) and callable(getattr(torch_texts, dataset_title)): - log.info(f"Dataset {dataset_title} is used as an attribute of `torchtext.datasets`.") - _text = torchtext.data.RawField() - _label = torchtext.data.RawField() - data_splits = getattr(torch_texts, dataset_title).splits(_text, _label, root=data_path) - assert len(data_splits) == len(splits) - data_splits = dict(zip(splits, data_splits)) - - if "valid" not in splits and valid_portion is not None: - log.info("Valid not in `splits` and `valid_portion` is given. Split `train` to `train` and `valid`") - data_splits["train"], data_splits["valid"] = data_splits["train"].split( - 1 - valid_portion, random_state=random.seed(split_seed)) - else: - raise NotImplementedError(f"Dataset {dataset_title} was not found.") - - data = {} - for data_field in data_splits: - data[data_field] = [] - for sample in data_splits[data_field].examples: - data[data_field].append((vars(sample)["text"], vars(sample)["label"])) - log.info(f"For field {data_field} found {len(data[data_field])} samples.") - return data diff --git a/deeppavlov/dataset_readers/ubuntu_v2_mt_reader.py b/deeppavlov/dataset_readers/ubuntu_v2_mt_reader.py deleted file mode 100644 index 57b779bd11..0000000000 --- a/deeppavlov/dataset_readers/ubuntu_v2_mt_reader.py +++ /dev/null @@ -1,117 +0,0 @@ -# Copyright 2018 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import csv -from pathlib import Path -from typing import List, Tuple, Union, Dict - -from deeppavlov.core.common.registry import register -from deeppavlov.core.data.dataset_reader import DatasetReader - - -@register('ubuntu_v2_mt_reader') -class UbuntuV2MTReader(DatasetReader): - """The class to read the Ubuntu V2 dataset from csv files taking into account multi-turn dialogue ``context``. - - Please, see https://github.com/rkadlec/ubuntu-ranking-dataset-creator. - - Args: - data_path: A path to a folder with dataset csv files. - num_context_turns: A maximum number of dialogue ``context`` turns. - padding: "post" or "pre" context sentences padding - """ - - def read(self, data_path: str, - num_context_turns: int = 1, - padding: str = "post", - *args, **kwargs) -> Dict[str, List[Tuple[List[str], int]]]: - """Read the Ubuntu V2 dataset from csv files taking into account multi-turn dialogue ``context``. - - Args: - data_path: A path to a folder with dataset csv files. - num_context_turns: A maximum number of dialogue ``context`` turns. - padding: "post" or "pre" context sentences padding - - Returns: - Dictionary with keys "train", "valid", "test" and parts of the dataset as their values - """ - - self.num_turns = num_context_turns - self.padding = padding - dataset = {'train': None, 'valid': None, 'test': None} - train_fname = Path(data_path) / 'train.csv' - valid_fname = Path(data_path) / 'valid.csv' - test_fname = Path(data_path) / 'test.csv' - self.sen2int_vocab = {} - self.classes_vocab_train = {} - self.classes_vocab_valid = {} - self.classes_vocab_test = {} - dataset["train"] = self.preprocess_data_train(train_fname) - dataset["valid"] = self.preprocess_data_validation(valid_fname) - dataset["test"] = self.preprocess_data_validation(test_fname) - return dataset - - def preprocess_data_train(self, train_fname: Union[Path, str]) -> List[Tuple[List[str], int]]: - contexts = [] - responses = [] - labels = [] - with open(train_fname, 'r') as f: - reader = csv.reader(f) - next(reader) - for el in reader: - contexts.append(self._expand_context(el[0].split('__eot__'), padding=self.padding)) - responses.append(el[1]) - labels.append(int(el[2])) - data = [el[0] + [el[1]] for el in zip(contexts, responses)] - data = list(zip(data, labels)) - return data - - def preprocess_data_validation(self, fname: Union[Path, str]) -> List[Tuple[List[str], int]]: - contexts = [] - responses = [] - with open(fname, 'r') as f: - reader = csv.reader(f) - next(reader) - for el in reader: - contexts.append(self._expand_context(el[0].split('__eot__'), padding=self.padding)) - responses.append(el[1:]) - data = [el[0] + el[1] for el in zip(contexts, responses)] - data = [(el, 1) for el in data] # NOTE: labels are useless here actually... - return data - - def _expand_context(self, context: List[str], padding: str) -> List[str]: - """ - Align context length by using pre/post padding of empty sentences up to ``self.num_turns`` sentences - or by reducing the number of context sentences to ``self.num_turns`` sentences. - - Args: - context (List[str]): list of raw context sentences - padding (str): "post" or "pre" context sentences padding - - Returns: - List[str]: list of ``self.num_turns`` context sentences - """ - if padding == "post": - sent_list = context - res = sent_list + (self.num_turns - len(sent_list)) * \ - [''] if len(sent_list) < self.num_turns else sent_list[:self.num_turns] - return res - elif padding == "pre": - # context[-(self.num_turns + 1):-1] because the last item of `context` is always '' (empty string) - sent_list = context[-(self.num_turns + 1):-1] - if len(sent_list) <= self.num_turns: - tmp = sent_list[:] - sent_list = [''] * (self.num_turns - len(sent_list)) - sent_list.extend(tmp) - return sent_list diff --git a/deeppavlov/deep.py b/deeppavlov/deep.py index 489e0932cf..52fc62677b 100644 --- a/deeppavlov/deep.py +++ b/deeppavlov/deep.py @@ -21,21 +21,17 @@ from deeppavlov.core.common.file import find_config from deeppavlov.download import deep_download from deeppavlov.utils.agent import start_rabbit_service -from deeppavlov.utils.alexa import start_alexa_server -from deeppavlov.utils.alice import start_alice_server -from deeppavlov.utils.ms_bot_framework import start_ms_bf_server from deeppavlov.utils.pip_wrapper import install_from_config from deeppavlov.utils.server import start_model_server from deeppavlov.utils.socket import start_socket_server -from deeppavlov.utils.telegram import interact_model_by_telegram log = getLogger(__name__) parser = argparse.ArgumentParser() parser.add_argument("mode", help="select a mode, train or interact", type=str, - choices={'train', 'evaluate', 'interact', 'predict', 'telegram', 'msbot', 'alexa', 'alice', - 'riseapi', 'risesocket', 'agent-rabbit', 'download', 'install', 'crossval'}) + choices={'train', 'evaluate', 'interact', 'predict', 'riseapi', 'risesocket', 'agent-rabbit', + 'download', 'install', 'crossval'}) parser.add_argument("config_path", help="path to a pipeline json config", type=str) parser.add_argument("-e", "--start-epoch-num", dest="start_epoch_num", default=None, @@ -48,11 +44,6 @@ parser.add_argument("--folds", help="number of folds", type=int, default=5) -parser.add_argument("-t", "--token", default=None, help="telegram bot token", type=str) - -parser.add_argument("-i", "--ms-id", default=None, help="microsoft bot framework app id", type=str) -parser.add_argument("-s", "--ms-secret", default=None, help="microsoft bot framework app secret", type=str) - parser.add_argument("--https", action="store_true", default=None, help="run model in https mode") parser.add_argument("--key", default=None, help="ssl key", type=str) parser.add_argument("--cert", default=None, help="ssl certificate", type=str) @@ -87,28 +78,6 @@ def main(): train_evaluate_model_from_config(pipeline_config_path, to_train=False, start_epoch_num=args.start_epoch_num) elif args.mode == 'interact': interact_model(pipeline_config_path) - elif args.mode == 'telegram': - interact_model_by_telegram(model_config=pipeline_config_path, token=args.token) - elif args.mode == 'msbot': - start_ms_bf_server(model_config=pipeline_config_path, - app_id=args.ms_id, - app_secret=args.ms_secret, - port=args.port, - https=args.https, - ssl_key=args.key, - ssl_cert=args.cert) - elif args.mode == 'alexa': - start_alexa_server(model_config=pipeline_config_path, - port=args.port, - https=args.https, - ssl_key=args.key, - ssl_cert=args.cert) - elif args.mode == 'alice': - start_alice_server(model_config=pipeline_config_path, - port=args.port, - https=args.https, - ssl_key=args.key, - ssl_cert=args.cert) elif args.mode == 'riseapi': start_model_server(pipeline_config_path, args.https, args.key, args.cert, port=args.port) elif args.mode == 'risesocket': diff --git a/deeppavlov/metrics/accuracy.py b/deeppavlov/metrics/accuracy.py index 58ca2ac0a7..560d92ee71 100644 --- a/deeppavlov/metrics/accuracy.py +++ b/deeppavlov/metrics/accuracy.py @@ -19,7 +19,6 @@ import numpy as np from deeppavlov.core.common.metrics_registry import register_metric -from deeppavlov.models.go_bot.nlg.dto.json_nlg_response import JSONNLGResponse @register_metric('accuracy') @@ -157,23 +156,6 @@ def per_item_dialog_accuracy(y_true, y_predicted: List[List[str]]): return correct / examples_len if examples_len else 0 -@register_metric("per_item_action_accuracy") -def per_item_action_accuracy(dialogs_true, dialog_jsons_predicted: List[List[JSONNLGResponse]]): - # todo metric classes??? - # todo oop instead of serialization/deserialization - utterances_actions_true = [utterance['act'] - for dialog in dialogs_true - for utterance in dialog] - - utterances_actions_predicted: Iterable[JSONNLGResponse] = itertools.chain(*dialog_jsons_predicted) - examples_len = len(utterances_actions_true) - correct = sum([y1.strip().lower() == '+'.join(y2.actions_tuple).lower() - for y1, y2 in zip(utterances_actions_true, utterances_actions_predicted)]) # todo ugly - return correct / examples_len if examples_len else 0 - -# endregion go-bot metrics - - @register_metric('acc') def round_accuracy(y_true, y_predicted): """ diff --git a/deeppavlov/models/bert/__init__.py b/deeppavlov/models/bert/__init__.py deleted file mode 100644 index e69de29bb2..0000000000 diff --git a/deeppavlov/models/bert/bert_classifier.py b/deeppavlov/models/bert/bert_classifier.py deleted file mode 100644 index e33d7bb35d..0000000000 --- a/deeppavlov/models/bert/bert_classifier.py +++ /dev/null @@ -1,243 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from logging import getLogger -from typing import List, Dict, Union - -import tensorflow as tf -from bert_dp.modeling import BertConfig, BertModel -from bert_dp.optimization import AdamWeightDecayOptimizer -from bert_dp.preprocessing import InputFeatures - -from deeppavlov.core.commands.utils import expand_path -from deeppavlov.core.common.registry import register -from deeppavlov.core.models.tf_model import LRScheduledTFModel - -logger = getLogger(__name__) - - -@register('bert_classifier') -class BertClassifierModel(LRScheduledTFModel): - """Bert-based model for text classification. - - It uses output from [CLS] token and predicts labels using linear transformation. - - Args: - bert_config_file: path to Bert configuration file - n_classes: number of classes - keep_prob: dropout keep_prob for non-Bert layers - one_hot_labels: set True if one-hot encoding for labels is used - multilabel: set True if it is multi-label classification - return_probas: set True if return class probabilites instead of most probable label needed - attention_probs_keep_prob: keep_prob for Bert self-attention layers - hidden_keep_prob: keep_prob for Bert hidden layers - optimizer: name of tf.train.* optimizer or None for `AdamWeightDecayOptimizer` - num_warmup_steps: - weight_decay_rate: L2 weight decay for `AdamWeightDecayOptimizer` - pretrained_bert: pretrained Bert checkpoint - min_learning_rate: min value of learning rate if learning rate decay is used - """ - - # TODO: add warmup - # TODO: add head-only pre-training - def __init__(self, bert_config_file, n_classes, keep_prob, - one_hot_labels=False, multilabel=False, return_probas=False, - attention_probs_keep_prob=None, hidden_keep_prob=None, - optimizer=None, num_warmup_steps=None, weight_decay_rate=0.01, - pretrained_bert=None, min_learning_rate=1e-06, **kwargs) -> None: - super().__init__(**kwargs) - - self.return_probas = return_probas - self.n_classes = n_classes - self.min_learning_rate = min_learning_rate - self.keep_prob = keep_prob - self.one_hot_labels = one_hot_labels - self.multilabel = multilabel - self.optimizer = optimizer - self.num_warmup_steps = num_warmup_steps - self.weight_decay_rate = weight_decay_rate - - if self.multilabel and not self.one_hot_labels: - raise RuntimeError('Use one-hot encoded labels for multilabel classification!') - - if self.multilabel and not self.return_probas: - raise RuntimeError('Set return_probas to True for multilabel classification!') - - self.bert_config = BertConfig.from_json_file(str(expand_path(bert_config_file))) - - if attention_probs_keep_prob is not None: - self.bert_config.attention_probs_dropout_prob = 1.0 - attention_probs_keep_prob - if hidden_keep_prob is not None: - self.bert_config.hidden_dropout_prob = 1.0 - hidden_keep_prob - - self.sess_config = tf.ConfigProto(allow_soft_placement=True) - self.sess_config.gpu_options.allow_growth = True - self.sess = tf.Session(config=self.sess_config) - - self._init_graph() - - self._init_optimizer() - - self.sess.run(tf.global_variables_initializer()) - - if pretrained_bert is not None: - pretrained_bert = str(expand_path(pretrained_bert)) - - if tf.train.checkpoint_exists(pretrained_bert) \ - and not (self.load_path and tf.train.checkpoint_exists(str(self.load_path.resolve()))): - logger.info('[initializing model with Bert from {}]'.format(pretrained_bert)) - # Exclude optimizer and classification variables from saved variables - var_list = self._get_saveable_variables( - exclude_scopes=('Optimizer', 'learning_rate', 'momentum', 'output_weights', 'output_bias')) - saver = tf.train.Saver(var_list) - saver.restore(self.sess, pretrained_bert) - - if self.load_path is not None: - self.load() - - def _init_graph(self): - self._init_placeholders() - - self.bert = BertModel(config=self.bert_config, - is_training=self.is_train_ph, - input_ids=self.input_ids_ph, - input_mask=self.input_masks_ph, - token_type_ids=self.token_types_ph, - use_one_hot_embeddings=False, - ) - - output_layer = self.bert.get_pooled_output() - hidden_size = output_layer.shape[-1].value - - output_weights = tf.get_variable( - "output_weights", [self.n_classes, hidden_size], - initializer=tf.truncated_normal_initializer(stddev=0.02)) - - output_bias = tf.get_variable( - "output_bias", [self.n_classes], initializer=tf.zeros_initializer()) - - with tf.variable_scope("loss"): - output_layer = tf.nn.dropout(output_layer, keep_prob=self.keep_prob_ph) - logits = tf.matmul(output_layer, output_weights, transpose_b=True) - logits = tf.nn.bias_add(logits, output_bias) - - if self.one_hot_labels: - one_hot_labels = self.y_ph - else: - one_hot_labels = tf.one_hot(self.y_ph, depth=self.n_classes, dtype=tf.float32) - - self.y_predictions = tf.argmax(logits, axis=-1) - if not self.multilabel: - log_probs = tf.nn.log_softmax(logits, axis=-1) - self.y_probas = tf.nn.softmax(logits, axis=-1) - per_example_loss = -tf.reduce_sum(one_hot_labels * log_probs, axis=-1) - self.loss = tf.reduce_mean(per_example_loss) - else: - self.y_probas = tf.nn.sigmoid(logits) - self.loss = tf.reduce_mean( - tf.nn.sigmoid_cross_entropy_with_logits(labels=one_hot_labels, logits=logits)) - - def _init_placeholders(self): - self.input_ids_ph = tf.placeholder(shape=(None, None), dtype=tf.int32, name='ids_ph') - self.input_masks_ph = tf.placeholder(shape=(None, None), dtype=tf.int32, name='masks_ph') - self.token_types_ph = tf.placeholder(shape=(None, None), dtype=tf.int32, name='token_types_ph') - - if not self.one_hot_labels: - self.y_ph = tf.placeholder(shape=(None,), dtype=tf.int32, name='y_ph') - else: - self.y_ph = tf.placeholder(shape=(None, self.n_classes), dtype=tf.float32, name='y_ph') - - self.learning_rate_ph = tf.placeholder_with_default(0.0, shape=[], name='learning_rate_ph') - self.keep_prob_ph = tf.placeholder_with_default(1.0, shape=[], name='keep_prob_ph') - self.is_train_ph = tf.placeholder_with_default(False, shape=[], name='is_train_ph') - - def _init_optimizer(self): - with tf.variable_scope('Optimizer'): - self.global_step = tf.get_variable('global_step', shape=[], dtype=tf.int32, - initializer=tf.constant_initializer(0), trainable=False) - # default optimizer for Bert is Adam with fixed L2 regularization - if self.optimizer is None: - - self.train_op = self.get_train_op(self.loss, learning_rate=self.learning_rate_ph, - optimizer=AdamWeightDecayOptimizer, - weight_decay_rate=self.weight_decay_rate, - beta_1=0.9, - beta_2=0.999, - epsilon=1e-6, - exclude_from_weight_decay=["LayerNorm", "layer_norm", "bias"] - ) - else: - self.train_op = self.get_train_op(self.loss, learning_rate=self.learning_rate_ph) - - if self.optimizer is None: - new_global_step = self.global_step + 1 - self.train_op = tf.group(self.train_op, [self.global_step.assign(new_global_step)]) - - def _build_feed_dict(self, input_ids, input_masks, token_types, y=None): - feed_dict = { - self.input_ids_ph: input_ids, - self.input_masks_ph: input_masks, - self.token_types_ph: token_types, - } - if y is not None: - feed_dict.update({ - self.y_ph: y, - self.learning_rate_ph: max(self.get_learning_rate(), self.min_learning_rate), - self.keep_prob_ph: self.keep_prob, - self.is_train_ph: True, - }) - - return feed_dict - - def train_on_batch(self, features: List[InputFeatures], y: Union[List[int], List[List[int]]]) -> Dict: - """Train model on given batch. - This method calls train_op using features and y (labels). - - Args: - features: batch of InputFeatures - y: batch of labels (class id or one-hot encoding) - - Returns: - dict with loss and learning_rate values - - """ - input_ids = [f.input_ids for f in features] - input_masks = [f.input_mask for f in features] - input_type_ids = [f.input_type_ids for f in features] - - feed_dict = self._build_feed_dict(input_ids, input_masks, input_type_ids, y) - - _, loss = self.sess.run([self.train_op, self.loss], feed_dict=feed_dict) - return {'loss': loss, 'learning_rate': feed_dict[self.learning_rate_ph]} - - def __call__(self, features: List[InputFeatures]) -> Union[List[int], List[List[float]]]: - """Make prediction for given features (texts). - - Args: - features: batch of InputFeatures - - Returns: - predicted classes or probabilities of each class - - """ - input_ids = [f.input_ids for f in features] - input_masks = [f.input_mask for f in features] - input_type_ids = [f.input_type_ids for f in features] - - feed_dict = self._build_feed_dict(input_ids, input_masks, input_type_ids) - if not self.return_probas: - pred = self.sess.run(self.y_predictions, feed_dict=feed_dict) - else: - pred = self.sess.run(self.y_probas, feed_dict=feed_dict) - return pred diff --git a/deeppavlov/models/bert/bert_ranker.py b/deeppavlov/models/bert/bert_ranker.py deleted file mode 100644 index c4d26be4ae..0000000000 --- a/deeppavlov/models/bert/bert_ranker.py +++ /dev/null @@ -1,467 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import re -from collections import OrderedDict -from logging import getLogger -from operator import itemgetter -from typing import List, Dict, Union - -import numpy as np -import tensorflow as tf -from bert_dp.modeling import BertConfig, BertModel -from bert_dp.optimization import AdamWeightDecayOptimizer -from bert_dp.preprocessing import InputFeatures - -from deeppavlov.core.commands.utils import expand_path -from deeppavlov.core.common.registry import register -from deeppavlov.core.models.tf_model import LRScheduledTFModel -from deeppavlov.models.bert.bert_classifier import BertClassifierModel - -logger = getLogger(__name__) - - -@register('bert_ranker') -class BertRankerModel(BertClassifierModel): - """BERT-based model for interaction-based text ranking. - - Linear transformation is trained over the BERT pooled output from [CLS] token. - Predicted probabilities of classes are used as a similarity measure for ranking. - - Args: - bert_config_file: path to Bert configuration file - n_classes: number of classes - keep_prob: dropout keep_prob for non-Bert layers - return_probas: set True if class probabilities are returned instead of the most probable label - """ - - def __init__(self, bert_config_file, n_classes=2, keep_prob=0.9, return_probas=True, **kwargs) -> None: - super().__init__(bert_config_file=bert_config_file, n_classes=n_classes, - keep_prob=keep_prob, return_probas=return_probas, **kwargs) - - def train_on_batch(self, features_li: List[List[InputFeatures]], y: Union[List[int], List[List[int]]]) -> Dict: - """Train the model on the given batch. - - Args: - features_li: list with the single element containing the batch of InputFeatures - y: batch of labels (class id or one-hot encoding) - - Returns: - dict with loss and learning rate values - """ - - features = features_li[0] - input_ids = [f.input_ids for f in features] - input_masks = [f.input_mask for f in features] - input_type_ids = [f.input_type_ids for f in features] - - feed_dict = self._build_feed_dict(input_ids, input_masks, input_type_ids, y) - - _, loss = self.sess.run([self.train_op, self.loss], feed_dict=feed_dict) - return {'loss': loss, 'learning_rate': feed_dict[self.learning_rate_ph]} - - def __call__(self, features_li: List[List[InputFeatures]]) -> Union[List[int], List[List[float]]]: - """Calculate scores for the given context over candidate responses. - - Args: - features_li: list of elements where each element contains the batch of features - for contexts with particular response candidates - - Returns: - predicted scores for contexts over response candidates - """ - - if len(features_li) == 1 and len(features_li[0]) == 1: - msg = "It is not intended to use the {} in the interact mode.".format(self.__class__) - logger.error(msg) - return [msg] - - predictions = [] - for features in features_li: - input_ids = [f.input_ids for f in features] - input_masks = [f.input_mask for f in features] - input_type_ids = [f.input_type_ids for f in features] - - feed_dict = self._build_feed_dict(input_ids, input_masks, input_type_ids) - if not self.return_probas: - pred = self.sess.run(self.y_predictions, feed_dict=feed_dict) - else: - pred = self.sess.run(self.y_probas, feed_dict=feed_dict) - predictions.append(pred[:, 1]) - if len(features_li) == 1: - predictions = predictions[0] - else: - predictions = np.hstack([np.expand_dims(el, 1) for el in predictions]) - return predictions - - -@register('bert_sep_ranker') -class BertSepRankerModel(LRScheduledTFModel): - """BERT-based model for representation-based text ranking. - - BERT pooled output from [CLS] token is used to get a separate representation of a context and a response. - Similarity measure is calculated as cosine similarity between these representations. - - Args: - bert_config_file: path to Bert configuration file - keep_prob: dropout keep_prob for non-Bert layers - attention_probs_keep_prob: keep_prob for Bert self-attention layers - hidden_keep_prob: keep_prob for Bert hidden layers - optimizer: name of tf.train.* optimizer or None for ``AdamWeightDecayOptimizer`` - weight_decay_rate: L2 weight decay for ``AdamWeightDecayOptimizer`` - pretrained_bert: pretrained Bert checkpoint - min_learning_rate: min value of learning rate if learning rate decay is used - """ - - def __init__(self, bert_config_file, keep_prob=0.9, - attention_probs_keep_prob=None, hidden_keep_prob=None, - optimizer=None, weight_decay_rate=0.01, - pretrained_bert=None, min_learning_rate=1e-06, **kwargs) -> None: - super().__init__(**kwargs) - - self.min_learning_rate = min_learning_rate - self.keep_prob = keep_prob - self.optimizer = optimizer - self.weight_decay_rate = weight_decay_rate - - self.bert_config = BertConfig.from_json_file(str(expand_path(bert_config_file))) - - if attention_probs_keep_prob is not None: - self.bert_config.attention_probs_dropout_prob = 1.0 - attention_probs_keep_prob - if hidden_keep_prob is not None: - self.bert_config.hidden_dropout_prob = 1.0 - hidden_keep_prob - - self.sess_config = tf.ConfigProto(allow_soft_placement=True) - self.sess_config.gpu_options.allow_growth = True - self.sess = tf.Session(config=self.sess_config) - - self._init_graph() - - self._init_optimizer() - - if pretrained_bert is not None: - pretrained_bert = str(expand_path(pretrained_bert)) - - if tf.train.checkpoint_exists(pretrained_bert) \ - and not (self.load_path and tf.train.checkpoint_exists(str(self.load_path.resolve()))): - logger.info('[initializing model with Bert from {}]'.format(pretrained_bert)) - # Exclude optimizer and classification variables from saved variables - var_list = self._get_saveable_variables( - exclude_scopes=('Optimizer', 'learning_rate', 'momentum', 'output_weights', 'output_bias')) - assignment_map = self.get_variables_to_restore(var_list, pretrained_bert) - tf.train.init_from_checkpoint(pretrained_bert, assignment_map) - - self.sess.run(tf.global_variables_initializer()) - - if self.load_path is not None: - self.load() - - @classmethod - def get_variables_to_restore(cls, tvars, init_checkpoint): - """Determine correspondence of checkpoint variables to current variables.""" - - assignment_map = OrderedDict() - graph_names = [] - for var in tvars: - name = var.name - m = re.match("^(.*):\\d+$", name) - if m is not None: - name = m.group(1) - graph_names.append(name) - ckpt_names = [el[0] for el in tf.train.list_variables(init_checkpoint)] - for u in ckpt_names: - for v in graph_names: - if u in v: - assignment_map[u] = v - return assignment_map - - def _init_graph(self): - self._init_placeholders() - - with tf.variable_scope("model"): - model_a = BertModel( - config=self.bert_config, - is_training=self.is_train_ph, - input_ids=self.input_ids_a_ph, - input_mask=self.input_masks_a_ph, - token_type_ids=self.token_types_a_ph, - use_one_hot_embeddings=False) - - with tf.variable_scope("model", reuse=True): - model_b = BertModel( - config=self.bert_config, - is_training=self.is_train_ph, - input_ids=self.input_ids_b_ph, - input_mask=self.input_masks_b_ph, - token_type_ids=self.token_types_b_ph, - use_one_hot_embeddings=False) - - output_layer_a = model_a.get_pooled_output() - output_layer_b = model_b.get_pooled_output() - - with tf.variable_scope("loss"): - output_layer_a = tf.nn.dropout(output_layer_a, keep_prob=self.keep_prob_ph) - output_layer_b = tf.nn.dropout(output_layer_b, keep_prob=self.keep_prob_ph) - output_layer_a = tf.nn.l2_normalize(output_layer_a, axis=1) - output_layer_b = tf.nn.l2_normalize(output_layer_b, axis=1) - embeddings = tf.concat([output_layer_a, output_layer_b], axis=0) - labels = tf.concat([self.y_ph, self.y_ph], axis=0) - self.loss = tf.contrib.losses.metric_learning.triplet_semihard_loss(labels, embeddings) - logits = tf.multiply(output_layer_a, output_layer_b) - self.y_probas = tf.reduce_sum(logits, 1) - self.pooled_out = output_layer_a - - def _init_placeholders(self): - self.input_ids_a_ph = tf.placeholder(shape=(None, None), dtype=tf.int32, name='ids_a_ph') - self.input_masks_a_ph = tf.placeholder(shape=(None, None), dtype=tf.int32, name='masks_a_ph') - self.token_types_a_ph = tf.placeholder(shape=(None, None), dtype=tf.int32, name='token_a_types_ph') - self.input_ids_b_ph = tf.placeholder(shape=(None, None), dtype=tf.int32, name='ids_b_ph') - self.input_masks_b_ph = tf.placeholder(shape=(None, None), dtype=tf.int32, name='masks_b_ph') - self.token_types_b_ph = tf.placeholder(shape=(None, None), dtype=tf.int32, name='token_types_b_ph') - self.y_ph = tf.placeholder(shape=(None,), dtype=tf.int32, name='y_ph') - self.learning_rate_ph = tf.placeholder_with_default(0.0, shape=[], name='learning_rate_ph') - self.keep_prob_ph = tf.placeholder_with_default(1.0, shape=[], name='keep_prob_ph') - self.is_train_ph = tf.placeholder_with_default(False, shape=[], name='is_train_ph') - - def _init_optimizer(self): - with tf.variable_scope('Optimizer'): - self.global_step = tf.get_variable('global_step', shape=[], dtype=tf.int32, - initializer=tf.constant_initializer(0), trainable=False) - # default optimizer for Bert is Adam with fixed L2 regularization - if self.optimizer is None: - - self.train_op = self.get_train_op(self.loss, learning_rate=self.learning_rate_ph, - optimizer=AdamWeightDecayOptimizer, - weight_decay_rate=self.weight_decay_rate, - beta_1=0.9, - beta_2=0.999, - epsilon=1e-6, - exclude_from_weight_decay=["LayerNorm", "layer_norm", "bias"] - ) - else: - self.train_op = self.get_train_op(self.loss, learning_rate=self.learning_rate_ph) - - if self.optimizer is None: - new_global_step = self.global_step + 1 - self.train_op = tf.group(self.train_op, [self.global_step.assign(new_global_step)]) - - def _build_feed_dict(self, input_ids_a, input_masks_a, token_types_a, - input_ids_b, input_masks_b, token_types_b, y=None): - feed_dict = { - self.input_ids_a_ph: input_ids_a, - self.input_masks_a_ph: input_masks_a, - self.token_types_a_ph: token_types_a, - self.input_ids_b_ph: input_ids_b, - self.input_masks_b_ph: input_masks_b, - self.token_types_b_ph: token_types_b, - } - if y is not None: - feed_dict.update({ - self.y_ph: y, - self.learning_rate_ph: max(self.get_learning_rate(), self.min_learning_rate), - self.keep_prob_ph: self.keep_prob, - self.is_train_ph: True, - }) - - return feed_dict - - def train_on_batch(self, features_li: List[List[InputFeatures]], y: Union[List[int], List[List[int]]]) -> Dict: - """Train the model on the given batch. - - Args: - features_li: list with two elements, one containing the batch of context features - and the other containing the batch of response features - y: batch of labels (class id or one-hot encoding) - - Returns: - dict with loss and learning rate values - """ - - input_ids_a = [f.input_ids for f in features_li[0]] - input_masks_a = [f.input_mask for f in features_li[0]] - input_type_ids_a = [f.input_type_ids for f in features_li[0]] - input_ids_b = [f.input_ids for f in features_li[1]] - input_masks_b = [f.input_mask for f in features_li[1]] - input_type_ids_b = [f.input_type_ids for f in features_li[1]] - - feed_dict = self._build_feed_dict(input_ids_a, input_masks_a, input_type_ids_a, - input_ids_b, input_masks_b, input_type_ids_b, y) - - _, loss = self.sess.run([self.train_op, self.loss], feed_dict=feed_dict) - return {'loss': loss, 'learning_rate': feed_dict[self.learning_rate_ph]} - - def __call__(self, features_li: List[List[InputFeatures]]) -> Union[List[int], List[List[float]]]: - """Calculate scores for the given context over candidate responses. - - Args: - features_li: list of elements where the first element represents the context batch of features - and the rest of elements represent response candidates batches of features - - Returns: - predicted scores for contexts over response candidates - """ - - if len(features_li) == 1 and len(features_li[0]) == 1: - msg = "It is not intended to use the {} in the interact mode.".format(self.__class__) - logger.error(msg) - return [msg] - - predictions = [] - input_ids_a = [f.input_ids for f in features_li[0]] - input_masks_a = [f.input_mask for f in features_li[0]] - input_type_ids_a = [f.input_type_ids for f in features_li[0]] - for features in features_li[1:]: - input_ids_b = [f.input_ids for f in features] - input_masks_b = [f.input_mask for f in features] - input_type_ids_b = [f.input_type_ids for f in features] - - feed_dict = self._build_feed_dict(input_ids_a, input_masks_a, input_type_ids_a, - input_ids_b, input_masks_b, input_type_ids_b) - pred = self.sess.run(self.y_probas, feed_dict=feed_dict) - predictions.append(pred) - if len(features_li) == 1: - predictions = predictions[0] - else: - predictions = np.hstack([np.expand_dims(el, 1) for el in predictions]) - return predictions - - -@register('bert_sep_ranker_predictor') -class BertSepRankerPredictor(BertSepRankerModel): - """Bert-based model for ranking and receiving a text response. - - BERT pooled output from [CLS] token is used to get a separate representation of a context and a response. - A similarity score is calculated as cosine similarity between these representations. - Based on this similarity score the text response is retrieved provided some base - with possible responses (and corresponding contexts). - Contexts of responses are used additionaly to get the best possible result of retrieval from the base. - - Args: - bert_config_file: path to Bert configuration file - interact_mode: mode setting a policy to retrieve the response from the base - batch_size: batch size for building response (and context) vectors over the base - keep_prob: dropout keep_prob for non-Bert layers - resps: list of strings containing the base of text responses - resp_vecs: BERT vector respresentations of ``resps``, if is ``None`` it will be build - resp_features: features of ``resps`` to build their BERT vector representations - conts: list of strings containing the base of text contexts - cont_vecs: BERT vector respresentations of ``conts``, if is ``None`` it will be build - cont_features: features of ``conts`` to build their BERT vector representations - """ - - def __init__(self, bert_config_file, interact_mode=0, batch_size=32, - resps=None, resp_features=None, resp_vecs=None, - conts=None, cont_features=None, cont_vecs=None, **kwargs) -> None: - super().__init__(bert_config_file=bert_config_file, - **kwargs) - - self.interact_mode = interact_mode - self.batch_size = batch_size - self.resps = resps - self.resp_vecs = resp_vecs - self.resp_features = resp_features - self.conts = conts - self.cont_vecs = cont_vecs - self.cont_features = cont_features - - if self.resps is not None and self.resp_vecs is None: - logger.info("Building BERT vector representations for the response base...") - self.resp_features = [resp_features[0][i * self.batch_size: (i + 1) * self.batch_size] - for i in range(len(resp_features[0]) // batch_size + 1)] - self.resp_vecs = self._get_predictions(self.resp_features) - self.resp_vecs /= np.linalg.norm(self.resp_vecs, axis=1, keepdims=True) - np.save(self.save_path / "resp_vecs", self.resp_vecs) - - if self.conts is not None and self.cont_vecs is None: - logger.info("Building BERT vector representations for the context base...") - self.cont_features = [cont_features[0][i * self.batch_size: (i + 1) * self.batch_size] - for i in range(len(cont_features[0]) // batch_size + 1)] - self.cont_vecs = self._get_predictions(self.cont_features) - self.cont_vecs /= np.linalg.norm(self.cont_vecs, axis=1, keepdims=True) - np.save(self.save_path / "cont_vecs", self.resp_vecs) - - def train_on_batch(self, features, y): - pass - - def __call__(self, features_li): - """Get the context vector representation and retrieve the text response from the database. - - Uses cosine similarity scores over vectors of responses (and corresponding contexts) from the base. - Based on these scores retrieves the text response from the base. - - Args: - features_li: list of elements where elements represent context batches of features - - Returns: - text response with the highest similarity score and its similarity score from the response base - """ - - pred = self._get_predictions(features_li) - return self._retrieve_db_response(pred) - - def _get_predictions(self, features_li): - """Get BERT vector representations for a list of feature batches.""" - - pred = [] - for features in features_li: - input_ids = [f.input_ids for f in features] - input_masks = [f.input_mask for f in features] - input_type_ids = [f.input_type_ids for f in features] - feed_dict = self._build_feed_dict(input_ids, input_masks, input_type_ids, - input_ids, input_masks, input_type_ids) - p = self.sess.run(self.pooled_out, feed_dict=feed_dict) - if len(p.shape) == 1: - p = np.expand_dims(p, 0) - p /= np.linalg.norm(p, axis=1, keepdims=True) - pred.append(p) - return np.vstack(pred) - - def _retrieve_db_response(self, ctx_vec): - """Retrieve a text response from the base based on the policy determined by ``interact_mode``. - - Uses cosine similarity scores over vectors of responses (and corresponding contexts) from the base. - """ - - bs = ctx_vec.shape[0] - if self.interact_mode == 0: - s = ctx_vec @ self.resp_vecs.T - ids = np.argmax(s, 1) - rsp = [[self.resps[ids[i]] for i in range(bs)], [s[i][ids[i]] for i in range(bs)]] - if self.interact_mode == 1: - sr = (ctx_vec @ self.resp_vecs.T + 1) / 2 - sc = (ctx_vec @ self.cont_vecs.T + 1) / 2 - ids = np.argsort(sr, 1)[:, -10:] - sc = [sc[i, ids[i]] for i in range(bs)] - ids = [sorted(zip(ids[i], sc[i]), key=itemgetter(1), reverse=True) for i in range(bs)] - sc = [list(map(lambda x: x[1], ids[i])) for i in range(bs)] - ids = [list(map(lambda x: x[0], ids[i])) for i in range(bs)] - rsp = [[self.resps[ids[i][0]] for i in range(bs)], [float(sc[i][0]) for i in range(bs)]] - if self.interact_mode == 2: - sr = (ctx_vec @ self.resp_vecs.T + 1) / 2 - sc = (ctx_vec @ self.cont_vecs.T + 1) / 2 - ids = np.argsort(sc, 1)[:, -10:] - sr = [sr[i, ids[i]] for i in range(bs)] - ids = [sorted(zip(ids[i], sr[i]), key=itemgetter(1), reverse=True) for i in range(bs)] - sr = [list(map(lambda x: x[1], ids[i])) for i in range(bs)] - ids = [list(map(lambda x: x[0], ids[i])) for i in range(bs)] - rsp = [[self.resps[ids[i][0]] for i in range(bs)], [float(sr[i][0]) for i in range(bs)]] - if self.interact_mode == 3: - sr = (ctx_vec @ self.resp_vecs.T + 1) / 2 - sc = (ctx_vec @ self.cont_vecs.T + 1) / 2 - s = (sr + sc) / 2 - ids = np.argmax(s, 1) - rsp = [[self.resps[ids[i]] for i in range(bs)], [float(s[i][ids[i]]) for i in range(bs)]] - # remove special tokens if they are presented - rsp = [[el.replace('__eou__', '').replace('__eot__', '').strip() for el in rsp[0]], rsp[1]] - return rsp diff --git a/deeppavlov/models/bert/bert_sequence_tagger.py b/deeppavlov/models/bert/bert_sequence_tagger.py deleted file mode 100644 index 4d3d8ad75f..0000000000 --- a/deeppavlov/models/bert/bert_sequence_tagger.py +++ /dev/null @@ -1,704 +0,0 @@ -# Copyright 2019 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from logging import getLogger -from typing import List, Union, Dict, Optional - -import numpy as np -import tensorflow as tf -from bert_dp.modeling import BertConfig, BertModel -from bert_dp.optimization import AdamWeightDecayOptimizer - -from deeppavlov.core.commands.utils import expand_path -from deeppavlov.core.common.registry import register -from deeppavlov.core.layers.tf_layers import bi_rnn -from deeppavlov.core.models.tf_model import LRScheduledTFModel - -log = getLogger(__name__) - - -def token_from_subtoken(units: tf.Tensor, mask: tf.Tensor) -> tf.Tensor: - """ Assemble token level units from subtoken level units - - Args: - units: tf.Tensor of shape [batch_size, SUBTOKEN_seq_length, n_features] - mask: mask of token beginnings. For example: for tokens - - [[``[CLS]`` ``My``, ``capybara``, ``[SEP]``], - [``[CLS]`` ``Your``, ``aar``, ``##dvark``, ``is``, ``awesome``, ``[SEP]``]] - - the mask will be - - [[0, 1, 1, 0, 0, 0, 0], - [0, 1, 1, 0, 1, 1, 0]] - - Returns: - word_level_units: Units assembled from ones in the mask. For the - example above this units will correspond to the following - - [[``My``, ``capybara``], - [``Your`, ``aar``, ``is``, ``awesome``,]] - - the shape of this tensor will be [batch_size, TOKEN_seq_length, n_features] - """ - shape = tf.cast(tf.shape(units), tf.int64) - batch_size = shape[0] - nf = shape[2] - nf_int = units.get_shape().as_list()[-1] - - # number of TOKENS in each sentence - token_seq_lengths = tf.cast(tf.reduce_sum(mask, 1), tf.int64) - # for a matrix m = - # [[1, 1, 1], - # [0, 1, 1], - # [1, 0, 0]] - # it will be - # [3, 2, 1] - - n_words = tf.reduce_sum(token_seq_lengths) - # n_words -> 6 - - max_token_seq_len = tf.cast(tf.reduce_max(token_seq_lengths), tf.int64) - # max_token_seq_len -> 3 - - idxs = tf.where(mask) - # for the matrix mentioned above - # tf.where(mask) -> - # [[0, 0], - # [0, 1] - # [0, 2], - # [1, 1], - # [1, 2] - # [2, 0]] - - sample_ids_in_batch = tf.pad(idxs[:, 0], [[1, 0]]) - # for indices - # [[0, 0], - # [0, 1] - # [0, 2], - # [1, 1], - # [1, 2], - # [2, 0]] - # it is - # [0, 0, 0, 0, 1, 1, 2] - # padding is for computing change from one sample to another in the batch - - a = tf.cast(tf.not_equal(sample_ids_in_batch[1:], sample_ids_in_batch[:-1]), tf.int64) - # for the example above the result of this statement equals - # [0, 0, 0, 1, 0, 1] - # so data samples begin in 3rd and 5th positions (the indexes of ones) - - # transforming sample start masks to the sample starts themselves - q = a * tf.cast(tf.range(n_words), tf.int64) - # [0, 0, 0, 3, 0, 5] - count_to_substract = tf.pad(tf.boolean_mask(q, q), [(1, 0)]) - # [0, 3, 5] - - new_word_indices = tf.cast(tf.range(n_words), tf.int64) - tf.gather(count_to_substract, tf.cumsum(a)) - # tf.range(n_words) -> [0, 1, 2, 3, 4, 5] - # tf.cumsum(a) -> [0, 0, 0, 1, 1, 2] - # tf.gather(count_to_substract, tf.cumsum(a)) -> [0, 0, 0, 3, 3, 5] - # new_word_indices -> [0, 1, 2, 3, 4, 5] - [0, 0, 0, 3, 3, 5] = [0, 1, 2, 0, 1, 0] - # new_word_indices is the concatenation of range(word_len(sentence)) - # for all sentences in units - - n_total_word_elements = tf.cast(batch_size * max_token_seq_len, tf.int32) - word_indices_flat = tf.cast(idxs[:, 0] * max_token_seq_len + new_word_indices, tf.int32) - x_mask = tf.reduce_sum(tf.one_hot(word_indices_flat, n_total_word_elements), 0) - x_mask = tf.cast(x_mask, tf.bool) - # to get absolute indices we add max_token_seq_len: - # idxs[:, 0] * max_token_seq_len -> [0, 0, 0, 1, 1, 2] * 2 = [0, 0, 0, 3, 3, 6] - # word_indices_flat -> [0, 0, 0, 3, 3, 6] + [0, 1, 2, 0, 1, 0] = [0, 1, 2, 3, 4, 6] - # total number of words in the batch (including paddings) - # batch_size * max_token_seq_len -> 3 * 3 = 9 - # tf.one_hot(...) -> - # [[1. 0. 0. 0. 0. 0. 0. 0. 0.] - # [0. 1. 0. 0. 0. 0. 0. 0. 0.] - # [0. 0. 1. 0. 0. 0. 0. 0. 0.] - # [0. 0. 0. 1. 0. 0. 0. 0. 0.] - # [0. 0. 0. 0. 1. 0. 0. 0. 0.] - # [0. 0. 0. 0. 0. 0. 1. 0. 0.]] - # x_mask -> [1, 1, 1, 1, 1, 0, 1, 0, 0] - - full_range = tf.cast(tf.range(batch_size * max_token_seq_len), tf.int32) - # full_range -> [0, 1, 2, 3, 4, 5, 6, 7, 8] - nonword_indices_flat = tf.boolean_mask(full_range, tf.math.logical_not(x_mask)) - # # y_idxs -> [5, 7, 8] - - # get a sequence of units corresponding to the start subtokens of the words - # size: [n_words, n_features] - elements = tf.gather_nd(units, idxs) - - # prepare zeros for paddings - # size: [batch_size * TOKEN_seq_length - n_words, n_features] - paddings = tf.zeros(tf.stack([tf.reduce_sum(max_token_seq_len - token_seq_lengths), - nf], 0), tf.float32) - - tensor_flat = tf.dynamic_stitch([word_indices_flat, nonword_indices_flat], - [elements, paddings]) - # tensor_flat -> [x, x, x, x, x, 0, x, 0, 0] - - tensor = tf.reshape(tensor_flat, tf.stack([batch_size, max_token_seq_len, nf_int], 0)) - # tensor -> [[x, x, x], - # [x, x, 0], - # [x, 0, 0]] - - return tensor - - -@register('bert_sequence_network') -class BertSequenceNetwork(LRScheduledTFModel): - """ - Basic class for BERT-based sequential architectures. - - Args: - keep_prob: dropout keep_prob for non-Bert layers - bert_config_file: path to Bert configuration file - pretrained_bert: pretrained Bert checkpoint - attention_probs_keep_prob: keep_prob for Bert self-attention layers - hidden_keep_prob: keep_prob for Bert hidden layers - encoder_layer_ids: list of averaged layers from Bert encoder (layer ids) - optimizer: name of tf.train.* optimizer or None for `AdamWeightDecayOptimizer` - weight_decay_rate: L2 weight decay for `AdamWeightDecayOptimizer` - encoder_dropout: dropout probability of encoder output layer - ema_decay: what exponential moving averaging to use for network parameters, value from 0.0 to 1.0. - Values closer to 1.0 put weight on the parameters history and values closer to 0.0 corresponds put weight - on the current parameters. - ema_variables_on_cpu: whether to put EMA variables to CPU. It may save a lot of GPU memory - freeze_embeddings: set True to not train input embeddings set True to - not train input embeddings set True to not train input embeddings - learning_rate: learning rate of BERT head - bert_learning_rate: learning rate of BERT body - min_learning_rate: min value of learning rate if learning rate decay is used - learning_rate_drop_patience: how many validations with no improvements to wait - learning_rate_drop_div: the divider of the learning rate after `learning_rate_drop_patience` unsuccessful - validations - load_before_drop: whether to load best model before dropping learning rate or not - clip_norm: clip gradients by norm - """ - - def __init__(self, - keep_prob: float, - bert_config_file: str, - pretrained_bert: str = None, - attention_probs_keep_prob: float = None, - hidden_keep_prob: float = None, - encoder_layer_ids: List[int] = (-1,), - encoder_dropout: float = 0.0, - optimizer: str = None, - weight_decay_rate: float = 1e-6, - ema_decay: float = None, - ema_variables_on_cpu: bool = True, - freeze_embeddings: bool = False, - learning_rate: float = 1e-3, - bert_learning_rate: float = 2e-5, - min_learning_rate: float = 1e-07, - learning_rate_drop_patience: int = 20, - learning_rate_drop_div: float = 2.0, - load_before_drop: bool = True, - clip_norm: float = 1.0, - **kwargs) -> None: - super().__init__(learning_rate=learning_rate, - learning_rate_drop_div=learning_rate_drop_div, - learning_rate_drop_patience=learning_rate_drop_patience, - load_before_drop=load_before_drop, - clip_norm=clip_norm, - **kwargs) - self.keep_prob = keep_prob - self.encoder_layer_ids = encoder_layer_ids - self.encoder_dropout = encoder_dropout - self.optimizer = optimizer - self.weight_decay_rate = weight_decay_rate - self.ema_decay = ema_decay - self.ema_variables_on_cpu = ema_variables_on_cpu - self.freeze_embeddings = freeze_embeddings - self.bert_learning_rate_multiplier = bert_learning_rate / learning_rate - self.min_learning_rate = min_learning_rate - - self.bert_config = BertConfig.from_json_file(str(expand_path(bert_config_file))) - - if attention_probs_keep_prob is not None: - self.bert_config.attention_probs_dropout_prob = 1.0 - attention_probs_keep_prob - if hidden_keep_prob is not None: - self.bert_config.hidden_dropout_prob = 1.0 - hidden_keep_prob - - self.sess_config = tf.ConfigProto(allow_soft_placement=True) - self.sess_config.gpu_options.allow_growth = True - self.sess = tf.Session(config=self.sess_config) - - self._init_graph() - - self._init_optimizer() - - self.sess.run(tf.global_variables_initializer()) - - if pretrained_bert is not None: - pretrained_bert = str(expand_path(pretrained_bert)) - - if tf.train.checkpoint_exists(pretrained_bert) \ - and not (self.load_path and tf.train.checkpoint_exists(str(self.load_path.resolve()))): - log.info('[initializing model with Bert from {}]'.format(pretrained_bert)) - # Exclude optimizer and classification variables from saved variables - var_list = self._get_saveable_variables( - exclude_scopes=('Optimizer', 'learning_rate', 'momentum', 'ner', 'EMA')) - saver = tf.train.Saver(var_list) - saver.restore(self.sess, pretrained_bert) - - if self.load_path is not None: - self.load() - - if self.ema: - self.sess.run(self.ema.init_op) - - def _init_graph(self) -> None: - self.seq_lengths = tf.reduce_sum(self.y_masks_ph, axis=1) - - self.bert = BertModel(config=self.bert_config, - is_training=self.is_train_ph, - input_ids=self.input_ids_ph, - input_mask=self.input_masks_ph, - token_type_ids=self.token_types_ph, - use_one_hot_embeddings=False) - - with tf.variable_scope('ner'): - layer_weights = tf.get_variable('layer_weights_', - shape=len(self.encoder_layer_ids), - initializer=tf.ones_initializer(), - trainable=True) - layer_mask = tf.ones_like(layer_weights) - layer_mask = tf.nn.dropout(layer_mask, self.encoder_keep_prob_ph) - layer_weights *= layer_mask - # to prevent zero division - mask_sum = tf.maximum(tf.reduce_sum(layer_mask), 1.0) - layer_weights = tf.unstack(layer_weights / mask_sum) - # TODO: may be stack and reduce_sum is faster - units = sum(w * l for w, l in zip(layer_weights, self.encoder_layers())) - units = tf.nn.dropout(units, keep_prob=self.keep_prob_ph) - return units - - def _get_tag_mask(self) -> tf.Tensor: - """ - Returns: tag_mask, - a mask that selects positions corresponding to word tokens (not padding and `CLS`) - """ - max_length = tf.reduce_max(self.seq_lengths) - one_hot_max_len = tf.one_hot(self.seq_lengths - 1, max_length) - tag_mask = tf.cumsum(one_hot_max_len[:, ::-1], axis=1)[:, ::-1] - return tag_mask - - def encoder_layers(self): - """ - Returns: the output of BERT layers specfied in ``self.encoder_layers_ids`` - """ - return [self.bert.all_encoder_layers[i] for i in self.encoder_layer_ids] - - def _init_placeholders(self) -> None: - self.input_ids_ph = tf.placeholder(shape=(None, None), - dtype=tf.int32, - name='token_indices_ph') - self.input_masks_ph = tf.placeholder(shape=(None, None), - dtype=tf.int32, - name='token_mask_ph') - self.token_types_ph = \ - tf.placeholder_with_default(tf.zeros_like(self.input_ids_ph, dtype=tf.int32), - shape=self.input_ids_ph.shape, - name='token_types_ph') - self.learning_rate_ph = tf.placeholder_with_default(0.0, shape=[], name='learning_rate_ph') - self.keep_prob_ph = tf.placeholder_with_default(1.0, shape=[], name='keep_prob_ph') - self.encoder_keep_prob_ph = tf.placeholder_with_default(1.0, shape=[], name='encoder_keep_prob_ph') - self.is_train_ph = tf.placeholder_with_default(False, shape=[], name='is_train_ph') - - def _init_optimizer(self) -> None: - with tf.variable_scope('Optimizer'): - self.global_step = tf.get_variable('global_step', - shape=[], - dtype=tf.int32, - initializer=tf.constant_initializer(0), - trainable=False) - # default optimizer for Bert is Adam with fixed L2 regularization - - if self.optimizer is None: - self.train_op = \ - self.get_train_op(self.loss, - learning_rate=self.learning_rate_ph, - optimizer=AdamWeightDecayOptimizer, - weight_decay_rate=self.weight_decay_rate, - beta_1=0.9, - beta_2=0.999, - epsilon=1e-6, - optimizer_scope_name='Optimizer', - exclude_from_weight_decay=["LayerNorm", - "layer_norm", - "bias", - "EMA"]) - else: - self.train_op = self.get_train_op(self.loss, - learning_rate=self.learning_rate_ph, - optimizer_scope_name='Optimizer') - - if self.optimizer is None: - with tf.variable_scope('Optimizer'): - new_global_step = self.global_step + 1 - self.train_op = tf.group(self.train_op, [self.global_step.assign(new_global_step)]) - - if self.ema_decay is not None: - _vars = self._get_trainable_variables(exclude_scopes=["Optimizer", - "LayerNorm", - "layer_norm", - "bias", - "learning_rate", - "momentum"]) - - self.ema = ExponentialMovingAverage(self.ema_decay, - variables_on_cpu=self.ema_variables_on_cpu) - self.train_op = self.ema.build(self.train_op, _vars, name="EMA") - else: - self.ema = None - - def get_train_op(self, loss: tf.Tensor, learning_rate: Union[tf.Tensor, float], **kwargs) -> tf.Operation: - assert "learnable_scopes" not in kwargs, "learnable scopes unsupported" - # train_op for bert variables - kwargs['learnable_scopes'] = ('bert/encoder', 'bert/embeddings') - if self.freeze_embeddings: - kwargs['learnable_scopes'] = ('bert/encoder',) - bert_learning_rate = learning_rate * self.bert_learning_rate_multiplier - bert_train_op = super().get_train_op(loss, - bert_learning_rate, - **kwargs) - # train_op for ner head variables - kwargs['learnable_scopes'] = ('ner',) - head_train_op = super().get_train_op(loss, - learning_rate, - **kwargs) - return tf.group(bert_train_op, head_train_op) - - def _build_basic_feed_dict(self, input_ids: tf.Tensor, input_masks: tf.Tensor, - token_types: Optional[tf.Tensor]=None, train: bool=False) -> dict: - """Fills the feed_dict with the tensors defined in the basic class. - You need to update this dict by the values of output placeholders - and class-specific network inputs in your derived class. - """ - feed_dict = { - self.input_ids_ph: input_ids, - self.input_masks_ph: input_masks, - } - if token_types is not None: - feed_dict[self.token_types_ph] = token_types - if train: - feed_dict.update({ - self.learning_rate_ph: max(self.get_learning_rate(), self.min_learning_rate), - self.keep_prob_ph: self.keep_prob, - self.encoder_keep_prob_ph: 1.0 - self.encoder_dropout, - self.is_train_ph: True, - }) - - return feed_dict - - def _build_feed_dict(self, input_ids, input_masks, token_types=None, *args, **kwargs): - raise NotImplementedError("You must implement _build_feed_dict in your derived class.") - - def train_on_batch(self, - input_ids: Union[List[List[int]], np.ndarray], - input_masks: Union[List[List[int]], np.ndarray], - *args, **kwargs) -> Dict[str, float]: - """ - - Args: - input_ids: batch of indices of subwords - input_masks: batch of masks which determine what should be attended - args: arguments passed to _build_feed_dict - and corresponding to additional input - and output tensors of the derived class. - kwargs: keyword arguments passed to _build_feed_dict - and corresponding to additional input - and output tensors of the derived class. - - Returns: - dict with fields 'loss', 'head_learning_rate', and 'bert_learning_rate' - """ - feed_dict = self._build_feed_dict(input_ids, input_masks, *args, **kwargs) - - if self.ema: - self.sess.run(self.ema.switch_to_train_op) - _, loss, lr = self.sess.run([self.train_op, self.loss, self.learning_rate_ph], - feed_dict=feed_dict) - return {'loss': loss, - 'head_learning_rate': float(lr), - 'bert_learning_rate': float(lr) * self.bert_learning_rate_multiplier} - - def __call__(self, - input_ids: Union[List[List[int]], np.ndarray], - input_masks: Union[List[List[int]], np.ndarray], - **kwargs) -> Union[List[List[int]], List[np.ndarray]]: - raise NotImplementedError("You must implement method __call__ in your derived class.") - - def save(self, exclude_scopes=('Optimizer', 'EMA/BackupVariables')) -> None: - if self.ema: - self.sess.run(self.ema.switch_to_train_op) - return super().save(exclude_scopes=exclude_scopes) - - def load(self, - exclude_scopes=('Optimizer', - 'learning_rate', - 'momentum', - 'EMA/BackupVariables'), - **kwargs) -> None: - return super().load(exclude_scopes=exclude_scopes, **kwargs) - - -@register('bert_sequence_tagger') -class BertSequenceTagger(BertSequenceNetwork): - """BERT-based model for text tagging. It predicts a label for every token (not subtoken) in the text. - You can use it for sequence labeling tasks, such as morphological tagging or named entity recognition. - See :class:`deeppavlov.models.bert.bert_sequence_tagger.BertSequenceNetwork` - for the description of inherited parameters. - - Args: - n_tags: number of distinct tags - use_crf: whether to use CRF on top or not - use_birnn: whether to use bidirection rnn after BERT layers. - For NER and morphological tagging we usually set it to `False` as otherwise the model overfits - birnn_cell_type: the type of Bidirectional RNN. Either `lstm` or `gru` - birnn_hidden_size: number of hidden units in the BiRNN layer in each direction - return_probas: set this to `True` if you need the probabilities instead of raw answers - """ - - def __init__(self, - n_tags: List[str], - keep_prob: float, - bert_config_file: str, - pretrained_bert: str = None, - attention_probs_keep_prob: float = None, - hidden_keep_prob: float = None, - use_crf=False, - encoder_layer_ids: List[int] = (-1,), - encoder_dropout: float = 0.0, - optimizer: str = None, - weight_decay_rate: float = 1e-6, - use_birnn: bool = False, - birnn_cell_type: str = 'lstm', - birnn_hidden_size: int = 128, - ema_decay: float = None, - ema_variables_on_cpu: bool = True, - return_probas: bool = False, - freeze_embeddings: bool = False, - learning_rate: float = 1e-3, - bert_learning_rate: float = 2e-5, - min_learning_rate: float = 1e-07, - learning_rate_drop_patience: int = 20, - learning_rate_drop_div: float = 2.0, - load_before_drop: bool = True, - clip_norm: float = 1.0, - **kwargs) -> None: - self.n_tags = n_tags - self.use_crf = use_crf - self.use_birnn = use_birnn - self.birnn_cell_type = birnn_cell_type - self.birnn_hidden_size = birnn_hidden_size - self.return_probas = return_probas - super().__init__(keep_prob=keep_prob, - bert_config_file=bert_config_file, - pretrained_bert=pretrained_bert, - attention_probs_keep_prob=attention_probs_keep_prob, - hidden_keep_prob=hidden_keep_prob, - encoder_layer_ids=encoder_layer_ids, - encoder_dropout=encoder_dropout, - optimizer=optimizer, - weight_decay_rate=weight_decay_rate, - ema_decay=ema_decay, - ema_variables_on_cpu=ema_variables_on_cpu, - freeze_embeddings=freeze_embeddings, - learning_rate=learning_rate, - bert_learning_rate=bert_learning_rate, - min_learning_rate=min_learning_rate, - learning_rate_drop_div=learning_rate_drop_div, - learning_rate_drop_patience=learning_rate_drop_patience, - load_before_drop=load_before_drop, - clip_norm=clip_norm, - **kwargs) - - def _init_graph(self) -> None: - self._init_placeholders() - - units = super()._init_graph() - - with tf.variable_scope('ner'): - if self.use_birnn: - units, _ = bi_rnn(units, - self.birnn_hidden_size, - cell_type=self.birnn_cell_type, - seq_lengths=self.seq_lengths, - name='birnn') - units = tf.concat(units, -1) - # TODO: maybe add one more layer? - logits = tf.layers.dense(units, units=self.n_tags, name="output_dense") - - self.logits = token_from_subtoken(logits, self.y_masks_ph) - - # CRF - if self.use_crf: - transition_params = tf.get_variable('Transition_Params', - shape=[self.n_tags, self.n_tags], - initializer=tf.zeros_initializer()) - log_likelihood, transition_params = \ - tf.contrib.crf.crf_log_likelihood(self.logits, - self.y_ph, - self.seq_lengths, - transition_params) - loss_tensor = -log_likelihood - self._transition_params = transition_params - - self.y_predictions = tf.argmax(self.logits, -1) - self.y_probas = tf.nn.softmax(self.logits, axis=2) - - with tf.variable_scope("loss"): - tag_mask = self._get_tag_mask() - y_mask = tf.cast(tag_mask, tf.float32) - if self.use_crf: - self.loss = tf.reduce_mean(loss_tensor) - else: - self.loss = tf.losses.sparse_softmax_cross_entropy(labels=self.y_ph, - logits=self.logits, - weights=y_mask) - - def _init_placeholders(self) -> None: - super()._init_placeholders() - self.y_ph = tf.placeholder(shape=(None, None), dtype=tf.int32, name='y_ph') - self.y_masks_ph = tf.placeholder(shape=(None, None), - dtype=tf.int32, - name='y_mask_ph') - - def _decode_crf(self, feed_dict: Dict[tf.Tensor, np.ndarray]) -> List[np.ndarray]: - logits, trans_params, mask, seq_lengths = self.sess.run([self.logits, - self._transition_params, - self.y_masks_ph, - self.seq_lengths], - feed_dict=feed_dict) - # iterate over the sentences because no batching in viterbi_decode - y_pred = [] - for logit, sequence_length in zip(logits, seq_lengths): - logit = logit[:int(sequence_length)] # keep only the valid steps - viterbi_seq, viterbi_score = tf.contrib.crf.viterbi_decode(logit, trans_params) - y_pred += [viterbi_seq] - return y_pred - - def _build_feed_dict(self, input_ids, input_masks, y_masks, y=None): - feed_dict = self._build_basic_feed_dict(input_ids, input_masks, train=(y is not None)) - feed_dict[self.y_masks_ph] = y_masks - if y is not None: - feed_dict[self.y_ph] = y - return feed_dict - - def __call__(self, - input_ids: Union[List[List[int]], np.ndarray], - input_masks: Union[List[List[int]], np.ndarray], - y_masks: Union[List[List[int]], np.ndarray]) -> Union[List[List[int]], List[np.ndarray]]: - """ Predicts tag indices for a given subword tokens batch - - Args: - input_ids: indices of the subwords - input_masks: mask that determines where to attend and where not to - y_masks: mask which determines the first subword units in the the word - - Returns: - Label indices or class probabilities for each token (not subtoken) - - """ - feed_dict = self._build_feed_dict(input_ids, input_masks, y_masks) - if self.ema: - self.sess.run(self.ema.switch_to_test_op) - if not self.return_probas: - if self.use_crf: - pred = self._decode_crf(feed_dict) - else: - pred, seq_lengths = self.sess.run([self.y_predictions, self.seq_lengths], feed_dict=feed_dict) - pred = [p[:l] for l, p in zip(seq_lengths, pred)] - else: - pred = self.sess.run(self.y_probas, feed_dict=feed_dict) - return pred - - -class ExponentialMovingAverage: - def __init__(self, - decay: float = 0.999, - variables_on_cpu: bool = True) -> None: - self.decay = decay - self.ema = tf.train.ExponentialMovingAverage(decay=decay) - self.var_device_name = '/cpu:0' if variables_on_cpu else None - self.train_mode = None - - def build(self, - minimize_op: tf.Tensor, - update_vars: List[tf.Variable] = None, - name: str = "EMA") -> tf.Tensor: - with tf.variable_scope(name): - if update_vars is None: - update_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES) - - with tf.control_dependencies([minimize_op]): - minimize_op = self.ema.apply(update_vars) - - with tf.device(self.var_device_name): - # Make backup variables - with tf.variable_scope('BackupVariables'): - backup_vars = [tf.get_variable(var.op.name, - dtype=var.value().dtype, - trainable=False, - initializer=var.initialized_value()) - for var in update_vars] - - def ema_to_weights(): - return tf.group(*(tf.assign(var, self.ema.average(var).read_value()) - for var in update_vars)) - - def save_weight_backups(): - return tf.group(*(tf.assign(bck, var.read_value()) - for var, bck in zip(update_vars, backup_vars))) - - def restore_weight_backups(): - return tf.group(*(tf.assign(var, bck.read_value()) - for var, bck in zip(update_vars, backup_vars))) - - train_switch_op = restore_weight_backups() - with tf.control_dependencies([save_weight_backups()]): - test_switch_op = ema_to_weights() - - self.train_switch_op = train_switch_op - self.test_switch_op = test_switch_op - self.do_nothing_op = tf.no_op() - - return minimize_op - - @property - def init_op(self) -> tf.Operation: - self.train_mode = False - return self.test_switch_op - - @property - def switch_to_train_op(self) -> tf.Operation: - assert self.train_mode is not None, "ema variables aren't initialized" - if not self.train_mode: - # log.info("switching to train mode") - self.train_mode = True - return self.train_switch_op - return self.do_nothing_op - - @property - def switch_to_test_op(self) -> tf.Operation: - assert self.train_mode is not None, "ema variables aren't initialized" - if self.train_mode: - # log.info("switching to test mode") - self.train_mode = False - return self.test_switch_op - return self.do_nothing_op diff --git a/deeppavlov/models/bert/bert_squad.py b/deeppavlov/models/bert/bert_squad.py deleted file mode 100644 index 53b8313bba..0000000000 --- a/deeppavlov/models/bert/bert_squad.py +++ /dev/null @@ -1,366 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import json -import math -from logging import getLogger -from typing import List, Tuple, Optional, Dict - -import numpy as np -import tensorflow as tf -from bert_dp.modeling import BertConfig, BertModel -from bert_dp.optimization import AdamWeightDecayOptimizer -from bert_dp.preprocessing import InputFeatures -from bert_dp.tokenization import FullTokenizer - -from deeppavlov import build_model -from deeppavlov.core.commands.utils import expand_path -from deeppavlov.core.common.registry import register -from deeppavlov.core.models.estimator import Component -from deeppavlov.core.models.tf_model import LRScheduledTFModel -from deeppavlov.models.squad.utils import softmax_mask - -logger = getLogger(__name__) - - -@register('squad_bert_model') -class BertSQuADModel(LRScheduledTFModel): - """Bert-based model for SQuAD-like problem setting: - It predicts start and end position of answer for given question and context. - - [CLS] token is used as no_answer. If model selects [CLS] token as most probable - answer, it means that there is no answer in given context. - - Start and end position of answer are predicted by linear transformation - of Bert outputs. - - Args: - bert_config_file: path to Bert configuration file - keep_prob: dropout keep_prob for non-Bert layers - attention_probs_keep_prob: keep_prob for Bert self-attention layers - hidden_keep_prob: keep_prob for Bert hidden layers - optimizer: name of tf.train.* optimizer or None for `AdamWeightDecayOptimizer` - weight_decay_rate: L2 weight decay for `AdamWeightDecayOptimizer` - pretrained_bert: pretrained Bert checkpoint - min_learning_rate: min value of learning rate if learning rate decay is used - """ - - def __init__(self, bert_config_file: str, - keep_prob: float, - attention_probs_keep_prob: Optional[float] = None, - hidden_keep_prob: Optional[float] = None, - optimizer: Optional[str] = None, - weight_decay_rate: Optional[float] = 0.01, - pretrained_bert: Optional[str] = None, - min_learning_rate: float = 1e-06, **kwargs) -> None: - super().__init__(**kwargs) - - self.min_learning_rate = min_learning_rate - self.keep_prob = keep_prob - self.optimizer = optimizer - self.weight_decay_rate = weight_decay_rate - - self.bert_config = BertConfig.from_json_file(str(expand_path(bert_config_file))) - - if attention_probs_keep_prob is not None: - self.bert_config.attention_probs_dropout_prob = 1.0 - attention_probs_keep_prob - if hidden_keep_prob is not None: - self.bert_config.hidden_dropout_prob = 1.0 - hidden_keep_prob - - self.sess_config = tf.ConfigProto(allow_soft_placement=True) - self.sess_config.gpu_options.allow_growth = True - self.sess = tf.Session(config=self.sess_config) - - self._init_graph() - - self._init_optimizer() - - self.sess.run(tf.global_variables_initializer()) - - if pretrained_bert is not None: - pretrained_bert = str(expand_path(pretrained_bert)) - - if tf.train.checkpoint_exists(pretrained_bert) \ - and not (self.load_path and tf.train.checkpoint_exists(str(self.load_path.resolve()))): - logger.info('[initializing model with Bert from {}]'.format(pretrained_bert)) - var_list = self._get_saveable_variables( - exclude_scopes=('Optimizer', 'learning_rate', 'momentum', 'squad')) - saver = tf.train.Saver(var_list) - saver.restore(self.sess, pretrained_bert) - - if self.load_path is not None: - self.load() - - def _init_graph(self): - self._init_placeholders() - - seq_len = tf.shape(self.input_ids_ph)[-1] - self.y_st = tf.one_hot(self.y_st_ph, depth=seq_len) - self.y_end = tf.one_hot(self.y_end_ph, depth=seq_len) - - self.bert = BertModel(config=self.bert_config, - is_training=self.is_train_ph, - input_ids=self.input_ids_ph, - input_mask=self.input_masks_ph, - token_type_ids=self.token_types_ph, - use_one_hot_embeddings=False, - ) - - last_layer = self.bert.get_sequence_output() - hidden_size = last_layer.get_shape().as_list()[-1] - bs = tf.shape(last_layer)[0] - - with tf.variable_scope('squad'): - output_weights = tf.get_variable('output_weights', [2, hidden_size], - initializer=tf.truncated_normal_initializer(stddev=0.02)) - output_bias = tf.get_variable('output_bias', [2], initializer=tf.zeros_initializer()) - - last_layer_rs = tf.reshape(last_layer, [-1, hidden_size]) - - logits = tf.matmul(last_layer_rs, output_weights, transpose_b=True) - logits = tf.nn.bias_add(logits, output_bias) - logits = tf.reshape(logits, [bs, -1, 2]) - logits = tf.transpose(logits, [2, 0, 1]) - - logits_st, logits_end = tf.unstack(logits, axis=0) - - logit_mask = self.token_types_ph - # [CLS] token is used as no answer - mask = tf.concat([tf.ones((bs, 1), dtype=tf.int32), tf.zeros((bs, seq_len - 1), dtype=tf.int32)], axis=-1) - logit_mask = logit_mask + mask - - logits_st = softmax_mask(logits_st, logit_mask) - logits_end = softmax_mask(logits_end, logit_mask) - start_probs = tf.nn.softmax(logits_st) - end_probs = tf.nn.softmax(logits_end) - - outer = tf.matmul(tf.expand_dims(start_probs, axis=2), tf.expand_dims(end_probs, axis=1)) - outer_logits = tf.exp(tf.expand_dims(logits_st, axis=2) + tf.expand_dims(logits_end, axis=1)) - - context_max_len = tf.reduce_max(tf.reduce_sum(self.token_types_ph, axis=1)) - - max_ans_length = tf.cast(tf.minimum(20, context_max_len), tf.int64) - outer = tf.matrix_band_part(outer, 0, max_ans_length) - outer_logits = tf.matrix_band_part(outer_logits, 0, max_ans_length) - - self.yp_score = 1 - tf.nn.softmax(logits_st)[:, 0] * tf.nn.softmax(logits_end)[:, 0] - - self.start_probs = start_probs - self.end_probs = end_probs - self.start_pred = tf.argmax(tf.reduce_max(outer, axis=2), axis=1) - self.end_pred = tf.argmax(tf.reduce_max(outer, axis=1), axis=1) - self.yp_logits = tf.reduce_max(tf.reduce_max(outer_logits, axis=2), axis=1) - - with tf.variable_scope("loss"): - loss_st = tf.nn.softmax_cross_entropy_with_logits(logits=logits_st, labels=self.y_st) - loss_end = tf.nn.softmax_cross_entropy_with_logits(logits=logits_end, labels=self.y_end) - self.loss = tf.reduce_mean(loss_st + loss_end) - - def _init_placeholders(self): - self.input_ids_ph = tf.placeholder(shape=(None, None), dtype=tf.int32, name='ids_ph') - self.input_masks_ph = tf.placeholder(shape=(None, None), dtype=tf.int32, name='masks_ph') - self.token_types_ph = tf.placeholder(shape=(None, None), dtype=tf.int32, name='token_types_ph') - - self.y_st_ph = tf.placeholder(shape=(None,), dtype=tf.int32, name='y_st_ph') - self.y_end_ph = tf.placeholder(shape=(None,), dtype=tf.int32, name='y_end_ph') - - self.learning_rate_ph = tf.placeholder_with_default(0.0, shape=[], name='learning_rate_ph') - self.keep_prob_ph = tf.placeholder_with_default(1.0, shape=[], name='keep_prob_ph') - self.is_train_ph = tf.placeholder_with_default(False, shape=[], name='is_train_ph') - - def _init_optimizer(self): - with tf.variable_scope('Optimizer'): - self.global_step = tf.get_variable('global_step', shape=[], dtype=tf.int32, - initializer=tf.constant_initializer(0), trainable=False) - # default optimizer for Bert is Adam with fixed L2 regularization - if self.optimizer is None: - - self.train_op = self.get_train_op(self.loss, learning_rate=self.learning_rate_ph, - optimizer=AdamWeightDecayOptimizer, - weight_decay_rate=self.weight_decay_rate, - beta_1=0.9, - beta_2=0.999, - epsilon=1e-6, - exclude_from_weight_decay=["LayerNorm", "layer_norm", "bias"] - ) - else: - self.train_op = self.get_train_op(self.loss, learning_rate=self.learning_rate_ph) - - if self.optimizer is None: - new_global_step = self.global_step + 1 - self.train_op = tf.group(self.train_op, [self.global_step.assign(new_global_step)]) - - def _build_feed_dict(self, input_ids, input_masks, token_types, y_st=None, y_end=None): - feed_dict = { - self.input_ids_ph: input_ids, - self.input_masks_ph: input_masks, - self.token_types_ph: token_types, - } - if y_st is not None and y_end is not None: - feed_dict.update({ - self.y_st_ph: y_st, - self.y_end_ph: y_end, - self.learning_rate_ph: max(self.get_learning_rate(), self.min_learning_rate), - self.keep_prob_ph: self.keep_prob, - self.is_train_ph: True, - }) - - return feed_dict - - def train_on_batch(self, features: List[InputFeatures], y_st: List[List[int]], y_end: List[List[int]]) -> Dict: - """Train model on given batch. - This method calls train_op using features and labels from y_st and y_end - - Args: - features: batch of InputFeatures instances - y_st: batch of lists of ground truth answer start positions - y_end: batch of lists of ground truth answer end positions - - Returns: - dict with loss and learning_rate values - - """ - input_ids = [f.input_ids for f in features] - input_masks = [f.input_mask for f in features] - input_type_ids = [f.input_type_ids for f in features] - - y_st = [x[0] for x in y_st] - y_end = [x[0] for x in y_end] - - feed_dict = self._build_feed_dict(input_ids, input_masks, input_type_ids, y_st, y_end) - - _, loss = self.sess.run([self.train_op, self.loss], feed_dict=feed_dict) - return {'loss': loss, 'learning_rate': feed_dict[self.learning_rate_ph]} - - def __call__(self, features: List[InputFeatures]) -> Tuple[List[int], List[int], List[float], List[float]]: - """get predictions using features as input - - Args: - features: batch of InputFeatures instances - - Returns: - predictions: start, end positions, logits for answer and no_answer score - - """ - input_ids = [f.input_ids for f in features] - input_masks = [f.input_mask for f in features] - input_type_ids = [f.input_type_ids for f in features] - - feed_dict = self._build_feed_dict(input_ids, input_masks, input_type_ids) - st, end, logits, scores = self.sess.run([self.start_pred, self.end_pred, self.yp_logits, self.yp_score], - feed_dict=feed_dict) - return st, end, logits.tolist(), scores.tolist() - - -@register('squad_bert_infer') -class BertSQuADInferModel(Component): - """This model wraps BertSQuADModel to make predictions on longer than 512 tokens sequences. - - It splits context on chunks with `max_seq_length - 3 - len(question)` length, preserving sentences boundaries. - - It reassembles batches with chunks instead of full contexts to optimize performance, e.g.,: - batch_size = 5 - number_of_contexts == 2 - number of first context chunks == 8 - number of second context chunks == 2 - - we will create two batches with 5 chunks - - For each context the best answer is selected via logits or scores from BertSQuADModel. - - - Args: - squad_model_config: path to DeepPavlov BertSQuADModel config file - vocab_file: path to Bert vocab file - do_lower_case: set True if lowercasing is needed - max_seq_length: max sequence length in subtokens, including [SEP] and [CLS] tokens - batch_size: size of batch to use during inference - lang: either `en` or `ru`, it is used to select sentence tokenizer - - """ - - def __init__(self, squad_model_config: str, - vocab_file: str, - do_lower_case: bool, - max_seq_length: int = 512, - batch_size: int = 10, - lang='en', **kwargs) -> None: - config = json.load(open(squad_model_config)) - config['chainer']['pipe'][0]['max_seq_length'] = max_seq_length - self.model = build_model(config) - self.max_seq_length = max_seq_length - vocab_file = str(expand_path(vocab_file)) - self.tokenizer = FullTokenizer(vocab_file=vocab_file, do_lower_case=do_lower_case) - self.batch_size = batch_size - - if lang == 'en': - from nltk import sent_tokenize - self.sent_tokenizer = sent_tokenize - elif lang == 'ru': - from ru_sent_tokenize import ru_sent_tokenize - self.sent_tokenizer = ru_sent_tokenize - else: - raise RuntimeError('en and ru languages are supported only') - - def __call__(self, contexts: List[str], questions: List[str], **kwargs) -> Tuple[List[str], List[int], List[float]]: - """get predictions for given contexts and questions - - Args: - contexts: batch of contexts - questions: batch of questions - - Returns: - predictions: answer, answer start position, logits or scores - - """ - batch_indices = [] - contexts_to_predict = [] - questions_to_predict = [] - predictions = {} - for i, (context, question) in enumerate(zip(contexts, questions)): - context_subtokens = self.tokenizer.tokenize(context) - question_subtokens = self.tokenizer.tokenize(question) - max_chunk_len = self.max_seq_length - len(question_subtokens) - 3 - if 0 < max_chunk_len < len(context_subtokens): - number_of_chunks = math.ceil(len(context_subtokens) / max_chunk_len) - sentences = self.sent_tokenizer(context) - for chunk in np.array_split(sentences, number_of_chunks): - contexts_to_predict += [' '.join(chunk)] - questions_to_predict += [question] - batch_indices += [i] - else: - contexts_to_predict += [context] - questions_to_predict += [question] - batch_indices += [i] - - for j in range(0, len(contexts_to_predict), self.batch_size): - c_batch = contexts_to_predict[j: j + self.batch_size] - q_batch = questions_to_predict[j: j + self.batch_size] - ind_batch = batch_indices[j: j + self.batch_size] - a_batch, a_st_batch, logits_batch = self.model(c_batch, q_batch) - for a, a_st, logits, ind in zip(a_batch, a_st_batch, logits_batch, ind_batch): - if ind in predictions: - predictions[ind] += [(a, a_st, logits)] - else: - predictions[ind] = [(a, a_st, logits)] - - answers, answer_starts, logits = [], [], [] - for ind in sorted(predictions.keys()): - prediction = predictions[ind] - best_answer_ind = np.argmax([p[2] for p in prediction]) - answers += [prediction[best_answer_ind][0]] - answer_starts += [prediction[best_answer_ind][1]] - logits += [prediction[best_answer_ind][2]] - - return answers, answer_starts, logits diff --git a/deeppavlov/models/classifiers/keras_classification_model.py b/deeppavlov/models/classifiers/keras_classification_model.py deleted file mode 100644 index fe4ced95c3..0000000000 --- a/deeppavlov/models/classifiers/keras_classification_model.py +++ /dev/null @@ -1,960 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from copy import deepcopy -from logging import getLogger -from pathlib import Path -from typing import List, Tuple, Optional, Generator, Union - -import numpy as np -import tensorflow.keras -from overrides import overrides -from tensorflow.keras import backend as K -from tensorflow.keras.layers import (Conv1D, Dropout, Dense, Input, BatchNormalization, GlobalMaxPooling1D, - MaxPooling1D, concatenate, Activation, Reshape, - GlobalAveragePooling1D, LSTM, GRU, Bidirectional) -from tensorflow.keras.models import Model -from tensorflow.keras.regularizers import l2 - -from deeppavlov.core.common.errors import ConfigError -from deeppavlov.core.common.file import save_json, read_json -from deeppavlov.core.common.registry import register -from deeppavlov.core.layers.keras_layers import additive_self_attention, multiplicative_self_attention -from deeppavlov.core.models.keras_model import LRScheduledKerasModel - -log = getLogger(__name__) - - -@register('keras_classification_model') -class KerasClassificationModel(LRScheduledKerasModel): - """ - Class implements Keras model for classification task for multi-class multi-labeled data. - - Args: - embedding_size: embedding_size from embedder in pipeline - n_classes: number of considered classes - model_name: particular method of this class to initialize model configuration - optimizer: function name from keras.optimizers - loss: function name from keras.losses. - last_layer_activation: parameter that determines activation function after classification layer. - For multi-label classification use `sigmoid`, - otherwise, `softmax`. - restore_lr: in case of loading pre-trained model \ - whether to init learning rate with the final learning rate value from saved opt - classes: list or generator of considered classes - text_size: maximal length of text in tokens (words), - longer texts are cut, - shorter ones are padded with zeros (pre-padding) - padding: ``pre`` or ``post`` padding to use - - Attributes: - opt: dictionary with all model parameters - n_classes: number of considered classes - model: keras model itself - epochs_done: number of epochs that were done - batches_seen: number of epochs that were seen - train_examples_seen: number of training samples that were seen - sess: tf session - optimizer: keras.optimizers instance - classes: list of considered classes - padding: ``pre`` or ``post`` padding to use - """ - - def __init__(self, embedding_size: int, n_classes: int, - model_name: str, optimizer: str = "Adam", loss: str = "binary_crossentropy", - learning_rate: Union[None, float, List[float]] = None, - learning_rate_decay: Optional[Union[float, str]] = 0., - last_layer_activation: str = "sigmoid", - restore_lr: bool = False, - classes: Optional[Union[list, Generator]] = None, - text_size: Optional[int] = None, - padding: Optional[str] = "pre", - **kwargs): - """ - Initialize model using parameters - from opt dictionary (from config), if model is being initialized from saved. - """ - if learning_rate is None and isinstance(learning_rate_decay, float): - learning_rate = 0.01 - elif learning_rate is None and learning_rate_decay is None: - learning_rate = 0.01 - learning_rate_decay = 0. - elif isinstance(learning_rate, float) and "learning_rate_drop_patience" in kwargs: - learning_rate_decay = "no" - - if classes is not None: - classes = list(classes) - - given_opt = {"embedding_size": embedding_size, - "n_classes": n_classes, - "model_name": model_name, - "optimizer": optimizer, - "loss": loss, - "learning_rate": learning_rate, - "learning_rate_decay": learning_rate_decay, - "last_layer_activation": last_layer_activation, - "restore_lr": restore_lr, - "classes": classes, - "text_size": text_size, - "padding": padding, - **kwargs} - self.opt = deepcopy(given_opt) - self.model = None - self.optimizer = None - - super().__init__(**given_opt) - - if classes is not None: - self.classes = self.opt.get("classes") - - self.n_classes = self.opt.get('n_classes') - if self.n_classes == 0: - raise ConfigError("Please, provide vocabulary with considered classes.") - - self.load() - - summary = ['Model was successfully initialized!', 'Model summary:'] - self.model.summary(print_fn=summary.append) - log.info('\n'.join(summary)) - - @overrides - def get_optimizer(self): - return self.model.optimizer - - def pad_texts(self, sentences: List[List[np.ndarray]]) -> Union[np.ndarray, Tuple[np.ndarray, np.ndarray]]: - """ - Cut and pad tokenized texts to self.opt["text_size"] tokens - - Args: - sentences: list of lists of tokens - - Returns: - array of embedded texts - """ - pad = np.zeros(self.opt['embedding_size']) - cut_batch = [sen[:self.opt['text_size']] for sen in sentences] - if self.opt["padding"] == "pre": - cut_batch = [[pad] * (self.opt['text_size'] - len(tokens)) + list(tokens) for tokens in cut_batch] - elif self.opt["padding"] == "post": - cut_batch = [list(tokens) + [pad] * (self.opt['text_size'] - len(tokens)) for tokens in cut_batch] - else: - raise ConfigError("Padding type {} is not acceptable".format(self.opt['padding'])) - return np.asarray(cut_batch) - - def check_input(self, texts: List[List[np.ndarray]]) -> np.ndarray: - """ - Check and convert input to array of tokenized embedded samples - - Args: - texts: list of tokenized embedded text samples - - Returns: - array of tokenized embedded texts samples that are cut and padded - """ - if self.opt["text_size"] is not None: - features = self.pad_texts(texts) - else: - if len(texts[0]): - features = np.array(texts) - else: - features = np.zeros((1, 1, self.opt["embedding_size"])) - - return features - - def train_on_batch(self, texts: List[List[np.ndarray]], labels: list) -> Union[float, List[float]]: - """ - Train the model on the given batch - - Args: - texts: list of tokenized embedded text samples - labels: list of labels - - Returns: - metrics values on the given batch - """ - features = self.check_input(texts) - - metrics_values = self.model.train_on_batch(features, np.array(labels)) - return metrics_values - - def __call__(self, data: List[List[np.ndarray]]) -> List[List[float]]: - """ - Infer on the given data - - Args: - data: list of tokenized text samples - - Returns: - for each sentence: - vector of probabilities to belong with each class - or list of labels sentence belongs with - """ - features = self.check_input(data) - return self.model.predict(features) - - def init_model_from_scratch(self, model_name: str) -> Model: - """ - Initialize uncompiled model from scratch with given params - - Args: - model_name: name of model function described as a method of this class - - Returns: - compiled model with given network and learning parameters - """ - log.info(f'[initializing `{self.__class__.__name__}` from scratch as {model_name}]') - model_func = getattr(self, model_name, None) - if callable(model_func): - model = model_func(**self.opt) - else: - raise AttributeError("Model {} is not defined".format(model_name)) - - return model - - def _load(self, model_name: str) -> None: - """ - Initialize uncompiled model from saved params and weights - - Args: - model_name: name of model function described as a method of this class - - Returns: - model with loaded weights and network parameters from files - but compiled with given learning parameters - """ - if self.load_path: - if isinstance(self.load_path, Path) and not self.load_path.parent.is_dir(): - raise ConfigError("Provided load path is incorrect!") - - opt_path = Path("{}_opt.json".format(str(self.load_path.resolve()))) - weights_path = Path("{}.h5".format(str(self.load_path.resolve()))) - - if opt_path.exists() and weights_path.exists(): - - log.info("[initializing `{}` from saved]".format(self.__class__.__name__)) - - self.opt["final_learning_rate"] = read_json(opt_path).get("final_learning_rate") - - model_func = getattr(self, model_name, None) - if callable(model_func): - model = model_func(**self.opt) - else: - raise AttributeError("Model {} is not defined".format(model_name)) - - log.info("[loading weights from {}]".format(weights_path.name)) - try: - model.load_weights(str(weights_path)) - except ValueError: - raise ConfigError("Some non-changeable parameters of neural network differ" - " from given pre-trained model") - - self.model = model - - return None - else: - self.model = self.init_model_from_scratch(model_name) - return None - else: - log.warning("No `load_path` is provided for {}".format(self.__class__.__name__)) - self.model = self.init_model_from_scratch(model_name) - return None - - def compile(self, model: Model, optimizer_name: str, loss_name: str, - learning_rate: Optional[Union[float, List[float]]], - learning_rate_decay: Optional[Union[float, str]]) -> Model: - """ - Compile model with given optimizer and loss - - Args: - model: keras uncompiled model - optimizer_name: name of optimizer from keras.optimizers - loss_name: loss function name (from keras.losses) - learning_rate: learning rate. - learning_rate_decay: learning rate decay. - - Returns: - - """ - optimizer_func = getattr(tensorflow.keras.optimizers, optimizer_name, None) - if callable(optimizer_func): - if isinstance(learning_rate, float) and isinstance(learning_rate_decay, float): - # in this case decay will be either given in config or, by default, learning_rate_decay=0. - self.optimizer = optimizer_func(lr=learning_rate, decay=learning_rate_decay) - else: - self.optimizer = optimizer_func() - else: - raise AttributeError("Optimizer {} is not defined in `tensorflow.keras.optimizers`".format(optimizer_name)) - - loss_func = getattr(tensorflow.keras.losses, loss_name, None) - if callable(loss_func): - loss = loss_func - else: - raise AttributeError("Loss {} is not defined".format(loss_name)) - - model.compile(optimizer=self.optimizer, - loss=loss) - return model - - @overrides - def load(self, model_name: Optional[str] = None) -> None: - - model_name = model_name or self.opt.get('model_name') - self._load(model_name=model_name) - # in case of pre-trained after loading in self.opt we have stored parameters - # now we can restore lear rate if needed - if self.opt.get("restore_lr", False) and ("final_learning_rate" in self.opt): - self.opt["learning_rate"] = self.opt["final_learning_rate"] - - self.model = self.compile(self.model, - optimizer_name=self.opt["optimizer"], - loss_name=self.opt["loss"], - learning_rate=self.opt["learning_rate"], - learning_rate_decay=self.opt["learning_rate_decay"]) - - @overrides - def save(self, fname: str = None) -> None: - """ - Save the model parameters into <>_opt.json (or <>_opt.json) - and model weights into <>.h5 (or <>.h5) - Args: - fname: file_path to save model. If not explicitly given seld.opt["ser_file"] will be used - - Returns: - None - """ - if not fname: - fname = self.save_path - else: - fname = Path(fname).resolve() - - if not fname.parent.is_dir(): - raise ConfigError("Provided save path is incorrect!") - else: - opt_path = f"{fname}_opt.json" - weights_path = f"{fname}.h5" - log.info(f"[saving model to {opt_path}]") - self.model.save_weights(weights_path) - - # if model was loaded from one path and saved to another one - # then change load_path to save_path for config - self.opt["epochs_done"] = self.epochs_done - if isinstance(self.opt.get("learning_rate", None), float): - self.opt["final_learning_rate"] = (K.eval(self.optimizer.lr) / - (1. + K.eval(self.optimizer.decay) * self.batches_seen)) - - if self.opt.get("load_path") and self.opt.get("save_path"): - if self.opt.get("save_path") != self.opt.get("load_path"): - self.opt["load_path"] = str(self.opt["save_path"]) - save_json(self.opt, opt_path) - - # noinspection PyUnusedLocal - def cnn_model(self, kernel_sizes_cnn: List[int], filters_cnn: int, dense_size: int, - coef_reg_cnn: float = 0., coef_reg_den: float = 0., dropout_rate: float = 0., - input_projection_size: Optional[int] = None, **kwargs) -> Model: - """ - Build un-compiled model of shallow-and-wide CNN. - - Args: - kernel_sizes_cnn: list of kernel sizes of convolutions. - filters_cnn: number of filters for convolutions. - dense_size: number of units for dense layer. - coef_reg_cnn: l2-regularization coefficient for convolutions. - coef_reg_den: l2-regularization coefficient for dense layers. - dropout_rate: dropout rate used after convolutions and between dense layers. - input_projection_size: if not None, adds Dense layer (with ``relu`` activation) - right after input layer to the size ``input_projection_size``. - Useful for input dimentionaliry recuction. Default: ``None``. - kwargs: other non-used parameters - - Returns: - keras.models.Model: uncompiled instance of Keras Model - """ - inp = Input(shape=(self.opt['text_size'], self.opt['embedding_size'])) - output = inp - - if input_projection_size is not None: - output = Dense(input_projection_size, activation='relu')(output) - - outputs = [] - for i in range(len(kernel_sizes_cnn)): - output_i = Conv1D(filters_cnn, kernel_size=kernel_sizes_cnn[i], - activation=None, - kernel_regularizer=l2(coef_reg_cnn), - padding='same')(output) - output_i = BatchNormalization()(output_i) - output_i = Activation('relu')(output_i) - output_i = GlobalMaxPooling1D()(output_i) - outputs.append(output_i) - - output = concatenate(outputs, axis=1) - - output = Dropout(rate=dropout_rate)(output) - output = Dense(dense_size, activation=None, - kernel_regularizer=l2(coef_reg_den))(output) - output = BatchNormalization()(output) - output = Activation('relu')(output) - output = Dropout(rate=dropout_rate)(output) - output = Dense(self.n_classes, activation=None, - kernel_regularizer=l2(coef_reg_den))(output) - output = BatchNormalization()(output) - act_output = Activation(self.opt.get("last_layer_activation", "sigmoid"))(output) - model = Model(inputs=inp, outputs=act_output) - return model - - # noinspection PyUnusedLocal - def dcnn_model(self, kernel_sizes_cnn: List[int], filters_cnn: List[int], dense_size: int, - coef_reg_cnn: float = 0., coef_reg_den: float = 0., dropout_rate: float = 0., - input_projection_size: Optional[int] = None, **kwargs) -> Model: - """ - Build un-compiled model of deep CNN. - - Args: - kernel_sizes_cnn: list of kernel sizes of convolutions. - filters_cnn: number of filters for convolutions. - dense_size: number of units for dense layer. - coef_reg_cnn: l2-regularization coefficient for convolutions. - coef_reg_den: l2-regularization coefficient for dense layers. - dropout_rate: dropout rate used after convolutions and between dense layers. - input_projection_size: if not None, adds Dense layer (with ``relu`` activation) - right after input layer to the size ``input_projection_size``. - Useful for input dimentionaliry recuction. Default: ``None``. - kwargs: other non-used parameters - - Returns: - keras.models.Model: uncompiled instance of Keras Model - """ - inp = Input(shape=(self.opt['text_size'], self.opt['embedding_size'])) - output = inp - - if input_projection_size is not None: - output = Dense(input_projection_size, activation='relu')(output) - - for i in range(len(kernel_sizes_cnn)): - output = Conv1D(filters_cnn[i], kernel_size=kernel_sizes_cnn[i], - activation=None, - kernel_regularizer=l2(coef_reg_cnn), - padding='same')(output) - output = BatchNormalization()(output) - output = Activation('relu')(output) - output = MaxPooling1D()(output) - - output = GlobalMaxPooling1D()(output) - output = Dropout(rate=dropout_rate)(output) - output = Dense(dense_size, activation=None, - kernel_regularizer=l2(coef_reg_den))(output) - output = BatchNormalization()(output) - output = Activation('relu')(output) - output = Dropout(rate=dropout_rate)(output) - output = Dense(self.n_classes, activation=None, - kernel_regularizer=l2(coef_reg_den))(output) - output = BatchNormalization()(output) - act_output = Activation(self.opt.get("last_layer_activation", "sigmoid"))(output) - model = Model(inputs=inp, outputs=act_output) - return model - - # noinspection PyUnusedLocal - def cnn_model_max_and_aver_pool(self, kernel_sizes_cnn: List[int], filters_cnn: int, dense_size: int, - coef_reg_cnn: float = 0., coef_reg_den: float = 0., dropout_rate: float = 0., - input_projection_size: Optional[int] = None, **kwargs) -> Model: - """ - Build un-compiled model of shallow-and-wide CNN where average pooling after convolutions is replaced with - concatenation of average and max poolings. - - Args: - kernel_sizes_cnn: list of kernel sizes of convolutions. - filters_cnn: number of filters for convolutions. - dense_size: number of units for dense layer. - coef_reg_cnn: l2-regularization coefficient for convolutions. Default: ``0.0``. - coef_reg_den: l2-regularization coefficient for dense layers. Default: ``0.0``. - dropout_rate: dropout rate used after convolutions and between dense layers. Default: ``0.0``. - input_projection_size: if not None, adds Dense layer (with ``relu`` activation) - right after input layer to the size ``input_projection_size``. - Useful for input dimentionaliry recuction. Default: ``None``. - kwargs: other non-used parameters - - Returns: - keras.models.Model: uncompiled instance of Keras Model - """ - - inp = Input(shape=(self.opt['text_size'], self.opt['embedding_size'])) - output = inp - - if input_projection_size is not None: - output = Dense(input_projection_size, activation='relu')(output) - - outputs = [] - for i in range(len(kernel_sizes_cnn)): - output_i = Conv1D(filters_cnn, kernel_size=kernel_sizes_cnn[i], - activation=None, - kernel_regularizer=l2(coef_reg_cnn), - padding='same')(output) - output_i = BatchNormalization()(output_i) - output_i = Activation('relu')(output_i) - output_i_0 = GlobalMaxPooling1D()(output_i) - output_i_1 = GlobalAveragePooling1D()(output_i) - output_i = concatenate([output_i_0, output_i_1]) - outputs.append(output_i) - - output = concatenate(outputs, axis=1) - - output = Dropout(rate=dropout_rate)(output) - output = Dense(dense_size, activation=None, - kernel_regularizer=l2(coef_reg_den))(output) - output = BatchNormalization()(output) - output = Activation('relu')(output) - output = Dropout(rate=dropout_rate)(output) - output = Dense(self.n_classes, activation=None, - kernel_regularizer=l2(coef_reg_den))(output) - output = BatchNormalization()(output) - act_output = Activation(self.opt.get("last_layer_activation", "sigmoid"))(output) - model = Model(inputs=inp, outputs=act_output) - return model - - # noinspection PyUnusedLocal - def bilstm_model(self, units_lstm: int, dense_size: int, - coef_reg_lstm: float = 0., coef_reg_den: float = 0., - dropout_rate: float = 0., rec_dropout_rate: float = 0., - input_projection_size: Optional[int] = None, **kwargs) -> Model: - """ - Build un-compiled BiLSTM. - - Args: - units_lstm (int): number of units for LSTM. - dense_size (int): number of units for dense layer. - coef_reg_lstm (float): l2-regularization coefficient for LSTM. Default: ``0.0``. - coef_reg_den (float): l2-regularization coefficient for dense layers. Default: ``0.0``. - dropout_rate (float): dropout rate to be used after BiLSTM and between dense layers. Default: ``0.0``. - rec_dropout_rate (float): dropout rate for LSTM. Default: ``0.0``. - input_projection_size: if not None, adds Dense layer (with ``relu`` activation) - right after input layer to the size ``input_projection_size``. - Useful for input dimentionaliry recuction. Default: ``None``. - kwargs: other non-used parameters - - Returns: - keras.models.Model: uncompiled instance of Keras Model - """ - - inp = Input(shape=(self.opt['text_size'], self.opt['embedding_size'])) - output = inp - - if input_projection_size is not None: - output = Dense(input_projection_size, activation='relu')(output) - - output = Bidirectional(LSTM(units_lstm, activation='tanh', - return_sequences=True, - kernel_regularizer=l2(coef_reg_lstm), - dropout=dropout_rate, - recurrent_dropout=rec_dropout_rate))(output) - - output = GlobalMaxPooling1D()(output) - output = Dropout(rate=dropout_rate)(output) - output = Dense(dense_size, activation=None, - kernel_regularizer=l2(coef_reg_den))(output) - output = Activation('relu')(output) - output = Dropout(rate=dropout_rate)(output) - output = Dense(self.n_classes, activation=None, - kernel_regularizer=l2(coef_reg_den))(output) - act_output = Activation(self.opt.get("last_layer_activation", "sigmoid"))(output) - model = Model(inputs=inp, outputs=act_output) - return model - - # noinspection PyUnusedLocal - def bilstm_bilstm_model(self, units_lstm_1: int, units_lstm_2: int, dense_size: int, - coef_reg_lstm: float = 0., coef_reg_den: float = 0., - dropout_rate: float = 0., rec_dropout_rate: float = 0., - input_projection_size: Optional[int] = None, **kwargs) -> Model: - """ - Build un-compiled two-layers BiLSTM. - - Args: - units_lstm_1: number of units for the first LSTM layer. - units_lstm_2: number of units for the second LSTM layer. - dense_size: number of units for dense layer. - coef_reg_lstm: l2-regularization coefficient for LSTM. Default: ``0.0``. - coef_reg_den: l2-regularization coefficient for dense layers. Default: ``0.0``. - dropout_rate: dropout rate to be used after BiLSTM and between dense layers. Default: ``0.0``. - rec_dropout_rate: dropout rate for LSTM. Default: ``0.0``. - input_projection_size: if not None, adds Dense layer (with ``relu`` activation) - right after input layer to the size ``input_projection_size``. - Useful for input dimentionaliry recuction. Default: ``None``. - kwargs: other non-used parameters - - Returns: - keras.models.Model: uncompiled instance of Keras Model - """ - - inp = Input(shape=(self.opt['text_size'], self.opt['embedding_size'])) - output = inp - - if input_projection_size is not None: - output = Dense(input_projection_size, activation='relu')(output) - - output = Bidirectional(LSTM(units_lstm_1, activation='tanh', - return_sequences=True, - kernel_regularizer=l2(coef_reg_lstm), - dropout=dropout_rate, - recurrent_dropout=rec_dropout_rate))(output) - - output = Dropout(rate=dropout_rate)(output) - - output = Bidirectional(LSTM(units_lstm_2, activation='tanh', - return_sequences=True, - kernel_regularizer=l2(coef_reg_lstm), - dropout=dropout_rate, - recurrent_dropout=rec_dropout_rate))(output) - - output = GlobalMaxPooling1D()(output) - output = Dropout(rate=dropout_rate)(output) - output = Dense(dense_size, activation=None, - kernel_regularizer=l2(coef_reg_den))(output) - output = Activation('relu')(output) - output = Dropout(rate=dropout_rate)(output) - output = Dense(self.n_classes, activation=None, - kernel_regularizer=l2(coef_reg_den))(output) - act_output = Activation(self.opt.get("last_layer_activation", "sigmoid"))(output) - model = Model(inputs=inp, outputs=act_output) - return model - - # noinspection PyUnusedLocal - def bilstm_cnn_model(self, units_lstm: int, kernel_sizes_cnn: List[int], filters_cnn: int, dense_size: int, - coef_reg_lstm: float = 0., coef_reg_cnn: float = 0., coef_reg_den: float = 0., - dropout_rate: float = 0., rec_dropout_rate: float = 0., - input_projection_size: Optional[int] = None, **kwargs) -> Model: - """ - Build un-compiled BiLSTM-CNN. - - Args: - units_lstm: number of units for LSTM. - kernel_sizes_cnn: list of kernel sizes of convolutions. - filters_cnn: number of filters for convolutions. - dense_size: number of units for dense layer. - coef_reg_lstm: l2-regularization coefficient for LSTM. Default: ``0.0``. - coef_reg_cnn: l2-regularization coefficient for convolutions. Default: ``0.0``. - coef_reg_den: l2-regularization coefficient for dense layers. Default: ``0.0``. - dropout_rate: dropout rate to be used after BiLSTM and between dense layers. Default: ``0.0``. - rec_dropout_rate: dropout rate for LSTM. Default: ``0.0``. - input_projection_size: if not None, adds Dense layer (with ``relu`` activation) - right after input layer to the size ``input_projection_size``. - Useful for input dimentionaliry recuction. Default: ``None``. - kwargs: other non-used parameters - - Returns: - keras.models.Model: uncompiled instance of Keras Model - """ - - inp = Input(shape=(self.opt['text_size'], self.opt['embedding_size'])) - output = inp - - if input_projection_size is not None: - output = Dense(input_projection_size, activation='relu')(output) - - output = Bidirectional(LSTM(units_lstm, activation='tanh', - return_sequences=True, - kernel_regularizer=l2(coef_reg_lstm), - dropout=dropout_rate, - recurrent_dropout=rec_dropout_rate))(output) - - output = Reshape(target_shape=(self.opt['text_size'], 2 * units_lstm))(output) - outputs = [] - for i in range(len(kernel_sizes_cnn)): - output_i = Conv1D(filters_cnn, - kernel_size=kernel_sizes_cnn[i], - activation=None, - kernel_regularizer=l2(coef_reg_cnn), - padding='same')(output) - output_i = BatchNormalization()(output_i) - output_i = Activation('relu')(output_i) - output_i = GlobalMaxPooling1D()(output_i) - outputs.append(output_i) - - output = concatenate(outputs, axis=1) - output = Dropout(rate=dropout_rate)(output) - output = Dense(dense_size, activation=None, - kernel_regularizer=l2(coef_reg_den))(output) - output = Activation('relu')(output) - output = Dropout(rate=dropout_rate)(output) - output = Dense(self.n_classes, activation=None, - kernel_regularizer=l2(coef_reg_den))(output) - act_output = Activation(self.opt.get("last_layer_activation", "sigmoid"))(output) - model = Model(inputs=inp, outputs=act_output) - return model - - # noinspection PyUnusedLocal - def cnn_bilstm_model(self, kernel_sizes_cnn: List[int], filters_cnn: int, units_lstm: int, dense_size: int, - coef_reg_cnn: float = 0., coef_reg_lstm: float = 0., coef_reg_den: float = 0., - dropout_rate: float = 0., rec_dropout_rate: float = 0., - input_projection_size: Optional[int] = None, **kwargs) -> Model: - """ - Build un-compiled BiLSTM-CNN. - - Args: - kernel_sizes_cnn: list of kernel sizes of convolutions. - filters_cnn: number of filters for convolutions. - units_lstm: number of units for LSTM. - dense_size: number of units for dense layer. - coef_reg_cnn: l2-regularization coefficient for convolutions. Default: ``0.0``. - coef_reg_lstm: l2-regularization coefficient for LSTM. Default: ``0.0``. - coef_reg_den: l2-regularization coefficient for dense layers. Default: ``0.0``. - dropout_rate: dropout rate to be used after BiLSTM and between dense layers. Default: ``0.0``. - rec_dropout_rate: dropout rate for LSTM. Default: ``0.0``. - input_projection_size: if not None, adds Dense layer (with ``relu`` activation) - right after input layer to the size ``input_projection_size``. - Useful for input dimentionaliry recuction. Default: ``None``. - kwargs: other non-used parameters - - Returns: - keras.models.Model: uncompiled instance of Keras Model - """ - - inp = Input(shape=(self.opt['text_size'], self.opt['embedding_size'])) - output = inp - - if input_projection_size is not None: - output = Dense(input_projection_size, activation='relu')(output) - - outputs = [] - for i in range(len(kernel_sizes_cnn)): - output_i = Conv1D(filters_cnn, kernel_size=kernel_sizes_cnn[i], - activation=None, - kernel_regularizer=l2(coef_reg_cnn), - padding='same')(output) - output_i = BatchNormalization()(output_i) - output_i = Activation('relu')(output_i) - output_i = MaxPooling1D()(output_i) - outputs.append(output_i) - - output = concatenate(outputs, axis=-1) - output = Dropout(rate=dropout_rate)(output) - - output = Bidirectional(LSTM(units_lstm, activation='tanh', - return_sequences=True, - kernel_regularizer=l2(coef_reg_lstm), - dropout=dropout_rate, - recurrent_dropout=rec_dropout_rate))(output) - - output = GlobalMaxPooling1D()(output) - output = Dropout(rate=dropout_rate)(output) - output = Dense(dense_size, activation=None, - kernel_regularizer=l2(coef_reg_den))(output) - output = Activation('relu')(output) - output = Dropout(rate=dropout_rate)(output) - output = Dense(self.n_classes, activation=None, - kernel_regularizer=l2(coef_reg_den))(output) - act_output = Activation(self.opt.get("last_layer_activation", "sigmoid"))(output) - model = Model(inputs=inp, outputs=act_output) - return model - - # noinspection PyUnusedLocal - def bilstm_self_add_attention_model(self, units_lstm: int, dense_size: int, self_att_hid: int, self_att_out: int, - coef_reg_lstm: float = 0., coef_reg_den: float = 0., - dropout_rate: float = 0., rec_dropout_rate: float = 0., - input_projection_size: Optional[int] = None, **kwargs) -> Model: - """ - Method builds uncompiled model of BiLSTM with self additive attention. - - Args: - units_lstm: number of units for LSTM. - self_att_hid: number of hidden units in self-attention - self_att_out: number of output units in self-attention - dense_size: number of units for dense layer. - coef_reg_lstm: l2-regularization coefficient for LSTM. Default: ``0.0``. - coef_reg_den: l2-regularization coefficient for dense layers. Default: ``0.0``. - dropout_rate: dropout rate to be used after BiLSTM and between dense layers. Default: ``0.0``. - rec_dropout_rate: dropout rate for LSTM. Default: ``0.0``. - input_projection_size: if not None, adds Dense layer (with ``relu`` activation) - right after input layer to the size ``input_projection_size``. - Useful for input dimentionaliry recuction. Default: ``None``. - kwargs: other non-used parameters - - Returns: - keras.models.Model: uncompiled instance of Keras Model - """ - - inp = Input(shape=(self.opt['text_size'], self.opt['embedding_size'])) - output = inp - - if input_projection_size is not None: - output = Dense(input_projection_size, activation='relu')(output) - - output = Bidirectional(LSTM(units_lstm, activation='tanh', - return_sequences=True, - kernel_regularizer=l2(coef_reg_lstm), - dropout=dropout_rate, - recurrent_dropout=rec_dropout_rate))(output) - - output = MaxPooling1D(pool_size=2, strides=3)(output) - - output = additive_self_attention(output, n_hidden=self_att_hid, - n_output_features=self_att_out) - output = GlobalMaxPooling1D()(output) - output = Dropout(rate=dropout_rate)(output) - output = Dense(dense_size, activation=None, - kernel_regularizer=l2(coef_reg_den))(output) - output = Activation('relu')(output) - output = Dropout(rate=dropout_rate)(output) - output = Dense(self.n_classes, activation=None, - kernel_regularizer=l2(coef_reg_den))(output) - act_output = Activation(self.opt.get("last_layer_activation", "sigmoid"))(output) - model = Model(inputs=inp, outputs=act_output) - return model - - # noinspection PyUnusedLocal - def bilstm_self_mult_attention_model(self, units_lstm: int, dense_size: int, self_att_hid: int, self_att_out: int, - coef_reg_lstm: float = 0., coef_reg_den: float = 0., - dropout_rate: float = 0., rec_dropout_rate: float = 0., - input_projection_size: Optional[int] = None, **kwargs) -> Model: - """ - Method builds uncompiled model of BiLSTM with self multiplicative attention. - - Args: - units_lstm: number of units for LSTM. - self_att_hid: number of hidden units in self-attention - self_att_out: number of output units in self-attention - dense_size: number of units for dense layer. - coef_reg_lstm: l2-regularization coefficient for LSTM. Default: ``0.0``. - coef_reg_den: l2-regularization coefficient for dense layers. Default: ``0.0``. - dropout_rate: dropout rate to be used after BiLSTM and between dense layers. Default: ``0.0``. - rec_dropout_rate: dropout rate for LSTM. Default: ``0.0``. - input_projection_size: if not None, adds Dense layer (with ``relu`` activation) - right after input layer to the size ``input_projection_size``. - Useful for input dimentionaliry recuction. Default: ``None``. - kwargs: other non-used parameters - - Returns: - keras.models.Model: uncompiled instance of Keras Model - """ - - inp = Input(shape=(self.opt['text_size'], self.opt['embedding_size'])) - output = inp - - if input_projection_size is not None: - output = Dense(input_projection_size, activation='relu')(output) - - output = Bidirectional(LSTM(units_lstm, activation='tanh', - return_sequences=True, - kernel_regularizer=l2(coef_reg_lstm), - dropout=dropout_rate, - recurrent_dropout=rec_dropout_rate))(output) - - output = MaxPooling1D(pool_size=2, strides=3)(output) - - output = multiplicative_self_attention(output, n_hidden=self_att_hid, - n_output_features=self_att_out) - output = GlobalMaxPooling1D()(output) - output = Dropout(rate=dropout_rate)(output) - output = Dense(dense_size, activation=None, - kernel_regularizer=l2(coef_reg_den))(output) - output = Activation('relu')(output) - output = Dropout(rate=dropout_rate)(output) - output = Dense(self.n_classes, activation=None, - kernel_regularizer=l2(coef_reg_den))(output) - act_output = Activation(self.opt.get("last_layer_activation", "sigmoid"))(output) - model = Model(inputs=inp, outputs=act_output) - return model - - # noinspection PyUnusedLocal - def bigru_model(self, units_gru: int, dense_size: int, - coef_reg_lstm: float = 0., coef_reg_den: float = 0., - dropout_rate: float = 0., rec_dropout_rate: float = 0., - input_projection_size: Optional[int] = None, **kwargs) -> Model: - """ - Method builds uncompiled model BiGRU. - - Args: - units_gru: number of units for GRU. - dense_size: number of units for dense layer. - coef_reg_lstm: l2-regularization coefficient for GRU. Default: ``0.0``. - coef_reg_den: l2-regularization coefficient for dense layers. Default: ``0.0``. - dropout_rate: dropout rate to be used after BiGRU and between dense layers. Default: ``0.0``. - rec_dropout_rate: dropout rate for GRU. Default: ``0.0``. - input_projection_size: if not None, adds Dense layer (with ``relu`` activation) - right after input layer to the size ``input_projection_size``. - Useful for input dimentionaliry recuction. Default: ``None``. - kwargs: other non-used parameters - - Returns: - keras.models.Model: uncompiled instance of Keras Model - """ - - inp = Input(shape=(self.opt['text_size'], self.opt['embedding_size'])) - output = inp - - if input_projection_size is not None: - output = Dense(input_projection_size, activation='relu')(output) - - output = Bidirectional(GRU(units_gru, activation='tanh', - return_sequences=True, - kernel_regularizer=l2(coef_reg_lstm), - dropout=dropout_rate, - recurrent_dropout=rec_dropout_rate))(output) - - output = GlobalMaxPooling1D()(output) - output = Dropout(rate=dropout_rate)(output) - output = Dense(dense_size, activation=None, - kernel_regularizer=l2(coef_reg_den))(output) - output = Activation('relu')(output) - output = Dropout(rate=dropout_rate)(output) - output = Dense(self.n_classes, activation=None, - kernel_regularizer=l2(coef_reg_den))(output) - act_output = Activation(self.opt.get("last_layer_activation", "sigmoid"))(output) - model = Model(inputs=inp, outputs=act_output) - return model - - # noinspection PyUnusedLocal - def bigru_with_max_aver_pool_model(self, units_gru: int, dense_size: int, - coef_reg_gru: float = 0., coef_reg_den: float = 0., - dropout_rate: float = 0., rec_dropout_rate: float = 0., - **kwargs) -> Model: - """ - Method builds uncompiled model Bidirectional GRU with concatenation of max and average pooling after BiGRU. - - Args: - units_gru: number of units for GRU. - dense_size: number of units for dense layer. - coef_reg_gru: l2-regularization coefficient for GRU. Default: ``0.0``. - coef_reg_den: l2-regularization coefficient for dense layers. Default: ``0.0``. - dropout_rate: dropout rate to be used after BiGRU and between dense layers. Default: ``0.0``. - rec_dropout_rate: dropout rate for GRU. Default: ``0.0``. - kwargs: other non-used parameters - - Returns: - keras.models.Model: uncompiled instance of Keras Model - """ - - inp = Input(shape=(self.opt['text_size'], self.opt['embedding_size'])) - - output = Dropout(rate=dropout_rate)(inp) - - output, state1, state2 = Bidirectional(GRU(units_gru, activation='tanh', - return_sequences=True, - return_state=True, - kernel_regularizer=l2(coef_reg_gru), - dropout=dropout_rate, - recurrent_dropout=rec_dropout_rate))(output) - - output1 = GlobalMaxPooling1D()(output) - output2 = GlobalAveragePooling1D()(output) - - output = concatenate([output1, output2, state1, state2]) - - output = Dropout(rate=dropout_rate)(output) - output = Dense(dense_size, activation=None, - kernel_regularizer=l2(coef_reg_den))(output) - output = Activation('relu')(output) - output = Dropout(rate=dropout_rate)(output) - output = Dense(self.n_classes, activation=None, - kernel_regularizer=l2(coef_reg_den))(output) - act_output = Activation(self.opt.get("last_layer_activation", "sigmoid"))(output) - model = Model(inputs=inp, outputs=act_output) - return model diff --git a/deeppavlov/models/classifiers/ru_obscenity_classifier.py b/deeppavlov/models/classifiers/ru_obscenity_classifier.py deleted file mode 100644 index 6c17ae2ae8..0000000000 --- a/deeppavlov/models/classifiers/ru_obscenity_classifier.py +++ /dev/null @@ -1,144 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import json -import re -from logging import getLogger -from pathlib import Path -from typing import List, Union - -import pymorphy2 - -from deeppavlov.core.commands.utils import expand_path -from deeppavlov.core.common.registry import register -from deeppavlov.core.models.estimator import Component - -log = getLogger(__name__) - - -@register("ru_obscenity_classifier") -class RuObscenityClassifier(Component): - """Rule-Based model that decides whether the sentence is obscene or not, - for Russian language - - Args: - data_path: a directory where the required files are stored. - next files are required: - -'obscenity_words.json' — file that stores list of obscenity words - -'obscenity_words_exception.json' — file that stores list of not obscenity words, - but which are detects by algorithm as obscenity(for fixing this situation) - -'obscenity_words_extended.json' — file that stores list of obscenity words, - in which user can add additional obscenity words - - Attributes: - obscenity_words: list of russian obscenity words - obscenity_words_extended: list of russian obscenity words - obscenity_words_exception: list of words on that model makes mistake that they are obscene - regexp: reg exp that finds various obscene words - regexp2: reg exp that finds various obscene words - morph: pymorphy2.MorphAnalyzer object - word_pattern: reg exp that finds words in text - """ - - def _get_patterns(self): - PATTERN_1 = r''.join(( - r'\w{0,5}[хx]([хx\s\!@#\$%\^&*+-\|\/]{0,6})', - r'[уy]([уy\s\!@#\$%\^&*+-\|\/]{0,6})[ёiлeеюийя]\w{0,7}|\w{0,6}[пp]', - r'([пp\s\!@#\$%\^&*+-\|\/]{0,6})[iие]([iие\s\!@#\$%\^&*+-\|\/]{0,6})', - r'[3зс]([3зс\s\!@#\$%\^&*+-\|\/]{0,6})[дd]\w{0,10}|[сcs][уy]', - r'([уy\!@#\$%\^&*+-\|\/]{0,6})[4чkк]\w{1,3}|\w{0,4}[bб]', - r'([bб\s\!@#\$%\^&*+-\|\/]{0,6})[lл]([lл\s\!@#\$%\^&*+-\|\/]{0,6})', - r'[yя]\w{0,10}|\w{0,8}[её][bб][лске@eыиаa][наи@йвл]\w{0,8}|\w{0,4}[еe]', - r'([еe\s\!@#\$%\^&*+-\|\/]{0,6})[бb]([бb\s\!@#\$%\^&*+-\|\/]{0,6})', - r'[uу]([uу\s\!@#\$%\^&*+-\|\/]{0,6})[н4ч]\w{0,4}|\w{0,4}[еeё]', - r'([еeё\s\!@#\$%\^&*+-\|\/]{0,6})[бb]([бb\s\!@#\$%\^&*+-\|\/]{0,6})', - r'[нn]([нn\s\!@#\$%\^&*+-\|\/]{0,6})[уy]\w{0,4}|\w{0,4}[еe]', - r'([еe\s\!@#\$%\^&*+-\|\/]{0,6})[бb]([бb\s\!@#\$%\^&*+-\|\/]{0,6})', - r'[оoаa@]([оoаa@\s\!@#\$%\^&*+-\|\/]{0,6})[тnнt]\w{0,4}|\w{0,10}[ё]', - r'([ё\!@#\$%\^&*+-\|\/]{0,6})[б]\w{0,6}|\w{0,4}[pп]', - r'([pп\s\!@#\$%\^&*+-\|\/]{0,6})[иeеi]([иeеi\s\!@#\$%\^&*+-\|\/]{0,6})', - r'[дd]([дd\s\!@#\$%\^&*+-\|\/]{0,6})[oоаa@еeиi]', - r'([oоаa@еeиi\s\!@#\$%\^&*+-\|\/]{0,6})[рr]\w{0,12}', - )) - - PATTERN_2 = r'|'.join(( - r"(\b[сs]{1}[сsц]{0,1}[uуy](?:[ч4]{0,1}[иаakк][^ц])\w*\b)", - r"(\b(?!пло|стра|[тл]и)(\w(?!(у|пло)))*[хx][уy](й|йа|[еeё]|и|я|ли|ю)(?!га)\w*\b)", - r"(\b(п[oо]|[нз][аa])*[хx][eе][рp]\w*\b)", - r"(\b[мm][уy][дd]([аa][кk]|[oо]|и)\w*\b)", - r"(\b\w*д[рp](?:[oо][ч4]|[аa][ч4])(?!л)\w*\b)", - r"(\b(?!(?:кило)?[тм]ет)(?!смо)[а-яa-z]*(? None: - log.info(f"Initializing `{self.__class__.__name__}`") - - data_path = expand_path(data_path) - with open(data_path / 'obscenity_words.json', encoding="utf-8") as f: - self.obscenity_words = set(json.load(f)) - with open(data_path / 'obscenity_words_exception.json', encoding="utf-8") as f: - self.obscenity_words_exception = set(json.load(f)) - if (data_path / 'obscenity_words_extended.json').exists(): - with open(data_path / 'obscenity_words_extended.json', encoding="utf-8") as f: - self.obscenity_words_extended = set(json.load(f)) - self.obscenity_words.update(self.obscenity_words_extended) - - PATTERN_1, PATTERN_2 = self._get_patterns() - self.regexp = re.compile(PATTERN_1, re.U | re.I) - self.regexp2 = re.compile(PATTERN_2, re.U | re.I) - self.morph = pymorphy2.MorphAnalyzer() - self.word_pattern = re.compile(r'[А-яЁё]+') - - def _check_obscenity(self, text: str) -> bool: - for word in self.word_pattern.findall(text): - if len(word) < 3: - continue - word = word.lower() - word.replace('ё', 'е') - normal_word = self.morph.parse(word)[0].normal_form - if normal_word in self.obscenity_words_exception \ - or word in self.obscenity_words_exception: - continue - if normal_word in self.obscenity_words \ - or word in self.obscenity_words \ - or bool(self.regexp.findall(normal_word)) \ - or bool(self.regexp.findall(word)) \ - or bool(self.regexp2.findall(normal_word)) \ - or bool(self.regexp2.findall(word)): - return True - return False - - def __call__(self, texts: List[str]) -> List[bool]: - """It decides whether text is obscene or not - - Args: - texts: list of texts, for which it needs to decide they are obscene or not - - Returns: - list of bool: True is for obscene text, False is for not obscene text - """ - decisions = list(map(self._check_obscenity, texts)) - return decisions diff --git a/deeppavlov/models/doc_retrieval/pop_ranker.py b/deeppavlov/models/doc_retrieval/pop_ranker.py index 280805dc48..f27938e811 100644 --- a/deeppavlov/models/doc_retrieval/pop_ranker.py +++ b/deeppavlov/models/doc_retrieval/pop_ranker.py @@ -17,7 +17,7 @@ from typing import List, Any, Tuple import numpy as np -from sklearn.externals import joblib +import joblib from deeppavlov.core.commands.utils import expand_path from deeppavlov.core.common.file import read_json diff --git a/deeppavlov/models/elmo/__init__.py b/deeppavlov/models/elmo/__init__.py deleted file mode 100644 index e69de29bb2..0000000000 diff --git a/deeppavlov/models/elmo/bilm_model.py b/deeppavlov/models/elmo/bilm_model.py deleted file mode 100644 index cc7eacb8b0..0000000000 --- a/deeppavlov/models/elmo/bilm_model.py +++ /dev/null @@ -1,510 +0,0 @@ -# originally based on https://github.com/allenai/bilm-tf/blob/master/bilm/training.py - -# Modifications copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import numpy as np -import tensorflow as tf - -DTYPE = 'float32' -DTYPE_INT = 'int64' - -tf.logging.set_verbosity(tf.logging.INFO) - - -class LanguageModel(object): - """ - A class to build the tensorflow computational graph for NLMs - - All hyperparameters and model configuration is specified in a dictionary - of 'options'. - - is_training is a boolean used to control behavior of dropout layers - and softmax. Set to False for testing. - - The LSTM cell is controlled by the 'lstm' key in options - Here is an example: - - 'lstm': { - 'cell_clip': 5, - 'dim': 4096, - 'n_layers': 2, - 'proj_clip': 5, - 'projection_dim': 512, - 'use_skip_connections': True}, - - 'projection_dim' is assumed token embedding size and LSTM output size. - 'dim' is the hidden state size. - Set 'dim' == 'projection_dim' to skip a projection layer. - """ - - def __init__(self, options, is_training): - self.options = options - self.is_training = is_training - self.bidirectional = options.get('bidirectional', False) - - # use word or char inputs? - self.char_inputs = 'char_cnn' in self.options - - # for the loss function - self.share_embedding_softmax = options.get( - 'share_embedding_softmax', False) - if self.char_inputs and self.share_embedding_softmax: - raise ValueError("Sharing softmax and embedding weights requires " - "word input") - - self.sample_softmax = options.get('sample_softmax', True) - - self._build() - - def _build_word_embeddings(self): - n_tokens_vocab = self.options['n_tokens_vocab'] - batch_size = self.options['batch_size'] - unroll_steps = self.options['unroll_steps'] - - # LSTM options - projection_dim = self.options['lstm']['projection_dim'] - - # the input token_ids and word embeddings - self.token_ids = tf.placeholder(DTYPE_INT, - shape=(batch_size, unroll_steps), - name='token_ids') - # the word embeddings - with tf.device("/cpu:0"): - self.embedding_weights = tf.get_variable( - "embedding", [n_tokens_vocab, projection_dim], - dtype=DTYPE, - ) - self.embedding = tf.nn.embedding_lookup(self.embedding_weights, - self.token_ids) - - # if a bidirectional LM then make placeholders for reverse - # model and embeddings - if self.bidirectional: - self.token_ids_reverse = tf.placeholder(DTYPE_INT, - shape=(batch_size, unroll_steps), - name='token_ids_reverse') - with tf.device("/cpu:0"): - self.embedding_reverse = tf.nn.embedding_lookup( - self.embedding_weights, self.token_ids_reverse) - - def _build_word_char_embeddings(self): - """ - options contains key 'char_cnn': { - - 'n_characters': 262, - - # includes the start / end characters - 'max_characters_per_token': 50, - - 'filters': [ - [1, 32], - [2, 32], - [3, 64], - [4, 128], - [5, 256], - [6, 512], - [7, 512] - ], - 'activation': 'tanh', - - # for the character embedding - 'embedding': {'dim': 16} - - # for highway layers - # if omitted, then no highway layers - 'n_highway': 2, - } - """ - batch_size = self.options['batch_size'] - unroll_steps = self.options['unroll_steps'] - projection_dim = self.options['lstm']['projection_dim'] - - cnn_options = self.options['char_cnn'] - filters = cnn_options['filters'] - n_filters = sum(f[1] for f in filters) - max_chars = cnn_options['max_characters_per_token'] - char_embed_dim = cnn_options['embedding']['dim'] - n_chars = cnn_options['n_characters'] - if n_chars != 261: - raise Exception("Set n_characters=261 for training see a \ - https://github.com/allenai/bilm-tf/blob/master/README.md") - if cnn_options['activation'] == 'tanh': - activation = tf.nn.tanh - elif cnn_options['activation'] == 'relu': - activation = tf.nn.relu - - # the input character ids - self.tokens_characters = tf.placeholder(DTYPE_INT, - shape=(batch_size, unroll_steps, max_chars), - name='tokens_characters') - # the character embeddings - with tf.device("/cpu:0"): - self.embedding_weights = tf.get_variable("char_embed", [n_chars, char_embed_dim], - dtype=DTYPE, - initializer=tf.random_uniform_initializer(-1.0, 1.0)) - # shape (batch_size, unroll_steps, max_chars, embed_dim) - self.char_embedding = tf.nn.embedding_lookup(self.embedding_weights, - self.tokens_characters) - - if self.bidirectional: - self.tokens_characters_reverse = tf.placeholder(DTYPE_INT, - shape=(batch_size, unroll_steps, max_chars), - name='tokens_characters_reverse') - self.char_embedding_reverse = tf.nn.embedding_lookup( - self.embedding_weights, self.tokens_characters_reverse) - - # the convolutions - def make_convolutions(inp, reuse): - with tf.variable_scope('CNN', reuse=reuse): - convolutions = [] - for i, (width, num) in enumerate(filters): - if cnn_options['activation'] == 'relu': - # He initialization for ReLU activation - # with char embeddings init between -1 and 1 - # w_init = tf.random_normal_initializer( - # mean=0.0, - # stddev=np.sqrt(2.0 / (width * char_embed_dim)) - # ) - - # Kim et al 2015, +/- 0.05 - w_init = tf.random_uniform_initializer( - minval=-0.05, maxval=0.05) - elif cnn_options['activation'] == 'tanh': - # glorot init - w_init = tf.random_normal_initializer( - mean=0.0, - stddev=np.sqrt(1.0 / (width * char_embed_dim)) - ) - w = tf.get_variable( - "W_cnn_%s" % i, - [1, width, char_embed_dim, num], - initializer=w_init, - dtype=DTYPE) - b = tf.get_variable( - "b_cnn_%s" % i, [num], dtype=DTYPE, - initializer=tf.constant_initializer(0.0)) - - conv = tf.nn.conv2d(inp, w, - strides=[1, 1, 1, 1], - padding="VALID") + b - # now max pool - conv = tf.nn.max_pool(conv, [1, 1, max_chars - width + 1, 1], - [1, 1, 1, 1], 'VALID') - - # activation - conv = activation(conv) - conv = tf.squeeze(conv, squeeze_dims=[2]) - - convolutions.append(conv) - - return tf.concat(convolutions, 2) - - # for first model, this is False, for others it's True - reuse = tf.get_variable_scope().reuse - embedding = make_convolutions(self.char_embedding, reuse) - - self.token_embedding_layers = [embedding] - - if self.bidirectional: - # re-use the CNN weights from forward pass - embedding_reverse = make_convolutions( - self.char_embedding_reverse, True) - - # for highway and projection layers: - # reshape from (batch_size, n_tokens, dim) to - n_highway = cnn_options.get('n_highway') - use_highway = n_highway is not None and n_highway > 0 - use_proj = n_filters != projection_dim - - if use_highway or use_proj: - embedding = tf.reshape(embedding, [-1, n_filters]) - if self.bidirectional: - embedding_reverse = tf.reshape(embedding_reverse, - [-1, n_filters]) - - # set up weights for projection - if use_proj: - assert n_filters > projection_dim - with tf.variable_scope('CNN_proj'): - W_proj_cnn = tf.get_variable( - "W_proj", [n_filters, projection_dim], - initializer=tf.random_normal_initializer( - mean=0.0, stddev=np.sqrt(1.0 / n_filters)), - dtype=DTYPE) - b_proj_cnn = tf.get_variable( - "b_proj", [projection_dim], - initializer=tf.constant_initializer(0.0), - dtype=DTYPE) - - # apply highways layers - def high(x, ww_carry, bb_carry, ww_tr, bb_tr): - carry_gate = tf.nn.sigmoid(tf.matmul(x, ww_carry) + bb_carry) - transform_gate = tf.nn.relu(tf.matmul(x, ww_tr) + bb_tr) - return carry_gate * transform_gate + (1.0 - carry_gate) * x - - if use_highway: - highway_dim = n_filters - - for i in range(n_highway): - with tf.variable_scope('CNN_high_%s' % i): - W_carry = tf.get_variable( - 'W_carry', [highway_dim, highway_dim], - # glorit init - initializer=tf.random_normal_initializer( - mean=0.0, stddev=np.sqrt(1.0 / highway_dim)), - dtype=DTYPE) - b_carry = tf.get_variable( - 'b_carry', [highway_dim], - initializer=tf.constant_initializer(-2.0), - dtype=DTYPE) - W_transform = tf.get_variable( - 'W_transform', [highway_dim, highway_dim], - initializer=tf.random_normal_initializer( - mean=0.0, stddev=np.sqrt(1.0 / highway_dim)), - dtype=DTYPE) - b_transform = tf.get_variable( - 'b_transform', [highway_dim], - initializer=tf.constant_initializer(0.0), - dtype=DTYPE) - - embedding = high(embedding, W_carry, b_carry, - W_transform, b_transform) - if self.bidirectional: - embedding_reverse = high(embedding_reverse, - W_carry, b_carry, - W_transform, b_transform) - self.token_embedding_layers.append(tf.reshape(embedding, - [batch_size, unroll_steps, highway_dim])) - - # finally project down to projection dim if needed - if use_proj: - embedding = tf.matmul(embedding, W_proj_cnn) + b_proj_cnn - if self.bidirectional: - embedding_reverse = tf.matmul(embedding_reverse, W_proj_cnn) \ - + b_proj_cnn - self.token_embedding_layers.append( - tf.reshape(embedding, [batch_size, unroll_steps, projection_dim]) - ) - - # reshape back to (batch_size, tokens, dim) - if use_highway or use_proj: - shp = [batch_size, unroll_steps, projection_dim] - embedding = tf.reshape(embedding, shp) - if self.bidirectional: - embedding_reverse = tf.reshape(embedding_reverse, shp) - - # at last assign attributes for remainder of the model - self.embedding = embedding - if self.bidirectional: - self.embedding_reverse = embedding_reverse - - def _build(self): - # size of input options - batch_size = self.options['batch_size'] - - # LSTM options - lstm_dim = self.options['lstm']['dim'] - projection_dim = self.options['lstm']['projection_dim'] - n_lstm_layers = self.options['lstm'].get('n_layers', 1) - dropout = self.options['dropout'] - keep_prob = 1.0 - dropout - - if self.char_inputs: - self._build_word_char_embeddings() - else: - self._build_word_embeddings() - - # now the LSTMs - # these will collect the initial states for the forward - # (and reverse LSTMs if we are doing bidirectional) - self.init_lstm_state = [] - self.final_lstm_state = [] - - # get the LSTM inputs - if self.bidirectional: - lstm_inputs = [self.embedding, self.embedding_reverse] - else: - lstm_inputs = [self.embedding] - - # now compute the LSTM outputs - cell_clip = self.options['lstm'].get('cell_clip') - proj_clip = self.options['lstm'].get('proj_clip') - - use_skip_connections = self.options['lstm'].get('use_skip_connections') - - lstm_outputs = [] - for lstm_num, lstm_input in enumerate(lstm_inputs): - lstm_cells = [] - for i in range(n_lstm_layers): - if projection_dim < lstm_dim: - # are projecting down output - lstm_cell = tf.nn.rnn_cell.LSTMCell( - lstm_dim, num_proj=projection_dim, - cell_clip=cell_clip, proj_clip=proj_clip) - else: - lstm_cell = tf.nn.rnn_cell.LSTMCell( - lstm_dim, - cell_clip=cell_clip, proj_clip=proj_clip) - - if use_skip_connections: - # ResidualWrapper adds inputs to outputs - if i == 0: - # don't add skip connection from token embedding to - # 1st layer output - pass - else: - # add a skip connection - lstm_cell = tf.nn.rnn_cell.ResidualWrapper(lstm_cell) - - # add dropout - if self.is_training: - lstm_cell = tf.nn.rnn_cell.DropoutWrapper(lstm_cell, - input_keep_prob=keep_prob) - - lstm_cells.append(lstm_cell) - - if n_lstm_layers > 1: - lstm_cell = tf.nn.rnn_cell.MultiRNNCell(lstm_cells) - else: - lstm_cell = lstm_cells[0] - - with tf.control_dependencies([lstm_input]): - self.init_lstm_state.append( - lstm_cell.zero_state(batch_size, DTYPE)) - # NOTE: this variable scope is for backward compatibility - # with existing models... - if self.bidirectional: - with tf.variable_scope('RNN_%s' % lstm_num): - _lstm_output_unpacked, final_state = tf.nn.static_rnn( - lstm_cell, - tf.unstack(lstm_input, axis=1), - initial_state=self.init_lstm_state[-1]) - else: - _lstm_output_unpacked, final_state = tf.nn.static_rnn( - lstm_cell, - tf.unstack(lstm_input, axis=1), - initial_state=self.init_lstm_state[-1]) - self.final_lstm_state.append(final_state) - - # (batch_size * unroll_steps, 512) - lstm_output_flat = tf.reshape( - tf.stack(_lstm_output_unpacked, axis=1), [-1, projection_dim]) - if self.is_training: - # add dropout to output - lstm_output_flat = tf.nn.dropout(lstm_output_flat, keep_prob) - tf.add_to_collection('lstm_output_embeddings', _lstm_output_unpacked) - - lstm_outputs.append(lstm_output_flat) - - self._build_loss(lstm_outputs) - - def _build_loss(self, lstm_outputs): - """ - Create: - self.total_loss: total loss op for training - self.softmax_W, softmax_b: the softmax variables - self.next_token_id / _reverse: placeholders for gold input - - """ - batch_size = self.options['batch_size'] - unroll_steps = self.options['unroll_steps'] - - n_tokens_vocab = self.options['n_tokens_vocab'] - - # DEFINE next_token_id and *_reverse placeholders for the gold input - def _get_next_token_placeholders(suffix): - name = 'next_token_id' + suffix - id_placeholder = tf.placeholder(DTYPE_INT, - shape=(batch_size, unroll_steps), - name=name) - return id_placeholder - - # get the window and weight placeholders - self.next_token_id = _get_next_token_placeholders('') - if self.bidirectional: - self.next_token_id_reverse = _get_next_token_placeholders( - '_reverse') - - # DEFINE THE SOFTMAX VARIABLES - # get the dimension of the softmax weights - # softmax dimension is the size of the output projection_dim - softmax_dim = self.options['lstm']['projection_dim'] - - # the output softmax variables -- they are shared if bidirectional - if self.share_embedding_softmax: - # softmax_W is just the embedding layer - self.softmax_W = self.embedding_weights - - with tf.variable_scope('softmax'), tf.device('/cpu:0'): - # Glorit init (std=(1.0 / sqrt(fan_in)) - softmax_init = tf.random_normal_initializer(0.0, 1.0 / np.sqrt(softmax_dim)) - if not self.share_embedding_softmax: - self.softmax_W = tf.get_variable( - 'W', [n_tokens_vocab, softmax_dim], - dtype=DTYPE, - initializer=softmax_init - ) - self.softmax_b = tf.get_variable( - 'b', [n_tokens_vocab], - dtype=DTYPE, - initializer=tf.constant_initializer(0.0)) - - # now calculate losses - # loss for each direction of the LSTM - self.individual_train_losses = [] - self.individual_eval_losses = [] - - if self.bidirectional: - next_ids = [self.next_token_id, self.next_token_id_reverse] - else: - next_ids = [self.next_token_id] - - for id_placeholder, lstm_output_flat in zip(next_ids, lstm_outputs): - # flatten the LSTM output and next token id gold to shape: - # (batch_size * unroll_steps, softmax_dim) - # Flatten and reshape the token_id placeholders - next_token_id_flat = tf.reshape(id_placeholder, [-1, 1]) - - with tf.control_dependencies([lstm_output_flat]): - sampled_losses = tf.nn.sampled_softmax_loss(self.softmax_W, self.softmax_b, - next_token_id_flat, lstm_output_flat, - self.options['n_negative_samples_batch'], - self.options['n_tokens_vocab'], - num_true=1) - - # get the full softmax loss - output_scores = tf.matmul( - lstm_output_flat, - tf.transpose(self.softmax_W) - ) + self.softmax_b - # NOTE: tf.nn.sparse_softmax_cross_entropy_with_logits - # expects unnormalized output since it performs the - # softmax internally - losses = tf.nn.sparse_softmax_cross_entropy_with_logits( - logits=output_scores, - labels=tf.squeeze(next_token_id_flat, squeeze_dims=[1]) - ) - sampled_losses = tf.reshape(sampled_losses, [self.options['batch_size'], -1]) - losses = tf.reshape(losses, [self.options['batch_size'], -1]) - self.individual_train_losses.append(tf.reduce_mean(sampled_losses, axis=1)) - self.individual_eval_losses.append(tf.reduce_mean(losses, axis=1)) - - # now make the total loss -- it's the train of the individual losses - if self.bidirectional: - self.total_train_loss = 0.5 * (self.individual_train_losses[0] + self.individual_train_losses[1]) - self.total_eval_loss = 0.5 * (self.individual_eval_losses[0] + self.individual_eval_losses[1]) - else: - self.total_train_loss = self.individual_train_losses[0] - self.total_eval_loss = self.individual_eval_losses[0] diff --git a/deeppavlov/models/elmo/elmo.py b/deeppavlov/models/elmo/elmo.py deleted file mode 100644 index f197ae7c15..0000000000 --- a/deeppavlov/models/elmo/elmo.py +++ /dev/null @@ -1,601 +0,0 @@ -# originally based on https://github.com/allenai/bilm-tf/blob/master/bilm/training.py - -# Modifications copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import copy -import json -from logging import getLogger -from typing import Optional, List - -import numpy as np -import tensorflow as tf -from overrides import overrides - -from deeppavlov.core.commands.utils import expand_path -from deeppavlov.core.common.registry import register -from deeppavlov.core.models.nn_model import NNModel -from deeppavlov.models.elmo.bilm_model import LanguageModel -from deeppavlov.models.elmo.elmo2tfhub import export2hub -from deeppavlov.models.elmo.train_utils import average_gradients, clip_grads, safely_str2int, dump_weights - -log = getLogger(__name__) - - -@register('elmo_model') -class ELMo(NNModel): - """ - The :class:`~deeppavlov.models.elmo.elmo.ELMo` is a deep contextualized word representation that models both - complex characteristics of word use (e.g., syntax and semantics), and how these uses vary across linguistic - contexts (i.e., to model polysemy). - - You can use this component for LM training, fine tuning, dumping ELMo to a hdf5 file and wrapping it to - the tensorflow hub. - - - Parameters: - options_json_path: Path to the json configure. - char_cnn: Options of char_cnn. For example {"activation":"relu","embedding":{"dim":16}, - "filters":[[1,32],[2,32],[3,64],[4,128],[5,256],[6,512],[7,1024]],"max_characters_per_token":50, - "n_characters":261,"n_highway":2} - bidirectional: Whether to use bidirectional or not. - unroll_steps: Number of unrolling steps. - n_tokens_vocab: A size of a vocabulary. - lstm: Options of lstm. It is a dict of "cell_clip":int, "dim":int, "n_layers":int, "proj_clip":int, - "projection_dim":int, "use_skip_connections":bool - dropout: Probability of keeping the network state, values from 0 to 1. - n_negative_samples_batch: Whether to use negative samples batch or not. Number of batch samples. - all_clip_norm_val: Clip the gradients. - initial_accumulator_value: Whether to use dropout between layers or not. - learning_rate: Learning rate to use during the training (usually from 0.1 to 0.0001) - n_gpus: Number of gpu to use. - seed: Random seed. - batch_size: A size of a train batch. - load_epoch_num: An index of loading epoch. - epoch_load_path: An epoch loading path relative to save_path. - epoch_save_path: An epoch saving path relative to save_path. - If epoch_save_path is None then epoch_save_path = epoch_load_path. - dumps_save_path: A dump saving path relative to save_path. - tf_hub_save_path: A tf_hub saving path relative to save_path. - - To train ELMo representations from a paper `Deep contextualized word representations - `__ you can use multiple GPUs by set ``n_gpus`` parameter. - - You can explicitly specify the path to a json file with hyperparameters of ELMo used to train by - ``options_json_path`` parameter. - The json file must be the same as the json file from `original ELMo implementation - `__. You can define the architecture using the separate parameters. - - Saving the model will take place in directories with some structure, see below example: - - {MODELS_PATH}/ - elmo_model/ - saves/ - epochs/ - 1/, 2/, .... # directories of epochs - dumps/ - weights_epoch_n_1.hdf5, weights_epoch_n_2.hdf5, .... # hdf5 files of dumped ELMo weights - hubs/ - tf_hub_model_epoch_n_1/, tf_hub_model_epoch_n_2/, .... # directories of tensorflow hub wrapped - ELMo - - Intermediate checkpoints saved to `saves` directory. - To specify load/save paths use ``load_epoch_num``, ``epoch_load_path``, ``epoch_save_path``, ``dumps_save_path``, - ``tf_hub_save_path``. - - Dumping and tf_hub wrapping of ELMo occurs after each epoch. - - For learning the LM model dataset like 1 Billion Word Benchmark dataset is needed. - Examples of how datasets should look like you can learn from the configs of the examples below. - - Vocabulary file is a text file, with one token per line, separated by newlines. - Each token in the vocabulary is cached as the appropriate 50 character id sequence once. - It is recommended to always include the special and tokens (case sensitive) in the vocabulary file. - - For fine-tuning of LM on specific data, it is enough to save base model to path - ``{MODELS_PATH}/elmo_model/saves/epochs/0/`` and start training. - - Also for fine-tuning of LM on specific data, you can use pre-trained model for russian language on different - datasets. - - - LM model pre-trained on `ru-news` dataset ( lines = 63M, tokens = 946M, size = 12GB ), model is available by - :config:`elmo_lm_ready4fine_tuning_ru_news ` configuration file - or :config:`elmo_lm_ready4fine_tuning_ru_news_simple ` - configuration file. - - LM model pre-trained on `ru-twitter` dataset ( lines = 104M, tokens = 810M, size = 8.5GB ), model is available by - :config:`elmo_lm_ready4fine_tuning_ru_twitter ` configuration file - or :config:`elmo_lm_ready4fine_tuning_ru_twitter_simple ` - configuration file. - - LM model pre-trained on `ru-wiki` dataset ( lines = 1M, tokens = 386M, size = 5GB ), model is available by - :config:`elmo_lm_ready4fine_tuning_ru_wiki ` configuration file - or :config:`elmo_lm_ready4fine_tuning_ru_wiki_simple ` - configuration file. - - `simple` configuration file is a configuration of a model without special tags of output - vocab used for first training. - - .. note:: - - You need to download about **4 GB** also by default about **32 GB** of RAM and **10 GB** of GPU memory - required to running the :config:`elmo_lm_ready4fine_tuning_ru_* ` - on one GPU. - - After training you can use ``{MODELS_PATH}/elmo_model/saves/hubs/tf_hub_model_epoch_n_*/`` - as a ``ModuleSpec`` by using `TensorFlow Hub `__ or by - DeepPavlov :class:`~deeppavlov.models.embedders.elmo_embedder.ELMoEmbedder`. - - More about the ELMo model you can get from `original ELMo implementation - `__. - - - If some required packages are missing, install all the requirements by running in command line: - - .. code:: bash - - python -m deeppavlov install - - where ```` is a path to one of the :config:`provided config files ` - or its name without an extension, for example : - - .. code:: bash - - python -m deeppavlov install elmo_1b_benchmark_test - - Examples: - For a quick start, you can run test training of the test model on small data by this command from bash: - - .. code:: bash - - python -m deeppavlov train deeppavlov/configs/elmo/elmo_1b_benchmark_test.json -d - - To download the prepared `1 Billion Word Benchmark dataset `__ and - start a training model use this command from bash: - - .. note:: - - You need to download about **2 GB** also by default about **10 GB** of RAM and **10 GB** of GPU memory - required to running :config:`elmo_1b_benchmark ` on one GPU. - - .. code:: bash - - python -m deeppavlov train deeppavlov/configs/elmo/elmo_1b_benchmark.json -d - - To fine-tune ELMo as LM model on `1 Billion Word Benchmark dataset `__ - use commands from bash : - - .. code:: bash - - # download the prepared 1 Billion Word Benchmark dataset - python -m deeppavlov download deeppavlov/configs/elmo/elmo_1b_benchmark.json - # copy model checkpoint, network configuration, vocabulary of pre-trained LM model - mkdir -p ${MODELS_PATH}/elmo-1b-benchmark/saves/epochs/0 - cp my_ckpt.data-00000-of-00001 ${MODELS_PATH}/elmo-1b-benchmark/saves/epochs/0/model.data-00000-of-00001 - cp my_ckpt.index ${MODELS_PATH}/elmo-1b-benchmark/saves/epochs/0/model.index - cp my_ckpt.meta ${MODELS_PATH}/elmo-1b-benchmark/saves/epochs/0/model.meta - cp checkpoint ${MODELS_PATH}/elmo-1b-benchmark/saves/epochs/0/checkpoint - cp my_options.json ${MODELS_PATH}/elmo-1b-benchmark/options.json - cp my_vocab {MODELS_PATH}/elmo-1b-benchmark/vocab-2016-09-10.txt - # start a fine-tuning - python -m deeppavlov train deeppavlov/configs/elmo/elmo_1b_benchmark.json - - After training you can use the ELMo model from tf_hub wrapper by - `TensorFlow Hub `__ or by - DeepPavlov :class:`~deeppavlov.models.embedders.elmo_embedder.ELMoEmbedder`: - - >>> from deeppavlov.models.embedders.elmo_embedder import ELMoEmbedder - >>> spec = f"{MODELS_PATH}/elmo-1b-benchmark_test/saves/hubs/tf_hub_model_epoch_n_1/" - >>> elmo = ELMoEmbedder(spec) - >>> elmo([['вопрос', 'жизни', 'Вселенной', 'и', 'вообще', 'всего'], ['42']]) - array([[ 0.00719104, 0.08544601, -0.07179783, ..., 0.10879009, - -0.18630421, -0.2189409 ], - [ 0.16325025, -0.04736076, 0.12354863, ..., -0.1889013 , - 0.04972512, 0.83029324]], dtype=float32) - - """ - - def __init__(self, - options_json_path: Optional[str] = None, # Configure by json file - char_cnn: Optional[dict] = None, # Net architecture by direct params, use for overwrite a json arch. - bidirectional: Optional[bool] = None, - unroll_steps: Optional[int] = None, - n_tokens_vocab: Optional[int] = None, - lstm: Optional[dict] = None, - dropout: Optional[float] = None, # Regularization - n_negative_samples_batch: Optional[int] = None, # Train options - all_clip_norm_val: Optional[float] = None, - initial_accumulator_value: float = 1.0, - learning_rate: float = 2e-1, # For AdagradOptimizer - n_gpus: int = 1, # TODO: Add cpu supporting - seed: Optional[int] = None, # Other - batch_size: int = 128, # Data params - load_epoch_num: Optional[int] = None, - epoch_load_path: str = 'epochs', - epoch_save_path: Optional[str] = None, - dumps_save_path: str = 'dumps', - tf_hub_save_path: str = 'hubs', - **kwargs) -> None: - - # ================ Checking input args ================= - if not (options_json_path or (char_cnn and bidirectional and unroll_steps - and n_tokens_vocab and lstm and dropout and - n_negative_samples_batch and all_clip_norm_val - )): - raise Warning('Use options_json_path or/and direct params to set net architecture.') - self.options = self._load_options(options_json_path) - self._update_arch_options(char_cnn, bidirectional, unroll_steps, n_tokens_vocab, lstm) - self._update_other_options(dropout, n_negative_samples_batch, all_clip_norm_val) - - # Special options - self.options['learning_rate'] = learning_rate - self.options['initial_accumulator_value'] = initial_accumulator_value - self.options['seed'] = seed - self.options['n_gpus'] = n_gpus - self.options['batch_size'] = batch_size - - self.permanent_options = self.options - - self.train_options = {} - self.valid_options = {'batch_size': 256, 'unroll_steps': 1, 'n_gpus': 1} - self.model_mode = '' - - tf.set_random_seed(seed) - np.random.seed(seed) - - super().__init__(**kwargs) - - self.epoch_load_path = epoch_load_path - - if load_epoch_num is None: - load_epoch_num = self._get_epoch_from(self.epoch_load_path, None) - - if epoch_save_path is None: - self.epoch_save_path = self.epoch_load_path - - self.save_epoch_num = self._get_epoch_from(self.epoch_save_path) - - self.dumps_save_path = dumps_save_path - self.tf_hub_save_path = tf_hub_save_path - - self._build_model(train=False, epoch=load_epoch_num) - - self.save() - # after building the model and saving to the specified save path - # change the way to load intermediate checkpoints - self.load_path = self.save_path - - def _load_options(self, options_json_path): - if options_json_path: - options_json_path = expand_path(options_json_path) - with open(options_json_path, 'r') as fin: - options = json.load(fin) - else: - options = {} - return options - - def _update_arch_options(self, char_cnn, bidirectional, unroll_steps, n_tokens_vocab, lstm): - if char_cnn is not None: - self.options['char_cnn'] = char_cnn - if bidirectional is not None: - self.options['bidirectional'] = bidirectional - if unroll_steps is not None: - self.options['unroll_steps'] = unroll_steps - if n_tokens_vocab is not None: - self.options['n_tokens_vocab'] = n_tokens_vocab - if lstm is not None: - self.options['lstm'] = lstm - - def _update_other_options(self, dropout, n_negative_samples_batch, all_clip_norm_val): - if dropout is not None: - self.options['dropout'] = dropout - if n_negative_samples_batch is not None: - self.options['n_negative_samples_batch'] = n_negative_samples_batch - if all_clip_norm_val is not None: - self.options['all_clip_norm_val'] = all_clip_norm_val - - def _get_epoch_from(self, epoch_load_path, default=0): - path = self.load_path - path = path.parent / epoch_load_path - candidates = path.resolve().glob('[0-9]*') - candidates = list(safely_str2int(i.parts[-1]) for i in candidates - if safely_str2int(i.parts[-1]) is not None) - epoch_num = max(candidates, default=default) - return epoch_num - - def _build_graph(self, graph, train=True): - with graph.as_default(): - with tf.device('/cpu:0'): - init_step = 0 - global_step = tf.get_variable( - 'global_step', [], - initializer=tf.constant_initializer(init_step), trainable=False) - self.global_step = global_step - # set up the optimizer - opt = tf.train.AdagradOptimizer(learning_rate=self.options['learning_rate'], - initial_accumulator_value=1.0) - - # calculate the gradients on each GPU - tower_grads = [] - models = [] - loss = tf.get_variable( - 'train_perplexity', [], - initializer=tf.constant_initializer(0.0), trainable=False) - for k in range(self.options['n_gpus']): - with tf.device('/gpu:%d' % k): - with tf.variable_scope('lm', reuse=k > 0): - # calculate the loss for one model replica and get - # lstm states - model = LanguageModel(self.options, True) - total_train_loss = model.total_train_loss - total_eval_loss = model.total_eval_loss - models.append(model) - # get gradients - grads = opt.compute_gradients( - tf.reduce_mean(total_train_loss) * self.options['unroll_steps'], - aggregation_method=tf.AggregationMethod.EXPERIMENTAL_TREE, - ) - tower_grads.append(grads) - # # keep track of loss across all GPUs - if train: - loss += total_train_loss - else: - loss += total_eval_loss - - # calculate the mean of each gradient across all GPUs - grads = average_gradients(tower_grads, self.options['batch_size'], self.options) - grads, _ = clip_grads(grads, self.options, True, global_step) - loss = loss / self.options['n_gpus'] - train_op = opt.apply_gradients(grads, global_step=global_step) - return models, train_op, loss, graph - - def _init_session(self): - sess_config = tf.ConfigProto(allow_soft_placement=True) - sess_config.gpu_options.allow_growth = True - - self.sess = tf.Session(config=sess_config) - self.sess.run(tf.global_variables_initializer()) - - batch_size = self.options['batch_size'] - unroll_steps = self.options['unroll_steps'] - - # get the initial lstm states - init_state_tensors = [] - final_state_tensors = [] - for model in self.models: - init_state_tensors.extend(model.init_lstm_state) - final_state_tensors.extend(model.final_lstm_state) - - char_inputs = 'char_cnn' in self.options - if char_inputs: - max_chars = self.options['char_cnn']['max_characters_per_token'] - - if not char_inputs: - feed_dict = { - model.token_ids: - np.zeros([batch_size, unroll_steps], dtype=np.int64) - for model in self.models - } - else: - feed_dict = { - model.tokens_characters: - np.zeros([batch_size, unroll_steps, max_chars], - dtype=np.int32) - for model in self.models - } - - if self.options['bidirectional']: - if not char_inputs: - feed_dict.update({ - model.token_ids_reverse: - np.zeros([batch_size, unroll_steps], dtype=np.int64) - for model in self.models - }) - else: - feed_dict.update({ - model.tokens_characters_reverse: - np.zeros([batch_size, unroll_steps, max_chars], - dtype=np.int32) - for model in self.models - }) - - init_state_values = self.sess.run(init_state_tensors, feed_dict=feed_dict) - return init_state_values, init_state_tensors, final_state_tensors - - def _fill_feed_dict(self, - char_ids_batches, - reversed_char_ids_batches, - token_ids_batches=None, - reversed_token_ids_batches=None): - # init state tensors - feed_dict = {t: v for t, v in zip(self.init_state_tensors, self.init_state_values)} - - for k, model in enumerate(self.models): - start = k * self.options['batch_size'] - end = (k + 1) * self.options['batch_size'] - - # character inputs - char_ids = char_ids_batches[start:end] # get char_ids - - feed_dict[model.tokens_characters] = char_ids - - if self.options['bidirectional']: - feed_dict[model.tokens_characters_reverse] = \ - reversed_char_ids_batches[start:end] # get tokens_characters_reverse - - if token_ids_batches is not None: - feed_dict[model.next_token_id] = token_ids_batches[start:end] # get next_token_id - if self.options['bidirectional']: - feed_dict[model.next_token_id_reverse] = \ - reversed_token_ids_batches[start:end] # get next_token_id_reverse - - return feed_dict - - def __call__(self, x, y, *args, **kwargs) -> List[float]: - if len(args) != 0: - return [] - char_ids_batches, reversed_char_ids_batches = x - token_ids_batches, reversed_token_ids_batches = y - - feed_dict = self._fill_feed_dict(char_ids_batches, reversed_char_ids_batches, token_ids_batches, - reversed_token_ids_batches) - - with self.graph.as_default(): - loss, self.init_state_values = self.sess.run([self.loss, self.final_state_tensors], feed_dict) - return loss - - @overrides - def load(self, epoch: Optional[int] = None) -> None: - """Load model parameters from self.load_path""" - path = self.load_path - if epoch is not None: - path = path.parent / self.epoch_save_path / str(epoch) / path.parts[-1] - path.resolve() - log.info(f'[loading {epoch} epoch]') - - # path.parent.mkdir(parents=True, exist_ok=True) - path = str(path) - - # Check presence of the model files - if tf.train.checkpoint_exists(path): - log.info(f'[loading model from {path}]') - with self.graph.as_default(): - saver = tf.train.Saver() - saver.restore(self.sess, path) - else: - log.info(f'[A checkpoint not found in {path}]') - - @overrides - def save(self, epoch: Optional[int] = None) -> None: - """Save model parameters to self.save_path""" - path = self.save_path - if epoch is not None: - path = path.parent / self.epoch_save_path / str(epoch) / path.parts[-1] - path.resolve() - log.info(f'[saving {epoch} epoch]') - - path.parent.mkdir(parents=True, exist_ok=True) - path = str(path) - - log.info(f'[saving model to {path}]') - with self.graph.as_default(): - saver = tf.train.Saver() - saver.save(self.sess, path) - - def train_on_batch(self, - x_char_ids: list, - y_token_ids: list) -> List[float]: - """ - This method is called by trainer to make one training step on one batch. - - Args: - x_char_ids: a batch of char_ids - y_token_ids: a batch of token_ids - - Returns: - value of loss function on batch - """ - - char_ids_batches, reversed_char_ids_batches = x_char_ids - token_ids_batches, reversed_token_ids_batches = y_token_ids - - feed_dict = self._fill_feed_dict(char_ids_batches, reversed_char_ids_batches, - token_ids_batches, reversed_token_ids_batches) - - with self.graph.as_default(): - loss, _, self.init_state_values = self.sess.run([self.loss, self.train_op, self.final_state_tensors], - feed_dict) - - return np.mean(loss) - - def _build_model(self, train: bool, epoch: Optional[int] = None, **kwargs): - - if hasattr(self, 'sess'): - self.sess.close() - - self.options = copy.deepcopy(self.permanent_options) - - if train: - self.options.update(self.train_options) - self.options.update(kwargs) - - self.models, self.train_op, self.loss, self.graph = self._build_graph(tf.Graph()) - else: - self.options.update(self.valid_options) - self.options.update(kwargs) - - self.models, self.train_op, self.loss, self.graph = self._build_graph(tf.Graph(), - train=False) - - with self.graph.as_default(): - self.init_state_values, self.init_state_tensors, self.final_state_tensors = \ - self._init_session() - self.load(epoch) - - def process_event(self, event_name, data): - if event_name == 'before_train' and self.model_mode != 'train': - self._build_model(train=True) - self.model_mode = 'train' - elif event_name == 'before_validation' and self.model_mode != 'validation': - epoch = self.save_epoch_num + int(data['epochs_done']) - self.save(epoch) - self.save() - self.elmo_export(epoch) - - self._build_model(train=False) - self.model_mode = 'validation' - - def elmo_export(self, epoch: Optional[int] = None) -> None: - """ - Dump the trained weights from a model to a HDF5 file and export a TF-Hub module. - """ - if hasattr(self, 'sess'): - self.sess.close() - path = self.save_path - if epoch: - from_path = path.parent / self.epoch_save_path / str(epoch) / path.parts[-1] - weights_to_path = path.parent / self.dumps_save_path / f'weights_epoch_n_{epoch}.hdf5' - tf_hub_to_path = path.parent / self.tf_hub_save_path / f'tf_hub_model_epoch_n_{epoch}' - from_path.resolve() - weights_to_path.resolve() - tf_hub_to_path.resolve() - log.info(f'[exporting {epoch} epoch]') - else: - from_path = path - weights_to_path = path.parent / self.dumps_save_path / 'weights.hdf5' - tf_hub_to_path = path.parent / self.tf_hub_save_path / 'tf_hub_model' - - weights_to_path.parent.mkdir(parents=True, exist_ok=True) - tf_hub_to_path.parent.mkdir(parents=True, exist_ok=True) - - # Check presence of the model files - if tf.train.checkpoint_exists(str(from_path)): - dump_weights(from_path.parent, weights_to_path, self.permanent_options) - - options = copy.deepcopy(self.permanent_options) - options['char_cnn']['n_characters'] = 262 - export2hub(weights_to_path, tf_hub_to_path, options) - - def destroy(self) -> None: - """ - Delete model from memory - - Returns: - None - """ - if hasattr(self, 'sess'): - for k in list(self.sess.graph.get_all_collection_keys()): - self.sess.graph.clear_collection(k) - super().destroy() diff --git a/deeppavlov/models/elmo/elmo2tfhub.py b/deeppavlov/models/elmo/elmo2tfhub.py deleted file mode 100644 index a304bf6837..0000000000 --- a/deeppavlov/models/elmo/elmo2tfhub.py +++ /dev/null @@ -1,208 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import shutil - -import numpy as np -import tensorflow as tf -import tensorflow_hub as hub - -from deeppavlov.models.elmo.elmo_model import BidirectionalLanguageModel, weight_layers - - -def make_module_spec(options, weight_file): - """Makes a module spec. - - Args: - options: LM hyperparameters. - weight_file: location of the hdf5 file with LM weights. - - Returns: - A module spec object used for constructing a TF-Hub module. - """ - - def module_fn(): - """Spec function for a token embedding module.""" - # init - _bos_id = 256 - _eos_id = 257 - _bow_id = 258 - _eow_id = 259 - _pad_id = 260 - - _max_word_length = 50 - _parallel_iterations = 10 - _max_batch_size = 1024 - - id_dtype = tf.int32 - id_nptype = np.int32 - max_word_length = tf.constant(_max_word_length, dtype=id_dtype, name='max_word_length') - - version = tf.constant('from_dp_1', dtype=tf.string, name='version') - - # the charcter representation of the begin/end of sentence characters - def _make_bos_eos(c): - r = np.zeros([_max_word_length], dtype=id_nptype) - r[:] = _pad_id - r[0] = _bow_id - r[1] = c - r[2] = _eow_id - return tf.constant(r, dtype=id_dtype) - - bos_ids = _make_bos_eos(_bos_id) - eos_ids = _make_bos_eos(_eos_id) - - def token2ids(token): - with tf.name_scope("token2ids_preprocessor"): - char_ids = tf.decode_raw(token, tf.uint8, name='decode_raw2get_char_ids') - char_ids = tf.cast(char_ids, tf.int32, name='cast2int_token') - char_ids = tf.strided_slice(char_ids, [0], [max_word_length - 2], - [1], name='slice2resized_token') - ids_num = tf.shape(char_ids)[0] - fill_ids_num = (_max_word_length - 2) - ids_num - pads = tf.fill([fill_ids_num], _pad_id) - bow_token_eow_pads = tf.concat([[_bow_id], char_ids, [_eow_id], pads], - 0, name='concat2bow_token_eow_pads') - return bow_token_eow_pads - - def sentence_tagging_and_padding(sen_dim): - with tf.name_scope("sentence_tagging_and_padding_preprocessor"): - sen = sen_dim[0] - dim = sen_dim[1] - extra_dim = tf.shape(sen)[0] - dim - sen = tf.slice(sen, [0, 0], [dim, max_word_length], name='slice2sen') - - bos_sen_eos = tf.concat([[bos_ids], sen, [eos_ids]], 0, name='concat2bos_sen_eos') - bos_sen_eos_plus_one = bos_sen_eos + 1 - bos_sen_eos_pads = tf.pad(bos_sen_eos_plus_one, [[0, extra_dim], [0, 0]], - "CONSTANT", name='pad2bos_sen_eos_pads') - return bos_sen_eos_pads - - # Input placeholders to the biLM. - tokens = tf.placeholder(shape=(None, None), dtype=tf.string, name='ph2tokens') - sequence_len = tf.placeholder(shape=(None,), dtype=tf.int32, name='ph2sequence_len') - - tok_shape = tf.shape(tokens) - line_tokens = tf.reshape(tokens, shape=[-1], name='reshape2line_tokens') - - with tf.device('/cpu:0'): - tok_ids = tf.map_fn( - token2ids, - line_tokens, - dtype=tf.int32, back_prop=False, parallel_iterations=_parallel_iterations, - name='map_fn2get_tok_ids') - - tok_ids = tf.reshape(tok_ids, [tok_shape[0], tok_shape[1], -1], name='reshape2tok_ids') - with tf.device('/cpu:0'): - sen_ids = tf.map_fn( - sentence_tagging_and_padding, - (tok_ids, sequence_len), - dtype=tf.int32, back_prop=False, parallel_iterations=_parallel_iterations, - name='map_fn2get_sen_ids') - - # Build the biLM graph. - bilm = BidirectionalLanguageModel(options, str(weight_file), - max_batch_size=_max_batch_size) - - embeddings_op = bilm(sen_ids) - - # Get an op to compute ELMo (weighted average of the internal biLM layers) - elmo_output = weight_layers('elmo_output', embeddings_op, l2_coef=0.0) - - weighted_op = elmo_output['weighted_op'] - mean_op = elmo_output['mean_op'] - word_emb = elmo_output['word_emb'] - lstm_outputs1 = elmo_output['lstm_outputs1'] - lstm_outputs2 = elmo_output['lstm_outputs2'] - - hub.add_signature("tokens", {"tokens": tokens, "sequence_len": sequence_len}, - {"elmo": weighted_op, - "default": mean_op, - "word_emb": word_emb, - "lstm_outputs1": lstm_outputs1, - "lstm_outputs2": lstm_outputs2, - "version": version}) - - # #########################Next signature############################# # - - # Input placeholders to the biLM. - def_strings = tf.placeholder(shape=(None), dtype=tf.string) - def_tokens_sparse = tf.string_split(def_strings) - def_tokens_dense = tf.sparse_to_dense(sparse_indices=def_tokens_sparse.indices, - output_shape=def_tokens_sparse.dense_shape, - sparse_values=def_tokens_sparse.values, - default_value='' - ) - def_mask = tf.not_equal(def_tokens_dense, '') - def_int_mask = tf.cast(def_mask, dtype=tf.int32) - def_sequence_len = tf.reduce_sum(def_int_mask, axis=-1) - - def_tok_shape = tf.shape(def_tokens_dense) - def_line_tokens = tf.reshape(def_tokens_dense, shape=[-1], name='reshape2line_tokens') - - with tf.device('/cpu:0'): - def_tok_ids = tf.map_fn( - token2ids, - def_line_tokens, - dtype=tf.int32, back_prop=False, parallel_iterations=_parallel_iterations, - name='map_fn2get_tok_ids') - - def_tok_ids = tf.reshape(def_tok_ids, [def_tok_shape[0], def_tok_shape[1], -1], name='reshape2tok_ids') - with tf.device('/cpu:0'): - def_sen_ids = tf.map_fn( - sentence_tagging_and_padding, - (def_tok_ids, def_sequence_len), - dtype=tf.int32, back_prop=False, parallel_iterations=_parallel_iterations, - name='map_fn2get_sen_ids') - - # Get ops to compute the LM embeddings. - def_embeddings_op = bilm(def_sen_ids) - - # Get an op to compute ELMo (weighted average of the internal biLM layers) - def_elmo_output = weight_layers('elmo_output', def_embeddings_op, l2_coef=0.0, reuse=True) - - def_weighted_op = def_elmo_output['weighted_op'] - def_mean_op = def_elmo_output['mean_op'] - def_word_emb = def_elmo_output['word_emb'] - def_lstm_outputs1 = def_elmo_output['lstm_outputs1'] - def_lstm_outputs2 = def_elmo_output['lstm_outputs2'] - - hub.add_signature("default", {"strings": def_strings}, - {"elmo": def_weighted_op, - "default": def_mean_op, - "word_emb": def_word_emb, - "lstm_outputs1": def_lstm_outputs1, - "lstm_outputs2": def_lstm_outputs2, - "version": version}) - - return hub.create_module_spec(module_fn) - - -def export2hub(weight_file, hub_dir, options): - """Exports a TF-Hub module - """ - - spec = make_module_spec(options, str(weight_file)) - - try: - with tf.Graph().as_default(): - module = hub.Module(spec) - - with tf.Session() as sess: - sess.run(tf.global_variables_initializer()) - if hub_dir.exists(): - shutil.rmtree(hub_dir) - module.export(str(hub_dir), sess) - finally: - pass diff --git a/deeppavlov/models/elmo/elmo_model.py b/deeppavlov/models/elmo/elmo_model.py deleted file mode 100644 index 8e475dcedb..0000000000 --- a/deeppavlov/models/elmo/elmo_model.py +++ /dev/null @@ -1,730 +0,0 @@ -# originally based on https://github.com/allenai/bilm-tf/blob/master/bilm/model.py - -# Modifications copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import h5py -import numpy as np -import tensorflow as tf - -DTYPE = 'float32' - - -class BidirectionalLanguageModel(object): - def __init__( - self, - options: dict, - weight_file: str, - use_character_inputs=True, - embedding_weight_file=None, - max_batch_size=128): - """ - Creates the language model computational graph and loads weights - - Two options for input type: - (1) To use character inputs (paired with Batcher) - pass use_character_inputs=True, and ids_placeholder - of shape (None, None, max_characters_per_token) - to __call__ - (2) To use token ids as input (paired with TokenBatcher), - pass use_character_inputs=False and ids_placeholder - of shape (None, None) to __call__. - In this case, embedding_weight_file is also required input - - options_file: location of the json formatted file with - LM hyperparameters - weight_file: location of the hdf5 file with LM weights - use_character_inputs: if True, then use character ids as input, - otherwise use token ids - max_batch_size: the maximum allowable batch size - """ - if not use_character_inputs: - if embedding_weight_file is None: - raise ValueError( - "embedding_weight_file is required input with " - "not use_character_inputs" - ) - - self._options = options - self._weight_file = weight_file - self._embedding_weight_file = embedding_weight_file - self._use_character_inputs = use_character_inputs - self._max_batch_size = max_batch_size - - self._ops = {} - self._graphs = {} - - def __call__(self, ids_placeholder): - """ - Given the input character ids (or token ids), returns a dictionary - with tensorflow ops: - - {'lm_embeddings': embedding_op, - 'lengths': sequence_lengths_op, - 'mask': op to compute mask} - - embedding_op computes the LM embeddings and is shape - (None, 3, None, 1024) - lengths_op computes the sequence lengths and is shape (None, ) - mask computes the sequence mask and is shape (None, None) - - ids_placeholder: a tf.placeholder of type int32. - If use_character_inputs=True, it is shape - (None, None, max_characters_per_token) and holds the input - character ids for a batch - If use_character_input=False, it is shape (None, None) and - holds the input token ids for a batch - """ - if ids_placeholder in self._ops: - # have already created ops for this placeholder, just return them - ret = self._ops[ids_placeholder] - - else: - # need to create the graph - if len(self._ops) == 0: - # first time creating the graph, don't reuse variables - lm_graph = BidirectionalLanguageModelGraph( - self._options, - self._weight_file, - ids_placeholder, - embedding_weight_file=self._embedding_weight_file, - use_character_inputs=self._use_character_inputs, - max_batch_size=self._max_batch_size) - else: - with tf.variable_scope('', reuse=True): - lm_graph = BidirectionalLanguageModelGraph( - self._options, - self._weight_file, - ids_placeholder, - embedding_weight_file=self._embedding_weight_file, - use_character_inputs=self._use_character_inputs, - max_batch_size=self._max_batch_size) - - ops = self._build_ops(lm_graph) - self._ops[ids_placeholder] = ops - self._graphs[ids_placeholder] = lm_graph - ret = ops - - return ret - - def _build_ops(self, lm_graph): - with tf.control_dependencies([lm_graph.update_state_op]): - # get the LM embeddings - token_embeddings = lm_graph.embedding - layers = [ - tf.concat([token_embeddings, token_embeddings], axis=2) - ] - - n_lm_layers = len(lm_graph.lstm_outputs['forward']) - for i in range(n_lm_layers): - layers.append( - tf.concat( - [lm_graph.lstm_outputs['forward'][i], - lm_graph.lstm_outputs['backward'][i]], - axis=-1 - ) - ) - - # The layers include the BOS/EOS tokens. Remove them - sequence_length_wo_bos_eos = lm_graph.sequence_lengths - 2 - layers_without_bos_eos = [] - for layer in layers: - layer_wo_bos_eos = layer[:, 1:, :] - layer_wo_bos_eos = tf.reverse_sequence( - layer_wo_bos_eos, - lm_graph.sequence_lengths - 1, - seq_axis=1, - batch_axis=0, - ) - layer_wo_bos_eos = layer_wo_bos_eos[:, 1:, :] - layer_wo_bos_eos = tf.reverse_sequence( - layer_wo_bos_eos, - sequence_length_wo_bos_eos, - seq_axis=1, - batch_axis=0, - ) - layers_without_bos_eos.append(layer_wo_bos_eos) - - # concatenate the layers - lm_embeddings = tf.concat( - [tf.expand_dims(t, axis=1) for t in layers_without_bos_eos], - axis=1 - ) - - # get the mask op without bos/eos. - # tf doesn't support reversing boolean tensors, so cast - # to int then back - mask_wo_bos_eos = tf.cast(lm_graph.mask[:, 1:], 'int32') - mask_wo_bos_eos = tf.reverse_sequence( - mask_wo_bos_eos, - lm_graph.sequence_lengths - 1, - seq_axis=1, - batch_axis=0, - ) - mask_wo_bos_eos = mask_wo_bos_eos[:, 1:] - mask_wo_bos_eos = tf.reverse_sequence( - mask_wo_bos_eos, - sequence_length_wo_bos_eos, - seq_axis=1, - batch_axis=0, - ) - mask_wo_bos_eos = tf.cast(mask_wo_bos_eos, 'bool') - - return { - 'lm_embeddings': lm_embeddings, - 'lengths': sequence_length_wo_bos_eos, - 'token_embeddings': lm_graph.embedding, - 'mask': mask_wo_bos_eos, - } - - -def _pretrained_initializer(varname, weight_file, embedding_weight_file=None): - """ - We'll stub out all the initializers in the pretrained LM with - a function that loads the weights from the file - """ - weight_name_map = {} - for i in range(2): - for j in range(8): # if we decide to add more layers - root = 'RNN_{}/RNN/MultiRNNCell/Cell{}'.format(i, j) - weight_name_map[root + '/rnn/lstm_cell/kernel'] = \ - root + '/LSTMCell/W_0' - weight_name_map[root + '/rnn/lstm_cell/bias'] = \ - root + '/LSTMCell/B' - weight_name_map[root + '/rnn/lstm_cell/projection/kernel'] = \ - root + '/LSTMCell/W_P_0' - - # convert the graph name to that in the checkpoint - varname_in_file = varname[5:] - if varname_in_file.startswith('RNN'): - varname_in_file = weight_name_map[varname_in_file] - - if varname_in_file == 'embedding': - with h5py.File(embedding_weight_file, 'r') as fin: - # Have added a special 0 index for padding not present - # in the original model. - embed_weights = fin[varname_in_file][...] - weights = np.zeros( - (embed_weights.shape[0] + 1, embed_weights.shape[1]), - dtype=DTYPE - ) - weights[1:, :] = embed_weights - else: - with h5py.File(weight_file, 'r') as fin: - if varname_in_file == 'char_embed': - # Have added a special 0 index for padding not present - # in the original model. - char_embed_weights = fin[varname_in_file][...] - weights = np.zeros( - (char_embed_weights.shape[0] + 1, - char_embed_weights.shape[1]), - dtype=DTYPE - ) - weights[1:, :] = char_embed_weights - else: - weights = fin[varname_in_file][...] - - # Tensorflow initializers are callables that accept a shape parameter - # and some optional kwargs - def ret(shape, **kwargs): - if list(shape) != list(weights.shape): - raise ValueError( - "Invalid shape initializing {0}, got {1}, expected {2}".format( - varname_in_file, shape, weights.shape) - ) - return weights - - return ret - - -class BidirectionalLanguageModelGraph(object): - """ - Creates the computational graph and holds the ops necessary for runnint - a bidirectional language model - """ - - def __init__(self, options, weight_file, ids_placeholder, - use_character_inputs=True, embedding_weight_file=None, - max_batch_size=128): - - self.options = options - self._max_batch_size = max_batch_size - self.ids_placeholder = ids_placeholder - self.use_character_inputs = use_character_inputs - - # this custom_getter will make all variables not trainable and - # override the default initializer - def custom_getter(getter, name, *args, **kwargs): - kwargs['trainable'] = False - kwargs['initializer'] = _pretrained_initializer( - name, weight_file, embedding_weight_file - ) - return getter(name, *args, **kwargs) - - if embedding_weight_file is not None: - # get the vocab size - with h5py.File(embedding_weight_file, 'r') as fin: - # +1 for padding - self._n_tokens_vocab = fin['embedding'].shape[0] + 1 - else: - self._n_tokens_vocab = None - - with tf.variable_scope('bilm', custom_getter=custom_getter): - self._build() - - def _build(self): - if self.use_character_inputs: - self._build_word_char_embeddings() - else: - self._build_word_embeddings() - self._build_lstms() - - def _build_word_char_embeddings(self): - """ - options contains key 'char_cnn': { - - 'n_characters': 262, - - # includes the start / end characters - 'max_characters_per_token': 50, - - 'filters': [ - [1, 32], - [2, 32], - [3, 64], - [4, 128], - [5, 256], - [6, 512], - [7, 512] - ], - 'activation': 'tanh', - - # for the character embedding - 'embedding': {'dim': 16} - - # for highway layers - # if omitted, then no highway layers - 'n_highway': 2, - } - """ - projection_dim = self.options['lstm']['projection_dim'] - - cnn_options = self.options['char_cnn'] - filters = cnn_options['filters'] - n_filters = sum(f[1] for f in filters) - max_chars = cnn_options['max_characters_per_token'] - char_embed_dim = cnn_options['embedding']['dim'] - n_chars = cnn_options['n_characters'] - if n_chars != 262: - raise Exception("Set n_characters=262 after training see a \ - https://github.com/allenai/bilm-tf/blob/master/README.md") - - if cnn_options['activation'] == 'tanh': - activation = tf.nn.tanh - elif cnn_options['activation'] == 'relu': - activation = tf.nn.relu - - # the character embeddings - with tf.device("/cpu:0"): - self.embedding_weights = tf.get_variable("char_embed", [n_chars, char_embed_dim], - dtype=DTYPE, - initializer=tf.random_uniform_initializer(-1.0, 1.0)) - # shape (batch_size, unroll_steps, max_chars, embed_dim) - self.char_embedding = tf.nn.embedding_lookup(self.embedding_weights, - self.ids_placeholder) - - # the convolutions - def make_convolutions(inp): - with tf.variable_scope('CNN'): - convolutions = [] - for i, (width, num) in enumerate(filters): - if cnn_options['activation'] == 'relu': - # He initialization for ReLU activation - # with char embeddings init between -1 and 1 - # w_init = tf.random_normal_initializer( - # mean=0.0, - # stddev=np.sqrt(2.0 / (width * char_embed_dim)) - # ) - - # Kim et al 2015, +/- 0.05 - w_init = tf.random_uniform_initializer( - minval=-0.05, maxval=0.05) - elif cnn_options['activation'] == 'tanh': - # glorot init - w_init = tf.random_normal_initializer( - mean=0.0, - stddev=np.sqrt(1.0 / (width * char_embed_dim)) - ) - w = tf.get_variable( - "W_cnn_%s" % i, - [1, width, char_embed_dim, num], - initializer=w_init, - dtype=DTYPE) - b = tf.get_variable( - "b_cnn_%s" % i, [num], dtype=DTYPE, - initializer=tf.constant_initializer(0.0)) - - conv = tf.nn.conv2d(inp, w, - strides=[1, 1, 1, 1], - padding="VALID") + b - # now max pool - conv = tf.nn.max_pool(conv, [1, 1, max_chars - width + 1, 1], - [1, 1, 1, 1], 'VALID') - - # activation - conv = activation(conv) - conv = tf.squeeze(conv, squeeze_dims=[2]) - - convolutions.append(conv) - - return tf.concat(convolutions, 2) - - embedding = make_convolutions(self.char_embedding) - - # for highway and projection layers - n_highway = cnn_options.get('n_highway') - use_highway = n_highway is not None and n_highway > 0 - use_proj = n_filters != projection_dim - - if use_highway or use_proj: - # reshape from (batch_size, n_tokens, dim) to (-1, dim) - batch_size_n_tokens = tf.shape(embedding)[0:2] - embedding = tf.reshape(embedding, [-1, n_filters]) - - # set up weights for projection - if use_proj: - assert n_filters > projection_dim - with tf.variable_scope('CNN_proj'): - W_proj_cnn = tf.get_variable( - "W_proj", [n_filters, projection_dim], - initializer=tf.random_normal_initializer( - mean=0.0, stddev=np.sqrt(1.0 / n_filters)), - dtype=DTYPE) - b_proj_cnn = tf.get_variable( - "b_proj", [projection_dim], - initializer=tf.constant_initializer(0.0), - dtype=DTYPE) - - # apply highways layers - def high(x, ww_carry, bb_carry, ww_tr, bb_tr): - carry_gate = tf.nn.sigmoid(tf.matmul(x, ww_carry) + bb_carry) - transform_gate = tf.nn.relu(tf.matmul(x, ww_tr) + bb_tr) - return carry_gate * transform_gate + (1.0 - carry_gate) * x - - if use_highway: - highway_dim = n_filters - - for i in range(n_highway): - with tf.variable_scope('CNN_high_%s' % i): - W_carry = tf.get_variable( - 'W_carry', [highway_dim, highway_dim], - # glorit init - initializer=tf.random_normal_initializer( - mean=0.0, stddev=np.sqrt(1.0 / highway_dim)), - dtype=DTYPE) - b_carry = tf.get_variable( - 'b_carry', [highway_dim], - initializer=tf.constant_initializer(-2.0), - dtype=DTYPE) - W_transform = tf.get_variable( - 'W_transform', [highway_dim, highway_dim], - initializer=tf.random_normal_initializer( - mean=0.0, stddev=np.sqrt(1.0 / highway_dim)), - dtype=DTYPE) - b_transform = tf.get_variable( - 'b_transform', [highway_dim], - initializer=tf.constant_initializer(0.0), - dtype=DTYPE) - - embedding = high(embedding, W_carry, b_carry, - W_transform, b_transform) - - # finally project down if needed - if use_proj: - embedding = tf.matmul(embedding, W_proj_cnn) + b_proj_cnn - - # reshape back to (batch_size, tokens, dim) - if use_highway or use_proj: - shp = tf.concat([batch_size_n_tokens, [projection_dim]], axis=0) - embedding = tf.reshape(embedding, shp) - - # at last assign attributes for remainder of the model - self.embedding = embedding - - def _build_word_embeddings(self): - projection_dim = self.options['lstm']['projection_dim'] - - # the word embeddings - with tf.device("/cpu:0"): - self.embedding_weights = tf.get_variable( - "embedding", [self._n_tokens_vocab, projection_dim], - dtype=DTYPE, - ) - self.embedding = tf.nn.embedding_lookup(self.embedding_weights, - self.ids_placeholder) - - def _build_lstms(self): - # now the LSTMs - # these will collect the initial states for the forward - # (and reverse LSTMs if we are doing bidirectional) - - # parse the options - lstm_dim = self.options['lstm']['dim'] - projection_dim = self.options['lstm']['projection_dim'] - n_lstm_layers = self.options['lstm'].get('n_layers', 1) - cell_clip = self.options['lstm'].get('cell_clip') - proj_clip = self.options['lstm'].get('proj_clip') - use_skip_connections = self.options['lstm']['use_skip_connections'] - - # the sequence lengths from input mask - if self.use_character_inputs: - mask = tf.reduce_any(self.ids_placeholder > 0, axis=2) - else: - mask = self.ids_placeholder > 0 - sequence_lengths = tf.reduce_sum(tf.cast(mask, tf.int32), axis=1) - batch_size = tf.shape(sequence_lengths)[0] - - # for each direction, we'll store tensors for each layer - self.lstm_outputs = {'forward': [], 'backward': []} - self.lstm_state_sizes = {'forward': [], 'backward': []} - self.lstm_init_states = {'forward': [], 'backward': []} - self.lstm_final_states = {'forward': [], 'backward': []} - - update_ops = [] - for direction in ['forward', 'backward']: - if direction == 'forward': - layer_input = self.embedding - else: - layer_input = tf.reverse_sequence( - self.embedding, - sequence_lengths, - seq_axis=1, - batch_axis=0 - ) - - for i in range(n_lstm_layers): - if projection_dim < lstm_dim: - # are projecting down output - lstm_cell = tf.nn.rnn_cell.LSTMCell( - lstm_dim, num_proj=projection_dim, - cell_clip=cell_clip, proj_clip=proj_clip) - else: - lstm_cell = tf.nn.rnn_cell.LSTMCell(lstm_dim, - cell_clip=cell_clip, proj_clip=proj_clip) - - if use_skip_connections: - # ResidualWrapper adds inputs to outputs - if i == 0: - # don't add skip connection from token embedding to - # 1st layer output - pass - else: - # add a skip connection - lstm_cell = tf.nn.rnn_cell.ResidualWrapper(lstm_cell) - - # collect the input state, run the dynamic rnn, collect - # the output - # the LSTMs are stateful. To support multiple batch sizes, - # we'll allocate size for states up to max_batch_size, - # then use the first batch_size entries for each batch - init_states = [ - tf.Variable( - tf.zeros([self._max_batch_size, dim]), - trainable=False - ) - for dim in lstm_cell.state_size - ] - batch_init_states = [ - state[:batch_size, :] for state in init_states - ] - - if direction == 'forward': - i_direction = 0 - else: - i_direction = 1 - variable_scope_name = 'RNN_{0}/RNN/MultiRNNCell/Cell{1}'.format( - i_direction, i) - with tf.variable_scope(variable_scope_name): - layer_output, final_state = tf.nn.dynamic_rnn( - lstm_cell, - layer_input, - sequence_length=sequence_lengths, - initial_state=tf.nn.rnn_cell.LSTMStateTuple( - *batch_init_states), - ) - - self.lstm_state_sizes[direction].append(lstm_cell.state_size) - self.lstm_init_states[direction].append(init_states) - self.lstm_final_states[direction].append(final_state) - if direction == 'forward': - self.lstm_outputs[direction].append(layer_output) - else: - self.lstm_outputs[direction].append( - tf.reverse_sequence( - layer_output, - sequence_lengths, - seq_axis=1, - batch_axis=0 - ) - ) - - with tf.control_dependencies([layer_output]): - # update the initial states - for i in range(2): - new_state = tf.concat( - [final_state[i][:batch_size, :], - init_states[i][batch_size:, :]], axis=0) - state_update_op = tf.assign(init_states[i], new_state) - update_ops.append(state_update_op) - - layer_input = layer_output - - self.mask = mask - self.sequence_lengths = sequence_lengths - self.update_state_op = tf.group(*update_ops) - - -def weight_layers(name, bilm_ops, l2_coef=None, - use_top_only=False, do_layer_norm=False, reuse=False): - """ - Weight the layers of a biLM with trainable scalar weights to - compute ELMo representations. - - For each output layer, this returns two ops. The first computes - a layer specific weighted average of the biLM layers, and - the second the l2 regularizer loss term. - The regularization terms are also add to tf.GraphKeys.REGULARIZATION_LOSSES - - Input: - name = a string prefix used for the trainable variable names - bilm_ops = the tensorflow ops returned to compute internal - representations from a biLM. This is the return value - from BidirectionalLanguageModel(...)(ids_placeholder) - l2_coef: the l2 regularization coefficient $\lambda$. - Pass None or 0.0 for no regularization. - use_top_only: if True, then only use the top layer. - do_layer_norm: if True, then apply layer normalization to each biLM - layer before normalizing - reuse: reuse an aggregation variable scope. - - Output: - { - 'weighted_op': op to compute weighted average for output, - 'regularization_op': op to compute regularization term - } - """ - - def _l2_regularizer(weights): - if l2_coef is not None: - return l2_coef * tf.reduce_sum(tf.square(weights)) - else: - return 0.0 - - # Get ops for computing LM embeddings and mask - lm_embeddings = bilm_ops['lm_embeddings'] - mask = bilm_ops['mask'] - - n_lm_layers = int(lm_embeddings.get_shape()[1]) - lm_dim = int(lm_embeddings.get_shape()[3]) - # import pdb; pdb.set_trace() - - with tf.control_dependencies([lm_embeddings, mask]): - # Cast the mask and broadcast for layer use. - mask_float = tf.cast(mask, 'float32') - broadcast_mask = tf.expand_dims(mask_float, axis=-1) - - def _do_ln(x): - # do layer normalization excluding the mask - x_masked = x * broadcast_mask - N = tf.reduce_sum(mask_float) * lm_dim - mean = tf.reduce_sum(x_masked) / N - variance = tf.reduce_sum(((x_masked - mean) * broadcast_mask) ** 2) / N - return tf.nn.batch_normalization( - x, mean, variance, None, None, 1E-12 - ) - - if use_top_only: - layers = tf.split(lm_embeddings, n_lm_layers, axis=1) - # just the top layer - sum_pieces = tf.squeeze(layers[-1], squeeze_dims=1) - # no regularization - reg = 0.0 - else: - with tf.variable_scope("aggregation", reuse=reuse): - W = tf.get_variable( - '{}_ELMo_W'.format(name), - shape=(n_lm_layers,), - initializer=tf.zeros_initializer, - regularizer=_l2_regularizer, - trainable=True, - ) - - # normalize the weights - normed_weights = tf.split( - tf.nn.softmax(W + 1.0 / n_lm_layers), n_lm_layers - ) - # split LM layers - layers = tf.split(lm_embeddings, n_lm_layers, axis=1) - - # compute the weighted, normalized LM activations - pieces = [] - for w, t in zip(normed_weights, layers): - if do_layer_norm: - pieces.append(w * _do_ln(tf.squeeze(t, squeeze_dims=1))) - else: - pieces.append(w * tf.squeeze(t, squeeze_dims=1)) - sum_pieces = tf.add_n(pieces) - - # get the regularizer - reg = [ - r for r in tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES) - if r.name.find('{}_ELMo_W/'.format(name)) >= 0 - ] - if len(reg) != 1: - raise ValueError - - # scale the weighted sum by gamma - - with tf.variable_scope("aggregation", reuse=reuse): - gamma = tf.get_variable( - '{}_ELMo_gamma'.format(name), - shape=(1,), - initializer=tf.ones_initializer, - regularizer=None, - trainable=True, - ) - - weighted_lm_layers = sum_pieces * gamma - weighted_lm_layers_masked = sum_pieces * broadcast_mask - - weighted_lm_layers_sum = tf.reduce_sum(weighted_lm_layers_masked, 1) - - mask_sum = tf.reduce_sum(mask_float, 1) - mask_sum = tf.maximum(mask_sum, [1]) - - weighted_lm_layers_mean = weighted_lm_layers_sum / tf.expand_dims(mask_sum, - 1) - - word_emb_2n = tf.squeeze(layers[0], [1]) - word_emb_1n = tf.slice(word_emb_2n, [0, 0, 0], [-1, -1, lm_dim // 2]) # to 512 - lstm_outputs1 = tf.squeeze(layers[1], [1]) - lstm_outputs2 = tf.squeeze(layers[2], [1]) - - ret = {'weighted_op': weighted_lm_layers, - 'mean_op': weighted_lm_layers_mean, - 'regularization_op': reg, - 'word_emb': word_emb_1n, - 'lstm_outputs1': lstm_outputs1, - 'lstm_outputs2': lstm_outputs2, } - - return ret diff --git a/deeppavlov/models/elmo/train_utils.py b/deeppavlov/models/elmo/train_utils.py deleted file mode 100644 index 4be3c7f4d3..0000000000 --- a/deeppavlov/models/elmo/train_utils.py +++ /dev/null @@ -1,244 +0,0 @@ -# originally based on https://github.com/allenai/bilm-tf/blob/master/bilm/training.py - -# Modifications copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import re - -import h5py -import tensorflow as tf - -from deeppavlov.models.elmo.bilm_model import LanguageModel - -tf.logging.set_verbosity(tf.logging.INFO) - - -def average_gradients(tower_grads, batch_size, options): - # calculate average gradient for each shared variable across all GPUs - average_grads = [] - for grad_and_vars in zip(*tower_grads): - # Note that each grad_and_vars looks like the following: - # ((grad0_gpu0, var0_gpu0), ... , (grad0_gpuN, var0_gpuN)) - # We need to average the gradients across each GPU. - - g0, v0 = grad_and_vars[0] - - if g0 is None: - # no gradient for this variable, skip it - average_grads.append((g0, v0)) - continue - - if isinstance(g0, tf.IndexedSlices): - # If the gradient is type IndexedSlices then this is a sparse - # gradient with attributes indices and values. - # To average, need to concat them individually then create - # a new IndexedSlices object. - indices = [] - values = [] - for g, v in grad_and_vars: - indices.append(g.indices) - values.append(g.values) - all_indices = tf.concat(indices, 0) - avg_values = tf.concat(values, 0) / len(grad_and_vars) - # deduplicate across indices - av, ai = _deduplicate_indexed_slices(avg_values, all_indices) - grad = tf.IndexedSlices(av, ai, dense_shape=g0.dense_shape) - - else: - # a normal tensor can just do a simple average - grads = [] - for g, v in grad_and_vars: - # Add 0 dimension to the gradients to represent the tower. - expanded_g = tf.expand_dims(g, 0) - # Append on a 'tower' dimension which we will average over - grads.append(expanded_g) - - # Average over the 'tower' dimension. - grad = tf.concat(grads, 0) - grad = tf.reduce_mean(grad, 0) - - # the Variables are redundant because they are shared - # across towers. So.. just return the first tower's pointer to - # the Variable. - v = grad_and_vars[0][1] - grad_and_var = (grad, v) - - average_grads.append(grad_and_var) - - assert len(average_grads) == len(list(zip(*tower_grads))) - - return average_grads - - -def summary_gradient_updates(grads, opt, lr): - """get summary ops for the magnitude of gradient updates""" - - # strategy: - # make a dict of variable name -> [variable, grad, adagrad slot] - vars_grads = {} - for v in tf.trainable_variables(): - vars_grads[v.name] = [v, None, None] - for g, v in grads: - vars_grads[v.name][1] = g - vars_grads[v.name][2] = opt.get_slot(v, 'accumulator') - - # now make summaries - ret = [] - for vname, (v, g, a) in vars_grads.items(): - - if g is None: - continue - - if isinstance(g, tf.IndexedSlices): - # a sparse gradient - only take norm of params that are updated - updates = lr * g.values - if a is not None: - updates /= tf.sqrt(tf.gather(a, g.indices)) - else: - updates = lr * g - if a is not None: - updates /= tf.sqrt(a) - - values_norm = tf.sqrt(tf.reduce_sum(v * v)) + 1.0e-7 - updates_norm = tf.sqrt(tf.reduce_sum(updates * updates)) - ret.append(tf.summary.scalar('UPDATE/' + vname.replace(":", "_"), updates_norm / values_norm)) - - return ret - - -def _deduplicate_indexed_slices(values, indices): - """Sums `values` associated with any non-unique `indices`. - Args: - values: A `Tensor` with rank >= 1. - indices: A one-dimensional integer `Tensor`, indexing into the first - dimension of `values` (as in an IndexedSlices object). - Returns: - A tuple of (`summed_values`, `unique_indices`) where `unique_indices` is a - de-duplicated version of `indices` and `summed_values` contains the sum of - `values` slices associated with each unique index. - """ - unique_indices, new_index_positions = tf.unique(indices) - summed_values = tf.unsorted_segment_sum(values, - new_index_positions, - tf.shape(unique_indices)[0]) - return (summed_values, unique_indices) - - -def clip_by_global_norm_summary(t_list, clip_norm, norm_name, variables): - # wrapper around tf.clip_by_global_norm that also does summary ops of norms - - # compute norms - # use global_norm with one element to handle IndexedSlices vs dense - norms = [tf.global_norm([t]) for t in t_list] - - # summary ops before clipping - summary_ops = [] - for ns, v in zip(norms, variables): - name = 'norm_pre_clip/' + v.name.replace(":", "_") - summary_ops.append(tf.summary.scalar(name, ns)) - - # clip - clipped_t_list, tf_norm = tf.clip_by_global_norm(t_list, clip_norm) - - # summary ops after clipping - norms_post = [tf.global_norm([t]) for t in clipped_t_list] - for ns, v in zip(norms_post, variables): - name = 'norm_post_clip/' + v.name.replace(":", "_") - summary_ops.append(tf.summary.scalar(name, ns)) - - summary_ops.append(tf.summary.scalar(norm_name, tf_norm)) - - return clipped_t_list, tf_norm, summary_ops - - -def clip_grads(grads, options, do_summaries, global_step): - # grads = [(grad1, var1), (grad2, var2), ...] - def _clip_norms(grad_and_vars, val, name): - # grad_and_vars is a list of (g, v) pairs - grad_tensors = [g for g, v in grad_and_vars] - vv = [v for g, v in grad_and_vars] - scaled_val = val - if do_summaries: - clipped_tensors, g_norm, so = clip_by_global_norm_summary( - grad_tensors, scaled_val, name, vv) - else: - so = [] - clipped_tensors, g_norm = tf.clip_by_global_norm( - grad_tensors, scaled_val) - - ret = [] - for t, (g, v) in zip(clipped_tensors, grad_and_vars): - ret.append((t, v)) - - return ret, so - - all_clip_norm_val = options['all_clip_norm_val'] - ret, summary_ops = _clip_norms(grads, all_clip_norm_val, 'norm_grad') - - assert len(ret) == len(grads) - - return ret, summary_ops - - -def safely_str2int(in_str: str): - try: - i = int(in_str) - except ValueError: - i = None - return i - - -def dump_weights(tf_save_dir, outfile, options): - """ - Dump the trained weights from a model to a HDF5 file. - """ - - def _get_outname(tf_name): - outname = re.sub(':0$', '', tf_name) - outname = outname.lstrip('lm/') - outname = re.sub('/rnn/', '/RNN/', outname) - outname = re.sub('/multi_rnn_cell/', '/MultiRNNCell/', outname) - outname = re.sub('/cell_', '/Cell', outname) - outname = re.sub('/lstm_cell/', '/LSTMCell/', outname) - if '/RNN/' in outname: - if 'projection' in outname: - outname = re.sub('projection/kernel', 'W_P_0', outname) - else: - outname = re.sub('/kernel', '/W_0', outname) - outname = re.sub('/bias', '/B', outname) - return outname - - ckpt_file = tf.train.latest_checkpoint(tf_save_dir) - - config = tf.ConfigProto(allow_soft_placement=True) - with tf.Graph().as_default(): - with tf.Session(config=config) as sess: - with tf.variable_scope('lm'): - LanguageModel(options, False) # Create graph - # we use the "Saver" class to load the variables - loader = tf.train.Saver() - loader.restore(sess, ckpt_file) - - with h5py.File(outfile, 'w') as fout: - for v in tf.trainable_variables(): - if v.name.find('softmax') >= 0: - # don't dump these - continue - outname = _get_outname(v.name) - # print("Saving variable {0} with name {1}".format( - # v.name, outname)) - shape = v.get_shape().as_list() - dset = fout.create_dataset(outname, shape, dtype='float32') - values = sess.run([v])[0] - dset[...] = values diff --git a/deeppavlov/models/embedders/bow_embedder.py b/deeppavlov/models/embedders/bow_embedder.py deleted file mode 100644 index cb31247abe..0000000000 --- a/deeppavlov/models/embedders/bow_embedder.py +++ /dev/null @@ -1,58 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from typing import List - -import numpy as np - -from deeppavlov.core.common.registry import register -from deeppavlov.core.models.component import Component - - -@register('bow') -class BoWEmbedder(Component): - """ - Performs one-hot encoding of tokens based on a pre-built vocabulary of tokens. - - Parameters: - depth: size of output numpy vector. - with_counts: flag denotes whether to use binary encoding (with zeros and ones), - or to use counts as token representation. - - Example: - .. code:: python - - >>> bow = BoWEmbedder(depth=3) - - >>> bow([[0, 1], [1], []) - [array([1, 1, 0], dtype=int32), - array([0, 1, 0], dtype=int32), - array([0, 0, 0], dtype=int32)] - """ - - def __init__(self, depth: int, with_counts: bool = False, **kwargs) -> None: - self.depth = depth - self.with_counts = with_counts - - def _encode(self, token_indices: List[int]) -> np.ndarray: - bow = np.zeros([self.depth], dtype=np.int32) - for idx in token_indices: - if self.with_counts: - bow[idx] += 1 - else: - bow[idx] = 1 - return bow - - def __call__(self, batch: List[List[int]]) -> List[np.ndarray]: - return [self._encode(sample) for sample in batch] diff --git a/deeppavlov/models/embedders/elmo_embedder.py b/deeppavlov/models/embedders/elmo_embedder.py deleted file mode 100644 index 09990ce648..0000000000 --- a/deeppavlov/models/embedders/elmo_embedder.py +++ /dev/null @@ -1,314 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import sys -from logging import getLogger -from typing import Iterator, List, Union, Optional - -import numpy as np -import tensorflow as tf -import tensorflow_hub as hub -from overrides import overrides - -from deeppavlov.core.commands.utils import expand_path -from deeppavlov.core.common.registry import register -from deeppavlov.core.data.utils import zero_pad, chunk_generator -from deeppavlov.core.models.component import Component -from deeppavlov.core.models.tf_backend import TfModelMeta - -log = getLogger(__name__) - - -@register('elmo_embedder') -class ELMoEmbedder(Component, metaclass=TfModelMeta): - """ - ``ELMo`` (Embeddings from Language Models) representations are pre-trained contextual representations from - large-scale bidirectional language models. See a paper `Deep contextualized word representations - `__ for more information about the algorithm and a detailed analysis. - - Parameters: - spec: A ``ModuleSpec`` defining the Module to instantiate or a path where to load a ``ModuleSpec`` from via - ``tenserflow_hub.load_module_spec`` by using `TensorFlow Hub `__. - elmo_output_names: A list of output ELMo. You can use combination of - ``["word_emb", "lstm_outputs1", "lstm_outputs2","elmo"]`` and you can use separately ``["default"]``. - - Where, - - * ``word_emb`` - CNN embedding (default dim 512) - * ``lstm_outputs*`` - ouputs of lstm (default dim 1024) - * ``elmo`` - weighted sum of cnn and lstm outputs (default dim 1024) - * ``default`` - mean ``elmo`` vector for sentence (default dim 1024) - - See `TensorFlow Hub `__ for more information about it. - dim: Can be used for output embeddings dimensionality reduction if elmo_output_names != ['default'] - pad_zero: Whether to use pad samples or not. - concat_last_axis: A boolean that enables/disables last axis concatenation. It is not used for - ``elmo_output_names = ["default"]``. - max_token: The number limitation of words per a batch line. - mini_batch_size: It is used to reduce the memory requirements of the device. - - - If some required packages are missing, install all the requirements by running in command line: - - .. code:: bash - - python -m deeppavlov install - - where ```` is a path to one of the :config:`provided config files ` - or its name without an extension, for example : - - .. code:: bash - - python -m deeppavlov install elmo_ru-news - - Examples: - >>> from deeppavlov.models.embedders.elmo_embedder import ELMoEmbedder - >>> elmo = ELMoEmbedder("http://files.deeppavlov.ai/deeppavlov_data/elmo_ru-news_wmt11-16_1.5M_steps.tar.gz") - >>> elmo([['вопрос', 'жизни', 'Вселенной', 'и', 'вообще', 'всего'], ['42']]) - array([[ 0.00719104, 0.08544601, -0.07179783, ..., 0.10879009, - -0.18630421, -0.2189409 ], - [ 0.16325025, -0.04736076, 0.12354863, ..., -0.1889013 , - 0.04972512, 0.83029324]], dtype=float32) - - You can use ELMo models from DeepPavlov as usual `TensorFlow Hub Module - `_. - - >>> import tensorflow as tf - >>> import tensorflow_hub as hub - >>> elmo = hub.Module("http://files.deeppavlov.ai/deeppavlov_data/elmo_ru-news_wmt11-16_1.5M_steps.tar.gz", - trainable=True) - >>> sess = tf.Session() - >>> sess.run(tf.global_variables_initializer()) - >>> embeddings = elmo(["это предложение", "word"], signature="default", as_dict=True)["elmo"] - >>> sess.run(embeddings) - array([[[ 0.05817392, 0.22493343, -0.19202903, ..., -0.14448944, - -0.12425567, 1.0148407 ], - [ 0.53596294, 0.2868537 , 0.28028542, ..., -0.08028372, - 0.49089077, 0.75939953]], - [[ 0.3433637 , 1.0031182 , -0.1597258 , ..., 1.2442509 , - 0.61029315, 0.43388373], - [ 0.05370751, 0.02260921, 0.01074906, ..., 0.08748816, - -0.0066415 , -0.01344293]]], dtype=float32) - - TensorFlow Hub module also supports tokenized sentences in the following format. - - >>> tokens_input = [["мама", "мыла", "раму"], ["рама", "", ""]] - >>> tokens_length = [3, 1] - >>> embeddings = elmo( - inputs={ - "tokens": tokens_input, - "sequence_len": tokens_length - }, - signature="tokens", - as_dict=True)["elmo"] - >>> sess.run(embeddings) - array([[[ 0.6040001 , -0.16130011, 0.56478846, ..., -0.00376141, - -0.03820051, 0.26321286], - [ 0.01834148, 0.17055789, 0.5311495 , ..., -0.5675535 , - 0.62669843, -0.05939034], - [ 0.3242596 , 0.17909613, 0.01657108, ..., 0.1866098 , - 0.7392496 , 0.08285746]], - [[ 1.1322289 , 0.19077688, -0.17811403, ..., 0.42973226, - 0.23391506, -0.01294377], - [ 0.05370751, 0.02260921, 0.01074906, ..., 0.08748816, - -0.0066415 , -0.01344293], - [ 0.05370751, 0.02260921, 0.01074906, ..., 0.08748816, - -0.0066415 , -0.01344293]]], dtype=float32) - - You can also get ``hub.text_embedding_column`` like described `here - `_. - - - """ - - def __init__(self, spec: str, elmo_output_names: Optional[List] = None, - dim: Optional[int] = None, pad_zero: bool = False, - concat_last_axis: bool = True, max_token: Optional[int] = None, - mini_batch_size: int = 32, **kwargs) -> None: - - self.spec = spec if '://' in spec else str(expand_path(spec)) - - self.elmo_output_dims = {'word_emb': 512, - 'lstm_outputs1': 1024, - 'lstm_outputs2': 1024, - 'elmo': 1024, - 'default': 1024} - elmo_output_names = elmo_output_names or ['default'] - self.elmo_output_names = elmo_output_names - elmo_output_names_set = set(self.elmo_output_names) - if elmo_output_names_set - set(self.elmo_output_dims.keys()): - log.error(f'Incorrect elmo_output_names = {elmo_output_names} . You can use either ["default"] or some of' - '["word_emb", "lstm_outputs1", "lstm_outputs2","elmo"]') - sys.exit(1) - - if elmo_output_names_set - {'default'} and elmo_output_names_set - {"word_emb", "lstm_outputs1", - "lstm_outputs2", "elmo"}: - log.error('Incompatible conditions: you can use either ["default"] or list of ' - '["word_emb", "lstm_outputs1", "lstm_outputs2","elmo"] ') - sys.exit(1) - - self.pad_zero = pad_zero - self.concat_last_axis = concat_last_axis - self.max_token = max_token - self.mini_batch_size = mini_batch_size - self.elmo_outputs, self.sess, self.tokens_ph, self.tokens_length_ph = self._load() - self.dim = self._get_dims(self.elmo_output_names, dim, concat_last_axis) - - def _get_dims(self, elmo_output_names, in_dim, concat_last_axis): - dims = [self.elmo_output_dims[elmo_output_name] for elmo_output_name in elmo_output_names] - if concat_last_axis: - dims = in_dim if in_dim else sum(dims) - else: - if in_dim: - log.warning(f"[ dim = {in_dim} is not used, because the elmo_output_names has more than one element.]") - return dims - - def _load(self): - """ - Load a ELMo TensorFlow Hub Module from a self.spec. - - Returns: - ELMo pre-trained model wrapped in TenserFlow Hub Module. - """ - elmo_module = hub.Module(self.spec, trainable=False) - - sess_config = tf.ConfigProto() - sess_config.gpu_options.allow_growth = True - sess = tf.Session(config=sess_config) - - tokens_ph = tf.placeholder(shape=(None, None), dtype=tf.string, name='tokens') - tokens_length_ph = tf.placeholder(shape=(None,), dtype=tf.int32, name='tokens_length') - - elmo_outputs = elmo_module(inputs={"tokens": tokens_ph, - "sequence_len": tokens_length_ph}, - signature="tokens", - as_dict=True) - - sess.run(tf.global_variables_initializer()) - - return elmo_outputs, sess, tokens_ph, tokens_length_ph - - def _fill_batch(self, batch): - """ - Fill batch correct values. - - Args: - batch: A list of tokenized text samples. - - Returns: - batch: A list of tokenized text samples. - """ - - if not batch: - empty_vec = np.zeros(self.dim, dtype=np.float32) - return [empty_vec] if 'default' in self.elmo_output_names else [[empty_vec]] - - filled_batch = [] - for batch_line in batch: - batch_line = batch_line if batch_line else [''] - filled_batch.append(batch_line) - - batch = filled_batch - - if self.max_token: - batch = [batch_line[:self.max_token] for batch_line in batch] - tokens_length = [len(batch_line) for batch_line in batch] - tokens_length_max = max(tokens_length) - batch = [batch_line + [''] * (tokens_length_max - len(batch_line)) for batch_line in batch] - - return batch, tokens_length - - def _mini_batch_fit(self, batch: List[List[str]], *args, **kwargs) -> Union[List[np.ndarray], np.ndarray]: - """ - Embed sentences from a batch. - - Args: - batch: A list of tokenized text samples. - - Returns: - A batch of ELMo embeddings. - """ - batch, tokens_length = self._fill_batch(batch) - - elmo_outputs = self.sess.run(self.elmo_outputs, - feed_dict={self.tokens_ph: batch, - self.tokens_length_ph: tokens_length}) - - if 'default' in self.elmo_output_names: - elmo_output_values = elmo_outputs['default'] - dim0, dim1 = elmo_output_values.shape - if self.dim != dim1: - shape = (dim0, self.dim if isinstance(self.dim, int) else self.dim[0]) - elmo_output_values = np.resize(elmo_output_values, shape) - else: - elmo_output_values = [elmo_outputs[elmo_output_name] for elmo_output_name in self.elmo_output_names] - elmo_output_values = np.concatenate(elmo_output_values, axis=-1) - - dim0, dim1, dim2 = elmo_output_values.shape - if self.concat_last_axis and self.dim != dim2: - shape = (dim0, dim1, self.dim) - elmo_output_values = np.resize(elmo_output_values, shape) - - elmo_output_values = [elmo_output_values_line[:length_line] - for length_line, elmo_output_values_line in zip(tokens_length, elmo_output_values)] - - if not self.concat_last_axis: - slice_indexes = np.cumsum(self.dim).tolist()[:-1] - elmo_output_values = [[np.array_split(vec, slice_indexes) for vec in tokens] - for tokens in elmo_output_values] - - return elmo_output_values - - @overrides - def __call__(self, batch: List[List[str]], - *args, **kwargs) -> Union[List[np.ndarray], np.ndarray]: - """ - Embed sentences from a batch. - - Args: - batch: A list of tokenized text samples. - - Returns: - A batch of ELMo embeddings. - """ - if len(batch) > self.mini_batch_size: - batch_gen = chunk_generator(batch, self.mini_batch_size) - elmo_output_values = [] - for mini_batch in batch_gen: - mini_batch_out = self._mini_batch_fit(mini_batch, *args, **kwargs) - elmo_output_values.extend(mini_batch_out) - else: - elmo_output_values = self._mini_batch_fit(batch, *args, **kwargs) - - if self.pad_zero: - elmo_output_values = zero_pad(elmo_output_values) - - return elmo_output_values - - def __iter__(self) -> Iterator: - """ - Iterate over all words from a ELMo model vocabulary. - The ELMo model vocabulary consists of ``['', '', '']``. - - Returns: - An iterator of three elements ``['', '', '']``. - """ - - yield from ['', '', ''] - - def destroy(self): - if hasattr(self, 'sess'): - for k in list(self.sess.graph.get_all_collection_keys()): - self.sess.graph.clear_collection(k) - super().destroy() diff --git a/deeppavlov/models/embedders/glove_embedder.py b/deeppavlov/models/embedders/glove_embedder.py deleted file mode 100644 index 1bf3ecd742..0000000000 --- a/deeppavlov/models/embedders/glove_embedder.py +++ /dev/null @@ -1,74 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -import pickle -from logging import getLogger -from typing import Iterator - -import numpy as np -from gensim.models import KeyedVectors -from overrides import overrides - -from deeppavlov.core.common.registry import register -from deeppavlov.models.embedders.abstract_embedder import Embedder - -log = getLogger(__name__) - - -@register('glove') -class GloVeEmbedder(Embedder): - """ - Class implements GloVe embedding model - - Args: - load_path: path where to load pre-trained embedding model from - pad_zero: whether to pad samples or not - - Attributes: - model: GloVe model instance - tok2emb: dictionary with already embedded tokens - dim: dimension of embeddings - pad_zero: whether to pad sequence of tokens with zeros or not - load_path: path with pre-trained GloVe model - """ - - def _get_word_vector(self, w: str) -> np.ndarray: - return self.model[w] - - def load(self) -> None: - """ - Load dict of embeddings from given file - """ - log.info(f"[loading GloVe embeddings from `{self.load_path}`]") - if not self.load_path.exists(): - log.warning(f'{self.load_path} does not exist, cannot load embeddings from it!') - return - self.model = KeyedVectors.load_word2vec_format(str(self.load_path)) - self.dim = self.model.vector_size - - @overrides - def __iter__(self) -> Iterator[str]: - """ - Iterate over all words from GloVe model vocabulary - - Returns: - iterator - """ - yield from self.model.vocab - - def serialize(self) -> bytes: - return pickle.dumps(self.model, protocol=4) - - def deserialize(self, data: bytes) -> None: - self.model = pickle.loads(data) - self.dim = self.model.vector_size diff --git a/deeppavlov/core/layers/__init__.py b/deeppavlov/models/entity_extraction/__init__.py similarity index 100% rename from deeppavlov/core/layers/__init__.py rename to deeppavlov/models/entity_extraction/__init__.py diff --git a/deeppavlov/models/kbqa/entity_detection_parser.py b/deeppavlov/models/entity_extraction/entity_detection_parser.py similarity index 61% rename from deeppavlov/models/kbqa/entity_detection_parser.py rename to deeppavlov/models/entity_extraction/entity_detection_parser.py index d223af8ad9..7719122c3d 100644 --- a/deeppavlov/models/kbqa/entity_detection_parser.py +++ b/deeppavlov/models/entity_extraction/entity_detection_parser.py @@ -12,8 +12,8 @@ # See the License for the specific language governing permissions and # limitations under the License. -from typing import List, Tuple, Union, Dict from collections import defaultdict +from typing import List, Tuple, Union, Dict import numpy as np @@ -23,44 +23,28 @@ @register('question_sign_checker') -class QuestionSignChecker(Component): - """This class adds question sign if it is absent or replaces dot with question sign""" - - def __init__(self, **kwargs): - pass - - def __call__(self, questions: List[str]) -> List[str]: - questions_sanitized = [] - for question in questions: - if not question.endswith('?'): - if question.endswith('.'): - question = question[:-1] + '?' - else: - question += '?' - questions_sanitized.append(question) - return questions_sanitized +def question_sign_checker(questions: List[str]) -> List[str]: + """Adds question sign if it is absent or replaces dots in the end with question sign.""" + return [question if question.endswith('?') else f'{question.rstrip(".")}?' for question in questions] @register('entity_detection_parser') class EntityDetectionParser(Component): """This class parses probabilities of tokens to be a token from the entity substring.""" - def __init__(self, entity_tags: List[str], type_tag: str, o_tag: str, tags_file: str, ignore_points: bool = False, + def __init__(self, o_tag: str, tags_file: str, entity_tags: List[str] = None, ignore_points: bool = False, return_entities_with_tags: bool = False, thres_proba: float = 0.8, **kwargs): """ - Args: - entity_tags: tags for entities - type_tag: tag for types o_tag: tag for tokens which are neither entities nor types tags_file: filename with NER tags + entity_tags: tags for entities ignore_points: whether to consider points as separate symbols return_entities_with_tags: whether to return a dict of tags (keys) and list of entity substrings (values) or simply a list of entity substrings thres_proba: if the probability of the tag is less than thres_proba, we assign the tag as 'O' """ self.entity_tags = entity_tags - self.type_tag = type_tag self.o_tag = o_tag self.ignore_points = ignore_points self.return_entities_with_tags = return_entities_with_tags @@ -68,22 +52,24 @@ def __init__(self, entity_tags: List[str], type_tag: str, o_tag: str, tags_file: self.tag_ind_dict = {} with open(str(expand_path(tags_file))) as fl: tags = [line.split('\t')[0] for line in fl.readlines()] + if self.entity_tags is None: + self.entity_tags = list( + {tag.split('-')[1] for tag in tags if len(tag.split('-')) > 1}.difference({self.o_tag})) + self.entity_prob_ind = {entity_tag: [i for i, tag in enumerate(tags) if entity_tag in tag] for entity_tag in self.entity_tags} - self.type_prob_ind = [i for i, tag in enumerate(tags) if self.type_tag in tag] - self.et_prob_ind = [i for tag, ind in self.entity_prob_ind.items() for i in ind] + self.type_prob_ind + self.tags_ind = {tag: i for i, tag in enumerate(tags)} + self.et_prob_ind = [i for tag, ind in self.entity_prob_ind.items() for i in ind] for entity_tag, tag_ind in self.entity_prob_ind.items(): for ind in tag_ind: self.tag_ind_dict[ind] = entity_tag - for ind in self.type_prob_ind: - self.tag_ind_dict[ind] = self.type_tag self.tag_ind_dict[0] = self.o_tag - def __call__(self, question_tokens: List[List[str]], token_probas: List[List[List[float]]]) -> \ + def __call__(self, question_tokens_batch: List[List[str]], tokens_info_batch: List[List[List[float]]], + tokens_probas_batch: np.ndarray) -> \ Tuple[List[Union[List[str], Dict[str, List[str]]]], List[List[str]], List[Union[List[int], Dict[str, List[List[int]]]]]]: """ - Args: question_tokens: tokenized questions token_probas: list of probabilities of question tokens @@ -93,30 +79,28 @@ def __call__(self, question_tokens: List[List[str]], token_probas: List[List[Lis Batch of lists of token indices in the text which correspond to entities """ entities_batch = [] - types_batch = [] positions_batch = [] - for tokens, probas in zip(question_tokens, token_probas): - tags, tag_probas = self.tags_from_probas(probas) - entities, types, positions = self.entities_from_tags(tokens, tags, tag_probas) + probas_batch = [] + for tokens, tokens_info, probas in zip(question_tokens_batch, tokens_info_batch, tokens_probas_batch): + entities, positions, entities_probas = self.entities_from_tags(tokens, tokens_info, probas) entities_batch.append(entities) - types_batch.append(types) positions_batch.append(positions) - return entities_batch, types_batch, positions_batch + probas_batch.append(entities_probas) + return entities_batch, positions_batch, probas_batch - def tags_from_probas(self, probas): + def tags_from_probas(self, tokens, probas): """ This method makes a list of tags from a list of probas for tags - Args: + tokens: text tokens list probas: probabilities for tokens to belong to particular tags - Returns: list of tags for tokens list of probabilities of these tags """ tags = [] tag_probas = [] - for proba in probas: + for token, proba in zip(tokens, probas): tag_num = np.argmax(proba) if tag_num in self.et_prob_ind: if proba[tag_num] < self.thres_proba: @@ -132,12 +116,10 @@ def entities_from_tags(self, tokens, tags, tag_probas): """ This method makes lists of substrings corresponding to entities and entity types and a list of indices of tokens which correspond to entities - Args: tokens: list of tokens of the text tags: list of tags for tokens tag_probas: list of probabilities of tags - Returns: list of entity substrings (or a dict of tags (keys) and entity substrings (values)) list of substrings for entity types @@ -145,62 +127,63 @@ def entities_from_tags(self, tokens, tags, tag_probas): and list of indices of entity tokens) """ entities_dict = defaultdict(list) - entity_types = [] entity_dict = defaultdict(list) entity_positions_dict = defaultdict(list) entities_positions_dict = defaultdict(list) - entity_type = [] - types_probas = [] - type_proba = [] + entities_probas_dict = defaultdict(list) + entity_probas_dict = defaultdict(list) replace_tokens = [(' - ', '-'), ("'s", ''), (' .', ''), ('{', ''), ('}', ''), (' ', ' '), ('"', "'"), ('(', ''), (')', '')] cnt = 0 - for n, (tok, tag, proba) in enumerate(zip(tokens, tags, tag_probas)): - if tag in self.entity_tags: - if self.ignore_points: - if len(tok) == 1 and n < len(tokens) - 1 and tokens[n + 1] == ".": - entity_dict[tag].append(f"{tok}.") - else: - entity_dict[tag].append(tok) - else: - entity_dict[tag].append(tok) - entity_positions_dict[tag].append(cnt) - - elif tag == self.type_tag: - entity_type.append(tok) - type_proba.append(proba) - elif self.ignore_points and tok == "." and n > 0 and len(tokens[n - 1]) == 1: - cnt -= 1 + for n, (tok, tag, probas) in enumerate(zip(tokens, tags, tag_probas)): + if tag.split('-')[-1] in self.entity_tags: + f_tag = tag.split("-")[-1] + if tag.startswith("B-") and any(entity_dict.values()): + for c_tag, entity in entity_dict.items(): + entity = ' '.join(entity) + for old, new in replace_tokens: + entity = entity.replace(old, new) + if entity: + entities_dict[c_tag].append(entity) + entities_positions_dict[c_tag].append(entity_positions_dict[c_tag]) + cur_probas = entity_probas_dict[c_tag] + entities_probas_dict[c_tag].append(round(sum(cur_probas) / len(cur_probas), 4)) + entity_dict[c_tag] = [] + entity_positions_dict[c_tag] = [] + entity_probas_dict[c_tag] = [] + + entity_dict[f_tag].append(tok) + entity_positions_dict[f_tag].append(cnt) + entity_probas_dict[f_tag].append(probas[self.tags_ind[tag]]) + elif any(entity_dict.values()): for tag, entity in entity_dict.items(): + c_tag = tag.split("-")[-1] entity = ' '.join(entity) for old, new in replace_tokens: entity = entity.replace(old, new) if entity: - entities_dict[tag].append(entity) - entities_positions_dict[tag].append(entity_positions_dict[tag]) - entity_dict[tag] = [] - entity_positions_dict[tag] = [] - elif len(entity_type) > 0: - entity_type = ' '.join(entity_type) - for old, new in replace_tokens: - entity_type = entity_type.replace(old, new) - entity_types.append(entity_type) - entity_type = [] - types_probas.append(np.mean(type_proba)) - type_proba = [] + entities_dict[c_tag].append(entity) + entities_positions_dict[c_tag].append(entity_positions_dict[c_tag]) + cur_probas = entity_probas_dict[c_tag] + entities_probas_dict[c_tag].append(round(sum(cur_probas) / len(cur_probas), 4)) + + entity_dict[c_tag] = [] + entity_positions_dict[c_tag] = [] + entity_probas_dict[c_tag] = [] cnt += 1 - if entity_types: - entity_types = sorted(zip(entity_types, types_probas), key=lambda x: x[1], reverse=True) - entity_types = [entity_type[0] for entity_type in entity_types] - entities_list = [entity for tag, entities in entities_dict.items() for entity in entities] entities_positions_list = [position for tag, positions in entities_positions_dict.items() for position in positions] + entities_probas_list = [proba for tag, probas in entities_probas_dict.items() for proba in probas] + + entities_dict = dict(entities_dict) + entities_positions_dict = dict(entities_positions_dict) + entities_probas_dict = dict(entities_probas_dict) if self.return_entities_with_tags: - return entities_dict, entity_types, entities_positions_dict + return entities_dict, entities_positions_dict, entities_probas_dict else: - return entities_list, entity_types, entities_positions_list + return entities_list, entities_positions_list, entities_probas_list diff --git a/deeppavlov/models/entity_extraction/entity_linking.py b/deeppavlov/models/entity_extraction/entity_linking.py new file mode 100644 index 0000000000..00441fce1a --- /dev/null +++ b/deeppavlov/models/entity_extraction/entity_linking.py @@ -0,0 +1,583 @@ +# Copyright 2017 Neural Networks and Deep Learning lab, MIPT +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import re +import sqlite3 +from logging import getLogger +from typing import List, Dict, Tuple, Union, Any +from collections import defaultdict + +import pymorphy2 +from hdt import HDTDocument +from nltk.corpus import stopwords +from rapidfuzz import fuzz + +from deeppavlov.core.common.registry import register +from deeppavlov.core.models.component import Component +from deeppavlov.core.models.serializable import Serializable +from deeppavlov.core.commands.utils import expand_path + +log = getLogger(__name__) + + +@register("entity_linker") +class EntityLinker(Component, Serializable): + """ + Class for linking of entity substrings in the document to entities in Wikidata + """ + + def __init__( + self, + load_path: str, + entities_database_filename: str, + entity_ranker=None, + num_entities_for_bert_ranking: int = 50, + wikidata_file: str = None, + num_entities_to_return: int = 10, + max_text_len: int = 300, + lang: str = "en", + use_descriptions: bool = True, + use_tags: bool = False, + lemmatize: bool = False, + full_paragraph: bool = False, + use_connections: bool = False, + max_paragraph_len: int = 250, + **kwargs, + ) -> None: + """ + + Args: + load_path: path to folder with inverted index files + entities_database_filename: file with sqlite database with Wikidata entities index + entity_ranker: deeppavlov.models.torch_bert.torch_transformers_el_ranker.TorchTransformersEntityRankerInfer + num_entities_for_bert_ranking: number of candidate entities for BERT ranking using description and context + wikidata_file: .hdt file with Wikidata graph + num_entities_to_return: number of candidate entities for the substring which are returned + max_text_len: max length of context for entity ranking by description + lang: russian or english + use_description: whether to perform entity ranking by context and description + use_tags: whether to use ner tags for entity filtering + lemmatize: whether to lemmatize tokens + full_paragraph: whether to use full paragraph for entity ranking by context and description + use_connections: whether to ranking entities by number of connections in Wikidata + max_paragraph_len: maximum length of paragraph for ranking by context and description + **kwargs: + """ + super().__init__(save_path=None, load_path=load_path) + self.morph = pymorphy2.MorphAnalyzer() + self.lemmatize = lemmatize + self.entities_database_filename = entities_database_filename + self.num_entities_for_bert_ranking = num_entities_for_bert_ranking + self.wikidata_file = wikidata_file + self.entity_ranker = entity_ranker + self.num_entities_to_return = num_entities_to_return + self.max_text_len = max_text_len + self.lang = f"@{lang}" + if self.lang == "@en": + self.stopwords = set(stopwords.words("english")) + elif self.lang == "@ru": + self.stopwords = set(stopwords.words("russian")) + self.use_descriptions = use_descriptions + self.use_connections = use_connections + self.max_paragraph_len = max_paragraph_len + self.use_tags = use_tags + self.full_paragraph = full_paragraph + self.re_tokenizer = re.compile(r"[\w']+|[^\w ]") + self.not_found_str = "not in wiki" + + self.load() + + def load(self) -> None: + self.conn = sqlite3.connect(str(self.load_path / self.entities_database_filename)) + self.cur = self.conn.cursor() + self.wikidata = None + if self.wikidata_file: + self.wikidata = HDTDocument(str(expand_path(self.wikidata_file))) + + def save(self) -> None: + pass + + def __call__( + self, + entity_substr_batch: List[List[str]], + entity_tags_batch: List[List[str]] = None, + sentences_batch: List[List[str]] = None, + entity_offsets_batch: List[List[List[int]]] = None, + sentences_offsets_batch: List[List[Tuple[int, int]]] = None, + ) -> Tuple[Union[List[List[List[str]]], List[List[str]]], Union[List[List[List[Any]]], List[List[Any]]], + Union[List[List[List[str]]], List[List[str]]]]: + if (not sentences_offsets_batch or sentences_offsets_batch[0] is None) and sentences_batch is not None \ + or not isinstance(sentences_offsets_batch[0][0], (list, tuple)): + sentences_offsets_batch = [] + for sentences_list in sentences_batch: + sentences_offsets_list = [] + start = 0 + for sentence in sentences_list: + end = start + len(sentence) + sentences_offsets_list.append([start, end]) + start = end + 1 + sentences_offsets_batch.append(sentences_offsets_list) + + if entity_tags_batch is None or not entity_tags_batch[0]: + entity_tags_batch = [["" for _ in entity_substr_list] for entity_substr_list in entity_substr_batch] + else: + entity_tags_batch = [[tag.upper() for tag in entity_tags] for entity_tags in entity_tags_batch] + + if sentences_batch is None: + sentences_batch = [[] for _ in entity_substr_batch] + sentences_offsets_batch = [[] for _ in entity_substr_batch] + + log.debug(f"sentences_batch {sentences_batch}") + if (not entity_offsets_batch and sentences_batch) or not entity_offsets_batch[0] \ + or not isinstance(entity_offsets_batch[0][0], (list, tuple)): + entity_offsets_batch = [] + for entity_substr_list, sentences_list in zip(entity_substr_batch, sentences_batch): + text = " ".join(sentences_list).lower() + log.debug(f"text {text}") + entity_offsets_list = [] + for entity_substr in entity_substr_list: + st_offset = text.find(entity_substr.lower()) + end_offset = st_offset + len(entity_substr) + entity_offsets_list.append([st_offset, end_offset]) + entity_offsets_batch.append(entity_offsets_list) + + entity_ids_batch, entity_conf_batch, entity_pages_batch = [], [], [] + for (entity_substr_list, entity_offsets_list, entity_tags_list, sentences_list, sentences_offsets_list,) in zip( + entity_substr_batch, + entity_offsets_batch, + entity_tags_batch, + sentences_batch, + sentences_offsets_batch, + ): + entity_ids_list, entity_conf_list, entity_pages_list = self.link_entities( + entity_substr_list, + entity_offsets_list, + entity_tags_list, + sentences_list, + sentences_offsets_list, + ) + log.debug(f"entity_ids_list {entity_ids_list} entity_conf_list {entity_conf_list}") + entity_ids_batch.append(entity_ids_list) + entity_conf_batch.append(entity_conf_list) + entity_pages_batch.append(entity_pages_list) + return entity_ids_batch, entity_conf_batch, entity_pages_batch + + def link_entities( + self, + entity_substr_list: List[str], + entity_offsets_list: List[List[int]], + entity_tags_list: List[str], + sentences_list: List[str], + sentences_offsets_list: List[List[int]], + ) -> Tuple[Union[List[List[str]], List[str]], Union[List[List[Any]], List[Any]], Union[List[List[str]], List[str]]]: + log.debug( + f"entity_substr_list {entity_substr_list} entity_tags_list {entity_tags_list} " + f"entity_offsets_list {entity_offsets_list}" + ) + entity_ids_list, conf_list, pages_list = [], [], [] + if entity_substr_list: + entities_scores_list = [] + cand_ent_scores_list = [] + entity_substr_split_list = [ + [word for word in entity_substr.split(" ") if word not in self.stopwords and len(word) > 0] + for entity_substr in entity_substr_list + ] + for entity_substr, entity_substr_split, tag in zip( + entity_substr_list, entity_substr_split_list, entity_tags_list + ): + cand_ent_scores = [] + if len(entity_substr) > 1: + entity_substr_split_lemm = [self.morph.parse(tok)[0].normal_form for tok in entity_substr_split] + cand_ent_init = self.find_exact_match(entity_substr, tag) + if not cand_ent_init or entity_substr_split != entity_substr_split_lemm: + cand_ent_init = self.find_fuzzy_match(entity_substr_split, tag) + + for entity in cand_ent_init: + entities_scores = list(cand_ent_init[entity]) + entities_scores = sorted(entities_scores, key=lambda x: (x[0], x[1]), reverse=True) + cand_ent_scores.append((entity, entities_scores[0])) + cand_ent_scores = sorted(cand_ent_scores, key=lambda x: (x[1][0], x[1][1]), reverse=True) + + cand_ent_scores = cand_ent_scores[:self.num_entities_for_bert_ranking] + cand_ent_scores_list.append(cand_ent_scores) + entity_ids = [elem[0] for elem in cand_ent_scores] + entities_scores_list.append({ent: score for ent, score in cand_ent_scores}) + entity_ids_list.append(entity_ids) + + if self.use_connections: + entity_ids_list = [] + entities_with_conn_scores_list = self.rank_by_connections(cand_ent_scores_list) + for entities_with_conn_scores in entities_with_conn_scores_list: + entity_ids = [elem[0] for elem in entities_with_conn_scores] + entity_ids_list.append(entity_ids) + + entity_descr_list = [] + pages_dict = {} + for entity_ids in entity_ids_list: + entity_descrs = [] + for entity_id in entity_ids: + res = self.cur.execute("SELECT * FROM entity_labels WHERE entity='{}';".format(entity_id)) + entity_info = res.fetchall() + if entity_info: + ( + cur_entity_id, + cur_entity_label, + cur_entity_descr, + cur_entity_page, + ) = entity_info[0] + entity_descrs.append(cur_entity_descr) + pages_dict[cur_entity_id] = cur_entity_page + else: + entity_descrs.append("") + entity_descr_list.append(entity_descrs) + if self.use_descriptions: + substr_lens = [len(entity_substr.split()) for entity_substr in entity_substr_list] + entity_ids_list, conf_list = self.rank_by_description( + entity_substr_list, + entity_offsets_list, + entity_ids_list, + entity_descr_list, + entities_scores_list, + sentences_list, + sentences_offsets_list, + substr_lens, + ) + if self.num_entities_to_return == 1: + pages_list = [pages_dict.get(entity_ids, "") for entity_ids in entity_ids_list] + else: + pages_list = [[pages_dict.get(entity_id, "") for entity_id in entity_ids] + for entity_ids in entity_ids_list] + + return entity_ids_list, conf_list, pages_list + + def process_cand_ent(self, cand_ent_init, entities_and_ids, entity_substr_split, tag): + if self.use_tags: + for cand_entity_title, cand_entity_id, cand_entity_rels, cand_tag, *_ in entities_and_ids: + if not tag or tag == cand_tag: + substr_score = self.calc_substr_score(cand_entity_title, entity_substr_split) + cand_ent_init[cand_entity_id].add((substr_score, cand_entity_rels)) + if not cand_ent_init: + for cand_entity_title, cand_entity_id, cand_entity_rels, cand_tag, *_ in entities_and_ids: + substr_score = self.calc_substr_score(cand_entity_title, entity_substr_split) + cand_ent_init[cand_entity_id].add((substr_score, cand_entity_rels)) + else: + for cand_entity_title, cand_entity_id, cand_entity_rels, *_ in entities_and_ids: + substr_score = self.calc_substr_score(cand_entity_title, entity_substr_split) + cand_ent_init[cand_entity_id].add((substr_score, cand_entity_rels)) + return cand_ent_init + + def find_exact_match(self, entity_substr, tag): + entity_substr_split = entity_substr.split() + cand_ent_init = defaultdict(set) + res = self.cur.execute("SELECT * FROM inverted_index WHERE title MATCH '{}';".format(entity_substr)) + entities_and_ids = res.fetchall() + if entities_and_ids: + cand_ent_init = self.process_cand_ent(cand_ent_init, entities_and_ids, entity_substr_split, tag) + if entity_substr.startswith("the "): + entity_substr = entity_substr.split("the ")[1] + entity_substr_split = entity_substr_split[1:] + res = self.cur.execute("SELECT * FROM inverted_index WHERE title MATCH '{}';".format(entity_substr)) + entities_and_ids = res.fetchall() + cand_ent_init = self.process_cand_ent(cand_ent_init, entities_and_ids, entity_substr_split, tag) + if self.lang == "@ru": + entity_substr_split_lemm = [self.morph.parse(tok)[0].normal_form for tok in entity_substr_split] + entity_substr_lemm = " ".join(entity_substr_split_lemm) + if entity_substr_lemm != entity_substr: + res = self.cur.execute( + "SELECT * FROM inverted_index WHERE title MATCH '{}';".format(entity_substr_lemm) + ) + entities_and_ids = res.fetchall() + if entities_and_ids: + cand_ent_init = self.process_cand_ent( + cand_ent_init, entities_and_ids, entity_substr_split_lemm, tag + ) + return cand_ent_init + + def find_fuzzy_match(self, entity_substr_split, tag): + if self.lang == "@ru": + entity_substr_split_lemm = [self.morph.parse(tok)[0].normal_form for tok in entity_substr_split] + else: + entity_substr_split_lemm = entity_substr_split + cand_ent_init = defaultdict(set) + for word in entity_substr_split: + res = self.cur.execute("SELECT * FROM inverted_index WHERE title MATCH '{}';".format(word)) + part_entities_and_ids = res.fetchall() + cand_ent_init = self.process_cand_ent(cand_ent_init, part_entities_and_ids, entity_substr_split, tag) + if self.lang == "@ru": + word_lemm = self.morph.parse(word)[0].normal_form + if word != word_lemm: + res = self.cur.execute("SELECT * FROM inverted_index WHERE title MATCH '{}';".format(word_lemm)) + part_entities_and_ids = res.fetchall() + cand_ent_init = self.process_cand_ent( + cand_ent_init, + part_entities_and_ids, + entity_substr_split_lemm, + tag + ) + return cand_ent_init + + def morph_parse(self, word): + morph_parse_tok = self.morph.parse(word)[0] + normal_form = morph_parse_tok.normal_form + return normal_form + + def calc_substr_score(self, cand_entity_title, entity_substr_split): + label_tokens = cand_entity_title.split() + cnt = 0.0 + for ent_tok in entity_substr_split: + found = False + for label_tok in label_tokens: + if label_tok == ent_tok: + found = True + break + if found: + cnt += 1.0 + else: + for label_tok in label_tokens: + if label_tok[:2] == ent_tok[:2]: + fuzz_score = fuzz.ratio(label_tok, ent_tok) + if fuzz_score >= 80.0 and not found: + cnt += fuzz_score * 0.01 + break + substr_score = round(cnt / max(len(label_tokens), len(entity_substr_split)), 3) + if len(label_tokens) == 2 and len(entity_substr_split) == 1: + if entity_substr_split[0] == label_tokens[1]: + substr_score = 0.5 + elif entity_substr_split[0] == label_tokens[0]: + substr_score = 0.3 + return substr_score + + def rank_by_connections(self, cand_ent_scores_list: List[List[Union[str, Tuple[str, str]]]]): + entities_for_ranking_list = [] + for entities_scores in cand_ent_scores_list: + entities_for_ranking = [] + if entities_scores: + max_score = entities_scores[0][1][0] + for entity, scores in entities_scores: + if scores[0] == max_score: + entities_for_ranking.append(entity) + entities_for_ranking_list.append(entities_for_ranking) + + entities_sets_list = [] + for entities_scores in cand_ent_scores_list: + entities_sets_list.append({entity for entity, scores in entities_scores}) + + entities_conn_scores_list = [] + for entities_scores in cand_ent_scores_list: + cur_entity_dict = {} + for entity, scores in entities_scores: + cur_entity_dict[entity] = 0 + entities_conn_scores_list.append(cur_entity_dict) + + entities_objects_list, entities_triplets_list = [], [] + for entities_scores in cand_ent_scores_list: + cur_objects_dict, cur_triplets_dict = {}, {} + for entity, scores in entities_scores: + objects, triplets = set(), set() + tr, cnt = self.wikidata.search_triples(f"http://we/{entity}", "", "") + for triplet in tr: + objects.add(triplet[2].split("/")[-1]) + triplets.add((triplet[1].split("/")[-1], triplet[2].split("/")[-1])) + cur_objects_dict[entity] = objects + cur_triplets_dict[entity] = triplets + entities_objects_list.append(cur_objects_dict) + entities_triplets_list.append(cur_triplets_dict) + + already_ranked = {i: False for i in range(len(entities_for_ranking_list))} + + for i in range(len(entities_for_ranking_list)): + for entity1 in entities_for_ranking_list[i]: + for j in range(len(entities_for_ranking_list)): + if i != j and not already_ranked[j]: + inters = entities_objects_list[i][entity1].intersection(entities_sets_list[j]) + if inters: + entities_conn_scores_list[i][entity1] += len(inters) + for entity2 in inters: + entities_conn_scores_list[j][entity2] += len(inters) + already_ranked[j] = True + else: + for entity2 in entities_triplets_list[j]: + inters = entities_triplets_list[i][entity1].intersection( + entities_triplets_list[j][entity2] + ) + inters = {elem for elem in inters if elem[0] != "P31"} + if inters: + prev_score1 = entities_conn_scores_list[i].get(entity1, 0) + prev_score2 = entities_conn_scores_list[j].get(entity2, 0) + entities_conn_scores_list[i][entity1] = max(len(inters), prev_score1) + entities_conn_scores_list[j][entity2] = max(len(inters), prev_score2) + + entities_with_conn_scores_list = [] + for i in range(len(entities_conn_scores_list)): + entities_with_conn_scores_list.append( + sorted( + list(entities_conn_scores_list[i].items()), + key=lambda x: x[1], + reverse=True, + ) + ) + return entities_with_conn_scores_list + + def rank_by_description( + self, + entity_substr_list: List[str], + entity_offsets_list: List[List[int]], + cand_ent_list: List[List[str]], + cand_ent_descr_list: List[List[str]], + entities_scores_list: List[Dict[str, Tuple[int, float]]], + sentences_list: List[str], + sentences_offsets_list: List[List[int]], + substr_lens: List[int], + ) -> Tuple[Union[List[List[str]], List[str]], Union[List[List[Any]], List[Any]]]: + entity_ids_list = [] + conf_list = [] + contexts = [] + for ( + entity_substr, + (entity_start_offset, entity_end_offset), + candidate_entities, + ) in zip(entity_substr_list, entity_offsets_list, cand_ent_list): + sentence = "" + rel_start_offset = 0 + rel_end_offset = 0 + found_sentence_num = 0 + for num, (sent, (sent_start_offset, sent_end_offset)) in enumerate( + zip(sentences_list, sentences_offsets_list) + ): + if entity_start_offset >= sent_start_offset and entity_end_offset <= sent_end_offset: + sentence = sent + found_sentence_num = num + rel_start_offset = entity_start_offset - sent_start_offset + rel_end_offset = entity_end_offset - sent_start_offset + break + context = "" + if sentence: + start_of_sentence = 0 + end_of_sentence = len(sentence) + if len(sentence) > self.max_text_len: + start_of_sentence = max(rel_start_offset - self.max_text_len // 2, 0) + end_of_sentence = min(rel_end_offset + self.max_text_len // 2, len(sentence)) + context = ( + sentence[start_of_sentence:rel_start_offset] + "[ENT]" + sentence[ + rel_end_offset:end_of_sentence] + ) + if self.full_paragraph: + cur_sent_len = len(re.findall(self.re_tokenizer, context)) + first_sentence_num = found_sentence_num + last_sentence_num = found_sentence_num + context = [context] + while True: + added = False + if last_sentence_num < len(sentences_list) - 1: + last_sentence_len = len( + re.findall( + self.re_tokenizer, + sentences_list[last_sentence_num + 1], + ) + ) + if cur_sent_len + last_sentence_len < self.max_paragraph_len: + context.append(sentences_list[last_sentence_num + 1]) + cur_sent_len += last_sentence_len + last_sentence_num += 1 + added = True + if first_sentence_num > 0: + first_sentence_len = len( + re.findall( + self.re_tokenizer, + sentences_list[first_sentence_num - 1], + ) + ) + if cur_sent_len + first_sentence_len < self.max_paragraph_len: + context = [sentences_list[first_sentence_num - 1]] + context + cur_sent_len += first_sentence_len + first_sentence_num -= 1 + added = True + if not added: + break + context = " ".join(context) + + log.debug(f"rank, context: {context}") + contexts.append(context) + + scores_list = self.entity_ranker(contexts, cand_ent_list, cand_ent_descr_list) + for (entity_substr, candidate_entities, substr_len, entities_scores, scores,) in zip( + entity_substr_list, + cand_ent_list, + substr_lens, + entities_scores_list, + scores_list, + ): + log.debug(f"len candidate entities {len(candidate_entities)}") + entities_with_scores = [ + ( + entity, + round(entities_scores.get(entity, (0.0, 0))[0], 2), + entities_scores.get(entity, (0.0, 0))[1], + round(float(score), 2), + ) + for entity, score in scores + ] + log.debug(f"len entities with scores {len(entities_with_scores)}") + entities_with_scores = sorted(entities_with_scores, key=lambda x: (x[1], x[3], x[2]), reverse=True) + log.debug(f"--- entities_with_scores {entities_with_scores}") + + if not entities_with_scores: + top_entities = [self.not_found_str] + top_conf = [(0.0, 0, 0.0)] + elif entities_with_scores and substr_len == 1 and entities_with_scores[0][1] < 1.0: + top_entities = [self.not_found_str] + top_conf = [(0.0, 0, 0.0)] + elif entities_with_scores and ( + entities_with_scores[0][1] < 0.3 + or (entities_with_scores[0][3] < 0.13 and entities_with_scores[0][2] < 20) + or (entities_with_scores[0][3] < 0.3 and entities_with_scores[0][2] < 4) + or entities_with_scores[0][1] < 0.6 + ): + top_entities = [self.not_found_str] + top_conf = [(0.0, 0, 0.0)] + else: + top_entities = [score[0] for score in entities_with_scores] + top_conf = [score[1:] for score in entities_with_scores] + + log.debug(f"--- top_entities {top_entities} top_conf {top_conf}") + + high_conf_entities = [] + high_conf_nums = [] + for elem_num, (entity, conf) in enumerate(zip(top_entities, top_conf)): + if len(conf) == 3 and conf[0] == 1.0 and conf[1] > 50 and conf[2] > 0.3: + new_conf = list(conf) + if new_conf[1] > 55: + new_conf[2] = 1.0 + new_conf = tuple(new_conf) + high_conf_entities.append((entity,) + new_conf) + high_conf_nums.append(elem_num) + + high_conf_entities = sorted(high_conf_entities, key=lambda x: (x[1], x[3], x[2]), reverse=True) + for n, elem_num in enumerate(high_conf_nums): + if 0 <= elem_num - n < len(top_entities): + del top_entities[elem_num - n] + del top_conf[elem_num - n] + + top_entities = [elem[0] for elem in high_conf_entities] + top_entities + top_conf = [elem[1:] for elem in high_conf_entities] + top_conf + + log.debug(f"top entities {top_entities} top_conf {top_conf}") + + if self.num_entities_to_return == 1 and top_entities: + entity_ids_list.append(top_entities[0]) + conf_list.append(top_conf[0]) + else: + entity_ids_list.append(top_entities[: self.num_entities_to_return]) + conf_list.append(top_conf[: self.num_entities_to_return]) + return entity_ids_list, conf_list diff --git a/deeppavlov/models/entity_extraction/ner_chunker.py b/deeppavlov/models/entity_extraction/ner_chunker.py new file mode 100644 index 0000000000..e72037311b --- /dev/null +++ b/deeppavlov/models/entity_extraction/ner_chunker.py @@ -0,0 +1,318 @@ +# Copyright 2017 Neural Networks and Deep Learning lab, MIPT +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import re +from logging import getLogger +from string import punctuation +from typing import List, Tuple + +from nltk import sent_tokenize +from transformers import AutoTokenizer + +from deeppavlov.core.common.registry import register +from deeppavlov.core.models.component import Component +from deeppavlov.core.common.chainer import Chainer +from deeppavlov.models.entity_extraction.entity_detection_parser import EntityDetectionParser + +log = getLogger(__name__) + + +@register('ner_chunker') +class NerChunker(Component): + """ + Class to split documents into chunks of max_chunk_len symbols so that the length will not exceed + maximal sequence length to feed into BERT + """ + + def __init__(self, vocab_file: str, max_seq_len: int = 400, lowercase: bool = False, max_chunk_len: int = 180, + batch_size: int = 2, **kwargs): + """ + Args: + max_chunk_len: maximal length of chunks into which the document is split + batch_size: how many chunks are in batch + """ + self.max_seq_len = max_seq_len + self.max_chunk_len = max_chunk_len + self.batch_size = batch_size + self.re_tokenizer = re.compile(r"[\w']+|[^\w ]") + self.tokenizer = AutoTokenizer.from_pretrained(vocab_file, + do_lower_case=True) + self.punct_ext = punctuation + " " + "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789" + self.russian_letters = "абвгдеёжзийклмнопрстуфхцчшщъыьэюя" + self.lowercase = lowercase + + def __call__(self, docs_batch: List[str]) -> Tuple[List[List[str]], List[List[int]], + List[List[List[Tuple[int, int]]]], List[List[List[str]]]]: + """ + This method splits each document in the batch into chunks wuth the maximal length of max_chunk_len + + Args: + docs_batch: batch of documents + Returns: + batch of lists of document chunks for each document + batch of lists of numbers of documents which correspond to chunks + """ + text_batch_list, nums_batch_list, sentences_offsets_batch_list, sentences_batch_list = [], [], [], [] + text_batch, nums_batch, sentences_offsets_batch, sentences_batch = [], [], [], [] + for n, doc in enumerate(docs_batch): + if self.lowercase: + doc = doc.lower() + start = 0 + text = "" + sentences_list = [] + sentences_offsets_list = [] + cur_len = 0 + doc_pieces = doc.split("\n") + doc_pieces = [self.sanitize(doc_piece) for doc_piece in doc_pieces] + doc_pieces = [doc_piece for doc_piece in doc_pieces if len(doc_piece) > 1] + if doc_pieces: + sentences = [] + for doc_piece in doc_pieces: + sentences += sent_tokenize(doc_piece) + for sentence in sentences: + sentence_tokens = re.findall(self.re_tokenizer, sentence) + sentence_len = sum([len(self.tokenizer.encode_plus(token, add_special_tokens=False)["input_ids"]) + for token in sentence_tokens]) + if cur_len + sentence_len < self.max_seq_len: + text += f"{sentence} " + cur_len += sentence_len + end = start + len(sentence) + sentences_offsets_list.append((start, end)) + sentences_list.append(sentence) + start = end + 1 + else: + text = text.strip() + if text: + text_batch.append(text) + sentences_offsets_batch.append(sentences_offsets_list) + sentences_batch.append(sentences_list) + nums_batch.append(n) + + if sentence_len < self.max_seq_len: + text = f"{sentence} " + cur_len = sentence_len + start = 0 + end = start + len(sentence) + sentences_offsets_list = [(start, end)] + sentences_list = [sentence] + start = end + 1 + else: + text = "" + sentence_chunks = sentence.split(" ") + for chunk in sentence_chunks: + chunk_tokens = re.findall(self.re_tokenizer, chunk) + chunk_len = sum([len(self.tokenizer.encode_plus(token, + add_special_tokens=False)["input_ids"]) + for token in chunk_tokens]) + if cur_len + chunk_len < self.max_seq_len: + text += f"{chunk} " + cur_len += chunk_len + 1 + end = start + len(chunk) + sentences_offsets_list.append((start, end)) + sentences_list.append(chunk) + start = end + 1 + else: + text = text.strip() + if text: + text_batch.append(text) + sentences_offsets_batch.append(sentences_offsets_list) + sentences_batch.append(sentences_list) + nums_batch.append(n) + + text = f"{chunk} " + cur_len = chunk_len + start = 0 + end = start + len(chunk) + sentences_offsets_list = [(start, end)] + sentences_list = [chunk] + start = end + 1 + + text = text.strip().strip(",") + if text: + text_batch.append(text) + nums_batch.append(n) + sentences_offsets_batch.append(sentences_offsets_list) + sentences_batch.append(sentences_list) + else: + text_batch.append("а") + nums_batch.append(n) + sentences_offsets_batch.append([(0, len(doc))]) + sentences_batch.append([doc]) + + num_batches = len(text_batch) // self.batch_size + int(len(text_batch) % self.batch_size > 0) + for jj in range(num_batches): + text_batch_list.append(text_batch[jj * self.batch_size:(jj + 1) * self.batch_size]) + nums_batch_list.append(nums_batch[jj * self.batch_size:(jj + 1) * self.batch_size]) + sentences_offsets_batch_list.append( + sentences_offsets_batch[jj * self.batch_size:(jj + 1) * self.batch_size]) + sentences_batch_list.append(sentences_batch[jj * self.batch_size:(jj + 1) * self.batch_size]) + + return text_batch_list, nums_batch_list, sentences_offsets_batch_list, sentences_batch_list + + def sanitize(self, text): + text_len = len(text) + + if text_len > 0 and text[text_len - 1] not in {'.', '!', '?'}: + i = text_len - 1 + while text[i] in self.punct_ext and i > 0: + i -= 1 + if (text[i] in {'.', '!', '?'} and text[i - 1].lower() in self.russian_letters) or \ + (i > 1 and text[i] in {'.', '!', '?'} and text[i - 1] in '"' and text[ + i - 2].lower() in self.russian_letters): + break + + text = text[:i + 1] + text = re.sub(r'\s+', ' ', text) + return text + + +@register('ner_chunk_model') +class NerChunkModel(Component): + """ + Class for linking of entity substrings in the document to entities in Wikidata + """ + + def __init__(self, ner: Chainer, + ner_parser: EntityDetectionParser, + **kwargs) -> None: + """ + Args: + ner: config for entity detection + ner_parser: component deeppavlov.models.entity_extraction.entity_detection_parser + **kwargs: + """ + self.ner = ner + self.ner_parser = ner_parser + + def __call__(self, text_batch_list: List[List[str]], + nums_batch_list: List[List[int]], + sentences_offsets_batch_list: List[List[List[Tuple[int, int]]]], + sentences_batch_list: List[List[List[str]]] + ): + """ + Args: + text_batch_list: list of document chunks + nums_batch_list: nums of documents + sentences_offsets_batch_list: indices of start and end symbols of sentences in text + sentences_batch_list: list of sentences from texts + Returns: + doc_entity_substr_batch: entity substrings + doc_entity_offsets_batch: indices of start and end symbols of entities in text + doc_tags_batch: entity tags (PER, LOC, ORG) + doc_sentences_offsets_batch: indices of start and end symbols of sentences in text + doc_sentences_batch: list of sentences from texts + """ + entity_substr_batch_list, entity_offsets_batch_list, entity_positions_batch_list, tags_batch_list, \ + entity_probas_batch_list, text_len_batch_list, text_tokens_len_batch_list = [], [], [], [], [], [], [] + for text_batch, sentences_offsets_batch, sentences_batch in \ + zip(text_batch_list, sentences_offsets_batch_list, sentences_batch_list): + text_batch = [text.replace("\xad", " ") for text in text_batch] + + ner_tokens_batch, ner_tokens_offsets_batch, ner_probas_batch, probas_batch = self.ner(text_batch) + entity_substr_batch, entity_positions_batch, entity_probas_batch = \ + self.ner_parser(ner_tokens_batch, ner_probas_batch, probas_batch) + + entity_pos_tags_probas_batch = [[(entity_substr.lower(), entity_substr_positions, tag, entity_proba) + for tag, entity_substr_list in entity_substr_dict.items() + for entity_substr, entity_substr_positions, entity_proba in + zip(entity_substr_list, entity_positions_dict[tag], + entity_probas_dict[tag])] + for entity_substr_dict, entity_positions_dict, entity_probas_dict in + zip(entity_substr_batch, entity_positions_batch, entity_probas_batch)] + + entity_substr_batch, entity_offsets_batch, entity_positions_batch, tags_batch, \ + probas_batch = [], [], [], [], [] + for entity_pos_tags_probas, ner_tokens_offsets_list in \ + zip(entity_pos_tags_probas_batch, ner_tokens_offsets_batch): + if entity_pos_tags_probas: + entity_offsets_list = [] + entity_substr_list, entity_positions_list, tags_list, probas_list = zip(*entity_pos_tags_probas) + for entity_positions in entity_positions_list: + start_offset = ner_tokens_offsets_list[entity_positions[0]][0] + end_offset = ner_tokens_offsets_list[entity_positions[-1]][1] + entity_offsets_list.append((start_offset, end_offset)) + else: + entity_substr_list, entity_offsets_list, entity_positions_list = [], [], [] + tags_list, probas_list = [], [] + entity_substr_batch.append(list(entity_substr_list)) + entity_offsets_batch.append(list(entity_offsets_list)) + entity_positions_batch.append(list(entity_positions_list)) + tags_batch.append(list(tags_list)) + probas_batch.append(list(probas_list)) + + entity_substr_batch_list.append(entity_substr_batch) + tags_batch_list.append(tags_batch) + entity_offsets_batch_list.append(entity_offsets_batch) + entity_positions_batch_list.append(entity_positions_batch) + entity_probas_batch_list.append(probas_batch) + text_len_batch_list.append([len(text) for text in text_batch]) + text_tokens_len_batch_list.append([len(ner_tokens) for ner_tokens in ner_tokens_batch]) + + doc_entity_substr_batch, doc_tags_batch, doc_entity_offsets_batch, doc_probas_batch = [], [], [], [] + doc_entity_positions_batch, doc_sentences_offsets_batch, doc_sentences_batch = [], [], [] + doc_entity_substr, doc_tags, doc_probas, doc_entity_offsets, doc_entity_positions = [], [], [], [], [] + doc_sentences_offsets, doc_sentences = [], [] + cur_doc_num = 0 + text_len_sum = 0 + text_tokens_len_sum = 0 + for entity_substr_batch, tags_batch, probas_batch, entity_offsets_batch, entity_positions_batch, \ + sentences_offsets_batch, sentences_batch, text_len_batch, text_tokens_len_batch, nums_batch in \ + zip(entity_substr_batch_list, tags_batch_list, entity_probas_batch_list, entity_offsets_batch_list, + entity_positions_batch_list, sentences_offsets_batch_list, sentences_batch_list, + text_len_batch_list, text_tokens_len_batch_list, nums_batch_list): + for entity_substr_list, tag_list, probas_list, entity_offsets_list, entity_positions_list, \ + sentences_offsets_list, sentences_list, text_len, text_tokens_len, doc_num in \ + zip(entity_substr_batch, tags_batch, probas_batch, entity_offsets_batch, entity_positions_batch, + sentences_offsets_batch, sentences_batch, text_len_batch, text_tokens_len_batch, nums_batch): + if doc_num == cur_doc_num: + doc_entity_substr += entity_substr_list + doc_tags += tag_list + doc_probas += probas_list + doc_entity_offsets += [(start_offset + text_len_sum, end_offset + text_len_sum) + for start_offset, end_offset in entity_offsets_list] + doc_sentences_offsets += [(start_offset + text_len_sum, end_offset + text_len_sum) + for start_offset, end_offset in sentences_offsets_list] + doc_entity_positions += [[pos + text_tokens_len_sum for pos in positions] + for positions in entity_positions_list] + doc_sentences += sentences_list + text_len_sum += text_len + 1 + text_tokens_len_sum += text_tokens_len + else: + doc_entity_substr_batch.append(doc_entity_substr) + doc_tags_batch.append(doc_tags) + doc_probas_batch.append(doc_probas) + doc_entity_offsets_batch.append(doc_entity_offsets) + doc_entity_positions_batch.append(doc_entity_positions) + doc_sentences_offsets_batch.append(doc_sentences_offsets) + doc_sentences_batch.append(doc_sentences) + doc_entity_substr = entity_substr_list + doc_tags = tag_list + doc_probas = probas_list + doc_entity_offsets = entity_offsets_list + doc_sentences_offsets = sentences_offsets_list + doc_sentences = sentences_list + cur_doc_num = doc_num + text_len_sum = text_len + 1 + text_tokens_len_sum = text_tokens_len + + doc_entity_substr_batch.append(doc_entity_substr) + doc_tags_batch.append(doc_tags) + doc_probas_batch.append(doc_probas) + doc_entity_offsets_batch.append(doc_entity_offsets) + doc_entity_positions_batch.append(doc_entity_positions) + doc_sentences_offsets_batch.append(doc_sentences_offsets) + doc_sentences_batch.append(doc_sentences) + + return doc_entity_substr_batch, doc_entity_offsets_batch, doc_entity_positions_batch, doc_tags_batch, \ + doc_sentences_offsets_batch, doc_sentences_batch, doc_probas_batch diff --git a/deeppavlov/models/go_bot/__init__.py b/deeppavlov/models/go_bot/__init__.py deleted file mode 100644 index e69de29bb2..0000000000 diff --git a/deeppavlov/models/go_bot/dto/__init__.py b/deeppavlov/models/go_bot/dto/__init__.py deleted file mode 100644 index e69de29bb2..0000000000 diff --git a/deeppavlov/models/go_bot/dto/dataset_features.py b/deeppavlov/models/go_bot/dto/dataset_features.py deleted file mode 100644 index bb5a673684..0000000000 --- a/deeppavlov/models/go_bot/dto/dataset_features.py +++ /dev/null @@ -1,258 +0,0 @@ -from typing import List - -import numpy as np - - -# todo remove boilerplate duplications -# todo comments -# todo logging -# todo naming -from deeppavlov.models.go_bot.nlu.dto.nlu_response import NLUResponse -from deeppavlov.models.go_bot.policy.dto.digitized_policy_features import DigitizedPolicyFeatures -from deeppavlov.models.go_bot.tracker.dto.dst_knowledge import DSTKnowledge - -from copy import deepcopy - - -class UtteranceFeatures: - """ - the DTO-like class storing the training features of a single utterance of a dialog - (to feed the GO-bot policy model) - """ - - action_mask: np.ndarray - attn_key: np.ndarray - tokens_embeddings_padded: np.ndarray - features: np.ndarray - - def __init__(self, - nlu_response: NLUResponse, - tracker_knowledge: DSTKnowledge, - features: DigitizedPolicyFeatures): - self.action_mask = features.action_mask - self.attn_key = features.attn_key - - tokens_vectorized = nlu_response.tokens_vectorized # todo proper oop - self.tokens_embeddings_padded = tokens_vectorized.tokens_embeddings_padded - self.features = features.concat_feats - - -class UtteranceTarget: - """ - the DTO-like class storing the training target of a single utterance of a dialog - (to feed the GO-bot policy model) - """ - action_id: int - - def __init__(self, action_id): - self.action_id = action_id - - -class UtteranceDataEntry: - """ - the DTO-like class storing both the training features and target - of a single utterance of a dialog (to feed the GO-bot policy model) - """ - features: UtteranceFeatures - target: UtteranceTarget - - def __init__(self, features, target): - self.features = features - self.target = target - - @staticmethod - def from_features_and_target(features: UtteranceFeatures, target: UtteranceTarget): - return UtteranceDataEntry(deepcopy(features), deepcopy(target)) - - @staticmethod - def from_features(features: UtteranceFeatures): - return UtteranceDataEntry(deepcopy(features), UtteranceTarget(None)) - - -class DialogueFeatures: - """ - the DTO-like class storing both the training features - of a dialog (to feed the GO-bot policy model) - """ - action_masks: List[np.ndarray] - attn_keys: List[np.ndarray] - tokens_embeddings_paddeds: List[np.ndarray] - featuress: List[np.ndarray] - - def __init__(self): - self.action_masks = [] - self.attn_keys = [] - self.tokens_embeddings_paddeds = [] - self.featuress = [] - - def append(self, utterance_features: UtteranceFeatures): - self.action_masks.append(utterance_features.action_mask) - self.attn_keys.append(utterance_features.attn_key) - self.tokens_embeddings_paddeds.append(utterance_features.tokens_embeddings_padded) - self.featuress.append(utterance_features.features) - - def __len__(self): - return len(self.featuress) - - -class DialogueTargets: - """ - the DTO-like class storing both the training targets - of a dialog (to feed the GO-bot policy model) - """ - action_ids: List[int] - - def __init__(self): - self.action_ids = [] - - def append(self, utterance_target: UtteranceTarget): - self.action_ids.append(utterance_target.action_id) - - def __len__(self): - return len(self.action_ids) - - -class DialogueDataEntry: - """ - the DTO-like class storing both the training features and targets - of a dialog (to feed the GO-bot policy model) - """ - features: DialogueFeatures - targets: DialogueTargets - - def __init__(self): - self.features = DialogueFeatures() - self.targets = DialogueTargets() - - def append(self, utterance_features: UtteranceDataEntry): - self.features.append(utterance_features.features) - self.targets.append(utterance_features.target) - - def __len__(self): - return len(self.features) - - -class PaddedDialogueFeatures(DialogueFeatures): - """ - the DTO-like class storing both the **padded to some specified length** training features - of a dialog (to feed the GO-bot policy model) - """ - padded_dialogue_length_mask: List[int] - - def __init__(self, dialogue_features: DialogueFeatures, sequence_length): - super().__init__() - - padding_length = sequence_length - len(dialogue_features) - - self.padded_dialogue_length_mask = [1] * len(dialogue_features) + [0] * padding_length - - self.action_masks = dialogue_features.action_masks + \ - [np.zeros_like(dialogue_features.action_masks[0])] * padding_length - - self.attn_keys = dialogue_features.attn_keys + [np.zeros_like(dialogue_features.attn_keys[0])] * padding_length - - self.tokens_embeddings_paddeds = dialogue_features.tokens_embeddings_paddeds + \ - [np.zeros_like( - dialogue_features.tokens_embeddings_paddeds[0])] * padding_length - - self.featuress = dialogue_features.featuress + [np.zeros_like(dialogue_features.featuress[0])] * padding_length - - -class PaddedDialogueTargets(DialogueTargets): - """ - the DTO-like class storing both the **padded to some specified length** training targets - of a dialog (to feed the GO-bot policy model) - """ - def __init__(self, dialogue_targets: DialogueTargets, sequence_length): - super().__init__() - - padding_length = sequence_length - len(dialogue_targets) - self.action_ids = dialogue_targets.action_ids + [0] * padding_length - - -class PaddedDialogueDataEntry(DialogueDataEntry): - """ - the DTO-like class storing both the **padded to some specified length** training features and targets - of a dialog (to feed the GO-bot policy model) - """ - features: PaddedDialogueFeatures - targets: PaddedDialogueTargets - - def __init__(self, dialogue_data_entry: DialogueDataEntry, sequence_length): - super().__init__() - - self.features = PaddedDialogueFeatures(dialogue_data_entry.features, sequence_length) - self.targets = PaddedDialogueTargets(dialogue_data_entry.targets, sequence_length) - - -class BatchDialoguesFeatures: - """ - the DTO-like class storing both the training features - of a batch of dialogues. (to feed the GO-bot policy model) - """ - b_action_masks: List[List[np.ndarray]] - b_attn_keys: List[List[np.ndarray]] - b_tokens_embeddings_paddeds: List[List[np.ndarray]] - b_featuress: List[List[np.ndarray]] - b_padded_dialogue_length_mask: List[List[int]] - max_dialogue_length: int - - def __init__(self, max_dialogue_length): - self.b_action_masks = [] - self.b_attn_keys = [] - self.b_tokens_embeddings_paddeds = [] - self.b_featuress = [] - self.b_padded_dialogue_length_mask = [] - self.max_dialogue_length = max_dialogue_length - - def append(self, padded_dialogue_features: PaddedDialogueFeatures): - self.b_action_masks.append(padded_dialogue_features.action_masks) - self.b_attn_keys.append(padded_dialogue_features.attn_keys) - self.b_tokens_embeddings_paddeds.append(padded_dialogue_features.tokens_embeddings_paddeds) - self.b_featuress.append(padded_dialogue_features.featuress) - self.b_padded_dialogue_length_mask.append(padded_dialogue_features.padded_dialogue_length_mask) - - def __len__(self): - return len(self.b_featuress) - - -class BatchDialoguesTargets: - """ - the DTO-like class storing both the training targets - of a batch of dialogues. (to feed the GO-bot policy model) - """ - b_action_ids: List[List[int]] - max_dialogue_length: int - - def __init__(self, max_dialogue_length): - self.b_action_ids = [] - self.max_dialogue_length = max_dialogue_length - - def append(self, padded_dialogue_targets: PaddedDialogueTargets): - self.b_action_ids.append(padded_dialogue_targets.action_ids) - - def __len__(self): - return len(self.b_action_ids) - - -class BatchDialoguesDataset: - """ - the DTO-like class storing both the training features and target - of a batch of dialogues. (to feed the GO-bot policy model) - Handles the dialogues padding. - """ - features: BatchDialoguesFeatures - targets: BatchDialoguesTargets - - def __init__(self, max_dialogue_length): - self.features = BatchDialoguesFeatures(max_dialogue_length) - self.targets = BatchDialoguesTargets(max_dialogue_length) - self.max_dialogue_length = max_dialogue_length - - def append(self, dialogue_features: DialogueDataEntry): - padded_dialogue_features = PaddedDialogueDataEntry(dialogue_features, self.max_dialogue_length) - self.features.append(padded_dialogue_features.features) - self.targets.append(padded_dialogue_features.targets) - - def __len__(self): - return len(self.features) diff --git a/deeppavlov/models/go_bot/dto/shared_gobot_params.py b/deeppavlov/models/go_bot/dto/shared_gobot_params.py deleted file mode 100644 index 0472c37333..0000000000 --- a/deeppavlov/models/go_bot/dto/shared_gobot_params.py +++ /dev/null @@ -1,24 +0,0 @@ -from deeppavlov.models.go_bot.nlu.nlu_manager import NLUManagerInterface -from deeppavlov.models.go_bot.nlg.nlg_manager import NLGManagerInterface -from deeppavlov.models.go_bot.tracker.featurized_tracker import FeaturizedTracker - - -# todo logging -class SharedGoBotParams: - """the DTO-like class to share the params used in various parts of the GO-bot pipeline.""" - # possibly useful: seems like the params reflect only "real-world" knowledge. - num_actions: int - num_intents: int - num_tracker_features: int - - def __init__(self, num_actions: int, num_intents: int, num_tracker_features: int): - self.num_actions = num_actions - self.num_intents = num_intents - self.num_tracker_features = num_tracker_features - - @staticmethod - def from_configured(nlg_manager: NLGManagerInterface, nlu_manager: NLUManagerInterface, tracker: FeaturizedTracker): - """builds the params object given some GO-bot units that are already configured""" - return SharedGoBotParams(nlg_manager.num_of_known_actions(), - nlu_manager.num_of_known_intents(), - tracker.num_features) diff --git a/deeppavlov/models/go_bot/go_bot.py b/deeppavlov/models/go_bot/go_bot.py deleted file mode 100644 index ce47cf2577..0000000000 --- a/deeppavlov/models/go_bot/go_bot.py +++ /dev/null @@ -1,484 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from logging import getLogger -from typing import Dict, Any, List, Optional, Union, Tuple - -import numpy as np - -from deeppavlov.core.common.registry import register -from deeppavlov.core.models.component import Component -from deeppavlov.core.models.nn_model import NNModel -from deeppavlov.models.go_bot.nlg.dto.nlg_response_interface import NLGResponseInterface -from deeppavlov.models.go_bot.nlu.dto.text_vectorization_response import TextVectorizationResponse -from deeppavlov.models.go_bot.nlu.tokens_vectorizer import TokensVectorizer -from deeppavlov.models.go_bot.dto.dataset_features import UtteranceDataEntry, DialogueDataEntry, \ - BatchDialoguesDataset, UtteranceFeatures, UtteranceTarget, BatchDialoguesFeatures -from deeppavlov.models.go_bot.dto.shared_gobot_params import SharedGoBotParams -from deeppavlov.models.go_bot.nlg.nlg_manager import NLGManagerInterface -from deeppavlov.models.go_bot.nlu.nlu_manager import NLUManager -from deeppavlov.models.go_bot.policy.policy_network import PolicyNetwork, PolicyNetworkParams -from deeppavlov.models.go_bot.policy.dto.policy_prediction import PolicyPrediction -from deeppavlov.models.go_bot.tracker.featurized_tracker import FeaturizedTracker -from deeppavlov.models.go_bot.tracker.dialogue_state_tracker import DialogueStateTracker, MultipleUserStateTrackersPool -from pathlib import Path - -log = getLogger(__name__) - - -# todo logging -@register("go_bot") -class GoalOrientedBot(NNModel): - """ - The dialogue bot is based on https://arxiv.org/abs/1702.03274, which - introduces Hybrid Code Networks that combine an RNN with domain-specific - knowledge and system action templates. - - The network handles dialogue policy management. - Inputs features of an utterance and predicts label of a bot action - (classification task). - - An LSTM with a dense layer for input features and a dense layer for it's output. - Softmax is used as an output activation function. - - Todo: - add docstring for trackers. - - Parameters: - tokenizer: one of tokenizers from - :doc:`deeppavlov.models.tokenizers ` module. - tracker: dialogue state tracker from - :doc:`deeppavlov.models.go_bot.tracker `. - hidden_size: size of rnn hidden layer. - dropout_rate: probability of weights dropping out. - l2_reg_coef: l2 regularization weight (applied to input and output layer). - dense_size: rnn input size. - attention_mechanism: describes attention applied to embeddings of input tokens. - - * **type** – type of attention mechanism, possible values are ``'general'``, ``'bahdanau'``, - ``'light_general'``, ``'light_bahdanau'``, ``'cs_general'`` and ``'cs_bahdanau'``. - * **hidden_size** – attention hidden state size. - * **max_num_tokens** – maximum number of input tokens. - * **depth** – number of averages used in constrained attentions - (``'cs_bahdanau'`` or ``'cs_general'``). - * **action_as_key** – whether to use action from previous time step as key - to attention. - * **intent_as_key** – use utterance intents as attention key or not. - * **projected_align** – whether to use output projection. - network_parameters: dictionary with network parameters (for compatibility with release 0.1.1, - deprecated in the future) - - word_vocab: vocabulary of input word tokens - (:class:`~deeppavlov.core.data.simple_vocab.SimpleVocabulary` recommended). - bow_embedder: instance of one-hot word encoder - :class:`~deeppavlov.models.embedders.bow_embedder.BoWEmbedder`. - embedder: one of embedders from - :doc:`deeppavlov.models.embedders ` module. - slot_filler: component that outputs slot values for a given utterance - (:class:`~deeppavlov.models.slotfill.slotfill.DstcSlotFillingNetwork` - recommended). - intent_classifier: component that outputs intents probability - distribution for a given utterance ( - :class:`~deeppavlov.models.classifiers.keras_classification_model.KerasClassificationModel` - recommended). - database: database that will be used during inference to perform - ``api_call_action`` actions and get ``'db_result'`` result ( - :class:`~deeppavlov.core.data.sqlite_database.Sqlite3Database` - recommended). - use_action_mask: if ``True``, network output will be applied with a mask - over allowed actions. - debug: whether to display debug output. - """ - - DEFAULT_USER_ID = 1 - POLICY_DIR_NAME = "policy" - - def __init__(self, - tokenizer: Component, - tracker: FeaturizedTracker, - nlg_manager: NLGManagerInterface, - save_path: str, - hidden_size: int = 128, - dropout_rate: float = 0., - l2_reg_coef: float = 0., - dense_size: int = None, - attention_mechanism: dict = None, - network_parameters: Optional[Dict[str, Any]] = None, - load_path: str = None, - word_vocab: Component = None, - bow_embedder: Component = None, - embedder: Component = None, - slot_filler: Component = None, - intent_classifier: Component = None, - database: Component = None, - use_action_mask: bool = False, - debug: bool = False, - **kwargs) -> None: - self.use_action_mask = use_action_mask # todo not supported actually - super().__init__(save_path=save_path, load_path=load_path, **kwargs) - - self.debug = debug - - policy_network_params = PolicyNetworkParams(hidden_size, dropout_rate, l2_reg_coef, - dense_size, attention_mechanism, network_parameters) - - self.nlu_manager = NLUManager(tokenizer, slot_filler, intent_classifier) # todo move to separate pipeline unit - self.nlg_manager = nlg_manager - self.data_handler = TokensVectorizer(debug, word_vocab, bow_embedder, embedder) - - # todo make mor abstract - self.dialogue_state_tracker = DialogueStateTracker.from_gobot_params(tracker, self.nlg_manager, - policy_network_params, database) - # todo make mor abstract - self.multiple_user_state_tracker = MultipleUserStateTrackersPool(base_tracker=self.dialogue_state_tracker) - - tokens_dims = self.data_handler.get_dims() - features_params = SharedGoBotParams.from_configured(self.nlg_manager, self.nlu_manager, - self.dialogue_state_tracker) - policy_save_path = Path(save_path, self.POLICY_DIR_NAME) - policy_load_path = Path(load_path, self.POLICY_DIR_NAME) - - self.policy = PolicyNetwork(policy_network_params, tokens_dims, features_params, - policy_load_path, policy_save_path, **kwargs) - - self.dialogues_cached_features = dict() - - self.reset() - - def prepare_dialogues_batches_training_data(self, - batch_dialogues_utterances_contexts_info: List[List[dict]], - batch_dialogues_utterances_responses_info: List[ - List[dict]]) -> BatchDialoguesDataset: - """ - Parse the passed dialogue information to the dialogue information object. - - Args: - batch_dialogues_utterances_contexts_info: the dictionary containing - the dialogue utterances training information - batch_dialogues_utterances_responses_info: the dictionary containing - the dialogue utterances responses training information - - Returns: - the dialogue data object containing the numpy-vectorized features and target extracted - from the utterance data - - """ - # todo naming, docs, comments - max_dialogue_length = max(len(dialogue_info_entry) - for dialogue_info_entry in batch_dialogues_utterances_contexts_info) # for padding - - batch_dialogues_dataset = BatchDialoguesDataset(max_dialogue_length) - for dialogue_utterances_info in zip(batch_dialogues_utterances_contexts_info, - batch_dialogues_utterances_responses_info): - dialogue_index_value = dialogue_utterances_info[0][0].get("dialogue_label") - - if dialogue_index_value and dialogue_index_value in self.dialogues_cached_features.keys(): - dialogue_training_data = self.dialogues_cached_features[dialogue_index_value] - else: - dialogue_training_data = self.prepare_dialogue_training_data(*dialogue_utterances_info) - if dialogue_index_value: - self.dialogues_cached_features[dialogue_index_value] = dialogue_training_data - - batch_dialogues_dataset.append(dialogue_training_data) - - return batch_dialogues_dataset - - def prepare_dialogue_training_data(self, - dialogue_utterances_contexts_info: List[dict], - dialogue_utterances_responses_info: List[dict]) -> DialogueDataEntry: - """ - Parse the passed dialogue information to the dialogue information object. - - Args: - dialogue_utterances_contexts_info: the dictionary containing the dialogue utterances training information - dialogue_utterances_responses_info: the dictionary containing - the dialogue utterances responses training information - - Returns: - the dialogue data object containing the numpy-vectorized features and target extracted - from the utterance data - - """ - dialogue_training_data = DialogueDataEntry() - # we started to process new dialogue so resetting the dialogue state tracker. - # simplification of this logic is planned; there is a todo - self.dialogue_state_tracker.reset_state() - for context, response in zip(dialogue_utterances_contexts_info, dialogue_utterances_responses_info): - - utterance_training_data = self.prepare_utterance_training_data(context, response) - dialogue_training_data.append(utterance_training_data) - - # to correctly track the dialogue state - # we inform the tracker with the ground truth response info - # just like the tracker remembers the predicted response actions when real-time inference - self.dialogue_state_tracker.update_previous_action(utterance_training_data.target.action_id) - - if self.debug: - log.debug(f"True response = '{response['text']}'.") - if utterance_training_data.features.action_mask[utterance_training_data.target.action_id] != 1.: - log.warning("True action forbidden by action mask.") - return dialogue_training_data - - def prepare_utterance_training_data(self, - utterance_context_info_dict: dict, - utterance_response_info_dict: dict) -> UtteranceDataEntry: - """ - Parse the passed utterance information to the utterance information object. - - Args: - utterance_context_info_dict: the dictionary containing the utterance training information - utterance_response_info_dict: the dictionary containing the utterance response training information - - Returns: - the utterance data object containing the numpy-vectorized features and target extracted - from the utterance data - - """ - # todo naming, docs, comments - text = utterance_context_info_dict['text'] - - # if there already were db lookups in this utterance - # we inform the tracker with these lookups info - # just like the tracker remembers the db interaction results when real-time inference - # todo: not obvious logic - self.dialogue_state_tracker.update_ground_truth_db_result_from_context(utterance_context_info_dict) - - utterance_features = self.extract_features_from_utterance_text(text, self.dialogue_state_tracker) - - action_id = self.nlg_manager.get_action_id(utterance_response_info_dict['act']) - utterance_target = UtteranceTarget(action_id) - - utterance_data_entry = UtteranceDataEntry.from_features_and_target(utterance_features, utterance_target) - return utterance_data_entry - - def extract_features_from_utterance_text(self, text, tracker, keep_tracker_state=False) -> UtteranceFeatures: - """ - Extract ML features for the input text and the respective tracker. - Features are aggregated from the - * NLU; - * text BOW-encoding&embedding; - * tracker memory. - - Args: - text: the text to infer to - tracker: the tracker that tracks the dialogue from which the text is taken - keep_tracker_state: if True, the tracker state will not be updated during the prediction. - Used to keep tracker's state intact when predicting the action - to perform right after the api call action is predicted and performed. - - Returns: - the utterance features object containing the numpy-vectorized features extracted from the utterance - """ - # todo comments - - nlu_response = self.nlu_manager.nlu(text) - - # region text BOW-encoding and embedding | todo: to nlu - # todo move vectorization to NLU - tokens_bow_encoded = self.data_handler.bow_encode_tokens(nlu_response.tokens) - - tokens_embeddings_padded = np.array([], dtype=np.float32) - tokens_aggregated_embedding = np.array([], dtype=np.float32) - if self.policy.has_attn(): - attn_window_size = self.policy.get_attn_window_size() - # todo: this is ugly and caused by complicated nn configuration algorithm - attn_config_token_dim = self.policy.get_attn_hyperparams().token_size - tokens_embeddings_padded = self.data_handler.calc_tokens_embeddings(attn_window_size, - attn_config_token_dim, - nlu_response.tokens) - else: - tokens_aggregated_embedding = self.data_handler.calc_tokens_mean_embedding(nlu_response.tokens) - nlu_response.set_tokens_vectorized(TextVectorizationResponse( - tokens_bow_encoded, - tokens_aggregated_embedding, - tokens_embeddings_padded)) - # endregion text BOW-encoding and embedding | todo: to nlu - - if not keep_tracker_state: - tracker.update_state(nlu_response) - - tracker_knowledge = tracker.get_current_knowledge() - - digitized_policy_features = self.policy.digitize_features(nlu_response, tracker_knowledge) - - return UtteranceFeatures(nlu_response, tracker_knowledge, digitized_policy_features) - - def _infer(self, user_utterance_text: str, user_tracker: DialogueStateTracker, - keep_tracker_state=False) -> Tuple[BatchDialoguesFeatures, PolicyPrediction]: - """ - Predict the action to perform in response to given text. - - Args: - user_utterance_text: the user input text passed to the system - user_tracker: the tracker that tracks the dialogue with the input-provided user - keep_tracker_state: if True, the tracker state will not be updated during the prediction. - Used to keep tracker's state intact when predicting the action to perform right after - the api call action - - Returns: - the features data object containing features fed to the model on inference and the model's prediction info - """ - utterance_features = self.extract_features_from_utterance_text(user_utterance_text, user_tracker, - keep_tracker_state) - - utterance_data_entry = UtteranceDataEntry.from_features(utterance_features) - - # region pack an utterance to batch to further get features in batched form - dialogue_data_entry = DialogueDataEntry() - dialogue_data_entry.append(utterance_data_entry) - # batch is single dialogue of 1 utterance => dialogue length = 1 - utterance_batch_data_entry = BatchDialoguesDataset(max_dialogue_length=1) - utterance_batch_data_entry.append(dialogue_data_entry) - # endregion pack an utterance to batch to further get features in batched form - utterance_batch_features = utterance_batch_data_entry.features - - # as for RNNs: output, hidden_state < - RNN(output, hidden_state) - hidden_cells_state, hidden_cells_output = user_tracker.network_state[0], user_tracker.network_state[1] - policy_prediction = self.policy(utterance_batch_features, - hidden_cells_state, - hidden_cells_output, - prob=True) - - return utterance_batch_features, policy_prediction - - def __call__(self, batch: Union[List[List[dict]], List[str]], - user_ids: Optional[List] = None) -> Union[List[NLGResponseInterface], - List[List[NLGResponseInterface]]]: - if isinstance(batch[0], list): - # batch is a list of *completed* dialogues, infer on them to calculate metrics - # user ids are ignored here: the single tracker is used and is reset after each dialogue inference - # todo unify tracking: no need to distinguish tracking strategies on dialogues and realtime - res = [] - for dialogue in batch: - dialogue: List[dict] - res.append(self._calc_inferences_for_dialogue(dialogue)) - else: - # batch is a list of utterances possibly came from different users: real-time inference - res = [] - if not user_ids: - user_ids = [self.DEFAULT_USER_ID] * len(batch) - for user_id, user_text in zip(user_ids, batch): - user_text: str - res.append(self._realtime_infer(user_id, user_text)) - - return res - - def _realtime_infer(self, user_id, user_text) -> List[NLGResponseInterface]: - # realtime inference logic - # - # we have the pool of trackers, each one tracks the dialogue with its own user - # (1 to 1 mapping: each user has his own tracker and vice versa) - - user_tracker = self.multiple_user_state_tracker.get_or_init_tracker(user_id) - responses = [] - - # todo remove duplication - - # predict the action to perform (e.g. response smth or call the api) - utterance_batch_features, policy_prediction = self._infer(user_text, user_tracker) - user_tracker.update_previous_action(policy_prediction.predicted_action_ix) - user_tracker.network_state = policy_prediction.get_network_state() - - # tracker says we need to say smth to user. we - # * calculate the slotfilled state: - # for each slot that is relevant to dialogue we fill this slot value if possible - # * generate text for the predicted speech action: - # using the pattern provided for the action; - # the slotfilled state provides info to encapsulate to the pattern - tracker_slotfilled_state = user_tracker.fill_current_state_with_db_results() - resp = self.nlg_manager.decode_response(utterance_batch_features, - policy_prediction, - tracker_slotfilled_state) - responses.append(resp) - - if policy_prediction.predicted_action_ix == self.nlg_manager.get_api_call_action_id(): - # tracker says we need to make an api call. - # we 1) perform the api call and 2) predict what to do next - user_tracker.make_api_call() - utterance_batch_features, policy_prediction = self._infer(user_text, user_tracker, - keep_tracker_state=True) - user_tracker.update_previous_action(policy_prediction.predicted_action_ix) - user_tracker.network_state = policy_prediction.get_network_state() - - # tracker says we need to say smth to user. we - # * calculate the slotfilled state: - # for each slot that is relevant to dialogue we fill this slot value if possible - # * generate text for the predicted speech action: - # using the pattern provided for the action; - # the slotfilled state provides info to encapsulate to the pattern - tracker_slotfilled_state = user_tracker.fill_current_state_with_db_results() - resp = self.nlg_manager.decode_response(utterance_batch_features, - policy_prediction, - tracker_slotfilled_state) - responses.append(resp) - - return responses - - def _calc_inferences_for_dialogue(self, contexts: List[dict]) -> List[NLGResponseInterface]: - # infer on each dialogue utterance - # e.g. to calculate inference score via comparing the inferred predictions with the ground truth utterance - # todo we provide the tracker with both predicted and ground truth response actions info. is this ok? - # todo (response to ^) this should be used only on internal evaluations - # todo warning. - res = [] - self.dialogue_state_tracker.reset_state() - for context in contexts: - if context.get('prev_resp_act') is not None: - # if there already were responses to user - # we inform the tracker with these responses info - # just like the tracker remembers the predicted response actions when real-time inference - previous_action_id = self.nlg_manager.get_action_id(context['prev_resp_act']) - self.dialogue_state_tracker.update_previous_action(previous_action_id) - - # if there already were db lookups - # we inform the tracker with these lookups info - # just like the tracker remembers the db interaction results when real-time inference - self.dialogue_state_tracker.update_ground_truth_db_result_from_context(context) - - utterance_batch_features, policy_prediction = self._infer(context['text'], self.dialogue_state_tracker) - self.dialogue_state_tracker.update_previous_action(policy_prediction.predicted_action_ix) # see above todo - self.dialogue_state_tracker.network_state = policy_prediction.get_network_state() - - # todo fix naming: fill_current_state_with_db_results & update_ground_truth_db_result_from_context are alike - tracker_slotfilled_state = self.dialogue_state_tracker.fill_current_state_with_db_results() - resp = self.nlg_manager.decode_response(utterance_batch_features, - policy_prediction, - tracker_slotfilled_state) - res.append(resp) - return res - - def train_on_batch(self, - batch_dialogues_utterances_features: List[List[dict]], - batch_dialogues_utterances_targets: List[List[dict]]) -> dict: - batch_dialogues_dataset = self.prepare_dialogues_batches_training_data(batch_dialogues_utterances_features, - batch_dialogues_utterances_targets) - return self.policy.train_on_batch(batch_dialogues_dataset.features, - batch_dialogues_dataset.targets) - - def reset(self, user_id: Union[None, str, int] = None) -> None: - # WARNING: this method is confusing. todo - # the multiple_user_state_tracker is applicable only to the realtime inference scenario - # so the tracker used to calculate metrics on dialogues is never reset by this method - # (but that tracker usually is reset before each dialogue inference) - self.multiple_user_state_tracker.reset(user_id) - if self.debug: - log.debug("Bot reset.") - - def load(self, *args, **kwargs) -> None: - self.policy.load() - super().load(*args, **kwargs) - - def save(self, *args, **kwargs) -> None: - super().save(*args, **kwargs) - self.policy.save() diff --git a/deeppavlov/models/go_bot/nlg/__init__.py b/deeppavlov/models/go_bot/nlg/__init__.py deleted file mode 100644 index e69de29bb2..0000000000 diff --git a/deeppavlov/models/go_bot/nlg/dto/__init__.py b/deeppavlov/models/go_bot/nlg/dto/__init__.py deleted file mode 100644 index e69de29bb2..0000000000 diff --git a/deeppavlov/models/go_bot/nlg/dto/batch_nlg_response.py b/deeppavlov/models/go_bot/nlg/dto/batch_nlg_response.py deleted file mode 100644 index 48eb525ebc..0000000000 --- a/deeppavlov/models/go_bot/nlg/dto/batch_nlg_response.py +++ /dev/null @@ -1,7 +0,0 @@ -from typing import Container -from deeppavlov.models.go_bot.nlg.dto.nlg_response_interface import NLGResponseInterface - - -class BatchNLGResponse: - def __init__(self, nlg_responses: Container[NLGResponseInterface]): - self.responses: Container[NLGResponseInterface] = nlg_responses diff --git a/deeppavlov/models/go_bot/nlg/dto/json_nlg_response.py b/deeppavlov/models/go_bot/nlg/dto/json_nlg_response.py deleted file mode 100644 index 399e388266..0000000000 --- a/deeppavlov/models/go_bot/nlg/dto/json_nlg_response.py +++ /dev/null @@ -1,25 +0,0 @@ -from deeppavlov.models.go_bot.nlg.dto.nlg_response_interface import NLGObjectResponseInterface - - -class JSONNLGResponse(NLGObjectResponseInterface): - """ - The NLG output unit that stores slot values and predicted actions info. - """ - def __init__(self, slot_values: dict, actions_tuple: tuple): - self.slot_values = slot_values - self.actions_tuple = actions_tuple - - def to_serializable_dict(self) -> dict: - return {'+'.join(self.actions_tuple): self.slot_values} - -class VerboseJSONNLGResponse(JSONNLGResponse): - - @staticmethod - def from_json_nlg_response(json_nlg_response: JSONNLGResponse) -> "VerboseJSONNLGResponse": - verbose_json_nlg_response = VerboseJSONNLGResponse(json_nlg_response.slot_values, - json_nlg_response.actions_tuple) - return verbose_json_nlg_response - - def get_nlu_info(self): - intent_name = "start" if self.actions_tuple[0] == "start" else self.actions_tuple[0][len("utter_"):].split('{')[0] - return {"intent": intent_name} diff --git a/deeppavlov/models/go_bot/nlg/dto/nlg_response_interface.py b/deeppavlov/models/go_bot/nlg/dto/nlg_response_interface.py deleted file mode 100644 index 80669018e5..0000000000 --- a/deeppavlov/models/go_bot/nlg/dto/nlg_response_interface.py +++ /dev/null @@ -1,10 +0,0 @@ -from abc import ABCMeta -from typing import Tuple - - -class NLGObjectResponseInterface(metaclass=ABCMeta): - def to_serializable_dict(self) -> dict: - raise NotImplementedError(f"to_serializable_dict() not implemented in {self.__class__.__name__}") - - -NLGResponseInterface = Tuple[NLGObjectResponseInterface, str] diff --git a/deeppavlov/models/go_bot/nlg/mock_json_nlg_manager.py b/deeppavlov/models/go_bot/nlg/mock_json_nlg_manager.py deleted file mode 100644 index 655712ab21..0000000000 --- a/deeppavlov/models/go_bot/nlg/mock_json_nlg_manager.py +++ /dev/null @@ -1,151 +0,0 @@ -import json -from itertools import combinations -from pathlib import Path -from typing import Union, Dict, List, Tuple - -from deeppavlov.core.commands.utils import expand_path -from deeppavlov.core.common.errors import ConfigError -from deeppavlov.core.common.registry import register, get_model -from deeppavlov.dataset_readers.dstc2_reader import DSTC2DatasetReader -from deeppavlov.models.go_bot.dto.dataset_features import BatchDialoguesFeatures -from deeppavlov.models.go_bot.nlg.dto.json_nlg_response import JSONNLGResponse, VerboseJSONNLGResponse -from deeppavlov.models.go_bot.nlg.nlg_manager import log -from deeppavlov.models.go_bot.nlg.nlg_manager_interface import NLGManagerInterface -from deeppavlov.models.go_bot.policy.dto.policy_prediction import PolicyPrediction - - -@register("gobot_json_nlg_manager") -class MockJSONNLGManager(NLGManagerInterface): - - # todo inheritance - # todo force a2id, id2a mapping to be persistent for same configs - - def __init__(self, - actions2slots_path: Union[str, Path], - api_call_action: str, - data_path: Union[str, Path], - dataset_reader_class="dstc2_reader", - debug=False): - self.debug = debug - - if self.debug: - log.debug(f"BEFORE {self.__class__.__name__} init(): " - f"actions2slots_path={actions2slots_path}, " - f"api_call_action={api_call_action}, debug={debug}") - - self._dataset_reader = get_model(dataset_reader_class) - - individual_actions2slots = self._load_actions2slots_mapping(actions2slots_path) - possible_actions_combinations_tuples = sorted( - set(actions_combination_tuple - for actions_combination_tuple - in self._extract_actions_combinations(data_path)), - key=lambda x: '+'.join(x)) - - self.action_tuples2ids = {action_tuple: action_tuple_idx - for action_tuple_idx, action_tuple - in enumerate(possible_actions_combinations_tuples)} # todo: typehint tuples somehow - self.ids2action_tuples = {v: k for k, v in self.action_tuples2ids.items()} - - self.action_tuples_ids2slots = {} # todo: typehint tuples somehow - for actions_combination_tuple in possible_actions_combinations_tuples: - actions_combination_slots = set(slot - for action in actions_combination_tuple - for slot in individual_actions2slots.get(action, [])) - actions_combination_tuple_id = self.action_tuples2ids[actions_combination_tuple] - self.action_tuples_ids2slots[actions_combination_tuple_id] = actions_combination_slots - - self._api_call_id = -1 - if api_call_action is not None: - api_call_action_as_tuple = (api_call_action,) - self._api_call_id = self.action_tuples2ids[api_call_action_as_tuple] - - if self.debug: - log.debug(f"AFTER {self.__class__.__name__} init(): " - f"actions2slots_path={actions2slots_path}, " - f"api_call_action={api_call_action}, debug={debug}") - - def get_api_call_action_id(self) -> int: - """ - Returns: - an ID corresponding to the api call action - """ - return self._api_call_id - - def _extract_actions_combinations(self, dataset_path: Union[str, Path]): - dataset_path = expand_path(dataset_path) - dataset = self._dataset_reader.read(data_path=dataset_path, dialogs=True, ignore_slots=True) - actions_combinations = set() - for dataset_split in dataset.values(): - for dialogue in dataset_split: - for user_input, system_response in dialogue: - actions_tuple = tuple(system_response["act"].split('+')) - actions_combinations.add(actions_tuple) - return actions_combinations - - @staticmethod - def _load_actions2slots_mapping(actions2slots_json_path) -> Dict[str, str]: - actions2slots_json_path = expand_path(actions2slots_json_path) - if actions2slots_json_path.exists(): - with open(actions2slots_json_path, encoding="utf-8") as actions2slots_json_f: - actions2slots = json.load(actions2slots_json_f) - else: - actions2slots = dict() - log.info(f"INSIDE {__class__.__name__} _load_actions2slots_mapping(): " - f"actions2slots_json_path={actions2slots_json_path} DOES NOT EXIST. " - f"initialized actions2slots mapping with an empty one: {str(actions2slots)}") - return actions2slots - - def get_action_id(self, action_text: Union[str, Tuple[str, ...]]) -> int: - """ - Looks up for an ID corresponding to the passed action text. - - Args: - action_text: the text for which an ID needs to be returned. - Returns: - an ID corresponding to the passed action text - """ - if isinstance(action_text, str): - actions_tuple = tuple(action_text.split('+')) - else: - actions_tuple = action_text - return self.action_tuples2ids[actions_tuple] # todo unhandled exception when not found - - def decode_response(self, - utterance_batch_features: BatchDialoguesFeatures, - policy_prediction: PolicyPrediction, - tracker_slotfilled_state: dict) -> JSONNLGResponse: - """ - Converts the go-bot inference objects to the single output object. - - Args: - utterance_batch_features: utterance features extracted in go-bot that - policy_prediction: policy model prediction (predicted action) - tracker_slotfilled_state: tracker knowledge before the NLG is performed - - Returns: - The NLG output unit that stores slot values and predicted actions info. - """ - slots_to_log = self.action_tuples_ids2slots[policy_prediction.predicted_action_ix] - - slots_values = {slot_name: tracker_slotfilled_state.get(slot_name, "unk") for slot_name in slots_to_log} - actions_tuple = self.ids2action_tuples[policy_prediction.predicted_action_ix] - - response = JSONNLGResponse(slots_values, actions_tuple) - verbose_response = VerboseJSONNLGResponse.from_json_nlg_response(response) - verbose_response.policy_prediction = policy_prediction - return verbose_response - - def num_of_known_actions(self) -> int: - """ - Returns: - the number of actions known to the NLG module - """ - return len(self.action_tuples2ids.keys()) - - def known_actions(self) -> List: - """ - Returns: - the list of actions known to the NLG module - """ - return list(self.action_tuples2ids.keys()) diff --git a/deeppavlov/models/go_bot/nlg/nlg_manager.py b/deeppavlov/models/go_bot/nlg/nlg_manager.py deleted file mode 100644 index dbceb122f3..0000000000 --- a/deeppavlov/models/go_bot/nlg/nlg_manager.py +++ /dev/null @@ -1,115 +0,0 @@ -import re -from logging import getLogger -from pathlib import Path -from typing import Union, List - -from deeppavlov.core.commands.utils import expand_path -import deeppavlov.models.go_bot.nlg.templates.templates as go_bot_templates -from deeppavlov.core.common.registry import register -from deeppavlov.models.go_bot.dto.dataset_features import BatchDialoguesFeatures -from deeppavlov.models.go_bot.nlg.nlg_manager_interface import NLGManagerInterface -from deeppavlov.models.go_bot.policy.dto.policy_prediction import PolicyPrediction - -log = getLogger(__name__) - - -# todo add the ability to configure nlg loglevel in config (now the setting is shared across all the GO-bot) -# todo add each method input-output logging when proper loglevel level specified - - -@register("gobot_nlg_manager") -class NLGManager(NLGManagerInterface): - """ - NLGManager is a unit of the go-bot pipeline that handles the generation of text - when the pattern is chosen among the known patterns and the named-entities-values-like knowledge is provided. - (the whole go-bot pipeline is as follows: NLU, dialogue-state-tracking&policy-NN, NLG) - - Parameters: - template_path: file with mapping between actions and text templates - for response generation. - template_type: type of used response templates in string format. - api_call_action: label of the action that corresponds to database api call - (it must be present in your ``template_path`` file), during interaction - it will be used to get ``'db_result'`` from ``database``. - debug: whether to display debug output. - """ - - def __init__(self, template_path: Union[str, Path], template_type: str, api_call_action: str, debug=False): - self.debug = debug - if self.debug: - log.debug(f"BEFORE {self.__class__.__name__} init(): " - f"template_path={template_path}, template_type={template_type}, " - f"api_call_action={api_call_action}, debug={debug}") - - template_path = expand_path(template_path) - template_type = getattr(go_bot_templates, template_type) - self.templates = go_bot_templates.Templates(template_type).load(template_path) - - self._api_call_id = -1 - if api_call_action is not None: - self._api_call_id = self.templates.actions.index(api_call_action) - - if self.debug: - log.debug(f"AFTER {self.__class__.__name__} init(): " - f"template_path={template_path}, template_type={template_type}, " - f"api_call_action={api_call_action}, debug={debug}") - - def get_action_id(self, action_text: str) -> int: - """ - Looks up for an ID relevant to the passed action text in the list of known actions and their ids. - - Args: - action_text: the text for which an ID needs to be returned. - Returns: - an ID corresponding to the passed action text - """ - return self.templates.actions.index(action_text) # todo unhandled exception when not found - - def get_api_call_action_id(self) -> int: - """ - Returns: - an ID corresponding to the api call action - """ - return self._api_call_id - - def decode_response(self, - utterance_batch_features: BatchDialoguesFeatures, - policy_prediction: PolicyPrediction, - tracker_slotfilled_state) -> str: - # todo: docstring - - action_text = self._generate_slotfilled_text_for_action(policy_prediction.predicted_action_ix, - tracker_slotfilled_state) - # in api calls replace unknown slots to "dontcare" - if policy_prediction.predicted_action_ix == self._api_call_id: - action_text = re.sub("#([A-Za-z]+)", "dontcare", action_text).lower() - return action_text - - def _generate_slotfilled_text_for_action(self, action_id: int, slots: dict) -> str: - """ - Generate text for the predicted speech action using the pattern provided for the action. - The slotfilled state provides info to encapsulate to the pattern. - - Args: - action_id: the id of action to generate text for. - slots: the slots and their known values. usually received from dialogue state tracker. - - Returns: - the text generated for the passed action id and slot values. - """ - text = self.templates.templates[action_id].generate_text(slots) - return text - - def num_of_known_actions(self) -> int: - """ - Returns: - the number of actions known to the NLG module - """ - return len(self.templates) - - def known_actions(self) -> List[str]: - """ - Returns: - the list of actions known to the NLG module - """ - return self.templates.actions diff --git a/deeppavlov/models/go_bot/nlg/nlg_manager_interface.py b/deeppavlov/models/go_bot/nlg/nlg_manager_interface.py deleted file mode 100644 index 0e060a31c1..0000000000 --- a/deeppavlov/models/go_bot/nlg/nlg_manager_interface.py +++ /dev/null @@ -1,52 +0,0 @@ -from abc import ABCMeta, abstractmethod -from typing import List - -from deeppavlov.models.go_bot.dto.dataset_features import BatchDialoguesFeatures -from deeppavlov.models.go_bot.nlg.dto.nlg_response_interface import NLGResponseInterface -from deeppavlov.models.go_bot.policy.dto.policy_prediction import PolicyPrediction - - -class NLGManagerInterface(metaclass=ABCMeta): - - @abstractmethod - def get_action_id(self, action_text) -> int: - """ - Looks up for an ID relevant to the passed action text in the list of known actions and their ids. - - Args: - action_text: the text for which an ID needs to be returned. - Returns: - an ID corresponding to the passed action text - """ - pass - - @abstractmethod - def get_api_call_action_id(self) -> int: - """ - Returns: - an ID corresponding to the api call action - """ - pass - - @abstractmethod - def decode_response(self, - utterance_batch_features: BatchDialoguesFeatures, - policy_prediction: PolicyPrediction, - tracker_slotfilled_state) -> NLGResponseInterface: - # todo: docstring - pass - - @abstractmethod - def num_of_known_actions(self) -> int: - """ - Returns: - the number of actions known to the NLG module - """ - pass - - @abstractmethod - def known_actions(self) -> List: - """ - Returns: - the list of actions known to the NLG module - """ diff --git a/deeppavlov/models/go_bot/nlg/templates/__init__.py b/deeppavlov/models/go_bot/nlg/templates/__init__.py deleted file mode 100644 index e69de29bb2..0000000000 diff --git a/deeppavlov/models/go_bot/nlg/templates/templates.py b/deeppavlov/models/go_bot/nlg/templates/templates.py deleted file mode 100644 index 549bdd60fb..0000000000 --- a/deeppavlov/models/go_bot/nlg/templates/templates.py +++ /dev/null @@ -1,186 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import copy -import re -from abc import ABCMeta, abstractmethod - - -class Template(metaclass=ABCMeta): - - @abstractmethod - def from_str(cls, s): - return cls(s) # TODO move deserialization logic onto separate class, smth like serialization proxy or factory - - -class DefaultTemplate(Template): - - def __init__(self, text=""): - self.text = text - - @classmethod - def from_str(cls, s): - return cls(s) - - def update(self, text=""): - self.text = self.text or text - - def __contains__(self, t): - return t.text == self.text - - def __eq__(self, other): - if isinstance(other, self.__class__): - return self.text == other.text - return False - - def __hash__(self): - """Override the default hash behavior (that returns the id)""" - return hash(self.text) - - def __str__(self): - return self.text - - def generate_text(self, slots=[]): - t = copy.copy(self.text) - if isinstance(slots, dict): - slots = slots.items() - for slot, value in slots: - t = t.replace('#' + slot, value, 1) - if t: - t = t[0].upper() + t[1:] - return t - - -class DualTemplate(Template): - - def __init__(self, default="", dontcare=""): - self.default = default - self.dontcare = dontcare - - @property - def dontcare_slots(self): - default_slots = self._slots(self.default) - dontcare_slots = self._slots(self.dontcare) - return default_slots - dontcare_slots - - @staticmethod - def _slots(text): - return set(re.findall('#(\w+)', text)) - - @classmethod - def from_str(cls, s): - return cls(*s.split('\t', 1)) - - def update(self, default="", dontcare=""): - self.default = self.default or default - self.dontcare = self.dontcare or dontcare - - def __contains__(self, t): - return t.default and (t.default == self.default) \ - or t.dontcare and (t.dontcare == self.dontcare) - - def __eq__(self, other): - if isinstance(other, self.__class__): - return (self.default == other.default) \ - and (self.dontcare == other.dontcare) - return False - - def __hash__(self): - """Override the default hash behavior (that returns the id)""" - return hash(self.default + '\t' + self.dontcare) - - def __str__(self): - return self.default + '\t' + self.dontcare - - def generate_text(self, slots): - t = copy.copy(self.default) - if isinstance(slots, dict): - slots = slots.items() - dontcare_slots = (s[0] for s in slots if s[1] == 'dontcare') - if self.dontcare and self.dontcare_slots.issubset(dontcare_slots): - t = copy.copy(self.dontcare) - for slot, value in slots: - t = t.replace('#' + slot, value, 1) - if t: - t = t[0].upper() + t[1:] - return t - - -class Templates: - - def __init__(self, ttype): - self.ttype = ttype - self.act2templ = {} - self.templ2act = {} - self._actions = [] - self._templates = [] - - def __contains__(self, key): - """If key is an str, returns whether the key is in the actions. - If key is a Template, returns if the key is templates. - """ - if isinstance(key, str): - return key in self.act2templ - elif isinstance(key, Template): - return key in self.templ2act - - def __getitem__(self, key): - """If key is an str, returns corresponding template. - If key is a Template, return corresponding action. - If does not exist, return None. - """ - if isinstance(key, str): - return self.act2templ[key] - elif isinstance(key, Template): - return self.templ2act[key] - - def __len__(self): - return len(self.act2templ) - - def __str__(self): - return str(self.act2templ) - - def __setitem__(self, key, value): - """If the key is not in the dictionary, add it.""" - key = str(key) - if key not in self.act2templ: - self.act2templ[key] = value - self.templ2act[value] = key - self._actions = [] - self._templates = [] - - @property - def actions(self): - if not self._actions: - self._actions = sorted(self.act2templ.keys()) - return self._actions - - @property - def templates(self): - if not self._templates: - self._templates = [self.act2templ[a] for a in self.actions] - return self._templates - - def load(self, filename): - with open(filename, 'r', encoding='utf8') as fp: - for ln in fp: - act, template = ln.strip('\n').split('\t', 1) - self.__setitem__(act, self.ttype.from_str(template)) - return self - - def save(self, filename): - with open(filename, 'w', encoding='utf8') as outfile: - for act in sorted(self.actions): - template = self.__getitem__(act) - outfile.write('{}\t{}\n'.format(act, template)) diff --git a/deeppavlov/models/go_bot/nlu/__init__.py b/deeppavlov/models/go_bot/nlu/__init__.py deleted file mode 100644 index e69de29bb2..0000000000 diff --git a/deeppavlov/models/go_bot/nlu/dto/__init__.py b/deeppavlov/models/go_bot/nlu/dto/__init__.py deleted file mode 100644 index e69de29bb2..0000000000 diff --git a/deeppavlov/models/go_bot/nlu/dto/nlu_response.py b/deeppavlov/models/go_bot/nlu/dto/nlu_response.py deleted file mode 100644 index 7570aef386..0000000000 --- a/deeppavlov/models/go_bot/nlu/dto/nlu_response.py +++ /dev/null @@ -1,18 +0,0 @@ -from typing import Any, Dict, Tuple, List, Union, Optional - -from deeppavlov.models.go_bot.nlu.dto.nlu_response_interface import NLUResponseInterface -from deeppavlov.models.go_bot.nlu.dto.text_vectorization_response import TextVectorizationResponse - - -class NLUResponse(NLUResponseInterface): - """ - Stores the go-bot NLU knowledge: extracted slots and intents info, embedding and bow vectors. - """ - def __init__(self, slots, intents, tokens): - self.slots: Union[List[Tuple[str, Any]], Dict[str, Any]] = slots - self.intents = intents - self.tokens = tokens - self.tokens_vectorized: Optional[TextVectorizationResponse] = None - - def set_tokens_vectorized(self, tokens_vectorized): - self.tokens_vectorized = tokens_vectorized diff --git a/deeppavlov/models/go_bot/nlu/dto/nlu_response_interface.py b/deeppavlov/models/go_bot/nlu/dto/nlu_response_interface.py deleted file mode 100644 index 2a6364907c..0000000000 --- a/deeppavlov/models/go_bot/nlu/dto/nlu_response_interface.py +++ /dev/null @@ -1,5 +0,0 @@ -from abc import ABCMeta - - -class NLUResponseInterface(metaclass=ABCMeta): - pass diff --git a/deeppavlov/models/go_bot/nlu/dto/text_vectorization_response.py b/deeppavlov/models/go_bot/nlu/dto/text_vectorization_response.py deleted file mode 100644 index 71de3eb4b6..0000000000 --- a/deeppavlov/models/go_bot/nlu/dto/text_vectorization_response.py +++ /dev/null @@ -1,9 +0,0 @@ -class TextVectorizationResponse: - """ - Stores the BOW-encodings and (padded or aggregated e.g. averaged) embeddings for text. - """ - - def __init__(self, tokens_bow_encoded, tokens_aggregated_embedding, tokens_embeddings_padded): - self.tokens_bow_encoded = tokens_bow_encoded - self.tokens_aggregated_embedding = tokens_aggregated_embedding - self.tokens_embeddings_padded = tokens_embeddings_padded diff --git a/deeppavlov/models/go_bot/nlu/nlu_manager.py b/deeppavlov/models/go_bot/nlu/nlu_manager.py deleted file mode 100644 index e18d74b48f..0000000000 --- a/deeppavlov/models/go_bot/nlu/nlu_manager.py +++ /dev/null @@ -1,82 +0,0 @@ -from logging import getLogger -from typing import List - -from deeppavlov import Chainer -from deeppavlov.models.go_bot.nlu.dto.nlu_response import NLUResponse -from deeppavlov.models.go_bot.nlu.nlu_manager_interface import NLUManagerInterface - -log = getLogger(__name__) - - -# todo add the ability to configure nlu loglevel in config (now the setting is shared across all the GO-bot) -# todo add each method input-output logging when proper loglevel level specified - - -class NLUManager(NLUManagerInterface): - """ - NLUManager is a unit of the go-bot pipeline that handles the understanding of text. - Given the text it provides tokenization, intents extraction and the slots extraction. - (the whole go-bot pipeline is as follows: NLU, dialogue-state-tracking&policy-NN, NLG) - """ - - def __init__(self, tokenizer, slot_filler, intent_classifier, debug=False): - self.debug = debug - if self.debug: - log.debug(f"BEFORE {self.__class__.__name__} init(): " - f"tokenizer={tokenizer}, slot_filler={slot_filler}, " - f"intent_classifier={intent_classifier}, debug={debug}") - # todo type hints - self.tokenizer = tokenizer - self.slot_filler = slot_filler - self.intent_classifier = intent_classifier - self.intents = [] - if isinstance(self.intent_classifier, Chainer): - self.intents = self.intent_classifier.get_main_component().classes - - if self.debug: - log.debug(f"AFTER {self.__class__.__name__} init(): " - f"tokenizer={tokenizer}, slot_filler={slot_filler}, " - f"intent_classifier={intent_classifier}, debug={debug}") - - def nlu(self, text: str) -> NLUResponse: - """ - Extracts slot values and intents from text. - - Args: - text: text to extract knowledge from - - Returns: - an object storing the extracted slos and intents info - """ - # todo meaningful type hints - tokens = self._tokenize_single_text_entry(text) - - slots = None - if callable(self.slot_filler): - slots = self._extract_slots_from_tokenized_text_entry(tokens) - - intents = [] - if callable(self.intent_classifier): - intents = self._extract_intents_from_tokenized_text_entry(tokens) - - return NLUResponse(slots, intents, tokens) - - def _extract_intents_from_tokenized_text_entry(self, tokens: List[str]): - # todo meaningful type hints, relies on unannotated intent classifier - intent_features = self.intent_classifier([' '.join(tokens)])[1][0] - return intent_features - - def _extract_slots_from_tokenized_text_entry(self, tokens: List[str]): - # todo meaningful type hints, relies on unannotated slot filler - return self.slot_filler([tokens])[0] - - def _tokenize_single_text_entry(self, text: str): - # todo meaningful type hints, relies on unannotated tokenizer - return self.tokenizer([text.lower().strip()])[0] - - def num_of_known_intents(self) -> int: - """ - Returns: - the number of intents known to the NLU module - """ - return len(self.intents) diff --git a/deeppavlov/models/go_bot/nlu/nlu_manager_interface.py b/deeppavlov/models/go_bot/nlu/nlu_manager_interface.py deleted file mode 100644 index 2d9cc49183..0000000000 --- a/deeppavlov/models/go_bot/nlu/nlu_manager_interface.py +++ /dev/null @@ -1,17 +0,0 @@ -from abc import ABCMeta, abstractmethod - -from deeppavlov.models.go_bot.nlu.dto.nlu_response_interface import NLUResponseInterface - - -class NLUManagerInterface(metaclass=ABCMeta): - @abstractmethod - def nlu(self, text) -> NLUResponseInterface: - pass - - @abstractmethod - def num_of_known_intents(self) -> int: - """ - Returns: - the number of intents known to the NLU module - """ - pass diff --git a/deeppavlov/models/go_bot/nlu/tokens_vectorizer.py b/deeppavlov/models/go_bot/nlu/tokens_vectorizer.py deleted file mode 100644 index 8ced116fce..0000000000 --- a/deeppavlov/models/go_bot/nlu/tokens_vectorizer.py +++ /dev/null @@ -1,149 +0,0 @@ -from logging import getLogger -from typing import List, Optional - -import numpy as np - -log = getLogger(__name__) - - -# todo logging -class TokensVectorRepresentationParams: - """the DTO-like class to transfer TokenVectorizer's vectorizers dimensions""" - - def __init__(self, embedding_dim: Optional[int], bow_dim: Optional[int]): - self.embedding_dim = embedding_dim - self.bow_dim = bow_dim - - -class TokensVectorizer: - """ - the TokensVectorizer class is used in the NLU part of deeppavlov go-bot pipeline. - (for more info on NLU logic see the NLUManager --- the go-bot NLU main class) - - TokensVectorizer is manages the BOW tokens encoding and tokens embedding. - Both BOW encoder and embedder are optional and have to be pre-trained: - this class wraps their usage but not training. - """ - - def __init__(self, debug, word_vocab=None, bow_embedder=None, embedder=None): - # todo adequate type hints - self.debug = debug - self.word_vocab = word_vocab # TODO: isn't it passed with bow embedder? - self.bow_embedder = bow_embedder - self.embedder = embedder - - def _use_bow_encoder(self) -> bool: - """ - Returns: - is BOW encoding enabled in the TokensVectorizer - """ - return callable(self.bow_embedder) - - def _embed_tokens(self, tokens: List[str], mean_embeddings: bool) -> Optional[np.ndarray]: - """ - Args: - tokens: list of tokens to embed - mean_embeddings: if True, will return the mean vector of calculated embeddings sequence. - otherwise will return the calculated embeddings sequence. - - Returns: - the (maybe averaged vector of) calculated embeddings sequence and None if embedder is disabled. - """ - tokens_embedded = np.array([], dtype=np.float32) - if callable(self.embedder): - tokens_embedded = self.embedder([tokens], mean=mean_embeddings)[0] - return tokens_embedded - - def bow_encode_tokens(self, tokens: List[str]) -> np.ndarray: - """ - Args: - tokens: list of tokens to BOW encode - - Returns: - if uses BOW encoder, returns np array with BOW encoding for tokens. - Otherwise returns an empty list. - """ - bow_features = np.array([], dtype=np.float32) - if self._use_bow_encoder(): - tokens_idx = self.word_vocab(tokens) - bow_features = self.bow_embedder([tokens_idx])[0] - bow_features = bow_features.astype(np.float32) - return bow_features - - @staticmethod - def _standard_normal_like(source_vector: np.ndarray) -> np.ndarray: - """ - Args: - source_vector: the vector of which to follow the result shape - - Returns: - the standard normal distribution of the shape of the source vector - """ - vector_dim = source_vector.shape[0] - return np.random.normal(loc=0.0, scale=1 / vector_dim, size=vector_dim) - - @staticmethod - def _pad_sequence_to_size(out_sequence_length: int, token_dim: int, tokens_embedded: np.ndarray) -> np.ndarray: - """ - Pad the passed vectors sequence to the specified length. - - Args: - out_sequence_length: the length to pad sequence to - token_dim: the shape of output embedding - tokens_embedded: some sequence of vectors - - Returns: - the padded sequence of vectors - """ - out_sequence_length = out_sequence_length - len(tokens_embedded) - padding = np.zeros(shape=(out_sequence_length, token_dim), dtype=np.float32) - if tokens_embedded: - emb_context = np.concatenate((padding, np.array(tokens_embedded))) - else: - emb_context = padding - return emb_context - - def calc_tokens_mean_embedding(self, tokens: List[str]) -> np.ndarray: - """ - Args: - tokens: list of tokens to embed - - Returns: - the average vector of embeddings sequence - or if avg is zeros then the standard normal distributed random vector instead. - None if embedder is disabled. - """ - tokens_embedded = self._embed_tokens(tokens, True) - # random embedding instead of zeros - if tokens_embedded.size != 0 and np.all(tokens_embedded < 1e-20): - # TODO: size != 0 not pythonic - tokens_embedded = np.fabs(self._standard_normal_like(tokens_embedded)) - return tokens_embedded - - def calc_tokens_embeddings(self, output_sequence_length: int, token_dim: int, tokens: List[str]) -> np.ndarray: - """ - Calculate embeddings of passed tokens. - Args: - output_sequence_length: the length of sequence to output - token_dim: the shape of output embedding - tokens: list of tokens to embed - - Returns: - the padded sequence of calculated embeddings - """ - tokens_embedded = self._embed_tokens(tokens, False) - if tokens_embedded is not None: - emb_context = self._pad_sequence_to_size(output_sequence_length, token_dim, tokens_embedded) - else: - emb_context = np.array([], dtype=np.float32) - return emb_context - - def get_dims(self) -> TokensVectorRepresentationParams: - """ - Returns: - the TokensVectorRepresentationParams with embedder and BOW encoder output dimensions. - None instead of the missing dim if BOW encoder or embedder are missing. - """ - embedder_dim = self.embedder.dim if self.embedder else None - bow_encoder_dim = len(self.word_vocab) if self.bow_embedder else None - return TokensVectorRepresentationParams(embedder_dim, bow_encoder_dim) diff --git a/deeppavlov/models/go_bot/policy/__init__.py b/deeppavlov/models/go_bot/policy/__init__.py deleted file mode 100644 index e69de29bb2..0000000000 diff --git a/deeppavlov/models/go_bot/policy/dto/__init__.py b/deeppavlov/models/go_bot/policy/dto/__init__.py deleted file mode 100644 index e69de29bb2..0000000000 diff --git a/deeppavlov/models/go_bot/policy/dto/attn_params.py b/deeppavlov/models/go_bot/policy/dto/attn_params.py deleted file mode 100644 index dc4e677578..0000000000 --- a/deeppavlov/models/go_bot/policy/dto/attn_params.py +++ /dev/null @@ -1,16 +0,0 @@ -from typing import NamedTuple - - -class GobotAttnParams(NamedTuple): - """ - the DTO-like class that stores the attention mechanism configuration params. - """ - max_num_tokens: int - hidden_size: int - token_size: int - key_size: int - type_: str - projected_align: bool - depth: int - action_as_key: bool - intent_as_key: bool diff --git a/deeppavlov/models/go_bot/policy/dto/digitized_policy_features.py b/deeppavlov/models/go_bot/policy/dto/digitized_policy_features.py deleted file mode 100644 index 6ee23096ec..0000000000 --- a/deeppavlov/models/go_bot/policy/dto/digitized_policy_features.py +++ /dev/null @@ -1,5 +0,0 @@ -class DigitizedPolicyFeatures: - def __init__(self, attn_key, concat_feats, action_mask): - self.attn_key = attn_key - self.concat_feats = concat_feats - self.action_mask = action_mask diff --git a/deeppavlov/models/go_bot/policy/dto/policy_network_params.py b/deeppavlov/models/go_bot/policy/dto/policy_network_params.py deleted file mode 100644 index 2bc1c196e6..0000000000 --- a/deeppavlov/models/go_bot/policy/dto/policy_network_params.py +++ /dev/null @@ -1,57 +0,0 @@ -from logging import getLogger - -log = getLogger(__name__) - - -class PolicyNetworkParams: - """ - The class to deal with the overcomplicated structure of the GO-bot configs. - It is initialized from the config-as-is and performs all the conflicting parameters resolution internally. - """ - # todo remove the complex config logic - UNSUPPORTED = ["obs_size"] - DEPRECATED = ["end_learning_rate", "decay_steps", "decay_power"] - - def __init__(self, - hidden_size, - dropout_rate, - l2_reg_coef, - dense_size, - attention_mechanism, - network_parameters): - self.hidden_size = hidden_size - self.dropout_rate = dropout_rate - self.l2_reg_coef = l2_reg_coef - self.dense_size = dense_size - self.attention_mechanism = attention_mechanism - self.network_parameters = network_parameters or {} - - self.log_deprecated_params(self.network_parameters.keys()) - - def get_hidden_size(self): - return self.network_parameters.get("hidden_size", self.hidden_size) - - def get_action_size(self): - return self.network_parameters.get("action_size") - - def get_dropout_rate(self): - return self.network_parameters.get("dropout_rate", self.dropout_rate) - - def get_l2_reg_coef(self): - return self.network_parameters.get("l2_reg_coef", self.l2_reg_coef) - - def get_dense_size(self): - return self.network_parameters.get("dense_size", self.dense_size) or self.hidden_size # todo :( - - def get_learning_rate(self): - return self.network_parameters.get("learning_rate", None) - - def get_attn_params(self): - return self.network_parameters.get('attention_mechanism', self.attention_mechanism) - - def log_deprecated_params(self, network_parameters): - if any(p in network_parameters for p in self.DEPRECATED): - log.warning(f"parameters {self.DEPRECATED} are deprecated," - f" for learning rate schedule documentation see" - f" deeppavlov.core.models.lr_scheduled_tf_model" - f" or read a github tutorial on super convergence.") diff --git a/deeppavlov/models/go_bot/policy/dto/policy_prediction.py b/deeppavlov/models/go_bot/policy/dto/policy_prediction.py deleted file mode 100644 index c25b7eef82..0000000000 --- a/deeppavlov/models/go_bot/policy/dto/policy_prediction.py +++ /dev/null @@ -1,18 +0,0 @@ -from typing import Tuple - -import numpy as np - - -class PolicyPrediction: - """ - Used to store policy model predictions and hidden values. - """ - def __init__(self, probs, prediction, hidden_outs, cell_state): - self.probs = probs - self.prediction = prediction - self.hidden_outs = hidden_outs - self.cell_state = cell_state - self.predicted_action_ix = np.argmax(probs) - - def get_network_state(self) -> Tuple: - return self.cell_state, self.hidden_outs diff --git a/deeppavlov/models/go_bot/policy/policy_network.py b/deeppavlov/models/go_bot/policy/policy_network.py deleted file mode 100644 index 1e3483203d..0000000000 --- a/deeppavlov/models/go_bot/policy/policy_network.py +++ /dev/null @@ -1,455 +0,0 @@ -import json -from typing import Tuple, Optional -from logging import getLogger - -import numpy as np -import tensorflow as tf - -from deeppavlov.core.common.errors import ConfigError -from deeppavlov.core.layers import tf_attention_mechanisms as am, tf_layers - -# noinspection PyUnresolvedReferences -from tensorflow.contrib.layers import xavier_initializer as xav - -from deeppavlov.core.models.tf_model import LRScheduledTFModel -from deeppavlov.models.go_bot.nlu.dto.nlu_response import NLUResponse - -from deeppavlov.models.go_bot.nlu.tokens_vectorizer import TokensVectorRepresentationParams -from deeppavlov.models.go_bot.dto.dataset_features import BatchDialoguesFeatures, BatchDialoguesTargets - -# todo -from deeppavlov.models.go_bot.dto.shared_gobot_params import SharedGoBotParams -from deeppavlov.models.go_bot.policy.dto.attn_params import GobotAttnParams -from deeppavlov.models.go_bot.policy.dto.digitized_policy_features import DigitizedPolicyFeatures -from deeppavlov.models.go_bot.policy.dto.policy_network_params import PolicyNetworkParams -from deeppavlov.models.go_bot.policy.dto.policy_prediction import PolicyPrediction -from deeppavlov.models.go_bot.tracker.dto.dst_knowledge import DSTKnowledge - -log = getLogger(__name__) - - -class PolicyNetwork(LRScheduledTFModel): - """ - the Policy Network is a ML model whose goal is to choose the right system response when in dialogue with user. - """ - - GRAPH_PARAMS = ["hidden_size", "action_size", "dense_size", "attention_params"] - SERIALIZABLE_FIELDS = ["hidden_size", "action_size", "dense_size", "dropout_rate", "l2_reg_coef", - "attention_params"] - - def __init__(self, network_params_passed: PolicyNetworkParams, - tokens_dims: TokensVectorRepresentationParams, - features_params: SharedGoBotParams, - load_path, - save_path, - debug=False, - **kwargs): - self.debug = debug - if self.debug: - log.debug(f"BEFORE {self.__class__.__name__} init(): " - f"network_params_passed={network_params_passed}, tokens_dims={tokens_dims}, " - f"features_params={features_params}, load_path={load_path}, save_path={save_path}, " - f"debug={debug}, kwargs={kwargs}") - if network_params_passed.get_learning_rate(): - kwargs['learning_rate'] = network_params_passed.get_learning_rate() # todo :( - - super().__init__(load_path=load_path, save_path=save_path, **kwargs) - - self.hidden_size = network_params_passed.get_hidden_size() - self.action_size = features_params.num_actions - self.dropout_rate = network_params_passed.get_dropout_rate() - self.l2_reg_coef = network_params_passed.get_l2_reg_coef() - self.dense_size = network_params_passed.get_dense_size() - - attn_params_passed = network_params_passed.get_attn_params() - self.attention_params = self.configure_attn(attn_params_passed, tokens_dims, features_params) # todo :( - self.input_size = self.calc_input_size(tokens_dims, features_params, self.attention_params) # todo :( - - if self.debug: - log.debug(f"INSIDE {self.__class__.__name__} init(). calculated NN hyperparams: " - f"attention_params={self.attention_params}, " - f"hidden_size={self.hidden_size}, action_size={self.action_size}, " - f"dropout_rate={self.dropout_rate}, l2_reg_coef={self.l2_reg_coef}, " - f"dense_size={self.dense_size}, input_size={self.input_size}") - - self._build_graph() - if self.debug: - log.debug(f"INSIDE {self.__class__.__name__} init(). build graph done.") - self.sess = tf.Session() - self.sess.run(tf.global_variables_initializer()) - if self.debug: - log.debug(f"INSIDE {self.__class__.__name__} init(). " - f"Session() initialization and global_variables_initializer() done.") - - if self.train_checkpoint_exists(): - log.info( - f"INSIDE {self.__class__.__name__} init(). Initializing {self.__class__.__name__} from checkpoint.") - self.load() - else: - log.info(f"INSIDE {self.__class__.__name__} init(). Initializing {self.__class__.__name__} from scratch.") - - if self.debug: - log.debug(f"AFTER {self.__class__.__name__} init(): " - f"network_params_passed={network_params_passed}, tokens_dims={tokens_dims}, " - f"features_params={features_params}, load_path={load_path}, save_path={save_path}, " - f"debug={debug}, kwargs={kwargs}") - - @staticmethod - def calc_input_size(tokens_dims: TokensVectorRepresentationParams, - shared_go_bot_params: SharedGoBotParams, - attention_params: Optional[GobotAttnParams]) -> int: - """ - Args: - tokens_dims: the tokens vectors dimensions - shared_go_bot_params: GO-bot hyperparams used in various parts of the pipeline - attention_params: the params of attention mechanism of the network for which input size is calculated - - Returns: - the calculated input shape of policy network - """ - input_size = 6 + shared_go_bot_params.num_tracker_features + shared_go_bot_params.num_actions # todo: why 6 - if tokens_dims.bow_dim: - input_size += tokens_dims.bow_dim - if tokens_dims.embedding_dim: - input_size += tokens_dims.embedding_dim - if shared_go_bot_params.num_intents: - input_size += shared_go_bot_params.num_intents - if attention_params is not None: - input_size -= attention_params.token_size - - return input_size - - @staticmethod - def configure_attn(attn: dict, - tokens_dims: TokensVectorRepresentationParams, - features_params: SharedGoBotParams): - # todo store params in proper class objects not in dicts, requires serialization logic update - - if not attn: - return None - - token_size = tokens_dims.embedding_dim # todo sync with nn params - action_as_key = attn.get('action_as_key', False) - intent_as_key = attn.get('intent_as_key', False) - key_size = PolicyNetwork.calc_attn_key_size(features_params, action_as_key, intent_as_key) - - gobot_attn_params = GobotAttnParams(max_num_tokens=attn.get("max_num_tokens"), - hidden_size=attn.get("hidden_size"), - token_size=token_size, - key_size=key_size, - type_=attn.get("type"), - projected_align=attn.get("projected_align"), - depth=attn.get("depth"), - action_as_key=action_as_key, - intent_as_key=intent_as_key) - - return gobot_attn_params - - @staticmethod - def calc_attn_key_size(shared_go_bot_params: SharedGoBotParams, action_as_key: bool, intent_as_key: bool) -> int: - """ - Args: - shared_go_bot_params: GO-bot hyperparams used in various parts of the pipeline - action_as_key: True if actions are part of attention keys - intent_as_key: True if intents are part of attention keys - - Returns: - the calculated attention key shape of policy network - """ - # True if actions are part of attention keys -- actually *the last predicted action* - - possible_key_size = 0 - if action_as_key: - possible_key_size += shared_go_bot_params.num_actions - if intent_as_key and shared_go_bot_params.num_intents: - possible_key_size += shared_go_bot_params.num_intents - possible_key_size = possible_key_size or 1 # todo rewrite - return possible_key_size - - def calc_attn_key(self, nlu_response: NLUResponse, tracker_knowledge: DSTKnowledge): - """ - Args: - nlu_response: nlu analysis output, currently only intents data is used - tracker_knowledge: one-hot-encoded previous executed action - - Returns: - vector representing an attention key - """ - # todo dto-like class for the attn features? - - attn_key = np.array([], dtype=np.float32) - - if self.attention_params: - if self.attention_params.action_as_key: - attn_key = np.hstack((attn_key, tracker_knowledge.tracker_prev_action)) - if self.attention_params.intent_as_key: - attn_key = np.hstack((attn_key, nlu_response.intents)) - if len(attn_key) == 0: - attn_key = np.array([1], dtype=np.float32) - return attn_key - - @staticmethod - def stack_features(nlu_response: NLUResponse, - tracker_knowledge: DSTKnowledge): - return np.hstack((nlu_response.tokens_vectorized.tokens_bow_encoded, - nlu_response.tokens_vectorized.tokens_aggregated_embedding, - nlu_response.intents, - tracker_knowledge.state_features, - tracker_knowledge.context_features, - tracker_knowledge.tracker_prev_action)) - - @staticmethod - def calc_action_mask(tracker_knowledge: DSTKnowledge): - # mask is used to prevent tracker from predicting the api call twice - # via logical AND of action candidates and mask - # todo: seems to be an efficient idea but the intuition beyond this whole hack is not obvious - mask = np.ones(tracker_knowledge.n_actions, dtype=np.float32) - - if np.any(tracker_knowledge.tracker_prev_action): - prev_act_id = np.argmax(tracker_knowledge.tracker_prev_action) - if prev_act_id == tracker_knowledge.api_call_id: - mask[prev_act_id] = 0. - - return mask - - def digitize_features(self, - nlu_response: NLUResponse, - tracker_knowledge: DSTKnowledge) -> DigitizedPolicyFeatures: - attn_key = self.calc_attn_key(nlu_response, tracker_knowledge) - concat_feats = self.stack_features(nlu_response, tracker_knowledge) - action_mask = tracker_knowledge.action_mask - - return DigitizedPolicyFeatures(attn_key, concat_feats, action_mask) - - def _build_graph(self) -> None: - self._add_placeholders() - - _logits, self._state = self._build_body() - - # probabilities normalization : elemwise multiply with action mask - _logits_exp = tf.multiply(tf.exp(_logits), self._action_mask) - _logits_exp_sum = tf.expand_dims(tf.reduce_sum(_logits_exp, -1), -1) - self._probs = tf.squeeze(_logits_exp / _logits_exp_sum, name='probs') - - # loss, train and predict operations - self._prediction = tf.argmax(self._probs, axis=-1, name='prediction') - - # _weights = tf.expand_dims(self._utterance_mask, -1) - # TODO: try multiplying logits to action_mask - onehots = tf.one_hot(self._action, self.action_size) - _loss_tensor = tf.nn.softmax_cross_entropy_with_logits_v2( - logits=_logits, labels=onehots - ) - # multiply with batch utterance mask - _loss_tensor = tf.multiply(_loss_tensor, self._utterance_mask) - self._loss = tf.reduce_mean(_loss_tensor, name='loss') - self._loss += self.l2_reg_coef * tf.losses.get_regularization_loss() - self._train_op = self.get_train_op(self._loss) - - def _add_placeholders(self) -> None: - self._dropout_keep_prob = tf.placeholder_with_default(1.0, shape=[], name='dropout_prob') - - self._features = tf.placeholder(tf.float32, [None, None, self.input_size], name='features') - - self._action = tf.placeholder(tf.int32, [None, None], name='ground_truth_action') - - self._action_mask = tf.placeholder(tf.float32, [None, None, self.action_size], name='action_mask') - - self._utterance_mask = tf.placeholder(tf.float32, shape=[None, None], name='utterance_mask') - - self._batch_size = tf.shape(self._features)[0] - - zero_state = tf.zeros([self._batch_size, self.hidden_size], dtype=tf.float32) - _initial_state_c = tf.placeholder_with_default(zero_state, shape=[None, self.hidden_size]) - _initial_state_h = tf.placeholder_with_default(zero_state, shape=[None, self.hidden_size]) - self._initial_state = tf.nn.rnn_cell.LSTMStateTuple(_initial_state_c, _initial_state_h) - - if self.attention_params: - _emb_context_shape = [None, None, self.attention_params.max_num_tokens, - self.attention_params.token_size] - self._emb_context = tf.placeholder(tf.float32, _emb_context_shape, name='emb_context') - self._key = tf.placeholder(tf.float32, [None, None, self.attention_params.key_size], name='key') - - def _build_body(self) -> Tuple[tf.Tensor, tf.Tensor]: - # input projection - _units = tf.layers.dense(self._features, self.dense_size, - kernel_regularizer=tf.nn.l2_loss, kernel_initializer=xav()) - - if self.attention_params: - _attn_output = self._build_attn_body() - _units = tf.concat([_units, _attn_output], -1) - - _units = tf_layers.variational_dropout(_units, keep_prob=self._dropout_keep_prob) - - # recurrent network unit - _lstm_cell = tf.nn.rnn_cell.LSTMCell(self.hidden_size) - _utter_lengths = tf.cast(tf.reduce_sum(self._utterance_mask, axis=-1), tf.int32) - - # _output: [batch_size, max_time, hidden_size] - # _state: tuple of two [batch_size, hidden_size] - _output, _state = tf.nn.dynamic_rnn(_lstm_cell, _units, - time_major=False, initial_state=self._initial_state, - sequence_length=_utter_lengths) - - _output = tf.reshape(_output, (self._batch_size, -1, self.hidden_size)) - _output = tf_layers.variational_dropout(_output, keep_prob=self._dropout_keep_prob) - # output projection - _logits = tf.layers.dense(_output, self.action_size, - kernel_regularizer=tf.nn.l2_loss, kernel_initializer=xav(), name='logits') - return _logits, _state - - def _build_attn_body(self): - attn_scope = f"attention_params/{self.attention_params.type_}" - with tf.variable_scope(attn_scope): - if self.attention_params.type_ == 'general': - _attn_output = am.general_attention(self._key, self._emb_context, - hidden_size=self.attention_params.hidden_size, - projected_align=self.attention_params.projected_align) - elif self.attention_params.type_ == 'bahdanau': - _attn_output = am.bahdanau_attention(self._key, self._emb_context, - hidden_size=self.attention_params.hidden_size, - projected_align=self.attention_params.projected_align) - elif self.attention_params.type_ == 'cs_general': - _attn_output = am.cs_general_attention(self._key, self._emb_context, - hidden_size=self.attention_params.hidden_size, - depth=self.attention_params.depth, - projected_align=self.attention_params.projected_align) - elif self.attention_params.type_ == 'cs_bahdanau': - _attn_output = am.cs_bahdanau_attention(self._key, self._emb_context, - hidden_size=self.attention_params.hidden_size, - depth=self.attention_params.depth, - projected_align=self.attention_params.projected_align) - elif self.attention_params.type_ == 'light_general': - _attn_output = am.light_general_attention(self._key, self._emb_context, - hidden_size=self.attention_params.hidden_size, - projected_align=self.attention_params.projected_align) - elif self.attention_params.type_ == 'light_bahdanau': - _attn_output = am.light_bahdanau_attention(self._key, self._emb_context, - hidden_size=self.attention_params.hidden_size, - projected_align=self.attention_params.projected_align) - else: - raise ValueError("wrong value for attention mechanism type") - return _attn_output - - def train_checkpoint_exists(self): - return tf.train.checkpoint_exists(str(self.load_path.resolve())) - - def get_attn_hyperparams(self) -> Optional[GobotAttnParams]: - attn_hyperparams = None - if self.attention_params: - attn_hyperparams = self.attention_params - return attn_hyperparams - - def has_attn(self): - """ - Returns: - True if the model has an attention mechanism - """ - return self.attention_params is not None - - def get_attn_window_size(self): - """ - Returns: - the length of the window the model looks with attn if the attention mechanism is configured. - if the model has no attention mechanism returns None. - """ - return self.attention_params.max_num_tokens if self.has_attn() else None - - def __call__(self, batch_dialogues_features: BatchDialoguesFeatures, - states_c: np.ndarray, states_h: np.ndarray, prob: bool = False, - *args, **kwargs) -> PolicyPrediction: - - states_c = [[states_c]] # list of list aka batch of dialogues - states_h = [[states_h]] # list of list aka batch of dialogues - - feed_dict = { - self._dropout_keep_prob: 1., - self._initial_state: (states_c, states_h), - self._utterance_mask: batch_dialogues_features.b_padded_dialogue_length_mask, - self._features: batch_dialogues_features.b_featuress, - self._action_mask: batch_dialogues_features.b_action_masks - } - if self.attention_params: - feed_dict[self._emb_context] = batch_dialogues_features.b_tokens_embeddings_paddeds - feed_dict[self._key] = batch_dialogues_features.b_attn_keys - - probs, prediction, state = self.sess.run([self._probs, self._prediction, self._state], feed_dict=feed_dict) - - policy_prediction = PolicyPrediction(probs, prediction, state[0], state[1]) - - return policy_prediction - - def train_on_batch(self, - batch_dialogues_features: BatchDialoguesFeatures, - batch_dialogues_targets: BatchDialoguesTargets) -> dict: - - feed_dict = { - self._dropout_keep_prob: 1., - self._utterance_mask: batch_dialogues_features.b_padded_dialogue_length_mask, - self._features: batch_dialogues_features.b_featuress, - self._action: batch_dialogues_targets.b_action_ids, - self._action_mask: batch_dialogues_features.b_action_masks - } - - if self.attention_params: - feed_dict[self._emb_context] = batch_dialogues_features.b_tokens_embeddings_paddeds - feed_dict[self._key] = batch_dialogues_features.b_attn_keys - - _, loss_value, prediction = self.sess.run([self._train_op, self._loss, self._prediction], feed_dict=feed_dict) - - return {'loss': loss_value, - 'learning_rate': self.get_learning_rate(), - 'momentum': self.get_momentum()} - - def load(self, *args, **kwargs) -> None: - # todo move load_nn_params here? - self._load_nn_params() - super().load(*args, **kwargs) - - def _load_nn_params(self) -> None: - if self.debug: - log.debug(f"BEFORE {self.__class__.__name__} _load_nn_params()") - - path = str(self.load_path.with_suffix('.json').resolve()) - - if self.debug: - log.debug(f"INSIDE {self.__class__.__name__} _load_nn_params(): path={path}") - # log.info(f"[loading parameters from {path}]") - with open(path, 'r', encoding='utf8') as fp: - params = json.load(fp) - if self.debug: - log.debug(f"INSIDE {self.__class__.__name__} _load_nn_params(): " - f"params={params}, GRAPH_PARAMS={self.GRAPH_PARAMS}") - - for p in self.GRAPH_PARAMS: - if self.__getattribute__(p) != params.get(p) and p not in {'attn', - 'attention_mechanism', 'attention_params'}: - # todo backward-compatible attention serialization - raise ConfigError(f"`{p}` parameter must be equal to saved" - f" model parameter value `{params.get(p)}`," - f" but is equal to `{self.__getattribute__(p)}`") - - if self.debug: - log.debug(f"AFTER {self.__class__.__name__} _load_nn_params()") - - def save(self, *args, **kwargs) -> None: - super().save(*args, **kwargs) - # todo move save_nn_params here? - self._save_nn_params() - - def _save_nn_params(self) -> None: - if self.debug: - log.debug(f"BEFORE {self.__class__.__name__} _save_nn_params()") - - path = str(self.save_path.with_suffix('.json').resolve()) - if self.debug: - log.debug(f"INSIDE {self.__class__.__name__} _save_nn_params(): path={path}") - nn_params = {opt: self.__getattribute__(opt) for opt in self.SERIALIZABLE_FIELDS} - if self.debug: - log.debug(f"INSIDE {self.__class__.__name__} _save_nn_params(): nn_params={nn_params}") - # log.info(f"[saving parameters to {path}]") - with open(path, 'w', encoding='utf8') as fp: - json.dump(nn_params, fp) - - if self.debug: - log.debug(f"AFTER {self.__class__.__name__} _save_nn_params()") diff --git a/deeppavlov/models/go_bot/tracker/__init__.py b/deeppavlov/models/go_bot/tracker/__init__.py deleted file mode 100644 index e69de29bb2..0000000000 diff --git a/deeppavlov/models/go_bot/tracker/dialogue_state_tracker.py b/deeppavlov/models/go_bot/tracker/dialogue_state_tracker.py deleted file mode 100644 index 9a0cb32c49..0000000000 --- a/deeppavlov/models/go_bot/tracker/dialogue_state_tracker.py +++ /dev/null @@ -1,279 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from logging import getLogger -from pathlib import Path -from typing import List, Union, Optional, Dict, Tuple, Any - -import numpy as np - -from deeppavlov.core.models.component import Component -from deeppavlov.models.go_bot.nlg.nlg_manager import NLGManagerInterface -from deeppavlov.models.go_bot.policy.dto.policy_network_params import PolicyNetworkParams -from deeppavlov.models.go_bot.tracker.dto.dst_knowledge import DSTKnowledge -from deeppavlov.models.go_bot.tracker.featurized_tracker import FeaturizedTracker - -log = getLogger(__name__) - - -class DialogueStateTracker(FeaturizedTracker): - def get_current_knowledge(self) -> DSTKnowledge: - state_features = self.get_features() - context_features = self.calc_context_features() - knowledge = DSTKnowledge(self.prev_action, - state_features, context_features, - self.api_call_id, - self.n_actions, - self.calc_action_mask()) - return knowledge - - def __init__(self, - slot_names, - n_actions: int, - api_call_id: int, - hidden_size: int, - database: Component = None, - domain_yml_path: Optional[Union[str, Path]]=None, - stories_yml_path: Optional[Union[str, Path]]=None, - **kwargs) -> None: - super().__init__(slot_names, domain_yml_path, stories_yml_path, **kwargs) - self.hidden_size = hidden_size - self.database = database - self.n_actions = n_actions - self.api_call_id = api_call_id - self.ffill_act_ids2req_slots_ids: Dict[int, List[int]] = dict() - self.ffill_act_ids2aqd_slots_ids: Dict[int, List[int]] = dict() - self.reset_state() - - @staticmethod - def from_gobot_params(parent_tracker: FeaturizedTracker, - nlg_manager: NLGManagerInterface, - policy_network_params: PolicyNetworkParams, - database: Component): - slot_names = parent_tracker.slot_names - - # region set formfilling info - act2act_id = {a_text: nlg_manager.get_action_id(a_text) for a_text in nlg_manager.known_actions()} - action_id2aqd_slots_ids, action_id2req_slots_ids = DialogueStateTracker.extract_reqiured_acquired_slots_ids_mapping( - act2act_id, slot_names, nlg_manager, parent_tracker) - - # todo why so ugly and duplicated in multiple users tracker - dialogue_state_tracker = DialogueStateTracker(slot_names, nlg_manager.num_of_known_actions(), - nlg_manager.get_api_call_action_id(), - policy_network_params.hidden_size, - database, - parent_tracker.domain_yml_path, - parent_tracker.stories_path) - - dialogue_state_tracker.ffill_act_ids2req_slots_ids = action_id2req_slots_ids - dialogue_state_tracker.ffill_act_ids2aqd_slots_ids = action_id2aqd_slots_ids - - # endregion set formfilling info - return dialogue_state_tracker - - @staticmethod - def extract_reqiured_acquired_slots_ids_mapping(act2act_id: Dict, - slot_names: List, - nlg_manager: NLGManagerInterface, - parent_tracker: FeaturizedTracker) -> Tuple[Dict[str, np.ndarray], Dict[str, np.ndarray]]: - """ - get the required and acquired slots information for each known action in the -Hot Encoding form - Args: - act2act_id: the mapping of actions onto their ids - slot_names: the names of slots known to the tracker - nlg_manager: the NLG manager used in system - parent_tracker: the tracker to take required and acquired slots information from - - Returns: - the dicts providing np.array masks of required and acquired slots for each known action - """ - action_id2aqd_slots_ids = dict() # aqd stands for acquired - action_id2req_slots_ids = dict() - for act in nlg_manager.known_actions(): - act_id = act2act_id[act] - - action_id2req_slots_ids[act_id] = np.zeros(len(slot_names), dtype=np.float32) - action_id2aqd_slots_ids[act_id] = np.zeros(len(slot_names), dtype=np.float32) - - if isinstance(act, tuple): - acts = act - else: - acts = [act] - - for act in acts: - for slot_name_i, slot_name in enumerate(parent_tracker.action_names2required_slots.get(act, [])): - slot_ix_in_tracker = slot_names.index(slot_name) - action_id2req_slots_ids[act_id][slot_ix_in_tracker] = 1. - for slot_name_i, slot_name in enumerate(parent_tracker.action_names2acquired_slots.get(act, [])): - slot_ix_in_tracker = slot_names.index(slot_name) - action_id2aqd_slots_ids[act_id][slot_ix_in_tracker] = 1. - return action_id2aqd_slots_ids, action_id2req_slots_ids - - def reset_state(self): - super().reset_state() - self.db_result = None - self.current_db_result = None - self.prev_action = np.zeros(self.n_actions, dtype=np.float32) - self._reset_network_state() - - def _reset_network_state(self): - self.network_state = ( - np.zeros([1, self.hidden_size], dtype=np.float32), - np.zeros([1, self.hidden_size], dtype=np.float32) - ) - - def update_previous_action(self, prev_act_id: int) -> None: - self.prev_action *= 0. - self.prev_action[prev_act_id] = 1. - - # todo oserikov это стоит переписать - def update_ground_truth_db_result_from_context(self, context: Dict[str, Any]): - self.current_db_result = context.get('db_result', None) - self._update_db_result() - - def make_api_call(self) -> None: - slots = self.get_state() - db_results = [] - if self.database is not None: - - # filter slot keys with value equal to 'dontcare' as - # there is no such value in database records - # and remove unknown slot keys (for example, 'this' in dstc2 tracker) - db_slots = { - s: v for s, v in slots.items() if v != 'dontcare' and s in self.database.keys - } - - db_results = self.database([db_slots])[0] - - # filter api results if there are more than one - # TODO: add sufficient criteria for database results ranking - if len(db_results) > 1: - db_results = [r for r in db_results if r != self.db_result] - else: - log.warning("No database specified.") - - log.info(f"Made api_call with {slots}, got {len(db_results)} results.") - self.current_db_result = {} if not db_results else db_results[0] - self._update_db_result() - - def calc_action_mask(self) -> np.ndarray: - mask = np.ones(self.n_actions, dtype=np.float32) - - if np.any(self.prev_action): - prev_act_id = np.argmax(self.prev_action) - if prev_act_id == self.api_call_id: - mask[prev_act_id] = 0. - - for act_id in range(self.n_actions): - required_slots_mask = self.ffill_act_ids2req_slots_ids[act_id] - acquired_slots_mask = self.ffill_act_ids2aqd_slots_ids[act_id] - act_req_slots_fulfilled = np.equal((required_slots_mask * self._binary_features()), required_slots_mask) - act_requirements_not_fulfilled = np.invert(act_req_slots_fulfilled)# if act_req_slots_fulfilled != [] else np.array([]) - ack_slot_is_already_known = np.equal((acquired_slots_mask * self._binary_features()), acquired_slots_mask) - - if any(act_requirements_not_fulfilled) or (all(ack_slot_is_already_known) and any(acquired_slots_mask)): - mask[act_id] = 0. - - return mask - - def calc_context_features(self): - # todo некрасиво - current_db_result = self.current_db_result - db_result = self.db_result - dst_state = self.get_state() - - result_matches_state = 0. - if current_db_result is not None: - matching_items = dst_state.items() - result_matches_state = all(v == db_result.get(s) - for s, v in matching_items - if v != 'dontcare') * 1. - context_features = np.array([ - bool(current_db_result) * 1., - (current_db_result == {}) * 1., - (db_result is None) * 1., - bool(db_result) * 1., - (db_result == {}) * 1., - result_matches_state - ], dtype=np.float32) - return context_features - - def _update_db_result(self): - if self.current_db_result is not None: - self.db_result = self.current_db_result - - def fill_current_state_with_db_results(self) -> dict: - slots = self.get_state() - if self.db_result: - for k, v in self.db_result.items(): - slots[k] = str(v) - return slots - - -class MultipleUserStateTrackersPool(object): - def __init__(self, base_tracker: DialogueStateTracker): - self._ids_to_trackers = {} - self.base_tracker = base_tracker - - def check_new_user(self, user_id: int) -> bool: - return user_id in self._ids_to_trackers - - def get_user_tracker(self, user_id: int) -> DialogueStateTracker: - if not self.check_new_user(user_id): - raise RuntimeError(f"The user with {user_id} ID is not being tracked") - - tracker = self._ids_to_trackers[user_id] - - # TODO: understand why setting current_db_result to None is necessary - tracker.current_db_result = None - return tracker - - def new_tracker(self): - # todo deprecated and never used? - tracker = DialogueStateTracker(self.base_tracker.slot_names, self.base_tracker.n_actions, - self.base_tracker.api_call_id, self.base_tracker.hidden_size, - self.base_tracker.database) - return tracker - - def get_or_init_tracker(self, user_id: int): - if not self.check_new_user(user_id): - self.init_new_tracker(user_id, self.base_tracker) - - return self.get_user_tracker(user_id) - - def init_new_tracker(self, user_id: int, tracker_entity: DialogueStateTracker) -> None: - # TODO: implement a better way to init a tracker - # todo deprecated. The whole class should follow AbstractFactory or Pool pattern? - tracker = DialogueStateTracker( - tracker_entity.slot_names, - tracker_entity.n_actions, - tracker_entity.api_call_id, - tracker_entity.hidden_size, - tracker_entity.database, - tracker_entity.domain_yml_path, - tracker_entity.stories_path - ) - tracker.ffill_act_ids2req_slots_ids = tracker_entity.ffill_act_ids2req_slots_ids - tracker.ffill_act_ids2aqd_slots_ids = tracker_entity.ffill_act_ids2aqd_slots_ids - - self._ids_to_trackers[user_id] = tracker - - def reset(self, user_id: int = None) -> None: - if user_id is not None and not self.check_new_user(user_id): - raise RuntimeError(f"The user with {user_id} ID is not being tracked") - - if user_id is not None: - self._ids_to_trackers[user_id].reset_state() - else: - self._ids_to_trackers.clear() diff --git a/deeppavlov/models/go_bot/tracker/dto/__init__.py b/deeppavlov/models/go_bot/tracker/dto/__init__.py deleted file mode 100644 index e69de29bb2..0000000000 diff --git a/deeppavlov/models/go_bot/tracker/dto/dst_knowledge.py b/deeppavlov/models/go_bot/tracker/dto/dst_knowledge.py deleted file mode 100644 index 6fe0837a77..0000000000 --- a/deeppavlov/models/go_bot/tracker/dto/dst_knowledge.py +++ /dev/null @@ -1,12 +0,0 @@ -from deeppavlov.models.go_bot.tracker.dto.tracker_knowledge_interface import TrackerKnowledgeInterface - - -# todo naming -class DSTKnowledge(TrackerKnowledgeInterface): - def __init__(self, tracker_prev_action, state_features, context_features, api_call_id, n_actions, action_mask): - self.tracker_prev_action = tracker_prev_action - self.state_features = state_features - self.context_features = context_features - self.api_call_id = api_call_id - self.n_actions = n_actions - self.action_mask = action_mask diff --git a/deeppavlov/models/go_bot/tracker/dto/tracker_knowledge_interface.py b/deeppavlov/models/go_bot/tracker/dto/tracker_knowledge_interface.py deleted file mode 100644 index cd20f358fd..0000000000 --- a/deeppavlov/models/go_bot/tracker/dto/tracker_knowledge_interface.py +++ /dev/null @@ -1,5 +0,0 @@ -from abc import ABCMeta - - -class TrackerKnowledgeInterface(metaclass=ABCMeta): - pass diff --git a/deeppavlov/models/go_bot/tracker/featurized_tracker.py b/deeppavlov/models/go_bot/tracker/featurized_tracker.py deleted file mode 100644 index ec1314036b..0000000000 --- a/deeppavlov/models/go_bot/tracker/featurized_tracker.py +++ /dev/null @@ -1,270 +0,0 @@ -import json -from pathlib import Path -from typing import List, Iterator, Union, Optional, Dict, Tuple - -import numpy as np - -from deeppavlov.core.commands.utils import expand_path -from deeppavlov.core.common.file import read_yaml -from deeppavlov.core.common.registry import register -from deeppavlov.dataset_readers.md_yaml_dialogs_reader import DomainKnowledge, MD_YAML_DialogsDatasetReader -from deeppavlov.models.go_bot.nlu.dto.nlu_response import NLUResponse -from deeppavlov.models.go_bot.tracker.dto.tracker_knowledge_interface import TrackerKnowledgeInterface -from deeppavlov.models.go_bot.tracker.tracker_interface import TrackerInterface - - -@register('featurized_tracker') -class FeaturizedTracker(TrackerInterface): - """ - Tracker that overwrites slots with new values. - Features are binary features (slot is present/absent) plus difference features - (slot value is (the same)/(not the same) as before last update) and count - features (sum of present slots and sum of changed during last update slots). - - Parameters: - slot_names: list of slots that should be tracked. - actions_required_acquired_slots_path: (optional) path to json-file with mapping - of actions to slots that should be filled to allow for action to be executed - """ - - def get_current_knowledge(self) -> TrackerKnowledgeInterface: - raise NotImplementedError("Featurized tracker lacks get_current_knowledge() method. " - "To be improved in future versions.") - - def __init__(self, - slot_names: List[str], - # actions_required_acquired_slots_path: Optional[Union[str, Path]]=None, - domain_yml_path: Optional[Union[str, Path]]=None, - stories_yml_path: Optional[Union[str, Path]]=None, - **kwargs) -> None: - self.slot_names = list(slot_names) - self.domain_yml_path = domain_yml_path - self.stories_path = stories_yml_path - self.action_names2required_slots, self.action_names2acquired_slots =\ - self._load_actions2slots_formfilling_info_from(domain_yml_path, stories_yml_path) - self.history = [] - self.current_features = None - - @property - def state_size(self) -> int: - return len(self.slot_names) - - @property - def num_features(self) -> int: - return self.state_size * 3 + 3 - - def update_state(self, nlu_response: NLUResponse): - slots = nlu_response.slots - - if isinstance(slots, list): - self.history.extend(self._filter(slots)) - - elif isinstance(slots, dict): - for slot, value in self._filter(slots.items()): - self.history.append((slot, value)) - - prev_state = self.get_state() - bin_feats = self._binary_features() - diff_feats = self._diff_features(prev_state) - new_feats = self._new_features(prev_state) - - self.current_features = np.hstack(( - bin_feats, - diff_feats, - new_feats, - np.sum(bin_feats), - np.sum(diff_feats), - np.sum(new_feats)) - ) - - def get_state(self): - # lasts = {} - # for slot, value in self.history: - # lasts[slot] = value - # return lasts - return dict(self.history) - - def reset_state(self): - self.history = [] - self.current_features = np.zeros(self.num_features, dtype=np.float32) - - def get_features(self): - return self.current_features - - def _filter(self, slots) -> Iterator: - return filter(lambda s: s[0] in self.slot_names, slots) - - def _binary_features(self) -> np.ndarray: - feats = np.zeros(self.state_size, dtype=np.float32) - lasts = self.get_state() - for i, slot in enumerate(self.slot_names): - if slot in lasts: - feats[i] = 1. - return feats - - def _diff_features(self, state) -> np.ndarray: - feats = np.zeros(self.state_size, dtype=np.float32) - curr_state = self.get_state() - - for i, slot in enumerate(self.slot_names): - if slot in curr_state and slot in state and curr_state[slot] != state[slot]: - feats[i] = 1. - - return feats - - def _new_features(self, state) -> np.ndarray: - feats = np.zeros(self.state_size, dtype=np.float32) - curr_state = self.get_state() - - for i, slot in enumerate(self.slot_names): - if slot in curr_state and slot not in state: - feats[i] = 1. - - return feats - - def _load_actions2slots_formfilling_info_from_json(self, - actions_required_acquired_slots_path: Optional[Union[str, Path]] = None)\ - -> Tuple[Dict[str, List[str]], Dict[str, List[str]]]: - """ - loads the formfilling mapping of actions onto the required slots from the json of the following structure: - {action1: {"required": [required_slot_name_1], "acquired": [acquired_slot_name_1, acquired_slot_name_2]}, - action2: {"required": [required_slot_name_21, required_slot_name_22], "acquired": [acquired_slot_name_21]}, - ..} - Returns: - the dictionary represented by the passed json - """ - actions_required_acquired_slots_path = expand_path(actions_required_acquired_slots_path) - with open(actions_required_acquired_slots_path, encoding="utf-8") as actions2slots_json_f: - actions2slots = json.load(actions2slots_json_f) - actions2required_slots = {act: act_slots["required"] for act, act_slots in actions2slots.items()} - actions2acquired_slots = {act: act_slots["acquired"] for act, act_slots in actions2slots.items()} - return actions2required_slots, actions2acquired_slots - - def _load_actions2slots_formfilling_info_from(self, - domain_yml_path: Optional[Union[str, Path]], - stories_yml_path: Optional[Union[str, Path]])\ - -> Tuple[Dict[str, List[str]], Dict[str, List[str]]]: - """ - loads the formfilling mapping of actions onto the required slots from the domain.yml form description: - - restaurant_form: - cuisine: - - type: from_entity - entity: cuisine - num_people: - - type: from_entity - entity: number - - Returns: - the dictionary represented by the passed json - """ - if domain_yml_path is None or stories_yml_path is None: - return {}, {} - - domain_yml_path = expand_path(domain_yml_path) - domain_knowledge: DomainKnowledge = DomainKnowledge.from_yaml(domain_yml_path) - potential_api_or_db_actions = domain_knowledge.known_actions - forms = domain_knowledge.forms - form_names = list(forms.keys()) - - # todo migrate to rasa2.0 - def read_md_story(story_path: Union[Path, str]) -> Dict[str, List[Dict]]: - """ - given the path to stories.md naively read steps from it. ToDo use MDYAML reader - Args: - story_path: the path to stories.md - - Returns: - the dict containing info on all the stories used - """ - story_f = open(story_path, 'r') - stories_li = [] - curr_story = None - for line in story_f: - line = line.strip() - if not line: continue; - if line.startswith("#"): - if curr_story is not None: - stories_li.append(curr_story) - story_name = line.strip('#').strip() - curr_story = {"story": story_name, "steps": []} - elif line.startswith("*"): - # user turn - step = {"intent": line.strip('*').strip()} - curr_story["steps"].append(step) - elif line.startswith('-'): - # system turn - step = {"action": line.strip('-').strip()} - curr_story["steps"].append(step) - if curr_story is not None: - stories_li.append(curr_story) - story_f.close() - stories_di = {"stories": stories_li} - return stories_di - - stories_md_path = expand_path(stories_yml_path) - stories_yml_di = read_md_story(stories_md_path) - prev_forms = [] - action2forms = {} - for story in stories_yml_di["stories"]: - story_name = story["story"] - story_steps = story["steps"] - for step in story_steps: - if "action" not in step.keys(): - continue - - curr_action = step["action"] - if curr_action.startswith("form"): - curr_action = json.loads(curr_action[len("form"):])["name"] - print(curr_action) - if curr_action in form_names: - prev_forms.append(curr_action) - if curr_action in potential_api_or_db_actions: - action2forms[curr_action] = prev_forms - prev_forms = [] - - def get_slots(system_utter: str, form_name: str) -> List[str]: - """ - Given the utterance story line, extract slots information from it - Args: - system_utter: the utterance story line - form_name: the form we are filling - - Returns: - the slots extracted from the line - """ - slots = [] - if system_utter.startswith(f"utter_ask_{form_name}_"): - slots.append(system_utter[len(f"utter_ask_{form_name}_"):]) - elif system_utter.startswith(f"utter_ask_"): - slots.append(system_utter[len(f"utter_ask_"):]) - else: - # todo: raise an exception - pass - return slots - - actions2acquired_slots = {utter.strip('-').strip(): get_slots(utter.strip('-').strip(), form_name) - for form_name, form in forms.items() - for utter in - MD_YAML_DialogsDatasetReader.augment_form(form_name, domain_knowledge, {}) - if utter.strip().startswith("-")} - forms2acquired_slots = {form_name: self._get_form_acquired_slots(form) for form_name, form in forms.items()} - actions2required_slots = {act: {slot - for form in forms - for slot in forms2acquired_slots[form]} - for act, forms in action2forms.items()} - return actions2required_slots, actions2acquired_slots - - def _get_form_acquired_slots(self, form: Dict) -> List[str]: - """ - given the form, return the slots that are acquired with this form - Args: - form: form to extract acquired slots from - - Returns: - the slots acquired from the passed form - """ - acquired_slots = [slot_name - for slot_name, slot_info_li in form.items() - if slot_info_li and slot_info_li[0].get("type", '') == "from_entity"] - return acquired_slots diff --git a/deeppavlov/models/go_bot/tracker/tracker_interface.py b/deeppavlov/models/go_bot/tracker/tracker_interface.py deleted file mode 100644 index a311d3da2a..0000000000 --- a/deeppavlov/models/go_bot/tracker/tracker_interface.py +++ /dev/null @@ -1,42 +0,0 @@ -from abc import ABCMeta, abstractmethod -from typing import Any, Dict - -import numpy as np - -from deeppavlov.models.go_bot.nlu.dto.nlu_response_interface import NLUResponseInterface -from deeppavlov.models.go_bot.tracker.dto.tracker_knowledge_interface import TrackerKnowledgeInterface - - -class TrackerInterface(metaclass=ABCMeta): - """ - An abstract class for trackers: a model that holds a dialogue state and - generates state features. - """ - - @abstractmethod - def update_state(self, nlu_response: NLUResponseInterface) -> None: - """Updates dialogue state with new ``slots``, calculates features.""" - pass - - @abstractmethod - def get_state(self) -> Dict[str, Any]: - """ - Returns: - Dict[str, Any]: dictionary with current slots and their values.""" - pass - - @abstractmethod - def reset_state(self) -> None: - """Resets dialogue state""" - pass - - @abstractmethod - def get_features(self) -> np.ndarray: - """ - Returns: - np.ndarray[float]: numpy array with calculates state features.""" - pass - - @abstractmethod - def get_current_knowledge(self) -> TrackerKnowledgeInterface: - pass diff --git a/deeppavlov/models/go_bot/wrapper.py b/deeppavlov/models/go_bot/wrapper.py deleted file mode 100644 index 2a61f6ba5a..0000000000 --- a/deeppavlov/models/go_bot/wrapper.py +++ /dev/null @@ -1,49 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from typing import Iterable - -from deeppavlov.core.common.registry import register -from deeppavlov.core.models.component import Component - - -@register('dialog_component_wrapper') -class DialogComponentWrapper(Component): - - def __init__(self, component: Component, **kwargs): - self.component = component - - @staticmethod - def _get_text(utter): - return utter['text'] - - def __call__(self, batch): - out = [] - if isinstance(batch[0], Iterable) and not isinstance(batch[0], str): - for dialog in batch: - res = self.component([self._get_text(utter) for utter in dialog]) - out.append(res) - else: - out = self.component(batch) - return out - - def fit(self, data): - self.component.fit([self._get_text(utter) - for dialog in data for utter in dialog]) - - def save(self, *args, **kwargs): - self.component.save(*args, **kwargs) - - def load(self, *args, **kwargs): - self.component.load(*args, **kwargs) diff --git a/deeppavlov/models/intent_catcher/__init__.py b/deeppavlov/models/intent_catcher/__init__.py deleted file mode 100644 index e69de29bb2..0000000000 diff --git a/deeppavlov/models/intent_catcher/intent_catcher.py b/deeppavlov/models/intent_catcher/intent_catcher.py deleted file mode 100644 index 87d6d4162a..0000000000 --- a/deeppavlov/models/intent_catcher/intent_catcher.py +++ /dev/null @@ -1,260 +0,0 @@ -# Copyright 2020 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import json -import os -import re -from logging import getLogger -from pathlib import Path -from typing import Union, List - -import numpy as np -import tensorflow as tf -import tensorflow_hub as tfhub -from overrides import overrides -from xeger import Xeger - -from deeppavlov.core.common.registry import register -from deeppavlov.core.models.nn_model import NNModel - -log = getLogger(__name__) - - -@register("intent_catcher") -class IntentCatcher(NNModel): - """Class for IntentCatcher Chainer's pipeline components.""" - - def __init__(self, save_path: Union[str, Path], load_path: Union[str, Path], - embeddings : str = 'use', limit : int = 10, multilabel : bool = False, - number_of_layers : int = 0, number_of_intents : int = 1, - hidden_dim : int = 256, mode : str = 'train', **kwargs) -> None: - """Initializes IntentCatcher model. - - This model is mainly used for user intent detection in conversational systems. - It provides some BERT-based embeddings for start and then fits a number - of dense layers upon them for labels prediction. - The main feature is that the user can provide regular expressions - instead of actual phrases, and the model will derive phrases from it, - thus making construction of the dataset easy and fast. - The number of phrases generated from regexp is control by `limit` parameter. - - Args: - save_path: Path to a directory with pretrained classifier and regexps for IntentCatcher. - load_path: Path to a directory with pretrained classifier and regexps for IntentCatcher. - embeddings: Input embeddings type. Provided embeddings are: USE and USE Large. - limit: Maximum number of phrases, that are generated from input regexps. - multilabel: Whether the task should be multilabel prediction or multiclass. - number_of_layers: Number of hidden dense layers, that come after embeddings. - number_of_intents: Number of output labels. - hidden_dim: Dimension of hidden dense layers, that come after embeddings. - mode: Train or infer mode. If infer - tries to load data from load_path. - **kwargs: Additional parameters whose names will be logged but otherwise ignored. - - """ - super(IntentCatcher, self).__init__(save_path=save_path, load_path=load_path, **kwargs) - if kwargs: - log.info(f'{self.__class__.__name__} got additional init parameters {list(kwargs)} that will be ignored') - urls = { - 'use':"https://tfhub.dev/google/universal-sentence-encoder/2", - 'use_large':"https://tfhub.dev/google/universal-sentence-encoder-large/2" - } - if embeddings not in urls: - raise Exception(f"Provided embeddings type `{embeddings}` is not available. Available embeddings are: use, use_large.") - self.limit = limit - embedder = tfhub.Module(urls[embeddings]) - self.sentences = tf.placeholder(dtype=tf.string) - self.embedded = embedder(self.sentences) - mode = mode.lower().strip() - if mode == 'infer': - self.load() - elif mode == 'train': - log.info("Initializing NN") - self.regexps = set() - self.classifier = self._config_nn(number_of_layers, multilabel, hidden_dim, number_of_intents) - else: - raise Exception(f"Provided mode `{mode}` is not supported!") - log.info("Configuring session") - self.session = self._config_session() - - @staticmethod - def _config_session(): - """ - Configure session for particular device - - Returns: - tensorflow.Session - """ - config = tf.ConfigProto() - config.gpu_options.allow_growth = True - # config.gpu_options.visible_device_list = '0' - session = tf.Session(config=config) - session.run(tf.global_variables_initializer()) - session.run(tf.tables_initializer()) - return session - - def _config_nn(self, number_of_layers, multilabel, hidden_dim, number_of_intents) -> tf.keras.Model: - """ - Initialize Neural Network upon embeddings. - - Returns: - tf.keras.Model - """ - if number_of_layers == 0: - layers = [ - tf.keras.layers.Dense( - units=number_of_intents, - activation='softmax' if not multilabel else 'sigmoid' - ) - ] - elif number_of_layers > 0: - layers = [ - tf.keras.layers.Dense( - units=hidden_dim, - activation='relu' - ) - ] - for i in range(number_of_layers-2): - layers.append( - tf.keras.layers.Dense( - units=hidden_dim, - activation='relu' - ) - ) - layers.append( - tf.keras.layers.Dense( - units=number_of_intents, - activation='softmax' if not multilabel else 'sigmoid' - ) - ) - elif number_of_layers < 0: - raise Exception("Number of layers should be >= 0") - classifier = tf.keras.Sequential(layers=layers) - classifier.compile( - optimizer='adam', - loss='sparse_categorical_crossentropy' if not multilabel else 'binary_crossentropy' - ) - return classifier - - def train_on_batch(self, x: list, y: list) -> List[float]: - """ - Train classifier on batch of data. - - Args: - x: List of input sentences - y: List of input encoded labels - - Returns: - List[float]: list of losses. - """ - assert len(x) == len(y), "Number of labels is not equal to the number of sentences" - try: - regexps = {(re.compile(s), l) for s, l in zip(x, y)} - except Exception as e: - log.error(f"Some sentences are not a consitent regular expressions") - raise e - xeger = Xeger(self.limit) - self.regexps = self.regexps.union(regexps) - generated_x = [] - generated_y = [] - for s, l in zip(x, y): # generate samples and add regexp - gx = {xeger.xeger(s) for _ in range(self.limit)} - generated_x.extend(gx) - generated_y.extend([l for i in range(len(gx))]) - log.info(f"Original number of samples: {len(y)}, generated samples: {len(generated_y)}") - embedded_x = self.session.run(self.embedded, feed_dict={self.sentences:generated_x}) # actual trainig - loss = self.classifier.train_on_batch(embedded_x, generated_y) - return loss - - def process_event(self, event_name, data): - pass - - def __call__(self, x: List[str]) -> List[int]: - """ - Predict probabilities. - - Args: - x: list of input sentences. - Returns: - list of probabilities. - """ - return self._predict_proba(x) - - def _predict_label(self, sentences: List[str]) -> List[int]: - """ - Predict labels. - - Args: - x: list of input sentences. - Returns: - list of labels. - """ - labels = [None for i in range(len(sentences))] - indx = [] - for i, s in enumerate(sentences): - for reg, l in self.regexps: - if reg.fullmatch(s): - labels[i] = l - if not labels[i]: - indx.append(i) - sentences_to_nn = [sentences[i] for i in indx] - x = self.session.run(self.embedded, feed_dict={self.sentences:sentences_to_nn}) - nn_predictions = self.classifier.predict_classes(x) - for i, l in enumerate(nn_predictions): - labels[indx[i]] = l - return labels - - def _predict_proba(self, x: List[str]) -> List[float]: - """ - Predict probabilities. Used in __call__. - - Args: - x: list of input sentences. - Returns: - list of probabilities - """ - x_embedded = self.session.run(self.embedded, feed_dict={self.sentences:x}) - probs = self.classifier.predict_proba(x_embedded) - _, num_labels = probs.shape - for i, s in enumerate(x): - for reg, l in self.regexps: - if reg.fullmatch(s): - probs[i] = np.zeros(num_labels) - probs[i, l] = 1.0 - return probs - - @overrides - def save(self) -> None: - """ - Save classifier parameters and regexps to self.save_path. - """ - log.info("Saving model {} and regexps to {}".format(self.__class__.__name__, self.save_path)) - save_path = Path(self.save_path) - if not save_path.exists(): - if save_path.parent.exists() and save_path.parent / "model" == save_path: - os.mkdir(save_path.parent / "model") - self.classifier.save(self.save_path / Path('nn.h5')) - regexps = [{"regexp":reg.pattern, "label":str(l)} for reg, l in self.regexps] - with open(self.save_path / Path('regexps.json'), 'w') as fp: - json.dump(regexps, fp) - - @overrides - def load(self) -> None: - """ - Load classifier parameters and regexps from self.load_path. - """ - log.info("Loading model {} and regexps from {}".format(self.__class__.__name__, self.save_path)) - self.classifier = tf.keras.models.load_model(self.load_path / Path("nn.h5")) - with open(self.load_path / Path('regexps.json')) as fp: - self.regexps = json.load(fp) - self.regexps = [(re.compile(d['regexp']), int(d['label'])) for d in self.regexps] diff --git a/deeppavlov/models/kbqa/entity_linking.py b/deeppavlov/models/kbqa/entity_linking.py deleted file mode 100644 index 43e8b18083..0000000000 --- a/deeppavlov/models/kbqa/entity_linking.py +++ /dev/null @@ -1,422 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from logging import getLogger -from typing import List, Dict, Tuple -from collections import defaultdict - -import numpy as np -import pymorphy2 -import faiss -from nltk.corpus import stopwords -from nltk import sent_tokenize -from sklearn.feature_extraction.text import TfidfVectorizer - -from deeppavlov.core.common.registry import register -from deeppavlov.core.models.component import Component -from deeppavlov.core.common.chainer import Chainer -from deeppavlov.core.models.serializable import Serializable -from deeppavlov.core.commands.utils import expand_path -from deeppavlov.core.common.file import load_pickle, save_pickle -from deeppavlov.models.kbqa.entity_detection_parser import EntityDetectionParser -from deeppavlov.models.kbqa.rel_ranking_bert_infer import RelRankerBertInfer - -log = getLogger(__name__) - - -@register('ner_chunker') -class NerChunker(Component): - """ - Class to split documents into chunks of max_chunk_len symbols so that the length will not exceed - maximal sequence length to feed into BERT - """ - - def __init__(self, max_chunk_len: int = 300, batch_size: int = 30, **kwargs): - """ - - Args: - max_chunk_len: maximal length of chunks into which the document is split - batch_size: how many chunks are in batch - """ - self.max_chunk_len = max_chunk_len - self.batch_size = batch_size - - def __call__(self, docs_batch: List[str]) -> Tuple[List[List[str]], List[List[int]]]: - """ - This method splits each document in the batch into chunks wuth the maximal length of max_chunk_len - - Args: - docs_batch: batch of documents - - Returns: - batch of lists of document chunks for each document - batch of lists of numbers of documents which correspond to chunks - """ - text_batch_list = [] - text_batch = [] - nums_batch_list = [] - nums_batch = [] - count_texts = 0 - text = "" - curr_doc = 0 - for n, doc in enumerate(docs_batch): - sentences = sent_tokenize(doc) - for sentence in sentences: - if len(text) + len(sentence) < self.max_chunk_len and n == curr_doc: - text += f"{sentence} " - else: - if count_texts < self.batch_size: - text_batch.append(text.strip()) - if n == curr_doc: - nums_batch.append(n) - else: - nums_batch.append(n - 1) - count_texts += 1 - else: - text_batch_list.append(text_batch) - text_batch = [] - nums_batch_list.append(nums_batch) - nums_batch = [n] - count_texts = 0 - curr_doc = n - text = f"{sentence} " - - if text: - text_batch.append(text.strip()) - text_batch_list.append(text_batch) - nums_batch.append(len(docs_batch) - 1) - nums_batch_list.append(nums_batch) - - return text_batch_list, nums_batch_list - - -@register('entity_linker') -class EntityLinker(Component, Serializable): - """ - Class for linking of entity substrings in the document to entities in Wikidata - """ - - def __init__(self, load_path: str, - word_to_idlist_filename: str, - entities_list_filename: str, - entities_ranking_filename: str, - vectorizer_filename: str, - faiss_index_filename: str, - chunker: NerChunker = None, - ner: Chainer = None, - ner_parser: EntityDetectionParser = None, - entity_ranker: RelRankerBertInfer = None, - num_faiss_candidate_entities: int = 20, - num_entities_for_bert_ranking: int = 50, - num_faiss_cells: int = 50, - use_gpu: bool = True, - save_path: str = None, - fit_vectorizer: bool = False, - max_tfidf_features: int = 1000, - include_mention: bool = False, - ngram_range: List[int] = None, - num_entities_to_return: int = 10, - lang: str = "ru", - use_descriptions: bool = True, - lemmatize: bool = False, - **kwargs) -> None: - """ - - Args: - load_path: path to folder with inverted index files - word_to_idlist_filename: file with dict of words (keys) and start and end indices in - entities_list filename of the corresponding entity ids - entities_list_filename: file with the list of entity ids from the knowledge base - entities_ranking_filename: file with dict of entity ids (keys) and number of relations in Wikidata - for entities - vectorizer_filename: filename with TfidfVectorizer data - faiss_index_filename: file with Faiss index of words - chunker: component deeppavlov.models.kbqa.ner_chunker - ner: config for entity detection - ner_parser: component deeppavlov.models.kbqa.entity_detection_parser - entity_ranker: component deeppavlov.models.kbqa.rel_ranking_bert_infer - num_faiss_candidate_entities: number of nearest neighbors for the entity substring from the text - num_entities_for_bert_ranking: number of candidate entities for BERT ranking using description and context - num_faiss_cells: number of Voronoi cells for Faiss index - use_gpu: whether to use GPU for faster search of candidate entities - save_path: path to folder with inverted index files - fit_vectorizer: whether to build index with Faiss library - max_tfidf_features: maximal number of features for TfidfVectorizer - include_mention: whether to leave entity mention in the context (during BERT ranking) - ngram_range: char ngrams range for TfidfVectorizer - num_entities_to_return: number of candidate entities for the substring which are returned - lang: russian or english - use_description: whether to perform entity ranking by context and description - lemmatize: whether to lemmatize tokens - **kwargs: - """ - super().__init__(save_path=save_path, load_path=load_path) - self.morph = pymorphy2.MorphAnalyzer() - self.lemmatize = lemmatize - self.word_to_idlist_filename = word_to_idlist_filename - self.entities_list_filename = entities_list_filename - self.entities_ranking_filename = entities_ranking_filename - self.vectorizer_filename = vectorizer_filename - self.faiss_index_filename = faiss_index_filename - self.num_entities_for_bert_ranking = num_entities_for_bert_ranking - self.num_faiss_candidate_entities = num_faiss_candidate_entities - self.num_faiss_cells = num_faiss_cells - self.use_gpu = use_gpu - self.chunker = chunker - self.ner = ner - self.ner_parser = ner_parser - self.entity_ranker = entity_ranker - self.fit_vectorizer = fit_vectorizer - self.max_tfidf_features = max_tfidf_features - self.include_mention = include_mention - self.ngram_range = ngram_range - self.num_entities_to_return = num_entities_to_return - self.lang_str = f"@{lang}" - if self.lang_str == "@en": - self.stopwords = set(stopwords.words("english")) - elif self.lang_str == "@ru": - self.stopwords = set(stopwords.words("russian")) - self.use_descriptions = use_descriptions - - self.load() - - if self.fit_vectorizer: - self.vectorizer = TfidfVectorizer(analyzer="char_wb", ngram_range=tuple(self.ngram_range), - max_features=self.max_tfidf_features, max_df=0.85) - self.vectorizer.fit(self.word_list) - self.matrix = self.vectorizer.transform(self.word_list) - self.dense_matrix = self.matrix.toarray() - if self.num_faiss_cells > 1: - quantizer = faiss.IndexFlatIP(self.max_tfidf_features) - self.faiss_index = faiss.IndexIVFFlat(quantizer, self.max_tfidf_features, self.num_faiss_cells) - self.faiss_index.train(self.dense_matrix.astype(np.float32)) - else: - self.faiss_index = faiss.IndexFlatIP(self.max_tfidf_features) - if self.use_gpu: - res = faiss.StandardGpuResources() - self.faiss_index = faiss.index_cpu_to_gpu(res, 0, self.faiss_index) - self.faiss_index.add(self.dense_matrix.astype(np.float32)) - self.save_vectorizers_data() - - def load(self) -> None: - self.word_to_idlist = load_pickle(self.load_path / self.word_to_idlist_filename) - self.entities_list = load_pickle(self.load_path / self.entities_list_filename) - self.word_list = list(self.word_to_idlist.keys()) - self.entities_ranking_dict = load_pickle(self.load_path / self.entities_ranking_filename) - if not self.fit_vectorizer: - self.vectorizer = load_pickle(self.load_path / self.vectorizer_filename) - self.faiss_index = faiss.read_index(str(expand_path(self.faiss_index_filename))) - if self.use_gpu: - res = faiss.StandardGpuResources() - self.faiss_index = faiss.index_cpu_to_gpu(res, 0, self.faiss_index) - - def save(self) -> None: - pass - - def save_vectorizers_data(self) -> None: - save_pickle(self.vectorizer, self.save_path / self.vectorizer_filename) - faiss.write_index(self.faiss_index, str(expand_path(self.faiss_index_filename))) - - def __call__(self, docs_batch: List[str]): - """ - - Args: - docs_batch: batch of documents - Returns: - batch of lists of candidate entity ids - """ - text_batch_list, nums_batch_list = self.chunker(docs_batch) - entity_ids_batch_list = [] - entity_substr_batch_list = [] - entity_positions_batch_list = [] - text_len_batch_list = [] - for text_batch in text_batch_list: - entity_ids_batch = [] - ner_tokens_batch, ner_probas_batch = self.ner(text_batch) - entity_substr_batch, _, entity_positions_batch = self.ner_parser(ner_tokens_batch, ner_probas_batch) - log.debug(f"entity_substr_batch {entity_substr_batch}") - log.debug(f"entity_positions_batch {entity_positions_batch}") - entity_substr_batch = [[entity_substr.lower() for tag, entity_substr_list in entity_substr_dict.items() - for entity_substr in entity_substr_list] - for entity_substr_dict in entity_substr_batch] - entity_positions_batch = [[entity_positions for tag, entity_positions_list in entity_positions_dict.items() - for entity_positions in entity_positions_list] - for entity_positions_dict in entity_positions_batch] - log.debug(f"entity_substr_batch {entity_substr_batch}") - log.debug(f"entity_positions_batch {entity_positions_batch}") - for entity_substr_list, entity_positions_list, context_tokens in \ - zip(entity_substr_batch, entity_positions_batch, ner_tokens_batch): - entity_ids_list = [] - if entity_substr_list: - entity_ids_list = self.link_entities(entity_substr_list, entity_positions_list, context_tokens) - entity_ids_batch.append(entity_ids_list) - entity_ids_batch_list.append(entity_ids_batch) - entity_substr_batch_list.append(entity_substr_batch) - entity_positions_batch_list.append(entity_positions_batch) - text_len_batch_list.append([len(text) for text in ner_tokens_batch]) - - doc_entity_ids_batch = [] - doc_entity_substr_batch = [] - doc_entity_positions_batch = [] - doc_entity_ids = [] - doc_entity_substr = [] - doc_entity_positions = [] - cur_doc_num = 0 - text_len_sum = 0 - for entity_ids_batch, entity_substr_batch, entity_positions_batch, text_len_batch, nums_batch in \ - zip(entity_ids_batch_list, entity_substr_batch_list, entity_positions_batch_list, - text_len_batch_list, nums_batch_list): - for entity_ids, entity_substr, entity_positions, text_len, doc_num in \ - zip(entity_ids_batch, entity_substr_batch, entity_positions_batch, text_len_batch, nums_batch): - if doc_num == cur_doc_num: - doc_entity_ids += entity_ids - doc_entity_substr += entity_substr - doc_entity_positions += [[pos + text_len_sum for pos in entity_position] - for entity_position in entity_positions] - text_len_sum += text_len - else: - doc_entity_ids_batch.append(doc_entity_ids) - doc_entity_substr_batch.append(doc_entity_substr) - doc_entity_positions_batch.append(doc_entity_positions) - doc_entity_ids = entity_ids - doc_entity_substr = entity_substr - doc_entity_positions = entity_positions - cur_doc_num = doc_num - text_len_sum = 0 - doc_entity_ids_batch.append(doc_entity_ids) - doc_entity_substr_batch.append(doc_entity_substr) - doc_entity_positions_batch.append(doc_entity_positions) - - return doc_entity_substr_batch, doc_entity_positions_batch, doc_entity_ids_batch - - def link_entities(self, entity_substr_list: List[str], entity_positions_list: List[List[int]] = None, - context_tokens: List[str] = None) -> List[List[str]]: - log.debug(f"context_tokens {context_tokens}") - log.debug(f"entity substr list {entity_substr_list}") - log.debug(f"entity positions list {entity_positions_list}") - entity_ids_list = [] - if entity_substr_list: - entity_substr_list = [[word for word in entity_substr.split(' ') - if word not in self.stopwords and len(word) > 0] - for entity_substr in entity_substr_list] - words_and_indices = [(self.morph_parse(word), i) for i, entity_substr in enumerate(entity_substr_list) - for word in entity_substr] - substr_lens = [len(entity_substr) for entity_substr in entity_substr_list] - log.debug(f"words and indices {words_and_indices}") - words, indices = zip(*words_and_indices) - words = list(words) - indices = list(indices) - log.debug(f"words {words}") - log.debug(f"indices {indices}") - ent_substr_tfidfs = self.vectorizer.transform(words).toarray().astype(np.float32) - D, I = self.faiss_index.search(ent_substr_tfidfs, self.num_faiss_candidate_entities) - candidate_entities_dict = defaultdict(list) - for ind_list, scores_list, index in zip(I, D, indices): - if self.num_faiss_cells > 1: - scores_list = [1.0 - score for score in scores_list] - candidate_entities = {} - for ind, score in zip(ind_list, scores_list): - start_ind, end_ind = self.word_to_idlist[self.word_list[ind]] - for entity in self.entities_list[start_ind:end_ind]: - if entity in candidate_entities: - if score > candidate_entities[entity]: - candidate_entities[entity] = score - else: - candidate_entities[entity] = score - candidate_entities_dict[index] += [(entity, cand_entity_len, score) - for (entity, cand_entity_len), score in candidate_entities.items()] - log.debug(f"{index} candidate_entities {[self.word_list[ind] for ind in ind_list[:10]]}") - candidate_entities_total = list(candidate_entities_dict.values()) - candidate_entities_total = [self.sum_scores(candidate_entities, substr_len) - for candidate_entities, substr_len in - zip(candidate_entities_total, substr_lens)] - log.debug(f"length candidate entities list {len(candidate_entities_total)}") - candidate_entities_list = [] - entities_scores_list = [] - for candidate_entities in candidate_entities_total: - log.debug(f"candidate_entities before ranking {candidate_entities[:10]}") - candidate_entities = [candidate_entity + (self.entities_ranking_dict.get(candidate_entity[0], 0),) - for candidate_entity in candidate_entities] - candidate_entities_str = '\n'.join([str(candidate_entity) for candidate_entity in candidate_entities]) - candidate_entities = sorted(candidate_entities, key=lambda x: (x[1], x[2]), reverse=True) - log.debug(f"candidate_entities {candidate_entities[:10]}") - entities_scores = {entity: (substr_score, pop_score) - for entity, substr_score, pop_score in candidate_entities} - candidate_entities = [candidate_entity[0] for candidate_entity - in candidate_entities][:self.num_entities_for_bert_ranking] - log.debug(f"candidate_entities {candidate_entities[:10]}") - candidate_entities_list.append(candidate_entities) - if self.num_entities_to_return == 1: - entity_ids_list.append(candidate_entities[0]) - else: - entity_ids_list.append(candidate_entities[:self.num_entities_to_return]) - entities_scores_list.append(entities_scores) - if self.use_descriptions: - entity_ids_list = self.rank_by_description(entity_positions_list, candidate_entities_list, - entities_scores_list, context_tokens) - - return entity_ids_list - - def morph_parse(self, word): - morph_parse_tok = self.morph.parse(word)[0] - normal_form = morph_parse_tok.normal_form - return normal_form - - def sum_scores(self, candidate_entities: List[Tuple[str, int]], substr_len: int) -> List[Tuple[str, float]]: - entities_with_scores_sum = defaultdict(int) - for entity in candidate_entities: - entities_with_scores_sum[(entity[0], entity[1])] += entity[2] - - entities_with_scores = {} - for (entity, cand_entity_len), scores_sum in entities_with_scores_sum.items(): - score = min(scores_sum, cand_entity_len) / max(substr_len, cand_entity_len) - if entity in entities_with_scores: - if score > entities_with_scores[entity]: - entities_with_scores[entity] = score - else: - entities_with_scores[entity] = score - entities_with_scores = list(entities_with_scores.items()) - - return entities_with_scores - - def rank_by_description(self, entity_positions_list: List[List[int]], - candidate_entities_list: List[List[str]], - entities_scores_list: List[Dict[str, Tuple[int, float]]], - context_tokens: List[str]) -> List[List[str]]: - entity_ids_list = [] - for entity_pos, candidate_entities, entities_scores in zip(entity_positions_list, candidate_entities_list, - entities_scores_list): - log.debug(f"entity_pos {entity_pos}") - log.debug(f"candidate_entities {candidate_entities[:10]}") - if self.include_mention: - context = ' '.join(context_tokens[:entity_pos[0]] + ["[ENT]"] + - context_tokens[entity_pos[0]:entity_pos[-1] + 1] + ["[ENT]"] + - context_tokens[entity_pos[-1] + 1:]) - else: - context = ' '.join(context_tokens[:entity_pos[0]] + ["[ENT]"] + context_tokens[entity_pos[-1] + 1:]) - log.debug(f"context {context}") - log.debug(f"len candidate entities {len(candidate_entities)}") - scores = self.entity_ranker.rank_rels(context, candidate_entities) - entities_with_scores = [(entity, round(entities_scores[entity][0], 2), entities_scores[entity][1], - round(score, 2)) for entity, score in scores] - log.debug(f"len entities with scores {len(entities_with_scores)}") - entities_with_scores = [entity for entity in entities_with_scores if entity[3] > 0.1] - entities_with_scores = sorted(entities_with_scores, key=lambda x: (x[1], x[3], x[2]), reverse=True) - log.debug(f"entities_with_scores {entities_with_scores}") - top_entities = [score[0] for score in entities_with_scores] - if self.num_entities_to_return == 1: - entity_ids_list.append(top_entities[0]) - else: - entity_ids_list.append(top_entities[:self.num_entities_to_return]) - return entity_ids_list diff --git a/deeppavlov/models/kbqa/kbqa_entity_linking.py b/deeppavlov/models/kbqa/kbqa_entity_linking.py deleted file mode 100644 index 6ca147f076..0000000000 --- a/deeppavlov/models/kbqa/kbqa_entity_linking.py +++ /dev/null @@ -1,434 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import re -import sqlite3 -from logging import getLogger -from typing import List, Dict, Tuple, Optional, Any -from collections import defaultdict, Counter - -import nltk -import pymorphy2 -from nltk.corpus import stopwords -from rapidfuzz import fuzz -from hdt import HDTDocument - -from deeppavlov.core.common.registry import register -from deeppavlov.core.models.component import Component -from deeppavlov.core.models.serializable import Serializable -from deeppavlov.core.commands.utils import expand_path -from deeppavlov.core.common.file import load_pickle, save_pickle -from deeppavlov.models.spelling_correction.levenshtein.levenshtein_searcher import LevenshteinSearcher -from deeppavlov.models.kbqa.rel_ranking_bert_infer import RelRankerBertInfer - -log = getLogger(__name__) - - -@register('kbqa_entity_linker') -class KBEntityLinker(Component, Serializable): - """ - This class extracts from the knowledge base candidate entities for the entity mentioned in the question and then - extracts triplets from Wikidata for the extracted entity. Candidate entities are searched in the dictionary - where keys are titles and aliases of Wikidata entities and values are lists of tuples (entity_title, entity_id, - number_of_relations). First candidate entities are searched in the dictionary by keys where the keys are - entities extracted from the question, if nothing is found entities are searched in the dictionary using - Levenstein distance between the entity and keys (titles) in the dictionary. - """ - - def __init__(self, load_path: str, - inverted_index_filename: str, - entities_list_filename: str, - q2name_filename: str, - types_dict_filename: Optional[str] = None, - who_entities_filename: Optional[str] = None, - save_path: str = None, - q2descr_filename: str = None, - descr_rank_score_thres: float = 0.01, - freq_dict_filename: Optional[str] = None, - entity_ranker: RelRankerBertInfer = None, - build_inverted_index: bool = False, - kb_format: str = "hdt", - kb_filename: str = None, - label_rel: str = None, - descr_rel: str = None, - aliases_rels: List[str] = None, - sql_table_name: str = None, - sql_column_names: List[str] = None, - lang: str = "en", - use_descriptions: bool = False, - include_mention: bool = False, - num_entities_to_return: int = 5, - lemmatize: bool = False, - use_prefix_tree: bool = False, - **kwargs) -> None: - """ - - Args: - load_path: path to folder with inverted index files - inverted_index_filename: file with dict of words (keys) and entities containing these words - entities_list_filename: file with the list of entities from the knowledge base - q2name_filename: file which maps entity id to name - types_dict_filename: file with types of entities - who_entities_filename: file with the list of entities in Wikidata, which can be answers to questions - with "Who" pronoun, i.e. humans, literary characters etc. - save_path: path where to save inverted index files - q2descr_filename: name of file which maps entity id to description - descr_rank_score_thres: if the score of the entity description is less than threshold, the entity is not - added to output list - freq_dict_filename: filename with frequences dictionary of Russian words - entity_ranker: component deeppavlov.models.kbqa.rel_ranker_bert_infer - build_inverted_index: if "true", inverted index of entities of the KB will be built - kb_format: "hdt" or "sqlite3" - kb_filename: file with the knowledge base, which will be used for building of inverted index - label_rel: relation in the knowledge base which connects entity ids and entity titles - descr_rel: relation in the knowledge base which connects entity ids and entity descriptions - aliases_rels: list of relations which connect entity ids and entity aliases - sql_table_name: name of the table with the KB if the KB is in sqlite3 format - sql_column_names: names of columns with subject, relation and object - lang: language used - use_descriptions: whether to use context and descriptions of entities for entity ranking - include_mention: whether to leave or delete entity mention from the sentence before passing to BERT ranker - num_entities_to_return: how many entities for each substring the system returns - lemmatize: whether to lemmatize tokens of extracted entity - use_prefix_tree: whether to use prefix tree for search of entities with typos in entity labels - **kwargs: - """ - super().__init__(save_path=save_path, load_path=load_path) - self.morph = pymorphy2.MorphAnalyzer() - self.lemmatize = lemmatize - self.use_prefix_tree = use_prefix_tree - self.inverted_index_filename = inverted_index_filename - self.entities_list_filename = entities_list_filename - self.build_inverted_index = build_inverted_index - self.q2name_filename = q2name_filename - self.types_dict_filename = types_dict_filename - self.who_entities_filename = who_entities_filename - self.q2descr_filename = q2descr_filename - self.descr_rank_score_thres = descr_rank_score_thres - self.freq_dict_filename = freq_dict_filename - self.kb_format = kb_format - self.kb_filename = kb_filename - self.label_rel = label_rel - self.aliases_rels = aliases_rels - self.descr_rel = descr_rel - self.sql_table_name = sql_table_name - self.sql_column_names = sql_column_names - self.inverted_index: Optional[Dict[str, List[Tuple[str]]]] = None - self.entities_index: Optional[List[str]] = None - self.q2name: Optional[List[Tuple[str]]] = None - self.types_dict: Optional[Dict[str, List[str]]] = None - self.lang_str = f"@{lang}" - if self.lang_str == "@en": - self.stopwords = set(stopwords.words("english")) - elif self.lang_str == "@ru": - self.stopwords = set(stopwords.words("russian")) - self.re_tokenizer = re.compile(r"[\w']+|[^\w ]") - self.entity_ranker = entity_ranker - self.use_descriptions = use_descriptions - self.include_mention = include_mention - self.num_entities_to_return = num_entities_to_return - if self.use_descriptions and self.entity_ranker is None: - raise ValueError("No entity ranker is provided!") - - if self.use_prefix_tree: - alphabet = "!#%\&'()+,-./0123456789:;?ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz½¿ÁÄ" + \ - "ÅÆÇÉÎÓÖ×ÚßàáâãäåæçèéêëíîïðñòóôöøùúûüýāăąćČčĐėęěĞğĩīİıŁłńňŌōőřŚśşŠšťũūůŵźŻżŽžơưșȚțəʻ" + \ - "ʿΠΡβγБМавдежикмностъяḤḥṇṬṭầếờợ–‘’Ⅲ−∗" - dictionary_words = list(self.inverted_index.keys()) - self.searcher = LevenshteinSearcher(alphabet, dictionary_words) - - if self.build_inverted_index: - if self.kb_format == "hdt": - self.doc = HDTDocument(str(expand_path(self.kb_filename))) - elif self.kb_format == "sqlite3": - self.conn = sqlite3.connect(str(expand_path(self.kb_filename))) - self.cursor = self.conn.cursor() - else: - raise ValueError(f'unsupported kb_format value {self.kb_format}') - self.inverted_index_builder() - self.save() - else: - self.load() - - def load_freq_dict(self, freq_dict_filename: str): - with open(str(expand_path(freq_dict_filename)), 'r') as fl: - lines = fl.readlines() - pos_freq_dict = defaultdict(list) - for line in lines: - line_split = line.strip('\n').split('\t') - if re.match("[\d]+\.[\d]+", line_split[2]): - pos_freq_dict[line_split[1]].append((line_split[0], float(line_split[2]))) - nouns_with_freq = pos_freq_dict["s"] - self.nouns_dict = {noun: freq for noun, freq in nouns_with_freq} - - def load(self) -> None: - self.inverted_index = load_pickle(self.load_path / self.inverted_index_filename) - self.entities_list = load_pickle(self.load_path / self.entities_list_filename) - self.q2name = load_pickle(self.load_path / self.q2name_filename) - if self.who_entities_filename: - self.who_entities = load_pickle(self.load_path / self.who_entities_filename) - if self.freq_dict_filename: - self.load_freq_dict(self.freq_dict_filename) - if self.types_dict_filename: - self.types_dict = load_pickle(self.load_path / self.types_dict_filename) - - def save(self) -> None: - save_pickle(self.inverted_index, self.save_path / self.inverted_index_filename) - save_pickle(self.entities_list, self.save_path / self.entities_list_filename) - save_pickle(self.q2name, self.save_path / self.q2name_filename) - if self.q2descr_filename is not None: - save_pickle(self.q2descr, self.save_path / self.q2descr_filename) - - def __call__(self, entity_substr_batch: List[List[str]], - templates_batch: List[str] = None, - context_batch: List[str] = None, - entity_types_batch: List[List[List[str]]] = None) -> Tuple[ - List[List[List[str]]], List[List[List[float]]]]: - entity_ids_batch = [] - confidences_batch = [] - if templates_batch is None: - templates_batch = ["" for _ in entity_substr_batch] - if context_batch is None: - context_batch = ["" for _ in entity_substr_batch] - if entity_types_batch is None: - entity_types_batch = [[[] for _ in entity_substr_list] for entity_substr_list in entity_substr_batch] - for entity_substr_list, template_found, context, entity_types_list in \ - zip(entity_substr_batch, templates_batch, context_batch, entity_types_batch): - entity_ids_list = [] - confidences_list = [] - for entity_substr, entity_types in zip(entity_substr_list, entity_types_list): - entity_ids, confidences = self.link_entity(entity_substr, context, template_found, entity_types) - if self.num_entities_to_return == 1: - if entity_ids: - entity_ids_list.append(entity_ids[0]) - confidences_list.append(confidences[0]) - else: - entity_ids_list.append("") - confidences_list.append(0.0) - else: - entity_ids_list.append(entity_ids[:self.num_entities_to_return]) - confidences_list.append(confidences[:self.num_entities_to_return]) - entity_ids_batch.append(entity_ids_list) - confidences_batch.append(confidences_list) - - return entity_ids_batch, confidences_batch - - def link_entity(self, entity: str, context: Optional[str] = None, template_found: Optional[str] = None, - entity_types: List[str] = None, cut_entity: bool = False) -> Tuple[List[str], List[float]]: - confidences = [] - if not entity: - entities_ids = ['None'] - else: - candidate_entities = self.candidate_entities_inverted_index(entity) - if entity_types and self.types_dict: - entity_types = set(entity_types) - candidate_entities = [entity for entity in candidate_entities if - self.types_dict.get(entity[1], set()).intersection(entity_types)] - if cut_entity and candidate_entities and len(entity.split()) > 1 and candidate_entities[0][3] == 1: - entity = self.cut_entity_substr(entity) - candidate_entities = self.candidate_entities_inverted_index(entity) - candidate_entities, candidate_names = self.candidate_entities_names(entity, candidate_entities) - entities_ids, confidences, srtd_cand_ent = self.sort_found_entities(candidate_entities, - candidate_names, entity, context) - if template_found: - entities_ids = self.filter_entities(entities_ids, template_found) - - return entities_ids, confidences - - def cut_entity_substr(self, entity: str): - word_tokens = nltk.word_tokenize(entity.lower()) - word_tokens = [word for word in word_tokens if word not in self.stopwords] - normal_form_tokens = [self.morph.parse(word)[0].normal_form for word in word_tokens] - words_with_freq = [(word, self.nouns_dict.get(word, 0.0)) for word in normal_form_tokens] - words_with_freq = sorted(words_with_freq, key=lambda x: x[1]) - return words_with_freq[0][0] - - def candidate_entities_inverted_index(self, entity: str) -> List[Tuple[Any, Any, Any]]: - word_tokens = nltk.word_tokenize(entity.lower()) - word_tokens = [word for word in word_tokens if word not in self.stopwords] - candidate_entities = [] - - candidate_entities_for_tokens = [] - for tok in word_tokens: - candidate_entities_for_tok = set() - if len(tok) > 1: - found = False - if tok in self.inverted_index: - candidate_entities_for_tok = set(self.inverted_index[tok]) - found = True - - if self.lemmatize: - if self.lang_str == "@ru": - morph_parse_tok = self.morph.parse(tok)[0] - lemmatized_tok = morph_parse_tok.normal_form - if self.lang_str == "@en": - lemmatized_tok = self.lemmatizer.lemmatize(tok) - - if lemmatized_tok != tok and lemmatized_tok in self.inverted_index: - candidate_entities_for_tok = \ - candidate_entities_for_tok.union(set(self.inverted_index[lemmatized_tok])) - found = True - - if not found and self.use_prefix_tree: - words_with_levens_1 = self.searcher.search(tok, d=1) - for word in words_with_levens_1: - candidate_entities_for_tok = \ - candidate_entities_for_tok.union(set(self.inverted_index[word[0]])) - candidate_entities_for_tokens.append(candidate_entities_for_tok) - - for candidate_entities_for_tok in candidate_entities_for_tokens: - candidate_entities += list(candidate_entities_for_tok) - candidate_entities = Counter(candidate_entities).most_common() - candidate_entities = [(entity_num, self.entities_list[entity_num], entity_freq, count) for \ - (entity_num, entity_freq), count in candidate_entities] - - return candidate_entities - - def sort_found_entities(self, candidate_entities: List[Tuple[int, str, int]], - candidate_names: List[List[str]], - entity: str, - context: str = None) -> Tuple[List[str], List[float], List[Tuple[str, str, int, int]]]: - entities_ratios = [] - for candidate, entity_names in zip(candidate_entities, candidate_names): - entity_num, entity_id, num_rels, tokens_matched = candidate - fuzz_ratio = max([fuzz.ratio(name.lower(), entity) for name in entity_names]) - entities_ratios.append((entity_num, entity_id, tokens_matched, fuzz_ratio, num_rels)) - - srtd_with_ratios = sorted(entities_ratios, key=lambda x: (x[2], x[3], x[4]), reverse=True) - if self.use_descriptions: - log.debug(f"context {context}") - id_to_score = {entity_id: (tokens_matched, score) for _, entity_id, tokens_matched, score, _ in - srtd_with_ratios[:30]} - entity_ids = [entity_id for _, entity_id, _, _, _ in srtd_with_ratios[:30]] - scores = self.entity_ranker.rank_rels(context, entity_ids) - entities_with_scores = [(entity_id, id_to_score[entity_id][0], id_to_score[entity_id][1], score) for - entity_id, score in scores] - entities_with_scores = sorted(entities_with_scores, key=lambda x: (x[1], x[2], x[3]), reverse=True) - entities_with_scores = [entity for entity in entities_with_scores if \ - (entity[3] > self.descr_rank_score_thres or entity[2] == 100.0)] - log.debug(f"entities_with_scores {entities_with_scores[:10]}") - entity_ids = [entity for entity, _, _, _ in entities_with_scores] - confidences = [score for _, _, _, score in entities_with_scores] - else: - entity_ids = [ent[1] for ent in srtd_with_ratios] - confidences = [float(ent[2]) * 0.01 for ent in srtd_with_ratios] - - return entity_ids, confidences, srtd_with_ratios - - def candidate_entities_names(self, entity: str, - candidate_entities: List[Tuple[int, str, int]]) -> Tuple[List[Tuple[int, str, int]], - List[List[str]]]: - entity_length = len(entity) - candidate_names = [] - candidate_entities_filter = [] - for candidate in candidate_entities: - entity_num = candidate[0] - entity_names = [] - - entity_names_found = self.q2name[entity_num] - if len(entity_names_found[0]) < 6 * entity_length: - entity_name = entity_names_found[0] - entity_names.append(entity_name) - if len(entity_names_found) > 1: - for alias in entity_names_found[1:]: - entity_names.append(alias) - candidate_names.append(entity_names) - candidate_entities_filter.append(candidate) - - return candidate_entities_filter, candidate_names - - def inverted_index_builder(self) -> None: - log.debug("building inverted index") - entities_set = set() - id_to_label_dict = defaultdict(list) - id_to_descr_dict = {} - label_to_id_dict = {} - label_triplets = [] - alias_triplets_list = [] - descr_triplets = [] - if self.kb_format == "hdt": - label_triplets, c = self.doc.search_triples("", self.label_rel, "") - if self.aliases_rels is not None: - for alias_rel in self.aliases_rels: - alias_triplets, c = self.doc.search_triples("", alias_rel, "") - alias_triplets_list.append(alias_triplets) - if self.descr_rel is not None: - descr_triplets, c = self.doc.search_triples("", self.descr_rel, "") - - if self.kb_format == "sqlite3": - subject, relation, obj = self.sql_column_names - query = f'SELECT {subject}, {relation}, {obj} FROM {self.sql_table_name} ' \ - f'WHERE {relation} = "{self.label_rel}";' - res = self.cursor.execute(query) - label_triplets = res.fetchall() - if self.aliases_rels is not None: - for alias_rel in self.aliases_rels: - query = f'SELECT {subject}, {relation}, {obj} FROM {self.sql_table_name} ' \ - f'WHERE {relation} = "{alias_rel}";' - res = self.cursor.execute(query) - alias_triplets = res.fetchall() - alias_triplets_list.append(alias_triplets) - if self.descr_rel is not None: - query = f'SELECT {subject}, {relation}, {obj} FROM {self.sql_table_name} ' \ - f'WHERE {relation} = "{self.descr_rel}";' - res = self.cursor.execute(query) - descr_triplets = res.fetchall() - - for triplets in [label_triplets] + alias_triplets_list: - for triplet in triplets: - entities_set.add(triplet[0]) - if triplet[2].endswith(self.lang_str): - label = triplet[2].replace(self.lang_str, '').replace('"', '') - id_to_label_dict[triplet[0]].append(label) - label_to_id_dict[label] = triplet[0] - - for triplet in descr_triplets: - entities_set.add(triplet[0]) - if triplet[2].endswith(self.lang_str): - descr = triplet[2].replace(self.lang_str, '').replace('"', '') - id_to_descr_dict[triplet[0]].append(descr) - - popularities_dict = {} - for entity in entities_set: - if self.kb_format == "hdt": - all_triplets, number_of_triplets = self.doc.search_triples(entity, "", "") - popularities_dict[entity] = number_of_triplets - if self.kb_format == "sqlite3": - subject, relation, obj = self.sql_column_names - query = f'SELECT COUNT({obj}) FROM {self.sql_table_name} WHERE {subject} = "{entity}";' - res = self.cursor.execute(query) - popularities_dict[entity] = res.fetchall()[0][0] - - entities_dict = {entity: n for n, entity in enumerate(entities_set)} - - inverted_index = defaultdict(list) - for label in label_to_id_dict: - tokens = re.findall(self.re_tokenizer, label.lower()) - for tok in tokens: - if len(tok) > 1 and tok not in self.stopwords: - inverted_index[tok].append((entities_dict[label_to_id_dict[label]], - popularities_dict[label_to_id_dict[label]])) - self.inverted_index = dict(inverted_index) - self.entities_list = list(entities_set) - self.q2name = [id_to_label_dict[entity] for entity in self.entities_list] - self.q2descr = [] - if id_to_descr_dict: - self.q2descr = [id_to_descr_dict[entity] for entity in self.entities_list] - - def filter_entities(self, entities: List[str], template_found: str) -> List[str]: - if template_found in ["who is xxx?", "who was xxx?"]: - entities = [entity for entity in entities if entity in self.who_entities] - if template_found in ["what is xxx?", "what was xxx?"]: - entities = [entity for entity in entities if entity not in self.who_entities] - return entities diff --git a/deeppavlov/models/kbqa/query_generator.py b/deeppavlov/models/kbqa/query_generator.py index e46500f7b7..2fedd1eee0 100644 --- a/deeppavlov/models/kbqa/query_generator.py +++ b/deeppavlov/models/kbqa/query_generator.py @@ -14,20 +14,19 @@ import itertools import re +from collections import namedtuple, OrderedDict from logging import getLogger -from typing import Tuple, List, Optional, Union, Dict, Any -from collections import namedtuple, defaultdict +from typing import Tuple, List, Optional, Union, Dict, Any, Set -import numpy as np import nltk +import numpy as np from deeppavlov.core.common.registry import register -from deeppavlov.models.kbqa.wiki_parser import WikiParser +from deeppavlov.models.kbqa.query_generator_base import QueryGeneratorBase from deeppavlov.models.kbqa.rel_ranking_infer import RelRankerInfer -from deeppavlov.models.kbqa.rel_ranking_bert_infer import RelRankerBertInfer from deeppavlov.models.kbqa.utils import \ extract_year, extract_number, order_of_answers_sorting, make_combs, fill_query -from deeppavlov.models.kbqa.query_generator_base import QueryGeneratorBase +from deeppavlov.models.kbqa.wiki_parser import WikiParser log = getLogger(__name__) @@ -39,12 +38,11 @@ class QueryGenerator(QueryGeneratorBase): """ def __init__(self, wiki_parser: WikiParser, - rel_ranker: Union[RelRankerInfer, RelRankerBertInfer], + rel_ranker: RelRankerInfer, entities_to_leave: int = 5, rels_to_leave: int = 7, max_comb_num: int = 10000, - return_all_possible_answers: bool = False, - return_answers: bool = False, *args, **kwargs) -> None: + return_all_possible_answers: bool = False, *args, **kwargs) -> None: """ Args: @@ -54,7 +52,6 @@ def __init__(self, wiki_parser: WikiParser, rels_to_leave: how many relations to leave after relation ranking max_comb_num: the maximum number of combinations of candidate entities and relations return_all_possible_answers: whether to return all found answers - return_answers: whether to return answers or candidate answers **kwargs: """ self.wiki_parser = wiki_parser @@ -63,45 +60,49 @@ def __init__(self, wiki_parser: WikiParser, self.rels_to_leave = rels_to_leave self.max_comb_num = max_comb_num self.return_all_possible_answers = return_all_possible_answers - self.return_answers = return_answers self.replace_tokens = [("wdt:p31", "wdt:P31"), ("pq:p580", "pq:P580"), ("pq:p582", "pq:P582"), ("pq:p585", "pq:P585"), ("pq:p1545", "pq:P1545")] super().__init__(wiki_parser=self.wiki_parser, rel_ranker=self.rel_ranker, entities_to_leave=self.entities_to_leave, rels_to_leave=self.rels_to_leave, - return_answers=self.return_answers, *args, **kwargs) + *args, **kwargs) def __call__(self, question_batch: List[str], question_san_batch: List[str], template_type_batch: Union[List[List[str]], List[str]], entities_from_ner_batch: List[List[str]], - types_from_ner_batch: List[List[str]]) -> List[Union[List[Tuple[str, Any]], List[str]]]: + entity_tags_batch: List[List[str]], + answer_types_batch: List[Set[str]]) -> List[str]: candidate_outputs_batch = [] template_answers_batch = [] - for question, question_sanitized, template_type, entities_from_ner, types_from_ner in \ - zip(question_batch, question_san_batch, template_type_batch, - entities_from_ner_batch, types_from_ner_batch): - candidate_outputs, template_answer = self.find_candidate_answers(question, question_sanitized, - template_type, entities_from_ner, - types_from_ner) + templates_nums_batch = [] + log.debug(f"kbqa inputs {question_batch} {entities_from_ner_batch} {template_type_batch} {entity_tags_batch}") + for question, question_sanitized, template_type, entities_from_ner, entity_tags_list, answer_types in \ + zip(question_batch, question_san_batch, template_type_batch, entities_from_ner_batch, + entity_tags_batch, answer_types_batch): + if template_type == "-1": + template_type = "7" + candidate_outputs, template_answer, templates_nums = \ + self.find_candidate_answers(question, question_sanitized, template_type, entities_from_ner, + entity_tags_list, answer_types) candidate_outputs_batch.append(candidate_outputs) template_answers_batch.append(template_answer) - if self.return_answers: - answers = self.rel_ranker(question_batch, candidate_outputs_batch, entities_from_ner_batch, - template_answers_batch) - log.debug(f"(__call__)answers: {answers}") - if not answers: - answers = ["Not Found"] - return answers - else: - log.debug(f"(__call__)candidate_outputs_batch: {[output[:5] for output in candidate_outputs_batch]}") - return candidate_outputs_batch + templates_nums_batch.append(templates_nums) + + answers = self.rel_ranker(question_batch, candidate_outputs_batch, entities_from_ner_batch, + template_answers_batch) + log.debug(f"(__call__)answers: {answers}") + if not answers: + answers = ["Not Found" for _ in question_batch] + return answers def query_parser(self, question: str, query_info: Dict[str, str], entities_and_types_select: List[str], entity_ids: List[List[str]], type_ids: List[List[str]], - rels_from_template: Optional[List[Tuple[str]]] = None) -> List[List[Union[Tuple[Any, ...], Any]]]: + answer_types: Set[str], + rels_from_template: Optional[List[Tuple[str]]] = None) -> Union[ + List[Dict[str, Union[Union[Tuple[Any, ...], List[Any]], Any]]], List[Dict[str, Any]]]: question_tokens = nltk.word_tokenize(question) query = query_info["query_template"].lower() for old_tok, new_tok in self.replace_tokens: @@ -131,9 +132,10 @@ def query_parser(self, question: str, query_info: Dict[str, str], else: rels = [self.find_top_rels(question, entity_ids, triplet_info) for triplet_info in triplet_info_list] + rels = [[rel for rel in rel_list] for rel_list in rels] log.debug(f"(query_parser)rels: {rels}") rels_from_query = [triplet[1] for triplet in query_triplets if triplet[1].startswith('?')] - answer_ent = re.findall("select [\(]?([\S]+) ", query) + answer_ent = re.findall(r"select [\(]?([\S]+) ", query) order_info_nt = namedtuple("order_info", ["variable", "sorting_order"]) order_variable = re.findall("order by (asc|desc)\((.*)\)", query) if order_variable: @@ -161,8 +163,6 @@ def query_parser(self, question: str, query_info: Dict[str, str], filter_info.append((unk_prop, prop_type)) log.debug(f"(query_parser)filter_from_query: {filter_from_query}") rel_combs = make_combs(rels, permut=False) - import datetime - start_time = datetime.datetime.now() entity_positions, type_positions = [elem.split('_') for elem in entities_and_types_select.split(' ')] log.debug(f"entity_positions {entity_positions}, type_positions {type_positions}") selected_entity_ids = [entity_ids[int(pos) - 1] for pos in entity_positions if int(pos) > 0] @@ -175,10 +175,6 @@ def query_parser(self, question: str, query_info: Dict[str, str], parser_info_list = [] confidences_list = [] all_combs_list = list(itertools.product(entity_combs, type_combs, rel_combs)) - if self.wiki_file_format == "pickle": - total_entities_list = list(itertools.chain.from_iterable(selected_entity_ids)) + \ - list(itertools.chain.from_iterable(selected_type_ids)) - parse_res = self.wiki_parser(["parse_triplets"], [total_entities_list]) for comb_num, combs in enumerate(all_combs_list): confidence = np.prod([score for rel, score in combs[2][:-1]]) confidences_list.append(confidence) @@ -186,14 +182,19 @@ def query_parser(self, question: str, query_info: Dict[str, str], fill_query(query_hdt_elem, combs[0], combs[1], combs[2]) for query_hdt_elem in query_sequence] if comb_num == 0: log.debug(f"\n__________________________\nfilled query: {query_hdt_seq}\n__________________________\n") - queries_list.append((rels_from_query + answer_ent, query_hdt_seq, filter_info, order_info, return_if_found)) + if comb_num > 0: + answer_types = [] + queries_list.append( + (rels_from_query + answer_ent, query_hdt_seq, filter_info, order_info, answer_types, rel_types, + return_if_found)) + parser_info_list.append("query_execute") if comb_num == self.max_comb_num: break candidate_outputs = [] candidate_outputs_list = self.wiki_parser(parser_info_list, queries_list) - if self.use_api_requester and isinstance(candidate_outputs_list, list) and candidate_outputs_list: + if self.use_wp_api_requester and isinstance(candidate_outputs_list, list) and candidate_outputs_list: candidate_outputs_list = candidate_outputs_list[0] if isinstance(candidate_outputs_list, list) and candidate_outputs_list: @@ -203,18 +204,27 @@ def query_parser(self, question: str, query_info: Dict[str, str], for combs, confidence, candidate_output in zip(all_combs_list, confidences_list, candidate_outputs_list): candidate_outputs += [[combs[0]] + [rel for rel, score in combs[2][:-1]] + output + [confidence] for output in candidate_output] + if self.return_all_possible_answers: - candidate_outputs_dict = defaultdict(list) + candidate_outputs_dict = OrderedDict() for candidate_output in candidate_outputs: - candidate_outputs_dict[(tuple(candidate_output[0]), - tuple(candidate_output[1:-2]))].append(candidate_output[-2:]) + candidate_output_key = (tuple(candidate_output[0]), tuple(candidate_output[1:-2])) + if candidate_output_key not in candidate_outputs_dict: + candidate_outputs_dict[candidate_output_key] = [] + candidate_outputs_dict[candidate_output_key].append(candidate_output[-2:]) candidate_outputs = [] for (candidate_entity_comb, candidate_rel_comb), candidate_output in candidate_outputs_dict.items(): - candidate_outputs.append(list(candidate_rel_comb) + - [tuple([ans for ans, conf in candidate_output]), candidate_output[0][1]]) + candidate_outputs.append({"entities": candidate_entity_comb, + "relations": list(candidate_rel_comb), + "answers": tuple([ans for ans, conf in candidate_output]), + "rel_conf": candidate_output[0][1] + }) else: - candidate_outputs = [output[1:] for output in candidate_outputs] - log.debug(f"(query_parser)loop time: {datetime.datetime.now() - start_time}") + candidate_outputs = [{"entities": f_entities, + "relations": f_relations, + "answers": f_answers, + "rel_conf": f_rel_conf + } for f_entities, *f_relations, f_answers, f_rel_conf in candidate_outputs] log.debug(f"(query_parser)final outputs: {candidate_outputs[:3]}") return candidate_outputs diff --git a/deeppavlov/models/kbqa/query_generator_base.py b/deeppavlov/models/kbqa/query_generator_base.py index 91786e37ae..772abd6cc5 100644 --- a/deeppavlov/models/kbqa/query_generator_base.py +++ b/deeppavlov/models/kbqa/query_generator_base.py @@ -12,20 +12,20 @@ # See the License for the specific language governing permissions and # limitations under the License. +import json from logging import getLogger -from typing import Tuple, List, Optional, Union, Any +from typing import Tuple, List, Optional, Union, Any, Set -from whapi import search, get_html from bs4 import BeautifulSoup +from whapi import search, get_html +from deeppavlov.core.commands.utils import expand_path +from deeppavlov.core.common.file import read_json from deeppavlov.core.models.component import Component from deeppavlov.core.models.serializable import Serializable -from deeppavlov.core.common.file import read_json -from deeppavlov.core.commands.utils import expand_path -from deeppavlov.models.kbqa.template_matcher import TemplateMatcher -from deeppavlov.models.kbqa.entity_linking import EntityLinker +from deeppavlov.models.entity_extraction.entity_linking import EntityLinker from deeppavlov.models.kbqa.rel_ranking_infer import RelRankerInfer -from deeppavlov.models.kbqa.rel_ranking_bert_infer import RelRankerBertInfer +from deeppavlov.models.kbqa.template_matcher import TemplateMatcher log = getLogger(__name__) @@ -37,46 +37,44 @@ class QueryGeneratorBase(Component, Serializable): """ def __init__(self, template_matcher: TemplateMatcher, - linker_entities: EntityLinker, - linker_types: EntityLinker, - rel_ranker: Union[RelRankerInfer, RelRankerBertInfer], + entity_linker: EntityLinker, + rel_ranker: RelRankerInfer, load_path: str, rank_rels_filename_1: str, rank_rels_filename_2: str, sparql_queries_filename: str, wiki_parser=None, - wiki_file_format: str = "hdt", entities_to_leave: int = 5, rels_to_leave: int = 7, syntax_structure_known: bool = False, - use_api_requester: bool = False, - return_answers: bool = False, *args, **kwargs) -> None: + use_wp_api_requester: bool = False, + use_el_api_requester: bool = False, + use_alt_templates: bool = True, + use_add_templates: bool = False, *args, **kwargs) -> None: """ Args: template_matcher: component deeppavlov.models.kbqa.template_matcher - linker_entities: component deeppavlov.models.kbqa.entity_linking for linking of entities - linker_types: component deeppavlov.models.kbqa.entity_linking for linking of types + entity_linker: component deeppavlov.models.entity_extraction.entity_linking for linking of entities rel_ranker: component deeppavlov.models.kbqa.rel_ranking_infer load_path: path to folder with wikidata files rank_rels_filename_1: file with list of rels for first rels in questions with ranking rank_rels_filename_2: file with list of rels for second rels in questions with ranking sparql_queries_filename: file with sparql query templates - wiki_file_format: format of wikidata file wiki_parser: component deeppavlov.models.kbqa.wiki_parser entities_to_leave: how many entities to leave after entity linking rels_to_leave: how many relations to leave after relation ranking syntax_structure_known: if syntax tree parser was used to define query template type - use_api_requester: whether deeppavlov.models.api_requester.api_requester component will be used for - Entity Linking and Wiki Parser - return_answers: whether to return answers or candidate answers + use_wp_api_requester: whether deeppavlov.models.api_requester.api_requester component will be used for + Wiki Parser + use_el_api_requester: whether deeppavlov.models.api_requester.api_requester component will be used for + Entity Linking + use_alt_templates: whether to use alternative templates if no answer was found for default query template """ super().__init__(save_path=None, load_path=load_path) self.template_matcher = template_matcher - self.linker_entities = linker_entities - self.linker_types = linker_types + self.entity_linker = entity_linker self.wiki_parser = wiki_parser - self.wiki_file_format = wiki_file_format self.rel_ranker = rel_ranker self.rank_rels_filename_1 = rank_rels_filename_1 self.rank_rels_filename_2 = rank_rels_filename_2 @@ -85,9 +83,11 @@ def __init__(self, template_matcher: TemplateMatcher, self.entities_to_leave = entities_to_leave self.rels_to_leave = rels_to_leave self.syntax_structure_known = syntax_structure_known - self.use_api_requester = use_api_requester + self.use_wp_api_requester = use_wp_api_requester + self.use_el_api_requester = use_el_api_requester + self.use_alt_templates = use_alt_templates + self.use_add_templates = use_add_templates self.sparql_queries_filename = sparql_queries_filename - self.return_answers = return_answers self.load() @@ -109,8 +109,10 @@ def find_candidate_answers(self, question: str, question_sanitized: str, template_types: Union[List[str], str], entities_from_ner: List[str], - types_from_ner: List[str]) -> Union[List[Tuple[str, Any]], List[str]]: - + entity_tags: List[str], + answer_types: Set[str]) -> Tuple[Union[Union[List[List[Union[str, float]]], + List[Any]], Any], + Union[str, Any], Union[List[Any], Any]]: candidate_outputs = [] self.template_nums = template_types @@ -120,88 +122,95 @@ def find_candidate_answers(self, question: str, question = question.replace(old, new) entities_from_template, types_from_template, rels_from_template, rel_dirs_from_template, query_type_template, \ - entity_types, template_answer, template_found = self.template_matcher(question_sanitized, entities_from_ner) + entity_types, template_answer, answer_types, template_found = self.template_matcher(question_sanitized, + entities_from_ner) self.template_nums = [query_type_template] + templates_nums = [] - log.debug(f"question: {question}\n") - log.debug(f"template_type {self.template_nums}") + log.debug( + f"question: {question} entities_from_template {entities_from_template} template_type {self.template_nums} " + f"types from template {types_from_template} rels_from_template {rels_from_template}") if entities_from_template or types_from_template: if rels_from_template[0][0] == "PHOW": how_to_content = self.find_answer_wikihow(entities_from_template[0]) candidate_outputs = [["PHOW", how_to_content, 1.0]] else: - entity_ids = self.get_entity_ids(entities_from_template, "entities", template_found, question, - entity_types) - type_ids = self.get_entity_ids(types_from_template, "types") + entity_ids = self.get_entity_ids(entities_from_template, entity_tags, question) log.debug(f"entities_from_template {entities_from_template}") log.debug(f"entity_types {entity_types}") log.debug(f"types_from_template {types_from_template}") log.debug(f"rels_from_template {rels_from_template}") log.debug(f"entity_ids {entity_ids}") - log.debug(f"type_ids {type_ids}") - - candidate_outputs = self.sparql_template_parser(question_sanitized, entity_ids, type_ids, - rels_from_template, - rel_dirs_from_template) + candidate_outputs, templates_nums = \ + self.sparql_template_parser(question_sanitized, entity_ids, [], answer_types, + rels_from_template, rel_dirs_from_template) if not candidate_outputs and entities_from_ner: log.debug(f"(__call__)entities_from_ner: {entities_from_ner}") - log.debug(f"(__call__)types_from_ner: {types_from_ner}") - entity_ids = self.get_entity_ids(entities_from_ner, "entities", question=question) - type_ids = self.get_entity_ids(types_from_ner, "types") + entity_ids = self.get_entity_ids(entities_from_ner, entity_tags, question) log.debug(f"(__call__)entity_ids: {entity_ids}") - log.debug(f"(__call__)type_ids: {type_ids}") self.template_nums = template_types log.debug(f"(__call__)self.template_nums: {self.template_nums}") if not self.syntax_structure_known: entity_ids = entity_ids[:3] - candidate_outputs = self.sparql_template_parser(question_sanitized, entity_ids, type_ids) - return candidate_outputs, template_answer + candidate_outputs, templates_nums = self.sparql_template_parser(question_sanitized, entity_ids, [], + answer_types) + return candidate_outputs, template_answer, templates_nums - def get_entity_ids(self, entities: List[str], - what_to_link: str, - template_found: str = None, - question: str = None, - entity_types: List[List[str]] = None) -> List[List[str]]: + def get_entity_ids(self, entities: List[str], tags: List[str], question: str) -> List[List[str]]: entity_ids = [] - if what_to_link == "entities": - if entity_types: - el_output = self.linker_entities([entities], [template_found], [question], [entity_types]) - else: - el_output = self.linker_entities([entities], [template_found], [question]) - if self.use_api_requester: + el_output = [] + try: + el_output = self.entity_linker([entities], [tags], [[question]], [None], [None]) + except json.decoder.JSONDecodeError: + log.info("not received output from entity linking") + if el_output: + if self.use_el_api_requester: el_output = el_output[0] - entity_ids, _ = el_output - if not self.use_api_requester and entity_ids: + if el_output: + if isinstance(el_output[0], dict): + entity_ids = [entity_info.get("entity_ids", []) for entity_info in el_output] + if isinstance(el_output[0], list): + entity_ids, *_ = el_output + if not self.use_el_api_requester and entity_ids: entity_ids = entity_ids[0] - if what_to_link == "types": - entity_ids, _ = self.linker_types([entities]) - entity_ids = entity_ids[0] return entity_ids def sparql_template_parser(self, question: str, entity_ids: List[List[str]], type_ids: List[List[str]], + answer_types: List[str], rels_from_template: Optional[List[Tuple[str]]] = None, - rel_dirs_from_template: Optional[List[str]] = None) -> List[Tuple[str]]: + rel_dirs_from_template: Optional[List[str]] = None) -> Tuple[Union[None, List[Any]], + List[Any]]: candidate_outputs = [] + log.debug(f"use alternative templates {self.use_alt_templates}") log.debug(f"(find_candidate_answers)self.template_nums: {self.template_nums}") templates = [] + templates_nums = [] for template_num in self.template_nums: for num, template in self.template_queries.items(): if (num == template_num and self.syntax_structure_known) or \ (template["template_num"] == template_num and not self.syntax_structure_known): templates.append(template) - templates = [template for template in templates if - (not self.syntax_structure_known and [len(entity_ids), len(type_ids)] == template[ - "entities_and_types_num"]) - or self.syntax_structure_known] + templates_nums.append(num) + new_templates = [] + new_templates_nums = [] + for template, template_num in zip(templates, templates_nums): + if (not self.syntax_structure_known and [len(entity_ids), len(type_ids)] == template[ + "entities_and_types_num"]) or self.syntax_structure_known: + new_templates.append(template) + new_templates_nums.append(template_num) + + templates = new_templates + templates_nums = new_templates_nums + templates_string = '\n'.join([template["query_template"] for template in templates]) log.debug(f"{templates_string}") if not templates: - return candidate_outputs + return candidate_outputs, [] if rels_from_template is not None: query_template = {} for template in templates: @@ -210,27 +219,38 @@ def sparql_template_parser(self, question: str, if query_template: entities_and_types_select = query_template["entities_and_types_select"] candidate_outputs = self.query_parser(question, query_template, entities_and_types_select, - entity_ids, type_ids, rels_from_template) + entity_ids, type_ids, answer_types, rels_from_template) else: for template in templates: entities_and_types_select = template["entities_and_types_select"] candidate_outputs = self.query_parser(question, template, entities_and_types_select, - entity_ids, type_ids, rels_from_template) + entity_ids, type_ids, answer_types, rels_from_template) + if self.use_add_templates: + additional_templates = template.get("additional_templates", []) + templates_nums += additional_templates + for add_template_num in additional_templates: + candidate_outputs += self.query_parser(question, self.template_queries[add_template_num], + entities_and_types_select, entity_ids, type_ids, + answer_types, rels_from_template) if candidate_outputs: - return candidate_outputs + templates_nums = list(set(templates_nums)) + return candidate_outputs, templates_nums - if not candidate_outputs: + if not candidate_outputs and self.use_alt_templates: alternative_templates = templates[0]["alternative_templates"] for template_num, entities_and_types_select in alternative_templates: candidate_outputs = self.query_parser(question, self.template_queries[template_num], entities_and_types_select, entity_ids, type_ids, - rels_from_template) + answer_types, rels_from_template) + templates_nums.append(template_num) if candidate_outputs: - return candidate_outputs + templates_nums = list(set(templates_nums)) + return candidate_outputs, templates_nums log.debug("candidate_rels_and_answers:\n" + '\n'.join([str(output) for output in candidate_outputs[:5]])) - return candidate_outputs + templates_nums = list(set(templates_nums)) + return candidate_outputs, templates_nums def find_top_rels(self, question: str, entity_ids: List[List[str]], triplet_info: Tuple) -> List[Tuple[str, Any]]: ex_rels = [] @@ -239,8 +259,11 @@ def find_top_rels(self, question: str, entity_ids: List[List[str]], triplet_info queries_list = list({(entity, direction, rel_type) for entity_id in entity_ids for entity in entity_id[:self.entities_to_leave]}) parser_info_list = ["find_rels" for i in range(len(queries_list))] - ex_rels = self.wiki_parser(parser_info_list, queries_list) - if self.use_api_requester and ex_rels: + try: + ex_rels = self.wiki_parser(parser_info_list, queries_list) + except json.decoder.JSONDecodeError: + log.info("find_top_rels, not received output from wiki parser") + if self.use_wp_api_requester and ex_rels: ex_rels = [rel[0] for rel in ex_rels] ex_rels = list(set(ex_rels)) ex_rels = [rel.split('/')[-1] for rel in ex_rels] @@ -248,17 +271,26 @@ def find_top_rels(self, question: str, entity_ids: List[List[str]], triplet_info ex_rels = self.rank_list_0 elif source == "rank_list_2": ex_rels = self.rank_list_1 - rels_with_scores = self.rel_ranker.rank_rels(question, ex_rels) + rels_with_scores = [] + ex_rels = [rel for rel in ex_rels if rel.startswith("P")] + if ex_rels: + rels_with_scores = self.rel_ranker.rank_rels(question, ex_rels) return rels_with_scores[:self.rels_to_leave] def find_answer_wikihow(self, howto_sentence: str) -> str: + tags = [] search_results = search(howto_sentence, 5) - article_id = search_results[0]["article_id"] - html = get_html(article_id) - page = BeautifulSoup(html, 'lxml') - tags = list(page.find_all(['p'])) + if search_results: + article_id = search_results[0]["article_id"] + html = get_html(article_id) + page = BeautifulSoup(html, 'lxml') + tags = list(page.find_all(['p'])) if tags: howto_content = f"{tags[0].text.strip()}@en" else: howto_content = "Not Found" return howto_content + + def query_parser(self, question, query_template, entities_and_types_select, entity_ids, type_ids, answer_types, + rels_from_template): + raise NotImplementedError diff --git a/deeppavlov/models/kbqa/query_generator_online.py b/deeppavlov/models/kbqa/query_generator_online.py deleted file mode 100644 index 1783bc6ffe..0000000000 --- a/deeppavlov/models/kbqa/query_generator_online.py +++ /dev/null @@ -1,190 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import itertools -import re -from logging import getLogger -from typing import Tuple, List, Optional, Union, Dict, Any - -import nltk - -from deeppavlov.core.common.registry import register -from deeppavlov.models.kbqa.wiki_parser_online import WikiParserOnline -from deeppavlov.models.kbqa.rel_ranking_infer import RelRankerInfer -from deeppavlov.models.kbqa.rel_ranking_bert_infer import RelRankerBertInfer -from deeppavlov.models.kbqa.utils import \ - extract_year, extract_number, make_combs, fill_online_query -from deeppavlov.models.kbqa.query_generator_base import QueryGeneratorBase - -log = getLogger(__name__) - - -@register('query_generator_online') -class QueryGeneratorOnline(QueryGeneratorBase): - """ - Class for query generation online using Wikidata query service - """ - - def __init__(self, wiki_parser: WikiParserOnline, - rel_ranker: Union[RelRankerInfer, RelRankerBertInfer], - entities_to_leave: int = 5, - rels_to_leave: int = 7, - return_answers: bool = False, *args, **kwargs) -> None: - """ - - Args: - wiki_parser: component deeppavlov.models.kbqa.wiki_parser - rel_ranker: component deeppavlov.models.kbqa.rel_ranking_infer - entities_to_leave: how many entities to leave after entity linking - rels_to_leave: how many relations to leave after relation ranking - return_answers: whether to return answers or candidate answers - """ - self.wiki_parser = wiki_parser - self.rel_ranker = rel_ranker - self.entities_to_leave = entities_to_leave - self.rels_to_leave = rels_to_leave - self.return_answers = return_answers - super().__init__(wiki_parser=self.wiki_parser, rel_ranker=self.rel_ranker, - entities_to_leave=self.entities_to_leave, rels_to_leave=self.rels_to_leave, - return_answers=self.return_answers, *args, **kwargs) - - self.load() - - def __call__(self, question_batch: List[str], - question_san_batch: List[str], - template_type_batch: List[str], - entities_from_ner_batch: List[List[str]], - types_from_ner_batch: List[List[str]]) -> List[Union[List[Tuple[str, Any]], List[str]]]: - - candidate_outputs_batch = [] - for question, question_sanitized, template_type, entities_from_ner, types_from_ner in \ - zip(question_batch, question_san_batch, template_type_batch, - entities_from_ner_batch, types_from_ner_batch): - candidate_outputs, _ = self.find_candidate_answers(question, question_sanitized, - template_type, entities_from_ner, types_from_ner) - candidate_outputs_batch.append(candidate_outputs) - if self.return_answers: - answers = self.rel_ranker(question_batch, candidate_outputs_batch) - log.debug(f"(__call__)answers: {answers}") - return answers - else: - log.debug(f"(__call__)candidate_outputs_batch: {[output[:5] for output in candidate_outputs_batch]}") - return candidate_outputs_batch - - def query_parser(self, question: str, query_info: Dict[str, str], - entities_and_types_select: List[str], - entity_ids: List[List[str]], type_ids: List[List[str]], - rels_from_template: Optional[List[Tuple[str]]] = None) -> List[Tuple[str]]: - question_tokens = nltk.word_tokenize(question) - query = query_info["query_template"].lower().replace("wdt:p31", "wdt:P31") - rels_for_search = query_info["rank_rels"] - rel_types = query_info["rel_types"] - rels_for_filter = query_info["filter_rels"] - property_types = query_info["property_types"] - query_seq_num = query_info["query_sequence"] - return_if_found = query_info["return_if_found"] - log.debug(f"(query_parser)query: {query}, {rels_for_search}, {query_seq_num}, {return_if_found}") - query_triplets = re.findall("{[ ]?(.*?)[ ]?}", query)[0].split(' . ') - log.debug(f"(query_parser)query_triplets: {query_triplets}") - query_triplets = [triplet.split(' ')[:3] for triplet in query_triplets] - triplet_info_list = [("forw" if triplet[2].startswith('?') else "backw", search_source, rel_type) - for search_source, triplet, rel_type in zip(rels_for_search, query_triplets, rel_types) if - search_source != "do_not_rank"] - log.debug(f"(query_parser)rel_directions: {triplet_info_list}") - rel_variables = re.findall(":(r[\d]{1,2})", query) - entity_ids = [entity[:self.entities_to_leave] for entity in entity_ids] - if rels_from_template is not None: - rels = [[(rel, 1.0) for rel in rel_list] for rel_list in rels_from_template] - else: - rels = [self.find_top_rels(question, entity_ids, triplet_info) - for triplet_info in triplet_info_list] - rels_list_for_filter = [] - rels_list_for_fill = [] - filter_rel_variables = [] - fill_rel_variables = [] - for rel_variable, rel_list, is_filter in zip(rel_variables, rels, rels_for_filter): - if is_filter: - rels_list_for_filter.append(rel_list) - filter_rel_variables.append(rel_variable) - else: - rels_list_for_fill.append(rel_list) - fill_rel_variables.append(rel_variable) - log.debug(f"(query_parser)rels: {rels}") - log.debug(f"rel_variables {rel_variables}, filter_rel_variables: {filter_rel_variables}") - log.debug(f"rels_list_for_filter: {rels_list_for_filter}") - log.debug(f"rels_list_for_fill: {rels_list_for_fill}") - rels_from_query = list(set([triplet[1] for triplet in query_triplets if triplet[1].startswith('?')])) - if "count" in query: - answer_ent = re.findall("as (\?[\S]+)", query) - else: - answer_ent = re.findall("select [\(]?([\S]+) ", query) - - filter_from_query = re.findall("contains\((\?\w), (.+?)\)", query) - log.debug(f"(query_parser)filter_from_query: {filter_from_query}") - - year = extract_year(question_tokens, question) - number = extract_number(question_tokens, question) - log.debug(f"year {year}, number {number}") - if year: - for elem in filter_from_query: - query = query.replace(f"{elem[0]}, n", f"YEAR({elem[0]}), {year}") - elif number: - for elem in filter_from_query: - query = query.replace(f"{elem[0]}, n", f"{elem[0]}, {number}") - query = query.replace(" where", f" {' '.join(rels_from_query)} where") - - log.debug(f"(query_parser)query_with_filtering: {query}") - rel_combs = make_combs(rels_list_for_fill, permut=False) - log.debug(f"(query_parser)rel_combs: {rel_combs[:3]}") - import datetime - start_time = datetime.datetime.now() - entity_positions, type_positions = [elem.split('_') for elem in entities_and_types_select.split(' ')] - log.debug(f"entity_positions {entity_positions}, type_positions {type_positions}") - selected_entity_ids = [entity_ids[int(pos) - 1] for pos in entity_positions if int(pos) > 0] - selected_type_ids = [type_ids[int(pos) - 1] for pos in type_positions if int(pos) > 0] - entity_combs = make_combs(selected_entity_ids, permut=True) - log.debug(f"(query_parser)entity_combs: {entity_combs[:3]}") - type_combs = make_combs(selected_type_ids, permut=False) - log.debug(f"(query_parser)type_combs: {type_combs[:3]}") - confidence = 0.0 - queries_list = [] - parser_info_list = [] - all_combs_list = list(itertools.product(entity_combs, type_combs, rel_combs)) - for comb_num, combs in enumerate(all_combs_list): - filled_query, filter_rels = fill_online_query(query, combs[0], combs[1], combs[2], fill_rel_variables, - filter_rel_variables, rels_list_for_filter) - if comb_num == 0: - log.debug(f"\n___________________________\nfilled query: {filled_query}\n___________________________\n") - queries_list.append((filled_query, return_if_found)) - parser_info_list.append("query_execute") - - candidate_outputs_list = self.wiki_parser(parser_info_list, queries_list) - outputs_len = len(candidate_outputs_list) - all_combs_list = all_combs_list[:outputs_len] - out_vars = filter_rels + rels_from_query + answer_ent - - candidate_outputs = [] - for combs, candidate_output in zip(all_combs_list, candidate_outputs_list): - candidate_output = [output for output in candidate_output - if (all([filter_value in output[filter_var[1:]]["value"] - for filter_var, filter_value in property_types.items()]) - and all([not output[ent[1:]]["value"].startswith("http://www.wikidata.org/value") - for ent in answer_ent]))] - candidate_outputs += [combs[2][:-1] + [output[var[1:]]["value"] for var in out_vars] + [confidence] - for output in candidate_output] - - log.debug(f"(query_parser)loop time: {datetime.datetime.now() - start_time}") - log.debug(f"(query_parser)final outputs: {candidate_outputs[:3]}") - - return candidate_outputs diff --git a/deeppavlov/models/kbqa/rel_ranking_bert_infer.py b/deeppavlov/models/kbqa/rel_ranking_bert_infer.py deleted file mode 100644 index 2d0d0608fc..0000000000 --- a/deeppavlov/models/kbqa/rel_ranking_bert_infer.py +++ /dev/null @@ -1,190 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from logging import getLogger -from typing import Tuple, List, Any, Optional - -from deeppavlov.core.common.registry import register -from deeppavlov.core.models.component import Component -from deeppavlov.core.models.serializable import Serializable -from deeppavlov.core.common.file import load_pickle -from deeppavlov.models.ranking.rel_ranker import RelRanker -from deeppavlov.models.kbqa.wiki_parser import WikiParser -from deeppavlov.models.preprocessors.bert_preprocessor import BertPreprocessor -from deeppavlov.models.kbqa.sentence_answer import sentence_answer - -log = getLogger(__name__) - - -@register('rel_ranking_bert_infer') -class RelRankerBertInfer(Component, Serializable): - """Class for ranking of paths in subgraph""" - - def __init__(self, load_path: str, - rel_q2name_filename: str, - ranker: RelRanker, - bert_preprocessor: Optional[BertPreprocessor] = None, - wiki_parser: Optional[WikiParser] = None, - batch_size: int = 32, - rels_to_leave: int = 40, - return_all_possible_answers: bool = False, - return_answer_ids: bool = False, - use_api_requester: bool = False, - use_mt_bert: bool = False, - return_sentence_answer: bool = False, - return_confidences: bool = False, **kwargs): - """ - - Args: - load_path: path to folder with wikidata files - rel_q2name_filename: name of file which maps relation id to name - ranker: component deeppavlov.models.ranking.rel_ranker - bert_perprocessor: component deeppavlov.models.preprocessors.bert_preprocessor - wiki_parser: component deeppavlov.models.wiki_parser - batch_size: infering batch size - rels_to_leave: how many relations to leave after relation ranking - return_all_possible_answers: whether to return all found answers - return_answer_ids: whether to return answer ids from Wikidata - use_api_requester: whether wiki parser will be used as external api - use_mt_bert: whether nultitask bert is used for ranking - return_sentence_answer: whether to return answer as a sentence - return_confidences: whether to return confidences of candidate answers - **kwargs: - """ - super().__init__(save_path=None, load_path=load_path) - self.rel_q2name_filename = rel_q2name_filename - self.ranker = ranker - self.bert_preprocessor = bert_preprocessor - self.wiki_parser = wiki_parser - self.batch_size = batch_size - self.rels_to_leave = rels_to_leave - self.return_all_possible_answers = return_all_possible_answers - self.return_answer_ids = return_answer_ids - self.use_api_requester = use_api_requester - self.use_mt_bert = use_mt_bert - self.return_sentence_answer = return_sentence_answer - self.return_confidences = return_confidences - self.load() - - def load(self) -> None: - self.rel_q2name = load_pickle(self.load_path / self.rel_q2name_filename) - - def save(self) -> None: - pass - - def __call__(self, questions_list: List[str], candidate_answers_list: List[List[Tuple[str]]], - entities_list: List[List[str]] = None, template_answers_list: List[str] = None) -> List[str]: - answers = [] - confidence = 0.0 - if entities_list is None: - entities_list = [[] for _ in questions_list] - if template_answers_list is None: - template_answers_list = ["" for _ in questions_list] - for question, candidate_answers, entities, template_answer in \ - zip(questions_list, candidate_answers_list, entities_list, template_answers_list): - answers_with_scores = [] - answer = "Not Found" - - n_batches = len(candidate_answers) // self.batch_size + int(len(candidate_answers) % self.batch_size > 0) - for i in range(n_batches): - questions_batch = [] - rels_labels_batch = [] - answers_batch = [] - confidences_batch = [] - for candidate_ans_and_rels in candidate_answers[i * self.batch_size: (i + 1) * self.batch_size]: - candidate_rels = candidate_ans_and_rels[:-2] - candidate_rels = [candidate_rel.split('/')[-1] for candidate_rel in candidate_rels] - candidate_answer = candidate_ans_and_rels[-2] - candidate_confidence = candidate_ans_and_rels[-1] - candidate_rels = " # ".join([self.rel_q2name[candidate_rel] \ - for candidate_rel in candidate_rels if - candidate_rel in self.rel_q2name]) - - if candidate_rels: - questions_batch.append(question) - rels_labels_batch.append(candidate_rels) - answers_batch.append(candidate_answer) - confidences_batch.append(candidate_confidence) - - if self.use_mt_bert: - features = self.bert_preprocessor(questions_batch, rels_labels_batch) - probas = self.ranker(features) - else: - probas = self.ranker(questions_batch, rels_labels_batch) - probas = [proba[1] for proba in probas] - for j, (answer, confidence, rels_labels) in \ - enumerate(zip(answers_batch, confidences_batch, rels_labels_batch)): - answers_with_scores.append((answer, rels_labels, max(probas[j], confidence))) - - answers_with_scores = sorted(answers_with_scores, key=lambda x: x[-1], reverse=True) - - if answers_with_scores: - log.debug(f"answers: {answers_with_scores[0]}") - answer_ids = answers_with_scores[0][0] - if self.return_all_possible_answers and isinstance(answer_ids, tuple): - answer_ids_input = [(answer_id, question) for answer_id in answer_ids] - else: - answer_ids_input = [(answer_ids, question)] - parser_info_list = ["find_label" for _ in answer_ids_input] - answer_labels = self.wiki_parser(parser_info_list, answer_ids_input) - if self.use_api_requester: - answer_labels = [label[0] for label in answer_labels] - if self.return_all_possible_answers: - answer_labels = list(set(answer_labels)) - answer_labels = [label for label in answer_labels if (label and label != "Not Found")][:5] - answer_labels = [str(label) for label in answer_labels] - if len(answer_labels) > 2: - answer = f"{', '.join(answer_labels[:-1])} and {answer_labels[-1]}" - else: - answer = ', '.join(answer_labels) - else: - answer = answer_labels[0] - if self.return_sentence_answer: - answer = sentence_answer(question, answer, entities, template_answer) - confidence = answers_with_scores[0][2] - - if self.return_confidences: - answers.append((answer, confidence)) - else: - if self.return_answer_ids: - answers.append((answer, answer_ids)) - else: - answers.append(answer) - - return answers - - def rank_rels(self, question: str, candidate_rels: List[str]) -> List[Tuple[str, Any]]: - rels_with_scores = [] - n_batches = len(candidate_rels) // self.batch_size + int(len(candidate_rels) % self.batch_size > 0) - for i in range(n_batches): - questions_batch = [] - rels_labels_batch = [] - rels_batch = [] - for candidate_rel in candidate_rels[i * self.batch_size: (i + 1) * self.batch_size]: - if candidate_rel in self.rel_q2name: - questions_batch.append(question) - rels_batch.append(candidate_rel) - rels_labels_batch.append(self.rel_q2name[candidate_rel]) - if questions_batch: - if self.use_mt_bert: - features = self.bert_preprocessor(questions_batch, rels_labels_batch) - probas = self.ranker(features) - else: - probas = self.ranker(questions_batch, rels_labels_batch) - probas = [proba[1] for proba in probas] - for j, rel in enumerate(rels_batch): - rels_with_scores.append((rel, probas[j])) - rels_with_scores = sorted(rels_with_scores, key=lambda x: x[1], reverse=True) - - return rels_with_scores[:self.rels_to_leave] diff --git a/deeppavlov/models/kbqa/rel_ranking_infer.py b/deeppavlov/models/kbqa/rel_ranking_infer.py index 6d851bad99..6aefdafc45 100644 --- a/deeppavlov/models/kbqa/rel_ranking_infer.py +++ b/deeppavlov/models/kbqa/rel_ranking_infer.py @@ -12,42 +12,70 @@ # See the License for the specific language governing permissions and # limitations under the License. -from typing import Tuple, List, Any +from logging import getLogger +from typing import Tuple, List, Any, Optional from scipy.special import softmax +from deeppavlov.core.common.chainer import Chainer +from deeppavlov.core.common.file import load_pickle from deeppavlov.core.common.registry import register from deeppavlov.core.models.component import Component from deeppavlov.core.models.serializable import Serializable -from deeppavlov.core.common.file import load_pickle -from deeppavlov.models.ranking.rel_ranker import RelRanker +from deeppavlov.models.kbqa.sentence_answer import sentence_answer +from deeppavlov.models.kbqa.wiki_parser import WikiParser + +log = getLogger(__name__) @register('rel_ranking_infer') class RelRankerInfer(Component, Serializable): - """This class performs ranking of candidate relations""" + """Class for ranking of paths in subgraph""" def __init__(self, load_path: str, rel_q2name_filename: str, - ranker: RelRanker, - rels_to_leave: int = 15, - batch_size: int = 100, **kwargs): - + ranker: Chainer = None, + wiki_parser: Optional[WikiParser] = None, + batch_size: int = 32, + rels_to_leave: int = 40, + softmax: bool = False, + return_all_possible_answers: bool = False, + return_answer_ids: bool = False, + use_api_requester: bool = False, + return_sentence_answer: bool = False, + rank: bool = True, + return_confidences: bool = False, **kwargs): """ Args: load_path: path to folder with wikidata files rel_q2name_filename: name of file which maps relation id to name - ranker: deeppavlov.models.ranking.rel_ranker - rels_to_leave: how many top scored relations leave + ranker: component deeppavlov.models.ranking.rel_ranker + wiki_parser: component deeppavlov.models.wiki_parser batch_size: infering batch size + rels_to_leave: how many relations to leave after relation ranking + softmax: whether to process relation scores with softmax function + return_all_possible_answers: whether to return all found answers + return_answer_ids: whether to return answer ids from Wikidata + use_api_requester: whether wiki parser will be used as external api + return_sentence_answer: whether to return answer as a sentence + rank: whether to rank relations or simple copy input + return_confidences: whether to return confidences of candidate answers **kwargs: """ super().__init__(save_path=None, load_path=load_path) self.rel_q2name_filename = rel_q2name_filename self.ranker = ranker - self.rels_to_leave = rels_to_leave + self.wiki_parser = wiki_parser self.batch_size = batch_size + self.rels_to_leave = rels_to_leave + self.softmax = softmax + self.return_all_possible_answers = return_all_possible_answers + self.return_answer_ids = return_answer_ids + self.use_api_requester = use_api_requester + self.return_sentence_answer = return_sentence_answer + self.rank = rank + self.return_confidences = return_confidences self.load() def load(self) -> None: @@ -56,35 +84,133 @@ def load(self) -> None: def save(self) -> None: pass - def __call__(self, question_batch: List[str], candidate_rels_batch: List[List[str]]) -> \ - List[List[Tuple[str, Any]]]: - rels_with_scores_batch = [] - for question, candidate_rels in zip(question_batch, candidate_rels_batch): - rels_with_scores_batch.append(self.rank_rels(question, candidate_rels)) - return rels_with_scores_batch + def __call__(self, questions_list: List[str], + candidate_answers_list: List[List[Tuple[str]]], + entities_list: List[List[str]] = None, + template_answers_list: List[str] = None) -> List[str]: + answers = [] + confidence = 0.0 + if entities_list is None: + entities_list = [[] for _ in questions_list] + if template_answers_list is None: + template_answers_list = ["" for _ in questions_list] + for question, candidate_answers, entities, template_answer in \ + zip(questions_list, candidate_answers_list, entities_list, template_answers_list): + answers_with_scores = [] + answer = "Not Found" + if self.rank: + n_batches = len(candidate_answers) // self.batch_size + int( + len(candidate_answers) % self.batch_size > 0) + for i in range(n_batches): + questions_batch = [] + rels_batch = [] + rels_labels_batch = [] + answers_batch = [] + entities_batch = [] + confidences_batch = [] + for candidate_ans_and_rels in candidate_answers[i * self.batch_size: (i + 1) * self.batch_size]: + candidate_rels = [] + candidate_rels_str, candidate_answer = "", "" + candidate_entities, candidate_confidence = [], [] + if candidate_ans_and_rels: + candidate_rels = candidate_ans_and_rels["relations"] + candidate_rels = [candidate_rel.split('/')[-1] for candidate_rel in candidate_rels] + candidate_answer = candidate_ans_and_rels["answers"] + candidate_entities = candidate_ans_and_rels["entities"] + candidate_confidence = candidate_ans_and_rels["rel_conf"] + candidate_rels_str = " # ".join([self.rel_q2name[candidate_rel] \ + for candidate_rel in candidate_rels if + candidate_rel in self.rel_q2name]) + if candidate_rels_str: + questions_batch.append(question) + rels_batch.append(candidate_rels) + rels_labels_batch.append(candidate_rels_str) + answers_batch.append(candidate_answer) + entities_batch.append(candidate_entities) + confidences_batch.append(candidate_confidence) + + if questions_batch: + probas = self.ranker(questions_batch, rels_labels_batch) + probas = [proba[1] for proba in probas] + for j, (answer, entities, confidence, rels_ids, rels_labels) in \ + enumerate(zip(answers_batch, entities_batch, confidences_batch, rels_batch, + rels_labels_batch)): + answers_with_scores.append( + (answer, entities, rels_labels, rels_ids, max(probas[j], confidence))) + + answers_with_scores = sorted(answers_with_scores, key=lambda x: x[-1], reverse=True) + else: + answers_with_scores = [(answer, rels, conf) for *rels, answer, conf in candidate_answers] + + answer_ids = tuple() + if answers_with_scores: + log.debug(f"answers: {answers_with_scores[0]}") + answer_ids = answers_with_scores[0][0] + if self.return_all_possible_answers and isinstance(answer_ids, tuple): + answer_ids_input = [(answer_id, question) for answer_id in answer_ids] + answer_ids = [answer_id.split("/")[-1] for answer_id in answer_ids] + else: + answer_ids_input = [(answer_ids, question)] + answer_ids = answer_ids.split("/")[-1] + parser_info_list = ["find_label" for _ in answer_ids_input] + answer_labels = self.wiki_parser(parser_info_list, answer_ids_input) + log.debug(f"answer_labels {answer_labels}") + if self.return_all_possible_answers: + answer_labels = list(set(answer_labels)) + answer_labels = [label for label in answer_labels if (label and label != "Not Found")][:5] + answer_labels = [str(label) for label in answer_labels] + if len(answer_labels) > 2: + answer = f"{', '.join(answer_labels[:-1])} and {answer_labels[-1]}" + else: + answer = ', '.join(answer_labels) + else: + answer = answer_labels[0] + if self.return_sentence_answer: + try: + answer = sentence_answer(question, answer, entities, template_answer) + except: + log.info("Error in sentence answer") + confidence = answers_with_scores[0][2] + if self.return_confidences: + answers.append((answer, confidence)) + else: + if self.return_answer_ids: + if not answer_ids: + answer_ids = "Not found" + answers.append((answer, answer_ids)) + else: + answers.append(answer) + if not answers: + if self.return_confidences: + answers.append(("Not found", 0.0)) + else: + answers.append("Not found") + + return answers def rank_rels(self, question: str, candidate_rels: List[str]) -> List[Tuple[str, Any]]: rels_with_scores = [] - n_batches = len(candidate_rels) // self.batch_size + int(len(candidate_rels) % self.batch_size > 0) - for i in range(n_batches): - questions_batch = [] - rels_labels_batch = [] - rels_batch = [] - for candidate_rel in candidate_rels[i * self.batch_size: (i + 1) * self.batch_size]: - if candidate_rel in self.rel_q2name: - questions_batch.append(question) - rels_batch.append(candidate_rel) - rels_labels_batch.append(self.rel_q2name[candidate_rel]) - if questions_batch: - probas = self.ranker(questions_batch, rels_labels_batch) - probas = [proba[1] for proba in probas] - for j, rel in enumerate(rels_batch): - rels_with_scores.append((rel, probas[j])) - scores = [score for rel, score in rels_with_scores] - if scores: - softmax_scores = softmax(scores) - rels_with_scores = [(rel, softmax_score) for (rel, score), softmax_score in - zip(rels_with_scores, softmax_scores)] + if question is not None: + n_batches = len(candidate_rels) // self.batch_size + int(len(candidate_rels) % self.batch_size > 0) + for i in range(n_batches): + questions_batch = [] + rels_labels_batch = [] + rels_batch = [] + for candidate_rel in candidate_rels[i * self.batch_size: (i + 1) * self.batch_size]: + if candidate_rel in self.rel_q2name: + questions_batch.append(question) + rels_batch.append(candidate_rel) + rels_labels_batch.append(self.rel_q2name[candidate_rel]) + if questions_batch: + probas = self.ranker(questions_batch, rels_labels_batch) + probas = [proba[1] for proba in probas] + for j, rel in enumerate(rels_batch): + rels_with_scores.append((rel, probas[j])) + if self.softmax: + scores = [score for rel, score in rels_with_scores] + softmax_scores = softmax(scores) + rels_with_scores = [(rel, softmax_score) for (rel, score), softmax_score in + zip(rels_with_scores, softmax_scores)] rels_with_scores = sorted(rels_with_scores, key=lambda x: x[1], reverse=True) return rels_with_scores[:self.rels_to_leave] diff --git a/deeppavlov/models/kbqa/sentence_answer.py b/deeppavlov/models/kbqa/sentence_answer.py index ea7042c819..847335deeb 100644 --- a/deeppavlov/models/kbqa/sentence_answer.py +++ b/deeppavlov/models/kbqa/sentence_answer.py @@ -12,13 +12,24 @@ # See the License for the specific language governing permissions and # limitations under the License. +import importlib import re from logging import getLogger +import pkg_resources import spacy log = getLogger(__name__) +# en_core_web_sm is installed and used by test_inferring_pretrained_model in the same interpreter session during tests. +# Spacy checks en_core_web_sm package presence with pkg_resources, but pkg_resources is initialized with interpreter, +# sot it doesn't see en_core_web_sm installed after interpreter initialization, so we use importlib.reload below. + +if 'en-core-web-sm' not in pkg_resources.working_set.by_key.keys(): + importlib.reload(pkg_resources) + +# TODO: move nlp to sentence_answer, sentence_answer to rel_ranking_infer and revise en_core_web_sm requirement, +# TODO: make proper downloading with spacy.cli.download nlp = spacy.load('en_core_web_sm') pronouns = ["who", "what", "when", "where", "how"] diff --git a/deeppavlov/models/kbqa/template_matcher.py b/deeppavlov/models/kbqa/template_matcher.py index fdb589adba..001be79573 100644 --- a/deeppavlov/models/kbqa/template_matcher.py +++ b/deeppavlov/models/kbqa/template_matcher.py @@ -69,8 +69,8 @@ def save(self) -> None: raise NotImplementedError def __call__(self, question: str, entities_from_ner: List[str]) -> \ - Tuple[Union[List[str], list], list, Union[list, Any], Union[list, Any], Union[str, Any], Any, Union[ - str, Any]]: + Tuple[Union[List[str], list], list, Union[list, Any], Union[list, Any], Union[str, Any], Union[list, Any], + Union[str, Any], Union[list, Any], Union[str, Any]]: question = question.lower() question = self.sanitize(question) question_length = len(question) @@ -79,6 +79,7 @@ def __call__(self, question: str, entities_from_ner: List[str]) -> \ template_found = "" entity_types = [] template_answer = "" + answer_types = [] results = self.pool.map(RegexpMatcher(question), self.templates) results = functools.reduce(lambda x, y: x + y, results) replace_tokens = [("the uk", "united kingdom"), ("the us", "united states")] @@ -114,9 +115,11 @@ def __call__(self, question: str, entities_from_ner: List[str]) -> \ query_type = template["template_type"] entity_types = template.get("entity_types", []) template_answer = template.get("template_answer", "") + answer_types = template.get("answer_types", []) min_length = cur_len - return entities, types, relations, relation_dirs, query_type, entity_types, template_answer, template_found + return entities, types, relations, relation_dirs, query_type, entity_types, template_answer, answer_types, \ + template_found def sanitize(self, question: str) -> str: question = re.sub(r"^(a |the )", '', question) diff --git a/deeppavlov/models/kbqa/tree_to_sparql.py b/deeppavlov/models/kbqa/tree_to_sparql.py index 1793164cc5..b5ff26c44b 100644 --- a/deeppavlov/models/kbqa/tree_to_sparql.py +++ b/deeppavlov/models/kbqa/tree_to_sparql.py @@ -12,22 +12,25 @@ # See the License for the specific language governing permissions and # limitations under the License. +import re +from collections import defaultdict from io import StringIO -from typing import Any, List, Tuple, Dict, Union from logging import getLogger -from collections import defaultdict +from typing import Any, List, Tuple, Dict, Union import numpy as np import pymorphy2 -import re +from navec import Navec from scipy.sparse import csr_matrix +from slovnet import Syntax from udapi.block.read.conllu import Conllu from udapi.core.node import Node -from deeppavlov.core.models.component import Component -from deeppavlov.core.common.file import read_json from deeppavlov.core.commands.utils import expand_path +from deeppavlov.core.common.file import read_json from deeppavlov.core.common.registry import register +from deeppavlov.core.models.component import Component +from deeppavlov.core.models.serializable import Serializable log = getLogger(__name__) @@ -110,6 +113,57 @@ def make_sparse_matrix(self, words: List[str]): return matrix +@register('slovnet_syntax_parser') +class SlovnetSyntaxParser(Component, Serializable): + """Class for syntax parsing using Slovnet library""" + + def __init__(self, load_path: str, navec_filename: str, syntax_parser_filename: str, **kwargs): + super().__init__(save_path=None, load_path=load_path) + self.navec_filename = expand_path(navec_filename) + self.syntax_parser_filename = expand_path(syntax_parser_filename) + self.re_tokenizer = re.compile(r"[\w']+|[^\w ]") + self.load() + + def load(self) -> None: + navec = Navec.load(self.navec_filename) + self.syntax = Syntax.load(self.syntax_parser_filename) + self.syntax.navec(navec) + + def save(self) -> None: + pass + + def __call__(self, sentences, entity_offsets_batch): + sentences_tok = [] + for sentence, entity_offsets in zip(sentences, entity_offsets_batch): + for start, end in entity_offsets: + entity_old = sentence[start:end] + entity_new = entity_old.capitalize() + sentence = sentence.replace(entity_old, entity_new) + sentence = sentence.capitalize() + sentences_tok.append(re.findall(self.re_tokenizer, sentence)) + markup = list(self.syntax.map(sentences_tok)) + + processed_markup_batch = [] + for markup_elem in markup: + processed_markup = [] + ids, words, head_ids, rels = [], [], [], [] + for elem in markup_elem.tokens: + ids.append(elem.id) + words.append(elem.text) + head_ids.append(elem.head_id) + rels.append(elem.rel) + if "root" not in {rel.lower() for rel in rels}: + for n, (elem_id, head_id) in enumerate(zip(ids, head_ids)): + if elem_id == head_id: + rels[n] = "root" + head_ids[n] = 0 + for elem_id, word, head_id, rel in zip(ids, words, head_ids, rels): + processed_markup.append(f"{elem_id}\t{word}\t_\t_\t_\t_\t{head_id}\t{rel}\t_\t_") + processed_markup_batch.append("\n".join(processed_markup)) + + return processed_markup_batch + + @register('tree_to_sparql') class TreeToSparql(Component): """ @@ -163,9 +217,12 @@ def __call__(self, syntax_tree_batch: List[str], count = False for syntax_tree, positions in zip(syntax_tree_batch, positions_batch): log.debug(f"\n{syntax_tree}") - tree = Conllu(filehandle=StringIO(syntax_tree)).read_tree() - root = self.find_root(tree) - tree_desc = tree.descendants + try: + tree = Conllu(filehandle=StringIO(syntax_tree)).read_tree() + root = self.find_root(tree) + tree_desc = tree.descendants + except ValueError: + root = "" unknown_node = "" if root: log.debug(f"syntax tree info, root: {root.form}") diff --git a/deeppavlov/models/kbqa/type_define.py b/deeppavlov/models/kbqa/type_define.py new file mode 100644 index 0000000000..0376934c05 --- /dev/null +++ b/deeppavlov/models/kbqa/type_define.py @@ -0,0 +1,154 @@ +# Copyright 2017 Neural Networks and Deep Learning lab, MIPT +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import pickle +from typing import List + +import pymorphy2 +import spacy +from nltk.corpus import stopwords + +from deeppavlov.core.commands.utils import expand_path +from deeppavlov.core.common.registry import register + + +@register('answer_types_extractor') +class AnswerTypesExtractor: + """Class which defines answer types for the question""" + + def __init__(self, lang: str, types_filename: str, types_sets_filename: str, + num_types_to_return: int = 15, **kwargs): + """ + + Args: + lang: Russian or English + types_filename: filename with dictionary where keys are type ids and values are type labels + types_sets_filename: filename with dictionary where keys are NER tags and values are Wikidata types + corresponding to tags + num_types_to_return: how many answer types to return for each question + **kwargs: + """ + self.lang = lang + self.types_filename = str(expand_path(types_filename)) + self.types_sets_filename = str(expand_path(types_sets_filename)) + self.num_types_to_return = num_types_to_return + self.morph = pymorphy2.MorphAnalyzer() + if self.lang == "@en": + self.stopwords = set(stopwords.words("english")) + self.nlp = spacy.load("en_core_web_sm") + self.pronouns = ["what"] + elif self.lang == "@ru": + self.stopwords = set(stopwords.words("russian")) + self.nlp = spacy.load("ru_core_news_sm") + self.pronouns = ["какой", "каком"] + with open(self.types_filename, 'rb') as fl: + self.types_dict = pickle.load(fl) + with open(self.types_sets_filename, 'rb') as fl: + self.types_sets = pickle.load(fl) + + def __call__(self, questions_batch: List[str], entity_substr_batch: List[List[str]], + tags_batch: List[List[str]], types_substr_batch: List[List[str]] = None): + types_sets_batch = [] + if types_substr_batch is None: + types_substr_batch = [] + for question, entity_substr_list in zip(questions_batch, entity_substr_batch): + types_substr = [] + type_noun = "" + doc = self.nlp(question) + token_pos_dict = {} + for n, token in enumerate(doc): + token_pos_dict[token.text] = n + for token in doc: + if token.text.lower() in self.pronouns and token.head.dep_ in ["attr", "nsubj"]: + type_noun = token.head.text + if not any([type_noun in entity_substr.lower() for entity_substr in entity_substr_list]): + types_substr.append(type_noun) + break + if type_noun: + for token in doc: + if token.head.text == type_noun and token.dep_ in ["amod", "compound"]: + type_adj = token.text + if not any([type_adj.lower() in entity_substr.lower() for entity_substr in + entity_substr_list]): + types_substr.append(type_adj) + break + elif token.head.text == type_noun and token.dep_ == "prep": + if len(list(token.children)) == 1 \ + and not any([list(token.children)[0] in entity_substr.lower() + for entity_substr in entity_substr_list]): + types_substr += [token.text, list(token.children)[0]] + elif any([word in question for word in self.pronouns]): + for token in doc: + if token.dep_ == "nsubj" and not any([token.text in entity_substr.lower() + for entity_substr in entity_substr_list]): + types_substr.append(token.text) + types_substr = [(token, token_pos_dict[token]) for token in types_substr] + types_substr = sorted(types_substr, key=lambda x: x[1]) + types_substr = " ".join([elem[0] for elem in types_substr]) + types_substr_batch.append(types_substr) + for types_substr in types_substr_batch: + types_substr_tokens = types_substr.split() + types_substr_tokens = [tok for tok in types_substr_tokens if tok not in self.stopwords] + if self.lang == "@ru": + types_substr_tokens = [self.morph.parse(tok)[0].normal_form for tok in types_substr_tokens] + types_substr_tokens = set(types_substr_tokens) + types_scores = [] + for entity in self.types_dict: + labels, cnt = self.types_dict[entity] + cur_cnts = [] + for label in labels: + label_tokens = label.lower().split() + if len(types_substr_tokens) == 1 and len(label_tokens) == 2 and \ + list(types_substr_tokens)[0] == label_tokens[0]: + cur_cnts.append(0.3) + else: + inters = types_substr_tokens.intersection(set(label_tokens)) + cur_cnts.append(len(inters) / max(len(types_substr_tokens), len(label_tokens))) + + types_scores.append([entity, max(cur_cnts), cnt]) + types_scores = sorted(types_scores, key=lambda x: (x[1], x[2]), reverse=True) + cur_types = [elem[0] for elem in types_scores if elem[1] > 0][:self.num_types_to_return] + types_sets_batch.append(cur_types) + + for n, (question, types_sets) in enumerate(zip(questions_batch, types_sets_batch)): + question = question.lower() + if not types_sets: + if self.lang == "@ru": + if question.startswith("кто"): + types_sets_batch[n] = self.types_sets["PER"] + elif question.startswith("где"): + types_sets_batch[n] = self.types_sets["LOC"] + elif self.lang == "@en": + if question.startswith("who"): + types_sets_batch[n] = self.types_sets["PER"] + elif question.startswith("where"): + types_sets_batch[n] = self.types_sets["LOC"] + + new_entity_substr_batch, new_entity_offsets_batch, new_tags_batch = [], [], [] + for question, entity_substr_list, tags_list in zip(questions_batch, entity_substr_batch, tags_batch): + new_entity_substr, new_tags = [], [] + if not entity_substr_list: + doc = self.nlp(question) + for token in doc: + if token.dep_ == "nsubj": + new_entity_substr.append(token.text) + new_tags.append("MISC") + break + new_entity_substr_batch.append(new_entity_substr) + new_tags_batch.append(new_tags) + else: + new_entity_substr_batch.append(entity_substr_list) + new_tags_batch.append(tags_list) + + return types_sets_batch, new_entity_substr_batch, new_tags_batch diff --git a/deeppavlov/models/kbqa/utils.py b/deeppavlov/models/kbqa/utils.py index 10e76c4f56..1af3cb6912 100644 --- a/deeppavlov/models/kbqa/utils.py +++ b/deeppavlov/models/kbqa/utils.py @@ -14,7 +14,7 @@ import re import itertools -from typing import List, Tuple +from typing import List def extract_year(question_tokens: List[str], question: str) -> str: @@ -84,13 +84,13 @@ def fill_query(query: List[str], entity_comb: List[str], type_comb: List[str], r rel_comb: ["P17"] ''' query = " ".join(query) - map_query_str_to_wikidata = [("P0", "http://schema.org/description"), - ("wd:", "http://www.wikidata.org/entity/"), - ("wdt:", "http://www.wikidata.org/prop/direct/"), - (" p:", " http://www.wikidata.org/prop/"), - ("wdt:", "http://www.wikidata.org/prop/direct/"), - ("ps:", "http://www.wikidata.org/prop/statement/"), - ("pq:", "http://www.wikidata.org/prop/qualifier/")] + map_query_str_to_wikidata = [("P0", "http://wd"), + ("P00", "http://wl"), + ("wd:", "http://we/"), + ("wdt:", "http://wpd/"), + (" p:", " http://wp/"), + ("ps:", "http://wps/"), + ("pq:", "http://wpq/")] for query_str, wikidata_str in map_query_str_to_wikidata: query = query.replace(query_str, wikidata_str) @@ -100,45 +100,7 @@ def fill_query(query: List[str], entity_comb: List[str], type_comb: List[str], r query = query.replace(f"t{n + 1}", entity_type) for n, (rel, score) in enumerate(rel_comb[:-1]): query = query.replace(f"r{n + 1}", rel) - query = query.replace("http://www.wikidata.org/prop/direct/P0", "http://schema.org/description") + query = query.replace("http://wpd/P0", "http://wd") + query = query.replace("http://wpd/P00", "http://wl") query = query.split(' ') return query - - -def fill_online_query(query: List[str], entity_comb: List[str], type_comb: List[str], - rel_comb: List[str], rels_to_replace: List[str], - rels_for_filter: List[str], rel_list_for_filter: List[List[str]]) -> Tuple[str, List[str]]: - rel_list_for_filter = [[rel for rel, score in rel_list] for rel_list in rel_list_for_filter] - for n, entity in enumerate(entity_comb[:-1]): - query = query.replace(f"e{n + 1}", entity) - for n, entity_type in enumerate(type_comb[:-1]): # type_entity - query = query.replace(f"t{n + 1}", entity_type) - for n, (rel, score) in enumerate(rel_comb[:-1]): - query = query.replace(rels_to_replace[n], rel) - - candidate_rel_filters = [] - new_rels = [] - if rels_for_filter: - n = 0 - for rel, candidate_rels in zip(rels_for_filter, rel_list_for_filter): - rel_types = re.findall(f" ([\S]+:){rel}", query) - for rel_type in rel_types: - new_rel = f"?p{n + 1}" - query = query.replace(f'{rel_type}{rel}', new_rel) - new_rels.append(new_rel) - candidate_rels_filled = [f"{new_rel} = {rel_type}{rel_value}" for rel_value in candidate_rels] - candidate_rel_str = " || ".join(candidate_rels_filled) - candidate_rel_filters.append(f"({candidate_rel_str})") - n += 1 - - if "filter" in query: - query = query.replace("filter(", f"filter({'&&'.join(candidate_rel_filters)}&&") - else: - query = query.replace(" }", f" filter({'&&'.join(candidate_rel_filters)}) }}") - - query = query.replace(" where", f" {' '.join(new_rels)} where") - if rel_list_for_filter[0][0] == "P0" and len(entity_comb) == 2: - query = f"select ?ent ?p1 where {{ wd:{entity_comb[0]} ?p1" + \ - "?ent filter((?p1=schema:description)&&(lang(?ent)='en'))}}" - - return query, new_rels diff --git a/deeppavlov/models/kbqa/wiki_parser.py b/deeppavlov/models/kbqa/wiki_parser.py index 82323c3645..5368708200 100644 --- a/deeppavlov/models/kbqa/wiki_parser.py +++ b/deeppavlov/models/kbqa/wiki_parser.py @@ -29,9 +29,13 @@ @register('wiki_parser') class WikiParser: - """This class extract relations, objects or triplets from Wikidata HDT file""" + """This class extract relations, objects or triplets from Wikidata HDT file.""" - def __init__(self, wiki_filename: str, file_format: str = "hdt", lang: str = "@en", **kwargs) -> None: + def __init__(self, wiki_filename: str, + file_format: str = "hdt", + prefixes: Dict[str, Union[str, Dict[str, str]]] = None, + max_comb_num: int = 1e6, + lang: str = "@en", **kwargs) -> None: """ Args: @@ -40,7 +44,22 @@ def __init__(self, wiki_filename: str, file_format: str = "hdt", lang: str = "@e lang: Russian or English language **kwargs: """ - self.description_rel = "http://schema.org/description" + + if prefixes is None: + prefixes = { + "entity": "http://we", + "label": "http://wl", + "alias": "http://wal", + "description": "http://wd", + "rels": { + "direct": "http://wpd", + "no_type": "http://wp", + "statement": "http://wps", + "qualifier": "http://wpq" + }, + "statement": "http://ws" + } + self.prefixes = prefixes self.file_format = file_format self.wiki_filename = str(expand_path(wiki_filename)) if self.file_format == "hdt": @@ -50,39 +69,113 @@ def __init__(self, wiki_filename: str, file_format: str = "hdt", lang: str = "@e self.parsed_document = {} else: raise ValueError("Unsupported file format") + + self.max_comb_num = max_comb_num self.lang = lang def __call__(self, parser_info_list: List[str], queries_list: List[Any]) -> List[Any]: + wiki_parser_output = self.execute_queries_list(parser_info_list, queries_list) + return wiki_parser_output + + def execute_queries_list(self, parser_info_list: List[str], queries_list: List[Any]): wiki_parser_output = [] + query_answer_types = [] for parser_info, query in zip(parser_info_list, queries_list): if parser_info == "query_execute": - *query_to_execute, return_if_found = query - candidate_output = self.execute(*query_to_execute) + candidate_output = [] + try: + what_return, query_seq, filter_info, order_info, answer_types, rel_types, return_if_found = query + if answer_types: + query_answer_types = answer_types + candidate_output = self.execute(what_return, query_seq, filter_info, order_info, + query_answer_types, rel_types) + except: + log.info("Wrong arguments are passed to wiki_parser") wiki_parser_output.append(candidate_output) - if return_if_found and candidate_output: - return wiki_parser_output elif parser_info == "find_rels": - wiki_parser_output += self.find_rels(*query) + rels = [] + try: + rels = self.find_rels(*query) + except: + log.info("Wrong arguments are passed to wiki_parser") + wiki_parser_output += rels + elif parser_info == "find_object": + objects = [] + try: + objects = self.find_object(*query) + except: + log.info("Wrong arguments are passed to wiki_parser") + wiki_parser_output.append(objects) + elif parser_info == "check_triplet": + check_res = False + try: + check_res = self.check_triplet(*query) + except: + log.info("Wrong arguments are passed to wiki_parser") + wiki_parser_output.append(check_res) elif parser_info == "find_label": - wiki_parser_output.append(self.find_label(*query)) + label = "" + try: + label = self.find_label(*query) + except: + log.info("Wrong arguments are passed to wiki_parser") + wiki_parser_output.append(label) + elif parser_info == "find_types": + types = [] + try: + types = self.find_types(query) + except: + log.info("Wrong arguments are passed to wiki_parser") + wiki_parser_output.append(types) elif parser_info == "find_triplets": if self.file_format == "hdt": - tr, c = self.document.search_triples(*query) - wiki_parser_output.append(list(tr)) + triplets = [] + try: + triplets_forw, c = self.document.search_triples(f"{self.prefixes['entity']}/{query}", "", "") + triplets.extend([triplet for triplet in triplets_forw + if not triplet[2].startswith(self.prefixes["statement"])]) + triplets_backw, c = self.document.search_triples("", "", f"{self.prefixes['entity']}/{query}") + triplets.extend([triplet for triplet in triplets_backw + if not triplet[0].startswith(self.prefixes["statement"])]) + except: + log.info("Wrong arguments are passed to wiki_parser") + wiki_parser_output.append(list(triplets)) else: - wiki_parser_output.append(self.document.get(query, {})) + triplets = {} + try: + triplets = self.document.get(query, {}) + except: + log.info("Wrong arguments are passed to wiki_parser") + uncompressed_triplets = {} + if triplets: + if "forw" in triplets: + uncompressed_triplets["forw"] = self.uncompress(triplets["forw"]) + if "backw" in triplets: + uncompressed_triplets["backw"] = self.uncompress(triplets["backw"]) + wiki_parser_output.append(uncompressed_triplets) + elif parser_info == "find_triplets_for_rel": + found_triplets = [] + try: + found_triplets, c = \ + self.document.search_triples("", f"{self.prefixes['rels']['direct']}/{query}", "") + except: + log.info("Wrong arguments are passed to wiki_parser") + wiki_parser_output.append(list(found_triplets)) elif parser_info == "parse_triplets" and self.file_format == "pickle": for entity in query: self.parse_triplets(entity) wiki_parser_output.append("ok") else: raise ValueError("Unsupported query type") + return wiki_parser_output def execute(self, what_return: List[str], query_seq: List[List[str]], filter_info: List[Tuple[str]] = None, - order_info: namedtuple = None) -> List[List[str]]: + order_info: namedtuple = None, + answer_types: List[str] = None, + rel_types: List[str] = None) -> List[List[str]]: """ Let us consider an example of the question "What is the deepest lake in Russia?" with the corresponding SPARQL query @@ -98,26 +191,27 @@ def execute(self, what_return: List[str], """ extended_combs = [] combs = [] - if "qualifier" not in filter_info: - for n, query in enumerate(query_seq): - unknown_elem_positions = [(pos, elem) for pos, elem in enumerate(query) if elem.startswith('?')] - """ - n = 0, query = ["?ent", "http://www.wikidata.org/prop/direct/P17", - "http://www.wikidata.org/entity/Q159"] - unknown_elem_positions = ["?ent"] - n = 1, query = ["?ent", "http://www.wikidata.org/prop/direct/P31", - "http://www.wikidata.org/entity/Q23397"] - unknown_elem_positions = [(0, "?ent")] - n = 2, query = ["?ent", "http://www.wikidata.org/prop/direct/P4511", "?obj"] - unknown_elem_positions = [(0, "?ent"), (2, "?obj")] - """ - if n == 0: - combs = self.search(query, unknown_elem_positions) - # combs = [{"?ent": "http://www.wikidata.org/entity/Q5513"}, ...] - else: - if combs: - known_elements = [] - extended_combs = [] + + for n, (query, rel_type) in enumerate(zip(query_seq, rel_types)): + unknown_elem_positions = [(pos, elem) for pos, elem in enumerate(query) if elem.startswith('?')] + """ + n = 0, query = ["?ent", "http://www.wikidata.org/prop/direct/P17", + "http://www.wikidata.org/entity/Q159"] + unknown_elem_positions = ["?ent"] + n = 1, query = ["?ent", "http://www.wikidata.org/prop/direct/P31", + "http://www.wikidata.org/entity/Q23397"] + unknown_elem_positions = [(0, "?ent")] + n = 2, query = ["?ent", "http://www.wikidata.org/prop/direct/P4511", "?obj"] + unknown_elem_positions = [(0, "?ent"), (2, "?obj")] + """ + if n == 0: + combs = self.search(query, unknown_elem_positions, rel_type) + # combs = [{"?ent": "http://www.wikidata.org/entity/Q5513"}, ...] + else: + if combs: + known_elements = [] + extended_combs = [] + if query[0].startswith("?"): for elem in query: if elem in combs[0].keys(): known_elements.append(elem) @@ -139,14 +233,21 @@ def execute(self, what_return: List[str], known_values = [comb[known_elem] for known_elem in known_elements] for known_elem, known_value in zip(known_elements, known_values): filled_query = [elem.replace(known_elem, known_value) for elem in query] - new_combs = self.search(filled_query, unknown_elem_positions) + new_combs = self.search(filled_query, unknown_elem_positions, rel_type) for new_comb in new_combs: extended_combs.append({**comb, **new_comb}) - combs = extended_combs + else: + new_combs = self.search(query, unknown_elem_positions, rel_type) + for comb in combs: + for new_comb in new_combs: + extended_combs.append({**comb, **new_comb}) + combs = extended_combs if combs: if filter_info: for filter_elem, filter_value in filter_info: + if filter_value == "qualifier": + filter_value = "wpq/" combs = [comb for comb in combs if filter_value in comb[filter_elem]] if order_info and not isinstance(order_info, list) and order_info.variable is not None: @@ -154,9 +255,9 @@ def execute(self, what_return: List[str], sort_elem = order_info.variable for i in range(len(combs)): value_str = combs[i][sort_elem].split('^^')[0].strip('"') - if value_str.endswith("T00:00:00Z"): - value_str = value_str.strip("T00:00:00Z") - combs[i][sort_elem] = value_str + fnd = re.findall(r"[\d]{3,4}-[\d]{1,2}-[\d]{1,2}", value_str) + if fnd: + combs[i][sort_elem] = fnd[0] else: combs[i][sort_elem] = float(value_str) combs = sorted(combs, key=lambda x: x[sort_elem], reverse=reverse) @@ -167,17 +268,35 @@ def execute(self, what_return: List[str], else: combs = [[elem[key] for key in what_return] for elem in combs] + if answer_types: + if answer_types == ["date"]: + combs = [[entity for entity in comb + if re.findall(r"[\d]{3,4}-[\d]{1,2}-[\d]{1,2}", entity)] for comb in combs] + else: + answer_types = set(answer_types) + combs = [[entity for entity in comb + if answer_types.intersection(self.find_types(entity))] for comb in combs] + combs = [comb for comb in combs if any([entity for entity in comb])] + return combs - def search(self, query: List[str], unknown_elem_positions: List[Tuple[int, str]]) -> List[Dict[str, str]]: + def search(self, query: List[str], unknown_elem_positions: List[Tuple[int, str]], rel_type) -> List[Dict[str, str]]: query = list(map(lambda elem: "" if elem.startswith('?') else elem, query)) subj, rel, obj = query if self.file_format == "hdt": - triplets, c = self.document.search_triples(subj, rel, obj) - if rel == self.description_rel: - triplets = [triplet for triplet in triplets if triplet[2].endswith(self.lang)] - combs = [{elem: triplet[pos] for pos, elem in unknown_elem_positions} for triplet in triplets] + combs = [] + triplets, cnt = self.document.search_triples(subj, rel, obj) + if cnt < self.max_comb_num: + if rel == self.prefixes["description"] or rel == self.prefixes["label"]: + triplets = [triplet for triplet in triplets if triplet[2].endswith(self.lang)] + combs = [{elem: triplet[pos] for pos, elem in unknown_elem_positions} for triplet in triplets] + else: + combs = [{elem: triplet[pos] for pos, elem in unknown_elem_positions} for triplet in triplets + if triplet[1].startswith(self.prefixes["rels"][rel_type])] + else: + log.debug("max comb num exceede") else: + triplets = [] if subj: subj, triplets = self.find_triplets(subj, "forw") triplets = [[subj, triplet[0], obj] for triplet in triplets for obj in triplet[1:]] @@ -185,34 +304,39 @@ def search(self, query: List[str], unknown_elem_positions: List[Tuple[int, str]] obj, triplets = self.find_triplets(obj, "backw") triplets = [[subj, triplet[0], obj] for triplet in triplets for subj in triplet[1:]] if rel: - if rel == self.description_rel: + if rel == self.prefixes["description"]: triplets = [triplet for triplet in triplets if triplet[1] == "descr_en"] else: rel = rel.split('/')[-1] triplets = [triplet for triplet in triplets if triplet[1] == rel] combs = [{elem: triplet[pos] for pos, elem in unknown_elem_positions} for triplet in triplets] + return combs def find_label(self, entity: str, question: str) -> str: entity = str(entity).replace('"', '') if self.file_format == "hdt": - if entity.startswith("Q"): + if entity.startswith("Q") or entity.startswith("P"): # example: "Q5513" - entity = "http://www.wikidata.org/entity/" + entity + entity = f"{self.prefixes['entity']}/{entity}" # "http://www.wikidata.org/entity/Q5513" - if entity.startswith("http://www.wikidata.org/entity/"): - labels, c = self.document.search_triples(entity, "http://www.w3.org/2000/01/rdf-schema#label", "") + if entity.startswith(self.prefixes["entity"]): + labels, c = self.document.search_triples(entity, self.prefixes["label"], "") # labels = [["http://www.wikidata.org/entity/Q5513", "http://www.w3.org/2000/01/rdf-schema#label", # '"Lake Baikal"@en'], ...] for label in labels: if label[2].endswith(self.lang): - found_label = label[2].strip(self.lang).replace('"', '') + found_label = label[2].strip(self.lang).replace('"', '').replace('$', ' ').replace(' ', ' ') + return found_label + for label in labels: + if label[2].endswith("@en"): + found_label = label[2].strip("@en").replace('"', '').replace('$', ' ').replace(' ', ' ') return found_label elif entity.endswith(self.lang): # entity: '"Lake Baikal"@en' - entity = entity[:-3] + entity = entity[:-3].replace('$', ' ').replace(' ', ' ') return entity elif "^^" in entity: @@ -224,14 +348,17 @@ def find_label(self, entity: str, question: str) -> str: entity = entity.split("^^")[0] for token in ["T00:00:00Z", "+"]: entity = entity.replace(token, '') - entity = self.format_date(entity, question) + entity = self.format_date(entity, question).replace('$', '') + return entity elif entity.isdigit(): + entity = str(entity).replace('.', ',') return entity + if self.file_format == "pickle": if entity: - if entity.startswith("Q"): + if entity.startswith("Q") or entity.startswith("P"): triplets = self.document.get(entity, {}).get("forw", []) triplets = self.uncompress(triplets) for triplet in triplets: @@ -244,48 +371,142 @@ def find_label(self, entity: str, question: str) -> str: return "Not Found" def format_date(self, entity, question): + dates_dict = {"January": "января", "February": "февраля", "March": "марта", "April": "апреля", "May": "мая", + "June": "июня", "July": "июля", "August": "августа", "September": "сентября", + "October": "октября", + "November": "ноября", "December": "декабря"} date_info = re.findall("([\d]{3,4})-([\d]{1,2})-([\d]{1,2})", entity) if date_info: year, month, day = date_info[0] - if "how old" in question.lower(): + if "how old" in question.lower() or "сколько лет" in question.lower(): entity = datetime.datetime.now().year - int(year) + elif "в каком году" in question.lower(): + entity = year + elif "в каком месяце" in question.lower(): + entity = month elif day != "00": date = datetime.datetime.strptime(f"{year}-{month}-{day}", "%Y-%m-%d") entity = date.strftime("%d %B %Y") else: entity = year - return entity + if self.lang == "@ru": + for mnth, mnth_replace in dates_dict.items(): + entity = entity.replace(mnth, mnth_replace) + return str(entity) entity = entity.lstrip('+-') return entity def find_alias(self, entity: str) -> List[str]: aliases = [] - if entity.startswith("http://www.wikidata.org/entity/"): - labels, cardinality = self.document.search_triples(entity, - "http://www.w3.org/2004/02/skos/core#altLabel", "") + if entity.startswith(self.prefixes["entity"]): + labels, cardinality = self.document.search_triples(entity, self.prefixes["alias"], "") aliases = [label[2].strip(self.lang).strip('"') for label in labels if label[2].endswith(self.lang)] return aliases - def find_rels(self, entity: str, direction: str, rel_type: str = "no_type") -> List[str]: + def find_rels(self, entity: str, direction: str, rel_type: str = "no_type", save: bool = False) -> List[str]: rels = [] if self.file_format == "hdt": + if not rel_type: + rel_type = "direct" if direction == "forw": - query = [f"http://www.wikidata.org/entity/{entity}", "", ""] + query = [f"{self.prefixes['entity']}/{entity}", "", ""] else: - query = ["", "", f"http://www.wikidata.org/entity/{entity}"] + query = ["", "", f"{self.prefixes['entity']}/{entity}"] triplets, c = self.document.search_triples(*query) - if rel_type != "no_type": - start_str = f"http://www.wikidata.org/prop/{rel_type}" - else: - start_str = "http://www.wikidata.org/prop/P" - rels = [triplet[1] for triplet in triplets if triplet[1].startswith(start_str)] + start_str = f"{self.prefixes['rels'][rel_type]}/P" + rels = {triplet[1] for triplet in triplets if triplet[1].startswith(start_str)} + rels = list(rels) if self.file_format == "pickle": triplets = self.document.get(entity, {}).get(direction, []) triplets = self.uncompress(triplets) rels = [triplet[0] for triplet in triplets if triplet[0].startswith("P")] return rels + def find_object(self, entity: str, rel: str, direction: str) -> List[str]: + objects = [] + if not direction: + direction = "forw" + if self.file_format == "hdt": + entity = f"{self.prefixes['entity']}/{entity.split('/')[-1]}" + rel = f"{self.prefixes['rels']['direct']}/{rel}" + if direction == "forw": + triplets, cnt = self.document.search_triples(entity, rel, "") + if cnt < self.max_comb_num: + objects.extend([triplet[2].split('/')[-1] for triplet in triplets]) + else: + triplets, cnt = self.document.search_triples("", rel, entity) + objects.extend([triplet[0].split('/')[-1] for triplet in triplets]) + else: + entity = entity.split('/')[-1] + rel = rel.split('/')[-1] + triplets = self.document.get(entity, {}).get(direction, []) + triplets = self.uncompress(triplets) + for found_rel, *objects in triplets: + if rel == found_rel: + objects.extend(objects) + return objects + + def check_triplet(self, subj: str, rel: str, obj: str) -> bool: + if self.file_format == "hdt": + subj = f"{self.prefixes['entity']}/{subj}" + rel = f"{self.prefixes['rels']['direct']}/{rel}" + obj = f"{self.prefixes['entity']}/{obj}" + triplets, cnt = self.document.search_triples(subj, rel, obj) + if cnt > 0: + return True + else: + return False + else: + subj = subj.split('/')[-1] + rel = rel.split('/')[-1] + obj = obj.split('/')[-1] + triplets = self.document.get(subj, {}).get("forw", []) + triplets = self.uncompress(triplets) + for found_rel, *objects in triplets: + if found_rel == rel: + for found_obj in objects: + if found_obj == obj: + return True + return False + + def find_types(self, entity: str): + types = [] + if self.file_format == "hdt": + if not entity.startswith("http"): + entity = f"{self.prefixes['entity']}/{entity}" + tr, c = self.document.search_triples(entity, f"{self.prefixes['rels']['direct']}/P31", "") + types = [triplet[2].split('/')[-1] for triplet in tr] + if "Q5" in types: + tr, c = self.document.search_triples(entity, f"{self.prefixes['rels']['direct']}/P106", "") + types += [triplet[2].split('/')[-1] for triplet in tr] + if self.file_format == "pickle": + entity = entity.split('/')[-1] + triplets = self.document.get(entity, {}).get("forw", []) + triplets = self.uncompress(triplets) + for triplet in triplets: + if triplet[0] == "P31": + types = triplet[1:] + types = set(types) + return types + + def find_subclasses(self, entity: str): + types = [] + if self.file_format == "hdt": + if not entity.startswith("http"): + entity = f"{self.prefixes['entity']}/{entity}" + tr, c = self.document.search_triples(entity, f"{self.prefixes['rels']['direct']}/P279", "") + types = [triplet[2].split('/')[-1] for triplet in tr] + if self.file_format == "pickle": + entity = entity.split('/')[-1] + triplets = self.document.get(entity, {}).get("forw", []) + triplets = self.uncompress(triplets) + for triplet in triplets: + if triplet[0] == "P279": + types = triplet[1:] + types = set(types) + return types + def uncompress(self, triplets: Union[str, List[List[str]]]) -> List[List[str]]: if isinstance(triplets, str): triplets = triplets.split('\t') diff --git a/deeppavlov/models/kbqa/wiki_parser_online.py b/deeppavlov/models/kbqa/wiki_parser_online.py deleted file mode 100644 index 4795ce982a..0000000000 --- a/deeppavlov/models/kbqa/wiki_parser_online.py +++ /dev/null @@ -1,107 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from logging import getLogger -from time import sleep -from typing import List, Dict, Any - -import requests -from requests.exceptions import ConnectionError, ConnectTimeout, ReadTimeout - -from deeppavlov import __version__ as dp_version -from deeppavlov.core.common.registry import register - -log = getLogger(__name__) - - -@register('wiki_parser_online') -class WikiParserOnline: - """This class extract relations or labels from Wikidata query service""" - - def __init__(self, url: str, timeout: float = 0.5, **kwargs) -> None: - self.url = url - self.timeout = timeout - self.agent_header = {'User-Agent': f'wiki_parser_online/{dp_version} (https://deeppavlov.ai;' - f' info@deeppavlov.ai) deeppavlov/{dp_version}'} - - def __call__(self, parser_info_list: List[str], queries_list: List[Any]) -> List[Any]: - wiki_parser_output = [] - for parser_info, query in zip(parser_info_list, queries_list): - if parser_info == "query_execute": - query_to_execute, return_if_found = query - candidate_output = self.get_answer(query_to_execute) - wiki_parser_output.append(candidate_output) - if return_if_found and candidate_output: - return wiki_parser_output - elif parser_info == "find_rels": - wiki_parser_output += self.find_rels(*query) - elif parser_info == "find_label": - wiki_parser_output.append(self.find_label(*query)) - else: - raise ValueError("Unsupported query type") - return wiki_parser_output - - def get_answer(self, query: str) -> List[Dict[str, Dict[str, str]]]: - data = [] - for i in range(5): - try: - resp = requests.get(self.url, - params={'query': query, 'format': 'json'}, - timeout=self.timeout, - headers=self.agent_header) - if resp.status_code != 200: - continue - data_0 = resp.json() - if "results" in data_0.keys(): - data = data_0['results']['bindings'] - elif "boolean" in data_0.keys(): - data = data_0['boolean'] - break - except (ConnectTimeout, ReadTimeout) as e: - log.warning(f'TimeoutError: {repr(e)}') - except ConnectionError as e: - log.warning(f'Connection error: {repr(e)}\nWaiting 1s...') - sleep(1) - return data - - def find_label(self, entity: str, question: str) -> str: - entity = str(entity).replace('"', '') - if entity.startswith("http://www.wikidata.org/entity/Q"): - entity = entity.split('/')[-1] - if entity.startswith("Q"): - query = f"SELECT DISTINCT ?label WHERE {{ wd:{entity} rdfs:label ?label . FILTER (lang(?label) = 'en') }}" - labels = self.get_answer(query) - if labels: - labels = [label["label"]["value"] for label in labels] - return labels[0] - elif entity.endswith("T00:00:00Z"): - return entity.split('T00:00:00Z')[0] - else: - return entity - - def find_rels(self, entity: str, direction: str, rel_type: str = "no_type") -> List[str]: - if direction == "forw": - query = f"SELECT DISTINCT ?rel WHERE {{ wd:{entity} ?rel ?obj . }}" - else: - query = f"SELECT DISTINCT ?rel WHERE {{ ?subj ?rel wd:{entity} . }}" - rels = self.get_answer(query) - if rels: - rels = [rel["rel"]["value"] for rel in rels] - - if rel_type != "no_type": - start_str = f"http://www.wikidata.org/prop/{rel_type}" - else: - start_str = "http://www.wikidata.org/prop/P" - rels = [rel for rel in rels if rel.startswith(start_str)] - return rels diff --git a/deeppavlov/models/morpho_tagger/__init__.py b/deeppavlov/models/morpho_tagger/__init__.py deleted file mode 100644 index e69de29bb2..0000000000 diff --git a/deeppavlov/models/morpho_tagger/__main__.py b/deeppavlov/models/morpho_tagger/__main__.py deleted file mode 100644 index 9649755157..0000000000 --- a/deeppavlov/models/morpho_tagger/__main__.py +++ /dev/null @@ -1,25 +0,0 @@ -import argparse - -from deeppavlov.core.common.file import find_config -from deeppavlov.download import deep_download -from deeppavlov.models.morpho_tagger.common import predict_with_model - -parser = argparse.ArgumentParser() -parser.add_argument("config_path", help="path to file with prediction configuration") -parser.add_argument("-d", "--download", action="store_true", help="download model components") -parser.add_argument("-b", "--batch-size", dest="batch_size", default=16, help="inference batch size", type=int) -parser.add_argument("-f", "--input-file", dest="file_path", default=None, help="path to the input file", type=str) -parser.add_argument("-i", "--input-format", dest="input_format", default="ud", - help="input format ('text' for untokenized text, 'ud' or 'vertical'", type=str) -parser.add_argument("-o", "--output-format", dest="output_format", default="basic", - help="input format ('basic', 'ud' or 'conllu' (the last two mean the same)", type=str) - -if __name__ == "__main__": - args = parser.parse_args() - config_path = find_config(args.config_path) - if args.download: - deep_download(config_path) - answer = predict_with_model(config_path, infile=args.file_path, input_format=args.input_format, - batch_size=args.batch_size, output_format=args.output_format) - for elem in answer: - print(elem) diff --git a/deeppavlov/models/morpho_tagger/cells.py b/deeppavlov/models/morpho_tagger/cells.py deleted file mode 100644 index f781ba53d6..0000000000 --- a/deeppavlov/models/morpho_tagger/cells.py +++ /dev/null @@ -1,179 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import numpy as np -import tensorflow as tf -import tensorflow.keras.backend as K -from tensorflow.keras.initializers import Constant -from tensorflow.keras.layers import InputSpec, Layer, Lambda, Dropout, Multiply - -INFTY = -100 - - -class Highway(Layer): - - def __init__(self, activation=None, bias_initializer=-1, **kwargs): - super().__init__(**kwargs) - self.activation = tf.keras.activations.get(activation) - self.bias_initializer = bias_initializer - if isinstance(self.bias_initializer, int): - self.bias_initializer = Constant(self.bias_initializer) - self.input_spec = [InputSpec(min_ndim=2)] - - def build(self, input_shape): - assert len(input_shape) >= 2 - input_dim = input_shape[-1] - - self.gate_kernel = self.add_weight( - shape=(input_dim, input_dim), initializer='uniform', name='gate_kernel') - self.gate_bias = self.add_weight( - shape=(input_dim,), initializer=self.bias_initializer, name='gate_bias') - self.dense_kernel = self.add_weight( - shape=(input_dim, input_dim), initializer='uniform', name='dense_kernel') - self.dense_bias = self.add_weight( - shape=(input_dim,), initializer=self.bias_initializer, name='dense_bias') - self.input_spec = InputSpec(min_ndim=2, axes={-1: input_dim}) - self.built = True - - def call(self, inputs, **kwargs): - gate = K.dot(inputs, self.gate_kernel) - gate = K.bias_add(gate, self.gate_bias, data_format="channels_last") - gate = self.activation(gate) - new_value = K.dot(inputs, self.dense_kernel) - new_value = K.bias_add(new_value, self.dense_bias, data_format="channels_last") - return gate * new_value + (1.0 - gate) * inputs - - def compute_output_shape(self, input_shape): - return input_shape - - -def weighted_sum(first, second, sigma, first_threshold=-np.inf, second_threshold=np.inf): - logit_probs = first * sigma + second * (1.0 - sigma) - infty_tensor = K.ones_like(logit_probs) * INFTY - logit_probs = K.switch(K.greater(first, first_threshold), logit_probs, infty_tensor) - logit_probs = K.switch(K.greater(second, second_threshold), logit_probs, infty_tensor) - return logit_probs - - -class WeightedCombinationLayer(Layer): - - """ - A class for weighted combination of probability distributions - """ - - def __init__(self, first_threshold=None, second_threshold=None, - use_dimension_bias=False, use_intermediate_layer=False, - intermediate_dim=64, intermediate_activation=None, - from_logits=False, return_logits=False, - bias_initializer=1.0, **kwargs): - # if 'input_shape' not in kwargs: - # kwargs['input_shape'] = [(None, input_dim,), (None, input_dim)] - super(WeightedCombinationLayer, self).__init__(**kwargs) - self.first_threshold = first_threshold if first_threshold is not None else INFTY - self.second_threshold = second_threshold if second_threshold is not None else INFTY - self.use_dimension_bias = use_dimension_bias - self.use_intermediate_layer = use_intermediate_layer - self.intermediate_dim = intermediate_dim - self.intermediate_activation = tf.keras.activations.get(intermediate_activation) - self.from_logits = from_logits - self.return_logits = return_logits - self.bias_initializer = bias_initializer - self.input_spec = [InputSpec(), InputSpec(), InputSpec()] - - def build(self, input_shape): - assert len(input_shape) == 3 - assert input_shape[0] == input_shape[1] - assert input_shape[0][:-1] == input_shape[2][:-1] - - input_dim, features_dim = input_shape[0][-1], input_shape[2][-1] - if self.use_intermediate_layer: - self.first_kernel = self.add_weight( - shape=(features_dim, self.intermediate_dim), - initializer="random_uniform", name='first_kernel') - self.first_bias = self.add_weight( - shape=(self.intermediate_dim,), - initializer="random_uniform", name='first_bias') - self.features_kernel = self.add_weight( - shape=(features_dim, 1), initializer="random_uniform", name='kernel') - self.features_bias = self.add_weight( - shape=(1,), initializer=Constant(self.bias_initializer), name='bias') - if self.use_dimension_bias: - self.dimensions_bias = self.add_weight( - shape=(input_dim,), initializer="random_uniform", name='dimension_bias') - super(WeightedCombinationLayer, self).build(input_shape) - - def call(self, inputs, **kwargs): - assert isinstance(inputs, list) and len(inputs) == 3 - first, second, features = inputs[0], inputs[1], inputs[2] - if not self.from_logits: - first = K.clip(first, 1e-10, 1.0) - second = K.clip(second, 1e-10, 1.0) - first_, second_ = K.log(first), K.log(second) - else: - first_, second_ = first, second - # embedded_features.shape = (M, T, 1) - if self.use_intermediate_layer: - features = K.dot(features, self.first_kernel) - features = K.bias_add(features, self.first_bias, data_format="channels_last") - features = self.intermediate_activation(features) - embedded_features = K.dot(features, self.features_kernel) - embedded_features = K.bias_add( - embedded_features, self.features_bias, data_format="channels_last") - if self.use_dimension_bias: - tiling_shape = [1] * (K.ndim(first) - 1) + [K.shape(first)[-1]] - embedded_features = K.tile(embedded_features, tiling_shape) - embedded_features = K.bias_add( - embedded_features, self.dimensions_bias, data_format="channels_last") - sigma = K.sigmoid(embedded_features) - - result = weighted_sum(first_, second_, sigma, - self.first_threshold, self.second_threshold) - probs = K.softmax(result) - if self.return_logits: - return [probs, result] - return probs - - def compute_output_shape(self, input_shape): - first_shape = input_shape[0] - if self.return_logits: - return [first_shape, first_shape] - return first_shape - - -def TemporalDropout(inputs, dropout=0.0): - """ - Drops with :dropout probability temporal steps of input 3D tensor - """ - # TO DO: adapt for >3D tensors - if dropout == 0.0: - return inputs - inputs_func = lambda x: K.ones_like(inputs[:, :, 0:1]) - inputs_mask = Lambda(inputs_func)(inputs) - inputs_mask = Dropout(dropout)(inputs_mask) - tiling_shape = [1, 1, K.shape(inputs)[2]] + [1] * (K.ndim(inputs) - 3) - inputs_mask = Lambda(K.tile, arguments={"n": tiling_shape}, - output_shape=inputs._keras_shape[1:])(inputs_mask) - answer = Multiply()([inputs, inputs_mask]) - return answer - - -def positions_func(inputs, pad=0): - """ - A layer filling i-th column of a 2D tensor with - 1+ln(1+i) when it contains a meaningful symbol - and with 0 when it contains PAD - """ - position_inputs = K.cumsum(K.ones_like(inputs, dtype="float32"), axis=1) - position_inputs *= K.cast(K.not_equal(inputs, pad), "float32") - return K.log(1.0 + position_inputs) \ No newline at end of file diff --git a/deeppavlov/models/morpho_tagger/common.py b/deeppavlov/models/morpho_tagger/common.py deleted file mode 100644 index a6288093bf..0000000000 --- a/deeppavlov/models/morpho_tagger/common.py +++ /dev/null @@ -1,293 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import sys -from pathlib import Path -from typing import List, Union, Optional - -from deeppavlov.core.commands.infer import build_model -from deeppavlov.core.commands.utils import expand_path, parse_config -from deeppavlov.core.common.registry import register -from deeppavlov.core.models.component import Component -from deeppavlov.dataset_readers.morphotagging_dataset_reader import read_infile -from deeppavlov.models.morpho_tagger.common_tagger import make_pos_and_tag - - -def predict_with_model(config_path: [Path, str], infile: Optional[Union[Path, str]] = None, - input_format: str = "ud", batch_size: [int] = 16, - output_format: str = "basic") -> List[Optional[List[str]]]: - """Returns predictions of morphotagging model given in config :config_path:. - - Args: - config_path: a path to config - - Returns: - a list of morphological analyses for each sentence. Each analysis is either a list of tags - or a list of full CONLL-U descriptions. - - """ - config = parse_config(config_path) - if infile is None: - if sys.stdin.isatty(): - raise RuntimeError('To process data from terminal please use interact mode') - infile = sys.stdin - else: - infile = expand_path(infile) - if input_format in ["ud", "conllu", "vertical"]: - from_words = (input_format == "vertical") - data: List[tuple] = read_infile(infile, from_words=from_words) - # keeping only sentences - data = [elem[0] for elem in data] - else: - if infile is not sys.stdin: - with open(infile, "r", encoding="utf8") as fin: - data = fin.readlines() - else: - data = sys.stdin.readlines() - model = build_model(config, load_trained=True) - for elem in model.pipe: - if isinstance(elem[-1], TagOutputPrettifier): - elem[-1].set_format_mode(output_format) - answers = model.batched_call(data, batch_size=batch_size) - return answers - - -@register('tag_output_prettifier') -class TagOutputPrettifier(Component): - """Class which prettifies morphological tagger output to 4-column - or 10-column (Universal Dependencies) format. - - Args: - format_mode: output format, - in `basic` mode output data contains 4 columns (id, word, pos, features), - in `conllu` or `ud` mode it contains 10 columns: - id, word, lemma, pos, xpos, feats, head, deprel, deps, misc - (see http://universaldependencies.org/format.html for details) - Only id, word, tag and pos values are present in current version, - other columns are filled by `_` value. - return_string: whether to return a list of strings or a single string - begin: a string to append in the beginning - end: a string to append in the end - sep: separator between word analyses - """ - - def __init__(self, format_mode: str = "basic", return_string: bool = True, - begin: str = "", end: str = "", sep: str = "\n", **kwargs) -> None: - self.set_format_mode(format_mode) - self.return_string = return_string - self.begin = begin - self.end = end - self.sep = sep - - def set_format_mode(self, format_mode: str = "basic") -> None: - """A function that sets format for output and recalculates `self.format_string`. - - Args: - format_mode: output format, - in `basic` mode output data contains 4 columns (id, word, pos, features), - in `conllu` or `ud` mode it contains 10 columns: - id, word, lemma, pos, xpos, feats, head, deprel, deps, misc - (see http://universaldependencies.org/format.html for details) - Only id, word, tag and pos values are present in current version, - other columns are filled by `_` value. - - Returns: - """ - self.format_mode = format_mode - self._make_format_string() - - def _make_format_string(self) -> None: - if self.format_mode == "basic": - self.format_string = "{}\t{}\t{}\t{}" - elif self.format_mode.lower() in ["conllu", "ud"]: - self.format_string = "{}\t{}\t_\t{}\t_\t{}\t_\t_\t_\t_" - else: - raise ValueError("Wrong mode for TagOutputPrettifier: {}, " - "it must be 'basic', 'conllu' or 'ud'.".format(self.mode)) - - def __call__(self, X: List[List[str]], Y: List[List[str]]) -> List[Union[List[str], str]]: - """Calls the :meth:`~prettify` function for each input sentence. - - Args: - X: a list of input sentences - Y: a list of list of tags for sentence words - - Returns: - a list of prettified morphological analyses - """ - return [self.prettify(x, y) for x, y in zip(X, Y)] - - def prettify(self, tokens: List[str], tags: List[str]) -> Union[List[str], str]: - """Prettifies output of morphological tagger. - - Args: - tokens: tokenized source sentence - tags: list of tags, the output of a tagger - - Returns: - the prettified output of the tagger. - - Examples: - >>> sent = "John really likes pizza .".split() - >>> tags = ["PROPN,Number=Sing", "ADV", - >>> "VERB,Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin", - >>> "NOUN,Number=Sing", "PUNCT"] - >>> prettifier = TagOutputPrettifier(mode='basic') - >>> self.prettify(sent, tags) - 1 John PROPN Number=Sing - 2 really ADV _ - 3 likes VERB Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin - 4 pizza NOUN Number=Sing - 5 . PUNCT _ - >>> prettifier = TagOutputPrettifier(mode='ud') - >>> self.prettify(sent, tags) - 1 John _ PROPN _ Number=Sing _ _ _ _ - 2 really _ ADV _ _ _ _ _ _ - 3 likes _ VERB _ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin _ _ _ _ - 4 pizza _ NOUN _ Number=Sing _ _ _ _ - 5 . _ PUNCT _ _ _ _ _ _ - """ - answer = [] - for i, (word, tag) in enumerate(zip(tokens, tags)): - answer.append(self.format_string.format(i + 1, word, *make_pos_and_tag(tag))) - if self.return_string: - answer = self.begin + self.sep.join(answer) + self.end - return answer - - -@register('lemmatized_output_prettifier') -class LemmatizedOutputPrettifier(Component): - """Class which prettifies morphological tagger output to 4-column - or 10-column (Universal Dependencies) format. - - Args: - format_mode: output format, - in `basic` mode output data contains 4 columns (id, word, pos, features), - in `conllu` or `ud` mode it contains 10 columns: - id, word, lemma, pos, xpos, feats, head, deprel, deps, misc - (see http://universaldependencies.org/format.html for details) - Only id, word, lemma, tag and pos columns are predicted in current version, - other columns are filled by `_` value. - return_string: whether to return a list of strings or a single string - begin: a string to append in the beginning - end: a string to append in the end - sep: separator between word analyses - """ - - def __init__(self, return_string: bool = True, - begin: str = "", end: str = "", sep: str = "\n", **kwargs) -> None: - self.return_string = return_string - self.begin = begin - self.end = end - self.sep = sep - self.format_string = "{0}\t{1}\t{4}\t{2}\t_\t{3}\t_\t_\t_\t_" - - def __call__(self, X: List[List[str]], Y: List[List[str]], Z: List[List[str]]) -> List[Union[List[str], str]]: - """Calls the :meth:`~prettify` function for each input sentence. - - Args: - X: a list of input sentences - Y: a list of list of tags for sentence words - Z: a list of lemmatized sentences - - Returns: - a list of prettified morphological analyses - """ - return [self.prettify(*elem) for elem in zip(X, Y, Z)] - - def prettify(self, tokens: List[str], tags: List[str], lemmas: List[str]) -> Union[List[str], str]: - """Prettifies output of morphological tagger. - - Args: - tokens: tokenized source sentence - tags: list of tags, the output of a tagger - lemmas: list of lemmas, the output of a lemmatizer - - Returns: - the prettified output of the tagger. - - Examples: - >>> sent = "John really likes pizza .".split() - >>> tags = ["PROPN,Number=Sing", "ADV", - >>> "VERB,Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin", - >>> "NOUN,Number=Sing", "PUNCT"] - >>> lemmas = "John really like pizza .".split() - >>> prettifier = LemmatizedOutputPrettifier() - >>> self.prettify(sent, tags, lemmas) - 1 John John PROPN _ Number=Sing _ _ _ _ - 2 really really ADV _ _ _ _ _ _ - 3 likes like VERB _ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin _ _ _ _ - 4 pizza pizza NOUN _ Number=Sing _ _ _ _ - 5 . . PUNCT _ _ _ _ _ _ - """ - answer = [] - for i, (word, tag, lemma) in enumerate(zip(tokens, tags, lemmas)): - pos, tag = make_pos_and_tag(tag, sep=",") - answer.append(self.format_string.format(i + 1, word, pos, tag, lemma)) - if self.return_string: - answer = self.begin + self.sep.join(answer) + self.end - return answer - - -@register('dependency_output_prettifier') -class DependencyOutputPrettifier(Component): - """Class which prettifies dependency parser output - to 10-column (Universal Dependencies) format. - - Args: - return_string: whether to return a list of strings or a single string - begin: a string to append in the beginning - end: a string to append in the end - sep: separator between word analyses - """ - - def __init__(self, return_string: bool = True, begin: str = "", - end: str = "", sep: str = "\n", **kwargs) -> None: - self.return_string = return_string - self.begin = begin - self.end = end - self.sep = sep - self.format_string = "{}\t{}\t_\t_\t_\t_\t{}\t{}\t_\t_" - - def __call__(self, X: List[List[str]], Y: List[List[int]], Z: List[List[str]]) -> List[Union[List[str], str]]: - """Calls the :meth:`~prettify` function for each input sentence. - - Args: - X: a list of input sentences - Y: a list of lists of head positions for sentence words - Z: a list of lists of dependency labels for sentence words - - Returns: - a list of prettified UD outputs - """ - return [self.prettify(x, y, z) for x, y, z in zip(X, Y, Z)] - - def prettify(self, tokens: List[str], heads: List[int], deps: List[str]) -> Union[List[str], str]: - """Prettifies output of dependency parser. - - Args: - tokens: tokenized source sentence - heads: list of head positions, the output of the parser - deps: list of head positions, the output of the parser - - Returns: - the prettified output of the parser - - """ - answer = [] - for i, (word, head, dep) in enumerate(zip(tokens, heads, deps)): - answer.append(self.format_string.format(i + 1, word, head, dep)) - if self.return_string: - answer = self.begin + self.sep.join(answer) + self.end - return answer diff --git a/deeppavlov/models/morpho_tagger/common_tagger.py b/deeppavlov/models/morpho_tagger/common_tagger.py deleted file mode 100644 index dfc7e330aa..0000000000 --- a/deeppavlov/models/morpho_tagger/common_tagger.py +++ /dev/null @@ -1,128 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""File containing common operation with keras.backend objects""" - -from typing import Union, Optional, Tuple - -from tensorflow.keras import backend as K -import numpy as np - -EPS = 1e-15 - - -# AUXILIARY = ['PAD', 'BEGIN', 'END', 'UNKNOWN'] -# AUXILIARY_CODES = PAD, BEGIN, END, UNKNOWN = 0, 1, 2, 3 - - -def to_one_hot(x, k): - """ - Takes an array of integers and transforms it - to an array of one-hot encoded vectors - """ - unit = np.eye(k, dtype=int) - return unit[x] - - -def repeat_(x, k): - tile_factor = [1, k] + [1] * (K.ndim(x) - 1) - return K.tile(x[:, None, :], tile_factor) - - -def make_pos_and_tag(tag: str, sep: str = ",", - return_mode: Optional[str] = None) -> Tuple[str, Union[str, list, dict, tuple]]: - """ - Args: - tag: the part-of-speech tag - sep: the separator between part-of-speech tag and grammatical features - return_mode: the type of return value, can be None, list, dict or sorted_items - - Returns: - the part-of-speech label and grammatical features in required format - """ - if tag.endswith(" _"): - tag = tag[:-2] - if sep in tag: - pos, tag = tag.split(sep, maxsplit=1) - else: - pos, tag = tag, ("_" if return_mode is None else "") - if return_mode in ["dict", "list", "sorted_items"]: - tag = tag.split("|") if tag != "" else [] - if return_mode in ["dict", "sorted_items"]: - tag = dict(tuple(elem.split("=")) for elem in tag) - if return_mode == "sorted_items": - tag = tuple(sorted(tag.items())) - return pos, tag - - -def make_full_UD_tag(pos: str, tag: Union[str, list, dict, tuple], - sep: str = ",", mode: Optional[str] = None) -> str: - """ - Args: - pos: the part-of-speech label - tag: grammatical features in the format, specified by 'mode' - sep: the separator between part of speech and features in output tag - mode: the input format of tag, can be None, list, dict or sorted_items - - Returns: - the string representation of morphological tag - """ - if tag == "_" or len(tag) == 0: - return pos - if mode == "dict": - tag, mode = sorted(tag.items()), "sorted_items" - if mode == "sorted_items": - tag, mode = ["{}={}".format(*elem) for elem in tag], "list" - if mode == "list": - tag = "|".join(tag) - return pos + sep + tag - - -def _are_equal_pos(first, second): - NOUNS, VERBS, CONJ = ["NOUN", "PROPN"], ["AUX", "VERB"], ["CCONJ", "SCONJ"] - return (first == second or any((first in parts) and (second in parts) - for parts in [NOUNS, VERBS, CONJ])) - - -IDLE_FEATURES = {"Voice", "Animacy", "Degree", "Mood", "VerbForm"} - - -def get_tag_distance(first, second, first_sep=",", second_sep=" "): - """ - Measures the distance between two (Russian) morphological tags in UD Format. - The first tag is usually the one predicted by our model (therefore it uses comma - as separator), while the second is usually the result of automatical conversion, - where the separator is space. - - Args: - first: UD morphological tag - second: UD morphological tag (usually the output of 'russian_tagsets' converter) - first_sep: separator between two parts of the first tag - second_sep: separator between two parts of the second tag - - Returns: - the number of mismatched feature values - """ - first_pos, first_feats = make_pos_and_tag(first, sep=first_sep, return_mode="dict") - second_pos, second_feats = make_pos_and_tag(second, sep=second_sep, return_mode="dict") - dist = int(not _are_equal_pos(first_pos, second_pos)) - for key, value in first_feats.items(): - other = second_feats.get(key) - if other is None: - dist += int(key not in IDLE_FEATURES) - else: - dist += int(value != other) - for key in second_feats: - dist += int(key not in first_feats and key not in IDLE_FEATURES) - return dist diff --git a/deeppavlov/models/morpho_tagger/lemmatizer.py b/deeppavlov/models/morpho_tagger/lemmatizer.py deleted file mode 100644 index dd881c0f2d..0000000000 --- a/deeppavlov/models/morpho_tagger/lemmatizer.py +++ /dev/null @@ -1,137 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from abc import abstractmethod -from typing import List, Optional - -import numpy as np -from pymorphy2 import MorphAnalyzer -from pymorphy2.analyzer import Parse -from russian_tagsets import converters - -from deeppavlov.core.common.registry import register -from deeppavlov.core.models.serializable import Serializable -from deeppavlov.models.morpho_tagger.common_tagger import get_tag_distance - - -class BasicLemmatizer(Serializable): - """ - A basic class for lemmatizers. It must contain two methods: - * :meth: `_lemmatize` for single word lemmatization. It is an abstract method and should be reimplemented. - * :meth: `__call__` for lemmatizing a batch of sentences. - """ - - def __init__(self, save_path: Optional[str] = None, - load_path: Optional[str] = None, **kwargs) -> None: - super().__init__(save_path, load_path, **kwargs) - - @abstractmethod - def _lemmatize(self, word: str, tag: Optional[str] = None) -> str: - """ - Lemmatizes a separate word given its tag. - - Args: - word: the input word. - tag: optional morphological tag. - - Returns: - a lemmatized word - """ - raise NotImplementedError("Your lemmatizer must implement the abstract method _lemmatize.") - - def __call__(self, data: List[List[str]], tags: Optional[List[List[str]]] = None) -> List[List[str]]: - """ - Lemmatizes each word in a batch of sentences. - - Args: - data: the batch of sentences (lists of words). - tags: the batch of morphological tags (if available). - - Returns: - a batch of lemmatized sentences. - """ - if tags is None: - tags = [[None for _ in sent] for sent in data] - if len(tags) != len(data): - raise ValueError("There must be the same number of tag sentences as the number of word sentences.") - if any((len(elem[0]) != len(elem[1])) for elem in zip(data, tags)): - raise ValueError("Tag sentence must be of the same length as the word sentence.") - answer = [[self._lemmatize(word, tag) for word, tag in zip(*elem)] for elem in zip(data, tags)] - return answer - - -@register("UD_pymorphy_lemmatizer") -class UDPymorphyLemmatizer(BasicLemmatizer): - """ - A class that returns a normal form of a Russian word given its morphological tag in UD format. - Lemma is selected from one of PyMorphy parses, - the parse whose tag resembles the most a known UD tag is chosen. - """ - - RARE_FEATURES = ["Fixd", "Litr"] - SPECIAL_FEATURES = ["Patr", "Surn"] - - def __init__(self, save_path: Optional[str] = None, load_path: Optional[str] = None, - rare_grammeme_penalty: float = 1.0, long_lemma_penalty: float = 1.0, - **kwargs) -> None: - self.rare_grammeme_penalty = rare_grammeme_penalty - self.long_lemma_penalty = long_lemma_penalty - self._reset() - self.analyzer = MorphAnalyzer() - self.converter = converters.converter("opencorpora-int", "ud20") - super().__init__(save_path, load_path, **kwargs) - - def save(self, *args, **kwargs): - pass - - def load(self, *args, **kwargs): - pass - - def _reset(self): - self.memo = dict() - - def _extract_lemma(self, parse: Parse) -> str: - special_feats = [x for x in self.SPECIAL_FEATURES if x in parse.tag] - if len(special_feats) == 0: - return parse.normal_form - # here we process surnames and patronyms since PyMorphy lemmatizes them incorrectly - for other in parse.lexeme: - tag = other.tag - if any(x not in tag for x in special_feats): - continue - if tag.case == "nomn" and tag.gender == parse.tag.gender and tag.number == "sing": - return other.word - return parse.normal_form - - def _lemmatize(self, word: str, tag: Optional[str] = None) -> str: - lemma = self.memo.get((word, tag)) - if lemma is not None: - return lemma - parses = self.analyzer.parse(word) - best_lemma, best_distance = word, np.inf - for i, parse in enumerate(parses): - curr_tag = self.converter(str(parse.tag)) - distance = get_tag_distance(tag, curr_tag) - for feat in self.RARE_FEATURES: - if feat in parse.tag: - distance += self.rare_grammeme_penalty - break - if len(word) == 1 and len(parse.normal_form) > 1: - distance += self.long_lemma_penalty - if distance < best_distance: - best_lemma, best_distance = self._extract_lemma(parse), distance - if distance == 0: - break - self.memo[(word, tag)] = best_lemma - return best_lemma diff --git a/deeppavlov/models/morpho_tagger/morpho_tagger.py b/deeppavlov/models/morpho_tagger/morpho_tagger.py deleted file mode 100644 index 45a6bb2379..0000000000 --- a/deeppavlov/models/morpho_tagger/morpho_tagger.py +++ /dev/null @@ -1,352 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from logging import getLogger -from pathlib import Path -from typing import List, Optional, Union, Tuple - -import numpy as np -import tensorflow.keras.backend as K -from tensorflow.keras import Model -from tensorflow.keras.layers import (Input, Dense, Lambda, Concatenate, Conv2D, Dropout, LSTM, Bidirectional, - TimeDistributed) -from tensorflow.keras.optimizers import Nadam -from tensorflow.keras.regularizers import l2 - -from deeppavlov.core.common.registry import register -from deeppavlov.core.data.simple_vocab import SimpleVocabulary -from deeppavlov.core.models.keras_model import KerasModel -from .cells import Highway -from .common_tagger import to_one_hot - -log = getLogger(__name__) - -MAX_WORD_LENGTH = 30 - - -@register("morpho_tagger") -class MorphoTagger(KerasModel): - """A class for character-based neural morphological tagger - - Parameters: - symbols: character vocabulary - tags: morphological tags vocabulary - save_path: the path where model is saved - load_path: the path from where model is loaded - mode: usage mode - - word_rnn: the type of character-level network (only `cnn` implemented) - char_embeddings_size: the size of character embeddings - char_conv_layers: the number of convolutional layers on character level - char_window_size: the width of convolutional filter (filters). - It can be a list if several parallel filters are applied, for example, [2, 3, 4, 5]. - char_filters: the number of convolutional filters for each window width. - It can be a number, a list (when there are several windows of different width - on a single convolution layer), a list of lists, if there - are more than 1 convolution layers, or **None**. - If **None**, a layer with width **width** contains - min(**char_filter_multiple** * **width**, 200) filters. - - char_filter_multiple: the ratio between filters number and window width - char_highway_layers: the number of highway layers on character level - conv_dropout: the ratio of dropout between convolutional layers - highway_dropout: the ratio of dropout between highway layers, - intermediate_dropout: the ratio of dropout between convolutional - and highway layers on character level - lstm_dropout: dropout ratio in word-level LSTM - word_vectorizers: list of parameters for additional word-level vectorizers, - for each vectorizer it stores a pair of vectorizer dimension and - the dimension of the corresponding word embedding - word_lstm_layers: the number of word-level LSTM layers - word_lstm_units: hidden dimensions of word-level LSTMs - word_dropout: the ratio of dropout before word level (it is applied to word embeddings) - regularizer: l2 regularization parameter - verbose: the level of verbosity - - A subclass of :class:`~deeppavlov.core.models.keras_model.KerasModel` - """ - def __init__(self, - symbols: SimpleVocabulary, - tags: SimpleVocabulary, - save_path: Optional[Union[str, Path]] = None, - load_path: Optional[Union[str, Path]] = None, - mode: str = 'infer', - word_rnn: str = "cnn", - char_embeddings_size: int = 16, - char_conv_layers: int = 1, - char_window_size: Union[int, List[int]] = 5, - char_filters: Union[int, List[int]] = None, - char_filter_multiple: int = 25, - char_highway_layers: int = 1, - conv_dropout: float = 0.0, - highway_dropout: float = 0.0, - intermediate_dropout: float = 0.0, - lstm_dropout: float = 0.0, - word_vectorizers: List[Tuple[int, int]] = None, - word_lstm_layers: int = 1, - word_lstm_units: Union[int, List[int]] = 128, - word_dropout: float = 0.0, - regularizer: float = None, - verbose: int = 1, **kwargs): - # Calls parent constructor. Results in creation of save_folder if it doesn't exist - super().__init__(save_path=save_path, load_path=load_path, mode=mode, **kwargs) - self.symbols = symbols - self.tags = tags - self.word_rnn = word_rnn - self.char_embeddings_size = char_embeddings_size - self.char_conv_layers = char_conv_layers - self.char_window_size = char_window_size - self.char_filters = char_filters - self.char_filter_multiple = char_filter_multiple - self.char_highway_layers = char_highway_layers - self.conv_dropout = conv_dropout - self.highway_dropout = highway_dropout - self.intermediate_dropout = intermediate_dropout - self.lstm_dropout = lstm_dropout - self.word_dropout = word_dropout - self.word_vectorizers = word_vectorizers # a list of additional vectorizer dimensions - self.word_lstm_layers = word_lstm_layers - self.word_lstm_units = word_lstm_units - self.regularizer = regularizer - self.verbose = verbose - self._initialize() - self.model_ = None - self.build() - - # Tries to load the model from model `load_path`, if it is available - self.load() - - def load(self) -> None: - """ - Checks existence of the model file, loads the model if the file exists - Loads model weights from a file - """ - - # Checks presence of the model files - if self.load_path.exists(): - path = str(self.load_path.resolve()) - log.info('[loading model from {}]'.format(path)) - self.model_.load_weights(path) - - def save(self) -> None: - """ - Saves model weights to the save_path, provided in config. The directory is - already created by super().__init__, which is called in __init__ of this class""" - path = str(self.save_path.absolute()) - log.info('[saving model to {}]'.format(path)) - self.model_.save_weights(path) - - def _initialize(self): - if isinstance(self.char_window_size, int): - self.char_window_size = [self.char_window_size] - if self.char_filters is None or isinstance(self.char_filters, int): - self.char_filters = [self.char_filters] * len(self.char_window_size) - if len(self.char_window_size) != len(self.char_filters): - raise ValueError("There should be the same number of window sizes and filter sizes") - if isinstance(self.word_lstm_units, int): - self.word_lstm_units = [self.word_lstm_units] * self.word_lstm_layers - if len(self.word_lstm_units) != self.word_lstm_layers: - raise ValueError("There should be the same number of lstm layer units and lstm layers") - if self.word_vectorizers is None: - self.word_vectorizers = [] - if self.regularizer is not None: - self.regularizer = l2(self.regularizer) - if self.verbose > 0: - log.info("{} symbols, {} tags in CharacterTagger".format(len(self.symbols), len(self.tags))) - - def build(self): - """Builds the network using Keras. - """ - word_inputs = Input(shape=(None, MAX_WORD_LENGTH+2), dtype="int32") - inputs = [word_inputs] - word_outputs = self._build_word_cnn(word_inputs) - if len(self.word_vectorizers) > 0: - additional_word_inputs = [Input(shape=(None, input_dim), dtype="float32") - for input_dim, dense_dim in self.word_vectorizers] - inputs.extend(additional_word_inputs) - additional_word_embeddings = [Dense(dense_dim)(additional_word_inputs[i]) - for i, (_, dense_dim) in enumerate(self.word_vectorizers)] - word_outputs = Concatenate()([word_outputs] + additional_word_embeddings) - outputs, lstm_outputs = self._build_basic_network(word_outputs) - compile_args = {"optimizer": Nadam(lr=0.002, clipnorm=5.0), - "loss": "categorical_crossentropy", "metrics": ["accuracy"]} - self.model_ = Model(inputs, outputs) - self.model_.compile(**compile_args) - if self.verbose > 0: - self.model_.summary(print_fn=log.info) - return self - - def _build_word_cnn(self, inputs): - """Builds word-level network - """ - inputs = Lambda(K.one_hot, arguments={"num_classes": len(self.symbols)}, - output_shape=lambda x: tuple(x) + (len(self.symbols),))(inputs) - char_embeddings = Dense(self.char_embeddings_size, use_bias=False)(inputs) - conv_outputs = [] - self.char_output_dim_ = 0 - for window_size, filters_number in zip(self.char_window_size, self.char_filters): - curr_output = char_embeddings - curr_filters_number = (min(self.char_filter_multiple * window_size, 200) - if filters_number is None else filters_number) - for _ in range(self.char_conv_layers - 1): - curr_output = Conv2D(curr_filters_number, (1, window_size), - padding="same", activation="relu", - data_format="channels_last")(curr_output) - if self.conv_dropout > 0.0: - curr_output = Dropout(self.conv_dropout)(curr_output) - curr_output = Conv2D(curr_filters_number, (1, window_size), - padding="same", activation="relu", - data_format="channels_last")(curr_output) - conv_outputs.append(curr_output) - self.char_output_dim_ += curr_filters_number - if len(conv_outputs) > 1: - conv_output = Concatenate(axis=-1)(conv_outputs) - else: - conv_output = conv_outputs[0] - highway_input = Lambda(K.max, arguments={"axis": -2})(conv_output) - if self.intermediate_dropout > 0.0: - highway_input = Dropout(self.intermediate_dropout)(highway_input) - for i in range(self.char_highway_layers - 1): - highway_input = Highway(activation="relu")(highway_input) - if self.highway_dropout > 0.0: - highway_input = Dropout(self.highway_dropout)(highway_input) - highway_output = Highway(activation="relu")(highway_input) - return highway_output - - def _build_basic_network(self, word_outputs): - """ - Creates the basic network architecture, - transforming word embeddings to intermediate outputs - """ - if self.word_dropout > 0.0: - lstm_outputs = Dropout(self.word_dropout)(word_outputs) - else: - lstm_outputs = word_outputs - for j in range(self.word_lstm_layers-1): - lstm_outputs = Bidirectional( - LSTM(self.word_lstm_units[j], return_sequences=True, - dropout=self.lstm_dropout))(lstm_outputs) - lstm_outputs = Bidirectional( - LSTM(self.word_lstm_units[-1], return_sequences=True, - dropout=self.lstm_dropout))(lstm_outputs) - pre_outputs = TimeDistributed( - Dense(len(self.tags), activation="softmax", - activity_regularizer=self.regularizer), - name="p")(lstm_outputs) - return pre_outputs, lstm_outputs - - # noinspection PyPep8Naming - def _transform_batch(self, data, labels=None, transform_to_one_hot=True): - data, additional_data = data[0], data[1:] - L = max(len(x) for x in data) - X = np.array([self._make_sent_vector(x, L) for x in data]) - X = [X] + [np.array(x) for x in additional_data] - if labels is not None: - Y = np.array([self._make_tags_vector(y, L) for y in labels]) - if transform_to_one_hot: - Y = to_one_hot(Y, len(self.tags)) - return X, Y - else: - return X - - def train_on_batch(self, *args) -> None: - """Trains the model on a single batch. - - Args: - *args: the list of network inputs. Last element of `args` is the batch of targets, - all previous elements are training data batches - """ - # data: List[Iterable], labels: Iterable[list] - # Args: - # data: a batch of word sequences - # labels: a batch of correct tag sequences - *data, labels = args - # noinspection PyPep8Naming - X, Y = self._transform_batch(data, labels) - self.model_.train_on_batch(X, Y) - - # noinspection PyPep8Naming - def predict_on_batch(self, data: Union[List[np.ndarray], Tuple[np.ndarray]], - return_indexes: bool = False) -> List[List[str]]: - """ - Makes predictions on a single batch - - Args: - data: model inputs for a single batch, data[0] contains input character encodings - and is the only element of data for mist models. Subsequent elements of data - include the output of additional vectorizers, e.g., dictionary-based one. - return_indexes: whether to return tag indexes in vocabulary or the tags themselves - - Returns: - a batch of label sequences - """ - X = self._transform_batch(data) - objects_number, lengths = len(X[0]), [len(elem) for elem in data[0]] - Y = self.model_.predict_on_batch(X) - labels = np.argmax(Y, axis=-1) - answer: List[Optional[List[str]]] = [None] * objects_number - for i, (elem, length) in enumerate(zip(labels, lengths)): - elem = elem[:length] - answer[i] = elem if return_indexes else self.tags.idxs2toks(elem) - return answer - - def __call__(self, *x_batch: np.ndarray, **kwargs) -> Union[List, np.ndarray]: - """ - Predicts answers on batch elements. - - Args: - x_batch: a batch to predict answers on. It can be either a single array - for basic model or a sequence of arrays for a complex one ( - :config:`configuration file ` - or its lemmatized version). - """ - return self.predict_on_batch(x_batch, **kwargs) - - def _make_sent_vector(self, sent: List, bucket_length: int = None) -> np.ndarray: - """Transforms a sentence to Numpy array, which will be the network input. - - Args: - sent: input sentence - bucket_length: the width of the bucket - - Returns: - A 3d array, answer[i][j][k] contains the index of k-th letter - in j-th word of i-th input sentence. - """ - bucket_length = bucket_length or len(sent) - answer = np.zeros(shape=(bucket_length, MAX_WORD_LENGTH+2), dtype=np.int32) - for i, word in enumerate(sent): - answer[i, 0] = self.tags["BEGIN"] - m = min(len(word), MAX_WORD_LENGTH) - for j, x in enumerate(word[-m:]): - answer[i, j+1] = self.symbols[x] - answer[i, m+1] = self.tags["END"] - answer[i, m+2:] = self.tags["PAD"] - return answer - - def _make_tags_vector(self, tags, bucket_length=None) -> np.ndarray: - """Transforms a sentence of tags to Numpy array, which will be the network target. - - Args: - tags: input sentence of tags - bucket_length: the width of the bucket - - Returns: - A 2d array, answer[i][j] contains the index of j-th tag in i-th input sentence. - """ - bucket_length = bucket_length or len(tags) - answer = np.zeros(shape=(bucket_length,), dtype=np.int32) - for i, tag in enumerate(tags): - answer[i] = self.tags[tag] - return answer diff --git a/deeppavlov/models/multitask_bert/__init__.py b/deeppavlov/models/multitask_bert/__init__.py deleted file mode 100644 index e69de29bb2..0000000000 diff --git a/deeppavlov/models/multitask_bert/multitask_bert.py b/deeppavlov/models/multitask_bert/multitask_bert.py deleted file mode 100644 index 15d464c53d..0000000000 --- a/deeppavlov/models/multitask_bert/multitask_bert.py +++ /dev/null @@ -1,1152 +0,0 @@ -# Copyright 2019 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import copy -from abc import ABC, abstractmethod -from logging import getLogger -from typing import Any, Callable, Dict, List, Optional, Tuple, Union - -import numpy as np -import tensorflow as tf -from bert_dp.modeling import BertConfig, BertModel -from bert_dp.optimization import AdamWeightDecayOptimizer -from bert_dp.preprocessing import InputFeatures -from overrides import overrides - -from deeppavlov.core.commands.utils import expand_path -from deeppavlov.core.common.errors import ConfigError -from deeppavlov.core.common.registry import register -from deeppavlov.core.layers.tf_layers import bi_rnn -from deeppavlov.core.models.tf_model import LRScheduledTFModel -from deeppavlov.models.bert.bert_sequence_tagger import token_from_subtoken - -log = getLogger(__name__) - - -class MTBertTask(ABC): - """Abstract class for multitask BERT tasks. Objects of its subclasses are linked with BERT body when - ``MultiTaskBert.build`` method is called. Training is performed with ``MultiTaskBert.train_on_batch`` method is - called. The objects of classes derived from ``MTBertTask`` don't have ``__call__`` method. Instead they have - ``get_sess_run_infer_args`` and ``post_process_preds`` methods, which are called from ``call`` method of - ``MultiTaskBert`` class. ``get_sess_run_infer_args`` method returns fetches and feed_dict for inference and - ``post_process_preds`` method retrieves predictions from computed fetches. Classes derived from ``MTBertTask`` - must ``get_sess_run_train_args`` method that returns fetches and feed_dict for training. - - Args: - keep_prob: dropout keep_prob for non-BERT layers - return_probas: set this to ``True`` if you need the probabilities instead of raw answers - learning_rate: learning rate of BERT head - """ - - def __init__( - self, - keep_prob: float = 1., - return_probas: bool = None, - learning_rate: float = 1e-3, - ): - self.keep_prob = keep_prob - self.return_probas = return_probas - self.init_head_learning_rate = learning_rate - self.min_body_learning_rate = None - self.head_learning_rate_multiplier = None - - self.bert = None - self.optimizer_params = None - self.shared_ph = None - self.shared_feed_dict = None - self.sess = None - self.get_train_op_func = None - self.freeze_embeddings = None - self.bert_head_variable_scope = None - - def build( - self, - bert_body: BertModel, - optimizer_params: Dict[str, Union[str, float]], - shared_placeholders: Dict[str, tf.placeholder], - sess: tf.Session, - mode: str, - get_train_op_func: Callable, - freeze_embeddings: bool, - bert_head_variable_scope: str) -> None: - """Initiates building of the BERT head and initializes optimizer parameters, placeholders that are common for - all tasks. - - Args: - bert_body: instance of ``BertModel``. - optimizer_params: a dictionary with four fields: - ``'optimizer'`` (``str``) -- a name of optimizer class, - ``'body_learning_rate'`` (``float``) -- initial value of BERT body learning rate, - ``'min_body_learning_rate'`` (``float``) -- min BERT body learning rate for learning rate decay, - ``'weight_decay_rate'`` (``float``) -- L2 weight decay for ``AdamWeightDecayOptimizer`` - shared_placeholders: a dictionary with placeholders used in all tasks. The dictionary contains fields - ``'input_ids'``, ``'input_masks'``, ``'learning_rate'``, ``'keep_prob'``, ``'is_train'``, - ``'token_types'``. - sess: current ``tf.Session`` instance - mode: ``'train'`` or ``'inference'`` - get_train_op_func: a function returning ``tf.Operation`` and with signature similar to - ``LRScheduledTFModel.get_train_op`` without ``self`` argument. It is a function returning train - operation for specified loss and variable scopes. - freeze_embeddings: set ``False`` to train input embeddings. - bert_head_variable_scope: variable scope for BERT head. - """ - self.bert_head_variable_scope = bert_head_variable_scope - self.get_train_op_func = get_train_op_func - self.freeze_embeddings = freeze_embeddings - self.bert = bert_body - self.optimizer_params = optimizer_params - if mode == 'train': - self.head_learning_rate_multiplier = \ - self.init_head_learning_rate / self.optimizer_params['body_learning_rate'] - else: - self.head_learning_rate_multiplier = 0 - mblr = self.optimizer_params.get('min_body_learning_rate') - self.min_body_learning_rate = 0. if mblr is None else mblr - self.shared_ph = shared_placeholders - self.sess = sess - self._init_graph() - if mode == 'train': - self._init_optimizer() - - @abstractmethod - def _init_graph(self) -> None: - """Build BERT head, initialize task specific placeholders, create attributes containing output probabilities - and model loss. Optimizer initialized not in this method but in ``_init_optimizer``.""" - pass - - def get_train_op(self, loss: tf.Tensor, body_learning_rate: Union[tf.Tensor, float], **kwargs) -> tf.Operation: - """Return operation for the task training. Head learning rate is calculated as a product of - ``body_learning_rate`` and quotient of initial head learning rate and initial body learning rate. - - Args: - loss: the task loss - body_learning_rate: the learning rate for the BERT body - - Returns: - train operation for the task - """ - assert "learnable_scopes" not in kwargs, "learnable scopes unsupported" - # train_op for bert variables - kwargs['learnable_scopes'] = ('bert/encoder', 'bert/embeddings') - if self.freeze_embeddings: - kwargs['learnable_scopes'] = ('bert/encoder',) - learning_rate = body_learning_rate * self.head_learning_rate_multiplier - bert_train_op = self.get_train_op_func(loss, body_learning_rate, **kwargs) - # train_op for ner head variables - kwargs['learnable_scopes'] = (self.bert_head_variable_scope,) - head_train_op = self.get_train_op_func(loss, learning_rate, **kwargs) - return tf.group(bert_train_op, head_train_op) - - def _init_optimizer(self) -> None: - with tf.variable_scope(self.bert_head_variable_scope): - with tf.variable_scope('Optimizer'): - self.global_step = tf.get_variable('global_step', - shape=[], - dtype=tf.int32, - initializer=tf.constant_initializer(0), - trainable=False) - # default optimizer for Bert is Adam with fixed L2 regularization - - if self.optimizer_params.get('optimizer') is None: - self.train_op = \ - self.get_train_op( - self.loss, - body_learning_rate=self.shared_ph['learning_rate'], - optimizer=AdamWeightDecayOptimizer, - weight_decay_rate=self.optimizer_params.get('weight_decay_rate', 1e-6), - beta_1=0.9, - beta_2=0.999, - epsilon=1e-6, - optimizer_scope_name='Optimizer', - exclude_from_weight_decay=["LayerNorm", - "layer_norm", - "bias"]) - else: - self.train_op = self.get_train_op(self.loss, - body_learning_rate=self.shared_ph['learning_rate'], - optimizer_scope_name='Optimizer') - - if self.optimizer_params.get('optimizer') is None: - with tf.variable_scope('Optimizer'): - new_global_step = self.global_step + 1 - self.train_op = tf.group(self.train_op, [self.global_step.assign(new_global_step)]) - - def _build_feed_dict(self, input_ids, input_masks, token_types, y=None, body_learning_rate=None): - sph = self.shared_ph - train = y is not None - feed_dict = { - sph['input_ids']: input_ids, - sph['input_masks']: input_masks, - sph['token_types']: token_types, - sph['is_train']: train, - } - if train: - feed_dict.update({ - sph['learning_rate']: body_learning_rate, - self.y_ph: y, - sph['keep_prob']: self.keep_prob, - }) - return feed_dict - - def train_on_batch(self, *args, **kwargs) -> Dict[str, float]: - """Trains the task on one batch. This method will work correctly if you override ``get_sess_run_train_args`` - for your task. - - Args: - kwargs: the keys are ``body_learning_rate`` and ``"in"`` and ``"in_y"`` params for the task. - - Returns: - dictionary with calcutated task loss and body and head learning rates. - """ - fetches, feed_dict = self.get_sess_run_train_args(*args, **kwargs) - _, loss = self.sess.run(fetches, feed_dict=feed_dict) - return {f'{self.bert_head_variable_scope}_loss': loss, - f'{self.bert_head_variable_scope}_head_learning_rate': - float(kwargs['body_learning_rate']) * self.head_learning_rate_multiplier, - 'bert_body_learning_rate': kwargs['body_learning_rate']} - - @abstractmethod - def get_sess_run_infer_args(self, *args) -> Tuple[List[tf.Tensor], Dict[tf.placeholder, Any]]: - """Returns fetches and feed_dict for inference. Fetches are lists of tensors and feed_dict is dictionary - with placeholder values required for fetches computation. The method is used inside ``MultiTaskBert`` - ``__call__`` method. - - If ``self.return_probas`` is ``True`` fetches contains probabilities tensor and predictions tensor otherwise. - - Overriding methods take task inputs as positional arguments. - - ATTENTION! Let ``get_sess_run_infer_args`` method have ``n_x_args`` arguments. Then the order of first - ``n_x_args`` arguments of ``get_sess_run_train_args`` method arguments has to match the order of - ``get_sess_run_infer_args`` arguments. - - Args: - args: task inputs. - - Returns: - fetches and feed_dict - """ - pass - - @abstractmethod - def get_sess_run_train_args(self, *args) -> Tuple[List[tf.Tensor], Dict[tf.placeholder, Any]]: - """Returns fetches and feed_dict for task ``train_on_batch`` method. - - Overriding methods take task inputs as positional arguments. - - ATTENTION! Let ``get_sess_run_infer_args`` method have ``n_x_args`` arguments. Then the order of first - ``n_x_args`` arguments of ``get_sess_run_train_args`` method arguments has to match the order of - ``get_sess_run_infer_args`` arguments. - - Args: - args: task inputs followed by expect outputs. - - Returns: - fetches and feed_dict - """ - pass - - @abstractmethod - def post_process_preds(self, sess_run_res: list) -> list: - """Post process results of ``tf.Session.run`` called for task inference. Called from method - ``MultiTaskBert.__call__``. - - Args: - sess_run_res: computed fetches from ``get_sess_run_infer_args`` method - - Returns: - post processed results - """ - pass - - -@register("mt_bert_seq_tagging_task") -class MTBertSequenceTaggingTask(MTBertTask): - """BERT head for text tagging. It predicts a label for every token (not subtoken) in the text. - You can use it for sequence labelling tasks, such as morphological tagging or named entity recognition. - Objects of this class should be passed to the constructor of ``MultiTaskBert`` class in param ``tasks``. - - Args: - n_tags: number of distinct tags - use_crf: whether to use CRF on top or not - use_birnn: whether to use bidirection rnn after BERT layers. - For NER and morphological tagging we usually set it to ``False`` as otherwise the model overfits - birnn_cell_type: the type of Bidirectional RNN. Either ``"lstm"`` or ``"gru"`` - birnn_hidden_size: number of hidden units in the BiRNN layer in each direction - keep_prob: dropout keep_prob for non-Bert layers - encoder_dropout: dropout probability of encoder output layer - return_probas: set this to ``True`` if you need the probabilities instead of raw answers - encoder_layer_ids: list of averaged layers from Bert encoder (layer ids) - optimizer: name of ``tf.train.*`` optimizer or None for ``AdamWeightDecayOptimizer`` - weight_decay_rate: L2 weight decay for ``AdamWeightDecayOptimizer`` - learning_rate: learning rate of BERT head - """ - - def __init__( - self, - n_tags: int = None, - use_crf: bool = None, - use_birnn: bool = False, - birnn_cell_type: str = 'lstm', - birnn_hidden_size: int = 128, - keep_prob: float = 1., - encoder_dropout: float = 0., - return_probas: bool = None, - encoder_layer_ids: List[int] = None, - learning_rate: float = 1e-3, - ): - super().__init__(keep_prob, return_probas, learning_rate) - self.n_tags = n_tags - self.use_crf = use_crf - self.use_birnn = use_birnn - self.birnn_cell_type = birnn_cell_type - self.birnn_hidden_size = birnn_hidden_size - self.encoder_dropout = encoder_dropout - self.encoder_layer_ids = encoder_layer_ids - - def _init_placeholders(self) -> None: - self.y_ph = tf.placeholder(shape=(None, None), dtype=tf.int32, name='y_ph') - self.y_masks_ph = tf.placeholder(shape=(None, None), - dtype=tf.int32, - name='y_mask_ph') - self.encoder_keep_prob = tf.placeholder_with_default(1.0, shape=[], name='encoder_keep_prob_ph') - - def _init_graph(self) -> None: - with tf.variable_scope(self.bert_head_variable_scope): - self._init_placeholders() - self.seq_lengths = tf.reduce_sum(self.y_masks_ph, axis=1) - - layer_weights = tf.get_variable('layer_weights_', - shape=len(self.encoder_layer_ids), - initializer=tf.ones_initializer(), - trainable=True) - layer_mask = tf.ones_like(layer_weights) - layer_mask = tf.nn.dropout(layer_mask, self.encoder_keep_prob) - layer_weights *= layer_mask - # to prevent zero division - mask_sum = tf.maximum(tf.reduce_sum(layer_mask), 1.0) - layer_weights = tf.unstack(layer_weights / mask_sum) - # TODO: may be stack and reduce_sum is faster - units = sum(w * l for w, l in zip(layer_weights, self.encoder_layers())) - units = tf.nn.dropout(units, keep_prob=self.shared_ph['keep_prob']) - if self.use_birnn: - units, _ = bi_rnn(units, - self.birnn_hidden_size, - cell_type=self.birnn_cell_type, - seq_lengths=self.seq_lengths, - name='birnn') - units = tf.concat(units, -1) - # TODO: maybe add one more layer? - logits = tf.layers.dense(units, units=self.n_tags, name="output_dense") - - self.logits = token_from_subtoken(logits, self.y_masks_ph) - - # CRF - if self.use_crf: - transition_params = tf.get_variable('Transition_Params', - shape=[self.n_tags, self.n_tags], - initializer=tf.zeros_initializer()) - log_likelihood, transition_params = \ - tf.contrib.crf.crf_log_likelihood(self.logits, - self.y_ph, - self.seq_lengths, - transition_params) - loss_tensor = -log_likelihood - self._transition_params = transition_params - - self.y_predictions = tf.argmax(self.logits, -1) - self.y_probas = tf.nn.softmax(self.logits, axis=2) - - with tf.variable_scope("loss"): - tag_mask = self._get_tag_mask() - y_mask = tf.cast(tag_mask, tf.float32) - if self.use_crf: - self.loss = tf.reduce_mean(loss_tensor) - else: - self.loss = tf.losses.sparse_softmax_cross_entropy(labels=self.y_ph, - logits=self.logits, - weights=y_mask) - - def _get_tag_mask(self) -> tf.Tensor: - """ - Returns: tag_mask, - a mask that selects positions corresponding to word tokens (not padding and ``CLS``) - """ - max_length = tf.reduce_max(self.seq_lengths) - one_hot_max_len = tf.one_hot(self.seq_lengths - 1, max_length) - tag_mask = tf.cumsum(one_hot_max_len[:, ::-1], axis=1)[:, ::-1] - return tag_mask - - def encoder_layers(self): - """ - Returns: the output of BERT layers specified in ``self.encoder_layers_ids`` - """ - return [self.bert.all_encoder_layers[i] for i in self.encoder_layer_ids] - - @overrides - def _build_feed_dict(self, input_ids, input_masks, y_masks, y=None, body_learning_rate=None): - token_types = np.zeros(np.array(input_ids).shape) - sph = self.shared_ph - train = y is not None - feed_dict = super()._build_feed_dict(input_ids, input_masks, token_types, y, body_learning_rate) - if train: - feed_dict[self.encoder_keep_prob] = 1.0 - self.encoder_dropout - feed_dict[self.y_masks_ph] = y_masks - return feed_dict - - def _decode_crf(self, feed_dict: Dict[tf.Tensor, np.ndarray]) -> List[np.ndarray]: - logits, trans_params, mask, seq_lengths = self.sess.run([self.logits, - self._transition_params, - self.y_masks_ph, - self.seq_lengths], - feed_dict=feed_dict) - # iterate over the sentences because no batching in viterbi_decode - y_pred = [] - for logit, sequence_length in zip(logits, seq_lengths): - logit = logit[:int(sequence_length)] # keep only the valid steps - viterbi_seq, viterbi_score = tf.contrib.crf.viterbi_decode(logit, trans_params) - y_pred += [viterbi_seq] - return y_pred - - def get_sess_run_infer_args( - self, - input_ids: Union[List[List[int]], np.ndarray], - input_masks: Union[List[List[int]], np.ndarray], - y_masks: Union[List[List[int]], np.ndarray], - ) -> Tuple[List[tf.Tensor], Dict[tf.placeholder, Any]]: - """Returns fetches and feed_dict for model inference. The method is called from ``MultiTaskBert.__call__``. - - Args: - input_ids: indices of the subwords in vocabulary - input_masks: mask that determines where to attend and where not to - y_masks: mask which determines the first subword units in the the word - - Returns: - list of fetches and feed_dict - """ - feed_dict = self._build_feed_dict(input_ids, input_masks, y_masks) - if self.return_probas: - fetches = self.y_probas - else: - if self.use_crf: - fetches = [self.logits, self._transition_params, self.y_masks_ph, self.seq_lengths] - else: - fetches = [self.y_predictions, self.seq_lengths] - return fetches, feed_dict - - def get_sess_run_train_args( - self, - input_ids: Union[List[List[int]], np.ndarray], - input_masks: Union[List[List[int]], np.ndarray], - y_masks: Union[List[List[int]], np.ndarray], - y: Union[List[List[int]], np.ndarray], - body_learning_rate: float) -> Tuple[List[tf.Tensor], Dict[tf.placeholder, Any]]: - """Returns fetches and feed_dict for model ``train_on_batch`` method. - - Args: - input_ids: indices of the subwords in vocabulary - input_masks: mask that determines where to attend and where not to - y_masks: mask which determines the first subword units in the the word - y: indices of ground truth tags - body_learning_rate: learning rate for BERT body - - Returns: - list of fetches and feed_dict - """ - feed_dict = self._build_feed_dict(input_ids, input_masks, y_masks, y=y, body_learning_rate=body_learning_rate) - fetches = [self.train_op, self.loss] - return fetches, feed_dict - - def post_process_preds(self, sess_run_res: List[np.ndarray]) -> Union[List[List[int]], List[np.ndarray]]: - """Decodes CRF if needed and returns predictions or probabilities. - - Args: - sess_run_res: list of computed fetches gathered by ``get_sess_run_infer_args`` - - Returns: - predictions or probabilities depending on ``return_probas`` attribute - """ - if self.return_probas: - pred = sess_run_res - else: - if self.use_crf: - logits, trans_params, mask, seq_lengths = sess_run_res - pred = [] - for logit, sequence_length in zip(logits, seq_lengths): - logit = logit[:int(sequence_length)] # keep only the valid steps - viterbi_seq, viterbi_score = tf.contrib.crf.viterbi_decode(logit, trans_params) - pred += [viterbi_seq] - else: - pred, seq_lengths = sess_run_res - pred = [p[:l] for l, p in zip(seq_lengths, pred)] - return pred - - -@register("mt_bert_classification_task") -class MTBertClassificationTask(MTBertTask): - """Task for text classification. - - It uses output from [CLS] token and predicts labels using linear transformation. - - Args: - n_classes: number of classes - return_probas: set ``True`` if return class probabilities instead of most probable label needed - one_hot_labels: set ``True`` if one-hot encoding for labels is used - keep_prob: dropout keep_prob for non-BERT layers - multilabel: set ``True`` if it is multi-label classification - learning_rate: learning rate of BERT head - optimizer: name of ``tf.train.*`` optimizer or ``None`` for ``AdamWeightDecayOptimizer`` - """ - - def __init__( - self, - n_classes: int = None, - return_probas: bool = None, - one_hot_labels: bool = None, - keep_prob: float = 1., - multilabel: bool = False, - learning_rate: float = 2e-5, - optimizer: str = "Adam", - ): - super().__init__(keep_prob, return_probas, learning_rate) - self.n_classes = n_classes - self.one_hot_labels = one_hot_labels - self.multilabel = multilabel - - if self.multilabel and not self.one_hot_labels: - raise RuntimeError('Use one-hot encoded labels for multilabel classification!') - - if self.multilabel and not self.return_probas: - raise RuntimeError('Set return_probas to True for multilabel classification!') - - def _init_placeholders(self): - if not self.one_hot_labels: - self.y_ph = tf.placeholder(shape=(None,), dtype=tf.int32, name='y_ph') - else: - self.y_ph = tf.placeholder(shape=(None, self.n_classes), dtype=tf.float32, name='y_ph') - - def _init_graph(self): - with tf.variable_scope(self.bert_head_variable_scope): - self._init_placeholders() - - output_layer = self.bert.get_pooled_output() - hidden_size = output_layer.shape[-1].value - - output_weights = tf.get_variable( - "output_weights", [self.n_classes, hidden_size], - initializer=tf.truncated_normal_initializer(stddev=0.02)) - - output_bias = tf.get_variable( - "output_bias", [self.n_classes], initializer=tf.zeros_initializer()) - - with tf.variable_scope("loss"): - output_layer = tf.nn.dropout(output_layer, keep_prob=self.shared_ph['keep_prob']) - logits = tf.matmul(output_layer, output_weights, transpose_b=True) - logits = tf.nn.bias_add(logits, output_bias) - - if self.one_hot_labels: - one_hot_labels = self.y_ph - else: - one_hot_labels = tf.one_hot(self.y_ph, depth=self.n_classes, dtype=tf.float32) - - self.y_predictions = tf.argmax(logits, axis=-1) - if not self.multilabel: - log_probs = tf.nn.log_softmax(logits, axis=-1) - self.y_probas = tf.nn.softmax(logits, axis=-1) - per_example_loss = -tf.reduce_sum(one_hot_labels * log_probs, axis=-1) - self.loss = tf.reduce_mean(per_example_loss) - else: - self.y_probas = tf.nn.sigmoid(logits) - self.loss = tf.reduce_mean( - tf.nn.sigmoid_cross_entropy_with_logits(labels=one_hot_labels, logits=logits)) - - def get_sess_run_train_args( - self, - features: List[InputFeatures], - y: Union[List[int], List[List[int]]], - body_learning_rate: float) -> Tuple[List[tf.Tensor], Dict[tf.placeholder, Any]]: - """Returns fetches and feed_dict for model ``train_on_batch`` method. - - Args: - features: text features created by BERT preprocessor. - y: batch of labels (class id or one-hot encoding) - body_learning_rate: learning rate for BERT body - - Returns: - list of fetches and feed_dict - """ - input_ids = [f.input_ids for f in features] - input_masks = [f.input_mask for f in features] - input_type_ids = [f.input_type_ids for f in features] - feed_dict = self._build_feed_dict(input_ids, input_masks, input_type_ids, y=y, - body_learning_rate=body_learning_rate) - fetches = [self.train_op, self.loss] - return fetches, feed_dict - - def get_sess_run_infer_args( - self, - features: List[InputFeatures]) -> Tuple[List[tf.Tensor], Dict[tf.placeholder, Any]]: - """Returns fetches and feed_dict for model inference. The method is called from ``MultiTaskBert.__call__``. - - Args: - features: text features created by BERT preprocessor. - - Returns: - list of fetches and feed_dict - """ - input_ids = [f.input_ids for f in features] - input_masks = [f.input_mask for f in features] - input_type_ids = [f.input_type_ids for f in features] - feed_dict = self._build_feed_dict(input_ids, input_masks, input_type_ids) - fetches = self.y_probas if self.return_probas else self.y_predictions - return fetches, feed_dict - - def post_process_preds(self, sess_run_res): - """Returns ``tf.Session.run`` results for inference without changes.""" - return sess_run_res - - -@register('mt_bert') -class MultiTaskBert(LRScheduledTFModel): - """The component for multi-task BERT. It builds the BERT body, launches building of BERT heads. - - The component aggregates components implementing BERT heads. The head components are called tasks. - ``__call__`` and ``train_on_batch`` methods of ``MultiTaskBert`` are used for inference and training of - BERT heads. BERT head components, which are derived from ``MTBertTask``, can be used only inside this class. - - One training iteration consists of one ``train_on_batch`` call for every task. - - If ``inference_task_names`` is not ``None``, then the component is created for training. Otherwise, the - component is created for inference. If component is created for inference, several tasks can be run - simultaneously. For explanation see parameter ``inference_task_names`` description. - - Args: - tasks: a dictionary. Task names are dictionary keys and objects of ``MTBertTask`` subclasses are dictionary - values. Task names are used as variable scopes in computational graph so it is important to use same - names in multi-task BERT train and inference configuration files. - bert_config_file: path to BERT configuration file - pretrained_bert: pre-trained BERT checkpoint - attention_probs_keep_prob: keep_prob for BERT self-attention layers - hidden_keep_prob: keep_prob for BERT hidden layers - body_learning_rate: learning rate of BERT body - min_body_learning_rate: min value of body learning rate if learning rate decay is used - learning_rate_drop_patience: how many validations with no improvements to wait - learning_rate_drop_div: the divider of the learning rate after ``learning_rate_drop_patience`` unsuccessful - validations - load_before_drop: whether to load best model before dropping learning rate or not - clip_norm: clip gradients by norm - freeze_embeddings: set to False to train input embeddings - inference_task_names: names of tasks on which inference is done. - If this parameter is provided, the component is created for inference, else the component is created for - training. - - If ``inference_task_names`` is a string, then it is a name of the task called separately from other tasks - (in individual ``tf.Session.run`` call). - - If ``inference_task_names`` is a ``list``, then elements of this list are either strings or lists of - strings. You can combine these options. For example, ``["task_name1", ["task_name2", "task_name3"], - ["task_name4", "task_name5"]]``. - - If an element of ``inference_task_names`` list is a string, the element is a name of the task that is - computed when ``__call__`` method is called. - - If an element of the ``inference_task_names`` parameter is a list of strings - ``["task_name1", "task_name2", ...]``, then tasks ``"task_name1"``, ``"task_name2"`` and so on are run - simultaneously in ``tf.Session.run`` call. This option is available if tasks ``"task_name1"``, - ``"task_name2"`` and so on have common inputs. Despite the fact that tasks share inputs, if positional - arguments are used in methods ``__call__`` and ``train_on_batch``, all arguments are passed individually. - For instance, if ``"task_name1"``, ``"task_name2"``, and ``"task_name3"`` all take an argument with name - ``x`` in the model pipe, then the ``__call__`` method takes arguments ``(x, x, x)``. - in_distribution: The distribution of variables listed in the ``"in"`` config parameter between tasks. - ``in_distribution`` can be ``None`` if only 1 task is called. In that case all variables - listed in ``"in"`` are arguments of 1 task. - - ``in_distribution`` can be a dictionary of ``int``. If that is the case, then keys of ``in_distribution`` - are task names and values are numbers of variables from ``"in"`` parameter of config which are inputs of - corresponding task. The variables in ``"in"`` parameter have to be in the same order the tasks are listed - in ``in_distribution``. - - ``in_distribution`` can be a dictionary of lists of ``str``. Strings are names of variables from ``"in"`` - configuration parameter. If ``"in"`` parameter is a list, then ``in_distribution`` works the same way as - when ``in_distribution`` is dictionary of ``int``. Values of ``in_distribution``, which are lists, are - replaced by their lengths. If ``"in"`` parameter in component config is a dictionary, then the order of - strings in ``in_distribution`` values has to match the order of arguments of ``train_on_batch`` and - ``get_sess_run_infer_args`` methods of task components. - in_y_distribution: The same as ``in_distribution`` for ``"in_y"`` config parameter. - """ - def __init__(self, - tasks: Dict[str, MTBertTask], - bert_config_file: str, - pretrained_bert: str = None, - attention_probs_keep_prob: float = None, - hidden_keep_prob: float = None, - optimizer: str = None, - weight_decay_rate: float = 1e-6, - body_learning_rate: float = 1e-3, - min_body_learning_rate: float = 1e-7, - learning_rate_drop_patience: int = 20, - learning_rate_drop_div: float = 2.0, - load_before_drop: bool = True, - clip_norm: float = 1.0, - freeze_embeddings: bool = True, - inference_task_names: Optional[Union[str, List[Union[List[str], str]]]] = None, - in_distribution: Optional[Dict[str, Union[int, List[str]]]] = None, - in_y_distribution: Optional[Dict[str, Union[int, List[str]]]] = None, - **kwargs) -> None: - super().__init__(learning_rate=body_learning_rate, - learning_rate_drop_div=learning_rate_drop_div, - learning_rate_drop_patience=learning_rate_drop_patience, - load_before_drop=load_before_drop, - clip_norm=clip_norm, - **kwargs) - self.optimizer_params = { - "optimizer": optimizer, - "body_learning_rate": body_learning_rate, - "min_body_learning_rate": min_body_learning_rate, - "weight_decay_rate": weight_decay_rate - } - self.freeze_embeddings = freeze_embeddings - self.tasks = tasks - - if inference_task_names is not None and isinstance(inference_task_names, str): - inference_task_names = [inference_task_names] - self.inference_task_names = inference_task_names - - self.mode = 'train' if self.inference_task_names is None else 'inference' - - self.shared_ph = None - - self.bert_config = BertConfig.from_json_file(str(expand_path(bert_config_file))) - - if attention_probs_keep_prob is not None: - self.bert_config.attention_probs_dropout_prob = 1.0 -attention_probs_keep_prob - if hidden_keep_prob is not None: - self.bert_config.hidden_dropout_prob = 1.0 - hidden_keep_prob - - self.sess_config = tf.ConfigProto(allow_soft_placement=True) - self.sess_config.gpu_options.allow_growth = True - self.sess = tf.Session(config=self.sess_config) - - self._init_bert_body_graph() - self.build_tasks() - - self.sess.run(tf.global_variables_initializer()) - - if pretrained_bert is not None: - pretrained_bert = str(expand_path(pretrained_bert)) - if tf.train.checkpoint_exists(pretrained_bert) \ - and not (self.load_path and tf.train.checkpoint_exists(str(self.load_path.resolve()))) \ - and self.mode == 'train': - log.info('[initializing model with Bert from {}]'.format(pretrained_bert)) - var_list = self._get_saveable_variables( - exclude_scopes=('Optimizer', 'learning_rate', 'momentum') + tuple(self.tasks.keys())) - saver = tf.train.Saver(var_list) - saver.restore(self.sess, pretrained_bert) - - if self.load_path is not None: - self.load() - self.in_distribution = in_distribution - self.in_y_distribution = in_y_distribution - - def build_tasks(self): - def get_train_op(*args, **kwargs): - return self.get_train_op(*args, **kwargs) - for task_name, task_obj in self.tasks.items(): - task_obj.build( - bert_body=self.bert, - optimizer_params=self.optimizer_params, - shared_placeholders=self.shared_ph, - sess=self.sess, - mode=self.mode, - get_train_op_func=get_train_op, - freeze_embeddings=self.freeze_embeddings, - bert_head_variable_scope=task_name - ) - - def _init_shared_placeholders(self) -> None: - self.shared_ph = { - 'input_ids': tf.placeholder(shape=(None, None), - dtype=tf.int32, - name='token_indices_ph'), - 'input_masks': tf.placeholder(shape=(None, None), - dtype=tf.int32, - name='token_mask_ph'), - 'learning_rate': tf.placeholder_with_default(0.0, shape=[], name='learning_rate_ph'), - 'keep_prob': tf.placeholder_with_default(1.0, shape=[], name='keep_prob_ph'), - 'is_train': tf.placeholder_with_default(False, shape=[], name='is_train_ph')} - self.shared_ph['token_types'] = tf.placeholder_with_default( - tf.zeros_like(self.shared_ph['input_ids'], dtype=tf.int32), - shape=self.shared_ph['input_ids'].shape, - name='token_types_ph') - - def _init_bert_body_graph(self) -> None: - self._init_shared_placeholders() - sph = self.shared_ph - self.bert = BertModel(config=self.bert_config, - is_training=sph['is_train'], - input_ids=sph['input_ids'], - input_mask=sph['input_masks'], - token_type_ids=sph['token_types'], - use_one_hot_embeddings=False) - - def save(self, exclude_scopes=('Optimizer', 'learning_rate', 'momentum')) -> None: - return super().save(exclude_scopes=exclude_scopes) - - def load(self, - exclude_scopes=('Optimizer', - 'learning_rate', - 'momentum'), - **kwargs) -> None: - return super().load(exclude_scopes=exclude_scopes, **kwargs) - - def train_on_batch(self, *args, **kwargs) -> Dict[str, Dict[str, float]]: - """Calls ``train_on_batch`` methods for every task. This method takes ``args`` or ``kwargs`` but not both. - The order of ``args`` is the same as the order of tasks in the component parameters: - - .. highlight:: python - .. code-block:: python - - args = [ - task1_in_x[0], - task1_in_x[1], - task1_in_x[2], - ... - task1_in_y[0], - task1_in_y[1], - ... - task2_in_x[0], - ... - ] - - If ``kwargs`` are used and ``in_distribution`` and ``in_y_distribution`` attributes are dictionaries of lists - of strings, then keys of ``kwargs`` have to be same as strings in ``in_distribution`` and - ``in_y_distribution``. If ``in_distribution`` and ``in_y_distribution`` are dictionaries of ``int``, then - ``kwargs`` values are treated the same way as ``args``. - - Args: - args: task inputs and expected outputs - kwargs: task inputs and expected outputs - - Returns: - dictionary of dictionaries with task losses and learning rates. - """ - # TODO: test passing arguments as args - if args and kwargs: - raise ValueError("You can use either args or kwargs not both") - n_in = sum([len(inp) if isinstance(inp, list) else inp for inp in self.in_distribution.values()]) - if args: - args_in, args_in_y = args[:n_in], args[n_in:] - in_by_tasks = self._distribute_arguments_by_tasks(args_in, {}, list(self.tasks.keys()), "in") - in_y_by_tasks = self._distribute_arguments_by_tasks(args_in_y, {}, list(self.tasks.keys()), "in_y") - else: - kwargs_in, kwargs_in_y = {}, {} - for i, (k, v) in enumerate(kwargs.items()): - if i < n_in: - kwargs_in[k] = v - else: - kwargs_in_y[k] = v - in_by_tasks = self._distribute_arguments_by_tasks({}, kwargs_in, list(self.tasks.keys()), "in") - in_y_by_tasks = self._distribute_arguments_by_tasks({}, kwargs_in_y, list(self.tasks.keys()), "in_y") - train_on_batch_results = {} - for task_name, task in self.tasks.items(): - train_on_batch_results.update( - task.train_on_batch( - *in_by_tasks[task_name], - *in_y_by_tasks[task_name], - body_learning_rate=max(self.get_learning_rate(), self.optimizer_params['min_body_learning_rate']) - ) - ) - for k, v in train_on_batch_results.items(): - train_on_batch_results[k] = float(f"{v:.3}") - return train_on_batch_results - - @staticmethod - def _unite_task_feed_dicts(d1, d2, task_name): - d = copy.copy(d1) - for k, v in d2.items(): - if k in d: - comp = v != d[k] - if isinstance(comp, np.ndarray): - comp = comp.any() - if comp: - raise ValueError( - f"Value of placeholder '{k}' for task '{task_name}' does not match value of this placeholder " - "in other tasks") - else: - d[k] = v - return d - - def _distribute_arguments_by_tasks(self, args, kwargs, task_names, what_to_distribute, in_distribution=None): - if args and kwargs: - raise ValueError("You may use args or kwargs but not both") - - if what_to_distribute == "in": - if in_distribution is not None: - distribution = in_distribution - else: - distribution = self.in_distribution - elif what_to_distribute == "in_y": - if in_distribution is not None: - raise ValueError( - f"If parameter `what_to_distribute` is 'in_y', parameter `in_distribution` has to be `None`. " - f"in_distribution = {in_distribution}") - distribution = self.in_y_distribution - else: - raise ValueError(f"`what_to_distribute` can be 'in' or 'in_y', {repr(what_to_distribute)} is given") - - if distribution is None: - if len(task_names) != 1: - raise ValueError(f"If no `{what_to_distribute}_distribution` is not provided there have to be only 1" - "task for inference") - return {task_names[0]: list(kwargs.values()) if kwargs else list(args)} - - if all([isinstance(task_distr, int) for task_distr in distribution.values()]): - ints = True - elif all([isinstance(task_distr, list) for task_distr in distribution.values()]): - ints = False - else: - raise ConfigError( - f"Values of `{what_to_distribute}_distribution` attribute of `MultiTaskBert` have to be " - f"either `int` or `list` not both. " - f"{what_to_distribute}_distribution = {distribution}") - - args_by_task = {} - - flattened = [] - for task_name in task_names: - if isinstance(task_name, str): - flattened.append(task_name) - else: - flattened.extend(task_name) - task_names = flattened - - if args and not ints: - ints = True - distribution = {task_name: len(in_distr) for task_name, in_distr in distribution.items()} - if ints: - if kwargs: - values = list(kwargs.values()) - else: - values = args - n_distributed = sum([n_args for n_args in distribution.values()]) - if len(values) != n_distributed: - raise ConfigError( - f"The number of '{what_to_distribute}' arguments of MultitaskBert does not match " - f"the number of distributed params according to '{what_to_distribute}_distribution' parameter. " - f"{len(values)} parameters are in '{what_to_distribute}' and {n_distributed} parameters are " - f"required '{what_to_distribute}_distribution'. " - f"{what_to_distribute}_distribution = {distribution}") - values_taken = 0 - for task_name in task_names: - args_by_task[task_name] = {} - n_args = distribution[task_name] - args_by_task[task_name] = [values[i] for i in range(values_taken, values_taken + n_args)] - values_taken += n_args - - else: - assert kwargs - arg_names_used = [] - for task_name in task_names: - in_distr = distribution[task_name] - args_by_task[task_name] = {} - args_by_task[task_name] = [kwargs[arg_name] for arg_name in in_distr] - arg_names_used += in_distr - set_used = set(arg_names_used) - set_all = set(kwargs.keys()) - if set_used != set_all: - raise ConfigError(f"There are unused '{what_to_distribute}' parameters {set_all - set_used}") - return args_by_task - - def __call__(self, *args, **kwargs): - """Calls one or several BERT heads depending on provided task names. ``args`` and ``kwargs`` contain - inputs of BERT tasks. ``args`` and ``kwargs`` cannot be used together. If ``args`` are used ``args`` content - has to be - - .. code-block:: python - - args = [ - task1_in_x[0], - task1_in_x[1], - ... - task2_in_x[0], - task2_in_x[1], - ... - ] - - If ``kwargs`` are used and ``in_distribution`` is a dictionary of ``int``, then ``kwargs``' order has to be - the same as ``args`` order described in the previous paragraph. If ``in_distribution`` is a dictionary of - lists of ``str``, then all task names from ``in_distribution`` have to be present in ``kwargs`` keys. - - Returns: - list of results of called tasks. - """ - if self.inference_task_names is None: - task_names = list(self.tasks.keys()) - else: - task_names = self.inference_task_names - if not task_names: - raise ValueError("No tasks to call") - if args and kwargs: - raise ValueError("You may use either args or kwargs not both") - return self.call(args, kwargs, task_names) - - def call( - self, - args: Tuple[Any], - kwargs: Dict[str, Any], - task_names: Optional[Union[List[str], str]], - in_distribution: Optional[Union[Dict[str, int], Dict[str, List[str]]]] = None, - ): - """Calls one or several BERT heads depending on provided task names in ``task_names`` parameter. ``args`` and - ``kwargs`` contain inputs of BERT tasks. ``args`` and ``kwargs cannot be used simultaneously. If ``args`` are - used ``args``, content has to be - - .. code-block:: python - - args = [ - task1_in_x[0], - task1_in_x[1], - ... - task2_in_x[0], - task2_in_x[1], - ... - ] - - If ``kwargs`` is used ``kwargs`` keys has to match content of ``in_names`` params of called tasks. - - Args: - args: generally, ``args`` parameter of ``__call__`` method of this component or ``MTBertReUser``. Inputs - of one or several tasks. Has to be empty if ``kwargs`` argument is used. - kwargs: generally, ``kwargs`` parameter of ``__call__`` method of this component or ``MTBertReUser``. - Inputs of one or several tasks. Has to be empty if ``args`` argument is used. - task_names: names of tasks that are called. If ``str``, then 1 task is called. If a task name is an - element of ``task_names`` list, then this task is run independently. If task an element of - ``task_names`` is an list of strings, then tasks in the inner list are run simultaneously. - in_distribution: a distribution of variables from ``"in"`` config parameters between tasks. For details - see method ``__init__`` docstring. - - Returns: - list results of called tasks. - """ - args_by_task = self._distribute_arguments_by_tasks(args, kwargs, task_names, "in", in_distribution) - results = [] - task_count = 0 - for elem in task_names: - if isinstance(elem, str): - task_count += 1 - task = self.tasks[elem] - fetches, feed_dict = task.get_sess_run_infer_args(*args_by_task[elem]) - sess_run_res = self.sess.run(fetches, feed_dict=feed_dict) - results.append(task.post_process_preds(sess_run_res)) - else: - fetches = [] - for task_name in elem: - task_count += 1 - feed_dict = {} - task_fetches, task_feed_dict = self.tasks[task_name].get_sess_run_infer_args( - *args_by_task[task_name]) - fetches.append(task_fetches) - feed_dict = self._unite_task_feed_dicts(feed_dict, task_feed_dict, task_name) - sess_run_res = self.sess.run(fetches, feed_dict=feed_dict) - for task_name, srs in zip(elem, sess_run_res): - task_results = self.tasks[task_name].post_process_preds(srs) - results.append(task_results) - if task_count == 1: - results = results[0] - return results - - -@register("mt_bert_reuser") -class MTBertReUser: - """Instances of this class are for multi-task BERT inference. In inference config ``MultiTaskBert`` class may - not perform inference of some tasks. For example, you may need to sequentially apply two models with BERT. - In that case, ``mt_bert_reuser`` is created to call remaining tasks. - - Args: - mt_bert: An instance of ``MultiTaskBert`` - task_names: Names of infered tasks. If ``task_names`` is ``str``, then ``task_names`` is the name of the only - infered task. If ``task_names`` is ``list``, then its elements can be either strings or lists of strings. - If an element of ``task_names`` is a string, then this element is a name of a task that is run - independently. If an element of ``task_names`` is a list of strings, then the element is a list of names - of tasks that have common inputs and run simultaneously. For detailed information look up - ``MultiTaskBert`` ``inference_task_names`` parameter. - """ - def __init__( - self, - mt_bert: MultiTaskBert, - task_names: Union[str, List[Union[str, List[str]]]], - in_distribution: Union[Dict[str, int], Dict[str, List[str]]] = None, - *args, - **kwargs): - self.mt_bert = mt_bert - if isinstance(task_names, str): - task_names = [task_names] - elif not task_names: - raise ValueError("At least 1 task has to specified") - self.task_names = task_names - flattened = [] - for elem in self.task_names: - if isinstance(elem, str): - flattened.append(elem) - else: - flattened.extend(elem) - - if in_distribution is None: - if len(flattened) > 1: - raise ValueError( - "If ``in_distribution`` parameter is not provided, there has to be only 1 task." - f"task_names = {self.task_names}") - - self.in_distribution = in_distribution - - def __call__(self, *args, **kwargs) -> List[Any]: - """Infer tasks listed in parameter ``task_names``. One of parameters ``args`` and ``kwargs`` has to be empty. - - Args: - args: inputs and labels of infered tasks. - kwargs: inputs and labels of infered tasks. - - Returns: - list of results of inference of tasks listed in ``task_names`` - """ - res = self.mt_bert.call(args, kwargs, task_names=self.task_names, in_distribution=self.in_distribution) - return res - - -@register("input_splitter") -class InputSplitter: - """The instance of these class in pipe splits a batch of sequences of identical length or dictionaries with - identical keys into tuple of batches. - - Args: - keys_to_extract: a sequence of ints or strings that have to match keys of split dictionaries. - """ - def __init__(self, keys_to_extract: Union[List[str], Tuple[str, ...]], **kwargs): - self.keys_to_extract = keys_to_extract - - def __call__(self, inp: Union[List[dict], List[List[int]], List[Tuple[int]]]) -> List[list]: - """Returns batches of values from ``inp``. Every batch contains values that have same key from - ``keys_to_extract`` attribute. The order of elements of ``keys_to_extract`` is preserved. - - Args: - inp: A sequence of dictionaries with identical keys - - Returns: - A list of lists of values of dictionaries from ``inp`` - """ - extracted = [[] for _ in self.keys_to_extract] - for item in inp: - for i, key in enumerate(self.keys_to_extract): - extracted[i].append(item[key]) - return extracted - diff --git a/deeppavlov/models/nemo/__init__.py b/deeppavlov/models/nemo/__init__.py deleted file mode 100644 index e69de29bb2..0000000000 diff --git a/deeppavlov/models/nemo/asr.py b/deeppavlov/models/nemo/asr.py deleted file mode 100644 index 70527adea3..0000000000 --- a/deeppavlov/models/nemo/asr.py +++ /dev/null @@ -1,193 +0,0 @@ -# Copyright 2020 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import logging -from io import BytesIO -from pathlib import Path -from typing import List, Optional, Tuple, Union, Dict - -import torch -from nemo.collections.asr import AudioToMelSpectrogramPreprocessor, JasperEncoder, JasperDecoderForCTC, GreedyCTCDecoder -from nemo.collections.asr.helpers import post_process_predictions -from nemo.collections.asr.parts.features import WaveformFeaturizer -from nemo.core.neural_types import AudioSignal, NeuralType, LengthsType -from nemo.utils.decorators import add_port_docs -from torch import Tensor -from torch.utils.data import Dataset, DataLoader - -from deeppavlov.core.common.registry import register -from deeppavlov.models.nemo.common import CustomDataLayerBase, NeMoBase - -log = logging.getLogger(__name__) - - -class AudioInferDataset(Dataset): - def __init__(self, audio_batch: List[Union[str, BytesIO]], sample_rate: int, int_values: bool, trim=False) -> None: - """Dataset reader for AudioInferDataLayer. - - Args: - audio_batch: Batch to be read. Elements could be either paths to audio files or Binary I/O objects. - sample_rate: Audio files sample rate. - int_values: If true, load samples as 32-bit integers. - trim: Trim leading and trailing silence from an audio signal if True. - - """ - self.audio_batch = audio_batch - self.featurizer = WaveformFeaturizer(sample_rate=sample_rate, int_values=int_values) - self.trim = trim - - def __getitem__(self, index: int) -> Tuple[Tensor, Tensor]: - """Processes audio batch item and extracts features. - - Args: - index: Audio batch item index. - - Returns: - features: Audio file's extracted features tensor. - features_length: Features length tensor. - - """ - sample = self.audio_batch[index] - features = self.featurizer.process(sample, trim=self.trim) - features_length = torch.tensor(features.shape[0]).long() - - return features, features_length - - def __len__(self) -> int: - return len(self.audio_batch) - - -class AudioInferDataLayer(CustomDataLayerBase): - """Data Layer for ASR pipeline inference.""" - - @property - @add_port_docs() - def output_ports(self) -> Dict[str, NeuralType]: - return { - "audio_signal": NeuralType(('B', 'T'), AudioSignal(freq=self._sample_rate)), - "a_sig_length": NeuralType(tuple('B'), LengthsType()) - } - - def __init__(self, *, - audio_batch: List[Union[str, BytesIO]], - batch_size: int = 32, - sample_rate: int = 16000, - int_values: bool = False, - trim_silence: bool = False, - **kwargs) -> None: - """Initializes Data Loader. - - Args: - audio_batch: Batch to be read. Elements could be either paths to audio files or Binary I/O objects. - batch_size: How many samples per batch to load. - sample_rate: Target sampling rate for data. Audio files will be resampled to sample_rate if - it is not already. - int_values: If true, load data as 32-bit integers. - trim_silence: Trim leading and trailing silence from an audio signal if True. - - """ - self._sample_rate = sample_rate - - dataset = AudioInferDataset(audio_batch=audio_batch, sample_rate=sample_rate, int_values=int_values, - trim=trim_silence) - - dataloader = DataLoader(dataset=dataset, batch_size=batch_size, collate_fn=self.seq_collate_fn) - super(AudioInferDataLayer, self).__init__(dataset, dataloader, **kwargs) - - @staticmethod - def seq_collate_fn(batch: Tuple[Tuple[Tensor], Tuple[Tensor]]) -> Tuple[Optional[Tensor], Optional[Tensor]]: - """Collates batch of audio signal and audio length, zero pads audio signal. - - Args: - batch: A tuple of tuples of audio signals and signal lengths. This collate function assumes the signals - are 1d torch tensors (i.e. mono audio). - - Returns: - audio_signal: Zero padded audio signal tensor. - audio_length: Audio signal length tensor. - - """ - _, audio_lengths = zip(*batch) - max_audio_len = 0 - has_audio = audio_lengths[0] is not None - if has_audio: - max_audio_len = max(audio_lengths).item() - - audio_signal = [] - for sig, sig_len in batch: - if has_audio: - sig_len = sig_len.item() - if sig_len < max_audio_len: - pad = (0, max_audio_len - sig_len) - sig = torch.nn.functional.pad(sig, pad) - audio_signal.append(sig) - - if has_audio: - audio_signal = torch.stack(audio_signal) - audio_lengths = torch.stack(audio_lengths) - else: - audio_signal, audio_lengths = None, None - - return audio_signal, audio_lengths - - -@register('nemo_asr') -class NeMoASR(NeMoBase): - """ASR model on NeMo modules.""" - - def __init__(self, load_path: Union[str, Path], nemo_params_path: Union[str, Path], **kwargs) -> None: - """Initializes NeuralModules for ASR. - - Args: - load_path: Path to a directory with pretrained checkpoints for JasperEncoder and JasperDecoderForCTC. - nemo_params_path: Path to a file containig labels and params for AudioToMelSpectrogramPreprocessor, - JasperEncoder, JasperDecoderForCTC and AudioInferDataLayer. - - """ - super(NeMoASR, self).__init__(load_path=load_path, nemo_params_path=nemo_params_path, **kwargs) - - self.labels = self.nemo_params['labels'] - - self.data_preprocessor = AudioToMelSpectrogramPreprocessor( - **self.nemo_params['AudioToMelSpectrogramPreprocessor'] - ) - self.jasper_encoder = JasperEncoder(**self.nemo_params['JasperEncoder']) - self.jasper_decoder = JasperDecoderForCTC(num_classes=len(self.labels), **self.nemo_params['JasperDecoder']) - self.greedy_decoder = GreedyCTCDecoder() - self.modules_to_restore = [self.jasper_encoder, self.jasper_decoder] - - self.load() - - def __call__(self, audio_batch: List[Union[str, BytesIO]]) -> List[str]: - """Transcripts audio batch to text. - - Args: - audio_batch: Batch to be transcribed. Elements could be either paths to audio files or Binary I/O objects. - - Returns: - text_batch: Batch of transcripts. - - """ - data_layer = AudioInferDataLayer(audio_batch=audio_batch, **self.nemo_params['AudioToTextDataLayer']) - audio_signal, audio_signal_len = data_layer() - processed_signal, processed_signal_len = self.data_preprocessor(input_signal=audio_signal, - length=audio_signal_len) - encoded, encoded_len = self.jasper_encoder(audio_signal=processed_signal, length=processed_signal_len) - log_probs = self.jasper_decoder(encoder_output=encoded) - predictions = self.greedy_decoder(log_probs=log_probs) - eval_tensors = [predictions] - tensors = self.neural_factory.infer(tensors=eval_tensors) - text_batch = post_process_predictions(tensors[0], self.labels) - - return text_batch diff --git a/deeppavlov/models/nemo/common.py b/deeppavlov/models/nemo/common.py deleted file mode 100644 index 883483c5d6..0000000000 --- a/deeppavlov/models/nemo/common.py +++ /dev/null @@ -1,117 +0,0 @@ -# Copyright 2020 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import base64 -from io import BytesIO -from logging import getLogger -from pathlib import Path -from typing import Union - -import nemo -import torch -from nemo.backends.pytorch import DataLayerNM -from torch.utils.data import Dataset, DataLoader - -from deeppavlov.core.commands.utils import expand_path -from deeppavlov.core.common.file import read_yaml -from deeppavlov.core.common.registry import register -from deeppavlov.core.models.component import Component -from deeppavlov.core.models.serializable import Serializable - -log = getLogger(__name__) - - -@register('base64_decode_bytesIO') -def ascii_to_bytes_io(batch: Union[str, list]) -> Union[BytesIO, list]: - """Recursively searches for strings in the input batch and converts them into the base64-encoded bytes wrapped in - Binary I/O objects. - - Args: - batch: A string or an iterable container with strings at some level of nesting. - - Returns: - The same structure where all strings are converted into the base64-encoded bytes wrapped in Binary I/O objects. - - """ - if isinstance(batch, str): - return BytesIO(base64.decodebytes(batch.encode())) - - return list(map(ascii_to_bytes_io, batch)) - - -@register('bytesIO_encode_base64') -def bytes_io_to_ascii(batch: Union[BytesIO, list]) -> Union[str, list]: - """Recursively searches for Binary I/O objects in the input batch and converts them into ASCII-strings. - - Args: - batch: A BinaryIO object or an iterable container with BinaryIO objects at some level of nesting. - - Returns: - The same structure where all BinaryIO objects are converted into strings. - - """ - if isinstance(batch, BytesIO): - return base64.encodebytes(batch.read()).decode('ascii') - - return list(map(bytes_io_to_ascii, batch)) - - -class NeMoBase(Component, Serializable): - """Base class for NeMo Chainer's pipeline components.""" - - def __init__(self, load_path: Union[str, Path], nemo_params_path: Union[str, Path], **kwargs) -> None: - """Initializes NeuralModuleFactory on CPU or GPU and reads nemo modules params from yaml. - - Args: - load_path: Path to a directory with pretrained checkpoints for NeMo modules. - nemo_params_path: Path to a file containig NeMo modules params. - - """ - super(NeMoBase, self).__init__(save_path=None, load_path=load_path, **kwargs) - placement = nemo.core.DeviceType.GPU if torch.cuda.is_available() else nemo.core.DeviceType.CPU - self.neural_factory = nemo.core.NeuralModuleFactory(placement=placement) - self.modules_to_restore = [] - self.nemo_params = read_yaml(expand_path(nemo_params_path)) - - def __call__(self, *args, **kwargs): - raise NotImplementedError - - def load(self) -> None: - """Loads pretrained checkpoints for modules from self.modules_to_restore list.""" - module_names = [str(module) for module in self.modules_to_restore] - checkpoints = nemo.utils.get_checkpoint_from_dir(module_names, self.load_path) - for module, checkpoint in zip(self.modules_to_restore, checkpoints): - log.info(f'Restoring {module} from {checkpoint}') - module.restore_from(checkpoint) - - def save(self, *args, **kwargs) -> None: - pass - - -class CustomDataLayerBase(DataLayerNM): - def __init__(self, dataset: Dataset, dataloader: DataLoader, **kwargs) -> None: - super(CustomDataLayerBase, self).__init__() - self._dataset = dataset - self._dataloader = dataloader - - def __len__(self) -> int: - return len(self._dataset) - - @property - def dataset(self) -> None: - return None - - @property - def data_iterator(self) -> torch.utils.data.DataLoader: - return self._dataloader diff --git a/deeppavlov/models/nemo/tts.py b/deeppavlov/models/nemo/tts.py deleted file mode 100644 index d31fa0bcfb..0000000000 --- a/deeppavlov/models/nemo/tts.py +++ /dev/null @@ -1,210 +0,0 @@ -# Copyright 2020 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from functools import partial -from io import BytesIO -from logging import getLogger -from pathlib import Path -from typing import List, Optional, Tuple, Union, Dict - -import torch -from nemo.collections.asr.parts import collections, parsers -from nemo.collections.asr.parts.dataset import TranscriptDataset -from nemo.collections.tts import TextEmbedding, Tacotron2Encoder, Tacotron2DecoderInfer, Tacotron2Postnet -from nemo.core.neural_types import NeuralType, LabelsType, LengthsType -from nemo.utils.decorators import add_port_docs -from nemo.utils.misc import pad_to -from scipy.io import wavfile -from torch import Tensor - -from deeppavlov.core.commands.utils import expand_path -from deeppavlov.core.common.registry import register -from deeppavlov.models.nemo.common import CustomDataLayerBase, NeMoBase -from deeppavlov.models.nemo.vocoder import WaveGlow, GriffinLim - -log = getLogger(__name__) - - -class TextDataset(TranscriptDataset): - def __init__(self, - text_batch: List[str], - labels: List[str], - bos_id: Optional[int] = None, - eos_id: Optional[int] = None, - lowercase: bool = True) -> None: - """Text dataset reader for TextDataLayer. - - Args: - text_batch: Texts to be used for speech synthesis. - labels: List of string labels to use when to str2int translation. - bos_id: Label position of beginning of string symbol. - eos_id: Label position of end of string symbol. - lowercase: Whether to convert all uppercase characters in a text batch into lowercase characters. - - """ - parser = parsers.make_parser(labels, do_lowercase=lowercase) - self.texts = collections.Text(text_batch, parser) - self.bos_id = bos_id - self.eos_id = eos_id - - -class TextDataLayer(CustomDataLayerBase): - @property - @add_port_docs() - def output_ports(self) -> Dict[str, NeuralType]: - return { - 'texts': NeuralType(('B', 'T'), LabelsType()), - "texts_length": NeuralType(tuple('B'), LengthsType()) - } - - def __init__(self, *, - text_batch: List[str], - labels: List[str], - batch_size: int = 32, - bos_id: Optional[int] = None, - eos_id: Optional[int] = None, - pad_id: Optional[int] = None, - **kwargs) -> None: - """A simple Neural Module for loading text data. - - Args: - text_batch: Texts to be used for speech synthesis. - labels: List of string labels to use when to str2int translation. - batch_size: How many strings per batch to load. - bos_id: Label position of beginning of string symbol. If None is initialized as `len(labels)`. - eos_id: Label position of end of string symbol. If None is initialized as `len(labels) + 1`. - pad_id: Label position of pad symbol. If None is initialized as `len(labels) + 2`. - - """ - len_labels = len(labels) - if bos_id is None: - bos_id = len_labels - if eos_id is None: - eos_id = len_labels + 1 - if pad_id is None: - pad_id = len_labels + 2 - - dataset = TextDataset(text_batch=text_batch, labels=labels, bos_id=bos_id, eos_id=eos_id) - - dataloader = torch.utils.data.DataLoader(dataset=dataset, batch_size=batch_size, - collate_fn=partial(self._collate_fn, pad_id=pad_id)) - super(TextDataLayer, self).__init__(dataset, dataloader, **kwargs) - - @staticmethod - def _collate_fn(batch: Tuple[Tuple[Tensor], Tuple[Tensor]], pad_id: int) -> Tuple[Tensor, Tensor]: - """Collates batch of texts. - - Args: - batch: A tuple of tuples of audio signals and signal lengths. - pad_id: Label position of pad symbol. - - Returns: - texts: Padded texts tensor. - texts_len: Text lengths tensor. - - """ - texts_list, texts_len = zip(*batch) - max_len = max(texts_len) - max_len = pad_to(max_len, 8) - - texts = torch.empty(len(texts_list), max_len, dtype=torch.long) - texts.fill_(pad_id) - - for i, text in enumerate(texts_list): - texts[i].narrow(0, 0, text.size(0)).copy_(text) - - if len(texts.shape) != 2: - raise ValueError(f'Texts in collate function have shape {texts.shape}, should have 2 dimensions.') - - return texts, torch.stack(texts_len) - - -@register('nemo_tts') -class NeMoTTS(NeMoBase): - """TTS model on NeMo modules.""" - def __init__(self, - load_path: Union[str, Path], - nemo_params_path: Union[str, Path], - vocoder: str = 'waveglow', - **kwargs) -> None: - """Initializes NeuralModules for TTS. - - Args: - load_path: Path to a directory with pretrained checkpoints for TextEmbedding, Tacotron2Encoder, - Tacotron2DecoderInfer, Tacotron2Postnet and, if Waveglow vocoder is selected, WaveGlowInferNM. - nemo_params_path: Path to a file containig sample_rate, labels and params for TextEmbedding, - Tacotron2Encoder, Tacotron2Decoder, Tacotron2Postnet and TranscriptDataLayer. - vocoder: Vocoder used to convert from spectrograms to audio. Available options: `waveglow` (needs pretrained - checkpoint) and `griffin-lim`. - - """ - super(NeMoTTS, self).__init__(load_path=load_path, nemo_params_path=nemo_params_path, **kwargs) - - self.sample_rate = self.nemo_params['sample_rate'] - self.text_embedding = TextEmbedding( - len(self.nemo_params['labels']) + 3, # + 3 special chars - **self.nemo_params['TextEmbedding'] - ) - self.t2_enc = Tacotron2Encoder(**self.nemo_params['Tacotron2Encoder']) - self.t2_dec = Tacotron2DecoderInfer(**self.nemo_params['Tacotron2Decoder']) - self.t2_postnet = Tacotron2Postnet(**self.nemo_params['Tacotron2Postnet']) - self.modules_to_restore = [self.text_embedding, self.t2_enc, self.t2_dec, self.t2_postnet] - - if vocoder == 'waveglow': - self.vocoder = WaveGlow(**self.nemo_params['WaveGlowNM']) - self.modules_to_restore.append(self.vocoder) - elif vocoder == 'griffin-lim': - self.vocoder = GriffinLim(**self.nemo_params['GriffinLim']) - else: - raise ValueError(f'{vocoder} vocoder is not supported.') - - self.load() - - def __call__(self, - text_batch: List[str], - path_batch: Optional[List[str]] = None) -> Union[List[BytesIO], List[str]]: - """Creates wav files or file objects with speech. - - Args: - text_batch: Text from which human audible speech should be generated. - path_batch: i-th element of `path_batch` is the path to save i-th generated speech file. If argument isn't - specified, the synthesized speech will be stored to Binary I/O objects. - - Returns: - List of Binary I/O objects with generated speech if `path_batch` was not specified, list of paths to files - with synthesized speech otherwise. - - """ - if path_batch is None: - path_batch = [BytesIO() for _ in text_batch] - elif len(text_batch) != len(path_batch): - raise ValueError('Text batch length differs from path batch length.') - else: - path_batch = [expand_path(path) for path in path_batch] - - data_layer = TextDataLayer(text_batch=text_batch, **self.nemo_params['TranscriptDataLayer']) - transcript, transcript_len = data_layer() - transcript_embedded = self.text_embedding(char_phone=transcript) - transcript_encoded = self.t2_enc(char_phone_embeddings=transcript_embedded, embedding_length=transcript_len) - mel_decoder, gate, alignments, mel_len = self.t2_dec(char_phone_encoded=transcript_encoded, - encoded_length=transcript_len) - mel_postnet = self.t2_postnet(mel_input=mel_decoder) - infer_tensors = [self.vocoder(mel_postnet), mel_len] - evaluated_tensors = self.neural_factory.infer(tensors=infer_tensors) - synthesized_batch = self.vocoder.get_audio(*evaluated_tensors) - - for fout, synthesized_audio in zip(path_batch, synthesized_batch): - wavfile.write(fout, self.sample_rate, synthesized_audio) - - return path_batch diff --git a/deeppavlov/models/nemo/vocoder.py b/deeppavlov/models/nemo/vocoder.py deleted file mode 100644 index 3ec918d266..0000000000 --- a/deeppavlov/models/nemo/vocoder.py +++ /dev/null @@ -1,131 +0,0 @@ -# Copyright 2020 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from logging import getLogger -from typing import List - -import librosa -import numpy as np -from nemo.core.neural_types import NmTensor -from nemo.collections.tts import WaveGlowInferNM -from numpy import ndarray - -log = getLogger(__name__) - - -class BaseVocoder: - """Class is used to maintain consistency in the construction of the TTS pipeline based on NeMo modules.""" - - def __call__(self, tensor: NmTensor) -> NmTensor: - """Should return the tensor after the evaluation of which speech could be synthesized with `get_audio` method""" - raise NotImplementedError - - def get_audio(self, evaluated_tensor: list, mel_len: list): - """Synthesizes audio from the evaluated tensor constructed by `__call__` method.""" - raise NotImplementedError - - -class WaveGlow(BaseVocoder): - def __init__(self, *, denoiser_strength: float = 0.0, n_window_stride: int = 160, **kwargs) -> None: - """Wraps WaveGlowInferNM module. - - Args: - denoiser_strength: Denoiser strength for waveglow. - n_window_stride: Stride of window for FFT in samples used in model training. - kwargs: Named arguments for WaveGlowInferNM constructor. - - """ - self.waveglow = WaveGlowInferNM(**kwargs) - self.denoiser_strength = denoiser_strength - self.n_window_stride = n_window_stride - - def __call__(self, mel_postnet: NmTensor) -> NmTensor: - return self.waveglow(mel_spectrogram=mel_postnet) - - def __str__(self): - return str(self.waveglow) - - def restore_from(self, path: str) -> None: - """Wraps WaveGlowInferNM restore_from method.""" - self.waveglow.restore_from(path) - if self.denoiser_strength > 0: - log.info('Setup denoiser for WaveGlow') - self.waveglow.setup_denoiser() - - def get_audio(self, evaluated_audio: list, mel_len: list) -> List[ndarray]: - """Unpacks audio data from evaluated tensor and denoises it if `denoiser_strength` > 0.""" - audios = [] - for i, batch in enumerate(evaluated_audio): - audio = batch.cpu().numpy() - for j, sample in enumerate(audio): - sample_len = mel_len[i][j] * self.n_window_stride - sample = sample[:sample_len] - if self.denoiser_strength > 0: - sample, _ = self.waveglow.denoise(sample, strength=self.denoiser_strength) - audios.append(sample) - return audios - - -class GriffinLim(BaseVocoder): - def __init__(self, *, - sample_rate: float = 16000.0, - n_fft: int = 1024, - mag_scale: float = 2048.0, - power: float = 1.2, - n_iters: int = 50, - **kwargs) -> None: - """Uses Griffin Lim algorithm to generate speech from spectrograms. - - Args: - sample_rate: Generated audio data sample rate. - n_fft: The number of points to use for the FFT. - mag_scale: Multiplied with the linear spectrogram to avoid audio sounding muted due to mel filter - normalization. - power: The linear spectrogram is raised to this power prior to running the Griffin Lim algorithm. A power - of greater than 1 has been shown to improve audio quality. - n_iters: Number of iterations of convertion magnitude spectrograms to audio signal. - - """ - self.mag_scale = mag_scale - self.power = power - self.n_iters = n_iters - self.n_fft = n_fft - self.filterbank = librosa.filters.mel(sr=sample_rate, n_fft=n_fft, **kwargs) - - def __call__(self, mel_postnet: NmTensor) -> NmTensor: - return mel_postnet - - def get_audio(self, mel_spec: list, mel_len: list) -> List[ndarray]: - audios = [] - for i, batch in enumerate(mel_spec): - log_mel = batch.cpu().numpy().transpose(0, 2, 1) - mel = np.exp(log_mel) - magnitudes = np.dot(mel, self.filterbank) * self.mag_scale - for j, sample in enumerate(magnitudes): - sample = sample[:mel_len[i][j], :] - audio = self.griffin_lim(sample.T ** self.power) - audios.append(audio) - return audios - - def griffin_lim(self, magnitudes): - """Griffin-Lim algorithm to convert magnitude spectrograms to audio signals.""" - phase = np.exp(2j * np.pi * np.random.rand(*magnitudes.shape)) - complex_spec = magnitudes * phase - signal = librosa.istft(complex_spec) - - for _ in range(self.n_iters): - _, phase = librosa.magphase(librosa.stft(signal, n_fft=self.n_fft)) - complex_spec = magnitudes * phase - signal = librosa.istft(complex_spec) - return signal diff --git a/deeppavlov/models/ner/NER_model.py b/deeppavlov/models/ner/NER_model.py deleted file mode 100644 index 565e474ac0..0000000000 --- a/deeppavlov/models/ner/NER_model.py +++ /dev/null @@ -1,317 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import string -from logging import getLogger - -import numpy as np -import tensorflow as tf -import tensorflow_hub as hub -from gensim.models import KeyedVectors -from gensim.models.wrappers import FastText -from tensorflow.contrib.layers import xavier_initializer, xavier_initializer_conv2d - -from deeppavlov.core.commands.utils import expand_path -from deeppavlov.core.common.registry import register -from deeppavlov.core.models.tf_model import LRScheduledTFModel - -log = getLogger(__name__) - - -@register('hybrid_ner_model') -class HybridNerModel(LRScheduledTFModel): - """ This class implements the hybrid NER model published in the paper: http://www.ijmlc.org/show-83-881-1.html - - Params: - n_tags: Number of pre-defined tags. - word_emb_path: The path to the pretrained word embedding model. - word_emb_name: The name of pretrained word embedding model. - One of the two values should be set including 'glove', 'baomoi' corresponding to two pre-trained word - embedding models: GloVe (https://www.aclweb.org/anthology/D14-1162/) - and baomoi (https://github.com/sonvx/word2vecVN). Otherwise, the word lookup table will be trained - from scratch. - word_vocab: The word vocabulary class. - word_dim: The dimension of the pretrained word vector. - char_vocab_size: The size of character vocabulary. - pos_vocab_size: The size of POS vocabulary. - chunk_vocab_size: The size of Chunk vocabulary. - char_dim: The dimension of character vector. - elmo_dim: The dimension of ELMo-based word vector - elmo_hub_path: The path to the ELmo tensorhub - pos_dim: The dimension of POS vector. - chunk_dim: The dimension of Chunk vector. - cap_dim: The dimension of capitalization vector. - cap_vocab_size: The size of capitalization vocabulary. - lstm_hidden_size: The number of units in contextualized Bi-LSTM network - drop_out_keep_prob: The probability of keeping hidden state - """ - - def __init__(self, - n_tags: int, - word_vocab, - word_dim: int, - word_emb_path: str, - word_emb_name: str = None, - char_vocab_size: int = None, - pos_vocab_size: int = None, - chunk_vocab_size: int = None, - char_dim: int = None, - elmo_dim: int = None, - elmo_hub_path: str = "https://tfhub.dev/google/elmo/2", - pos_dim: int = None, - chunk_dim: int = None, - cap_dim: int = None, - cap_vocab_size: int = 5, - lstm_hidden_size: int = 256, - dropout_keep_prob: float = 0.5, - **kwargs) -> None: - - assert n_tags != 0, 'Number of classes equal 0! It seems that vocabularies is not loaded.' \ - ' Check that all vocabulary files are downloaded!' - - if 'learning_rate_drop_div' not in kwargs: - kwargs['learning_rate_drop_div'] = 10.0 - if 'learning_rate_drop_patience' not in kwargs: - kwargs['learning_rate_drop_patience'] = 5.0 - if 'clip_norm' not in kwargs: - kwargs['clip_norm'] = 5.0 - super().__init__(**kwargs) - - word2id = word_vocab.t2i - word_emb_path = str(expand_path(word_emb_path)) - - self._dropout_ph = tf.placeholder_with_default(dropout_keep_prob, shape=[], name='dropout') - self.training_ph = tf.placeholder_with_default(False, shape=[], name='is_training') - self._y_ph = tf.placeholder(tf.int32, [None, None], name='y_ph') - - self._xs_ph_list = [] - self._input_features = [] - - # use for word contextualized bi-lstm, elmo - self.real_sent_lengths_ph = tf.placeholder(tf.int32, [None], name="real_sent_lengths") - self._xs_ph_list.append(self.real_sent_lengths_ph) - - # Word emb - with tf.variable_scope("word_emb"): - word_ids_ph = tf.placeholder(tf.int32, [None, None], name="word_ids") - self._xs_ph_list.append(word_ids_ph) - - word_embeddings = self.load_pretrained_word_emb(word_emb_path, word_emb_name, word_dim, word2id) - - word_lookup_table = tf.Variable(word_embeddings, dtype=tf.float32, trainable=True, name="word_embeddings") - word_emb = tf.nn.embedding_lookup(word_lookup_table, word_ids_ph, name="embedded_word") - self._input_features.append(word_emb) - - # POS feature - if pos_dim is not None: - with tf.variable_scope("pos_emb"): - pos_ph = tf.placeholder(tf.int32, [None, None], name="pos_ids") - self._xs_ph_list.append(pos_ph) - - tf_pos_embeddings = tf.get_variable(name="pos_embeddings", - dtype=tf.float32, - shape=[pos_vocab_size, pos_dim], - trainable=True, - initializer=xavier_initializer()) - - embedded_pos = tf.nn.embedding_lookup(tf_pos_embeddings, - pos_ph, - name="embedded_pos") - self._input_features.append(embedded_pos) - - # Chunk feature - if chunk_dim is not None: - with tf.variable_scope("chunk_emb"): - chunk_ph = tf.placeholder(tf.int32, [None, None], name="chunk_ids") - self._xs_ph_list.append(chunk_ph) - - tf_chunk_embeddings = tf.get_variable(name="chunk_embeddings", - dtype=tf.float32, - shape=[chunk_vocab_size, chunk_dim], - trainable=True, - initializer=xavier_initializer()) - - embedded_chunk = tf.nn.embedding_lookup(tf_chunk_embeddings, - chunk_ph, - name="embedded_chunk") - self._input_features.append(embedded_chunk) - - # Capitalization feature - if cap_dim is not None: - with tf.variable_scope("cap_emb"): - cap_ph = tf.placeholder(tf.int32, [None, None], name="cap_ids") - self._xs_ph_list.append(cap_ph) - - tf_cap_embeddings = tf.get_variable(name="cap_embeddings", - dtype=tf.float32, - shape=[cap_vocab_size, cap_dim], - trainable=True, - initializer=xavier_initializer()) - - embedded_cap = tf.nn.embedding_lookup(tf_cap_embeddings, - cap_ph, - name="embedded_cap") - self._input_features.append(embedded_cap) - - # Character feature - if char_dim is not None: - with tf.variable_scope("char_emb"): - char_ids_ph = tf.placeholder(tf.int32, [None, None, None], name="char_ids") - self._xs_ph_list.append(char_ids_ph) - - tf_char_embeddings = tf.get_variable(name="char_embeddings", - dtype=tf.float32, - shape=[char_vocab_size, char_dim], - trainable=True, - initializer=xavier_initializer()) - embedded_cnn_chars = tf.nn.embedding_lookup(tf_char_embeddings, - char_ids_ph, - name="embedded_cnn_chars") - conv1 = tf.layers.conv2d(inputs=embedded_cnn_chars, - filters=128, - kernel_size=(1, 3), - strides=(1, 1), - padding="same", - name="conv1", - kernel_initializer=xavier_initializer_conv2d()) - conv2 = tf.layers.conv2d(inputs=conv1, - filters=128, - kernel_size=(1, 3), - strides=(1, 1), - padding="same", - name="conv2", - kernel_initializer=xavier_initializer_conv2d()) - char_cnn = tf.reduce_max(conv2, axis=2) - - self._input_features.append(char_cnn) - - # ELMo - if elmo_dim is not None: - with tf.variable_scope("elmo_emb"): - padded_x_tokens_ph = tf.placeholder(tf.string, [None, None], name="padded_x_tokens") - self._xs_ph_list.append(padded_x_tokens_ph) - - elmo = hub.Module(elmo_hub_path, trainable=True) - emb = elmo(inputs={"tokens": padded_x_tokens_ph, "sequence_len": self.real_sent_lengths_ph}, - signature="tokens", as_dict=True)["elmo"] - elmo_emb = tf.layers.dense(emb, elmo_dim, activation=None) - self._input_features.append(elmo_emb) - - features = tf.nn.dropout(tf.concat(self._input_features, axis=2), self._dropout_ph) - - with tf.variable_scope("bi_lstm_words"): - cell_fw = tf.contrib.rnn.LSTMCell(lstm_hidden_size) - cell_bw = tf.contrib.rnn.LSTMCell(lstm_hidden_size) - (output_fw, output_bw), _ = tf.nn.bidirectional_dynamic_rnn(cell_fw, cell_bw, features, - sequence_length=self.real_sent_lengths_ph, - dtype=tf.float32) - self.output = tf.concat([output_fw, output_bw], axis=-1) - - ntime_steps = tf.shape(self.output)[1] - self.output = tf.reshape(self.output, [-1, 2 * lstm_hidden_size]) - layer1 = tf.nn.dropout(tf.layers.dense(inputs=self.output, units=lstm_hidden_size, activation=None, - kernel_initializer=xavier_initializer()), self._dropout_ph) - pred = tf.layers.dense(inputs=layer1, units=n_tags, activation=None, - kernel_initializer=xavier_initializer()) - self.logits = tf.reshape(pred, [-1, ntime_steps, n_tags]) - - log_likelihood, self.transition_params = tf.contrib.crf.crf_log_likelihood(self.logits, - self._y_ph, - self.real_sent_lengths_ph) - # loss and opt - with tf.variable_scope("loss_and_opt"): - self.loss = tf.reduce_mean(-log_likelihood) - self.train_op = self.get_train_op(self.loss) - - self.sess = tf.Session() - self.sess.run(tf.global_variables_initializer()) - self.load() - - def predict(self, xs): - feed_dict = self._fill_feed_dict(xs) - logits, trans_params, sent_lengths = self.sess.run([self.logits, - self.transition_params, - self.real_sent_lengths_ph], - feed_dict=feed_dict) - # iterate over the sentences because no batching in viterbi_decode - y_pred = [] - for logit, sequence_length in zip(logits, sent_lengths): - logit = logit[:int(sequence_length)] # keep only the valid steps - viterbi_seq, viterbi_score = tf.contrib.crf.viterbi_decode(logit, trans_params) - y_pred += [viterbi_seq] - return y_pred - - def _fill_feed_dict(self, xs, y=None, train=False): - assert len(xs) == len(self._xs_ph_list) - xs = list(xs) - for x in xs[1:]: - x = np.array(x) - feed_dict = {ph: x for ph, x in zip(self._xs_ph_list, xs)} - if y is not None: - feed_dict[self._y_ph] = y - feed_dict[self.training_ph] = train - if not train: - feed_dict[self._dropout_ph] = 1.0 - - return feed_dict - - def __call__(self, *args, **kwargs): - if len(args[0]) == 0 or (args[0] == [0]): - return [] - return self.predict(args) - - def train_on_batch(self, *args): - *xs, y = args - feed_dict = self._fill_feed_dict(xs, y, train=True) - _, loss_value = self.sess.run([self.train_op, self.loss], feed_dict) - return {'loss': loss_value, - 'learning_rate': self.get_learning_rate(), - 'momentum': self.get_momentum()} - - def load_pretrained_word_emb(self, model_path, model_name, word_dim, word2id=None, vocab_size=None): - loaded_words = 0 - if word2id is not None: - vocab_size = len(word2id) - word_embeddings = np.zeros(shape=(vocab_size, word_dim)) - - if model_name == "glove": - model = KeyedVectors.load_word2vec_format(model_path, binary=False) - for word in word2id: - if word in model: - word_embeddings[word2id[word]] = model[word] - loaded_words += 1 - elif model_name == "baomoi": - model = KeyedVectors.load_word2vec_format(model_path, binary=True, unicode_errors='ignore') - for word in word2id: - if len(word) == 1: - if word[0] in string.punctuation: - word_embeddings[word2id[word]] = model[""] - loaded_words += 1 - elif word.isdigit(): - word_embeddings[word2id[word]] = model[""] - loaded_words += 1 - elif word in model.vocab: - word_embeddings[word2id[word]] = model[word] - loaded_words += 1 - elif model_name == "fasttext": - ft_model = FastText.load_fasttext_format(model_path) - for word in word2id: - if word in ft_model.wv.vocab: - word_embeddings[word2id[word]] = ft_model.wv[word] - loaded_words += 1 - elif model_name is not None: - raise RuntimeError(f'got an unexpected value for model_name: `{model_name}`') - - log.info(f"{loaded_words}/{vocab_size} words were loaded from {model_path}.") - return word_embeddings diff --git a/deeppavlov/models/ner/__init__.py b/deeppavlov/models/ner/__init__.py deleted file mode 100644 index e69de29bb2..0000000000 diff --git a/deeppavlov/models/ner/bio.py b/deeppavlov/models/ner/bio.py deleted file mode 100644 index 7eb75015ed..0000000000 --- a/deeppavlov/models/ner/bio.py +++ /dev/null @@ -1,46 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from logging import getLogger -from typing import List - -from deeppavlov.core.common.registry import register -from deeppavlov.core.models.component import Component - -log = getLogger(__name__) - - -@register('ner_bio_converter') -class BIOMarkupRestorer(Component): - """Restores BIO markup for tags batch""" - - def __init__(self, *args, **kwargs) -> None: - pass - - @staticmethod - def _convert_to_bio(tags: List[str]) -> List[str]: - tags_bio = [] - for n, tag in enumerate(tags): - if tag != 'O': - if n > 0 and tags[n - 1] == tag: - tag = 'I-' + tag - else: - tag = 'B-' + tag - tags_bio.append(tag) - - return tags_bio - - def __call__(self, tag_batch: List[List[str]], *args, **kwargs) -> List[List[str]]: - y = [self._convert_to_bio(sent) for sent in tag_batch] - return y diff --git a/deeppavlov/models/ner/network.py b/deeppavlov/models/ner/network.py deleted file mode 100644 index 56259ef07d..0000000000 --- a/deeppavlov/models/ner/network.py +++ /dev/null @@ -1,324 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from logging import getLogger -from typing import Tuple - -import numpy as np -import tensorflow as tf - -from deeppavlov.core.common.registry import register -from deeppavlov.core.layers.tf_layers import cudnn_bi_lstm, cudnn_bi_gru, bi_rnn, stacked_cnn, INITIALIZER -from deeppavlov.core.layers.tf_layers import embedding_layer, character_embedding_network, variational_dropout -from deeppavlov.core.models.tf_model import LRScheduledTFModel - -log = getLogger(__name__) - - -@register('ner') -class NerNetwork(LRScheduledTFModel): - """ - The :class:`~deeppavlov.models.ner.network.NerNetwork` is for Neural Named Entity Recognition and Slot Filling. - - Parameters: - n_tags: Number of tags in the tag vocabulary. - token_emb_dim: Dimensionality of token embeddings, needed if embedding matrix is not provided. - char_emb_dim: Dimensionality of token embeddings. - capitalization_dim : Dimensionality of capitalization features, if they are provided. - pos_features_dim: Dimensionality of POS features, if they are provided. - additional_features: Some other features. - net_type: Type of the network, either ``'rnn'`` or ``'cnn'``. - cell_type: Type of the cell in RNN, either ``'lstm'`` or ``'gru'``. - use_cudnn_rnn: Whether to use CUDNN implementation of RNN. - two_dense_on_top: Additional dense layer before predictions. - n_hidden_list: A list of output feature dimensionality for each layer. A value (100, 200) means that there will - be two layers with 100 and 200 units, respectively. - cnn_filter_width: The width of the convolutional kernel for Convolutional Neural Networks. - use_crf: Whether to use Conditional Random Fields on top of the network (recommended). - token_emb_mat: Token embeddings matrix. - char_emb_mat: Character embeddings matrix. - use_batch_norm: Whether to use Batch Normalization or not. Affects only CNN networks. - dropout_keep_prob: Probability of keeping the hidden state, values from 0 to 1. 0.5 works well in most cases. - embeddings_dropout: Whether to use dropout on embeddings or not. - top_dropout: Whether to use dropout on output units of the network or not. - intra_layer_dropout: Whether to use dropout between layers or not. - l2_reg: L2 norm regularization for all kernels. - gpu: Number of gpu to use. - seed: Random seed. - """ - GRAPH_PARAMS = ["n_tags", # TODO: add check - "char_emb_dim", - "capitalization_dim", - "additional_features", - "use_char_embeddings", - "additional_features", - "net_type", - "cell_type", - "char_filter_width", - "cell_type"] - - def __init__(self, - n_tags: int, # Features dimensions - token_emb_dim: int = None, - char_emb_dim: int = None, - capitalization_dim: int = None, - pos_features_dim: int = None, - additional_features: int = None, - net_type: str = 'rnn', # Net architecture - cell_type: str = 'lstm', - use_cudnn_rnn: bool = False, - two_dense_on_top: bool = False, - n_hidden_list: Tuple[int] = (128,), - cnn_filter_width: int = 7, - use_crf: bool = False, - token_emb_mat: np.ndarray = None, - char_emb_mat: np.ndarray = None, - use_batch_norm: bool = False, - dropout_keep_prob: float = 0.5, # Regularization - embeddings_dropout: bool = False, - top_dropout: bool = False, - intra_layer_dropout: bool = False, - l2_reg: float = 0.0, - gpu: int = None, - seed: int = None, - **kwargs) -> None: - tf.set_random_seed(seed) - np.random.seed(seed) - - assert n_tags != 0, 'Number of classes equal 0! It seems that vocabularies is not loaded.' \ - ' Check that all vocabulary files are downloaded!' - - if 'learning_rate_drop_div' not in kwargs: - kwargs['learning_rate_drop_div'] = 10.0 - if 'learning_rate_drop_patience' not in kwargs: - kwargs['learning_rate_drop_patience'] = 5.0 - if 'clip_norm' not in kwargs: - kwargs['clip_norm'] = 5.0 - super().__init__(**kwargs) - self._add_training_placeholders(dropout_keep_prob) - self._xs_ph_list = [] - self._y_ph = tf.placeholder(tf.int32, [None, None], name='y_ph') - self._input_features = [] - - # ================ Building input features ================= - - # Token embeddings - self._add_word_embeddings(token_emb_mat, token_emb_dim) - - # Masks for different lengths utterances - self.mask_ph = self._add_mask() - - # Char embeddings using highway CNN with max pooling - if char_emb_mat is not None and char_emb_dim is not None: - self._add_char_embeddings(char_emb_mat) - - # Capitalization features - if capitalization_dim is not None: - self._add_capitalization(capitalization_dim) - - # Part of speech features - if pos_features_dim is not None: - self._add_pos(pos_features_dim) - - # Anything you want - if additional_features is not None: - self._add_additional_features(additional_features) - - features = tf.concat(self._input_features, axis=2) - if embeddings_dropout: - features = variational_dropout(features, self._dropout_ph) - - # ================== Building the network ================== - - if net_type == 'rnn': - if use_cudnn_rnn: - if l2_reg > 0: - log.warning('cuDNN RNN are not l2 regularizable') - units = self._build_cudnn_rnn(features, n_hidden_list, cell_type, intra_layer_dropout, self.mask_ph) - else: - units = self._build_rnn(features, n_hidden_list, cell_type, intra_layer_dropout, self.mask_ph) - elif net_type == 'cnn': - units = self._build_cnn(features, n_hidden_list, cnn_filter_width, use_batch_norm) - self._logits = self._build_top(units, n_tags, n_hidden_list[-1], top_dropout, two_dense_on_top) - - self.train_op, self.loss = self._build_train_predict(self._logits, self.mask_ph, n_tags, - use_crf, l2_reg) - self.predict = self.predict_crf if use_crf else self.predict_no_crf - - # ================= Initialize the session ================= - - sess_config = tf.ConfigProto(allow_soft_placement=True) - sess_config.gpu_options.allow_growth = True - if gpu is not None: - sess_config.gpu_options.visible_device_list = str(gpu) - self.sess = tf.Session(config=sess_config) - self.sess.run(tf.global_variables_initializer()) - self.load() - - def _add_training_placeholders(self, dropout_keep_prob): - self._dropout_ph = tf.placeholder_with_default(dropout_keep_prob, shape=[], name='dropout') - self.training_ph = tf.placeholder_with_default(False, shape=[], name='is_training') - - def _add_word_embeddings(self, token_emb_mat, token_emb_dim=None): - if token_emb_mat is None: - token_ph = tf.placeholder(tf.float32, [None, None, token_emb_dim], name='Token_Ind_ph') - emb = token_ph - else: - token_ph = tf.placeholder(tf.int32, [None, None], name='Token_Ind_ph') - emb = embedding_layer(token_ph, token_emb_mat) - self._xs_ph_list.append(token_ph) - self._input_features.append(emb) - - def _add_mask(self): - mask_ph = tf.placeholder(tf.float32, [None, None], name='Mask_ph') - self._xs_ph_list.append(mask_ph) - return mask_ph - - def _add_char_embeddings(self, char_emb_mat): - character_indices_ph = tf.placeholder(tf.int32, [None, None, None], name='Char_ph') - char_embs = character_embedding_network(character_indices_ph, emb_mat=char_emb_mat) - self._xs_ph_list.append(character_indices_ph) - self._input_features.append(char_embs) - - def _add_capitalization(self, capitalization_dim): - capitalization_ph = tf.placeholder(tf.float32, [None, None, capitalization_dim], name='Capitalization_ph') - self._xs_ph_list.append(capitalization_ph) - self._input_features.append(capitalization_ph) - - def _add_pos(self, pos_features_dim): - pos_ph = tf.placeholder(tf.float32, [None, None, pos_features_dim], name='POS_ph') - self._xs_ph_list.append(pos_ph) - self._input_features.append(pos_ph) - - def _add_additional_features(self, features_list): - for feature, dim in features_list: - feat_ph = tf.placeholder(tf.float32, [None, None, dim], name=feature + '_ph') - self._xs_ph_list.append(feat_ph) - self._input_features.append(feat_ph) - - def _build_cudnn_rnn(self, units, n_hidden_list, cell_type, intra_layer_dropout, mask): - sequence_lengths = tf.to_int32(tf.reduce_sum(mask, axis=1)) - for n, n_hidden in enumerate(n_hidden_list): - with tf.variable_scope(cell_type.upper() + '_' + str(n)): - if cell_type.lower() == 'lstm': - units, _ = cudnn_bi_lstm(units, n_hidden, sequence_lengths) - elif cell_type.lower() == 'gru': - units, _ = cudnn_bi_gru(units, n_hidden, sequence_lengths) - else: - raise RuntimeError('Wrong cell type "{}"! Only "gru" and "lstm"!'.format(cell_type)) - units = tf.concat(units, -1) - if intra_layer_dropout and n != len(n_hidden_list) - 1: - units = variational_dropout(units, self._dropout_ph) - return units - - def _build_rnn(self, units, n_hidden_list, cell_type, intra_layer_dropout, mask): - sequence_lengths = tf.to_int32(tf.reduce_sum(mask, axis=1)) - for n, n_hidden in enumerate(n_hidden_list): - units, _ = bi_rnn(units, n_hidden, cell_type=cell_type, - seq_lengths=sequence_lengths, name='Layer_' + str(n)) - units = tf.concat(units, -1) - if intra_layer_dropout and n != len(n_hidden_list) - 1: - units = variational_dropout(units, self._dropout_ph) - return units - - def _build_cnn(self, units, n_hidden_list, cnn_filter_width, use_batch_norm): - units = stacked_cnn(units, n_hidden_list, cnn_filter_width, use_batch_norm, training_ph=self.training_ph) - return units - - def _build_top(self, units, n_tags, n_hididden, top_dropout, two_dense_on_top): - if top_dropout: - units = variational_dropout(units, self._dropout_ph) - if two_dense_on_top: - units = tf.layers.dense(units, n_hididden, activation=tf.nn.relu, - kernel_initializer=INITIALIZER(), - kernel_regularizer=tf.nn.l2_loss) - logits = tf.layers.dense(units, n_tags, activation=None, - kernel_initializer=INITIALIZER(), - kernel_regularizer=tf.nn.l2_loss) - return logits - - def _build_train_predict(self, logits, mask, n_tags, use_crf, l2_reg): - if use_crf: - sequence_lengths = tf.reduce_sum(mask, axis=1) - log_likelihood, transition_params = tf.contrib.crf.crf_log_likelihood(logits, self._y_ph, sequence_lengths) - loss_tensor = -log_likelihood - self._transition_params = transition_params - else: - ground_truth_labels = tf.one_hot(self._y_ph, n_tags) - loss_tensor = tf.nn.softmax_cross_entropy_with_logits(labels=ground_truth_labels, logits=logits) - loss_tensor = loss_tensor * mask - self._y_pred = tf.argmax(logits, axis=-1) - - loss = tf.reduce_mean(loss_tensor) - - # L2 regularization - if l2_reg > 0: - loss += l2_reg * tf.reduce_sum(tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)) - - train_op = self.get_train_op(loss) - return train_op, loss - - def predict_no_crf(self, xs): - feed_dict = self._fill_feed_dict(xs) - pred_idxs, mask = self.sess.run([self._y_pred, self.mask_ph], feed_dict) - - # Filter by sequece length - sequence_lengths = np.sum(mask, axis=1).astype(np.int32) - pred = [] - for utt, l in zip(pred_idxs, sequence_lengths): - pred.append(utt[:l]) - return pred - - def predict_crf(self, xs): - feed_dict = self._fill_feed_dict(xs) - logits, trans_params, mask = self.sess.run([self._logits, - self._transition_params, - self.mask_ph], - feed_dict=feed_dict) - sequence_lengths = np.maximum(np.sum(mask, axis=1).astype(np.int32), 1) - # iterate over the sentences because no batching in viterbi_decode - y_pred = [] - for logit, sequence_length in zip(logits, sequence_lengths): - logit = logit[:int(sequence_length)] # keep only the valid steps - viterbi_seq, viterbi_score = tf.contrib.crf.viterbi_decode(logit, trans_params) - y_pred += [viterbi_seq] - return y_pred - - def _fill_feed_dict(self, xs, y=None, train=False): - assert len(xs) == len(self._xs_ph_list) - xs = list(xs) - xs[0] = np.array(xs[0]) - feed_dict = {ph: x for ph, x in zip(self._xs_ph_list, xs)} - if y is not None: - feed_dict[self._y_ph] = y - feed_dict[self.training_ph] = train - if not train: - feed_dict[self._dropout_ph] = 1.0 - return feed_dict - - def __call__(self, *args, **kwargs): - if len(args[0]) == 0 or (len(args[0]) == 1 and len(args[0][0]) == 0): - return [] - return self.predict(args) - - def train_on_batch(self, *args): - *xs, y = args - feed_dict = self._fill_feed_dict(xs, y, train=True) - _, loss_value = self.sess.run([self.train_op, self.loss], feed_dict) - return {'loss': loss_value, - 'learning_rate': self.get_learning_rate(), - 'momentum': self.get_momentum()} - - def process_event(self, event_name, data): - super().process_event(event_name, data) diff --git a/deeppavlov/models/ner/svm.py b/deeppavlov/models/ner/svm.py deleted file mode 100644 index d8eda1538b..0000000000 --- a/deeppavlov/models/ner/svm.py +++ /dev/null @@ -1,83 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import pickle -from itertools import chain -from logging import getLogger -from typing import List, Union - -import numpy as np -from sklearn.svm import SVC - -from deeppavlov.core.common.registry import register -from deeppavlov.core.models.estimator import Estimator - -log = getLogger(__name__) - - -@register('ner_svm') -class SVMTagger(Estimator): - """ - ``SVM`` (Support Vector Machines) classifier for tagging sequences - - Parameters: - return_probabilities: whether to return probabilities or predictions - kernel: kernel of SVM (RBF works well in the most of the cases) - seed: seed for SVM initialization - """ - - def __init__(self, return_probabilities: bool = False, kernel: str = 'rbf', seed=42, *args, **kwargs) -> None: - super().__init__(*args, **kwargs) - self.classifier = None - self.return_probabilities = return_probabilities - self._kernel = kernel - self._seed = seed - - self.load() - - def fit(self, tokens: List[List[str]], tags: List[List[int]], *args, **kwargs) -> None: - tokens = list(chain(*tokens)) - tags = list(chain(*tags)) - self.classifier = SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, - decision_function_shape='ovr', degree=3, gamma='auto', - kernel=self._kernel, max_iter=-1, probability=self.return_probabilities, - random_state=self._seed, shrinking=True, tol=0.001, verbose=False) - self.classifier.fit(tokens, tags) - - def __call__(self, token_vectors_batch: List[List[str]], *args, **kwargs) -> \ - Union[List[List[int]], List[List[np.ndarray]]]: - lens = [len(utt) for utt in token_vectors_batch] - token_vectors_list = list(chain(*token_vectors_batch)) - predictions = self.classifier.predict(token_vectors_list) - y = [] - cl = 0 - for l in lens: - y.append(predictions[cl: cl + l]) - cl += l - return y - - def save(self) -> None: - with self.save_path.open('wb') as f: - pickle.dump(self.classifier, f, protocol=4) - - def serialize(self): - return pickle.dumps(self.classifier, protocol=4) - - def load(self) -> None: - if self.load_path.exists(): - with self.load_path.open('rb') as f: - self.classifier = pickle.load(f) - - def deserialize(self, data): - self.classifier = pickle.loads(data) diff --git a/deeppavlov/models/preprocessors/assemble_embeddings_matrix.py b/deeppavlov/models/preprocessors/assemble_embeddings_matrix.py deleted file mode 100644 index 35b0266d30..0000000000 --- a/deeppavlov/models/preprocessors/assemble_embeddings_matrix.py +++ /dev/null @@ -1,93 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import numpy as np -from sklearn.decomposition import PCA - -from deeppavlov.core.common.registry import register -from deeppavlov.core.data.simple_vocab import SimpleVocabulary -from deeppavlov.models.embedders.abstract_embedder import Embedder - - -@register('emb_mat_assembler') -class EmbeddingsMatrixAssembler: - """For a given Vocabulary assembles matrix of embeddings obtained from some `Embedder`. This - class also can assemble embeddins of characters using - - Args: - embedder: an instance of the class that convertes tokens to vectors. - For example :class:`~deeppavlov.models.embedders.fasttext_embedder.FasttextEmbedder` or - :class:`~deeppavlov.models.embedders.glove_embedder.GloVeEmbedder` - vocab: instance of :class:`~deeppavlov.core.data.SimpleVocab`. The matrix of embeddings - will be assembled relying on every token in the vocabulary. the indexing will match - vocabulary indexing. - character_level: whether to perform assembling on character level. This procedure will - assemble matrix with embeddings for every character using averaged embeddings of - words, that contain this character. - emb_dim: dimensionality of the resulting embeddings. If not ``None`` it should be less - or equal to the dimensionality of the embeddings provided by `Embedder`. The - reduction of dimensionality is performed by taking main components of PCA. - estimate_by_n: how much samples to use to estimate covariance matrix for PCA. - 10000 seems to be enough. - - Attributes: - dim: dimensionality of the embeddings (can be less than dimensionality of - embeddings produced by `Embedder`. - """ - - def __init__(self, - embedder: Embedder, - vocab: SimpleVocabulary, - character_level: bool = False, - emb_dim: int = None, - estimate_by_n: int = 10000, - *args, - **kwargs) -> None: - if emb_dim is None: - emb_dim = embedder.dim - self.emb_mat = np.zeros([len(vocab), emb_dim], dtype=np.float32) - tokens_for_estimation = list(embedder)[:estimate_by_n] - estimation_matrix = np.array([embedder([[word]])[0][0] for word in tokens_for_estimation], dtype=np.float32) - emb_std = np.std(estimation_matrix) - - if emb_dim < embedder.dim: - pca = PCA(n_components=emb_dim) - pca.fit(estimation_matrix) - elif emb_dim > embedder.dim: - raise RuntimeError(f'Model dimension must be greater than requested embeddings ' - f'dimension! model_dim = {embedder.dim}, requested_dim = {emb_dim}') - else: - pca = None - for n, token in enumerate(vocab): - if character_level: - char_in_word_bool = np.array([token in word for word in tokens_for_estimation], dtype=bool) - all_words_with_character = estimation_matrix[char_in_word_bool] - if len(all_words_with_character) != 0: - if pca is not None: - all_words_with_character = pca.transform(all_words_with_character) - self.emb_mat[n] = sum(all_words_with_character) / len(all_words_with_character) - else: - self.emb_mat[n] = np.random.randn(emb_dim) * np.std(self.emb_mat[:n]) - else: - try: - if pca is not None: - self.emb_mat[n] = pca.transform(embedder([[token]])[0])[0] - else: - self.emb_mat[n] = embedder([[token]])[0][0] - except KeyError: - self.emb_mat[n] = np.random.randn(emb_dim) * emb_std - - @property - def dim(self): - return self.emb_mat.shape[1] diff --git a/deeppavlov/models/preprocessors/bert_preprocessor.py b/deeppavlov/models/preprocessors/bert_preprocessor.py deleted file mode 100644 index e60a068193..0000000000 --- a/deeppavlov/models/preprocessors/bert_preprocessor.py +++ /dev/null @@ -1,324 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -import re -import random -from logging import getLogger -from typing import Tuple, List, Optional, Union - -from bert_dp.preprocessing import convert_examples_to_features, InputExample, InputFeatures -from bert_dp.tokenization import FullTokenizer - -from deeppavlov.core.commands.utils import expand_path -from deeppavlov.core.common.registry import register -from deeppavlov.core.data.utils import zero_pad -from deeppavlov.core.models.component import Component -from deeppavlov.models.preprocessors.mask import Mask - -log = getLogger(__name__) - - -@register('bert_preprocessor') -class BertPreprocessor(Component): - """Tokenize text on subtokens, encode subtokens with their indices, create tokens and segment masks. - - Check details in :func:`bert_dp.preprocessing.convert_examples_to_features` function. - - Args: - vocab_file: path to vocabulary - do_lower_case: set True if lowercasing is needed - max_seq_length: max sequence length in subtokens, including [SEP] and [CLS] tokens - - Attributes: - max_seq_length: max sequence length in subtokens, including [SEP] and [CLS] tokens - tokenizer: instance of Bert FullTokenizer - """ - - def __init__(self, - vocab_file: str, - do_lower_case: bool = True, - max_seq_length: int = 512, - **kwargs) -> None: - self.max_seq_length = max_seq_length - vocab_file = str(expand_path(vocab_file)) - self.tokenizer = FullTokenizer(vocab_file=vocab_file, - do_lower_case=do_lower_case) - - def __call__(self, texts_a: List[str], texts_b: Optional[List[str]] = None) -> List[InputFeatures]: - """Call Bert :func:`bert_dp.preprocessing.convert_examples_to_features` function to tokenize and create masks. - - texts_a and texts_b are separated by [SEP] token - - Args: - texts_a: list of texts, - texts_b: list of texts, it could be None, e.g. single sentence classification task - - Returns: - batch of :class:`bert_dp.preprocessing.InputFeatures` with subtokens, subtoken ids, subtoken mask, segment mask. - - """ - - if texts_b is None: - texts_b = [None] * len(texts_a) - # unique_id is not used - examples = [InputExample(unique_id=0, text_a=text_a, text_b=text_b) - for text_a, text_b in zip(texts_a, texts_b)] - return convert_examples_to_features(examples, self.max_seq_length, self.tokenizer) - - -@register('bert_ner_preprocessor') -class BertNerPreprocessor(Component): - """Takes tokens and splits them into bert subtokens, encodes subtokens with their indices. - Creates a mask of subtokens (one for the first subtoken, zero for the others). - - If tags are provided, calculates tags for subtokens. - - Args: - vocab_file: path to vocabulary - do_lower_case: set True if lowercasing is needed - max_seq_length: max sequence length in subtokens, including [SEP] and [CLS] tokens - max_subword_length: replace token to if it's length is larger than this - (defaults to None, which is equal to +infinity) - token_masking_prob: probability of masking token while training - provide_subword_tags: output tags for subwords or for words - subword_mask_mode: subword to select inside word tokens, can be "first" or "last" - (default="first") - - Attributes: - max_seq_length: max sequence length in subtokens, including [SEP] and [CLS] tokens - max_subword_length: rmax lenght of a bert subtoken - tokenizer: instance of Bert FullTokenizer - """ - - def __init__(self, - vocab_file: str, - do_lower_case: bool = False, - max_seq_length: int = 512, - max_subword_length: int = None, - token_masking_prob: float = 0.0, - provide_subword_tags: bool = False, - subword_mask_mode: str = "first", - **kwargs): - self._re_tokenizer = re.compile(r"[\w']+|[^\w ]") - self.provide_subword_tags = provide_subword_tags - self.mode = kwargs.get('mode') - self.max_seq_length = max_seq_length - self.max_subword_length = max_subword_length - self.subword_mask_mode = subword_mask_mode - vocab_file = str(expand_path(vocab_file)) - self.tokenizer = FullTokenizer(vocab_file=vocab_file, - do_lower_case=do_lower_case) - self.token_masking_prob = token_masking_prob - - def __call__(self, - tokens: Union[List[List[str]], List[str]], - tags: List[List[str]] = None, - **kwargs): - if isinstance(tokens[0], str): - tokens = [re.findall(self._re_tokenizer, s) for s in tokens] - subword_tokens, subword_tok_ids, startofword_markers, subword_tags = [], [], [], [] - for i in range(len(tokens)): - toks = tokens[i] - ys = ['O'] * len(toks) if tags is None else tags[i] - assert len(toks) == len(ys), \ - f"toks({len(toks)}) should have the same length as ys({len(ys)})" - sw_toks, sw_marker, sw_ys = \ - self._ner_bert_tokenize(toks, - ys, - self.tokenizer, - self.max_subword_length, - mode=self.mode, - subword_mask_mode=self.subword_mask_mode, - token_masking_prob=self.token_masking_prob) - if self.max_seq_length is not None: - if len(sw_toks) > self.max_seq_length: - raise RuntimeError(f"input sequence after bert tokenization" - f" shouldn't exceed {self.max_seq_length} tokens.") - subword_tokens.append(sw_toks) - subword_tok_ids.append(self.tokenizer.convert_tokens_to_ids(sw_toks)) - startofword_markers.append(sw_marker) - subword_tags.append(sw_ys) - assert len(sw_marker) == len(sw_toks) == len(subword_tok_ids[-1]) == len(sw_ys), \ - f"length of sow_marker({len(sw_marker)}), tokens({len(sw_toks)})," \ - f" token ids({len(subword_tok_ids[-1])}) and ys({len(ys)})" \ - f" for tokens = `{toks}` should match" - subword_tok_ids = zero_pad(subword_tok_ids, dtype=int, padding=0) - startofword_markers = zero_pad(startofword_markers, dtype=int, padding=0) - attention_mask = Mask()(subword_tokens) - - if tags is not None: - if self.provide_subword_tags: - return tokens, subword_tokens, subword_tok_ids, \ - attention_mask, startofword_markers, subword_tags - else: - nonmasked_tags = [[t for t in ts if t != 'X'] for ts in tags] - for swts, swids, swms, ts in zip(subword_tokens, - subword_tok_ids, - startofword_markers, - nonmasked_tags): - if (len(swids) != len(swms)) or (len(ts) != sum(swms)): - log.warning('Not matching lengths of the tokenization!') - log.warning(f'Tokens len: {len(swts)}\n Tokens: {swts}') - log.warning(f'Markers len: {len(swms)}, sum: {sum(swms)}') - log.warning(f'Masks: {swms}') - log.warning(f'Tags len: {len(ts)}\n Tags: {ts}') - return tokens, subword_tokens, subword_tok_ids, \ - attention_mask, startofword_markers, nonmasked_tags - return tokens, subword_tokens, subword_tok_ids, startofword_markers, attention_mask - - @staticmethod - def _ner_bert_tokenize(tokens: List[str], - tags: List[str], - tokenizer: FullTokenizer, - max_subword_len: int = None, - mode: str = None, - subword_mask_mode: str = "first", - token_masking_prob: float = None) -> Tuple[List[str], List[int], List[str]]: - do_masking = (mode == 'train') and (token_masking_prob is not None) - do_cutting = (max_subword_len is not None) - tokens_subword = ['[CLS]'] - startofword_markers = [0] - tags_subword = ['X'] - for token, tag in zip(tokens, tags): - token_marker = int(tag != 'X') - subwords = tokenizer.tokenize(token) - if not subwords or (do_cutting and (len(subwords) > max_subword_len)): - tokens_subword.append('[UNK]') - startofword_markers.append(token_marker) - tags_subword.append(tag) - else: - if do_masking and (random.random() < token_masking_prob): - tokens_subword.extend(['[MASK]'] * len(subwords)) - else: - tokens_subword.extend(subwords) - if subword_mask_mode == "last": - startofword_markers.extend([0] * (len(subwords) - 1) + [token_marker]) - else: - startofword_markers.extend([token_marker] + [0] * (len(subwords) - 1)) - tags_subword.extend([tag] + ['X'] * (len(subwords) - 1)) - - tokens_subword.append('[SEP]') - startofword_markers.append(0) - tags_subword.append('X') - return tokens_subword, startofword_markers, tags_subword - - -@register('bert_ranker_preprocessor') -class BertRankerPreprocessor(BertPreprocessor): - """Tokenize text to sub-tokens, encode sub-tokens with their indices, create tokens and segment masks for ranking. - - Builds features for a pair of context with each of the response candidates. - """ - - def __call__(self, batch: List[List[str]]) -> List[List[InputFeatures]]: - """Call BERT :func:`bert_dp.preprocessing.convert_examples_to_features` function to tokenize and create masks. - - Args: - batch: list of elemenents where the first element represents the batch with contexts - and the rest of elements represent response candidates batches - - Returns: - list of feature batches with subtokens, subtoken ids, subtoken mask, segment mask. - """ - - if isinstance(batch[0], str): - batch = [batch] - - cont_resp_pairs = [] - if len(batch[0]) == 1: - contexts = batch[0] - responses_empt = [None] * len(batch) - cont_resp_pairs.append(zip(contexts, responses_empt)) - else: - contexts = [el[0] for el in batch] - for i in range(1, len(batch[0])): - responses = [] - for el in batch: - responses.append(el[i]) - cont_resp_pairs.append(zip(contexts, responses)) - examples = [] - for s in cont_resp_pairs: - ex = [InputExample(unique_id=0, text_a=context, text_b=response) for context, response in s] - examples.append(ex) - features = [convert_examples_to_features(el, self.max_seq_length, self.tokenizer) for el in examples] - - return features - - -@register('bert_sep_ranker_preprocessor') -class BertSepRankerPreprocessor(BertPreprocessor): - """Tokenize text to sub-tokens, encode sub-tokens with their indices, create tokens and segment masks for ranking. - - Builds features for a context and for each of the response candidates separately. - """ - - def __call__(self, batch: List[List[str]]) -> List[List[InputFeatures]]: - """Call BERT :func:`bert_dp.preprocessing.convert_examples_to_features` function to tokenize and create masks. - - Args: - batch: list of elemenents where the first element represents the batch with contexts - and the rest of elements represent response candidates batches - - Returns: - list of feature batches with subtokens, subtoken ids, subtoken mask, segment mask - for the context and each of response candidates separately. - """ - - if isinstance(batch[0], str): - batch = [batch] - - samples = [] - for i in range(len(batch[0])): - s = [] - for el in batch: - s.append(el[i]) - samples.append(s) - s_empt = [None] * len(samples[0]) - # TODO: add unique id - examples = [] - for s in samples: - ex = [InputExample(unique_id=0, text_a=text_a, text_b=text_b) for text_a, text_b in - zip(s, s_empt)] - examples.append(ex) - features = [convert_examples_to_features(el, self.max_seq_length, self.tokenizer) for el in examples] - - return features - - -@register('bert_sep_ranker_predictor_preprocessor') -class BertSepRankerPredictorPreprocessor(BertSepRankerPreprocessor): - """Tokenize text to sub-tokens, encode sub-tokens with their indices, create tokens and segment masks for ranking. - - Builds features for a context and for each of the response candidates separately. - In addition, builds features for a response (and corresponding context) text base. - - Args: - resps: list of strings containing the base of text responses - resp_vecs: BERT vector respresentations of ``resps``, if is ``None`` features for the response base will be build - conts: list of strings containing the base of text contexts - cont_vecs: BERT vector respresentations of ``conts``, if is ``None`` features for the response base will be build - """ - - def __init__(self, - resps=None, resp_vecs=None, conts=None, cont_vecs=None, **kwargs) -> None: - super().__init__(**kwargs) - self.resp_features = None - self.cont_features = None - if resps is not None and resp_vecs is None: - log.info("Building BERT features for the response base...") - resp_batch = [[el] for el in resps] - self.resp_features = self(resp_batch) - if conts is not None and cont_vecs is None: - log.info("Building BERT features for the context base...") - cont_batch = [[el] for el in conts] - self.cont_features = self(cont_batch) diff --git a/deeppavlov/models/preprocessors/capitalization.py b/deeppavlov/models/preprocessors/capitalization.py deleted file mode 100644 index 3760979471..0000000000 --- a/deeppavlov/models/preprocessors/capitalization.py +++ /dev/null @@ -1,138 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - - -from typing import Tuple, List, Optional - -import numpy as np - -from deeppavlov.core.common.registry import register -from deeppavlov.core.data.utils import zero_pad -from deeppavlov.core.models.component import Component - - -@register('capitalization_featurizer') -class CapitalizationPreprocessor(Component): - """ - Featurizer useful for NER task. It detects following patterns in the words: - - no capitals - - single capital single character - - single capital multiple characters - - all capitals multiple characters - - Args: - pad_zeros: whether to pad capitalization features batch with zeros up - to maximal length or not. - - Attributes: - dim: dimensionality of the feature vectors, produced by the featurizer - """ - - def __init__(self, pad_zeros: bool = True, *args, **kwargs) -> None: - self.pad_zeros = pad_zeros - self._num_of_features = 4 - - @property - def dim(self): - return self._num_of_features - - def __call__(self, tokens_batch, **kwargs): - cap_batch = [] - max_batch_len = 0 - for utterance in tokens_batch: - cap_list = [] - max_batch_len = max(max_batch_len, len(utterance)) - for token in utterance: - cap = np.zeros(4, np.float32) - # Check the case and produce corresponding one-hot - if len(token) > 0: - if token[0].islower(): - cap[0] = 1 - elif len(token) == 1 and token[0].isupper(): - cap[1] = 1 - elif len(token) > 1 and token[0].isupper() and any(ch.islower() for ch in token): - cap[2] = 1 - elif all(ch.isupper() for ch in token): - cap[3] = 1 - cap_list.append(cap) - cap_batch.append(cap_list) - if self.pad_zeros: - return zero_pad(cap_batch) - else: - return cap_batch - - -def process_word(word: str, to_lower: bool = False, - append_case: Optional[str] = None) -> Tuple[str]: - """The method implements the following operations: - 1. converts word to a tuple of symbols (character splitting), - 2. optionally converts it to lowercase and - 3. adds capitalization label. - - Args: - word: input word - to_lower: whether to lowercase - append_case: whether to add case mark - ('' for first capital and '' for all caps) - - Returns: - a preprocessed word. - - Example: - >>> process_word(word="Zaman", to_lower=True, append_case="first") - ('', 'z', 'a', 'm', 'a', 'n') - >>> process_word(word="MSU", to_lower=True, append_case="last") - ('m', 's', 'u', '') - """ - if all(x.isupper() for x in word) and len(word) > 1: - uppercase = "" - elif word[0].isupper(): - uppercase = "" - else: - uppercase = None - if to_lower: - word = word.lower() - if word.isdigit(): - answer = [""] - elif word.startswith("http://") or word.startswith("www."): - answer = [""] - else: - answer = list(word) - if to_lower and uppercase is not None: - if append_case == "first": - answer = [uppercase] + answer - elif append_case == "last": - answer = answer + [uppercase] - return tuple(answer) - - -@register('char_splitting_lowercase_preprocessor') -class CharSplittingLowercasePreprocessor(Component): - """A callable wrapper over :func:`process_word`. - Takes as input a batch of tokenized sentences - and returns a batch of preprocessed sentences. - """ - - def __init__(self, to_lower: bool = True, append_case: str = "first", *args, **kwargs): - self.to_lower = to_lower - self.append_case = append_case - - def __call__(self, tokens_batch: List[List[str]], **kwargs) -> List[List[Tuple[str]]]: - answer = [] - for elem in tokens_batch: - # if isinstance(elem, str): - # elem = NLTKMosesTokenizer()([elem])[0] - # # elem = [x for x in re.split("(\w+|[,.])", elem) if x.strip() != ""] - answer.append([process_word(x, self.to_lower, self.append_case) for x in elem]) - return answer diff --git a/deeppavlov/models/preprocessors/char_splitter.py b/deeppavlov/models/preprocessors/char_splitter.py deleted file mode 100644 index c242d3612e..0000000000 --- a/deeppavlov/models/preprocessors/char_splitter.py +++ /dev/null @@ -1,37 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from logging import getLogger - -from overrides import overrides - -from deeppavlov.core.common.registry import register -from deeppavlov.core.models.component import Component - -log = getLogger(__name__) - - -@register('char_splitter') -class CharSplitter(Component): - """This component transforms batch of sequences of tokens into batch of sequences of character sequences.""" - - def __init__(self, **kwargs): - pass - - @overrides - def __call__(self, batch, *args, **kwargs): - char_batch = [] - for tokens_sequence in batch: - char_batch.append([list(tok) for tok in tokens_sequence]) - return char_batch diff --git a/deeppavlov/models/preprocessors/ner_preprocessor.py b/deeppavlov/models/preprocessors/ner_preprocessor.py index cef6bb6fef..db9e604840 100644 --- a/deeppavlov/models/preprocessors/ner_preprocessor.py +++ b/deeppavlov/models/preprocessors/ner_preprocessor.py @@ -11,89 +11,6 @@ log = getLogger(__name__) -@register("ner_preprocessor") -class NerPreprocessor(): - """ Preprocess the batch of list of tokens - - Params: - get_x_padded_for_elmo: whether the padded batch used for ELMo is returned - get_x_cap_padded: whether the padded batch used for capitalization feature extraction is returned - """ - - def __init__(self, get_x_padded_for_elmo=False, get_x_cap_padded=False, **kwargs): - self.get_x_padded_for_elmo = get_x_padded_for_elmo - self.get_x_cap_padded = get_x_cap_padded - - self.cap_vocab_size = 5 - - def encode_cap(self, s): - if s.upper() == s: - return 1 - elif s.lower() == s: - return 2 - elif (s[0].upper() == s[0]) and (s[1:].lower() == s[1:]): - return 3 - else: - return 4 - - def __call__(self, batch: List[List[str]], **kwargs): - """ Process the input batch - - Args: - batch: list of list of tokens - - Returns: - x_lower: batch in lowercase - sent_lengths: lengths of sents - x_padded_for_elmo (optional): batch padded with "", used as input for ELMo - x_cap_padded: batch of capitalization features - """ - - x_lower = [[token.lower() for token in sent] for sent in batch] - sent_lengths = [len(sent) for sent in batch] - ret = (x_lower, sent_lengths,) - - max_len = max(sent_lengths) - - if self.get_x_padded_for_elmo: - x_tokens_elmo = [sent + [""] * (max_len - len(sent)) for sent in batch] - ret += (x_tokens_elmo,) - - if self.get_x_cap_padded: - cap_seq = [[self.encode_cap(token) for token in sent] for sent in batch] - x_cap_padded = np.zeros((len(batch), max_len)) - for i, caps in enumerate(cap_seq): - x_cap_padded[i, :len(caps)] = caps - ret += (x_cap_padded,) - - return ret - - -@register("convert_ids2tags") -class ConvertIds2Tags(): - """ Class used to convert the batch of indices to the batch of tags - - Params: - id2tag: the dictionary used to convert the indices to the corresponding tags - - """ - - def __init__(self, id2tag, *args, **kwargs): - self.id2tag = id2tag - - def __call__(self, y_predicted): - """ Convert the batch of indices to the corresponding batch of tags - - Params: - y_predicted: the batch of indices - - Returns: - the corresponding batch of tags - """ - - return [[self.id2tag[id] for id in seq] for seq in y_predicted] - - @register("ner_vocab") class NerVocab(Estimator): """ Implementation of the NER vocabulary diff --git a/deeppavlov/models/preprocessors/random_embeddings_matrix.py b/deeppavlov/models/preprocessors/random_embeddings_matrix.py deleted file mode 100644 index b72f75a0fa..0000000000 --- a/deeppavlov/models/preprocessors/random_embeddings_matrix.py +++ /dev/null @@ -1,37 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import numpy as np - -from deeppavlov.core.common.registry import register - - -@register('random_emb_mat') -class RandomEmbeddingsMatrix: - """Assembles matrix of random embeddings. - - Args: - vocab_len: length of the vocabulary (number of tokens in it) - emb_dim: dimensionality of the embeddings - - Attributes: - dim: dimensionality of the embeddings - """ - - def __init__(self, vocab_len: int, emb_dim: int, *args, **kwargs) -> None: - self.emb_mat = np.random.randn(vocab_len, emb_dim).astype(np.float32) / np.sqrt(emb_dim) - - @property - def dim(self): - return self.emb_mat.shape[1] diff --git a/deeppavlov/models/preprocessors/russian_lemmatizer.py b/deeppavlov/models/preprocessors/russian_lemmatizer.py deleted file mode 100644 index ae68f4fc97..0000000000 --- a/deeppavlov/models/preprocessors/russian_lemmatizer.py +++ /dev/null @@ -1,37 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import pymorphy2 - -from deeppavlov.core.common.registry import register -from deeppavlov.core.models.component import Component - - -@register('pymorphy_russian_lemmatizer') -class PymorphyRussianLemmatizer(Component): - """Class for lemmatization using PyMorphy.""" - - def __init__(self, *args, **kwargs): - self.lemmatizer = pymorphy2.MorphAnalyzer() - - def __call__(self, tokens_batch, **kwargs): - """Takes batch of tokens and returns the lemmatized tokens.""" - lemma_batch = [] - for utterance in tokens_batch: - lemma_utterance = [] - for token in utterance: - p = self.lemmatizer.parse(token)[0] - lemma_utterance.append(p.normal_form) - lemma_batch.append(lemma_utterance) - return lemma_batch diff --git a/deeppavlov/models/preprocessors/siamese_preprocessor.py b/deeppavlov/models/preprocessors/siamese_preprocessor.py deleted file mode 100644 index 9a7a92332e..0000000000 --- a/deeppavlov/models/preprocessors/siamese_preprocessor.py +++ /dev/null @@ -1,138 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from logging import getLogger -from typing import List, Union, Iterable, Optional - -import numpy as np - -from deeppavlov.core.commands.utils import expand_path -from deeppavlov.core.common.registry import register -from deeppavlov.core.data.utils import zero_pad_truncate -from deeppavlov.core.models.component import Component -from deeppavlov.core.models.estimator import Estimator - -log = getLogger(__name__) - - -@register('siamese_preprocessor') -class SiamesePreprocessor(Estimator): - """ Preprocessing of data samples containing text strings to feed them in a siamese network. - - First ``num_context_turns`` strings in each data sample corresponds to the dialogue ``context`` - and the rest string(s) in the sample is (are) ``response(s)``. - - Args: - save_path: The parameter is only needed to initialize the base class - :class:`~deeppavlov.core.models.serializable.Serializable`. - load_path: The parameter is only needed to initialize the base class - :class:`~deeppavlov.core.models.serializable.Serializable`. - max_sequence_length: A maximum length of text sequences in tokens. - Longer sequences will be truncated and shorter ones will be padded. - dynamic_batch: Whether to use dynamic batching. If ``True``, the maximum length of a sequence for a batch - will be equal to the maximum of all sequences lengths from this batch, - but not higher than ``max_sequence_length``. - padding: Padding. Possible values are ``pre`` and ``post``. - If set to ``pre`` a sequence will be padded at the beginning. - If set to ``post`` it will padded at the end. - truncating: Truncating. Possible values are ``pre`` and ``post``. - If set to ``pre`` a sequence will be truncated at the beginning. - If set to ``post`` it will truncated at the end. - use_matrix: Whether to use a trainable matrix with token (word) embeddings. - num_context_turns: A number of ``context`` turns in data samples. - num_ranking_samples: A number of condidates for ranking including positive one. - add_raw_text: whether add raw text sentences to output data list or not. - Use with conjunction of models using sentence encoders - tokenizer: An instance of one of the :class:`deeppavlov.models.tokenizers`. - vocab: An instance of :class:`deeppavlov.core.data.simple_vocab.SimpleVocabulary`. - embedder: an instance of one of the :class:`deeppavlov.models.embedders`. - sent_vocab: An instance of of :class:`deeppavlov.core.data.simple_vocab.SimpleVocabulary`. - It is used to store all ``responces`` and to find the best ``response`` - to the user ``context`` in the ``interact`` mode. - """ - - def __init__(self, - save_path: str = './tok.dict', - load_path: str = './tok.dict', - max_sequence_length: int = None, - dynamic_batch: bool = False, - padding: str = 'post', - truncating: str = 'post', - use_matrix: bool = True, - num_context_turns: int = 1, - num_ranking_samples: int = 1, - add_raw_text: bool = False, - tokenizer: Component = None, - vocab: Optional[Estimator] = None, - embedder: Optional[Component] = None, - sent_vocab: Optional[Estimator] = None, - **kwargs): - - self.max_sequence_length = max_sequence_length - self.padding = padding - self.truncating = truncating - self.dynamic_batch = dynamic_batch - self.use_matrix = use_matrix - self.num_ranking_samples = num_ranking_samples - self.num_context_turns = num_context_turns - self.add_raw_text = add_raw_text - self.tokenizer = tokenizer - self.embedder = embedder - self.vocab = vocab - self.sent_vocab = sent_vocab - self.save_path = expand_path(save_path).resolve() - self.load_path = expand_path(load_path).resolve() - - super().__init__(load_path=self.load_path, save_path=self.save_path, **kwargs) - - def fit(self, x: List[List[str]]) -> None: - if self.sent_vocab is not None: - self.sent_vocab.fit([el[self.num_context_turns:] for el in x]) - x_tok = [self.tokenizer(el) for el in x] - self.vocab.fit([el for x in x_tok for el in x]) - - def __call__(self, x: Union[List[List[str]], List[str]]) -> Iterable[List[List[np.ndarray]]]: - if len(x) == 0 or isinstance(x[0], str): - if len(x) == 1: # interact mode: len(batch) == 1 - x_preproc = [[sent.strip() for sent in x[0].split('&')]] # List[str] -> List[List[str]] - elif len(x) == 0: - x_preproc = [['']] - else: - x_preproc = [[el] for el in x] - else: - x_preproc = [el[:self.num_context_turns + self.num_ranking_samples] for el in x] - for el in x_preproc: - x_tok = self.tokenizer(el) - x_ctok = [y if len(y) != 0 else [''] for y in x_tok] - if self.use_matrix: - x_proc = self.vocab(x_ctok) - else: - x_proc = self.embedder(x_ctok) - if self.dynamic_batch: - msl = min((max([len(y) for el in x_tok for y in el]), self.max_sequence_length)) - else: - msl = self.max_sequence_length - x_proc = zero_pad_truncate(x_proc, msl, pad=self.padding, trunc=self.truncating) - x_proc = list(x_proc) - if self.add_raw_text: - x_proc += el # add (self.num_context_turns+self.num_ranking_samples) raw sentences - yield x_proc - - def load(self) -> None: - pass - - def save(self) -> None: - if self.sent_vocab is not None: - self.sent_vocab.save() - self.vocab.save() diff --git a/deeppavlov/models/preprocessors/squad_preprocessor.py b/deeppavlov/models/preprocessors/squad_preprocessor.py index c342902d4f..3346522f65 100644 --- a/deeppavlov/models/preprocessors/squad_preprocessor.py +++ b/deeppavlov/models/preprocessors/squad_preprocessor.py @@ -14,425 +14,64 @@ import bisect -import pickle -import unicodedata -from collections import Counter from logging import getLogger -from pathlib import Path -from typing import Tuple, List, Union, Dict +from typing import List, Dict -import numpy as np -from nltk import word_tokenize -from tqdm import tqdm - -from deeppavlov.core.commands.utils import expand_path from deeppavlov.core.common.registry import register from deeppavlov.core.models.component import Component -from deeppavlov.core.models.estimator import Estimator logger = getLogger(__name__) -@register('squad_preprocessor') -class SquadPreprocessor(Component): - """ SquadPreprocessor is used to preprocess context and question in SQuAD-like datasets. - - Preprocessing includes: sanitizing unicode symbols, quotes, word tokenizing and - building mapping from raw text to processed text. - - Params: - context_limit: max context length in tokens - question_limit: max question length in tokens - char_limit: max number of characters in token - """ - - def __init__(self, context_limit: int = 450, question_limit: int = 150, char_limit: int = 16, *args, **kwargs): - self.context_limit = context_limit - self.question_limit = question_limit - self.char_limit = char_limit - - def __call__(self, contexts_raw: Tuple[str, ...], questions_raw: Tuple[str, ...], - **kwargs) -> Tuple[ - List[str], List[List[str]], List[List[List[str]]], - List[List[int]], List[List[int]], - List[str], List[List[str]], List[List[List[str]]], - List[List[Tuple[int, int]]] - ]: - """ Performs preprocessing of context and question - Args: - contexts_raw: batch of contexts to preprocess - questions_raw: batch of questions to preprocess - - Returns: - context: batch of processed contexts - contexts_tokens: batch of tokenized contexts - contexts_chars: batch of tokenized and split on chars contexts - contexts_r2p: batch of mappings from raw context to processed context - contexts_p2r: batch of mappings from procesesd context to raw context - questions: batch of processed questions - questions_tokens: batch of tokenized questions - questions_chars: batch of tokenized and split on chars questions - spans: batch of mapping tokens to position in context - """ - contexts = [] - contexts_tokens = [] - contexts_chars = [] - contexts_r2p = [] - contexts_p2r = [] - questions = [] - questions_tokens = [] - questions_chars = [] - spans = [] - for c_raw, q_raw in zip(contexts_raw, questions_raw): - c, r2p, p2r = SquadPreprocessor.preprocess_str(c_raw, return_mapping=True) - c_tokens = [token.replace("''", '"').replace("``", '"') for token in word_tokenize(c)][:self.context_limit] - c_chars = [list(token)[:self.char_limit] for token in c_tokens] - q = SquadPreprocessor.preprocess_str(q_raw) - q_tokens = [token.replace("''", '"').replace("``", '"') for token in word_tokenize(q)][:self.question_limit] - q_chars = [list(token)[:self.char_limit] for token in q_tokens] - contexts.append(c) - contexts_tokens.append(c_tokens) - contexts_chars.append(c_chars) - contexts_r2p.append(r2p) - contexts_p2r.append(p2r) - questions.append(q) - questions_tokens.append(q_tokens) - questions_chars.append(q_chars) - spans.append(SquadPreprocessor.convert_idx(c, c_tokens)) - return contexts, contexts_tokens, contexts_chars, contexts_r2p, contexts_p2r, \ - questions, questions_tokens, questions_chars, spans - - @staticmethod - def preprocess_str(line: str, return_mapping: bool = False) -> Union[Tuple[str, List[int], List[int]], str]: - """ Removes unicode and other characters from str - - Args: - line: string to process - return_mapping: return mapping from line to preprocessed line or not - - Returns: - preprocessed line, raw2preprocessed mapping, preprocessed2raw mapping - - """ - if not return_mapping: - return ''.join(c for c in line if not unicodedata.combining(c)).replace("''", '" ').replace("``", '" ') - - r2p = [len(line)] * (len(line) + 1) - p2r = [len(line)] * (len(line) + 1) - s = '' - for i, c in enumerate(line): - if unicodedata.combining(c): - r2p[i] = -1 - else: - s += c - r2p[i] = len(s) - 1 - p2r[len(s) - 1] = i - return s.replace("''", '" ').replace("``", '" '), r2p, p2r - - @staticmethod - def convert_idx(text: str, tokens: List[str]) -> List[Tuple[int, int]]: - current = 0 - spans = [] - for token in tokens: - current = text.find(token, current) - if current < 0: - logger.error("Token {} cannot be found".format(token)) - raise Exception() - spans.append((current, current + len(token))) - current += len(token) - return spans - - -@register('squad_ans_preprocessor') -class SquadAnsPreprocessor(Component): - """ SquadAnsPreprocessor is responsible for answer preprocessing.""" - - def __init__(self, *args, **kwargs): - pass - - def __call__(self, answers_raw: Tuple[List[str], ...], answers_start: Tuple[List[int], ...], - r2ps: List[List[int]], spans: List[List[Tuple[int, int]]], - **kwargs) -> Tuple[List[List[str]], List[List[int]], List[List[int]]]: - """ Processes answers for SQuAD dataset - - Args: - answers_raw: list of str [batch_size x number_of_answers] - answers_start: start position of answer (in chars) [batch_size x number_of_answers] - r2ps: mapping from raw context to processed context - spans: mapping tokens to position in context - - Returns: - processed answer text, start position in tokens, end position in tokens - [batch_size x number_of_answers] - - """ - answers = [] - start = [] - end = [] - for ans_raw, ans_st, r2p, span in zip(answers_raw, answers_start, r2ps, spans): - start.append([]) - end.append([]) - answers.append([]) - for a_raw, a_st in zip(ans_raw, ans_st): - ans = SquadPreprocessor.preprocess_str(a_raw) - ans_st = r2p[a_st] - ans_end = ans_st + len(ans) - answer_span = [] - for idx, sp in enumerate(span): - if not (ans_end <= sp[0] or ans_st >= sp[1]): - answer_span.append(idx) - if len(answer_span) != 0: - y1, y2 = answer_span[0], answer_span[-1] - else: - # answer not found in context - y1, y2 = -1, -1 - start[-1].append(y1) - end[-1].append(y2) - answers[-1].append(ans) - return answers, start, end - - -@register('squad_vocab_embedder') -class SquadVocabEmbedder(Estimator): - """ SquadVocabEmbedder is used to build tokens/chars vocabulary and embedding matrix. - - It extracts tokens/chars form dataset and looks for pretrained embeddings. - - Params: - emb_folder: path to download pretrained embeddings - emb_url: link to pretrained embeddings - save_path: extracted embeddings save path - load_path: extracted embeddigns load path - context_limit: max context length in tokens - question_limit: max question length in tokens - char_limit: max number of characters in token - level: token or char - """ - - def __init__(self, emb_folder: str, emb_url: str, save_path: str, load_path: str, - context_limit: int = 450, question_limit: int = 150, char_limit: int = 16, - level: str = 'token', *args, **kwargs): - self.emb_folder = expand_path(emb_folder) - self.level = level - self.emb_url = emb_url - self.emb_file_name = Path(emb_url).name - self.save_path = expand_path(save_path) - self.load_path = expand_path(load_path) - self.context_limit = context_limit - self.question_limit = question_limit - self.char_limit = char_limit - self.loaded = False - - self.NULL = "" - self.OOV = "" - - self.emb_folder.mkdir(parents=True, exist_ok=True) - - self.emb_dim = self.emb_mat = self.token2idx_dict = None - - if self.load_path.exists(): - self.load() - - def __call__(self, contexts: List[List[str]], questions: List[List[str]]) -> Tuple[np.ndarray, np.ndarray]: - """ Transforms tokens/chars to indices. - - Args: - contexts: batch of list of tokens in context - questions: batch of list of tokens in question - - Returns: - transformed contexts and questions - """ - if self.level == 'token': - c_idxs = np.zeros([len(contexts), self.context_limit], dtype=np.int32) - q_idxs = np.zeros([len(questions), self.question_limit], dtype=np.int32) - for i, context in enumerate(contexts): - for j, token in enumerate(context): - c_idxs[i, j] = self._get_idx(token) - - for i, question in enumerate(questions): - for j, token in enumerate(question): - q_idxs[i, j] = self._get_idx(token) - - elif self.level == 'char': - c_idxs = np.zeros([len(contexts), self.context_limit, self.char_limit], dtype=np.int32) - q_idxs = np.zeros([len(questions), self.question_limit, self.char_limit], dtype=np.int32) - for i, context in enumerate(contexts): - for j, token in enumerate(context): - for k, char in enumerate(token): - c_idxs[i, j, k] = self._get_idx(char) - - for i, question in enumerate(questions): - for j, token in enumerate(question): - for k, char in enumerate(token): - q_idxs[i, j, k] = self._get_idx(char) - - return c_idxs, q_idxs - - def fit(self, contexts: Tuple[List[str], ...], questions: Tuple[List[str]], *args, **kwargs): - self.vocab = Counter() - self.embedding_dict = dict() - if not self.loaded: - logger.info('SquadVocabEmbedder: fitting with {}s'.format(self.level)) - if self.level == 'token': - for line in tqdm(contexts + questions): - for token in line: - self.vocab[token] += 1 - elif self.level == 'char': - for line in tqdm(contexts + questions): - for token in line: - for c in token: - self.vocab[c] += 1 - else: - raise RuntimeError("SquadVocabEmbedder::fit: Unknown level: {}".format(self.level)) - - with (self.emb_folder / self.emb_file_name).open('r', encoding='utf8') as femb: - emb_voc_size, self.emb_dim = map(int, femb.readline().split()) - for line in tqdm(femb, total=emb_voc_size): - line_split = line.strip().split(' ') - word = line_split[0] - vec = np.array(line_split[1:], dtype=float) - if len(vec) != self.emb_dim: - continue - if word in self.vocab: - self.embedding_dict[word] = vec - - self.token2idx_dict = {token: idx for idx, token in enumerate(self.embedding_dict.keys(), 2)} - self.token2idx_dict[self.NULL] = 0 - self.token2idx_dict[self.OOV] = 1 - self.embedding_dict[self.NULL] = [0.] * self.emb_dim - self.embedding_dict[self.OOV] = [0.] * self.emb_dim - idx2emb_dict = {idx: self.embedding_dict[token] - for token, idx in self.token2idx_dict.items()} - self.emb_mat = np.array([idx2emb_dict[idx] for idx in range(len(idx2emb_dict))]) - - def load(self) -> None: - logger.info('SquadVocabEmbedder: loading saved {}s vocab from {}'.format(self.level, self.load_path)) - with self.load_path.open('rb') as f: - self.emb_dim, self.emb_mat, self.token2idx_dict = pickle.load(f) - self.loaded = True - - def deserialize(self, data: bytes) -> None: - self.emb_dim, self.emb_mat, self.token2idx_dict = pickle.loads(data) - self.loaded = True - - def save(self) -> None: - logger.info('SquadVocabEmbedder: saving {}s vocab to {}'.format(self.level, self.save_path)) - self.save_path.parent.mkdir(parents=True, exist_ok=True) - with self.save_path.open('wb') as f: - pickle.dump((self.emb_dim, self.emb_mat, self.token2idx_dict), f, protocol=4) - - def serialize(self) -> bytes: - return pickle.dumps((self.emb_dim, self.emb_mat, self.token2idx_dict), protocol=4) - - def _get_idx(self, el: str) -> int: - """ Returns idx for el (token or char). - - Args: - el: token or character - - Returns: - idx in vocabulary - """ - for e in (el, el.lower(), el.capitalize(), el.upper()): - if e in self.token2idx_dict: - return self.token2idx_dict[e] - return 1 - - -@register('squad_ans_postprocessor') -class SquadAnsPostprocessor(Component): - """ SquadAnsPostprocessor class is responsible for processing SquadModel output. - - It extract answer from context using predicted by SquadModel answer positions. - """ - - def __init__(self, *args, **kwargs): - pass - - def __call__(self, ans_start: Tuple[int, ...], ans_end: Tuple[int, ...], contexts: Tuple[str, ...], - p2rs: List[List[int]], spans: List[List[Tuple[int, int]]], - **kwargs) -> Tuple[List[str], List[int], List[int]]: - """ Extracts answer from context using predicted answer positions. - - Args: - ans_start: predicted start position in processed context: list of ints with len(ans_start) == batch_size - ans_end: predicted end position in processed context - contexts: raw contexts - p2rs: mapping from processed context to raw - spans: tokens positions in context - - Returns: - postprocessed answer text, start position in raw context, end position in raw context - """ - answers = [] - start = [] - end = [] - for a_st, a_end, c, p2r, span in zip(ans_start, ans_end, contexts, p2rs, spans): - if a_st == -1 or a_end == -1: - start.append(-1) - end.append(-1) - answers.append('') - else: - start.append(p2r[span[a_st][0]]) - end.append(p2r[span[a_end][1]]) - answers.append(c[start[-1]:end[-1]]) - return answers, start, end - - @register('squad_bert_mapping') class SquadBertMappingPreprocessor(Component): """Create mapping from BERT subtokens to their characters positions and vice versa. - Args: do_lower_case: set True if lowercasing is needed - """ def __init__(self, do_lower_case: bool = True, *args, **kwargs): self.do_lower_case = do_lower_case - def __call__(self, contexts, bert_features, *args, **kwargs): - subtok2chars: List[Dict[int, int]] = [] - char2subtoks: List[Dict[int, int]] = [] + def __call__(self, contexts_batch, bert_features_batch, subtokens_batch, **kwargs): + subtok2chars_batch: List[List[Dict[int, int]]] = [] + char2subtoks_batch: List[List[Dict[int, int]]] = [] - for batch_counter, (context, features) in enumerate(zip(contexts, bert_features)): - subtokens: List[str] - if self.do_lower_case: - context = context.lower() - if len(args) > 0: - subtokens = args[0][batch_counter] - else: - subtokens = features.tokens - context_start = subtokens.index('[SEP]') + 1 - idx = 0 - subtok2char: Dict[int, int] = {} - char2subtok: Dict[int, int] = {} - for i, subtok in list(enumerate(subtokens))[context_start:-1]: - subtok = subtok[2:] if subtok.startswith('##') else subtok - subtok_pos = context[idx:].find(subtok) - if subtok_pos == -1: - # it could be UNK - idx += 1 # len was at least one - else: - # print(k, '\t', t, p + idx) - idx += subtok_pos - subtok2char[i] = idx - for j in range(len(subtok)): - char2subtok[idx + j] = i - idx += len(subtok) - subtok2chars.append(subtok2char) - char2subtoks.append(char2subtok) - return subtok2chars, char2subtoks + for batch_counter, (context_list, features_list, subtokens_list) in \ + enumerate(zip(contexts_batch, bert_features_batch, subtokens_batch)): + subtok2chars_list, char2subtoks_list = [], [] + for context, features, subtokens in zip(context_list, features_list, subtokens_list): + if self.do_lower_case: + context = context.lower() + context_start = subtokens.index('[SEP]') + 1 + idx = 0 + subtok2char: Dict[int, int] = {} + char2subtok: Dict[int, int] = {} + for i, subtok in list(enumerate(subtokens))[context_start:-1]: + subtok = subtok[2:] if subtok.startswith('##') else subtok + subtok_pos = context[idx:].find(subtok) + if subtok_pos == -1: + # it could be UNK + idx += 1 # len was at least one + else: + # print(k, '\t', t, p + idx) + idx += subtok_pos + subtok2char[i] = idx + for j in range(len(subtok)): + char2subtok[idx + j] = i + idx += len(subtok) + subtok2chars_list.append(subtok2char) + char2subtoks_list.append(char2subtok) + subtok2chars_batch.append(subtok2chars_list) + char2subtoks_batch.append(char2subtoks_list) + return subtok2chars_batch, char2subtoks_batch @register('squad_bert_ans_preprocessor') class SquadBertAnsPreprocessor(Component): """Create answer start and end positions in subtokens. - Args: do_lower_case: set True if lowercasing is needed - """ def __init__(self, do_lower_case: bool = True, *args, **kwargs): @@ -448,7 +87,7 @@ def __call__(self, answers_raw, answers_start, char2subtoks, **kwargs): if self.do_lower_case: ans = ans.lower() try: - indices = {c2sub[i] for i in range(ans_st, ans_st + len(ans)) if i in c2sub} + indices = {c2sub[0][i] for i in range(ans_st, ans_st + len(ans)) if i in c2sub[0]} st = min(indices) end = max(indices) except ValueError: @@ -468,12 +107,17 @@ class SquadBertAnsPostprocessor(Component): def __init__(self, *args, **kwargs): pass - def __call__(self, answers_start, answers_end, contexts, bert_features, subtok2chars, *args, **kwargs): + def __call__(self, answers_start_batch, answers_end_batch, contexts_batch, + subtok2chars_batch, subtokens_batch, ind_batch, *args, **kwargs): answers = [] starts = [] ends = [] - for batch_counter, (answer_st, answer_end, context, features, sub2c) in \ - enumerate(zip(answers_start, answers_end, contexts, bert_features, subtok2chars)): + for answer_st, answer_end, context_list, sub2c_list, subtokens_list, ind in \ + zip(answers_start_batch, answers_end_batch, contexts_batch, subtok2chars_batch, subtokens_batch, + ind_batch): + sub2c = sub2c_list[ind] + subtok = subtokens_list[ind][answer_end] + context = context_list[ind] # CLS token is no_answer token if answer_st == 0 or answer_end == 0: answers += [''] @@ -482,10 +126,7 @@ def __call__(self, answers_start, answers_end, contexts, bert_features, subtok2c else: st = self.get_char_position(sub2c, answer_st) end = self.get_char_position(sub2c, answer_end) - if len(args) > 0: - subtok = args[0][batch_counter][answer_end] - else: - subtok = features.tokens[answer_end] + subtok = subtok[2:] if subtok.startswith('##') else subtok answer = context[st:end + len(subtok)] answers += [answer] diff --git a/deeppavlov/models/preprocessors/torch_transformers_preprocessor.py b/deeppavlov/models/preprocessors/torch_transformers_preprocessor.py index ed690a959a..8bc2daec34 100644 --- a/deeppavlov/models/preprocessors/torch_transformers_preprocessor.py +++ b/deeppavlov/models/preprocessors/torch_transformers_preprocessor.py @@ -12,16 +12,18 @@ # See the License for the specific language governing permissions and # limitations under the License. -import re +import math import random +import re from collections import defaultdict from dataclasses import dataclass from logging import getLogger from pathlib import Path -import torch -from typing import Tuple, List, Optional, Union, Dict, Set +from typing import Tuple, List, Optional, Union, Dict, Set, Any +import nltk import numpy as np +import torch from transformers import AutoTokenizer from transformers.data.processors.utils import InputFeatures @@ -38,17 +40,13 @@ class TorchTransformersMultiplechoicePreprocessor(Component): """Tokenize text on subtokens, encode subtokens with their indices, create tokens and segment masks. - Check details in :func:`bert_dp.preprocessing.convert_examples_to_features` function. - Args: vocab_file: path to vocabulary do_lower_case: set True if lowercasing is needed max_seq_length: max sequence length in subtokens, including [SEP] and [CLS] tokens - return_tokens: whether to return tuple of input features and tokens, or only input features Attributes: max_seq_length: max sequence length in subtokens, including [SEP] and [CLS] tokens - return_tokens: whether to return tuple of input features and tokens, or only input features tokenizer: instance of Bert FullTokenizer """ @@ -57,10 +55,8 @@ def __init__(self, vocab_file: str, do_lower_case: bool = True, max_seq_length: int = 512, - return_tokens: bool = False, **kwargs) -> None: self.max_seq_length = max_seq_length - self.return_tokens = return_tokens if Path(vocab_file).is_file(): vocab_file = str(expand_path(vocab_file)) self.tokenizer = AutoTokenizer(vocab_file=vocab_file, @@ -120,17 +116,14 @@ def __call__(self, texts_a: List[List[str]], texts_b: List[List[str]] = None) -> class TorchTransformersPreprocessor(Component): """Tokenize text on subtokens, encode subtokens with their indices, create tokens and segment masks. - Check details in :func:`bert_dp.preprocessing.convert_examples_to_features` function. - Args: - vocab_file: path to vocabulary + vocab_file: A string, the `model id` of a predefined tokenizer hosted inside a model repo on huggingface.co or + a path to a `directory` containing vocabulary files required by the tokenizer. do_lower_case: set True if lowercasing is needed max_seq_length: max sequence length in subtokens, including [SEP] and [CLS] tokens - return_tokens: whether to return tuple of input features and tokens, or only input features Attributes: max_seq_length: max sequence length in subtokens, including [SEP] and [CLS] tokens - return_tokens: whether to return tuple of input features and tokens, or only input features tokenizer: instance of Bert FullTokenizer """ @@ -139,28 +132,18 @@ def __init__(self, vocab_file: str, do_lower_case: bool = True, max_seq_length: int = 512, - return_tokens: bool = False, **kwargs) -> None: self.max_seq_length = max_seq_length - self.return_tokens = return_tokens - if Path(vocab_file).is_file(): - vocab_file = str(expand_path(vocab_file)) - self.tokenizer = AutoTokenizer(vocab_file=vocab_file, - do_lower_case=do_lower_case) - else: - self.tokenizer = AutoTokenizer.from_pretrained(vocab_file, do_lower_case=do_lower_case) + self.tokenizer = AutoTokenizer.from_pretrained(vocab_file, do_lower_case=do_lower_case) def __call__(self, texts_a: List[str], texts_b: Optional[List[str]] = None) -> Union[List[InputFeatures], Tuple[List[InputFeatures], List[List[str]]]]: """Tokenize and create masks. - texts_a and texts_b are separated by [SEP] token - Args: texts_a: list of texts, texts_b: list of texts, it could be None, e.g. single sentence classification task - Returns: batch of :class:`transformers.data.processors.utils.InputFeatures` with subtokens, subtoken ids, \ subtoken mask, segment mask, or tuple of batch of InputFeatures and Batch of subtokens @@ -181,21 +164,100 @@ def __call__(self, texts_a: List[str], texts_b: Optional[List[str]] = None) -> U return input_features +@register('torch_transformers_entity_ranker_preprocessor') +class TorchTransformersEntityRankerPreprocessor(Component): + """Class for tokenization of text into subtokens, encoding of subtokens with indices and obtaining positions of + special [ENT]-tokens + Args: + vocab_file: path to vocabulary + do_lower_case: set True if lowercasing is needed + max_seq_length: max sequence length in subtokens, including [SEP] and [CLS] tokens + special_tokens: list of special tokens + special_token_id: id of special token + return_special_tokens_pos: whether to return positions of found special tokens + """ + + def __init__(self, + vocab_file: str, + do_lower_case: bool = False, + max_seq_length: int = 512, + special_tokens: List[str] = None, + special_token_id: int = None, + return_special_tokens_pos: bool = False, + **kwargs) -> None: + self.max_seq_length = max_seq_length + self.do_lower_case = do_lower_case + if Path(vocab_file).is_file(): + vocab_file = str(expand_path(vocab_file)) + self.tokenizer = AutoTokenizer(vocab_file=vocab_file, + do_lower_case=do_lower_case) + else: + self.tokenizer = AutoTokenizer.from_pretrained(vocab_file, do_lower_case=do_lower_case) + if special_tokens is not None: + special_tokens_dict = {'additional_special_tokens': special_tokens} + self.tokenizer.add_special_tokens(special_tokens_dict) + self.special_token_id = special_token_id + self.return_special_tokens_pos = return_special_tokens_pos + + def __call__(self, texts_a: List[str]) -> Tuple[Any, List[int]]: + """Tokenize and find special tokens positions. + Args: + texts_a: list of texts, + Returns: + batch of :class:`transformers.data.processors.utils.InputFeatures` with subtokens, subtoken ids, \ + subtoken mask, segment mask, or tuple of batch of InputFeatures and Batch of subtokens + batch of indices of special token ids in input ids sequence + """ + # in case of iterator's strange behaviour + if isinstance(texts_a, tuple): + texts_a = list(texts_a) + if self.do_lower_case: + texts_a = [text.lower() for text in texts_a] + lengths = [] + input_ids_batch = [] + for text_a in texts_a: + encoding = self.tokenizer.encode_plus( + text_a, add_special_tokens=True, pad_to_max_length=True, return_attention_mask=True) + input_ids = encoding["input_ids"] + input_ids_batch.append(input_ids) + lengths.append(len(input_ids)) + + max_length = min(max(lengths), self.max_seq_length) + input_features = self.tokenizer(text=texts_a, + add_special_tokens=True, + max_length=max_length, + padding='max_length', + return_attention_mask=True, + truncation=True, + return_tensors='pt') + special_tokens_pos = [] + for input_ids_list in input_ids_batch: + found_n = -1 + for n, input_id in enumerate(input_ids_list): + if input_id == self.special_token_id: + found_n = n + break + if found_n == -1: + found_n = 0 + special_tokens_pos.append(found_n) + + if self.return_special_tokens_pos: + return input_features, special_tokens_pos + else: + return input_features + + @register('torch_squad_transformers_preprocessor') class TorchSquadTransformersPreprocessor(Component): """Tokenize text on subtokens, encode subtokens with their indices, create tokens and segment masks. - Check details in :func:`bert_dp.preprocessing.convert_examples_to_features` function. - Args: vocab_file: path to vocabulary do_lower_case: set True if lowercasing is needed max_seq_length: max sequence length in subtokens, including [SEP] and [CLS] tokens - return_tokens: whether to return tuple of input features and tokens, or only input features Attributes: max_seq_length: max sequence length in subtokens, including [SEP] and [CLS] tokens - return_tokens: whether to return tuple of input features and tokens, or only input features tokenizer: instance of Bert FullTokenizer """ @@ -204,11 +266,9 @@ def __init__(self, vocab_file: str, do_lower_case: bool = True, max_seq_length: int = 512, - return_tokens: bool = False, add_token_type_ids: bool = False, **kwargs) -> None: self.max_seq_length = max_seq_length - self.return_tokens = return_tokens self.add_token_type_ids = add_token_type_ids if Path(vocab_file).is_file(): vocab_file = str(expand_path(vocab_file)) @@ -217,61 +277,148 @@ def __init__(self, else: self.tokenizer = AutoTokenizer.from_pretrained(vocab_file, do_lower_case=do_lower_case) - def __call__(self, texts_a: List[str], texts_b: Optional[List[str]] = None) -> Union[List[InputFeatures], - Tuple[List[InputFeatures], - List[List[str]]]]: + def __call__(self, question_batch: List[str], context_batch: Optional[List[str]] = None) -> Union[ + List[InputFeatures], + Tuple[List[InputFeatures], + List[List[str]]]]: """Tokenize and create masks. - texts_a and texts_b are separated by [SEP] token + texts_a_batch and texts_b_batch are separated by [SEP] token Args: - texts_a: list of texts, - texts_b: list of texts, it could be None, e.g. single sentence classification task + texts_a_batch: list of texts, + texts_b_batch: list of texts, it could be None, e.g. single sentence classification task Returns: batch of :class:`transformers.data.processors.utils.InputFeatures` with subtokens, subtoken ids, \ - subtoken mask, segment mask, or tuple of batch of InputFeatures and Batch of subtokens + subtoken mask, segment mask, or tuple of batch of InputFeatures, batch of subtokens and batch of + split paragraphs """ - if texts_b is None: - texts_b = [None] * len(texts_a) + if context_batch is None: + context_batch = [None] * len(question_batch) + + input_features_batch, tokens_batch, split_context_batch = [], [], [] + for question, context in zip(question_batch, context_batch): + question_list, context_list = [], [] + context_subtokens = self.tokenizer.tokenize(context) + question_subtokens = self.tokenizer.tokenize(question) + max_chunk_len = self.max_seq_length - len(question_subtokens) - 3 + if 0 < max_chunk_len < len(context_subtokens): + number_of_chunks = math.ceil(len(context_subtokens) / max_chunk_len) + sentences = nltk.sent_tokenize(context) + for chunk in np.array_split(sentences, number_of_chunks): + context_list += [' '.join(chunk)] + question_list += [question] + else: + context_list += [context] + question_list += [question] - input_features = [] - tokens = [] - for text_a, text_b in zip(texts_a, texts_b): - encoded_dict = self.tokenizer.encode_plus( - text=text_a, text_pair=text_b, - add_special_tokens=True, - max_length=self.max_seq_length, - truncation=True, - padding='max_length', - return_attention_mask=True, - return_tensors='pt') - - if 'token_type_ids' not in encoded_dict: - if self.add_token_type_ids: - input_ids = encoded_dict['input_ids'] - seq_len = input_ids.size(1) - sep = torch.where(input_ids == self.tokenizer.sep_token_id)[1][0].item() - len_a = min(sep + 1, seq_len) - len_b = seq_len - len_a - encoded_dict['token_type_ids'] = torch.cat((torch.zeros(1, len_a, dtype=int), - torch.ones(1, len_b, dtype=int)), dim=1) - else: - encoded_dict['token_type_ids'] = torch.tensor([0]) - - curr_features = InputFeatures(input_ids=encoded_dict['input_ids'], - attention_mask=encoded_dict['attention_mask'], - token_type_ids=encoded_dict['token_type_ids'], - label=None) - input_features.append(curr_features) - if self.return_tokens: - tokens.append(self.tokenizer.convert_ids_to_tokens(encoded_dict['input_ids'][0])) - - if self.return_tokens: - return input_features, tokens - else: - return input_features + input_features_list, tokens_list = [], [] + for question_elem, context_elem in zip(question_list, context_list): + encoded_dict = self.tokenizer.encode_plus( + text=question_elem, text_pair=context_elem, + add_special_tokens=True, + max_length=self.max_seq_length, + truncation=True, + padding='max_length', + return_attention_mask=True, + return_tensors='pt') + if 'token_type_ids' not in encoded_dict: + if self.add_token_type_ids: + input_ids = encoded_dict['input_ids'] + seq_len = input_ids.size(1) + sep = torch.where(input_ids == self.tokenizer.sep_token_id)[1][0].item() + len_a = min(sep + 1, seq_len) + len_b = seq_len - len_a + encoded_dict['token_type_ids'] = torch.cat((torch.zeros(1, len_a, dtype=int), + torch.ones(1, len_b, dtype=int)), dim=1) + else: + encoded_dict['token_type_ids'] = torch.tensor([0]) + + curr_features = InputFeatures(input_ids=encoded_dict['input_ids'], + attention_mask=encoded_dict['attention_mask'], + token_type_ids=encoded_dict['token_type_ids'], + label=None) + input_features_list.append(curr_features) + tokens_list.append(self.tokenizer.convert_ids_to_tokens(encoded_dict['input_ids'][0])) + + input_features_batch.append(input_features_list) + tokens_batch.append(tokens_list) + split_context_batch.append(context_list) + + return input_features_batch, tokens_batch, split_context_batch + + +@register('rel_ranking_preprocessor') +class RelRankingPreprocessor(Component): + """Class for tokenization of text and relation labels + Args: + vocab_file: path to vocabulary + add_special_tokens: special_tokens_list + do_lower_case: set True if lowercasing is needed + max_seq_length: max sequence length in subtokens, including [SEP] and [CLS] tokens + """ + + def __init__(self, + vocab_file: str, + add_special_tokens: List[str], + do_lower_case: bool = True, + max_seq_length: int = 512, + **kwargs) -> None: + self.max_seq_length = max_seq_length + self.tokenizer = AutoTokenizer.from_pretrained(vocab_file, do_lower_case=do_lower_case) + self.add_special_tokens = add_special_tokens + special_tokens_dict = {'additional_special_tokens': add_special_tokens} + self.tokenizer.add_special_tokens(special_tokens_dict) + + def __call__(self, questions_batch: List[List[str]], rels_batch: List[List[str]] = None) -> Dict[str, torch.tensor]: + """Tokenize questions and relations + texts_a and texts_b are separated by [SEP] token + Args: + questions_batch: list of texts, + rels_batch: list of relations list + + Returns: + batch of :class:`transformers.data.processors.utils.InputFeatures` with subtokens, subtoken ids, \ + subtoken mask, segment mask, or tuple of batch of InputFeatures and Batch of subtokens + """ + lengths = [] + for question, rels_list in zip(questions_batch, rels_batch): + if isinstance(rels_list, list): + rels_str = self.add_special_tokens[2].join(rels_list) + else: + rels_str = rels_list + text_input = f"{self.add_special_tokens[0]} {question} {self.add_special_tokens[1]} {rels_str}" + encoding = self.tokenizer.encode_plus(text=text_input, + return_attention_mask=True, add_special_tokens=True, + truncation=True) + lengths.append(len(encoding["input_ids"])) + max_len = max(lengths) + input_ids_batch = [] + attention_mask_batch = [] + token_type_ids_batch = [] + for question, rels_list in zip(questions_batch, rels_batch): + if isinstance(rels_list, list): + rels_str = self.add_special_tokens[2].join(rels_list) + else: + rels_str = rels_list + text_input = f"{self.add_special_tokens[0]} {question} {self.add_special_tokens[1]} {rels_str}" + encoding = self.tokenizer.encode_plus(text=text_input, + truncation = True, max_length=max_len, + pad_to_max_length=True, return_attention_mask = True) + input_ids_batch.append(encoding["input_ids"]) + attention_mask_batch.append(encoding["attention_mask"]) + if "token_type_ids" in encoding: + token_type_ids_batch.append(encoding["token_type_ids"]) + else: + token_type_ids_batch.append([0]) + + input_features = {"input_ids": torch.LongTensor(input_ids_batch), + "attention_mask": torch.LongTensor(attention_mask_batch), + "token_type_ids": torch.LongTensor(token_type_ids_batch)} + + return input_features @register('torch_transformers_ner_preprocessor') @@ -326,8 +473,19 @@ def __call__(self, tokens: Union[List[List[str]], List[str]], tags: List[List[str]] = None, **kwargs): + tokens_offsets_batch = [[] for _ in tokens] if isinstance(tokens[0], str): - tokens = [re.findall(self._re_tokenizer, s) for s in tokens] + tokens_batch = [] + tokens_offsets_batch = [] + for s in tokens: + tokens_list = [] + tokens_offsets_list = [] + for elem in re.finditer(self._re_tokenizer, s): + tokens_list.append(elem[0]) + tokens_offsets_list.append((elem.start(), elem.end())) + tokens_batch.append(tokens_list) + tokens_offsets_batch.append(tokens_offsets_list) + tokens = tokens_batch subword_tokens, subword_tok_ids, startofword_markers, subword_tags = [], [], [], [] for i in range(len(tokens)): toks = tokens[i] @@ -377,7 +535,7 @@ def __call__(self, log.warning(f'Tags len: {len(ts)}\n Tags: {ts}') return tokens, subword_tokens, subword_tok_ids, \ attention_mask, startofword_markers, nonmasked_tags - return tokens, subword_tokens, subword_tok_ids, startofword_markers, attention_mask + return tokens, subword_tokens, subword_tok_ids, startofword_markers, attention_mask, tokens_offsets_batch @staticmethod def _ner_bert_tokenize(tokens: List[str], diff --git a/deeppavlov/models/ranking/bilstm_gru_siamese_network.py b/deeppavlov/models/ranking/bilstm_gru_siamese_network.py deleted file mode 100644 index fe243598a9..0000000000 --- a/deeppavlov/models/ranking/bilstm_gru_siamese_network.py +++ /dev/null @@ -1,110 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from logging import getLogger - -from tensorflow.keras import backend as K -from tensorflow.keras.layers import Input, GlobalMaxPooling1D, Lambda, Dense, GRU -from tensorflow.keras.models import Model - -from deeppavlov.core.common.registry import register -from deeppavlov.models.ranking.bilstm_siamese_network import BiLSTMSiameseNetwork - -log = getLogger(__name__) - - -@register('bilstm_gru_nn') -class BiLSTMGRUSiameseNetwork(BiLSTMSiameseNetwork): - """The class implementing a siamese neural network with BiLSTM, GRU and max pooling. - - GRU is used to take into account multi-turn dialogue ``context``. - - Args: - len_vocab: A size of the vocabulary to build embedding layer. - seed: Random seed. - shared_weights: Whether to use shared weights in the model to encode ``contexts`` and ``responses``. - embedding_dim: Dimensionality of token (word) embeddings. - reccurent: A type of the RNN cell. Possible values are ``lstm`` and ``bilstm``. - hidden_dim: Dimensionality of the hidden state of the RNN cell. If ``reccurent`` equals ``bilstm`` - ``hidden_dim`` should be doubled to get the actual dimensionality. - max_pooling: Whether to use max-pooling operation to get ``context`` (``response``) vector representation. - If ``False``, the last hidden state of the RNN will be used. - triplet_loss: Whether to use a model with triplet loss. - If ``False``, a model with crossentropy loss will be used. - margin: A margin parameter for triplet loss. Only required if ``triplet_loss`` is set to ``True``. - hard_triplets: Whether to use hard triplets sampling to train the model - i.e. to choose negative samples close to positive ones. - If set to ``False`` random sampling will be used. - Only required if ``triplet_loss`` is set to ``True``. - """ - - def create_model(self) -> Model: - input = [] - if self.use_matrix: - for i in range(self.num_context_turns + 1): - input.append(Input(shape=(self.max_sequence_length,))) - context = input[:self.num_context_turns] - response = input[-1] - emb_layer = self.embedding_layer() - emb_c = [emb_layer(el) for el in context] - emb_r = emb_layer(response) - else: - for i in range(self.num_context_turns + 1): - input.append(Input(shape=(self.max_sequence_length, self.embedding_dim,))) - context = input[:self.num_context_turns] - response = input[-1] - emb_c = context - emb_r = response - lstm_layer = self.lstm_layer() - lstm_c = [lstm_layer(el) for el in emb_c] - lstm_r = lstm_layer(emb_r) - pooling_layer = GlobalMaxPooling1D(name="pooling") - lstm_c = [pooling_layer(el) for el in lstm_c] - lstm_r = pooling_layer(lstm_r) - lstm_c = [Lambda(lambda x: K.expand_dims(x, 1))(el) for el in lstm_c] - lstm_c = Lambda(lambda x: K.concatenate(x, 1))(lstm_c) - gru_layer = GRU(2 * self.hidden_dim, name="gru") - gru_c = gru_layer(lstm_c) - - if self.triplet_mode: - dist = Lambda(self._pairwise_distances)([gru_c, lstm_r]) - else: - dist = Lambda(self._diff_mult_dist)([gru_c, lstm_r]) - dist = Dense(1, activation='sigmoid', name="score_model")(dist) - model = Model(context + [response], dist) - return model - - def create_score_model(self) -> Model: - cr = self.model.inputs - if self.triplet_mode: - emb_c = self.model.get_layer("gru").output - emb_r = self.model.get_layer("pooling").get_output(-1) - dist_score = Lambda(lambda x: self.euclidian_dist(x), name="score_model") - score = dist_score([emb_c, emb_r]) - else: - score = self.model.get_layer("score_model").output - score = Lambda(lambda x: 1. - K.squeeze(x, -1))(score) - score = Lambda(lambda x: 1. - x)(score) - model = Model(cr, score) - return model - - def create_context_model(self) -> Model: - m = Model(self.model.inputs[:-1], - self.model.get_layer("gru").output) - return m - - def create_response_model(self) -> Model: - m = Model(self.model.inputs[-1], - self.model.get_layer("pooling").get_output_at(-1)) - return m diff --git a/deeppavlov/models/ranking/bilstm_siamese_network.py b/deeppavlov/models/ranking/bilstm_siamese_network.py deleted file mode 100644 index 547fe746e9..0000000000 --- a/deeppavlov/models/ranking/bilstm_siamese_network.py +++ /dev/null @@ -1,292 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from logging import getLogger -from typing import List - -import numpy as np -from tensorflow.keras import backend as K -from tensorflow.keras import losses -from tensorflow.keras.initializers import glorot_uniform, Orthogonal -from tensorflow.keras.layers import (Input, LSTM, Embedding, GlobalMaxPooling1D, Lambda, Dense, Layer, Multiply, - Bidirectional) -from tensorflow.keras.models import Model -from tensorflow.keras.optimizers import Adam -from tensorflow.python.framework.ops import Tensor - -from deeppavlov.core.common.registry import register -from deeppavlov.models.ranking.keras_siamese_model import KerasSiameseModel - -log = getLogger(__name__) - - -@register('bilstm_nn') -class BiLSTMSiameseNetwork(KerasSiameseModel): - """The class implementing a siamese neural network with BiLSTM and max pooling. - - There is a possibility to use a binary cross-entropy loss as well as - a triplet loss with random or hard negative sampling. - - Args: - len_vocab: A size of the vocabulary to build embedding layer. - seed: Random seed. - shared_weights: Whether to use shared weights in the model to encode ``contexts`` and ``responses``. - embedding_dim: Dimensionality of token (word) embeddings. - reccurent: A type of the RNN cell. Possible values are ``lstm`` and ``bilstm``. - hidden_dim: Dimensionality of the hidden state of the RNN cell. If ``reccurent`` equals ``bilstm`` - ``hidden_dim`` should be doubled to get the actual dimensionality. - max_pooling: Whether to use max-pooling operation to get ``context`` (``response``) vector representation. - If ``False``, the last hidden state of the RNN will be used. - triplet_loss: Whether to use a model with triplet loss. - If ``False``, a model with crossentropy loss will be used. - margin: A margin parameter for triplet loss. Only required if ``triplet_loss`` is set to ``True``. - hard_triplets: Whether to use hard triplets sampling to train the model - i.e. to choose negative samples close to positive ones. - If set to ``False`` random sampling will be used. - Only required if ``triplet_loss`` is set to ``True``. - """ - - def __init__(self, - len_vocab: int, - seed: int = None, - shared_weights: bool = True, - embedding_dim: int = 300, - reccurent: str = "bilstm", - hidden_dim: int = 300, - max_pooling: bool = True, - triplet_loss: bool = True, - margin: float = 0.1, - hard_triplets: bool = False, - *args, - **kwargs) -> None: - - self.toks_num = len_vocab - self.seed = seed - self.hidden_dim = hidden_dim - self.shared_weights = shared_weights - self.pooling = max_pooling - self.recurrent = reccurent - self.margin = margin - self.embedding_dim = embedding_dim - self.hard_triplets = hard_triplets - self.triplet_mode = triplet_loss - - super(BiLSTMSiameseNetwork, self).__init__(*args, **kwargs) - - def compile(self) -> None: - optimizer = Adam(lr=self.learning_rate) - if self.triplet_mode: - loss = self._triplet_loss - else: - loss = losses.binary_crossentropy - self.model.compile(loss=loss, optimizer=optimizer) - self.score_model = self.create_score_model() - - def load_initial_emb_matrix(self) -> None: - log.info("[initializing new `{}`]".format(self.__class__.__name__)) - if self.use_matrix: - if self.shared_weights: - self.model.get_layer(name="embedding").set_weights([self.emb_matrix]) - else: - self.model.get_layer(name="embedding_a").set_weights([self.emb_matrix]) - self.model.get_layer(name="embedding_b").set_weights([self.emb_matrix]) - - def embedding_layer(self) -> Layer: - out = Embedding(self.toks_num, - self.embedding_dim, - input_length=self.max_sequence_length, - trainable=True, name="embedding") - return out - - def lstm_layer(self) -> Layer: - if self.pooling: - ret_seq = True - else: - ret_seq = False - ker_in = glorot_uniform(seed=self.seed) - rec_in = Orthogonal(seed=self.seed) - if self.recurrent == "bilstm" or self.recurrent is None: - out = Bidirectional(LSTM(self.hidden_dim, - input_shape=(self.max_sequence_length, self.embedding_dim,), - kernel_initializer=ker_in, - recurrent_initializer=rec_in, - return_sequences=ret_seq), merge_mode='concat') - elif self.recurrent == "lstm": - out = LSTM(self.hidden_dim, - input_shape=(self.max_sequence_length, self.embedding_dim,), - kernel_initializer=ker_in, - recurrent_initializer=rec_in, - return_sequences=ret_seq) - return out - - def create_model(self) -> Model: - if self.use_matrix: - context = Input(shape=(self.max_sequence_length,)) - response = Input(shape=(self.max_sequence_length,)) - if self.shared_weights: - emb_layer_a = self.embedding_layer() - emb_layer_b = emb_layer_a - else: - emb_layer_a = self.embedding_layer() - emb_layer_b = self.embedding_layer() - emb_c = emb_layer_a(context) - emb_r = emb_layer_b(response) - else: - context = Input(shape=(self.max_sequence_length, self.embedding_dim,)) - response = Input(shape=(self.max_sequence_length, self.embedding_dim,)) - emb_c = context - emb_r = response - - if self.shared_weights: - lstm_layer_a = self.lstm_layer() - lstm_layer_b = lstm_layer_a - else: - lstm_layer_a = self.lstm_layer() - lstm_layer_b = self.lstm_layer() - lstm_c = lstm_layer_a(emb_c) - lstm_r = lstm_layer_b(emb_r) - if self.pooling: - pooling_layer = GlobalMaxPooling1D(name="sentence_embedding") - lstm_c = pooling_layer(lstm_c) - lstm_r = pooling_layer(lstm_r) - - if self.triplet_mode: - dist = Lambda(self._pairwise_distances)([lstm_c, lstm_r]) - else: - dist = Lambda(self._diff_mult_dist)([lstm_c, lstm_r]) - dist = Dense(1, activation='sigmoid', name="score_model")(dist) - model = Model([context, response], dist) - return model - - def create_score_model(self) -> Model: - cr = self.model.inputs - if self.triplet_mode: - emb_c = self.model.get_layer("sentence_embedding").get_output_at(0) - emb_r = self.model.get_layer("sentence_embedding").get_output_at(1) - dist_score = Lambda(lambda x: self._euclidian_dist(x), name="score_model") - score = dist_score([emb_c, emb_r]) - else: - score = self.model.get_layer("score_model").output - score = Lambda(lambda x: 1. - K.squeeze(x, -1))(score) - score = Lambda(lambda x: 1. - x)(score) - model = Model(cr, score) - return model - - def _diff_mult_dist(self, inputs: List[Tensor]) -> Tensor: - input1, input2 = inputs - a = K.abs(input1 - input2) - b = Multiply()(inputs) - return K.concatenate([input1, input2, a, b]) - - def _euclidian_dist(self, x_pair: List[Tensor]) -> Tensor: - x1_norm = K.l2_normalize(x_pair[0], axis=1) - x2_norm = K.l2_normalize(x_pair[1], axis=1) - diff = x1_norm - x2_norm - square = K.square(diff) - _sum = K.sum(square, axis=1) - _sum = K.clip(_sum, min_value=1e-12, max_value=None) - dist = K.sqrt(_sum) / 2. - return dist - - def _pairwise_distances(self, inputs: List[Tensor]) -> Tensor: - emb_c, emb_r = inputs - bs = K.shape(emb_c)[0] - embeddings = K.concatenate([emb_c, emb_r], 0) - dot_product = K.dot(embeddings, K.transpose(embeddings)) - square_norm = K.batch_dot(embeddings, embeddings, axes=1) - distances = K.transpose(square_norm) - 2.0 * dot_product + square_norm - distances = distances[0:bs, bs:bs+bs] - distances = K.clip(distances, 0.0, None) - mask = K.cast(K.equal(distances, 0.0), K.dtype(distances)) - distances = distances + mask * 1e-16 - distances = K.sqrt(distances) - distances = distances * (1.0 - mask) - return distances - - def _triplet_loss(self, labels: Tensor, pairwise_dist: Tensor) -> Tensor: - y_true = K.squeeze(labels, axis=1) - """Triplet loss function""" - if self.hard_triplets: - triplet_loss = self._batch_hard_triplet_loss(y_true, pairwise_dist) - else: - triplet_loss = self._batch_all_triplet_loss(y_true, pairwise_dist) - return triplet_loss - - def _batch_all_triplet_loss(self, y_true: Tensor, pairwise_dist: Tensor) -> Tensor: - anchor_positive_dist = K.expand_dims(pairwise_dist, 2) - anchor_negative_dist = K.expand_dims(pairwise_dist, 1) - triplet_loss = anchor_positive_dist - anchor_negative_dist + self.margin - mask = self._get_triplet_mask(y_true, pairwise_dist) - triplet_loss = mask * triplet_loss - triplet_loss = K.clip(triplet_loss, 0.0, None) - valid_triplets = K.cast(K.greater(triplet_loss, 1e-16), K.dtype(triplet_loss)) - num_positive_triplets = K.sum(valid_triplets) - triplet_loss = K.sum(triplet_loss) / (num_positive_triplets + 1e-16) - return triplet_loss - - def _batch_hard_triplet_loss(self, y_true: Tensor, pairwise_dist: Tensor) -> Tensor: - mask_anchor_positive = self._get_anchor_positive_triplet_mask(y_true, pairwise_dist) - anchor_positive_dist = mask_anchor_positive * pairwise_dist - hardest_positive_dist = K.max(anchor_positive_dist, axis=1, keepdims=True) - mask_anchor_negative = self._get_anchor_negative_triplet_mask(y_true, pairwise_dist) - anchor_negative_dist = mask_anchor_negative * pairwise_dist - mask_anchor_negative = self._get_semihard_anchor_negative_triplet_mask(anchor_negative_dist, - hardest_positive_dist, - mask_anchor_negative) - max_anchor_negative_dist = K.max(pairwise_dist, axis=1, keepdims=True) - anchor_negative_dist = pairwise_dist + max_anchor_negative_dist * (1.0 - mask_anchor_negative) - hardest_negative_dist = K.min(anchor_negative_dist, axis=1, keepdims=True) - triplet_loss = K.clip(hardest_positive_dist - hardest_negative_dist + self.margin, 0.0, None) - triplet_loss = K.mean(triplet_loss) - return triplet_loss - - def _get_triplet_mask(self, y_true: Tensor, pairwise_dist: Tensor) -> Tensor: - # mask label(a) != label(p) - mask1 = K.expand_dims(K.equal(K.expand_dims(y_true, 0), K.expand_dims(y_true, 1)), 2) - mask1 = K.cast(mask1, K.dtype(pairwise_dist)) - # mask a == p - mask2 = K.expand_dims(K.not_equal(pairwise_dist, 0.0), 2) - mask2 = K.cast(mask2, K.dtype(pairwise_dist)) - # mask label(n) == label(a) - mask3 = K.expand_dims(K.not_equal(K.expand_dims(y_true, 0), K.expand_dims(y_true, 1)), 1) - mask3 = K.cast(mask3, K.dtype(pairwise_dist)) - return mask1 * mask2 * mask3 - - def _get_anchor_positive_triplet_mask(self, y_true: Tensor, pairwise_dist: Tensor) -> Tensor: - # mask label(a) != label(p) - mask1 = K.equal(K.expand_dims(y_true, 0), K.expand_dims(y_true, 1)) - mask1 = K.cast(mask1, K.dtype(pairwise_dist)) - # mask a == p - mask2 = K.not_equal(pairwise_dist, 0.0) - mask2 = K.cast(mask2, K.dtype(pairwise_dist)) - return mask1 * mask2 - - def _get_anchor_negative_triplet_mask(self, y_true: Tensor, pairwise_dist: Tensor) -> Tensor: - # mask label(n) == label(a) - mask = K.not_equal(K.expand_dims(y_true, 0), K.expand_dims(y_true, 1)) - mask = K.cast(mask, K.dtype(pairwise_dist)) - return mask - - def _get_semihard_anchor_negative_triplet_mask(self, negative_dist: Tensor, - hardest_positive_dist: Tensor, - mask_negative: Tensor) -> Tensor: - # mask max(dist(a,p)) < dist(a,n) - mask = K.greater(negative_dist, hardest_positive_dist) - mask = K.cast(mask, K.dtype(negative_dist)) - mask_semihard = K.cast(K.expand_dims(K.greater(K.sum(mask, 1), 0.0), 1), K.dtype(negative_dist)) - mask = mask_negative * (1 - mask_semihard) + mask * mask_semihard - return mask - - def _predict_on_batch(self, batch: List[np.ndarray]) -> np.ndarray: - return self.score_model.predict_on_batch(x=batch) diff --git a/deeppavlov/models/ranking/deep_attention_matching_network_use_transformer.py b/deeppavlov/models/ranking/deep_attention_matching_network_use_transformer.py deleted file mode 100644 index a9dc45ccd0..0000000000 --- a/deeppavlov/models/ranking/deep_attention_matching_network_use_transformer.py +++ /dev/null @@ -1,403 +0,0 @@ -# Copyright 2018 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from logging import getLogger -from typing import List, Dict, Tuple, Optional - -import numpy as np -import tensorflow as tf -import tensorflow_hub as hub - -from deeppavlov.core.common.registry import register -from deeppavlov.models.ranking.matching_models.dam_utils import layers -from deeppavlov.models.ranking.matching_models.dam_utils import operations as op -from deeppavlov.models.ranking.tf_base_matching_model import TensorflowBaseMatchingModel - -log = getLogger(__name__) - - -@register('dam_nn_use_transformer') -class DAMNetworkUSETransformer(TensorflowBaseMatchingModel): - """ - Tensorflow implementation of Deep Attention Matching Network (DAM) [1] improved with USE [2]. We called it DAM-USE-T - ``` - http://aclweb.org/anthology/P18-1103 - - Based on Tensorflow code: https://github.com/baidu/Dialogue/tree/master/DAM - We added USE-T [2] as a sentence encoder to the DAM network to achieve state-of-the-art performance on the datasets: - * Ubuntu Dialogue Corpus v1 (R@1: 0.7929, R@2: 0.8912, R@5: 0.9742) - * Ubuntu Dialogue Corpus v2 (R@1: 0.7414, R@2: 0.8656, R@5: 0.9731) - - References: - [1] - ``` - @inproceedings{ , - title={Multi-Turn Response Selection for Chatbots with Deep Attention Matching Network}, - author={Xiangyang Zhou, Lu Li, Daxiang Dong, Yi Liu, Ying Chen, Wayne Xin Zhao, Dianhai Yu and Hua Wu}, - booktitle={Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)}, - volume={1}, - pages={ -- }, - year={2018} - } - ``` - [2] Cer D, Yang Y, Kong S-y, Hua N, Limtiaco N, John RS, et al. 2018. Universal sentence encoder. - arXiv preprint arXiv:1803.11175 2018. - - Args: - num_context_turns (int): A number of ``context`` turns in data samples. - max_sequence_length(int): A maximum length of text sequences in tokens. - Longer sequences will be truncated and shorter ones will be padded. - learning_rate (float): Initial learning rate. - emb_matrix (np.ndarray): An embeddings matrix to initialize an embeddings layer of a model. - trainable_embeddings (bool): Whether train embeddings matrix or not. - embedding_dim (int): Dimensionality of token (word) embeddings. - is_positional (bool): Adds a bunch of sinusoids of different frequencies to an embeddings. - stack_num (int): Number of stack layers, default is 5. - seed (int): Random seed. - decay_steps (int): Number of steps after which is to decay the learning rate. - """ - - def __init__(self, - embedding_dim: int = 200, - max_sequence_length: int = 50, - learning_rate: float = 1e-3, - emb_matrix: Optional[np.ndarray] = None, - trainable_embeddings: bool = False, - is_positional: bool = True, - stack_num: int = 5, - seed: int = 65, - decay_steps: int = 600, - *args, - **kwargs): - - self.seed = seed - tf.set_random_seed(self.seed) - - self.max_sentence_len = max_sequence_length - self.word_embedding_size = embedding_dim - self.trainable = trainable_embeddings - self.is_positional = is_positional - self.stack_num = stack_num - self.learning_rate = learning_rate - self.emb_matrix = emb_matrix - self.decay_steps = decay_steps - - super(DAMNetworkUSETransformer, self).__init__(*args, **kwargs) - - ############################################################################## - self._init_graph() - self.sess_config = tf.ConfigProto(allow_soft_placement=True) - self.sess_config.gpu_options.allow_growth = True - self.sess = tf.Session(config=self.sess_config) - self.sess.run([tf.global_variables_initializer(), tf.tables_initializer()]) - ############################################################################## - - if self.load_path is not None: - self.load() - - def _init_placeholders(self): - """ Init model placeholders """ - with tf.variable_scope('inputs'): - # Utterances and their lengths - self.utterance_ph = tf.placeholder(tf.int32, shape=(None, self.num_context_turns, self.max_sentence_len)) - self.all_utterance_len_ph = tf.placeholder(tf.int32, shape=(None, self.num_context_turns)) - - # Responses and their lengths - self.response_ph = tf.placeholder(tf.int32, shape=(None, self.max_sentence_len)) - self.response_len_ph = tf.placeholder(tf.int32, shape=(None,)) - - # Labels - self.y_true = tf.placeholder(tf.int32, shape=(None,)) - - # Raw sentences for context and response - self.context_sent_ph = tf.placeholder(tf.string, - shape=(None, self.num_context_turns), - name="context_sentences") - self.response_sent_ph = tf.placeholder(tf.string, shape=(None,), name="response_sentences") - - def _init_sentence_encoder(self): - """ Init sentence encoder, for example USE-T """ - # sentence encoder - self.embed = hub.Module("https://tfhub.dev/google/universal-sentence-encoder-large/3", - trainable=False) - - # embed sentences of context - with tf.variable_scope('sentence_embeddings'): - x = [] - for i in range(self.num_context_turns): - x.append(self.embed(tf.reshape(self.context_sent_ph[:, i], shape=(tf.shape(self.context_sent_ph)[0],)))) - embed_context_turns = tf.stack(x, axis=1) - embed_response = self.embed(self.response_sent_ph) - - # for context sentences: shape=(None, self.num_context_turns, 1, 512) - self.sent_embedder_context = tf.expand_dims(embed_context_turns, axis=2) - # for resp sentences: shape=(None, 1, 512) - self.sent_embedder_response = tf.expand_dims(embed_response, axis=1) - - def _init_graph(self): - self._init_placeholders() - self._init_sentence_encoder() - - with tf.variable_scope('sentence_emb_dim_reduction'): - dense_emb = tf.layers.Dense(200, - kernel_initializer=tf.keras.initializers.glorot_uniform(seed=42), - kernel_regularizer=tf.keras.regularizers.l2(), - bias_regularizer=tf.keras.regularizers.l2(), - trainable=True) - - a = [] - for i in range(self.num_context_turns): - a.append(dense_emb(self.sent_embedder_context[:, i])) - sent_embedder_context = tf.stack(a, axis=1) - sent_embedder_response = dense_emb(self.sent_embedder_response) - - with tf.variable_scope('embedding_matrix_init'): - word_embeddings = tf.get_variable("word_embeddings_v", - initializer=tf.constant(self.emb_matrix, dtype=tf.float32), - trainable=self.trainable) - with tf.variable_scope('embedding_lookup'): - response_embeddings = tf.nn.embedding_lookup(word_embeddings, self.response_ph) - - Hr = response_embeddings - if self.is_positional and self.stack_num > 0: - with tf.variable_scope('positional'): - Hr = op.positional_encoding_vector(Hr, max_timescale=10) - - with tf.variable_scope('expand_resp_embeddings'): - Hr = tf.concat([sent_embedder_response, Hr], axis=1) - - Hr_stack = [Hr] - - for index in range(self.stack_num): - with tf.variable_scope('self_stack_' + str(index)): - Hr = layers.block( - Hr, Hr, Hr, - Q_lengths=self.response_len_ph, K_lengths=self.response_len_ph, attention_type='dot') - Hr_stack.append(Hr) - - # context part - # a list of length max_turn_num, every element is a tensor with shape [batch, max_turn_len] - list_turn_t = tf.unstack(self.utterance_ph, axis=1) - list_turn_length = tf.unstack(self.all_utterance_len_ph, axis=1) - list_turn_t_sent = tf.unstack(sent_embedder_context, axis=1) - - sim_turns = [] - # for every turn_t calculate matching vector - for turn_t, t_turn_length, turn_t_sent in zip(list_turn_t, list_turn_length, list_turn_t_sent): - Hu = tf.nn.embedding_lookup(word_embeddings, turn_t) # [batch, max_turn_len, emb_size] - - if self.is_positional and self.stack_num > 0: - with tf.variable_scope('positional', reuse=True): - Hu = op.positional_encoding_vector(Hu, max_timescale=10) - - with tf.variable_scope('expand_cont_embeddings'): - Hu = tf.concat([turn_t_sent, Hu], axis=1) - - Hu_stack = [Hu] - - for index in range(self.stack_num): - with tf.variable_scope('self_stack_' + str(index), reuse=True): - Hu = layers.block( - Hu, Hu, Hu, - Q_lengths=t_turn_length, K_lengths=t_turn_length, attention_type='dot') - - Hu_stack.append(Hu) - - r_a_t_stack = [] - t_a_r_stack = [] - for index in range(self.stack_num + 1): - - with tf.variable_scope('t_attend_r_' + str(index)): - try: - t_a_r = layers.block( - Hu_stack[index], Hr_stack[index], Hr_stack[index], - Q_lengths=t_turn_length, K_lengths=self.response_len_ph, attention_type='dot') - except ValueError: - tf.get_variable_scope().reuse_variables() - t_a_r = layers.block( - Hu_stack[index], Hr_stack[index], Hr_stack[index], - Q_lengths=t_turn_length, K_lengths=self.response_len_ph, attention_type='dot') - - with tf.variable_scope('r_attend_t_' + str(index)): - try: - r_a_t = layers.block( - Hr_stack[index], Hu_stack[index], Hu_stack[index], - Q_lengths=self.response_len_ph, K_lengths=t_turn_length, attention_type='dot') - except ValueError: - tf.get_variable_scope().reuse_variables() - r_a_t = layers.block( - Hr_stack[index], Hu_stack[index], Hu_stack[index], - Q_lengths=self.response_len_ph, K_lengths=t_turn_length, attention_type='dot') - - t_a_r_stack.append(t_a_r) - r_a_t_stack.append(r_a_t) - - t_a_r_stack.extend(Hu_stack) - r_a_t_stack.extend(Hr_stack) - - t_a_r = tf.stack(t_a_r_stack, axis=-1) - r_a_t = tf.stack(r_a_t_stack, axis=-1) - - # log.info(t_a_r, r_a_t) # debug - - # calculate similarity matrix - with tf.variable_scope('similarity'): - # sim shape [batch, max_turn_len, max_turn_len, 2*stack_num+1] - # divide sqrt(200) to prevent gradient explosion - sim = tf.einsum('biks,bjks->bijs', t_a_r, r_a_t) / tf.sqrt(float(self.word_embedding_size)) - - sim_turns.append(sim) - - # cnn and aggregation - sim = tf.stack(sim_turns, axis=1) - log.info('sim shape: %s' % sim.shape) - with tf.variable_scope('cnn_aggregation'): - final_info = layers.CNN_3d(sim, 32, 32) # We can improve performance if use 32 filters for each layer - # for douban - # final_info = layers.CNN_3d(sim, 16, 16) - - # loss and train - with tf.variable_scope('loss'): - self.loss, self.logits = layers.loss(final_info, self.y_true, clip_value=10.) - self.y_pred = tf.nn.softmax(self.logits, name="y_pred") - tf.summary.scalar('loss', self.loss) - - self.global_step = tf.Variable(0, trainable=False) - initial_learning_rate = self.learning_rate - self.learning_rate = tf.train.exponential_decay( - initial_learning_rate, - global_step=self.global_step, - decay_steps=self.decay_steps, - decay_rate=0.9, - staircase=True) - - Optimizer = tf.train.AdamOptimizer(self.learning_rate) - self.grads_and_vars = Optimizer.compute_gradients(self.loss) - - for grad, var in self.grads_and_vars: - if grad is None: - log.info(var) - - self.capped_gvs = [(tf.clip_by_value(grad, -1., 1.), var) for grad, var in self.grads_and_vars] - self.train_op = Optimizer.apply_gradients( - self.capped_gvs, - global_step=self.global_step) - - # Debug - self.print_number_of_parameters() - - def _append_sample_to_batch_buffer(self, sample: List[np.ndarray], buf: List[Tuple]): - """ - The function for adding samples to the batch buffer - - Args: - sample (List[nd.array]): samples generator - buf (List[Tuple[np.ndarray]]) : List of samples with model inputs each: - [( context, context_len, response, response_len ), ( ... ), ... ]. - - Returns: - None - """ - sample_len = len(sample) - - batch_buffer_context = [] # [batch_size, 10, 50] - batch_buffer_context_len = [] # [batch_size, 10] - batch_buffer_response = [] # [batch_size, 50] - batch_buffer_response_len = [] # [batch_size] - - raw_batch_buffer_context = [] # [batch_size, 10] - raw_batch_buffer_response = [] # [batch_size] - - context_sentences = sample[:self.num_context_turns] - response_sentences = sample[self.num_context_turns:sample_len // 2] - - raw_context_sentences = sample[sample_len // 2:sample_len // 2 + self.num_context_turns] - raw_response_sentences = sample[sample_len // 2 + self.num_context_turns:] - - # Format model inputs: - # 4 model inputs - - # 1. Token indices for context - batch_buffer_context += [context_sentences for sent in response_sentences] # replicate context N times - # 2. Token indices for response - batch_buffer_response += [response_sentence for response_sentence in response_sentences] - # 3. Lengths of all context sentences - lens = [] - for context in [context_sentences for sent in response_sentences]: # replicate context N times - context_sentences_lens = [] - for sent in context: - sent_len = len(sent[sent != 0]) - sent_len = sent_len + 1 if sent_len > 0 else 0 # 1 additional token is the USE token - context_sentences_lens.append(sent_len) - lens.append(context_sentences_lens) - batch_buffer_context_len += lens - # 4. Length of response - lens = [] - for response in [response_sentence for response_sentence in response_sentences]: - sent_len = len(response[response != 0]) - sent_len = sent_len + 1 if sent_len > 0 else 0 # 1 additional token is the USE token - lens.append(sent_len) - batch_buffer_response_len += lens - # 5. Raw context sentences - raw_batch_buffer_context += [raw_context_sentences for sent in raw_response_sentences] - # 6. Raw response sentences - raw_batch_buffer_response += [raw_sent for raw_sent in raw_response_sentences] - - for i in range(len(batch_buffer_context)): - buf.append(tuple(( - batch_buffer_context[i], - batch_buffer_context_len[i], - batch_buffer_response[i], - batch_buffer_response_len[i], - raw_batch_buffer_context[i], - raw_batch_buffer_response[i] - ))) - return len(response_sentences) - - def _make_batch(self, batch: List[Tuple[np.ndarray]]) -> Dict: - """ - The function for formatting model inputs - - Args: - batch (List[Tuple[np.ndarray]]): List of samples with model inputs each: - [( context, context_len, response, response_len ), ( ... ), ... ]. - graph (str): which graph the inputs is preparing for - - Returns: - Dict: feed_dict to feed a model - """ - input_context = [] - input_context_len = [] - input_response = [] - input_response_len = [] - input_raw_context = [] - input_raw_response = [] - - # format model inputs for MAIN graph as numpy arrays - for sample in batch: - input_context.append(sample[0]) - input_context_len.append(sample[1]) - input_response.append(sample[2]) - input_response_len.append(sample[3]) - input_raw_context.append(sample[4]) # raw context is the 4th element of each Tuple in the batch - input_raw_response.append(sample[5]) # raw response is the 5th element of each Tuple in the batch - - return { - self.utterance_ph: np.array(input_context), - self.all_utterance_len_ph: np.array(input_context_len), - self.response_ph: np.array(input_response), - self.response_len_ph: np.array(input_response_len), - self.context_sent_ph: np.array(input_raw_context), - self.response_sent_ph: np.array(input_raw_response) - } diff --git a/deeppavlov/models/ranking/keras_siamese_model.py b/deeppavlov/models/ranking/keras_siamese_model.py deleted file mode 100644 index a69365960e..0000000000 --- a/deeppavlov/models/ranking/keras_siamese_model.py +++ /dev/null @@ -1,123 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from abc import abstractmethod -from logging import getLogger -from typing import List - -import numpy as np -from tensorflow.keras import losses -from tensorflow.keras.models import Model -from tensorflow.keras.optimizers import Adam - -from deeppavlov.core.models.keras_model import KerasModel -from deeppavlov.models.ranking.siamese_model import SiameseModel - -log = getLogger(__name__) - - -class KerasSiameseModel(SiameseModel, KerasModel): - """The class implementing base functionality for siamese neural networks in keras. - - Args: - learning_rate: Learning rate. - use_matrix: Whether to use a trainable matrix with token (word) embeddings. - emb_matrix: An embeddings matrix to initialize an embeddings layer of a model. - Only used if ``use_matrix`` is set to ``True``. - max_sequence_length: A maximum length of text sequences in tokens. - Longer sequences will be truncated and shorter ones will be padded. - dynamic_batch: Whether to use dynamic batching. If ``True``, the maximum length of a sequence for a batch - will be equal to the maximum of all sequences lengths from this batch, - but not higher than ``max_sequence_length``. - attention: Whether any attention mechanism is used in the siamese network. - *args: Other parameters. - **kwargs: Other parameters. - """ - - def __init__(self, - learning_rate: float = 1e-3, - use_matrix: bool = True, - emb_matrix: np.ndarray = None, - max_sequence_length: int = None, - dynamic_batch: bool = False, - attention: bool = False, - *args, - **kwargs) -> None: - - super(KerasSiameseModel, self).__init__(*args, **kwargs) - - self.learning_rate = learning_rate - self.attention = attention - self.use_matrix = use_matrix - self.emb_matrix = emb_matrix - if dynamic_batch: - self.max_sequence_length = None - else: - self.max_sequence_length = max_sequence_length - self.model = self.create_model() - self.compile() - if self.load_path.exists(): - self.load() - else: - self.load_initial_emb_matrix() - - if not self.attention: - self.context_model = self.create_context_model() - self.response_model = self.create_response_model() - - def compile(self) -> None: - optimizer = Adam(lr=self.learning_rate) - loss = losses.binary_crossentropy - self.model.compile(loss=loss, optimizer=optimizer) - - def load(self) -> None: - log.info("[initializing `{}` from saved]".format(self.__class__.__name__)) - self.model.load_weights(str(self.load_path)) - - def save(self) -> None: - log.info("[saving `{}`]".format(self.__class__.__name__)) - self.model.save_weights(str(self.save_path)) - - def load_initial_emb_matrix(self) -> None: - log.info("[initializing new `{}`]".format(self.__class__.__name__)) - if self.use_matrix: - self.model.get_layer(name="embedding").set_weights([self.emb_matrix]) - - @abstractmethod - def create_model(self) -> Model: - pass - - def create_context_model(self) -> Model: - m = Model(self.model.inputs[:-1], - self.model.get_layer("sentence_embedding").get_output_at(0)) - return m - - def create_response_model(self) -> Model: - m = Model(self.model.inputs[-1], - self.model.get_layer("sentence_embedding").get_output_at(1)) - return m - - def _train_on_batch(self, batch: List[np.ndarray], y: List[int]) -> float: - loss = self.model.train_on_batch(batch, np.asarray(y)) - return loss - - def _predict_on_batch(self, batch: List[np.ndarray]) -> np.ndarray: - y_pred = self.model.predict_on_batch(batch) - return y_pred - - def _predict_context_on_batch(self, batch: List[np.ndarray]) -> np.ndarray: - return self.context_model.predict_on_batch(batch) - - def _predict_response_on_batch(self, batch: List[np.ndarray]) -> np.ndarray: - return self.response_model.predict_on_batch(batch) diff --git a/deeppavlov/models/ranking/matching_models/__init__.py b/deeppavlov/models/ranking/matching_models/__init__.py deleted file mode 100644 index e69de29bb2..0000000000 diff --git a/deeppavlov/models/ranking/matching_models/dam_utils/__init__.py b/deeppavlov/models/ranking/matching_models/dam_utils/__init__.py deleted file mode 100644 index e69de29bb2..0000000000 diff --git a/deeppavlov/models/ranking/matching_models/dam_utils/layers.py b/deeppavlov/models/ranking/matching_models/dam_utils/layers.py deleted file mode 100644 index 037453d77e..0000000000 --- a/deeppavlov/models/ranking/matching_models/dam_utils/layers.py +++ /dev/null @@ -1,555 +0,0 @@ -# Copyright 2018 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# @inproceedings{ , -# title={Multi-Turn Response Selection for Chatbots with Deep Attention Matching Network}, -# author={Xiangyang Zhou, Lu Li, Daxiang Dong, Yi Liu, Ying Chen, Wayne Xin Zhao, Dianhai Yu and Hua Wu}, -# booktitle={Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)}, -# volume={1}, -# pages={ -- }, -# year={2018} -# } -# ``` -# http://aclweb.org/anthology/P18-1103 -# -# Based on authors' Tensorflow code: https://github.com/baidu/Dialogue/tree/master/DAM - -from logging import getLogger - -import tensorflow as tf - -import deeppavlov.models.ranking.matching_models.dam_utils.operations as op - -log = getLogger(__name__) - - -def similarity(x, y, x_lengths, y_lengths): - '''calculate similarity with two 3d tensor. - - Args: - x: a tensor with shape [batch, time_x, dimension] - y: a tensor with shape [batch, time_y, dimension] - - Returns: - a tensor with shape [batch, time_x, time_y] - - Raises: - ValueError: if - the dimenisons of x and y are not equal. - ''' - with tf.variable_scope('x_attend_y'): - try: - x_a_y = block( - x, y, y, - Q_lengths=x_lengths, K_lengths=y_lengths) - except ValueError: - tf.get_variable_scope().reuse_variables() - x_a_y = block( - x, y, y, - Q_lengths=x_lengths, K_lengths=y_lengths) - - with tf.variable_scope('y_attend_x'): - try: - y_a_x = block( - y, x, x, - Q_lengths=y_lengths, K_lengths=x_lengths) - except ValueError: - tf.get_variable_scope().reuse_variables() - y_a_x = block( - y, x, x, - Q_lengths=y_lengths, K_lengths=x_lengths) - - return tf.matmul(x + x_a_y, y + y_a_x, transpose_b=True) - - -def dynamic_L(x): - '''Attention machanism to combine the infomation, - from https://arxiv.org/pdf/1612.01627.pdf. - - Args: - x: a tensor with shape [batch, time, dimension] - - Returns: - a tensor with shape [batch, dimension] - - Raises: - ''' - key_0 = tf.get_variable( - name='key', - shape=[x.shape[-1]], - dtype=tf.float32, - initializer=tf.random_uniform_initializer( - -tf.sqrt(6. / tf.cast(x.shape[-1], tf.float32)), - tf.sqrt(6. / tf.cast(x.shape[-1], tf.float32)))) - - key = op.dense(x, add_bias=False) # [batch, time, dimension] - weight = tf.reduce_sum(tf.multiply(key, key_0), axis=-1) # [batch, time] - weight = tf.expand_dims(tf.nn.softmax(weight), -1) # [batch, time, 1] - - L = tf.reduce_sum(tf.multiply(x, weight), axis=1) # [batch, dimension] - return L - - -def loss(x, y, num_classes=2, is_clip=True, clip_value=10): - '''From info x calculate logits as return loss. - - Args: - x: a tensor with shape [batch, dimension] - num_classes: a number - - Returns: - loss: a tensor with shape [1], which is the average loss of one batch - logits: a tensor with shape [batch, 1] - - Raises: - AssertionError: if - num_classes is not a int greater equal than 2. - TODO: - num_classes > 2 may be not adapted. - ''' - assert isinstance(num_classes, int) - assert num_classes >= 2 - - # W = tf.get_variable( - # name='weights', - # shape=[x.shape[-1], num_classes-1], - # initializer=tf.orthogonal_initializer()) - # bias = tf.get_variable( - # name='bias', - # shape=[num_classes-1], - # initializer=tf.zeros_initializer()) - # - # logits = tf.reshape(tf.matmul(x, W) + bias, [-1]) - # loss = tf.nn.sigmoid_cross_entropy_with_logits( - # labels=tf.cast(y, tf.float32), - # logits=logits) - # loss = tf.reduce_mean(tf.clip_by_value(loss, -clip_value, clip_value)) - - W = tf.get_variable( - name='weights', - shape=[x.shape[-1], num_classes], - initializer=tf.orthogonal_initializer()) - bias = tf.get_variable( - name='bias', - shape=[num_classes], - initializer=tf.zeros_initializer()) - - logits = tf.matmul(x, W) + bias - loss = tf.nn.sparse_softmax_cross_entropy_with_logits( - labels=y, - logits=logits) - loss = tf.reduce_mean(tf.clip_by_value(loss, -clip_value, clip_value)) - - return loss, logits - - -def attention( - Q, K, V, - Q_lengths, K_lengths, - attention_type='dot', - is_mask=True, mask_value=-2 ** 32 + 1, - drop_prob=None): - '''Add attention layer. - Args: - Q: a tensor with shape [batch, Q_time, Q_dimension] - K: a tensor with shape [batch, time, K_dimension] - V: a tensor with shape [batch, time, V_dimension] - - Q_length: a tensor with shape [batch] - K_length: a tensor with shape [batch] - - Returns: - a tensor with shape [batch, Q_time, V_dimension] - - Raises: - AssertionError: if - Q_dimension not equal to K_dimension when attention type is dot. - ''' - assert attention_type in ('dot', 'bilinear') - if attention_type == 'dot': - assert Q.shape[-1] == K.shape[-1] - - Q_time = Q.shape[1] - K_time = K.shape[1] - - if attention_type == 'dot': - logits = op.dot_sim(Q, K) # [batch, Q_time, time] - if attention_type == 'bilinear': - logits = op.bilinear_sim(Q, K) - - if is_mask: - mask = op.mask(Q_lengths, K_lengths, Q_time, K_time) # [batch, Q_time, K_time] - # mask = tf.Print(mask, [logits[0], mask[0]], tf.get_variable_scope().name + " logits, mask: ", summarize=10) - logits = mask * logits + (1 - mask) * mask_value - # logits = tf.Print(logits, [logits[0]], tf.get_variable_scope().name + " masked logits: ", summarize=10) - - attention = tf.nn.softmax(logits) - - if drop_prob is not None: - log.info('use attention drop') - attention = tf.nn.dropout(attention, drop_prob) - - return op.weighted_sum(attention, V) - - -def FFN(x, out_dimension_0=None, out_dimension_1=None): - '''Add two dense connected layer, max(0, x*W0+b0)*W1+b1. - - Args: - x: a tensor with shape [batch, time, dimension] - out_dimension: a number which is the output dimension - - Returns: - a tensor with shape [batch, time, out_dimension] - - Raises: - ''' - with tf.variable_scope('FFN_1'): - y = op.dense(x, out_dimension_0, initializer=tf.keras.initializers.he_normal(seed=42)) - y = tf.nn.relu(y) - with tf.variable_scope('FFN_2'): - # z = op.dense(y, out_dimension_1, initializer=tf.keras.initializers.glorot_uniform(seed=42)) # TODO: check - z = op.dense(y, out_dimension_1) # , add_bias=False) #!!!! - return z - - -def block( - Q, K, V, - Q_lengths, K_lengths, - attention_type='dot', - is_layer_norm=True, - is_mask=True, mask_value=-2 ** 32 + 1, - drop_prob=None): - '''Add a block unit from https://arxiv.org/pdf/1706.03762.pdf. - Args: - Q: a tensor with shape [batch, Q_time, Q_dimension] - K: a tensor with shape [batch, time, K_dimension] - V: a tensor with shape [batch, time, V_dimension] - - Q_length: a tensor with shape [batch] - K_length: a tensor with shape [batch] - - Returns: - a tensor with shape [batch, time, dimension] - - Raises: - ''' - att = attention(Q, K, V, - Q_lengths, K_lengths, - attention_type=attention_type, - is_mask=is_mask, mask_value=mask_value, - drop_prob=drop_prob) - if is_layer_norm: - with tf.variable_scope('attention_layer_norm'): - y = op.layer_norm_debug(Q + att) - else: - y = Q + att - - z = FFN(y) - if is_layer_norm: - with tf.variable_scope('FFN_layer_norm'): - w = op.layer_norm_debug(y + z) - else: - w = y + z - return w - - -def CNN(x, out_channels, filter_size, pooling_size, add_relu=True): - '''Add a convlution layer with relu and max pooling layer. - - Args: - x: a tensor with shape [batch, in_height, in_width, in_channels] - out_channels: a number - filter_size: a number - pooling_size: a number - - Returns: - a flattened tensor with shape [batch, num_features] - - Raises: - ''' - # calculate the last dimension of return - num_features = ((tf.shape(x)[1] - filter_size + 1) / pooling_size * - (tf.shape(x)[2] - filter_size + 1) / pooling_size) * out_channels - - in_channels = x.shape[-1] - weights = tf.get_variable( - name='filter', - shape=[filter_size, filter_size, in_channels, out_channels], - dtype=tf.float32, - initializer=tf.random_uniform_initializer(-0.01, 0.01)) - bias = tf.get_variable( - name='bias', - shape=[out_channels], - dtype=tf.float32, - initializer=tf.zeros_initializer()) - - conv = tf.nn.conv2d(x, weights, strides=[1, 1, 1, 1], padding="VALID") - conv = conv + bias - - if add_relu: - conv = tf.nn.relu(conv) - - pooling = tf.nn.max_pool( - conv, - ksize=[1, pooling_size, pooling_size, 1], - strides=[1, pooling_size, pooling_size, 1], - padding="VALID") - - return tf.contrib.layers.flatten(pooling) - - -def CNN_3d(x, out_channels_0, out_channels_1, add_relu=True): - '''Add a 3d convlution layer with relu and max pooling layer. - - Args: - x: a tensor with shape [batch, in_depth, in_height, in_width, in_channels] - out_channels: a number - filter_size: a number - pooling_size: a number - - Returns: - a flattened tensor with shape [batch, num_features] - - Raises: - ''' - in_channels = x.shape[-1] - weights_0 = tf.get_variable( - name='filter_0', - shape=[3, 3, 3, in_channels, out_channels_0], - dtype=tf.float32, - initializer=tf.random_uniform_initializer(-0.001, 0.001)) - bias_0 = tf.get_variable( - name='bias_0', - shape=[out_channels_0], - dtype=tf.float32, - initializer=tf.zeros_initializer()) - - conv_0 = tf.nn.conv3d(x, weights_0, strides=[1, 1, 1, 1, 1], padding="SAME") - log.info('conv_0 shape: %s' % conv_0.shape) - conv_0 = conv_0 + bias_0 - - if add_relu: - conv_0 = tf.nn.elu(conv_0) - - pooling_0 = tf.nn.max_pool3d( - conv_0, - ksize=[1, 3, 3, 3, 1], - strides=[1, 3, 3, 3, 1], - padding="SAME") - log.info('pooling_0 shape: %s' % pooling_0.shape) - - # layer_1 - weights_1 = tf.get_variable( - name='filter_1', - shape=[3, 3, 3, out_channels_0, out_channels_1], - dtype=tf.float32, - initializer=tf.random_uniform_initializer(-0.001, 0.001)) - bias_1 = tf.get_variable( - name='bias_1', - shape=[out_channels_1], - dtype=tf.float32, - initializer=tf.zeros_initializer()) - - conv_1 = tf.nn.conv3d(pooling_0, weights_1, strides=[1, 1, 1, 1, 1], padding="SAME") - log.info('conv_1 shape: %s' % conv_1.shape) - conv_1 = conv_1 + bias_1 - - if add_relu: - conv_1 = tf.nn.elu(conv_1) - - pooling_1 = tf.nn.max_pool3d( - conv_1, - ksize=[1, 3, 3, 3, 1], - strides=[1, 3, 3, 3, 1], - padding="SAME") - log.info('pooling_1 shape: %s' % pooling_1.shape) - - return tf.contrib.layers.flatten(pooling_1) - - -def CNN_3d_2d(x, out_channels_0, out_channels_1, add_relu=True): - '''Add a 3d convlution layer with relu and max pooling layer. - - Args: - x: a tensor with shape [batch, in_depth, in_height, in_width, in_channels] - out_channels: a number - filter_size: a number - pooling_size: a number - - Returns: - a flattened tensor with shape [batch, num_features] - - Raises: - ''' - in_channels = x.shape[-1] - weights_0 = tf.get_variable( - name='filter_0', - shape=[1, 3, 3, in_channels, out_channels_0], - dtype=tf.float32, - initializer=tf.random_uniform_initializer(-0.01, 0.01)) - bias_0 = tf.get_variable( - name='bias_0', - shape=[out_channels_0], - dtype=tf.float32, - initializer=tf.zeros_initializer()) - - conv_0 = tf.nn.conv3d(x, weights_0, strides=[1, 1, 1, 1, 1], padding="SAME") - log.info('conv_0 shape: %s' % conv_0.shape) - conv_0 = conv_0 + bias_0 - - if add_relu: - conv_0 = tf.nn.elu(conv_0) - - pooling_0 = tf.nn.max_pool3d( - conv_0, - ksize=[1, 1, 3, 3, 1], - strides=[1, 1, 3, 3, 1], - padding="SAME") - log.info('pooling_0 shape: %s' % pooling_0.shape) - - # layer_1 - weights_1 = tf.get_variable( - name='filter_1', - shape=[1, 3, 3, out_channels_0, out_channels_1], - dtype=tf.float32, - initializer=tf.random_uniform_initializer(-0.01, 0.01)) - bias_1 = tf.get_variable( - name='bias_1', - shape=[out_channels_1], - dtype=tf.float32, - initializer=tf.zeros_initializer()) - - conv_1 = tf.nn.conv3d(pooling_0, weights_1, strides=[1, 1, 1, 1, 1], padding="SAME") - log.info('conv_1 shape: %s' % conv_1.shape) - conv_1 = conv_1 + bias_1 - - if add_relu: - conv_1 = tf.nn.elu(conv_1) - - pooling_1 = tf.nn.max_pool3d( - conv_1, - ksize=[1, 1, 3, 3, 1], - strides=[1, 1, 3, 3, 1], - padding="SAME") - log.info('pooling_1 shape: %s' % pooling_1.shape) - - return tf.contrib.layers.flatten(pooling_1) - - -def CNN_3d_change(x, out_channels_0, out_channels_1, add_relu=True): - '''Add a 3d convlution layer with relu and max pooling layer. - - Args: - x: a tensor with shape [batch, in_depth, in_height, in_width, in_channels] - out_channels: a number - filter_size: a number - pooling_size: a number - - Returns: - a flattened tensor with shape [batch, num_features] - - Raises: - ''' - in_channels = x.shape[-1] - weights_0 = tf.get_variable( - name='filter_0', - shape=[3, 3, 3, in_channels, out_channels_0], - dtype=tf.float32, - # initializer=tf.random_normal_initializer(0, 0.05)) - initializer=tf.random_uniform_initializer(-0.01, 0.01)) - bias_0 = tf.get_variable( - name='bias_0', - shape=[out_channels_0], - dtype=tf.float32, - initializer=tf.zeros_initializer()) - # Todo - g_0 = tf.get_variable(name='scale_0', - shape=[out_channels_0], - dtype=tf.float32, - initializer=tf.ones_initializer()) - weights_0 = tf.reshape(g_0, [1, 1, 1, out_channels_0]) * tf.nn.l2_normalize(weights_0, [0, 1, 2]) - - conv_0 = tf.nn.conv3d(x, weights_0, strides=[1, 1, 1, 1, 1], padding="VALID") - log.info('conv_0 shape: %s' % conv_0.shape) - conv_0 = conv_0 + bias_0 - ####### - ''' - with tf.variable_scope('layer_0'): - conv_0 = op.layer_norm(conv_0, axis=[1, 2, 3, 4]) - log.info('layer_norm in cnn') - ''' - if add_relu: - conv_0 = tf.nn.elu(conv_0) - - pooling_0 = tf.nn.max_pool3d( - conv_0, - ksize=[1, 2, 3, 3, 1], - strides=[1, 2, 3, 3, 1], - padding="VALID") - log.info('pooling_0 shape: %s' % pooling_0.shape) - - # layer_1 - weights_1 = tf.get_variable( - name='filter_1', - shape=[2, 2, 2, out_channels_0, out_channels_1], - dtype=tf.float32, - initializer=tf.random_uniform_initializer(-0.01, 0.01)) - - bias_1 = tf.get_variable( - name='bias_1', - shape=[out_channels_1], - dtype=tf.float32, - initializer=tf.zeros_initializer()) - - g_1 = tf.get_variable(name='scale_1', - shape=[out_channels_1], - dtype=tf.float32, - initializer=tf.ones_initializer()) - weights_1 = tf.reshape(g_1, [1, 1, 1, out_channels_1]) * tf.nn.l2_normalize(weights_1, [0, 1, 2]) - - conv_1 = tf.nn.conv3d(pooling_0, weights_1, strides=[1, 1, 1, 1, 1], padding="VALID") - log.info('conv_1 shape: %s' % conv_1.shape) - conv_1 = conv_1 + bias_1 - # with tf.variable_scope('layer_1'): - # conv_1 = op.layer_norm(conv_1, axis=[1, 2, 3, 4]) - - if add_relu: - conv_1 = tf.nn.elu(conv_1) - - pooling_1 = tf.nn.max_pool3d( - conv_1, - ksize=[1, 3, 3, 3, 1], - strides=[1, 3, 3, 3, 1], - padding="VALID") - log.info('pooling_1 shape: %s' % pooling_1.shape) - - return tf.contrib.layers.flatten(pooling_1) - - -def RNN_last_state(x, lengths, hidden_size): - '''encode x with a gru cell and return the last state. - - Args: - x: a tensor with shape [batch, time, dimension] - length: a tensor with shape [batch] - - Return: - a tensor with shape [batch, hidden_size] - - Raises: - ''' - cell = tf.nn.rnn_cell.GRUCell(hidden_size) - outputs, last_states = tf.nn.dynamic_rnn(cell, x, lengths, dtype=tf.float32) - return outputs, last_states diff --git a/deeppavlov/models/ranking/matching_models/dam_utils/operations.py b/deeppavlov/models/ranking/matching_models/dam_utils/operations.py deleted file mode 100644 index a6bd6a5fee..0000000000 --- a/deeppavlov/models/ranking/matching_models/dam_utils/operations.py +++ /dev/null @@ -1,400 +0,0 @@ -# Copyright 2018 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# @inproceedings{ , -# title={Multi-Turn Response Selection for Chatbots with Deep Attention Matching Network}, -# author={Xiangyang Zhou, Lu Li, Daxiang Dong, Yi Liu, Ying Chen, Wayne Xin Zhao, Dianhai Yu and Hua Wu}, -# booktitle={Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)}, -# volume={1}, -# pages={ -- }, -# year={2018} -# } -# ``` -# http://aclweb.org/anthology/P18-1103 -# -# Based on authors' Tensorflow code: https://github.com/baidu/Dialogue/tree/master/DAM - -import math -from logging import getLogger - -import numpy as np -import tensorflow as tf -from scipy.stats import multivariate_normal - -log = getLogger(__name__) - - -def learning_rate(step_num, d_model=512, warmup_steps=4000): - a = step_num ** (-0.5) - b = step_num * warmup_steps ** (-1.5) - return a, b, d_model ** (-0.5) * min(step_num ** (-0.5), step_num * (warmup_steps ** (-1.5))) - - -def selu(x): - alpha = 1.6732632423543772848170429916717 - scale = 1.0507009873554804934193349852946 - log.info('use selu') - return scale * tf.where(x >= 0.0, x, alpha * tf.nn.elu(x)) - - -def bilinear_sim_4d(x, y, is_nor=True): - '''calulate bilinear similarity with two 4d tensor. - - Args: - x: a tensor with shape [batch, time_x, dimension_x, num_stacks] - y: a tensor with shape [batch, time_y, dimension_y, num_stacks] - - Returns: - a tensor with shape [batch, time_x, time_y, num_stacks] - - Raises: - ValueError: if - the shapes of x and y are not match; - bilinear matrix reuse error. - ''' - M = tf.get_variable( - name="bilinear_matrix", - shape=[x.shape[2], y.shape[2], x.shape[3]], - dtype=tf.float32, - initializer=tf.orthogonal_initializer()) - sim = tf.einsum('biks,kls,bjls->bijs', x, M, y) - - if is_nor: - scale = tf.sqrt(tf.cast(x.shape[2] * y.shape[2], tf.float32)) - scale = tf.maximum(1.0, scale) - return sim / scale - else: - return sim - - -def bilinear_sim(x, y, is_nor=True): - '''calculate bilinear similarity with two tensor. - Args: - x: a tensor with shape [batch, time_x, dimension_x] - y: a tensor with shape [batch, time_y, dimension_y] - - Returns: - a tensor with shape [batch, time_x, time_y] - Raises: - ValueError: if - the shapes of x and y are not match; - bilinear matrix reuse error. - ''' - M = tf.get_variable( - name="bilinear_matrix", - shape=[x.shape[-1], y.shape[-1]], - dtype=tf.float32, - # initializer=tf.orthogonal_initializer()) - initializer=tf.keras.initializers.glorot_uniform(seed=42)) - sim = tf.einsum('bik,kl,bjl->bij', x, M, y) - - if is_nor: - scale = tf.sqrt(tf.cast(x.shape[-1] * y.shape[-1], tf.float32)) - scale = tf.maximum(1.0, scale) - return sim / scale - else: - return sim - - -def dot_sim(x, y, is_nor=True): - '''calculate dot similarity with two tensor. - - Args: - x: a tensor with shape [batch, time_x, dimension] - y: a tensor with shape [batch, time_y, dimension] - - Returns: - a tensor with shape [batch, time_x, time_y] - Raises: - AssertionError: if - the shapes of x and y are not match. - ''' - assert x.shape[-1] == y.shape[-1] - - sim = tf.einsum('bik,bjk->bij', x, y) - - if is_nor: - scale = tf.sqrt(tf.cast(x.shape[-1], tf.float32)) - scale = tf.maximum(1.0, scale) - return sim / scale - else: - return sim - - -def layer_norm(x, axis=None, epsilon=1e-6): - '''Add layer normalization. - - Args: - x: a tensor - axis: the dimensions to normalize - - Returns: - a tensor the same shape as x. - - Raises: - ''' - log.info('wrong version of layer_norm') - scale = tf.get_variable( - name='scale', - shape=[1], - dtype=tf.float32, - initializer=tf.ones_initializer()) - bias = tf.get_variable( - name='bias', - shape=[1], - dtype=tf.float32, - initializer=tf.zeros_initializer()) - - if axis is None: - axis = [-1] - - mean = tf.reduce_mean(x, axis=axis, keepdims=True) - variance = tf.reduce_mean(tf.square(x - mean), axis=axis, keepdims=True) - norm = (x - mean) * tf.rsqrt(variance + epsilon) - return scale * norm + bias - - -def layer_norm_debug(x, axis=None, epsilon=1e-6): - '''Add layer normalization. - - Args: - x: a tensor - axis: the dimensions to normalize - - Returns: - a tensor the same shape as x. - - Raises: - ''' - if axis is None: - axis = [-1] - shape = [x.shape[i] for i in axis] - - scale = tf.get_variable( - name='scale', - shape=shape, - dtype=tf.float32, - initializer=tf.ones_initializer()) - bias = tf.get_variable( - name='bias', - shape=shape, - dtype=tf.float32, - initializer=tf.zeros_initializer()) - - mean = tf.reduce_mean(x, axis=axis, keepdims=True) - variance = tf.reduce_mean(tf.square(x - mean), axis=axis, keepdims=True) - norm = (x - mean) * tf.rsqrt(variance + epsilon) - return scale * norm + bias - - -def dense(x, out_dimension=None, add_bias=True, initializer=tf.orthogonal_initializer()): - '''Add dense connected layer, Wx + b. - - Args: - x: a tensor with shape [batch, time, dimension] - out_dimension: a number which is the output dimension - - Return: - a tensor with shape [batch, time, out_dimension] - - Raises: - ''' - if out_dimension is None: - out_dimension = x.shape[-1] - - W = tf.get_variable( - name='weights', - shape=[x.shape[-1], out_dimension], - dtype=tf.float32, - initializer=initializer) - if add_bias: - bias = tf.get_variable( - name='bias', - shape=[1], - dtype=tf.float32, - initializer=tf.zeros_initializer()) - return tf.einsum('bik,kj->bij', x, W) + bias - else: - return tf.einsum('bik,kj->bij', x, W) - - -def matmul_2d(x, out_dimension, drop_prob=None): - '''Multiplies 2-d tensor by weights. - - Args: - x: a tensor with shape [batch, dimension] - out_dimension: a number - - Returns: - a tensor with shape [batch, out_dimension] - - Raises: - ''' - W = tf.get_variable( - name='weights', - shape=[x.shape[1], out_dimension], - dtype=tf.float32, - initializer=tf.orthogonal_initializer()) - if drop_prob is not None: - W = tf.nn.dropout(W, drop_prob) - log.info('W is dropout') - - return tf.matmul(x, W) - - -def gauss_positional_encoding_vector(x, role=0, value=0): - position = int(x.shape[1]) - dimension = int(x.shape[2]) - log.info('position: %s' % position) - log.info('dimension: %s' % dimension) - - _lambda = tf.get_variable( - name='lambda', - shape=[position], - dtype=tf.float32, - initializer=tf.constant_initializer(value)) - _lambda = tf.expand_dims(_lambda, axis=-1) - - mean = [position / 2.0, dimension / 2.0] - - # cov = [[position/3.0, 0], [0, dimension/3.0]] - sigma_x = position / math.sqrt(4.0 * dimension) - sigma_y = math.sqrt(dimension / 4.0) - cov = [[sigma_x * sigma_x, role * sigma_x * sigma_y], - [role * sigma_x * sigma_y, sigma_y * sigma_y]] - - pos = np.dstack(np.mgrid[0:position, 0:dimension]) - - rv = multivariate_normal(mean, cov) - signal = rv.pdf(pos) - signal = signal - np.max(signal) / 2.0 - - signal = tf.multiply(_lambda, signal) - signal = tf.expand_dims(signal, axis=0) - - log.info('gauss positional encoding') - - return x + _lambda * signal - - -def positional_encoding(x, min_timescale=1.0, max_timescale=1.0e4, value=0): - '''Adds a bunch of sinusoids of different frequencies to a tensor. - - Args: - x: a tensor with shape [batch, length, channels] - min_timescale: a float - max_timescale: a float - - Returns: - a tensor the same shape as x. - - Raises: - ''' - length = x.shape[1] - channels = x.shape[2] - _lambda = tf.get_variable( - name='lambda', - shape=[1], - dtype=tf.float32, - initializer=tf.constant_initializer(value)) - - position = tf.to_float(tf.range(length)) - num_timescales = channels // 2 - log_timescale_increment = ( - math.log(float(max_timescale) / float(min_timescale)) / - (tf.to_float(num_timescales) - 1)) - inv_timescales = min_timescale * tf.exp( - tf.to_float(tf.range(num_timescales)) * -log_timescale_increment) - scaled_time = tf.expand_dims(position, 1) * tf.expand_dims(inv_timescales, 0) - signal = tf.concat([tf.sin(scaled_time), tf.cos(scaled_time)], axis=1) - signal = tf.pad(signal, [[0, 0], [0, tf.mod(channels, 2)]]) - # signal = tf.reshape(signal, [1, length, channels]) - signal = tf.expand_dims(signal, axis=0) - - return x + _lambda * signal - - -def positional_encoding_vector(x, min_timescale=1.0, max_timescale=1.0e4, value=0): - '''Adds a bunch of sinusoids of different frequencies to a tensor. - - Args: - x: a tensor with shape [batch, length, channels] - min_timescale: a float - max_timescale: a float - - Returns: - a tensor the same shape as x. - - Raises: - ''' - length = x.shape[1] - channels = x.shape[2] - _lambda = tf.get_variable( - name='lambda', - shape=[length], - dtype=tf.float32, - initializer=tf.constant_initializer(value)) - _lambda = tf.expand_dims(_lambda, axis=-1) - - position = tf.to_float(tf.range(length)) - num_timescales = channels // 2 - log_timescale_increment = ( - math.log(float(max_timescale) / float(min_timescale)) / - (tf.to_float(num_timescales) - 1)) - inv_timescales = min_timescale * tf.exp( - tf.to_float(tf.range(num_timescales)) * -log_timescale_increment) - scaled_time = tf.expand_dims(position, 1) * tf.expand_dims(inv_timescales, 0) - signal = tf.concat([tf.sin(scaled_time), tf.cos(scaled_time)], axis=1) - signal = tf.pad(signal, [[0, 0], [0, tf.mod(channels, 2)]]) - - signal = tf.multiply(_lambda, signal) - signal = tf.expand_dims(signal, axis=0) - - return x + signal - - -def mask(row_lengths, col_lengths, max_row_length, max_col_length): - '''Return a mask tensor representing the first N positions of each row and each column. - - Args: - row_lengths: a tensor with shape [batch] - col_lengths: a tensor with shape [batch] - - Returns: - a mask tensor with shape [batch, max_row_length, max_col_length] - - Raises: - ''' - row_mask = tf.sequence_mask(row_lengths, max_row_length) # bool, [batch, max_row_len] - col_mask = tf.sequence_mask(col_lengths, max_col_length) # bool, [batch, max_col_len] - - row_mask = tf.cast(tf.expand_dims(row_mask, -1), tf.float32) - col_mask = tf.cast(tf.expand_dims(col_mask, -1), tf.float32) - - return tf.einsum('bik,bjk->bij', row_mask, col_mask) - - -def weighted_sum(weight, values): - '''Calcualte the weighted sum. - - Args: - weight: a tensor with shape [batch, time, dimension] - values: a tensor with shape [batch, dimension, values_dimension] - - Return: - a tensor with shape [batch, time, values_dimension] - - Raises: - ''' - return tf.einsum('bij,bjk->bik', weight, values) diff --git a/deeppavlov/models/ranking/mpm_siamese_network.py b/deeppavlov/models/ranking/mpm_siamese_network.py deleted file mode 100644 index cccc26f508..0000000000 --- a/deeppavlov/models/ranking/mpm_siamese_network.py +++ /dev/null @@ -1,180 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - - -from logging import getLogger - -from tensorflow.keras import backend as K -from tensorflow.keras.initializers import glorot_uniform, Orthogonal -from tensorflow.keras.layers import Input, LSTM, Lambda, Dense, Dropout, Bidirectional -from tensorflow.keras.models import Model - -from deeppavlov.core.common.registry import register -from deeppavlov.core.layers.keras_layers import AttentiveMatchingLayer, MaxattentiveMatchingLayer -from deeppavlov.core.layers.keras_layers import FullMatchingLayer, MaxpoolingMatchingLayer -from deeppavlov.models.ranking.bilstm_siamese_network import BiLSTMSiameseNetwork - -log = getLogger(__name__) - - -@register('mpm_nn') -class MPMSiameseNetwork(BiLSTMSiameseNetwork): - """The class implementing a siamese neural network with bilateral multi-Perspective matching. - - The network architecture is based on https://arxiv.org/abs/1702.03814. - - Args: - dense_dim: Dimensionality of the dense layer. - perspective_num: Number of perspectives in multi-perspective matching layers. - aggregation dim: Dimensionality of the hidden state in the second BiLSTM layer. - inpdrop_val: Float between 0 and 1. A dropout value for the linear transformation of the inputs. - recdrop_val: Float between 0 and 1. A dropout value for the linear transformation of the recurrent state. - ldrop_val: A dropout value of the dropout layer before the second BiLSTM layer. - dropout_val: A dropout value of the dropout layer after the second BiLSTM layer. - """ - - def __init__(self, - dense_dim: int = 50, - perspective_num: int = 20, - aggregation_dim: int = 200, - recdrop_val: float = 0.0, - inpdrop_val: float = 0.0, - ldrop_val: float = 0.0, - dropout_val: float = 0.0, - *args, - **kwargs) -> None: - - self.dense_dim = dense_dim - self.perspective_num = perspective_num - self.aggregation_dim = aggregation_dim - self.ldrop_val = ldrop_val - self.recdrop_val = recdrop_val - self.inpdrop_val = inpdrop_val - self.dropout_val = dropout_val - self.seed = kwargs.get("triplet_loss") - self.triplet_mode = kwargs.get("triplet_loss") - - super(MPMSiameseNetwork, self).__init__(*args, **kwargs) - - def create_lstm_layer_1(self): - ker_in = glorot_uniform(seed=self.seed) - rec_in = Orthogonal(seed=self.seed) - bioutp = Bidirectional(LSTM(self.hidden_dim, - input_shape=(self.max_sequence_length, self.embedding_dim,), - kernel_regularizer=None, - recurrent_regularizer=None, - bias_regularizer=None, - activity_regularizer=None, - recurrent_dropout=self.recdrop_val, - dropout=self.inpdrop_val, - kernel_initializer=ker_in, - recurrent_initializer=rec_in, - return_sequences=True), merge_mode=None) - return bioutp - - def create_lstm_layer_2(self): - ker_in = glorot_uniform(seed=self.seed) - rec_in = Orthogonal(seed=self.seed) - bioutp = Bidirectional(LSTM(self.aggregation_dim, - input_shape=(self.max_sequence_length, 8 * self.perspective_num,), - kernel_regularizer=None, - recurrent_regularizer=None, - bias_regularizer=None, - activity_regularizer=None, - recurrent_dropout=self.recdrop_val, - dropout=self.inpdrop_val, - kernel_initializer=ker_in, - recurrent_initializer=rec_in, - return_sequences=False), - merge_mode='concat', - name="sentence_embedding") - return bioutp - - def create_model(self) -> Model: - if self.use_matrix: - context = Input(shape=(self.max_sequence_length,)) - response = Input(shape=(self.max_sequence_length,)) - emb_layer = self.embedding_layer() - emb_c = emb_layer(context) - emb_r = emb_layer(response) - else: - context = Input(shape=(self.max_sequence_length, self.embedding_dim,)) - response = Input(shape=(self.max_sequence_length, self.embedding_dim,)) - emb_c = context - emb_r = response - lstm_layer = self.create_lstm_layer_1() - lstm_a = lstm_layer(emb_c) - lstm_b = lstm_layer(emb_r) - - f_layer_f = FullMatchingLayer(self.perspective_num) - f_layer_b = FullMatchingLayer(self.perspective_num) - f_a_forw = f_layer_f([lstm_a[0], lstm_b[0]])[0] - f_a_back = f_layer_b([Lambda(lambda x: K.reverse(x, 1))(lstm_a[1]), - Lambda(lambda x: K.reverse(x, 1))(lstm_b[1])])[0] - f_a_back = Lambda(lambda x: K.reverse(x, 1))(f_a_back) - f_b_forw = f_layer_f([lstm_b[0], lstm_a[0]])[0] - f_b_back = f_layer_b([Lambda(lambda x: K.reverse(x, 1))(lstm_b[1]), - Lambda(lambda x: K.reverse(x, 1))(lstm_a[1])])[0] - f_b_back = Lambda(lambda x: K.reverse(x, 1))(f_b_back) - - mp_layer_f = MaxpoolingMatchingLayer(self.perspective_num) - mp_layer_b = MaxpoolingMatchingLayer(self.perspective_num) - mp_a_forw = mp_layer_f([lstm_a[0], lstm_b[0]])[0] - mp_a_back = mp_layer_b([lstm_a[1], lstm_b[1]])[0] - mp_b_forw = mp_layer_f([lstm_b[0], lstm_a[0]])[0] - mp_b_back = mp_layer_b([lstm_b[1], lstm_a[1]])[0] - - at_layer_f = AttentiveMatchingLayer(self.perspective_num) - at_layer_b = AttentiveMatchingLayer(self.perspective_num) - at_a_forw = at_layer_f([lstm_a[0], lstm_b[0]])[0] - at_a_back = at_layer_b([lstm_a[1], lstm_b[1]])[0] - at_b_forw = at_layer_f([lstm_b[0], lstm_a[0]])[0] - at_b_back = at_layer_b([lstm_b[1], lstm_a[1]])[0] - - ma_layer_f = MaxattentiveMatchingLayer(self.perspective_num) - ma_layer_b = MaxattentiveMatchingLayer(self.perspective_num) - ma_a_forw = ma_layer_f([lstm_a[0], lstm_b[0]])[0] - ma_a_back = ma_layer_b([lstm_a[1], lstm_b[1]])[0] - ma_b_forw = ma_layer_f([lstm_b[0], lstm_a[0]])[0] - ma_b_back = ma_layer_b([lstm_b[1], lstm_a[1]])[0] - - concat_a = Lambda(lambda x: K.concatenate(x, axis=-1))([f_a_forw, f_a_back, - mp_a_forw, mp_a_back, - at_a_forw, at_a_back, - ma_a_forw, ma_a_back]) - concat_b = Lambda(lambda x: K.concatenate(x, axis=-1))([f_b_forw, f_b_back, - mp_b_forw, mp_b_back, - at_b_forw, at_b_back, - ma_b_forw, ma_b_back]) - - concat_a = Dropout(self.ldrop_val)(concat_a) - concat_b = Dropout(self.ldrop_val)(concat_b) - - lstm_layer_agg = self.create_lstm_layer_2() - agg_a = lstm_layer_agg(concat_a) - agg_b = lstm_layer_agg(concat_b) - - agg_a = Dropout(self.dropout_val)(agg_a) - agg_b = Dropout(self.dropout_val)(agg_b) - - reduced = Lambda(lambda x: K.concatenate(x, axis=-1))([agg_a, agg_b]) - - if self.triplet_mode: - dist = Lambda(self._pairwise_distances)([agg_a, agg_b]) - else: - ker_in = glorot_uniform(seed=self.seed) - dense = Dense(self.dense_dim, kernel_initializer=ker_in)(reduced) - dist = Dense(1, activation='sigmoid', name="score_model")(dense) - model = Model([context, response], dist) - return model diff --git a/deeppavlov/models/ranking/rel_ranker.py b/deeppavlov/models/ranking/rel_ranker.py deleted file mode 100644 index 6d69ad27fc..0000000000 --- a/deeppavlov/models/ranking/rel_ranker.py +++ /dev/null @@ -1,146 +0,0 @@ -from typing import List, Tuple, Union, Dict, Optional - -import numpy as np -import tensorflow as tf - -from deeppavlov.core.common.check_gpu import check_gpu_existence -from deeppavlov.core.common.registry import register -from deeppavlov.core.layers.tf_layers import variational_dropout -from deeppavlov.core.models.component import Component -from deeppavlov.core.models.tf_model import LRScheduledTFModel -from deeppavlov.models.embedders.abstract_embedder import Embedder -from deeppavlov.models.squad.utils import CudnnGRU, CudnnCompatibleGRU, softmax_mask - - -@register('two_sentences_emb') -class TwoSentencesEmbedder(Component): - """This class is used for embedding of two sentences.""" - - def __init__(self, embedder: Embedder, **kwargs): - """ - - Args: - embedder: what embedder to use: Glove, Fasttext or other - **kwargs: - """ - self.embedder = embedder - - def __call__(self, sentence_tokens_1: List[List[str]], sentence_tokens_2: List[List[str]]) -> \ - Tuple[List[Union[list, np.ndarray]], List[Union[list, np.ndarray]]]: - sentence_token_embs_1 = self.embedder(sentence_tokens_1) - sentence_token_embs_2 = self.embedder(sentence_tokens_2) - return sentence_token_embs_1, sentence_token_embs_2 - - -@register('rel_ranker') -class RelRanker(LRScheduledTFModel): - """ - This class determines whether the relation appropriate for the question or not. - """ - - def __init__(self, n_classes: int = 2, - dropout_keep_prob: float = 0.5, - return_probas: bool = False, **kwargs): - """ - - Args: - n_classes: number of classes for classification - dropout_keep_prob: Probability of keeping the hidden state, values from 0 to 1. 0.5 works well - in most cases. - return_probas: whether to return confidences of the relation to be appropriate or not - **kwargs: - """ - kwargs.setdefault('learning_rate_drop_div', 10.0) - kwargs.setdefault('learning_rate_drop_patience', 5.0) - kwargs.setdefault('clip_norm', 5.0) - - super().__init__(**kwargs) - - self.n_classes = n_classes - self.dropout_keep_prob = dropout_keep_prob - self.return_probas = return_probas - config = tf.ConfigProto() - config.gpu_options.allow_growth = True - - if check_gpu_existence(): - self.GRU = CudnnGRU - else: - self.GRU = CudnnCompatibleGRU - - self.question_ph = tf.placeholder(tf.float32, [None, None, 300]) - self.rel_emb_ph = tf.placeholder(tf.float32, [None, None, 300]) - - r_mask_2 = tf.cast(self.rel_emb_ph, tf.bool) - r_len_2 = tf.reduce_sum(tf.cast(r_mask_2, tf.int32), axis=2) - r_mask = tf.cast(r_len_2, tf.bool) - r_len = tf.reduce_sum(tf.cast(r_mask, tf.int32), axis=1) - rel_emb = tf.math.divide_no_nan(tf.reduce_sum(self.rel_emb_ph, axis=1), - tf.cast(tf.expand_dims(r_len, axis=1), tf.float32)) - - self.y_ph = tf.placeholder(tf.int32, shape=(None,)) - self.one_hot_labels = tf.one_hot(self.y_ph, depth=self.n_classes, dtype=tf.float32) - self.keep_prob_ph = tf.placeholder_with_default(1.0, shape=[], name='keep_prob_ph') - - q_mask_2 = tf.cast(self.question_ph, tf.bool) - q_len_2 = tf.reduce_sum(tf.cast(q_mask_2, tf.int32), axis=2) - q_mask = tf.cast(q_len_2, tf.bool) - q_len = tf.reduce_sum(tf.cast(q_mask, tf.int32), axis=1) - - question_dr = variational_dropout(self.question_ph, keep_prob=self.keep_prob_ph) - b_size = tf.shape(self.question_ph)[0] - - with tf.variable_scope("question_encode"): - rnn = self.GRU(num_layers=2, num_units=75, batch_size=b_size, input_size=300, keep_prob=self.keep_prob_ph) - q = rnn(question_dr, seq_len=q_len) - - with tf.variable_scope("attention"): - rel_emb_exp = tf.expand_dims(rel_emb, axis=1) - dot_products = tf.reduce_sum(tf.multiply(q, rel_emb_exp), axis=2, keep_dims=False) - s_mask = softmax_mask(dot_products, q_mask) - att_weights = tf.expand_dims(tf.nn.softmax(s_mask), axis=2) - self.s_r = tf.reduce_sum(tf.multiply(att_weights, q), axis=1) - - self.logits = tf.layers.dense(tf.multiply(self.s_r, rel_emb), 2, activation=None, use_bias=False) - self.y_pred = tf.argmax(self.logits, axis=-1) - - loss_tensor = tf.nn.sigmoid_cross_entropy_with_logits(labels=self.one_hot_labels, logits=self.logits) - - self.loss = tf.reduce_mean(loss_tensor) - self.train_op = self.get_train_op(self.loss) - - self.sess = tf.Session(config=config) - self.sess.run(tf.global_variables_initializer()) - self.load() - - def fill_feed_dict(self, questions_embs: List[np.ndarray], rels_embs: List[np.ndarray], y=None, train=False) -> \ - Dict[tf.placeholder, List[np.ndarray]]: - questions_embs = np.array(questions_embs) - rels_embs = np.array(rels_embs) - feed_dict = {self.question_ph: questions_embs, self.rel_emb_ph: rels_embs} - if y is not None: - feed_dict[self.y_ph] = y - if train: - feed_dict[self.keep_prob_ph] = self.dropout_keep_prob - else: - feed_dict[self.keep_prob_ph] = 1.0 - - return feed_dict - - def __call__(self, questions_embs: List[np.ndarray], rels_embs: List[np.ndarray]) -> \ - List[np.ndarray]: - feed_dict = self.fill_feed_dict(questions_embs, rels_embs) - if self.return_probas: - pred = self.sess.run(self.logits, feed_dict) - else: - pred = self.sess.run(self.y_pred, feed_dict) - return pred - - def train_on_batch(self, questions_embs: List[np.ndarray], - rels_embs: List[np.ndarray], - y: List[int]) -> Dict[str, float]: - feed_dict = self.fill_feed_dict(questions_embs, rels_embs, y, train=True) - _, loss_value = self.sess.run([self.train_op, self.loss], feed_dict) - - return {'loss': loss_value, - 'learning_rate': self.get_learning_rate(), - 'momentum': self.get_momentum()} diff --git a/deeppavlov/models/ranking/sequential_matching_network.py b/deeppavlov/models/ranking/sequential_matching_network.py deleted file mode 100644 index a9222897af..0000000000 --- a/deeppavlov/models/ranking/sequential_matching_network.py +++ /dev/null @@ -1,150 +0,0 @@ -# Copyright 2018 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from logging import getLogger -from typing import Optional - -import numpy as np -import tensorflow as tf - -from deeppavlov.core.common.registry import register -from deeppavlov.models.ranking.tf_base_matching_model import TensorflowBaseMatchingModel - -log = getLogger(__name__) - - -@register('smn_nn') -class SMNNetwork(TensorflowBaseMatchingModel): - """ - Tensorflow implementation of Sequential Matching Network - - Wu, Yu, et al. "Sequential Matching Network: A New Architecture for Multi-turn Response Selection in - Retrieval-based Chatbots." ACL. 2017. - https://arxiv.org/abs/1612.01627 - - Based on authors' Tensorflow code: https://github.com/MarkWuNLP/MultiTurnResponseSelection - - Args: - num_context_turns (int): A number of ``context`` turns in data samples. - max_sequence_length (int): A maximum length of text sequences in tokens. - Longer sequences will be truncated and shorter ones will be padded. - learning_rate (float): Initial learning rate. - emb_matrix (np.ndarray): An embeddings matrix to initialize an embeddings layer of a model. - trainable_embeddings (bool): Whether train embeddings matrix or not. - embedding_dim (int): Dimensionality of token (word) embeddings. - """ - - def __init__(self, - embedding_dim: int = 200, - max_sequence_length: int = 50, - learning_rate: float = 1e-3, - emb_matrix: Optional[np.ndarray] = None, - trainable_embeddings: bool = False, - *args, - **kwargs): - - self.max_sentence_len = max_sequence_length - self.word_embedding_size = embedding_dim - self.trainable = trainable_embeddings - self.learning_rate = learning_rate - self.emb_matrix = emb_matrix - - super(SMNNetwork, self).__init__(*args, **kwargs) - - self.sess_config = tf.ConfigProto(allow_soft_placement=True) - self.sess_config.gpu_options.allow_growth = True - self.sess = tf.Session(config=self.sess_config) - self._init_graph() - self.sess.run(tf.global_variables_initializer()) - - if self.load_path is not None: - self.load() - - def _init_placeholders(self): - with tf.variable_scope('inputs'): - # Utterances and their lengths - self.utterance_ph = tf.placeholder(tf.int32, shape=(None, self.num_context_turns, self.max_sentence_len)) - self.all_utterance_len_ph = tf.placeholder(tf.int32, shape=(None, self.num_context_turns)) - - # Responses and their lengths - self.response_ph = tf.placeholder(tf.int32, shape=(None, self.max_sentence_len)) - self.response_len_ph = tf.placeholder(tf.int32, shape=(None,)) - - # Labels - self.y_true = tf.placeholder(tf.int32, shape=(None,)) - - def _init_graph(self): - self._init_placeholders() - - word_embeddings = tf.get_variable("word_embeddings_v", - initializer=tf.constant(self.emb_matrix, dtype=tf.float32), - trainable=self.trainable) - - all_utterance_embeddings = tf.nn.embedding_lookup(word_embeddings, self.utterance_ph) - response_embeddings = tf.nn.embedding_lookup(word_embeddings, self.response_ph) - sentence_GRU = tf.nn.rnn_cell.GRUCell(self.word_embedding_size, kernel_initializer=tf.orthogonal_initializer()) - all_utterance_embeddings = tf.unstack(all_utterance_embeddings, num=self.num_context_turns, - axis=1) # list of self.num_context_turns tensors with shape (?, 200) - all_utterance_len = tf.unstack(self.all_utterance_len_ph, num=self.num_context_turns, axis=1) - A_matrix = tf.get_variable('A_matrix_v', shape=(self.word_embedding_size, self.word_embedding_size), - initializer=tf.contrib.layers.xavier_initializer(), dtype=tf.float32) - final_GRU = tf.nn.rnn_cell.GRUCell(self.word_embedding_size, kernel_initializer=tf.orthogonal_initializer()) - reuse = None - - response_GRU_embeddings, _ = tf.nn.dynamic_rnn(sentence_GRU, - response_embeddings, - sequence_length=self.response_len_ph, - dtype=tf.float32, - scope='sentence_GRU') - response_embeddings = tf.transpose(response_embeddings, perm=[0, 2, 1]) - response_GRU_embeddings = tf.transpose(response_GRU_embeddings, perm=[0, 2, 1]) - matching_vectors = [] - for utterance_embeddings, utterance_len in zip(all_utterance_embeddings, all_utterance_len): - matrix1 = tf.matmul(utterance_embeddings, response_embeddings) - utterance_GRU_embeddings, _ = tf.nn.dynamic_rnn(sentence_GRU, - utterance_embeddings, - sequence_length=utterance_len, - dtype=tf.float32, - scope='sentence_GRU') - matrix2 = tf.einsum('aij,jk->aik', utterance_GRU_embeddings, A_matrix) # TODO:check this - matrix2 = tf.matmul(matrix2, response_GRU_embeddings) - matrix = tf.stack([matrix1, matrix2], axis=3, name='matrix_stack') - conv_layer = tf.layers.conv2d(matrix, filters=8, kernel_size=(3, 3), padding='VALID', - kernel_initializer=tf.contrib.keras.initializers.he_normal(), - activation=tf.nn.relu, reuse=reuse, name='conv') # TODO: check other params - pooling_layer = tf.layers.max_pooling2d(conv_layer, (3, 3), strides=(3, 3), - padding='VALID', name='max_pooling') # TODO: check other params - matching_vector = tf.layers.dense(tf.contrib.layers.flatten(pooling_layer), 50, - kernel_initializer=tf.contrib.layers.xavier_initializer(), - activation=tf.tanh, reuse=reuse, - name='matching_v') # TODO: check wthether this is correct - if not reuse: - reuse = True - matching_vectors.append(matching_vector) - _, last_hidden = tf.nn.dynamic_rnn(final_GRU, - tf.stack(matching_vectors, axis=0, name='matching_stack'), - # resulting shape: (10, ?, 50) - dtype=tf.float32, - time_major=True, - scope='final_GRU') # TODO: check time_major - logits = tf.layers.dense(last_hidden, 2, kernel_initializer=tf.contrib.layers.xavier_initializer(), - name='final_v') - self.y_pred = tf.nn.softmax(logits) - self.logits = logits - self.loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels=self.y_true, logits=logits)) - optimizer = tf.train.AdamOptimizer(learning_rate=self.learning_rate) - self.train_op = optimizer.minimize(self.loss) - - # Debug - self.print_number_of_parameters() diff --git a/deeppavlov/models/ranking/siamese_model.py b/deeppavlov/models/ranking/siamese_model.py deleted file mode 100644 index 64eb8b2d7f..0000000000 --- a/deeppavlov/models/ranking/siamese_model.py +++ /dev/null @@ -1,135 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from typing import List, Iterable, Union, Tuple, Dict - -import numpy as np - -from deeppavlov.core.models.nn_model import NNModel - - -class SiameseModel(NNModel): - """The class implementing base functionality for siamese neural networks. - - Args: - batch_size: A size of a batch. - num_context_turns: A number of ``context`` turns in data samples. - *args: Other parameters. - **kwargs: Other parameters. - """ - - def __init__(self, - batch_size: int, - num_context_turns: int = 1, - *args, **kwargs) -> None: - super().__init__(*args, **kwargs) - - self.batch_size = batch_size - self.num_context_turns = num_context_turns - - def load(self, *args, **kwargs) -> None: - pass - - def save(self, *args, **kwargs) -> None: - pass - - def train_on_batch(self, samples_generator: Iterable[List[np.ndarray]], y: List[int]) -> float: - """ - This method is called by trainer to make one training step on one batch. - The number of samples returned by `samples_generator` is always equal to `batch_size`, so we need to: - 1) accumulate data for all of the inputs of the model; - 2) format inputs of a model in a proper way using `self._make_batch` function; - 3) run a model with provided inputs and ground truth labels (`y`) using `self._train_on_batch` function; - 4) return mean loss value on the batch - - Args: - samples_generator (Iterable[List[np.ndarray]]): generator that returns list of numpy arrays - of words of all sentences represented as integers. - Its shape: (number_of_context_turns + 1, max_number_of_words_in_a_sentence) - y (List[int]): tuple of labels, with shape: (batch_size, ) - - Returns: - float: value of mean loss on the batch - """ - buf = [] - for sample in samples_generator: - self._append_sample_to_batch_buffer(sample, buf) - b = self._make_batch(buf) - loss = self._train_on_batch(b, y) - return loss - - def __call__(self, samples_generator: Iterable[List[np.ndarray]]) -> Union[np.ndarray, List[str]]: - """ - This method is called by trainer to make one evaluation step on one batch. - - Args: - samples_generator (Iterable[List[np.ndarray]]): generator that returns list of numpy arrays - of words of all sentences represented as integers. - Has shape: (number_of_context_turns + 1, max_number_of_words_in_a_sentence) - - Returns: - np.ndarray: predictions for the batch of samples - """ - y_pred = [] - buf = [] - for j, sample in enumerate(samples_generator, start=1): - n_responses = self._append_sample_to_batch_buffer(sample, buf) - if len(buf) >= self.batch_size: - for i in range(len(buf) // self.batch_size): - b = self._make_batch(buf[i * self.batch_size:(i + 1) * self.batch_size]) - yp = self._predict_on_batch(b) - y_pred += list(yp) - lenb = len(buf) % self.batch_size - if lenb != 0: - buf = buf[-lenb:] - else: - buf = [] - if len(buf) != 0: - b = self._make_batch(buf) - yp = self._predict_on_batch(b) - y_pred += list(yp) - y_pred = np.asarray(y_pred) - # reshape to [batch_size, n_responses] if needed (n_responses > 1) - y_pred = np.reshape(y_pred, (j, n_responses)) if n_responses > 1 else y_pred - return y_pred - - def reset(self) -> None: - pass - - def _append_sample_to_batch_buffer(self, sample: List, - buf: Union[List[List[np.ndarray]], List[Tuple[np.ndarray]]]) -> int: - context = sample[:self.num_context_turns] - responses = sample[self.num_context_turns:] - buf += [context + [el] for el in responses] - - return len(responses) - - def _train_on_batch(self, batch: Union[List[np.ndarray], Dict], y: List[int]) -> float: - pass - - def _predict_on_batch(self, batch: Union[List[np.ndarray], Dict]) -> np.ndarray: - pass - - def _predict_context_on_batch(self, batch: List[np.ndarray]) -> np.ndarray: - pass - - def _predict_response_on_batch(self, batch: List[np.ndarray]) -> np.ndarray: - pass - - def _make_batch(self, x: List[List[np.ndarray]]) -> List[np.ndarray]: - b = [] - for i in range(len(x[0])): - z = [el[i] for el in x] - b.append(np.asarray(z)) - return b diff --git a/deeppavlov/models/ranking/siamese_predictor.py b/deeppavlov/models/ranking/siamese_predictor.py deleted file mode 100644 index a42dccc22b..0000000000 --- a/deeppavlov/models/ranking/siamese_predictor.py +++ /dev/null @@ -1,146 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from logging import getLogger -from typing import List, Iterable, Callable, Union - -import numpy as np - -from deeppavlov.core.common.registry import register -from deeppavlov.core.data.simple_vocab import SimpleVocabulary -from deeppavlov.core.models.component import Component -from deeppavlov.models.ranking.keras_siamese_model import SiameseModel - -log = getLogger(__name__) - - -@register('siamese_predictor') -class SiamesePredictor(Component): - """The class for ranking or paraphrase identification using the trained siamese network in the ``interact`` mode. - - Args: - batch_size: A size of a batch. - num_context_turns: A number of ``context`` turns in data samples. - ranking: Whether to perform ranking. - If it is set to ``False`` paraphrase identification will be performed. - attention: Whether any attention mechanism is used in the siamese network. - If ``False`` then calculated in advance vectors of ``responses`` - will be used to obtain similarity score for the input ``context``; - Otherwise the whole siamese architecture will be used - to obtain similarity score for the input ``context`` and each particular ``response``. - The parameter will be used if the ``ranking`` is set to ``True``. - responses: A instance of :class:`~deeppavlov.core.data.simple_vocab.SimpleVocabulary` - with all possible ``responses`` to perform ranking. - Will be used if the ``ranking`` is set to ``True``. - preproc_func: A ``__call__`` function of the - :class:`~deeppavlov.models.preprocessors.siamese_preprocessor.SiamesePreprocessor`. - interact_pred_num: The number of the most relevant ``responses`` which will be returned. - Will be used if the ``ranking`` is set to ``True``. - **kwargs: Other parameters. - """ - - def __init__(self, - model: SiameseModel, - batch_size: int, - num_context_turns: int = 1, - ranking: bool = True, - attention: bool = False, - responses: SimpleVocabulary = None, - preproc_func: Callable = None, - interact_pred_num: int = 3, - *args, **kwargs) -> None: - - super().__init__() - - self.batch_size = batch_size - self.num_context_turns = num_context_turns - self.ranking = ranking - self.attention = attention - self.preproc_responses = [] - self.response_embeddings = None - self.preproc_func = preproc_func - self.interact_pred_num = interact_pred_num - self.model = model - if self.ranking: - self.responses = {el[1]: el[0] for el in responses.items()} - self._build_preproc_responses() - if not self.attention: - self._build_response_embeddings() - - def __call__(self, batch: Iterable[List[np.ndarray]]) -> List[Union[List[str], str]]: - context = next(batch) - try: - next(batch) - log.error("It is not intended to use the `%s` with the batch size greater then 1." % self.__class__) - except StopIteration: - pass - - if self.ranking: - if len(context) == self.num_context_turns: - scores = [] - if self.attention: - for i in range(len(self.preproc_responses) // self.batch_size + 1): - responses = self.preproc_responses[i * self.batch_size: (i + 1) * self.batch_size] - b = [context + el for el in responses] - b = self.model._make_batch(b) - sc = self.model._predict_on_batch(b) - scores += list(sc) - else: - b = self.model._make_batch([context]) - context_emb = self.model._predict_context_on_batch(b) - context_emb = np.squeeze(context_emb, axis=0) - scores = context_emb @ self.response_embeddings.T - ids = np.flip(np.argsort(scores), -1) - return [[self.responses[el] for el in ids[:self.interact_pred_num]]] - else: - return ["Please, provide contexts separated by '&' in the number equal to that used while training."] - - else: - if len(context) == 2: - b = self.model._make_batch([context]) - sc = self.model._predict_on_batch(b)[0] - if sc > 0.5: - return ["This is a paraphrase."] - else: - return ["This is not a paraphrase."] - else: - return ["Please, provide two sentences separated by '&'."] - - def reset(self) -> None: - pass - - def process_event(self) -> None: - pass - - def _build_response_embeddings(self) -> None: - resp_vecs = [] - for i in range(len(self.preproc_responses) // self.batch_size + 1): - resp_preproc = self.preproc_responses[i * self.batch_size: (i + 1) * self.batch_size] - resp_preproc = self.model._make_batch(resp_preproc) - resp_preproc = resp_preproc - resp_vecs.append(self.model._predict_response_on_batch(resp_preproc)) - self.response_embeddings = np.vstack(resp_vecs) - - def _build_preproc_responses(self) -> None: - responses = list(self.responses.values()) - for i in range(len(responses) // self.batch_size + 1): - el = self.preproc_func(responses[i * self.batch_size: (i + 1) * self.batch_size]) - self.preproc_responses += list(el) - - def rebuild_responses(self, candidates) -> None: - self.attention = True - self.interact_pred_num = 1 - self.preproc_responses = list() - self.responses = {idx: sentence for idx, sentence in enumerate(candidates)} - self._build_preproc_responses() diff --git a/deeppavlov/models/ranking/tf_base_matching_model.py b/deeppavlov/models/ranking/tf_base_matching_model.py deleted file mode 100644 index d7aa420856..0000000000 --- a/deeppavlov/models/ranking/tf_base_matching_model.py +++ /dev/null @@ -1,162 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from logging import getLogger -from typing import List, Dict, Tuple - -import numpy as np - -from deeppavlov.core.models.tf_model import TFModel -from deeppavlov.models.ranking.siamese_model import SiameseModel - -log = getLogger(__name__) - - -class TensorflowBaseMatchingModel(TFModel, SiameseModel): - """ - Base class for ranking models that uses context-response matching schemes. - - Note: - Tensorflow session variable already presents as self.sess attribute - (derived from TFModel and initialized by Chainer) - - Args: - batch_size (int): a number of samples in a batch. - num_context_turns (int): a number of ``context`` turns in data samples. - mean_oov (bool): whether to set mean embedding of all tokens. By default: True. - use_logits (bool): whether to use raw logits as outputs instead of softmax predictions - - """ - - def __init__(self, - batch_size: int, - num_context_turns: int = 10, - mean_oov: bool = True, - use_logits: bool = False, - *args, - **kwargs): - super(TensorflowBaseMatchingModel, self).__init__(batch_size=batch_size, num_context_turns=num_context_turns, - *args, **kwargs) - self.use_logits = use_logits - if mean_oov: - self.emb_matrix[1] = np.mean(self.emb_matrix[2:], - axis=0) # set mean embedding for OOV token at the 2nd index - - def _append_sample_to_batch_buffer(self, sample: List[np.ndarray], buf: List[Tuple]) -> int: - """ - - Args: - sample (List[nd.array]): samples generator - buf (List[Tuple]) : List of samples with model inputs each: - [( context, context_len, response, response_len ), ( ... ), ... ]. - Returns: - a number of candidate responses - """ - # - batch_buffer_context = [] # [batch_size, 10, 50] - batch_buffer_context_len = [] # [batch_size, 10] - batch_buffer_response = [] # [batch_size, 50] - batch_buffer_response_len = [] # [batch_size] - - context_sentences = sample[:self.num_context_turns] - response_sentences = sample[self.num_context_turns:] - - # Format model inputs: - # 4 model inputs - - # 1. Token indices for context - batch_buffer_context += [context_sentences] * len(response_sentences) - # 2. Token indices for response - batch_buffer_response += list(response_sentences) - # 3. Lens of context sentences - lens = [] - for context in [context_sentences] * len(response_sentences): - context_sentences_lens = [] - for sent in context: - context_sentences_lens.append(len(sent[sent != 0])) - lens.append(context_sentences_lens) - batch_buffer_context_len += lens - # 4. Lens of response sentences - lens = [] - for response_sent in response_sentences: - lens.append(len(response_sent[response_sent != 0])) - batch_buffer_response_len += lens - - for i in range(len(batch_buffer_context)): - buf.append(tuple(( - batch_buffer_context[i], - batch_buffer_context_len[i], - batch_buffer_response[i], - batch_buffer_response_len[i] - ))) - - return len(response_sentences) - - def _make_batch(self, batch: List[Tuple[List[np.ndarray], List, np.ndarray, int]]) -> Dict: - """ - The function for formatting model inputs - - Args: - batch (List[Tuple[np.ndarray]]): List of samples with model inputs each: - [( context, context_len, response, response_len ), ( ... ), ... ]. - Returns: - Dict: feed_dict to feed a model - """ - input_context = [] - input_context_len = [] - input_response = [] - input_response_len = [] - - # format model inputs as numpy arrays - for sample in batch: - input_context.append(sample[0]) - input_context_len.append(sample[1]) - input_response.append(sample[2]) - input_response_len.append(sample[3]) - - return { - self.utterance_ph: np.array(input_context), - self.all_utterance_len_ph: np.array(input_context_len), - self.response_ph: np.array(input_response), - self.response_len_ph: np.array(input_response_len) - } - - def _predict_on_batch(self, batch: Dict) -> np.ndarray: - """ - Run a model with the batch of inputs. - The function returns a list of predictions for the batch in numpy format - - Args: - batch (Dict): feed_dict that contains a batch with inputs for a model - - Returns: - nd.array: predictions for the batch (raw logits or softmax outputs) - """ - if self.use_logits: - return self.sess.run(self.logits, feed_dict=batch)[:, 1] - else: - return self.sess.run(self.y_pred, feed_dict=batch)[:, 1] - - def _train_on_batch(self, batch: Dict, y: List[int]) -> float: - """ - The function is for formatting of feed_dict used as an input for a model - Args: - batch (Dict): feed_dict that contains a batch with inputs for a model (except ground truth labels) - y (List(int)): list of ground truth labels - - Returns: - float: value of mean loss on the batch - """ - batch.update({self.y_true: np.array(y)}) - return self.sess.run([self.loss, self.train_op], feed_dict=batch)[0] # return the first item aka loss diff --git a/deeppavlov/models/slotfill/__init__.py b/deeppavlov/models/slotfill/__init__.py deleted file mode 100644 index e69de29bb2..0000000000 diff --git a/deeppavlov/models/slotfill/slotfill.py b/deeppavlov/models/slotfill/slotfill.py deleted file mode 100644 index 2a8b4c047d..0000000000 --- a/deeppavlov/models/slotfill/slotfill.py +++ /dev/null @@ -1,130 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import json -from logging import getLogger - -from rapidfuzz import process -from overrides import overrides - -from deeppavlov.core.common.registry import register -from deeppavlov.core.data.utils import download -from deeppavlov.core.models.component import Component -from deeppavlov.core.models.serializable import Serializable - -log = getLogger(__name__) - - -@register('dstc_slotfilling') -class DstcSlotFillingNetwork(Component, Serializable): - """Slot filling for DSTC2 task with neural network""" - - def __init__(self, threshold: float = 0.8, **kwargs): - super().__init__(**kwargs) - self.threshold = threshold - self._slot_vals = None - # Check existance of file with slots, slot values, and corrupted (misspelled) slot values - self.load() - - @overrides - def __call__(self, tokens_batch, tags_batch, *args, **kwargs): - slots = [{}] * len(tokens_batch) - m = [i for i, v in enumerate(tokens_batch) if v] - if m: - tags_batch = [tags_batch[i] for i in m] - tokens_batch = [tokens_batch[i] for i in m] - for i, tokens, tags in zip(m, tokens_batch, tags_batch): - slots[i] = self.predict_slots(tokens, tags) - return slots - - def predict_slots(self, tokens, tags): - # For utterance extract named entities and perform normalization for slot filling - - entities, slots = self._chunk_finder(tokens, tags) - slot_values = {} - for entity, slot in zip(entities, slots): - match, score = self.ner2slot(entity, slot) - if score >= self.threshold * 100: - slot_values[slot] = match - return slot_values - - def ner2slot(self, input_entity, slot): - # Given named entity return normalized slot value - if isinstance(input_entity, list): - input_entity = ' '.join(input_entity) - entities = [] - normalized_slot_vals = [] - for entity_name in self._slot_vals[slot]: - # todo log missing keys - for entity in self._slot_vals[slot][entity_name]: - # todo log missing keys - entities.append(entity) - normalized_slot_vals.append(entity_name) - best_match, score = process.extract(input_entity, entities, limit=2 ** 20)[0] - return normalized_slot_vals[entities.index(best_match)], score - - @staticmethod - def _chunk_finder(tokens, tags): - # For BIO labeled sequence of tags extract all named entities form tokens - prev_tag = '' - chunk_tokens = [] - entities = [] - slots = [] - for token, tag in zip(tokens, tags): - curent_tag = tag.split('-')[-1].strip() - current_prefix = tag.split('-')[0] - if tag.startswith('B-'): - if len(chunk_tokens) > 0: - entities.append(' '.join(chunk_tokens)) - slots.append(prev_tag) - chunk_tokens = [] - chunk_tokens.append(token) - if current_prefix == 'I': - if curent_tag != prev_tag: - if len(chunk_tokens) > 0: - entities.append(' '.join(chunk_tokens)) - slots.append(prev_tag) - chunk_tokens = [] - else: - chunk_tokens.append(token) - if current_prefix == 'O': - if len(chunk_tokens) > 0: - entities.append(' '.join(chunk_tokens)) - slots.append(prev_tag) - chunk_tokens = [] - prev_tag = curent_tag - if len(chunk_tokens) > 0: - entities.append(' '.join(chunk_tokens)) - slots.append(prev_tag) - return entities, slots - - def _download_slot_vals(self): - url = 'http://files.deeppavlov.ai/datasets/dstc_slot_vals.json' - download(self.save_path, url) - - def save(self, *args, **kwargs): - with open(self.save_path, 'w', encoding='utf8') as f: - json.dump(self._slot_vals, f) - - def serialize(self): - return json.dumps(self._slot_vals) - - def load(self, *args, **kwargs): - if not self.load_path.exists(): - self._download_slot_vals() - with open(self.load_path, encoding='utf8') as f: - self._slot_vals = json.load(f) - - def deserialize(self, data): - self._slot_vals = json.loads(data) diff --git a/deeppavlov/models/slotfill/slotfill_raw.py b/deeppavlov/models/slotfill/slotfill_raw.py deleted file mode 100644 index 39c7dff097..0000000000 --- a/deeppavlov/models/slotfill/slotfill_raw.py +++ /dev/null @@ -1,181 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import json -import tempfile -from collections import defaultdict -from logging import getLogger -from math import exp - -from pathlib import Path -from overrides import overrides - -from deeppavlov.core.common.file import read_yaml -from deeppavlov.core.common.registry import register -from deeppavlov.core.models.component import Component -from deeppavlov.core.models.serializable import Serializable -from deeppavlov.dataset_readers.md_yaml_dialogs_reader import MD_YAML_DialogsDatasetReader, DomainKnowledge - -log = getLogger(__name__) - - -@register('slotfill_raw') -class SlotFillingComponent(Component, Serializable): - """Slot filling using Fuzzy search""" - - def __init__(self, threshold: float = 0.7, return_all: bool = False, **kwargs): - super().__init__(**kwargs) - self.threshold = threshold - self.return_all = return_all - # self._slot_vals is the dictionary of slot values - self._slot_vals = None - self.load() - - @overrides - def __call__(self, batch, *args, **kwargs): - slots = [{}] * len(batch) - - m = [i for i, v in enumerate(batch) if v] - if m: - batch = [batch[i] for i in m] - # tags_batch = self._ner_network.predict_for_token_batch(batch) - # batch example: [['is', 'there', 'anything', 'else']] - for i, tokens in zip(m, batch): - # tokens are['is', 'there', 'anything', 'else'] - slots_values_lists = self._predict_slots(tokens) - if self.return_all: - slots[i] = dict(slots_values_lists) - else: - slots[i] = {slot: val_list[0] for slot, val_list in slots_values_lists.items()} - # slots[i] example {'food': 'steakhouse'} - # slots we want, example : [{'pricerange': 'moderate', 'area': 'south'}] - return slots - - def _predict_slots(self, tokens): - # For utterance extract named entities and perform normalization for slot filling - entities, slots = self._fuzzy_finder(self._slot_vals, tokens) - slot_values = defaultdict(list) - for entity, slot in zip(entities, slots): - slot_values[slot].append(entity) - return slot_values - - def load(self, *args, **kwargs): - with open(self.load_path, encoding='utf8') as f: - self._slot_vals = json.load(f) - - def deserialize(self, data): - self._slot_vals = json.loads(data) - - def save(self): - with open(self.save_path, 'w', encoding='utf8') as f: - json.dump(self._slot_vals, f) - - def serialize(self): - return json.dumps(self._slot_vals) - - def _fuzzy_finder(self, slot_dict, tokens): - global input_entity - if isinstance(tokens, list): - input_entity = ' '.join(tokens) - entities = [] - slots = [] - for slot, tag_dict in slot_dict.items(): - candidates = self.get_candidate(input_entity, tag_dict, self.get_ratio) - for candidate in candidates: - if candidate not in entities: - entities.append(candidate) - slots.append(slot) - return entities, slots - - def get_candidate(self, input_text, tag_dict, score_function): - candidates = [] - positions = [] - for entity_name, entity_list in tag_dict.items(): - for entity in entity_list: - ratio, j = score_function(entity.lower(), input_text.lower()) - if ratio >= self.threshold: - candidates.append(entity_name) - positions.append(j) - if candidates: - _, candidates = list(zip(*sorted(zip(positions, candidates)))) - return candidates - - def get_ratio(self, needle, haystack): - d, j = self.fuzzy_substring_distance(needle, haystack) - m = len(needle) - d - return exp(-d / 5) * (m / len(needle)), j - - @staticmethod - def fuzzy_substring_distance(needle, haystack): - """Calculates the fuzzy match of needle in haystack, - using a modified version of the Levenshtein distance - algorithm. - The function is modified from the Levenshtein function - in the bktree module by Adam Hupp - :type needle: string - :type haystack: string""" - m, n = len(needle), len(haystack) - - # base cases - if m == 1: - not_found = needle not in haystack - not_found = float(not_found) # float required by the method usage - occurrence_ix = 0 if not_found else haystack.index(needle) - return not_found, occurrence_ix - if not n: - return m - - row1 = [0] * (n + 1) - for j in range(0, n + 1): - if j == 0 or not haystack[j - 1].isalnum(): - row1[j] = 0 - else: - row1[j] = row1[j - 1] + 1 - - for i in range(0, m): - row2 = [i + 1] - for j in range(0, n): - cost = (needle[i] != haystack[j]) - row2.append(min(row1[j + 1] + 1, row2[j] + 1, row1[j] + cost)) - row1 = row2 - - d = n + m - j_min = 0 - for j in range(0, n + 1): - if j == 0 or j == n or not haystack[j].isalnum(): - if d > row1[j]: - d = row1[j] - j_min = j - # d = min(d, row1[j]) - return d, j_min - - -@register('slotfill_raw_rasa') -class RASA_SlotFillingComponent(SlotFillingComponent): - """wraps SlotFillingComponent so that it takes the slotfilling info from RASA configs""" - - def __init__(self, **kwargs): - super().__init__(**kwargs) - - def save(self): - pass - - def load(self, *args, **kwargs): - """reads the slotfilling info from RASA-styled dataset""" - domain_path = Path(self.load_path, MD_YAML_DialogsDatasetReader.DOMAIN_FNAME) - nlu_path = Path(self.load_path, MD_YAML_DialogsDatasetReader.NLU_FNAME) - domain_knowledge = DomainKnowledge(read_yaml(domain_path)) - # todo: rewrite MD_YAML_DialogsDatasetReader so that public methods are enough - _, slot_name2text2value = MD_YAML_DialogsDatasetReader._read_intent2text_mapping(nlu_path, domain_knowledge) - self._slot_vals = slot_name2text2value diff --git a/deeppavlov/models/squad/__init__.py b/deeppavlov/models/squad/__init__.py deleted file mode 100644 index e69de29bb2..0000000000 diff --git a/deeppavlov/models/squad/squad.py b/deeppavlov/models/squad/squad.py deleted file mode 100644 index 55c548cbf1..0000000000 --- a/deeppavlov/models/squad/squad.py +++ /dev/null @@ -1,326 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from logging import getLogger -from typing import List, Tuple - -import numpy as np -import tensorflow as tf - -from deeppavlov.core.common.check_gpu import check_gpu_existence -from deeppavlov.core.common.registry import register -from deeppavlov.core.layers.tf_layers import cudnn_bi_gru, variational_dropout -from deeppavlov.core.models.tf_model import LRScheduledTFModel -from deeppavlov.models.squad.utils import dot_attention, simple_attention, PtrNet, CudnnGRU, CudnnCompatibleGRU - -logger = getLogger(__name__) - - -@register('squad_model') -class SquadModel(LRScheduledTFModel): - """ - SquadModel predicts answer start and end position in given context by given question. - - High level architecture: - Word embeddings -> Contextual embeddings -> Question-Context Attention -> Self-attention -> Pointer Network - - If noans_token flag is True, then special noans_token is added to output of self-attention layer. - Pointer Network can select noans_token if there is no answer in given context. - - Parameters: - word_emb: pretrained word embeddings - char_emb: pretrained char embeddings - context_limit: max context length in tokens - question_limit: max question length in tokens - char_limit: max number of characters in token - char_hidden_size: hidden size of charRNN - encoder_hidden_size: hidden size of encoder RNN - attention_hidden_size: size of projection layer in attention - keep_prob: dropout keep probability - min_learning_rate: minimal learning rate, is used in learning rate decay - noans_token: boolean, flags whether to use special no_ans token to make model able not to answer on question - """ - - def __init__(self, word_emb: np.ndarray, char_emb: np.ndarray, context_limit: int = 450, question_limit: int = 150, - char_limit: int = 16, train_char_emb: bool = True, char_hidden_size: int = 100, - encoder_hidden_size: int = 75, attention_hidden_size: int = 75, keep_prob: float = 0.7, - min_learning_rate: float = 0.001, noans_token: bool = False, **kwargs) -> None: - super().__init__(**kwargs) - - self.init_word_emb = word_emb - self.init_char_emb = char_emb - self.context_limit = context_limit - self.question_limit = question_limit - self.char_limit = char_limit - self.train_char_emb = train_char_emb - self.char_hidden_size = char_hidden_size - self.hidden_size = encoder_hidden_size - self.attention_hidden_size = attention_hidden_size - self.keep_prob = keep_prob - self.min_learning_rate = min_learning_rate - self.noans_token = noans_token - - self.word_emb_dim = self.init_word_emb.shape[1] - self.char_emb_dim = self.init_char_emb.shape[1] - - self.last_impatience = 0 - self.lr_impatience = 0 - - if check_gpu_existence(): - self.GRU = CudnnGRU - else: - self.GRU = CudnnCompatibleGRU - - self.sess_config = tf.ConfigProto(allow_soft_placement=True) - self.sess_config.gpu_options.allow_growth = True - self.sess = tf.Session(config=self.sess_config) - - self._init_graph() - - self._init_optimizer() - - self.sess.run(tf.global_variables_initializer()) - - # Try to load the model (if there are some model files the model will be loaded from them) - if self.load_path is not None: - self.load() - - def _init_graph(self): - self._init_placeholders() - - self.word_emb = tf.get_variable("word_emb", initializer=tf.constant(self.init_word_emb, dtype=tf.float32), - trainable=False) - self.char_emb = tf.get_variable("char_emb", initializer=tf.constant(self.init_char_emb, dtype=tf.float32), - trainable=self.train_char_emb) - - self.c_mask = tf.cast(self.c_ph, tf.bool) - self.q_mask = tf.cast(self.q_ph, tf.bool) - self.c_len = tf.reduce_sum(tf.cast(self.c_mask, tf.int32), axis=1) - self.q_len = tf.reduce_sum(tf.cast(self.q_mask, tf.int32), axis=1) - - bs = tf.shape(self.c_ph)[0] - self.c_maxlen = tf.reduce_max(self.c_len) - self.q_maxlen = tf.reduce_max(self.q_len) - self.c = tf.slice(self.c_ph, [0, 0], [bs, self.c_maxlen]) - self.q = tf.slice(self.q_ph, [0, 0], [bs, self.q_maxlen]) - self.c_mask = tf.slice(self.c_mask, [0, 0], [bs, self.c_maxlen]) - self.q_mask = tf.slice(self.q_mask, [0, 0], [bs, self.q_maxlen]) - self.cc = tf.slice(self.cc_ph, [0, 0, 0], [bs, self.c_maxlen, self.char_limit]) - self.qc = tf.slice(self.qc_ph, [0, 0, 0], [bs, self.q_maxlen, self.char_limit]) - self.cc_len = tf.reshape(tf.reduce_sum(tf.cast(tf.cast(self.cc, tf.bool), tf.int32), axis=2), [-1]) - self.qc_len = tf.reshape(tf.reduce_sum(tf.cast(tf.cast(self.qc, tf.bool), tf.int32), axis=2), [-1]) - # to remove char sequences with len equal zero (padded tokens) - self.cc_len = tf.maximum(tf.ones_like(self.cc_len), self.cc_len) - self.qc_len = tf.maximum(tf.ones_like(self.qc_len), self.qc_len) - self.y1 = tf.one_hot(self.y1_ph, depth=self.context_limit) - self.y2 = tf.one_hot(self.y2_ph, depth=self.context_limit) - self.y1 = tf.slice(self.y1, [0, 0], [bs, self.c_maxlen]) - self.y2 = tf.slice(self.y2, [0, 0], [bs, self.c_maxlen]) - - if self.noans_token: - # we use additional 'no answer' token to allow model not to answer on question - # later we will add 'no answer' token as first token in context question-aware representation - self.y1 = tf.one_hot(self.y1_ph, depth=self.context_limit + 1) - self.y2 = tf.one_hot(self.y2_ph, depth=self.context_limit + 1) - self.y1 = tf.slice(self.y1, [0, 0], [bs, self.c_maxlen + 1]) - self.y2 = tf.slice(self.y2, [0, 0], [bs, self.c_maxlen + 1]) - - with tf.variable_scope("emb"): - with tf.variable_scope("char"): - cc_emb = tf.reshape(tf.nn.embedding_lookup(self.char_emb, self.cc), - [bs * self.c_maxlen, self.char_limit, self.char_emb_dim]) - qc_emb = tf.reshape(tf.nn.embedding_lookup(self.char_emb, self.qc), - [bs * self.q_maxlen, self.char_limit, self.char_emb_dim]) - - cc_emb = variational_dropout(cc_emb, keep_prob=self.keep_prob_ph) - qc_emb = variational_dropout(qc_emb, keep_prob=self.keep_prob_ph) - - _, (state_fw, state_bw) = cudnn_bi_gru(cc_emb, self.char_hidden_size, seq_lengths=self.cc_len, - trainable_initial_states=True) - cc_emb = tf.concat([state_fw, state_bw], axis=1) - - _, (state_fw, state_bw) = cudnn_bi_gru(qc_emb, self.char_hidden_size, seq_lengths=self.qc_len, - trainable_initial_states=True, - reuse=True) - qc_emb = tf.concat([state_fw, state_bw], axis=1) - - cc_emb = tf.reshape(cc_emb, [bs, self.c_maxlen, 2 * self.char_hidden_size]) - qc_emb = tf.reshape(qc_emb, [bs, self.q_maxlen, 2 * self.char_hidden_size]) - - with tf.name_scope("word"): - c_emb = tf.nn.embedding_lookup(self.word_emb, self.c) - q_emb = tf.nn.embedding_lookup(self.word_emb, self.q) - - c_emb = tf.concat([c_emb, cc_emb], axis=2) - q_emb = tf.concat([q_emb, qc_emb], axis=2) - - with tf.variable_scope("encoding"): - rnn = self.GRU(num_layers=3, num_units=self.hidden_size, batch_size=bs, - input_size=c_emb.get_shape().as_list()[-1], - keep_prob=self.keep_prob_ph) - c = rnn(c_emb, seq_len=self.c_len) - q = rnn(q_emb, seq_len=self.q_len) - - with tf.variable_scope("attention"): - qc_att = dot_attention(c, q, mask=self.q_mask, att_size=self.attention_hidden_size, - keep_prob=self.keep_prob_ph) - rnn = self.GRU(num_layers=1, num_units=self.hidden_size, batch_size=bs, - input_size=qc_att.get_shape().as_list()[-1], keep_prob=self.keep_prob_ph) - att = rnn(qc_att, seq_len=self.c_len) - - with tf.variable_scope("match"): - self_att = dot_attention(att, att, mask=self.c_mask, att_size=self.attention_hidden_size, - keep_prob=self.keep_prob_ph) - rnn = self.GRU(num_layers=1, num_units=self.hidden_size, batch_size=bs, - input_size=self_att.get_shape().as_list()[-1], keep_prob=self.keep_prob_ph) - match = rnn(self_att, seq_len=self.c_len) - - with tf.variable_scope("pointer"): - init = simple_attention(q, self.hidden_size, mask=self.q_mask, keep_prob=self.keep_prob_ph) - pointer = PtrNet(cell_size=init.get_shape().as_list()[-1], keep_prob=self.keep_prob_ph) - if self.noans_token: - noans_token = tf.Variable(tf.random_uniform((match.get_shape().as_list()[-1],), -0.1, 0.1), tf.float32) - noans_token = tf.nn.dropout(noans_token, keep_prob=self.keep_prob_ph) - noans_token = tf.expand_dims(tf.tile(tf.expand_dims(noans_token, axis=0), [bs, 1]), axis=1) - match = tf.concat([noans_token, match], axis=1) - self.c_mask = tf.concat([tf.ones(shape=(bs, 1), dtype=tf.bool), self.c_mask], axis=1) - logits1, logits2 = pointer(init, match, self.hidden_size, self.c_mask) - - with tf.variable_scope("predict"): - max_ans_length = tf.cast(tf.minimum(15, self.c_maxlen), tf.int64) - outer_logits = tf.exp(tf.expand_dims(logits1, axis=2) + tf.expand_dims(logits2, axis=1)) - outer_logits = tf.matrix_band_part(outer_logits, 0, max_ans_length) - outer = tf.matmul(tf.expand_dims(tf.nn.softmax(logits1), axis=2), - tf.expand_dims(tf.nn.softmax(logits2), axis=1)) - outer = tf.matrix_band_part(outer, 0, max_ans_length) - self.yp1 = tf.argmax(tf.reduce_max(outer, axis=2), axis=1) - self.yp2 = tf.argmax(tf.reduce_max(outer, axis=1), axis=1) - self.yp_logits = tf.reduce_max(tf.reduce_max(outer_logits, axis=2), axis=1) - if self.noans_token: - self.yp_score = 1 - tf.nn.softmax(logits1)[:, 0] * tf.nn.softmax(logits2)[:, 0] - loss_1 = tf.nn.softmax_cross_entropy_with_logits(logits=logits1, labels=self.y1) - loss_2 = tf.nn.softmax_cross_entropy_with_logits(logits=logits2, labels=self.y2) - self.loss = tf.reduce_mean(loss_1 + loss_2) - - def _init_placeholders(self): - self.c_ph = tf.placeholder(shape=(None, None), dtype=tf.int32, name='c_ph') - self.cc_ph = tf.placeholder(shape=(None, None, self.char_limit), dtype=tf.int32, name='cc_ph') - self.q_ph = tf.placeholder(shape=(None, None), dtype=tf.int32, name='q_ph') - self.qc_ph = tf.placeholder(shape=(None, None, self.char_limit), dtype=tf.int32, name='qc_ph') - self.y1_ph = tf.placeholder(shape=(None,), dtype=tf.int32, name='y1_ph') - self.y2_ph = tf.placeholder(shape=(None,), dtype=tf.int32, name='y2_ph') - - self.lear_rate_ph = tf.placeholder_with_default(0.0, shape=[], name='learning_rate') - self.keep_prob_ph = tf.placeholder_with_default(1.0, shape=[], name='keep_prob_ph') - self.is_train_ph = tf.placeholder_with_default(False, shape=[], name='is_train_ph') - - def _init_optimizer(self): - with tf.variable_scope('Optimizer'): - self.global_step = tf.get_variable('global_step', shape=[], dtype=tf.int32, - initializer=tf.constant_initializer(0), trainable=False) - self.train_op = self.get_train_op(self.loss, learning_rate=self.lear_rate_ph) - - def _build_feed_dict(self, c_tokens, c_chars, q_tokens, q_chars, y1=None, y2=None): - feed_dict = { - self.c_ph: c_tokens, - self.cc_ph: c_chars, - self.q_ph: q_tokens, - self.qc_ph: q_chars, - } - if y1 is not None and y2 is not None: - feed_dict.update({ - self.y1_ph: y1, - self.y2_ph: y2, - self.lear_rate_ph: max(self.get_learning_rate(), self.min_learning_rate), - self.keep_prob_ph: self.keep_prob, - self.is_train_ph: True, - }) - - return feed_dict - - def train_on_batch(self, c_tokens: np.ndarray, c_chars: np.ndarray, q_tokens: np.ndarray, q_chars: np.ndarray, - y1s: Tuple[List[int], ...], y2s: Tuple[List[int], ...]) -> float: - """ - This method is called by trainer to make one training step on one batch. - - Args: - c_tokens: batch of tokenized contexts - c_chars: batch of tokenized contexts, each token split on chars - q_tokens: batch of tokenized questions - q_chars: batch of tokenized questions, each token split on chars - y1s: batch of ground truth answer start positions - y2s: batch of ground truth answer end positions - - Returns: - value of loss function on batch - """ - # TODO: filter examples in batches with answer position greater self.context_limit - # select one answer from list of correct answers - y1s = np.array([x[0] for x in y1s]) - y2s = np.array([x[0] for x in y2s]) - if self.noans_token: - noans_mask = ((y1s != -1) * (y2s != -1)) - y1s = (y1s + 1) * noans_mask - y2s = (y2s + 1) * noans_mask - - feed_dict = self._build_feed_dict(c_tokens, c_chars, q_tokens, q_chars, y1s, y2s) - loss, _, lear_rate = self.sess.run([self.loss, self.train_op, self.lear_rate_ph], - feed_dict=feed_dict) - report = {'loss': loss, 'learning_rate': float(lear_rate), 'momentum': self.get_momentum()} - return report - - def __call__(self, c_tokens: np.ndarray, c_chars: np.ndarray, q_tokens: np.ndarray, q_chars: np.ndarray, - *args, **kwargs) -> Tuple[np.ndarray, np.ndarray, List[float]]: - """ - Predicts answer start and end positions by given context and question. - - Args: - c_tokens: batch of tokenized contexts - c_chars: batch of tokenized contexts, each token split on chars - q_tokens: batch of tokenized questions - q_chars: batch of tokenized questions, each token split on chars - - Returns: - answer_start, answer_end positions, answer logits which represent models confidence - """ - if any(np.sum(c_tokens, axis=-1) == 0) or any(np.sum(q_tokens, axis=-1) == 0): - logger.info('SQuAD model: Warning! Empty question or context was found.') - noanswers = -np.ones(shape=(c_tokens.shape[0]), dtype=np.int32) - zero_probs = np.zeros(shape=(c_tokens.shape[0]), dtype=np.float32) - if self.noans_token: - return noanswers, noanswers, zero_probs, zero_probs - return noanswers, noanswers, zero_probs - - feed_dict = self._build_feed_dict(c_tokens, c_chars, q_tokens, q_chars) - - if self.noans_token: - yp1, yp2, logits, score = self.sess.run([self.yp1, self.yp2, self.yp_logits, self.yp_score], - feed_dict=feed_dict) - noans_mask = (yp1 * yp2).astype(bool) - yp1 = yp1 * noans_mask - 1 - yp2 = yp2 * noans_mask - 1 - return yp1, yp2, logits.tolist(), score.tolist() - - yp1, yp2, logits = self.sess.run([self.yp1, self.yp2, self.yp_logits], feed_dict=feed_dict) - return yp1, yp2, logits.tolist() - - def process_event(self, event_name: str, data) -> None: - """ - Processes events sent by trainer. Implements learning rate decay. - - Args: - event_name: event_name sent by trainer - data: number of examples, epochs, metrics sent by trainer - """ - super().process_event(event_name, data) diff --git a/deeppavlov/models/squad/utils.py b/deeppavlov/models/squad/utils.py deleted file mode 100644 index d9ac2e4d92..0000000000 --- a/deeppavlov/models/squad/utils.py +++ /dev/null @@ -1,214 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - - -import tensorflow as tf - - -class CudnnGRU: - def __init__(self, num_layers, num_units, batch_size, input_size, keep_prob=1.0): - self.num_layers = num_layers - self.grus = [] - self.inits = [] - self.dropout_mask = [] - for layer in range(num_layers): - input_size_ = input_size if layer == 0 else 2 * num_units - gru_fw = tf.contrib.cudnn_rnn.CudnnGRU(num_layers=1, num_units=num_units) - gru_bw = tf.contrib.cudnn_rnn.CudnnGRU(num_layers=1, num_units=num_units) - - init_fw = tf.Variable(tf.zeros([num_units])) - init_fw = tf.expand_dims(tf.tile(tf.expand_dims(init_fw, axis=0), [batch_size, 1]), axis=0) - init_bw = tf.Variable(tf.zeros([num_units])) - init_bw = tf.expand_dims(tf.tile(tf.expand_dims(init_bw, axis=0), [batch_size, 1]), axis=0) - - mask_fw = tf.nn.dropout(tf.ones([1, batch_size, input_size_], dtype=tf.float32), - keep_prob=keep_prob) - mask_bw = tf.nn.dropout(tf.ones([1, batch_size, input_size_], dtype=tf.float32), - keep_prob=keep_prob) - - self.grus.append((gru_fw, gru_bw,)) - self.inits.append((init_fw, init_bw,)) - self.dropout_mask.append((mask_fw, mask_bw,)) - - def __call__(self, inputs, seq_len, keep_prob=1.0, is_train=None, concat_layers=True): - outputs = [tf.transpose(inputs, [1, 0, 2])] - for layer in range(self.num_layers): - gru_fw, gru_bw = self.grus[layer] - init_fw, init_bw = self.inits[layer] - mask_fw, mask_bw = self.dropout_mask[layer] - with tf.variable_scope('fw_{}'.format(layer), reuse=tf.AUTO_REUSE): - out_fw, _ = gru_fw(outputs[-1] * mask_fw, (init_fw,)) - with tf.variable_scope('bw_{}'.format(layer), reuse=tf.AUTO_REUSE): - inputs_bw = tf.reverse_sequence( - outputs[-1] * mask_bw, seq_lengths=seq_len, seq_dim=0, batch_dim=1) - out_bw, _ = gru_bw(inputs_bw, (init_bw,)) - out_bw = tf.reverse_sequence( - out_bw, seq_lengths=seq_len, seq_dim=0, batch_dim=1) - outputs.append(tf.concat([out_fw, out_bw], axis=2)) - if concat_layers: - res = tf.concat(outputs[1:], axis=2) - else: - res = outputs[-1] - res = tf.transpose(res, [1, 0, 2]) - return res - - -class CudnnCompatibleGRU: - def __init__(self, num_layers, num_units, batch_size, input_size, keep_prob=1.0): - self.num_layers = num_layers - self.grus = [] - self.inits = [] - self.dropout_mask = [] - for layer in range(num_layers): - input_size_ = input_size if layer == 0 else 2 * num_units - gru_fw = tf.nn.rnn_cell.MultiRNNCell([ - tf.contrib.cudnn_rnn.CudnnCompatibleGRUCell(num_units=num_units)]) - - gru_bw = tf.nn.rnn_cell.MultiRNNCell([ - tf.contrib.cudnn_rnn.CudnnCompatibleGRUCell(num_units=num_units)]) - - init_fw = tf.Variable(tf.zeros([num_units])) - init_fw = tf.expand_dims(tf.tile(tf.expand_dims(init_fw, axis=0), [batch_size, 1]), axis=0) - init_bw = tf.Variable(tf.zeros([num_units])) - init_bw = tf.expand_dims(tf.tile(tf.expand_dims(init_bw, axis=0), [batch_size, 1]), axis=0) - - mask_fw = tf.nn.dropout(tf.ones([1, batch_size, input_size_], dtype=tf.float32), - keep_prob=keep_prob) - mask_bw = tf.nn.dropout(tf.ones([1, batch_size, input_size_], dtype=tf.float32), - keep_prob=keep_prob) - - self.grus.append((gru_fw, gru_bw,)) - self.inits.append((init_fw, init_bw,)) - self.dropout_mask.append((mask_fw, mask_bw,)) - - def __call__(self, inputs, seq_len, keep_prob=1.0, is_train=None, concat_layers=True): - outputs = [tf.transpose(inputs, [1, 0, 2])] - for layer in range(self.num_layers): - gru_fw, gru_bw = self.grus[layer] - init_fw, init_bw = self.inits[layer] - mask_fw, mask_bw = self.dropout_mask[layer] - with tf.variable_scope('fw_{}'.format(layer), reuse=tf.AUTO_REUSE): - with tf.variable_scope('cudnn_gru', reuse=tf.AUTO_REUSE): - out_fw, _ = tf.nn.dynamic_rnn(cell=gru_fw, inputs=outputs[-1] * mask_fw, time_major=True, - initial_state=tuple(tf.unstack(init_fw, axis=0))) - - with tf.variable_scope('bw_{}'.format(layer), reuse=tf.AUTO_REUSE): - with tf.variable_scope('cudnn_gru', reuse=tf.AUTO_REUSE): - inputs_bw = tf.reverse_sequence( - outputs[-1] * mask_bw, seq_lengths=seq_len, seq_dim=0, batch_dim=1) - out_bw, _ = tf.nn.dynamic_rnn(cell=gru_bw, inputs=inputs_bw, time_major=True, - initial_state=tuple(tf.unstack(init_bw, axis=0))) - out_bw = tf.reverse_sequence( - out_bw, seq_lengths=seq_len, seq_dim=0, batch_dim=1) - - outputs.append(tf.concat([out_fw, out_bw], axis=2)) - if concat_layers: - res = tf.concat(outputs[1:], axis=2) - else: - res = outputs[-1] - res = tf.transpose(res, [1, 0, 2]) - return res - - -class PtrNet: - def __init__(self, cell_size, keep_prob=1.0, scope="ptr_net"): - self.gru = tf.nn.rnn_cell.GRUCell(cell_size) - self.scope = scope - self.keep_prob = keep_prob - - def __call__(self, init, match, hidden_size, mask): - with tf.variable_scope(self.scope): - BS, ML, MH = tf.unstack(tf.shape(match)) - BS, IH = tf.unstack(tf.shape(init)) - match_do = tf.nn.dropout(match, keep_prob=self.keep_prob, noise_shape=[BS, 1, MH]) - dropout_mask = tf.nn.dropout(tf.ones([BS, IH], dtype=tf.float32), keep_prob=self.keep_prob) - inp, logits1 = attention(match_do, init * dropout_mask, hidden_size, mask) - inp_do = tf.nn.dropout(inp, keep_prob=self.keep_prob) - _, state = self.gru(inp_do, init) - tf.get_variable_scope().reuse_variables() - _, logits2 = attention(match_do, state * dropout_mask, hidden_size, mask) - return logits1, logits2 - - -def dot_attention(inputs, memory, mask, att_size, keep_prob=1.0, scope="dot_attention"): - """Computes attention vector for each item in inputs: - attention vector is a weighted sum of memory items. - Dot product between input and memory vector is used as similarity measure. - - Gate mechanism is applied to attention vectors to produce output. - - Args: - inputs: Tensor [batch_size x input_len x feature_size] - memory: Tensor [batch_size x memory_len x feature_size] - mask: inputs mask - att_size: hidden size of attention - keep_prob: dropout keep_prob - scope: - - Returns: - attention vectors [batch_size x input_len x (feature_size + feature_size)] - - """ - with tf.variable_scope(scope): - BS, IL, IH = tf.unstack(tf.shape(inputs)) - BS, ML, MH = tf.unstack(tf.shape(memory)) - - d_inputs = tf.nn.dropout(inputs, keep_prob=keep_prob, noise_shape=[BS, 1, IH]) - d_memory = tf.nn.dropout(memory, keep_prob=keep_prob, noise_shape=[BS, 1, MH]) - - with tf.variable_scope("attention"): - inputs_att = tf.layers.dense(d_inputs, att_size, use_bias=False, activation=tf.nn.relu) - memory_att = tf.layers.dense(d_memory, att_size, use_bias=False, activation=tf.nn.relu) - logits = tf.matmul(inputs_att, tf.transpose(memory_att, [0, 2, 1])) / (att_size ** 0.5) - mask = tf.tile(tf.expand_dims(mask, axis=1), [1, IL, 1]) - att_weights = tf.nn.softmax(softmax_mask(logits, mask)) - outputs = tf.matmul(att_weights, memory) - res = tf.concat([inputs, outputs], axis=2) - - with tf.variable_scope("gate"): - dim = res.get_shape().as_list()[-1] - d_res = tf.nn.dropout(res, keep_prob=keep_prob, noise_shape=[BS, 1, IH + MH]) - gate = tf.layers.dense(d_res, dim, use_bias=False, activation=tf.nn.sigmoid) - return res * gate - - -def simple_attention(memory, att_size, mask, keep_prob=1.0, scope="simple_attention"): - """Simple attention without any conditions. - - Computes weighted sum of memory elements. - """ - with tf.variable_scope(scope): - BS, ML, MH = tf.unstack(tf.shape(memory)) - memory_do = tf.nn.dropout(memory, keep_prob=keep_prob, noise_shape=[BS, 1, MH]) - logits = tf.layers.dense(tf.layers.dense(memory_do, att_size, activation=tf.nn.tanh), 1, use_bias=False) - logits = softmax_mask(tf.squeeze(logits, [2]), mask) - att_weights = tf.expand_dims(tf.nn.softmax(logits), axis=2) - res = tf.reduce_sum(att_weights * memory, axis=1) - return res - - -def attention(inputs, state, att_size, mask, scope="attention"): - """Computes weighted sum of inputs conditioned on state""" - with tf.variable_scope(scope): - u = tf.concat([tf.tile(tf.expand_dims(state, axis=1), [1, tf.shape(inputs)[1], 1]), inputs], axis=2) - logits = tf.layers.dense(tf.layers.dense(u, att_size, activation=tf.nn.tanh), 1, use_bias=False) - logits = softmax_mask(tf.squeeze(logits, [2]), mask) - att_weights = tf.expand_dims(tf.nn.softmax(logits), axis=2) - res = tf.reduce_sum(att_weights * inputs, axis=1) - return res, logits - - -def softmax_mask(val, mask): - INF = 1e30 - return -INF * (1 - tf.cast(mask, tf.float32)) + val diff --git a/deeppavlov/models/syntax_parser/__init__.py b/deeppavlov/models/syntax_parser/__init__.py deleted file mode 100644 index e69de29bb2..0000000000 diff --git a/deeppavlov/models/syntax_parser/joint.py b/deeppavlov/models/syntax_parser/joint.py deleted file mode 100644 index 78cf1244e9..0000000000 --- a/deeppavlov/models/syntax_parser/joint.py +++ /dev/null @@ -1,142 +0,0 @@ -from typing import Union, List - -from deeppavlov.core.common.registry import register -from deeppavlov.core.models.component import Component -from deeppavlov.core.common.chainer import Chainer - -from deeppavlov.models.morpho_tagger.common import TagOutputPrettifier,\ - LemmatizedOutputPrettifier, DependencyOutputPrettifier - - -UD_COLUMN_FEAT_MAPPING = {"id": 0, "word": 1, "lemma": 2, "upos": 3, "feats": 5, "head": 6, "deprel": 7} - - -@register("joint_tagger_parser") -class JointTaggerParser(Component): - """ - A class to perform joint morphological and syntactic parsing. - It is just a wrapper that calls the models for tagging and parsing - and comprises their results in a single output. - - Args: - tagger: the morphological tagger model (a :class:`~deeppavlov.core.common.chainer.Chainer` instance) - parser_path: the syntactic parser model (a :class:`~deeppavlov.core.common.chainer.Chainer` instance) - output_format: the output format, it may be either `ud` (alias: `conllu`) or `json`. - to_output_string: whether to convert the output to a list of strings - - Attributes: - tagger: a morphological tagger model (a :class:`~deeppavlov.core.common.chainer.Chainer` instance) - parser: a syntactic parser model (a :class:`~deeppavlov.core.common.chainer.Chainer` instance) - - """ - - def __init__(self, tagger: Chainer, parser: Chainer, - output_format: str = "ud", to_output_string: bool = False, - *args, **kwargs): - if output_format not in ["ud", "conllu", "json", "dict"]: - UserWarning("JointTaggerParser output_format can be only `ud`, `conllu` or `json`. "\ - "Unknown format: {}, setting the output_format to `ud`.".format(output_format)) - output_format = "ud" - self.output_format = output_format - self.to_output_string = to_output_string - self.tagger = tagger - self.parser = parser - self._check_models() - - def _check_models(self): - tagger_prettifier = self.tagger[-1] - if not isinstance(tagger_prettifier, (TagOutputPrettifier, LemmatizedOutputPrettifier)): - raise ValueError("The tagger should output prettified data: last component of the config " - "should be either a TagOutputPrettifier or a LemmatizedOutputPrettifier " - "instance.") - if isinstance(tagger_prettifier, TagOutputPrettifier): - tagger_prettifier.set_format_mode("ud") - tagger_prettifier.return_string = False - parser_prettifier = self.parser[-1] - if not isinstance(parser_prettifier, DependencyOutputPrettifier): - raise ValueError("The tagger should output prettified data: last component of the config " - "should be either a DependencyOutputPrettifier instance.") - parser_prettifier.return_string = False - - def __call__(self, data: Union[List[str], List[List[str]]])\ - -> Union[List[List[dict]], List[str], List[List[str]]]: - r"""Parses a batch of sentences. - - Args: - data: either a batch of tokenized sentences, or a batch of raw sentences - - Returns: - `answer`, a batch of parsed sentences. A sentence parse is a list of single word parses. - Each word parse is either a CoNLL-U-formatted string or a dictionary. - A sentence parse is returned either as is if ``self.to_output_string`` is ``False``, - or as a single string, where each word parse begins with a new string. - - .. code-block:: python - - >>> from deeppavlov.core.commands.infer import build_model - >>> model = build_model("ru_syntagrus_joint_parsing") - >>> batch = ["Девушка пела в церковном хоре.", "У этой задачи есть сложное решение."] - >>> print(*model(batch), sep="\\n\\n") - 1 Девушка девушка NOUN _ Animacy=Anim|Case=Nom|Gender=Fem|Number=Sing 2 nsubj _ _ - 2 пела петь VERB _ Aspect=Imp|Gender=Fem|Mood=Ind|Number=Sing|Tense=Past|VerbForm=Fin|Voice=Act 0 root _ _ - 3 в в ADP _ _ 5 case _ _ - 4 церковном церковный ADJ _ Case=Loc|Degree=Pos|Gender=Masc|Number=Sing 5 amod _ _ - 5 хоре хор NOUN _ Animacy=Inan|Case=Loc|Gender=Masc|Number=Sing 2 obl _ _ - 6 . . PUNCT _ _ 2 punct _ _ - - 1 У у ADP _ _ 3 case _ _ - 2 этой этот DET _ Case=Gen|Gender=Fem|Number=Sing 3 det _ _ - 3 задачи задача NOUN _ Animacy=Inan|Case=Gen|Gender=Fem|Number=Sing 4 obl _ _ - 4 есть быть VERB _ Aspect=Imp|Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin|Voice=Act 0 root _ _ - 5 сложное сложный ADJ _ Case=Nom|Degree=Pos|Gender=Neut|Number=Sing 6 amod _ _ - 6 решение решение NOUN _ Animacy=Inan|Case=Nom|Gender=Neut|Number=Sing 4 nsubj _ _ - 7 . . PUNCT _ _ 4 punct _ _ - - >>> # Dirty hacks to change model parameters in the code, you should do it in the configuration file. - >>> model["main"].to_output_string = False - >>> model["main"].output_format = "json" - >>> for sent_parse in model(batch): - >>> for word_parse in sent_parse: - >>> print(word_parse) - >>> print("") - {'id': '1', 'word': 'Девушка', 'lemma': 'девушка', 'upos': 'NOUN', 'feats': 'Animacy=Anim|Case=Nom|Gender=Fem|Number=Sing', 'head': '2', 'deprel': 'nsubj'} - {'id': '2', 'word': 'пела', 'lemma': 'петь', 'upos': 'VERB', 'feats': 'Aspect=Imp|Gender=Fem|Mood=Ind|Number=Sing|Tense=Past|VerbForm=Fin|Voice=Act', 'head': '0', 'deprel': 'root'} - {'id': '3', 'word': 'в', 'lemma': 'в', 'upos': 'ADP', 'feats': '_', 'head': '5', 'deprel': 'case'} - {'id': '4', 'word': 'церковном', 'lemma': 'церковный', 'upos': 'ADJ', 'feats': 'Case=Loc|Degree=Pos|Gender=Masc|Number=Sing', 'head': '5', 'deprel': 'amod'} - {'id': '5', 'word': 'хоре', 'lemma': 'хор', 'upos': 'NOUN', 'feats': 'Animacy=Inan|Case=Loc|Gender=Masc|Number=Sing', 'head': '2', 'deprel': 'obl'} - {'id': '6', 'word': '.', 'lemma': '.', 'upos': 'PUNCT', 'feats': '_', 'head': '2', 'deprel': 'punct'} - - {'id': '1', 'word': 'У', 'lemma': 'у', 'upos': 'ADP', 'feats': '_', 'head': '3', 'deprel': 'case'} - {'id': '2', 'word': 'этой', 'lemma': 'этот', 'upos': 'DET', 'feats': 'Case=Gen|Gender=Fem|Number=Sing', 'head': '3', 'deprel': 'det'} - {'id': '3', 'word': 'задачи', 'lemma': 'задача', 'upos': 'NOUN', 'feats': 'Animacy=Inan|Case=Gen|Gender=Fem|Number=Sing', 'head': '4', 'deprel': 'obl'} - {'id': '4', 'word': 'есть', 'lemma': 'быть', 'upos': 'VERB', 'feats': 'Aspect=Imp|Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin|Voice=Act', 'head': '0', 'deprel': 'root'} - {'id': '5', 'word': 'сложное', 'lemma': 'сложный', 'upos': 'ADJ', 'feats': 'Case=Nom|Degree=Pos|Gender=Neut|Number=Sing', 'head': '6', 'deprel': 'amod'} - {'id': '6', 'word': 'решение', 'lemma': 'решение', 'upos': 'NOUN', 'feats': 'Animacy=Inan|Case=Nom|Gender=Neut|Number=Sing', 'head': '4', 'deprel': 'nsubj'} - {'id': '7', 'word': '.', 'lemma': '.', 'upos': 'PUNCT', 'feats': '_', 'head': '4', 'deprel': 'punct'} - - """ - tagger_output = self.tagger(data) - parser_output = self.parser(data) - answer = [] - for i, (tagger_sent, parser_sent) in enumerate(zip(tagger_output, parser_output)): - curr_sent_answer = [] - for j, curr_word_tagger_output in enumerate(tagger_sent): - curr_word_tagger_output = curr_word_tagger_output.split("\t") - curr_word_parser_output = parser_sent[j].split("\t") - curr_word_answer = curr_word_tagger_output[:] - # setting parser output - curr_word_answer[6:8] = curr_word_parser_output[6:8] - if self.output_format in ["json", "dict"]: - curr_word_answer = {key: curr_word_answer[index] - for key, index in UD_COLUMN_FEAT_MAPPING.items()} - if self.to_output_string: - curr_word_answer = str(curr_word_answer) - elif self.to_output_string: - curr_word_answer = "\t".join(curr_word_answer) - curr_sent_answer.append(curr_word_answer) - if self.to_output_string: - curr_sent_answer = "\n".join(str(x) for x in curr_sent_answer) - answer.append(curr_sent_answer) - return answer - - diff --git a/deeppavlov/models/syntax_parser/network.py b/deeppavlov/models/syntax_parser/network.py deleted file mode 100644 index 094f00f11e..0000000000 --- a/deeppavlov/models/syntax_parser/network.py +++ /dev/null @@ -1,345 +0,0 @@ -# Copyright 2019 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -from logging import getLogger -from typing import List, Union, Tuple - -import numpy as np -import tensorflow as tf -import tensorflow.keras.backend as kb -from tensorflow.contrib.layers import xavier_initializer - -from deeppavlov.core.common.registry import register -from deeppavlov.core.data.utils import zero_pad -from deeppavlov.core.layers.tf_layers import bi_rnn -from deeppavlov.models.bert.bert_sequence_tagger import BertSequenceNetwork, token_from_subtoken - -log = getLogger(__name__) - - -def gather_indexes(A: tf.Tensor, B: tf.Tensor) -> tf.Tensor: - """ - Args: - A: a tensor with data - B: an integer tensor with indexes - - Returns: - `answer` a tensor such that ``answer[i, j] = A[i, B[i, j]]``. - In case `B` is one-dimensional, the output is ``answer[i] = A[i, B[i]]`` - - """ - are_indexes_one_dim = (kb.ndim(B) == 1) - if are_indexes_one_dim: - B = tf.expand_dims(B, -1) - first_dim_indexes = tf.expand_dims(tf.range(tf.shape(B)[0]), -1) - first_dim_indexes = tf.tile(first_dim_indexes, [1, tf.shape(B)[1]]) - indexes = tf.stack([first_dim_indexes, B], axis=-1) - answer = tf.gather_nd(A, indexes) - if are_indexes_one_dim: - answer = answer[:,0] - return answer - - -def biaffine_layer(deps: tf.Tensor, heads: tf.Tensor, deps_dim: int, - heads_dim: int, output_dim: int, name: str = "biaffine_layer") -> tf.Tensor: - """Implements a biaffine layer from [Dozat, Manning, 2016]. - - Args: - deps: the 3D-tensor of dependency states, - heads: the 3D-tensor of head states, - deps_dim: the dimension of dependency states, - heads_dim: the dimension of head_states, - output_dim: the output dimension - name: the name of a layer - - Returns: - `answer` the output 3D-tensor - - """ - input_shape = [kb.shape(deps)[i] for i in range(tf.keras.backend.ndim(deps))] - first_input = tf.reshape(deps, [-1, deps_dim]) # first_input.shape = (B*L, D1) - second_input = tf.reshape(heads, [-1, heads_dim]) # second_input.shape = (B*L, D2) - with tf.variable_scope(name): - kernel_shape = (deps_dim, heads_dim * output_dim) - kernel = tf.get_variable('kernel', shape=kernel_shape, initializer=xavier_initializer()) - first = tf.matmul(first_input, kernel) # (B*L, D2*H) - first = tf.reshape(first, [-1, heads_dim, output_dim]) # (B*L, D2, H) - answer = kb.batch_dot(first, second_input, axes=[1, 1]) # (B*L, H) - first_bias = tf.get_variable('first_bias', shape=(deps_dim, output_dim), - initializer=xavier_initializer()) - answer += tf.matmul(first_input, first_bias) - second_bias = tf.get_variable('second_bias', shape=(heads_dim, output_dim), - initializer=xavier_initializer()) - answer += tf.matmul(second_input, second_bias) - label_bias = tf.get_variable('label_bias', shape=(output_dim,), - initializer=xavier_initializer()) - answer = kb.bias_add(answer, label_bias) - answer = tf.reshape(answer, input_shape[:-1] + [output_dim]) # (B, L, H) - return answer - - -def biaffine_attention(deps: tf.Tensor, heads: tf.Tensor, name="biaffine_attention") -> tf.Tensor: - """Implements a trainable matching layer between two families of embeddings. - - Args: - deps: the 3D-tensor of dependency states, - heads: the 3D-tensor of head states, - name: the name of a layer - - Returns: - `answer` a 3D-tensor of pairwise scores between deps and heads - - """ - deps_dim_int = deps.get_shape().as_list()[-1] - heads_dim_int = heads.get_shape().as_list()[-1] - assert deps_dim_int == heads_dim_int - with tf.variable_scope(name): - kernel_shape = (deps_dim_int, heads_dim_int) - kernel = tf.get_variable('kernel', shape=kernel_shape, initializer=tf.initializers.identity()) - first_bias = tf.get_variable('first_bias', shape=(kernel_shape[0], 1), - initializer=xavier_initializer()) - second_bias = tf.get_variable('second_bias', shape=(kernel_shape[1], 1), - initializer=xavier_initializer()) - # deps.shape = (B, L, D) - # first.shape = (B, L, D), first_rie = sum_d deps_{rid} kernel_{de} - first = tf.tensordot(deps, kernel, axes=[-1, -2]) - answer = tf.matmul(first, heads, transpose_b=True) # answer.shape = (B, L, L) - # add bias over x axis - first_bias_term = tf.tensordot(deps, first_bias, axes=[-1, -2]) - answer += first_bias_term - # add bias over y axis - second_bias_term = tf.tensordot(heads, second_bias, axes=[-1, -2]) # (B, L, 1) - second_bias_term = tf.transpose(second_bias_term, [0, 2, 1]) # (B, 1, L) - answer += second_bias_term - return answer - - -@register('bert_syntax_parser') -class BertSyntaxParser(BertSequenceNetwork): - """BERT-based model for syntax parsing. - For each word the model predicts the index of its syntactic head - and the label of the dependency between this head and the current word. - See :class:`deeppavlov.models.bert.bert_sequence_tagger.BertSequenceNetwork` - for the description of inherited parameters. - - Args: - n_deps: number of distinct syntactic dependencies - embeddings_dropout: dropout for embeddings in biaffine layer - state_size: the size of hidden state in biaffine layer - dep_state_size: the size of hidden state in biaffine layer - use_birnn: whether to use bidirection rnn after BERT layers. - Set it to `True` as it leads to much higher performance at least on large datasets - birnn_cell_type: the type of Bidirectional RNN. Either `lstm` or `gru` - birnn_hidden_size: number of hidden units in the BiRNN layer in each direction - return_probas: set this to `True` if you need the probabilities instead of raw answers - predict tags: whether to predict morphological tags together with syntactic information - n_tags: the number of morphological tags - tag_weight: the weight of tag model loss in multitask training - """ - - def __init__(self, - n_deps: int, - keep_prob: float, - bert_config_file: str, - pretrained_bert: str = None, - attention_probs_keep_prob: float = None, - hidden_keep_prob: float = None, - embeddings_dropout: float = 0.0, - encoder_layer_ids: List[int] = (-1,), - encoder_dropout: float = 0.0, - optimizer: str = None, - weight_decay_rate: float = 1e-6, - state_size: int = 256, - use_birnn: bool = True, - birnn_cell_type: str = 'lstm', - birnn_hidden_size: int = 256, - ema_decay: float = None, - ema_variables_on_cpu: bool = True, - predict_tags = False, - n_tags = None, - tag_weight = 1.0, - return_probas: bool = False, - freeze_embeddings: bool = False, - learning_rate: float = 1e-3, - bert_learning_rate: float = 2e-5, - min_learning_rate: float = 1e-07, - learning_rate_drop_patience: int = 20, - learning_rate_drop_div: float = 2.0, - load_before_drop: bool = True, - clip_norm: float = 1.0, - **kwargs) -> None: - self.n_deps = n_deps - self.embeddings_dropout = embeddings_dropout - self.state_size = state_size - self.use_birnn = use_birnn - self.birnn_cell_type = birnn_cell_type - self.birnn_hidden_size = birnn_hidden_size - self.return_probas = return_probas - self.predict_tags = predict_tags - self.n_tags = n_tags - self.tag_weight = tag_weight - if self.predict_tags and self.n_tags is None: - raise ValueError("n_tags should be given if `predict_tags`=True.") - super().__init__(keep_prob=keep_prob, - bert_config_file=bert_config_file, - pretrained_bert=pretrained_bert, - attention_probs_keep_prob=attention_probs_keep_prob, - hidden_keep_prob=hidden_keep_prob, - encoder_layer_ids=encoder_layer_ids, - encoder_dropout=encoder_dropout, - optimizer=optimizer, - weight_decay_rate=weight_decay_rate, - ema_decay=ema_decay, - ema_variables_on_cpu=ema_variables_on_cpu, - freeze_embeddings=freeze_embeddings, - learning_rate=learning_rate, - bert_learning_rate=bert_learning_rate, - min_learning_rate=min_learning_rate, - learning_rate_drop_div=learning_rate_drop_div, - learning_rate_drop_patience=learning_rate_drop_patience, - load_before_drop=load_before_drop, - clip_norm=clip_norm, - **kwargs) - - def _init_graph(self) -> None: - self._init_placeholders() - - units = super()._init_graph() - - with tf.variable_scope('ner'): - units = token_from_subtoken(units, self.y_masks_ph) - if self.use_birnn: - units, _ = bi_rnn(units, - self.birnn_hidden_size, - cell_type=self.birnn_cell_type, - seq_lengths=self.seq_lengths, - name='birnn') - units = tf.concat(units, -1) - # for heads - head_embeddings = tf.layers.dense(units, units=self.state_size, activation="relu") - head_embeddings = tf.nn.dropout(head_embeddings, self.embeddings_keep_prob_ph) - dep_embeddings = tf.layers.dense(units, units=self.state_size, activation="relu") - dep_embeddings = tf.nn.dropout(dep_embeddings, self.embeddings_keep_prob_ph) - self.dep_head_similarities = biaffine_attention(dep_embeddings, head_embeddings) - self.dep_heads = tf.argmax(self.dep_head_similarities, -1) - self.dep_head_probs = tf.nn.softmax(self.dep_head_similarities) - # for dependency types - head_embeddings = tf.layers.dense(units, units=self.state_size, activation="relu") - head_embeddings = tf.nn.dropout(head_embeddings, self.embeddings_keep_prob_ph) - dep_embeddings = tf.layers.dense(units, units=self.state_size, activation="relu") - dep_embeddings = tf.nn.dropout(dep_embeddings, self.embeddings_keep_prob_ph) - # matching each word with its head - head_embeddings = gather_indexes(head_embeddings, self.y_head_ph) - self.dep_logits = biaffine_layer(dep_embeddings, head_embeddings, - deps_dim=self.state_size, heads_dim=self.state_size, - output_dim=self.n_deps) - self.deps = tf.argmax(self.dep_logits, -1) - self.dep_probs = tf.nn.softmax(self.dep_logits) - if self.predict_tags: - tag_embeddings = tf.layers.dense(units, units=self.state_size, activation="relu") - tag_embeddings = tf.nn.dropout(tag_embeddings, self.embeddings_keep_prob_ph) - self.tag_logits = tf.layers.dense(tag_embeddings, units=self.n_tags) - self.tags = tf.argmax(self.tag_logits, -1) - self.tag_probs = tf.nn.softmax(self.tag_logits) - with tf.variable_scope("loss"): - tag_mask = self._get_tag_mask() - y_mask = tf.cast(tag_mask, tf.float32) - self.loss = tf.losses.sparse_softmax_cross_entropy(labels=self.y_head_ph, - logits=self.dep_head_similarities, - weights=y_mask) - self.loss += tf.losses.sparse_softmax_cross_entropy(labels=self.y_dep_ph, - logits=self.dep_logits, - weights=y_mask) - if self.predict_tags: - tag_loss = tf.losses.sparse_softmax_cross_entropy(labels=self.y_tag_ph, - logits=self.tag_logits, - weights=y_mask) - self.loss += self.tag_weight_ph * tag_loss - - def _init_placeholders(self) -> None: - super()._init_placeholders() - self.y_head_ph = tf.placeholder(shape=(None, None), dtype=tf.int32, name='y_head_ph') - self.y_dep_ph = tf.placeholder(shape=(None, None), dtype=tf.int32, name='y_dep_ph') - if self.predict_tags: - self.y_tag_ph = tf.placeholder(shape=(None, None), dtype=tf.int32, name='y_tag_ph') - self.y_masks_ph = tf.placeholder(shape=(None, None), dtype=tf.int32, name='y_mask_ph') - self.embeddings_keep_prob_ph = tf.placeholder_with_default( - 1.0, shape=[], name="embeddings_keep_prob_ph") - if self.predict_tags: - self.tag_weight_ph = tf.placeholder_with_default(1.0, shape=[], name="tag_weight_ph") - - def _build_feed_dict(self, input_ids, input_masks, y_masks, - y_head=None, y_dep=None, y_tag=None) -> dict: - y_masks = np.concatenate([np.ones_like(y_masks[:,:1]), y_masks[:, 1:]], axis=1) - feed_dict = self._build_basic_feed_dict(input_ids, input_masks, train=(y_head is not None)) - feed_dict[self.y_masks_ph] = y_masks - if y_head is not None: - y_head = zero_pad(y_head) - y_head = np.concatenate([np.zeros_like(y_head[:,:1]), y_head], axis=1) - y_dep = zero_pad(y_dep) - y_dep = np.concatenate([np.zeros_like(y_dep[:,:1]), y_dep], axis=1) - feed_dict.update({self.embeddings_keep_prob_ph: 1.0 - self.embeddings_dropout, - self.y_head_ph: y_head, - self.y_dep_ph: y_dep}) - if self.predict_tags: - y_tag = np.concatenate([np.zeros_like(y_tag[:,:1]), y_tag], axis=1) - feed_dict.update({self.y_tag_ph: y_tag, self.tag_weight_ph: self.tag_weight}) - return feed_dict - - def __call__(self, - input_ids: Union[List[List[int]], np.ndarray], - input_masks: Union[List[List[int]], np.ndarray], - y_masks: Union[List[List[int]], np.ndarray]) \ - -> Union[Tuple[List[Union[List[int], np.ndarray]], List[List[int]]], - Tuple[List[Union[List[int], np.ndarray]], List[List[int]], List[List[int]]]]: - - """ Predicts the outputs for a batch of inputs. - By default (``return_probas`` = `False` and ``predict_tags`` = `False`) it returns two output batches. - The first is the batch of head indexes: `i` stands for `i`-th word in the sequence, - where numeration starts with 1. `0` is predicted for the syntactic root of the sentence. - The second is the batch of indexes for syntactic dependencies. - In case ``return_probas`` = `True` we return the probability distribution over possible heads - instead of the position of the most probable head. For a sentence of length `k` the output - is an array of shape `k * (k+1)`. - In case ``predict_tags`` = `True` the model additionally returns the index of the most probable - morphological tag for each word. The batch of such indexes becomes the third output of the function. - - Returns: - `pred_heads_to_return`, either a batch of most probable head positions for each token - (in case ``return_probas`` = `False`) - or a batch of probability distribution over token head positions - - `pred_deps`, the indexes of token dependency relations - - `pred_tags`: the indexes of token morphological tags (only if ``predict_tags`` = `True`) - - """ - feed_dict = self._build_feed_dict(input_ids, input_masks, y_masks) - if self.ema: - self.sess.run(self.ema.switch_to_test_op) - if self.return_probas: - pred_head_probs, pred_heads, seq_lengths =\ - self.sess.run([self.dep_head_probs, self.dep_heads, self.seq_lengths], feed_dict=feed_dict) - pred_heads_to_return = [np.array(p[1:l,:l]) for l, p in zip(seq_lengths, pred_head_probs)] - else: - pred_heads, seq_lengths = self.sess.run([self.dep_heads, self.seq_lengths], feed_dict=feed_dict) - pred_heads_to_return = [p[1:l] for l, p in zip(seq_lengths, pred_heads)] - feed_dict[self.y_head_ph] = pred_heads - pred_deps = self.sess.run(self.deps, feed_dict=feed_dict) - pred_deps = [p[1:l] for l, p in zip(seq_lengths, pred_deps)] - answer = [pred_heads_to_return, pred_deps] - if self.predict_tags: - pred_tags = self.sess.run(self.tags, feed_dict=feed_dict) - pred_tags = [p[1:l] for l, p in zip(seq_lengths, pred_tags)] - answer.append(pred_tags) - return tuple(answer) diff --git a/deeppavlov/models/syntax_parser/parser.py b/deeppavlov/models/syntax_parser/parser.py deleted file mode 100644 index eb18f97d88..0000000000 --- a/deeppavlov/models/syntax_parser/parser.py +++ /dev/null @@ -1,47 +0,0 @@ -# Copyright 2019 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from typing import List -import numpy as np - -from dependency_decoding import chu_liu_edmonds - -from deeppavlov.core.common.registry import register -from deeppavlov.core.models.component import Component - - -@register('chu_liu_edmonds_transformer') -class ChuLiuEdmonds(Component): - """ - A wrapper for Chu-Liu-Edmonds algorithm for maximum spanning tree - """ - def __init__(self, min_edge_prob=1e-6, **kwargs): - self.min_edge_prob = min_edge_prob - - def __call__(self, probs: List[np.ndarray]) -> List[List[int]]: - """Applies Chu-Liu-Edmonds algorithm to the matrix of head probabilities. - - probs: a 3D-array of probabilities of shape B*L*(L+1) - """ - answer = [] - for elem in probs: - m, n = elem.shape - assert n == m+1 - elem = np.log10(np.maximum(self.min_edge_prob, elem)) - np.log10(self.min_edge_prob) - elem = np.concatenate([np.zeros_like(elem[:1,:]), elem], axis=0) - # it makes impossible to create multiple edges 0->i - elem[1:, 0] += np.log10(self.min_edge_prob) * len(elem) - chl_data = chu_liu_edmonds(elem.astype("float64")) - answer.append(chl_data[0][1:]) - return answer diff --git a/deeppavlov/models/tokenizers/jieba_tokenizer.py b/deeppavlov/models/tokenizers/jieba_tokenizer.py deleted file mode 100644 index b56b19ec00..0000000000 --- a/deeppavlov/models/tokenizers/jieba_tokenizer.py +++ /dev/null @@ -1,68 +0,0 @@ -# Copyright 2020 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from typing import List, Union - -import jieba - -from deeppavlov.core.common.registry import register -from deeppavlov.core.models.component import Component - - -@register("jieba_tokenizer") -class JiebaTokenizer(Component): - """ - Tokenizes chinese text into tokens - - Doesn't have any parameters. - """ - - def __init__(self, **kwargs) -> None: - jieba.initialize() - pass - - @staticmethod - def tokenize_str(text: str) -> str: - """ - Tokenize a single string - - Args: - text: a string to tokenize - - Returns: - tokenized string - """ - return ' '.join(jieba.cut(text)) - - def __call__(self, batch: Union[List[str], List[List[str]]]) -> Union[List[str], List[List[str]]]: - """ - Tokenize either list of strings or list of list of strings - - Args: - batch a list of either strings or list of strings - - Returns: - tokenized strings in the given format - """ - - if isinstance(batch[0], str): - batch_tokenized = [JiebaTokenizer.tokenize_str(s) for s in batch] - elif isinstance(batch[0], list): - for lst in batch: - batch_tokenized = [self(lst) for lst in batch] - else: - raise NotImplementedError('Not implemented for types other than' - ' str or list') - - return batch_tokenized diff --git a/deeppavlov/models/tokenizers/lazy_tokenizer.py b/deeppavlov/models/tokenizers/lazy_tokenizer.py deleted file mode 100644 index f437bcbfb8..0000000000 --- a/deeppavlov/models/tokenizers/lazy_tokenizer.py +++ /dev/null @@ -1,37 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from logging import getLogger - -from nltk import word_tokenize -from overrides import overrides - -from deeppavlov.core.common.registry import register -from deeppavlov.core.models.component import Component - -log = getLogger(__name__) - - -@register('lazy_tokenizer') -class LazyTokenizer(Component): - """Tokenizes if there is something to tokenize.""" - - def __init__(self, **kwargs): - pass - - @overrides - def __call__(self, batch, *args, **kwargs): - if len(batch) > 0 and isinstance(batch[0], str): - batch = [word_tokenize(utt) for utt in batch] - return batch diff --git a/deeppavlov/models/tokenizers/ru_sent_tokenizer.py b/deeppavlov/models/tokenizers/ru_sent_tokenizer.py deleted file mode 100644 index 15055d5c37..0000000000 --- a/deeppavlov/models/tokenizers/ru_sent_tokenizer.py +++ /dev/null @@ -1,47 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from typing import Set, Tuple - -from rusenttokenize import ru_sent_tokenize, SHORTENINGS, JOINING_SHORTENINGS, PAIRED_SHORTENINGS - -from deeppavlov.core.common.registry import register -from deeppavlov.core.models.component import Component - - -@register("ru_sent_tokenizer") -class RuSentTokenizer(Component): - """ - Rule-base sentence tokenizer for Russian language. - https://github.com/deepmipt/ru_sentence_tokenizer - - Args: - shortenings: list of known shortenings. Use default value if working on news or fiction texts - joining_shortenings: list of shortenings after that sentence split is not possible (i.e. "ул"). - Use default value if working on news or fiction texts - paired_shortenings: list of known paired shotenings (i.e. "т. е."). - Use default value if working on news or fiction texts - - """ - - def __init__(self, shortenings: Set[str] = SHORTENINGS, - joining_shortenings: Set[str] = JOINING_SHORTENINGS, - paired_shortenings: Set[Tuple[str, str]] = PAIRED_SHORTENINGS, - **kwargs): - self.shortenings = shortenings - self.joining_shortenings = joining_shortenings - self.paired_shortenings = paired_shortenings - - def __call__(self, batch: [str]) -> [[str]]: - return [ru_sent_tokenize(x, self.shortenings, self.joining_shortenings, self.paired_shortenings) for x in batch] diff --git a/deeppavlov/models/tokenizers/spacy_tokenizer.py b/deeppavlov/models/tokenizers/spacy_tokenizer.py index 6c5ec5ddea..f0d65c81a7 100644 --- a/deeppavlov/models/tokenizers/spacy_tokenizer.py +++ b/deeppavlov/models/tokenizers/spacy_tokenizer.py @@ -25,6 +25,7 @@ logger = getLogger(__name__) +# TODO: make proper handling through spacy.cli.download in the stage of python -m deeppavlov download def _try_load_spacy_model(model_name: str, disable: Iterable[str] = ()): disable = set(disable) try: diff --git a/deeppavlov/models/torch_bert/crf.py b/deeppavlov/models/torch_bert/crf.py new file mode 100644 index 0000000000..8e89b27531 --- /dev/null +++ b/deeppavlov/models/torch_bert/crf.py @@ -0,0 +1,28 @@ +import numpy as np +import torch +from torch import nn +from torchcrf import CRF as CRFbase + + +class CRF(CRFbase): + """Class with Conditional Random Field from PyTorch-CRF library + with modified training function + """ + + def __init__(self, num_tags: int, batch_first: bool = False) -> None: + super().__init__(num_tags=num_tags, batch_first=batch_first) + nn.init.zeros_(self.transitions) + nn.init.zeros_(self.start_transitions) + nn.init.zeros_(self.end_transitions) + self.stats = torch.zeros((num_tags, num_tags), dtype=torch.float) + self.zeros = torch.zeros((num_tags, num_tags), dtype=torch.float) + self.neg = torch.full((num_tags, num_tags), -1000.0) + + def forward(self, tags_batch: torch.LongTensor, y_masks: np.ndarray): + seq_lengths = np.sum(y_masks, axis=1) + for seq_len, tags_list in zip(seq_lengths, tags_batch): + if seq_len > 1: + for i in range(seq_len - 1): + self.stats[int(tags_list[i])][int(tags_list[i + 1])] += 1.0 + with torch.no_grad(): + self.transitions.copy_(torch.where(self.stats > 0, self.zeros, self.neg)) diff --git a/deeppavlov/models/torch_bert/torch_bert_ranker.py b/deeppavlov/models/torch_bert/torch_bert_ranker.py index 8990e8ef0e..d3bbf79dc0 100644 --- a/deeppavlov/models/torch_bert/torch_bert_ranker.py +++ b/deeppavlov/models/torch_bert/torch_bert_ranker.py @@ -47,7 +47,7 @@ class TorchBertRankerModel(TorchModel): e.g. {'lr': 0.1, 'weight_decay': 0.001, 'momentum': 0.9} """ - def __init__(self, pretrained_bert: str, + def __init__(self, pretrained_bert: str = None, bert_config_file: Optional[str] = None, n_classes: int = 2, return_probas: bool = True, @@ -97,7 +97,7 @@ def train_on_batch(self, features_li: List[List[InputFeatures]], y: Union[List[i self.optimizer.zero_grad() loss, logits = self.model(b_input_ids, token_type_ids=None, attention_mask=b_input_masks, - labels=b_labels) + labels=b_labels, return_dict=False) loss.backward() # Clip the norm of the gradients to 1.0. # This is to help prevent the "exploding gradients" problem. @@ -162,13 +162,18 @@ def load(self, fname=None): if self.pretrained_bert: log.info(f"From pretrained {self.pretrained_bert}.") + if Path(expand_path(self.pretrained_bert)).exists(): + self.pretrained_bert = str(expand_path(self.pretrained_bert)) config = AutoConfig.from_pretrained(self.pretrained_bert, # num_labels=self.n_classes, output_attentions=False, output_hidden_states=False) + self.model = AutoModelForSequenceClassification.from_pretrained(self.pretrained_bert, config=config) + # TODO: make better exception handling here and at + # deeppavlov.models.torch_bert.torch_transformers_classifier.TorchTransformersClassifierModel.load try: hidden_size = self.model.classifier.out_proj.in_features @@ -178,7 +183,7 @@ def load(self, fname=None): self.model.classifier.out_proj.out_features = self.n_classes self.model.num_labels = self.n_classes - except torch.nn.modules.module.ModuleAttributeError: + except AttributeError: hidden_size = self.model.classifier.in_features if self.n_classes != self.model.num_labels: @@ -188,13 +193,10 @@ def load(self, fname=None): self.model.num_labels = self.n_classes - elif self.bert_config_file and Path(self.bert_config_file).is_file(): - self.bert_config = AutoConfig.from_json_file(str(expand_path(self.bert_config_file))) - if self.attention_probs_keep_prob is not None: - self.bert_config.attention_probs_dropout_prob = 1.0 - self.attention_probs_keep_prob - if self.hidden_keep_prob is not None: - self.bert_config.hidden_dropout_prob = 1.0 - self.hidden_keep_prob + elif self.bert_config_file and expand_path(self.bert_config_file).is_file(): + self.bert_config = AutoConfig.from_pretrained(str(expand_path(self.bert_config_file))) self.model = AutoModelForSequenceClassification.from_config(config=self.bert_config) + else: raise ConfigError("No pre-trained BERT model is given.") @@ -205,28 +207,4 @@ def load(self, fname=None): if self.lr_scheduler_name is not None: self.lr_scheduler = getattr(torch.optim.lr_scheduler, self.lr_scheduler_name)( self.optimizer, **self.lr_scheduler_parameters) - - if self.load_path: - log.info(f"Load path {self.load_path} is given.") - if isinstance(self.load_path, Path) and not self.load_path.parent.is_dir(): - raise ConfigError("Provided load path is incorrect!") - - weights_path = Path(self.load_path.resolve()) - weights_path = weights_path.with_suffix(f".pth.tar") - if weights_path.exists(): - log.info(f"Load path {weights_path} exists.") - log.info(f"Initializing `{self.__class__.__name__}` from saved.") - - # now load the weights, optimizer from saved - log.info(f"Loading weights from {weights_path}.") - checkpoint = torch.load(weights_path, map_location=self.device) - # set strict flag to False if position_ids are missing - # this is needed to load models trained on older versions - # of transformers library - strict_load_flag = bool([key for key in checkpoint["model_state_dict"].keys() - if key.endswith("embeddings.position_ids")]) - self.model.load_state_dict(checkpoint["model_state_dict"], strict=strict_load_flag) - self.optimizer.load_state_dict(checkpoint["optimizer_state_dict"]) - self.epochs_done = checkpoint.get("epochs_done", 0) - else: - log.info(f"Init from scratch. Load path {weights_path} does not exist.") + super().load() diff --git a/deeppavlov/models/torch_bert/torch_transformers_classifier.py b/deeppavlov/models/torch_bert/torch_transformers_classifier.py index 3bf8077518..064908d7f5 100644 --- a/deeppavlov/models/torch_bert/torch_transformers_classifier.py +++ b/deeppavlov/models/torch_bert/torch_transformers_classifier.py @@ -21,7 +21,7 @@ import torch from overrides import overrides from torch.nn import BCEWithLogitsLoss -from transformers import AutoModelForSequenceClassification, AutoConfig, AutoModel +from transformers import AutoModelForSequenceClassification, AutoConfig, AutoModel, AutoTokenizer from transformers.modeling_outputs import SequenceClassifierOutput from deeppavlov.core.common.errors import ConfigError @@ -51,6 +51,8 @@ class TorchTransformersClassifierModel(TorchModel): e.g. {'lr': 0.1, 'weight_decay': 0.001, 'momentum': 0.9} clip_norm: clip gradients by norm coefficient bert_config_file: path to Bert configuration file (not used if pretrained_bert is key title) + is_binary: whether classification task is binary or multi-class + num_special_tokens: number of special tokens used by classification model """ def __init__(self, n_classes, @@ -65,6 +67,7 @@ def __init__(self, n_classes, clip_norm: Optional[float] = None, bert_config_file: Optional[str] = None, is_binary: Optional[bool] = False, + num_special_tokens: int = None, **kwargs) -> None: if not optimizer_parameters: @@ -84,6 +87,7 @@ def __init__(self, n_classes, self.clip_norm = clip_norm self.is_binary = is_binary self.bert_config = None + self.num_special_tokens = num_special_tokens if self.multilabel and not self.one_hot_labels: raise RuntimeError('Use one-hot encoded labels for multilabel classification!') @@ -210,7 +214,8 @@ def load(self, fname=None): else: self.model = AutoModelForSequenceClassification.from_pretrained(self.pretrained_bert, config=config) - # TODO need a better solution here + # TODO need a better solution here and at + # deeppavlov.models.torch_bert.torch_bert_ranker.TorchBertRankerModel.load try: hidden_size = self.model.classifier.out_proj.in_features @@ -221,7 +226,7 @@ def load(self, fname=None): self.model.classifier.out_proj.out_features = self.n_classes self.model.num_labels = self.n_classes - except torch.nn.modules.module.ModuleAttributeError: + except AttributeError: hidden_size = self.model.classifier.in_features if self.n_classes != self.model.num_labels: @@ -240,6 +245,10 @@ def load(self, fname=None): else: raise ConfigError("No pre-trained BERT model is given.") + tokenizer = AutoTokenizer.from_pretrained(self.pretrained_bert) + if self.num_special_tokens: + self.model.resize_token_embeddings(len(tokenizer) + self.num_special_tokens) + # TODO that should probably be parametrized in config if self.device.type == "cuda" and torch.cuda.device_count() > 1: self.model = torch.nn.DataParallel(self.model) @@ -251,41 +260,7 @@ def load(self, fname=None): if self.lr_scheduler_name is not None: self.lr_scheduler = getattr(torch.optim.lr_scheduler, self.lr_scheduler_name)( self.optimizer, **self.lr_scheduler_parameters) - - if self.load_path: - log.info(f"Load path {self.load_path} is given.") - if isinstance(self.load_path, Path) and not self.load_path.parent.is_dir(): - raise ConfigError("Provided load path is incorrect!") - - weights_path = Path(self.load_path.resolve()) - weights_path = weights_path.with_suffix(f".pth.tar") - if weights_path.exists(): - log.info(f"Load path {weights_path} exists.") - log.info(f"Initializing `{self.__class__.__name__}` from saved.") - - # now load the weights, optimizer from saved - log.info(f"Loading weights from {weights_path}.") - checkpoint = torch.load(weights_path, map_location=self.device) - model_state = checkpoint["model_state_dict"] - optimizer_state = checkpoint["optimizer_state_dict"] - - # load a multi-gpu model on a single device - if not self.is_data_parallel and "module." in list(model_state.keys())[0]: - tmp_model_state = {} - for key, value in model_state.items(): - tmp_model_state[re.sub("module.", "", key)] = value - model_state = tmp_model_state - - # set strict flag to False if position_ids are missing - # this is needed to load models trained on older versions - # of transformers library - strict_load_flag = bool([key for key in checkpoint["model_state_dict"].keys() - if key.endswith("embeddings.position_ids")]) - self.model.load_state_dict(model_state, strict=strict_load_flag) - self.optimizer.load_state_dict(optimizer_state) - self.epochs_done = checkpoint.get("epochs_done", 0) - else: - log.info(f"Init from scratch. Load path {weights_path} does not exist.") + super().load() class AutoModelForBinaryClassification(torch.nn.Module): diff --git a/deeppavlov/models/torch_bert/torch_transformers_el_ranker.py b/deeppavlov/models/torch_bert/torch_transformers_el_ranker.py new file mode 100644 index 0000000000..fa269182c4 --- /dev/null +++ b/deeppavlov/models/torch_bert/torch_transformers_el_ranker.py @@ -0,0 +1,445 @@ +# Copyright 2017 Neural Networks and Deep Learning lab, MIPT +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from logging import getLogger +from pathlib import Path +from typing import List, Optional, Dict, Tuple, Union, Any + +import numpy as np +import torch +import torch.nn as nn +import torch.nn.functional as F +from torch import Tensor +from transformers import AutoConfig, AutoTokenizer, AutoModel + +from deeppavlov.core.commands.utils import expand_path +from deeppavlov.core.common.errors import ConfigError +from deeppavlov.core.common.registry import register +from deeppavlov.core.models.torch_model import TorchModel +from deeppavlov.models.preprocessors.torch_transformers_preprocessor import TorchTransformersEntityRankerPreprocessor + +log = getLogger(__name__) + + +@register('torch_transformers_el_ranker') +class TorchTransformersElRanker(TorchModel): + """Class for ranking of entities by context and description + Args: + model_name: name of the function which initialises and returns the model class + encoder_save_path: path to save the encoder checkpoint + bilinear_save_path: path to save bilinear layer checkpoint + block_size: size of block in bilinear layer + emb_size: entity embedding size + pretrained_bert: pretrained Bert checkpoint path or key title (e.g. "bert-base-uncased") + bert_config_file: path to Bert configuration file, or None, if `pretrained_bert` is a string name + criterion: name of loss function + optimizer: optimizer name from `torch.optim` + optimizer_parameters: dictionary with optimizer's parameters, + e.g. {'lr': 0.1, 'weight_decay': 0.001, 'momentum': 0.9} + return_probas: set this to `True` if you need the probabilities instead of raw answers + attention_probs_keep_prob: keep_prob for Bert self-attention layers + hidden_keep_prob: keep_prob for Bert hidden layers + clip_norm: clip gradients by norm + """ + + def __init__( + self, + model_name: str, + encoder_save_path: str, + bilinear_save_path: str, + block_size: int, + emb_size: int, + pretrained_bert: str = None, + bert_config_file: Optional[str] = None, + criterion: str = "CrossEntropyLoss", + optimizer: str = "AdamW", + optimizer_parameters: Dict = None, + return_probas: bool = False, + attention_probs_keep_prob: Optional[float] = None, + hidden_keep_prob: Optional[float] = None, + clip_norm: Optional[float] = None, + **kwargs + ): + self.encoder_save_path = encoder_save_path + self.bilinear_save_path = bilinear_save_path + self.pretrained_bert = pretrained_bert + self.bert_config_file = bert_config_file + self.return_probas = return_probas + self.attention_probs_keep_prob = attention_probs_keep_prob + self.hidden_keep_prob = hidden_keep_prob + self.clip_norm = clip_norm + self.block_size = block_size + self.emb_size = emb_size + + super().__init__( + model_name=model_name, + optimizer=optimizer, + criterion=criterion, + optimizer_parameters=optimizer_parameters, + return_probas=return_probas, + **kwargs) + + def train_on_batch(self, q_features: List[Dict], + c_features: List[Dict], + entity_tokens_pos: List[int], + labels: List[int]) -> float: + """ + + Args: + q_features: batch of indices of text subwords + c_features: batch of indices of entity description subwords + entity_tokens_pos: list of indices of special tokens + labels: 1 if entity is appropriate to context, 0 - otherwise + + Returns: + the value of loss + """ + _input = {'labels': labels} + _input['entity_tokens_pos'] = entity_tokens_pos + for elem in ['input_ids', 'attention_mask']: + inp_elem = [getattr(f, elem) for f in q_features] + _input[f"q_{elem}"] = torch.LongTensor(inp_elem).to(self.device) + for elem in ['input_ids', 'attention_mask']: + inp_elem = [getattr(f, elem) for f in c_features] + _input[f"c_{elem}"] = torch.LongTensor(inp_elem).to(self.device) + + self.model.train() + self.model.zero_grad() + self.optimizer.zero_grad() # zero the parameter gradients + + loss, softmax_scores = self.model(**_input) + loss.backward() + self.optimizer.step() + + # Clip the norm of the gradients to prevent the "exploding gradients" problem + if self.clip_norm: + torch.nn.utils.clip_grad_norm_(self.model.parameters(), self.clip_norm) + + if self.lr_scheduler is not None: + self.lr_scheduler.step() + + return loss.item() + + def __call__(self, q_features: List[Dict], + c_features: List[Dict], + entity_tokens_pos: List[int]) -> Union[List[int], List[np.ndarray]]: + """ Predicts entity labels (1 if the entity description is appropriate to the context, 0 - otherwise) + + Args: + q_features: batch of indices of text subwords + c_features: batch of indices of entity description subwords + entity_tokens_pos: list of indices of special tokens + + Returns: + Label indices or class probabilities for each token (not subtoken) + + """ + self.model.eval() + + _input = {'entity_tokens_pos': entity_tokens_pos} + for elem in ['input_ids', 'attention_mask']: + inp_elem = [getattr(f, elem) for f in q_features] + _input[f"q_{elem}"] = torch.LongTensor(inp_elem).to(self.device) + for elem in ['input_ids', 'attention_mask']: + inp_elem = [getattr(f, elem) for f in c_features] + _input[f"c_{elem}"] = torch.LongTensor(inp_elem).to(self.device) + + with torch.no_grad(): + softmax_scores = self.model(**_input) + if self.return_probas: + pred = softmax_scores + else: + pred = torch.argmax(softmax_scores, dim=1).cpu().numpy() + + return pred + + def siamese_ranking_el_model(self, **kwargs) -> nn.Module: + return SiameseBertElModel( + pretrained_bert=self.pretrained_bert, + encoder_save_path=self.encoder_save_path, + bilinear_save_path=self.bilinear_save_path, + bert_config_file=self.pretrained_bert, + device=self.device, + block_size=self.block_size, + emb_size=self.emb_size + ) + + def save(self, fname: Optional[str] = None, *args, **kwargs) -> None: + if fname is None: + fname = self.save_path + if not fname.parent.is_dir(): + raise ConfigError("Provided save path is incorrect!") + weights_path = Path(fname).with_suffix(f".pth.tar") + log.info(f"Saving model to {weights_path}.") + torch.save({ + "model_state_dict": self.model.cpu().state_dict(), + "optimizer_state_dict": self.optimizer.state_dict(), + "epochs_done": self.epochs_done + }, weights_path) + self.model.to(self.device) + self.model.save() + + +class TextEncoder(nn.Module): + """Class for obtaining the BERT output for CLS-token and special entity token + Args: + pretrained_bert: pretrained Bert checkpoint path or key title (e.g. "bert-base-uncased") + bert_config_file: path to Bert configuration file, or None, if `pretrained_bert` is a string name + device: device to use + """ + + def __init__(self, pretrained_bert: str = None, + bert_config_file: str = None, + device: torch.device = torch.device('cpu')): + super().__init__() + self.pretrained_bert = pretrained_bert + self.bert_config_file = bert_config_file + self.encoder, self.config, self.bert_config = None, None, None + self.device = device + self.load() + self.tokenizer = AutoTokenizer.from_pretrained(self.pretrained_bert) + self.encoder.resize_token_embeddings(len(self.tokenizer) + 1) + self.encoder.to(self.device) + + def forward(self, + input_ids: Tensor, + attention_mask: Tensor, + entity_tokens_pos: List[int] = None + ) -> Union[Tuple[Any, Tensor], Tuple[Tensor]]: + if entity_tokens_pos is not None: + q_outputs = self.encoder(input_ids=input_ids, attention_mask=attention_mask) + q_hidden_states = q_outputs.last_hidden_state + + entity_emb = [] + for i in range(len(entity_tokens_pos)): + pos = entity_tokens_pos[i] + entity_emb.append(q_hidden_states[i, pos]) + + entity_emb = torch.stack(entity_emb, dim=0).to(self.device) + return entity_emb + else: + c_outputs = self.encoder(input_ids=input_ids, attention_mask=attention_mask) + c_cls_emb = c_outputs.last_hidden_state[:, :1, :].squeeze(1) + return c_cls_emb + + def load(self) -> None: + if self.pretrained_bert: + log.info(f"From pretrained {self.pretrained_bert}.") + self.config = AutoConfig.from_pretrained( + self.pretrained_bert, output_hidden_states=True + ) + self.encoder = AutoModel.from_pretrained(self.pretrained_bert, config=self.config) + + elif self.bert_config_file and Path(self.bert_config_file).is_file(): + self.config = AutoConfig.from_json_file(str(expand_path(self.bert_config_file))) + self.encoder = AutoModel.from_config(config=self.bert_config) + else: + raise ConfigError("No pre-trained BERT model is given.") + self.encoder.to(self.device) + + +class BilinearRanking(nn.Module): + """Class for calculation of bilinear form of two vectors + Args: + n_classes: number of classes for classification + emb_size: entity embedding size + block_size: size of block in bilinear layer + """ + + def __init__(self, n_classes: int = 2, emb_size: int = 768, block_size: int = 8): + super().__init__() + self.n_classes = n_classes + self.emb_size = emb_size + self.block_size = block_size + self.bilinear = nn.Linear(self.emb_size * self.block_size, self.n_classes) + self.softmax = nn.Softmax(dim=1) + + def forward(self, text1: Tensor, text2: Tensor): + b1 = text1.view(-1, self.emb_size // self.block_size, self.block_size) + b2 = text2.view(-1, self.emb_size // self.block_size, self.block_size) + bl = (b1.unsqueeze(3) * b2.unsqueeze(2)).view(-1, self.emb_size * self.block_size) + logits = self.bilinear(bl) + softmax_logits = self.softmax(logits) + log_softmax = F.log_softmax(logits, dim=-1) + return softmax_logits, log_softmax + + +class SiameseBertElModel(nn.Module): + """Class with model for ranking of entities by context and description + Args: + emb_size: entity embedding size + block_size: size of block in bilinear layer + encoder_save_path: path to save the encoder checkpoint + bilinear_save_path: path to save bilinear layer checkpoint + pretrained_bert: pretrained Bert checkpoint path or key title (e.g. "bert-base-uncased") + bert_config_file: path to Bert configuration file, or None, if `pretrained_bert` is a string name + device: device to use + """ + + def __init__( + self, + emb_size: int, + block_size: int, + encoder_save_path: str, + bilinear_save_path: str, + pretrained_bert: str = None, + bert_config_file: str = None, + device: torch.device = torch.device('cpu') + ): + super().__init__() + self.pretrained_bert = pretrained_bert + self.encoder_save_path = encoder_save_path + self.bilinear_save_path = bilinear_save_path + self.bert_config_file = bert_config_file + self.device = device + + # initialize parameters that would be filled later + self.encoder = TextEncoder(pretrained_bert=self.pretrained_bert, device=self.device) + self.bilinear_ranker = BilinearRanking(emb_size, block_size) + + def forward( + self, + q_input_ids: Tensor, + q_attention_mask: Tensor, + c_input_ids: Tensor, + c_attention_mask: Tensor, + entity_tokens_pos: List, + labels: List[int] = None + ) -> Union[Tuple[Any, Tensor], Tuple[Tensor]]: + + entity_emb = self.encoder(input_ids=q_input_ids, attention_mask=q_attention_mask, + entity_tokens_pos=entity_tokens_pos) + c_cls_emb = self.encoder(input_ids=c_input_ids, attention_mask=c_attention_mask) + softmax_scores, log_softmax = self.bilinear_ranker(entity_emb, c_cls_emb) + + if labels is not None: + labels_one_hot = [[0.0, 0.0] for _ in labels] + for i in range(len(labels)): + labels_one_hot[i][labels[i]] = 1.0 + labels_one_hot = torch.Tensor(labels_one_hot).to(self.device) + + bs, dim = labels_one_hot.shape + per_sample_loss = -torch.bmm(labels_one_hot.view(bs, 1, dim), log_softmax.view(bs, dim, 1)).squeeze( + 2).squeeze(1) + loss = torch.mean(per_sample_loss) + return loss, softmax_scores + else: + return softmax_scores + + def save(self) -> None: + encoder_weights_path = expand_path(self.encoder_save_path).with_suffix(f".pth.tar") + log.info(f"Saving encoder to {encoder_weights_path}.") + torch.save({"model_state_dict": self.encoder.cpu().state_dict()}, encoder_weights_path) + bilinear_weights_path = expand_path(self.bilinear_save_path).with_suffix(f".pth.tar") + log.info(f"Saving bilinear weights to {bilinear_weights_path}.") + torch.save({"model_state_dict": self.bilinear_ranker.cpu().state_dict()}, bilinear_weights_path) + self.encoder.to(self.device) + self.bilinear_ranker.to(self.device) + + +@register('torch_transformers_entity_ranker_infer') +class TorchTransformersEntityRankerInfer: + """Class for infering of model for ranking of entities from a knowledge base by context and description + Args: + pretrained_bert: pretrained Bert checkpoint path or key title (e.g. "bert-base-uncased") + encoder_weights_path: path to save the encoder checkpoint + bilinear_weights_path: path to save bilinear layer checkpoint + spaecial_token_id: id of special token + do_lower_case: whether to lower case the text + batch_size: batch size when model infering + emb_size: entity embedding size + block_size: size of block in bilinear layer + device: `cpu` or `gpu` device to use + """ + + def __init__(self, pretrained_bert, + encoder_weights_path, + bilinear_weights_path, + special_token_id: int, + do_lower_case: bool = False, + batch_size: int = 5, + emb_size: int = 300, + block_size: int = 8, + device: str = "gpu", **kwargs): + self.device = torch.device("cuda" if torch.cuda.is_available() and device == "gpu" else "cpu") + self.pretrained_bert = pretrained_bert + self.preprocessor = TorchTransformersEntityRankerPreprocessor(vocab_file=self.pretrained_bert, + do_lower_case=do_lower_case, + special_tokens=["[ENT]"]) + self.encoder, self.config = None, None + self.config = AutoConfig.from_pretrained(self.pretrained_bert, output_hidden_states=True) + self.emb_size = emb_size + self.block_size = block_size + self.encoder = TextEncoder(pretrained_bert=self.pretrained_bert, device=self.device) + self.encoder_weights_path = str(expand_path(encoder_weights_path)) + self.bilinear_weights_path = str(expand_path(bilinear_weights_path)) + encoder_checkpoint = torch.load(self.encoder_weights_path, map_location=self.device) + self.encoder.load_state_dict(encoder_checkpoint["model_state_dict"]) + self.encoder.to(self.device) + self.bilinear_ranking = BilinearRanking(emb_size=self.emb_size, block_size=self.block_size) + bilinear_checkpoint = torch.load(self.bilinear_weights_path, map_location=self.device) + self.bilinear_ranking.load_state_dict(bilinear_checkpoint["model_state_dict"]) + self.bilinear_ranking.to(self.device) + self.special_token_id = special_token_id + self.batch_size = batch_size + + def __call__(self, contexts_batch: List[str], + candidate_entities_batch: List[List[str]], + candidate_entities_descr_batch: List[List[str]]): + entity_emb_batch = [] + + num_batches = len(contexts_batch) // self.batch_size + int(len(contexts_batch) % self.batch_size > 0) + for ii in range(num_batches): + contexts_list = contexts_batch[ii * self.batch_size:(ii + 1) * self.batch_size] + context_features = self.preprocessor(contexts_list) + context_input_ids = context_features["input_ids"].to(self.device) + context_attention_mask = context_features["attention_mask"].to(self.device) + special_tokens_pos = [] + for input_ids_list in context_input_ids: + found_n = -1 + for n, input_id in enumerate(input_ids_list): + if input_id == self.special_token_id: + found_n = n + break + if found_n == -1: + found_n = 0 + special_tokens_pos.append(found_n) + + cur_entity_emb_batch = self.encoder(input_ids=context_input_ids, + attention_mask=context_attention_mask, + entity_tokens_pos=special_tokens_pos) + + entity_emb_batch += cur_entity_emb_batch.detach().cpu().numpy().tolist() + + scores_batch = [] + for entity_emb, candidate_entities_list, candidate_entities_descr_list in \ + zip(entity_emb_batch, candidate_entities_batch, candidate_entities_descr_batch): + if candidate_entities_list: + entity_emb = [entity_emb for _ in candidate_entities_list] + entity_emb = torch.Tensor(entity_emb).to(self.device) + descr_features = self.preprocessor(candidate_entities_descr_list) + descr_input_ids = descr_features["input_ids"].to(self.device) + descr_attention_mask = descr_features["attention_mask"].to(self.device) + candidate_entities_emb = self.encoder(input_ids=descr_input_ids, + attention_mask=descr_attention_mask) + scores_list, _ = self.bilinear_ranking(entity_emb, candidate_entities_emb) + scores_list = scores_list.detach().cpu().numpy() + scores_list = [score[1] for score in scores_list] + entities_with_scores = [(entity, score) for entity, score in zip(candidate_entities_list, scores_list)] + entities_with_scores = sorted(entities_with_scores, key=lambda x: x[1], reverse=True) + scores_batch.append(entities_with_scores) + else: + scores_batch.append([]) + + return scores_batch diff --git a/deeppavlov/models/torch_bert/torch_transformers_sequence_tagger.py b/deeppavlov/models/torch_bert/torch_transformers_sequence_tagger.py index 1ca16eb637..2ed835b916 100644 --- a/deeppavlov/models/torch_bert/torch_transformers_sequence_tagger.py +++ b/deeppavlov/models/torch_bert/torch_transformers_sequence_tagger.py @@ -14,7 +14,7 @@ from logging import getLogger from pathlib import Path -from typing import List, Union, Dict, Optional +from typing import List, Union, Dict, Optional, Tuple import numpy as np import torch @@ -25,6 +25,7 @@ from deeppavlov.core.common.errors import ConfigError from deeppavlov.core.common.registry import register from deeppavlov.core.models.torch_model import TorchModel +from deeppavlov.models.torch_bert.crf import CRF log = getLogger(__name__) @@ -58,97 +59,38 @@ def token_from_subtoken(units: torch.Tensor, mask: torch.Tensor) -> torch.Tensor nf = shape[2] nf_int = units.size()[-1] - # number of TOKENS in each sentence token_seq_lengths = torch.sum(mask, 1).to(torch.int64) - # for a matrix m = - # [[1, 1, 1], - # [0, 1, 1], - # [1, 0, 0]] - # it will be - # [3, 2, 1] n_words = torch.sum(token_seq_lengths) - # n_words -> 6 max_token_seq_len = torch.max(token_seq_lengths) - # max_token_seq_len -> 3 idxs = torch.stack(torch.nonzero(mask, as_tuple=True), dim=1) - # for the matrix mentioned above - # tf.where(mask) -> - # [[0, 0], - # [0, 1] - # [0, 2], - # [1, 1], - # [1, 2] - # [2, 0]] sample_ids_in_batch = torch.nn.functional.pad(input=idxs[:, 0], pad=[1, 0]) - # for indices - # [[0, 0], - # [0, 1] - # [0, 2], - # [1, 1], - # [1, 2], - # [2, 0]] - # it is - # [0, 0, 0, 0, 1, 1, 2] - # padding is for computing change from one sample to another in the batch a = torch.logical_not(torch.eq(sample_ids_in_batch[1:], sample_ids_in_batch[:-1]).to(torch.int64)) - # for the example above the result of this statement equals - # [0, 0, 0, 1, 0, 1] - # so data samples begin in 3rd and 5th positions (the indexes of ones) - # transforming sample start masks to the sample starts themselves q = a * torch.arange(n_words).to(torch.int64) - # [0, 0, 0, 3, 0, 5] count_to_substract = torch.nn.functional.pad(torch.masked_select(q, q.to(torch.bool)), [1, 0]) - # [0, 3, 5] new_word_indices = torch.arange(n_words).to(torch.int64) - torch.gather( count_to_substract, dim=0, index=torch.cumsum(a, 0)) - # tf.range(n_words) -> [0, 1, 2, 3, 4, 5] - # tf.cumsum(a) -> [0, 0, 0, 1, 1, 2] - # tf.gather(count_to_substract, tf.cumsum(a)) -> [0, 0, 0, 3, 3, 5] - # new_word_indices -> [0, 1, 2, 3, 4, 5] - [0, 0, 0, 3, 3, 5] = [0, 1, 2, 0, 1, 0] - # new_word_indices is the concatenation of range(word_len(sentence)) - # for all sentences in units n_total_word_elements = (batch_size * max_token_seq_len).to(torch.int32) word_indices_flat = (idxs[:, 0] * max_token_seq_len + new_word_indices).to(torch.int64) x_mask = torch.sum(torch.nn.functional.one_hot(word_indices_flat, n_total_word_elements), 0) x_mask = x_mask.to(torch.bool) - # to get absolute indices we add max_token_seq_len: - # idxs[:, 0] * max_token_seq_len -> [0, 0, 0, 1, 1, 2] * 2 = [0, 0, 0, 3, 3, 6] - # word_indices_flat -> [0, 0, 0, 3, 3, 6] + [0, 1, 2, 0, 1, 0] = [0, 1, 2, 3, 4, 6] - # total number of words in the batch (including paddings) - # batch_size * max_token_seq_len -> 3 * 3 = 9 - # tf.one_hot(...) -> - # [[1. 0. 0. 0. 0. 0. 0. 0. 0.] - # [0. 1. 0. 0. 0. 0. 0. 0. 0.] - # [0. 0. 1. 0. 0. 0. 0. 0. 0.] - # [0. 0. 0. 1. 0. 0. 0. 0. 0.] - # [0. 0. 0. 0. 1. 0. 0. 0. 0.] - # [0. 0. 0. 0. 0. 0. 1. 0. 0.]] - # x_mask -> [1, 1, 1, 1, 1, 0, 1, 0, 0] full_range = torch.arange(batch_size * max_token_seq_len).to(torch.int64) - # full_range -> [0, 1, 2, 3, 4, 5, 6, 7, 8] nonword_indices_flat = torch.masked_select(full_range, torch.logical_not(x_mask)) - # # y_idxs -> [5, 7, 8] - - # get a sequence of units corresponding to the start subtokens of the words - # size: [n_words, n_features] def gather_nd(params, indices): assert type(indices) == torch.Tensor return params[indices.transpose(0, 1).long().numpy().tolist()] elements = gather_nd(units, idxs) - # prepare zeros for paddings - # size: [batch_size * TOKEN_seq_length - n_words, n_features] sh = tuple(torch.stack([torch.sum(max_token_seq_len - token_seq_lengths), torch.tensor(nf)], 0).numpy()) paddings = torch.zeros(sh, dtype=torch.float64) @@ -167,12 +109,8 @@ def dynamic_stitch(indices, data): return res tensor_flat = torch.stack(dynamic_stitch([word_indices_flat, nonword_indices_flat], [elements, paddings])) - # tensor_flat -> [x, x, x, x, x, 0, x, 0, 0] tensor = torch.reshape(tensor_flat, (batch_size, max_token_seq_len.item(), nf_int)) - # tensor -> [[x, x, x], - # [x, x, 0], - # [x, 0, 0]] return tensor @@ -201,7 +139,6 @@ class TorchTransformersSequenceTagger(TorchModel): Args: n_tags: number of distinct tags pretrained_bert: pretrained Bert checkpoint path or key title (e.g. "bert-base-uncased") - return_probas: set this to `True` if you need the probabilities instead of raw answers bert_config_file: path to Bert configuration file, or None, if `pretrained_bert` is a string name attention_probs_keep_prob: keep_prob for Bert self-attention layers hidden_keep_prob: keep_prob for Bert hidden layers @@ -214,13 +151,13 @@ class TorchTransformersSequenceTagger(TorchModel): load_before_drop: whether to load best model before dropping learning rate or not clip_norm: clip gradients by norm min_learning_rate: min value of learning rate if learning rate decay is used + use_crf: whether to use Conditional Ramdom Field to decode tags """ def __init__(self, n_tags: int, pretrained_bert: str, bert_config_file: Optional[str] = None, - return_probas: bool = False, attention_probs_keep_prob: Optional[float] = None, hidden_keep_prob: Optional[float] = None, optimizer: str = "AdamW", @@ -230,16 +167,17 @@ def __init__(self, load_before_drop: bool = True, clip_norm: Optional[float] = None, min_learning_rate: float = 1e-07, + use_crf: bool = False, **kwargs) -> None: self.n_classes = n_tags - self.return_probas = return_probas self.attention_probs_keep_prob = attention_probs_keep_prob self.hidden_keep_prob = hidden_keep_prob self.clip_norm = clip_norm self.pretrained_bert = pretrained_bert self.bert_config_file = bert_config_file + self.use_crf = use_crf super().__init__(optimizer=optimizer, optimizer_parameters=optimizer_parameters, @@ -281,6 +219,8 @@ def train_on_batch(self, attention_mask=b_input_masks, labels=b_labels).loss loss.backward() + if self.use_crf: + self.crf(y, y_masks) # Clip the norm of the gradients to 1.0. # This is to help prevent the "exploding gradients" problem. if self.clip_norm: @@ -295,7 +235,7 @@ def train_on_batch(self, def __call__(self, input_ids: Union[List[List[int]], np.ndarray], input_masks: Union[List[List[int]], np.ndarray], - y_masks: Union[List[List[int]], np.ndarray]) -> Union[List[List[int]], List[np.ndarray]]: + y_masks: Union[List[List[int]], np.ndarray]) -> Tuple[List[List[int]], List[np.ndarray]]: """ Predicts tag indices for a given subword tokens batch Args: @@ -317,16 +257,18 @@ def __call__(self, # Move logits and labels to CPU and to numpy arrays logits = token_from_subtoken(logits[0].detach().cpu(), torch.from_numpy(y_masks)) - if self.return_probas: - pred = torch.nn.functional.softmax(logits, dim=-1) - pred = pred.detach().cpu().numpy() + probas = torch.nn.functional.softmax(logits, dim=-1) + probas = probas.detach().cpu().numpy() + if self.use_crf: + logits = logits.transpose(1, 0).to(self.device) + pred = self.crf.decode(logits) else: logits = logits.detach().cpu().numpy() pred = np.argmax(logits, axis=-1) - seq_lengths = np.sum(y_masks, axis=1) - pred = [p[:l] for l, p in zip(seq_lengths, pred)] + seq_lengths = np.sum(y_masks, axis=1) + pred = [p[:l] for l, p in zip(seq_lengths, pred)] - return pred + return pred, probas @overrides def load(self, fname=None): @@ -349,6 +291,8 @@ def load(self, fname=None): raise ConfigError("No pre-trained BERT model is given.") self.model.to(self.device) + if self.use_crf: + self.crf = CRF(self.n_classes).to(self.device) self.optimizer = getattr(torch.optim, self.optimizer_name)( self.model.parameters(), **self.optimizer_parameters) @@ -357,21 +301,21 @@ def load(self, fname=None): self.optimizer, **self.lr_scheduler_parameters) if self.load_path: - log.info(f"Load path {self.load_path} is given.") - if isinstance(self.load_path, Path) and not self.load_path.parent.is_dir(): - raise ConfigError("Provided load path is incorrect!") - - weights_path = Path(self.load_path.resolve()) - weights_path = weights_path.with_suffix(f".pth.tar") - if weights_path.exists(): - log.info(f"Load path {weights_path} exists.") - log.info(f"Initializing `{self.__class__.__name__}` from saved.") - - # now load the weights, optimizer from saved - log.info(f"Loading weights from {weights_path}.") - checkpoint = torch.load(weights_path, map_location=self.device) - self.model.load_state_dict(checkpoint["model_state_dict"]) - self.optimizer.load_state_dict(checkpoint["optimizer_state_dict"]) - self.epochs_done = checkpoint.get("epochs_done", 0) - else: - log.info(f"Init from scratch. Load path {weights_path} does not exist.") + super().load() + if self.use_crf: + weights_path_crf = Path(f"{self.load_path}_crf").resolve() + weights_path_crf = weights_path_crf.with_suffix(".pth.tar") + if weights_path_crf.exists(): + checkpoint = torch.load(weights_path_crf, map_location=self.device) + self.crf.load_state_dict(checkpoint["model_state_dict"], strict=False) + else: + log.info(f"Init from scratch. Load path {weights_path_crf} does not exist.") + + @overrides + def save(self, fname: Optional[str] = None, *args, **kwargs) -> None: + super().save() + if self.use_crf: + weights_path_crf = Path(f"{fname}_crf").resolve() + weights_path_crf = weights_path_crf.with_suffix(".pth.tar") + torch.save({"model_state_dict": self.crf.cpu().state_dict()}, weights_path_crf) + self.crf.to(self.device) diff --git a/deeppavlov/models/torch_bert/torch_transformers_squad.py b/deeppavlov/models/torch_bert/torch_transformers_squad.py index 9506ce924e..3f5efc05f2 100644 --- a/deeppavlov/models/torch_bert/torch_transformers_squad.py +++ b/deeppavlov/models/torch_bert/torch_transformers_squad.py @@ -13,8 +13,6 @@ # limitations under the License. import re -import json -import math from logging import getLogger from pathlib import Path from typing import List, Tuple, Optional, Dict @@ -22,14 +20,12 @@ import numpy as np import torch from overrides import overrides -from transformers import AutoModelForQuestionAnswering, AutoConfig, AutoTokenizer +from transformers import AutoModelForQuestionAnswering, AutoConfig from transformers.data.processors.utils import InputFeatures -from deeppavlov import build_model -from deeppavlov.core.common.errors import ConfigError from deeppavlov.core.commands.utils import expand_path +from deeppavlov.core.common.errors import ConfigError from deeppavlov.core.common.registry import register -from deeppavlov.core.models.estimator import Component from deeppavlov.core.models.torch_model import TorchModel logger = getLogger(__name__) @@ -65,6 +61,7 @@ class TorchTransformersSquad(TorchModel): load_before_drop: whether to load best model before dropping learning rate or not clip_norm: clip gradients by norm min_learning_rate: min value of learning rate if learning rate decay is used + batch_size: batch size for inference of squad model """ def __init__(self, @@ -79,6 +76,7 @@ def __init__(self, load_before_drop: bool = True, clip_norm: Optional[float] = None, min_learning_rate: float = 1e-06, + batch_size: int = 10, **kwargs) -> None: if not optimizer_parameters: @@ -93,6 +91,7 @@ def __init__(self, self.pretrained_bert = pretrained_bert self.bert_config_file = bert_config_file + self.batch_size = batch_size super().__init__(optimizer=optimizer, optimizer_parameters=optimizer_parameters, @@ -102,7 +101,8 @@ def __init__(self, min_learning_rate=min_learning_rate, **kwargs) - def train_on_batch(self, features: List[InputFeatures], y_st: List[List[int]], y_end: List[List[int]]) -> Dict: + def train_on_batch(self, features: List[List[InputFeatures]], + y_st: List[List[int]], y_end: List[List[int]]) -> Dict: """Train model on given batch. This method calls train_op using features and labels from y_st and y_end @@ -115,10 +115,9 @@ def train_on_batch(self, features: List[InputFeatures], y_st: List[List[int]], y dict with loss and learning_rate values """ - - input_ids = [f.input_ids for f in features] - input_masks = [f.attention_mask for f in features] - input_type_ids = [f.token_type_ids for f in features] + input_ids = [f[0].input_ids for f in features] + input_masks = [f[0].attention_mask for f in features] + input_type_ids = [f[0].token_type_ids for f in features] b_input_ids = torch.cat(input_ids, dim=0).to(self.device) b_input_masks = torch.cat(input_masks, dim=0).to(self.device) @@ -128,7 +127,7 @@ def train_on_batch(self, features: List[InputFeatures], y_st: List[List[int]], y y_end = [x[0] for x in y_end] b_y_st = torch.from_numpy(np.array(y_st)).to(self.device) b_y_end = torch.from_numpy(np.array(y_end)).to(self.device) - + input_ = { 'input_ids': b_input_ids, 'attention_mask': b_input_masks, @@ -163,79 +162,107 @@ def accepted_keys(self) -> Tuple[str]: accepted_keys = self.model.forward.__code__.co_varnames return accepted_keys - @property - def is_data_parallel(self) -> bool: - return isinstance(self.model, torch.nn.DataParallel) - - def __call__(self, features: List[InputFeatures]) -> Tuple[List[int], List[int], List[float], List[float]]: + def __call__(self, features_batch: List[List[InputFeatures]]) -> Tuple[ + List[List[int]], List[List[int]], List[List[float]], List[List[float]], List[int]]: """get predictions using features as input Args: - features: batch of InputFeatures instances + features_batch: batch of InputFeatures instances Returns: - predictions: start, end positions, start, end logits positions + start_pred_batch: answer start positions + end_pred_batch: answer end positions + logits_batch: answer logits + scores_batch: answer confidences + ind_batch: indices of paragraph pieces where the answer was found """ - input_ids = [f.input_ids for f in features] - input_masks = [f.attention_mask for f in features] - input_type_ids = [f.token_type_ids for f in features] - - b_input_ids = torch.cat(input_ids, dim=0).to(self.device) - b_input_masks = torch.cat(input_masks, dim=0).to(self.device) - b_input_type_ids = torch.cat(input_type_ids, dim=0).to(self.device) - - input_ = { - 'input_ids': b_input_ids, - 'attention_mask': b_input_masks, - 'token_type_ids': b_input_type_ids, - 'return_dict': True - } - - with torch.no_grad(): - input_ = {arg_name: arg_value for arg_name, arg_value in input_.items() if arg_name in self.accepted_keys} - # Forward pass, calculate logit predictions - outputs = self.model(**input_) - - logits_st = outputs.start_logits - logits_end = outputs.end_logits - - bs = b_input_ids.size()[0] - seq_len = b_input_ids.size()[-1] - mask = torch.cat([torch.ones(bs, 1, dtype=torch.int32), - torch.zeros(bs, seq_len - 1, dtype=torch.int32)], dim=-1).to(self.device) - logit_mask = b_input_type_ids + mask - logits_st = softmax_mask(logits_st, logit_mask) - logits_end = softmax_mask(logits_end, logit_mask) - - start_probs = torch.nn.functional.softmax(logits_st, dim=-1) - end_probs = torch.nn.functional.softmax(logits_end, dim=-1) - scores = torch.tensor(1) - start_probs[:, 0] * end_probs[:, 0] # ok - - outer = torch.matmul(start_probs.view(*start_probs.size(), 1), - end_probs.view(end_probs.size()[0], 1, end_probs.size()[1])) - outer_logits = torch.exp(logits_st.view(*logits_st.size(), 1) + logits_end.view( - logits_end.size()[0], 1, logits_end.size()[1])) - - context_max_len = torch.max(torch.sum(b_input_type_ids, dim=1)).to(torch.int64) - - max_ans_length = torch.min(torch.tensor(20).to(self.device), context_max_len).to(torch.int64).item() - - outer = torch.triu(outer, diagonal=0) - torch.triu(outer, diagonal=outer.size()[1] - max_ans_length) - outer_logits = torch.triu(outer_logits, diagonal=0) - torch.triu( - outer_logits, diagonal=outer_logits.size()[1] - max_ans_length) - - start_pred = torch.argmax(torch.max(outer, dim=2)[0], dim=1) - end_pred = torch.argmax(torch.max(outer, dim=1)[0], dim=1) - logits = torch.max(torch.max(outer_logits, dim=2)[0], dim=1)[0] + predictions = {} + # TODO: refactor batchification + indices, input_ids, input_masks, input_type_ids = [], [], [], [] + for n, features_list in enumerate(features_batch): + for f in features_list: + input_ids.append(f.input_ids) + input_masks.append(f.attention_mask) + input_type_ids.append(f.token_type_ids) + indices.append(n) + + num_batches = len(indices) // self.batch_size + int(len(indices) % self.batch_size > 0) + for i in range(num_batches): + b_input_ids = torch.cat(input_ids[i * self.batch_size:(i + 1) * self.batch_size], dim=0).to(self.device) + b_input_masks = torch.cat(input_masks[i * self.batch_size:(i + 1) * self.batch_size], dim=0).to(self.device) + b_input_type_ids = torch.cat(input_type_ids[i * self.batch_size:(i + 1) * self.batch_size], + dim=0).to(self.device) + input_ = { + 'input_ids': b_input_ids, + 'attention_mask': b_input_masks, + 'token_type_ids': b_input_type_ids, + 'return_dict': True + } + + with torch.no_grad(): + input_ = {arg_name: arg_value for arg_name, arg_value in input_.items() + if arg_name in self.accepted_keys} + # Forward pass, calculate logit predictions + outputs = self.model(**input_) + + logits_st = outputs.start_logits + logits_end = outputs.end_logits + + bs = b_input_ids.size()[0] + seq_len = b_input_ids.size()[-1] + mask = torch.cat([torch.ones(bs, 1, dtype=torch.int32), + torch.zeros(bs, seq_len - 1, dtype=torch.int32)], dim=-1).to(self.device) + logit_mask = b_input_type_ids + mask + logits_st = softmax_mask(logits_st, logit_mask) + logits_end = softmax_mask(logits_end, logit_mask) + + start_probs = torch.nn.functional.softmax(logits_st, dim=-1) + end_probs = torch.nn.functional.softmax(logits_end, dim=-1) + scores = torch.tensor(1) - start_probs[:, 0] * end_probs[:, 0] # ok + + outer = torch.matmul(start_probs.view(*start_probs.size(), 1), + end_probs.view(end_probs.size()[0], 1, end_probs.size()[1])) + outer_logits = torch.exp(logits_st.view(*logits_st.size(), 1) + logits_end.view( + logits_end.size()[0], 1, logits_end.size()[1])) + + context_max_len = torch.max(torch.sum(b_input_type_ids, dim=1)).to(torch.int64) + + max_ans_length = torch.min(torch.tensor(20).to(self.device), context_max_len).to(torch.int64).item() + + outer = torch.triu(outer, diagonal=0) - torch.triu(outer, diagonal=outer.size()[1] - max_ans_length) + outer_logits = torch.triu(outer_logits, diagonal=0) - torch.triu( + outer_logits, diagonal=outer_logits.size()[1] - max_ans_length) + + start_pred = torch.argmax(torch.max(outer, dim=2)[0], dim=1) + end_pred = torch.argmax(torch.max(outer, dim=1)[0], dim=1) + logits = torch.max(torch.max(outer_logits, dim=2)[0], dim=1)[0] + + # Move logits and labels to CPU and to numpy arrays + start_pred = start_pred.detach().cpu().numpy() + end_pred = end_pred.detach().cpu().numpy() + logits = logits.detach().cpu().numpy().tolist() + scores = scores.detach().cpu().numpy().tolist() + + for j, (start_pred_elem, end_pred_elem, logits_elem, scores_elem) in \ + enumerate(zip(start_pred, end_pred, logits, scores)): + ind = indices[i * self.batch_size + j] + if ind in predictions: + predictions[ind] += [(start_pred_elem, end_pred_elem, logits_elem, scores_elem)] + else: + predictions[ind] = [(start_pred_elem, end_pred_elem, logits_elem, scores_elem)] - # Move logits and labels to CPU and to numpy arrays - start_pred = start_pred.detach().cpu().numpy() - end_pred = end_pred.detach().cpu().numpy() - logits = logits.detach().cpu().numpy().tolist() - scores = scores.detach().cpu().numpy().tolist() + start_pred_batch, end_pred_batch, logits_batch, scores_batch, ind_batch = [], [], [], [], [] + for ind in sorted(predictions.keys()): + prediction = predictions[ind] + max_ind = np.argmax([pred[2] for pred in prediction]) + start_pred_batch.append(prediction[max_ind][0]) + end_pred_batch.append(prediction[max_ind][1]) + logits_batch.append(prediction[max_ind][2]) + scores_batch.append(prediction[max_ind][3]) + ind_batch.append(max_ind) - return start_pred, end_pred, logits, scores + return start_pred_batch, end_pred_batch, logits_batch, scores_batch, ind_batch @overrides def load(self, fname=None): @@ -270,144 +297,4 @@ def load(self, fname=None): if self.lr_scheduler_name is not None: self.lr_scheduler = getattr(torch.optim.lr_scheduler, self.lr_scheduler_name)( self.optimizer, **self.lr_scheduler_parameters) - - if self.load_path: - logger.info(f"Load path {self.load_path} is given.") - if isinstance(self.load_path, Path) and not self.load_path.parent.is_dir(): - raise ConfigError("Provided load path is incorrect!") - - weights_path = Path(self.load_path.resolve()) - weights_path = weights_path.with_suffix(f".pth.tar") - if weights_path.exists(): - logger.info(f"Load path {weights_path} exists.") - logger.info(f"Initializing `{self.__class__.__name__}` from saved.") - - # now load the weights, optimizer from saved - logger.info(f"Loading weights from {weights_path}.") - checkpoint = torch.load(weights_path, map_location=self.device) - model_state = checkpoint["model_state_dict"] - optimizer_state = checkpoint["optimizer_state_dict"] - - # load a multi-gpu model on a single device - if not self.is_data_parallel and "module." in list(model_state.keys())[0]: - tmp_model_state = {} - for key, value in model_state.items(): - tmp_model_state[re.sub("module.", "", key)] = value - model_state = tmp_model_state - - strict_load_flag = bool([key for key in checkpoint["model_state_dict"].keys() - if key.endswith("embeddings.position_ids")]) - self.model.load_state_dict(model_state, strict=strict_load_flag) - self.optimizer.load_state_dict(optimizer_state) - self.epochs_done = checkpoint.get("epochs_done", 0) - else: - logger.info(f"Init from scratch. Load path {weights_path} does not exist.") - - -@register('torch_transformers_squad_infer') -class TorchTransformersSquadInfer(Component): - """This model wraps BertSQuADModel to make predictions on longer than 512 tokens sequences. - - It splits context on chunks with `max_seq_length - 3 - len(question)` length, preserving sentences boundaries. - - It reassembles batches with chunks instead of full contexts to optimize performance, e.g.,: - batch_size = 5 - number_of_contexts == 2 - number of first context chunks == 8 - number of second context chunks == 2 - - we will create two batches with 5 chunks - - For each context the best answer is selected via logits or scores from BertSQuADModel. - - - Args: - squad_model_config: path to DeepPavlov BertSQuADModel config file - vocab_file: path to Bert vocab file - do_lower_case: set True if lowercasing is needed - max_seq_length: max sequence length in subtokens, including [SEP] and [CLS] tokens - batch_size: size of batch to use during inference - lang: either `en` or `ru`, it is used to select sentence tokenizer - - """ - - def __init__(self, squad_model_config: str, - vocab_file: str, - do_lower_case: bool, - max_seq_length: int = 512, - batch_size: int = 10, - lang: str = 'en', **kwargs) -> None: - config = json.load(open(squad_model_config)) - config['chainer']['pipe'][0]['max_seq_length'] = max_seq_length - self.model = build_model(config) - self.max_seq_length = max_seq_length - - if Path(vocab_file).is_file(): - vocab_file = str(expand_path(vocab_file)) - self.tokenizer = AutoTokenizer(vocab_file=vocab_file, - do_lower_case=do_lower_case) - else: - self.tokenizer = AutoTokenizer.from_pretrained(vocab_file, do_lower_case=do_lower_case) - - self.batch_size = batch_size - - if lang == 'en': - from nltk import sent_tokenize - self.sent_tokenizer = sent_tokenize - elif lang == 'ru': - from ru_sent_tokenize import ru_sent_tokenize - self.sent_tokenizer = ru_sent_tokenize - else: - raise RuntimeError('en and ru languages are supported only') - - def __call__(self, contexts: List[str], questions: List[str], **kwargs) -> Tuple[List[str], List[int], List[float]]: - """get predictions for given contexts and questions - - Args: - contexts: batch of contexts - questions: batch of questions - - Returns: - predictions: answer, answer start position, logits or scores - - """ - batch_indices = [] - contexts_to_predict = [] - questions_to_predict = [] - predictions = {} - for i, (context, question) in enumerate(zip(contexts, questions)): - context_subtokens = self.tokenizer.tokenize(context) - question_subtokens = self.tokenizer.tokenize(question) - max_chunk_len = self.max_seq_length - len(question_subtokens) - 3 - if 0 < max_chunk_len < len(context_subtokens): - number_of_chunks = math.ceil(len(context_subtokens) / max_chunk_len) - sentences = self.sent_tokenizer(context) - for chunk in np.array_split(sentences, number_of_chunks): - contexts_to_predict += [' '.join(chunk)] - questions_to_predict += [question] - batch_indices += [i] - else: - contexts_to_predict += [context] - questions_to_predict += [question] - batch_indices += [i] - - for j in range(0, len(contexts_to_predict), self.batch_size): - c_batch = contexts_to_predict[j: j + self.batch_size] - q_batch = questions_to_predict[j: j + self.batch_size] - ind_batch = batch_indices[j: j + self.batch_size] - a_batch, a_st_batch, logits_batch = self.model(c_batch, q_batch) - for a, a_st, logits, ind in zip(a_batch, a_st_batch, logits_batch, ind_batch): - if ind in predictions: - predictions[ind] += [(a, a_st, logits)] - else: - predictions[ind] = [(a, a_st, logits)] - - answers, answer_starts, logits = [], [], [] - for ind in sorted(predictions.keys()): - prediction = predictions[ind] - best_answer_ind = np.argmax([p[2] for p in prediction]) - answers += [prediction[best_answer_ind][0]] - answer_starts += [prediction[best_answer_ind][1]] - logits += [prediction[best_answer_ind][2]] - - return answers, answer_starts, logits + super().load() diff --git a/deeppavlov/models/vectorizers/word_vectorizer.py b/deeppavlov/models/vectorizers/word_vectorizer.py deleted file mode 100644 index 7f93c94556..0000000000 --- a/deeppavlov/models/vectorizers/word_vectorizer.py +++ /dev/null @@ -1,289 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import pathlib -from abc import abstractmethod -from collections import defaultdict -from typing import List, Dict, AnyStr, Union - -import numpy as np -from pymorphy2 import MorphAnalyzer -from russian_tagsets import converters - -from deeppavlov.core.common.registry import register -from deeppavlov.core.models.component import Component -from deeppavlov.core.models.serializable import Serializable -from deeppavlov.models.morpho_tagger.common_tagger import make_pos_and_tag - - -class WordIndexVectorizer(Serializable, Component): - """ - A basic class for custom word-level vectorizers - """ - - def __init__(self, save_path: str, load_path: Union[str, List[str]], **kwargs) -> None: - Serializable.__init__(self, save_path, load_path, **kwargs) - - @property - @abstractmethod - def dim(self): - raise NotImplementedError("You should implement dim property in your WordIndexVectorizer subclass.") - - def _get_word_indexes(self, word: AnyStr) -> List: - """ - Transforms a word to corresponding vector of indexes - """ - raise NotImplementedError("You should implement get_word_indexes function " - "in your WordIndexVectorizer subclass.") - - def __call__(self, data: List) -> np.ndarray: - """ - Transforms words to one-hot encoding according to the dictionary. - - Args: - data: the batch of words - - Returns: - a 3D array. answer[i][j][k] = 1 iff data[i][j] is the k-th word in the dictionary. - """ - # if isinstance(data[0], str): - # data = [[x for x in re.split("(\w+|[,.])", elem) if x.strip() != ""] for elem in data] - max_length = max(len(x) for x in data) - answer = np.zeros(shape=(len(data), max_length, self.dim), dtype=int) - for i, sent in enumerate(data): - for j, word in enumerate(sent): - answer[i, j][self._get_word_indexes(word)] = 1 - return answer - - -@register("dictionary_vectorizer") -class DictionaryVectorizer(WordIndexVectorizer): - """ - Transforms words into 0-1 vector of its possible tags, read from a vocabulary file. - The format of the vocabulary must be wordtag_1...tag_k - - Args: - save_path: path to save the vocabulary, - load_path: path to the vocabulary(-ies), - min_freq: minimal frequency of tag to memorize this tag, - unk_token: unknown token to be yielded for unknown words - """ - - def __init__(self, save_path: str, load_path: Union[str, List[str]], - min_freq: int = 1, unk_token: str = None, **kwargs) -> None: - super().__init__(save_path, load_path, **kwargs) - self.min_freq = min_freq - self.unk_token = unk_token - self.load() - - @property - def dim(self): - return len(self._t2i) - - def save(self) -> None: - """Saves the dictionary to self.save_path""" - with self.save_path.open("w", encoding="utf8") as fout: - for word, curr_labels in sorted(self.word_tag_mapping.items()): - curr_labels = [self._i2t[index] for index in curr_labels] - curr_labels = [x for x in curr_labels if x != self.unk_token] - fout.write("{}\t{}".format(word, " ".join(curr_labels))) - - def load(self) -> None: - """Loads the dictionary from self.load_path""" - if not isinstance(self.load_path, list): - self.load_path = [self.load_path] - for i, path in enumerate(self.load_path): - if isinstance(path, str): - self.load_path[i] = pathlib.Path(path) - labels_by_words = defaultdict(set) - for infile in self.load_path: - with infile.open("r", encoding="utf8") as fin: - for line in fin: - line = line.strip() - if line.count("\t") != 1: - continue - word, labels = line.split("\t") - labels_by_words[word].update(labels.split()) - self._initialize(labels_by_words) - - def _initialize(self, labels_by_words: Dict): - self._i2t = [self.unk_token] if self.unk_token is not None else [] - self._t2i = defaultdict(lambda: self.unk_token) - freq = defaultdict(int) - for word, labels in labels_by_words.items(): - for label in labels: - freq[label] += 1 - self._i2t += [label for label, count in freq.items() if count >= self.min_freq] - for i, label in enumerate(self._i2t): - self._t2i[label] = i - if self.unk_token is not None: - self.word_tag_mapping = defaultdict(lambda: [self.unk_token]) - else: - self.word_tag_mapping = defaultdict(list) - for word, labels in labels_by_words.items(): - labels = {self._t2i[label] for label in labels} - self.word_tag_mapping[word] = [x for x in labels if x is not None] - return self - - def _get_word_indexes(self, word: AnyStr): - return self.word_tag_mapping[word] - - -@register("pymorphy_vectorizer") -class PymorphyVectorizer(WordIndexVectorizer): - """ - Transforms russian words into 0-1 vector of its possible Universal Dependencies tags. - Tags are obtained using Pymorphy analyzer (pymorphy2.readthedocs.io) - and transformed to UD2.0 format using russian-tagsets library (https://github.com/kmike/russian-tagsets). - All UD2.0 tags that are compatible with produced tags are memorized. - The list of possible Universal Dependencies tags is read from a file, - which contains all the labels that occur in UD2.0 SynTagRus dataset. - - Args: - save_path: path to save the tags list, - load_path: path to load the list of tags, - max_pymorphy_variants: maximal number of pymorphy parses to be used. If -1, all parses are used. - """ - - USELESS_KEYS = ["Abbr"] - VALUE_MAP = {"Ptan": "Plur", "Brev": "Short"} - - def __init__(self, save_path: str, load_path: str, max_pymorphy_variants: int = -1, **kwargs) -> None: - super().__init__(save_path, load_path, **kwargs) - self.max_pymorphy_variants = max_pymorphy_variants - self.load() - self.memorized_word_indexes = dict() - self.memorized_tag_indexes = dict() - self.analyzer = MorphAnalyzer() - self.converter = converters.converter('opencorpora-int', 'ud20') - - @property - def dim(self): - return len(self._t2i) - - def save(self) -> None: - """Saves the dictionary to self.save_path""" - with self.save_path.open("w", encoding="utf8") as fout: - fout.write("\n".join(self._i2t)) - - def load(self) -> None: - """Loads the dictionary from self.load_path""" - self._i2t = [] - with self.load_path.open("r", encoding="utf8") as fin: - for line in fin: - line = line.strip() - if line == "": - continue - self._i2t.append(line) - self._t2i = {tag: i for i, tag in enumerate(self._i2t)} - self._make_tag_trie() - - def _make_tag_trie(self): - self._nodes = [defaultdict(dict)] - self._start_nodes_for_pos = dict() - self._data = [None] - for tag, code in self._t2i.items(): - pos, tag = make_pos_and_tag(tag, sep=",", return_mode="sorted_items") - start = self._start_nodes_for_pos.get(pos) - if start is None: - start = self._start_nodes_for_pos[pos] = len(self._nodes) - self._nodes.append(defaultdict(dict)) - self._data.append(None) - for key, value in tag: - values_dict = self._nodes[start][key] - child = values_dict.get(value) - if child is None: - child = values_dict[value] = len(self._nodes) - self._nodes.append(defaultdict(dict)) - self._data.append(None) - start = child - self._data[start] = code - return self - - def find_compatible(self, tag: str) -> List[int]: - """ - Transforms a Pymorphy tag to a list of indexes of compatible UD tags. - - Args: - tag: input Pymorphy tag - - Returns: - indexes of compatible UD tags - """ - if " " in tag and "_" not in tag: - pos, tag = tag.split(" ", maxsplit=1) - tag = sorted([tuple(elem.split("=")) for elem in tag.split("|")]) - else: - pos, tag = tag.split()[0], [] - if pos not in self._start_nodes_for_pos: - return [] - tag = [(key, self.VALUE_MAP.get(value, value)) for key, value in tag - if key not in self.USELESS_KEYS] - if len(tag) > 0: - curr_nodes = [(0, self._start_nodes_for_pos[pos])] - final_nodes = [] - else: - final_nodes = [self._start_nodes_for_pos[pos]] - curr_nodes = [] - while len(curr_nodes) > 0: - i, node_index = curr_nodes.pop() - # key, value = tag[i] - node = self._nodes[node_index] - if len(node) == 0: - final_nodes.append(node_index) - for curr_key, curr_values_dict in node.items(): - curr_i, curr_node_index = i, node_index - while curr_i < len(tag) and tag[curr_i][0] < curr_key: - curr_i += 1 - if curr_i == len(tag): - final_nodes.extend(curr_values_dict.values()) - continue - key, value = tag[curr_i] - if curr_key < key: - for child in curr_values_dict.values(): - curr_nodes.append((curr_i, child)) - else: - child = curr_values_dict.get(value) - if child is not None: - if curr_i < len(tag) - 1: - curr_nodes.append((curr_i + 1, child)) - else: - final_nodes.append(child) - answer = [] - while len(final_nodes) > 0: - index = final_nodes.pop() - if self._data[index] is not None: - answer.append(self._data[index]) - for elem in self._nodes[index].values(): - final_nodes.extend(elem.values()) - return answer - - def _get_word_indexes(self, word): - answer = self.memorized_word_indexes.get(word) - if answer is None: - parse = self.analyzer.parse(word) - if self.max_pymorphy_variants > 0: - parse = parse[:self.max_pymorphy_variants] - tag_indexes = set() - for elem in parse: - tag_indexes.update(set(self._get_tag_indexes(elem.tag))) - answer = self.memorized_word_indexes[word] = list(tag_indexes) - return answer - - def _get_tag_indexes(self, pymorphy_tag): - answer = self.memorized_tag_indexes.get(pymorphy_tag) - if answer is None: - tag = self.converter(str(pymorphy_tag)) - answer = self.memorized_tag_indexes[pymorphy_tag] = self.find_compatible(tag) - return answer diff --git a/deeppavlov/requirements/aiml_skill.txt b/deeppavlov/requirements/aiml_skill.txt deleted file mode 100644 index 6a6602091e..0000000000 --- a/deeppavlov/requirements/aiml_skill.txt +++ /dev/null @@ -1 +0,0 @@ -python-aiml==0.9.3 \ No newline at end of file diff --git a/deeppavlov/requirements/bert_dp.txt b/deeppavlov/requirements/bert_dp.txt deleted file mode 100644 index 9be3f8d71f..0000000000 --- a/deeppavlov/requirements/bert_dp.txt +++ /dev/null @@ -1 +0,0 @@ -git+https://github.com/deepmipt/bert.git@feat/multi_gpu \ No newline at end of file diff --git a/deeppavlov/requirements/datasets.txt b/deeppavlov/requirements/datasets.txt index 675da51276..bedb3b1054 100644 --- a/deeppavlov/requirements/datasets.txt +++ b/deeppavlov/requirements/datasets.txt @@ -1 +1 @@ -datasets==1.11.0 \ No newline at end of file +datasets>=1.16.0,<2.4.0 diff --git a/deeppavlov/requirements/en_core_web_sm.txt b/deeppavlov/requirements/en_core_web_sm.txt index 3fb142ab5d..6e4830cd98 100644 --- a/deeppavlov/requirements/en_core_web_sm.txt +++ b/deeppavlov/requirements/en_core_web_sm.txt @@ -1 +1,2 @@ -https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.2.5/en_core_web_sm-2.2.5.tar.gz#egg=en_core_web_sm==2.2.5 \ No newline at end of file +https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.3.0/en_core_web_sm-3.3.0-py3-none-any.whl +spacy diff --git a/deeppavlov/requirements/faiss.txt b/deeppavlov/requirements/faiss.txt deleted file mode 100644 index 65f36232ce..0000000000 --- a/deeppavlov/requirements/faiss.txt +++ /dev/null @@ -1 +0,0 @@ -faiss-gpu==1.6.3 diff --git a/deeppavlov/requirements/fasttext.txt b/deeppavlov/requirements/fasttext.txt index 9b0c672164..4a287d8c51 100644 --- a/deeppavlov/requirements/fasttext.txt +++ b/deeppavlov/requirements/fasttext.txt @@ -1 +1 @@ -fasttext==0.9.1 \ No newline at end of file +fasttext==0.9.* diff --git a/deeppavlov/requirements/gensim.txt b/deeppavlov/requirements/gensim.txt deleted file mode 100644 index 96d519b5bd..0000000000 --- a/deeppavlov/requirements/gensim.txt +++ /dev/null @@ -1 +0,0 @@ -gensim==3.8.1 \ No newline at end of file diff --git a/deeppavlov/requirements/jieba.txt b/deeppavlov/requirements/jieba.txt deleted file mode 100644 index dc9a2d185a..0000000000 --- a/deeppavlov/requirements/jieba.txt +++ /dev/null @@ -1 +0,0 @@ -jieba==0.42.1 diff --git a/deeppavlov/requirements/kenlm.txt b/deeppavlov/requirements/kenlm.txt index 2210ba6aa5..9d57c24888 100644 --- a/deeppavlov/requirements/kenlm.txt +++ b/deeppavlov/requirements/kenlm.txt @@ -1 +1 @@ -git+https://github.com/kpu/kenlm.git@96d303cfb1a0c21b8f060dbad640d7ab301c019a#egg=kenlm \ No newline at end of file +pypi-kenlm==0.1.20210121 diff --git a/deeppavlov/requirements/lxml.txt b/deeppavlov/requirements/lxml.txt index eaf4d54c65..7d859b12f7 100644 --- a/deeppavlov/requirements/lxml.txt +++ b/deeppavlov/requirements/lxml.txt @@ -1 +1 @@ -lxml==4.4.2 +lxml==4.9.* diff --git a/deeppavlov/requirements/morpho_tagger.txt b/deeppavlov/requirements/morpho_tagger.txt deleted file mode 100644 index a89ce584c2..0000000000 --- a/deeppavlov/requirements/morpho_tagger.txt +++ /dev/null @@ -1 +0,0 @@ -russian-tagsets==0.6 \ No newline at end of file diff --git a/deeppavlov/requirements/nemo-asr.txt b/deeppavlov/requirements/nemo-asr.txt deleted file mode 100644 index 1a072b36b7..0000000000 --- a/deeppavlov/requirements/nemo-asr.txt +++ /dev/null @@ -1,7 +0,0 @@ -frozendict==1.2 -kaldi-io==0.9.4 -inflect==4.1.0 -unidecode==1.1.1 -librosa==0.7.2 -torch-stft==0.1.4 -numba==0.48 \ No newline at end of file diff --git a/deeppavlov/requirements/nemo-tts.txt b/deeppavlov/requirements/nemo-tts.txt deleted file mode 100644 index a0f3139b34..0000000000 --- a/deeppavlov/requirements/nemo-tts.txt +++ /dev/null @@ -1,3 +0,0 @@ -matplotlib==3.2.1 -sentencepiece==0.1.85 -youtokentome==1.0.6 \ No newline at end of file diff --git a/deeppavlov/requirements/nemo.txt b/deeppavlov/requirements/nemo.txt deleted file mode 100644 index e6f8ff402a..0000000000 --- a/deeppavlov/requirements/nemo.txt +++ /dev/null @@ -1 +0,0 @@ -nemo-toolkit==0.10.0 \ No newline at end of file diff --git a/deeppavlov/requirements/opt_einsum.txt b/deeppavlov/requirements/opt_einsum.txt index 04e76d27a1..b2ea8c2870 100644 --- a/deeppavlov/requirements/opt_einsum.txt +++ b/deeppavlov/requirements/opt_einsum.txt @@ -1 +1 @@ -opt-einsum==3.3.0 \ No newline at end of file +opt-einsum==3.3.* diff --git a/deeppavlov/requirements/pytorch.txt b/deeppavlov/requirements/pytorch.txt new file mode 100644 index 0000000000..93197394a4 --- /dev/null +++ b/deeppavlov/requirements/pytorch.txt @@ -0,0 +1 @@ +torch>=1.6.0,<1.13.0 diff --git a/deeppavlov/requirements/pytorch14.txt b/deeppavlov/requirements/pytorch14.txt deleted file mode 100644 index f940e921a8..0000000000 --- a/deeppavlov/requirements/pytorch14.txt +++ /dev/null @@ -1,2 +0,0 @@ -torch==1.4.0 -torchvision==0.5.0 \ No newline at end of file diff --git a/deeppavlov/requirements/pytorch16.txt b/deeppavlov/requirements/pytorch16.txt deleted file mode 100644 index 0d41debc01..0000000000 --- a/deeppavlov/requirements/pytorch16.txt +++ /dev/null @@ -1,2 +0,0 @@ -torch==1.6.0 -torchvision==0.7.0 \ No newline at end of file diff --git a/deeppavlov/requirements/rapidfuzz.txt b/deeppavlov/requirements/rapidfuzz.txt index e6b4ffa3a3..5d3ee5c2fe 100644 --- a/deeppavlov/requirements/rapidfuzz.txt +++ b/deeppavlov/requirements/rapidfuzz.txt @@ -1 +1 @@ -rapidfuzz==0.7.6 +rapidfuzz==2.1.* diff --git a/deeppavlov/requirements/rasa_skill.txt b/deeppavlov/requirements/rasa_skill.txt deleted file mode 100644 index bfb2598b2d..0000000000 --- a/deeppavlov/requirements/rasa_skill.txt +++ /dev/null @@ -1 +0,0 @@ -git+https://github.com/deepmipt/rasa.git@b0a80916e54ed9f4496c709a28f1093f7a5f2492#egg=rasa==1.2.7 diff --git a/deeppavlov/requirements/ru_core_news_sm.txt b/deeppavlov/requirements/ru_core_news_sm.txt new file mode 100644 index 0000000000..d7e3dd11c9 --- /dev/null +++ b/deeppavlov/requirements/ru_core_news_sm.txt @@ -0,0 +1,2 @@ +https://github.com/explosion/spacy-models/releases/download/ru_core_news_sm-3.3.0/ru_core_news_sm-3.3.0-py3-none-any.whl +spacy diff --git a/deeppavlov/requirements/sacremoses.txt b/deeppavlov/requirements/sacremoses.txt new file mode 100644 index 0000000000..5d069c7669 --- /dev/null +++ b/deeppavlov/requirements/sacremoses.txt @@ -0,0 +1 @@ +sacremoses==0.0.53 diff --git a/deeppavlov/requirements/slovnet.txt b/deeppavlov/requirements/slovnet.txt new file mode 100644 index 0000000000..6e063f1115 --- /dev/null +++ b/deeppavlov/requirements/slovnet.txt @@ -0,0 +1,2 @@ +slovnet==0.5.* +navec diff --git a/deeppavlov/requirements/sortedcontainers.txt b/deeppavlov/requirements/sortedcontainers.txt index ecb69929c1..1a3a6bff0d 100644 --- a/deeppavlov/requirements/sortedcontainers.txt +++ b/deeppavlov/requirements/sortedcontainers.txt @@ -1 +1 @@ -sortedcontainers==2.1.0 \ No newline at end of file +sortedcontainers==2.4.* diff --git a/deeppavlov/requirements/spacy.txt b/deeppavlov/requirements/spacy.txt deleted file mode 100644 index 9693ba97a9..0000000000 --- a/deeppavlov/requirements/spacy.txt +++ /dev/null @@ -1 +0,0 @@ -spacy==2.2.3 \ No newline at end of file diff --git a/deeppavlov/requirements/syntax_parser.txt b/deeppavlov/requirements/syntax_parser.txt deleted file mode 100644 index 053781647f..0000000000 --- a/deeppavlov/requirements/syntax_parser.txt +++ /dev/null @@ -1 +0,0 @@ -git+https://github.com/andersjo/dependency_decoding.git@79510908223b93bd4c1fb0409a2a66dd75577c2c \ No newline at end of file diff --git a/deeppavlov/requirements/tf-gpu.txt b/deeppavlov/requirements/tf-gpu.txt deleted file mode 100644 index c6114f09c4..0000000000 --- a/deeppavlov/requirements/tf-gpu.txt +++ /dev/null @@ -1 +0,0 @@ -tensorflow-gpu==1.15.5 \ No newline at end of file diff --git a/deeppavlov/requirements/tf-hub.txt b/deeppavlov/requirements/tf-hub.txt deleted file mode 100644 index 6c9b9fb164..0000000000 --- a/deeppavlov/requirements/tf-hub.txt +++ /dev/null @@ -1 +0,0 @@ -tensorflow-hub==0.7.0 \ No newline at end of file diff --git a/deeppavlov/requirements/tf.txt b/deeppavlov/requirements/tf.txt deleted file mode 100644 index d5a56dee1e..0000000000 --- a/deeppavlov/requirements/tf.txt +++ /dev/null @@ -1 +0,0 @@ -tensorflow==1.15.5 \ No newline at end of file diff --git a/deeppavlov/requirements/torchcrf.txt b/deeppavlov/requirements/torchcrf.txt new file mode 100644 index 0000000000..2e2f260106 --- /dev/null +++ b/deeppavlov/requirements/torchcrf.txt @@ -0,0 +1 @@ +pytorch-crf==0.7.* diff --git a/deeppavlov/requirements/torchtext.txt b/deeppavlov/requirements/torchtext.txt deleted file mode 100644 index 766718b628..0000000000 --- a/deeppavlov/requirements/torchtext.txt +++ /dev/null @@ -1 +0,0 @@ -torchtext==0.6.0 \ No newline at end of file diff --git a/deeppavlov/requirements/transformers.txt b/deeppavlov/requirements/transformers.txt index ac8b9921ca..65c2816393 100644 --- a/deeppavlov/requirements/transformers.txt +++ b/deeppavlov/requirements/transformers.txt @@ -1 +1 @@ -transformers==4.6.0 \ No newline at end of file +transformers>=4.13.0,<4.21.0 diff --git a/deeppavlov/requirements/transformers28.txt b/deeppavlov/requirements/transformers28.txt deleted file mode 100644 index ec122c087f..0000000000 --- a/deeppavlov/requirements/transformers28.txt +++ /dev/null @@ -1 +0,0 @@ -transformers==2.8.0 \ No newline at end of file diff --git a/deeppavlov/requirements/udapi.txt b/deeppavlov/requirements/udapi.txt index d923dfbf55..c3bbe488c9 100644 --- a/deeppavlov/requirements/udapi.txt +++ b/deeppavlov/requirements/udapi.txt @@ -1 +1 @@ -git+https://github.com/udapi/udapi-python.git@1e4004f577f3c6e471528ce4b87dd570ce8f2706 \ No newline at end of file +udapi==0.3.* diff --git a/deeppavlov/requirements/whapi.txt b/deeppavlov/requirements/whapi.txt index 8637c13b43..a6389584e0 100644 --- a/deeppavlov/requirements/whapi.txt +++ b/deeppavlov/requirements/whapi.txt @@ -1 +1,2 @@ -whapi==0.6.2 \ No newline at end of file +bs4 +whapi==0.6.* diff --git a/deeppavlov/requirements/xeger.txt b/deeppavlov/requirements/xeger.txt deleted file mode 100644 index d415ec4b7e..0000000000 --- a/deeppavlov/requirements/xeger.txt +++ /dev/null @@ -1 +0,0 @@ -xeger==0.3.5 diff --git a/deeppavlov/skills/__init__.py b/deeppavlov/skills/__init__.py deleted file mode 100644 index e69de29bb2..0000000000 diff --git a/deeppavlov/skills/aiml_skill/README.md b/deeppavlov/skills/aiml_skill/README.md deleted file mode 100644 index cad5e100ed..0000000000 --- a/deeppavlov/skills/aiml_skill/README.md +++ /dev/null @@ -1,6 +0,0 @@ -This skill wraps python-aiml library and allows developer to integrate AIML scripts into DeepPavlov dialog system. - -If you'd like to find more free AIML scripts here is link: -https://github.com/pandorabots/Free-AIML - -You can set path to folder with your AIML scripts as config param (see attr `path_to_aiml_scripts`). \ No newline at end of file diff --git a/deeppavlov/skills/aiml_skill/__init__.py b/deeppavlov/skills/aiml_skill/__init__.py deleted file mode 100644 index e5b4b02f6b..0000000000 --- a/deeppavlov/skills/aiml_skill/__init__.py +++ /dev/null @@ -1 +0,0 @@ -from .aiml_skill import AIMLSkill diff --git a/deeppavlov/skills/aiml_skill/aiml_skill.py b/deeppavlov/skills/aiml_skill/aiml_skill.py deleted file mode 100644 index 51bf6f2360..0000000000 --- a/deeppavlov/skills/aiml_skill/aiml_skill.py +++ /dev/null @@ -1,158 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import uuid -from logging import getLogger -from pathlib import Path -from typing import Tuple, Optional, List - -import aiml - -from deeppavlov.core.common.registry import register -from deeppavlov.core.models.component import Component - -log = getLogger(__name__) - - -@register("aiml_skill") -class AIMLSkill(Component): - """Skill wraps python-aiml library into DeepPavlov interfrace. - AIML uses directory with AIML scripts which are loaded at initialization and used as patterns - for answering at each step. - """ - - def __init__(self, - path_to_aiml_scripts: str, - positive_confidence: float = 0.66, - null_response: str = "I don't know what to answer you", - null_confidence: float = 0.33, - **kwargs - ) -> None: - """ - Construct skill: - read AIML scripts, - load AIML kernel - - Args: - path_to_aiml_scripts: string path to folder with AIML scripts - null_response: Response string to answer if no AIML Patterns matched - positive_confidence: The confidence of response if response was found in AIML scripts - null_confidence: The confidence when AIML scripts has no rule for responding and system returns null_response - """ - # we need absolute path (expanded for user home and resolved if it relative path): - self.path_to_aiml_scripts = Path(path_to_aiml_scripts).expanduser().resolve() - log.info(f"path_to_aiml_scripts is: `{self.path_to_aiml_scripts}`") - - self.positive_confidence = positive_confidence - self.null_confidence = null_confidence - self.null_response = null_response - self.kernel = aiml.Kernel() - # to block AIML output: - self.kernel._verboseMode = False - self._load_scripts() - - def _load_scripts(self) -> None: - """ - Scripts are loaded recursively from files with extensions .xml and .aiml - Returns: None - - """ - # learn kernel to all aimls in directory tree: - all_files = sorted(self.path_to_aiml_scripts.rglob('*.*')) - learned_files = [] - for each_file_path in all_files: - if each_file_path.suffix in ['.aiml', '.xml']: - # learn the script file - self.kernel.learn(str(each_file_path)) - learned_files.append(each_file_path) - if not learned_files: - log.warning(f"No .aiml or .xml files found for AIML Kernel in directory {self.path_to_aiml_scripts}") - - def process_step(self, utterance_str: str, user_id: any) -> Tuple[str, float]: - response = self.kernel.respond(utterance_str, sessionID=user_id) - # here put your estimation of confidence: - if response: - # print(f"AIML responds: {response}") - confidence = self.positive_confidence - else: - # print("AIML responses silently...") - response = self.null_response - confidence = self.null_confidence - return response, confidence - - def _generate_user_id(self) -> str: - """Here you put user id generative logic if you want to implement it in the skill. - - Returns: - user_id: Random generated user ID. - - """ - return uuid.uuid1().hex - - def __call__(self, - utterances_batch: List[str], - states_batch: Optional[List] = None) -> Tuple[List[str], List[float], list]: - """Returns skill inference result. - - Returns batches of skill inference results, estimated confidence - levels and up to date states corresponding to incoming utterance - batch. - - Args: - utterances_batch: A batch of utterances of str type. - states_batch: A batch of arbitrary typed states for - each utterance. - - - Returns: - response: A batch of arbitrary typed skill inference results. - confidence: A batch of float typed confidence levels for each of - skill inference result. - output_states_batch: A batch of arbitrary typed states for - each utterance. - - """ - # grasp user_ids from states batch. - # We expect that skill receives None or dict of state for each utterance. - # if state has user_id then skill uses it, otherwise it generates user_id and calls the - # user with this name in further. - - # In this implementation we use current datetime for generating uniqe ids - output_states_batch = [] - user_ids = [] - if states_batch is None: - # generate states batch matching batch of utterances: - states_batch = [None] * len(utterances_batch) - - for state in states_batch: - if not state: - user_id = self._generate_user_id() - new_state = {'user_id': user_id} - - elif 'user_id' not in state: - new_state = state - user_id = self._generate_user_id() - new_state['user_id'] = self._generate_user_id() - - else: - new_state = state - user_id = new_state['user_id'] - - user_ids.append(user_id) - output_states_batch.append(new_state) - - confident_responses = map(self.process_step, utterances_batch, user_ids) - responses_batch, confidences_batch = zip(*confident_responses) - - return responses_batch, confidences_batch, output_states_batch diff --git a/deeppavlov/skills/dsl_skill/__init__.py b/deeppavlov/skills/dsl_skill/__init__.py deleted file mode 100644 index d2b332d4b6..0000000000 --- a/deeppavlov/skills/dsl_skill/__init__.py +++ /dev/null @@ -1,3 +0,0 @@ -from .context import UserContext -from .dsl_skill import DSLMeta -from .utils import SkillResponse, UserId diff --git a/deeppavlov/skills/dsl_skill/context.py b/deeppavlov/skills/dsl_skill/context.py deleted file mode 100644 index acbfc6c5b9..0000000000 --- a/deeppavlov/skills/dsl_skill/context.py +++ /dev/null @@ -1,53 +0,0 @@ -# Copyright 2019 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import json -from typing import Optional, Union, Dict - -from deeppavlov.skills.dsl_skill.utils import UserId - - -class UserContext: - """ - UserContext object stores information that the current skill currently knows about the user. - - Args: - user_id: id of user - message: current message - current_state: current user state - payload: custom payload dictionary, or a JSON-serialized string of such dictionary - - Attributes: - handler_payload: stores information generated by the selected handler - - """ - - def __init__( - self, - user_id: Optional[UserId] = None, - message: Optional[str] = None, - current_state: Optional[str] = None, - payload: Optional[Union[Dict, str]] = None, - ): - self.user_id = user_id - self.message = message - self.current_state = current_state - self.handler_payload = {} - - # some custom data added by skill creator - self.payload = payload - if payload == '' or payload is None: - self.payload = {} - elif isinstance(payload, str): - self.payload = json.loads(payload) diff --git a/deeppavlov/skills/dsl_skill/dsl_skill.py b/deeppavlov/skills/dsl_skill/dsl_skill.py deleted file mode 100644 index 93e9f8544d..0000000000 --- a/deeppavlov/skills/dsl_skill/dsl_skill.py +++ /dev/null @@ -1,225 +0,0 @@ -# Copyright 2019 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from abc import ABCMeta -from collections import defaultdict -from functools import partial -from itertools import zip_longest, starmap -from typing import List, Optional, Dict, Callable, Tuple - -from deeppavlov.core.common.registry import register -from deeppavlov.skills.dsl_skill.context import UserContext -from deeppavlov.skills.dsl_skill.handlers.handler import Handler -from deeppavlov.skills.dsl_skill.handlers.regex_handler import RegexHandler -from deeppavlov.skills.dsl_skill.utils import SkillResponse, UserId - - -class DSLMeta(ABCMeta): - """ - This metaclass is used for creating a skill. Skill is register by its class name in registry. - - Example: - - .. code:: python - - class ExampleSkill(metaclass=DSLMeta): - @DSLMeta.handler(commands=["hello", "hey"]) - def __greeting(context: UserContext): - response = "Hello, my friend!" - confidence = 1.0 - return response, confidence - - Attributes: - name: class name - state_to_handler: dict with states as keys and lists of Handler objects as values - user_to_context: dict with user ids as keys and UserContext objects as values - universal_handlers: list of handlers that can be activated from any state - - """ - skill_collection: Dict[str, 'DSLMeta'] = {} - - def __init__(cls, name: str, - bases, - namespace, - **kwargs): - super().__init__(name, bases, namespace, **kwargs) - cls.name = name - cls.state_to_handler = defaultdict(list) - cls.user_to_context = defaultdict(UserContext) - cls.universal_handlers = [] - - handlers = [attribute for attribute in namespace.values() if isinstance(attribute, Handler)] - - for handler in handlers: - if handler.state is None: - cls.universal_handlers.append(handler) - else: - cls.state_to_handler[handler.state].append(handler) - - cls.handle = partial(DSLMeta.__handle, cls) - cls.__call__ = partial(DSLMeta.__handle_batch, cls) - cls.__init__ = partial(DSLMeta.__init__class, cls) - register()(cls) - DSLMeta.__add_to_collection(cls) - - def __init__class(cls, - on_invalid_command: str = "Простите, я вас не понял", - null_confidence: float = 0, - *args, **kwargs) -> None: - """ - Initialize Skill class - - Args: - on_invalid_command: message to be sent on message with no associated handler - null_confidence: the confidence when DSL has no handler that fits request - """ - # message to be sent on message with no associated handler - cls.on_invalid_command = on_invalid_command - cls.null_confidence = null_confidence - - def __handle_batch(cls: 'DSLMeta', - utterances_batch: List[str], - user_ids_batch: List[UserId]) -> Tuple[List, ...]: - """Returns skill inference result. - Returns batches of skill inference results, estimated confidence - levels and up to date states corresponding to incoming utterance - batch. - - Args: - utterances_batch: A batch of utterances of str type. - user_ids_batch: A batch of user ids. - - Returns: - response_batch: A batch of arbitrary typed skill inference results. - confidence_batch: A batch of float typed confidence levels for each of - skill inference result. - - """ - return (*map(list, zip(*starmap(cls.handle, zip_longest(utterances_batch, user_ids_batch)))),) - - @staticmethod - def __add_to_collection(cls: 'DSLMeta') -> None: - """ - Adds Skill class to Skill classes collection - - Args: - cls: Skill class - - """ - DSLMeta.skill_collection[cls.name] = cls - - @staticmethod - def __handle(cls: 'DSLMeta', - utterance: str, - user_id: UserId) -> SkillResponse: - """ - Handles what is going to be after a message from user arrived. - Simple usage: - skill([], []) - - Args: - cls: instance of callee's class - utterance: a message to be handled - user_id: id of a user - - Returns: - result: handler function's result if succeeded - - """ - context = cls.user_to_context[user_id] - - context.user_id = user_id - context.message = utterance - - current_handler = cls.__select_handler(context) - return cls.__run_handler(current_handler, context) - - def __select_handler(cls, - context: UserContext) -> Optional[Callable]: - """ - Selects handler with the highest priority that could be triggered from the passed context. - - Returns: - handler function that is selected and None if no handler fits request - - """ - available_handlers = cls.state_to_handler[context.current_state] - available_handlers.extend(cls.universal_handlers) - available_handlers.sort(key=lambda h: h.priority, reverse=True) - for handler in available_handlers: - if handler.check(context): - handler.expand_context(context) - return handler.func - - def __run_handler(cls, handler: Optional[Callable], - context: UserContext) -> SkillResponse: - """ - Runs specified handler for current context - - Args: - handler: handler to be run. If None, on_invalid_command is returned - context: user context - - Returns: - SkillResponse - - """ - if handler is None: - return SkillResponse(cls.on_invalid_command, cls.null_confidence) - try: - return SkillResponse(*handler(context=context)) - except Exception as exc: - return SkillResponse(str(exc), 1.0) - - @staticmethod - def handler(commands: Optional[List[str]] = None, - state: Optional[str] = None, - context_condition: Optional[Callable] = None, - priority: int = 0) -> Callable: - """ - Decorator to be used in skills' classes. - Sample usage: - - .. code:: python - - class ExampleSkill(metaclass=DSLMeta): - @DSLMeta.handler(commands=["hello", "hi", "sup", "greetings"]) - def __greeting(context: UserContext): - response = "Hello, my friend!" - confidence = 1.0 - return response, confidence - - Args: - priority: integer value to indicate priority. If multiple handlers satisfy - all the requirements, the handler with the greatest priority value will be used - context_condition: function that takes context and - returns True if this handler should be enabled - and False otherwise. If None, no condition is checked - commands: phrases/regexs on what the function wrapped - by this decorator will trigger - state: state name - - Returns: - function decorated into Handler class - - """ - if commands is None: - commands = [".*"] - - def decorator(func: Callable) -> Handler: - return RegexHandler(func, commands, - context_condition=context_condition, - priority=priority, state=state) - - return decorator diff --git a/deeppavlov/skills/dsl_skill/handlers/__init__.py b/deeppavlov/skills/dsl_skill/handlers/__init__.py deleted file mode 100644 index e69de29bb2..0000000000 diff --git a/deeppavlov/skills/dsl_skill/handlers/handler.py b/deeppavlov/skills/dsl_skill/handlers/handler.py deleted file mode 100644 index c041404e82..0000000000 --- a/deeppavlov/skills/dsl_skill/handlers/handler.py +++ /dev/null @@ -1,68 +0,0 @@ -# Copyright 2019 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from typing import Callable, Optional - -from deeppavlov.skills.dsl_skill.context import UserContext -from deeppavlov.skills.dsl_skill.utils import SkillResponse - - -class Handler: - """ - Handler instance helps DSLMeta class distinguish functions wrapped - by @DSLMeta.handler to add them to handlers storage. - It also checks if the handler function should be triggered based on the given context. - - Attributes: - func: handler function - state: state in which handler can be activated - priority: priority of the function. If 2 or more handlers can be activated, handler - with the highest priority is selected - context_condition: predicate that accepts user context and checks if the handler should be activated. Example: - `lambda context: context.user_id != 1` checks if user_id is not equal to 1. - That means a user with id 1 will be always ignored by the handler. - - """ - - def __init__(self, - func: Callable, - state: Optional[str] = None, - context_condition: Optional[Callable] = None, - priority: int = 0): - self.func = func - self.state = state - self.context_condition = context_condition - self.priority = priority - - def __call__(self, context: UserContext) -> SkillResponse: - return self.func(context) - - def check(self, context: UserContext) -> bool: - """ - Checks: - - if the handler function should be triggered based on the given context via context condition. - - Args: - context: user context - - Returns: - True, if handler should be activated, False otherwise - """ - if self.context_condition is not None: - return self.context_condition(context) - return True - - def expand_context(self, context: UserContext) -> UserContext: - context.handler_payload = {} - return context diff --git a/deeppavlov/skills/dsl_skill/handlers/regex_handler.py b/deeppavlov/skills/dsl_skill/handlers/regex_handler.py deleted file mode 100644 index 04cf171774..0000000000 --- a/deeppavlov/skills/dsl_skill/handlers/regex_handler.py +++ /dev/null @@ -1,80 +0,0 @@ -# Copyright 2019 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import re -from typing import List, Callable, Optional - -from deeppavlov.skills.dsl_skill.context import UserContext -from deeppavlov.skills.dsl_skill.handlers.handler import Handler - - -class RegexHandler(Handler): - """ - This handler checks whether the message that is passed to it is matched by a regex. - - Adds the following key to ```context.handler_payload```: - - 'regex_groups' - groups parsed from regular expression in command, by name - - Attributes: - func: handler function - state: state in which handler can be activated - priority: priority of the function. If 2 or more handlers can be activated, function - with the highest priority is selected - context_condition: predicate that accepts user context and checks if the handler should be activated. - Example: `lambda context: context.user_id != 1` checks if user_id is not equal to 1. - That means a user with id 1 will be always ignored by the handler. - commands: handler is activated if regular expression from this list is matched with a user message - - """ - - def __init__(self, - func: Callable, - commands: Optional[List[str]] = None, - state: Optional[str] = None, - context_condition: Optional[Callable] = None, - priority: int = 0): - super().__init__(func, state, context_condition, priority) - self.commands = [re.compile(command) for command in commands] - - def check(self, context: UserContext) -> bool: - """ - Checks: - - if the handler function should be triggered based on the given context via context condition. - - if at least one of the commands is matched to the `context.message`. - - Args: - context: user context - - Returns: - True, if handler should be activated, False otherwise - """ - is_previous_matches = super().check(context) - if not is_previous_matches: - return False - - message = context.message - return any(re.search(regexp, ' '.join(message)) for regexp in self.commands) - - def expand_context(self, context: UserContext) -> UserContext: - context.handler_payload = {'regex_groups': {}} - message = context.message - for regexp in self.commands: - match = re.search(regexp, ' '.join(message)) - if match is not None: - for group_ind, span in enumerate(match.regs): - context.handler_payload['regex_groups'][group_ind] = message[span[0]: span[1]] - for group_name, group_ind in regexp.groupindex.items(): - context.handler_payload['regex_groups'][group_name] = \ - context.handler_payload['regex_groups'][group_ind] - return context diff --git a/deeppavlov/skills/dsl_skill/utils.py b/deeppavlov/skills/dsl_skill/utils.py deleted file mode 100644 index 717d52f637..0000000000 --- a/deeppavlov/skills/dsl_skill/utils.py +++ /dev/null @@ -1,22 +0,0 @@ -# Copyright 2019 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from typing import Union, NamedTuple - -UserId = Union[str, int] - - -class SkillResponse(NamedTuple): - response: str - confidence: float diff --git a/deeppavlov/skills/rasa_skill/__init__.py b/deeppavlov/skills/rasa_skill/__init__.py deleted file mode 100644 index d694bafa04..0000000000 --- a/deeppavlov/skills/rasa_skill/__init__.py +++ /dev/null @@ -1 +0,0 @@ -from .rasa_skill import RASASkill diff --git a/deeppavlov/skills/rasa_skill/rasa_skill.py b/deeppavlov/skills/rasa_skill/rasa_skill.py deleted file mode 100644 index e334f4b0ab..0000000000 --- a/deeppavlov/skills/rasa_skill/rasa_skill.py +++ /dev/null @@ -1,269 +0,0 @@ -# Copyright 2019 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import asyncio -import logging -import uuid -from functools import reduce -from pathlib import Path -from typing import Tuple, Optional, List - -from rasa.cli.utils import get_validated_path -from rasa.constants import DEFAULT_MODELS_PATH -from rasa.core.agent import Agent -from rasa.core.channels import CollectingOutputChannel -from rasa.core.channels import UserMessage -from rasa.model import get_model - -from deeppavlov.core.common.registry import register -from deeppavlov.core.models.component import Component - -logger = logging.getLogger(__name__) - - -@register("rasa_skill") -class RASASkill(Component): - """RASASkill lets you to wrap RASA Agent as a Skill within DeepPavlov environment. - - The component requires path to your RASA models (folder with timestamped tar.gz archieves) - as you use in command `rasa run -m models --enable-api --log-file out.log` - - """ - - def __init__(self, path_to_models: str, **kwargs) -> None: - """ - Constructs RASA Agent as a DeepPavlov skill: - read model folder, - initialize rasa.core.agent.Agent and wrap it's interfaces - - Args: - path_to_models: string path to folder with RASA models - - """ - # we need absolute path (expanded for user home and resolved if it relative path): - self.path_to_models = Path(path_to_models).expanduser().resolve() - - model = get_validated_path(self.path_to_models, "model", DEFAULT_MODELS_PATH) - - model_path = get_model(model) - if not model_path: - # can not laod model path - raise Exception("can not load model path: %s" % model) - - self._agent = Agent.load(model_path) - self.ioloop = asyncio.new_event_loop() - logger.info(f"path to RASA models is: `{self.path_to_models}`") - - def __call__(self, - utterances_batch: List[str], - states_batch: Optional[List] = None) -> Tuple[List[str], List[float], list]: - """Returns skill inference result. - - Returns batches of skill inference results, estimated confidence - levels and up to date states corresponding to incoming utterance - batch. - - Args: - utterances_batch: A batch of utterances of str type. - states_batch: A batch of arbitrary typed states for - each utterance. - - - Returns: - response: A batch of arbitrary typed skill inference results. - confidence: A batch of float typed confidence levels for each of - skill inference result. - output_states_batch: A batch of arbitrary typed states for - each utterance. - - """ - user_ids, output_states_batch = self._handle_user_identification(utterances_batch, states_batch) - ################################################################################# - # RASA use asyncio for handling messages and handle_text is async function, - # so we need to instantiate event loop - # futures = [rasa_confident_response_decorator(self._agent, utt, sender_id=uid) for utt, uid in - futures = [self.rasa_confident_response_decorator(self._agent, utt, sender_id=uid) for utt, uid in - zip(utterances_batch, user_ids)] - - asyncio.set_event_loop(self.ioloop) - results = self.ioloop.run_until_complete(asyncio.gather(*futures)) - - responses_batch, confidences_batch = zip(*results) - return responses_batch, confidences_batch, output_states_batch - - async def rasa_confident_response_decorator(self, rasa_agent, text_message, sender_id): - """ - Args: - rasa_agent: rasa.core.agent.Agent instance - text_message: str with utterance from user - sender_id: id of the user - - Returns: None or tuple with str and float, where first element is a message and second is - confidence - """ - - resp = await self.rasa_handle_text_verbosely(rasa_agent, text_message, sender_id) - if resp: - responses, confidences, actions = resp - else: - logger.warning("Null response from RASA Skill") - return None - - # for adaptation to deep pavlov arch we need to merge multi-messages into single string: - texts = [each_resp['text'] for each_resp in responses if 'text' in each_resp] - merged_message = "\n".join(texts) - - merged_confidence = reduce(lambda a, b: a * b, confidences) - # TODO possibly it better to choose another function for calculation of final confidence - # current realisation of confidence propagation may cause confidence decay for long actions - # chains. If long chains is your case, try max(confidence) or confidence[0] - return merged_message, merged_confidence - - async def rasa_handle_text_verbosely(self, rasa_agent, text_message, sender_id): - """ - This function reimplements RASA's rasa.core.agent.Agent.handle_text method to allow to retrieve - message responses with confidence estimation altogether. - - It reconstructs with merge RASA's methods: - https://github.com/RasaHQ/rasa_core/blob/master/rasa/core/agent.py#L401 - https://github.com/RasaHQ/rasa_core/blob/master/rasa/core/agent.py#L308 - https://github.com/RasaHQ/rasa/blob/master/rasa/core/processor.py#L327 - - This required to allow RASA to output confidences with actions altogether - (Out of the box RASA does not support such use case). - - Args: - rasa_agent: rasa.core.agent.Agent instance - text_message: str with utterance from user - sender_id: id of the user - - Returns: None or - tuple where first element is a list of messages dicts, the second element is a list - of confidence scores for all actions (it is longer than messages list, because some actions - does not produce messages) - - """ - message = UserMessage(text_message, - output_channel=None, - sender_id=sender_id) - - processor = rasa_agent.create_processor() - tracker = processor._get_tracker(message.sender_id) - - confidences = [] - actions = [] - await processor._handle_message_with_tracker(message, tracker) - # save tracker state to continue conversation from this state - processor._save_tracker(tracker) - - # here we restore some of logic in RASA management. - # ###### Loop of IntraStep decisions ########################################################## - # await processor._predict_and_execute_next_action(msg, tracker): - # https://github.com/RasaHQ/rasa/blob/master/rasa/core/processor.py#L327-L362 - # keep taking actions decided by the policy until it chooses to 'listen' - should_predict_another_action = True - num_predicted_actions = 0 - - def is_action_limit_reached(): - return (num_predicted_actions == processor.max_number_of_predictions and - should_predict_another_action) - - # action loop. predicts actions until we hit action listen - while (should_predict_another_action and - processor._should_handle_message(tracker) and - num_predicted_actions < processor.max_number_of_predictions): - # this actually just calls the policy's method by the same name - action, policy, confidence = processor.predict_next_action(tracker) - - confidences.append(confidence) - actions.append(action) - - should_predict_another_action = await processor._run_action( - action, - tracker, - message.output_channel, - processor.nlg, - policy, confidence - ) - num_predicted_actions += 1 - - if is_action_limit_reached(): - # circuit breaker was tripped - logger.warning( - "Circuit breaker tripped. Stopped predicting " - "more actions for sender '{}'".format(tracker.sender_id)) - if processor.on_circuit_break: - # call a registered callback - processor.on_circuit_break(tracker, message.output_channel, processor.nlg) - - if isinstance(message.output_channel, CollectingOutputChannel): - - return message.output_channel.messages, confidences, actions - else: - return None - - def _generate_user_id(self) -> str: - """ - Here you put user id generative logic if you want to implement it in the skill. - - Although it is better to delegate user_id generation to Agent Layer - Returns: str - - """ - return uuid.uuid1().hex - - def _handle_user_identification(self, utterances_batch, states_batch): - """Method preprocesses states batch to guarantee that all users are identified (or - identifiers are generated for all users). - - Args: - utterances_batch: batch of utterances - states_batch: batch of states - - Returns: - - """ - # grasp user_ids from states batch. - # We expect that skill receives None or dict of state for each utterance. - # if state has user_id then skill uses it, otherwise it generates user_id and calls the - # user with this name in further. - - # In this implementation we use current datetime for generating uniqe ids - output_states_batch = [] - user_ids = [] - if states_batch is None: - # generate states batch matching batch of utterances: - states_batch = [None] * len(utterances_batch) - - for state in states_batch: - if not state: - user_id = self._generate_user_id() - new_state = {'user_id': user_id} - - elif 'user_id' not in state: - new_state = state - user_id = self._generate_user_id() - new_state['user_id'] = self._generate_user_id() - - else: - new_state = state - user_id = new_state['user_id'] - - user_ids.append(user_id) - output_states_batch.append(new_state) - return user_ids, output_states_batch - - def destroy(self): - self.ioloop.close() - super().destroy() diff --git a/deeppavlov/utils/alexa/__init__.py b/deeppavlov/utils/alexa/__init__.py deleted file mode 100644 index 7c5931bb83..0000000000 --- a/deeppavlov/utils/alexa/__init__.py +++ /dev/null @@ -1 +0,0 @@ -from .server import start_alexa_server diff --git a/deeppavlov/utils/alexa/request_parameters.py b/deeppavlov/utils/alexa/request_parameters.py deleted file mode 100644 index c360d8e3c7..0000000000 --- a/deeppavlov/utils/alexa/request_parameters.py +++ /dev/null @@ -1,94 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Request parameters for the DeepPavlov model launched as a skill for Amazon Alexa. - -Request parameters from this module are used to declare additional information -and validation for request parameters to the DeepPavlov model launched as -a skill for Amazon Alexa. - -See details at https://fastapi.tiangolo.com/tutorial/header-params/, - https://fastapi.tiangolo.com/tutorial/body-multiple-params/ - -""" - -from fastapi import Header, Body - -_signature_example = 'Z5H5wqd06ExFVPNfJiqhKvAFjkf+cTVodOUirucHGcEVAMO1LfvgqWUkZ/X1ITDZbI0w+SMwVkEQZlkeThbVS/54M22StNDUtfz4Ua20xNDpIPwcWIACAmZ38XxbbTEFJI5WwqrbilNcfzqiGrIPfdO5rl+/xUjHFUdcJdUY/QzBxXsceytVYfEiR9MzOCN2m4C0XnpThUavAu159KrLj8AkuzN0JF87iXv+zOEeZRgEuwmsAnJrRUwkJ4yWokEPnSVdjF0D6f6CscfyvRe9nsWShq7/zRTa41meweh+n006zvf58MbzRdXPB22RI4AN0ksWW7hSC8/QLAKQE+lvaw==' -_signature_cert_chain_url_example = 'https://s3.amazonaws.com/echo.api/echo-api-cert-6-ats.pem' -_body_example = { - "version": "1.0", - "session": { - "new": True, - "sessionId": "amzn1.echo-api.session.ee48c20e-5ad5-461f-a735-ce058491e914", - "application": { - "applicationId": "amzn1.ask.skill.52b86ebd-dd7d-45c3-a763-de584f62b8d6" - }, - "user": { - "userId": "amzn1.ask.account.AHUAJ5RRTJDATP63AIRLNOVBC2QCJ7U5WSVSD432EA45PDVWAX5CQ6Z2OLD2H2A77VSBQGIMIWAVBMWLHK2EVZAE5VVJ2FHWS4AQM3GMIDH62GZBZ4DOUWXA3DXRBBXXXTKAITDUCZTLG5GP3XN7YORE5FQO2MERGKK7WAJUTHPMLYN4W2IUBVYDIW7544M57N4KV5HMS4DESMY" - } - }, - "context": { - "System": { - "application": { - "applicationId": "amzn1.ask.skill.52b86ebd-dd7d-45c3-a763-de584f62b8d6" - }, - "user": { - "userId": "amzn1.ask.account.AHUAJ5RRTJDATP63AIRLNOVBC2QCJ7U5WSVSD432EA45PDVWAX5CQ6Z2OLD2H2A77VSBQGIMIWAVBMWLHK2EVZAE5VVJ2FHWS4AQM3GMIDH62GZBZ4DOUWXA3DXRBBXXXTKAITDUCZTLG5GP3XN7YORE5FQO2MERGKK7WAJUTHPMLYN4W2IUBVYDIW7544M57N4KV5HMS4DESMY" - }, - "device": { - "deviceId": "amzn1.ask.device.AH777YKPTWMNQGVKUKDWPQOWWEDBDJNMIGP5GHDXOIMI3N5RYZWQ2HBQEOUXMUJEHRBKDX6HCFEA7RRWNAGKHJLSD5KWLTKR35D42TW6BVL64THCYUITTH3G6ZMWZ6GNAELTXWB4YAZJWUK4J2BIFVLUP2KHZNTQRJRBEFGNWY4V2RCEEQOZC", - "supportedInterfaces": {} - }, - "apiEndpoint": "https://api.amazonalexa.com", - "apiAccessToken": "eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiIsImtpZCI6IjEifQ.eyJhdWQiOiJodHRwczovL2FwaS5hbWF6b25hbGV4YS5jb20iLCJpc3MiOiJBbGV4YVNraWxsS2l0Iiwic3ViIjoiYW16bjEuYXNrLnNraWxsLjUyYjg2ZWJkLWRkN2QtNDVjMy1hNzYzLWRlNTg0ZjYyYjhkNiIsImV4cCI6MTU2OTgzNTY5MiwiaWF0IjoxNTY5ODM1MzkyLCJuYmYiOjE1Njk4MzUzOTIsInByaXZhdGVDbGFpbXMiOnsiY29udGV4dCI6IkFBQUFBQUFBQUFCTm5aUTd4b09EcGNYL0tuMDFpZ1F6S2dFQUFBQUFBQUJSazluemRVNTlQZWVFY0t5SERSZEwzRiszdnZrVGpQWWQ3MnhFYzFQcUNSeStTTWZmaFFscUh4azJuTHNTV01JKzFnZEtYc0t1RGVSQkJqNERTck5TUWVCZjNkbmtxNERWMXRqVjhmUnB1UWRXdlY2bERZN3YycXMyZVRlZEN6V0RLY21oRXFjRHdBNWlmdUxEdzB5bmZVVVh6Rk0yLzBBeDdGUmYxaS9FWXJRaWV0T2Q1dWllYU9RUFUrUUNMUUNRMFI0Ni9Ld1d1SWdxcE5sSGw0bU0xSHNhYXJOS3VzM0hDRzNyNm9LekxkT25EVUFKTDRtajkzSGwwZUhUQ1M0WDFySEtTTHNMNUlxa2hnUTk3a0R0WVovK1dNbkVDNklGUEZ6OHdYYU9jaDJYS05EUTNERVlGWTE0WHRkTXY0MlBYeTJlQ3VjQy9udnU2ZGMxaGRjUGdkZUp2Rmw3WlBBK0RSa2RqYXovL1NNTjVQMlNBY0NqK2JBZXIrTGZOTDByYUxhbGh5OEhleGl5IiwiY29uc2VudFRva2VuIjpudWxsLCJkZXZpY2VJZCI6ImFtem4xLmFzay5kZXZpY2UuQUg3NzdZS1BUV01OUUdWS1VLRFdQUU9XV0VEQkRKTk1JR1A1R0hEWE9JTUkzTjVSWVpXUTJIQlFFT1VYTVVKRUhSQktEWDZIQ0ZFQTdSUldOQUdLSEpMU0Q1S1dMVEtSMzVENDJUVzZCVkw2NFRIQ1lVSVRUSDNHNlpNV1o2R05BRUxUWFdCNFlBWkpXVUs0SjJCSUZWTFVQMktIWk5UUVJKUkJFRkdOV1k0VjJSQ0VFUU9aQyIsInVzZXJJZCI6ImFtem4xLmFzay5hY2NvdW50LkFIVUFKNVJSVEpEQVRQNjNBSVJMTk9WQkMyUUNKN1U1V1NWU0Q0MzJFQTQ1UERWV0FYNUNRNloyT0xEMkgyQTc3VlNCUUdJTUlXQVZCTVdMSEsyRVZaQUU1VlZKMkZIV1M0QVFNM0dNSURINjJHWkJaNERPVVdYQTNEWFJCQlhYWFRLQUlURFVDWlRMRzVHUDNYTjdZT1JFNUZRTzJNRVJHS0s3V0FKVVRIUE1MWU40VzJJVUJWWURJVzc1NDRNNTdONEtWNUhNUzRERVNNWSJ9fQ.brF2UpwjKMbYhR50WdoALbz0CM9hFtfAUw4Hh9-tOMJY8imui3oadv5S6QbQlfYD4_V_mJG2WOfkLmvirdRwdY6gI289WB48a6pK29VVcJWhYv1wIEpNQUMvMQqMZpjUuCI6DR9PqSeHulqPt14ytiA1ghOVSsAsHFXGbhNNeM9SdS1Ss0JQolSvXo09qC3JFRpDBI1bzBxRthhWEwgIEkC-JuFAbCbXz-710FkI4vzlMElgvC2GIsPf-5RaTJXps4UuG1rLieerirrrZfbpmhO0x2vDbLvBCCbqUtoHPyKofexfBXebvMjjJ7PRZvKYxAg3SBVZLvpGVl0prgJ8PA" - }, - "Viewport": { - "experiences": [ - { - "arcMinuteWidth": 246, - "arcMinuteHeight": 144, - "canRotate": False, - "canResize": False - } - ], - "shape": "RECTANGLE", - "pixelWidth": 1024, - "pixelHeight": 600, - "dpi": 160, - "currentPixelWidth": 1024, - "currentPixelHeight": 600, - "touch": [ - "SINGLE" - ], - "video": { - "codecs": [ - "H_264_42", - "H_264_41" - ] - } - } - }, - "request": { - "type": "LaunchRequest", - "requestId": "amzn1.echo-api.request.9b112eb9-eb11-433d-b6b3-8dba7eab9637", - "timestamp": "2019-09-30T09:23:12Z", - "locale": "en-US", - "shouldLinkResultBeReturned": False - } -} - -signature_header = Header(..., example=_signature_example, alias='Signature') -cert_chain_url_header = Header(..., example=_signature_cert_chain_url_example, alias='Signaturecertchainurl') -data_body = Body(..., example=_body_example) diff --git a/deeppavlov/utils/alexa/server.py b/deeppavlov/utils/alexa/server.py deleted file mode 100644 index eff296f733..0000000000 --- a/deeppavlov/utils/alexa/server.py +++ /dev/null @@ -1,88 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import asyncio -import json -from logging import getLogger -from pathlib import Path -from queue import Queue -from typing import Union, Optional - -import uvicorn -from fastapi import FastAPI -from starlette.responses import JSONResponse - -from deeppavlov.core.common.log import log_config -from deeppavlov.utils.alexa.request_parameters import data_body, cert_chain_url_header, signature_header -from deeppavlov.utils.connector import AlexaBot -from deeppavlov.utils.server import get_ssl_params, redirect_root_to_docs, get_server_params - -log = getLogger(__name__) -app = FastAPI() - - -def start_alexa_server(model_config: Union[str, Path, dict], - port: Optional[int] = None, - https: Optional[bool] = None, - ssl_key: Optional[str] = None, - ssl_cert: Optional[str] = None) -> None: - """Initiates FastAPI web service with Alexa skill. - - Allows raise Alexa web service with DeepPavlov config in backend. - - Args: - model_config: DeepPavlov config path. - port: FastAPI web service port. - https: Flag for running Alexa skill service in https mode. - ssl_key: SSL key file path. - ssl_cert: SSL certificate file path. - - """ - server_params = get_server_params(model_config) - - host = server_params['host'] - port = port or server_params['port'] - - ssl_config = get_ssl_params(server_params, https, ssl_key=ssl_key, ssl_cert=ssl_cert) - - input_q = Queue() - output_q = Queue() - - bot = AlexaBot(model_config, input_q, output_q) - bot.start() - - endpoint = '/interact' - redirect_root_to_docs(app, 'interact', endpoint, 'post') - - @app.post(endpoint, summary='Amazon Alexa custom service endpoint', response_description='A model response') - async def interact(data: dict = data_body, - signature: str = signature_header, - signature_chain_url: str = cert_chain_url_header) -> JSONResponse: - # It is necessary for correct data validation to serialize data to a JSON formatted string with separators. - request_dict = { - 'request_body': json.dumps(data, separators=(',', ':')).encode('utf-8'), - 'signature_chain_url': signature_chain_url, - 'signature': signature, - 'alexa_request': data - } - - bot.input_queue.put(request_dict) - loop = asyncio.get_event_loop() - response: dict = await loop.run_in_executor(None, bot.output_queue.get) - response_code = 400 if 'error' in response.keys() else 200 - return JSONResponse(response, status_code=response_code) - - uvicorn.run(app, host=host, port=port, log_config=log_config, ssl_version=ssl_config.version, - ssl_keyfile=ssl_config.keyfile, ssl_certfile=ssl_config.certfile) - bot.join() diff --git a/deeppavlov/utils/alice/__init__.py b/deeppavlov/utils/alice/__init__.py deleted file mode 100644 index 02434309b8..0000000000 --- a/deeppavlov/utils/alice/__init__.py +++ /dev/null @@ -1 +0,0 @@ -from .server import start_alice_server diff --git a/deeppavlov/utils/alice/request_parameters.py b/deeppavlov/utils/alice/request_parameters.py deleted file mode 100644 index 6974c581aa..0000000000 --- a/deeppavlov/utils/alice/request_parameters.py +++ /dev/null @@ -1,57 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -"""Request parameters for the DeepPavlov model launched as a skill for Yandex.Alice. - -Request parameters from this module are used to declare additional information -and validation for request parameters to the DeepPavlov model launched as -a skill for Yandex.Alice. - -See details at https://fastapi.tiangolo.com/tutorial/body-multiple-params/ - -""" - -from fastapi import Body - -_body_example = { - 'name': 'data', - 'in': 'body', - 'required': 'true', - 'example': { - 'meta': { - 'locale': 'ru-RU', - 'timezone': 'Europe/Moscow', - "client_id": 'ru.yandex.searchplugin/5.80 (Samsung Galaxy; Android 4.4)' - }, - 'request': { - 'command': 'где ближайшее отделение', - 'original_utterance': 'Алиса спроси у Сбербанка где ближайшее отделение', - 'type': 'SimpleUtterance', - 'markup': { - 'dangerous_context': True - }, - 'payload': {} - }, - 'session': { - 'new': True, - 'message_id': 4, - 'session_id': '2eac4854-fce721f3-b845abba-20d60', - 'skill_id': '3ad36498-f5rd-4079-a14b-788652932056', - 'user_id': 'AC9WC3DF6FCE052E45A4566A48E6B7193774B84814CE49A922E163B8B29881DC' - }, - 'version': '1.0' - } -} - -data_body = Body(..., example=_body_example) diff --git a/deeppavlov/utils/alice/server.py b/deeppavlov/utils/alice/server.py deleted file mode 100644 index 33efd6e46a..0000000000 --- a/deeppavlov/utils/alice/server.py +++ /dev/null @@ -1,65 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import asyncio -from logging import getLogger -from pathlib import Path -from queue import Queue -from typing import Optional, Union - -import uvicorn -from fastapi import FastAPI - -from deeppavlov.core.common.log import log_config -from deeppavlov.utils.alice.request_parameters import data_body -from deeppavlov.utils.connector import AliceBot -from deeppavlov.utils.server import get_server_params, get_ssl_params, redirect_root_to_docs - -log = getLogger(__name__) -app = FastAPI() - - -def start_alice_server(model_config: Union[str, Path], - host: Optional[str] = None, - port: Optional[int] = None, - endpoint: Optional[str] = None, - https: Optional[bool] = None, - ssl_key: Optional[str] = None, - ssl_cert: Optional[str] = None) -> None: - server_params = get_server_params(model_config) - - host = host or server_params['host'] - port = port or server_params['port'] - endpoint = endpoint or server_params['model_endpoint'] - - ssl_config = get_ssl_params(server_params, https, ssl_key=ssl_key, ssl_cert=ssl_cert) - - input_q = Queue() - output_q = Queue() - - bot = AliceBot(model_config, input_q, output_q) - bot.start() - - redirect_root_to_docs(app, 'answer', endpoint, 'post') - - @app.post(endpoint, summary='A model endpoint', response_description='A model response') - async def answer(data: dict = data_body) -> dict: - loop = asyncio.get_event_loop() - bot.input_queue.put(data) - response: dict = await loop.run_in_executor(None, bot.output_queue.get) - return response - - uvicorn.run(app, host=host, port=port, log_config=log_config, ssl_version=ssl_config.version, - ssl_keyfile=ssl_config.keyfile, ssl_certfile=ssl_config.certfile) - bot.join() diff --git a/deeppavlov/utils/connector/__init__.py b/deeppavlov/utils/connector/__init__.py index 6adbf146d9..711fcbba8b 100644 --- a/deeppavlov/utils/connector/__init__.py +++ b/deeppavlov/utils/connector/__init__.py @@ -1,2 +1 @@ -from .bot import AlexaBot, AliceBot, MSBot, TelegramBot from .dialog_logger import DialogLogger diff --git a/deeppavlov/utils/connector/bot.py b/deeppavlov/utils/connector/bot.py deleted file mode 100644 index 9eb1e4eece..0000000000 --- a/deeppavlov/utils/connector/bot.py +++ /dev/null @@ -1,544 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import threading -from collections import namedtuple -from datetime import datetime, timedelta -from logging import getLogger -from pathlib import Path -from queue import Empty, Queue -from threading import Thread, Timer -from typing import Dict, Optional, Union - -import requests -import telebot -from OpenSSL.crypto import X509 -from requests.exceptions import HTTPError - -from deeppavlov.core.commands.infer import build_model -from deeppavlov.core.common.chainer import Chainer -from deeppavlov.core.common.file import read_json -from deeppavlov.core.common.paths import get_settings_path -from deeppavlov.utils.connector.conversation import AlexaConversation, AliceConversation, BaseConversation -from deeppavlov.utils.connector.conversation import MSConversation, TelegramConversation -from deeppavlov.utils.connector.ssl_tools import verify_cert, verify_signature - -CONNECTOR_CONFIG_FILENAME = 'server_config.json' -INPUT_QUEUE_TIMEOUT = 1 - -log = getLogger(__name__) - -ValidatedCert = namedtuple('ValidatedCert', ['cert', 'expiration_timestamp']) - - -class BaseBot(Thread): - """Routes requests to conversations, sends responses to channel. - - Attributes: - input_queue: Queue for incoming requests from the channel. - - """ - input_queue: Queue - _run_flag: bool - _model: Chainer - _conversations: Dict[str, BaseConversation] - - def __init__(self, - model_config: Union[str, Path, dict], - input_queue: Queue) -> None: - """Builds DeepPavlov model, initiates class attributes. - - Args: - model_config: Path to DeepPavlov model config file. - input_queue: Queue for incoming requests from channel. - - """ - super(BaseBot, self).__init__() - self.input_queue = input_queue - self._run_flag = True - self._model = build_model(model_config) - self._conversations = dict() - log.info('Bot initiated') - - def run(self) -> None: - """Thread run method implementation. Routes requests from ``input_queue`` to request handler.""" - while self._run_flag: - try: - request: dict = self.input_queue.get(timeout=INPUT_QUEUE_TIMEOUT) - except Empty: - pass - else: - response = self._handle_request(request) - self._send_response(response) - - def join(self, timeout: Optional[float] = None) -> None: - """Thread join method implementation. Stops reading requests from ``input_queue``, cancels all timers. - - Args: - timeout: Timeout for join operation in seconds. If the timeout argument is not present or None, - the operation will block until the thread terminates. - - """ - self._run_flag = False - for timer in threading.enumerate(): - if isinstance(timer, Timer): - timer.cancel() - Thread.join(self, timeout) - - def _del_conversation(self, conversation_key: Union[int, str]) -> None: - """Deletes Conversation instance. - - Args: - conversation_key: Conversation key. - - """ - if conversation_key in self._conversations.keys(): - del self._conversations[conversation_key] - log.info(f'Deleted conversation, key: {conversation_key}') - - def _handle_request(self, request: dict) -> Optional[dict]: - """Routes the request to the appropriate conversation. - - Args: - request: Request from the channel. - - Returns: - response: Corresponding response to the channel request if replies are sent via bot, None otherwise. - - """ - raise NotImplementedError - - def _send_response(self, response: Optional[dict]) -> None: - """Sends response to the request back to the channel. - - Args: - response: Corresponding response to the channel request if replies are sent via bot, None otherwise. - - """ - raise NotImplementedError - - def _get_connector_params(self) -> dict: - """Reads bot and conversation default params from connector config file. - - Returns: - connector_defaults: Dictionary containing bot defaults and conversation defaults dicts. - - """ - connector_config_path = get_settings_path() / CONNECTOR_CONFIG_FILENAME - connector_config: dict = read_json(connector_config_path) - - bot_name = type(self).__name__ - conversation_defaults = connector_config['telegram'] - bot_defaults = connector_config['deprecated'].get(bot_name, conversation_defaults) - - connector_defaults = {'bot_defaults': bot_defaults, - 'conversation_defaults': conversation_defaults} - - return connector_defaults - - -class AlexaBot(BaseBot): - """Validates Alexa requests and routes them to conversations, sends responses to Alexa. - - Attributes: - input_queue: Queue for incoming requests from Alexa. - output_queue: Queue for outgoing responses to Alexa. - - """ - output_queue: Queue - _conversation_config: dict - _amazon_cert_lifetime: timedelta - _request_timestamp_tolerance_secs: int - _refresh_valid_certs_period_secs: int - _valid_certificates: Dict[str, ValidatedCert] - _timer: Timer - - def __init__(self, - model_config: Union[str, Path, dict], - input_queue: Queue, - output_queue: Queue) -> None: - """Initiates class attributes. - - Args: - model_config: Path to DeepPavlov model config file. - input_queue: Queue for incoming requests from Alexa. - output_queue: Queue for outgoing responses to Alexa. - - """ - super(AlexaBot, self).__init__(model_config, input_queue) - self.output_queue = output_queue - - connector_config: dict = self._get_connector_params() - self._conversation_config: dict = connector_config['conversation_defaults'] - bot_config: dict = connector_config['bot_defaults'] - - self._conversation_config['intent_name'] = bot_config['intent_name'] - self._conversation_config['slot_name'] = bot_config['slot_name'] - - self._amazon_cert_lifetime = timedelta(seconds=bot_config['amazon_cert_lifetime_secs']) - self._request_timestamp_tolerance_secs = bot_config['request_timestamp_tolerance_secs'] - self._refresh_valid_certs_period_secs = bot_config['refresh_valid_certs_period_secs'] - self._valid_certificates = {} - self._refresh_valid_certs() - - def _refresh_valid_certs(self) -> None: - """Provides cleanup of periodical certificates with expired validation.""" - self._timer = Timer(self._refresh_valid_certs_period_secs, self._refresh_valid_certs) - self._timer.start() - - expired_certificates = [] - - for valid_cert_url, valid_cert in self._valid_certificates.items(): - valid_cert: ValidatedCert = valid_cert - cert_expiration_time: datetime = valid_cert.expiration_timestamp - if datetime.utcnow() > cert_expiration_time: - expired_certificates.append(valid_cert_url) - - for expired_cert_url in expired_certificates: - del self._valid_certificates[expired_cert_url] - log.info(f'Validation period of {expired_cert_url} certificate expired') - - def _verify_request(self, signature_chain_url: str, signature: str, request_body: bytes) -> bool: - """Provides series of Alexa request verifications against Amazon Alexa requirements. - - Args: - signature_chain_url: Signature certificate URL from SignatureCertChainUrl HTTP header. - signature: Base64 decoded Alexa request signature from Signature HTTP header. - request_body: full HTTPS request body - - Returns: - result: True if verification was successful, False otherwise. - - """ - if signature_chain_url not in self._valid_certificates.keys(): - amazon_cert: X509 = verify_cert(signature_chain_url) - if amazon_cert: - expiration_timestamp = datetime.utcnow() + self._amazon_cert_lifetime - validated_cert = ValidatedCert(cert=amazon_cert, expiration_timestamp=expiration_timestamp) - self._valid_certificates[signature_chain_url] = validated_cert - log.info(f'Certificate {signature_chain_url} validated') - else: - log.error(f'Certificate {signature_chain_url} validation failed') - return False - else: - validated_cert: ValidatedCert = self._valid_certificates[signature_chain_url] - amazon_cert: X509 = validated_cert.cert - - if verify_signature(amazon_cert, signature, request_body): - result = True - else: - log.error(f'Failed signature verification for request: {request_body.decode("utf-8", "replace")}') - result = False - - return result - - def _handle_request(self, request: dict) -> dict: - """Processes Alexa request and returns response. - - Args: - request: Dict with Alexa request payload and metadata. - - Returns: - result: Alexa formatted or error response. - - """ - request_body: bytes = request['request_body'] - signature_chain_url: str = request['signature_chain_url'] - signature: str = request['signature'] - alexa_request: dict = request['alexa_request'] - - if not self._verify_request(signature_chain_url, signature, request_body): - return {'error': 'failed certificate/signature check'} - - timestamp_str = alexa_request['request']['timestamp'] - timestamp_datetime = datetime.strptime(timestamp_str, '%Y-%m-%dT%H:%M:%SZ') - now = datetime.utcnow() - - delta = now - timestamp_datetime if now >= timestamp_datetime else timestamp_datetime - now - - if abs(delta.seconds) > self._request_timestamp_tolerance_secs: - log.error(f'Failed timestamp check for request: {request_body.decode("utf-8", "replace")}') - return {'error': 'failed request timestamp check'} - - conversation_key = alexa_request['session']['sessionId'] - - if conversation_key not in self._conversations: - self._conversations[conversation_key] = \ - AlexaConversation(config=self._conversation_config, - model=self._model, - self_destruct_callback=self._del_conversation, - conversation_id=conversation_key) - - log.info(f'Created new conversation, key: {conversation_key}') - - conversation = self._conversations[conversation_key] - response = conversation.handle_request(alexa_request) - - return response - - def _send_response(self, response: dict) -> None: - """Sends response to Alexa. - - Args: - response: Alexa formatted or error response. - - """ - self.output_queue.put(response) - - -class AliceBot(BaseBot): - """Processes Alice requests and routes them to conversations, returns responses to Alice. - - Attributes: - input_queue: Queue for incoming requests from Alice. - output_queue: Queue for outgoing responses to Alice. - - """ - output_queue: Queue - _conversation_config: dict - - def __init__(self, - model_config: Union[str, Path, dict], - input_queue: Queue, - output_queue: Queue) -> None: - """Initiates class attributes. - - Args: - model_config: Path to DeepPavlov model config file. - input_queue: Queue for incoming requests from Alice. - output_queue: Queue for outgoing responses to Alice. - - """ - super(AliceBot, self).__init__(model_config, input_queue) - self.output_queue = output_queue - connector_config: dict = self._get_connector_params() - self._conversation_config = connector_config['conversation_defaults'] - - def _handle_request(self, request: dict) -> dict: - """Processes Alice request and returns response. - - Args: - request: Dict with Alice request payload and metadata. - - Returns: - result: Alice formatted response. - - """ - conversation_key = request['session']['session_id'] - - if conversation_key not in self._conversations: - self._conversations[conversation_key] = \ - AliceConversation(config=self._conversation_config, - model=self._model, - self_destruct_callback=self._del_conversation, - conversation_id=conversation_key) - log.info(f'Created new conversation, key: {conversation_key}') - conversation = self._conversations[conversation_key] - response = conversation.handle_request(request) - - return response - - def _send_response(self, response: dict) -> None: - """Sends response to Alice. - - Args: - response: Alice formatted response. - - """ - self.output_queue.put(response) - - -class MSBot(BaseBot): - """Routes Microsoft Bot Framework requests to conversations, sends responses to Bot Framework. - - Attributes: - input_queue: Queue for incoming requests from Microsoft Bot Framework. - - """ - _conversation_config: dict - _auth_polling_interval: int - _auth_url: str - _auth_headers: dict - _auth_payload: dict - _http_session: requests.Session - - def __init__(self, - model_config: Union[str, Path, dict], - input_queue: Queue, - client_id: Optional[str], - client_secret: Optional[str]) -> None: - """Initiates class attributes. - - Args: - model_config: Path to DeepPavlov model config file. - input_queue: Queue for incoming requests from Microsoft Bot Framework. - client_id: Microsoft App ID. - client_secret: Microsoft App Secret. - - Raises: - ValueError: If ``client_id`` or ``client_secret`` were not set neither in the configuration file nor - in method arguments. - - """ - super(MSBot, self).__init__(model_config, input_queue) - connector_config: dict = self._get_connector_params() - bot_config: dict = connector_config['bot_defaults'] - bot_config['auth_payload']['client_id'] = client_id or bot_config['auth_payload']['client_id'] - bot_config['auth_payload']['client_secret'] = client_secret or bot_config['auth_payload']['client_secret'] - - if not bot_config['auth_payload']['client_id']: - e = ValueError('Microsoft Bot Framework app id required: initiate -i param ' - 'or auth_payload.client_id param in server configuration file') - log.error(e) - raise e - - if not bot_config['auth_payload']['client_secret']: - e = ValueError('Microsoft Bot Framework app secret required: initiate -s param ' - 'or auth_payload.client_secret param in server configuration file') - log.error(e) - raise e - - self._conversation_config = connector_config['conversation_defaults'] - self._auth_polling_interval = bot_config['auth_polling_interval'] - self._auth_url = bot_config['auth_url'] - self._auth_headers = bot_config['auth_headers'] - self._auth_payload = bot_config['auth_payload'] - self._http_session = requests.Session() - self._update_access_info() - - def _update_access_info(self) -> None: - """Updates headers for http_session used to send responses to Bot Framework. - - Raises: - HTTPError: If authentication token request returned other than 200 status code. - - """ - self._timer = threading.Timer(self._auth_polling_interval, self._update_access_info) - self._timer.start() - - result = requests.post(url=self._auth_url, - headers=self._auth_headers, - data=self._auth_payload) - - status_code = result.status_code - if status_code != 200: - raise HTTPError(f'Authentication token request returned wrong HTTP status code: {status_code}') - - access_info = result.json() - headers = { - 'Authorization': f"{access_info['token_type']} {access_info['access_token']}", - 'Content-Type': 'application/json' - } - - self._http_session.headers.update(headers) - - log.info(f'Obtained authentication information from Microsoft Bot Framework: {str(access_info)}') - - def _handle_request(self, request: dict) -> None: - """Routes MS Bot Framework request to conversation. - - Args: - request: Dict with MS Bot Framework request payload and metadata. - - """ - conversation_key = request['conversation']['id'] - - if conversation_key not in self._conversations: - self._conversations[conversation_key] = \ - MSConversation(config=self._conversation_config, - model=self._model, - self_destruct_callback=self._del_conversation, - conversation_id=conversation_key, - http_session=self._http_session) - - log.info(f'Created new conversation, key: {conversation_key}') - - conversation = self._conversations[conversation_key] - conversation.handle_request(request) - - def _send_response(self, response: dict) -> None: - """Dummy method to match ``run`` method body.""" - pass - - -class TelegramBot(BaseBot): - """Routes messages from Telegram to conversations, sends responses back.""" - _conversation_config: dict - _token: str - - def __init__(self, model_config: Union[str, Path, dict], token: Optional[str]) -> None: - """Initiates and validates class attributes. - - Args: - model_config: Path to DeepPavlov model config file. - token: Telegram bot token. - - Raises: - ValueError: If telegram token was not set neither in config file nor in method arguments. - - """ - super(TelegramBot, self).__init__(model_config, Queue()) - connector_config: dict = self._get_connector_params() - bot_config: dict = connector_config['bot_defaults'] - self._conversation_config = connector_config['conversation_defaults'] - self._token = token or bot_config['token'] - - if not self._token: - e = ValueError('Telegram token required: initiate -t param or telegram_defaults/token ' - 'in server configuration file') - log.error(e) - raise e - - def start(self) -> None: - """Starts polling messages from Telegram, routes messages to handlers.""" - bot = telebot.TeleBot(self._token) - bot.remove_webhook() - - @bot.message_handler(commands=['start']) - def send_start_message(message: telebot.types.Message) -> None: - chat_id = message.chat.id - out_message = self._conversation_config['start_message'] - bot.send_message(chat_id, out_message) - - @bot.message_handler(commands=['help']) - def send_help_message(message: telebot.types.Message) -> None: - chat_id = message.chat.id - out_message = self._conversation_config['help_message'] - bot.send_message(chat_id, out_message) - - @bot.message_handler() - def handle_inference(message: telebot.types.Message) -> None: - chat_id = message.chat.id - context = message.text - - if chat_id not in self._conversations: - self._conversations[chat_id] = \ - TelegramConversation(config=self._conversation_config, - model=self._model, - self_destruct_callback=self._del_conversation, - conversation_id=chat_id) - - conversation = self._conversations[chat_id] - response = conversation.handle_request(context) - bot.send_message(chat_id, response) - - bot.polling() - - def _handle_request(self, request: dict) -> None: - """Dummy method to match ``run`` method body.""" - pass - - def _send_response(self, response: Optional[dict]) -> None: - """Dummy method to match ``run`` method body.""" - pass diff --git a/deeppavlov/utils/connector/conversation.py b/deeppavlov/utils/connector/conversation.py deleted file mode 100644 index ecc6b179dd..0000000000 --- a/deeppavlov/utils/connector/conversation.py +++ /dev/null @@ -1,465 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from logging import getLogger -from threading import Timer -from typing import Dict, Optional, Union -from urllib.parse import urljoin - -from requests import Session - -from deeppavlov.core.common.chainer import Chainer -from deeppavlov.utils.connector.dialog_logger import DialogLogger - -log = getLogger(__name__) - -DIALOG_LOGGER_NAME_MAPPING = { - 'AlexaConversation': 'alexa', - 'AliceConversation': 'alice', - 'MSConversation': 'ms_bot_framework', - 'TelegramConversation': 'telegram', - '_unsupported': 'new_conversation' -} - - -class BaseConversation: - """Receives requests, generates responses.""" - _model: Chainer - _self_destruct_callback: callable - _conversation_id: Union[int, str] - _timer: Timer - _infer_utterances: list - _conversation_lifetime: int - _next_arg_msg: str - _start_message: str - - def __init__(self, - config: dict, - model: Chainer, - self_destruct_callback: callable, - conversation_id: Union[int, str]) -> None: - """Initiates instance properties and starts self-destruct timer. - - Args: - config: Dictionary containing base conversation parameters. - model: Model that infered with user messages. - self_destruct_callback: Function that removes this Conversation instance. - conversation_id: Conversation ID. - - """ - self._model = model - self._self_destruct_callback = self_destruct_callback - self._conversation_id = conversation_id - self._infer_utterances = list() - self._conversation_lifetime = config['conversation_lifetime'] - self._next_arg_msg = config['next_argument_message'] - self._start_message = config['start_message'] - self._unsupported_message = config['unsupported_message'] - logger_name: str = DIALOG_LOGGER_NAME_MAPPING.get(type(self).__name__, - DIALOG_LOGGER_NAME_MAPPING['_unsupported']) - self._dialog_logger = DialogLogger(logger_name=logger_name) - self._start_timer() - - def handle_request(self, request: dict) -> Optional[dict]: - """Rearms self-destruct timer and sends the request to a processing. - - Args: - request: Request from the channel. - - Returns: - response: Corresponding to the channel response to the request from the channel if replies are sent via bot, - None otherwise. - - """ - self._rearm_self_destruct() - return self._handle_request(request) - - def _start_timer(self) -> None: - """Initiates self-destruct timer.""" - self._timer = Timer(self._conversation_lifetime, self._self_destruct_callback, [self._conversation_id]) - self._timer.start() - - def _rearm_self_destruct(self) -> None: - """Rearms self-destruct timer.""" - self._timer.cancel() - self._start_timer() - - def _handle_request(self, request: dict) -> Optional[dict]: - """Routes the request to the appropriate handler. - - Args: - request: Request from the channel. - - Returns: - response: Corresponding response to the channel request if replies are sent via bot, None otherwise. - - """ - raise NotImplementedError - - def _handle_launch(self, request: dict) -> Optional[dict]: - """Handles launch request. - - Args: - request: Start request from channel. - - Returns: - response: Greeting message wrapped in the appropriate to the channel structure if replies are sent via bot, - None otherwise. - - """ - response = self._generate_response(self._start_message, request) - - return response - - def _handle_unsupported(self, request: dict) -> Optional[dict]: - """Handles all unsupported request types. - - Args: - request: Request from channel for which a separate handler was not defined. - - Returns: - response: Message that request type is not supported wrapped in the appropriate to the channel data - structure if replies are sent via bot, None otherwise. - - """ - response = self._generate_response(self._unsupported_message, request) - log.warning(f'Unsupported request: {request}') - - return response - - def _generate_response(self, message: str, request: dict) -> Optional[dict]: - """Wraps message in the appropriate to the channel data structure. - - Args: - message: Raw message to be sent to the channel. - request: Request from the channel to which the ``message`` replies. - - Returns: - response: Data structure to be sent to the channel if replies are sent via bot, None otherwise. - - """ - raise NotImplementedError - - def _act(self, utterance: str) -> str: - """Infers DeepPavlov model with utterance. - - If DeepPavlov model requires more than one argument, utterances are accumulated until reaching required - arguments amount to infer. - - Args: - utterance: Text to be processed by DeepPavlov model. - - Returns: - response: Model response if enough model arguments have been accumulated, message prompting for the next - model argument otherwise. - - """ - self._infer_utterances.append([utterance]) - - if len(self._infer_utterances) == len(self._model.in_x): - self._dialog_logger.log_in(self._infer_utterances, self._conversation_id) - prediction = self._model(*self._infer_utterances) - self._infer_utterances = list() - if len(self._model.out_params) == 1: - prediction = [prediction] - prediction = '; '.join([str(output[0]) for output in prediction]) - response = prediction - self._dialog_logger.log_out(response, self._conversation_id) - else: - response = self._next_arg_msg.format(self._model.in_x[len(self._infer_utterances)]) - - return response - - -class AlexaConversation(BaseConversation): - """Receives requests from Amazon Alexa and generates responses.""" - _intent_name: str - _slot_name: str - _handled_requests: Dict[str, callable] - - def __init__(self, config: dict, model, self_destruct_callback: callable, conversation_id: str) -> None: - super(AlexaConversation, self).__init__(config, model, self_destruct_callback, conversation_id) - self._intent_name = config['intent_name'] - self._slot_name = config['slot_name'] - - self._handled_requests = { - 'LaunchRequest': self._handle_launch, - 'IntentRequest': self._handle_intent, - 'SessionEndedRequest': self._handle_end, - '_unsupported': self._handle_unsupported - } - - def _handle_request(self, request: dict) -> dict: - """Routes Alexa requests to the appropriate handler. - - Args: - request: Alexa request. - - Returns: - response: Response conforming to the Alexa response specification. - - """ - request_type = request['request']['type'] - request_id = request['request']['requestId'] - log.debug(f'Received request. Type: {request_type}, id: {request_id}') - - if request_type in self._handled_requests: - response = self._handled_requests[request_type](request) - else: - response = self._handled_requests['_unsupported'](request) - - return response - - def _generate_response(self, message: str, request: dict) -> dict: - """Wraps message in the conforming to the Alexa data structure. - - Args: - message: Raw message to be sent to Alexa. - request: Request from the channel to which the ``message`` replies. - - Returns: - response: Data structure conforming to the Alexa response specification. - - """ - response = { - 'version': '1.0', - 'sessionAttributes': { - 'sessionId': request['session']['sessionId'] - }, - 'response': { - 'shouldEndSession': False, - 'outputSpeech': { - 'type': 'PlainText', - 'text': message - }, - 'card': { - 'type': 'Simple', - 'content': message - } - } - } - - return response - - def _handle_intent(self, request: dict) -> dict: - """Handles IntentRequest Alexa request. - - Args: - request: Alexa request. - - Returns: - response: Data structure conforming to the Alexa response specification. - - """ - request_id = request['request']['requestId'] - request_intent: dict = request['request']['intent'] - - if self._intent_name != request_intent['name']: - log.error(f"Wrong intent name received: {request_intent['name']} in request {request_id}") - return {'error': 'wrong intent name'} - - if self._slot_name not in request_intent['slots'].keys(): - log.error(f'No slot named {self._slot_name} found in request {request_id}') - return {'error': 'no slot found'} - - utterance = request_intent['slots'][self._slot_name]['value'] - model_response = self._act(utterance) - - if not model_response: - log.error(f'Some error during response generation for request {request_id}') - return {'error': 'error during response generation'} - - response = self._generate_response(model_response, request) - - return response - - def _handle_end(self, request: dict) -> dict: - """Handles SessionEndedRequest Alexa request and deletes Conversation instance. - - Args: - request: Alexa request. - - Returns: - response: Dummy empty response dict. - - """ - response = {} - self._self_destruct_callback(self._conversation_id) - return response - - -class AliceConversation(BaseConversation): - """Receives requests from Yandex.Alice and generates responses.""" - def _handle_request(self, request: dict) -> dict: - """Routes Alice requests to the appropriate handler. - - Args: - request: Alice request. - - Returns: - response: Response conforming to the Alice response specification. - - """ - message_id = request['session']['message_id'] - session_id = request['session']['session_id'] - log.debug(f'Received message. Session: {session_id}, message_id: {message_id}') - - if request['session']['new']: - response = self._handle_launch(request) - elif request['request']['command'].strip(): - text = request['request']['command'].strip() - model_response = self._act(text) - response = self._generate_response(model_response, request) - else: - response = self._handle_unsupported(request) - - return response - - def _generate_response(self, message: str, request: dict) -> dict: - """Wraps message in the conforming to the Alice data structure. - - Args: - message: Raw message to be sent to Alice. - request: Request from the channel to which the ``message`` replies. - - Returns: - response: Data structure conforming to the Alice response specification. - - """ - response = { - 'response': { - 'end_session': False, - 'text': message - }, - 'session': { - 'session_id': request['session']['session_id'], - 'message_id': request['session']['message_id'], - 'user_id': request['session']['user_id'] - }, - 'version': '1.0' - } - - return response - - -class MSConversation(BaseConversation): - """Receives requests from Microsoft Bot Framework and generates responses.""" - def __init__(self, - config: dict, - model: Chainer, - self_destruct_callback: callable, - conversation_id: str, - http_session: Session) -> None: - """Initiates instance properties and starts self-destruct timer. - - Args: - config: Dictionary containing base conversation parameters. - model: Model that infered with user messages. - self_destruct_callback: Function that removes this Conversation instance. - conversation_id: Conversation ID. - http_session: Session used to send responses to Bot Framework. - - """ - super(MSConversation, self).__init__(config, model, self_destruct_callback, conversation_id) - self._http_session = http_session - - self._handled_activities = { - 'message': self._handle_message, - 'conversationUpdate': self._handle_launch, - '_unsupported': self._handle_unsupported - } - - def _handle_request(self, request: dict) -> None: - """Routes MS Bot requests to the appropriate handler. Returns None since handlers send responses themselves. - - Args: - request: MS Bot request. - - """ - activity_type = request['type'] - activity_id = request['id'] - log.debug(f'Received activity. Type: {activity_type}, id: {activity_id}') - - if activity_type in self._handled_activities.keys(): - self._handled_activities[activity_type](request) - else: - self._handled_activities['_unsupported'](request) - - self._rearm_self_destruct() - - def _handle_message(self, request: dict) -> None: - """Handles MS Bot message request. - - Request redirected to ``_unsupported`` handler if ms bot message does not contain raw text. - - Args: - request: MS Bot request. - - """ - if 'text' in request: - in_text = request['text'] - model_response = self._act(in_text) - if model_response: - self._generate_response(model_response, request) - else: - self._handled_activities['_unsupported'](request) - - def _generate_response(self, message: str, request: dict) -> None: - """Wraps message in the conforming to the MS Bot data structure and sends it to MS Bot via HTTP session. - - Args: - message: Raw message to be sent to MS Bot. - request: Request from the channel to which the ``message`` replies. - - """ - response = { - "type": "message", - "from": request['recipient'], - "recipient": request['from'], - 'conversation': request['conversation'], - 'text': message - } - - url = urljoin(request['serviceUrl'], f"v3/conversations/{request['conversation']['id']}/activities") - - response = self._http_session.post(url=url, json=response) - - try: - response_json_str = str(response.json()) - except ValueError as e: - response_json_str = repr(e) - - log.debug(f'Sent activity to the MSBotFramework server. ' - f'Response code: {response.status_code}, response contents: {response_json_str}') - - -class TelegramConversation(BaseConversation): - """Receives requests from Telegram bot and generates responses.""" - def _handle_request(self, message: str) -> str: - """Handles raw text message from Telegram bot. - - Args: - message: Message from Telegram bot. - - Returns: - response: Response to a ``message``. - - """ - response = self._act(message) - - return response - - def _generate_response(self, message: str, request: dict) -> None: - """Does nothing.""" - pass diff --git a/deeppavlov/utils/connector/ssl_tools.py b/deeppavlov/utils/connector/ssl_tools.py deleted file mode 100644 index 572ff56d92..0000000000 --- a/deeppavlov/utils/connector/ssl_tools.py +++ /dev/null @@ -1,216 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import base64 -import re -import ssl -from logging import getLogger -from pathlib import Path -from typing import List, Optional -from urllib.parse import urlsplit - -import requests -from OpenSSL import crypto - -log = getLogger(__name__) - - -def verify_sc_url(url: str) -> bool: - """Verify signature certificate URL against Amazon Alexa requirements. - - Batch of dialog IDs can be provided, in other case utterances indexes in - incoming batch are used as dialog IDs. - - Args: - url: Signature certificate URL from SignatureCertChainUrl HTTP header. - - Returns: - result: True if verification was successful, False if not. - """ - parsed = urlsplit(url) - - scheme: str = parsed.scheme - netloc: str = parsed.netloc - path: str = parsed.path - - try: - port = parsed.port - except ValueError: - port = None - - result = (scheme.lower() == 'https' and - netloc.lower().split(':')[0] == 's3.amazonaws.com' and - path.startswith('/echo.api/') and - (port == 443 or port is None)) - - return result - - -def extract_certs(certs_txt: str) -> List[crypto.X509]: - """Extracts pycrypto X509 objects from SSL certificates chain string. - - Args: - certs_txt: SSL certificates chain string. - - Returns: - result: List of pycrypto X509 objects. - """ - pattern = r'-----BEGIN CERTIFICATE-----.+?-----END CERTIFICATE-----' - certs_txt = re.findall(pattern, certs_txt, flags=re.DOTALL) - certs = [crypto.load_certificate(crypto.FILETYPE_PEM, cert_txt) for cert_txt in certs_txt] - return certs - - -def verify_sans(amazon_cert: crypto.X509) -> bool: - """Verifies Subject Alternative Names (SANs) for Amazon certificate. - - Args: - amazon_cert: Pycrypto X509 Amazon certificate. - - Returns: - result: True if verification was successful, False if not. - """ - cert_extentions = [amazon_cert.get_extension(i) for i in range(amazon_cert.get_extension_count())] - subject_alt_names = '' - - for extention in cert_extentions: - if 'subjectAltName' in str(extention.get_short_name()): - subject_alt_names = extention.__str__() - break - - result = 'echo-api.amazon.com' in subject_alt_names - - return result - - -def verify_certs_chain(certs_chain: List[crypto.X509], amazon_cert: crypto.X509) -> bool: - """Verifies if Amazon and additional certificates creates chain of trust to a root CA. - - Args: - certs_chain: List of pycrypto X509 intermediate certificates from signature chain URL. - amazon_cert: Pycrypto X509 Amazon certificate. - - Returns: - result: True if verification was successful, False if not. - """ - store = crypto.X509Store() - - # add certificates from Amazon provided certs chain - for cert in certs_chain: - store.add_cert(cert) - - # add CA certificates - default_verify_paths = ssl.get_default_verify_paths() - - default_verify_file = default_verify_paths.cafile - default_verify_file = Path(default_verify_file).resolve() if default_verify_file else None - - default_verify_path = default_verify_paths.capath - default_verify_path = Path(default_verify_path).resolve() if default_verify_path else None - - ca_files = [ca_file for ca_file in default_verify_path.iterdir()] if default_verify_path else [] - if default_verify_file: - ca_files.append(default_verify_file) - - for ca_file in ca_files: - ca_file: Path - if ca_file.is_file(): - with ca_file.open('r', encoding='ascii') as crt_f: - ca_certs_txt = crt_f.read() - ca_certs = extract_certs(ca_certs_txt) - for cert in ca_certs: - store.add_cert(cert) - - # add CA certificates (Windows) - ssl_context = ssl.create_default_context() - der_certs = ssl_context.get_ca_certs(binary_form=True) - pem_certs = '\n'.join([ssl.DER_cert_to_PEM_cert(der_cert) for der_cert in der_certs]) - ca_certs = extract_certs(pem_certs) - for ca_cert in ca_certs: - store.add_cert(ca_cert) - - store_context = crypto.X509StoreContext(store, amazon_cert) - - try: - store_context.verify_certificate() - result = True - except crypto.X509StoreContextError: - result = False - - return result - - -def verify_signature(amazon_cert: crypto.X509, signature: str, request_body: bytes) -> bool: - """Verifies Alexa request signature. - - Args: - amazon_cert: Pycrypto X509 Amazon certificate. - signature: Base64 decoded Alexa request signature from Signature HTTP header. - request_body: full HTTPS request body - Returns: - result: True if verification was successful, False if not. - """ - signature = base64.b64decode(signature) - - try: - crypto.verify(amazon_cert, signature, request_body, 'sha1') - result = True - except crypto.Error: - result = False - - return result - - -def verify_cert(signature_chain_url: str) -> Optional[crypto.X509]: - """Conducts series of Alexa SSL certificate verifications against Amazon Alexa requirements. - - Args: - signature_chain_url: Signature certificate URL from SignatureCertChainUrl HTTP header. - Returns: - result: Amazon certificate if verification was successful, None if not. - """ - try: - certs_chain_get = requests.get(signature_chain_url) - except requests.exceptions.ConnectionError as e: - log.error(f'Amazon signature chain get error: {e}') - return None - - certs_chain_txt = certs_chain_get.text - certs_chain = extract_certs(certs_chain_txt) - - amazon_cert: crypto.X509 = certs_chain.pop(0) - - # verify signature chain url - sc_url_verification = verify_sc_url(signature_chain_url) - if not sc_url_verification: - log.error(f'Amazon signature url {signature_chain_url} was not verified') - - # verify not expired - expired_verification = not amazon_cert.has_expired() - if not expired_verification: - log.error(f'Amazon certificate ({signature_chain_url}) expired') - - # verify subject alternative names - sans_verification = verify_sans(amazon_cert) - if not sans_verification: - log.error(f'Subject alternative names verification for ({signature_chain_url}) certificate failed') - - # verify certs chain - chain_verification = verify_certs_chain(certs_chain, amazon_cert) - if not chain_verification: - log.error(f'Certificates chain verification for ({signature_chain_url}) certificate failed') - - result = (sc_url_verification and expired_verification and sans_verification and chain_verification) - - return amazon_cert if result else None diff --git a/deeppavlov/utils/ms_bot_framework/__init__.py b/deeppavlov/utils/ms_bot_framework/__init__.py deleted file mode 100644 index fdd5c51faf..0000000000 --- a/deeppavlov/utils/ms_bot_framework/__init__.py +++ /dev/null @@ -1 +0,0 @@ -from .server import start_ms_bf_server diff --git a/deeppavlov/utils/ms_bot_framework/server.py b/deeppavlov/utils/ms_bot_framework/server.py deleted file mode 100644 index 325a8756a2..0000000000 --- a/deeppavlov/utils/ms_bot_framework/server.py +++ /dev/null @@ -1,60 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from logging import getLogger -from pathlib import Path -from queue import Queue -from typing import Optional - -import uvicorn -from fastapi import FastAPI - -from deeppavlov.core.common.log import log_config -from deeppavlov.utils.connector import MSBot -from deeppavlov.utils.server import get_server_params, get_ssl_params, redirect_root_to_docs - -log = getLogger(__name__) -app = FastAPI() - - -def start_ms_bf_server(model_config: Path, - app_id: Optional[str], - app_secret: Optional[str], - port: Optional[int] = None, - https: Optional[bool] = None, - ssl_key: Optional[str] = None, - ssl_cert: Optional[str] = None) -> None: - - server_params = get_server_params(model_config) - - host = server_params['host'] - port = port or server_params['port'] - - ssl_config = get_ssl_params(server_params, https, ssl_key=ssl_key, ssl_cert=ssl_cert) - - input_q = Queue() - bot = MSBot(model_config, input_q, app_id, app_secret) - bot.start() - - endpoint = '/v3/conversations' - redirect_root_to_docs(app, 'answer', endpoint, 'post') - - @app.post(endpoint) - async def answer(activity: dict) -> dict: - bot.input_queue.put(activity) - return {} - - uvicorn.run(app, host=host, port=port, log_config=log_config, ssl_version=ssl_config.version, - ssl_keyfile=ssl_config.keyfile, ssl_certfile=ssl_config.certfile) - bot.join() diff --git a/deeppavlov/utils/settings/log_config.json b/deeppavlov/utils/settings/log_config.json index d04d78125e..515384a42f 100644 --- a/deeppavlov/utils/settings/log_config.json +++ b/deeppavlov/utils/settings/log_config.json @@ -23,6 +23,13 @@ ], "propagate": true }, + "train_report": { + "level": "INFO", + "handlers": [ + "train_handler" + ], + "propagate": true + }, "filelock": { "level": "WARNING", "handlers": [ @@ -39,6 +46,9 @@ "uvicorn_fmt": { "format": "%(asctime)s %(message)s", "datefmt": "%Y-%m-%d %H:%M:%S" + }, + "message": { + "format": "%(message)s" } }, "handlers": { @@ -66,6 +76,12 @@ "formatter": "uvicorn_fmt", "stream": "ext://sys.stdout", "filters": ["probeFilter"] + }, + "train_handler": { + "class": "logging.StreamHandler", + "level": "INFO", + "formatter": "message", + "stream": "ext://sys.stdout" } }, "filters": { diff --git a/deeppavlov/utils/settings/server_config.json b/deeppavlov/utils/settings/server_config.json index 9fa2ebb2f3..1bae81cf4b 100644 --- a/deeppavlov/utils/settings/server_config.json +++ b/deeppavlov/utils/settings/server_config.json @@ -10,14 +10,6 @@ "unix_socket_file": "/tmp/deeppavlov_socket.s", "socket_launch_message": "launching socket server at" }, - "telegram": { - "token": "", - "conversation_lifetime": 3600, - "start_message": "Welcome to DeepPavlov inference bot!", - "help_message": "Welcome to DeepPavlov inference bot!", - "next_argument_message": "Please enter an argument '{}'", - "unsupported_message": "Unsupported message received." - }, "agent-rabbit": { "service_name": "", "agent_namespace": "deeppavlov_agent", @@ -28,28 +20,5 @@ "rabbit_login": "guest", "rabbit_password": "guest", "rabbit_virtualhost": "/" - }, - "deprecated": { - "AlexaBot": { - "amazon_cert_lifetime_secs": 3600, - "request_timestamp_tolerance_secs": 150, - "refresh_valid_certs_period_secs": 120, - "intent_name": "AskDeepPavlov", - "slot_name": "raw_input" - }, - "MSBot": { - "auth_polling_interval": 3500, - "auth_url": "https://login.microsoftonline.com/botframework.com/oauth2/v2.0/token", - "auth_headers": { - "Host": "login.microsoftonline.com", - "Content-Type": "application/x-www-form-urlencoded" - }, - "auth_payload": { - "grant_type": "client_credentials", - "scope": "https://api.botframework.com/.default", - "client_id": "", - "client_secret": "" - } - } } } diff --git a/deeppavlov/utils/telegram/__init__.py b/deeppavlov/utils/telegram/__init__.py deleted file mode 100644 index c04a651276..0000000000 --- a/deeppavlov/utils/telegram/__init__.py +++ /dev/null @@ -1 +0,0 @@ -from .telegram_ui import interact_model_by_telegram diff --git a/deeppavlov/utils/telegram/telegram_ui.py b/deeppavlov/utils/telegram/telegram_ui.py deleted file mode 100644 index e7e3195837..0000000000 --- a/deeppavlov/utils/telegram/telegram_ui.py +++ /dev/null @@ -1,23 +0,0 @@ -# Copyright 2017 Neural Networks and Deep Learning lab, MIPT -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -from pathlib import Path -from typing import Optional, Union - -from deeppavlov.utils.connector import TelegramBot - - -def interact_model_by_telegram(model_config: Union[str, Path, dict], token: Optional[str] = None) -> None: - bot = TelegramBot(model_config, token) - bot.start() diff --git a/docs/_static/ms_bot_framework/01_web_app_bot.png b/docs/_static/ms_bot_framework/01_web_app_bot.png deleted file mode 100644 index 9e09adb36f9057fe83e61be958d7e2273bba6519..0000000000000000000000000000000000000000 GIT binary patch literal 0 HcmV?d00001 literal 45399 zcmce;bzGF)-Z#pOgA66zHH3hIbV?14N{S*#gM!lCF-S<4h%^F9OR99I($d`_-5u|m zz4x>CInQ}cy#Jifhx=xnx#o%$zxAy}h`OpGJ`Obw1_lQHBPF?~7#Oz@7#I*v_-*ix z*n`oB7#IkQM{?57+zd8TvEnqJpR@@(uhpF}J^lKUknpem7h!2h4UK3f6^|Vz75UJo za^NM5RNbSPfeG>Xa7K(jZoc7oIo&JK_%@$$Y%R-0(Q@pgnVMx+o7kEi(bam&v;AdH z7#R^Eq?o8P?j8x3Vg2Th+B4-Po^A3uIbCIp07 zX=FtE!6PKJHP%ccM-B=5=g*WJct^$P`8T-86YxssX|?LVK9Or=fGeR+{mBGxcM-8` zW^-PjZ*c2XI}u4sOS8ZHPAOzNLuxE_7n#}Tl~-`m`0a|r?qu!GE1Pl*wuU@{0Lk$B z+s{7y*X=brl0;*;+XUnzVa>$3dQ~`ovY)&cEjLdTrWUdapc8X7x~cazpg`z|aklLZ z3~yZpe-sh?BX|kTX|UWTCS3zs$-sYXhZ^b?82p*{?-XrK8AguX&Ctfyo!68 z*<)uP*)^c0CT-2?t=nFS zam($?v(@yVbeK|Hb_N&uKQ3e!gMtNz%{e8*1K8$5PfKDk=AVkUB|nIn3nWH%Ygc-{ zqTwn&pWysM)$8OvNelbh6*2aUf1!PaA)4LyZtLTC$Y@1(>(;G;CV7{9hjaw{6Jvy# zm7cEP9oI-3-xWFg*Lx{EhM28=hr}{F({28`6!xYaUO6zhgyh=|{1*R$98|x0%FSzJ zfVXNw4rgrqo1lc!#f6D%2D^>o<=|viwx?R#0SWetiyCXPdM|}&PkI>?s##6#-OrS! z8oZ#q3+;;81lh^`#Q~W28Kol^f>T8Vw=c_yMk{(1%0_3U-+7S{@oQP>>5E}k-1+8O zeP!+!eqRwMTF`Q2ZmYu5`0~`2kd~HK0mY&ejRD@2`}5_|-dxiyPVEwmKpe8>R58~Q z>q((ky(v^j>%$#u16j(Wl2_kvwMWwVE7B7?_bp_c$p2$l7NSRD^-qmHuW)4j<=LJ? z=_g+EUaP@_pa^Qyj>xT;AcOIs)kg$M$7#vCbL)R^Vj`$;Gn#}|;98mQp2r|MHbib5 z4G(;(ANt#0%)Kl>&Bmzu{)%UteljGEY=V(pRx%~+n%Cs6QV`ybYGWRDM0BN*VC&-~ z^3f4B?@Nz^5E0S(3{9=@^#xJ=MP4(=&6+gN)4E|SbJm-Vw~*McJu)em+4S^!QWS5~ z%fDYxv9SrYXGM#(PwulHLil_SnS1+o{5&bH<)^+^MQ3jt@z)BG}#dx~3XqN&LHI!}kBD3H7BppRk%_ab) zpORf*5+oASpw`pBNv+~~a?vKj`;+z02vxX9UODrjPl>yPOJO6Uz(`)yhkcMbumYxWD<1J z*gcdx`XcW{?(s1{|6wB{wn%cp!}mSm9qvIj^YaM$7@TOgzX$XCzj;^E1b&<^u58O! zdU{pV-7%kZbqQ4n%u+$^GA4X``J=fx5I07`)5Y_SQiaX5{no_uL8tBnUi*!a2ba4o z1aD@qHN&a+se6(HTblf_B31*S0S3eq>i=#ac_eg6C*ZIiE*FD*CLjD9rZiD4Yv(<= z$pdAPPIIXI?(`;sM@mq;aD^3o6lp-84D(U+YEVASjD(e1cIlaPe@!n+yzn$Fzy z?RPGvf|HLizxWhXU>84r{+0io+IG`7Z;lwA6B;Frdi?+>%cIhX;PkKXUhehxicvB| z96lx_P;#Pjwh6;2sR_}CID}eYEIKbj#>>s4h!85tLT&Rc?@OIGv|1x*D8NK&3B;uc z-(}Y-OmI(qcQGV#ZGZQpfPJzRX-P3()o&O(i&>5I$eFbVJBYc zfMXwPF1P;jYHx1;(Nv8WUI^}RusSN*Ha~QWh`Ynm?GOQb;d4XmsImE7K$7K z14l06Q_(+{JpeIwt-)vY)>20^1ZcqEMC!Dp543oOX~wJZ~r zumX@`IU!$%gMgAEa#T}sJ`-8yf}ZeujGA@jJnzLFqWWwf)@TaOo`lUb_Zs{*5JDwU zfGbjhuSR8P$ub$e1@FcSMLnc|!&s2err*?A@L&vZ(Kqy9pm*NN6h%PE5eSG0ggF-% zoSU8z0a+mGwvdBysv+qKyONdY4qd@1u5FAX)L?Rr=wj(84O)brum?g^f&s3BODV07 zeo8p{nnqg8Oh)d|eU$c$-1_GR0BZJJaV@>TnfFm|?tn{*5JzJSco9Ol(I1sThH3!< z>Hs&gz&m821mjczgY3_W#wyeg1ye$3>zI;4P6(CaIjNEaErW{d&woqc%Q=1nO^x?# zI0O|1PPV7fk~euR+n5hs>JgrmOA~kJv0sq8RrBN%*Xh;CY^mpo&5d7iOfV{nq%%p7 zCjNnLC^iYJTn+r*vs}WL6*dtrTN8{dEG#ll0d70p@8aGAr!neFlj!?$53;W;SB->0 z-=R7%m98`{@W;|0uHeEOslUL*!%!74B~V5qKCg|JvjMPoDJOgvnf6+=jvG3|rrzM= zeYi2&=fqMi-l60Ez#Wt0MNw-`y3e^X zB8?m&XfcHR`^R>nysuhrqGE5c!?gdiw21flA^$}&E&!#O??KZX+C6)~G{*@n`g;|5RZL-`(ZC=Os;cq#&e>wjuAQIeWK6#@;hAW`qVT){{x2oiSqNS zf8*v|^o0q7RP0urhH^F5`ZI178Prh+1&AXI(kLxv_F?BI45-VqgL$c&v#7rtLwTBe zpesy4g|XSWxY%ikilh@$7`r1i`LqAm>Q->prxsQjZSz;nOFCQbvZyZr%qH@^ZU@U2 z6CTM24EOk9^x`80!H!-t&!?70JciXWA_02x?@g^qD1imBe^H8o4(6!&+Cca`kFC5e z_d3=le$d3SJPM!nxxjC@+{ayLkFa!616Vi|i-11Ch20Ac);*IkamnSn;vY2i^K7$l z>;A+m?9;mcq6lyMbpN_FgoLK4wN-apeDXKA6O9^|Ewu^a5bL}d8Kgdr$4KNg z4LYj_?Z8o_S4Es1Mubi16IFY4Jh^hwek@VqwrAqV(7++)ydDrn!9B5&@o9PFo33_= zk>8b|4*`o>8m*NsKmZ6%NCOdqitiP2^5>R%{a=#N@fpmKZDa&+igR#8^U|-`z_XZDn|lW;R#LrFF_-5X#XU^x?GZF9 zSZQJ%J9Q`YHna8C)lHUwY?xoGXhbv0D&R3lcBkPb-+e+3UQK!RS~IokK$onyN%5DG%iW+`PnKnkIN?Ia(pG)2#y z7RG$gtx*!^^`TNvOLw}4g7cFdTRDnAyp~zFhm=o#$K4yqFjaij9o=LVZ>PzuRj6&I zx*PloFnB*U?9)Ju`FE>PQrA2+AGSF`?Lz~VZs-$vSY{Y12NPZwAnLFLCvo^FXsPA3 zJO6%dy5N(yb`egrkoCkIC_Kj=9w*!W_%tJ?fJzxhzwS;LxO~G8nK?G?VX*Rvr+f1C z@yBRx!+IXv+Z`wOQrw;rgq1@Bpc#)p-Z!3@Z--KR6e3V|yWoopH(=WKdPpAh*3MqY zbz85Wq$~B$Or7U@Z2YjIs!WeU*+v?sfEfWF$u-{ZWNN_#9pZ#RMx}9F`V%=aCA6bh zSLa6^6F)3BXCFmE12WP;y=e|6N|)jtKc&+sHjp@Y*dPlC{u?3+M=`Xrw7tqV8veJ3&kD!$fDJv+aa4R0%WTfqlIHymh+9mHAso zDMR?%o#6uE8Lx7Ig?9XAP?91nPuwql^OhIC{qvV{S*j4d9Di8EK;Cn)UffMUc$t z?!c*A-F=uUrIh;6S)nbI49}HSVqfxOF3Ik9jCs9P8PUs_q)o*i7Q>skb}umDtC0`- z`~?zhrMcSyD{gL_UtMVLNnZO?Rqmb|c7Np+>Ak0|0Ul%I4-fC}mjBWV55}Ru1ZY-} zUdshu|4r)Ln%oh|8s=t^7H&LpKyyf{z;%f9zhC~sp5G}E(r3HY0o!~dL z^rna&w0`+Y5nzes(=fpNdnfmsZbj13t*vM=SDQGF7rmD~Cugb6NB*Jk;6|XV=%dEc z4)$tGQoU8-t{jy}8?dT>QF)5u)T``l^>obnh|5u}>OQ+Q=?d@*jK?X!aN%-a#N#mc zEQ(9}x}y#Mi5mgDIHHigqgQ=tDF1mE?Qo|&h3n(Q!{6LRyKUq3`(Y8rESF`k_Bma@ zpZT&<^rbgofaVq`&37W>Zw)kz6m!|+!Tp{C7Vc7wLf*ItDVuOxb3L&JWhyMSCPj^A z7E?*T3bphZR>oL^3|5sS_L1Shkex@6=2u$Vy%n~z*CA<7UpbUEo&8z}#jsaC{wdgB zVJqc$jwc(Ja!(m){_e8M(Og9vTY{|0VL9}W(UL=}5TlNT^D=yZnW&zJYG!4xSo=Cf$b}|GdOW#ZoGP^~GX1 z-^@R#Az6)Ms&I*SWwFE8RERS1WSMg2caQ4i-!}usxWkOlYlX&D%uZd~y4Nl&%{yif zsYAm&>V4-LAqRXC(1Vqpj^7_Sx@Ywgp5Nr z9NRxe^;>t-nUjZ7a;^2INmv#~c_lPwSy!D1MC?(S4+pF6kX7%VLe|)BeISqouFvf$`Lct7Kl{L2?JY+>Y5V4XJ30 z+a}s~!);JoS;fi+X%?+@KVbcUOQK<1)bS4UAk|}{e9I?o^LdV{K7gvDshUd87ordx zyI$j<10Ja+t@~Pq0o{rGR8bjXIFlS;{ri%k73t2X0U?1W^WxFkR3N2< z2i}ZZ*0V1vjv8t)#=4x``qe|56O*b3);sJCf72Bvscs%bRr0r|QnHRb9O+@m z^Eg^hvX|(()$HS-7!`Q6BJ(kv4%c;nD$?O#MdhOFl5Zt4riO$uU9iDx&szEhU4DbYtKy(v6t(Lnl?FTyg5RT}ri;(EhA;25+A`*2V;+H#~d7>Ix zF}2~cK$Vi6O`0gduZpH)CB`fK9VPhW^|l8#WrldBuM8>Ee#!8*_Lp=vcX-*?=l7-E zZIqXk)f3HOe)ueq{!StcCvyis07_2Wd7wLQER0v|B+h6%Rx%H2GFifj*+sWwOJK6dWqp>p5*qlK zpqHIBCW9XqKSa~`%S5!f)~g1M#V(SIG`0^Nnzpq8yfzUhsf{-Ytoccb6N4SyxsDx` zJ3UksWyYMd>ZM+0SNq^>|JlUb{Jr z?iA6%;0&-PC0A5AcFV_^-hsEMTZM734%Y3p(^UFgRo6)x^x1K3vpd><-S>it-$feu zUOCrZxjZ1bPpC6b>#_XtMNhB36E@+`*CbHx+Gr9WxAFHJbR?e9=#A}`x4&m{;BEV` zQe;pQ6l3T`WHvKE(s5Hc_Ba{=;nJ%OE^au->4?1AVc`JNF(H(U!>p*r=jygX3pFYEpu@irjoEC`%}EETfno#VA;F8rN=(AM?CYm>O)@(f*1R&( zGK?*4iJ|by$Eb8&9|A>I483oox+gj+C_8*DQFE zDKO^2a>o>Yu(Xhz8KAg7-Ee~;Y&*jvcJ1`bLD*l>(?M~V6?uwDMoQYp`R25ZSVtmK z(X-4~!t=O!xZp*%Oh8S!xL69~-57TyEvC{dYr9nu+Zk~MC3TSpy2l(A!yc33O4i$I zQW^S_Kg|YO^%ra#QeQRKR~Gct+nsnWb;hhn1{Lm};Q?-O?A<@tl-(;?=SjmVzU;4D zMdNyXi5X5U_$Q!kS25ODhnPjMp*UL7@*X~6eqtev!9M^4z%U2`_ zy>7Jxo|!4la#mLpS*C9t14ySyho-*3^S^`Ub~o1&jyNCrfx1-haPqN+H<8c0{YSwI z6|aJ=45h>1{*72k@7pO-zP+2*MTQMCP9e}>6Mth}uh5Mu$JHJYCK5z#(UVVo>h)ft zY zR|auFV+es-GW6fo5+#%y-VbVG$P=KUlzw*~szL$<)kq-j5VOaRh{(_V1y>W0m22$!$87TWd-tG3Me@xK07Yo#e0{dcv(lHo z^BQVl08XF|q$ybFS3y-Ga5uDs;JUj(7{8K*i`{c)KTG>}A2>Win`Mn8;H2WxI@ZCc zBkaywCf+o8?|oDo2cfTkLhxq#oTSj84hK+PUZ;ZR1wq7jpCp;rao%hRTaA@K(tRbN z_KWSMgR72~s;LhHvsIFrRFb}nIIn+kC&#b5=QX18iA#TTbRT;OET6o6sq-xY{AS$& zK;=ndOJ@MGzc@?)oO|Uwk>5PY{Aet6(bH>dbKH?q(o6e`^Y}Jk`N{4{dw^X6dWShw zs6zsHRa?EcZ-z-*Xo8?6`A?haP=J7wb11m=DhUB_ln17adfBUct&t=&HLhEHYC^00 zo~JCQ0M1Wdv|)Mf8>e_#T#Op}TGJc-WzBIUng|`;f->=>pM#1nr|JiMdF+f)307@AEP<*Pvmtnc<1rNa) zt-zblI&leViay!MFQ&z=9Z3_PT2!LijNaL`E7(fJYBak_|oGiSF>vFSY1)Ug2|3+kU6pO_8M>*$lJ{QOgH&TTbS*#zgn$wYGq zc6K-@?9@Qb3C+{YNy3WH?u&*r6zhdJuJ&eti&IMZh}+E6hM?7tvx8NtjApElu^r#v z4(>K#G0YurO#%;tR@6bRb6E$vv^(E&`dt=ls=-w%9OLYuPimk@U(D;EM|j0AL&D>@ zHJrhxF-^i#FU`DK16GIxGYCgZok6bg0*ty^+gnG7Mr^m2D7(@0nhVqR@Frjf5prm8 z4iy_Rh&tAzfCk6(`Zp;hj}ZpYLrhNY9PZ5aY2SNMq{r&Sp_)<&fd9SY&9h=63Q{^i zP+BMqy^OE*vz2w2Un+!CLOfJmwx>dXo|9=k`IB|CK&Ra7IQ4pWCF;$9q{UFD(nht@ z-`KiPuy)M1Q8-c!G`gG`0)aV(OaI}kt~5G-EPREiGzq);OZUy`8Va=BJl`zO? z#FC(XQEVV+kO-8?1wdWf4;p2Y2y1^=B4J+FKX!!#uFJr zVO71Bp%n-t$^MuAES_?Aq??4iPD-VHf0ZWNU@=fqXbgUjV()U}N>SFsf*B|^WC{J= zpo;`P;qySaH~~Zrmyj!gM&J!90Gq@F$XM(^44v`ZktmN={lFTb6F@*KLimZ>&_@R4 zv@t@b7R}R{2iLXl<({H|)mXn9b+%eMeI7<>AZO{e>#X-l6)f0Pe>3MBT8x|a zJZyY-cCZJ;9#fzY<{H#JjAc`cWO)=jw=<>n?<~Nq?~R~IfaxqC7bMUCQxk7egoMYT zfU!lguzG`lNCfE20=-#JgM?sOVOt3~HPjkP3SR7n!x&GbJb)T<-_+7stH>oX!D0;SpdtKE}XEah#vi{tc# z{tSNw6c4RsS9}kvK-UVy7_J3)w#pcBv9YQG;K|_?O#@x?;aC9%{+;VMzEYUa)`#Af zSPU~{rwc(G&JW)%ChGjO%G(=JN7;qr{W8fVxfn0<_Uuj*>{_Y@4c_Axl2C4PfCBRJ zEijISpaG(sKWU?Zfg#xxIB%ea+hjsHrslZ103^FN6W?_b2n&G#m{?N4VBXsb%0Pd; zw-||}LUBYW{FDzYMutDr&bgoiibyESxpqD)kzEK%7zQWlaA&do9hb*rGd0Re@Xi40 zA(U|0EicrP1CaL9ua7f4j;KZ4_XGCu1l(8YXeAt4IZr{O`h(gSNQCI7$Z%!R%AkqI=i zF>)iyk+Ws){9v-I{JH$2!en$8^rTi<&^Z%yh8cJl62}%gFg%DFb~LiF=Zif~n&-*( ze9cZhZUCcYIMbXaC>CD7_0|jFl@XxYPWI=U5ZOh7SB@zuRTgXg#_hQL?)$E+Qt|); zH@*Z%7rrJqQh?P06SP%#X}Ab}33O;e70my;Lp(^0z@d7=m9XHA zO{W~Vn?Gn8;oJfhq7r&=?8dm91P%{4w1`Fz+^tKsCahK(+&9h8a5h{!3LJZ6l}V3@ zw!=p8xgZM<#is8;XFwrbz+bn}gcT|_K7g{k^&5?5H<#ACa3rx|m^NoPnzU`WQif7+ z8?*u#lIE^2N8z+NMg$PEB51~z+UY_a{#UG(;z3u_Kp{I@7 zDY*gkuKje`|McKfOvpcR-~2@$JzdyN%&IKAa*e`{28ECg6v1{BC zN}qfI%1vOpuYuJ$pT!V6r@cfVXp8RfFaO*^8*c9Nf_V2jpR4mX1DTk> z(Fp@Oj1j1jZij0`OdH9<+66sp|7aw&|EZC-Nf7pnw@$V_UbUiAP{4Q5$Wf&Ofh}x6 ziaXF&+|kCU*aM_Vdw9qL-JhQ|vJ^ys$n3DxX?&gj(Ak3d&C4I>l)hI-pZ3bYLPmk58!%&3usTdh&dj^J~efl6k-dwz}{`= zANQam%yZ&iZ_nb@8bMkOP@*{oiz7 z0G<_2$y@R-O^QQH{P<~!Iv()ZW>`wf`wL3#=9y~u6z(x24Lr950X%E3{e+N)0*~SL z>#lggDt3ObtgNQ{Mgrm(K!hM=218Pq{!RxUFj~8lg=e?#<~{w2bN^XkFjuOW;+XJ; z3foa98T#hcJ-woUu$~NwA@fnyNvUc-N?TX`KNY_UcyueiC!^{_AzZYUq{27Vh;54t z!soVYh;f(h!T9g)T+}f;7YN-#0Q+~HCtU-V_Zn!&co<85`cgxs}v87_o z9vdHDuy~N8+;zu*`5n`Ga(0O?A)E!`Fu$qeb-ag_aD4EGE&SyB2~+v^awv=M^<|7| zy!Qot`p{~(U{P-+z81*?L_ z7>*whO}uiDAclw@G#0RX(Kc>QY^Or-fc>Y=T#1`^=e#tW=58Nec%Ot z!{S>zA*@!HgM#AvP(Y6|z%+R?RavUT9!@8AjLo6w^o2VLqevs`HW)~+$!R{Ys(8qt zfIovTb-C;R>xDh1SE>OaPI_mk>1>g~mEpUF9aZV0DMMJN=MRg|yBx)t-hGlW)Wf>a zR;ryK{?P!2v1Hr&kg-zJul|l5<)DBsv((rw*Tb zw-;aytZK}!f5!nMzkrAms1BiIq4X;OJoc%IJ7pps$jV!fi`&`Kz21SWm!IO{pTVX} zl9gzuwSEY|`AxXowWx~|2DCp!bR2QUb-!M%5hQjl-@eeed-8p-)Wvr49Tf@NTx%)w zsqrs|)-VRC=Jh&2>mO>F7w2FYr;!J?4FX@wzBUp|^}E$Laa$OL>(5_G#lWJr3Fvzc z5zbp%Hi2_Vto3;+t^@oduLfYCsaR({rcp{plWw(ceFJ zjdDxAh0}@>A7(e47=*E76MBqUzdNhCy}8gsjNp1VXw5Wd4x3^{y5a5Ab*)ZCS^`(c zY)=xj0*^KC3;#aMd8i-mwYfxa^i1eZ!wvnv6b`fQ!oCi&Ab zfLZ}LeNC_Ydnf;sp%0b+ZHZp~ax%#{=iNC*5&H#^cmA^dIcm1d&wMWJ$ya(Vex*cq z)bfwCXr9tyc6KL9C0$gz?%3pKXK{0SHIdx8j!N?cI?SIAd#uf!=#q!bSXks|RRX+m3`%oZRMVk_h2{*5gl1ms?P z9}Bn;oiac`aU?B%hP=cZw_iP#*1Oty=xMqA8xkWe54g-}HAa=o8J}dXAuhk~6PXw>i5n7MLPC-tu?HW@zE%eZl(3b|i{JXmBGy0I#bC#)cvwg%M+O3-Qg%pg{0$ z*$q_NFDe|&Jx`0P0BK*(JDugQPGA_DlDFx3Idr6j>3SIat@`R3pDav@kPjmlOy(_a zxULpe_758Z|D-C7ll1m*d_Yl8`cmk2??mrx(PCzryVs%KWTeR|WtEP+D(7Lgm`BZN zvXU-OcPJC1R|dx4u~KH>>RQ*{@7^;u&gF_q%*jbfwj@P(rT?7c-8+o4bKIV)a$Ip9 zQ7M*KWwCCnQrh{2rtvErljwN1#fVqemRSsjy_kt00xBsDpkJ%5n#>9I!ul6@0Q1}i z^wr3ct^LXpw$r+B|7Mv`-cx%2_DI<_(YUZ(qYePR7wCR)hnY zvS%aWs3Het@q&}5pOo87zbvmTC*#^*?M__$j$!O}*R|3;Kg~2_Gy|-$U`P~Qy$+4D zU#u6J!PO-`FNZ1ZFa7HI<;Qv0>T~`#$H<2Y*R|!I5Id4xywdLV{JT$Kt8`*63I0NB z*k-2$6yFXJ1sI1GAR>H*O*(pWGhZ!K9K47bGu>io1r3;S>s6b`GirL*ULbXnlfJt= zR_X!)T5=mxgPL{%^HUIO1I(-!PKcxBF{a~jxMnQVz#gpQX0Zqyut31m#H^jGoU z*hJh1b4p?JMr!!)NCXYLW**MNV>%R2_UDG5+GctkQD>ih=!nd8y%17BiEl}me_JfX zu>V<~Ye1m^ADOC>GTxby0*1hiYKN)h1Td>o$Psweq{9q;pw~SD z7*1Pv^T%b7!eK28XY=Su;9X2~=h1jwu+f4;7^*RG0tB(erC%f~tqQ%o$yfpNI^vxy zKuNkyhY6SGtfM1EL<;SII1LWjeXCl_G%gU;it@Qk*FHJa9!`S?BbQ(R+Qak!ZFGm$ zb0UA*PN1-vgG~3E4L5H|I$u)06n={u?|QL#^0mR}X(n+Uj8+0snCKXo;pR1F12n8=IAPF2sk##%Y@>_2L7+T?LkUv@wRqBs5xB2UX(kI4SWeo{54VMBPa6+zpX3styPcbl+e(l(;Bn89Z zhULprJWGLCT0Ag|y@hmibom38j31JhP!cO3tkk$AD3RnM!EjzXl2+7YU(5}T8FjvV zf!AQ1fZvAisJ8apOq+btPFW@gWek1jhymhPmc+f|9l$mf&rUhJgID+a16x#{aCN%n z!Y1wR>P9mzVbF>7co~&!5FT%f%OQ#)jP2eIY@f+fd*j<%WRCCN3S)eJ6Ueqm7#q@Jxb;N5UJ zHV6l}K{dYcnE9x~;6DdS(#@v3gE@VuY?av%Rnq(?FlZ7;1R|3S{VSkoqWcZa6aPtc zjh;iAHD^z|$P9gFI%R8i;kDQRUwza^r%|kR$qr-MMEuAq%f> z<+>9EGeHV)2v^jn>m9L%^R*)EK%hv!1N3_%No&f3pYPrG&m_<%F-9jz6@pP{Wv$W% z<3F^k04Ewqq*>}2?^Yh|#gIuZ4Y*7s@NcFmqITK0nB;MoUAL!t1-@F!HLe1hd7AQT z$k_KEa)SJS10m@YT3m;z?gb?DY-PxFJXgKQ|-xr5|0n5mn@v^V(r#V36 zVg}Fc`#ryX39aww34i*JTJRjR6QokGC8genT?hXL`Z0Qsh1co)dsWkN>}ard8v_hh z>9R$=KJv}d*8eDiivM;~Q`6_hw`T*2tgvuJ%bVj#+m&$%?uJXg+xXN=PRZ5kFt0$+ z<7hy8sEejK$y)ywA3zQ@ps$xo(1P@FJm*=!==|n*`CPf}DN1AVmPo3xBBZ(X2;4&(RyB(gl(qZmFRgvLgFG4M~RC8N;l| zvQV#L(w<9rtJKcx|?L7ryHJ=&bAYKcsme+EKYa zCC**(Y%nE1D0ha>l?Tv(5>hM%v%m$!LvTOpb@^E&fOecaIo5!1;*QoZIHPJ9>6QOz_D2nwBkYT0Mc5Nk3VtY2btvsqZ)v|wI>{AAug$if_?#}hvcZO zvo_k?)VqsL{JyYtKCh=#Sy>#2a%QegtD*z?&^Mh!kma8E`_MT9EEv{-qQEV6o&_fYW76c^I9b6Cz zS|y|W&lY)sZ}-9ClxdC_ zbi=2TzpVd9=^d_}3`X>it*n_4R1PQuZlT{Ni2Rh$x|0Pci!ji2(a6&8?f0N+Y~VJj zn4kvex$ut)MF@#PH$J6>APY1eyle9IAlK0k2>{8f_V7eMl&0ySJya#ResPgm{UrTg zdwhT?3T#h;f7qV)kl!m0%1VbXHL=xMEF;fPSYj~tY%eUFnC3Xay_!Gz`;f?v1iTXV zd;UWbNawAq%H+#;o}lEY5^V?}4WQu`KA8=F_6h;dz^(Jd+nEgzM;NTlIOl-oNic$r3XfNmq`=Hn&B8V_rF=W3}ioxQEWvt>I)O?+NId!EzqHA#7tb+MCJ_vOM3%_kifw1}cNbXQbbo0!^~B zJK+9jOFtG!e8jN-n6ga*xdGcE7YvVw-grf$!1M>3B}$E&WPmg=sVeYfq%fF7ObD3f zkpNLG&Nca0I&YApvl9Wo0bSGpF3@UOV-P+K*#D-52CP-<13mTV?Ulk?q%(bTwR+ZLT|rLiYl-)T%Z+dm=oAgaoCYM8=@EP7ohmO0Gt%nZ zC-RLA*xDMX%aev1!LS=3ASkjER;pKo1UB4U(_r1XjQISyQYn@tgy;b&GcX(kp>rUw zn5&%uCBFL@7D_(ixxD`TRZlD?(2vVUeYYl@0T93Q_s>_sw2wgJWHPL;T}cAV6crFO zR5QN7FpT3iyaSBZcAz{fPiJIgRC-?sfo)DG^d15$ z$DX9&6bC|{-2zLvptYP3>rGl}8eJ+Z*xcPSOAwoIH4kia zzyeaS!uHh7ft;9cMG+~)l4Lg*(KQx_fZY=U zw|xJ8{RF}lw}2ar7RLE;T_+pPBr#>c14gh{y6rKs8zs=lK*~UvwH2sfK+$OffIddb zSMp$IH;_@Elpj2}P#V@qx$&aHP|vkWVTq7sqcux!IH)14@;DdsN z7*EjT8d%hkZM>_@WtRjVRKV*Cb69Rp3;zg2Ic5wiC@T~BVPx%*7XIQ_BV&fstnOA9 z4~4b1&*lL)QEPFWK>@?L2Gc$_VE2Oh*T-Z$JEqgsN5F1IJk3@PX}CVe-6UuWUzFIK z1k$-i`HF6~_sS0*ilCaPn23-x@{el(6%u=MetW5jz=;Ou$TaUW01|KM9=-4zbp|kJ zs>bc!j+HQGD<{4KP-_t=0B?qJiLzBwEw1CSTJy`buR`xYZAcJ-25zXhUotT~*UT)T zH}JLtDY@|%^a+y1GxRjN$8F8uYIyA~Z1D6diEsJR*>jBx=$bp zHR>O$Ed9EO#U>No-(kEw?(&+dcP0r8tq-Mkrdgbsu3>dD-N^o--Qa;~wG~A-Lc!p= zX|s+ON<(n6{XSu%o}QQR!K9OPZ2&eA^*Nz%v0;mFRuIB;#VS`XOXz8iO4NG-dZYCK z-b$~(GEQUa&Zi=f0_@ceY3qZ1;JHVe z=F&{&8~2G*=N%YrR68v|OfLn7@3TVfrXfdQQSSyDgc7VIw1!g&MTMV~`*=LjPFQWa zI1`p_WVaBZH>mJBb9h#uMMl676jb&8h<|8(_=TaMGfzNRP1Ip7C^pCMMIRiV+ND@^ z>mJ>b87B-VF-s ziAJf#gD+jtL?7y@!G7ZTPhpD}j|`bqCJs+e2e&E3YSVSzsAJ&ooFwM4v9r@xA79WK z;;s9MRWNwEJ#?Z$$sd@_tx3J#yt179D(qr29iXytjjTPzrswW{x8u=2Wbr#6R>+E^ zbdf8*^`XK#a9*8e#FodMB=TnEf|T>nruK9pjMX>=_d2U1l2LwtNgo;wZkp{cfo+$gg-x5^x+`py~wn^~Rrfq}Cc7PxY z2?z%v7g|H?7nU9k<>}MR?~H>zUXP4cdzrfy+8PpATyQcw+gqn7R>Jx{Wo{o*j~hGz zBH#0zl-Cz`AtpM5a=IN*4OASv#-F8c<2WsOzy}L)+gmi=h<~64kA`9z$bh{KfHhc; zTEr@HTc&Kp1R%7|d4Tr(pm)XcWuMdb(%v`YfiafBcdzc$OD?qlOp3GY7BixwW8F z)`#V)UOaWhW4C%qkL91k63l0pjDdZp-gH~PYSVmkrk-lpQn+=|hpjtP z&o^Cp!o7D1`LiQCs`zt&KGe65YK>s%gMXIAx^B*_;*Vz0aTv;-CQK|d>sViYQtEwB z%^syMe98v#B}8nO;ZYob-A6OG>u~ip3r@KKO1$~;&H9xeJJKycM7{&;8CFg(&uWj@ z>hLT!Y@i2!fj|&@ZORRJzY(GiQ-!u#MXlgZLa0|ez2y-AF}KSQlp?J2MXJqwkBM>1 zS1sG9z~d~2^jMOQ#4%VbE}J52W0uiGI2i=8L~QI_*Ah+`2R9;Yln2j@A zD;<`Jg^F}Sp1u8(Ih-TGV`AhAoAu^2(^mT=Da&ik)Kc_^ilDPQZ*kzeyyc~%os>|a z&fo7C*bMJp)0hS+>+JUWO13XxmfP9g%;e(fkDlKHM57bG>qFt%^SL`FZpFdrs_#+s z@c^nPOZ9{Q>po*9g^a>0^--`k?6&M*!h3U&e)B(ztG;;WB9z^X5B7Xa0;ZxSWU-lv z5tuk+ZI*YtjP`}Il&?ZQum~DIWyU&^v)k2w^*a_{a6?e~9V}#bx{}*&Yn{wd{~&%# z`E)oo&@sFsQ|3p7N_|J%URKKzO1;o)EE|_Kk}jyfUy&=&==HCpX_jXGy(@pUhsGBx zM`NY0{VlZy^?lIEl;ZBZCUGqA6sxFJ=8DIt)8LFD6(T!+NCj?mS=;vf2M@+K!hC*L zYtx)>8BDBlCMhL2=ts69CwSwDu%lgHesBI|dSf!3Pv)ZcVA7?j*E+exaYNzYWet}% zupI@JlSM{+@MX*HKQE4;a99>B-MFE>o@6P|j1ytaI0&-%L5*@6Vony3T zfRn-6BPCOsf4Q&iAg&brPT$F5e$q zP%Xwe7s$TF^~KncE2p>{<;Wlrn+tpBihJ-kUs2b~qUWeQ1}4I3@cqo>-1UR!Xg*{i zx_a>dvLMc(e-C=C+_(j8O`m^j#Y9L46TgSd%xcJ&g-t#B{C3cFN~do9_&+Hdd2n&< zR>F0KIpWMF1m`8oRCHsba7aJNr@Qn{WLI1_>2tbTy2e@4@0(kGTFOuJ5K zw#@jy!;%p0=9i4oAbafsz;ewI{0=6DWDn$da;l zcCh!JAb17+M{Ll6rImO8{hRdv4;#Hv;)tYMfaL#k@AZE@aS%AyBnasKTf7uUNE)*` z|49e>z-hU&pb#+V710k!@E?8P|CM-yXc=K)AukXt|Mh^u|Gc;n1e}SS=Up{g)F1+{ z>6;1_)oQr`<f;0>xgd!BFScAlIwbF8(FwbuTty}m^NJ)@tDMLEsTmAuBpLo=}2GQ$zs z`0>DPjt5MBzD%AvjY$NzKu?$JbFl2<$O^{*7jxsW(sL?MhJ^zIlodG8RFegB{#OyHk(mr9<1&nyM_$ZCa_&95sm4s8Y%AaVY=aEy;OG68213EG7R zK*X+HnsceH^sW85EM;pyWLO`YRToa7KF>hktmxCUQT* zpRIKjdpW8h1WFn#TwHn`WPhBjLQTmdGIDauua3X#MS&a0?q+M`69K>qf&l%W14Omv z;?#S2DB$vtPsT%%L};PPV_6i#%$M3iBgj~l^PMZTyPJsVM0)(*8^|2~Kk)^m66i%l z{2zfIX%U1Ch@1g+By3YqceWDuNarrh^oLg`5GDX}Y4r?X!|p&&j@CM%&%I@HAa!Ri zjfSuQdq4Ye?;1u09sxe-{A9U>qGpOPHo!N;EDHd>TL284)=Xm!t>CgUmxP2pSoIc)kT_ ze|5h+d8wH!coTW`wg;;RTnx=YLI%*x*Zppwq(wu+8Im=z`%^WJNxpl)!NwjEog3^$ zC-H)9{#98hUPTnM+zrKO`!MEP*OV zPzp#RdIU7E(-Ch0*iFZBxEw7I%tFi?lVHGn4+ z;9k|cRQ?C0#E#f#H0PxB;G((kJU@J%ubDEk>JF@-X!N^s-AfQ_3)uJ8F8zUcoERAl{J&&+u~xkVdUwlh~2V{54kW2o4HqMX>J(tph1x z3YO=Z_5KeYm$^D+BWVrPTvGr&CJtfCd}AQXvj!Lld|?L4?guRTfN8nE@B@Tk5U}!| zT@TEWv1fbag;^E?MW;nb)X$3tBt1YfS?>AwA%$P3wa-7EtNTd6$LH74b~H5~{gOHx z47{-_i7M49IQ7aY4K=`&%TbbjDXj6alty~$sJmN+IM$a86X)3brCt4? zB^Kb@>a4ohRCiMWgEBkd{S->o2tO9S0^42@P;>*|o`tqUg)agf?$H!*nLJbq;FswT z=i!xBay=-O;&&&Kp%)X<0AYJK@)B;&2#$w|icmiw%w}5@gIJ?2;%vC}!eAjk0(x4! zDx2#F*QWEXdrfG{b5x6;6wPPZqI5>S0O?HR-)j&Q5_>0-UJUBmWIU6eY{?UYd{$)> ztM(*c)uwG3X!x{>#(>@?MtsS3Ob_LjhgzSms(Yu(^b95VEkz+>$>JL+@6)=E1UAEb z;6-vltA25cN&wmyk6M6|cBRfT)hkL-avMN_{!@%Y87M)y`U(>6L<$T4UW5bEH6Fa3 zDS6E2zUDc1jK3dx6VeJ#Mc`&q3?gOU4}dt3FhH?SWA^~*;ZLEr0~AXE($VvJDDgFv z;Ixp(+JGM1=emoCO*6h_6;-~!eW@5u%mRRSw4bpTTkQjPjf4t1gsg=QY}%z3xF)se>wKu#9fXuX76LbWP;@4(>{xi=Be%jj!?JlB+!U$a2HT{u~rDW?{PhMCh#jJCoZu#qlE-%C=+1;-@&L3$?9Q;TKR zb#CUw=SQghY6_P-fZT@|@A_`94N zXye_Yz?D}&*)supCW8@&N7~JHyyu%gfS(!+8LRXf!w9$C< zrt|b6tBC6j|667`V*V>CeiOTQC|?8SBW-{`27`=0P>Btzj5UUsdtwvm!$ z;uuufXnCwM5yWvl_m`P91qz0PiSqf$XRm%TBy=B+Xb7ZOos<}sxwwXR=L(l9F_BJk zG>-2k-l<4MjxmGN*c&b3r#2qvfixjo%CLI?4Q@%bZ*q0IGfPs|{A`YGL|7?m9@~$d zllbfw{I*n9d2qXn1j9zR7qP-+beHF`Fp2s~4OvF;bArFi_OhkxfXv6cPeCQ{%t;zf zXu4EIg0=<$N`pn^?e^j;|H@*%>VB$sow)cy9nsD+QQEDCUr3VgHv(4bKTH|fJ}%f< zZp@Lf@>rQ!s97M!Vp8(Cc?{i^BSaSDAN+)7p2~-ig_^q`+yJp}iZr3@AWDMPq~S<) zm`>Coxll}sfBT%Y1_-g2$BIX3gqDwNf!4te=qlS?Fqn+hms$&Dj?Qs`!m{F(!5NAv z3g#z#7M}@-r_={VIRE2op2m(?IcNd`P_cPZKYORpgHs@aPr_Y2UvGNu0Lqpat}!w& zdSc)%<9^@#kpM(i0i2?%zTM2@Y@SR9y`ge41cb#7z!X{SI5`obIQ*)WBUBjo4HLEDH>KmI|90wRDMKF51+p51%J7%h~B6zqEb9SLS44)Z&BO@)su8=emU z**8FPno?d4gg;x{CUxZp)%(*pwu2e%2Ugv{Du564 z2l+Q>og?s4Ali0h48}b-X5tgFSMRFKuBh{kjSBz*x3n=06j7A+7;iO^U|6vdL zzsvLge}(;jId|CqBH{l(6xsii+y*xXKPmS+?EmJ)asOYS3IBlx%+VD&l~71M3#iV$ z08UWHSV2dMkp%-a_J0LU;8k-njnx7NRRVaxwKxAeJ}wE*M;!&cgYEsSkI+(t$L#x{ znETQAiX7&mQi`M0g2s2wLQ=-m0@)XVR%(#&KhVMg`XHwFdMJQwK_fYHT1F?)E%(8n z+C}}B^d~)i>i54T^m{j5aQh|T5>Wx*@&Pa@`hFq-Vhkj3tm@UdS=4@O1~QF-R8fKj z|D)Kavj^RMfan`8p?j}`76@>86jy=YHYg7x#qRm9xSifXUjASxPsJNlmYXS*AS~g{ ze=jsNw@~j~ zLS^6USyQLW2F#41OMR%oG*B}0gP<>mN1v68W5K=O0#xW|wbJ)JQ1aIKpI6#z+;adA zpWCHj7>%e#7eik8CAwgcbFmU(w~!03hbzHe?1#wN(L&jF=flaLoCtvcC0pSaC&7R` zfU&V*p%2{LC~#?}%_RE(6aXvH0!=NTR4{dZa(ew=d)xo%M?=5}NTV>rHr=`#bh2c_ z&i~6Uf>NASM1+!0?xJVA?{ssVp!xbAJ@6=CGZ72g4Z(iI)*(O+voy^b`%1NN~eTG(Z=MlS<<3t0u1(^haD zhygN(-zMma(I4Om1%qIpzTX!8@~8=X#tM)Ub-ikPSmPjy20ZA19~P9?DHWlvau#h$ z$>Q$8v~uSU2@fyOsj;9qvb)K;g(+Sf>$v557XVa49PD!-?jsmX>Iy(gyed@a-_i&{ zD7hLQ0LW%zwpIWfZT}3ym@fB?OIh$pqiL0X<4HrB;E1=sp{*I0IbDcLfrW4M+<|=| zYYPV}9ZKKWX^D{Tjf z!%jtj{n+QunYT3)DSv}6BlQ~DUHo`pp|_ZC@a_4?N3Z;rArb_!C3i|vp8`%uIC}Y< zfXe*3{-yFAJwdb?8H`UO880Q?_ETVZ&%AgB#9%++X)EY}E!p!`ik04Sym-l%Te-jr zVxC3Au3Uj|UqP48)Vq{V35~a=s|qbTBzrjl-~nv(5B}|3_M3op$LN1NtOA0ph0!HG zQ1zpI=zv%2(oBIn$u;1HK=Oe|L)pAssf()tw?k;b2C#q1st-*b#sJ&%%v55TTx{7W zGn3p8T}a0uVad`N6Hp|^&z>q){9ApwEgndLE3gnt>Gan6fRtm+%3Ji0T78K7zQV|v zkBk*LTMWhMDPb-X@KV;r0QA1feRg6 zKj?y5E?FsHl;kGWRp>|pfRNmw{(Y>#3vvNLA=LuL$+-a^F#dVQu+bmp{m$8^ZvyqP zkRZUqVO+Py;n8Z}hyHmY{2*}UN$wPDq}*$|gy|8!~VtZaQFN$ceKc&DN+XiVmo!(0?YVM*wA z=pL_}-$}~;`|jvi5xnK(EhbsN5Ky`=cQuB4>8SnL?$tZ(G$4l$vK133!33dL^>=bU zd~dNS8tK(9{6D382|y?ff*aInmi7TsAuHtGmKBWD!Rzb+=tWFRws&asb3axvxVGBv z_<;_06~85qY3EQ(zoLX4qmgLsE$;!V97p<5m^m$0BSGRjx@fC6NH3JBFAKDv!s1TLOxZ9 zuf^=|^@tC@ri`;5T;}|@@xj6(gRs@I=r^!9{#+o30Wb0O&$8jC?@;4ds-yS9_n~oc z>zdmQ#;aGtG-Av*d)L4t3A0`E=}D524>|2nBD~UF={1$qpi|5Mx$$m1L5RDskiNMn zV1z{F$bPf(IcLE0qfbWtl1=~-eAxg@>B6xVb(#`HN8y3=XaVVhB!ZXh3P9Jnz}x4` zD?l;J7`OD25(W~z7=bfn0BVJO9P+=k08C-3LTm#7OJYH`wpu%_$qHJRO#j%+)!v!; zhwTj_{SH8WMDl4eX4`%hC}nnvAtA0K1n(p~Yyhw@3?=vD1@2HQ8h3e};m(yXUlRf9 zOG=JoPlFNzS^yta5#Y0@sEpl!N?P*rnw!6k-th%f zKL)2>bx>JFr%pLKRXA>2w$HzEt>~Dy5~)y%?dtewlm|z~vp2R6v^>4k#F0nHW7viT zO!#df*XBJ2&wnBJ#9r-rs8w?E@$vI(kLBp5eh3ytCW2fQ0S9}#bg)ZMRw^X0QIS%` z!>^-1NZCS|6%f-{`D-)HdcD74vj#<7?M8+K%92lpMkf$X?+D`*-8-sB)Qf zoiC42+KQAVUY}|+`hHNt(CdC_&?TW4i|73L>1V3jS9vj8NB@$=u{G{Kj_fX)z)I9Y zh_KHg?0HNL_?;t0zFjCtd*1R08}fF14iVAYJNQaPA<*9ZT1{0op2bQ2pKHVX%gn#1 z_2ijpI(;dli1sTp4WX?KM-#vJkRmqPZO!Tmmw1_(M@cFsBl)y$^d}ImcZ67_JeU;M zV#VfTNyRzryL>O+#716)8jwInb)D%zNA?7<}EzK>b*qd0b&Wa-)Ws9S-;Rr2tl}FSv zPBeCL4*W0pS%idRW+}o5PAL*HF-kXGv(`m7d=}jBly~skV+(r)hsIkBPc$p)%lPK& z175_qpM0k&J>w#`e@9sT`FYF<=C<4RV~C$Key{uXpW#U)(?MY?oli!n6s@6Wxr^zveRJUt$?LB+Gn_lt(5L#&Q>lMj#&qdNjtQ~4 zKCKyj=%A#vTzMW5kpP1~2_91Vp6%oNu`T@p|M|1dQplmSL2!Ww#N56#G}7!zPI@Ch zBro5sVnq46@(>D<$=8~rV840gB}PwZZ}HKWZ!?nbQj2slSWQLqv|A6z<(?*;;gzHd!NvH!oVVH91=l7ZYw& z!l8_E_giOPf7@#m3*Upb|N2~UR$1WQgw?@e5FAzoAHaAS3lwmM0Dk^qg!lwI6Ny=I zc5VyfTd#`s(_5bw4B|yE06V*GW+jUeJh`)u)ti2!KFu3a?hggFQC+jqn-*Wg?`6yl ze=m6+G$HjF}G!lp-@Uu>~{M)^0g)0~5+lIHp#y7MXI=XA6fNU6?kR$nGbvzITV|1Y%FDpGaAw&?B&Ap9&(-6mPA2&K67%h zZxeKncdNniYrV1o$!ex|k9GJmDS(bpkU zvFb>_{hcm)kux#^;!NvAa?SfU*7bgm^Ui{39ORmuYN?eMhJ+^luz_$ZWvgO!^ghB| zNwOAD{Vgs)+iq(!==yWVwx8o3i#Y5QVsv-8Pw_eAoIN`Rl1p>tI(XZW5P0R@Z-y%X zA0XxGl~VcrgoEh06NTQSR&ZH~L|$a~?cq<@R**4vhgSqilf7M(sIR|<)`Cb<#%egh z`;5+~&N4|@OU$md1C;PhM_)53wt@nygGlI{>FV1EpJ^%ma+k}3<*I26aft~$#(BRY zms&PeX=#`938oK)7c0l&9J(`u(QbI$5`WjYVIpGT7P1>p7AW7(Jhj1($VhqoHGmUK z$xMa9@t5*)t?}2r7)jmRJ&O3v_t!|tse}sc{a1YQE5;ZKFlDQOp#;fUR*I8};1#F-N5VH#{_+i2p3R{fYUU#313QF3s-uX`oE zHM1@nM{X`m{iM1Sk{e;py8|Q*7o{0m z68m&=&*I)JqzW&0@8+3BRMU0VPU2(G&J9pfa^<)-H2#=v&__^`MeDDm??3;!y|Ag@ zr>~$csjbZ$Ix~7@;hY%$v2@h1)| z1E-&cT=k*E^D)#f$seU33pyBbTK{c1k>HO|*X+)8boU`m4*t=^<-_nUEYb8Z;}>V6 z{Ya48_7A#n8CJ&Ol9H?bqK04gvI3+o-y{{niufcH_3}Bfz9yA%Wf0eehzBr z`eSOPUk;ZqTFwps`KHnpqw(^IH2W24R3u}m&>xL+OxbbL6IYj@Y14V7Ed?uUy9ays z3OuV7ZrKYq8d=!%5t*T?y00slDNi_~97m=CuVfx8l`2@~Ph*AT8r$yPvu>gYe@H7= z<`8nX(YaGbUU>GV`7$xETzoy|nepfi)_+*=;^?Biybn0We!s*Demr@phz#F5Y9u$t&Ob;` zicm{j`sZKEK%#n_{nT%O;6T^;aG_nlfemv4Cw1TKsWL+uX}oJ%T9oBKFyn&R-z7d~ zME|OSW*qKn#O}-2Ua-T7l!xKuBYMw+1ujHHG4b@AUPyX9i)^p=(BoV$imrOE{)Za| z4%giFKEpG61ug@=PicJEJhfnH6p7y)wzCcFSw!ziFN{^MVjl!R6xQ|B$;X3q^_seA z%y_pP!)1x8sK2;2QAQ{HXpgV|%wWUxwxf8~-KSiMY1-&igK&*87BOW*k}#v>!8A3% zK=S6H(4R#|Rj(up;iEqzu!gtqms{*O5^DnW1>Q|c9Gf#U-0S>A`)gT})hlf2bxT@e zsTQ@B&RA;*4@IhrA-lAk+1v8i#E*BZ6D3@%AJ#~*4S}8 z8?JSkW6s;SJytL;rb|48XlmscsBTT~tEFPh1k8*p(5rNRP>4;rsS_H#S49>1tAc*> zO-HrohJ$?sr&@n<&hVOj7oU%OLbu$S}4Ur<4b31+v33JBlG&}me=#uuzRKF?E{m2(apeieW zV4;cliC6RiHS~J8dR?S3o!v2y`m(OB&hp0J_23!{xtQ|A*N75 za`J(^QZTqmD#OK}M9Hmxj|xBU!J;jZ+_Grx4juZ*q=2AAZd7}W)D-p_d2y3x zj>neoN#P#ER$2Bxx=@y^m8Ik|GN6$)*MPo_{Ye|?OTTuBwYrBgpxJbYBEqs^j%<;+ zrJOU{CU)i{a{5lm)8t!dTCThe4GqI_bN)!MdtR73QF>>a!8*>535VvJj6psfBjY;O z@1VhQ*CNcOV%Oj%dPTSWK_3=a|Ddwf(gE~JWZIZP<7b;H!DIF(zL_4IF7Wy?-_SRC z8H;vb`J#r_J!`SmSZgK--k4ivN3vQiY|DO_oYK}xYz|6xaD0#+*Y=r+$x?QVviPMG z?rfe5?Kut(Td~2Jwnz4px2CKNdR>oI^2f&A#wm&oq8``>)c6TP5?n~VZj9-Ol_w-39PQ~4D*af0X(d%v zqm`pXO#J&Dv%lb)erfPl##f7JLDe^%KWnY6U241%r=9O+RVpEo(fn93+FA12%g7C` z;GY-ho%8ft(-sImSU9|u{3#KR8VX|k*x2^>GIH;CnD^EnaXrB+XbRl-Xz4qtNj;Shq=v}Y% z36%-BXW!oiUoMjX}G&f zLr4M9^o%}6v6j5@OZQNL4LI5P2R$~-XYhTtLhaZ-*<^d`>+^1LEz~D<*-9Nt*mzd7 zo_QTY{E+>QPRG7!NcmjUWe3*xc->hy7OI8oc$rV-l^{x+|D}rOL&>f^yzcX63RnVU zp{8cw<6GpqtyXKUoc_+h)Ncy~8dd2csElJD)6U|@8k`Vteg;j7`^adIu|0Adp?E34 zCeH?IPfb}m%wtPfo@`E$SbJGRemd}jW=Op9YryNZggvtkxMVzbeniWQT1`YJlrj?K z9|b=Kq-=zk(}MmkVV|8K!j}y=>6)gC=GK^)j2;_yDGG-D zlyr2iY^=uQ!IWxeQikfO{FuGnpX-v6f2WFl@BUyJ2K-|3sdGE^xDkvg_a0|EgSK~5+@hVDfE1QJCH zTibA8=xgooC#a~X$PQPIs!rs4O5EG4woGSXVNw3=TUTPSzTRgWt^C5m=%i(!FjJ2C z1pH;__B)_g%m?4o-ynSzz;4tDX&|}^9m#_xZUAB%3Z!EZ9_VmUTwIlg*NkrgXdmhE zwllP- zdB_hLCzmEx?|ty?+-x?2@k?Uy@DGg!;(jx?BD+VX$3nG^|7t(!rw%Tqxlg~nEC`}) z`kHk5N<7g%Q=008qXm+C<2Z2N8gkB(O$EvQ^eGfbneu>DJ0c>Y1*i;Bpc?|($A(Qs z%EQA$$RtfuTwGjUT}>%`-~UVys6BO+j5M`=3OatI25nqWzy-MoJf!6{HFqHiiHRXV zEVG2}a{^S6=5}@w04#&L=d1v%pgS(B1%%hjdw`Ll0)h)ptW_YY1ddkTOu0b*3tAaJ z5#Z7f>F8i{ns2}hCt;?l^Z+Vcw3B-On?EaP=CrV{_@*pg<7`L25uS-;`T+ z9uBB#0NP-2XD1eJy1+jw3J_;-<2IgM`FM)CEmf*(@cQt!BCFzecYKd`mX?>9d^>v$02KDyAx_B9~(O z-xJ2izXhc|SFC&bKwhg5gO{l<@bUnff~X+pbipVdpQsSG>ukjy0epGw^FjMw&`@WHpkvWj) z!PP53yaHgA4D($A0NpnIu#_|E1xQ#(==+cbh4w00Tj$p?#Xbl9Hc}LFX|Rw+PEJlr zQPiSZT3UeeLqPQbxQh-#prN5L2TUgrKV0p}CN@6}i{o~J^l`&n2k)~_lMMIXEO)=w zYmU4vq$7Lvo^H9Bk8sZ1CI0%kXXwwUOYk6w8a%Lx{JS{VNI_VTagt+j{h}i!hp5-& zLKQVNHQAE@2O90k83m)v$5{m`kdy>$^?;G#;nvM@-}N#n2?;!mobZ4Wq7)G0CJBfUZ<>A~6ANTYwTwY0O1;bW*G(L#h_U2xbh>k*2B`7$lyhZz4E9L~lN+d~hLw zl%KyxJ+=<|t%gz}Rr<}yyR{_nilR`Q95PIIDa;`yN&tlW@Ho1=4=d^|s*gYwgj6Fp zAMMRIR_W!6l!6u<<{)F8nHR>y!~{#oNhK|5%)THsG&G!Rd}Y|v0UiOlbPDJeur$cX z{(=a?x}xsv25z&rR;*MGb6gJF3TT@HR{onQ=$c)S@ie4&cz%oK4dOQ1AYohQd z`7x;C+y3-63Sm6DmwgDSeDO#9K(`4<8l4;wB$vm};z4d^RBFssLqxm-f2A^_NHGdF z^!3r&8Wog?h)h8~bpEKJaS8C@5hh?sV}pS;%2(1;3i0|`@&p8%lP`xR4^62W6eUO> zZ;orIVvO6zqpW{p^EhIKye&+P2b)4$PY+ETvmCjUwO-#{HN2e&WalPeraJ){uas{E z^yi#^wquev?8s68YoA;)pLWk?ju&>%rr0-EGrNPr9E0ni_70{wBp3`kw2YNP_a%fE zR?-wK{Tdt@hZ-nsw&wgZarW%_|Gp2VjY$!PGt}aKN^vsgur)nnL4f;?+)bv+sv0ZYpDjD?-e2-ow_R>+kqPM6o7&sloU?TSK~@&I{yKYXXYV6VCY8+S_yWT-DX9fn{5Nx2y&_ zii<|GhhLRdSq@9bWC*SZcE7KgA!By6 zz2C&Vf4tD0Eaa+}siCWjFzl5iQmy$I#nyK%;d#H-AeC7~c4WOhED zs(})gpO=03D0P{liV80MW4<>*K^WSMQZc}RM$x~2U6f;Y`kY$Cb)1f+EV$f%$)ux0 zNfDC{Ua9*@Sqr>6*4D6=1!{{N7QqJ(x{kj`FRpICv*LO#Oj}=j+HtY-O3)vnQ~LV? zQ~i5g)JjoSe!dbL2RmFs6dUj0AbD9-=lJOB;Cxer0;1H(sob$2M+u3vY%>sWpE`IW zw|pGgZrKnP7DgjwkJIuPm1j=o1HG4q z5++bR=AkeW@3eiIy)xRTn40@xJ+}gtxP6&oJBT%perRX~&B(eVA*}>e&DMlU)>kvj z&I{*cVHd0dAK&&bG2yDclKyM(n3!0>QZo|G^alh?e|TUmgVg%n0pZsiuc=e856rP&=IG@wDz$ z;!l%|p2;%z{bPZtO0Ro_u`zEPPL=y{ zl^A?ZF+NgQKiNFQruu4xOD*P26|ZyrsxT($Ifh=>qa~paoRAV2Ol>$!uttM?>)lhV z5QUwwBGf}NjB`>9yr_4%L56q!#l29-JG{8JDZAJ!9P|#6&4wRwOTY{ag>;NRJXI7F zq;ZItL%T_Kt$s%#iwr(VI_+!^H+sv3;54_D-^eHlUChR=w#gYO*ldC6XPoT1*61xG zzj)?sP)s<5O=(kckn^tO1?|vqQdCx;Dp|b}j#E9&n@l22i|6UyIhWzQT}|`{MsCs& z3!#T1U+p4dF?TFn`QuhYI(h_MbZGeJbd>yX%1fGME*=QWr;1=M0MhV4ZpFf;C=?Jz zf;kzT;U<}b7T}7B)r`BEM*PLX+%BT`{(6qWAMRY)T#YRoI9-L^AHM@PZ7J)MjVT<= z@Hp~H1oo;9vz)>ohw)Gb>A-(wsvmL%{EFPJr$;%GWxXxFT{^66{!w_*lqD;5v|bp; z)b%|}+x+@%KyKD6Z&!<*kId%gb|@mAB)Fuc`hVW>d$2*!=PD#&!E$ive%-4rovS)q z6&w7C9l6_<`n%B;M=kC`yAQNB7x(`X2nxNTc(iU7^G*$gop1eBYVb)Uv?vv|8pbc% zMtaeuHq=XAeMXAGRr37%R#H@b6=r!(vrMXiU`vY@OS#B!+uulS*x%y&znLcro*0dc zQib`wIN;jLeuK@Cmv`{v&;QZ_l%Fezyp0nG4KEuRa@Bx-)$(52(A49jJpKF+OWhJH zgzDu^C9>|Uf7)p{*7rVpbVkWP#!D`b-)iC?`I6s^jZx~|AFfd`^B=CLihr|TPMTk6 zi{O5*pz;)#qVQ)}@pt33?`*tgHO~AFnth2pS?5P%i3a}TU%zuI%e^{GCQV#4l#hsf ztH5Pz$buXrF=Z_qd_h5KeTObJRZvK@)7z?fa4szAVc)L|39g3&yu7?ZE{8D#mcHC= z&0gmazhl}6V$&a=udjbm45Xnq2gd(w4)-uBhRTBf?Oq-jPFYIF&V*O9IjfFVciz$) zv-?-0u$0x8gnv;d6^Q5g-otE&dS}_z%_->ufr|o#<7$X;BnJjBa<}?vmv8@r)52*@w(KLjvwAmxN zZw1qiPq_p4!ij5QEA}1f-JgxCy*_IwA$xpMaoyofWR{6n#2)mH8`f9(WN+qIZT8b^ z?*4Qih7vH{N}`joDqobb4Acj(Hr%orXWJZl1_^w`YE{*t2=5Xv8D{L!w}5o2fJJ*| zwC;5eY-;^kWjA5y$szjhI#AIv_38;+Ns$#i;Ajl{dp{itx{R|%{>S} z@5M~ttnGTLa3lwujEn>MUOmi6tcB_A+7ngHpQH*eZuXRtY3wX*3>}YM2WC}D#sp|c z9v6d=<{W>qZw_wwtl);k&IwR7bmn}*kKSPH>|6xOI201NBhYFv_}>9;rpuVxt5kY zy}i?cT^b!&>JW_QRPm)(9?>XHtPX=RX*#K2?fqx{KBN_T>0OYAB32ap1*_T zycASDIVkfh%F7Y4bfU2Ly>2I)k0hjsv^E|&;yN5gxb$3vG5@ z-Wvezv9Pl%oBAF#BqN9+C2$zFsU-M7@|Beo)CDLL^=?JiXoZ8WNI5+*K(inQw66;2 zl3TmGacOC3hrB-H=dmv!`sW_ZCge+SJ9iyZAkMs`eQczDQz|TeJB?DPF z>quGlC(Ji9&N*g3_(ZC!J+M>w-@{LSNwD3%#8FL>*U%sUKmyu#eMEEvdN37%oDy+a zX=w&@e+FQ$I}DIz?!l712cAGjEvxJ&m_~Q-rf_M``17%6H(*6+BC5YI=x+AQ`}le? z=F_GMx=7rS)P7!-5Z~bUHiSSbCxs#0=304q{(D{o5%Coa!x$Lt&{0Q| z+P=LODVPLM!>{q*Utw5TzWb@In1fQLwvo}~khIkLe(tS+c0S{(jWrc2H zw106Q;}aUbJl5#Iv<6=m57sp$x$15hUXgeZoSGCnD=~t!LArrTtrY_mhLhfJsBoFH zVzIuM7+GG9WewUH2?`5$+ZUVIS0i`j^0ScaGMR2>B!MR(5l;EDls|Eb^ee zM$1r+Vg*P~V-R;kq~EvffQpNYFQ06=c>>p6`V44v30ioN*4Nj=ot=RfAlL48XKiy6 z$92yaUca)n6%G7XLEre;TXTh8`6PyhhDv~Dp3pw;>#2jmbP4@|VemEi)nsLLb>bU; z|5|hnq}a_BkyTe$1LGp$)j#?j>#iLuFfRUtGO|fv4|xO!1E&7D)9N_{sggZ~r6h#i zEEB!7xq;$u@A{Mc$O0Pi{yl})-h3=*SA;f4lvh-sJ8%gBtoL9Y*bZ98#t5rye1USE zDA)Yv|3Z2r;b*}7)-p5%8g`YEE-Q4es~~`*E(n`oE!yuR=q+vmEc*%wJ6nMnmTlmj z&CNnLUk4Fps+yW@02)Qq@Up>OYe7fr=)u>ZJr$aM?0&lggt86=D1g;bf}|oO0)Zke zh-17R1a6FvNlEc6%XStPTs{DqCjtCGS0dN@oU$?!4-b!!kPyn>zl|0A`1vuugHS*` zmcCK2P8Jo>V9lkXHqVd*FATe_6Zbww%{|?1$l+lRPNIjme46XU-4c|zq@+Cidj&;B z3JUg^ZnMDadl!ZOL4gOd_V{izhcU=Q0AgnW>@842!nUZp{YcL@;0ga9B=~I^P;MzG zfUYMQ0vLaRgCJAd7yhfz5dgz{^E1sM>tbEe&0aGCiutmuN7^nx=V@YKBag9s@*2QGmc=mWL%ET#(qWrSO4A z0kx#}ZU|U01D$!G{F1#SJZ|d%kw1}~oxmJY4*F?SJB+JKZ`*2BR907a04ux_x8Z9z zjEy28Az|4M?ZyY9ap6!UA!t;&u4fpWYD`2_^Z<0MVbL^x>axNjpIt44#Qz9KM*rf8 zIe2e}L?R z8vR~o@^U#xV_5?LI`@G!=zEjB6LJ^CvKp*v?}-;1mN9{zRB#8eqQ6`8i@!2vi-6t1 zhDdb+*QW$C$KY!;wSdhJ+3R$+zd-8lvreb)jQ2@C09cAry7lO2$Llz;LPjbn$uhwf z@cQ@+yr{%-a&qpCz}I30_5(YB#;%=k@x_kx1qVMvhZ7?M;z|N;0X5p(AuSIk=LVen zZM6_IYL`8P9WuY<4!r*dg2Z+$hgE}1Q(sA{iL|f|YO&ewNzgWDNQb0_dqH|g7wW`7 ztBuJg5e*+>Hd!logU73Ebyx^MC?L73;R5h0;gB%#hJ;{sj*s63Hom*U^1cTKTOcM_ z8Qq{yNs>~OnvyIhd5L#W~@Fu0S;@Cn-wXZx|%c-h=6-WNg5RCVr)&84L-(v_XxqM zeo(Q}?Dfu+fu>|3xjyAM{X&^RO`8nmr? zEd7vOHxUcp6k%7$&RzZb;<(ty=u`MBA{opnu|52>t{pN@%g;p^2ip_$wPmJlonLqo zKBfaCn&^O3d;LA9ZmZYp*;$CfUrja&(8i5I;9h<;n>wSgSS^r6Mvz8;IR87jjkZv_YIi8(SRJ6M?@ueAySy7RdEFjyo zyb>RpaaFNW7=as`V7&&K?@*lx99yb^p+DXbQq@UX0$X7A?@G#In{!hhS$+=m&58$o zq2&V0U%gNLNIIqE2OecTYUoeVx8TitY~*;aZ?8`W$Tpr#A@m2Kk3X)dM#jloFQ&O( za&5|!{HdNb(9|R>o0WnBFCG10mRvhZov75fZ(tkt8QW>c>p-=Z)R1du~v^$s#Fv07e5pT@d`h*2&J^6B-K!N;C`n> z$DaM^3)-knRgNp04o;+(Y~VzC`?=4<5`xX4n-6)ozqC*Loh;kg2Lx=uWe-mRSRVFl z{0T=616^D2=8sE*2N)9Wn;Z9;40TfS-^{JfmUP4MEQdapo1Roa6W0sEm{dRH6n^tL+*khOajxuCsQ0Vq$tMB~mUo zFV7eewSTkpH3h{@##szeNc%H2GqYTn^~>_|pEGFQGxaJd_StN4N~F%Zp&4eH%8xX0l8#SE;L&=d-irT! z;C@+WyNmtzF;3vR@32Fj$u@wXAH8NZsizMwx5}D9LqzWU{QL-0@o+-MC@<$(5PL*x zxfL*T(G2ME<*CxZ0#i<|s9e?7+zgq?Qe)O<^-eA>{PW|;$^MxQI>0sGO}ifY^>KIi zlaS*CuaQx5v>~nTPySCs;H8X+RkXt$sAZ+|MItt?Eslw6K*~YT%@5-$n_T9ZrP0w* z3J}K42Ucds|F5kt4~O!7`(})N8^)HbV;5OcvZTRGc4kmSk~LdeWJ#8>GsuJriELRC zDHMgs7G+P!uA+QpFI#!f`Tm~cIo|hu9)I~`j+uMzxv%Rw&(HZ;5;J+%zAi5-1<)1o z9~tbquV@!?tFr0m;Q+a!z!&}IkE%7e)HkV*Th3zFb4zjRkae)aEu8hYpWRs5pdBN2^27Bvex^%4$)&;ytuucsi!%H-jvhW?sArwZ79c%ltKKQHRCbWu zJNW+#Aoed19UsMJy?oKheoT68h{PD-S#oSbG=;xDFF(!qmbj7Yn@5Q%3!`s%&b05P zIED=;r8q%DN3f*FSy))S*6$t}v5iMG3?9a@P;i z=7lZj=IrFLheDA=GE7_h^{b~=VdI%wDk@bpc(WN4jBQ81&r6DJ+{$U6HEZwd^5pXf2o~uQl-5kN&UjNBk^2`P*qODpdAPl;s^d%9hD`8Nn@}K+q z$ru#~_u-opqs{ANE(#$QEHM7bl$7=EuB0YtDLxl#fDQ zKQQN~-tSJXsG}C_W9*|o82S5HoA1%x~{c zINY2=vSl8uCaLpKSyN!xI1Ag09#+=P`wfY9_K0zWD(H7_Q#iR$cul?sCQ80gZmPoQ zmuj>Hg^}fTBe}C17mb_qPhs$WAk>u&-kg2%`QslYPCk^sqf<`q0GfcO6`z0D87C>a zrpvmLhj7+vWt>t|Ys(6&t*a=oC_%c(Y|pf~UhGTc{0|9S-P-SV@fjl0^?&1m1Gq}g zKbJj&a0z7(>fwyOlfW02CArUc`ydoN+cJa!`Y)gjoXyLzOtrL3)$ONEMHs8Iu(Ims z>V&rD%4I?NJQMUt%#e)2O?v)U!<%adh$;99dORoC_9rha2*B_vAH29m`P`uio$a+g zsV8Tf&cb)_;Gw_ggb+I(Z0V3-MhDC+K}W_tPwBD*|>(6hkont?(z{}!38X3#UCTk zY8J^RL#h}shj>h|Fe$;zcUPVVcY6u{=9iU+R`&m1b;2>Q|0me`8ZE!2%CNb zvDjwd%3M$wl(S95*$gvG;!IhEgPc7nQ8=IhtWt>a1Z%uGcZF)~IJjF}Q@hLe}l z?spy|O^p4#KJl}hsCTo8{3-~S63XY>alEFRJ`d6=f@#g;gXVr4v&vl zSPiXD|IPe=K|U_npWjV{1X=2chEFpi zt&h6;T1VM! zjp+2|Z*hldt#4$$(Sjdog}nutK?5WFnOzcdh%rHd;{s zgsc5r5bYS9H$HJ7i0@94**^2T&xWiRvD_eyYv*zTns@3W0Sc5uu#d3R*I;o@q=FZdM zSi1a?J^z7@ZpI~tnirwlDTS@YN1&ABfL9JV@!sGtot}fQ8=|fMGlDsUht>^E`JBhh zWNxT0-|*u#Dl4Jl*x1Ya{CXDS^i*ig9DcHoHk1xva>M~;V&=-1^BHd#(NOViPbaJ- z{7dBc4S>~*eKgdi0F<=+rbTbHQZH*2u|gg3${vHuZHvuPYDK1Jfw;m_iIcNJl8SLL zKYs5A*a8WlmX%OINNEhPTOt<_#?;4znJ*%a3Yu0?U&!}91dPfykjMXnTUiHEZ$M*< z_DOiod5~m#Y=$;mYZrAkqwy)m;B+LsWBS9kZ>6dc+~yCD(|^$?QX{>Ag4PZhvnxw4 zrdlQ^ML?I1p7^sIzW){|HH1=2sXcHa6gc9}2IfX&s`VqVydVKu(meY(EwoEO#$kmc zQ})m_yvyK^T#zb+(ayL ziR&91(NR%!wLYVW(9lqY^RESTbab3Wh9D4*ou8loJ%Cz!H=ukx9{5w}@VS>_J%GBh zL}9C#Ixjv5e*5f{e)NK^o|Qf0Kq0aaA15)>rjHxfj;rqMYsp~!-HfzO#a?CZ1RV5q zDko#zvAN!`>gu?wDbL)d_-a`DqB1Mvo5+4(hUjl%FLq z@V5{`73`O&%r&65&CWEQt6S6K0aQ0KH%v1zO~wKej3@ulg>8S;2HS~ybs|PaMtbJ} z=Q_M2HYz4}B_M#WL$(kg%Ocxnt}1|~v8$`U@X2-KrCgosC~){(JG%~Gfiu_?>4gE; z@fPq6=YU$K3D0US&yT9o(zHo|Z4f-S(krFRFug@pLZ-$vxbn@;G@Wk<&g4RIr%g{^ z02+k5vzd`mGX#1NfbJ%QxVAKQlu9$(7=-gMThpHf9QEG9vGEizO9*VhUe|3=M?B;D^Q4ie+CH28%08*tg+sLlm&yRqc2ocwpTNm5?bOsX6~A^it^+fIl-X@zIXN3SZrar`e>V5wfRS}sg)|o_Z~dh0Ty&EE zmzfrvk>+9g*4c7L$yrOc%nm7}cH26;)e-ulLIq^63#=3x8q};2PS^VR7wY7p&b#-> zMTAr`6y8iv(7fJO8*>2f>tRa(R6Pi0Qu-kw+V4Mn05?2Q*s-iaPT_x0Jx(YoVDiz@ z)5AdpP$4P;21G|~?d(Vc_hsev5MAM)&xa#EG|weU%2e_w?CinZB!A&S#U&RPiFQ1<|DPT9jQjWRRNLR3x5rT- zsU%~Y<}Y6tiJzO1gB4qePM@9~X@5Cp-Zw-eJA0cVt#4h%GpT+aW0acX;-ysEi{E?R z#H#J~Y*VRm3jq!`HzImJEHhj0;inXWf@^ugM+p&vOX<3JyO{n3d!8MWxN9bLci0eR zvSJ%W&EtoZlxrIg8D}osNs5_0s|t&{A8D3 z!cF+kmGw^l>}OvYkTFhr9(usU!;@y_9d~)7aMG5Vob)HZSlrzRygD6M?k{PCUWdU{ zLnJBcF{`!#3#YWSG>c!cb6LKJ$vs8=Yu8lIRNB$AoVIl03$@}xqpf7`0X-IGaUZY3 zDWms2MT;1QD3cT`ea4o(r_K{68k4BRU44{PDL6$6i1zi$SNo_o4%E z`aItU^%y<vVK;zNz$5+WU1+LY$cR~VJVLL?2}!rq=p2so1T zRbFH+5H-hrg8rdqT&sDA9;z`rA;97VQI}|XQxzOB? z<1b43s?`z{Jo4tl_(NsXRZ%KclNpMbua1Ug>KfsHwM$kWuzV?{d4Lhq)YR-c86=X| zMI9uTs@~)rv%cO!k`EXa+xw-;VZR*rg=Vmcd_sHTi$=^U_ks5F8@$9M#`6s<$g4XW z2UY;hVkrWU@fIAjA`I@fH9VKYtPITH^WHiC%oX`TfJejg5_BGg&&J^__AJfy+4p zmwkK`Ha0etUA~#B`3^DJ*^Omr_%Vcm+YO@XOH|@L(lZuAYcfy6t?^!lC}$x_K{E$M z+=Jn$736EOiSD^Fo@X0ug@wj2whdQ$ZGReT71%u+^Uj7!_<}*L7bF@Qq5Z9#=iEz8{q7;{2fdY@WBq;``j*trIjldBh;)$AOLFK+H>cK@Rvn zfFE)u+!kPFSgL!1LXF*stZgmmq(5+C>H7o>MqUSR8Z$A)2wl8!LtH|F4tUHkF_Uut zu-2u*5UeG;q^GN@e_5#ra|C~1`v$p$b#*OEj^zb)(nPd<^s;?*PA2+gUl2H19csy} zR#IP@GfTzJ6;}aM;Qf0xm~`;9Id9xMwzO3uDN)5Nwby>MBg9Gip*3}l$eY?G4irsPm6X0J*(Frx{&9Y)-d$l! z&ealFbaq0kuM*M}%nY#Sqvuy3|YHbSHOW0n=&IsCEo|Vz-axoMEBOQ_pap zaB}X+t_#}Rx;>H^iHvq_?dUKNvLMDlT2UUcAhzyS1d`+$TRWl^vc&{uT$q5mhxO<& z2aKw!Wwrq^Ieb4f5+0)j?>@e;>Q~64!Hcp99iZUA`F;f-SA2ZhWLX&sMIKLAwBa~^ zpUT73W`0&cC6MJo-+={jr@_Edty`)gV}^a-_X&E zl3)u!7?Lor?bp}DJ~ROR}9das8TQU-ggym@%Z zgRGGNKEQM;G12=Qot8fsou{GDXhn>aBx5=^{-%#rfm%7tH40#|w=5J@P*4zm4KO9M z*GB2*g>d$(a)Nlx+RIb7z{W1Y8|>tyt;}U>$h92>e`={_bSL*;%MH}P1fb{+Jwqc8 z*Mss)-%OQYP;69qLr$)gwoJ-laB*SkMPKPFoyq5b&Mq(D#|87iM+-59W7o#mbU%()3 zoSijs1xM_cF z!LR-=*uqx@EZMCNh-)f=kn^*lT#Z*hX4{0!lN@HgDgD@HzQHO3lFowah=jgFTzOO> zg7sN!j)4^O_Cp>8{H`snitvh%m6pAW7vl}V;(c#9cj3@v6V6oB9m%K%3CD)npBcRR z5j(rC=E*RAssT%uq@!KzdV_QPTOscCe_77l-5QO0qh@{oXG#a_1fk#PymD{yzkL9> zYiMV0%LDlR?=&U)rou$Rfu$c8rqZrthjJ+j5RRQOQTGdRXl7s-TVq6~-mX(ept2|o zqv4#&^r2khyw|LL@D_=zbIDqi0FT|tov zzCqB_L%_0U@b*3!bhFp$*X~nXip3rJFGO3R0ndZw3Qv^jzuYK6n$08M4A`=wo2vCR zZaN@O8O=BsCGWUxww?Yx72nZ%V(nf|&%em+HTwhLMPt$)f!fA`dH5zGHzlZx*xL9^!eATRZ0`L1&IRri0+12l- zD|YVEbsWz(BZ8psd;fQP8A#WXVvXAK@y)T#zX0U_(!fRek7ZNK;{x8E2@}%3H}k3L z+Sc;JcMT2Sy;%{?J07USwNo!>+c(G;Dws75qU59zk=S{0&#l*4;(INUCEqJj%%Ywx zGjul50;PtkdHEhl)0@lO$qd3vLWkLyf8-MXpP4jD=wrLkOJ+&s-`(H

Jt z+O~Fr4E2KX#+2rdc8sT`w-ffKl>0b-(+&1{PaN9tbZtkpG;RL(^a>ijLWJIf#f-Df zRhJxXhkW+T#}afKnKbJ~-ZifYCbjq~h{c$t?;bLx9w`Cs^;eFi6>AZ@ALjcdActe? zJ*HFr0u`nfa6Y?>Zh1`cn8>Q?VzY|QVxgd9(aMhI%Vq~w>vk9T7q&8)hd1u9ChB3q zeS;ovoV0OtW@1A6J}D?^6CYHrJ+?h3AY<(}%*#dbL!0FJAJr!a+SVC`z0}bcIqUgK z7M1fcNqNK1FX+tC)-)T>619(_O}$ri-(I_Bp6uR)X#yDTSdf^loDeT;-y^89O@X+GPF`B%CYGP_gUmInx6)DYbYX+~2QS5w(+ z4|=qOUpjF0HOP=J`O1?=y)D+?lD>Vj4LGdXi@AtRdK8-vQ1NGAW)u+0~uJcC?YjODmHv zh!>tahoFq;T9YV3Q?<^i7d6poF~@S^jHdtFNTB`uRdd;8F^jrjK6%|;UfmI`_p}Xy z8Lnv5PRQ=%t6RP{vX+CZx(MfzVrH!)iX&NGd4psd<-Y#b94?L(fn5aqApz$bO7`bM zvyy1-^^{APi!bm7t!^W0sKl~gChyUpiv=VwRmn62g_hN%Di_t}atwY-NaM+6zoh$c z43hf|J`|m95E*o|D|8|1sajl1kO|O5opW-WQXkcId+VO5hMt<|<|NzEcNGBs}^Pk4}dw;Xw0~OxIO>s z4LWeYao*haDBqP5417I7sN-o`AWVG9nYU4f3#XrJGdM!)PyM8dL*46^2dJyrnpJh~ zQ}Ncin_BhJqtm{gOeXph?|I5e8InR_sK}B+wTvI$F8;gnD_fQBI5)l-2}Vud%f~tP zL}Vu0e>znres?JE{~mrybo5Ye4{|64mfg-1x!!o@5GFwIIrG}*l_T`1h_;^9ouZ2u zdWz;>?qT%akB=Gh^ZF!l-BD$*84o~^S$5tzw2AyTCy_EoH?`4K6~HybzS#$i`3Fkx=u_-jE9GJT^TK} zgNJv88T?Bnyb6vadWTlv;W6VW%ggBbm~Cd_d+1F}bbDun;t~Be;8cjB=Fy7M{r;m| zE03>&sG>VsoYI?*G+)8Exq&JjkA<9YciH6UUiL{+g0hXQ;_aGkV_s(WxzxE2E^bHS zKfl|ii+le@S0&&P{Xc#%M1+&tm=~DaP0SF)fBtB}Hi;=YrO60st`NOQlpZg8u<-Mz zZ4bQ4@xM`LD|hAi$}t z!H;WDZ}`Eh0Oe}Ne+>z9m;4Q0^;I4>Qsq~8_~;~ICIU?;O8TE43ubWgmEdYIqBXhs z85f>~y_L^CCwus&rlwlWK13N3zQ}P^F6n)i67$SEwGj(H#?A>_{plX~KWXp$pl(5P zBWJF=>~SirffO-U7V!75d;ix6{6?RU6C(ST2?{d93%UKF7zfL8Z$`-V@ArQ7AkN~ocafF7oYOB*d$|Ly2s_}j z>}=;OYk=tG8Tf|9zSA@Vx6LFn9?&E{YOkI%ze+efIj;B2J!;!M-AN&# z`oKxPIq3T2)Y8DO*d3{A_%?Oy}_h6#pcid z9#z%UbQbFtj`}Qd|N53MxBBJf)7`~xpO^2TLgWSw(Z`!pBxOGmw9tjZ67u<8Ne2rOGxg| zaqsxDh7@9W$Q9T_%?##6p*YReRWoEFBIw*=PK9$D98ZxU5lZVvO6#OKh0~aXw-0vTobq#S?-Y%N-`F>>9eHA|7jF8~+rqE; z2YtUDO!fTqKrray*_c^4 z)8Bj2CC(JUjm6{AEx0ybX&!Igmn6fAgA+iPWVP}Gh+ zj8j@NmZ_2*!WSHhB~wK(Bh(hOcl{eyZe0;4P-_0@PZt2DL zqA)PTzFtpxzhdx4%=8_l zytN&9$6QO-j1c_+f_ZUQdBP4Vd(K$w+cGK2Judk9`?7LVORkVvwrP2mym?`Ath50P%#DyH6502N8XiiDt!52Q>UKFP90wX;R6c>lT_bj4i>KN&FT?1=;6f#+s zC^e-|ujcMrKNY`w&iiAWd1pZZc5!qM>cXH<;&w9_E6^93i|Obes61$VsU)1kCUs6S zc6LP$CK7auANuPI`Hr+f1Q7E{K!4J_@jGq#RK(Di)NdQ@$>$+^A8S` z8m#-{3rib2Sw3*wu6-HtiNv_A#r5W7p&Hrc$Fz>wfh#X(__Q=j4!FXfy!lrDB6uR` zmZG&C^SALEIl^Cc7XM(dczx4W$#gE08ysSvGHNG;CCoF1hV2=_JhBF>0eL+%Eb0YWpoDgR9EbqceT0$J8}J z4;K^!r|3GgVc6~M*fRFMN%=YVk3tM4ZguXuT3g3S2VGdb-{V6Y>YTv3o z2^7B;{?Pcof#%_T9*_2DKDh(IdKvoFvitRmecujU(TSwR{oV$%bQ7leyAIpdy$ohI zUj%Dqn_!)rTg4U6HBnA&GeP5A?HScu5)!XQjHXZy!?gwS7@rj9=ugP!$c}{d*A)&A zv2RlTE{-Lh_yl==v`qN$$4w$@4rx4tl(vqQhc!PhlbV|=od3~%6RgjuYw8?JeNj*P zUneh?+NjL`{Y_Uo9@WZ7KEQCTI$)_M_UXa8zG;)!^%nml1NyzeOmVBj&8g$V85e!d z@xEk^;UaC;m7M0FOG(MgGxujFE9sNXzBLGd{i=#|zoZ zi}T%$%8#0j)FPwx$=r zZ|GERDzD$)i+f(OtJA%A7$c6R=F?eJX+|yX!r`)HShIR|oC2T!jPV|^xblzV#c_RGI;ajSF{_gl z-*Gj#pXYdoanRl!iM)e8|IXsOP6m}%ata(hm|s~5bN@N`iz@1Iw#41r-bOJ>?sUcM z@4U?8ZC_ty-CTLt0z>z#j_mko(|eXz2o@HL;&}}|@%b}>2$lSJ#n&yfakb6v1o2Yr zR`;I)xrBm4txRZ_=n~Dez^gk4N((>0l4%Xn5cdrDJ8?8C7@Dhy32pJis@+U$^WH`^ zSo4$&K>7}so`jzKJ*#vS4<%>~A3AOA9S^v|IOrI)KTyCuq+yN5+ssJ5vcTfw{V2jr z?yhx)y~A~FMlv|rLhuD)^HZbPquq}2-f4Fs#=#0Q@dpvrF-Iqj*<;fI@A5MDcF=v3 zUXss_)TE2H1GbEojCGM9Vy(1qbk^tx*b)_=mT;Lpxp}eBYBWiFJojqot6v;KQF|tC z9a8PB9=ShI$hG}Ocb%b1^LH_6GogMh?5sn7 ze%Enax+^kNKB~~->l+y#kRKy0n#nWcRd*+CcDP?S|q%w+wx+*)=_zQ?a`lU0vG986#lK5}^;o0Hn&jOsJm_n9jy zD(q(J$`O1>X;k6L+Hh`3Npf;B!f2#GJr{3yW6+un@oz*aEZe39x8#DN9xf62{26Mr zgUV-ho&Khy*^)0_R#%>jNL5-?JO`4*Qoz z7$-W|6!S;1J0?WTP1alq`Aa&^glSH<%`bOZ#?4OmE^YaqUvlhR`*r!FJ9GE;-IQDd z%<*i5`|26`VarG2{E(Jh71^PSKV}jMS4b>>=Arer^_+=kZZib&nH9U-iOjMhXPw~&6#=b} zX_2!Rm$-$%1nae(mY|$yUYSp;Oxn{!&&1QbA7yW|U4PFYNl&2YC|-i6y-{yUumT|= zBMK!-v;!nGT>i@x%5frug( z48>0=L>A!v8qOs!s5HA-zEuR3ub^>>3N??UQCOwc$Lw!TQ3%<5d4m^uG-!>F5a0uN zeq_(aMOao^zSlkb=d_hiG!=iGE{9Z{I1NJ@Qu=%JUHNwm-Ir1@RpD3ZN31&laS)L! zTX^*u)AHAm>iXinE57dQ#-|ZY-}p4C2Sql2BFs-koG&cQ_%gBhTy`Z4E=WEh*vw6U ziozSdYeRHI6!cft*)^G_D43?Hr-F`M6ABiz~?Piea7GrM-|lNJA<+>mWP*apL*qg(&}T-OIc%o+wb~oH#fOS z6D4@47Zx!vrkXFUL6zdw2>pFE!`AaTjmbdNEB%@~sWI~H?quksNtgO+oT}dJKytUU z{s`jgO4dhW0MWuQLEfd=zvwElq3Qk{tZRc@B6{Lu0xqWl!c8(iw zGV*T~g@%W!_XX@;tV2A_kIz5I$I6WRH#W}p>(bV4*+IL86*93aN1tR$BxGrQ1OZ?@TS!2ho}SNi#%tdH-DV`6x)yp+|Rq@pn6w)2P}qjQDYXDo^^Jj>NNL zEg}e4xk-J`)=d5G(OeLZ%kzkch&?wfI-#o2r2a{9wsMQ8`>H&>W2Ttr{KiIU$NM`9 z6%_H*%8Si2jzRQovk=m^n3ei+aX3aH`Fy%H+3U42KGf01MNB!o+ot5z~mE>iy}vBRG1o8 zxr>wHt~_A~yWCyr7t_{6d|QD8=L&rrWeBx?K*DmH7k2TpzSMf9Au2}+Tk<{H&up5U zTYct+@j&>clq1+V?_{9z6EB3`XA*p}<{H&?n49C^s$R{*Q&5&2Zv=Vpeae65Z4qqb`+HI|ELhSF3{j#!OzosPbiZSC5zTx09 zkiq7}J^7RR>`+r*DoH(Hl{9+CLeZIxGJlP(vg{#TANhviyF_Ze3Mtvwl;iP~meL3@ zHECJmOr5;Wzn2&7rE)^G5kkl2laTsmDhpee`1B?Yc}Ei=1NhAaNT9TJ-rl%<+ya$g zdWi>*=f}|F+0N1yUGf`y7jlxy5ekGsmsf`jJT*uC(IYRGANGcuc4SgY5fEIX7rr)A z=d5cf9fc)<<0VD2h|(rG-8x}LBJ$#vgHW1TreQh?_ccs3zR?L-U>r>1`o}2YHxaiF zHm8JALvJnUBo*C4+HTLb^hdKeXN$UDJ^Ay~@R9jUz02iV(USqeW(0zLB11C3*#BB3 zswbVtP=kmbu8PoL&w{@U@o})0P?N3SHE+ylRlq3&ghH<5b!u%X(5^I5_ur@laws>9 zM}_m@sPO*%#d|F0#((Z2p0gDi+RrK=JS)PRt-4;_c=`lou?6T@*^iEl>~0Eh6(5SH z7-2OG6tx?x2}wFun?e$y?m{Vsi%PZtDTZCXb=>`^xh{V5|+4=(aE5|1VB>UO)h`xD7_fE;kT|ZhMRS@Oxho5fhwV zL1M7Vpp1cq;^7lcKmZ-bjYAAKjyHIMH4xko{td&BkZ5-BiQ(0Sd8c3?$?zGybVorD zZwNviF)*~W90nTd7!3OYrlsr4Rf{8VZ|tnnWY|8f8dgM=9r%7~XGYRNY5?jcpQnl{ z0inlFy5TC=ookQHe>f`T2RH8TY48 z+shu*2zdWCHEs40oR#E~UR3BJMp;n6)NV(bxYp4yG(21}A0LbTwc*|r zE(&>ASVY9eSdgXRbd9a%9UFkAqCSn6n^Xr~28J{x5B%tgq}T=9E~Ih=zfP_#11b*- z_YrQrICabrc`EBGcL#~87+QQsijUnYdpMW^TLjCm<3I*#!*1aKL5@89;KtV6-0TYW zs$GqF>oZY6F-hR`hLwRk1}cdH(F3pVwSaLcQ2f2fV2jrqdPFHndFpVw|-po1f&CfL*#g%J4YXQ9WU zM3Ywm2AHW5vbArAePNOA8hK=E8vz!HU004i@R#z=3LruVf`swxz2wk%!9g%v~zbByGge4 zk-XHmOtA!Hj3w$9BZZn{wgB;eAWn4n(Rde!AYoX^gMBLIy@w;!R!V2bOYshIuaF(; zvhPrLRCNKAh{(F{&;1A)lmGP!BKzRv1G#{n7Ok(e_dJPF-ve4 z(t;F%w3>VOI2yrxadF|m9Zkb;@LcwS-z)-h<*HqGSg1K*H)`?V-srgzn9?lvbP?q=+pO$txyTLX-jHL{ZV#0=@a$xdizSB( z$qUwhn3(7K;>GzI%i~zOyRtnoH}o5@tp^^lN)nWxg@V>Pbmg^_ODZQWH;|HQ{BcV|k z-h(DDJ|Ms532XK6nwialXuzH)MU6o7LRW#3c-7evEAP8)#bP^47cEI5c5WWbAo)QufMS3RfrRWPAF5XH*nH0~xx7YPpA0-+b z9Gt9mOrO5iJJMI#dP)rBXr=M8srik+I^7wcX-@fd3*V}&^m7-8b8^y-l^W{iP&Q6S zijn*K`(I;{gi`SYUa_^M*_^2-v-%WwvRvJt@wn{Qh;19#o9|iplFE4KGr_K38hAay z?d1&uNr@B@Nn+u*R}DUq<>>=IQ@C_T#pyv5DHF~^1hOt5y?=k#4*DEE+&Mzw@3pOM zm2Qp(R9UIkL7&RoSA}n)npUID6e@4oKU2Sr3BE$WDai7Heo-a`X_a&r_4D#`vOkgI zw~YN14oyb3+Z7L+4u`ED^QpfOkZB=fu0VN+PU|+OymBp{EUL91xe5aAgrD6VPwOQq zr%8d8gM_zzTM2(fZ4!5Thg*dJrio0Mchb*2~FgaFZfNw>2M7@GMOH1d}8o$Zrjz0K2Bjy%O!I)@P6 z1(wDr!UTZ%ADtaPUEd6xqV4NZsURWriN_~dEv-3Iz@iDmBwA9X5Ungh)~qPD0s{H< zbq9`hBl+dtCK}YRUKcjB5`A_)7j$_(oN#cn)G6Hh3VZebyQ##&(Y<=q`V&x40h6Y` zd%8F3*_Xsdd&iC(EFOwVyu=-HxZ^u`YT>}1e6!E)BKLR2q;Q3uNX{E~>;#s5^!O{` z==58ee$XE!`aG%rotCr^cOvo|`DMxvmXb^aeReETdB83meY|ulJCa8BPb+bCeVI9Y z|0&gnq2-4($R}3R>TBsfMjom9r3S?eB=B?XXrH^MXUD8p_qQtpc{^kod3!rCFBIer zX6m+5juhwo8hdh8F={Ua;6Op`3?rc#_3mm7@V9EIXVcSTq($GtQWBGQ{-!V~cgAhy zKvUBt@o>XGqzNyS_It+I-KCv8+1zJQ*m0#!7O6J_fE|zS~Vx&v7zSjM;AUQ${fUx!x0b@4!HrG>h<*CsN;0mY z=$k|1^o7koX=b#cUAFDEnaQXM=EoE;l$(*{&c9UBDXaYt7<0AmN{n?`&dT4A0g@vKJ$N^2eM^2AMJP@ z7j#u3+>lXHuS8GTwn%@pnY#ykuV1FOpBy&Tov(fwaSS~;@5nGd7! z;E6d?ya2J7+^>ztVznl)hO)E#%&)!SLVMPR%|x2nl5*w7bH%v?Rus%nYbiVh5JLR) z*NF%cmqdVk7SZc9y%4b-h^q z_7(X2JI1ai=I5*q#bzyHB@;zCN#%D7#gqrbv7`@Xbe8A#4ws{8E^*b!_gUAb%A(t1 zdCpYal)J>=ZcvKvs`Ll&Fp8@`iSSun4hws=6u_fb^d@QZ@e=>}FB4632=ZKY(l9PV zG?y*YQI@egw}cD9&%cx)^-60PWWUAZ-H)RrvqorX8i+eF`N>UIGUBcgegf4l1Nl(2 zNLuKE%G%>+VT=jKguCvDs_Rf?GYmHRcxQoi{T!wpw4ii1n~ed#OEk{#A5)CeJ>=y5DXUkf_i6dfJ0_PtW)#66TPRVkVNl}>czz5^+i{iR{x zt!=M_Skuw33@0B#-TPMc-QSluf4|l9GG7!QJGmDx6~~oxbFAmp%l-vYgz?X%r9Z>V zpxDy4o+OUUN|IpSCncVJ+az zipHP2S;^CiYoujCAP|H}(NPDb?O z8zK4B_l5>k=V0bY0=S)v|d7Kda-#Kq!XY@@P@}CoI>}0Udj>? z!8QC3&*H9LUoRb)2#=*7F7$R`Lx9q5q(U@xNfkelOma00Kfp)kwJ8UEO9%m354Uva z*5uPAz1L=}ik$3wdv_jjFR)5V4hcF>>(x&2nADZ|8LP`!9`iYAKrD~#__m`Ol0N6g z8O-^m-II&M!pb55vpkqg5@HF~WU-)v%H#E#Jm}WvrMLE7=X|{tNx@N4SS^T;m1s$k z*HXTg&hbu1$?OSx^`zUY*ih!a&tuD@*@5S$_V`$*)t^Jzg%!nym6Ecu&pqoX#yB6m zv(}~$oGOwEJcH%ru&h&Z2f(lu)haI{oByI5>zvkUwM0|V$zORG;k%92{V6xVz(={H zUwKnu@enQ%q0W89_lL4hihR3PA&H{s`$Sa5I{EMX7I}6z5H@TJ!ktPn(l5U~CTCy> zUHYWSNv8TGbV2h$Mx7fUesAFeWAjHjE-jk$*FjAefsMv7ZUP+R29RCWJxN*z6y5gh z(?$%9)=aiX6gx`+EeAXvN01j;r6C9=mOu@tB4FTq_Lu`I>eKPR?}v9B0E;FjuK>n~ z;A23nyOglm$(TgI@M&)dyyk!)s%W9t+TYRYx-z3I3;+eU=^h$Xp6@jIba!PqS4mzL z=Zi2<1&rN6{c~g@Mh3@-uhHG)Nyc2q@l6P;dlWb$3XGKoSFHd4st#a^@v+Sdfn8X* z&z~NqL(;b)cMPI#MKt#P ziC=la*N!WLfr)MPKS;#nfT_5~)^mzV=!Jb@lU`QHaJxO)tDP(0xT+@Mvm^cU3%lA2 zEC4VIKpmPELy9TJGsc2nmjGBNf3&o*@!CRcZ;)P7b4b&HwLSBXESR@sa5sWPHeRw& zYMeZ{sNMxJ76(BtA?AkATkq{zTELQKKL*9q-F0|l-KVqu_%+}`I$4PTR=)u?TAzb{ zGN53#1G<{3{QdiP0k>t&&FNa*mNgwSJuU?^vnilS0uo9Jl1n!rF>rG7H=$9ZpOZQG zK@f|*Aylvg23TZR$(WIm;kvu1@ZN410YGMOvjC36m&6i~{k8(v-~?o~#IO%Q65on2j;QjN|+mDIai0fSD;(#jh)}Va4 ze}u*IZO|oTvc~q_Ez<_fqy)1pGr*#D-}1N^Zr-eZx^(q>qbCVqsRpj{&7&pyBwk)# z;N*A&GXNMJ=0WFS0BA3r{GkzXwU7e9q62tm<`);=QZWN)p!Q6>f4)Y%9wnm&VRv#cLj`*4mf>3LoQ9LY09*1u_UIH;Gx1$6U=kpQ$ zwu2ct63oE=)A{>Hlk2Zrv%VB)y{W!F|7ZRR@mu<3Uj4iaT051e z!y1BnAoBTpSWbMIlpshAHhP`{kQMExqI20GRzPYLq<&DR3iU$>q~j%$^K#Wc`JtDE zO-LZWWA_yg;7u7LBig~N`?<|bI>2-g)u9k&@ngES!EGh#^yTtXkB!3?;-#c2PB;ld zhb?j$VVTOE8bO?xRJVd3Er^fany%HZUaY4T{bmvw3Cc0<$TS}vFxpr^m6l$+v}zxO zJb&(b^gD<3s(3(nR1KlYR@NMP-jX zrvz9M3PPNKvKJ^Mc;J&H0?-I>!~%g6K-<4)pxYLSzFx81WnaWA&jeSiHF@!e z6@vJR3vI^0z4qoGIg=Ly-omF1mIFrwh`?o%Eo?YY7P~;J{73v9*8mY?n(xaAxTUXG zI_`pRAWy|bbl`M*i6Kk?VW(7Euhd}ROYS3_5{j~rfOb#E9~bQ&E%a=}^MQLYVh5M~ z{GLSVu^gBwr{2j7ZlXyMaE#s@FGG^(EFN4g9SSZ7M)LA{+i#E#)D*$DS{F^hiHz=m z6QhO62y97kA8zl@VEym#j35X1YD$2zu)#%qVZ0`g2n256ZX(A<31u-1KEXRJ!H~g+ z3-S0o?r~s%`xw{jV9xDulh2occ?tHyiz&R#ETA@9gPRt-;ov0?rB(;u|8}6WeF_S$ zW>R(gC;DG=oqz$=*<#|<@V~ZUyQgy$gbn@*s!bph?M^Q4&Z2$9Uv53z+-r?^buiL17Ue+SO5Dh-NnSmh+ z{xj1uP!uyPW)Y%nUJTFBDUg{@Zu0RwebWc)1)>Fru^5oomNutqrUUt8FgX7L*zH8m z0s=nlp_%ExsBaO1&~Jj%nt9SzC6t+;Ko#N4gZ8&&FJSju;;ZTdr{Q^k&6xRzhK04y z%}LGr3Gz_#8B>C|(h2x@0;`e+$Pd$RVc2$xMSNkPdUx5V4MZkk25^Oe6E|mkkv4g28xStT3Wn38?RY>#<`1__?B-fq8qK-} z!1)O*a#u5?f*8%r%zXZ=#yZ^7%xVTU9_fp-lcz_&A8yXJ(1XH<;Y2%_i2>rFA_X&25!fROB^b0ngA5D`0XF67c{T&Dij2$YfK%7IEpPIZ{L#7jSY36DS8FI=SRXVRK^9n7N2+H_cZ7`@_z9O z2#~LQHk8)VdNa#nq!2~f4wCP}`g)XdGW!Ckz0(SIkX)x1mPVtoUW^DvHnSLma^vD> z6uYi=QK81lj8Uj|ev>-2%zieu5&~TIwda+Yvdc||(_D0{L=U?#p~hoH+E*&gKaL;PTK6HGCaYw8 z>GlU8!=|PsdR(AlLI6^T(&se3h2`bYmqNCI)+Lj1`>Cv$jnU!|z^u7gYFrmO;$gi2 zXjB7pTxunvws)wT1lgfv7%3(!_3=`3=iVPN%cE>*40M@depd(N4iqjmD!+^lVT0tt z1)}f-r|qol8yv<;xc5ucUkeXQrV5m~w4Nx2Dg9!VPGWOcxHvN}Lj3_Mo#jy09M~@b zpjsl`rIvo{p#$&^^;y7D=loMyYGM4m8%rb zth66hD7UAHidY7cZ4W5SBmjL5lyurZec~9eR&FIx{l*~;N!oYry#a#kni2*oFQrkY z03Fw81Fn>~h3mTkHXeaM88>`=l&_kuxS{HHcew>u>^D z($^M$tWkhEoHSGzM-yz+ha!`u@8C98v8q9sr*Vs0*brXsvkuokxw-V z0__1Upk{8ZH9_w0=}$Is7`jLL9Fb~-k+2L5-mM-~Z+2TuOh$?tF1GNPHn=u3kTOY@ zbr6F32BXVEcf$mUcg{0)#kqC)l-zn~UnGz^EPm%y2O+AsK`JzOtf3JOL#VZJg1~iG z)byLRY}K}|JnX6+fo%+GA2yUhlLX?2KrLRPeMIKR(Db>xy0CA*ACP@oY{U;EOaLLO znC3}8gvc#w=6H8$DW7AzF>&V;*x)@T6B$&fXYL?LI1nZ+v2{;O7@qTeuuiJ%1{5nn z1Y}%91gi;4aGnPEa#XS0!m)B&3a(&fcGrodDTV9N_V4f=+o43iKg5~3F4Jv}=? zwJcBL`JX@j*MS^ME9Sv@`*DA$VddHA<;woOlOR(@^4J%4<7LH&Yf=}-tiTly_4f8= zy9MGvAq))+hhy2i58iP?bV~H>+4SV!0|$|OOTAJ@JL>xeaLKOCwulG*b-sJs<_n%E zi{^Bi-<}d_0rtV^;Z`x#*yTK3J&nP+Mgasf8e7IR}Hd7o}+Ub#&Nhe4y*L`V?= zxTbBy}xd-%#d^^I1eQad+#HGf}aES+u(OlvL|K-H2CkJkOb;8v4)u5CQrB3FU7ZR z-uJs>T>qrJqX88BD&6=|z7IR99VRNun;DR|-kPymC1~+*EuI_IKe-7CxheM^11n!| zhNyez!v?7d=D?Zj zUFLc~F=)70mjib7>Ra=VfhzN~z**WRA|_U{{B$?YPPWm~1eUbHf>hB3Q5k3(Ycmlc zs+M3{l3uinm?HkOH6%EXS=ySFKM0WXltNkeYunw?G}VtfuckI^ftpvG z+ki!+MZs)vj}T5HQe(E57~9$nhT-JnfeBmzYm$KPt^yuHpzSQYC4Q?ddrn zk0o=Ul$HzdjUv+(@&AEJ0fJ}w7lLoQbVJuY`9Zw9)|=Rh@EWUnbVaYsX!tB-i4ffu zg}97g0Q<^~Uj0D@Hvk0U8umZIKc}=>_^X?V7zG@@wfp1hS_cwt7j@Ye(WHNKG-k+u zF)tgOL&WAvl|N2j!I9Jd7jFQ`LAl#-25K;v2-1u+G-R+=c!#*NR2M@#7-N`#xr;;m ziluHE%260m7?w+zPPw0uffq-Od<7mwZR6tzlHLD>AOZTPfFoNfU?K|g23pa~)ZmpV zHFz+qxRE^}sRFrx4;-s|>@qkG#X+t&+9D`Kl^{4KY~!kt{+|XG$Unr+;t6grDq<^M3;q;X0E^FuU{y&r*yKUvL8$Ce&wt8FJ7mjXA z!ad@oyCT$xD%?IBcd;7$+CgOW77uPs#wZ~TU^7K(mLLq9J?D7GPKyZ$2+wnKSsA4Q zV*pj@Xm6LPtd{aWyavp-(m)>Y7}qSq8!mKix!R|KNTNH7_b1)?J23~?b`^T6fup7g0fEF z$qG%GS@~c5SRyLs)A!ET2Az^&9&jV?9HNT9PKv)6q!ya^3v@BA#Q+4oz^%InRW-VZ z;~*%QE)ZR`Tc7y?BNyGPAn&bds)>q+A@T8Kt>=d(<)3g477_nLURk-$or8(Tg3pd! zagG-N$rTUQ$HwHrE+)0LwFOz9)0l7^^g2}Ak0@eAT;1JuCqTauMJkVB&+i}p31YQ{ zjjf<|cYN|QTl<}LpPlN@kCzv6K%(2-|E24hNf?$R7h$>Fo5-p41`k}OU|I~gBc{FA zFdXqrR>0pwc>TIQiw&q>w7-!FnP2(?sFC$2oX=q;-8}HUME=7Yx%(UeADWu1KpRxQ z93`-EfQ}#OE=3<75kPaUgT^KazkRLBI$!<-Ofg|bv((MM6-Nrx@osCT>ztcg3|!J` z_}S-Rg5<%v>cGdDYZ547XwWbv=zr)``!apN|Ig~c!pchi9Jn@WmOQu^44nJ5jO6f}06fO1%lFU+1wAvYVecYg_2q*=ASLDjIKThJ z*&od^(+|X1S&T-l&+x&ov}wE#o`d+w1xH#@lSO0sh+2}eH%97SrZY9w%R>EK_PbjQ zKMfa;vWSQ`lmgTefKa`tfZa>?I9}KoarGXgbzTJ_y(6rRD-nZ_i#HZau6A`Y%v^*C z-||&U2i~)XD$Axx+`m8k!QJ+IvoD2G978)WySQ#n3a*45ua{P8l^fs185=+reJ z5JS1)7LoP2cEgrH>#QFCTI(>@lOg)CV`xY-OriWi%~ki+FWR&4heY$cMC3$6h@nUy zj;>^67Gn1+HbOp15-GEavdp&~%36Qb)+Vi2qy+)`pZ1~ycp&8BD!T^<3Z3ECsYn8* z>tgi>AG}Ix8~=Rs>F%zZ_3km7@7wp0#NbHNK28}x?GGtYz zr(CI%Pt2`fLaC0q#wbC{rkX0dq!{~_C;{aE;;-y*RzN~!fr~KZlZ*seb?Pw;+e43> zh$2k1J^+QAVu9Om%2O;1V0hg^MZk4N$(K{AK)*GZ z`SLmb2f(gqd5iO~)rQ|SL?6ECDOnUTWu0Uz9z6RvzmainnJI{mrMd;vh=lRJz=cNo z2)P}c^yb--TlTt`%c-B;L$xEiLOj9Rut_GO zNnrpDXX>PET_CWe=Xh`#7UdNr0(Dm8AJ3=~gM&uEwU_Lt6dr2*qtR1!#StCAm9)mk z2vw#`NBp7WC3=_?`!PB?dQSjtl(xENuI!bL^4c{>;nrVI#_#>}b10QSF79V4Ljq9F zWMn_1yz_x{vvYK0$jG3a68}IRgWXAbMqRwUww5QK9-+jYs(Sm$cM%*_A7BOj7%E@E z3`jbgv>}$s%Th25ADZ?(s9<5uit%oRwI|Hg>gmuf6&gsP0+d<>Byrj6osXtSA*z(j z+P{?U!#{hKb-YZ%c#;zm`^S2-NTN5=_y>u!F-8$BqJWuk<3LtazKBLCs8qepw)Y0I z(_GLTMY$XBZfgU-(K+4M+%&G!uU52WvQTzFcrD$QHxpn^pb;0V{yICk9*0lTEb!xk zVNS2XG3S}=8V-mq*ynCxMznGNg!}F*nJpe3-CT;};J;}{50DZbz$jNizO9o555<6Yz&8m4Y`%#SkI+l-;N^-$ zSMON)m(Lf9^lq!guJ+n1Wi&2gyo0?63Ey}AG4yIU%T6|{+uKnKk-X9ZXV_!O*w{M} znrv|nIraYFQktU@+xSAt9hoKN4+U1HeR@gK()2IwA%DH~GlwLH;;*V3TTO?;UERDC zQ(qnIC|vL_tgT@$e}C_q&3p~t`OOza$sZ;bYl1`xkS8W49{=rPI{woeXm^?|I!yoL zFR8Y3qX>r_+ODePXd^{%ZQKzriA^&Oi*;=d_(SJj7^{@#B&waOA(!_#foS2;BO~FU ztIy0OCmaXMtbc$NRW{R_ag!*a?Gh}51qUFvlS38%V3vC+;~oTDN#Ien&dA%Atdq;O z?|XV#9LNZd#iJEJvy~538ZzA&a}famD&}70xvRyI@R&>Isxuhy&BN{4k_+=fZ4wv( zu(6qV9h;pyO{94el8`9WZqqSat@0X)_(vg_6+Uu@P?6ZO1ca-tU5t=_$+&~VGAelL zbfQ|dON&X@LKHQ6_^l!q1KJ44hK53vtY!-4T#pXbx4!$6(FWH9Ww%9IMI?L~Q;$rY zeZQ|RyzfWpcwjx4NISN5zxD5F$WKPNHsa4bE*Q-`j|ad0i6eej-z{mJi+Y8Fd^*jBxn156&{4N5p?B z>HdgBi4flZ8{0PewDWyO`*0u86-Bi+7u7(8^*g_6p(xwm&33aDJT_`O&r41AU@IHq z3IFl}5GCAdIbB+DXWE=%HRMRt^`8FTmyDK z6c)1zi4^m$ytO@?=Vo_@VS`NU(RZDDA#i&bTI;L7=T`9jnWq=~%A-#v8Cv8N~F*cdeQytW=0paZSlPPlg$Fdq`u2!_Cf>=66~% z0ZdlhQsg72rhAU9ay_eaV$8bln|HlEYUs-1<|{R#H?JpdSdr?DpftPDX6UjVEBX;0?si7NYn+hc(Pe(K~1%FyNCTs!Mysti(oc=jW<((&2h zmi;eS2#L>_ZSpc9W)xM=fiY@Ov2DP$ZqY%@AXEvy{jaw@e@OxQg zX9u!a7=-@fTbxc?i+IgWF;jH2OB;Ky2S*(Jo}XMepRQ{>V8qm1X0%Oxj4#bTxG{e{ z=yJlS)uNy-F4S7T@&~!^-l}2WO1;ZN!@hN-%3JoF&-k@;cR~tn={y(HB$d}*=~gWp z!)42-gW_Wz)PQ~*25@fY?{a!_#9MKiE@sZMj66^iO3EAf)D9vtEsmGs8I`F(4bL7(SOCa_*J%hNOqAdQq{?;V}J{YF2ZwwWfzI`x#Z|`v%pdh6< z-aWRuvYZ6PWzV^3KDgfr_vVje9trsNYC;!5V_WxGL z3q1@djCBA9gA0Y`3jqk=m2&OR@ef+VaW@o$vxpqWOkE*D;U2GYy9z_DK7~L8p6^%{ z-=9**pHZ{c&M?J08e&xuC5QSEw7zY;pxHoF-E)E8xUFbzZVqf_*A6x&K7pnZ*Y#1Z zJhhhyd}2yzpbxyC=o6`Y*c1-lWksQ2AR5?xeG^PqQ!oIm2}=u|1P!1A94LjcvSGvz z4o;Vr(Lk;SN|5sFfJDp+PNFfLakT~=wxdwg?fLt&aTzg`X6F$9zyGEA4&#Sky^Oml z{&czb5s))!KqX^z)x3*C`nK(0nefHphlIu?G9AFxa0Ou?gy?U<@bOuIKZ_P9l(>cy zy>er!#jS+wxrV25k*{!^GCo?7*$#&Od$^U#$z|3o($U!|XxT*wnuFoME~B7Fh92Dj zigxE`M_4%N%_ty~FO2a>9b7)@3}*opH$iws2!0bsZWp_w1)na;(?5vJU?-x*Ad21KAhm<2};q#;jcs57_1=m}T7fHr`0G@*Tn<)Dm z6PjTjks`$pY6=i+i|4eG&t4$twLb%qSnqeFk8U1l5=vTbeRksJY==`N5|IkTU^jv_qqH4tZm0Y9z=?{$ z^;5xahNFf2A&@j88p7*Yden5S2) z!+j(xL4~lawHcsWf!_dCNnpJSefRit$ps3w0sKe(j&D^l=!O(Rx?h@Z(msmi_s3prmx02uV%;w_w`|8dS)TDLj7lVeF;How1Zh+ZQUnY_N+jOv z`TlmA!jc=HD=V~$k`t{2gf`Jy%pjPN-O6t-6%L3@f+z6m7`wIx~OtV!{tE;O% z{{FgfOYCUtDH0Z`rAOF@#SRt0ke++-91|U&=s8#x1H8EuYOwN@&d@W?#pm}_F-gkLh>X43G2{wV~^4Qrv~I71K%fTEFuq1auHn^;qk~cG6{k_ z1sU*vyc%%-Xib`V;9uii0rZUah@``$65R$+Dbh-X3HV@C5MI5NhpyBDtpZbQOXSV& z@>?W9_^#)g|L21A|F69e1cdG0_4l9B-}^_{at#BHHm@?b$1R(Mv8(oTJQ8xYYHkhO)2kgX@w5v-p3J5y3JZ2bM*c(%2m- z&pGcUNlvoXEuEuPH|rZV^dsHliyNxi{|UAxVPs-RH0GqiP;Rl#O{ea;JCZ*uaqN}; zNI!c~Bq8y?H>@s7s3pCitEw9B@78Ru&1q>m<_zY&?|hl0u=(w;_J{#|sM=JO`!r0W zisFt>&CtYrRyfH}$*R4ek8;H++@bsMeq+Cz|J9?DKX(4w+4iLD9r|4ocli@fZk^t| zP&%-QZz?WG7y&$MWIb)X+z1LeZph|p0_51|`>^ai$CrBnm5cepDgdZHQi33Qs*GC{ zJ8Tm0y)@D_bZpky`2j461LRpE<11)pfZ#M2R9)~D01{OX)YS?XKfT91%j;!^ zSecktfrr~b?7%*PdH7fXL2R3vnk+%`4w?zQLS_FM1z5BDo3P@a$3NAD3Lw-SGLLgg zS#t(1g+zk%F0@5!_HB}=#s9_;*emyMNl;3h23a3e&}2Y}L7%KtWR$WyEbO78 zTe}M&DvagUfi7^!b?_ba0?iKIQJj=@k!6vvW&j1m1mzOj+S=M>C!R<@I@hz^cMc|~ zHc)(uIAsFpP8L=Hm=_D4QL>8*esgovb_ZFc0Fvtka&HrufQp>@?G2Oo^DuRQQH35X zr&K7e2m}HeZa#S0f7)q%2-#ZwD6OiYAu1yy2qy+5L5DkcEoe1CnLraN3^@5USif%n ziz`24H3OH%0OdlEqSK&-WC2O(V2^%+U<_nZ-Ccb>DKcsT68R{A)jUrFux*3n`2`VTa2t=ap;?rl(+6K#QhIZvW|KV6;%%yapmlVqI4ul}^aP?Be z<5wnf8a{TE$Z1@j<6PpAq}WQ?%w?|uzU=Tahzn5ln3O#L^2iFAr(wFx8{E)S^9&F7 z+_@)WG{b(K3(TR&U2gWS-zo{KdFI}S204)d=s10Uj@YsADv-ReQ|a3aI8n=j5b!sv zy0%1*s|GEXZ4}Eo|AIN3eC;~D@9b#!kn*FNq|!7szbRcGAGW#S`*8pu-i?0`Z_%LO znSnm=#tz;)ZsUb%Ie75O5Bv3vjSA;a%~6S@*Ri7wFxnmAx4f5gsQLuBC7k{pn8*kSgaVW9i<$l>%>2q1 z%9?b(YLC|M&|Ms@E1n)Gw>ei0JS2#}B;Yl@TNke4HFD2}GIJ-2dI=;)rZ;js*G>#B zLQTQ%%fZiI@#Fp7tKOXZPuP~4y?)yi2s8^@mgyZOqZ=>b|2eFJ(4qXxNW&zcXJHR&>SCN8flK-wY5Dohei+;sp;?cd$L9 zH(W?JK6UQSyF-2OFQXNR(2iym0<((TU}OLG-BX~m_MF7ibXC@tQjg}+1qB4SA5Muc zxF7?9-W%W~FWVDyA;WD~-(=LjF=bZ_QMVYRNA{20Vm7{RtuLNt;aB42J5Y2i3l-=p zX%Y5S>Kr_eAr8y(XQ6+uJh4iSPucWKD^2Ua`0V0nA^Zxdou^`^5o2}W@3Z%uYEOL4 zX-^p|G}wW;>$-a+SJ6lew`VV{>%lqL(r@x`%QFA0fEn5LY! zxqH-$MMxcU_19+UKhLzw40F>5`j{FgwC*cuc)qsRf3?PEse4I-84G{qw%i!J(p*dS5HxO|4Fd|uLLPz(BW`5j5 z@bdm+X&@$LVD6ou7{0%*dF^)m4bBgnZ)^r(F4Iv7U3&8ZCfd#cyukhs3huG3y9ff| zQiy351dfV6V{~GiZXByyYoq7CTij9N10~yQ_?$1q5^LUCJ>K(&A>4hPH4@apeo*28$O5r0m6OdwB2jjRbb{*EA9 zpmiPO8;YeRLiWhsKsGD6Q<`^{etp(hNj<35Ko{tA#x+sheV>-rTW{7U`&LQuD zS5mNcT3xzL7!_O6!iV^GPu_i4XMM`bFZL*SB02BM*R2zV*<{Q<9Eb+Q4Ymffz&ok&`Vmpv1F$5E3Z(zJ8x`xTCpZf7G)~?>EGuP*a?1 z*?LlkHH9dZa;cJ-G{tBSI;rlY3u9+~&;Qh72cxgm;`=3xZ7!iR(ojWU=_)LKLn+PB zGgUla%22~Y*nvT9&}F;N@68+gss0`M1|!L4W`hKQ1A!xdraCw_m)^)r|Io=*oqn5R zvbE&&#PhaFwF$>9JNOWnI3d)N$)eGCY3-8%!j|bCbXU7?q*DHD{!`<2=-+CKmO}G6 z(7Mlaz%1&|Y-yg=0|zyD1jIXZf98KVcI6yoI6r*PitPG3P4<0N zXfP=Jk717jiP2jZ%HXwcDai#S_j69Sv)Mmv#yzv5+LFnYu{`n*n!1Y&b7Qg6imGg@+3~uWSr@(9d)ST7O;rILBf9LJ1LTZBYtHQQ+dqvyZ z{?+Z4Po6)5V+R&aZ-a%;f5f&WrgfK$SIk0^f(*{W_rpu@B^qnSy4+^a=10+3y}!R4 zh}RH}$0mVLge|I*59Z_<#qu@rqgBJpl8l#Luqq&ma%3mHWC_ z*401cw9=|a9=EbtKG+&&T;D{eFT|Az1Rpwd=m}ro@=R3_JAWP>*bibthdDp^z!|Kc zXVgh{?&MF5jfjX~KeB{*40Q;EcODBhmz zY7s=bH)C$j1W3RE_TaUj&4%9IU#mEJu2b1#ebRdQ7nO|NQ;BxLfq~_{%@7`1UOLTw zk(kTmsD}5e)s#wq3Vt%286{hN-f9UMuks!Z<4O}B z)qA9vuF3P~OVWH1_|RK2chSxZ6DUPmi(pH3&J$$g`wn`vd(Y$O2F}=+?Tw~VV>1Lb z11dWYCP&asJbHGI87F;U8k`UNz8v+B%eo-Fq+y|3u=$`*;)nEH10W)2#X%TPq!XQ{ zgh)H;r1XA`DcnT!z{T5k$nR)}2Ho|MTy$!G$xPHfsVV#@GkH&{_#vI#-cB2f7@M>O z@kJ7VB1DEF-$H$9XWd+-j@IdyK$U(;u`1MWmaWO^#3jdPP|b{lX1(86j?YnrjD~N! znJ;D-K96kO#SY<7gonfqPjl8=8%xCXhxG|Il#n^}X-=8Uhg2WqWFKjxy7c*d9hYzyn+`p&sr2vGX*wjfrjwGtFP$zEGG1XWE;Vnw z~eQcZwnj z47~1S)E+)n`e>1A`*h&lY#4dFcP*9gI%rF2kHdbIx=+0M#%2Is)CrJVdGbCghuA4l z50>?OEzb^6a62u(I1#$B84YX|H1&EpoCJ#llN@*`39P-&b!8m!Isg~*Af3>3$Ku69L1K$7bhh^5@L1g6j*>gOvs{cyCf9XE%LOQw}RKkW;5@g zr)+pK=pDUd&@I;&h%p7xh80eQk?MZK{Bc+|BY$pr#kk7|oM!&g?4or52seSa$+Xt;iPKgp-5NV*y zXYyiVBAOVcfA7=>id{}w@=M%Y%BQ%@li2(oUN{$<<6Uu%JO z`c{V02X00p&_0If{6%(5L``^0$1|FOrVj9Uv1rWi=_whl1 zAwaW*{Rt8*#I8~8F)%V?8T(}4qSe0n=J+qNpmAfbRDh|FE|FkLtF6JA3jGaoZCTC~*LD z_)l>g!~bB#q0_=U?1G!{kFDmvcoIlxeF?}!>cc(W#cs$B#u_8`vlICHh{s3yWcnpA zB1Ho+PyG&(%Ks1N`)`ur-@!@^Xt-i~3QiRR_wc`&9fjHA4&fNw|0C0CAHYVDhC*^7pfCqDL{M?axxdbe-c@}C9^&xDQ*RS7hW}z2@Yl+a0V4e0 z1_-21{f|7wzkNz1^uM1`mnnDwi7w-Q?>}fX-8T_9$nj5%)PK_Aq)*5}$jCwd2c9A` z6Y~=BzW(P)E{nQ=1o;+Adv>$j2sT$cTDRoK;?9r7T0<)4`=Nkg0?26r^PM>PQj9F` ze4{ql?}q%uXAM;@IjWMzyOvncm*@KU_*mwtsi_r!d`!}cMpe%LCJ6YfDUx^9((nOZ zGfBj@A*1%A_v)jQ)lxkI9MQs}4wG9S2CvO8PK6cSn3$+E#4j?!sBdR!dh6WuX8ulj zN73iU!U?aLzh`IsBQS3q7}y03Oe5&acM@wKQywe@jI485L?>YwuN}rn9?gILl1JWB zF!r|VYh_My<(S+JvaghRTyO!&_q;DF2926{Fq%33^4THXXov0`ttgx4fuo;4jQ%iN z3p{t$8RS~GUPnJYNhYIBP1z5(_fq$a@EdSEHOV8B@))DRY(F3EMJ5GfbSdtMPi_)1 zx5V<9Q4^!98X2mp4TEw56JORF1J@#z7};L9H&9R+;64IX+YNBbo&u{Nq_CWVQ0S|p z1)w`y2P;G+G^7mJIdU0zH>ppFggM%Olx}&s1HyYl)-HUi;Aj&E= z`8Jf11u)Z=9lg&F5JNk}S!}7FsCI7*AFVnq0euVi#N?zERD>|M?1uYfmEhs+od|No zrar~sTY*2fwyti5Zfjof@&a|A6hJm8FlIJD4vH@*#N6Jb#eX{pTBA=ObO#&NxsM() z{81NxU*SrVcYf=7^UlME@@tBa6ZY#Hw2q)v*Mq3$&8@A)pU*~V&~z{D0dax_2@~BX zk3!)w85AtraV(kChN+e%u{Sm04g~J~ArZgZ)a>wfXs2q1n~7k7t&3GW1)K^f;L4)X z(|MTpT3A?c<1Svj2ob5uAO&p`@CA805DBb68tA*Z5%1l<4{alNON%bjqJ+$^{?r(o z-$6m~vU!hA%4)&{8mxFT!s6ogmIly1c{a^}*xcMfomG8EJ-Ya6$^2 z4q@I2!hXXy7a71HO}Bsl{vRq=4M_i?1^Doi=iA#{J0>=EZ5ZN!7pbwau|*rDXg;@K z=h)sxZF;{*Z!z>lyFsIvp>b6hOu7r(Tfqpo1zM$Vip7km)8cq^OGd5yB_8w+uH72# z2292#wHjELm=IHoGt8Vm5$W&z!^n5nU&>%a@>!94)vs}m9na03hA1+?*`{{q2rd!^ z1*PuNawe^hV+ZbpE99pHY_8+#{!9x1D3Uv!cHK@PQ9Pz>JhWgN2z3yi!MiiCDh`-dm() zlOpXERBgcVctTRZH^aS*M7=_H))?7GV#+8#DC86JB}V`gv=2zer-);2g0GY9^lDv~ zBi7xXGE;%w0+%ja0puyz9I;U6GQ&j61cr*cV9``hl4-Z8V(_qxxlRROkEPC57%E2B z@8{xVn`*!UK3u{m00i*ag$xyFw+Sj^zB@`!xZ>#jSa6A7I+l7%Z|7){%L))&FHcA*WkoTI^^vH)+xWO*QEks?7ZN)pT;s3dwYF)&Ip%odX!>0 z$H^!&mu_7tDHgYpr|OG_s6G21meo<#b$f-hZyvVDv4$(RYmtJAQs$=0h4&4!9&XQ+ zL|nRHTH}4ZcIFou&&-s>$RR#4J0Ow$xAt6*)8O4>JXKXyngiAPB=-RleUaH6rACu| z`)&#=#|{|Gl*<+ee}alP?^SN*H^ zhDzBz{7NK9z2wj;F6LyFrWKdA0N685JKE_5o(3%lBWqm)R0;dwLEz%AoPCEwONI+{ z7~a-CtFFcr$M+7{k2Lh@C*_V!zx~-HrBMALrDPPtVnksx+DORJxSCMU3}Egzg1>Zs zN%v)tFj|(H3jv}KWmD~*;tZL}AzLf&9GD;ZPU>5_QO$d$A!RSz8&WhC6tE`ga^u(i zjpqp{8<<+%=nWz1-g~Be--(@NA&8dq{iH02(Ewt#(SIirD(oI6 zq)3=os^-tvd19#3#uKLiHrV`3$byM95t6lHc{m+|fo|M(90RKl0K$^=*iZ?07vElw z|7A1SF|%gO`JGCcMphZk(|kv;yy$B(R=dbq*@t&FIbiXaylycv1z;I}Do?HxifnON z>T^VHll5M;m`w29f=~A2rtNT}h=!(?gzrcP;1uiBY@o~^Q}SKyNA6jUbbzB~y+gA2 zN|H~(_W=TF_1Bir!LtF2t@nY$&QQA~&YT1QJI%e17O*^nThsB71oT4-NQ}id^85cR z?_<{hE>0Ifq#C+(YrUEJ{^cx69W&cjNxItiU!(Pj;Pvx~&3JLbkF4()YI2y|&B*>1SZ(?mDJ_0!WPrz$gm=12vupEoiJBRtsew_fHb?zAdBpRzR^O4ODGE z%GYmR2QS|wfRdZOka^u#=6m`kpBDeOoo#^>Ev-MQRvwW@;B8*r)V1P^J==RVlcthN z*0VC-2&=doLubWfG%K;6oeg!1t5*6FJ9<_kABUp-B=~neIa(wx#b$X2S%q+GN>?G7xeY|B%eeuoyp10UN4u@lbUhMfkpMw+)*&Juov4~aejqa@KuLV z9G!^bku0L88}rwS{g{}h-LaLHVOqrjR)vG&m$DzC$g(jvGl!2D<-wGwOIW#yF` zsrR}POvP_2=?Vry4BIWuutQFkq1f78lrQb3t`p!N&3+4@8xur^#EA;Qjv_0(oVWsh zHuZ@V4Et7AzqVvfYztOTt^I7!F%n#Yb&X`jk@-{kr?t!1Zxe)UCy%_YTXv#n-|u$L z)KrG2G5pUN#s_Kz=#PNe6q;puKbO!52rf!RjQT|r6Y!%s^4=D z8*_=pGL?}Z15(QVXjZSOfh7XDqQ)|wVj0ePB!5yxNGNAvI$oZcNqRX~B~+dNkY`JB z93R`QjaS=lS{^5M)TwF zBXVRO31*azc)mQE>O?RskF2Zn28oP-T>ll{j$OyzX@+2`C(NcIyG}TJ@vLsl@l~08 zhVLlLDqEJou%sZv>lBUc?hVRC>`Vj%f=*$Z$jxIDx3WgRn0arP=*#zm0b@@byFuHJ z>|2s%wHS!?4>b1PFoW|9`)hEI13f~(qjgUfoe>mCLZcj+*(wTY5o|M|>gCEof7OIu zDS&xpvG@)a8?1aS^&(3o45tX++q}{kB^c(>`isWj$YZ}T{!%3OU8YfdU}>589f?L# z4DzO6F{@N^l;FX)j5B14W#4Z=!vDyQRe1Fb zS;`#dr+k#Bjq%_Tcfa;hsPZx&l=n!Pc>#%MH1K0@IthBriYBrDtzLDN{Gmf8uVeky zFJ?Qh{gz-C3Ex`bp%oS$kw=@A!=fMnWyEy)nf{mOp*CVL*045!RaK)iT|Nrn zsJn$4_g9kz|1R?`sI4yNic}0hR_3EeNPw)UiptUN<(FVv6X3e(t+T(BhHZXYnH?v- zo0EX+-O*vzmJzV>y;qc*=xDR84Et{Qr zS|RIfus(pPVpJ@63oMZf02{gh0NObjTw7b4qH8+yHHT>4{}Fx_0pk4v0EK63?Y_N} zXw%lx4^61mW=!hoev`DdUQ^T2lOh2lBY>3S4+QxTR5Ms;~{@l+lLgINf61>UlUunNI4jmT0*fUp{{#B~LS55_G(PPg(STYTYC}s1*ZlQu zVZ3>^QS#kfG(2;mYw4$MP;zlFJD%H%yL2;=B8O>*S^_ww1|{YshvAZilebk+WPugn zR$Rd})uEoEG6qz78`K!jRgj#D_PlyJAl{qE`LG_Zue@_khO(7=A28}ud%S?tuYqEw zLR)R^_wVM!!`spDUOlp0|M|@8`K50v%p61dli*RRhdkL`-uJ@S#~FZ^iy3IQest}Z z$M8KqL^v_Mvh>wt6ZU11i4I!*aSb#YrNNEjFqm@Oj#83*2r5WU0H_z4tDt^L`F6nj zi{{ybep7}}$?gMNoUz*pF!tDIAOyykT+{;Pt|KlWHy0_4n~N07%*hwF6RCGDp2%w5bMa` zKuNmF!@(IP+1QqhtgPSlyLRnDq!(e2%x3kg)k?~khO~QYm+RPv;`e(QdrX>MzrnB7 zcoh=7;r&m+vc)D?&T%WphV|B$KTN%lZ^#&f59+sA?JucyE4Pl0Nr0iXmJ&}FdJ!kl zwtZ=>Y< zk%|NFB2=WOpz(QCUMgv8{1YB$J{HT^IK}^G4$vFHVj4CEHVTQp1GTa1Wa}Qm?)AaM zJbe=9tbJ^iBzuYd$y9y=*GT@DTzD>Va61V7Te2lYbec6zw7TXvR<>Se%Ck7g)=DxkgaWu~Tu^FIPy)jT#_SQgzAZhltqa$Vu zFpk5Q;PnZ#ijQha^x(pue6}44Ok!cOLa$I*up=|!(wNPyU1PIq zF?nt&?gbK`ZB0nCTFXj#Elgfc9yX=|4v*4lSZFdQ2);pg>VQ=@tMC$I@%E2U5?DOynIYBjF!N-=s%DEsSAUQJeX1i+;h5*G)HXomDTszn>tO;cUPrSEND z$xP^kzB&HBG*67H*`@_V$oYmWT%X=m*lE1J+FulosnJP6%Ai{hmSr(4QShLe7jN6& z(PNxBL18;;%t*WbT{F-@RaxJcHV9$CYL0F1*7iVbL-Jf!&*QRSC2iSLdD<O3s%)^>Yh6h9dHlLxJzG0J9K~(V;Z7Nr49Zjb_UZ1EEE#DTau_P5 z1Pd6anT_NAzg#GqZ_5D+n*tU&NRyeX+fi|-A8GON`bSqYyUeg|%IIYmvOql#eAkoP zqQ7xyK2bP@)1Ri~MDkVOLxYjl>g;GTJ_0!!TlOb$43!iKm)gOmQe%WION7U;Qk9wu z9a1|loG0I;_*F)fNRlF&p9>FeFd+BLE@VEDO=4r!)gx|vYHi((!^Iqa^F9?0;9?J7 z%9~c-azZ4D3%T4_r`u|bs4X#Qt0C}&3efmsmM5b4Pkekx@rGSyQ2Wu^J?s+`Y`z}* zXxnOg@Yli}APURg-hK@9N-$>+I=i}NvRNL4Gx6Ll{TVrvi*OPB@unBq;37CMgL0&q z_zsW{yaxT^BH#e9jyJp-J63srfbVQV4M!&b_xZSmOB%Spo6Z>-8Tk-YDG6fvQIqf@ zz-sY;)$+cQxx|V7FYU_lc*+R9Aw+ojOn32_*B+?qL4DM(b;+#Pt@uQ9{$lVxX2lTu zhwkLblTcf*LWqBt=e;x9Wc+dTh9rqhCilL$d>+)|H$_wCJeOadf&v}c1L(zzsj#r^ zKj#G1{jsBKKjS{XymAUm{DL>^j%6J@w-;dzU?IJG*JRcJJHrLn8}xOrqCVCw5beo> zN46bBpxN1D3Xdsf#b+xY46=xyo$hG5nHZ-x_(U}Nc?ex}Vnk*p-w9BrB3je#1z-bW z!2Ynb5*8NrzBc#}XzkLb|9&Nl`;$A6m{G`|aYjNWBuNmoPfz zn)(xA+QXV;$DP@uEN}k;7tL8 zE21+iZ&(^5lFzrvm%F<#~wEFS>EfK@{U}Ome1=7SDGlQ()D=0Dj#F#t&=d z%>E-%L%#>~FQ0?etT|0qsy<(_5KWW~J%9c2vJ}ReQ2Fz0@Tc&X0(PGx3y;U#AduY3 zp+(N;K=X6p(`vS(CPcXj+T#3m5>F>TJO6dDDndUR`pHBj;x~U>(LMZ}y3{>VX!Xsh z^6|5_&c%quUyGkv7}-YfK#kuJfycpC5K zL5C4^z|&>!07uq5gayCw+(V~;tvRmJtF0|9KC#Ea@DDm>)mNe484f)|XFxJ)-NxKq z;;+K64t@T_RvO1P z%uce8Z+B_+aYulmh)!ye%SoZ5Uj2-{rrxv8T7{ zHL{%UDz1|&eMm))=b3Q{^wojG$OMp(Ga@h*2)SJ(Ks zV)CIY{Ra#h*f550<0?U@;q<0DaDK3y<013L#pHEF$r^rGGV~T92_ZHia-b5*RHp$@ z_Pqw@!vc7vis=fT=sXbs(q?kxArMQy*OKeJ2iMY_)20p`tPlCa4n?_PmE6s}FCmln z-T^e3rejA*@**C%em$b^o`=l?tHKxD8+HB9Jz}hr=+q!5MCIbAmeD8Rni~7@p}rFQ zBwe{$S$fzHrZ{f)lc_;^d8NhkUH64+PGX5-K2wc`nrnv>we8HC`nwY31NZ_^=jDbB zRPH+P+D5OI8b9O7LVysv2U@+L8J8i=xXG)F{D4@?Pd&g&EAe+2ttoA4yT`a zJa#I#7UPDsXJ0E6-c?tAo{-Z#X4Xh^H-O2ZuWsv^F{L}}Sp#$jz$7+!SiuYb8vgMo zIKvvX{tbgIK4&#WrEzc?6&QB3J=v_fC?iX_Hm&RWwQXPcRPaoW9(2mB&}STX*n(wq zlN2!mhq7X%>fCm#-r=AGWN{2+uKr?{_jf~>5o8%~;+mauH7 z(!Z6ZOKCTTRiBiFsg@%-{n+4p&BR3-c&jxeO@c_tsOQh^VR3XFEK7bK^5?*sg+LKh zUgKYtb9T)7_ON5&93~`Q$UAR<{Pxf0a@q;dn<|8E_?!1wRs%U*h5ZBPHb5l79zu=f z(u3|dp?Va@ao2AyF~bI@@lc^JUapSu7<{g%pU8hZnz=Z_Qnwgn1D!py9zMT5$r%Fx z)(kDt-R^BLQ!d~-2PHkLVYea3UFLwh`pxVsM1f8@%6e|5@n5Gaz>b1V@Al)8(k(IL z5Ln~tr&3G*-3M-d7AYXjp5!ztoZ02o51aQy!}~A6jt8k%nL{yCo!2|_5Auvj0O)Fxh}B9N9LtrkH-)1xatY88cP)Tr&Q~b3{88*sUwiBtEQzZV#uj) zNxeZk$XS(t*3&cgA9f+d)8wr%J2q9N#IVu2$h|Lgt}>6BR$qTO>xhSaOA3`QM?u$N z=aDrNrD|&$c7(nDjUXY{Q$$5{NN6u%2U2jveAKAls+uWv;M~lv|yx5Ln>3{IR zf!OcHl3q6nGMsir*ltjD*k}bps3pwQYF}e7e5^br3fxj#pV>yv@5QU}3_Cabm8D*% zZ0Dra9jw1z!~no9DTFMe;_Mw$I-j`28aI70{^C~~ZAWXEkO=>pkWQm9!&fBxy`?WK zO<5@Ar4sPl^P{JKt{;iUhUAqKmLZ~OEb;Ja0ZbFwbGqdgsqUNZHtOuh8bJyl;{S&j)s_BbAkwSZ~ypx5EAy^ub2GYZ20Wq3ZS zfQv*;ySTZx^-bG1F3|cGcCXwKYdZjs;rjL1PLJAKGlB{}__$=ylq_x#70s^a)?MN( zDEwY-MNB4On2zksO4%>h7K%PAuZyN?7oU+z*e(u|VISnYpFe-hSoS`eI+rPV{YZK8 zvc|PK15T}?8Hv!m^0zSrRa?m&kHY!GjPk}NMc=!-W!HnJXEJi?*ATpv?%MQ`C-y}5 znc!P#b&scLX-2d3D0RQ9KW|4y-`v~MaOYU+3ciC+0aE9^O^eJV;tdKr_cvf}@URZpPZTwSmIU|xn&N5mnIRI)yfZA9`GZ-GSra#5B_Ce$9T zFN@A3P_&uYhWN^Jqi@$_c77(405d1KUBHtIM+xViPL%JJMeBadO$Gqnm&5WUT|vs7 zvR>Fwub^8xuh#oagyzae(PKZ;7XE}PzoRJq>J>WHliSoW#EcWdwU#qiGnW1}lC(T^ z{=~;p;J38k?XKBgwk8L~8e6#h2lW&SRlDoR@~r!(GF+V4$Q0-xvTSIQ1aj%w&=osA z5DNG>+v?s`Er@N|6_W)%p5IwCBT8BB^0?l*H?gDic4t1xBM7^R#_&lKN|t##b4qPq zLPH<9zOJhMHQL&gv;3^b%58^qpk0!x@5jP)Ik_;d6C-yKa-Ko1l>2mVzt3%zR>KHB zzZkY&b!cx+7|88nLKTW5!$wJpD`HMe^6Fo}oQ)gO%xhdc&%_D7O^9Y;Y>7(W(fV0h z#^BXi?tV}~p|>n&siq@1$h@`Cfb|a*fBBHQ{<;mS_?Kc)eG7D-?Xl1olHS2jGb6hW z)UheZAtV?RHB~!xm~h6pmX>m+NKA|*fwkgKC2*ac?m7;BwqgDlZNM_NnFg<-XrY7y zH0WSPK*yu|vE_`Wh|4F8k~Hb2!2ON07Ntx>4;-U3ia4SqZ`^bWD~l0F>vGuI%-aCR zocSgi1_TlL+)*m>4Mg}UtNM(LND>QN@FN*y-dBd7UaV4X%42RKY}4e@v#z782s}I^ zvRlyzi$w`>%zUZMYL%579T4r{9rxwmcXT&$Nj(VKaxf}dt<`|^5v=#vH#cv<_whhC zy!@KOHHxiQ7WwM0@`cPOFu&;GF;q$wU=pq}xoLq+*9M^-Pv#?O^1x>c@@O8FcMfj3`p>kBpFxC*G zd%o-pUL^mYXZ0kwMfJaQ<75&6|7bVduOw{rJ@^U<*5plfs*M8$la7zuc|?gdiWG4ev4hm-BB!KK>qc;1xWT699+&Pe-P}eNMiN|KI1Y zg`7TXHeMXL6fWe3T|d4XhYqrUvk00eC@S0qMC{g(mdtMscuI)a;2u33nusQxCt+t$ zaLqwu7{w#-Wbeb%pmB8_h-T{zMzf>(f_Y-D9nXOm=H9PUnR#$(?5JdAgpTYkMpP{_!XADWZROhq|FL!KOlBc`7(tJLIK%(%hxQNi*>v=NR7|b;s-EP5RrI2 zNfZ-e0*vTC*k2g9@sN-C?z&{J2k-81M=m>upW|WEd}zYylqA;Yf!q)G-ZQvhND0ux z@(RJJv7rTnC!Jq;4<{f0rdcgC&DB}d?l^#^VvhLxv%o$Z`BtNl5&=o`-rp3#rO~*v zwJ^SG=NWJx>%daNQuKogZG=F;P@JW6}@3y6Y$BV+0o~6_guOM=n+>9eh3b z`%C3BC7n|Cn#CK1%7XX1-Y30Z$80LNvdHDB%GBY|`g$ZDbW8ous|vn8lyXOvmfFRL z-@o?Ea^oe;vm2mgOiD_kJAeNC^4t(JNGCt_0CIxF>bJDDi9%D!tL8^E&(n?v1Z80b zl1Mf`05o2Ov_)NgeK^=LyaB+VKl5O$Y=eC`DnS>;%>-(%GRp!2?8S)X;NU=9U|iBi zsXI}u3#+U8uq_jyT6f*P`NF;|0YO31;ZNQB2*Q^Dh3M=7zcTG+6Ymf@wSx&I%MPzB z_o+9QWKtrdyDla5!3hpo1JhRAJloenq5Iv6PtpY9aO`;Tot+d6UE(fL-*1kpn>x*v z7XtR_)`s72Yj05!$#+7JaN`+Ijbk%48CFyDca@av+tEHFX!_^_QKM>#fQLOo8vs68 z7_d0p2ngiP_Fs9)CWY|r5Kyuv>g{60z;=bysHjWoL(qFzm060IKRf}Ou_9R&N*6z6 z*wAc6@fB$528f^N5yT)PgSRV3g9j)TU5Kfx*?|ORzbT@+IzMC?PUe05@Ifq-AqHrV zUy_IAk3S+p*FQcvS-AP*jl79#AMM86%yb?w+IGlKliPJZQt$pqvMd+J@ zYN$uhg+OEp%=5H(VJ&{}k4>~2j5pn|0;QU+4*l-k<3NeNnw3blu5uNF%>bnIP}YU| zdjf*Yn*jJr-9luxAelwrA$yADU@>5yo0npJdh3m*LPmg(e@)QqLkQ-}5|E~D8Ftpg zBL<;Wc@iDu;bPR`1vg@lsj$Sy!Ye|JjRl`|&61mOaxjy-$_0Y&ah2e7P)ZWgF;CT^Mb!+Egb4a9r5Cf~L= z36Kg%gNru*Owf~f%!yVGcM z%fk%TaLv_kvrjjh_|A$`oOM~aokQc3X6$d?P1~pF%b0O_QH)I6iKFzcpx*T~1pf`s zN~k_in7xdEQPB<{aJ&Dk`1=d!tFEco#PP^ZNK}Bd`zy!g8IKC}J5QeM0X~=m>uap}A20^lYh*L(OGP=*K^3-Nu#^$46GXG=1RaWZWr4ocI8?)!M= zBv3q5@y-n0mC$fdrauoXB{Y+=@6K8>4ZjNfy^hjBQai**Wy6zzN`e;FsPRC;d{&qB zGh}g0tG_9SS-yb)-;3kSKFJoV_kIcH-f}_xUM>T9y)%yZy0`5HcI}o%u$|IM5E>Rfdg;poLlfP3* z6&RKypsIORKHIk30aFkSu)gVAwhP11T1 zNWb5>mz-)vb$}c_Mxv)>4Ytb6-9T9qgNX)-`tU>N#Qc1GT`Aax9kR`3wQxxmalI*V z6wqa7o;L%aUnE}*UQMO8qe0y0NL|uTbkXvZnft~W?Pa+wUs0NSsWayE%HcYr_sh~b z4DrlozxC+2PAF_n*(A7bYLuURp=u5rRzefp2!Rq+|B}b1+h~@B!j>J9)P)On%l}|NczARW7Gkq}0QZg3JoEg71^2D>Tt4mN=nB&i~h{uKrX_P*C`<6-G z8P4M$sy##vTxS02j`qP4>MJoC_E#1FlJvU0Up=98=!YuM@I`XR9x~p99r~9)o^wZ+ zC7tQInKi`FSJreeujF1CqN}{B1l^rwN|%1ee0#OH9)EayG5?CUdEwhJ0QmUI z?#^k@K}R^L>&IiTT=%WRRQ2S5xa$af>+L(ttc`n|#f%Aj!l`Ciec6E0jciSVVN*Sn^T+*>f>G*;`?yz;4&((cx z28Fk@mN}3AOB1Lysu{ALl(@yc&3CaiK$Id82zMP-}`WeuCBSiIfLi-je4A+E!alEhnZnp|#Wdur{M7N~p9 zK2k}d!z=2P?)v$uh8cu6<7@1epBaVrR|M4X=_vzD5(T{O0t7@Yj}3&Efh<1N#&{+a zlREkZ_J#>VpQPzB7SoB0+g95vnl3FEx-rx$$XW;=3kWp()?-wi^t>D@nHN{*->UQ` z7tK|e7(Eu9YGMHm&@GV*eko8LeVT)D!(4e+JKh5$zYkD>)Fi=cQIhN`5`JsyV z_I7Lng;R2@~4Fuo6z7awFvMRuf=j`bHO-gpTsdw(YcoHXKu zhNj0AWAa8Sg!dcQtODwHRz$Zi_sY+@ujzHN_6m7%)5Wuh4s_$YK$R(!=qrdo)G4<= z3wG2-p7aN%G*Y4OZ?y{F4~ID89@bvZxxp7{+ZETZ_E{1wW*_1Fj%;@d6Lpj z{2}}xgR<~#pD2Ca`Rc_ITTm8PSZsPsS|l@6{KliKpE{_4e%|;4Y_@w>l`BGJi?~

M_R=w%&Wq$Wxf|FxyZ z3Mm_x+kpnb)NOaD`HFZ&TS}Ox1lMwZjIOr!LshbIgdu4A(OCW=?JI0l=_8A zDL~$85jMx^2hAhbMZVL)(^U9x7vBN#{Mf;X5cICzoI`SWqp)&fUStW?W1g ztYqEZ?o`?js7z^~*Jxe*z>~7!mZ72Bg~I0M=7Pry{#crsfHh}qdO8wx0l%vnz}Rra z-Cc4w3oiwR$th6XN50UGZ@P%X>yMw!EP}C22gtwysL3gpj{s}56aM|7HvUY-P;2DS z{>sgb^9fhVo|fFmUT`e`6r&wyLFW{y?^{Im6cHXZ{;Q#h?9~o%Dv!VVZO_6N*dTdq zZF#17scuS1iSUWW5Q2*v}^d;hK}6gVDLOdR_8=ZvGqQstI}9>&bih0KGAzS zSLNGeRnOqm!BhC}4XX9b1@S+Ww}VeW!Zz*=9k~IIbvLE-{*Zn2jr8TL)|hSGl?sKX zkldzoYks<~wgbfLUu1#imgDyJpb@k@x&i`wE+4p6hae>~s33)c_)%t9x4#?AI%0X5 zkY0`+b2mG1+@T!xBkp!YvCYMdj7(8+B{s#zZR~SDrhN}`Cirn?$~gQT!v3v_2pIU6 zEmO7O>)H_rCbK|d4QUUZ&?j*CS!Lq~Jm^+kKvUeV+>qtHTAQUPl2!wMU*JQJY}5j@ za6A@XpZ^>70YxJBH5gX^4dQ|J(X04G^kaZ!6a@eM4b8_nT+7*diE!{cMf*dUuPGri zk==%Xg0Z3b#o&7i1Bx%f(yJ{3Kc`WVUi>r{e)&-^R2qd4D5SoM$e5^bA;)*Y$+)M$ zPhS0LRaDY}fBJ+w-_7I~WkAsYds-D&kGy-w0}V_wbcp>wO@c<>zFeE{dv*L^-64R1 zomv86QGNgZ{X!_RdD(N{$$qW6Pk1QxJ2Cwm!6J`Y#OU`w#(WJENm7Q|zk@$4^*FBr zDwG>9Cnq=4TOg|L@7_Z_NFcGZv%9#uUWe(FAT7-VEt1JII%Hc}$Va>_BEP@2r?(px zklxd1B+M;c%!KeVLMQgR( za7iG*Jq=|6=tD=x=H~9`QY_t|&C-KycXUHSG&fe}dvJT%`GLqLkJg3yf-_C&{4fhJ zYxFVK;QfcKtF#1B1@5ijiCF;kLbcaNZm5mTCp2RzLi7=)BEoqa zB|T8v#pl)r6?7eQZhZ5G*?Y8356qc6H-AZ-Qi3=k*!ymA-yH)(Lo5`xKr1pW)X&y| z@?#+zG_S~J-~7X9j`NRYP$x=Zo@<7kfm66dh}uD`GuL9d_Jbc$)NZfGR&CPp40cq9 z%#j*rtlJ(aNdFgSZygoo+pY~W;i9);(%M3d@{-U=B6e@$?a@{TBj^}5ymiZ*io<}Zaj;^7{EmzKJ|fs*SjYa z0yn#9QjG*OAFb2P+Ec)m?s&*AT5MBG4iuGxN8!zbw!xnc4GoDsJF%v(khIoIB@@v3 zDp4S@jQNDPbhpU^OJ-a}%;@`DGNPiQUN9Q{F;dL{8*L-Fpe3sXqkuXc(W>+A=Fhd| zlKPE(D5(Q+lHdgeWs_tK);Ky5*x2jrhgJ-z4JC~YDP z12`m_FjzM#v9Wfh>HvEw@TO6kcp{Ps>;k=eJM=(sCxAbh_vXCegUrllLU`y-6Jc7q z*{%D&$7BRXjRL?(iXnobSmf25LT$L#h_=(Y9&SZb3`2XF*R;mOBPB&eJJ90p`_K!? zUH)R%+Yi3|V9M4wHufl(nU%GpHyQR{mxli9akM>@A?FA>w_A^y3xI`6U9t_lyE>1N zmyPYeWvqShf&pU|#e>QlYUy{$lK&0kKi5?os~Uf)7!qOYKO)70IgFwwPhYC&{z^3R zK1C1(^%8ipYa}Pbv+`Oe=gb=cTZOAH3>wcrD*Xj1@j&`H0w|6SKmKg?0Rz-^&ZwEi z#SY%M5s<_xasGfS{)qPsVFyH?1M4S~rz7y+F^s&3;uMl(v3=~jMq8N!lN1@h!k*YU zr{|1GU|t`E<;A=?QJUh&KXS{p^DU0ek-xB5tQ~N+_h6)I2Uq@4D1rAIcxlWAF963d zh9tM!wl3g#7vXS=v1p5&R&O=0CY6rw4DR&lN6Q9n)$dNw;hauAy-dZbm=t`5jym?H z(I3{Q*ON(jaw`BZQgx5hih)5QBmr4!)I^0h0e!RLX!U#aC=JT0q6T7EEe3SZ!kIg~EwxX-q81 zgWx%49$w6jg7jF@L7Sn`QNY=e6 zLi9^R7Eis2d(~LfUj$)24CH5a@rKvkciHa05c=g~a1s7YA1l+qvQof5+h*J|3MS}b zW9_|I{B{C;*dqDg48c@P#jiW+=SyXm2uqe&R%-a=$XSsx**JtYNZ=Idr$-+}kYALN zz5x^3oCs*L=oEP4+cW(0WiKM^?+}V$mm}aU>sk2G4pz&0E+W8;g2Nh!{NIrL`)H#9 zBJ4^Q$gcFV%2^VDQN=OQP2`DXdpjll^A#7_1+!62c-9`4rlA3KH@-V}4`b7@|Fh8p zT@tGhlWd*=>_wf6F2X(tAYP9YqSp?iZ)o5xp(=5UtBB{roJTPAbVTm)ztHHM3n+&x zoQZj)Gef)-{r+Ydf@_n9fi59ASef-ST1h>)gz4ICNj7331U z$&>Oi!f@Dc@EncSZz%VB=Jqp96R2bH>F_e=->5aPLI7T<2N$kql+x;?6EBi>gxcG? znaVLoCq(JmiE_w#y;1o*UdFr)vrM=GEpFBl3@qp~h}x>R+DcFqMkSse zq*y7tl~0E%lZ=UJpAZ>X8@ohHW;~7GYVU{!U9AUlbPwB|>ogkt;q;|!COf+#0M-9= z24D2LaeA~Y`uz>`&yB|o=W6p6f5!He?1vimdOqXhB_9mCF&v4VP&tO<;WfmvvLQ!2 z@0C$ZE9roo{LrrS4W1kgyd%1W|KX7zqj%N6WR*wPcvFr(JC^S>jln503ns6}oG&Tr zlnZrUQpQfmaYR$JhHNku&>OlSX*1$gLeVYTP`~&c0E#}unjUtL#iB5b46sOBYj_3BCk-V?HLw$ ztp4rDu$4jlLyC<_Oqd1RWBxcAsL%;Hu5oTSc`j_srHvfrB>hA&5PtFnyX-ns4Vhkz z$l)vxS!<95a}h9KJ`ST!flu2`TTOs6AXbG4%E0!M_tUoZ1$U-@aRJQD%^{FIl~wLC zKN7YB%bM(@2n+kaUPn`e$cuE&Nvm*0ys`uiz-k%(Lrp;r&PEA~^^P?#(UVtX9+ch(WJywSe5*K8G)l)NjRfK{QW z#bPGNBl$q^U{>Kq2cA%kgUYbSbqxWPuvWRKs7MMp1Ycg-a#L~0;ypY>dHM2tp6EIv zfl#F6RgIljlqZ37EyC44taQxg{aPDHqY+xx!7x}2d4oMFodB9BlJL&UnFXRtN%E&YW zjTEjWUsCD+#-qFevMYN1jLMS>$jhId9ZF#ZU_%PPu&FtDo}^Z6p*7JS-7ALX=VnKOcjvG+A`cU9@F|?b{h63_Ka*vQdhOUwJ@F-V5kNSsyyr-G z&5iFkLBXTY6}AIMt2wW+r3C}Q=Z!ElUvtNVhZ8l35CcwZZGu3j_3vLr5G!sQ8<`%c zdmtOb$M*B*&x`#9G!#*{HWy|;{~U#QGV9kq)~}t&fbL0z?PnTjS>bQq$RTloj5sjy zR0j>dnduxvY61A94iY1O5B4~%v*7GVM^7ILu!IA6r@0Nk7KpnKFJR+%;PLIH#YL>G z?WF8xI>lGuk4GbFNe!44{u_vZDu09&%XV!v075vd>E2(tn3A{Q^D=abXy&2p5K(>S z$HBvw5pq996Aj<)l*k$tTii)Hdl)iJ67CTQNd>{WY5$GcEY zQFpPl4VX&X?FjSF zhdlo61~|c#_LnBA?44&KXLtWrwOmtsTN=$68;;fiYqzSo_9s(H~>#~O`6=g z07oK?Ng<_}FL-b?olC9b&A+MkI1DAiq;KBLFcZbds^c}$n4DOe(Y9Lq`MKC6mCcYL zA)`NE2L;`tIl8a+PO)Y$8AfPd&9?;>8J-~Cs~HVwa4cl7*#Zkjh!~8 z(gBDdj|TE1fvbHLuzVdS4@iJA*Va$~>TCDTY@%9~Jf8|WTnocg15*z@cTJcSfmAmt zabMtn;HTY9VBVE2=G~@aUxr#UM^@%?dU`rUZ5?NMU4LggK@HS;9D{{BJMWu-jv?J* z_Zu@LMkxXvSURyx5(@kJ)RtQgh%b97?)I&b&Go1wQ_pwBi`d+94(SKbaEl> zmCYr7Edp+)X@as_P5{O-$MkITYRFEWELawLrTYe}qJ-K2zP^!cc;9{bYZG0BQPXos z=v?u7PTm$&aNbfim-$2GxB2M{kIn(%uMpazY#Tzv5@WLNOOjk=y4R`SHQ?+pqU112%DL*mavLI3Y14TY$`)5ppf0He=hkKI_rQ(Yedu$ zP<)CTWUGob?|*Ci_wr4Pj?{#=a(YKklnI^5pJR;5yR7)s_QrFjm=Jx=FRI!10Mm9! z9V#s=n=UwjMHxE9FRrB6tW3f~zt~VjGiGm}3q)dNZN{-Na))mpnF3yzE)PuRXhyXZ z*xmR!l13mSuq=+_`a!`sx~J5w^a_m_q!-%Ot3nETcKBCl%Z7uXOl6cXM*qHf6Y{jB z5jb<{!aoi_gy$C#&(FuFi>q#J9*yG(4h{~|1#j18T5j(4!8`u`r8>#aIFQY<+4>9< zY?gaS;gww-F9dZtaRAd%e~tOUiK z^Rt=)zwsU(Mj+1Wy~&QWk&wza5jPcc$a=nRdMzR5NR4RV-Gv~jp{;z8x-QoYkI5*{ z2*+O^NPci;(7Xulj_1jK*xefp_7(77Y{?Qb%}@MSJ88fnqJc(#MXQXfm};gG_8_S6{SJfI|?3GD`FqfBCXV$aE6su@<`)&DV7-G57v zee>N@u0Z1g5?vh<->Eu&3>;?Wr{9f37nO|a!8xW!yX}cMzAzPnAJ?9T16ARyN%r;R z|8zcqgN>XLD3j7>=y^lJQw6+*ImfIcXf%>zm>_R77h+SSt1VYzULF~UUrkZ+U8DsC zz%3K)?Ou0p-{ZazOR@_R-T$O3>^MJ!dytDTHDrbK?pA|s7B{CMuTJk`;WFsqH{GxL zZpr<+Gu=c0kvG(H^8uNhwb!CN_`J2AM?=Xun_qb0CukW1^5kY$RuW*%XdW41S9q+Q z+n;jDU$S2ETF)#4F|Rp$rX?ILbZ_*y+-Y08Ti00bNw`I*t)8*lDE5m^&?T1FvJGtc zv__UU`g7CARa>jY#x!YWI}j#M@6;H`|EOh;oGdx&*i3Ti+m7V{b;v@3Uv#W)1IsPZ zvb;o!+ppGIpDuX@e9Bjla;sls^86(JK+@#@4bGPMhY09UkWq3%u z&+TC6>KOYQVV3ykX}Nu` zA2%VqN$p-6kdS#Rj||qfX0aGYN9#_hE(f_cuNsq6l54!Hyjs4sN~@Xq2urKEkv;_G z4@l=VKcv^KfiuMTf^E!&2%8*TiJwpMAKn+UY+KU1cae#KA)gWF>$C}4F>@!t2g1H< z>J$c?!~+5o2SFt`W#vo)6(dINbj$pn_K*TEw?*1p(y#ZwY;3j7*oX@)xRCDE zC6-<>Qw=bgp504lBxZDV#^(oXIt>%Kd09`XtU_bi=;7C_B$Ddi{PgWXA88;% z@S7F)*43$Sd8VtzH}VwPf>Mu6UaV|>*q~^dJx5jYxjF@#6gb!{xtvI&cwiDVWRYRF z!4>?YhPDMg*Kk+T;~j1M-6Pi3hbwA(ul>JYZ05>7gV$mGD(dnH{iJA5<_G)Z#4D~T z3^@M%fo!rC?);4qUNyR2vJMw#G_Y@N5U9qm-_KaQ52+n8T}An@C{5`4TWH_&%e%NE zdQ+xzR_i&e4FrEiW|l1xc=u4op{fIptpbG(l-7ZO4ZC(o=3-qyNY0 z3sPq9y#)+YRu{t$d(tb53kg*{r*1Qffn*4n=T6g4F28*dXc|xlKFeAUds?;8-lHez zU+>-*>Kav+kv~ei0myQZh6(5yW&iyYf$n{K^-HS^A{G569~hKSrhGXHU;SZ{L1wu2e?zKKt)Nm0^xV7N(06 zBUiq0$O=1xK4Z&vllekjY2)@U)w<24=eCp<-M{Qn8_gxxybgje1r5d&aq-#*Y})dsmCarXBCVUugg#_AFIhTUh?=rp5z)-wsU)c!uRF3Xl`~VZJYMPRW(sc zV06c{xTYUCJCLL@pd3if(lbO5nE|KjMLV%+V4s}9UqWjcAIZDSL6!vNrPp^R7(l%S zHy)^F$Zvp0x@^`t(r8}JIPpA&*9ezp$t;mu?F>|x_qNY_9snNOe;o1XmaaG01e-%h zvp0EnL1JxlbCyy1Yef-F?z+&a6}Do3Cat@=6IE*Uv_+z-D??1^xq(ih<9jD4ErSi- z+ZS9pCidJQ>;C(uH*BQ4Dep@1 zdYwNk`#!a+ruL5~$`R5uDJ`~{oypDeFL6fkM+|@5eIMO4yf$Rnw zR(Ybk_v3eJ4WCCCuZoQ1{yNP)mbE>pdy#+4;zFrDFaCDX8lnI7Vc-9&mEVv-a#rAPpmGJBX?f*kM`J+SQ8`ELqQ#b@%8z63vgRw)O5n9L+X zugNPeP7TA4SvUR5&;)z|Nu4B^DQI1JH9Iqy!;mv}-O_ITaF7Lz0bbbVe0kRY*i&fb zGKyYoHFVa9w}f;svTNtBid9yZQl8|Y5lLLy{I?&HMN7eQVzH&{pOiw zzKF-R75$Zw1PR~y(#pdJ_f2XGJKlZ1$D%03qqd>+us{g^=x3LaU#vpHfhlo;ftHKr zjz~Yhu-ON)q~uSJj67OQioP$zMn3ge+cBj4Zm{0=e$@&VB-^v3(_g@&fqP@UMWRG) ze|@w%QP_;*bEI9XRPgcaB<|k>&#Gn_KFKpg-K4P1z4P;j+occGoe88f2g^dD%iltC zqW$l&?NQeM_AxXj?i_97DX3)=?NUw9UUIhK^eJ6b0SA-3N%@Pi!Y8#(s3WAo{k3KqcY2&{z&a@;M1Lmzv2`{Pbh?|y?@S5|9qIB%bj*^> z$f&5kMaNvGQzoN6sZsEUUitT1XtR$e>lzY`CK+RL^P`>@6MCs9D78CDRjcG`aUU6X zRu?aQ<=uJ}9A)>Lo4e{_C@+89othj69+6`Gj>B0;(WTU5oG*sF8S}sFEcl;(OIzoq z`)7g-9{<_?=Vly-{pS`3dmD9f&PCChOhweAFJB7#nrx;Um#ADW}-S^K%p_3{iZekxAcjCK0d(FKjW z4&U$CJF-U_ZVFVi32hME?=Y67KI4WqXEBPqg`6kn3ex@jM6pdSlCg%7jK5J_5f4ziaRU7}_*ZMa4|(IC z!z->^cb5RxNS3g*K^&SU4P-9gfgbqHp7VwV9+`599!30$+IaCoR3{Y`AD9%7GYS#8 zRPG4s*7uc(P&)X-m-kvuPEY+ZGJE)N?UeES%}<}&cl8)RGYsIDO2r)LMi_L{o1 z3^pvvkrsD!hFl8#0Z>%P_};#uua;Jl7EX$?#n{1R0Pt!4p`?zW-OO@&Dng13w!Ci;Im#tr#90X6r5rSADcZB=l6#sBc0P_Wr z3XB-(X=~uIPDwDMpbSj~J_2nAGFxYB5>y~V{X)!#RMYzq;?aM?DAGvx(pL4sS_6L- zzORt#sV_gA^-<(%;S{S40eSs5XUT{gzVExEunOW6@NcyG|MF0PUGU$U{tGWm2?_6? zb=Y>HrOFPX&qX^pPQH&mlanX|v$PPKPT~!(?j9o;{%=N zYqc{8>=7=84;^5DI`rlWOYj7YJi+-&);#!?Z9Qsz-5WtQ5BH%#ge`;(sCY4uE+w?8 z`@9~Eukvha|G16UFgZ9nO4&7#+>@|Hi1?0q>+sMgfK@!GpUKwO*8@I*87ZCyFWYPL z-XEBHZwu^7VN2v{0957k0Ki& ziRsdLxMPfDKnf`udCgILc!zkWuHTNFA-q)!C(dKz9-Z-=?RAD_^Ou%zwZpy}yj*H0 z!|4Lw#l8zC8q@W-qFzJy2%}`nC7KZ<$n#(T>33L63>oO=LjjP?0dhf10FZe^{e*QQLj6a*alWbeqa`&A(YFEb1f!*opAi>43^@H-aKK z#B*F#RcNTfT?V?;7P@0!Z!WIck5vb909ZMl*xcNl48Bz0Z%uD(Y^;Y$u?QwdU0CfDqv0&u&i!UpnxIfp#?bwEg~h*gkO!uYDoVI^e_>2@mGU zHdp5PrCt9+iv_ z_k0FFhIYO8>s+nN7z;dC{jf<4XB8k095Xm-NcW<%*O`6;v6DY0YAst>$6LWPDGsqa z-M5HE4QS_Tz9e5QP@Fv!rd}fNltS3opm5e$pK1NZ{+WqF%<`SzA^)%b){o=k)}p2r zzCEFwU<*ygFos+Rdw9BL6^bsd?*=Ts>1CLD^ho&z`OAwX3FpI&z>aQ!3cSOjztTTN zYh}v{+%Ni5`c#3SQ!9lgd8Y=3L3 zsGn8!j)HhHX~&7|0Q#$eR`LB;JBNqOnY)f&D(&cwC?i?_{kw6h(Kg38zinqK>?|+7 zYDd~k(=BZ}v>GN(Kn;o%jfaI_By<}5IFRHg(ITME$H9sadt(92?H0%1DIg9Q5-9rn zfAZ*YAjVN0C^F}}dj3aGCdjr$kSFizNCh2g19X5(r^W}RUAm${2N28THncW9Be)IW zG`jKRtr+WG|3g?o~Qkx!Zv6qSe8QAvO0(aI2ni01Jd5 zI)zdwus#H^YB@m0=Qkx_{g*{&5Q!vD^zWa20BeGY&7cE2x`NsUfJ7=f2U>1`b5a*h zRpz1jh%gL1e7-j3!rhleLaGE4+Bevq$r4(<%54%BDU)W!}TteL(ZS zrns1`8rTCGc(3~+7R+$X1ksIEUiKr9qqb|%%8z$=k$A%^`3~eV!T8ML=63!`Mx4)l z*O_!t^CbDJ78M@DS@<1Q@{L%v9+RlBaX1_Ggh#vNCFbdjcR!E^v!CF5;}XUSwNu!)hGOHukL2 zb377mvNGuzF^*XT>q-SAT+*@QWld%d&3c1hkh={l;8DEo|3cR zD$MqS=Ay9$9TCMAz(Hz(fvNySE|1$GTwn)6?n^cTRjUr5lf6~mj&$^l5@1L`8=`SZ z!a~Jtpk$JyrqP(#lhU%gI<_08FyhQP`gXMeCcP~mwU$4@=ne_M^O$%)e4sky!d0do zzIKf&^>oYE4g1As^D9emo+6rAPvck`C?#zsbPuyU*vuxyMC>&MtDk^0fm!SpS8AL}&|zuM z$B&e@x&rvBn5LlK@w*haS1xmi;^Osl@wu&eM!6bm>+2D3w;5ngYT_ViWi1o;H1r~U zEO~4RjF{rS<^%J9og;C-HDYs zn0pPu7z1(q<+)B`_vMAAGRyAu@zuuR*$DX>Bq*Dj&@T2K7aRmhhPW-^}mji$(7G{i& zs}m{?q`7Y*i=XdLO!+H{w`G*7w%wH^BdIoSG5%sywDvA$@(N=Vw0h~)&xGs^$Pdri zHytRQa*geCa5y-Z8Caonzq(L0wd}49sg6hZRv>JjDJjGcq#do(-z2?8Z@kA+5`Hy( zF1#mkOUyfOtz0cg_ys=`^cC}-7yRy~P!1P&Quzy6ecXOsy zAK1Crt!3BXsxWW;I8UbJ6Pohu-0$Os}`p-bSB)H`KTEj ze=UWE<-JYbrEfoCm~oghn}-b{*$~c>{@UbY!+8tA#Z5zn`WN?(<Iy%`fo}7!BG%+()!WOYTQ5a?CTQ><_Ik-QCtPS$2o@0 zk+L<3G_EmyWrYF!ahYyquh97JNk7{05XB4O^4_JR!`($P%-f7u8TIt3twpNxb^ORJ zL{6*J_Efktd^l*;=+&g4BE@zaLB!3 ztg(}qdUr&XckUh|&3+y}`U<;E4ZYWiq+%P<3cH@|3~{rr*UcU>6;)nk=a%(BTbc)> z6t=h(3v$|;(?)nUa_f#*X0b?^)(*-yuV?@GMvSaG;5U zCZzz+0#Wi5#gX!*Gjj_ICxrRkt#h1-A*o$k(!V+-TE9j&ey1Fm%eCST+$wuC-=*0` zi4JE*g+0Z+*u+1`OO48uddT#OBc3#uXwLzx;rn=f8_hU!mcLy%$5NtXRceL{h(tm` z(0t~O7%A3gesfLY_kC1W{4q8#%`%gHz!rs$^yVsh2gGq+{KkR=F)8U%74YN0ONxgy=P-rEW#+5Unw$bu}M26;m%gJIy|}&mltR5DTG%Hvbfz>&`ol# zdv7MRRki0^>;QADc#TO<{lj4ow~n+Dsp&ojx4rZaCMClEA0k_UA%9a$ks z?+FbuA(Rh&sSt|(yEk(}{U`a|9Z=7nICi=BTgQ0d`zm*LWAq0CP6?xP{qBX)3vj=$ zldXB#nB2cU-d@|=(iv^OVK`3u$`-zc0%s?UqJO}nz?HTX5QuuB z7d2LV7N1nV&-Hta_xQ>q)9Kvq{Be3XiT`?sZffI6qsI-asm;i=kg{K^S_HoH-z%yF z#*{eJX|SxAhVO!ZE;8g{QZk6-NluSRuG+xgFQI*urn+SjUGOE`VPiBUNA2CbQjZndl49?=kaDv)JR6?kixEyBMmk$JgJ`qEXp|kv@ zBfOst50LB1_0M&M?d7;oh?8Irnm^+N{^2F~sRta;1FARjPcWjH@QZPexqd!UK^9?} z@*+o5v>p*TU>~Tgh!*%QHX}4Ma!p7LLT{=g7Y+GE=B|-w`ci*5xUwDLKI#!=IJ!#J zW@Xj?INkq$pZ?crw;{){`G0lMLgwgk66r=b#YOn2q!OIY|6E^m z%gJZDr`$V{C-<@J9!cY&4bd@jD^%`FH_h{p>1WcdmmOP9L!|^J+pOJV zi4T2g^USwBG(K)PJ3c{rNl1G-0xf$pw9H9@hU^?!RCdt#0zi}+FQK0-REv1$BMDm*6BCH&d?IM?C+lR~*brFy8P{o$ z`m5av1tv!R`KKR2cT#Ot2@D@3KAwk%N66%vD72uF5VwwGegSwC8r9ATUBNN#pdB7+ zZq<0G4rYLa$O<8(*r&h;jwwXQE8t__^XAGWutyO6xPGwS&h6#rM-!arAR;PC2?HC# z&J~0v2P>$@PRy5FVN^fz8Y@>?l+%|ABA3&>vZ6#@7cM9Nk5>KD*=tPF3Ebq0%5?{a zEp%R9dMpBy1>KMBTR?;a)^@D|?tn9b#!;_|WO`Y~^>gUUD=w#l_E{MO4H}@;BgkrP z+c;5>lj;n=zLu{U^&3l0-UJnA0u1nYfh3BU>Oo+xINAwhR2!)P%psxBa6it#pt}sz zBtM4gH~|4rZ;NO9CtD3r1_ai@#v%l1V6Y*iV5npS2El~c!tJNjJwP*OAVVFtE6<%P zo}L0y*CwnU1~A_+(rfsFyttXA10|TEFyaT$b8it3DVB*jwWjC~|D150<6pR5>wwE_ zL8ex2Ztf0vzDs!wNivF>85JFe_<~ljC~bPlsTOn^9Q}AGwihQvAipQv(}=kC80j^a z8L9TOuKN4?Pl5ZX|JoD^P?s4RS>@_bNKcXxuxzzOd2VeU zu-Ki`{{;6o8T>UAjAY@MF-f^zR&{lCJ1g&2n{4&7l;uD_&SV>ZyB1b85_r2EVTFN2 zO&wx*?$t@zp~|g#wYgYJGNF({GPBntQdQhwgTV`Uy)+TTW;vI?z5t{<5vUD&zwIG- zZ92@cRGHz<3}RhN`^&%wiV&Wh~&!#?XkM7**BPF%$zJ zIr?_nLJU~*Y~tZUXF@|m2}GkVQL8G+vx&A~4BSx0{t?ElPREZY=JwofuBKzC}Cpw^2$N~x|(m-`1DN$Ln3X-m+)iH}&dKg5tKeWCC|Ugvf%%)1@!P!_!tjIKweJvk~n~7?t%D>rX)9 z)$w|sYNBj@9F%Dn13tVFLft`tuJB?#fw8fkC*HGIll8BcY;%E1@4x)&E#k_;WAe-( zM>039h4z~IvX1X6;qfK%OxWtpUpuuzSUWhqqFr5Gz1tm>#{gGoySH{KLVeNqRoVvGUyjdlS1XkdMZ^=rlE`ERgTkNAd z8uz6iqcq5Q7?n{}I$P;pP3O(uiVok{KjC{MrTaLP|e4p7?%b{akdL%k_QQ zikeiD0;^k%)7*)*nwxac53~VXNeFLw>vbX>^_tSoaD}ioR5oh_^CJ+981^Oi1%@vO zNs1)NF^X3#>NT5`Ko*u3YabzU;S?wjY_y`)iEDB-6BVB0?AZa*ByyKc-$wFS7;s-8 znsu4FX~RvGltQ-9qeWjuP$E2b=TRy7_O5eOmhE#_R#a$ zBqbIb&C=vIaZ4L*3dT^0`@0l&ApTpgQ4^MMG3bN<%w7;Z?CkCeIDd6{*@?sZ4}^M~ zzxS6~8C!3Fb*c%vGs~Q4Su8#ItDVo;!$)00sLabr%J7dy1fU`jMam(H$H#wW(o6E8 zQpuB!rrvLs(-xP+{LOr;)+LM%(q#sI zE1U=f2=SB8}8jc>=%TvqpLBtiszydKIS!Qes=q121s|s_`|)NlJX zZ7Qwo*4`4wM(A|Yi+dT)MkkKSv3S!?rtlKTkzjkb{%3cqj=lo;oB0sdyBH&;67c>E z6K#Pps^9uLFed}HdfUMXkHd4aPHgFvc08-_us_H_PYef}U4qq9z_8GJ7+lStoy$Hr zvln>`;ntbT}9pn~PfOU@;^v<4&xEWYWPW$e05a)aPQW0duy18ie zu&$Xptu&NNhF#pEm~s!kdU{6_IO3a+<=xb&jUB5cii4i)y!qA>gSoJ;v(9UUNLF zQN42~Gt7!rt%z$D7EvcJ7DdA>`9lT;yqo8GtVU)H;R1LM-00G4P(^x|?puS#A|V}^ z6f2@+w<^mJ;`J6v{yyyTuLB=pGAfGZZiPTvR{BW!2$|fAXHQu9vxi2Zcoeq>r984# zPpZqbbm_-)N{OGYC<1*7C|gh3eH_a!?@N@|US7 zT;q3n{Kdn{-$LdbaRMOtd$YNuA>s;02O-#yMH(97>o%ycnp>}ZVfZ8oX(-2`;u$0D z#GM@C9x!>Tm$oMF&o1j|FLZu!Mh>Eo*0vY(Zh*_$t?X@>np$YJ0OB<_mR3cc1isy| zOp(L2lv3Q0th57-L-%2XJPVune_e@rW}M8&l5*`XWfxd!9C##Wd|6Uu&Z8Wzw~WA> z^v{;;-@)<~UlR(ZH&x2su9Jg<3j0CiM=S)9hyfTfKK~|? zH7Q_sL%&px!WP1ho`~)^Qj{J5$*DC0F{5(GinbDx1~Vmzg%hmg<9R|OL$;dsMaRlR zynC}%?%a$)#l(T)xklFCjD1vr(E{&Z?|pj~0q{w3J9c4P;`5a`Nr7z@5L1{t<5#+f z=>Q#n3)NZuF)V)+iB!7=pJ)EyOJwQ5pvN9awr_`$)9OFC`ZJ7PCkQD@76n^KD|uLJ zbbh)Nj&I7`^U}RL^Z7X~D|@W%>fY=tO83LyE&HLjHC>IBtgC+@7Oc-PM4O;+_# z#T9V&M_dNc_f*WsR6)adFREA66-nSHX4yB__a-_IcM{davDCCR)u`!homV>L-r?IB zE2^`Y&_Yl?^3L^B=3S_I^}4VnuJ4Z$NN8jnDY4dR(2F6`<}!=$2o#jqT8w4qK;uP# zVSejnx=X00BVXLcr~Tc=?I^MBI=O9=!g)f^{5com9GZqW;~nMBHtQ$19~tgQag;N= z7~j9v$F6l>j=TnmFaG^=C+-7Ag+$&f72F9QXmN(pmtIH8ivaYSJ-vpHNO^I3{de1& zmzNp|!X~fNg-w`)@9fS*Cms>s!-VdZe00%IymyX!YNt}J%FC(Vj>6}9WOL^ZX>K>T z*nt)(F`8?(iS$&f7O2zsX&;)ujDJN3G|uD(_gJbLWvQf*&Sg|+C_0}BM_I&u%JcY( z_m-rF`wPGFRLGdPWU&=pgxIl>7e{mQ_w$8nu z>mUJ({MN>NL`h-7Acg2?A%&9m7wTn$n^okFp)i0{%d|0i2M@4SFHh2 z0B}K-6aiFH@hKFv;@{r^;e(eD*R&U-0ARcDyJgC;vuVN%R_0zT{w6t zHJ_`-WLh}87<4%2d!Ger|Bb-&=`G}-0J8l+?zc9ju6s8S#xKwUY0}%YJoUeI%N{n( zl!yd@Tr~f66>oRT5@U2a9;Y}bkbYo(q_td5^Dl2nceR5t%7LsrKhOyMTN6&^Z`YfI zFB)WKFG!?Fi(AN(Kv?MrB$||Dv1d&l*Qlwx-m;WQ*2WDIXQV07kg8#L-Io1&lL zO4mJ3S=}zBTAk+(7ZjLBKJWC(1ll$;IQkf-WE%je^c(0&u_h9Og~pnnH>LLLVa8SE zzn=Hxo(ubxFedxHC%V_U&U&qh0DXoD*eL`6Hv~AiXnLLOYdjvGXx5azYYkoQBoTIW zo!f5Dp3K+6zz`2&|5C7yuc~oL-^-}XblnrRu3>!q`0;hBS1d4+PuElcFhqEDQ&LZy)>yVSzfwyn)s3w+8JUrw5UR7l|cW3%bo##=7}XrG%nC5@;ABs%V9-lkbN; z&T;*Aq+{Xy&%1jUh>My}p#WAwUc(btuj-R&sW-4=bfAnx09F*L6E}BxL79#61cK>FDb~p4g^p{A(Vs-l+`CM6vtOy9nqr2#W4q-5I17v z18M{T*s!hA>J$VIbc-F8eFP&9<;8+gX6rT~X`tH_7Z)Qq$5zO2eWv_Xatne`4rB?$ zz+N0hkpu`z%+PsM9=Ef;AZ?_Phw>=heJ&3|qbSuh_m3kbt|xj*cSKD^B>6uUza%M} zWBXbsj2h(w`0w|;skW0drbwOGLwaSt$lPrNdT~97VkASHU|@b0352gWj(geJ*`gB= z?x0VHL-0RPPqkc)QJ+mnNT9=XzzAdQZ!FWaxEahCS+<{nM_u$&XCSZJ^(a}_&3D}j z#JgLOc68pCprfS~lH&KJ%a?NoI~#|)Y>*8Sw7;(XJaHfsu$h=nrctXmm*s$R84Y5N zm{Y725sZ_TrFN*ts(~l#wTs(7aw+{R=A^|^-HCWpFxh^$U9HnC&gDvo_%|E1ZDS_L zt6DQ!q)j7|BG{jLYOZzJIrIPyFC@fjdNvua@HFV}UL5N!aNL3Cjo;P1M_?Jc2>O!7 zWpmbjLw!Ue{%yO%ymWC!OA0|$d+62QifVpO@;#RNFD`)X(RVV@S;Etzsb|1%?PJHLXb+G(>nT<_62FtR$%+tTH#~|a3zn`iThMKTu zc_dP2oF372HubpsA+K5lzCpxxhg1cBv0z5Z<~cKM;63_%|Cf{RX9WlFz()V z80dt|Ay|L4D}MA6`Zoc-GVs&egaI@>(`601z2d^CQ^*7HSishU%A_2MN#{Icj#%TF-@b#b0l~MQfAs^ae z>pOtw?F90?EtqK+Vnk!0@z&3z`Wa_UuS1P;xSw;Q2R8Z^9lMq#alsR&#d}o>BmQg- z8!m5d76AsR*-P{OtP!@o2jiiz!4sk{zM z6w3@y&K~ai-s39SxIJx)-L;>F;+k1m3m4)891`F+$6$ zBk1K`xUSAaLBk5K<#!m?D|0Y#*2L}R@UgxS?_WT9}XaBc$JoO*8{o=au{@Qg8n__53q(J zU?0XoCi1nhQAiImA|li*rR#GI;QkN>may^1mFQvS-vY$vr>kS86j&_cd<}dl@-KzP zhzk=QzUYZW1-|0SAXnC@U?Oy-nj?-X;^N{D0Gi)AmzYUT zo60wE^d9j7E!;V2*u{=}caU9+s#r(UGBY1Nn4|QMcaTt#^qG(MA1`4ML0u7m8)@Ns zcJHblnvok=qz&Mlx(2e`S;zxcPv#{wJ$wXg!2M4-LlBc}6pxXo2D$} z=FwbvlUhQS_l@<83%S>r)079rC&nso${T0UnVvNT81Xs5Q_CHyE& z0015bky>;Z9e~rFMS%gt6h}^Iqfy%@LU<5@{lRNo_$4K66I%_lj16UU?#R3fQr@^I!mlmwN#pU*Q7qW9m%)xQ_{4)c#mn72~S#h`1p+iR|G1s}{ zN=iyv?7+i3wl#v8*~Su(wR@r$$Q#ugVM>!?#*p{!ep6_uJ#6RGd;kyCVo3({>G~kQ z%6zQ=2&SSE0FjX&C655GTi3e6%&Ge^C*qV6(x!#$s#8aBqt*% zL69gpDo9d9C4-=X0s@j0dak|CK6~Hpr*Gf$um60X4-{*yde zo|wO4II0F;{d!;O@8J;4+uYMfV;BA_z=(A>0o|AL(9gNKSj6sYsBon%ytb}r3gI1M zCBTi9H*UvDmE#b+5UywQ@ax(DrFo{6z?5b!zGU8s_$yclO-;$gRT;%=(ltBS9 zVE&3q>JLqvDd_mX;Dw2tuDOOXI@X~WCVJeXefBSIV zVA?WRiU43K4n47%B%cP@IGMJ8a8)Z@1#lY!l|k-`T}{a=D4D5i&yIB}eaz*==fmzMRE0N6rZ@hr_>;~0bB{t8h-r77B+U6UVR1 zA7C>o2lLVwg1uKh_)0DufsQUo6KcB{9Z9*GEcZVP9A(d{i_jgTm(O*lnxOmWv?|J- z-^@X_vBBKI*_FQV4cpxLud|&Q_!!~vfHPmmJ2zG0QPD@m zUl~!4okT|SN?lI!fZ{PD)qtqo9qZ&NVfvTWA=s%FoYQrEP^TZHuN`S(!G-z)HQ78O z%J=?>;FE8&RgT@hO^1Q1W^G1s(GkSAb(?A#=0J3&=NV=`E)$PA<9GR+3;%s+#pCj8 zU56@#kC;E^^E%Hg*!Q%DLUtnZ=hRjQY&NE)78R@bTfm|gyZEw{#6an>z6iD!uAiUV z+(mbM{!I1r=bZ1~-RodCpxr?g-eJk6s#ZcZCu}iB6_WjN&c+1BxI;ZsUQaDA#kBWGp`?i9(1&UI5T3VZ--%wQmQ|&W26`L*t zoBybye#GdeWxrLQ-)LV{*juI0ce&2!uG>;#@+TRKQa^jL`48m9Qs06tw<8Tpu>So# zpodv97*r>uo<8UiehAsKUSGzB?Fd4?2{ljzjc4lTH zX&rc`Dje@%1Z?a5^uEtMXYx<4rPY3M;w(h#;gguaaDW9BrZbym}-qS9pHW4J^) z*EW|v;}~B?%i`_Cq19y@@@K8vYygTpzCXvOW_dX+*?dq&t!k!fd6qFG;AUoEu&n#c zbbW2=*Y;qC?ey!X2r)oAL4GF?Ll-zD?vwEuYQjHK9_0uWZ655&jXa3E9mUMO5qz2D z@b9cqo_b1YN80{{WvWh;Z1LdxI7zZTrh(nBX4==3yV)#>Ybr5xO*$8<*AMn%LF-@} z!Fg@|Y{aX8HAWEpFiyGrvbPPQDS|2k>C*9o_bGA^2@BEdJktW?>J1X={#w)xevIi; zYmq=lkoy{?qT^}n1w3hVDv0yfjKWJ)_5(JKE3_#vfe=I5rCyr`^-u$9?_!9pHKb8G z;)GhhN7XGRvw{S=t<~v=P|u8sB*q5rGIa|Pn)ZI4f^71F*J4<02G$>Pt;= z73KR-LwD>T%{b_Ns=vIEQAIUXdP~+P3e>oYD=Mn9EU(xJH?Rjlyz!wXU9bp{4+B$OCZ9@76KE?x; zf4DV(pPLte)S`eGuwO8SX>z@S14s!vM-ADG2>EJW^B{0ICdb(-5uX3&AXjK zx9ij*K>MO;_1q;ZO$D#|P?@9@sz8~?m(C0WS1Cn~tC|J*CV9v=$rJohz$i_HO2sBE zCAfndcTnGGF5)aZnPi(pG!&+I`*{4|Iw zP=FxrT9YT_9?Lfzl>gw(mRXJPoK^uvDR5)PujM2jC`F(su$q1-yw}n{w zxKCXVi#>BW`jRvYZ(ogFWF0BZPdgY%06DYeIAHILGPXst2m(BoHFrc z=^x1PCqv?IPxyJx{#0e1?{jkQ$w*_Cv|#qEYFeE>5<_;m^y;;1OsF>Xq4t0z2<6&G z)|*K9n6P3TpI=?!W~56=lHlUy*^cPrqY%gP8&5W_ZmuO;@6hFbfW-|Yuat)$tYTp! z*XP@Zwce3;f3g7Mkxcvu+mL0{$YpxIz}b@4{$7@u?-_Rq){sd?(+a=RhV9@YlAf_# zHto*GFdDZMql{={g2U747_j ziWtkq(ISWWAb{3-efP;MCa!{2dDy7#?p-7%RRrewJD9RD}aEujP zVGMp4os(=^0N9l$LeJw|e_UjOj)o*dr~@xI0_zBkIb|L5@CeV zNN-ShsI{It<03I){5K&+xo*{??;PLxBMJ=+8a=0ti+l1u)T?Qa?EM#i4dTU`-nDP} zcZnwu1jO*ggPZ5k4-qWI&!hUXFisQ=wLnld5T|cRwa91$eY^BHkJl{iSiD5TsLsbJ z;85{t#W4S}h1hur0K&UK6&tO9=nxA(Rci{_nXLT{8l%|j{{CqNyI379fa-(%^ zuDinhgJj6gm-GrV$cH(=`SCUcsjqy0Ah}z9zMbFKL6y-U#$YvT3SEQ0h90~dq40u% zohAEEWexmm1Yuzku3~X3qc$_s%XHmzM%c9=><7=1!UuLZa1g&=K6i(Ab@|&_kdA3Q zZ-(Ibr01sy$kCx?VIizBhxbU%u)jUWpm6FG;(Y!U7=r)55C)`gD2_|*Iv zO?XfKG(_Qt!0$HZLYMi6oz zhU0_4c_6d_TUTfXTp)z)&cL@H8}|5aQJMnVm+n|>dVtEzQ70}{W*@(hzX#ER)6;h) z^+TEY0RI8W+?nDtixI%oih#_by}O$gRIu&PDJB_clL5hdJu9HuO`e)_MEUyr>y+8P zDxL%twIiSh8L}=3*rW#k)6xzd*v6nu8gi;bn$Q$!2*3`g>F9Anaey^>@KvxssJ)p% z#>?u9*sN_Ve`JDM2nk1i6&&B0rEIOA&usqbh4?sP|EdRQ`%)?t6%}V7PXnc=Hiq9h zjUnrX6}1q;itbfLeFFXh4%W1UlRy>c?=uKVvZ)wQc(#TLyH+$1~`l5 z+p{A}2pbnbNKIZUn3cN78A+wjcc0oKm5l$06-DjpEdVY+*v1zdtWiKF4n~$pX6gc( zmZMe#=d#9e#J*!m4^2;!arGdYzR!*w5WMX=t zQiZflt{`G4DPw$)-rcp5f-x6 zX$Q2bLja?(y0-&QML66%1SDkTDtpVa(P{q|6rT_~hgelZPa;AIolP}>h3*0b*3R*|^?aL#mn=h`Kp&T(|3odEw2TC> zd(5s>%Q?go5E`CXyR(La<~`KL?Prd*LV&HMDQ(gSsqO6Bj;lZ}f{s1fVXf@3_SUs6 zdC;!=oxJmFW;|C_u5`u(w45`{l*6wv@}kuFhaQ_c{@Ghm6-WSsgkugJ3bgmiR?dQE z%DD6hUDBgdEaZ#g1c9wB_|}XPdBC{UQ|)UBw(|eSv#vt*%5Uk@VYu%LEWQxkkqNBy z5=Lomn70odOHLfUt$;80Loq&56I$2uHLT9+n|Qa7MySM<3&E#_EO}6#=cp|Z0G35w z_!ECH?8`!Hvd`f^bk&rfMeM`{G$}})sxC@O!nh0Y>Sp!z%g&-jU~UPm*+DHjw_wz6 z44ETEaOK?)7Nx6=V94*x(<+00K2xKH0a)XPhd3%1b`VIzAyA(lJ38_?84Q zl}vi{@rMI(Mk^kSzc>eObgZ+gY`fXA@l5I#yePmd--83XAt#8tmQniD?RjQiu2Rq{ z7}uQyOdeEr$M_Uu?-cfu@r9?-etPwa(lR>+OA?hTvmg1Guo`mpP_two)ygS1X!v(S zpE?842>df04QwEP@i!jRx$RlN4RKKuqb<0A@eaSX4n;VK%z4B7z|&Y_oPdyg5e}%*~2q; z#rSV}0YcFDb0pR?5rUc80dZsNTbtk0vg>ol+TOiV{*o=)2yf=4L5o?(9f0gZU;BCd zgWm=Up}TZiMf=1vUD>O&0l3Mc?E912vNaS~T3WF8`{DCpQFlA|ts8Co;?5=}8eZjE zmfx#Sl~`aYBSoP7KkLS*dy+Ov=boMe2U-IH=zy5$fDw3^4NiC1SXft-V5^J*junCq zUM+)FAz`(_eTYYy1S(|#(9#2tA^}(6u-$?vW382ncN)xyJAkX4ESbZkWo~3IuVoDMb4GfD}Kr<-8SmPBiof#7lU-# z_cVV=X$LA1#w|zb#L}7WNSa3^9ta2CBJ{^v>|793UT0@5}S zJfYi1Mwp;DANk^iY}5Yk5{7&SMA;@Ump?;2S=jbEEcNE$eJ~7|0n=kCDXBrU<uNIQNxx>4;IzPm;oO@2}tze=N(WyW4u%J%7 z`p(raI1Y{&P-{Tjdy3h%Rtj+3o`W`{$BK<|D51h23y7|S=R>_zN-z}C1PV4(H2GrF zL1Lk~6qW|b5p47MV4yPXwFH&f2P*dRS&q=% zZ=#KxqZcz2d$E)-&G_1&?NS_aO*6Tngj3Oi1_ z+`YNzlI^@S>Zx#JS!|*2zT@vT*Kqw79whbXw)`M-^tL>;&?~b0>@j`MC0);6rm)V5 znU3fdaNB&g;iV8zBNw_JaHKi?O)jec*8xS@;;lC;)wJ=Q?W~Ixzsf{bL+4XnQR>Y< z;h=K_-iO7D%kdK9bv;O4U26-&Z^d%l?@d12Of_)+@cyYoMcsTa+%Ly*o?;O3FY2yD&N~n_k@l!2lr6S^(b3T#;c#0O)>CpGgtx2osNt8M7kvc_zHS3A zRZGq%{XW_Wm}FO{pjM(*U(Kg4!V4l* znIek{2a8uh)m3ZIoR7rt2P|faCt*V8gZ%wh8Ou7poCQ1>+Q5+`3`TV>oI#12PelW9 zL4as5S;kQt-wkj@(w1vrzQXi+&*Cyvd6EaHrt*{nk3*48-~-5oG*40@(&7qk=C@Hw zbT0_*6vt%##5wmQsi|B9bBzu_SiihrKmW1rS~}?g>=J^|;$xU9#>dJ40^k^ExT0^K zA4Sj6<3h~4j-YTE$PqB|>;0!RVl|R#r#y1Pa?tw_&wi(K{eF^H_0Q)E7R5&QdEzHR zTSiNiskvP*GF$jpO4vpQyjhtz!qd_vn={S3&8BmYxueM4cYt&_jUs}Un}%kN&s$Pi zOdYfX%qptZa=yO5pelpVSSsxxFszdBs(tKGaM%&bQbROJ@TqBF(H<(h))iOgv%4Yg z@o8hXkBNp{^7zMnh)_*oROK-wY5g1ySWQ|mWM6{nV@-&_L1TfQn5p$w7vIJMr9Sp? z{^7{cdm@IJngNrA9H*YCG9Mq`c|aBjZ5msNw!eS;Q-hpv@OS0b=&$il=Twi`5KX3k zeGJ*%c%tmjd>Fvrbj7h+UnUr>Fj$$6D@`|jpq2*^TzaRnDyw5xf)*6JgjK>LM^8F} zbsvf9+pT|b0oo`kYE5w;$}aH^{QWaM_ah$4L7EwkG*njKKu989c9vw)F_?4oFM%lFI>2Q zXJ2VtKUC?|8wLSuG;Q~nHy2DEo2Xfi`xG>R^qczSiM!jETaVWo%ivxDMk-+BdVY0d zLmdv#VlPKli5N2fE$Yf7Al#)aqpx+R-%QMQj&;lrW?h#MSYtC^nkEGeGSYSL*^l^(2a7WCi%4%$h<10~6d(mK6>Z&WbjcsH)tK zb6MjU4JC-u*&eeTq7XjsUD}RMA&GK;jA;{-Mb`|HD;!# zN%v)&Q))yFbK^=bJSV-Mh3dS+CTP4;wrZu>!6Uee1qW{3(Lbb_>pe_6zz*vihJgab z>J8R0%*jlaL@>lTQ_2+AD=m=`U<{bmO(&D*t)i2IDwiHLhd|GO$&B@k;RDD+*`bv9 z49f`)s((8&qG5-|&9UKP=&c66cJm=p^FFG);N{|+3<-8pT)o_0x!zKKhMC@X&8LSOhh*_DkadsL~g?fpl zScv_CTr&|s-}`P|)$GD{%}8duGQQN4MwCJHwkK1Hz8;n1#x`Q>5(lMK=l!20dL7=< z<(L${b*{U8@zDdR=+mSm7F>O#i9-2cHA#~1)Qx4_QV&#)gejdM%BggGJ=Gn@PRq;d zZK*3`k#qlgZTsthWI(WHfKrq)VJn#d`GjZ%ekg~$jV%e7P`cxkjuK#oBj9CPCFbL+ zeOt{PL!kv!*2#a!p6^!W)1^2E!%2D1?l*A0${yqwd}iy`+&U7I4VC)W2~x+nnbkvb zVmo1TXr9?kJL#!ZTbGtJ#ztAS^6>@hGplMkf7Ox1*l$RRSm2zXo;D!HBxe$^Zs};P zZD+^lddz-2t%COOTplIeHZz(47YuE|;oDRMrPO&{iYc&bs4A%f8x`($5N(^?Lgq@R%4 zb9x`9A;c$o@i||~ckA^wRa_Ar-pKQB>Y9wri{9S$wC4|)(#x9GE9j9nD*E>0M>rhU zJ%>bDMdJ!q*MlX!1ep&ri-jbHMbJKl6JAh3L3cb)j3B04`{GTG8D->IP3ep0{Ls)i zd3bn>OR+L`)!xeZa=pV$+Z{>1IMD+4*FaN(A!TM{rd>|0sgXx=*ukfaS>X8*$3EjF zxwkqSnb+~k#A(Pg&O9rOg0>eO=PSr{(TNtTH4Htpl0?02w|BR72{N4-|6JV;y6tD& z(d4}c0OR1BLf*19v<^_&lx{CeivUyAv2gmq_T_>WI}gT#6VPuA_e)m4gg>$$LO`L> zV(3nd7a`tb{T=>+bJlo_!AjEAW`<$zOvdAXG^GLHXFgPe>#*$m&84jhp)6@hpxLC4 zFLldNR+V*xL@B_}@TxUShB%dgJU=le$> zOGkwtj?ty1dd5Si^hj*_05NTlKKaCVP9)=m`$G$H1(6tZz3Q4uy?k9mL+p9!t1J?i zOROrLDo6}~Na<}8Vr}NzH1T~$T$auC3XDxVI`CCJKI_E7JDwHN{MzuB>Vlj z_P3i6>QCJy%WqE>>tJV>0 znCI=M7{+ox^kt~g9n#o)gh>g9Y-v9KxhHU;lDOc{g{ZE30?nBR=JY*z(NBW^!53Rb zlvJm3L%8ez8Y2uUM?{6Lg<{BSd{eLV>2if~B9paS&K)Yrh#s5Q(Z}q<=dVx~@ zCUA5ZO=w9f|>;8FbR>;6yv8c=O8cz!*a{Cyj~vM$+e zNSsBdpKnh9SQ6T}aYfj!1Riwkd*An3K=bP;NbiY5P!SNeyN3LD{O)(+T|yXwcHvIG z0%a=#!{~LJxg!L@O61X&cZMd#QP&V8TypqR0wW123EY%}dzG^Syt3^4@OrjB9|DqX zP=kjAz#+QFjfp4AAP)KnmzCkX;;Bn*AqSUbIeq$*;(s&S9wVsAKS%h#{TFnR%cKON zs{#Q8{5L-G|6!a!`42VhJ&9;K_TxgNRS22!%*+^7Rn_t4TdHc;zZ&Ydd_tzcOZJg{ zWa=YU3wmOTuZ*e3Sdj1x=*QvV;TnzNrXkzAyRUEB+uM)9aoo5x$O$C?q;eAf;^9p& zKI7v@7EMIvWD&+A5%O@P^Fb_I7TnzV00ZBq8 zpdR+bD;?@F9^{63hyKKg#3`f}ZU=z)5wq5sRpa@YV^L{CUD&Q4nN{n{Zc_k=k{Dpr z)(RecID-0&C>xwURk8P^of56Q+{tfkI4(hE?(|pVH#Mr5146 zpOYX!?I04#nm<-LGs@4-&qqMsI}397d%uH0POkRjV$D-6VKt^-2lC?G9fPqulmDwX zgJ1YaCVT@Ev^IR7uXFztfNK)(ikZ?u88|Vy+_&-0S|ma(yf1sJ+mgsEZiaz6e>;#b zIh}*&x*O)8vlHad;co*3uwvbsEwNwB|2KvXnO;e~CeT74D+$!Ka^Cw*y;hfMyiQwN zTMt45!qjT}?=hl=nGs-B7e4g>$hl8o=Ewf0T$bqKsw`T75)RM2D`}b4d#@ zwrocTp#bxcgga{cGEZ}EF1>FD@*WOcBY}w4b3g?&I%rxDz6Dff4oG2>6`1dVU%@mm z;X44-i9^y&s2F#G!HX&2B@q!32m}EcI`8}^MECmoZZA^yfYwLgMmwOJ1E}@gttnt` zffG9=V94FTpefD;W348cpU|DO0K$Wwj3IVl=;8d2YiO4PqTmSH5HaW*%O*7f&Bol0 z)_(xbbv#gPitJj|19soBLVVQ)ZM};7fx&GDbjeys0RAAY#hu<}vx5f$X#Isj%5$mr z1XORdfQcvVJ*mv4;6rBW+m|9tTk9ikRmJw_@1;Tj9Ka*s_I2e=meDitB1HRD2&>Jk z(;57)lCYcW8c!=9Bs+iDW$T4ChzV&5`n~LCk16{37B-_!;Z$?evuRL!hT}hF;4{d0 zzwEg|;S^T{D$> zyFk}Xwo7cQ#R|5{k8v)v*p+AJOIQH z)d64>chloSvX2#WR8T29XW8g;VetL?^WPtycL4{);oRotW{FW?TL_G%fvm%^Gv*7l zuWc6~ARJQQ5t0L%(B6EoPb)c@og)i3{F7SwQjAbCO*Nd8)T; zbMs)1^r|ZYQoqLgnEL1RB0s-dbL!KhVCsj6YyeOxgqZ~HW8(yB8X9e|T}-73A1%Vn zBD;3B>9oVqiy9MaKP-gl}O}HJv%IgYNM%3RrQ&6f%5+Y z3%dQS<^#P`$*gayG#j@jgJ)P0A)%iI%Mi$4C>k3ZJJ7>W0tKHC8Ff|Ybshtxm)Enf zifk)O^edoc+xqGf+jj1;i#PCm0PARj*^|um@5~_1JUjLdtVmz4LBeG9-5YwZ)$%>S z<)1Hbt#Dvd^|Fo#E(|_>HLEeYg0m{hWV_=ss-U_UDe!ojGJ}&DC$M~j!UZW({;7Dq zSgLQuXaw#?B>$!7K0Y_194SR30!~LPy*w{-hez7OZv?23Sr>FQr54oiW+_z@crF4pDnRVQ;z^%N(<>CCnx_RYf!QPN(-Njt*Hk# zqlQaHPHOGH(d6V#Be%DH^$A!BxP6pk;R7{YJ3z}?V_^xJ^{s95&tC+$W(6a79#QnbTXDK>-6102c@#Ws0CKJ9?=A-I%A`iueUX zi}d@V$MpGBU`Ow`bI-N}$S?C!#-04eko8l zX2=MPoeig@*4UX2XXp@)Mmvqyu7Sn%`3Q+i%ZbH1M@*=8drDPN`rx0N$6)>m*=38W{I#?D{TFpKYpzbsHVHaxgtp$@KP3d zm(V@*FI=AwO4LD2y`-UAp>CV2xI!K*?HUIweLDxIfT-cVEEySujN1$sJo znVwB*BjHwZ^gDa8+nS6JU?i(N*!I5gCvu7wLrKx~tb||uOMPt@QS-d`L>ik;E4Rg5 zKMRX$GbHnp^jXU}X=7F8IifvTfAtFuBGkT)q?B$W#cBCI;R@5}AZV8* zARR{=Nvp+3{@&7cV!9f@ju;>ACB3{!@jurk`*tjgu6F@WxzRXnC zo5cUhLf(>%Yd~vnZ$&C3xL>Y$ zUi1Dg6KDj|e9a%5;DMD8XHIl_Juh(2%&LLz3xqBXo3mT^z23kRn3A((O)tw?;#LG7 zeE5vGe%w&rQSuYE(`*q!cZ^XSRhSq56F$e2&2@=YVqa$7NFWk5^5g2G1eQpsiS?jK z0{lp+<8<)7T4sZ61!|*jIUUcU8v?BRMmF2*dPx!lSBmMM;s@Ak7knW40-az^E#LCg zEkf9!+j)DOd}ItiWyIgvHsW)7I6l(&jDOA_|2k1SsNR}mTpwyeH%K3IDr1=;d|cl1 zlUUdt8E(I8_Ia{SJ#1f={CuvkfKTTZ_)M1e7sq-K;l zztI}OKlkfok8F+59gm- z>y>8s<;h7_(X3w6FVQs+t|U&kg}!^7{A-+ZD^fuD!t?Qft%=tT_5}}Ifm48gZhtCE z-WyNLFp-|Ix$-D4G=)t@BU+!hx~_BdSp^yUsmpN|8Dk9fslKx9CSN7Bd8&rN4(Xg( z*7}~52ul5OoV@I$Tmjik6h*UW{4rsVJ5#Yh0l>eBVfoMU_)9}5@d`D~YOu(!#i0tC zWe--<#m{Q9wEN*_&{+qSKKLfrA^+JZ;DDS_P#}YVM-8$jCwVCGJ7y@ZAH_>aBva@@ zN5c;K-tKOlecah;w&Qw3HsKAg;`S7LXQl8QMn>NrsIb&WnCRBlIz^?YKiY^rRooR> z7;a82+2eYB2a1d{U|=jssvVqjNTB5eX9of8uL+D3OAts?vO2@a=_JDm4B<(HL}ZNLJS)x-*zj)*H)!JD(x^70@vR%r97&3{0)n4s*x%DzXi zS3erf?-f3p{B=_Hd0%(;g>gD@eX*Wz31BdR z+v0$e-_bNX4%vQIXNa;T%(ImaoE(v z?X?kq&vhsCUPe#c7qfDxBud+QcEx>T4Dqt2lM$y()~+!~$nxchTjF!9KP@9Rg*($; zdx)}7_HEqCkdoFCo_w3#<8A$Z;;VIHLgn_SZE}n{le+gXbFqIhmPGV(3&Re_5+e)! zz8PjRbMlD|r#Qq#iYDJUN3r5-9sPl|{TnhK2aUZ0A%Ygl<=uTPPO|J2IesT}%s zh;`Ycu9*>6YdxB(3I-h{6RXSf>BVm-7^|Eu((dm$Wh6xK_^aJ;;WW+gTgctcK6>lL zuP#IwcfaW>Cop{keS?+HyXhk<0Q+$pVnUZ4(*H_1eoLt}{-S5l@H;fhoAQ3gYx&xZ zyU=-o;+U0OqJdqF|E()hfs8&bC`Ml27`mvnp(U1A9h;URKk7#oe=A*)-_2%=>1


e!{rUo^-g~=$s~C8bSm{ z^*1}$xrGS<7L=zse_nkQv)=nu$A!^{6)%9AH7F^O!uwo8GxGg(=Zx<=gViP0))c^} z$Ad>XY_1ATAA&mxDL$9z*TXOx5zs<2eH#&2FgXyV#WjSPI6|NLL!#4HkW2Y6m5}3C zv2ZCKdlCBMicCpldT&SHk88I4#nzfdz=fxR21%MNhvAv5)ogCY5D%lMamT3eIFt(` zP-KSdWZ~SO-$<k1kyz>QPtZl5@f z_aR6BPYFV%oLdakY6_v!d+0^WZdp@X=KB&vlYeJ(rY}m$ral6&6rMYWxUX8&oDSL` zK$tZI4Xh&~Bkey{ez5@lBVxw~xfD1Db|XBIe8O$?AQ4X&&o8Thla(eku>o?4%R*n){P|gtX`@*Uv}Vb9jAn~E z0j=q-hOsti$)V^0El33T;3)bPExMwvf^$b4&uNYl^Jn@g4?I1)dC)@7o1ku|iu6nY zLZ=QW;n%?OU=Bi2HU%F(q>*$>3jxvWA@mlcmBwAg?)Q3T6g>cA4AozEr{ht^fC5$> zkh0J{PlnuQeUxLTKJ+kTSGS~Y$S^iy42Ty`mqWdTAoCbzV zAD)n;A-aDMf69u8ban?D0e>*OQ5AAidim0k0mXRX(uN3lh82Lavr7YkGaUOf-y^KV zlm*t2_O*URW#w@apAJn@_0p*QzG&Vw00Gr|f8OP^Y(C zO@Q13*uphBqy1nvzS>H)l2x-Xo)3)=*dXWZhTq7z4Qql=C6$VbN&wJ&hZ_97*Bb$` zc(M-9nB|iP#@|^x9q)_`Xx|GE*tF$*XEjGAnT(Zh1Bsy!cnu5qsLn?Dn@UIra0~@S z0rVL`ntH>_3?J+2TLhK~)u;KTvgVV@Y~MG#vb+n=zqwZt>N1rR>)GIyDya6sWtBXY=Ca=B=)mrr}?a@KmgVG z&TV9{)>N=;cOr>P^e8HY>)BJqXdvV25{0tw=i*QejH^PZ%AMU*=4-|atn%$%zDTLSqq1MLX z;+}P!xImKx++OH3ohDM|0I4fb${z2qUz2snl%>#u%>dj%m1S?dy_1`fZhOT&LQIRb zvk#ub0sPFg%}ck@K(EpOK6%_t{d*E_A6D`?uJOQ*xL9FsEHM2R*}Xn0ab#V=7ME^ro_|-e zrTnQnSTTld=HeDumYvjBSGvZ5@W+6&!8t_mehsBXg9P$}Tfp!IFj{uBn(-#nUl`s@ znUTGgwF;jXeRVJrZvx0_>wqPe`O)N}zB)w@n4>rQIg9pAW%9g8Q5QYU?Vbl#@c2L)uF7GT`83_gLzR2e?PUZneMIr838^Zg$J4r@{C!~Zq%{z=cL?7Vy2km16H{JZCJ_Q2q0 zM{$sA%Q?K6Ma@JT0ym2EC!5nZ;r@VOp33Q%B*zC}RKaFTPDw}doilE2lRkOo`03Lo z*;^6OOpf%1G2YLa`JjlMz&NbkHNJgzx>JMzur*cGY(>(Ca(fNmKij^*w|p01#MabB zPB+WMM_1i&<3}cS-3v)(4r^nB`~+LE8z9wn>p%kh<=>~zX}sl!-ID9=cP-3#4ji2g z7j@^({KhEj5hXI3$zTD-R8h-+gAxqT>wrQ)6~#T&0MErw!yM}$=6%4xqZ|Q;(hM*# z7F^9)+)?VNmiQ&;>@K)x$HuG?zrAn@>~fKA4}G6k;K?0Ur&~oz#h=53>hJf3ZM_^& zA~7_75NBTw+led1z2q}d$^|^Qg>-41urE-P5jlv=MJY=uW6xxJ7b)SPE#1(@xCOf+(0kKT3vsAn}hD!?q08^*;&#_~=EvcBE z$H{^+PCX`m%VVwcml}V6au?}X=$D_;1m@Gw%O7vLmReT%Xc$G#U&LUi{6^MWBRsrJ z>RRI@2^r;vWnDy+cY){Ubc3&qulV~$rYS#|S=`a-yLxt@W22TvqYBvTL|*@8qIag8 z$hYjQfEeL~tXnQWVd+~M^e}6XzFOw{M+F{gObiPX3p&+zi}b`onJ?swu`w;JM#J_( zC?wb-{!t!1wRiuYr7p-asIRYXN&1wHHrN^ZR*>0Cg>xo}NrCT3H%QA>Aj_cv<@m>RmZ_<|6{r zy|CFUe+=Siz>X!t;{0pzR2Z+zjG|7`v>(xcV4O!6CMtRSL;hGIImGe`WKT1hZ%06k ztcN}J{xAx^>V(W`kJIrsm}S@%J3=Mir(lI*h>x&InubOU(L_j|{yQv(Xn05ohhl|D z-nRnv?iKuQ1i2`EwGtz$Mg$&sbYW<^@ish&yoKd8X6Wq*r_wWWi0A_eRg{`7^hi2! z8!v}}pFBbWb(+i91<+o`Es8w2sg0JuEgAO3{bZP_Yr!TQ`B;S*xKL5gQyciSzaU)Q zi)AQ4v*iSa>j*)B%*kW>i24ARX3x-s%v)*0wePw%<#OmI8zS>_`+QrimIp90)-jiI zVu@IVWVjHMeaTV0k0bgJGoN@;fv>l6Z3xZQ!LFncrpftA`{Ej7ZOs=Zvb)D#rW>p5 zV8ASi7NKr`DF8Gy9|~<>L4k78dlF*hplK+^y_C-;Wj4OAh3EuP<;z{#WbXmNRR0i~9z z5!w5v%tD-cdE>GcLju1`ne9aJ+dQPg*1m2gzrQ}-{o@S7LH5HAS>3Z$H0YB9D4vKh zrv_kp&%w#94wh{`UGF>-Fh7(nUbYq*1ScY&IQgs>OopVK6abwCl261bS{ z0JkO=S9ETb3wH5b%Gfb>@fUJu>{8>8)wSVm;nC`l48V?_R>WZ$5_>_us}$v3xnXIbwb#?w#@7_KEjr zJzjFOmm~UWaK0W@K3n5uM} zGd-;i{ToEbtHK3#^(THnLnbB=EwzHWHyIjDBKZ-N*wrm8*i^Qs$wM~ZUwCNik+ltn z7}A6C;RHm5g240!xbx&o?c<8}>>fqd3SwiEST*O<6McmYl8 zglx|sgY^ug4r}0~KKs!n@5HS~qK}<=%`&9zm4Or<4+nj|&y1+g-tS{cLvV;2o(_tQ zVHS^(u>50Es7d{hq3B=4 zjse{hFdb83S z2@<&6DTp0`zChlZntlwr*63nL7J4#I9EFfN9HK@CuxZgIgN&pTu=*8H9-J&J0v?$~ zrJpz)7ZMP_x6(I*BtkopCz4C?YKCv~uh&OKT!ibX<{?DThFHaCZB~npcRvAtEc*@Q zV$#>Y%Z2RyFy~SZz(J(Y-g4U;X;-OJpdPsjgK7`#{jmoBGCUsUtaiF&j_*PrdwWY8 zsjbv7>*Uw*I$5Z}$+*1bnzOg6yDE${T7|v&{=vC_?}j8|FtXu!@45O3OiLhOj_x;d zL3M#Pe7SROzvxBwHPdEJvtr%o;HhH&bG#$*_MD zia*J50E0;|ID-!jup{Ji;~*KEgYd|g1{cH!6n^fHi|_>!e@+BXFMpJUj-PpJ>K5uf zVoG3P>(F87hf+`MM%W5KAGd?*=jBO*paUSnx1NMp%nPFra3J_Pb@c3KoKg4g$&e|E zCwpR5ePj|G>w5-zBXSdfroashfjBPO)7pTRcEh4}=r8acbufJX7c#U9Z?4$F!7&8c znyVoRTK^LNQjlhLS23leLA&wKO+6y`lqR0kp0l%)Td|%`c_%_>StO+Sji9b@f(MmL zbQT|?z-lD3RXU4Pg70ud@T=V`#!Sx-U&69<-hN<0kk*V5h~}Y8 z_-qEbva>P^ROX>ovz*uXn9%#PgN+|+q*$b^oEA>Gk9sf#X$^GtBp3NC^nDW(4S!!| zJfQ79CtOG-Zc&Z{^G94H{xr|>h|*Mq`cjVnq#E?nSo`)U&jCm+y9;CYVLQyZ{ZtXo zk{V$+OI|)4YSCLeI-htDs^D6qzFgH~Q`t-@&D7eOck+uPe2T(}@^73jnZhF9$&+WaGB zmzp(cak-71<&S}{%f4c`!8k$QW^!zy-XtH9BtV36d`AE=`vDpOhDIhP4)R(c6Zi{e z74aQ4$qh3CkhVsxt!hwfjr#cRte>=hI65MiHtL9}oOD6< zfQ*X(WLzrp*A@)6_b$oU8})VDHPF2&D5KugcZd?$aywF=f!xXc!EtK2#G1eO&C1NFL+gy>~q5lk#PLQw({8&yAj?{lDX%_8(fE=_Kz)*6s$ ztYyofK*wg^p-lgK50+$>n;98B+bRoWgm}dt;B#p=n6HLtiT?bR$_h3TE4Of1Ct66F zI-!-Dg{>NOTt>&&-nkwqLyBF4i1YsbjG+rpXa2A|80 zvf_>=UPQJv_jOmFNdiQEoTa6TG01YqkVbD~No2^M!YKX^{X@3 z$&*B6n0z?kR*wqIbp)nI;a=mPlsU5Hkzk1OsCxoj8&JT)>+q*=hfyN776qS{mkx`9 zpG-5uRg79ojWk<+Q1_6}{uUV~OFf00o?iC-#{ULXH3>YpXK*8O__&uHhF%SRC60`5 zm}L%$_?Zd5^bf;gl!!n4zIUQ6Gwx$ns~il#|4A_Ye;kLG zV0vEd#3l(mU>M$7{z{*5LI>W3JRiIlO6@`1f5hV}BGVoy#v2Pr0>tn(o$BH9=WzAkk{KD z-avGCNOJ?W<4#yW^5T)6#QJG82p7t4LzY<19%J7=bp7mc8rya_pFP%VG4qc8QU0vN)63 zWrM`>Hy4tw|Cm*v+C#v8IQbw%kwRKDLFa>vT__K<0gT(1=M8K@=-OKO8Qj{u6`aB^qk-nmi>YN_V&(W=vce4Ybk%dPz# zQN(Qx93MO;>fe+2U#kO$U$T_w=6N9|hx}M2Sd&Z@@`a zxV0ZLKcIhCEE?)1!Z&ulD*nGtnvdv?m!^NqMj^(C2d8JY74?7s1qcBQp~N!gF>bl7 zohLxdC5wGbdPA;Wp~iqd&A4IalbWk~aKJbX^%2iTQK%mz(rp5@0}*8Icz{e&gzJqJ z(F7aMF3YhoX7L(=&-lfM@L_EtpjUWMD9Q6q)nETSpJsxC02~7-Geq4|Su#VZsbGF_ z)*H}j?NSpW9h_^!kBsj?9+NqMM*l3hPcgPW zb5!py+!dHkhV4#y*n^`AJC;K{OVQ&S`YpPH zQPS^V|K^L2E_MDZxoeU>vTxhwuS8ukwXmSkc#deV>DYNh>SdQ>a zQl3UybL$N<<`cxt8x=C;ytw`^_TDons;ylYr5lhYNNfQCn;aC7BuPPX&Ph;_90iFY z85$(#pafAeW|E4cabLS*d{RO%M)^9lCPEc+}!d3l&p-##vu%T!u?95l^yY_|;&Ti!!cl7c4c8RQV zDEP*Stbi2sM}+ytH-H*#VVK}SU9N8)!@hcbuS6^Jfwa-28QGRIq;pbR8vQo^-*=}* zXY_2PVw=I^KComS3U_l}kir~yGso9FN#^mVy$Se0HqcKgpxI*{yI>?^_7PsOTlH^I z@?}wt#k`Q;&%SM6Cn8&dzXT^#zh5Z563e^?Z8{{d0=qcTvjvU*%tCd8?GYLx%ZuD_n`Z#MMOcc+4111NAK`6we zfHC{D9_f6$r@rth0kcjF^7$nw*2X*4htTR(}wXRwFP#aNMX@pOpYi(eUfYCs3GpGePD_}*V!@VDNqfqd*|IcJR=IORb| z7YmlvX;6VU=th#uh2`hV&-}CwjyoT9OC%^xnj7>qNdun;()(>U@xT!d``t5EZe&Jq zjJx;Y@d{LC3$glR-cR}3YXho5*pCnC%dy|c*@!b(%Oj56NM^3cv*~uTCgyv0%&?Z; zjFA`Rf=j@4dRlkz-f-DKn#D_tp?!fngJvy{sD| zfeU+*JmP!jrVfz|9u9ejrx)RY*m<8n^ZiKELMZtpEPx90iyu8i92ez&n8mu1!F37yM!9COcSO2zmw!Xbo8 zEGvp1P02AyjhZ%nDWlnN@!aPvYZ}(TKZO02cIWBhgx4CmjX!NMhSc2t>c$blJG35w zbKBor`99BP_g3O)_6wg z5YMf*)+T7kyEP=}coKZyN4{T#K#D#Q`;W$SZcQ_iGyDKlBSBCJrp+{w=m^qQ?XfB> z58allLUdYX`2L-hOs#P1G|DUUrh;CyNR{_~hs z(Dp)05#GO*N3XUZI^XDtF7U=r-w(Y9LTv7iU6d^aixg&ONv;-^t>&|d&Ceop*Krsz z0iB|}K~tq?y8GE}G76k1QJ--+rRBsMXm=FNePKkh^J`dqhLW;A`NYXz%#{^j>@
CCnytM3*q8F9@2x(As&?MSt31o4FurTKnumb_PfTXi3B^u zojRBGX$8PxS3j@STCSump6@VFLguqX_Vvfn6sK9FAA$OR2C1C_|Lm_Z_XNU#bOKKT zWAU?aOEQpK@>ziC9!BaBo{*^1X5V2`tYf9#V)q^mTZ5%bL|*R?RWqsnnjl;{p$!e& zNwR~2_G9uCjZkb;&m;-veejv+iQpjzMwz%RkceK`qg+3ES@Jgm${a8ZNA4_KomAb+ zvK-iYFuvd4tx@E=dnzJRN!;I!DkYtCSEFob=yA3%!RD%zg8-acMn4T6e{DHRwz>L+-8AaLQ=>*II8f3!!bV*B zF&RW6EQ?0AepN^NuZt~%{;dW0z3=m|o-RS`-C7RyOCxe#j>%a$iS&75hw}R0=d$FQ zniDUtk%F4S>7GOYuc>ivSof}Wjj{CRsX+s~`77tGIP(y;MU9@?5!q4)Ru z;-x~wI)yb7_9wlOEUr&Z^IIgKIC1E?hM^1@k&x6MW7`@F=6U`a(3UJMQD81Qb^d?| z$?iP+YixjV)_hY>rB4^Q!|oZC2;^_*eomtk$>l7SIuSS9QDMUI+#7!x_;= zclgzL043)qpf41{N+n-0$?^k}a+h?iP=01*NT_yQadtlFamd{hj51Etkc#87B_ONX zRZxAG;M*%7{#D{Jr80BrM9K3I*?&S%Xs0IV;1vxwsDpY&O*~mp-4G@>a?-PB8g&HSdER2@Xx1Gm{#Z{*AHQ_zLY>vmw<R&JMr=IhH(abqZQ-!(m!*b+b!4|G-T8JgT`*urQV*EXOY? zl3lM`&mUx&GdJ`^NU7bHul+^ySjJ@g z=%bS}#pcSLG6Yc;KA?<`t7DAX0SyR~Pz$qrtu;{-%rS+7*3%bdxhU@27)nXb%Y2js zRV~SBj@h+PVz9Ug=SBhAw0r;A-kahMEZPx2JXV7!b&&y3h;)`R(Fd#?UKD< zH4@Fa#8c9jwK2|VgHIG8dT!GeR~<5QLN%GmepCpA7aOhm`xUq5T-Tc~;HsGG%#x;l zR%d(krM!a4fe<*HM63<$sMWsq$LgR970CSQW|x*hcFoBbJdogoUhZ1RM@nUc{^wC^c|cgrphb&$_>Je zc5=h##?E6k3|3wo)hNf2uYj2}t(O4|YV~_N`ia$s@;Q34Q#f($fwp%88Hz$DTGISR zzry3NJ8FFNyX`Z__c19PlfZ6)_f9tJk}oBruiIHqQtIK@7ebp3HX-o!^@K1$;k77x z5Qc!yj2q(fjJz+1v<&(wUV;2a8fuTPV}XjyUgnjg)B&kkTzz)8qei$hr=dX3BTqY7 zmcD$R^P*#!(Wa?XM(~yVWh+QxY7;|Fvr&&OXwrCGKt+>N>$2VWTgdm>MdtJV@96`@ ztKB@_-xsg@r6_MqIZ54|sKbCL_l=njY%;#%5x^NcH3-t|sJl}h0Euw2_HH9%fI z53`Td;fT*uZA`}c!8iTqz~M(3@u3K<7 zZ{}Mdqa=BWbo~NPRQ$keWg)7T?ldFMNP;Pc#9X<6bb-j}q8$`tam4HU7Ap{rRn|E7`g(>M0?c@#KT7u2HRO6DFYCP|z$xyE##am`e z+sFIj!;-;Aew1>!cf>eh%+EjxwgCH=_$oJecy?D#oy$dJ6I-s2Y2xvm7%3CHlnwf(5s}OzJUux*rr0~)0BML;lE#Khj%+QAFGL25u`a>@H^d_tm1T(sIyR@{OS@5I+gD_EM{KPwVTI{M zy17DjA7x3di+p2eUy?q5ssDN>AwNL1yXxtc8?T^pd8i<|*(;lWr89jAkgL5IotKvq zHSqfK`dk}X?oqm)J?j>nQ45=cie*AL2Zcbf?zp$sl*~(>rTvFPk9)M7e*9JeV<%m0 zc??Oga7Ti_9aMVFVVrTe<}qm;zG?jx3^CycK|#4ok>l7?^B@oM#JTKnp;rOfZGBW) zROwHQf^QnGetp!X{F{KhSFuhk|JE%!wy}JmJaxStb+jXM+;aQd_4(XnU`LMwtM%l4 z)6pCq9$5<+SG?r7Q4qxUb+Ifo23n8s=-Ipu`5yA~<0$rfJz&IQ968 zNaAeL$IPau?I8@`w^HP%=LGfy)?X&~;WU11W(&XGp8yWvERlYGr3@Lt6vb2qi`DTz zFPzgkKuPMP6i1R<)=|~8qqyO;f+}#TqG!X31J|Sg($NY4YQ-!^AKvmlBXT>QBTjpP zJ&i!4Gfpeu25CyMm@zp=3JdIDTqOKADRDXgE<^fm$y1BE2TE`Xy7DIU*s^ji&_k>* zkx#N(Kt?A%m(k7Ps$U+JWF&G%WmiS2To+8N>k0FI26GSR+?W$zx05ELr*}K={qy5a zCvgEi73TZg54o{>0HqH5T@{dKt4xe!5H+MQc5$jD~1;MRNJB~pJWMVayBDX#lKa+kY^Z|0wQM5)FdOaf-sA{31 z-!}I+V+IuIeh1wjSb1*E?*yc|VE-q(PG&|wexs$R&i|?a@hm)Cdq^V$x(zOZoR&yk zk^bSCUz+$4RN({aFgswcxK1+~7_Gaa%c$ALMoIwP8PHf5zmhMQYMQH{@M1)k(toqAZ0rYI=KvCuJGt=_i4U4Npc_n$R80skbjRm*SFdR1- zg$t_rt&n-O)n!`@OPexCfM>SxX0^OQ8P;~&37)}2fJaC4Tuu9K3hG}7>B@lR5dK=} z1}XI=d1E?`6bfWH5OUrm%SDz$1nE@Z?h$yJp@O%`&sj=$6ht|9@}tB2NnxLv8lA00U|JBrOJw?a zy@VYILmj|j^8!@ijC3d3#wgyl(Y^3HcvhuTm@M~}>-K0m`;}Ssf~FlCLHT{L@zbq} z8(sPWqYW?wjl2!M1fB!C&8U}36*hEigePhv0R()oUf$R!|AoHgGdA3Q>E>%aSyJjl z4`Y6g$wdTE`w`C6&_U-BNx*)g-e68}d#skbc*KiHXu?U%)Cwd#szDxf-IDWTlWKc- zXr!S7CHx8Zt52M)dr2hU)w0~nQHeW+Yw4cJJltNiZP)5+LNq>JQj)E;FVn3hoIe*! z+!F51vo;*&{y;G0h*iPTB4R@{*W1d9^DO}jI~B8m93dK50F0@VA4Rk+vhPCtM=%Nbs-RXp$x{&ZqH_D2l8<3xG75phao`>aF_qd ze8SK%wSD?|i!u__nay=WD_lhKrVQ%W=aNsIP$K+LGOvcw_SJ&L9gl~nPa8@&_I9MI z9qcX)(Dgj1o2;fB@=Aa^{ze$+a`^F&G7*M7_7gnHVtvzyC1bC_wR1fr3TFGsXHRl9 z2UP;y@Ly-5*e>}!?7A>X1hbw7fq==WFgtB|T#B8mxlWH5raGQrgW?)f36Q+JF!OBK zEjcf9SjoYa*tLC@q0D9g??oG_gNQ2q5Md zPKT@J_iE?D9hwSO_|?NoV=aypgjUkH0o>5 z`2kkyf;anrLl{b=URbfn4!42|G0}vneo0YjM8SKp5SZx1LHG@5Cm;>wx zQ4H^OJzECUA)E{#Imt1@tZDHoWscYE3q2L@v^G z@2uE|y@e!mY44ek&fu~4!OWL0Kiim8;8RpoN)>duQmm6hSrhVyxM6iP5~4!6NFG19 zi{Xj^|5CWevscKcLhiiPjPnAAl^U9ojt$Ab+{4Kt`WL&6Eu|kHd?sgR=xgH-P0jzZsDXmEf(p6)|FKE2aL9YjeT`*+W`Vy*!JOv5%zfoffdQy|yE5oN0*uQ#g z|5AO}P)CyNSQa-eozrr>Dkl|eOG0KJOc{E^jN%x7cVTz?j=*2nQ;5Ro=SLxr-Q}w} z-%2g&m}qEdoW8$Z;p_v4m^Cn!h$Isn(stfCA|}PWju0d71W-xZ=s*jH75D>{mk`Cq z5G-HwTc{l&^tcM3hQ;96@cZ@5<|$Z=nuGp$uFeVZJ*=XHZo0&7XL99&z{Ot6eLG*3 zt=Ca)j)i`Y-OOI!5d?i$1UO}e9klfE(0|1W54tuaYC`5KB%&aO04N5~O@rE$IneY* z17Y8@y#cC+X{0k4SZr6^i*s@E$YQ)k2g;N(L^^Ym{Ano7`guff2o~%x_tD!<6J>%) zze6MODW6pRuss7@s;}UJ#N;SArf}a-Fz)<%M$EAXIAWs9miyT~8Tka#zN;(AxH?DW0**#~9D)CMbOIP8 z-OSY>)Yo%z+At#OhtgjG@l<|WXC+a)o4Unu)v*=Wt2lP%+-H1qa!!DEi`vX3?78yj zp%mR)XSL8odC+*I0xP0G|I=?t<`w5LBQiwk#o&HUb<)Q!whb zw#L#yRoPCbaXEPYU%$#dRZ%AH6Z&Tp0Aw4h+wmk5FWPV6~jOkVKr=_txGaR37|q0+=% zu5jjqOg4EBJZ}YeNjAVwsYb)L%=UrAZlT{AXgOZh1jaB((+0qjp*z-^k+JO)(4p&y zSQzQv1SPQccw8LZEJQMeseS;T=!IbZUlkBl1L?HQ3iN!j?%lE*mvUw|8&h1=mu8}6 zvOj)TzBByt8-|RsebZj+*7H8iZM;4nHZ^*r1_IQzDg37A1+KrPK(dL~r`D7~lZpdI z`L0GfkLugLF`59S_Jq^7i2gB=f|f6bbEIOaYJD^QAV<(G#*aUmI~vu~eiY7W#x z%!e}~)}k`RNZB&CjO*K%zQ(uJP+lNWdF|E!D(#tXuT7_Xmyi@Kc#hNk;M#i62d4Ms z*eaJ{L@dW=IsSG+Xy8(bniw2<=eWb-Fn(gNA-cb zVYA0eIIB(r$(|(YmK2Rk?eEt|G$wvikqoZ6ndbsXP`+3;uDK+`vRq~Zk34G+R4wW1fz2%STW7Q$%9`nV>_! zJ`A?M#7u8^KzdRp3$ZfJG1m2KbupkmWy;kih*YAv0xkg>-40AuB4%psa)1evo(6S0 z!x*Z$zyYy~LHp})7eR|v4VOy&nFxjLN-5mz8hHxne;$Ok?k#__t)&+j#!2Xhzr>r1p_Nd9T5gnz4m=CjuZjFN6LpB>ec2Z~g)6NxA8T;3Tp5(&_wy`W&C zJ4`fYfcK!g@!CNL=15itr5?Ek5N*v}ng(ru#M`!bVSt1<>y?yPC$k-&Bk^$gPhsdH zWV+yF z6=*4XOxZ>#c|g%1aP?L3!Na}{e9ZY9HPh3P6+luY47D}u z`k39|UxR1AoDU7vSMJ|ZN?r#2lmbYR+FFErL_q2^#<>D9K8hDy2dNX8#YJa+N-fs; zEtS6WSR0qM>r82yy`P%hM)>OXqA0MD{TOWSbVnxVcCT4AC*)fn;c2@CSkfIX(4CdbXe>Mf_e2v}7*Ppqg2YpBc7AIRETQXZZU!v*!j5OCtc9 zb&5VF*?mJLaJNytYFGSwUSRTA)TDa&IHynRoF}5v6UQ>bUSK~NZwR#60z-2CCi!pqtm~N^TcaQ0GL)J3;iY!4;DxmS<~4yPj`>1Bsh ze#n{G<8WAzj@G2wV=%pZ-@)hG=1=I|kUVxOzjnY~cB97OQ?-d2QQAqA9yObcOjwSa zmKj}37DS%gB&A=5v?s(u@eraG!BzTtLmT9X-&`LrT7%$782W#lBnDv*-oA5B$P(<9 zS2cll+<-606o__sjBmn`cpvI;6MZ9Mr#DEcqZ~!3;&J2wItynDB%^S&89@BEQ6={| zOnH6>HKORaLEK|tBL^|0^idij@9ughj8le&8psRJ!e-L_*fRIAq{n2V8F(7XH|Kg> zHf0xr*|mCye(!OG_Ey&Y+5O&5Uc!NXOKlcf!Axmx77-NEm;e3C1SjA2#ktk8sQi?r zg)8=#$o6Ho)WvAo-lmwGfunFB4KQI-1lcU3@4Zc-0PAGP)mM|lJvnnayc02oLB*RJ z3hlsr{QPz5r9~B4SlHgOxKI@*8mH7qM}(m(@Gnm+>F0-janhM7J4WM)kU#phW+L8e zCnM34pFmnKnoHw0=Pw7+44*;_wOeCenIUnhWspc5LAIoO8EIG{o0tJ{p2)ZT{+12L~sUVd{LKywE-UxFd^+Rr3)*{ zmaf~)gU@#aEc^dnTFuQw;Q9+k+s1=opk@uvfbcV zobO>NDqPWhdh?N7fBs0I-{bycBY>e<>4dJyLUzqaZRxzhR~Zerj-)&f=obXcNMFfq{PZ1V zRC7u(%wgFT+)>&aSY8(@jCcKI%W7d!A{RK-^3=xe?XT~(1IF02t{uRTb?3eU&A+t( z6+JdD2uLX?ls(pi)0LX3MFNbnUaPoM&^1Apmdh7hojBqcV>ey)-k9c=3fP{CHK8D} zEfAh`WDQymqd^}9Zr{n;YSR}Qp82ELEo99~p;mQof!z!g&Z<8??DSW;Fs$D$zq4W? z!Llf8eO;zHoZSkm7L+mNevZS6KWsgQ#B6t-T*_zV`j+~@!=8bH<`y4Tqf~;*`!9b1 zjxcf;9XbRyB2c z!=gq73x?(oRv=2svuJf`$we`PF~eT^cV^eBw#^d~y#ICB17$`tn0<4VJ9=c43K&(} zWlp)xfgi-SR=iuXAivNwBky3pSc0RmPv~_Q^VFgd9Ql$@TSC5Qf}gb%d^7njEEHH> zs$IUvApo4YLB=icl#J2us)V&)()j*1xXVv2i?>CFA?|cigGzh`*szSe{*pPlQF1hL z`ZD}AplV`&w|*^Aj$=y!kOd<19+*)_5Kh3s_&FasiG4B&-5T(CnAb)+3V#h;tRC2~ zNs@Qe)YHk!i}7vWB4yZ;Ug1aKKg7hsv3NKVQ=J7r<4@+V0*)hnJ*}`QKg|8(xAq zASR9B1$Em>dzB{*ke5F7ua~C!&jUK(7KrOnk_#D-qZlu6!Ir>42UTl#Th#OmK0Q_Y zjiK=Nd-n_RNcJG}Shm6imNib$=uZbnh}*(%`y%$>^BMh8%8gaX`1;ARE}j6OEDT^n z|Gxp`|2qRn<03A_we4e(Q%h!ww53AvrcU&0HLB^)c{Ls@duu6*yVNKl#BNp+_4SQP zBRPUt6%2^mBND+cGHfbLRZ7n?9(7Z5aNzmF3NRtmZhu4eW)&V-i zsx?}pxdOZ5*7?`5%`J)s)eg3H-S>IDcIVf|)XZ9ognM5e5l9}7O@-XMz(M1IK$s-W zCSQ^f{IQOq@zf_nbnB_Xp(ij@xDqW3$s0aaOnmp4EpQ+$dHaA(01b*qo}+V*y|()D zJRE)=zRJM(A=dJ8(RZI9B?eSnA659MV)z$dxYKyBBCrRy#d7QurZ;h-E$f&`ReiqI z$SBU0KbM^Tu9?r)`SuRJ&tjL8zy_Xds?VnO+mK@k47Dm;tc6Fi)99PCaAIR{XKwhj z%Zwy;0J9)C`~mS*uY7-N4O+#Gw@=78pa>NZRqiMAv`PSfpt(er+AaFv8y;UgN`va- zff>NicR5HVxo7=CQY?Iv?f)&WYek+!Q-?r(8?mEp7Xgc5=p53`hLCw@LSU{Ocl`} z-Y$ZvmWOERhwlWEfwR8z)iUQVgf%CXH;h~5=1F_=z(5{z>w7(+WbF1!tl)2KcsP0h zf+_Y74|9IEL{X`=k)5K(;gX3kSc}P32HY<&I|sBSjb#e)2l$K3_gVlon6$ z4RYWYB3U3)g_1?e>Th=@T0_t>J zti!M5DS{+uk)wEizcS;1?$vvA>GkztLp5uYD-5p+yn8y9Z+U&mP3u1sZjuPkBd#<= zKpS=~(V(CuC$)vtpE^uyPccPs9UZm05+9T?@yD9IB8KY9WpC=NQKw68jzKK!zHXE2 z#kmrCkil-e4=2_QAfO*SOybtcK+?h?Fz|`n`O6u1Xu1 zTbzMZt#-(4a{C2>OB5nb2V8FwM%)A}VtHsng1sO?mS zbu@xV6t!6%y4bXQeBpD67+_R)D=LI@=8NOTl^Q{Zgf0O5Yyoa!TG(|+u~>)nFHXoZ zB6UN$9jkqo=%GQxAs6tg^ZRG!_del%f)S??z_;2$ z3&~taU+J94Xc^#jNL*G=kCYO=SdW;(^})$UHyy&u`&wKazOJ2Y+!r_X*>zH{a-|cg?~!LncRl!u=Eyb?#)_WqHKG%=;>o^`9?2shr)Z<=gRNGN;0*{q)tH z*?-A9nSO3@E54MNl)9tR$Ay~7m1@BzP^%^G#%>c{(E8c7(=O@B{m#=H+X zO|?3%-5&(qUv}f@Ynk9O1E~l(SdI7Ra@@7eFcUJ@;(RDE#Dv~ze43Y!kG#Fp=;MKo ze>!w&t@bWbQ1C@ht_?U0fC&jNxC~Q3lvE-Dd%T=H5iJeA>29?GNfi|$8_r3Pg6TH_ zEo=>VUVheNjAKEH6`wUy=*Iw_G-im_6o+&>hBe0w>X!>Pvxr#wmZ)|rPC3_1ZK{uM zZvQho`0eH#0Us4DjFGd&&?#}1N^WgieQxPWl&lj78d%@_bPF~lvIG`s+Ee@{&5282 z6Fb0t{sP(r9g?ly(X+W;;+n1B(ES|vky|0en9QeCFiD2OLNw;*tAk2B858 z2GFR6nCobu5m?x^&@pmlQ#$ncQ#GhgNvDeE@8^y#Y<&mY_A-b`{4*L}+c1fRT7sX8 z$Ll;c!LC*0>PMh)IqFnBgTJ!~%AdQOdxH6dY?7$b-xeKycak&gp$*qDgc;^dzG~>o zz}qO-{Vg1s-Il_8biLVO$Bpih2Ne^4Vh>8E0Zt2C5?q$LO=Wp9A%L1(MZIn6cxqWq zrQfFAfB7OVLi8YL!*HkDtbU8W*Mhvw$p7=?*4NE^yZQ^U*;i`eLE0LxehLnx57iqz z(OelC+ufdYLXZl3;h0s2h8FYkP)PD}U52>97u|H8MAE9ljxcNHj7>;w$X2% zp3{6D3872nRO=S245o)U_(^187<#I*11Dyo7G?BE$`c()&LA;m0zm~Wh%mXCyC@Lz zk{I(H(4B(xu}KKxIJ1q1m;V07@~x}%j5I0qRD;ZXVbRM^Buw-xa6MD@mVm|cNaZ#xoCOmgUSlcB07h?5- zeqjn-8vKPt)&O}$bg-~22dC~6UGw!NNGo?&EYGrXyTO>Y4$|HS>%8snwseFfxyFP& zjGEp(44V1Wb4n6}Js0G15h}F3x6j8z(AvgAkWu%$B&41i46g}K?z)Q|WZc8ixIM2X zEnu6T;FgzJJ)BHK)hL4MDiYMsS^X*j-21Ypp=N=GXxjq1G>M^GEO}NWVB2X+cIL5U zS)UCpThQr7o!Yh*8|Q7%^dPom2Ra>VBgnpg=c7-GRmC zp*BsE2vkqp9}8aV>vYCvRA2%b|u)nMUZwviHF;%MSB z;1JBgHd#i)!$KYz2TB6%Up`dQ*YReGzDQzBt0ehX@j9|G86&*#1qi3tvVL_@a*QG@ zZs|2(c9LD&6q}Lw5@F@E6>r-cZ=4JO6t|dBiP6FvmtG_OPj@FJF@`F5lARXI#?oL< zr&mr$bwA&c`#j1f_!=-KFA|@$(4J6HXj0N@Y=66;IY1wm|p!u5WhjAjCdl={STu822P!iS8Mnlxp<{xM=Dl|?-vPZ8Zy8W7~eer3faguBu- zHs5~c*}pfXst)MaKc07Q=IStezqAtb54XEqC*^)Ga3auh`(ek}b0E}m2v7Un-U(c* zbKK;Z{tW@tim6r`4}@D&r!0NlCM0qxlET7ctmC^gp(X`k1I)$V#I^-|k-NR1|BqGl zB3MOjZ6ISVXIW5QDhm9-$^km3c~rw}Xy{o&SI7LdI~m$gTPBCa7O`Y)98Hf{wU>jf3rfIgXyPx9MJ@St^Ld_{KO1-zw>d*$b$+6Z7Z&{C+ zd12Wk30n;8Tb*w|TZwMcd7Z_zZJVh$aBH0VLirJ6ltdukl~duTd?G&)PTxBIeX>yjjBBs5$@S(z0V$g`K~tx!r16&c_93siabi-Eht3x(FicP*7q^&SVRq!kfNjrC3V;R`Iy+xnsjQ#puJ#BI;j zeiP;o81BM4GyQ7~JwevcC0Ik$0rI^E#m%M0XHQ;lsPK@%(I?%FrD)PzJ@NS{O%4)N zQubroeIAz8C5Z*}fzM(;H2)ue2b_G|emjC_Hze+%hoW%G4-sswy?)Q{xV(aHy}x|p$Gb}p33veg zGY$LAfZ9X!Ubs8`e6-Q;1j@exS1@LX zj6m>zhP}+9Jeh9#aG~9hB<<3kq9%Vr_+;)Cujcsv9h&kV$u0ktm<UcT@ZDu?)9!>4qUVU%7=-;Bgr+TBs3l(@RNaA{s)pQm}#64cMWC{M1foE~FJIDwcq;LZvlx#Ry`XSAvS*#KOji&n=ChE`lDRn&w?)IL$3X#y*sdwsNug+_Ym7W3*;C&u-Fmq&rg|dK?0A2WRe{IN2OVNYb50LdlXf`?ul9 zi(AyL6YtPtq3(5E<5uHvrGRBUIwT|na5fFYn%lp(!otEJ!k7|)yx-Qgb!5{hJ z_c<_CVx+2 zq;+xkzA%NT3>(3-PZ_Fj6ZxlGHN*5j+;&F7{JQqxfB|w9BvTAB`~;g;Ad?Xb0s}>$ z^SWOnIb_UJKtuCZmuBIRXc-`b&Td1LTGPt6|3E0(BxAq>Y5jY9F$)4gf=uZCNM|FZ zbjZMUYya@rGH@Ps#$;o6iUEwALa4vni0vq$%SKTg>&$64tN3aa~_s{X55!#1ptSmhI>uxiz-;sD_FQxR9>0zQNdN zD%K20{kjVE5G9l?M=nFp?<4rZxv#b*!@TmS%FDJEVMHup_#4Dr=Xf-8BV=NJDx*9j z&bCNgdw?$`CR#~8n!hLGHmu;oJ)a~#ZBbI#<@QM0TNQk>Tp;5$@U8g*B7F{uD0Es{ z90AS>3wo*LRBtb z>Gkxesz#W_Z{GVw6~9GW5slxkTIG2r>X1oshK?+QpwkDCFlhW}U|=8(avZ_tB=i#? zsaKH77aJR!3c+_TCXkMH4VsW>>|^j|U!!2yLVD}q;*rsp#=L^HYy$DWY@jMlgPBnC z{&SPIwJth%#77yw*o^`q&-AGiQRwgw(1p!)?wlsl3<)@63TV*GAx}t7Bw*T}!5hmG0=0b%~CQ>;eGp>8uTC!U<|8N+Pi;X8!qb?j z%5WD*yS%u|tuEAl+Pq5i&hp4eXXaMOMJdZZKDJ)T$ICRNT+w(P;^JFB1-*U_T=A#( zGWmtg(jz(|(D{Fd|d^SrE%>-Z>*P z@PWuxkWECPiB{Ml&M%2cAnoE#{4wh#V?+CHt2ae`xpJ8WtSDg(bN(4(ih{QoXGboC>M!+fl4016HLDy&!+7@@;c$O&sCq!2$)=^3e3v3Y2 zLYw6;bMIQdWol-g~bWr=hR>^i5f{X3|71?>wukdMcic!5`!ohIIixHz)|~F!9;2&2m4J z1t2;N;^kHT7`(M}hFH5v*43x47SdrW<9dcoxqG}mrE~z2jiDPWeU}flRan@GVl0+{ z6ov#P+;zdiJJn)I#6{1(CXiz2UzxSK(9o9}j+Gn(mN1CM+89;+TH5@3c!my3E%226 zN6eVhPr$3sh2MQjOh4{YU)$iW5{o+lW|<+i&?OuDI-;X{DWK|?%!Gf?b9Z22<*YmsadaIUSJg_J4=~r zSo?&W;r!7@eI`2U7zITE@Gza{R7*+$E%56751_ItDJ&nE*?dJF_=(D$uJy- zF+{QRO~iQF3*15AXt`j+tIZc$^o5%QH>z`jxXrj}tz$blU`@1L&y*SBRfUXD#vh zv09Z`Z&08VZeqvPt~ELM2vn!;+Wzx>Cg%AZT46P$3CX+9S>r#`a%)1|(-~6(A|te6 z-EZdUrKJ~zsn?puIuw5cf^32g-*t)iQ%0?aGzAV+v?k*jFPGk28TC2$xuC>Nfe!+G z%l0RA8w&99qwR@XCI%eX?K6vJ5Aa;Z?nf9ZE+RvMv&v*wbB0T2-h9o%vz@z7O&&zA zoh7Uge1egFOkSawG*B?*d{o~><@A)0OnoAN^OU?G4OeOr+GR1^m%mdORe1=?bQ2uP zB%E3nX?pnOBY@h2LCv+F$#&Vv8F@G z#wL7lEuE81iYop-0Ek@>SzbUtQ|!(bU!YMZ0ZmNCb&f6SPm6!G zLRjB<4@*e%`|#ezj8?NrGlqD#$3HMo3u0OC85jOhZ))zLTUKaEB*U;3)1px*h2M;q zy*^0yQn@Nr$mltQoLD(PE@`KchPNswv1`AF7bgDnJ5Oi08cp2FWPFvER0;=C z=hRkx8G<>ZjX$0b1^#kdglF;~tJduE8Mg;l*of4*8&%h6xwSs>vF1q?MG*(s(4;^t z+lvd-Sb@X+!zeyHP+WKNllM#jKwO~wZ}aRnLIxUP|tmHXX3BF z*IO*SX4;|hg{Y&5ZfiUwTk}yuDFk}V<8o#^bs?|Zw!x+4Ga8K>Sx(BK^%I(Y9BzHRg4L$? zA#2;jgk|uRQ)dm06&1rzz1Y`l5>0r?xa;1Z-y{lgGeZo%zP^j#H5|p@Eag*h(ah(r zHQC@&A4Hgq)diZWaHNPv@6OCALvt*g9ZttdN<5r-OljDzmxEMP#0S6^1c=olO zq>15}Qv|-ocr>Hnsnd$+DO79&=?^m^A_I4Ix_Dcfe7w}KUu6}Sl_2^GUCOUg!k;@l zLJcIvCM7*-Lf%!TzpWWj?|npTSgtSMFRbSn#jDV*-k$E|r9V-9_sJ=+)c-aHUSa5q zS;f>)MHVNU*&CPSyV~0FOp~~3Cydf>V^8-3!dc`YaeSUABM!I5(f;((@ z3POlZoOu3~f}7l#wE9Kt8g&dSN+AwI|G(OM&$y<(ZC%)e&|B!BlmJ0`5fMQtp@-h9 zid0ciL<9r`3_VB-O;iLGgVIEbD2O1v3xXh^h*S{-5v3QsBmcek|D3b$?>+au=ga%$ z^%D}Z)?9OqIp-M9c*Zl5@Uyr^H=4eY_B^Vv#+=!e=`d?08a35(lob=d)-pKlFS}Oo zspZ22@#&?JYRkF}r$2w{Sq&DWsy@E*o0VGgm+N2}@L=1kftDUGWpbazs$*r64G74o zAon)zbixrj2U8q_JwxV~d(lP8ak?=^IeF9AEQ8&-w)T#+*JXVZdPRMmOTn{E^74LU zYb%mfOf8|>m{@zgV=4JGi8i^mp!(ri2aN1ybL{bSiTQV4Bo6W9;%CekS?Qwpr5w;R zG{O1u@C5%jOFO`usHck2+u0Bz0k6C8c-smiO6)=S*=x8SF7z0+9w(|$ZpC=Fp6tOC zRJsKl+?PtfLD9?a5GiwdE$oFx#P<{GNq9EnCKU9k7Wp>()V?sRdOqq|F{Gzk<2Tdp z{o5-4xJWiASoN(z_md&GDwBVxL$9A6d4vQDqHLh!dWjKQ_t~><(I`fA{6bI&Gj`trxfNPgNtz&JLJczV zvE!uU)+WtqvcUb#&BU~>%+uC2E5t}%Y0#swOQl^j)MDWJ5b>ruh)Gg8`ru$n3ksF3 zxq^}jJxxC#?K@Wnl=Sl}RPo5yL?S-3R1aY|FjzU#2;S9$!Har9+ML`2)uV(CS|8pC z!&5TYiSsZmrOH!_^7OpoF1Co39yZA}jXGJr3hL|`AvP($vu2hnwYN1or=#1QL}>Yg z+vtX-uQ8Mvvj5R*w7&Ag!D8klElSP)z|mCm#K>LO3*)_6An;jx)xMMN=L0yU!?=z# z<`@x~Q7u0t9Uh5b(D6K{1%uSq`(FgfhzmYVSXF5DFWby%jYOsL%>?Mgmi3|fI@@F4 zMQ^9xBDgdjh8{(0{z62$0k<+*nyoQ3`(RR1-q)kl*v@x-BH}6)1P9bUX<-DrFCX|T zS$yXMGf*b%WkrOEuKB|T_)i)eSmSZrLuUG)-4bzMX3`tl&*cObl(AL1^jt$QN7ehi3A{nmpHh>KX6CS{ zR&{z<()Q8sY9biu9wX^V`{+h&aKyVP{Ddv?IX7vx+uoNK4=~aM7xGkX{dNo9*gcQn za7W^sD54#ULtO5;>|P8q!Rn|MZboMD{UU)esJprQd`w2Bu0+7qt>Xo-h)OyOTu)O= z6uuz@2K(N<`;H+2)gy{S97^Fx5kat*F(DX;o+=gG?097a1HPhvMVB3eV1)h6feu>? zldq5fF}sY38WB|j8-bhp^OD=}OEqQtKa=;yahhL7)aeNyaM_AF7Tw% zL&I_bq|D>;3ge9bSMp~ZQ%Gy zhW2}%l)e-mh+rs401$+`^N!|+we$c*a}k69hf3h$tqjQ%b`>7IcU`3Ksb80y>&?xi zaC1D&B{0VX&nhRO?vh|~>1i`h`N@EDN`k9)f%uUJ!=_dp{#-=Htos$84;wt{OU^a;T=H8&4= zs|dL%LG=){ez*GbHU*T8SJu`t!GABhcX=FA*f#e-Aw>lG0Qi>Bx4l0mi|Ba=BDWm? zeIyyt%{+v1KdY+qwksCC)Wbn*3t3=YaA;~?Kli?* z5W+lTV1jXbMGJF7r;sd=UM&2q!w>2adB44laK2%(w#C5g7^s}19eA~M+z1Dto#>nF z*V0gL2fn-B7zp8u0Y|;!%CMZ{YoDB5Xx0pRa~>d*bsu^{8Fu2qoAUs6Rnc`B?0s6~ z`1m;B53TlC-SiZwrz=v;b#O&t4{3+yfN`e58oNpEe2K^Kp;qN6jkIn8kFWdVrQxg> z<3&8Rw1780@tf(mbUw8-6UsY{=LbBYcDq#(4Pu~7O3%;%Ij?-6bSwmPFn%An zDQ*9q3t?!6f^rw(B4%O;jLI<_ga=1Lpl`h}7-l9EBy$y- zAULS0skMCibgFKBb>5mwpb*si6!q^sCJ_SYTCzywE?Or&M0xK>nH?vWAB`$wwCH%mkZO5utOLd$~; zp@R{-McE6o;k!rWk`5m@o_`;&lLqzVESUUOHWJ{ilx72&1dlg&Yz?`)09CO)HUbL% zwl{c{CwZWySZ`%?mN?=aN{Rlq`nBhwzqjVxd}eftGm%Yv`beycG3+=e2EJ6`8|V-Q zXtdG+wiFO|O@Nk$>}Z+A;{glU1Ri-iNe5B}1kHu_uFQ0-+#pPBP#QnI(M$QmSwd!@ z8>lT?_5ek3c7Fbn_1jDW%8OhGt&gciLmTPlgWg16%!i28Lu5ZK9L5N62V3(Gkr_}2 z+z9}rCm=9VvW(~4M&u)HWXXZ$B!VD4P8)(dIiM6Cm0s#9W93AvCrf=YAk}w%Ip*=>$DnPop4tB6 z4m((dX_{m}R{sQ$F>O45)zBd#l;e4ib$59%ssLm*QRawwcdgvbJIb z@T4V3jh=)vf|Y5doidk3pr-q3i<5bKB(4C7OoLL+t%)dB{=B*iSbB#j*sA^tNJuzs zim4o+m0F>LGC!Zrqn=xSj7}+z9!S@`tSoWo)G;Fq2&m`|8sP8ZgAw zLo64l>8lyw4~r9(b_!3xZ9cIuAJ9k2!2vO)sPiT{jUY!kiT${eNZL>JZrJxrbS`qq z&c5<9v~VGh)pLnm|Fd8A@%3vfrk&s*-9Rl0V@f-2haH4-&h$ROx^@OOHF15?!*t0N z+n&J)SQa^K;Bf?fxsef~-h}l^<&E-b$Ima;Ot}PY%V^t~v1VoSjQPjLy%9!y<0z{l zIJ`D^MLEjB<-#E*doT$0eTZUu{PW$^0rGVsc>~XtvRaIcq!JN%=-EGrt_ewFZz`O3 zqRhvU@kLqsYDIWD=@g9}Go(DdgC3AW64I{LMTxn8+58#7cz_2jCd_8}^HYIBuy2l} z<6Uwo<;jn^zsXG_6vOdbz0KtE$8drgQA~$_htQg`ZBke=ni*A3wcZ)%_{8{~eR28V zC8eaBcT5L}S@w5S+$ZORgsn9^wI#$0zQGHu?sceqjGb7Zj7NiNazKi zoCw~veQ8HDJ;&hYBeNVQ0Fs5_thbuhxV8?CfoJjJr&kdeir3e*pY9*lh{Rf7a2B0&^f3ftsY(>W)!U{Mqb_sl^Q*K+*&3L>do=~ z{?DRg>M}AKD?eeqdb3hV-C0c@jRniqzR z_h4=ZMy;Hz)3XfN&-v7akOLs+zXc6&Jztw^{m1OCwrMFB7qU_eGk_;wN_zp?X|i1)!uqi0$>Rg=LxGIfe+PBPE@Cxceohh zk+`QSfwcV!QH7&J0cTBHW#QooT?yYFB2hCljpHnl4Y4z zP{7OAf5d0My>>OwGY>)*=_ZKK7zPDZ_cv4A8{lzgNo7hGw@MELoOyF6|NBi42sYE- z-o)pCy}fyJDl}Bz2yz#*3G($8NH#QDVtia@@KZo@BwYu9{~4i)LYmHSt_ZCuelNQ; zTA@b~I6gQ2s)ij+rc2_V7X?l8Oejz51n|UjH|TVLnpL#XO(`+9Nj9qz2g^=ias^$x z)(seeNp9Q~X!9fsDh-*pZt*(43M!{SoD{Y5%x$1h%kdo#EO85m8dY$BEU*5C)8xr! zR3nH`3R8RnXf;q^QvsAwW5DrjuTzTNhw57v07*+9EU?reA&eh#CyWs zfXP`_t%2AAeZBM)BOWA)^Fw186ocMk0$4tL1Ym9D*DqrQq}#3J{L2dO!w2mPEL@N1 z6UnFQLTrEk-VxGTg@Vm;?(1__IGN-OE{Hk4X6&QZeN5GSCY6vx?{E_LGW6F;*r2vJ zr2&cgJPxtlxzURJ)#cBpWiu=*ypnH5T|v6l3i7gx9=Yaugpry%r(+LFg`@hXt@Dk< z0L#mSnuTH;Ntqu|FmXbRiK325NfQ8XSsdhKGhud&L6>YOQoa26CP)p?&=?IBJ?#cZ zXK+mz1+3=u>wos_f%*Idg%WuOhvOFTH8XAQt2@m60iHW}Z9vfn2s;%T3rlHSs$07W zvIfI5!4NoG&1q?4w26ZRs$OdQMJ=edk4ih-(F(qO=3N)9I;fYR9Ablc0>c6tq17gL zRV;xKCjr_6oU6=|W)hDUv@|gpq-3bA=pOLwCPR}ob~d(DiPtj>blVG)lfr%XH0_%L z0DtyiBo?mgn7qBJJ6sm$G1+u)X81I#C9WsL{wD^OCcAa)(7K}22b`OcPpUY>EU}C z){J4MxMxOe0?^Uw{>%4ZFq!-~9CX|+$di#E)`N()m0?ujuP?nh#VB{Mq6Gzj%Rm>f zTIs@wg`Vit1|J(M9ixZ_QH4?jv!FH>*6vLV*VWT292Pz}aV9axRsC#~>+w{YjmfyW znYN^Fb{y!Ftk-a5?H(g#lLPf($Hja);|bWo-#xe0XJKDVPIsae?#VgMO5IX&Rg4wl zO$@|YbNj&OXhO-zVU8S&!54m@zocz&jMFN=%wf2zDHZWEWc2IEz2RkZ(P{2iV;pYh z_m!h_G{R!Ref;><6>1pl7(3yRP+Jcm^J(pKzP5Cv%D3Gcj3X14V<#v|i3$q~$IIQX zqNB6AM{*8W`e2(mwF)u|MPIeyG;*iLf)s=zh=Fthq%8NC0$+A( zew@-cZ|^=*ey$+#tKikw22WS`W}p7~T%F77EmN|wtcd~T6D!8~O=RFKtBi{O3n41ChhH9x3r|!K}1ME6C0nkP$UtP*mn49bC>le;BjBFq}=~!6oQYMFfZb5FYlzU+W6Apr-0D^l=9~VRO;qI9B z-mS4YH8NN#tNm$gY_OiD*-ZkosUE$r&>TP4^d(RS!FR}w48=o*>f%Hdm#3{z+sna) zww`l+O)J7ltez-uhJYs3EN46m+Itp!Wd}9G04*EZbGFc8AxkHYrPW9Kr~$5_e=G#{ zYq#ArX$-G?BJFn6Z^%<=<%c?2Q5r=E!x3WrYu_&BV$hi41_3*Nu`>$L4hr7Gi8f<^ zwE~{wqeaUBzq3U0F9lY=58{V@x<;J6Y%V+CZgNod%9~HGIGB^0zMPY310GAU)hiI$ z6bD$@3K<#36*R!H?gSfcf$7L*E{mEv!;ZoB#vkpxhYwMO20>1o>yI!M0Gp^aT0h5# z!0!9P5TUieq3kwoXdwZVgl{?_W$jbFIkm7GmJVdF8#gKj2!5DbQnvvD^+DJa(uV+VflP?*HU%z*KD<=AR^TeioQ(HLs zzOZ70TQCy#-BdV&rBW;XAN&(QV}C0R7X(WFk_h4>d)Xbr{R57#MDooxEIJ)K{2zbM zHJ6RcdX!1>GE4~_;m!&#jm6W{n+m2;bEC=P6%YuJaG}T!WLY5GIa}eIIHDZ`SZqIfMv_f9#yKHvOix72 zIF~m*o;&zArfO3YFMGAOx^wOIyJIx60#)qhFST!wva+(k7K`f`M%>+`k_-*r42N0& z;2nn^O+|qa((A>40*w60fX@j=qV2IpBZ>_*Fo6vn#;l;(eGy1yHZ;#fhg=%20>Z7d zq-1gM091J4`J1spvVWtR(b^bYY50vfqE%F^G0(AP~)QqTZC{|cWcYeZhqq<4sg z&lUL{8R@av`>3&_UxP#dXp5rcP(}#lHCPuh7^`%ICuCFP;HEX?Ab$A#A%rt3+jJFs zSQiTZ+wJQiizQt^a*{kAPRGNx!J-e5j|)}Xmo!2q>>wqmn?q@Y8o23~w5iMRc`_Ww zQ(r|p*3$`|@VQKLY8C1K+X<+Bb7<#ZT!6nOgK(Ix16H0m9Qo-0Z;55=7@3b%pG)W8 z+q+8fzu?3Z&mU4<-J8GavlO=&ty3F!G$eDCY(>vur_MmYupAMo+=O2CGd3CMJO zc6Rn%&Fjv&8evkG>bK5BE&=)(PjLeGia`uW?_(G^w6~816hXh#7e&rLbZRXcia_*v z0Q|MJ)83r+-00Ok=(2hfI&(-qLvxn}21r)5sTrH)aFS*$rd6>Mq40VrB%stNwzstu z*(f`u(YJ4*kje@otH<1L(IoOs@TU#oh}(S#pHp0P<8AKl+3UT3JMDx zduIOQE_mUcRo?buz?xd?OKZJF0Owu{9=51-0}iQ9K=nK~e@t2d&EMGBX=wHv=H>pf*w@{@V&Yy zp&?~!1=+EaP^}{aA#EDKR0Vp$lrRM+OG^<#X|6`Cdmo>j6NTxt zO9VVojuKBlHs6EXVf(^KnP>fGGWAcL>xwp^wUdRO)6`>^2(QoeVYYWZ0PaPda3!%B zsskS1DFaO!5R1FJKLt2sMTgo|HTPE2j|mPpiU?YkgUVHB|HIeC-=Ha)fty>N7zSej zZO`%+wxNKB^97KPI^hN9{|?)pe_Zq%JoAEjSAKL3j%?(80ubOCAbC&0xdW&)1&2i* z1o$7tP6kQUvx5bx>`9wQh%@;dU-}oavzVKkR}uOk z=G(3Tb)j;`LCDA8b#<$O32H4XGhE^*9Ug8Dg0&i{P=jR%m4UYY7$qee=oBFTDB1ix z3Y9PkrLILCvF`$=a z(mi+svczcsnM$cj0Z#_n6*}oB{j?l|wFb}kdtnY@Rf-zj&`De!xW%@3U;Y_7Nx$E} z%{^O)PsuQM`qZd7*YD@3(@?5)VLO3~%RtkSK77W+y!T_qoC6WIer=2+Grfum<-B3V zWG`=0pz+V4zml95U)Izbegt?O0zV*4x^?$LhOa$)PARqV#*9xfT%+2;cjH3WAj?l;F1_K=A#+`f&;Q-se zV5cf;TxQGHicmONA39n5-(~S%%J?pTy&y&c(m{nOpY(;4T%W%|?QATe5r*SkLxKuqCH0Rbb+F#MjOjRQ%z4%O$D*Tg`8)%|TOY+^H(O5XNqLvIqHE}I(89JCqrn`3m~GH>Lu%3cVWuRMR**+ zPE1fd8V$mQ`e_bd6he!Elbc)}mBh(QZoy(P)$XrBa4oYULs*AKjN4Js%Z|}X2hq>n zh^KE7SO#XDv4zEbqvpmYHn&I{4B5@q^_5#|f{mnmdpplxT%%(l>x(O0D95VvRngI? z;+SW4WiD;V1M_C2MBq6U4Ps>+*o5imf=cr+kDL$N{Lw?w=tO-2MV~wTW zaV+aY?1*4o&vJ0TdxSzUULl(ufh149p>UJC6gA^D0}IW8M;4mDgyQRhEZrJ zV6{;6TDK{*3mUu}7M&Vcb^>`3suyQUw={SDw zy~)xEt@8)IAZHjXS?(SM*+8-ASY+PPJFVjwgP;LE4<<_`m{tk&3{aM;-5A~j85;U| z{4s>&WWb86;XeR;P{D)Qtz|_V07#i%4+jd<0`tJflW@i3aSFLX3|x7l_+UxsmGU8x z|1_=)ZSazrdldEp%XXMiQ`zmy%reR4U!MSf{6d$#HcTaY>AUKBF2?C07aemRHA!@A zc9=m#ZBw0l3PdA{P}HgHYEHb2p3j7KPGd0ibvJJj&PtjSUrxw{14eQ>m31(pPM9Q1 zJ{mmxMjATES)2187Q>7f9|ED%)&omDX=24`HPOlsj-DH^F}tUcFG2%{mz`oHz7L*n zux*b*`SzC~KhQc0izCm2Jz!&LCw~7QP`50iV;^`A9ywY0QItFuIUGZF`$IkO6l3cY z!drishK+47;qs;7(i|w#6o$+XHv++^X2$17bvtqqlV1cl=b4MzN4XJNA9!baGZ}pL z`7ggW1$?`dk^TXpdSKE0v?0b7=CYTCz3ed1uZTJQBeOqAOfpT`N3YLie#;JdBbQ`4 zrB@|x$|v_T-65uw)!^BAD7`ZBL9=OR&=j7PKx^o}J6ZnKP#ds)6{=A$_Ml7(8_?oG zlhyMJd~0~Kv9+cg*|GU|x&6J3!U%gY$ifVFSmRcY2pV2|1E{74_>;hP{#}n*Vvuwv zl570>6QV6}Cxf{wma#w52&O<+u>`2R&@W~MZxFqdD$2yCwSdZf-lUBz1)nPKY3gIb zS8uhE{oQPDqu`q|M{`T!QT(?lY>X6>7M@c}s{S^9N6*R!-^fb+VdLYsw|bpD)IY&S zD(!UGQv&zFs~)#590vk9L=mckK`ank`O%G>fCuX2JCdt|>@{d8LPj!^M{I5w$cF^1 z+&sC(C=sbO+5b}ot1UuxuRC2&=+~0#azr`rbn(W z{-Zjfm;oPwq#k_deY5nF#G6Ml2}{4;y~PJ}bLWTG|M+APTpW6t0-S|hIQVau*@u|# zudDT6`tat9d0XCpxgaZ`Cfz6U^d}o%(eL_3eT80`WhynP zQxdj!>y)Jycde?(r0+qc>h*ej=HZ8IeFb$p`}*Wnv*uK>BoT;&I47q&A<{9#xR`I@ z4ohMTd1VBRkY*+@3vL}xMxPx?d*7_Hw-D}99j(_|U?BGUXx_uYLPs5F7eN@g{-_Rs z=Yc#uV@jHE@4&Q(EaZPWaFyAV#NPT%Rt92@_+K1h5ei56RN*6=ouFIQwY8l~bFS=M zqJG!*Sks37yxc}As-c>i7NlB~9^353ij(2~O@n4g!L>~Vxv8n74`p(bqhN`3kz7vkoGNto_ z7GbJG;Pg`14hw#@CW!9PHCD5y8k*Ds<)amqVuL#QEz&pH9ff9Ogp&%(zH zDkQS?<252bNI$CTG+jx<5s+yp2ek|o9@B|h6%pbjVOa&tF^I=v5dU{tl~q@J1*HaZ zv9U|^U2;yQQaqLXYU;~v+da91irU&f?O~rzE_|Wuc>ADv@Wp|Y{j;KQfTw{6W(e~z zv_S%31!`8hXoW&Cpw((sY$pqdRrIT!@3jZ-4R|IK{tbLDUI*Tmyd@$cbaQ*V3%D;7 zRshsk+^;{&SV4s_ET}_gKnTzU?0rh!D(#TF{uY$S3~iL|f(EJ)4*JxIs)zVz0mB4u zW0P?s9I_4Hdn$V_gRg;}4d2`Dgifgvo6bVq?@*waEDr@QR?C{+JYzDAbB=Jqp=-6^ z%9UrWuC8VPp}OD_0@@m7R%NrFxDMzeS*JaD-~%XJInJ4!q&phdmMkC**w~QH6`KRA zSRu&rnlJ+4YS9b~+p7KAOAmP`s{qAg$Mlm;esRAp_8RItSMh~3aJrZKh5!6epR2i0 zcTId`K|PP+`q*E9edD8Rba=SVY-OKDLRbzmtwi!xUOK-PPQmNx>FcR-s_t!x z*tM&Wiuc=ImEZCm65=nUDti;;F%VAJr5cBR z2Z>wbFby0(yo?ebkC8mcjEiE3H|P5PXl83JNj(T|GyVEJi1z1x?EUMM)h5Q*x9hol zOM{(@TS`nbTZZo!Pm)-M6$0o+W_K^y7<3I$G6e+a0i#D@4srFrWw0uyv^H-HUZJKi zRw|%<@`yWIaD$MKnr(L(W`Fsm&l+pex;*I1boBI=!{_?*TBRSROoQ0w704Gp!JV%M z*{pz2H)JHIj7~hon}_VFoRlpiGR*>V@4wE{mKJi6eb1f5U?87(b7lIpoRI|P`So`T zr*OWsRlh&3c|wg(q{R5|Jz6XX(VmHFLeiz`_qcm=lX)Q>d`vHkaDTck-he3n!X#JC z5tNPxU&GnlO<{DM$aI_o>UXm^GA|7@QcisGg?3dn+initJYGDc=r4B!MKSLMzmeaY zBMhYu37R8&4I3UIkn^sVzFAceL~)oWq?N7j83tWC&-e)%07=f{PJ`x+IFz7S z2jpE}0lM|n^t33TTnm*7;ehxh4Hm3C0v&yP;P(&tABakFp)~jRHhU;-?V;boX>}DU zOMzXMqh}TBpR4ly+{Ob> z9ZoQU}YObq|`5j_I~sO0lPF(UCSzkRipedo9=ZvxpK9~9(p_pytxEZO>TFFX3j$xo@D zDm)WJ=n#j7oMm{+;_J0zZm{$6-dBDK&8945>nSC5DgYK)Ump|}skZKMJ4LCo(v_l6 z2&(YSQ5sW&4KfiwMzj*fWxc3Kbo^xpwR9l^9 zsX>E9S7D0h3N@EywVhB83c1|>rS%lUJm>==G>mOak(wq({>!WdF4Y$>TjqWNnM2DxGrPZ)!>u;R9 z2))$+xZE<5qVP2}|GaVx8`n~P%aeG|uIwsJSi6JwwA-!R7Z*nTMr*DZ@SoE^wg{Vu z1m>WIq9o?v#1AocUty#&M83yyQ-{ss2@Za*5|;|gJoXHq7++(0Iw^>5KDPRPoV=dW zp78R-)BM)6=j1~sBU`0JF0C+HohVZ=P9{Ks*07NW6VLsE&p&*sCf7)%#N!()DeZ^O zQhk3pEq8=LByRKinD1K#6m`n&*dU$v!ZT-&Vs|E+k+j(uN52Z%p1)O9l(M!LJDd#CudQRP(4N5_Li z%@_n1l&ff49KwhtPn@fa^t?@z*z&kR{>*M?MB((0`o`%2j&1ckC8CVh1-AL7qyuAg z+MIMh9ZOFJbH#X52v$ED;|3~byX&r@59xN1zQ+Vd9>S4z4_xhWr^}ga3;&KV+)YnF zrm2VuY9UK;;0dsEi2D5szGYd;iJpF!gq+&Fq*KR0e0~oxf8D|3Jaqzu|Cl721n&#y z+rn98oGl>-BMT9m?Q5O*dy&*^4FYG$6KARCl^hn``wv{FQd!TP@KNGUgld&m8SVu9 zrIwXDO1!yIXIXykK_5N$HE$U+;eAd{R=b=fMYgB2z{fALdC#Cg?&8$7yS%r7yH`4W`9LA9bevSe17vE(S<~RO>&-gse%^}OYRFEdBUaJ zj|t-yl0~{n#1=4WimsVRqYG6ve|OLk68)uGl&R$t5WR|=(vY%mHbETH zn1pQ5AN|2%0#1SS9OSIb6BRI|0h4IzWiL1ghU7{ChQ<)dzw4=2+(Bxq#12@h; z>Dow48ipNVY>P$ULopGNcL0Ay_m1VR-=E)M0lN(mr;DbA^&v{Kp2x!>I^62ahNn?R z>ySvwG4kutn8{j&WBy+YzK4kJu_c_6Q9FW&KxtKz{9-Z=uL=#fBl#Gwk|kUZ-I$)@ z8FRe8TxT}`VS$Rgu&q~f?3xht$BJ0sH=-rNz<`wnDf!Y7;$`~?tV_)27W}LLwVi}mB}vyM?afb zU(<)GBzbT~GB4NHQ)qZ=4h_?jRQQP2u|{pdJn!>~3gVZU)+>d%xOQwX74&<8d%7W9 zZOqq9K2&)Z3z`uM_A|7hs3U3U-wiA^Fc$`nTu3IS9)jXlf%PuelbBM{J?A7c7koE?*wjSl3-UKMV6CeuP z{dRB=(a{auQ!w~3Qw-RrlvYFfWA<$k?L@wHPfRO4>p(-Z3OeHLz&(>BwlA!C9>n{Y zJ;q_Z1F+QuO2+EVOm6$a|5V3o^9$I*2NPU!iAQ;`cbTSSg_f;s!heE{mz>F{r1A=!sr~ENATs`Ua{J^p#b<2>~z8!OT=Nk_6+v5`n@K*wDSe<3PYTS!g0m z;aX+%@UyAHFA?nhYw(m`!-*p}FQ83dUS~V=2d>PEnB=GM>Y&zX5@3$wKnNHK-WL=M z7tLmKLA9JI2o`34dju|08W)#zg&N>ddtT4=R1|(6Jj`9%{}}dv{VH6B>4>~8@+Ght z`qzGxT`q;knPPa@qXE1K1!R*d4vD=F>;zvq5(u_i;1<>Py#ks%;SC+29)|~Cq3OTg z0s7={b8}k_67DXYsJH`+hz7d9m~Aizcd4-VIbbw7OsH(&1G)sj(>Vwnk_Xa)7ED7N z??L7BH}@Xw+sDCVsH%l zV0|G)S$s4B=@!uP3V0eQ7=aEG$AS4CimpX(Pg6o= z^ps1U2UGf+LT`wC1h-bv`5&3WJ_?l&K|gXEQhdEs4c+Pc=tmOa=l|!|@(&aE+a&&J z$`i9dSh=7NA7(%J5`)JL!^?XqD5%`a1!j8EHDlBOW#W*c=l0)TYHr0rz|j@zsD;;S&=U25fof28-5-^9hTQd zz7%huBYuw~GW!R@jl6_h*HiwrO8zzj!218Stl7)>U=}XK-r*5JoPwqBH6&<6#U9q> zw!52c!W}gu0>I(R`|E@9C{zgUryL|yRCl;98f2^8gKosil#Fg=Zo!U{JZiwv{FoDppk}nfo-9&RJrgGaL zi(;YwTGdb=@qcSZf3NPpZcbo<$@W*Ey8vYu1XK9=LgZ~8VZ z$wb&m|Np7}KW;Eelr>LH$+7*vUU&aIrv7;{YQxEx9^rBr7JrEu7*et57ixTiFdQ9I z=;jvvW0VUYPLbQ8)U*LFrD^Mbwqhw84D0`7gTVpKY4YTMI>{T}Bh!j@^aNHvY-OV> zpOJrEO;g84V8PI^Y`m=E2yrp6v~(7k+F2E_{bWGu&Ph^ZC7%VA6oauW&CziB2#etz>Fz6E9?2-?8h!s;YgO`TY(*|La}YXz;NSk?FnP2Xd!wYh?Cl zu8xVUIp(o@>*1t=PN%=<@=P05_sbsYIUb%`v5O}#T}*vbGHo65c9*Je_RZ9MzIMd3 zqM`#SqO;dWKefsR74;PCTXs1hx3&X|iz z(ot!+hEZy=|92(7AQR!>T{WYF7lTEzmg9_nFMjviF(mJX%PP^6*)>)hejxp}!lOf9 zeXZA9sMHmEk0w!h>CAk4lQO%T<6HCNA{4^N+2)z5he+D!mjq*X47l;D4#xa%FZ{f? z;*+W~E|=ZNKIB-E9Dib0Oi9G4?fDzm@?(yc12EZXy@_5V!eOv2mFa>Ca~3@qlgB5o ztwDj6qUNXF&7`j??(f!2%^lazIdA#3>f9bYl$G)O=NhFR6FL-gTe)BeCMV zW$}EM2OCrQvU_*j+3JZyNIa4Kc2)Q3&=+cK9qlJPo#g_rg!&u$oe4bZ6elYldjv&( zcXflT$Hls}rAtsMoe-`Y?<#q@I$iJOx207b`c5oso6nN?$0r)!_tJ4TO7-e&>1i_O zhLQB=>tDrYe`ndnJ_*LQ%SGco{1Y>Jvdrqg-8S$GBHtAFL*4!2V{V7H-avP2?IgP( zpXgq;XI6e(ZdbFHrB&70klP=&T90hY2TK^pp^QB;IzG-G?n?Bj$?r2~^_VW-*O@b9 zdOoXH;>xwz)H@F^b!`aEq;|!;>wNG_c&c(eH9EE(@1NK~x~XcTE21Q#&#+!A(tKhr zp|am-;*hzKF96aELlPj?Jd5#7sStE*SuVYD^t)d_$i=8z7 z>wDo>qeuRQ9m8vwe$HiD%)j^% zv3umfTWKB3D}$z6Q#p$RE2gYs2-(*)qR_VDW^TY?>B0g>sVI}fBy#BJyN)vMQ7mBV z7K!Y|m98-+ilwJm=;>1|nSSS-)a@LcI(@IJFY2eUNn*ipPdwhy%}3QwfD>&XsQw(N zjTB7=)tkVgNTa+1?CN87#VZ@ud@t86Bavx;-bwo%RWOqJVk-T|R89u*pr3v1Ph%R> zIV-~_{YPhCA9o%2&1_z1DH4^@m2J1*zN3JMwh zjw;VwUpUfXxc`NXVo^Zg+LiS97oNJ`_^n>})2N0?pM1V|01I1CXX*7Qj}`IYeAlf+ zsD2-A5-gVIhQkQc%&lbXD@2D&uU8H!_O`h#|K&IESKt{Hh{Ts|Pw{cb{H z%TpIoc=Y=t{+ImrIFljQ?g!7wgb9{ww;PACDl$$3FJF&rtjI_>eW;(8yb>~NDQW-s zQe0P$kG%4uKV{3e@74(vSrtzi;Eck8P1i$haSXGel9(d~pDa?j=;%Z&v+H}FF7VY& z#*@v{onl{vmPedQ@bSQB8)@WTHP?U9Eyl%2PDgYLb~KzB&9{t+rk-}euK&J4gnH#+%IPagY^`-_|K6%+IgATDWdt4!$)VQUR-7#9IH#_#pcfNL_ z-lgzVod=EMo)>|CyNGbNUO?`iTo@*7!PAoQk|{2bY5Sgw<=C~~>NU$kl8-qGns4e{ zG(?PEk#dNYxn7W`<5c6XseOKXj8t5i(6v(Rs_4_~t~&8c#J1>DV(nDskSTq~>!VHN z;jwFeQ{Q;z+?Hv&dj6Om>m# z2D{=*%N+)5l|~=W3obZ~JdPBPo~*k1XzZroT7}WZhg?PXVe6W82{pR`ChIcx;&%We zbvmo6f5IkZcSPUY1Dnx5=s!WT(K7DNNot{631IhS1}h!r^;)-TN8LSz8$2~_#pmJi z?XFdwLx>jBOw%XdEw_0p$-B~}go%-bn1Rkj!C6%}_4u6>73c{j-5{>!&)?hfMuE1G6e$Gn_hLv9TY!>J{>`mU>aZs?|6>io`HBJS^8S0B9Ol_ zR;EX>6Ec(#RWWTbu9@OgExx^dLFFHpHsS+0pRUc73+%q!@LexH!}VrY7#)A$SNMji z|6(Q={I7&A%Ql@r1)i(QTbJ^}UPFrCJ>5n3-`oa74pd|$ltHGbVUlvaaJo?lLFvG| z)>EipXUA*6eSeOwm;lz%4uwUROIF;x4I3y52k%9?Lkt^CU^IudRutg#I11ReDeOUu dZxBKKW>w)(AQnELquU4n4E4-(OSN4h{}%wjANjK0D_Osjm$LLddYwe=zV82Y<-LBD z=zH`evcM=LB!tRLBL?^1uVm@IxUf5m)2-qE{yUmN6upO@AZjj%FINf->VLcXEN~kb zO?!k8uSO$2O~{8J$o;n~Oz;>Mm*T%(pJ3teA?`3#Isg4<6tK^Zx{&{CV+e8-$S7){ zN!S?2u>amI1bXl}+5bm-p`;XI{ldr;O{jkwKYs?z=hPJKw%$ z0Pl_2nW_E3;A;Fp@7PGsKdZk3Y%HdNR|w6Yky;}AlfA`y=z!MORVk7Gnq$>S7{sgc zu)MtAPyXxLwgOCees;*t`@5^3I)UNst+tz4{-FpozkQP#4 zD#-10OV!oYm6DPYvNf7)alF>O*b~@!%ZK(?r^WhdPG}IWynZg5EjE}$Q*muxd=Unn^ENn|AHW%Z*q?}OlG06Ly3f|p5r5jhy03~in)+ZstE zu`5hx)P6=SlOR=Vz{3%64MIRbcpstA-wj=LAjSEUsY&3La3DUbcb#`_NX0RAWg@gp23a<;MwLq=Z z8!>`qKdNx%ikUi(&E9vQ=)uxQbRGNhYu{kr1IT`GV>C`^%!;B)=^6o3-4Nvb4Ld?w zRL8yR{U4TtTcns5!WA)>H3j|tvz~`bl4cG9HWO0!-Ov4P>p}6Wo!)!ndW)!k&)gBt zC;9$Bz(;JqeL(H-`WaVIQGt$38QCIE7;dK zDWX-ihu$X#a%mva?$j3B4@*K+04?wuM%x>Aou2fWBNX&Teee&a{H1{%cbzoEmFS4W z=}kpudAc^+x*qnzX=q6F#T6l6|IHwxT_{yNyI$VE212op{izDdDslUcR&lD?Mpa5m z%939NM+KOga1M)+Z4$B&0%xo70bUyH1+Mvkv+4KG}^fYzI=(0H98F-y%hhC_RK-fN)+1>{3fuXGW;t2_G} z+@Fb0&C2Q@jKI>g{}L~q#w+UfNA#i{oUsHOa(4vij^Q~U-SU8VLC$I>zp~6TsX%rI9_Cb^RGQ8bU=opI?Vs! z_F}p>IJoQKawV(yT`Qe(c7(2pFYQdn*mG`Bts2S2fsU-cx6R_x5cR8Til93lm^Nm| zjmB!bfT{1bx8Oz(R5%IpP9EMA3k1j%IG9|k3ygqvwKmqM^x%5}H*=7OOlUHa-3(QOq|>jRi9G>chaHw}8Uy&a*P!)pcs zyDqjkWJ-$}wmQZF3sM48pU31ru$cJh^e$uYcGzU<@E|tTK}o;N)@;F!sfq`+zZQlr z&5=kGig-nn7CPG5sV2imNLQvRtMmik+7uK$sWeM;7+qoY)3y1KQ_GGhtXflmdhy*= znRFXA)7yXTgDHKflquT%-DZPP`}4IVBKIeQO7LgDerc^d(9Q^-7l#^l=kJ@akVbov z58R~Ci|t{$T>0dRy1#7Cm_3}s4PS;O30ltbrwZN$>A{E0PvgYIeCqb!h*i6im=I^{ zx1PaT^jm$0*<`T(j#s~Y3a^0yx%YI2{qq$;MiNP4!9{VWE?&a>TaFi_M?-1JPtlTD zxxRUtVV$H=9p=N&&`EgFgP$JSq-Ut(drgCGN|&9i0>D4;ocxP)s`z(5Pcyc#fj_$){0Qi#+7CxS+L;2|+)?~r+K`YwZ($dWeXkJ2I6 zoqIU0_Na9(2C@72q8GBQY*FW4n7YB8NcF5=%$&IMu)Hrn_N}bKs?`4sSDn-37j0#e z5H2se_3uux^r#g<>#=m5xl%HR`rMzd;trMi^gFbAj#Mhs&1kjSH^6!NcW%vpZUuX3 zKUP!5Bgm>(?vmx^bl`dkj||R~bK}AzZ5HhK&$h_3et}NAvMXHcMCPf0b0zl4n?Zr9 z244wQ$7-bhby%oDQ7pAVY`S%%A`cz{epg574DoyU{+NMxm&2XMJvny9wt-W2YMN4Y z`SE$grOgf=AdhbpuY}+lM>=PAxbl!t)p24tTilIHB2`jD=)=*ZhgxLO2dK>{Gr_z^ z$-22$8_|h`g{~eEP7?T3O1#$GA3U5K3O#4D+D+i|N-(qE(@POyNmPfIbR(f|SdXMI z$YXqr6F>9Ok~ho1CJwFOW+F7$C;b`vUah2{+$a@C5zp23W75>pD+)%cAL{E>CP)tz z*WV@;%LZq$ln+qo3@N_p#Y{p)ILOOyOt`vbR{h14q|#y$VOdCDI;9>fmiz7BDKL=U znhC>w>0q)nF)eYANPPwS_SGGi=w4RRpvFl}<`e3oPru77A(74jtBx{pIAYs@`RweK z`cprF67jiHnRzG7Cx-sji3zlcOt#Ux1bxyh^N3^gO+1BfpgufdLky}-DVA9)rK7w2oBp^Of2*~z=;P*nI?zi z>H046<%DSLOm59@`o_e-zmvGf3F5L~yQJwP^YJ?K;77n56m?&qrW`gY`|!mrH~&{` zKijo=!-XBXpx___C|dDO6>4(N9wu@So+8*dMc$Lm)BhlJWa<`n(c+6DLXLTGy@i5- zF7eKgTF< zWI^(A&mF6#nN-)x)5Dc%jZr(NyGc^ZX#pPzrQ1V54d0GMp=+cIlW3RPSJ+e3R?zfX zAJv&HR6Nh`}B>iIT8d33=2w(QI{ge54pN0U~%eCp!k(b&Xu;UJ2&304`U5)o5 z{b~&g^V@_sEntvWRu;LvG2kp(==HmK4EWBXoIKzh^LW6GKM8Au@nCHgi@gr!D-P_! zuYC-`i@AgKS(bMWMuF4g^*CKWw{To6uJeDJ`C0nzDn8ajyF zw9_9=3HgsrBSeM$igd>%V)kbZ+J4xsb>J`x&{4eVC%DA9<9J#qiTU(cMR z@9JXJ@pAkSX-pQ1+ljy0%s|+fBXDVH9v*Y&r8oRa>~%S8Mt*ZuGFha3zPfw!ZAxmS zZ{(4Z)mTN$YBcrz-2p0mjz!>8*204CoMbg%(b`1u59z{2+}=1REB4o>s%F;KU)dIO za!NjvE6>ySAvktTxy@>InUaG^Hexrgbj%>{+xfHQtvmDOik?FzuU7^vuC3FiSq8?& z7>IWjKC7I*>}ATDV|qh9&zbe?C?7=p5$Ac}0JcyYNgYx)K}jkX#8hl^*adWUppV4l zZ?M>6nr!bRgxzs%WyJ3YhEYSu?A9r9leB?6oJt!BJiy53Z>c*;iPAVZM9<{FDX*`J zaH%_i2r1s6sIV=)15Q=tL>G0>5OQ1$W>2?e4VjrYi1wFtF%36tn**Y5v-WYys%SP5 z_t#WmDmkJ-+M3=RRadof$nmBlH~0igxR8UTetr_uk?$ftc;gw1;qC=k;(GQ(m<`7* z7}`IK2X`ukT}KCyD`0vJ___ZXLDA15G*N^3Y^Pw{-mLl`cSj3)E8SJLI;{}EBcV$@ zJ=Rk)j=!6c^{kk>)b7$U4u_WbH$EhDfm5>+f5!YU_AhZ>WUX2Ag~gc~D4!v*C2J;d zAIg-(Ec)0Ti`KRsW?Ant8A9H8CZ*LxzuZmge7qjQ0=tm@>AOKUo!?yI(As+2R>BV{ zKvQvYh+}ZZRQslTqw#@#S-OwQ`6QYJ^FaIX)SqF;1?~m7xuBbE3dZ}U3#}F zKK@)L3cT~(T7+0#w>j#fSCL1}DSAspR?SI}ki_fdVF`3C<1dF`Ec*P)_@?gBW6V+L zE7V%4WZU@M4$Cn+KF^Qj-8&YJie`+~FH3P0v?guQqjTNYffLhdtrLYi)j!qlA=dXt zwusImwDlEwalR78JUOT84uA*xRvqL1KIXOoW=Ywqkn)~<=s9#X%Yul~4@&eVu* z@Oa?S+J6m`u6iGnZew1=utZev!5U8@$ZI|ssn$YKG-HST+$yI(5ZB_>*95H78DH%X zhkYxfOk>QBZ@T)$@`p7fT48gA)Mbsu6l znzZ4Gd~ZXE}h9(=?YzXJNum4WDA~KKkLS8@RjI zNlK==r$xe}qg@y_y}?>5gE(B@A3XKx`V$5#5hpMN>#-6L81z5X1Y_6O-$!J_|Da%& z{*>^Pe@jz+L&%Qs#JpsF>8wZqIej^aO!zMLsqUBjxIojmozuT(Oa(FNMvi!PkK`?p3!f!L_Z7U1$nF;km zk=9$;c#PK!V7h)fUyhP%f?N_onv23YnzpwmFS$@!%ilGx;VGvmxRJytkO#nZ_L=eL zy_>ElvK-YYThlhm+squ{5*f21s!}3)_T{SxfAhooN1SUVnMBwx*c?wcY0_a~51Y{a ze-kX9@jfL%*Q&Ic0)9=)#v~*Rg(0HNnt7w5j%^w*HCauUtCG)FX`_|MChHI%We{Q* z1!nixOFQ`tA)Bw9U8u<~h!}J14-+{){lxyo7(RN=Rw932X4!8XH;+>{99N{r;W8W0 z_`;m`DI`&Rta!hAl2q|)P=;U9+;p=kkC9Oa^FvVu6`c3S{>uX1O^G;y?{k+d9z#*G z4@Bs%j926c3WQZ_hLa{3%14&8WDZQ$_yxU*okjRi0`t4WS|+@q5)mecvxNyUki`T< z!*H_L+jpm;g-snt6;2kv0!xV=@67`3jgl^Ii%q{vzVCd6-H7W=Xn2_2NfxR=vd~w( z1LN^=B@&T@?L!OyTd%vhfuHMqP#??jH(jPu_1SHW0UNG9Pc{@kWSxiO5}TlJ`ajzR zPJ4pTZ101IJ3<7LZ^s+RgZq3;VAAaF2<{t*-CL1hlUk;mbzkS|bs*WyO2j-;81iu+ zt&kkSo;O}9El1{}AE&Naf2@QtyT=y(>39LETYUOGu80ky z6HnW^y9a%<4Ff1ZCj(ZMVS4u)jy2uV(grkoVEgId&ZSRlOVx-~OX2bm7l~Hzy z>cb^B*j?`(_{QKBU9Yy#ml)={fht6p1TCJy2Db7OhyWQ*l(53G) z8!1#3B&L}WQS`#Ds0xt0C=1g`Ng>iV3y1#K9=@>4U?vnSy<}*hC^hr&u5|3);10~J zG{3NK@mMXtO8v`7)j4>6zB5nSn7s<<(cjd_Tg@$3>m71Ag_mlteY$wCXycz^_vMS4ni|mgu5uciDOpld62I>m4Zz&Q`CoCi+VLoA zS^c;a+gV{c9URH2nDu1OChQDvu&w6KjJR)kVaV25Y%H%U1&wffZqcwH zhF=@-igv~L*QMW6e~8=Y$-_9;_pVlqhn3V8Yu(v{J4S_l9xfnxc00LV9jZ7&T#<_C!!cc2#6LEAcayFyPb1~* zI!fDKJc#i&4XZx@VGklriw(VEYHPtMZ&l;xp zAo>sr%&G2`+0Gh;t-^eiX{XG|w6GYoz_5Xqc5Cf^wLUbbo_v|7KFpkh(ADRhApK{a zt0?6Ok9GNO85>d>W*?tJm|EpQb%n?7!gh_X)hcfC`p~irk+*wa9XT0G9L?w}B~Gv^ zoQ2Xz*0g4=-p49r)rbu*TO#vA#gZmZu{h7f4Jt7=rtD`gPjph*nN9bT-={SO^o~G9 zQqiw<0CKBKD_VbK&t)nxt+Q1eRfIMjm%aBGwqwL8B|R=bn#&* zIfwl(djy&GlCg%$Yj7k%@aJn3uE$SC9)MqWhOmCKG>$X%H#BO?DGXRG$!r<$r3D+% zgLNMWT99ux`Zu%rgBQEB6~3Y#F|@@fa^d?oVkvOjV)`3%7hht?LbF+7xUeA@Ki$Yy z^$hkAug+;+V(k%J=UeM) z0duaNb)wka@Q%Tt1#i9Je!X8Rtr@RH97>v3MgyprVwGOV>ox!#Ajt8qW38}sco z9UX_XM;o;E+3~-W=x&swtCr0T)cVPKxR}LDo?~!*eI1c94E~M=Roqzh6 zo9Zov@czU_^>Tl*BB*Q3wcN7%I4Aj{E%lgM(>-6nPjU3|y($~qBr-yr*6>ujX|eB# zQNoMvXQ6KI%NSXGf)>#e>~Lbq*Iz_o_@OazNizNd_Elp#EmA(T%kt=rxGxA=%xxs0 zsZrEMEK7h3`X@yk$A|-DS$HZjDcDX2v|!U9%5B3igPdNb6tNl_dTn!)@1dFX;WX4? zJ+eY?0+Vu`M@+2{<;NXV^FoMKcfFMGbj1=g@yT5I~G5lc4L`v55|h*P*yc&}M;j#ZmsaWf|)17o!Wh{gwmtkiqAWa74;M{LzWJc!L~+#b-p@ZAMP1_@z*|tF2^t$=l`;3&LiLJ> zJG!(;vO|0;$u6Q+tqgh-7LnSip_x`=$cPT4>P6b#+-<@!I|q{#1BbbThol^cT;*$T zZkk^jza|HpU<`RuD{D%5lau=!p?Bqbl_$lT5%~2Ro1Aqx8*Z%h5ubu&4v)U_cgq)u zI~74O$H_`^;ovGiAcf2TwA z#I|5)h^)T--k*_p2E&IZ>1>vOKX{OCC{O#ab4{E*Z0+X&beTfF8FY}q<^OIp`z_RY z{eYs1TPQM~8VD0AKY+qXn3mvsUvK_q9kr68yz?c`%*AROv_gf(`muIIZwczW%Qt2h z&QmsbGj_d{Tjo%oke)sVctC5er!z@=FxS>}neia%geKIwg=sSdV15gVqg7QwU0&8< zoZtd4&YSb}PZj@M3lw`OdQ{+p;Y(D!R}{d%Jpo-7yNww$m}Om8v2TOOF;Ooasp_HE zM=T7;l5ic*oTd*-e@qF3Q`d=P-L}^Z%jX4B#b^Z5D9^OanwtzDa=-k zXyeM-VsIKe;S4VAqnA5~Ah8H>&^X*yHgMZ;q1WjmMJp`>wbEBT-ji59qV};?9_NbH zT0sLpxT`_jHx|ec!FJf{2gwtq74dxy4p_6dr}XA{iA;BGZ$`7W;YE$By=AU4O3fZ9 z&^a2?;~V_rMQ7XJk9+md-L|kqUJIJV#I`3YZje=2=P)}m{k1Oa173jVjuMWP6(viS zKZ-AIwCMT%!i4U}t8}jv=JMj2y$#9x0-w9m^FlB@cj{v>`RunSO6CzH-qU{mE(^4z zD#YZ>!X?&ta+%_f*GX88&&tz8K*#8rUZQ$ax7NFAs~b8bvOn#Nl4vOW>u?U8>~w(Z#YQG8Wsd^SkR$uXI_t#q zta$eur?djqy=Y5BL*Wy;hgt9-+JgCmYTxsUK+JBKMaZpG-1{$TUIuO$wP6P;oh)lKmo*6>)Rv7q-K7hFy4PV+&BE%=kXzr_AQqRv*M9VpI#yuJ!amEWuU zZZR7vUcK4-Vj%SpTH&-1daP+_;l(osm6XmwY5E=uvZ>3G3|r?6JMkfuR35D zEuL_VkZPretEN#NoCJ=DRu-jmr58*4wpWpUE)2FM@)chy_oW90Q})Qc=5Kq6iseFw z&q8ML7Rvh8;7&<0UA$Y3`-tY~wq8~b#IN=dj@02}B-8Fj!EwfWp(&no!33UvezCc7?tJBR|MVC^461B|0^R~4hDVah z)Tc>hV;i?(?kNF*eqKsk9QD@)d)1yEk=@GZRFWm=0h_6JA7yEPWI!V4uryoxsa*;- zF{~z~oktSr+tB--dB~Rv7cL>H*Uf7&Sxy~#A3<8FCozq!63D-50tlEAuuSLLO6U6L zOv2xL1jexOoUvW5<(33niAZ5chtGR)VycGVbT-Oj(c!H z3uG<}iE>R=LYuHe*_#~4!T^hQUKdG?!h)`9p9$u@#4V|V_J6+sx7rgF7r!lnvS!Bf zB)CT>Z|FB;9>xp8yPCzc`@wj71w=w;s08RTX$VDqu6)Q@ce>2V;Wr1Jsc*mOK<*zD zBUU-?BlXa|FhnABORPc6`f$84GKZ_0D1z57>yDRFt?wjd!u)Xqpv!yiX?6M-1`(72 z#x4lAqH%lbMq_A%9g17&wt3Ebks?7_;R~NU>32RN?ky}9I~MG+`^S8rlO_I%TjGA# zc;eSXd6*aaJ27=R%50QO1_WNKECdi*bcbV+6TeCN!ae$KLL=VuWh!TjHDEw|W`x;-V#`si8jQv(YsZfNLD}{vJY39Rt{Yl>Q_w%o%00Ur#T5{2(7z2H8R!my> zL-zgee7x8i6`)*l`GQ=+5COD6I3T~5@@Gd3L3l<3sJal{3|?2<|85~Dn`K1dPj2(q zB=4fCf%aC#;Z#G!nytUMo^8uR+VS8F4Gm{2H3^Y1@9moCo*k*afm^h!!v%SH$xP-8 zb9f&?NWb%yucX@4xbJ=C_xyohrBfS+h(G=Vo6?^D$x>)_ZI^sBB{U?A)ui^d`!IqrROKK&dA z&Byj6$4aURRpxL>`yq;@Yu$p3yT#(X<;wU9w!tqKiL?DuzRg8`e<_Plak=HAiVE71 zOV7=#pqm{+A)k{zOB5X)9XSQuGrM@p!Aw@;7dd<`yuQX>2;O-r(T@*zTPrTTE%?X> zfApAbYQVbZ)0P~wUCI=K=A8-j^(aSFd>!0@cpYaLZ%P}*XG?o=IvjXw-n~2doWg^0 z=TG~QHc)M74s8QDQ%tw?xucN+<^2?=RNhi74HFZXmmE&;IJ@B+zWDypF?QhhdL=rR z>diXfBun(~?uKj2Ko^&gLOHB%jofkhkt%YByFTn*>Q!%EWSO)GUcbOxpYqHAX5!CHt308%(21t^y@UocD?sr)OD`& zLRd_%$I5&a{>G|S{MQal#L?}D<&zzZp8jCV1v@^AG1}?ds}%e@?6)>v*^a$2{=WDQ z=dd*`(%P?{R#^(DS^2Tao^GC9wFbsscZxTBBZyq+lLlV~VlC>rAa}Z5gbqpRtt?YB zFyMI4t0HV#K)(gB$7T3q&PyMH{O9C`Rx!q>eL3(yb$Wa+J~W_7g{-ZS>DC#kFYRam zvIh)8LPEKWD?qKKHFDEnHh3~jXiCAsfioQUTr!zV_~f9zx6EJ`pTz*5t@{_Qk+HFv zg#~l??d+Y_qi<&072c4`Pz8ESIzgn=Gm_I2m=DV z(#=$#?LCdlpDWQSc>g*NBa0cd~t#_nzupboUy8BYhh@2Qlz_dokz_x{hDo8QO7 zJ(q3*cQW2dt-j4zwlYD$dW2tYmJeH+n)N<6Dqor*Ra- zTI;7_r6>5rK6SV84@6y-=(7HTY7JE#8`iskpNU>TKA-8`pLNvk?3vo@%&Dz2<@) zq5fLUsxb4A51I0SuCMe|%|Nv>#(qvKnA^6<@r`H$&WI^Th`LupdlQqzzZw+j9PqV64cY?kADDoU{ zWA45A6g+b3cJ&CPYigWABEY11gckkAO~VVdhUK?J>~BJFM_eG_qC@>{m-aow7W>FZ zG(tx8yH9Va`GX&YUhp|5EH3KZ11gcqu)7Ji-Wr6H-ls zcTLdbCykMxI}AY5A$GBPEND)eiC1cRoW=F{ia`O?%UjNeO^z15Z2V?xKlJV{Nb90& z-8i}0)H}?7!EUM}UE18pEvj*(N3RJ=MxOfJBawtpCAKa@Z-O4{YR|6<{b(rK4BT#s zf=hke-{{Q8-#dkciuH~wgU2fgW$B*SF;}6l(gZ8wC@*{;4Vi3Ms1q$Et>W?LvN^rE zBrDa?S~@!770LN6CC4r7p@UagqC0BDPk#@&iuOmVw-Tn14kelEw{tPwZp`iL^Mly# z50+KXI+tF#Q%**B(d$gg$_2m6g5;HI*&?54)z8n4q#;b3k9A8wUetOk^Bf71yeiUq zx``2JBi{&qiW~YgrL-q$+W*a$g8(v_`?@5;f#FLwkERYQEG+EF<>g58qzD=o7MADY z{^3%ydb^v|_4V}|KzOL2WZGmIi`#V(-4F!i0vX{=TyDO#IvFA$BU7`pt68-xlAJXc z7EwJ5HXPmFAck6K(4Elc z5n_M4nP@}}S?My!=^WO)E>G_H&;|MjHV45`EhG)TlE({J-(UBnX2Gl)P6{~c!nfno z9zR#W>5`|d;OwTSw}BMxN2t z^>P?q1AZ+Rti7nfcPv)U1(P0eO`L2aVRt}&N>kbaXOMQnkh4!ZV78x~k>^&{;Fodk$Xy&S(KL-w!yDQsKt*?+f(>XS#qgTX6StR*9pX(sO`y zGnrj5)hw1m<#t&k<>27Z>-4IX4(l{(c3R<`lLF2M^=?1U!}VTrb1MsGFq>g3`swK@ zK7_0LPVY|<@fjNs)!ld>5&4q6qA0%m30(oZ(;>9BwpJLIV`7rYeXZkV^_2#BS3GbY zj?em9GMt~`3-PyOV8LG*(0~Q!k1?tgN+V0%dv1lVU@_n`BSJcPxyuctf7}tC5 z8MP3C2yWk3wzAlo{;3IhzfLYcjWd;TCViMVGsWM*6zsjYxS7?aTQLkfwwv7Uh7U~o z;P^aj{=+VX&kZ{b?NE`{IyuNw#YM0GSt%0h(1R#p4*rm$oT(_5MxZ{SXVl2ppyg8( zOzw-z@I-fc$O@W*7TR6jg?GPM|%2j7P}H!*PYJX(rci9vCA&EYhcsvFYDqCDl#>$29?D z2U`QDW>`de$QdW!VYZ?O!4;PvmgiVf@b!sVR3|kv4rF*nSV+joYN7XP&7ZN_WHXKVipOF8{Wk5x zLcJ-u=)`OmPU+kjh&_VgfSZf=N1FT%7ZFkLM>RMosK2$fL`C3sM35o_Z^Qn%U8`J-`gw$!(PiUi#Z)O1W*^=qvIZTziw!zn6ZTZAh?(>2@0N&swm(-* zC~|km<=IzZ=I?e=g>rVrHHNtk2-d5N6Pc~$k?!vS?s&4Z-8>e`iUvcqrbzC_egKH5 zweG`_Ff^8Kb}^M}b!xNE@-p=RUhuS<@YFcpVa%2<&ntf$f0bPK~3aRc5}?)jsvgPCvZ|XbDsD^S9D#XsfNd)k89*) z0gJX_q(23CN@lP3Q&^7ly~$*L`9X6*=7^>eijLS=VUX_CrbC9nEQY3KZybu za<8D>`~zYd;B{~Ujjy6gFRGA|Ge@WxIK=h4ASheVt`SS!i`z~+t_E!z_#k^WlyAM2E4Po;WPKBJa9<%=Ux3<2><`Pw^K~wf?Qq2 z*Lwmv{4WoMNJ`A!05dww%^~;ii5LY6w;nj4ry97QP=R@O52IS42py0wRLW`sA`j}I z>y2nq!d0MhCAL2td27{sU0O*gLB0<-pssHZ>Q8|WHm*;%vQ*#@uat1CDhI~XnE*O4 znb|lMFa77?A-m-#?B9BjyQ|~TZ>5Uv#Y@a>ryoZOcEo&P|L3I+(pZcowI}@F> zH;yQ+`$Ht5Ngp6rsBBujDE5~ru;RWR1;h<+u1-(YgH;MdoOIsLrO_iBHv@SZAE~29 z#8~gTZ1hT{EY1pGSM)o60KmsmezQiY0wp&$J}}D9UL;H1?B$3yid!iOUX7dz0Iba9 z7Z8vrpP}UEU;Os!+uYq`@MCyP49flE-7!Ji+nKFN6?6jRiMam+mcV^v6tE;ExaLT0 zO=wR(B6@Fp>k0G+xEMr$%KcpZo`?{2`!yUMor=U+nNztcV+J5%V z@E80b3Xy9ld}`(doPkR0W0d}(-?Xs+58aRrZ~o!4BWGvl3gb>pkB;J)IKLRlXiO@g zwo}<#2Xna4VHTq`{YUwu5Ca1$hYc2m{nHByMrP$eoLCZW+luA1Jdx3)mxFnN-kc;7 z`L{y92a9|~bNeM}F>l7zk`75RtMMPjg<>1OC-CGmk8HLq>NaA3H-ON6+8uNgUUI!k z*QhmUIk{YM`K^80T=3uigcCCyAyT4_(6BKm8WA4eJ$~5j_28F%P_0v2`OJoe^a8mUi8rO#p#O^R4%EvGi}GS7X{MGSex3=)G+)wx51I zTz(C#SH^g4&TOvuYNwUf6m-k)x)H)-AF7*n-2QxM-Z+f0_3tW|2#dF)S#KNybO?qr%W;=HBZL)|W0WSv~z9!s5T zzWN zGzLura?K#E`2A5B?0w64|Vzggl7p*R@2!v-#ue)X9!VJ5+*q4K>#Sl%)E_kI8kPi z%{4^<9Md@Bw4s5$Mz2xA3=hY!$z~^6WJ^;D7l#(8R5a9rJ{j+_q78yl~bchJm`u^NEram(9=0jmB8g2wuMVpP!G_J#EIbI^=lX! zQfcXX3lOuaUAVC-=vtvTd>XyFM7OLI3Lk!fv|C&48Jq{3M`TNPv$Tb(4|ZJi-2%n# zdN{jB2%)%9KBYbc46L7wuOHcI!Q?_d*#1{X(qdv_Km6qhzHUGG?oUx(aGj3!x^5u> z>%a8TxJHkIR`2RyK{ZixjgLy?V4d$4R^jF8&rI*oM(87Ox>&5VI#nL)wU?=v$k^j( zMEKb0BAlIWjaEO=eUJZK&m3dm2UG3ZOEpA-b~GH{X|~ixHa!f^qh=^EHS7NWx@eaBpxGj5Q-G&KIfs+2rX4 ztg&0dug6{INwKy!Tn0>jY>0g zf^@?9#m2$LMiIex^yX=+Q}H3U)0s?j`hYguX}B+jALW{7)<4{l(?mjPLdHpHvH7z+iQ3`Jkx zjrd7X-OY9rM>dl#ewq>k1dA;+x+Wa2 zvkHPw5eZoJ3Y!F)slbYmDC9E$^P*q;nB&TsW4eUpv7a5T;oB+~rY_hio2cy9IdSR^0ER0pG z<_r5*9R^AE3s<|wF0KOw5iRhe8vVvn+M4;RoaCl~7wJufDO8W2C8K-b1qBVQj9nk^ zZ)(<4x(3YQ+F3d#B;I#j9j{}BH#g&1E{w(CP&cI|I-No_7{@!;u0ynL=-V)rd04=7 z$Tpg-0*9uD7>V(+8BysVQ6lx=8YxKW*7v#zpjb1PC%#K?yPM%y;o&tQ4i*{I=nX}k zNq1Z}k3-d-PJ!7v0KHo*R0|Rx0gk?6&p0?Es^~=6VV=2V(f4$7*kcP<-Sn_wJCN3D zwnDYRV$`d}mMY$p9G%?xGSBH*9c@_Wp#ppB(Ys-)#&LkvZEd3e#s-xmU&86+65!UJ zG}xsoS$`h&dv8xE%}GxGN7DBhVkjvg#6Q4SK(?>;zOU&s&zSjhalHt3cj+FrI^#Q% z-*E^gZ7R#wCL7Cww2X||*Ie@1R>zw#CZIsZU-$7-ruhx^=6n1tbni-sx|f4`QnQ2; zm^Wv1o1KeJawFMmqjL>)rHyfY47_+$nB$oaOJuCU69m8Jsu$i|pJ2@r8jslnsmXj? zt06^Ru72azoQcf{z4Ms&)X~;)^|h8KV@1hLeUwyPq$b}tH^W@FMyede#xhb9tR+QJk_-RCy~Yb@6j-qtKl0;tSvU`z|1x>QEj*)Be<~L6d58)ZknWZUj~)Ow?8)j*l=bB69$&S(B-nuggehq;R9*ty!A|s#!eA`%QOqg_cLfk3 znn1T|yO|om)*ZiEs--xVZ+y@Bt3PwSovP~q(ff70+4payKc4u3On0=;@&sI457D!) zXTkWdf6dNvu^lVu7ZS30GXUZ_je!RsHlCZ@hS9najVv9;?KfWY98E4aolOzh*1tI! zOa>jyf6)^Ba2T_SD{EcTe5UeM1wDiFpF~aQfF#8_+R^4QqePm>;LPl7Yz#}m0hgCF zx_*m2Ppiv%S=B^lqYjYwvi6RhVlEDe(Fd{16@DXO{?y-Faj$CR5P)R&^y|Awbot|I@C6+oQscm>2t`ZQqCr~7k8 z5~aA91}dL@Ec>(qRPq}))>fihc;d(}fKXopYn53jd1@>9mmS(sQw$X{pxW&0k(&o! zBgslj>kEnbk_~;`1~RKZM2L7D^*RuVaFV4VnKeFRQ)$xGFL>-fG@Ih6KZ3->ssdAm zm_`O#uQB2mStZgQgLu$+x$3>#;qeNr^mFk-XTWk%6)0xJs7L0N)EC>`A0{Q{+NhVR z&xD?lFi;VUeZ`mj+T`;JE(gUyBO>b%DMd;50runVrO1sXcM|Dqbe#Rfc+k(~7Lr#= zL@uYKP3CJf6lnZzs-GTM_i*JhhZT-$u7<-ETHG$XBv62e&gi7at1E53Jeq=cVw_wo6-&csED8KUAFy$ZiA=)MvHkkZik8k-AwqG)k zY8ckq$3m8+4RNUL#2*Ng=gF|F^Z=at&v?tN01E=iAHkPAt<*F z?|EkMEruzbs%KZ%*PoU+rb_+8qDeTJIlWoxh9Kd3QJ5U)S_(m}U(>CsJ)8tnJ*y@z z9?|}p(kNw7VY+_RP>07nk$jL&7bAIQDrEb$tcFwxzuTDS+R|#cNIChdIw6Z%CZE=7 zJEl$D3Qd~kmm0qgIVr*ga+fsG^Fu?H817MCI!h{z@FH``TF(&o5I{&}|&t+JUX2C(z>YWz>Zeu`Zly zrN6|Tv}=-YCciV+E*iVS5e;>oEej6z{TVoVu@W2V*dCglZ_4R+d6b(V8Fm_F5T#eX z8VzW_*XKCdJD7WQn5>2GMG{S0F?D7zvmlx z%EMymAL+(Hy3Oh|nZ4prL%Fsf*W_emXjtlSnaAf@cVx%U1UDk6d_`UN=>^YMIY{c8 zWwZV$wpSW2*YErO_+qJBl5Rl~UFP@eyVmboi}{f>H(6ZRH8SSt7{v6pVEo@Zoo?L(6VmE0}!cwAp80o+nu&n-)% z`9X4pW^5QeS%5{!^X_kxBgR1 zjhE^mL0FDJf{h*?5lxuQ^*o`=vQQsR*T>I^xdX$mTfbit@rkZyls4YU=@5vs&&U>& z9x%R4SoYkaw3II-LCEjaH8$!5WnjWK=M!;UdU?Elk%ojV>8z$(GyuT{;C?gp}F7ihjlDF4K zD|`Y48Q`O&C8{|U^Av*aRc5Q(+m#tln=GLR_CDtustiOC%n$)yqLum^f?>68@$Jb& zY*}<_e=&5Ru?*|IbZuX8`1x6P{2|u5Ai4$!fF4F3I$_Q=Q#HH4SI~ZQWcERSD}ea+ zQooCuQF*_l&W6(+*qQ*Z$PjPmCz3UWzihAn>Pp?E{+9nx}^*Q0u9@o9N%awK_Zb@U) zr1iU`>Y=mGBD$XRG*~PT^!Kj!EbWdDzj<>zgC@j5>?;w3iVX!$b#PZ7Ltz!ZhSoz}{(Pk_$ zy-3}H4IuVkJ4H6;I%#(&e+;(>thd#Bnfxg_ zXRgYyK38|}zpl;6vp>jGNT{8`Y9Pf@)cK zzrKqUn;-alp5TZf|B687*+QCQ_NBp3yG15t$=u!1p<1s@`j zNK$Sy$atP)H~?k03MF zRumYy->wHH{|mtV*AHD&sG+vew912+O0C#n?&($NN(1VU+j4|RPF~(g0_B*}JVW(* z+MlhZrxNoB5cfVQik@ikl1=$JOWnlb@Izu$3F?oBAGkZZxptPu|Dy(oe`FLC6bHaP zGU_YzI9w$ab!Vre5({WdeFUWZTGt=ief|B0uHyegw=p6yl-<3Q?JRd5Bjs>^4@e3u zcX6qLfEH35WdLls!TYh~w7|MCYJL+4BzF?(S|O zkArveI%On?jkzb6cRqO1>1&^|9Lzpy;}j$g2};V3jK%`hO-g0(~XVP z-Q69O1=00&&W_hGUtP`1U|8bE-&`*LG5cDk#VT8GIjTQ0 zBd(WQ&X?WmcNibfeR6Q#x;zV5Z%^bse!Z9Ny*bXLlrACezKe(i0B-^4M0s{NNDYLb z`@i-U?^Ql~Phi#o<}7Hi&GYzku?k6>*Y6QVWbP9qfl7FY9Q-95iq?MlD{#Unk0#_2 zkSYh~TX}2Y)P7x{4JMbz5FYz;INo49HLQNJdRh-7hz{?0kNE4c3})>j%rmMjHrAo@ z!K+p)EGm{SbmoN)N-32IMApjimz_RXk4xfvdn-wTbR!;Czi4qDrcQGMURAc3C&ALv zlDk`uFjC#}DuA=2E9m^#>2Ov4EhoUHSkCtrTNRbAfZA1HJ!y9E`*%0!R3f!y5l#Fl zkwHb@W;u>>;Sg+e)a^`;#hJIHTfalYC74Ysll zF;K>&fYTQWu@b)0`RHm>q?Sc&SogYrmfM`$W5^WI3DZ#GC$IppZ})7iIRuyWTb`IF zUBxAE1EapAM!{~f;%+A?yWRrBXPAvsZPZL0>Yd4FQ?=UDRHTwlwdB~2#2=L9;wRec z3-S0;E3Fuk58XO;mFYW-2)?<#P;S3C-RJbMY4c6LaBbvyB}LN^(yE0)=_?;a6k2NK z)8=;#P9iFNnRyJ;RusoYva#4+$LD>y zn{UHY@}K&h*I7E7F*&XFogIyKpKWG29h~?@BYVSN?abDaihDJTB{w@Rr?sRXvn;P= z_j*>m{BuJp`Z^|iq*flD&a6d1$mqis%=FM6>g{GW@Sp-n^`H#wK@2PcQ*d&d4ut`C4$CQ9#LbpAM;Tk(>(_V6 zo}BeDa2m+*E8h_qW*J~}08Sh8aQSS5Jp3`OK?W@RA!Uy|JZr1pi=4&tX#LFjz_iM+ zo!oLHr5j9qDD-I>`&&FyP^Cej9c};TblHEI7BR=Tsd!{}utH1Vz6DZx#-5x%drO7k zA)YP%`x0MDHJISsa`F5xl;Ft*3m@GoL+XZ8*_tDvRuM5^;>~-*emjeF(IC#R0Ei&> zYPs3*rOiCaOPs#Y`&$e+ag6d^Irx}t$Z>(J%E^fwQKqM-2N03AqxE6>bFR)0kYMOm z^GHh=vhnqJb|j+3Wvwt?WbI_vD9OG+E;5v~6gxXkIu~x(;`|7-%Vo{4RyyHLECbw0 z?bxX6fuRn3>by%hYUC8u9jtA?AIz*6PnH8n%K+FieStrb_|G;e=xgJ0b8`(^oQtm_ zXrbR~jLqo~u`Qj?-kaptkWiAocpT#cDs)1CSQmOn-&YY|6N3m%Uh3cfXjYKIf;I0+ zRQu>k(ePlxk|uQ3CNlX|hPDLk=P`%JJvJc~;H2X$dXK-1(1a)Bn3dfkkOOqh0B8vv z0=4pZORxA)Md)@g)JlmgNg$?W%!*V82=8wS7zLlz8lB`?ibG|1eJl=l@8>5~TU+g3ujalK7mQHwStn$pn+n~PMRr(E()Gnr z8bwGByc0A-RS%s~z98*jgeUOe*ws7tV`E`W0)d@|))FX(Os_1`*iB-6UHo;S~YnKN!a$*n+eO4{E0FZEr_w zNM7rv9Aqn=QE6aRT ze^=nyYlM)Ksjc>Zn;ChUkF@VXyj{1Q?B9>l;*o`j%06=4Ly)m(7p0j0i7{7yBs3k_ zR+U`D)7x!8um`aNoJ06y!J6nJl*k2lKCi<=OqHnhPnBu}v?fGB;D`Nby;5#wZI!V zvm783Xy>vB_Y^!g(@Q%3fH-WWM`*ln;#+ZfZZ1}cL1E9gB&gMnxyYReeohy0=QFOR zH*>4b=FixQFhhvZp3x&@jr>QqK4HNIQmuy47}c^?w`TS_{ZN0=+U3T6$edb1X_vE$LSEp6_XDuaFkgIt{j&;`Z+tVR^Ms! z8G)B#vY~6mLUGw1CjO>+bmGdPJw0(CDDLgYNMd*#7j*zJ*|gom#MCgksQLFh<-hKm zdvO5-dwWa4&;6SfrLayTwtJk`I`C7kcjh`nE{=L5DnmDmz+JWpb(}g6-qqj%5reVJ z!e?$}`D+lH^cb@+AbKC?fE^6En-4KMjs5B-sl1f_2R^Wep2aaek2pkLi&uhO2fuAOeOB-`gFAQmo`;Qar zpYs%0h8i`k_V7kuj(F6sF3Ufc2LBa71HM~44GsU}f4=TNei?Ld#)75ql2;qJN`tZf z{$*-j9reN<^F8ZB@WEF2Fj7=q$iF^Mpj%19{7!*PTP#V{ z|J(WTK?*fj1W{5DdN6|!SOD>Ra5;6<a7_7XD+cVZ$hI|L-q$A57H@ZT|oI+x>I4 zR-S_ewfFowzflW(_d(is7MW`wrn4UfmIZ+Crj<|9_~5Tu>}thiH>@K9siQpb$kH!3*_HLN89>y1ex8TE#OlKA*v`=nZ=yi}4$vctRinjF>R=eg`&snNc$bD6J)5 zp8<)lBEYG*-FHYc{)noEh8z}0CNsW5aj2kHh}7ltOaUuuqNa1F-BI(;07mXd2Ds@4 zuHbzZmKhBHYro;~qZWkiNoCWB0Qy5(gw8XEAfZT|l~22D9#?}LrTBZT?z=r8PdS)= z_7VxIA$U@^u|vE;)bDh8VlU?POKGr>4i5utFE=R=e@lVIJlycl#Z2=CHAe!7Ur+v; z*{*4DvBha`rEQW)Ora3ysZt{G)tcKDxz9^6?rC8VO3{8>o|z$;$dkl96dt{YO~Mi5 zcYP5U=CwIy$mR9x8HYSP{G;@ROoKFdV#Fzc`R{dKfaF{>mB3y^`f&wnWl49vwRj#Q zP?__E@rfLM1P3MtyUovo5)qfU57VTL(G*Lco}=~`93LG3mID#^z180gPGcRuCUR)% zS>f*i*>6Dd4@G!$Gs#k$XRy!tQ9i(3IRXOoX+P4g;Pr^}NKx=uFmt@@2mum=`KQdr zGTk4y6WXlmo*_*@nEz{knYvu3v>P~GpKHw(J@YJX5@=-;7}dxrMBQ;v2|Gv_Aj=fO z&UBy>b0|xYv*txh5Wp?WLG9O($w&%IEeOcVM?oA}3z8&2eaiYhD}MUO=V%xtk{sex zV(@rSk~};Cl!&-^i3?0BDk`A75~3RBf#%V(6-E}O)s_nEsL5x~&CEi5RSxuV$dnRX zG?>QKxS{cNlvE$>O4a`GTbGb0uHcA*I~58GQw7RL6Ptqsu;khj=slD#%hK?6@jl3Z zWJ6NzhRMLmcW`=gd=^2aadGbYB)nv~#f1bksg^!J+UV8u_Z8m3%ocJ&2fvw)Q)e|1 zRv;f8CeAD9HRTl2s_3(_`?ypJpG}_~NTK&Y;NA_!p%$dtW10M;zw+l~;e#<`go<$# z(qKxU=SP+nqiJe4>Zz=$cxU2Xu7hAa<>d2(oCpq&c$Pt0ly5%x5Gdf}sHYDAbN_zg z_*by#Ek0#FFdv8+0O7VH9YQ6AIS{}F!!7{zQykA0b^;&^EfQG6E}u2Me*feLWD28x z&5Otlf?7tI^_Vgwf%=vCm_kHoR=yhJ+4OJFN_Er>D3AOnde%TQT_b3PLJhXC4JgVR)VFL;@s~e<-64f*_NDEq-)k zl#r{vPZsFdc})!H?eu#t>qCJIDrq#*K^U~1-@`P7GNH|o1uU&3i1Oj)Cp(C01}50$ zF0x+{%_AUmCMs2gM%odbe<;hOkbyImEmAtU5ch8P+97`{z8lOL)=DqYzD9kZwO(Aw z6zAs+)`4IWUVVNSc;xSm33LC|EgR0|+=sa{i(e*VMR%SfU@y`FO^vMh4OFsxnb4yo zM^u=2_zJZ0?^|m3MwKK+GIV+znnBUUL4yuFkQ7~ln~r?9L$2S&52nxFZ^~BerYk~i zrz_2+{@mzq6YJfxbhS$-*o)hD66t)y;8&`8BDQvxX^Ryl^;O|t<7p(W@I^iKrIxpt zV(B;GZExpI3VcMf-A!xR8Rdfc@$9jx>&4v{#nq^%q5zz6r{dvhTE+K<`>ejq7#RBR z4TF)rVZ%C*&p3*4=T}Nd`a212^>IumEV^GUsL?^d$mb;6b{UW11KNu0de6n?Izy1- zq7=APV%Fq^SN8_XrO&jB769pK0VJkaJQvXSz0=P)+wlaVa2Zi?y$UwC)@Xi2G9zyr z=T+!4Bdbgs8}3~uXF^pq{Kixbv#(q=CdD= z)(ANMQirdd?Y-W1P43SnYL6pHH14z}l8S_7-SImBDXctL4I9rku}?vPDf3ErLl+iq;~b| z+`L!J^T(M(X3D}1{nK{#^ewVv~?4XKJ^5)DuWV7-u3p##e0Pk?a z2fPr#K@Ea^1TuFgEP4`76S0&hM*81YJ~J{PWTP`A_D3*7HoUP_yt9QBXDjk}AMM$a zlp7PJ?T0WGD%;*o*Dg|`EiI!=WoBLLXapHu9IMQ+8$L#^m?w&Yig@?I^~aa8=Z%@l zG1fm4EWR&JSKi+LjO++mcakfm_B)fNyXCs~NG)iOZb&pL=ei|T?{+(=Hyv-SAAn0o5*MifEeDE)$U|x4kbV_5ASNhvkLpL90p>@%Crjag0he&ec*(hT0;9G_Hr#9 z7%TE1={x|(%>9-VASsW&AHzBYYS$}dPbh)olGx#vE?vU+=G3w6ll5VLk}nD)-zR10 zF#^kE;Jify8a^?k+&zpOiH)@lip7=_ylD?ESAuc-@CsaF${joX%8V<&iS7zwo`QZG z!AmB+!L8t#JwzsvLl)H2+`!4n88|!_w2gq-p!dpknzIx3EOIdd^c^@^L z#PDum8o9+V_ZaWeHAeSbP=@?k0@O8QABBdAjvfW4i4T`E&K;|P2{_aBUh96)rzrQk zp){^smuF`~AD;oCoXdZ{s+++Z>l93w`Rt$6N!msg|NTtNu`EGqpz}mSl#?PpB&?_6 zPc0@vnVeK_Lh?PxIMOL1sx5X3&o1FkYxyrbJ67NiI1;ZML3!o@Gm@du*nYq7@*Y_u z5)3v8W(2h`vw5CPvNSJg_ z^30le>`BOuFl;Y71}5@o*K>)W5Uh5ZS3M#4#NW0-W-R1=!X8D!Nu{-jJn#{4TnbW2 z=fL)Joh1vBtr6@CvmXI^BS>ZmS7G2Fr1K1S!Rf?0Vp-3X`S}=}sfuRXcs+|qaVz~> zSogw`Y-TvD#Tdg<%MEG^LcD9kp{4mrf{s)zMEqc*&DL2~s%`>y=FadXQ1O+r-Mn7% zSdFuXeB|%R366kBntwbl&>P@3=V3zF%xX5?{3y)}a*F+aUYwM3xJGunLnBYZ6&BPJYkU796d2Wc+jigFH_+Fswr$*RXX?Sp!N{4Cs1IdUBZh zNKN>whQ)l84URE)y~pcP=?d-h`-3K2#1iRg6D2^*tV0Z0%;LAZP){S0b>CkyChAAf z&tu1$NxVFek>87cNGSr@pL+^4x}E^E+aXF#s`>p>Z@ro9R@Ln6i^NARF&_kF$fzIt zhMtQ;5I~_c%(gHezzCBRAh?VYBif3Hz;h3^L>PE86=0nv^bDbNk2gP38kmj-P$m!( zUygPlVV%NK3(y>}m2*mG0Z;`DwJ4kC3~w!}qtfNGto z-1JGs?->OKQCf;YS-d;62+GLnUQAd#k=b7%QQRWlYcbb~ecQs$ zTv;|5S^PbKP4nF?>(5V&KkBB?#uN{a)>waVSZGos_VucG@}T%0E|uxd?0YAm_ygg^ z928m0j7@0*RIr%TOf0D1=V(PP!`vwI!5iY=*NW6`Y4Y)RskcW)bA(9_aUZ-0n|W85u;FaJxv{~Ze(xKD2-B)H z#amI=Gp;0vIy>}pmw>v+WQ!Tzwx9#z*8UIEg8WxyhGX0ydN@l*@Cn+_6S*R|Chr6G zg7-%&h~A?PyZth_UIo300PfOyk3+72avFR2FJb=y2?%l`;k7PXVqa-bK-Mb>I~8fd z#Ot4`JWu#QpqKaT{mVl2-Ff}2%U?fuTZXc^1$NaoLqrAlcU=6uRU4F(7}d-i_4EDW z9&5)bddE4taO5imn4z1S_ne21`&CrKg^}_{w6XD46M1XKHofcg32U5Uc1MLH%q?%)JPSJNRYCYlp$xAyCeo#>cGv3j%R{6)gUHyW zqzd`cUtE1hrovbm?vkfGX<&^7D@@Z3_*P|j-+sPNoR+2e;ts|&N6*R4p+jyGVg;zHgePWX(U_S|-?Ui*>_QXh7=rY4S zz{!w7<5RNH%1!7xlRfj^Oo>LG10n7x)$7Gl^&Hj{ff=)#Wgku{Zt7Fn9ofx^6Q{+E z%%P^!y+xygk|*Qs$+UM@mV!@*8I)F7It%GCU)i2J0A*Beh;TBJP9e97g8wI5B}S{I zX9H!7rVkEhGLr^XrCuFE@z6v)8x;?FvK#D+x1TZP#WtsI^ii1QH--{Lep~!O|eX!OKpcq9Tf((K|(5HSTMVZgVyH0w9<~7^DXKhI^QIH zFI;&#!#04chPOAYqAPp#%kc1GURdnhDi6!WTTB_TK3QL`jLOL2qDb}IG3rm$p;zud z+97D8eR-1!MItF7><|3O)2|2-`*a`ocOxepKy>uGAYWTs-T-iXLk$e>NO`@IX(;RYF@>+KoWj(8N!sH|9QR9>b){U{Ij6+-Xkhx*|c= zDfGZ12|$p0uHytujO3l0rDSl%I!;o7u&leIay{)pl5r-=y32c+9+^LA4VlDabW-Wb z>9l&u<<1z3CQ=AQY&J=&B|Z#c#<^z{(Yk{gpwel8ZM00cypB{hP8}7g$>M=<04Ef? zozvo7wcMNY_PH#OX)9ix?kq;T+DwgLeFPco=;8IRA5#lidS!WQK2qr7AkU@kP;n01 z#uJ2Bv28t~ntDxnZal?NSQzUfjzwa`9jF`vtv78^#wg#k zI(q(jE_m;sdZObMJyV1+7t?+OWDxo-#DFwE}^aMC-<;mH4I_NSxu?~F>m`AOR8n*@=jQk zom9M2#wm@OctDaJM3j_9l%^>^>;@{%K(NOJ+Y^ImLTr3f$bX`$&JrS(3vQ<`q^If$je$sHr?@>=BzHecHWEm>5}TF``tTUCA5 zFZf^ySP!#sP2J(^Gtv zC}&yhluN9!%R9z;f1dZ7tmtRDOsLg4qAehl%T}IsPapP>JSq_4mY*7Q^-K!4?LnkM# zg;=oh>b_))v#3XG{oT^s^puQwVlbWT6 zrnwP@$R7Ei1R}@=@NPJKO*xUd?1nsAeO!k{5A^s3w1myg<${9-cp{%Oa*Xg{l!egH z$`qy`Rh7QXW785%UVs02uGRe}cBLpYd04YOtGV}^sLucmqNRN0%tDg82#fLsn|@6n zte^)Ag0NwCIUYU_iBvEEeSiBu6x6B($7ESBRs?!rB(Eh#+7b^u*_xD4Wcp{&I#Xj5 ze#&Ek*E|2b069Wvb!F}d{6~=yYs3*Gs!>ih2P0?$)_0x~CicKa#XRG&9rz^BN+cp~dtd-fiOuR9g&hHx( z?7_$J+qlYRVWRx7h9uRcTJvzBlk$@FeS7iV#zC+dXKrXBpB^n3Bzw+hZ`tz{*#I2UJr#Iz*H*bRBf-oaY$6ui%L);?~!`WqpA`AkTXyPXXwFYvf zpF>je3&Y%*2nQtw0~;;y_Hgo5b7L{+50SaqrTk~SduU2S_XG#vj-`}O4UNn22 z7BtqXmuh(=fgBGgXrtK{%w5WJQ9M53^xP-(R-68=tgMOhN30)ql*c zQdsX=jSYfza&K+;!SVeP!K^)?+%}>a zCD3s%ZOVj5vJQ9F#AxE6sM-@wH9iQ!Nj_n5_zWm#X#V7D23-!3#XQY8B&2bOay{^K zy?5hi@^%E)S6q%fU-@k;-}ML&N{n=!#$$cu99Tj-w&wVVji*ZI zopaif47nWy5qz1$_A6T4?Slf`B~i@tD5POoNi0hFcL>;YvClb1?jD-1^ICtVNnti5 zP|M=u-calD-5jF^&*y<|q5C6yJdM?1KbI9z%bG5yVn z6yzU*y-1~21k1e(o|KZXI=c{DDnUqu=F*chPIBT19)STARyQa+GzM70v5MsQx-7hj z(9)@}uv*J(J@PN$1;oV}^~peV`ivJ6L-~|HtGW=>b(R-!QRtnlyvEx@d_D|`=jX{& z;L71pMbLbpJ{MH>N}9+v_w~KjELFEZ7mjpgWp`L=3k4$lh(drmSJDO|d(y)yqHNi3 zfeqV>^~j?rC9JjBIeTEOX0cVPUg5#quPK^h^ks6a;X@Fm+0Qi`8Zd0#AN{x^36)av zI~Iz<=?CpGY@C)18Iy!tLD{}urEVh3dglw86n{J8j4Tb`aFX+uVE z|D&a}pBsT7?B@p44%-Z8w2M^=mDdJJ=nVCt_WV>YI($OQLdX~glt3*i-tVRjlK`hC zW$Z)O@x6i+&Rjc|KL;@DOUL02Ncr~Xb-#~?S=h-3?(x;hAhBbQx1MH+N*|< zV?}W@1+4j@=l^mG~}ORGre^z3_VeP;Y}lY}rFnZ>Cj6(S-F@d~BXo ztDzN_hBm%b_4`paZ;)1C2Ojt7&>)s4bBnpc*DIvqI*SwtQhKscof)yn(a-6S@LkI7 z{Qe>Q5Yg?fYZgjI+E_I|N!pygBKz^{WPFZkA)9e;dK=jBA4md%T2@^Lm*Orf`^WIC;Ju zsb5x{b{rPt-UA_Zju&ud3}2=6xUi=9PkAtqlliT9fQAAwm_f`A7P@Go7R-pqSsv?1O~p+EN@p~Mo)}v+J6g-{J}Ws75LFpW z95lKVr{51FDL9MKPL4L%C?uRoUAD%!LsMv_6>i&slC3&2o)`u47%{UE6|uKvJ}WX7 zeNbFZq&oc=jk_{JM_Ja{M5m9=R2|N175Atj%;Qa>LIBWbLPU_<2IN^d%X0mAe;k$J z?+sgtqe-45`B+1mG9M zKzrw|d=CbeW}R+1O%&k?(*ysLxwL8{t&h4tXcT7*9}4cpKCa(i25l4Iv98d>4@MGJXL)2xB7gM;$=s8(y?A!G+_SKt zqF(8&hQ#`qD=tZRh_kDeyF(M{FiYIv1nzxlF6vfernyJ;GdCX~2o|1q&vTm|&Zs2W zy%4RQE95lkc**}A_b(Qp!G_I6yup&%2xv7?#OxWI5Xg)sHdz@JZcVjZDTN*Qof%7g&b}kQ;L~WW95=4LwvkY{ z96bDW)#sLV))3K7{-i~mmJzvv_@CZJB;!Y9;D9Z-6ZdHTHw5+;Bm;5vD+3@m(RxcK zj}4??-Bqu9r@P@+Z-k?uR-}(|PcS*OZdqx-8KSm4=BKVKa4FSX)0~!&aO4l2`D$Ij znKxgNMLGs6ZFOY5GCLFhyjT3G&8*+qse}w0M*mH8+c6r7D`wn42)ex_zCPj*3wyWm zLw`IHXk)}ZImc>sMaq_vvUIsHi6YgvJw&~Wv-?gG9QqAq5pEQz6!LK15Qb0*2}fU5 zCfu?etMAliBZs4oK`xO1hyme=dE}E*C1AU~*wL#I&;Ao>1Wm7>@6p79DaLYqdWQdj z5DmtNusHy_Y>%sX!=wc(QQ4cVPz9XsE+`X3&v|1c7U;BCTC#ASD*fks23Pp}HKGyD%;w(F+GM?a|a*K7ANZ@Ks06VmnD^QzjzNEe?~xMM+49= zv!~}p=XkWM-3Y9|V+&u4JOxKtE|hZEZeUmN>ng&7_vQHg&B!o@Eu%Gb?VCK8kPQY? zu^gFx%29e7r>j1!tn`6834{UZPnXe;8H0{nX9e056U)?`FWXYMiFx%^*T)iNb%j*f z?9uh%+M>yAH2HWLmQ z+yc<>q)x%9#*!I3%m8Vb^WyeS3tuWaE`bc>wMSGWgtDJVfjY^wv|XJpRnl%bKe$SH zvnjONN{HCn5vpcNDpSF-TPw+q1q)+*%xwlQdup?m@T=h3Zj`FglI@-qYG+o7?vLer zFq^c&3!1FT;y{clQ3~ah>d9pqwBDL>Fa4O*l(vDTT~pg|-ZDUFeu6dKpbBZVIZY&5 z%T4sg78#0F(NV7Z5~wx^m!4O=!oGziDAO8f(-puOROS|Yq*RAX#v`{w*%Tb*XLQXmeVCw^1nIrlHZ z%W#(`TNolYLIc3JVv7^!GX8#X_K6Zqrz zzpjmaeF+3Q``KC-CX;rbPz)?W8v9H>XM!|Z2hZo}B!SM$xgu_? z=WjhlWv9|l#73PquD?BY+y0uEOpmDl@J2+D6hE*uE+Vhkbdbj}DjmqF5OdzoE%+}_ zY{i`HvTbR))ni%&S0}#nA^m2>!dg*x)D6D`8M?g%gO}sYRqqM*PJVrqD8z)5qVK7~ zTS-#{D$W0)^&*PfpKHi>>ES~e-@fLQR@x$d^hV4{rXX{qfsjdK^`g;kD2EA`Rlf|6 z(JBPo=6>B1ybVcFCWXq{lD*Gn(u%;3Brw5Hl)t0w?=b#^Q)a_Om+X0%i&pU|3>R>X`nV$6L(|4Lmn%=V>; zfC_gUzV}t-WMObf7mmu?$qhvh&!ztE&CxwSF30j*$O${;WuD(yKkC-5x%0<+oC-E!-qRQ5c ztehU_gpZi6$l~L`vnEthkOT5%VIPA?E;R30hS=q=uJuQ!%Z%yBPspX-pQgH7>(CJC zmR}dmy%zgXtF|{IaUisOEZ%KJA@eS8?=D1*++(uF(wP0vgjLztA7D|!%<6d8AI*rb z1-&fvVg2QS&`-9bGo3gkdhtp5qfV?`D)Zn#TZ_xaS(_y=`d?LcOa?+>pR`IE>qyXY z@#tEHxzKe5vY09!35S7RtiRMF_`Hv2wtB*EO;@^BJVvqsftX{&D6BrRZ2e77&hOXo z$GH*f<;H74Pe)HQJw+_OC(zXH(^q`TjK8Eu(hi5h(=ET%oY zetRqCCL26Zkbo(ZwA*G7Q9DPU`NqmW=>QaA6@|Qsq-1Jx zLx0WBBVPyck@Ts#JGG~QtEZL}LSnKWMi zZ$GNzaj{8QL%=z3xBPyr&FIJ4YRK)fpS+R;pXeKP2MtYPp4R15_$mI9bvM zaOh=eZ^;UNb^Ls*(QcZ@Sbz+F_p4X|G5R7xS1pt79^l02@4=qv;d9aA&4skOhyH^@8<#;rTTl$z z-iCuZS`+3S=Kg!it$-|uyhu=Z-x&QqCIh|+qe&>*q$XddWgyrc%bN%jxJz%Y%hOSM ze(0WrQAH?-vFU_u_%PKT^sIQKZ5*q~qsRK}VG=GSweA|%UK!(zJdFtcUd_V&=e+-g zqxs^4WP7gz)8>^qLVBaS11b&x;E2c|KyDQ+Lu0(ZRRjvCs95aFx z_3xw<{3P-;V<-&v_4i-$D72Cn{5!1v;nYu7^gjSJg;8Svg)II>qh5oTJ4otZ6w$5V zpmwKPc^K?!7I8Z zF`3_PT8Hra%XhWoe-`d(ztpfp@umOwv*Seos!N1`C{PJC&Psc?o|u4<%nl1kvQZCy zeqS2{K>%=9MBPHem;d`qD5jM7A0Jr;^^uj}0GN?XvdiyhL%s8g7)Hgj_j_$U4G`4G zJ;uQ4S+r-ZdZhQak=Dehq5;hC!~60yN%)$f|Bk*+SN{LL^xsAcd{@4*MjH99C#diG zq*aZr`fuO$pNMkT{}$}kMb?-DT=_qy=Kt$rrh%w|h5YXa796FdniwzcqloNE$p#dJ zy#n@ystZL^W<4F=BfzK9vHmJ(lK$I@mwL1(a`*rBVz3N)6!Zq_A?aRr_~3^ff(lST zXWa%g(QN==)QT>D=&*vJUA&`PVZ{6n`@u+AJgti{hIh8xy!_$K9M2Va!fRAC9UcP; zh<)l_DKL_QTFyCMD%&Wq6e-Kj|Pb8@K>pnGSr0a|cDAgYJ8v zYKB3(R6Q0n8CnITi{jPUAs_%BzZC@RQ80LxT8k2{Eq?(WIz=#ehE7_^`Ptb!sX+Aa zpz+5N=qAo3I5Uj}8~rUX_F$T%@c(h62xY0dxO(&XgC!ebfC0#^gfd{kS0@KJ6ZnXfT0$Qn@%xDZ5?_E%bNw^w#p4hf^N$r2djVfoZHSa_{5@=;( zBNmjhbuTPDR!L(|bk39TWgYwhd|=K#GXFvU<3-TpYz1JFz{*QX*nvzp*)N2FuAv2} z>SlpA89&6oC0EPRQ~r! zBz9DLP!=DkIJpgEymh}|Lpusaj9;w{H^%dHhzAM6u!E3sut0i3T01}mOGv;uDPL|5 zxA|Qljzxg$41MGUkfPyB_$ENrhym4lOZhjrWzm8UzxuVq|Br;k^)DT#@6QSLz`yi^ z9%cQB>py<5*ikL7kqV*2cwW;C7|3+~Q((%~rGbTQ1m$Xps8W=!um7eD0dLZclM zhHsWYb*P*N;7M+-jx}i8lo@rj9Pw7~B25~swR`}RBy*OGXFXTP0H5Q00p}Wf1l!zC z>|1dG?=B5odnbUmG}umKqfpI@Uq8CV{J}$*z~tTqgPo8m>U6bi)a+3 z`Hj6M2SleSZNO-y+kUy!cHZT?TE8_hS>$Bia%}OXNEtW>bbdGIR)%cAT#Nyo3lNe zTX0gL@y^B-51a)tvX$X%Z+|!x(tAy(tJ6{zYPC+}sFldu^EJ8wl314J|8VsdP*rVf z|FDFFbVzr1s&scqr<8v?7qkwOaN0>@7E6aSEWd#B*rkdS>ZMO&~rHPRVPw}&;=%Gu;p{2Gu90&tW z-XT`#7R4%M4BorskH%Yhc|6wX*8<0pYDXEQ)5p1>Yhg8tgxpMIFTaGl%$E)%wiguo zIjcY8OvWWXxGB7zG`iZ%;n?at5@*{umrY$kKR&G=XXwsT?9Z)Y68Pk3I?xR07jP&9gTSx}FcY4`;j8#esV2HYO~mIFpT%F{2R zh~x1>0$cL!n6ey+zd#FVV@+_m=u2Y1(EHHOXuy|aW@Rgx{t)*73_jBg#^r~|O0XF= z6(tRr@Go@GHinXsrnQ5y7jgeddl?q2a!7BQ zo7s9jgvvR}^mXt(LCNG#2SH(`PaX6A7h--_QuvrA_m5O0)Dv~05l=aU&azn+KE!th zGvqhoGQT+5aC$VdcG6#XGu&YDO_>%qzTr_3weC%X2z}x5o7acovaK$QpRu~XXNLnF zk9K!2heZ<-1cxw@r(2kty$xm{4ic3400v`7Q^@Wj2(NPc&J6h5fcioShGTYg3t0-$ zDU83Glu~KNf8<`52=)R&c=l4UnN85NBonoh?-c_=9v{c}j!lX1j{);L`GGtP24o5~ zD>cjwWqzby|8a0~YKnuZ6$4CHvq8h_u3Y@pFV#wTXxMOSj~@$SavaF_k{1wOb&SIE z+EI`2J~n*7k?12UBLye1WElJ;4jKI6gPhmTM-c64N~GLOC6md6BAi&pkE{wtBD|2F zGz3sJzC2ztpPF1NzjZcU4KS~Z-W`EpGkmNXu*~(W-g@YTTw-egRU z`B#cr=mgBlKEP)%`6mwphoW%90zf84^Y`3}v4sFoJZGaDsa8be@_dU6_k(&eUHD>{ z$o!bb3lA1QzDp|mjslIzJxo@aubn=aUSLAfn(0R>8s<&iw1@DqBHH0SguP$`8{7-eJ)$Mb+$k{F8S;s;)b3a;cs;ORG7-hL{DNdB42_hn#@+z+QfNcxhemH?72-hwZx_g8+ zJ^J{Bp)slul~U_z0D^F+V#;YgoUWooLdLd+6)ut||LdcW|y-Bl8{lp9iBK(W7*b@%t zT!o7$eAv)|r&DYT$y%9UBCBXuNc69~7S!JE{Q#$M44mqQG~W)Oj;Asw0=9d)B@0Q) zK0qwLmtgpm^TNTJZh`bu9&z(A&chEIA0c}%9>4{Zp1Gvd*RO3eL1u)3@D)Q32-tB} z3*OJukk){H`lHH8dgu|zWjzjm$&6Uj$;VNfHCo8g3Vt6B0TSssxsM$QxFtA&eIBmG z(iV-;lS@vcmX-C`44`)+bRc#qlQ`#vCt{7+AM+GR#kc&Ea;yFjC3T6Ogk&^Deyhhg z`P37JCZ$&Gxh4M#lTsBTav#3_Z|U7;D}O6hTaSnI(E=jMqDkpd4aQuPA=z@u=2_8b_njl8MxkX8o(-szpcXkMr1>BrD*78>&c7EL-Z_x61mBMCvP z1pLoPI^1D+Z9Ys{_^ln@dumtJ7p$QJXfcXuv>yhbE~1D4#6qd518gMW%(yCJhU9S- zBg*7kq@c{;qHWi!5Mt{?{}?mz&uXuJ&Pmz~07X1X&8I)K@fyex9EzCt0t~5jkc)W{ z(`hGrpFDoL{&r_Fc~tKlf4`x)@=%BzyzzPQt?3nL2LGyH(dB4x!M9586*c%ccP<-0 zYv7^bz{#cJ(6M`?4sUc&yM)7;rxhZ#UG30A{j2qviYQ`c6vRiWyO}84srSV6OSg<{ zxK(T%KxRK|MuuMpSvnMilOidV0$t>L8Jc1=uj*y+3LnotXGJ`K({2xbYrh%?iQotP zgSaBa)!&MMoFxCqE$GN->@w$PfqQ6-?kCYw}Kcm-G!DM6NgQ_>m8iSH5W5%T}9_ zTvAty!F2C3!03dVa2%bit3;2>)1QG~E+_xm_s6h9Fg&Bm3g~Oq7;gDG!vm@Ff{&-( zA_$-O2cDiW;zYIJ?wek4#@# zXLTngCK_|~3mY;`zt3nr5}S^9U8$K`)XVABQuk+Ofp=Q&@QT8GCl4*1iW~rKx830~ zaa!&FUYI@2U!7htAB(^wFW)l`>q$vhgjlE8`T$5V(w+ddxXebYX7Pt+5Tt>z zpj@JJHpXF4%QaAZmvRW)Q3!is*RjNvrnO`sM?p(D`+j+x$~=_IN^r71YBTb%Jz+E3 z;zAAY$d33ddE|ounok4}3SigA232+1CV&{I4=1+dkVhXr0KUEE@+k-?)P?p_Y>MyM zo&t?=MwcaxvC$qNt&xb&s2p?%)Rz?Q)z|MqovHc7gB9#VgnF}XsklsDjS5dE)?U3F zy4Ps7MDXs{Wu*ims5fu?YdAfe;6Zc$KDxTu_vR1vD0X+xfp$IQrd;46ekhI*{jCIv zF3f1Y?0=1zAZ_$Zy{S&*(>Ef^x*7I1pfC3n{^-2+W@(iVL4B&&Ldao&D=yvz)Rbrd zyM$i0`Pk(?Kg=7P^5MvVM}H9%XZb&ipkIRYlN?CeDgYvCqy3cq+IWHby;BxRam1QG z$nN9}$NaD1`!t8nsA0ATg$ zl(5UQ%{T0TZu#ExC{HDWyTm%4?DrB7U!eCnv0~9Kk4riQ;t~oVgIJ?9a*C;2oyObO z>s`dF{Gyi+Vv$~;ty3{I*d-4$n_U5txoVquB3rbgs)v~IyP^ceR!1*FNgQL;D$j}RlrItRXwk-(U_{(}Xulh&OVy{mgY z-k_>zvzl_K@unO{fgCwkxs{vjTsp`}fKIas!2nQ-L8<)#?*OyyBT(E*W4|=>CaE5% zx_M#oIcLztChEP@d!n%VXHrNzHL%pPaIj5iNxX&ORfYIi81w^(jxp3f1)NXm z)0GQ>s-H+2{7`?3^B=tTvuel3#1<=0zd*XK>GXO!0WUH|akqoDxcV*SBQ>j4<>AWM)-UNCU zSg&pzz40Jq(0uVS>d*d{T02I&r(vykJbK={b%#J<0THf0LMj}7JYQuXpuWL7@AS>3 zg0{);`Uy~`5gKX-N!Qcfd78c2w-i|OpumcLk{A85^HCGo}4hH6zLuXc1{v{)8!lB$y<=1}9$o)XSj7EO611PnaYlzj3dMtZ|Bo z>>-;Dp4Zc3tLXnwLif_&C#}`@kfIUn^}qXaZaZ69^pb^xaxtOjI~W#;g~1}a&CL5_ zP`&S7Jj&7@7%0x)P8N3M5Pi3uI3}_9wIHK2Nz|Mp&Lx$s8N;Xex@rDub%mVa7;|AW zxBZeRFm8-4=3rj^E~)mCaZQcDRwVPmc{+bH^W!z1-z_58n-t%7^Re}*uKu3-Tpo?g zQ{xoCXx*y?hhX`r5zmyPQ3#w@M!bR%!}FNUEVoWjLDj_%8YeOp1qXje{Hd?5Q44K; ztZnJCSBYNs)!Ae=-tn#sBp{dt{)M`mE!^`eBsq)vta2)2w%b>)ewqnjNv0RqyHHMi zN8n87645H?vq-LPc1;^N(3&|X#JSRssAe1(tG>DPZ8akxSu#3;)2gkr258! z+%QGHLBJO2%D3oXydPTVddr92F(t&yuYWObmtxE4iQ{LXWN;MV;I#*z!t(NQ?J z#b8WuctN?sP3hN5uRDwApGAIj3%&=D9sb;wTCitZXHpjJN1puxYX?4ny-J3JirxS5 z&8X}ncn?rBHxL+U6mM~96c*;nuMbZkAfaSN&47a>)Jsrv`{hb1l!5cmtynsD4aA}k zwMztVl0E`2uO>s({HB+H#FVrT2XRMW?H&UMgP{PB(dW5^q$h+pFMcUfTl}II?SFx% zK6RqQCF67tY{~w1Spu?k`2ir?5lur}wCnSuA%~0YiF?{{BA-F8I0!_HEH#Gt9pBLo zKxtF@9`L&ab(Y$LF6@O+2*d?QYYDz?9TWi+^U7*M!0oH3jb2UUw~5&rB{2ghVEY;Q zAp4*!IE{V*DMvg{Lnil?dA$I?LA~zmL^E-lLvMa=FOZFY%qag9!`upDBexo+!)1=) zbocTKDn9<+J-?Z%W5xg*RNQ6RokQ%3Vi#tUbe-?;X3sz}<*JTAS`~y&f8iU5ARdGS zYRVGHt;KMz5ORf}NFwp5>=ahyhHh4MkS}4eI@H~Gbn+6j;~fFgCG}uW)#!y!X9|-_ z@OU=~!eL7=)J}~OMvya8n-rMwL5EV;VS4bp-SXf2#rdP7sMr9RL5t%(($;|e7|~QQ zebZ<5(?qF*STYniI6Adck9FJG(Qdg70P-4aNxVBcmlU}I% z6La}xbE zEsk^@VSEHn-dq5{gF0mE)}dxQ6@mJ}jrds1!r~Qd^6D;ycqh=8V8+Hu0Y-gLH~>RG zc3tmBw3!~~w(JYK41-8`EMbU&7kqgYQGJE9TP2l&$nES)9H{r{9Cl{6_;$H02II_y z54zT`9O4X5aU)~$Rmtx-&7A;N##7G!Dp zu~1ST$GCMj6i40!RI53gFBN(c2)M(4I*VQtQFR^lmyS+%llWs|>0U-XI%O8|GTWq$Kd)EtgERS8BNHcW5dIAkXL?O=yGV&jB!T^XjsZ z%P;?Flid6?-64w8o2)>u4Xdt1Y7M_5T}-EJ{5Nj>pXMsGKZ}2xeU5WIwqY ze3tAg6an?77LXINSG;a>0s02Kpc6%>>}?bcHK^A6VEv`Q4rnvdASfB-TTtQu0vzgZ zfS4*%gLEm%Z~Z-~69-*yRCe75xY+#!X?l^7A{D~A4dzg1>+3!uTrqF0ldbm@Vg zG|Bi&^|>EVJ(Y{=7b91n_q;ul$?5NE?wD1J7%M8y?-oEa!0(g?Q7a>9kOKyKTPf7q zXmY4%>cZ~JKw-?N$#JgN9;VppPVEh3MDTcy*Z5()+U2Tl(!o1{Dc1v3DcBfnS99gy zH~$b~r#-wlxLvtVfXU`lp+!Slrdm9fj{1@3`yo|8wUsw~wGH&LBXFn_KrKTLWJ^AF ze*g|`fo9+90}wbp+MC0cUQyOOr42H&Yitbz>i=7YL~C? zM{j^{A}gsn!_hX$AD}3vi<$Mp36u!1evuxMJiBN3v%$<)9xS-yUZ-||9iMe%y5{0 z7qJ3JoZc5_=$_7|%-s$PAH_h7i>Az(8F|=!Zui*5h5Oon7pHxA{6(!HikTU;PW>X9 z-Goig__8`~ZztsVcjm*Xmtr0h<(JEQ{E30YwCIemhpS{WMRCwI>+4Bgwphvh>V3zi z@paker@ldpJzkx$s7v*-4N?TOz9bT2yX@Q_dq}D%`>ylX*Ge=+MS_%`1htElEX)#V zU7^(j(rC6wBp6PVec$c!tr!+n^^E)a2%|h2`T@IsJssG(7GYq|s;03FFn*ZEgu>}3 zWu?G7OEv^KzZjntpcPquXW1&A8qmL|85sY5yaIccaD|;JmO`B8qK{b`gXH(_5_Kcd`luz$NfPSh;apN_O}WOiJy+v zTnxze|Y+RCmWmb<2SjT zI^srmTc&F-1N6%nAA&-40H?ie&`mtL3g^C=R7?2SR}WoM7gBYdV~_j7h3uDW!28o_ znsOmCfAOqc8mayXp4-KkQBGKUt>+aCwQ&Dz{>sZjk0FM5f(wu>UPjaE^L|nh&i%6O z^hvz}WLo|#eh$4de9%MqW|l3%4cLwU4Z)x?THpV#r{ z!du1^59xaRWH4e z<{keJU#l!|4@{)0)UHSfi-NWTfv7$(&rby9%!ug-8f)K;fbGwoQb(1p=ZDYY3QA7Xgbe=`I1g-?Ec;nCSLd zKqU0cwBdD8E?TC5i}=OiawI4OG3HcK3e=t+Tr1vK4<$(r?E?IRGKp4KU+3z*Ka2;I z&opxYb&fjIvQW?{=18o&BPzuMNV{ub00N4Cxr{^!`TbYT5a7Q@N__{zNg|kCIDC7! zge2m1fJ;>YSoSMN*K4WjaBpeX5FSm=u_HbS5TG+pEUBh%rrd6`kt~Fk9GgQEkXBx5 zJ%D`E&5TUundTjJHX#M{V4omJO=9_5e55$ZGy|hO;n}k{PM~FeY4C1Kw`S|T)b!-c zQn179*4sqL`B$Uw_kwwqYd+U6W=6_WUCc)dFOw$unF8+5N$Je+vF*LPAFd9=>ML!OQI+-C6LjTyJuAqOEk-%4UHSs=!0_X`>?k(1 zM)B&HOx07|L*wcUd++38_}h+)2WyI)t=0R`iipb^hd-})dq+sROHx9r|0Tjz_O7_MkS+g7dIOUK$hmyx*s!?o%}EEw zdta$}1gL{n*D?&=@DraV`tkxXJ*@rt$3@#*>dD{dd z9jFgvF;|L{Jzs1*HcYNRd_w_a(nxmB520UBsPl;VR;-T!+D@)pD@J@e)i?`(} z@nMm4{*{5-)?*aTHtzkRkoyaQ0uA;BazYgbs8%MEV?c;4yw%p#s#9(O1<6W8 z)z^L_8oa#cx{hZ?-e!sp^tz<(NG8<`-%2t9NZM~``EDfw?(b{@Y&Bmr{XcmZpdI+hiLTs*ISsv2}2*d;l2i;hh1d*r!4M|WaBsF46t zB0TbTptAjX7CU0^^@@1#j>|z+W#&eHuq?04um+2XZ^`LJk4#-=uG< zgwuQTmpQmM^3}#|eCga~T(piMACL-|HM<<=+(1^&8+DUU%Uk7qzxQdp^gtD8z$OwB z=#C5oM96LfwS(|m#<}rM>|vXT5!x!%-X&06A>RRfR4HclM<4rHJyRe}=H&-m^2$G_ z8Z#jT01OXpe1&Qj$7!+<(7jmGsU|OaH_Mc=MkxOyPelKu7y|c*z&FcHc+_VoiCFOD zUB2S$A@!7-2pOyO5qPrLy7ujkNSzH=fxfhIeH%rkcM*RX2-Q8FZ}CX~*(e*%qZYR^>B6;x{y|MIX%1Y`LObc5$ePsT!+ftmKev16ez#$dt=qe+o?@PxE1*L znh!0-2&5B7o}SoiSsp_l+6tVASi|*IA(lpN_qK!cfTN z$ve*ay@iidgZF}QU*e5ceuer9y1nKW4*X3F^K=;ucYk;?ymUDF%Ruyyx&QJ^w)W;F zL%+&rind=(x(cc!{DO+V#9%mm_45LBY7{t=PQRaYiMv1#cO&K81N#@}1zys=k=APU z?N4M5d${JWXmI{U$Do#tc}~{S@0<0>V*KLA)#{PBZ%)m)oGWrMBJ`k=t2kMyFL(r_ zdeF|>A?6Ew2K5#;ZJPz;O}i7D!|URMOtWvy9%f^p`CWr@jwDHV6)xT1{wx>d5RPw% z<#f)x;kXkOQ?^cImJs)yR~ULzz3RxiV5Ht1y7QReZ^;9}G{78`UCuaAS&Sd47SHCi zJP4)l1rl*Uo!Ltx-;jlGqJIGCX%gbat#C4F5e}5iAc!nf%tQ=Bu;#kdNX2++*XZX_2OJ+bZKRV1%OZDNHOrPS7Tt_ zYYW_w1pRN!zJ|)a7&c2e@sLcyw9ceHB}@Sy+_fIP1CSl7pT;E=@)z2^3H<~P-gfkq zgkZq_{OLChDd4({7cK(JGS`XkWYJx#;T|*wx`FEdyvVJh%3`JN*6nUfiZ&bXJTZHECIX<)(GDpPl3S8H-RLfM&2I zd||@Y8tuP^^;AAYxmB+`bgeLk!4xEx<*F!-f7L3d(A57}f>8t-KVevp?@v{WRIt$m zk`+jD1q8^s(yf8Zhm7JZM9*$VL4T)cCM~S!0L9_S;o(nS={OkdnqJKq$5=QJ zAm+^?T!hi|0kaoR2WvdaX$ZmcTj>$zpcz@VvR zu>wG@2&V+!)1+5lRtn%Z0h?@vJ6@3Z|92V(3%s*bYDGnU^x?rfI{(w*N^g^+_xjVI z6`4AV%b%)4d0Ywo_GaFH+bC2XtPaGEwr$Wj#~u?>ZY)gDX$jBgj&2$V$W#bVA@C>JGwSr{lqxS_r*E&Emz2s z&Uf3kYYsvG)vouOCo-+=@;gp@Br@Bf3Ql{(*Ldw>xvDUhn|0l@+5*J=QN|R5j{eoN zHAj5w$hB*IPOT!TU&w;qOBg>Y7W%JQzYc<`JK>|yYSC8V2ZK3gq?kAH4*H6u)6M&F zB18EDNCi47_Mg<1S|^T=VY3`=Zbhf#UW{tDacb}$1{>q1r+OmW^(rdu!v5F!W}Dcd z50riHkJ7-T!b>h#<@|+`X+$)ueqG1*7*k0ELlmifxVfm}G=R)J2L5iz5N1^!N9{Ie zH6DHsbS1}#%;U2&!u7X<36yfzsvj5hC|M9KI=q1Eqa(Y+h@Rq z&E&WF+^^a4Ub+N=Q`PeQ)S^>Kn4kjaqD5tQn04RJwz~F-ycN)Ioe=7o==EUHxN{D( zZq$2$?!`6f|j zbvOTZr+H7n6-U>J<7Z&WdlW}7p2Vwjl=^q`Ta9|Y==<4yBosf|z=>ad1h^imJQqTP zPu^d7>_8THMR8!wp#2(xqy8$4Q6j+?9LRsJnx8MR=wZ|uN61ubm9dH6Hk33q0x_VF z$MY^OPZA&%AM;{72HtE|H8N>C7nvXTyVaws3t-vh;_IFcX1x70)Zox6LEj#h1kIzH zM@6k`Hby1*79VYjaX_p0oh0)QZF87!+rxtT^uP&yo%vzZ_Y)58H~vQpIBmbbN!=sp z^E4?bH*w~m881I{-AG)FA#r^87{3~LVOktS^S4)QRO}~?8fNKO;sML5yVY+>??gVuc)qDzGwD8;!VgS*Apt(>@qye zDratK9u}HWtPyzfGCgg0Pp)@W3vI0x(!!(C11*C_#Uc%GsEr_IO|mOHeb(M25Ro0+ z`D}UoDa~nva#T!=EM7e@O8JabQ|okmcbkRgk6WCu z`@5u@jTsDC&)LX3T|cE)o5FA~y41A$OxFA^$Gduph0LVvQLW1&+B3X(n_WLRPTtrp ze-*sHM939%OUj@_x2=m1k&#C9kqhb-s~_52YJK(e;3UaoeL=}PodtuPocu8Z9}`ui z%aV+V3$Z~q}j1 zcK)p1mk8co>61h>dkp-Rkao3j)6~Tf`0#u|owCyG7X{jucvKoPN$<|0&vNmmLldb3V5|f7x$Ll_`9rwX5eR)5;&{)hL_X{LeIud+qL|!1iuW%04t!kWiD|q7~rJ?2CS6cZOzWZarh9 zTffw*mbLFz5%a=pO)=zrV`AmPWT*8`{>5h6yZNddD|(mt&$>$Qh9T+M+bNA4y<#Oo zit36v!lBPp^h*Li7bK6(V;dn#-9$|H}IEJ~tG z@NQ4&b>*=BTGNB#I>YK^r|ULtH8y=Z>l*#-2G z%NSZZJd*Kc+!lup8Pq`YU$fWb-WOCR)dy{UIa2n2nawijwi)?Nr-4F@K$FwFJngW3 zpgP8-jY;=-`%=C@T?a>!pd#iP<0q?)-IPVH-FNwrj1=zF7mq9U7=Q0QHS#+Tpgq4# zaHJ5w8l*|%a7-xX+AwI7AY?8l)O8CO%tDJ>_E!Hf{HGVKJ<^wsix#h-z9Tkurg(C6 z+!>2fm+4E|ZpuTqavK?|E`LSrhj$3RJMzzNIB}${R73u3H60o~o|Io|ky)9vUOTx= zEQ#keru;E^PfxcX*OO=vh*QgS@bf=dfbLvBWFrhqLN}hTJL3zwS)U*KQd^vVjbzz( zPFShqTaKI+7>Ff3xVfHDOQ>ZV42O}|4-~r0+nyD?i5K~*v`*7uyvl2HWa-yAjUSjd z=FNO;Q+s3B+iBUd)Vl7A6?k`S#s8k_uB&I*GSc;Ai+1Ye1NQ(O2M6IO@eieKJ>~Bm z*J=)FXRIIYrMus)OEkAtD_%|Jlvj8EHENDbP38Bz?V#1}UHm|H;<1h1 zLYJ=?7dtq`PTXiYit@IvM;`GzWb?Tsq7rzm&mWY37DI5*pY)iuzxX=$jk1}n6T{No zeXpG?igVLtMbm>~kK@PBFComP+|Y=tOcIK@cFN1c$A6oxoiuwGSHC~IOJvf|q(;|h z-5jj6vm8s=pZO9YCSGk*5s&`nltZ@lVV64mu>JI5cP=h{G5OyH_d!zlL#|NM5L zJDZT!tEQ|y>vDAOGF8!p{P7#62ozFYu2+X>8m+q~ifx4)_=S*hx>i4rY@%@32K7ea zl2I*m1sWcRL~Sv(y^H2el9ga@j>_(Lri`BUwpk&mn;Lunn)L83%=q+-Ix>9q(->#N zRfQN2|E5^R22y$-K`CV1yyxA5Sa>@XYK98^e{3FyJQ0Ps7C!k8>?I+e0Ta0=UScwX z>sz$J!1!keTHRV|h4_l_V3$~R5Y`!4&3b<}{_^(S_KTGjf{Dj3yYg?;vgmA0c(>XA z?h>nkA4wS-RgI00{uZ&0alBbSKNYd}+rDcW&Ey?cNVAg`X-=*^!o7fIYm~m)+wp)? zykIM6z{c6O>i8Gssp_5C$uev{|Bnf%8ygSw9=X|Hsnwv)M;VV0{%gdyB^lIff@&a| zhU^w1>5P7vn<)ZgfZvJ<+FH44`vs3XzY5X8d_Bt79|QPu`!z=s5?z9x{DNhnJEmh- zg>JkIGQ!eavRv5vw$qB`oA{xiK+F~S z1JSsgqblYe1qH9|nac11?kyV!B&P$I9{Dsh`>}LjxBm(=A?|gZ@7vlUp z%3P~S!D?(aOZ^`gNg;J`)^3uzBjbB>Aw2Oir2#&T5z;ggXkO9bjd&W%6Qo$37%LV> zGP2*km4~sg_xf*^Hw*fnu*j=kGnrI&XYn8&n96AmHS;9go{Pw<<{=Fm9R50RQqAi| zoZ&L-zZaY5wYpU$At$%I*^;W%yu_OHxqI2F%Bd`2g#n-6!$;C?-tP0eGyfjD$HULb}J@}sxTR%@0W_tx%B_YLezE?Sg17X_Lj zCVwi&zZCB)ENXsUVP|s>`1&PJQB_e9?;vn*A%?mu4}n;~>urbCcaip1i@logt)&Z2 zRih*N&o0JT1$<8%1FjLj8#bc*i#!`VlXEb)v~jS__+VO>(QCrpHzN0cZZ8hp-T-3n zHVGB{_F{$fRS_eUdo;pf-`s6@_D9DXZTspQ8a}F`2O5JDYIhQ`0IMMAiS7sUrL$utbZQMBu-;dhezADkF|pD2EaRoBjz_i@@5!itd&}zHaks8d)`twPN_^g` zM3~dOb*`}IqdulG;)hu4m-5kC>qqkm@tLA7j|Z^!62>}F{tkvOzy94aig}(g)&5rd zUs00gpH9+PAY#tE)(~Xuqmu*AWVl?`l3{OQjxj@msRKgMYL`h7t(50V(2AXR{U*!#)W)$dOriD*Y`8QuYibG(&C zzTXWxP~>8W6bx)xsKi@w#w{3ioG#z0bccS5jb$_YWH+UiEAECnwcC}DPVLFNHmX-Y ztp=^rhpt(`%|!q6=b18ydnLipCd^3iY@q$w?2HI9%!P8$F1u}k%7+VI$nlXh#EK)J zGk`clnyN~pjOaj~N6J;fNYEggc4!#~HE=QHMFHlJhoW3Wy(u4Y;N52=^pDro(X2bq z{mAot{jl9tV$%Y|cV`is$)8CLQr%6Cb_Cv$ z3)Gu^v&nm9@hifF#kik+Vs|b050fOcZvP76`A=Wh8!2$F5-UTxYnF(F2iJn$xx(m4 zL2AfNEDO{NSpD~TC;%MO|JUJ2XH!hq8 zzwRI=ol6ZYay>!39p_5fi$wk?52CEYf;SQRf6qhk#g;U>Bw$DkHUSiCEsL# z)y!r9D?8i((L#|8>Rk*9`B9eg?=<9J|6e0TcycK!Gajr&6a6S$q#hZphzu=Q2$tjx z;b!<=`;ZLn`0wrmUz~xTYZsSGbr{Y77kT<-R?rUI5eNL<5ufEtBhD#;Ya=)R$o|i@ z#UBD!B*Kk_KLmf0J`|6z)e#wz9KZ*-V?9Tu3$_8TY+%Tno*;xPFbTvXC5;{$Vj*Pt zTN-dZp-v9n+UjAC`p2q*=b{S!mC-G4I&>=>a0h$PAyaT|P2lt~f$^?jp67rCxVZw% zT}s%+T{IZ`Z7LrD60lbqo9+`}z_&f{L1+X=f!SyCN_1s)wbA;=E8x_|IykuU=zkmV zbriThEZ!aCPvsyPb6}&$#e^Fb!}Bqz*1c?GKCnH-+ZoUzE>QWxkCXoD>O6|36O*1Gc!q4a|Q80=c*qtaiEif42>hLF;m)|9zj#KiA&_Je@iWPAU~dA&>%zp%#8r0W19y9`mdW z1X+bL|-k=Fob6croWxAm(8kDNSK-2bNM$B&l) zhh6*191sg+oDs1J#n`h^TGDaMz|!b8G1-^&Im)S2255Yai(@ zQ1beXbJ*Lv1=kb#XH1URdM0hcnXiv4oCWvE)iq-5KZoNdvutw$z)h$sex~bxm zcMp2!mI#T;c$EPeX3p^akm1i<@a!R13Mh0h94qg8krWt8Oh9gnLHr55Xlt%D_rHp^ItJ7}2W=&M^$Cty?Wdl@S zm})`l6q5Vt1KzN^-LEgt|v!NCsqOUFh~ zRHiJ)0_@R+-hkjcMtma_;;$u@$lo9K5xNfZpW$rp_OBIJzW7bdT)}YebLB=jO%V{~ zV`2OBeDHGRWjsb-_?HzAfGr$+IAXmF^j@o>Tq(vQJy8&9 zp4T)nMCW>C26S|EN5jkog;v8U%B?_i7%0S0RZpq{n!CXZqn*-6|0s&#Gqb zPm}0a_H2uhI~e*a>VWj_LS%hs?AtrBN!Pc7OZ_KH;RWC^H0e`q#7Yc$7X`)hb}c=P zagh4O*4H^Un{)M67H`&&jQ#}`4=hmH7r^y(th<`uro`F&x6Gme_dx}M;9OsFXOd(( zi0?L}lzeV>VghNkg$*{In*K2dxp*O_My(hO!2M<*w}zdmS^eBvcZ zcrn0Y){k%birBjxA`u|=1jZ#Q-$s{yb$YwH`Q&oHg4%N6GmK^38QTcx_mYsC>kUfJ zdhz0G2X-YQdNNgihMkxyWgUarqrWv%*OJMVG4B5kpfx`5po~$*po)L(!0mM~fv1TX z4$g`q+3ej(Hh7SZ{|>0=?~?%P)vH`MT}vNuD}pE8#-r1d&`2J-J#HtRaCA5?6L;ps z+*w!y>DS$WgLy82wt2xnV)9RTOcuU1?jVB4WZEaWNFXLV!%rWq5sFY`K9EP;yXg-C zit}Vdl~%|)@cY~PU4{WvonpcEw*ImToQqkdxYunyw!d5=3rU@!*KG->e97Aq%dQ$X zkw|BFOZR6mp1Od&`TCP@-d^JWPMsz-aJcN^t=yi0dar#cG7|$lv?~dCmGgulsRdk@ zSweSO0f4()^YHVBMn|KIU$1FDbGbhleLM0==|2e^SeCN{_y>fiVT`zd-}*$5s!V39pPQN{I|=eZeU`KfKG2;~2$doD8TiUa>YrIdlH?e{kSEl% z@FB%9>+R6O!Q5{g5n6;=f}h!ths}b(BWNUb@8BRRxq!2dmha-1^Gsl12GN_v8Z>|A#w$WysDvjHW3V@|06RE@?02Y+? zQ#J4ADNrHFuhIw)6O~jP@@Jf*q@YOi=GLtyt*+J^ z@*+Y3q%cPoV^356nZk$%QxF`yF?KY+e=vDh7b)8_JfeDcT64j|ggtYpPw%E`?U!W+{nz% z?wOw#k^7gyIN)aK`~)bAMkoeM!;9xHA!*_eo7hLQ{??`M=AF3psi&>9;zWm=$H;h^^yDyNKwlO} zkC(nJm`v8*P9U}-=ySq=O)VY=Y)$)sO_CO}w_5}ZOeY%v3rDVRVSxf2oHqK*S<0&eSmM4rvj+-!5;8S*Zlu)mY*2V`w4Z4A^_8f(ew5e>^FA&4_cQRKPqXrXYlp6UWhg`k?@Kls60>;^xO6PRD`QYF{CBHJ*5Ummhl*Pc zU;-Cw<@fL5yni4f5(h5+Z~I}_D8A~GOAhl@B|{pJopTi+0IeF(u_avb1sC2grr?%xCsn1zDXG!x1!|`1$6=qDw?fkt3-^NWaLxgzuG;56YS!y{D`|> z^ySUlZ8AniJrqQfklnICHp_*y3He_flte^Pl5`1cpNtU)i~wa^xUvLzw8bF0p|G}OCan~J=UIk39N#Ft7Vl#tFO0r&%}f_a|j)lV@<6tmY3bHU%y)YuK8k!IAbDl zsH z?imUYIDdqP5zzgo{|FAopE8JTObdxim{aHlTsh^+N?8@*?U>+^3%|F60hR@D027x4 zgzDEeHi-bX_ZOJLN!r@7fQZ3ke`Y&buaCpTR1PfF(?%wre)V1(!s|F5AZO9<5*@oc zO%!J4;K0<<8klcqDK07b+2P~Pc+qu#>izYLcioFl8XPe3NlCaI zubR-nJ0SqHF>Ct055@g?tj}!0IGOl8n{F@AKl%m0kGfhieEudTCTV_g(a};M6`~=% z{Mc6Nh|M~vWa5x~NJlLZ>lL#ag8L}6Lt>mjl9kCgQBz1N#_ zXdKsf2%j-eh9E(ev$Edjoe}M^D%Jt`Wa1|ijKQ`BODL2(vt>E!BcV`1KUPEx320|# zgv?OYf3x>B8iY=wgOBE`pws|Dg-^mYtXk4)frL(3HK!QRm>=r9_Sz;?n1ES<=--+)QbIAWW@#K;Yzc@0$v z5)#tX_}<>0j=v-6C7=*$%w7F<0sPhDIh%qiOQ0$k1=q4U>^M~$0VtL;%{P3R(EQMX z7~_NNUa^c|cR^6Ei_#aqg^CvjnENeO8wiUF94M4rrKRp{a_2yofOpfm*By|4F(X06 zTfPhl{Z(V9{kEoK(#zc)q%}$rn}BJny=bNgvP&`GvzxO0vI@M^Bq@G@@Twb^>wz6} zdEY1KZ-X}pInr~#nuc;0^HQRNfUmABT0MkJq>ksHUv+<-=*3&{NhQJLli||?A+dT2 zBOVdZHwR(hUwS3v=1i)M-4%CTK0hgn;ty!c)Swhn}M$FXjd08UkUV*uaQ%{-1G}&I(d#8z?kre)Y zOLGj#p!davn-LIoRphuh(`Tv7-7Saj9-D)4WMD{wg0*hz=2?e_F$xYSkpAD=1yj%_ zS@w3u#d3asC_r%&V?b2Xv)-Y$lS(YcBQFS|!&bt{PPABo+y%&HKkjnG@=Bfp))%^O zCwC{SO6;p+Yt$4Kd%%ep+p_eIJ$~r(=g$Kme}EnalJi@u1{Ogw$aVpn_{~V}=J%`; z*>=Bp7_i~>zB`}M=G$N-cv;$QRSoCXfzn#fQZa|em9EaXE0=CTJJ!!fGTqT@xL5uE zqwO!Fs@}W)QCJX_MnF=eQxFsoP!LeMyGu$%5Red%R#H&9r8@*c>0Y#?r04>qdr{JL z=Gxc2_qFfm|BN%vi*v@%F$`Gz;#>1m(^g&V*Z1$ihzByFglYvq9pT5hv@j_xesy~ZZVMmo zrFk%^oYlQr*RQIYGn9UTGATV0Wb{9diZ{jym1+qVeEI72ook=vQfpG~|5X#lPmPdi zh3M1Y=BK@=Gtsu&L@C+MYNkw~v%3N39q*It!Cu`LOC96c>hM@!ta0C}&g!AKm0UWn zD})N{FyE6YcFlcPesqd7ktju}Qku<@s%2)eEO@THi?7P!<$s$fY(6_>`s04<#nLVD zMzV)hbmS#iCq2sCdFxRX!%DqsUO;u^s_>|7{TEMBv5d1W^S73(tJNr99w^rV@K$| zTJQ3oU3Q`I#I!fL{)xv!ES4!LI<<+fmPF18QKF$T7C;kJ-V9Q9j*=f<6C(9}DK_yo zA)mM)=5uTDP}t^}t{qw7WcQoRjoxQ%MnZSlk5A&_j`rmi-u+YQd{w6J@H_72Hq`m+ zz{XO;;%F$}3P}+Zr^ovm^G3|V^%8+ZS_IBr&*EOMsW0BEt~i2o+~(4Eravnq3Nh9Z z_&XGqRSE?A0vq+YEA??Vwd5bnL2D;XOGqaZ!x$A9EoiSDwvXez?GPF&=rG4F;(9fM z7a2*H7RRNxtvJn{U{{u3@R2S1<=g?Z&HVklS>hUs$KBUz$phDg;L=+S*k3!3stdjb&;_;l1JZ%9m^`JG0JwQR1A%lr^(I!(an zlC4sBu)kNn9@83Iv?KE!N64N-9%nz|6Yk9HC6~{Z_7TychjV>WRR*38mkQmu;?=5@Jf2;oeRlZn z{mM%Dmf+S49YM`)-i~Y^-LDDjVLN<-aR~WjIohk#VMntk!wNpow8>tWS$X%1wd-K2 zMrEc>=H~M|3h{{-CdjX3yne~~$9By}KlAimm7cHF@Ru#-1uT zdw?hnHyZTS5Ny18(qA;4^4gdojPxocb)W<7!e%|2<9180okCt5wR8J{=?1&XHJ3XY zJb2WK)&9bc3cou?*tw?Y-hF!e;M#d-yrizdQutA-3!T^p#lJ!L$KqIjq^lrBL)p?- zu6+#D;LwQM1v+vC;AYFd;+UP8o=*4T+1BSBma^!+`1JfNm5G6Y7T#0IH}a^!848?z zfN=8~CXj|&x3oBnm=+aC)kL2SJm!BstZL~8tt53VzlIFQ-GkQN3YMQI8a%I*mfk7o zQF4GIufj-Y`?E+nIzVWBn#2zdj}Tq?02t7E%5org+2ajyal)G%aiDJCITXfzAYk0> zG*@;LIp0<{tP}TsaGv(}d%?*js8LlLJlwz)a~YbZC`N5@lYWz}>V6Q0jY3V-&c^5% ztIaRILLu`}_tfoPC+b&)D8|{H!+BMq%dxsL{M1~%5=xkFD{!e-!7`|DPjaO%jzfc7 zKtKTUI4CLO~{i*gOVxp~H2N-wKqdCl)6%Quqt+0{nm2%ExAX8N~xgzLVR@E@z0pd<64 z;@WP+1(0cJ!y7X}HdQc4sUq&mO`d-i!|y1TZcwf2E^SN>X#yF~%UN6eQ91&d;WBky zZ)O_zGq=$B`jno^IiFRb+kFO>OO3lDN=`+KK`4S7zftj9DdcY346|wft=TS!^Y{rf z$zdL%%Q>$Pw2R7^%xQQS5@e{ztMp|C@?Sbdc?S^)#IWEjtMbP@-eK#}0{?yeO69jZ zZU`g4g9s+?LV^i!)1@7`)-d-B-9<{RL9)$7`0X1g6XEnmZ1*s!(x&-4lME1ZZg=R- z&Qrbn%)s=`vGWOf!oun&IbGa+g^}n(w!Jsz>A*a@2mkqpZ-kY+Dya;oT+=8E^EZ__ zlZfnbE52`1RVt_VC{2lRD94Jg>gyKGONP6tLp#lcbt-G;d9^pLieEAA|)8IZHgBt<6b#y5iRR zW9!24nP;ePFK$|N(!B!0M&C3k(eU>+VMZZMGb*V)W2;{vYc4;JT4JGkTU=;ioluR% zGEr0UW_OG9-xFILQ=cJx9Wjyd=+s8f;LqX3zu_+6PhR^8iNV=;jq!0F;`Dd(+X9Oo zAWhU&kB*KC%DA8)?0|~D)L|Z%3p%vI7GwAGp0Kd_;r4vN-fM?Xdua3mBxS1>Dv5BE zpnP95qU-?v|>vACL6cn?d{5BXf$fG&s>WRRh#)!iCnAm07ONypePqLm@u_?UHujd}0%bes4`ry1ae30vrshTPI zlNwbf-<5N|u{vX4CeYsdTZlj2p0x3U>`ewH7E_373^oaLM{sH-AMR9|4Q%}q783rH zXZGD}^o8bjS*u?~A3NW{eE|Zl4Eb@ScXZm^fzFLc`EO3~bUroNMc*rf-xd&^qwu|5 z?}^oBDXzGcr#fDFVLe``zui$RmA-J06lA208;5DO+4t2zNTf9HUAIaHZTjhna1|fh zvXsO_Vbb`73sIE;^o;k3qkai5{>}|B`Y%T4gTg8^s#+Dj1Yl?Y6oC_;XZuH@d>Dv9 z{B?dNd$r~~`U{I?mY)$cr}>`UEn|h4`0`YxZcqQB)7p=R>pWkaR|TL37Wb=(PKhAl zcL*JJoB~YcTTcGAyOsJ(TYtZboV}F_p3C$tmJMU~^}EhRUq?R~6rO23&1F+rS>6FA zD(!iLf^@CFzyHt9dN}{DY>pI;<;7v`?d=tK=MzJ9{CYRTN2wNItD$!374NW#YoSUq zh#2)e8`e-X!p%Dwf-{a}8M>4!dLX82niZSMs_yrHiMqbJ_~Q2?qZg(fCarh@`-AlqwD zIx|1%Pvv?m6TZztsLIsmj*@fc7mse5>%~)8@pxE|9&TWCq zhHJgi1EQ`RxJ;uH{yGo&gnlK`FPYNlYk$!|7(GXL*DzllU1+!edMBVTlj;#-HYEYQ zO*5VcmQ)9-qZZhgpM1C{{OIcE&sha@jgL1ZVwzFUUc9L+_yM9qRT`Sb_fdiwTWtmO zyJm>yb}R+=s{A-oG#vatztUI$?9+dDdC@E!4fgS_8S>*$GNYGMecZhcI`2>j^bp?* z>MACupa{J0yz(1Z&)bnR!#0U504I+R*qik;&?lVGdKNREolfpRs=nBc_J#OUaVo>yJ5-U+OCm$OSOtj-}Ldj{Z-=PhmmFF<23_R2%3mKjsM zPdX-^BSG`#Q$+05kBwCR1gW*8#Q#Mm=kVZ84jC0cW{IM|X>rNn1>9pkdW%1IW5MsG zNbQ*Q>uD=6kaFZ&Rwrmb|Gjj8qvk{ahWYrkhgkM^H5xGXg@2>O1r$mm zn13u^3;mMb<1b46{o3IDrK!Nz?Q$d8{MZR;9Ir!;)iIChlLKe!4S z#1+4c{&cHwaay!`WA=i*D_Ti`rpm1I?p?(Zi9#z3@c*uX|0;$%-Qa`n^~W=z4@|06 zJD6~2zm4Wp0!XZiHPAqJIW<*Z$ba9vtavKi|EdOmb*zURphYGD@1xLKK1{epF7a6o z=J5CMO<#yvYyS;f();4WckK)Util>@fH;LHU$;7Sw8SCJ5);b&?yY6Dszo6)LUGJ~ z7<+wqe*gN3fR_TFn1^_Q7y1=@-F<)1I;OMMNY1WYxX&G{-$F0o_LI?k zR%j?&xOfHnVzoc6h5r2!O&4&2Y#N=GBlX=9crDo{2sWuZdm=fId=kMO|CVRSifh0J z-X$YcZodGZcKz`{wZ-3~!`Fr>oa>j)*+J&YPG(;+$HbTS#a_;EIx^Xsd%g|q7w$gr zU60}5o2OrXllt!S=tlzgxiJd+knQ~xnBOZNnR@7sUtD07=4hO1^cMR7r30L<(sWDB z4sSDlWFVBA70kRDwVx*`E2scq-? z#h?ESDC2-8NapyhmKC&;(Je#AaOch)sfMY&eVJ;9dE*@SltMH&y~~wmsC>qhc6Yh~ zBDf@_q%t5qY;$u{RlNVo1B75;$?V{S3>7B#x>-o`u6&wJI&-xb&?7ce4L7#7ww54u z_6!gIDU?{@>+oq`Zg2>LlNCMP{6en$ce56!#&qCeDXZ7W?ORzCJF8NcbGhQ~9?AGO zbqUiSm`S>O4$QCrEMeH$*-0%VwFrJ*HWbg{C@-1&;8h+{Fze~*S?aoF2A#rKh8B0k z;pR`#i+*G2{I~5ZFqcn=+4>^?9#G=$SnAz-fF0)HKwfz8ppkp}U~4E09pcN%2oMe` zRY9&PA!IbfBD&oIZQxAOf=y zW@xQxYeL~LS{u6F9>Dj{+!+w?1@5by`2J+1A^Pwjf& zLayqUOZ^+-Pk3gMcx-n+Y}5;dPB;;He+&_ql@u{tSKL}>vwfyLu74`Z%1tbFY$Ty+ z!+6qg3qx38Hu_M`WN)-x=u}JXS{a|3n3xOaJW50z5Bz3YbWXj5dT) znEjxZ>+62=_cQ#9-Tq$-nLas6BWx0J=uVtBZyQKBZ}|I}>AO@Otlxp8l~>8ht{C8H zI2A4g?|-9^e&$5;0f{3cX(Pg=82UZK!Z(f_rGshy{c#-~?vKRBvW`!q9r!vvg$udf zOUkGS%buxOuontAq0BmSH>`cM^nHNgy}h6S=46r0Ti97Os&>A}V!ELPVqq7XrU}q& zeX|KB);|Xa_(M1zmV;!Kfi5O$Qraqq&Yq9hzux%d%ddvupx{3L7)NZ90c_tBdgP^; zgi_)}a+gP(sc(W*WSfe+UAAJw@Ol(OUmm=oe8Xg}5%eU^-tM^f#>YBZe+?5d|ES%l zp55VeYubZ=WoBju-Mqwx1rlxRvV?ARNQhY5g5xYJ_9H1;6ZG#;h{ z0Uye+{1wawYc}$aIR9NL!hOz2I*F%I(=YrU4n=)?z+8EWlle*F3;;GH(05)<< zVc2BGkP1gc>Mz}~m9TDnnxmV}@TudpB zMm$|2Si7+h+}bM1pMVPc67?=4Qb#?Xk{IK5P?as2rQ5@|)*8L&vhl{v4pv#zevC4PVLRxOxZcA&l&X@Emaa!WaufL1UBEGc z0p^{EUw^usIQE?2bn$}CQ(r5BcV;{DS6kUwv+yil;q1@h?^4~seXwIj)qtWMrNwP| zU(GVS5whDD5IeGddq0!;VJa6=%yWV^635QmDpYPAPpxVvQwGyblD-jVzc$Q#z;uE` zvg5(w`N#Tfj@Zh=H;+b<7a7C2nFxYk6 zM9EL|Lq~9zl&NWK1+$i+i0a9XH8yrw;KaE%1#ME$8}0VZWSvJh;DTx!O-lq!KJCpD zIC-`Z^SOxq&dEPnix8*1+G&S44UDS%yw>{3DNDM}n50);>^Mo#%_l}8QalId(oyqh^Kv-!#%OKlkCOjDBexi@?>4y zJavyzs|i<%ZQ{2c#Rt5%73i-oP7R|f&|3!0alL_4{i?{wBF_GNw+=l(ebC)P0)js` zK?%{^Hs4i=FC9kv9>}D}$9wBm?J?$sIwc-7 zmRT0rIB~)R9OxbSqWr(k#^EM7b3yj~AxavBn)z?wJTF1JP+uQnHq(j&HNtKI|7SOA zwW@vSh%z}jd6UFz`n#^{{R@IE85)dH5g;eTEU5j}ymNjX7eEU$NF|LIaVztcis@#- zAaN!4Jz+&!5*Tn}ne{cV2B~1dIyO+ANO50ht5e+v8j-x&XgDSWkG#%4r?$40U{t?) zF14_npDQrZs)gF0`1zvnotfOMk3Pmm*oTFNwg%F91%c@eF+9wlY#(QVcNmaaSjY*9 zwC{9GJnvdwgDi^JSTu4SZKz_4*1ZDvq6)$0EJz6nx|!Lq+Q%Rmfr4d}a%3%u(BFNp z)I#Nds=NQJ*5Sk$^~5NM2uDD#nN1jA&`;$SFin?1#j3H|XA@8>VlVeAp zlGpXNzP8u#}UchaXcM0&?oX9kg5-- z5vVGc4Q`-rv-@6vP4L?`ub^NDPvglOyN1Kd+qXGQ%4vBmuni3jU+%7uIV}(R>om5b z&l3HqVsxymtj~Z$efLooYqNGrqNvuDBH|@ZKazi>dy?cX?k`5YTx5f%fhD!RzCK5A z^IzZ$7<_Qx>KwL(E>EEu4~z+1+sQu?nwtFWp;VtFD=geMer(~HnVYLs zJ6Oit;@PK)ocEmnM&`U$_0n9ceGdjd@L`0`Hqe%Spx%9-e>Ez#+w&dvGb)4Da%3Ql z@GHd$12_=oA144+mZ3M4vtmd*3+Zw9F)=aG@rRw4;|5Ex(SGwn_w?r@1^I{2uYxGm z>xkpk`cHp2dKTQ7V<|%NwY9dg(*o6}$}7q=-QYOJed2y5(r_??clT_F3I(#Dwzd!o z)fy+NkM|t!D+$!naGpXx67gQ1>MqY;oF#P`#$KJLsr(#DKhh;{%1$~i-M@L72i>@Hqb?|uNcN}|5 z%p5JWasBr8H_)dsXpJf_lBu;>9~Elf+T{(=_@zVkR0i`WGtlnbJlDi@-EmzI8Jd@mSBQ*# zgCmIBPrdXW+GtD?yqCYGh*7hbDlGK-ddAs_Q3!6rWzX|7PnNp#gIOZ25JME++=i{y zwoB!SXCq|mfI{DIbJHHQi|NqYhnI~jf@w+1fYepoOCz^fx|h|y45sy7k1=THHhS|JA6WVaB%U~SwOluGTE%tk^vHyi*uwcGsK(Z7X@D_@m z9hkRi(-s)J&1>P)i7KAzQA~5|EikTS2M@e*;{ zmc@cZY*>Y&`4-O==0c+drYUK~QR<$K$5S?LG8ArWW+07Hph~rM0>`|gp^8AApgjp5HifDM82s)tL%~n($p2s z$-y~>Jur-{T=MlOH>ACm&0QqI4td>Ehg&uLdH1Zfs;+BZtzRoP!S~V-0rP|zvS`qsrN}`)neNc z_ib-$`Z=z!CM<4$5^-_akEXMmEW0EVMkB|DoX-}oef44}cng@6g1fRa%|i&I1^O=h zK@_t3q8wMa5-JrrwAR|->1_sk5lqu+3X*)7c3RH*@Fr%zRnok8@nU3yxe$&!qEb!W zrHshTO!Hwpf-W06LEDc;a}Bh6a8PW ztXD}x%12?DvWuRGfItZ+W0)&un>&)BIK)1#sCcgDCC5A?QTw)itXdR7m1I>2GWI8G z`zqVGWtUP}J#^VE=Ix(b9oW9RI#a^Z@4hY{@?ji+x| z&JD|uscb{pD*0`zkwWc!yIDCmN{+jW3kx~lnq+x>kd?C4)Dg#^eNx{=W-Gq8 zXj7m{1i?g(-rFEI4s$lDm2FIuc5hrEVaGx~Bv8$2X(&yK4R3S%2 z8o~?TRub=8iev4bT_ zDm-o_|DP{-%*2V3aVW0*QwHyZ%9K|SRqwaykYozus)HU}m%|0{Y5a$5I*LA;FR35B za6|Z&YWY~D?ZjmT1w5@IDf#|RQLeIwwRJ*qM=OjSoSJ!Qt<^}!PR{^jQc!rTJ88rb zV{Gh@w_ zQ>IY?kxJcdIlu6F7mJ}WD8hKxva!-F%JGt51!k7)tfZK4kT<#RU)vxZyZXO?XA)lz z9^^cO2J=^&wV0YZy!D=W8>qi zVj2kuCV{&!k>9q6cgXwr1rN-?99nsnSx-pVc>bkRCJYf16jvP_AIDl|$lEOWru%IQ z_C6fi#rRr`KE#z6E7WUTzizO5_oRwB5)cTA;2#)a`EsT8K-LN5NSFQ zZC=kY>}A%=uV)SaHXeohq+cB9m+j(yd#@s*i21#Yh> z+5@_IuL7ojndBLF6nv=VMb-S$Pcf{Z#VlJ6fh_(jSO zDD?swgGeC@L)(w0u)cy$v@zIWk2&o+AeIxZThIQ6CH}*v0I9QIq&j;K&4M^6RCTj9 z{zb=y6Z^uEeqM+Mcb`V42jgEf&_7p=Hkkk(?$%D@ihZ9hL+@(Fg0xv zBm+v9Zh~Pb_TO9mx34Ef0&NGo0`}i@K(Pg*WgV0C(TMIyr@isNE5;)*xf-Qrbd}F3 zz>_)TwfTs?O|)0&DF(F)gD+h0bjthN0CoKL+5nmWCuMTn<^Nx^*)>#kK;p%SCPta8KeE9G?S$UL#1V-@6Hiy+QT=~xoP`PydcZV614&ibA3;gP?yv4-h^d`i z{T^`Dj2HfhgU@AGdXz%dy&ApGy=cU|XkPsM`Ve%;+wt6J|J=ae+VDG)@Nfbsor@Fx zwL{WZqQPB(mmg#>Rfqv%4+{!v8vXRk9)%QRH>jYQ^xQWRx_6I}xG9Bm|MTxWI+WN4 zIX(Z7PnTYg{E?1|yfo7P=Iz^3>oHq6xMHA-4-nv*nwz&&{e|s2XjG*Ny?oOOltT*R zU;f1|w12Vp|MV~Z-?L0WiyVWr>M9voF!+vIT3RGNop}r}c{m>2=brG4W_UlyT|sGB zg7uADK616vZmJzNOkz5^Fi>tR+?IL;oLo2^wrb9NAp6S?M~XmC9nlyS|BLyR73YLG zQ%-`@*YX%$fB~97^5FF!pUTS0cU^*kc8`aueH1%#Jy~4-6dGyX*jQZ+fszyoJ&pv? z)GL51F7d$JIemXiCsqtbomlV?9QJl(280wyZr{cP15eNJyF;qJYpk@AnAP$=sZ)k+ zz0Le+YuPNu%%4B0==+N^@d^h2PIGs+LUSWAb+8@LR8y%zIuE|Ivx>~05)FK*8=rogKBdf|H#J$j~kqJ|W+bSN+YPN`LdU1Mp(s;HO@5_p>VmTcet~eBpj`bc!_=sK| znCHvyEWDYuejSc%03@$Z{jFK6P7_TkeMzKyC7`q zh&{<}?SI)a_12JybQN7awf`+P#Ma0a|1y5T6iLi+^m0)fTqZ^`@E#_ef32+!ei*AS zk|iy~Z)pAp(&&;56GjVjTK7}+ICFw1R-I2o@G^H9-Wf_Xu!0jtba(02$3_ZYeblQZ z)xJv~$a4-3A6@VgdWmPM_VRQ2ZK=xVDXt|Rrhd{L)1ArolCAA6h*rv5^5k&wix%{b z@>PN#o$%U>dN(~?D!G)VmNL6_K~w;FKl?p9aKT{{PIRk8fzyq8ZT~M^a5$Y`?F9io zfk#6#%FpICu5RJr{BSvvE7!Jr^J&7wKr=iLcBdP z)pCOF#qM&2M%q4!KelEEB~I$=?5}T2A6i zq#08s@v~SiobO$oTE}Tl4i}1c;_DbD!b^X7$$FU1oAlDU;%7_M!GKbUQDh5R$GZ{w zXHehbI5jTS;v|q~@8Lj;&>tgmKhiNq_z~|o=xWmDIRy?jl(}B>5VTTFu~!Mjo2`Ps zR`^63_Y1W#xzK%%d|R)T*dB|cWB<)r6QY-F%;desrEN1R6OXN)_^rB_-*Fu=I`QNn zd6FXyqCJD2wWT-9jed_2qsJUZFK+zyB`;xq@vK|=?~FG`LlRjq3!cr%|85Z#BV$f` zfLY_g?R-z$opU}^R{R`$X**$fUiXx#jg0CaiGm)QvtYaE^!fI~#qHv}b=yEb7rNP% zx3bKlQi3qZN{Kg**?E5GQ>S;$8Q53wHGfE#`n;klv=UeG2*bdClA{3;x3wN_#M|Ug$=kh-#Gd4>xQ!@*PYq8t3{%lSI-rB z0!;cu+COW#ynOmlPS=1keK>7(X)k~@K+r|rg`lmKWu`a(jD5i@;L?Pl=@T=*=O#xP zO1Iqct&K0Ne(*DIDic5OG$2Xk!fBcqa9yL(#&T(yF@0)@Z|%sW3wmA!v6m9LHLq?9 z-jx@WF0;4O{fRahw8W(8em!reoBjDs;S$^3l#oo01@Eq=a(0Id+=sgt0w8`b@NhYZD3v&WH64Gk*8N#{ z`yDyIL_LmI)tM5GpA2i-QCtIuHG!@!i2%8@tdaTxWE*N83{v!wGgd~ey#hLPFKic) zO2K%?boPPAa1%!Tb>WPn!0Mw@P`S$!Ruz}+Wq&$FUW^gx9m&c6b>;Q%i2^yQ6+6#) zBFIKld~R}eUix5WfSqTUBj#(@8ih5=bU#5xNU%Ph1jWUXIUd&VP&pJaLEq9RWgP@0 zPAb-0qGRfs+xN)OUScEBxEdOTCOFaDQxT8r&*JV_IO%~i#vDE@(jTY48OzZR!HSle z_F7vmp`v{4PQFp3JzeP;?I<0t4rcA(!1bl6q6^e zSAS=1^Y%Z__dgaBG*@+tF8P*MqQxuclWQBuEo0g|dF9~HRp=GrPOG(L(IC!#Zoiv0 z!M=2qUGqfOHHjqQ%)!pDv%&ULj(erAyI{;LhcOWKpjBJdScxtF(etGGeBU*l>hC84 zPa(&RYd7aG%}LD-t39YdHls2BS~*!HSa8yn=~r|z2v)8C0Qr< zi>|leXND4iFq$$2EqT*7B=&Cm5eLRCgi)gG;_O(4{mg6T;(iVc3il{}gy*5}?LYd| zK{53cYBKWQ*07sZ0$c3S4v{jxW{}orLgL7^GQqa0SN;Pv`t9ZYNyztFBb58KQ&Jgx*IziG z;~X-g>r$*4`J&_eOT{qk(OWseEr~G1Gm|00n1Bf%7Y06$<5JEywC%HRt1G|n63B$w z%>qUk*+6V>UsqXRxAEArM+LwV4m4fc*554Wa~G*$bg@Qct)rs@O}{|vLnnD6qT%|QbzJp5rL! z##2qJSv-1YFXqj8nM~#=qnEP+>=Z`tlWVda5Ai=n zsnpA*1sXC;6KM9L@0?ZgMX9C1p9tD&HnCg8xYN^fPC_IC{)yBXS0N!nY`jY zjAQ&mN=j<_JaMn>IHx!1`^kjMh^d3J_LqgZOFrNRtT{6`H)k;K{+MFRtK;bC=$z7X z>*mds1jqhoN(r@$Zr32MB&~Z94rJgKdkb6&6E`;@m^>Az`1ug0;OL3J=g6vEV%%#V z;WJ4rW*Etc~N1e)NFvF;O6@ED4J(l1Y`^)eETs-fY0`)ch6nrB&&dX*~bslE^q zhp2a$|0d+VS2SrvYN)!0mR*U)t_5S2Ns5Oc>LEM$W~O|ii7P*O3Sm?4XEu|}KG~`y zN{o?-_1{)W%*FG*j{jjv9b$h(Q+Zt67%SBo|B0?^4^z=xcrZ@z4}lFHWc6!`?_uV{ zVmLU#OGjFGRWa=G=NA!Vlwf1tXJZ(fH$8{$!Sb)-cuYil??!R8N-46P?}J4)Fofd# zt$*A34`HIEymysE=Y!W72(Ymj{&24o$zNHn-~T|b9?dO!t!-CY7de*w7>5UPEMWUKo_aP!KXn^(GBKiW)O*n3AT96 zVneFIh0SS+l_`?yb2Ji?zb#s9()dBA&R$g6E7OP0I?px#hYuOaPi0h9bj6@<=`Nm6NsTAXHKOBj<4~-PG)g%0| z219&4W;&pXkCt;SvTIK!1`405?EcX{d4`&6&s}}0{&Ph`XRy!{{#&=s^-ils=dNG2 zB{J&Q5$--v)p`9h!21#t6Q?(1fc#^;GujHu(9UdvArt+Zks$AUb*CTx3bJ(V7 zNpHMW;@g}4*7d98lJ*acAGLbTyn}>H4J#ou8&&aK)`X#0k)WkLOME9vaU2*nV28|Qw0Rz zwgkL~c$Y8FcXIA+Maf`VF{Je`s*lxdSE8Bv=($9oW2sYQoj9PKMt&6Z?L0ZyV%ICX zR^zgfGw4+QVn}-aMT(er14$fk%W8sydutORw~_>Bfuzyt=#AqMgHVl()|$>5v@ilP z7Bn+HL3GH9vlUzF19B2s6-;#g3Bc%)vM%p!c%VPqQBjtr)2X7o*wJt@1Wk#Eh}^dS zEe@v=pj>+9Rc zi(@AHc^f)7%76Cv15b=2+_v{0%0EqTt5=z}HC}D|{^Sv64l9!7ri4QlkWrU*J1juv zI!y5~IxXl~QT<$)Fi*>!LTwD2+P!OpquO~9Ji;GY0&&t%XQ=Tn6tKP_YByq~fZ7`N zw|tK4@BTcmlmI858^e(p#rR9G|7Tdz(+x&O5qXUF!NJmd8KactOXi`=w@Fi!4{y(O zpmOk%n3#Mm;Aqttkg?j{Z9hs8rPf2$zHQU_mOwZ^tIQOn^?Pj5vC$TbyUb(xmMHAn}K7b(g$;}L-j76!M z?v0WCLQOf61oriLyB^F-xOsaH%{=F^Y?PLv01HV!eoT*PQYZfE=~EV5lFIQ4&)Uch zMtn(;AkFBl1}o(ar(1TDuQ>iH)Zrnsyy+ke&4n6# z;@C$a1miI%wF-C+4dtk}y{~<=r@Xv8j91;#i zPtZ;~hQl!;Vtn2(3 z_L%%!ZKp`EN8CAh5?SRaB4={kzC8K42cjK5YC&84u3u*zbTVqihI(YG3~rw+MHn|` zD57^Gy-(C-yEbV>xNzZ2URO z$vJ)N!`LqOD*kYMUq+M1?pj^%xt0|~6%u>3Sg`XUUhXEfc8d7n-Y>gy!GRgR$lN!e zpDK||393A}{wT+v`tXkXq-3; z;UgND#&i%o1B;$}!F!#%Qb>Q~3GIC+_M<<`nOQC;jKoXZe(`l|wotOfLb#7#joTjU zv?%nl)%4nU@YVX4_OBRh(5J=aZ=Av4dkjq%R~~ew^E4{V?Z3TtCOWWRba(woniyqk zYs+rZP=LLfXcO|B=esn@v22ec4DYaVf1%qbJqoTrM8)hzhUX zhQvy(1IH9N87<2+4g0%$jB{&};JNJTI|aIpM8u@*ZY?g~IoO6z`PKT#yow`Sg z!Snq!L#?Vuebo&crC7jnu1uv1mo- z1o=hv2O_b7z>9gUcNE#*I2RPxzYGjL@ztw7p!s`*_0dCs`IDcADD!UUTx)E98LNLT z_5-Jq=%EeM<~MPq$0(Xvp=1n z*9o*=M_bpjF9J(e96P=Y!f)R-FN_~Ez!;DGXg^k`lZO;ux>sLqjK{fL!~9NKK*kJa zKIq2J5^IVnj!B$vs?C>;=jC8`n9GFahio5+bssx0a1+}&C^1v)GgjG6RbO-w6lD3$ zseO`KvrO*YGSa&E#_)=qg{P|O?T;U6v@~l(UgjxnQC#zblMiTa6US!u8P-{}{J%z! zn4~DkzxO_?zCc9eA|xz4i>i0OdBu4~!29el_Z{u9$S3y3MvEb>YK+aCd%@w+9vdV# zc<|@)FZ5c6{_G7>sXM%(G1i~m!HzS9F?nPCJx)|BkdQR-|b0)L5DfwbE78kMJ z3E#N>w-(@V#+1Rg2pvqyS;V_2i^)unrKnXEVl#F7)J3EUmHIe+k`-y-segrxy8d({ zskJcWqkUXR;$3U)!V8z&9;u%Fh;8&|X4K-QXLj}r2?>e3L=wz&&xFOn8IIIYZA>$T zot1_AaHH1|jlp>{6N;R!5#n^V@2Es2MXC4@o4C$bp6|XAkLSv9W?FHQJo1v!EYej= zMSXR)^dKQiB`=Lbezd4#ygj#{aKhFxP#0U^Tc#Gvgg!Q=)&yMUASnj55#T{#dplHO z`jZ14=HASY39c3n}h>eea|2WB__P1-4y`4bX<(q;kiG251{JL=V(Hg{O_kyFT_@3aSm4I+)Dv%kF z`g(Y9khe4n$({@@D5QK{i4lFS;Ldw-S9k=H9MiPo7`ok}%0oE>W?!-O#Rcg#bT<5$#U6;j`_7oX%H= z`*1v|9%O#J4ev|ID_98a#JNa(8`7#R_q>p2`=(`lipo`jYX$@(Kckh%BIf*RZ3-%$ z_8TMX+%7^{rA_kUU~~G}U{>%?aXH2x*|N+hV&@-q5JubL@;`CM<3GXf@RR+gCc(S+ zW!Hdr+aX6mrJ|!YOER7!wDLe9x$~n<4og~?-%xHbgKw9Tog6VkU-S3x5r~O!`|&tE z(t$<3d=;ZF0f^?OR+!SHsET`254WaG7+HBS!}jc2qr|*?l_K8R7fp}_y>BbbyiWXb zm-!~YRoGFoJ5$mZzSyv1EEQWr`55~#8+FD)IIN1#^(5eFA!sbdk&tp?x zk2Zz$!w#n332Y^?ndyGVxv5^bY`He^cdc_)YCR6ODaE`*L>@eNi9+6k5Mwo?w0BF< zO2kmi1q=-6fU`F6wIUy&29(0KG&i;Ld?4%iFX5t2fyhq?QZ)K%HFVhlU#Z%D`pr*z zS?-UPPq$$x|Aw$7Z3;vg!<(-WGR-@4TZkyOvlTUa4oIC)z5ZGDgN$6u7yIc)F&npE zZtZdVz|<^Q@*MA+UK@9OZcbL6PH)bwp2dzt8md82%@3}L-r2a0G0P8S5s_I-#KvXN zfpbuCbL(L*_1(vomY-hPhs{2n#O5Ix{lN(Bm z6SH{x`pIX3H4kp)25Xy5b0=Ms=dWEeV46DLu6%2g$cpg%@p+nEwds0G06_}~{Rb-T zOv*-;%l*FosH*$Y5^%`FmkQqk~tf|-9I}OlP)oSFul53XnQrw`NVw>)s_`ZI?~o?@8syHQwOop{_5HC z=8#nRe25JW@glGPuy=b04dT+*`_*RMe2k8!^A#k{)69?jJ`Ej)$7J`D;lbO_8ZaTJ z?6$ZQJq#c8dcfK8un$t2`7t5ytk*$~jNu&Q*^7&TfDz?ipV}6D2h~l|l!?cdKgrJa zDW}^MCJae`-~X^(VsqX!x!UEsY>fJSm2!XcYX7?J^A~ zPAgkD3UP{x&1ToWEk^BvvQ(W$&BsuD|E~7=^XFDb>Dz=n^1?HzXN!HwSAb{GIFH5C zBc1_=NK+(>0yF{kFh$O5_$xf*l=eAPF3A8s`jQ?1JDsa$FAU0`WRM=lp{_- z=h?xQ>S6YI%Vl}+66ilvPrEcQ;7%ppQ;&QH%z*5nnA4If%N1RG33$B6y%YgdJnral zqK{a(wUpk*Ib2A~y|@ok2$W^VS9-}RooBj(^kok|IWgs6_*XyV)l@0x0w7q3iR$Ci3Du_rUEz+G*(yh`+hX~Rl-5?;H z(y5?;sDy+lAt0Snf*>V`q@;9roO|v4zV7{=an6tL$G68Gdptw%*=x~lX;nb&msZVe8_gtAtp}+=bLYu3Jx^kXhVlX1IiMRkC)S~GB5)=9D zn>vwHtBSuT@!H!4IZ8Ur2bW;=b$)lUKD|70JZ`+S&@X`X=6Ap~^3oGHV(U=6{gU(u zY_GM|UMM~mQ>1~CHXk8H=)2g-|BLG@jap5_lj`)uSCII()Fr&CZKkVV)9A-8W=OVv zaSpz6?SNk;?d@YvPm}go<8QfYj?b!PHLFHOMojuLWgWBWt*vA#;Xr3F_vjTUaYjGi z(PC^i6tuAaUPxX50v|ZezZ+ZDTv?@Zn(wx*-JkFtrCi$#EuVEx8RRb|&_AIs>QV`8 zIE$>w^Vdok+erML15Eo-_C!*7mqA;YO)vUnHtqrf@^xl?q=dm?vi^PS-4FK>UmQy!*FsA1$0R+gHK+{+y?p$ohW)*qhDYT` z6mpt_yQ@F(fxjsfsugO^gP8U|to5Y)x1cNF#}%GRCs^%7@Tf$yXK$FG?0?yv8Y;V7 zNcEEB>1W7Y<|lsQVb>g?B6T5LhhPX$GF8Ep!?ZjlKUacZrKb5$!K|HKWC9dq0fCkdTbqF>R6XmuF zmCp%VxvX^=+m|38J^O7!-Y$~P3F8`1%uCn_;vP$;5{r-g5zHyhpgM0F-d7E>{xY8j zh}&Io7}c(_d&?a6ti5-d+PG4g__2Pw|Hc&Eo%F>lt}P~}SeeQ$T+D>^E#jL_PCSh7 zRbnYchAQ>?k@yAslaW+#q340h1{bmN4qyX@o6y|Q^vE9<3HZQ=jS`H#5B{C`F*oNl ztP(!bf4sgz8GjtS*KzNo$l7`BtvM&hZ*Rm@X(Ylga@5PnSYd#h=~K7lJ^D6_U=2fM z*nWQBlgTJDtRaO-uLCzq68Y{O#mNaZTc96^!tT5j#0ALzuFz!}o8A{U401dZ!uKIm zd}iFEi(Xh*VYl36PH9!nhQ3R7Uds1~xZ(T^0=x+$xK6`&;s9-MjZa^5A(GDQ=Qga) z6BJA0jSO)duP%?g4?!LTsp5hel?sDLU}&H-v8R=Lrt3pV^-|l-3wnKnas{av9@c5! zH(9qZ~Q5p_M?k0kif8ib&Qk!5Q z^`I$kN-To)?oEU^~mAyv}(^hlu4;8V|(rL&6=4vycIN6(Su2{nxWyzdTJ z?;3gI1Dv7BOOe9u{f}oN!f=FQ7gt~LN#^p~)I=p_R@U%$i8z6tAE4PH&btRlBK0vgYUEIJ21Xp7nk;D*N}m+)G3R2si&q?r1)7=FULeJ zguX8i-8;o*iJe-K%2ofUC^p+;%ySyJ6%O~CkHj^w+u6yWK7=OScwur>FhFU4sD?aR zsB#fbLRzHEVk=NDy@s3$8y}zuYqL3X-LEuxx~W-7acS8=Oh90odT)v3Y;+&n!LaJW zdS;G#j)Ut1S9e*?fQB{krg`jo&olO0Vk{mkVs5Oo)5AFhiA| z%04@~J~cJ_^vuLd1}>lu4kpW?QO^_`Ibp2Q=1QHv{quOFQ6It-81E*HZu#nic1NZ1 zyxxMOx#gBdxgYjy$IH7Y4V(N|ZF|R?S1p?y?*{}?|M)iJAF9zIq545;*E^tLZ(Zh& z8^8YUwe&;M*c5+}#+Uw3dq|RXnfq4B*@^whJ}Zyx z!<_BPOoPLq4HxHdqP2bB65CZFQEZQf1&u}FVKQvV?kFYM6)o`$NfJLv@(>MM>gl#b zvm<`xlC`V=R-cL`0f7u97bRR2s66FHbZU#&`n2fGy;pN1a$P3E$$CF35J{+!92 z#4eQd#`fdBox~}#wt}~)QF29T;Zf5}JP%q(Sk?nPvbs?32gJo^dbIkwU+{+QlF<#F z%rW;nDX3?5V$^+ZOi@(|-}|-UPsciM@x;u_-ktjVgVs$9yttbS*SH+*l1QiPJv}Z= za4!YyN41V)z-fOL-jwZH+Q4ok`fIz0*Ah zlLp9Rux4)wirG?DZ(>NDd*C-sYdDpa%PsRAsux*NhselG>zZ6KGs`kXQP%EQ{qV_2 zXg=SBb6RpS?Q(H>MPqZlQ9iX_TTQZoT%z(?+ucw$Fzp@bw$k)0cQlP3KwS ze}o)^m-~(LOy@Ah*DQgVt-mt1ME%#&d#-og&3nC6{h;@QYucdh*eAzVHlq*+GWe+J ze-elom zH|ybgQ_9EY*oVm{dGLEP8DsXf$Jvy=u#){+q+s9&%4>MST08~ zw)gMMEz?A7f4Zf?_loM_{!X))NZ}?sCn@$LiKf8qCJf9djrOhIixDRx_OzvjT3_bM zTfUb#R~-Uq%p2IvJXX~VZ-XKxPxDL5OaDWC1eT;jYBz6CHYU zdzVcXI>|-^-q7M1;WUlCKNFVxh(Wn>{hL$&^|E(sVR019Upu+joWgweFC>$a>V0Fu zU$MJBsDi5U!TnH#lMfyHcwW}wu5I_kTAU?!RtdP;VPM3DCDk!e+Fjq6zPrB}`vkXs zaV7sq;e&!Pfkw&mpY7N`R;f%wC)PF1N}+=1R1ErU_QXf4>%E&T^_B;=4g_Om&QHpMul;o>dcL&M zC3zxi%+=o?wKTS^E!DN0hrOQsgzsdgZ#wH#qN%ZVhb^hqzi*Mp7t4c+wge0Fxd@aK zH;kq6Uj2*N-pGldY^{9G@h@OuJzgGC;mWx)b8gYTLF(M>dls8J;=9E3J+>ome!wB? zu7rp7_4zzOeOyoSP?{(6x7~2mdl#6|1-?eS_eak(m9^Y?zmP0l>0kYeu9@CUJG+)R z#>u$99b1L{VK9al1qb%5OL^1v1}9F6D=9|IRf&mC!!k9tqZ#$&VYu?I%qI_N3$Lc* zuameVjOcY`!CR+?Mbeb%x)?+4W755ksYh9TVw}#sS8<9D;Mzz?d&+zxu~6i~Xq=XafsD03pHgUU0!C5ndBxZe^Eds*=XUGcN*a zswlPks<=^$(l;8HEO(BMB+pXyukHzwF7i*3RoyRq=9EJ3HGIB{t8-43u6dRE$9qDI zeR~$8X3IL7`Q5Esp+S2GYtk3?(jP~>B*J39dA<4ZNvgb4uR{#oSC`D2UM#^F`$WUH zG>)jm+_hdG4jLTfY2fV?EMs+EqM;^vC#dpu+4N%9gVK71+1jM{On#Kf_z4;%i$`;Y zwRZ;zv}zukjoC=0SD#^RAA2^nc4A4|p3-{x>EcJtXgG|xyS<+ao1m#|9{4)SD0P`q zyA9Ek90*KdC^U}1jmD^q>>QCzXONm^eBXO3AHwi20n!Xom`j>d2ck_VA$w3l4T|vM zu}r#0P@<+1ISYjW@qR~a``^3yqS(VQRm-L{L6y2WvcR8l479)Ry?yi*SIib0=k$#3 ztcoO(@6w8?xqaYWQ(paPztx*)sXf)fZR-zP?w-zQ)_qoEV>nOV;ZKW`Xy1;vBJQ<) zQnD((`Qg*c0*+fJ!I(k>6xft7?B5-o2ajYb)qc#z8LQT-vso3SxHOZ!6rWzPil|!% z;&l(E$utXK7{F?o;5t;%ih@-&D?mZq^IF7yWP3?GvL+sai){HK~G=LDr-SYph}4UXe-6C@{ECMufLHemb2dBaFRCf<09@sik*M?@gZfBjHiK zd{OR;8Nb$Byr;cS39+8BM%myRH1#ym>2#SZHB0CanJ^IYFd&EWGCfLm%xr|1eaneH zxHnzoBIq7>+u$Umo}=^(?HTyuZ?62fG28Rfpia_#N&Dd7Alv0qf5SbS(5i>oY~@I3 z8bk<$18M4(9vroCK@=IphOrNcL?Fckq?{M?V25}Oe7LacymPq@KF3=|kBJ{aJjEs` z^z$9&!fpcHg%ZqAHTi;HVaQe$i5*03eyLaUmp8{v1>mf%`7Ei+k-IvbbhF@#D%6UJssazJgpHKUUT!5Ez965uuMU*Qvro#)va=#8v z_2$qI00vKXPnSNx9Egd7=y301Bpc@T)3y}NW4MK;@$rkGr3?rSy#yhAIBIHYNmte! z(+nPC?ZXlE7XaHznB^Li##Js%f(~POv2e6|bH*){m=h%&M$k77L6fgs}J!aRCMGWwBKsz5UK(mnb&~9Ti40(dxJP&PScBuUIVkXRxoVI^TdH%JZv&9ZH?cE zCChZn6JCoi!~)&m@KHd)*#G3Mf)csu{y_;1lE>kHx67HaRKl;C+-n^Y0klpm?zp(V z2DzWj%Za^=Xq^X#(0Ltn28X;5AY2ttV3j%?ZhJHEY=~q?`bN^-4zseTcGykQSw#$Ry$XD@2rAK=z=mt&Kf76~x5Gj9%3Y zIj+I$e_Z8iVc!?)#Ut!6fk{JNdqG?rl!OQefs2jSs$BHd zKv8!|=`1;14e+>mc3fxoqg3e&w?PFlfWV-+aR{=(g1ybq zn(%c1Fs2b$d3z!7s??dQ+wymafc#X)V*d5zPp$XQI`$EO{d{e*p~xYX6QaC&dU_7# zQeJwVornM!@&e8J1Y_T^;Z(PCAIQT6yUFz-lfd{ce$YwCG&4s@5C|l`2TZeoCxrlu z$$Vp~DRLO>fWQ`lJ#SmrLBVJ5ho#c%t^-IgNu)IH>4FZHjWHP;_d;FwO)aHlPOyV% zxbxd>>iAE2D+~UrB2Ywe9s@@54SXYy&j+w1;udW;1xK(Wf@JfUIN#A+ zUU|W+{K_@bT_p}y#VuRHD;t!+Qojw-_daOrqCJBKwCtb~o+nvKrIiIi-gW&l^}crc z7yxPC8J)v=QsQ&yVy{m?P)@HP%i{sbIG7_K#$%#*ng0DFE0bx4=6Q8lN|gd}IRLNu z`XsBP70~mfACqNOCU|r5)3Jd~gfn5Nnai>#eMv%uyFG^Lo#5MN;xk*?Nc`DTJ1)KA zE67k}R{8p_7vy67B1>380)@z`hzXAA8(|@SOR@&)hv2wgG^PRp*;pnaHDF<#w(ZNb z;Zq1w#>V1}Qt&o4(JfA%GcbLE3h-U#8c{Dd@4!LxDQKUUA|ik>S)gKZdip9)glV41 zSAnXEjc%9AxW2HCoS|=pPw+*`*tfiUA4!4Qds#?m_(6(Rk0N=|+m2MkY9&{YUv{=` zOxz6V(){%=4~S1ZTAUtmGsfG%^vRcNM@}B-paP=4d}F@P?eo!jLrDDMUT#8h2fZ72 z-ha6OlN6+_gmOC9u1M6-2f88T(uadTD;(Q9iCB1GM*Q_8+M~&Xfxpd?&`2+1JD3*; z;C6n!jGXd=kF$Nr^jn5Y;1;Sa>p&yybb&*+P(5FkG4e;eESc$9#mw&kqj|<#hV%za zqXsf1c6T^r=?md;T{2~6Gj5Bcc_m$`c2&wq*0W0`3X_QWHSlYc|3{jbof6JD`GRSJ%${;C(wM%S8%+?9WkGLv>CZ} zm$5l*f@qpcNWSp>dmTSFkqYJ~4U%99LYMEt|2v5BGP0*1ses9h{#g&&{oWApr!?%m*Qev{4Rw5A` z49z>#sE+@83;=z%B?{&P9vChhqYKm-Kv@E%z~w7`m-_DhlD$$N9weI~SiG@>9#S80 zi8GK=Mbo`)@Pna&B87_IA_0SZ!PwU@PKYp+3B{Tl3XA=_u-+fmesHlH1jg^k7?~pN zLfHBxO1`OZj>!f@Dy$7G+y)@r2vg(7n3munF(!Ljc%%<9>UdrWCG>Z4VlYfQc^x_u z6M58r>#>y^moKrLobD^`9;;+#Gx=A`WHzWMJP8fWPa{T}e%r8YyjG&i>XfQ?UErdW zIC>&zT+?BbFJ$%YHDk+=Y^9LxcVcM#G{G*ctzDO$p74|%Oj>mF?@jC(sjJ72+Uab) z?236x$fQ&tzIue=@q>16Fuy^knogv5w(aw`H@64Tt6VY^VAL@$D5^mz!XRKAp_f8| z=HjPB156CfX2Ycp^!JGwa@n=D%zXZh0f9^ND!&pv!FfKUW;QA%4WIMRf*pn=NMDUz z2}RRH6uU!=<8RaT*ve_nV{KdY6ZEtZ&N^tt-!XVl4W_u&?n zhNRfMxP>e%aDd*c`@|Y5h@-W?TFrkiVw8by1Z&R>Jf3(TE_ptf!Be)Skg3Cx`|RqE zHhF@trnVyc`G!=>*Kmo1fENS`+|z!bhq%GeG7XaMY?eS+fyqxH>~B@89Y#E`FYa0D zMH%DwmqQFyM);XYNl_94lr8jh6LIG_wFU&>{YzAR1N|DSk>WQ?TJ+{jL7FbY;GxC7 zGEs+37|KC$49z*kQ_jA4!Z|i;+*v8bC|ZgIMSWC7k$47qybgUOEObRFS~f05i&n^1 zqRr>gD;0!yxA4=MHOaiS{j1&ZTygV=sg$RD;HfsCD_G%_a#2y{1l~oR>Qv~S-2~|! zG-Z9IO8)p9Tulzp2(7uDf!Z-?%_?cVBRSDE6I~+jFyp1Cok%V&-C4LfRBB(gPH+_n)VP*AH@Wa)krGAjh`V1#=A>8ZyznR*T2h6OT#UHr$?aZeqhwsYp!^;DiZgbaXq5HNTSV###}PCscg}s>Hk@d64 zkYT%?l=;r$iyxtF?ogrJ9{A5yU%-4$fRcr=A0g!stc<5tSbv}wm8p#uvTuYQQ0hav6_BN6_s*Ux+{^uw2)l`9l-?^`*z;J7;7-$;? za~}R_PAOw+K#bHCnLa%|od#DV!C3=1_A9Y5u;Kh?YSo>kOZ*F=Tlw~5qIm31 z@aoIQ;58@d^cVFxzyQ`V4Ftqi`k&W7et`5O&_%LWt|2dmdsBC!0Nku)jQcz%U?dlS zL|>v-FB`qiujl2DJ!@xIY%yS`l-oHy@m7Vx2jofx=O9HW801U!Z~y!`?_@P`a=fpP z05Yv@ZtEm0I1;Tjdq9Y1p3b}E$fvJFT;CGzK?T2LnXt!sFAYlK??(PTu>bwZBc}Dm zxO%^`VPJ`Y-+CCO3xw{F?+t zkZyOj{ZW3Z#yMdEZEt~AJ(wO$ywxUHF?*dU<_gbHsO8qc>) zOkUr_vv6Dw0IR!cXey)sj(hoB5~|AqV}@8{i{L@)zF8@6T!^JS99Z1(bATu5AvdGR zEqISrtaycq{{gIM1Mes|PBqsZ6mw4-R$|l_8nB}9PrlwjBUp_UOuni&fgO5(b#tJ# z-KnI!xl{h8CjoWLhNWMnxMjT2H{j#ML84Q^qKM-<>T^-|NTXHv_a_U+{%`K~W-M(w zkP`Bs!A0Fxe?SAP%DgnfcDMUxN_&9CxrM`XIh~RLa}oA6L&pbpe;QqXeRxa+uMGDM zFM(?*-BFSj;zT&}8{;W@&eIx+D0^H<6u4NGdBef*5S+s8LUq*B_;FjWr4RNR>E%)msxd+ zBzNFU;ZuvNH~aQZ)IH~SrU#ExUb_*96{|7_QmJBFyp+Emr$NP?PVdg7BOsyJE<<9L z?wtE8;l9f-AfI83`ao8`%3I-helOxN`8vQd1%li)o|~DYv?|$^(Z+3W3D&r{xJJN5 z7W5(o^!wxE<3k5f-43aQB0N1eLBb!eBx<6rITjh!sTXKlgNrCS)WkO8anww8eQ$2u z*X0+5LCIp6M}bLsCnTT~+kX1c3E?2a{WSE{JO z1;}VvOkc62U4QtU{Q`wDc8FYS%@iu>(Ue@@g$rE?bH8vUf>Smm3El)Oz?Ip4^|EOk;Fg?#Eb z27V2q1Ai{&fd^a_?Fr8xJ1dZPTJTclJ%DQD2>Gym@yACNbh2}x&-H=+zcuMD9N6a!H{k1bH@s9BZ^-i+b`sQz(?5do>K$u#^VRSz};PDy&u`n zQ&)EvOgk3rqU<+4y?JUd9SI)lB=6g;%SakGAO z_F1bNf~K#HIz5016^3lZdKoJ*Xgl!f12^yY|E0K`V!~Q4M#>s^WIq2cZ?}Qwl6l{Y z%N-BRL)Fd~Lit1}2`I^f2mG3xK7MSgSXOg_;xJLe*g%lFXzjCXP_DKNzw**mf_V_| zk1o+yGnL-oDvNUBhJnV_-mlH)mo}u|d)c)Gq z(a|peAsb-_)G2$cr`IpSlzh}*MCm_P>8w}!029gX9c?9B18Z%~d^6ZXUzl9os-BH9 zF0<;Rc(Sm#7!JtRud~z3?5MB)&u_fzN_F=s3Q=nPlk$|)&E>BPHrk1XZ~aj2y0Iq^ z;eAhD5=dCrI@<{~c0$WEpFyYjyC%`vb!RKx%*#$p&}QF^!q@@NXGEi{(QK$$xU)Qr z59%#>Saj&vEUpRKz0Pcq4Wj{_*_;ts!gbX9^kWUc>twtgU%PBjVTagJ4~4x|PT|r? zOI4!5Cgp4d;WK-IIN-C0pw;@JOg0P47?VJBcC{DjmDeg1M8=ig-eiq!F%;TIJJ|1fqYhzUQ<(uYv{%AdLYiz+7cP5I=0C@13*==82Z^ z1VA$d5BU5$HvKYVAR1B}-uiPoKt2R)VqTT4OnI+K^iG$vW!RFYz-sRx$N8qv4(m&f zCR^T;#=0pd5=`!@ZpRZ>xrG-XrYv&|r1e5U#%FmR2@CXJiBf)iT#$BYv+SBjGd4DUmiTb^gJD6c zor`Ew`(>SpqM(fp9!jY*WMvJ?%gcKZT7%)1fbY^GojH^9nQU##F=3DXu&#FN7ZA{u z>Z4X1;k>0pHM6vYF4Z9aT*_ZkHC;Fm_=zSR@o&N-B6PR6H#hs3(PF2(8COpL{Tx7g$V3?s8CtkZ} zkt3p}g-p?(UqExH`zg{ff2hv;6aRh_-an;T09toaKwn7BRI&4Bz}=(_r+4bx@7-># zO*WstH)hx|y^rU9n+ysSA`5?j=21ojl(<6W{K;_(aD{H z$@YI%Np_Sh2w~4&b@6Pf(=-$&!eBfPL&GHdE5U^tkC;8inExk*zf$-NQFtT%O9Xa_q939xmyDwkCq=l$YWz;f4eIn*YbBYON)zkF#ZzVGv7{W zIz!3HP7XB;Kz;|-#1;6J#zD;Ik#}ymt5VWnUX8lIFy_vXri-+Tt7G`gZ6rY54-%Td z@^S&dpfSC@kE6*k;{G`Eza~QQ^H>h)oO%Xy&)BC_i2;7_Ja=bWN0AB<>2~@I$1?O% z!XLUDv_p$^umW*0u6Fw!lqnl!1_V|m8+bVhpm=M7EaG^MyIcY&z>Ku}S%C|Q0L>8B zIcS?&v9%3}v0$}-efJ3Lhc^L+%L428kuqzdqy6uy)qra*lJo4Y(q?aA7A%p?tV4ImtHh%upz z!?R!Ri4=enOE;Vocy8V_xELWBFe0v(>RQzlF7{QnERRJEi87{m>gp^dU z1Um05YCId29ngL7>94wz_0X{HF|MA0LDSEl1{`BZ>j1h{`O~CqnrKkSeZ%aA)L%OP zPhZX@#KoWDU|?VbdSr3>u7Gj(#fy;S1~DAv;3LnQAR zciGtglmrC)pP^+ji0gjn1^eg5{{CGv!!z_e^*5-J$168>sU)C7y89UFv_X&Bu>+`I zYE>5x#@`b$XK80T31R6zQUWqK^der8*djHE5K@B}NySc_AcIj6mQOMtMu=edEzlc{ zfuiZx&hHzT^ke?Vt3uINc~PZKHZFpeHyB}Z!fLNSZ3Mj{@I6NdO(Kimxi1aSOW%-8 zrDE(uJMbO0HyiYvK!)%QxBGm!AH(DFN#Eh{u6(CpicMpI`06CF|+ic3oRqNp2ccAG8Y zzkr8mQ-`Vo?Jhh&t=`z{$U}`n9_o8C`Lsir@yiu&$*+$AdLDupAwW8m6^Ql`H&zuk z290$>S+2M@T{8=7Yip2zB^wmA1Mgdb_6@N|E9wO9gne90RZsa6Q{O@xv^wyO({L0K z$A6Fla5#ZTDbwo#0Vl>#K1#= zTHNzBwNOfwM?^?c12{xa?#x6=G#MQu79 zBmCcWRC=B%KHX`5ds|Cm1?ePh8Ai%gDl3tDJ-)7%AzoZs+Fw6WWGqEOvc9xbuvN3H z1+NqS^_IaLCm|g9Jy0AV(3Xvx9dIU19>VBCja50udS8k?Iow`&*CA{@oUfIO)-2(N=h69z zE3Rr;YgXLF#Z6meEC;%WGjR8;>aQ;t)q1{hsaohrAX-~m{ zg6gXibz)F+q)5Zfynu?%`ks|cE%TF_T8$hfk~+Q38Uyb8==pc19j-*LgrI36Bl-`e6ZCK{cX7Nxu`% z_fGp;TbdfK$jOat6JA#Bp(JHCBqIUo{s{3n<};-=E6&z}K4kFa2a1xnhm9-$g)i=; z2fPXKKCy;-;{Ez@-oqo6puYU#e<W~;FxB06yZ!ZIf^;eJWn#SWpPA44 z{~7$hN)F;R@yiO0jPSS9W=oya;YV^H(f~MSB@t~0K7F9z1Y=&|64HMGRe!Src@*KU zn5#G`A!aUs0G*yyEyA&anrv7x+u$~zD`Q+xv-F0T7y zG7>%2_!GH$9MEb*eUedf;^p@5TPwT?&)T-|hNX2g+=Ys#voJsWI2c;ALaeHf!m>TZ?a08^P;Z&(6G$?)rlMy!emn`W29&pV6p{~NU-Tc3IZPL|P z9xTUKiNDgk4>L0-zbMwFx4Yf&$$o==FfvL8^&Ud~NxuI`H5ILt5$>L|Y9+l46E0rp zP3x5U{|wIcKsS=LuQoC!^RLw}4g9*tc!etJry6IUOSSylOWhF((n|1*wCF$Y!Cs68H6jNZNqU4I^-ZKdcP?Z! zQHQ6D-rCs%f8`D&L)Kt&<2^}cf|t=U!eXBB-`~aeJOEe;?=Nz$P)=Jf$4#c7wX5uV z5p&X;7oR?rU_RBnP%>Q<5MJ5gK}y(R1DE)5G}5)6G2R_sQp@r3g>DrjXV1FuqWr%_ z1MyXuSMOH-6ruO7>BBg-lp1}!+P>i1taAC4oqMpE8ANikTsKtBO= zey?==hkRGk9~|Pjlck z)Ptf0WLSs&lSaP$ozzX}!i;4wfHVB`Y) zf*XiicA;?tiquIlA+S7dL)6TW@I8gbzhZ+Glm20?AxoC@78(xsm~4dy<&)y=%TMi6 z_(}~}9}7fX06OODtE&o(amSq$f{Xjl-aYC)A3?%C7sJlT!_`Iq(=Dcn%{#=-)TA*i`{p{- z`($r)N2~Xqr8d8{VnTaAEFs&K76OgZpS%uZ!oUfT-JOI22msf7s{`mhvnq1CGZu7r zbzvhEGy(_BQp?`hUb#5A@&X-GOUpPImdu{ZG_Q@utHUEBZ|Ug?1E>-JU&r9@=XmFp zq)K^`f{;jTP0js|jx=ov z%a8IUKn3$ET^;Y8#Xx%5NP2N`N|4%WXzU@pl7x*^^OsPF1Hz`I+4SWL7Ua^kLF5=6 zy?;;~zRmYM6b=qsK;s&LEf?cov5CGxUo!MyPS{7SBL-oGctROzcmO6dMa&{G2lW zQ9AwmIlwb`=d)t5{A+Ihp0ri}AWxDc*3BCI1oraoM#psLmCVy+onqJUu46wxDSEE+ z{OzS|-Ac#NGR5PoUeH6>7+4z0Gkll)cnkqwJ_A?lDdVT&V)X<^Ld+E5X2Z{) zaMBVyf+L=8Tp0*6jXB2VGHan>pp*&a-Ue~{V9A^w{q8ZGe1KNUL<;1bI1mtX z+%x_g4VT;fy|few0RHdpG!k6oEL|-Ag96H!--NJ^a6N!#qd+p(={2};U7=Q4pC~6X za&9&=I4OsLlNwuiK4clzGd24{Pk5xtmhAD~;v3B`3JKCkxZp3uNo_p@>yCk_d8BG+Bf+V}M{?G1)e4zD0MJHQxxtjYn*kmr_GWVx$rBckL z5=UcNvCQJfMO5<+U3-tU1^1jBr1Tv@pA`-z?3XAAm>ZgeB3q}-Ly2Imlsy0bicREw zJO2YyV`1Tuck&%+{MLyfqd!Axc37$m=THr`7}t*gkAEgihUo{aQTsEG2p0(26K#uD zJiw?adxq^7QBkC|a4{q_G}ELh=m6aK)IoYbRIacMgh(^s!j78zZnhGG|0)3l7HhYQ z`ke?1+Ks&SHFjyjgTpr*Mz!MJceOUm##wYCr5AbED|vu~6{Jw#(vDeF(AUV?b;Y>BS{arnWnkeEc7ev@+Af#fG<P-s{Rpwpr*6N?4iBpjhO`}HPuRGw<_^bIJqq|&UmFck>%GrjEmY+ds^4J{nmsT zd3(-f2uut4Q)2ExBQB5X_W9ZA^zRoGuuEz**WCwW5doL%y;FowI{qMjtk|Ux-nU!* z^9l*=vvE{nZ@ErDH=k+b>gJ{$H!unNC-C?bNxTJ>#9+Y7Zalz;Gt&G`+N{vyj5=># z*ESrcM%h|?a#8I$y%dPzD!3!a^{f;z*hIUp5352*4e=p$O4`^~9cTUmDr$!LL~YfQ zmv=G|Vwrowe;8$hKx%IJsT? z>y>$NZ0X@4+in1tar#DyHYbA-Q z*Q$z@lap_Po4TDYu?XXUza)t+a?yO-PjTe`-1$;vktX28}rQ-FD?*=SjQ{(NtkH58{Af_`=?0AEO=@#$Bl z;j_g)MT@=-0EW%@tx0{G1_!G zpYVyWj=LM!{~I`NA>t`;ij%sp2sv7ESJAmT9X9UODpq#MWzr8*QBrDae|j}(E+Z3? z%Q`lf;F>T*IA;XGkRc&x=A&iS>}F#>Yc3>n8>4e@a6mr-i%PU?*ddjjXnC~!UE&7e zOW5NC?xYD))z^Crrn{4@0gYWZ?j;Cn+7XG^AepGR-`Csh+O0moVQ ziq^c_pVTJ?FfV>_9sP6~%S9fmm9mN>cEmYG5$ZjE&*%ReLij*& zUWTz<2BMMbh(BU{5hB38H92SGqROr9-p4f`oG%W_vLidL+h!iz9cq|AAYMm zqF!&`+Gvo~p@@Q&%J$Ta6Qi!)H)P~h&O`)AYXSVFGb(*0Xlh7|Z+MLfLk4?ALehCDOu zh{mv${q9PWX9)ZZPs?45g-@yS)nnr5p%iL!=elM`Dl1PnyC64SaB1SFc9~*+oV@3= z{P5i-J|Q_Sr_`134@0Fb3GkoVuhBWiJizE65CM~(3 zu=X9d-dFD!K%ew|`jfSJJaa*yaOu@$30vXZ$I%FA38f^{xCH9%B>`ybK_j0Eai8YoOq}$VHAYN)a`#-*#T=<2+RM(#6RbhP%FD}_hI@L> z5Toq8yjZT-^WSw!>sTZ7$s-1XGMrbFMzR#`N^I?{I7?DS1ok2?DX&D}#%P%H?@eF_ z(PAj9*ZC~qO@0W7;!+{4bj7>;L+FO%yb%7_Y2ERf$opo-f3SJ``Ov8j`^& zV{P_5|L7!+r}nvU{*jAHYX8TPnC;f8&I&6nEo|bU@xoNnla_v`N$yKS_Jbw%c}(e? zgT+flR)vxz5#OHU43_D(#}erB2x$kND$sPVm_|BN;kJsT-PxD7_r@*D`;f$u(>>Z? zUcBbU7(_oX{oqaPr8OP4ea@GU z=pSYf5`OdB&EI;7?3%v;nf-mw#E`b0R40#jA=A`!mCGqOH!kW>zAyFxd$4~1(VB-CzEUBP^KLRI;Y*! zRv*w4R}=d?R4)$C|L7_wy&r+egTma1_T_RQ$fjl*&bfm2^s-%!_v}x02GpQKATZzi zireUCNedBR;cd;K*xN7NZiXI1fcPeOhiB*JnnQirikB2}3`7}|hY!PuSZb%HOD+MC z=xoYsX)r$+h~B=%Mqc@i&2_bfjJ;XigeKEQ4^v_<_n}+*U1-q^Ki?|ce1))Qr~IsWwpM@$@FF$rtG z?(WQH%MtAgr98o8 z$R!pY_xHxtv0z%Ew!l;oU~*29PPBb__VQ_JL|xW8Z49-;=^yzB$D8=?_LVEJ0_cV3Wn^~9Y9Y3&SZdU#Eu%1h|(p{mZ#uE|}LP8sC zY-~XO$W~TX7E&k)6~F~vFq3jy9VJ=)`AHr;lIH=?$jf6TSOp79Vx(8dIQFr$bP7_h zrlw35mzJ<@I$G|psq98a<5X5xn*rT&aUd5TjN(ikU-`DA94RU&TruwK?1a4Ti?T1Z zq}IifJz5!?U-plIOA8)w1E2Gnhvk9e-@35pM!?mgiV8uXErbE(CR{|)dk+WBFSqyC zB*0i{-Nl~{p7*|TH<7j*GHrcBJ$;aAONLBa*~DX0)KCUw+B(3rWij5k?dxi|lgBL9 zEJ1gj2wLRDYpPe|20z|1yg{{?$JTX+?KZ*Mr^(Oe)0TKHD*DMz>!W%!4SU@atZscJ ztcAkVf$=e_o}}qvr*~&IIE62zWH7%e$PqnDHM>~k9kEy@e#KE&ta;mv^N~?_Q=Kec zl}n|6jVMMR<_lqgA96d&1iz|->q;yunMA+$1)j3fc;b8v368j?-@W-{-7&V9?`hqp zZOUBHn}W=ci{_h&cac( zUO$VNMDGbhg5D|#o8Cz5PSeICx#vgaCBlzRR8d8XJY%MGVvBjS2{hKyHRV1i$8+%h zHA9&Kf!k{e<+#;o8JnobS~PT!O+JScJQs4LPESvliG1=AHCoVR8b=~{q4h@sZ|l82A)^WLhM)8?;Bo}QkyEzuJ#NNZT8f?K~#f$|j`Z4XpW z5lE@#(JG;eHbyoa>4Tm9|z^w33oW9p}*cjNlzmK_lF7b4bM>w|Y&1 zVPs^auAzYy@0*=+8JurIJ;uj22-|ajloF3|fU0y8OeeENmh)ex8L$)@ z)}Vu~p~I%mY(rR0y#K&ia5 z)0$nR!t*wmj+IP2C+Y{^XH>kmHm~_by|86_mTg#P)~zcPRC;eMixhn;2on_ie|UTA zxGcA|T~raJ1f)wqkZuG)q~oPKEgGbx1VkE??w0OWK@cRRBqXFuK#&fRmX3YDI_Fw* zef!(zoPEyzoj)he_lYsaGsYcPoS29EBum)M*wa3XD0@*_jOqUb=i*NnXPXCnxThDV zNLMWbODRJQtW!QVT_=G95~j1@4k~mgeL%!N8`v4Nepm~3$hxd7h9utf;v(@fc*^TF zc(T(@TI>wf`I%IemU7GHX=HjbX<;KiF0lb6c6q2UrrYMv_evG_k zTqR#NTO#7@*ycW$uCt|rnN63bGhC0;v9mRQsq)2Xn`oW-!AlC7t%IM7R^3kwBZ~Z! zsEatlmntn-5Q%PRxw8%=)=x-HE?VEFSB*V#+MpY99`f~1iXLOrq@AR`lGD6EBIagp*@5uP`?4aaT2yREue2tMVm<27Z}OEaX`nhz=d9pO>OfIdudiw4U*i zsa4)+KFz1hq}35xaJPLZBQyUb-+c|OxZSw>!!?@O(@~RYsm~8(s|N2}Y99PflH{+Q zZt|XMKwDZ-ong}1Z+1Qv)#UNxAY)cfu7HK}!S9OW6#ilP+5Kbp6>`4KG$ZHF0EH;x z!h%`8`(iBTMACW;}TOpzwyx+^Pd89k$tE<#kO z&r~UBT@J7h0^ri#An67gHwKxg)d<_lkK=vn#;BWKMo-f7@nl^`=Mmx!5OIf2=!hS35nc#00v<=GFYsNgDdy@G3Fb0HgKwSNuf!u-fQH)yYTU4 zu|-y#LJ{Z3EpnBc_il?hiue3_Ldxsy z6TQ5G!*guyg&%%azYknkF|aez4J#=raT_#xJ6}1dHx3q{Xr(5CV@>}im7xG5gtRG~ zTq+R|Lo>s+>CVmNGeD2rG|zIGCoOy{r|As7hKcdM>uddfH@;7+tLM{EJ$`z8&@bng zcKb>w418tcy6;aM(!B~e?-t88>!xizrTo|Ad_RTP^_pb$^*e4~_+n;^vH{f~*7$%nNEKuU0AHR(_tLVfjpes!*3QN0^5?22GNlsW%Tav_Q2Sb#9Tvzof@(?qod`?J|fPCt@ z10g2{)h2Yv5)H6=2DR=W|Jg?Fx!6B`1Lt!j{J&4D3WRdY5D=hzeZWb@j(kMq6Kt+k z(GxJ?!0$p{((X6?8LGd3J*DF9C#X-Om=TLiU=}WaAB5rD?1$T(n-?5PDT*;5*Ms4| z-x~QtpJ?QD=gD@+5*pp$cS~5l@W)^YxHICuMOuCqWJbjWte@tL{`qE1DzFY-4`U4mE;tTjE_d(|2YH;@6r_WRi+_{Ugl zo>@`4TfQ!bviG3f zp#9S|lgAe|lE-EU2-r<79t;ZN>kE=e-d~w3UWVQs5^!k@wd}*RDmbM<`(Xp?_$qF2qN#owc8a~v*wpfY!+%1n+C83u~ zd$*5twD?;SlxL$NBUPojx8Eg-%^+u9NYJJO9_}lxDzWeJ={qU58sZy`A*2&MOa`U zNS+`{(f6Sdbc0Rkx(%fu@pCFDm$w3$0c$dxg6Rs38PIBlF#kE%KHEweExxty-}8Lf z81ql2F}@b_oP@l{QdLVmN}LC)lV#D|V8zGhG@yB6qSl>Y=_|L75RAnRJ*;Xm1WiUk z^Ebx3WM0QIG|cP-C3EU{DUmGkIqze*>)+Z0=jthE6fmH6mb)$os>Jt&&e?W@#?24 zG;r8ijUO2yx= z|H?Ej(288I;(&$ruN5PPfzQM}Q*W_Pb8lSeQ_S~jReyZ^f#}GnIOf#g;4;i|t=pMz za_IC*3sXB^Geqd`=&mPWFi{~#vXViF7DiIu-yZUYY| z0}G2F&`KeL0tj*Y>B$M#(|OsHJN76LuXFCLe+z+*bcivWem?p6j0M7{)MbyqkjwhI zL6Xc3y5>VgS_}I^2-(QRhYv`EDLgS+(idS(@ud z7FYk5p~Qosv>7uM8NsndQ+V0_ob&|kB)cS#*|$Z6aGHJBDrjTRUrDX242zaPq-mZK zT;p)|@sn7byHE*ZkjdU75-K8?oK_VvJip`e^OLPxcLk<9K9oH1B_6qw;P6%d^D~wm zQ3vmr_5PFI&o-uBQt%feO>^Axu{)ljp`k_TNmI=z4d&4|3Tje z#)uKXo4I&+K7lTh*wYH*033<2*M)^aLmsj947R?eO$x5ALUo;-bH)(kY3;VYsDj7-H?GteasSezEQrRRtpQBKbh2 zo@=7|UPJ-!d$VUKlYHOs_h#lcE+?nov>w{G86=0Gnlu}yzdo>QK;%o z;J#F0S)F0lOI3YDJEPpc1K5-V9<%T}()}wQN4t0u4f+k91K3(Sn9jkTfXxJ1eibBG z9B-u_enT@U*Qb&5F$cD=QvHaC0d5GVxCW<-ve0+3KD-Kjx)l3qv;I)MUi=b>u3z*Z zaf+!G(J}e=X9wpG845keq$zf7UQk~5S3>Qo!GJChEy;p{F_ZzC0QA$6Js}`^J)Hcu zyp*Y0=9gyZU3T5?Qoh};&Gy!!HKsk zBsJ94iNdf(hH{BV_>iIVPod!N{Y>6hERke=H;@Nm3SuUg0Z>8IuM23OYt)v7ycPqf zNccMxVO!BW@VDtc%Y>2!!PDS?Vb}>2w!xOs==$!(BM(16x1dK$nBSPOoc;8usIDWw zNf68M6I^(#|6J=C&i=lVuDX)-$?%c@1R07Ya8@=YXhV%n^yQB1#qwB+jg zsxh9y3t1So)T4IWRa-`ey$zT?<`s&GeVU zE5k{5JNea$TmKoNsnVi$*4gE^zXP?UFxWsFF3HNtVeB@=n4CZf#(^?l_PvV8Au51R zPMMlhZ7@}Hk}50r-AJPUS3{@N`_~c4%Z(jb>SC#fde@R=dEPv9waqqrSkyOYpgi(k zexQ?UlCrKYB`60N-5dhT$G}Cmy_0Yt5?%tKE1&9W0q6wXbQsi(UV$^fFE7tumVn20 zN~q8-KRq9crtjWysUDGyi^aXbW)cI)hHjlcV3FdUpGz76=kjz6>hO#zu0c%B!RGml z7k5JJ7%f!h$hiz|!2flaX|ZE}8CgHAzsy|3#cAnAz7S@-Z}()FjN3rc;Zo@IZS(+= zbH7?IvyA{muG&(K9mUrxUB^$PpP1A=f7?E!Y-gdNTOYJoKt!4t{+Q>abcxZbynx}@ zVjji$==@6Bd28l$o`H#rHv1EjTN9V1a^?b@nyQX(kp+CS=Cal&t>GQ{)S@ii>YBE? z&e{JuQRK1nYt1*Oi$B?fFOilDptjqz87k(>C-k>gN%gj;{yPDom&s`HbHftU4Ye`# z>+-a}cdpMx{Op++0h-6j4lQ|Yalq`vQQn=Wq9}3}iGH0k$`OwT9~;10!jD=Pz{2?T z<>7}6(-7j>sMQodKEiC7H_Yk^_wV0dkG%yX4&mjHo>~AK_8E^w!yi14-+qoN%FmAg z7G4N1KY$hQK+QZ*BqSuFJ*GWk3j}g#8*1<&q!?sSCBGg#20o4N0>gVdJMsIE0es{g z7RF9X+bmwK5BzSryb(aSN4?g94w<`Z9^Y)3a|9LX9lMMJJFk-i`%*meU&D1S4*jnZ z69x3@Hi?awwznW`JU{3@3`K^XI2RE_>$#GbBp$^O0CDO zT>k4rsIQs{$enIbVjeu`ay9>2i^oA-=hx*m*+O#qAj2ma3#VK3;`DJz4ngw5Z5-n8 zylo|3nYZOv4L*G}-?hk#kgw^yY;#_Jg`R#=#>VYsh)Rr|;|pw-73R6IpQuA0hQ-() zI#stfEIqwG8^gmrzq5AQW)hc$zJEX9W`CYP@q^)mP3&Kg}cRYtv|iXg;bor!q)TgsHV4 zod=}q)m#;ml3ahItKe|uxmpDfeVDWdBLORR6Z424CUxl+IEOXS*7?Eq;jvX zLmohYpH3&ZV2q7}b0s%AGn&6yxS@t_5@=$~e5~0YVh6$4_3e#|DrG5D{Q10hb`i$; zoHZB4_(=jU{z{IGMFlCwgTuq1qg|RbPeXiseAwZxT)$%{781T%S8Q-yQy96 zklCHi@htH`YN2`;nP=cQi?9+zLK?*}U7@OJs5i97+u6OVc(UD!FXh)Fi*`CUpvP5u zyDdxa2e-wv9jXX+(|V$pY=n;0ny-D@rG?~qnKxSM%^!aRkE-fuHBY6#!4yvJUAmkR zTuuJr>W9yj5n1TYx%IuJ#)0%l9}evFPN$U-#$QX(QQs`YiuviO+WejxRaF$Q3Y?W2 zgb0e*#aF=3?ZJa2S#Ar%sPl8$$OKdFMfQRkjS@I?(gpic9YP@SzgFrO^&Ys}d zq|5Bv;R_zk!W$=;cm1l7iuyBF(DlJD$`_&-C!ynR}1tFWod)F=3dd+ZoE*WqUT~Zf{e@K0Y*r zBDeT>xcobHy#KK$bAe5ExV0oZw1xe(yb#8}3T1uUOfvE+?mi2B_o6xzb4}*R;4!X7 z=nr!ySKC|Ps4ITn?#rrCE(YE3^YEsc!6@`##ef*UyTrSEh=~W|<^sn=io2Sd?a!_H zJ)6_GQtAjIWE5B$c=E$TQ0!OLo=v&3L{X&lIo34r{=y1tcpJjUAtWbZY^GFItJQ25 zac%OKf`6h9OSs~@rogu}2TLx|urksTb-0uC+Q+0ehqr97vBG6<%!{3h2iKQ|f#%Z7339`p zDD*X|#~A0Ej*7yh3~#;6%ZbKomm58Fn8K$*Uls-9U-2-c8|YR&O8237VZUUGIi%^6 zS=^TJ^|s)tqUWhbxqB_WPxYOi6{8m`DIfAw8#pHek>w=SYbb1zN?3}Dd z+vx*89(s3d!ZH96mC88-y?QBE(#~4>h60cSv~&1@G;Jca_#x0$|0a~|C9!tfZJJ^2|0a~oIACaelrSPl1?6wHg5F}OK8DwNNa<-fE{ z^^d&c)^7C`&O{IE@gAk4@NIi6rlLSrLoRP#dE;8#m!Wr_8P&^@+L`t017+{5Vu`Mn z$Mzd9SlxM8Ft?0q6*{(eC4Yn7U`du$b3m)<`$?dt)RbLY+-^nBT` z{6Idy*iNFp%OJ-bqf<`}F65|N(Ghf`0_}?K1cIxKHbl zGPILp`TX~ijv&ow(bI9H!ZgTxLM_l99T|CJATY)%HV1} zU8I7xzD8XO*VvXHkuADfR-Lin{J=fM%5A_pD-fSr8AyKAnSKU{6PjT;1A?IxKMq0} z85x=ui_8VQ^>P7LUDVhbe@sNdbiK#JN9N^nWT!@|Qo+J zRfYd5ouB&b9YLMeV4^Oh6<$B*>8%G39?XIPp8AaPua#NGMCGWpSb}&@7v|1Wk6WoW zcl!h;Dy#wkLvdI_ReYZfjtdej>2+lB_LdkFZ^`b_Q5jW3kyW>TfgKrmAJ=7{IDEB8 z8F!>2j#KKlHFPeto4&Q@EQKnQJt4(kh0{6ZUi}X7pMZrYWY^33F?wGNzB{+Po_0C^ z`@kTbW`$Hfb3S4i!u>)IZr-9Bzi zSc|>o;&)6^QC|AYyZX+DNmg~E1P zg!_k3;Ap;)Ik-V1pz#@0xxAjKXoOmt1p*$&Zdeeb_oasYkv%QPnNh&;Mdpp4=|>sK z^w~lZo6}E@Q5I@A*PL2^{s}fyFH{Rg*UoPPyPJ*n8}k<)qUZ&}AvZ9qi^2!bAK9E1 z5fHDli`@`pP^i)n+k|@LJL`#Oq!bi19309y0Z8>>&_AYs5eD|Ez96^xy>{d1hArxI z?`>jIrDX2=$qGW9iaH1_Zr1M!PRXyAp)%OZY`Q>O9*z*k}e`qFu{-lhIiVAOR z6hkVTA!|^PXR+C-PrpV`@Z&^9McvS|uvnSp&;qkNahElV7%k3{7+Ol)g{-A*dt@?H zRd@IVnG7KdvbpVPz1^sR$nIY8%ip@um4Lhh%)ER*Y2`a3R&OitvQ8N+KzbC4t4;6$?$dd{py3KSoLHGs4>#y< zvnRGMQ)V7)SkQW|SfUtH61mrNT_DLDpRKXoc02Z>O7YJdJl(XS=DF(8%TS7_4zEHs zs9?C}8N`mc)pu`Wm~IEo<66AekjGQ%i9qH3Il;tCmnp{kJY)1BCWkjBGbhq^Wba;H{9 zOd$p{Ifu^s+nCPd+0jBQM8P)s=@$BmomKW5Q^NM4DI6d3JD4XsEbwK5J@X~-j-d}8 zhw1z@3HX5o?VxeTkabWvow9RtkC)Lk$=g@xveGL@W}+FrK`D*JI?S7sU%IxQ! zx)Q#~8pq)Bcttp#;I00)Z#Kp^-_>y`uth-kc{KWd%FDYhtFgRhHS@pLYQtmqs=Z0w84=nCUSyMZrfV?X|?gEZrgFdLvA| zKdWAxHc+`O^<|h&XEnX5SV$gov+TfMkB!>-Ux8$3Ddql;VPvv7{I(VSi=Eh|yp>Ewa+~4q4iRGLnLyo9F3bDz%(*lIUegjXeCPx%Br5<*UrDO!{p% zbjd%h&$3lgVz=k+^AmMg41GkR7L=Z%(ulf^Z0Ns2g*W`48v1|gclWau73A7|8FnJ0 z6Ig_wm={G{CSeeli>HsbEgE=mXq_alwm2nTcrb~epty2h?4c3{VXo&lL5yFNw=*;I z==iYW(Dwr_7TO)Q;(ikuZfz?4YQ%3QPj-|zF!)4(#4oVwYSoy9!OH%k`8|Z>P%hJ9 zXSRn}+4%xX`#$E(3-tZ;XTHIYw&)TkM*8u}lD12!tEeKlCnS6W_w{2u2xx3+W|Zfy zqN}{rHz$AK5&flE_8}R!T%wWjV2m?9`1H_+zddNW9)GU#lE=c9lrJq@+SrUe>Y3Tr z7So&EU_$)IuKXIIo9;0sc3uR@K6q5NzI{5{SCTY`@$t^Rz~?35Cr#g|dK-V(F87AH z@3nRI*H#M?4ZX>{-3K@KW4h-iC=9UvmS`e~ zNY=Q0J&NecKxCoaO znp08vO$RapU0DROJDIV2-pzcP87a-awo53k+`msZ7xJ)5qy9OjUY;X&Y?1$>vVHzIRlJ5;KcQm88wkk#;KP!FSjk-;40~aZXmZ8I1TslQM$g}x+nn0Go$ z>-V2K#gntuiLV|hH>dW<8E0ya0N=ZpesZPlLPhRXs#x}HXir0>r6EDa9?O~hl68*! zeMJ3X9@f#`Db46TaKhjM*l}qgo2fwNqmA913#J>NU1Gf3;(4BmES;F|jDhiaP<7wn z6-Si;7=yLlYF>xH!<$t0F+t^r+TWoBKt_e@m2lTY*)d=OPH0a!V zva;ULg{5)!RG*iLqC{MZac~l7c*Sga=Jj%TR*2-6)__D^&M~JA)g(1?1Ioz4yHUu2 zx8$;PR&PucrZII0d9)G(h$FUkJISjX)i!$ z-xSsj5`TMmpm3($q4V$=yN=Fi7VpvrSQLMKElBthbP53IMmqBLA**_ms(b>q*LB~X z)P~p+d*%_N8NJ}m!y!d)0JSXrK*CpoCB7R&9N-&>MNmsup7%EvzyPHh=9v^=KVW_*lELuh0^?!Y_$jUK15EC&>WN-zQ zP!gsryvOB7j%l|D3Nldd5u!2U%`R=9dQZm)sox zdZFL=6F~9Q8ofn}$U{M5VlI-T=69pAFu~=C9A{j{F(u)jo7pboLhAN!2uKF>$7b}x zNnYh^^TE4vQv7JihE`|BGBVFgrP+tYg@ov+E!#hYh4woG`E%17)NtW#``HJI87;^k za+gnX!y92rBLOVz=(zdK(9F`CiBDnd{^!qBF_HB~6h{$yg3oeD7!fX9PLmMX?(CG` z5h99%c|+)uQIY%^oxi5bg%%1tLT?#*a}|-V#_%c(?F*ncXplEX<+W-4#~1MLS3}O4 zJ97RMUSh=Ff~U=`yWs8$Zy`bs|3Qs?J3SENGvKjK+2es5xUKaUPV(qzDHK|$Wu zz-02@zxjPvLL}Y`<>uTK#f<03$I(lh1PB{SFd)urXN$2T-Wb7tpvOIuXa-1~zg8UF zRJ8^<3mje|f&?($_2}?V1x%=fx>aV6bLX_bKduX|4=s=qEnq@4$x5OXu^e#?ouDec zX7dd$dyKfxk!B`<#IZf$SWV91oy3-F{MPrpPEHZ=+=zUaS5r++gXlJkMRs9@$o1XZ zos09&;XC@*t1<;3Z~Y0ez#PX8Bh@wHx@C?;bO~KV5U-aqUmSC;kMYud_Bi(4o}eJfLCYR(hq}vrHOC>l=>#EQv}W zPYkirn^8vNc^}(1WRKPD_#?jK_IGieC%<h1Jyuv5T7_N=+|VF9xb~ zc}L~35LXn1j+I~SuAYw=i@n-k`5~z+(5`ZL&>O6SuQv9pVmWnBO);?z+^9l+EdLr7 zu}tJjCgiheK@CRY5oe7%gaL<{F@_4gf3wG?jdKqaq!=6Gn+G*-i`{-7{%+(LwO
trowrqcyd)KSu4czlU`?LX6(=nk)X??S9kai&V*_B<(cL zeWP;AVa%eUqO5v6#?Q;PCimdf;dGma(4cpf*;L}^<_U${ueklYUoff}AWqYDaHgeZpL$#i0 z>A}B-6_%AAOj*b)#{D)_Ohwqoe4^|x!H3n!bP++gj<)WB)lk6tqM)V>$NzQ%L)=d| zmkJ;j)2q8UbeE{ukFt$>*uZf|Rj&8{VX zJw}(~N}JJAHovT_8wCcZz@kY`(AO_N7m~R~{}T$SjeHd%MrU{xKnc1vCE8;lUxfkr zDps7s2%`tF7amNMp^fW2WI6ljdFpxsXh2X0XR#bDX4gCZVUL@WXC{-X&aq$eV=~r_ z<|(GLhU3?*M6Mb9a}}dgZ+Iw$;dYsM{iC zP^ao#Znh~dWpnqZ2nh+hiJ?ES?*Ownj!T0D{q#QZngXorT0ocli~{ zHmCHc-kYI{N(m2he#u=C#P0c#+*@WQ$f%Ls`WJV3crLF;_qRUOpw`FQm~(C32euVQhrstt=| z)iaBP>+KQgjU#WV?l>+-M)o5KPy;DhD;^#m+(`7DkPH5_`wliBdICUW2%s4k@7}$8 zuiQK;a-HXK_cd#sH}`xZBd={81^Bvw(hs-A03G)tG*&SCRU}f?zX1r1hM8F|#SJ>e zeS?FaHX1fsvt@8wNuLr&Zx7i_eE7-5$A@oS=n39XeV}gOxH`|^D~NHToLcW|C8*;F zwY|wkRSk`p9(r4FA3p5FPGIds1}cvdFxk85Kayxyov%;^@tmpM8yvyomz>?YLyPwZ zCVQhM&wKwT?)3LJUcA?_Z*MsmKlEkDG3>8c`=Yf&tb-=GTVl#Fnp1e_|Gs~DP?0*$ zaay{+Yxv+&T(D$S#N^%;l5~cdWmF2y0gblO35){k*FFzEY>U#D|EtsgBn^SL{vJmr zO9`93yohERJ8bs*XOF)R<%0y&<()pL{f9%F+@pIiKG`8I!ESWmVl;jyHe(t(0TVZT zkqTGYogu>Ueu1YdHB-N+PDz)Gz*7J9re0>E=&Q?D-19B8y`SH(_$!qm4Jmof&T4{j zYx+%hUeV)=aY>g$>_+?g#1s{;0*UP%wA*N2%fC@_Ozfi#5CSt$M@tOuArdcip%-QjtZM@ z4WYKS_U!FoX&_Ca1#PAHnRvZ@zyT8H0lTKbmK#8L>`C)Te!SFz!I>hkQ@I=yE^y~g z1~mGRJZSS*I#q*8FY9)+-YHad;|4%aP2t`>Lynb??|z!;d-^R!$=y>t9CdOCF_Gtd zf8ct9>Rfa+wa2vhtv|EUd#?bw8jbb6g!!w5?v?y~jV0`q^U01FBP=*GQ%L>|+-#U}-Sc;Dx0XPG(w*bZy-D%@`6HJkm3m@gzWr>yF zpqL5CCw*$jQh~nTZ{~6BRg#a=&(uumJt3KDJp71ho5$y;9rqoFwKg4YQs8YV`(=Hv z6hUtiEin>n=lkFgZ8Z>E)@qZ&*`xFUk(~5ELYn_|-sSUmyp#J1;!YG!Pbe%g7_n}@ zIbtP$t%9KCWw>@+_mT|IQzA4LT*Ep|pS%gLQ`x@!?-=297S#61DgEC`M9ZD}{r_Ju!tIOQt2u5J zw;;ns$iJ3XRD<#J8Sk&`M8<1xiBKXoelQEu=f26=hJxB7@!K1p(h?IfOc!Q7%HF=< z*byA$cs5gPi|Tl$N16UX^<{gOiBSFN`drx6bVx%gVVB<-hWrT$I1g+-efC{D#e@4{ z4XT!sV1~n&3RP`&57x)(FCy{k&|SF$a!QzWrZDMG&%2%r5OYD%3Mx8yD@qaO>MiJU z+iKaKz=w2_c6GhWB%iCCdRL+?=3f4CG>FXyU0))+tZWJ`7uwm>?RPPu`RU9`S3g=% z@!MR-Cm=w~pX7AJm_lR3mF7^?dYpHQA-P2u(V|+sFYLCL0T4)#ER!}d@#W18H%X!! z003@a-C9jeO_nv7u}azO8GZ@~*-?{E1e|N8PKloR*E0I(7!4_@b3nXGwFrJmwYiRp ziW*p|(IGeuDE`ml@vjLXjdDITC*qUP9dI+8*xz>z_m6iaz2Cc?>Z9%nOVr&&IiosE zM0&_$f|;N*N)GKWG{%#Ntw7o3_TdAtALI`Na=Li1L!f#7pQ<5=lj%ak$1C8@hZifckva|BWS3|esWe=NFCNkXKLEmn%P{{TB~Yr&bEt3tEw^tDCRQ$EVx4YMU_Eq=ZL z&*7TSvdR(?jSe}|zJawIz-aRhQLr|*vYG)|?y(B1SAlGE&8RFWun7?v#<_$;Fn$dg zp(frZ+pO9n%Zj;_8)A~DLs`*XE2;DCIwx(ZchwqnMpq&q&PG zc5IU55Bov6m?5hhG?ZRIzdmGgjjqXSc|u1jLW9s9O!+>So_*oh?W^3tl?WcBSOW1k zZ}p*}0yW7eWkqk_vQG&{J<0!~yb_YtQ|q+Bp;4tFN{F%Zy-CvC(lWzsp9;AC{NYr> z479z&!^2YNbe~qfJew*594|-&X_n{m@5e5s+R&+{LnF_K3PN6KfT6Yq1R$$+qf7zN zK&gG66Q5;k9v*w1$`=`Tdm2T2v6C8EQ+OFD<=2eqAh+RXExKDm(x*KM5M7aDde7WhREisnj~`8I1_1uA!4+In1ZQ-~AN2tU%vhx6V*fv(r3 z^j-E#>xQNUFF3mNCAAP3 zXl*9>u^ovx{=UayBptJlG~?Y6#yw43yVj=OGuGH@pY93@N4$x>mHOj{kzD#(SmqLS zP>W@WEVs-=lnCim54uJJA8~v5w4B%yQ3+mMCmyjs3vFjAH0zg(X6ODSW@5CdFVsQ& zEC7_DLuB_aq2Zu&(bY35CU}EwPtc9)t{ci0q!yg{fY?}aW3z>}*+QQg2IdDcK>CJ; zmZkG%i%~v&D7ZzHE@B2s4_bjPzzSC_>HyuX;2kP}goH-Ca4st=lj7=wX0>kt$Y{ap z?sq#l*~^-ZNEOBRVTT6Eje(=in3nnWv*8Olg6ymuVA=i}C_MyPKDAnp+M03o$HR%iL zx5}mxQ4yXhHcvvx%d+l1c@v`d`ejCO7Ah;Cc05vpZ?1n`F%_Yk^q9_>nJlk#_;2X{gzF!Z zR*aujt_ckM*S2rM_%CSad3s@OP~FG;0R(ZjaHJ?3Wbyl zoa(*c>a4$+^TeGCX7a|j=r8z9C|0XaL7f9c!))$ z_v!Mp?4s92L_`GEWVg3ge@2o4Pu^59K$*uh**gos%T*HCu`%qZd z(JBXRwABaw!2&wF6McEY3NGtSEt<(f#YqBx3XzVgm1NZExNxpPoeuiRO8wAQu83>T zaVjNvJQw_HdP(T(4PjVR&$lipXUv~IQ>~k8AjJK%jpmld;_uNftvpd7{F zl1k?VIj%jBK4&!L#+KeNTAkwk-6A%TXooz;S7*~ttMVxL186KmFR!!sjlchm984>{ePfyK%Z z9(?kb%E9%Y|4=!&lQl&sYJcXv$$Q|JmpRP%)q(fF*Eq;E-1BDmiT$Z-rmbSpmugQY_FyURh0L)zq?LR*vY?B zS@CpHmec%UF|eioY8PA(J>4ZB=QR)XJUv=$7C$ET*t^d9JO6 zS+f?CnomJnc+bmqcR8y2W5M%Ng?m-CJ-@t5{fNxFhDf61GD2Stvsg5JRaKC>KF{`n zoUAVhi-_ck@LjdO?JmwikaBtn+CQ_KGorv@W{q0(#k3YPFi%MJQc+OQm=DfZy1-P4 zXU)3@^gsJh+Nu?tFVXZ0VDT#2*wJzS37;56bU)(N$No{&s+tc7OUt6MOdai(BQSVG zXdXk=Y$x@KHJ$Vl4_~pbPRU8DE?cLuN-h|-ymWo z-jtSG56Y)oSn>jy%QnPB1Ox=Ys=b^J-9eK(71>+(1o(B10bO6KS)!xyTOWbWO^nqW zKxfn>ibW)q+fJWR&M5QCTUJ*Yyr^{(e^nweYBr;n`6`{U7nOw&8ChFayObgOg@A8X zCp1(xW0F6P^yQ6`U|_g?>84725OIrqsT>x~x8o zI$qSdt#nVC9Z2-j?by(M`Q3~7j1be#sm_@>4(}Ki&|z^xp^sN@Yizu5N&BZyW?FH= zeAxFpb>S)=X3ca{Xqx=KrT(}4pKY+2JQv3jR^9^p+dxWZ=HVc4wu$^k!l~CiOf-Ca zxyd~EVS7}y znYyE}>J9}HQo(0z;OxgS9zB^RdahGd?L>&T1(|0n5NEM~cNAq6R3Bp}`g}7(c1uD% zby1!lOLk~*j%xP+{pT%33*BY4C~)2hY1d|2BFOd4Ouhiz7d)4*D)+-fus z`T`1`&ELRzNN7qOyuS+VaIvrmD7b;!ntFG9D+vQrgi zfRfWxmA!E&&`^is<^{84oyXY|I-xzW1#&Jll@7$NF-rYQ(b$L`<=}xb)$lyP@+;ut zapKSKI{l%D@Qm8yuWJVIab!qD$7hW&*QDB9>6@dWo0an&Btg^;&fSQX5WErNnE!bf z*&VoX;~GTuF(Q{EokL+gkEhHu`dpgbB9y};+8&PA4H1FviR${EDL)FLXRi*SqkGxIG9d_k z&l*?qT1+%e(Xv&6fx7X-(k7O_#EM`wia6`@UR@5>tpJJ{L)8&fZFCllRXp6sTasbm}zy)vNgKKDx{M-D7r0Teb=<>7ouG4jzs;H-_kH~Iws9ly9Z~)x4eabi#ezasA#<5+%ac(1&M|4J6kWC}j z1DJHva?Fycs_RD}?rI1Eg?-IXv{CuDk{}JWY%rr*pe4(Y8yZ`8?A;rux#{ArXh=!=&0cMh9Wwd~F z$pp4j#hMJcToqa?L*~o3IG~9$Lz(V3AF60#l4Ja3j2iFPcuC{?Y^BRgPFE2&fL>Ra zh(XW)ywh-Ygas`@2~V)h(#)}niUrw(kWr>#%s6n}4ypli?XzclMh>bhyLW%yP<(ML zN%6uHOeksTy8(#Hp4GcU9K8z8uuVYl$_Zflg7qyYV@n=$!BW5W8N2M$+9R- z`_TC?1i9eZlV}ON^`Uw|cn*w$!>OMgW1-jC$>F0WEiQmA=f9CRibtctMV)BFa!zg)wI_Dk<3sa=iR}}u^`Oi@WnjNU1uiB#( zkk!eI;`z1Av~P5jq)=V??%Q$u2I&q+s4k{!Fr#&TKyf@SiG*zHB^;_Krm)lyGjfq1fp7 zW=ixd)r0zd0yht7*I3+w4o$3Xs3_6P$lx+RS_#u$d*OzU+M-H-aD4Sp1_O%A|cfyk5!KS ze{}X0Kvk~ayV9i~jii)Fqev;G(g>2$ARvMg5+YsFDJ3l((jYCRbcd1($d(jj(_MG% zXYO?zhuIF_{@(9v>E;?(PzYMp01L@xCVJM>*GSL`kvjLj{G` zQl*OYF2?5@#+A`4EpjZ!d0%rg48B)$R1JUyLJ-u@7-7%^62wRnM_yJ&#=CS=9CMLE zST|Rm;yUFs8%d%d+2B)9>)VzncfSmtC^%e?z<6`W64ZrV8{KIe(BMNMZ|Qlmf8>_IWO%#E_;*$whhlUy~lSJU0P=YdH+9m&aa1rIqm{H2NB>OLmm~;Zn z5dB)pCu+|wJfN;bFA%`29^jOtw7QS(3hoOAFXAIGz#^cF%_o`+8GSq4AM6k~eO3gX zzh_7}T|Z&6KDcEup`yt-xBOmd_ueRb_YY!!>gzQBHh+(lq{n=Duhtw|pRwhF=3Ym6 ztD-ch3lYRcgtbxwxl;P@^%?6E%v2rL28UZIA7$41*Y?jBJHL3?;$% z$KGp4VTSl$@OLYJY?g=8EdKA`Nci;53tW=xX7GN1;M5=r7BInZue>)#8ZP6_gJaY? zq5s|d2~-a8_kWuAaYjm}qAXK<;ohmhFUiSN=h=X|7V0imdTjOo`3`X9l>$?X$VV{_ z!-}YBP-jJZ3%&*o?2srgP$x%95`&uv@4bZIA^%t`!R)YyF%RhZVk#uUowM^^;C+bB zqGAGHsd!OYHbm(ANjs~`4Xk?fLL2A*7W#~8{%Jn3)e;9Ygs#! zw3vKo;r_NeP_~>D@L31SOREHRq0zVU3|i1vxFzrjkMvqPK(>d^TmSYM{4A_P7dq;<&tE`1{`=-VKFez>`LyK7vF(z ztI`YoLV^4iP6E&5wTalj25F2M~<1(Ieqys_RT1B zbMeE~?Be1KGlI1OEI+oa^`BQ$8i|R{eNw3^#Iw6PdB->CUFamj3xggxT2<0e{l~!& zCa)O0&a7rE-1JI#KJk0_8%OYJS*0b7{M-Nr!3I|tiL(O?Xa(13#He z8CYg_DnYGoVgKP=Jkxd`0y#T-fsHMXi5w}@dQUdO{vudl-8Ca6 z<_gE9#sK;0X}%9xF-iAg@M=^_qsf!TKlDbcJsl=e)pqAY_aW|nzi1qs*f=$`$FA4F zw;CLn5MPtyzjn#!gg&LQh1e+P(P?&%F$ECO!RcKzi)*1=2`W_??0E@ zg>RMEgsV1{P<>#cGJnz0+88GCDx$>u+nmjl>KF6Y{xM%U!~(>9dw-|eXgl4)_aTn{lvaPs`P(wBa>4F}E#_OlV|7S9c09->Y$ua(Q`}_O4p7 z*Y(eqhAlPQ7EH>aZ~|^jkzxViO2fI=Nk5B&Skf)au?MR49>;J`{QrBh<=NY|Qw>bs1Lew14#?RN}R zlUXD(Vt)P!$S;#Zyx9THJ+bb*a>Kiqo&z4H`8@TSQWDt{S1a|4sI9|SH>t&76$1A? zDn3tY{lc+;&?=FF`iW27Umn!sEvj&`A#9$b(^9WraHC>iF{N4NHvB?ruf&CMWPxN>@XUdbFuQu~i`$!*J>oG< zrZ4l}Dzpg~abqe&Xp@~K8DO~$bUgljq=LzlkFCc^I&v_3`>4jgqjq56O}Mt&?16_J z`uKzprS-S6Sd;_E4rPBeI6 zan9_0TJ5LRsC3_p{+#2eM03!*=EKP-zr5K__h2uEmm+2Jcc%~n`6NR6#oPO~8u+f< zwFJ#ebbNX7cUW3HR)cYbg$h5c=sAD*6Dv?L_<;COfzTI1x+7vRj|ItzX>!cTUNF?6 z2L9EfsQHYH4655NuNyie{6!^tmCGRX(+1xlfGT1ZvJMWM!A*<~#X(GYI9l(-fa;ds zZ3OIcVseu4=1r35ffArITKs5gd@kX2=B}-)TVNg=_L7%`uM+HJPM~&H;EAH)SqxnT zEfLx7GiYsSZEM41@Yo>+8c~87?ZYJV4gn_SEI!wr>*V>ob8ly}3T{KB+!hH z3wMV!S2N-W`jD$K;_AfrryMpmHrf*Ty$ZS#&Q5>GLoKg3| z_XGl{@d&Ly2bY%B5ay*HRCt!=TG=^h#GBpD2??*ClG6^?MBJgpzdGYUf;6DHV32#wAUfHmF7w?-n?=5 zr9MaM@xjx@pIC=lzVF`R8NPF(Tzq*EtM0TF9$XTZn)nw|=7q!8q-5c}bTkoJ9Krv>a#M@HZd{)3T z&5A1<5;PAg(WcM~xuWz9`eJvG(=;%(I)O-+x@tbjJTLwFVd70JaLJWw_B=)WPC3h+ zgfmU0JX6SZADNRX@Z_A=#Bgu!gZU3&6MRGhQapOlbh4$mOFNu#ALp&I`rh7^LF%o< zu3gCb3u<0vWo0p%_v}CevKVF1#dus#mx0Jl%Q&`vb@kdG^^^h7?3`~Y|07z~@EaI1 zQ9vpM`fL16XGfO13vFy^?wMa}F8S9h^Ah2hqc*T`1>>T015#;zHVFQ_LZcz?%S_`Od=_c(7E-3drv0_=c~6|yzBz6jt$JK- zsah>8{W<&D7wY{yC5DY5Fqhd~W(qVpU#^Z_f-JHP4(W}(?=R+1zII^EH|qOoDb1_I zP~>#hmk5qb&W0T|x07|Rb9I`1_z!xpCs3zB!lS#MngsrAdtg-84!QiOjG^GE^yzJD zABy2FYE_5_X8(!@qFcqt?SkB6qgWhf%`C@cV-V`bdqyDzT?bF3ShERC*pbx~37aNn z6A4S}yBjHfRYs(;4^~rorX;@62K0AkZEmx?7@={%SU>!jZf&5)B|;YLrrIrhU?-iO zjV``!)Ot{2qwT^%GEIHwacrpr1!Cs3SF(*I+vG<=QH#g63h$1E)tgiT4C7}{(RnEy z^gkvCz03OYwRc`4m1W9J(6KGn=GR=j{mBn!uES07S1BPKP37d9Cfv!Ler|iK^Rcas z{W}+&MUqV)m}`w3&*z5xe4(}$1?65I8INHw0-E-EQ6!B@h5{Z)?oE<(-2_S#`4ygH zYmuv!Q14J+37s&kvp2i;8mglOSILQY>pbGVelL%%H^>}=F5n`mN_|6cN6R7?$MNd#|!vAL|rHPaFIMxc*?i4 z4_-+^Q+$||_w+)PyNfLtwRG_(?Jy5gZdE`El5bAPzO@e==2iei@?OyWj~gVVf4vob ze}BJ*pNwY@DnZ8~d!}bkg8ZNpJI)ANka0@^!lmKP9oU*oJv6s0a3z{;67NUXaqI59 zb=A~(atNX+S5-HBe4I4f`k-SM0XC;-7})OxNmThw+JgGJ9D<+3DxAw! z-?H?WaVnENaC`{dADcdVy%=M|e#q5g5F+KA_L1Vv*YP>qQ$>c_J4t55$ig>bpJvHF zYu{-jP-4yLo58g@+(jGcFF_ZA06~P**QfXO?4m_P_w@9}N=4n0*x(ym3Huv_mpJtp zMX;QYT34x#?R6S{e8!77EqSbRhb?S*Hf)RM*OHY*+(n+#Qkd`JSwydY0#?%3dwgYq4af2Hi7JO@5)4*A#r!?`~Cj7l9l$ zWZmfwm@HjV*)_Pxi)g2_pD|eT!Ot;hjI8@6+GI1Yeyyam^U_}8n>Pz~+*Ks;_kbW-A3 z>Xku$k~&W~lXm-5z_R=$ataG{2335dlGF_CfA+?s-xM<70psq5+qB85sFF;`A7A2^ z8nw+kUlch)q|~zJ^W>fc17o^+o@I&^-QY`I7>E*Bf3NO$yq_{IBWk+M?jbtVCA$Mrr6M_mhD$RpBJxGOx8Vpok^T|-6X&^UQ2j&{6)H<>?jw> zCFUs>eAcWk8H4$T*xw>k(jzavSJQz;q-xa0p;-yHYPxn?Q7bURL%LM>1xnUpnZSr%`GSShZQGz$GC5lKYyKh zba1k3?_BcyXo^LlsmEzW{Gq6;Y@WCUEB_D9_GP>lCicc zA4z!O_xF{;lZx2!pSgQyH~h6l!ydNlalhNhPvd;>z+2Hro~m3BGq1G$^Ki#Sv1F0Q z1!*)~ed%iJU6kZ#`_bj5FR?01^eO}ApE(ItQ2cSgdPW>MOD+GIDW8tNr$1Q;0nSlR zs=L;}&=FC;-lXp4Ks>HyG~@ZUy3XV2)4gS{h^@GS1_4fXI@Z9CzkAIjcu#w`eS$n) z79$6ryy0=%P4TP8B$aQgl`2-`i)@FY4ErndEhjL_{&>Aw5AxXp_E_J-eivU<#B@mZf1MGPXq10ae z;ztDJYlZrLF4jR+lj%qa+Y5z{O8OqKZ-0N$oPSHe{?g>*-l-?1?%>APH!fR6j{IKz zqBCH3bu@w1Abj2<*8RfZ{xyGUARG# zo_mu^BNe_+qn1J_jjVj|$$JXDsVFHa-@cOB8@qBiB&`PP_Nfum6l<6Cm%F+;I{A;O ziILHJH<*5kLw*wk6FYxUsydW!P`*|%E!xsB^QYOA+O5^@?a-j(?WJ=t03$y+`B?av zmiTwHgyG&E)BJJMW`KfGzRp%s=%T=UD259%dim8M`aLzXo^2}s`l!>u)?d+*sKO#I zNA&MNOgYJMyyXwJL>Di%#Inbb-@B66WpSu3=oXD*KzI-vzGzw6n9KANI5+%bvV_#p zK~BNg#FIR%rI+jsY3%yPIay0HXkA%oo=?{H27H5l(_P*}GHG|PuF}bIjLL-au#%D{ z`?GmIR*^8EJCcLGme!UiFM{U;1t~Nf{0v>G(g>a)Hs<~1HvdEV+kc;Vi0!@=GPGx{ z-u%kTjaDg^?Cb>*VoL}0qUDY2X$7w;26RJP8n2E_1pHn7t7MBzK8T~=I(l%-xh$h} zKS|Luh{8qTO0Z$H7ryR%hB8yxe48Ex2_H60ypFP-6lR!smxs&{hEv2ytVWEG=e``O zY+xV0!pB0q#nmAXnSZ_GI0$D{eZLx8_=Ck^sGI#+?J=Kfp}g}q9g}5#!}mY6%g$%m zr>a`|6#k^fEGm!T>R2*)l$S#FYab?wl8?EFb|ogrUgvw({f1#aCM78>aQH@0xXXJ& zyvK}ardOdrd$F)j1ib5(KiUqy5Z^qJsioW@yv~8(8D1wDx_UUF6Bdu!(Ja$87t&u< zIh3%LK3S^iO1_$u<6c;@VKXYSExQRDTQ{S*o6WdgMb#(D;tKlMxF0vlVQKwsiMI)R zH7|Y1$1j#q1Sn-^L_#nd(}SXI0a=kot+UOr=0NIlG|-WG)by?5Cf+h9OnVOlk%(ld zVK<}Jrhz@Vf??}xvThyishsPuMXSb>@f&&UzE!|$HEDqO)zNLOg20KO{^olL_uE`n zYa^ewNfA7|21WhlWyNo79wue|*G}*%8U~9BSIcRcJeFUQs3(3so;SsenMzLBrD}b` z*4DPz&Pdnxz;tA5G8q!vxx!9C1j_PD@1RWZm<#izYAp?c+dvGaA3PJAA2_IR7NVDm zi(h)$=`?Te`ujRLCg|xf-nxco*ImbukTg6o`Hj|NrFPDxR2HXXW$(Ah?SBf3e6;AS z*QQqMCNtL67iP`OGnU))vW>?_ZGYEO_+*dB;+#WHIndvpXaNN>gFDKoxG()2hP&%^ zNd)5*?Cwi`tac9nyN17v<3mD>d_|GF{9B!ZuJHEfDUF{~Uwd74G6r zJOd|(Y^WAPN<(L~=F>fz=$IG>{ZY_%_}Rr-a;9wvEZTxQ5} z(oVAs1>S<+I*v-+AQ=cmaMKQFrw4@~5 zgF??wveX8xKyn#X!+^vz`^atSozpA?#3BxhpITcnNZFL7;CV(xM*~}r;xVAu1%@^) zV$2JnL8mD0&h#`La8vPgD=mB=s%~j-$ATKI`dUuI-Zx7Qx(HuiW*~jyFnAs?GBGoc znv08zQ$c41cq{~i2f#%yY@(qH7Z_g3OWoC=;=2$R7smm9h*g$1pstyaoQ#r_?u1N) zo+iHU97-t{J_Y-wzIPEGN4J@J%M7%MfPS%QRSf$WmfiiwRf?m)tx=?)uO19^nI23?J2m2D!5T(rn2mpG!RfR(zQwFC-My6-G z?BVV=_|*s55YI#ODFt018*4Q`ezsE@X|&wtMKn3CKH6pV9*Nov8A=mzxCXA$0ZtEG^hn2uT6j9ecO9X~#w2d)$?(liCJ;0!^Cd5xbObm#|boJg1g&&VWvX^9^wR z$~cN46M-N}%c1Yg-?x%9pC?Uz{l!mKB@?Z^rh)Mk1hbB8Bchdok?D zlKI2{Df3h~RNlNF$iY)4$gfF^oQZf;YcD})TrRJ3&FC_Mhv{eo(W>AMsK(sE<{bD~ zk%HKIe3mUG8u#$*6;=tTXej~su3By)O^(L1bp)^61%IDaF)khMyPyvDQC+T7P3T5E znefeA;K`Xq$TRijoTExrRDJmiIy;|ygp76EKbSo1U zXlj0&_LM^@RH{}YD567xlIJmO1&OGmRSCiM_5*>F{2-Ia&|r#4))AG6eRfYMI99W^ z_hd?k+Rk?o?G zAt=o{1)R3x%&G0@U(vpXi@YKm>mc7-DqymZf{E$6E!s$q0e%V0c_n#v z^Yia>&%ajRZ4D^Uy&@}XT|il&EPoaOS|O7O|Jq)?F}JR&%(pZyS(bhV&f@~ZcQ=H~dEkov;6Z*z2?fnqqsM^SS?sqq)>3Z;#I<>~1ueQgS~!u)3%-SXYnTNE*! zt*nYjkZ(COJAhCbI|~G!AEOz0!NI{C90r6++*T09SgcQ!XC1v}WMV>x>hgUxwVO~~ zR)N%F7t3!0&SD8emxYDCSg}Lh>yJFcP?-1MsCwelzfpCW{|UBE@A7xEv1NH!H;*~a zMMCZgw%|;2bQ>+7ZDAwX0z)(PhIq!x^8}8dJ>G_Y@5Ag^H?%Uc1ksNOE ziv($?Rmr`q!CeML`}Y_sp&ZcEpB=f3qKI<-@&Gc{NQr?So)bIMA!}NFyp{{>bR35~ zBn=RpWk*IWA88r7NZZ9$CEhd(Nm>O%bCnDeX{niJA_P}peWm8e=vF z{iy>FzGu=13*(bqcY*cII+&uxf<|(k4t~yQeW%6ha}l-bJ$R+0revQyL8uEpZ7}=& z^C!N!IoovVE!lq&DzN3!wf&FvPLEm##snW)U3q5qb8ZgWBB(K>`q-l+XJ99B3H(1; z0+uy5uQV4{7QS;R1~c-7`nu+335ZCxrW-SRlJK#>EeBL4r4Yc@Y>4_=uSm&Xf7PFwPf9p~ z_y(67)7P4hRyfzN6BN0XCR0_-K;zGUCFVc59)vnD-531k5z9i5FE(}fh zK@i0Yt8sPVgWV7|5F!lf+qt+AO_t;Ia5Ke^?4RGHmlE$PU(IH!#l#(_AiqphEYEMZX)kR}to3PTuAG+xo+j$X< zY>#=)<}sOZXaKIT!H5Xq497B`n?F9z?} zhr3?qS8Ih6jQ5wfmTw&sBE{rBuh)>?Ty3uB!0hC0H$Imox7&M!e!IWX*QmoSOXrg1 z$$=ABftxlH&y7oslr#Q$6(V~P`6{y_6^wuCNWVy+w^9g<8-qcKudTDw6p$)s@Qxt{ z01mh+#U_I3$nk4R$1mm$l9k?Bj! zcW&p~2_)JQ>~iR2xNz5Vv71qxUiNRtVZKZi751DyE@86wl6BgLb^>*qtRmkl)m|!w zRSbA82?Jdv~!e>JW!j z2+bi3J8197YJIBb^tC(qQQIe;8L!z{0j0w4dDeEU!gy@w;rCB_9%+kB5NJ;I6URgb zylKAFTB1(0$;U$;UZ7|CIeB4LeE^;5JHG<+3(g&jm7u!%R^r<1q@>Qleoe`KwHD z#nS{b4ci-9);BRvW37oNHjgoAlV-ymYDk7$lrNkmsA36-ao;??^vl=hzNP@K;~;VM5L4Y~jf(FpCLEholkWb2Z;>D`LiX zl~`!(OLnkF6yF3rVMO!zYl~TVn+p_?H@=BVPf9S9?sH@^_%X8zhzxeKbbe%H?JiMi znI6FuzBNO$H+IcJNbEuh;Zu%puBV$84Ju!;dG>NGB*ZQ_a_AbjOA_rElKG_vJ)uoi zAr@+lRy7-OJP{nGImc!*!{5JnrO_scGD)+hBSnmO!j=o)tH0#6n7Jw6E06aiL}_L&~cHS?LfjlzbTlze70afZqQ`;(nXH!=(EEOZJ`l@ zlD(%?8;+MvNWa)VcV%s3%lW*x#;Ltojbr#?(VNbzEqy6dl9pOeH76mey4N$JCA>~k zb+TW-I40NjO~ga#ZTXr_EtSk8tFTi3r;Twuqvv(WL6mGi?m6G1zCBXaULCbyY|4J_ zWy1K^PC6un4rqJ_sM@2L-hjfJ5uQW;4yuI#Vj0!^XrbH(iLaqa%@k~K6xzTL4m$1& zpbdf~<=s0}HDQ12Zoe_wyU!@J%fV&DD|8a+uk}_`~7Y`p_tq&z0 z)1TZ7P~+KiQNS?UAvk)wQ?L3xn}99X24U0#BmC2rkmHuaXNDHCkJa4IF&-~7 zDP!Y^uaex(udzKPU~m>mM2xyP8QVO4&ApzN7dzb7%+qo@GV1Ke&Az=SZL%`P*D#!U zF`&}%-rCl$=5YrF_OFLm-1>gK2ALtx;>0$idXMNY&!wfm;HW}O6p(E-{gE6R8)a*342+GV z&IHvVZ3o^|ek?yE$bJybFAk=aei`T*l$JV&4F+g%@18SUxdKzHMWt9*0OchE@mLFZ z^lpGWK>ab(cb(S>JcKM}|)Z92jSL{#&Geuoe=;pi`Iu^@QR zy@2@gJPfRFB|)=b7-;T%Ee2Zawh^J2FB4!^5%LlS!mLusMpYXJ4^Ig{j@q_nRye7C zos_Y*b>N8F?6%MzK(pq|VW{rBhkgWXSA%-ZubM#UAq&fn#nDyDged&~AJu-p{8H&3 zP}|F;o8RwvF~e|qc!FMx61(L~pH%Roq7!MkgNCbXA{*ZRDx>_tGxzoff}Q8#ozV>I zrk(i~z$8*#c;ux<=p=&cxE3X+Z=^zQc)#SQn2}c4>hGc>U|MB_y3Z~~YSx9QW7IG3%ab`hx-`2qt~44m$RgW-yZ=VwOxvLOLE?`2 zm0-E`sk(}g2uP{An?Ma%zrmR}p2t8f3JLwtex)zi4vu+MO$V<-XGVJzbKu*yaW0qs z1^Jfg?rwJe(nhJ8-XFjSd59=Us}Yw~`YRG?e0jLN0yCObRx;lII3SeKxzR=>1usHI zc|h#;4XpC+XU-kd^n)&o?~Ll+2c4a44^C7*i*Xgu$b5uiRaVIl=F7zfH>9%DAj7&6 zGpTWoZ_q-x|E<^zNIV4J-O66M#4?MTwv})0M<(p3)Z<+Ux@7I0P4(uUbZuS^+K+Iq zH-7WCUcPg}X}YvFTpWe*t-s0l6P7r~s5~44vkO)we)X|LLIH@8D(93Y`oLAAqfgVr zT%oo`9>3ABOL05CF&bKMu$%44Jnk(evb6!O%m-;JB7F)>Az6n9NMwLknvxfmghb0} z7jOiNto-N6W`7ZtngYhR>50SkGQ|-m`rgw+Ng|C`6_W3Ug&w~!3t{Nck}|QU)>Uo{ zhv4|_0kWBkO7=XsJ;2xySaPkON_|JqX>MjWnKpA$jSS#1-xh9d?WPSp50>WURSa5Y zXu3(I?pkXs_g&U#=sEuWD}MCH&rM#B-WRpw=fZnuj`LWC*J;O66}#6nq~%3^;~X_=A%qz<=P#4buIgItR+p((HVf&2~!uuf6*9uP#AoqYfU?q*s2_JC%yo z1*OM}ODBcK#+Uc-%YjaShF@d+cvfp~jm;@x-ovyO^E8jRoWX`grl{R7PEeiKr0MQH zQwtEYdqdg-c*`?=Z!yhtDo6;*ShVM0cvJ>wJ3JN}VU@w(j)P%LR$f#IX73G_FD@%Dx zK=P3wK57Sy>4+wD z6C^W7yqZY@&E1QP&5Sb%$@?!Oy>M85{(5MW`ThKNxuew4{ko@q^(O{jzH*)a5^u>; z->9r17x!wUO)Y$$v!W2|{&v6Q?vwLyEOEYb-cT)1zvB&o3=ZCgdGLn+p}b%fPrIh) z=@qoEg~dtiwaUqG>ts3iqvxkamZ^(MYB;eywC}8yh{@waR9EkppFg;$rv7!reET@g zH&6Y@yTb;*^^0$mh5j_QTZe{ZW4X}f%cB26#l1irgsymC!Dz~kH9|J~wouz^@!UM*y)lB$N& zMep=*tws-;b8$3vuecz=Ht}`@gG7nnarN5wa^q9iA`l*y?lr2jm!_LpnT_17%L7hi zWS1i)r8(;!t5P_wQ}@2j)29I_{@_dAEr0LqPf;%gx1{evO^NaN!5uxlxBFr>bwN_` zzDCdv7z}O9(0fg6+8OSa2~L)BZA}|P@h!B-qYrAdd7n{9(?rff8!Blc9>PI*ZY+9K zZYn=yj8xb06vwT}B;iYMGH=SR$KB`FtNsLrh1O+1Tvxf>&rk2ZWw=q3$v)-6A0>%m zXN*Ufz&@lHcy+=hrH9ty-WQKHHh)VY+i>0C8eK-jUDjgesR|=Wep%Nh-U00Y7E3r- zdlPP$UcPb#kCT%VWskdH5r5rAgi1p#N7Q-a0t%c^U0~{B4Xhg}HmI*A*&;!_GpGf( z-Srg}wP))2YOF=NRrqYmDJpFX7ako>0ieO#ID!T~fZULR6tdK0dlLKbc!0uFf#~Mm zp3aXcgNmozd+R^)j^gYTz5xJEvc_YCpCWNC1nqKg8W>3TqD*%doZ>N*a*VnecmoR- z1t~?AHNgQZB+83rT3JY8@70LR@FDj*SIk{BmLPG(KBu1MPp^#oMI zBvP0K>XyEBNKRL--iq9e^kiQTn(T*XhUF$$;M`W2Y0ZKa-jLaCB8sY8=&h=eBdG&( zujNQF@~}(Q-!(6M`-2t)puJ%p6LM#{(SNaK}3sx^ClwU#_~jlFUeu_k^z(Ed=gj3 zeN+rnHHL*Sq?+WZ$c92yvCdy~OPGci{Ut&4DC9}J=xQ`_p>coa{`Mm8Y`D-7br0F@ zb9%`3tMqA$7*P|qO>-dye#4$A{b9?goFt3d7y2~+z;~=GAHPw^)%LDtFm-vjyI;k@ z?k-Fs>7Plzd;j3jv9}=sPH!&V9rF3nZlIdDBiEZ*6r1+^{QMpun!rbrb%&o&AlhQj zexWD7Au4*V9DZ=#?iW(cQ>L4FFN5l+sK**SFvu}jOR?95vftF1`ds#;WU9u)B^WvN(}hDQW!^GOw~qZ*}ed{ z5{2uskK| z`QBvRN8v2HESZmD&mtoJspVaagt!!qha)fB2vxnaY!udo8ro~pAvT03KL>q2pNrLp zfw8By)i-%n{M?jVJkEVEw5#lMdWx_AGVnZJ6Y5J6AV75l14-6_3>kAx>2+daVn{^% z@>>uwbevU{A#@53TZihqpv_bepjtoAPUWPe{93<$CEeV_v$eD9VDLK0$edlc%<|WL z+lX5BC&GdEH^BgEcl1!Z`(irC8RHEdbptd$e_C`3Y@I#`%4%DY9F5p@eoo#W?)ZLX z5>g4h0eZ6{`o;*I1X|?exEoZ+C+Zy7SQf%ULqkb^&CJZy(I;`#jk69OA8vcKe*1>+ z={Y@ItT)aTn2Y>YMRRs`R#sW5atb>;8WM?g2EaA8C?2|Qd!Eg8goiKe&55ExeNpi5 zYUAHE$Db%{_$15)i=+!0kM6v<4D~tFzZkc^r?pu^8!vV4v|Y>NK%8Yq8^J$k2O@Z} zm@us9^ghe~4ivRG@4&1uZN?BjS3_NKMe!3Fd>VThb@sl6QjEqkMRl5H8AtinWwcjV_{FDTY4PRc8!Ox8`uWOMd{VU8D zM^_hN;s4YkQO^Q{Q6;lwx>13{a5)D16K1Di!quIChKBxo=p9Oc)7OrtRj{F%!h5uN zP}pC$!URK*Ewl$mNdtYRh~R&wrQD|rxUmVgyjlX}P3vep%BlEzQco`LabLJsl1zO; zM%f1?8r;%2`+Fi9a)f5oSjS!MDg%H`F}*Ps%g^-Z#r!E`h@rDGO-WZfQKJb@h|E=C zCzJatnY%IFUKHPRj|v!Cefm*0fSJjM1l6#4FWuz|EP($$eku`opv=u^!XEz`rVj}t zG(AsOJmt7ei1pk0-K>@uoxXKpK3I<%wl7wQ5tE%!MsV%Pw5_o*s61klawaCZ{@+eB wz%&Hl+2#bKQi9eos-N-S?UoSfN%)-OSZt=OR*&rkU4TFG(#ldL_a49gUmq=DVgLXD diff --git a/docs/_static/ms_bot_framework/04_bot_settings.png b/docs/_static/ms_bot_framework/04_bot_settings.png deleted file mode 100644 index 68bdba9de486a5bcfe7bf2dfa479039e619fba3a..0000000000000000000000000000000000000000 GIT binary patch literal 0 HcmV?d00001 literal 164448 zcmaHTbzGF+_BCCKD2PadG)N0bx0KXSA|+CiLwBc2cS=ZicY}0FcSuSM-Sr;s?{`1% zd;NOf|3+tco@e&iXYaMwT4%l}$Vp&g5MaQ;!C^{Cib3Gukmumw5EIc*!2d|qh<=5G zBZHF?6IOQA-kC?XRaUDhrmp$vN*vGhKb5vDp9Xd3>E_UTuR2Yw=@&*&5|GHY;Y zEjs2ih3H@kng~zGEF9btd5`4VeEZ*0vqd@-cWe!6X)Jugx8 zM0lDChfFI3kD>6-mtnN`+n)^6w=Z3!8kM_+u|RW2G2q=A7u2!VX7yFwvT4=_ggdkW^;aFdW7y+S5dEi zY14g6&PVV_Vt5=;QYQl?`Q={lW>|~Z+3uwk;*?QOu&1aE_IzwD&!>?L^dg<+qT9H1 z4%3N;c+A>Ac4w<`=D!67J`lLtVfXd*U25}tuzz~m37fCSnJ>0w`7})SHbsD#l{Kak z%XM#|vDES4BN8fVVu#naKNnr7_W_Hg`~~+CTEW{??Z(UQ^)~q zD>>d!KFecz?^GGyV-l$Jl7-&J=OA337yFxSNL?Rx_#^|0ii*%MF~yaYmA!8Wtv$IB z&p@?ogGNM5Z*%3EHaTZxH@ZSv1WwKQ5Tw>ciIW0DBCe(HW) z(Y4wWiF&d3zsoa?Y6}j>EGo-Tj7Ker9-J*qGIHm(>YuMh2@!sV zqnw4uAyR2pNxelyPf8rSA!M^vTJH#R$K>yv^Qxc3dvM^ot00DY@tasAC>Yc1xFE=$ zP7^1DJFwkj$hL7K{%H-bk+(zF;=ZGM7ZNwbnMB`n{o&m9L}9+Su{tZ%Y#Qy_1tzQLOfAU{z{N;mz7DBLvPe|WI5yF_27CUksxPi+05)MRTcm zEWzu?D*37^f*v=Q9Gn(&pUtMr(!h>BUkMc)+s>4U^$!R@wcnk=-ENYOdHKcpWGmM+ zNzg-Ei_3JPV5;7s7~_OWawwiXo1RrEM}d5GC`G5xTc^?_`u!8_?_%iZmzTfxm)o5Y zUC`&&jI4=kCy9COmpX$9O&(U7jEjKn5YD8TsymeAi!lK#adhzX`lZ$So zt3F@`p?~5{t z`%^`D9x6?@{UlnCii#IqhTe}4zrfh% zW|G)f;sH;qfQ+1eF8kx5iHC{|!;L&cetR*bORAZP^xQbsN3V>%cHu~B#hf%;y!q^ETG%tmu)q=N8M2r3Sa(8 zX`^#6=3qh}ajRqUx?4t`Y0(<>2nuGjc0Zv>Hh=E*d87J6O5tNZjUct;>6sgB=n_n| z6o-Uyvq1IAC~tz8RaY3T&K++0y_~X5fvWZF47}oHpY@M#FGV+~ES5szJ~hnU;;fQ~ zsC7;6&+;NZkuPpv*hW0jK~4GIt=Wi|T)6WfbbaMu{aOdD(FOyb-WHPSEn-13fdV@i zh*nCrF0na!)8aK9Mz9-CZ|l2vx$L|MUC?@#d*)YQ8GBYnZ8`MQ^qceE8x%hOtL*dz zr1i2EsLB^B8YcEYSKpE4D+u`>5f<6v5TV2dkZ9JMqzw`(VP2ps6 zZ38<*Ek{9k{KH~}G0rMa*+-2w)89~#mh=iCvy_$|*XN~M#f}HRN)a5;W7fQc*8Ags z7HRQb-(KyRtkv*64YMLBYrFaJ1t&~!M55YaUO7yw;2rCUr|x{69aJ!Own~LzhU%Eh zz_0$8Jk1?!AZ7LEx#G-rD+UER?iZODJwD#vW9`UJLi6e^MK~5JTlvA38#j(Qs&iL6L&i4JBsHbqnyD|69!z>KP8`r@xW29?evBV-OqJNm zk*$AxPn&l0ca}@2^uBJf*9MZ^2{sqIz9CLkD ziVQVWtIC8QP%g+tf5&ty=dyoaXmE9L%YL{&&kFLioLnz{VqfqTURt&MHME~SQP^A8 zINtPBxMq)7?0nt>x^(NPYa|xXFAR zQ=`Q3hF0=XT4V3ANnDZxIC~b&R_; zpPRmPt$Qi5KcMGm=ZyQ^=NcyoYXIJVP-d=$|GZP!caATsuIF85e%*Sa4dFN!n&zRJ zx7$R+R^#C(3zye;c_;GtbI7DoYf+YKN9SbwqGTc5s^)-h(CmJ?CM3+MdTuus7&v3 zUM%YdizO!bTBOMYPTNg|xAAP^=Bw8SU8MOfnyT(HLhft5xUM%Rlkcj`@XL&bLR>En zb5(vN+*1CKOA%8VobZflJ4C^ft9tn4JK6+?yDBOQ}jD8lh#o! zWqw~F*~mm?=6t%73vP0|*4owNc2$vofqA>O?)7J_4$}vdNMuk9akdJ)p6J;ac8!tw z1rEZUHZFr@Y|Fvg1p16hD<|tuX#4s@5;VR`PPpEm4(ud@W?u^!X?~qd+vPQUWgQq+ zP#ev*_q*~d%%l8Vpp8{mf{EW-p-S%2lIGAZ;4*Y=lIddWh^B2p9FTNVz1&)y()nmE zWl_X;kzYS=O1&P!V>2&|=HjSG__?q56Kk01G>^eBLHE>xgWJFq|Lj}gr2~UJd1LHe z{{_J_xjTCjr)6zucfHqD9dvoY^7ZxUfY@qay@{RsDN|qtmBCK=xg;GZdRHm1&IeO_SK0C7&;#sQcRV5bJoXM%mWFW?Yn$41jBguDd zeu*m8$gwT!sTp#YP+Z44=P>%SL)QU?%~QNZO(}FTdG+1qGdz9sLqrnh-&wsQoA+6# zo+#fknmJ#Y{lhKg`O_Ms0o1!2=(d%N6e&Q4FZNfzI&LuYu+8PP%*Q!}vHoDBOGme~ zLp%g?Pd>}};5S}cwW6fU;)1A{5f;dyxO6=cul+;$6oK*64H&FQy7Cj}Rn)EX84Tcr zi-(amR75c>I^Nc9yKY|f1fy!HFI8byCtKrNx}HH2 zT$ZtH++}QwW4N%oGwL%+EacUNwOP1BOwXEu;g&|(M9!HbFj5qYxUh!LoXNy8sm%Aj zDYSLA4G?C@=*x)*H(uWG!>lL*6hz!P4=!zW@d4;H=8k0 zrB;K!J6d=vJ!Qqp_=H0-g1LtztZBt6(-He!YGF*Gsw#`b_95LrO@p0ZljUEu!&2I} z%eQq`;z&R-_%*%d_4&(JPx)+aN0w@0vk!&}u(KN?LL9C|Mf_LwwGH*5OGx)?Qqu5HeJiXG=U}hqdc)>NcFKgVW_f!ZrK=V z-qv?itp(%_k(8E3RZ?@%!*kh5dTu2A$u+inpc{J=v(f9U%dI9<%>}{>X;BC_^F{6du6s8=6?=L4`Ab`VB*T(s4t?3!<_)JM>e7N^2K^b1t(~Aq5$R{# znl3Ii?3sG^F}Oiw`dlcFr1|Z60=ZXx)7clBR(b`jZ?^>b)uy@2ObjzAFSpYcjt*HU zSDV?xP#+x(q~z5vuE_HxjlXP_7tq@f<_Xe(ype8{q{GhjVM?nlxC8^`YihZqoFeA@ zp3JHmBv_fWhKOD+DN~~N*6G>SID?9wCW65{_a+eOpIHL&?j>zek_*IhN!ZFcY@- zKF|0O_-->qYcLjyBKiBdaUH|l38;}gI?@B(uHB7&qG+`==yRD5gZl%Kj&82bx`qV1 z?sz}!V;&0q$?Zs1z#*QA;glg_P|A^2BEiBi`Atuh5>2J|iEMAkw23voTgWM$UAi&M z8Q@mB+(4J3Rw#am`LB%E4mjGJn$NGECqz|*S@y;~a~M@d*V6hr;lG#A+Wqd9U7vPb za~vMa#aXG)Fg9#D^MiqygfIF2sx*?$G$vtefA@@|FFhB<8G)s3S{2e)GotOd^4BkJ zNLwf98Zo>$60=NHYx*2B4c)T75|3RJGw-%;M|x~Fx)->VJ3#VFGn1ri?kH3Q1y|Rf zRQ@g~DovumdRR6_{BYLR{xgP?Uu=C=4kqU=+lHq^1$TSQv2nW(xKF@<=|Jp zaP~#d`P`M2RI$~|@ls;Qp)R&%zEV^2vNAdh&FNFY^1q=uVyaE+Z+>@JLzKqip28 zbeyCmDL#I#h$|gDRcpqM(rQvxbd z<Xp=(8W z@h+W8$x9O3%LYDlESMjDMc(Y|o*d;yV{5MDo*dy-AElWK@JfL$DUYpWw62tyOm7no zZ&e1*VpkVxD5;=o{>d64Dg;HsWT*U3C@i$~^utmz&xT*Rv?yp9ql5aq`gdr_Dej?h zo3tvwI-q+G%L^>-)?TXDRbsu?t552q$&v{Z6t{@M%lzBZ=1P)HBc4r+7Uj(9@aBA?|G1<;8)l6l zg{o%mc9mXoJ9GJZCd&>{{)OesFqvTH|(A)#?^^bt{YSljbv!z`<~N z{5w#y*1D~=72e+7{vF_#CW>{SpYXP7W-3kf?#b>!I<+DqCY`3B)YQAiYNbOr<1x&G z4}V#smq0nF7ndVMHJ3$@dQvm1f9z>`5@OkXlqRgqx=fD7eSs%3w1@cdJ0=fJnvbii z>-}sV78d6Fh~sK#fcP~bkC6J$g&!+B{uzTI6OQ&K9|$}VWtJ{BDbqkN5E`V zS-rlu>*& zo$yUcVuBot!;;x%BUOi;EPDL_|0D1epVddRWW%|j+LA2EQ-+_GlBc%!UH=ju+I<0d zupK?_w`2d4nl3m?TIFX&Dzj}997^B!Xm?JV(F7VWSc`QM<|YA1QTY;TvoV+}K>0Wt zn?c9*0J#@H;ogkbyu5ln5mYnSFRhcJp8sBGM77=+%p3&hnArX1L`}2CQuxO0Xno+* zWYIh?!v^5A0}+rhKAi4M?~SpwRd?ugB|JfejegUS57ouQmfHq8_g?x9-s_P+q`Ie71S*h6FT77&P?)8$6vRu7)N6aq=3SYxT%|BY9zk(%1& zbVs=GePtN0g9V@KIa(x*yh!QCj^T6(tf`rqJe!P%p;5}=S24KLWgoRwd%!*CQ!Qr{ zO!1W(oll0#3}gxeT^V!%g8~3rRsw}m*N1MxZ1_=DFae z$AJ5SIcLsp5RUO>Sk@Wqx5E91D>$xeJel>wtlfvhO9q@>_EuP&-*(09K z0OGX)ZNBlpAHgTL#&UhSLP_&yW=G2Od%h!VgMlnROv~FCJ%vhW+}ye!FutBN^2rT2 zHFtn_nrd|6qWnBw(HqT>k10p>Io6%(12MmI#pTZMD|DUA)h}s;C7oIZjKZ#SE%Wp9 zhyY4JG(^NncwTDNDf`_dykF7Q)}CRb%_?AuwR-;XW-V5yt&4`2S~lS`pmObIA;oSZ z8B$wEN+J%%SwXE~TtShMpJ&q?FPNCg%LBzbzfS9Hn(Crhc-RPVKc-$GKhg!=d zV9dK{bUJ!Q{q}iiON-EW3evvEof`(IP)Z)LsIS=kS@k-XdZVka!fNX4^;WyXyHf-O z6J8mA*WR8k&!$hWa6U2Fz2bfOk|-^$M@O~iu?LChKf9CX1#-a)%h(7c?01YBRSy$6 z%^_(jX(E1O7-H&AOp2%(Y!(}wd_b9w&zCtB_4@l+JR-Y^SHgB;WV3ioS!&Xd-S0{j&!W|?0m)(9?^1VinX$LZgf-$xuP!r zrvY+>@0yOsP34vdmusPVIkm}H!IwRW5ZCN0K;99RSrxe2chmpAr)h8x6DCS@MI{#l z@t8R`Cos|H%Jr?f0Igne(_`{U{pa0o_2OouKb>xM5CyrZQyBSKruux5%K=zv1-~v( z0ach2*-(61;3OMQpHN7=pl6eITM;6u+T?U11sj`Tq8AdSPd<@8S=}xt?Xn0%^1Pnho&`c4M@;d-VD*rK{j= z3;jLQ3ABkbIdnLiq46BWbSoeN;_8d{RRh(2fFS7EqTaZGL?nNbjO9X?cAGZ0mx(HS0A8MPE1!Ei7xA3MT@ zEhfWm`w(!&VqdVSKF)g=h+#c?B)9KIxq(T-_b%Edo(r1n{P@Lgar3PR*r@wk3F5ss`9sg{=TTOhjJL{3RqtXO7gHO!Sv1S21T%}a-_Bu83$rY&Xx+m#!Nr;qFrK2ot+Ka3pKdN^X)R-j(Dsw+VFU*j#q=f$k0M6KAW`-E3(JI6r>D*YI4)u4YN=1>)ASNpyC$^d)SksI) zhGvUZTESXdH~@(t*=>!f!XO*!gL>@REn_Y76Vtfy7_nY=wW4?RIl+uXfhqwuepw}U zwye%9PvrMroG(d>@!u_CXvMxHp?z0GFt``g$i|;G2uACJXv6SB7HntfLTh!c`9>Fq z2KOTrF6dkf5SSGj9IYyQ)!o%H$BVQy+yceR3EVoq;^wUeB{9o#oyVwjL8{)EICYt0 zZw;jg>NGB9Vv^nxpTg$TRYzpv^Mrj+^}3f@(dY9{HvlOMfo$wG--V24M@5H2xT$1f zetZ`Xv^?L}-5tWjtheKJUChcQFpYn3(LWj`_{AmiP&PS)m^TlK5f66d&tF7S#X5pD zH8mf9{YV*;{~@E)DSoEF8j1Q8JFe@S&43lS!{Nd(qij02+N1)>aknU`8)H%BSVS!9 z>&YUmuIr1VF%tNH5L)+c4`TRoR9InlV<9wgs>)tL(Wfp9^N#Yd1wbART{4bURx8xk z=PO8+3CdgaGC33*$jHcgusLq})JPWjUirC37qy#kIE7ouO-e=>h66Wele;Y{MOssV zXH=?88zAQB@@+Uc^6f%7YzwgAqz3e+a}>SB=={j*Gz^Y-Oq!oT zgHjzIkfXJX9U(Zh6j3$rUH2{HsRTXHBwaXH_9fP9O5eNFS>3#+KXMco^JSR~4EBvu z+@9s%Rtn>E`@ChMTiJ<3B|T2@52HETw>&~Xns?JU(&UgZs);gv*!7VEh|^5MYdMqd zCgLvV!J{P?cj;>lCMj-y^udX(dw$IBdpGLdH=87>V7@J5P2gFp>*{S!K*1!)Nq}KT zNYi1-%35*DEd*dUm?Ah#BiXI>MsJ-p0g#)ckml{YnQmN1o^!p>o0Wa(zL-H2%~uie z9yTIyzJ!Q77y6u4q>H$4VzG@&a5294;aQV-aj%?b%4s z=QyxzU+IMpNjk}Mnv89JTkg4nkhu2xXiv{Y2mf<8^r|5LaNt$%%T@+a5-B@R@SQq5 z0=i|MsD8&;o!xe+(U8T71A4N_WNx}v4?X$U0-l)Ysznb$44{@mx;ilDaJFb*j`}P7 z(_tkZcei=Regzewvk!^4hsi}Pch0S@S)40o9zsS}$-QG^t7qU8MkyU5$6)}*u+drq z6IDZ1l8^5xwtdf=s03;OY%%EzB-t=ZPLQnQF%FW1%>wJSGD>PSAj|`_2*pXQD}{ki zP9a7`YzBTZYpRwn6{Pn*q_>Mcadzev8QKj|$>#UC*!pxhE~Ia5qqM>Aa@hS6e7kO| zzO3%6oR1yS&w1^GF8!qj5-M4o4l~}ZM578Hrb^?~^kGpA3p(zTy|4K6C7#_QSmjEC z%QgCPMkP;K#qQS%=Y@a3PG9O(JuO2P)lzJ*CdVQHt8Eqz(1VK73EBEYIh-WZE@Hlt z>xEnWFw#mB5f1{64)aU8{HwJic8jo=i?m@wm~>Al2BKJ>vkjhygky^ymMAqTQdZ7z znY|<319}tbCBhj{2$wffJVH(uK5IZa!&(3!>X~YA!U-W2Br)xVpcv)3Uj4o^t?6nc zz2J)(l9Q#&_z}0S`Im?)G@+wi=$P-ktUXLqEQW^N`MfC2>elqS@;HO( zwq9S%3JiZs#Y)>1tQJ><14Q1WHrE`p+=ulQZh6}Bn&2409`n3~qe zNTuG7cPV`?YU^eb1?naq@SJ~1lo5Z=4pbOK#`W$NG47S|#{w2PhfB&IF(d2=vO5_e z^+=Z33fomc=gy1XurBaDT3A@1e;&$g8CslD10|+Y*VWU@7hyEk>k3w?%w2gfhl!sL z6-4Do>t2&)JpNjy-$@=@i}KEvJ%eN%E>Uqzoq)cC2#Xa1GrXF4Bx;q$L)h#p=SM z&pBlM$xLdD3R~LwN}zR_rWwqWw*#3oENamrF~P*=d0K0;2Jxyh%c4tL1pu!yHY!5C ztnR&ZZ+EY$)6my&qI|-*KJ>%#jzGMB5fT1GprMz^VTVYQ(J$8lyB;7Vyw%GMJ25_G z;?QlsNI``B>YRd*-N&5!s2xi%=Z2E<PvIA_6iG8Q}(~YpsKJjL9oUGY#ts%wk8>AORk2zUW&v z(jkkrHrbbkz70C7?8D>ZD$!$7IsWy>I=~fxThrIf{qbbc<7|wJA!FS=U$t=Td`;IY zD*5M_`C^GKqW+QkUsEIGJ~h}>%Dgo=4~e)elIYdQ(}D}Iqdny~`GSEF%n1p7l+rBb zag&f2!spMQe}J)I?5l=V3?+vsDuN1;Z>Ksf9B%%OsKCcaQeA## zzG~b+Fq8_smfc1BaYXH5# z3&>+=81^$rgpjI5FPK4t6sd?aD7^qz5%B;*=(fMq3fwYn+uSzpY8$VNUoeQY|Js#s z_y__~gkXW5Mau6S^!c-;X1)DDY72i;X)nFJI&uPqksQ+bRsIMb_Nq)I?p|5m!t5+6 z5ZF~1K86x;e|M+$YWCfR_`i{M1X$wjyo9S)JrRuozR7rAZf}@mp(!091%s@F`yD@h zRIz7Ci2_huH49T?QJ(7~xNiI2WINyD`IQg3V`vLPn%W~uFS)mBNUiO1KNMwEQ0 z&-0NH=Ov5JPeMzz;d3dNu>?6dcGu+AH25KE+Py!YYdCj1XF3}FWr^-R!k)k*9H!}2 zKeyi8u92Sr!=14ezp+mtblrG3hne-uOO=+aa;X_GtJ8e^<6LeTr&r(V9S*K`re_8$ z3iqp=Wd3s9I5NOwBfUx%fw#fPh;ErrGTaILC&Ky{k&vqh2w#&V+z>k1d%|96bRhAs zT^iz-&yfTm@>OuIH0b;#$l$L{ z@5$^!_P(pK=kTTv=UmS2{iYTlK+>3ypgbZ%6BHD@uQ)f20KvU>I;52Q#&a^rhfAvf zw>wkWb$+mdoz$Mp@AASWgy>xN_!A&csT^HJP=iQr5O;`o#yjIeNd$hB8T36SA|l!| zCLtjyDJ}IXE{2L|9c}z2ZotuOECk{~5U0QJaqAfihh?>urMFrkXZQO`RLi8PVKqMSvTjnAMH?sHFK8TYPnQC>_Q?MHcl1Hr&!3IcRtdq)Ryd-Ijtx2v1( z@RIWKc6l6{U)iJ{%H*zd_f#pJ67@r(*J$qU%%#vp0pJ&$UOMP-NJWA%sJ`s_#_KR9 z>J7T2?%nOp!RaiRQZ*Z$aRDbLe_S~Y|5x_ja}D^ftFO6NiQm6bk;zY4fTZRS&wS1YMwX&Rse#XF(DS^%-0xFv2LyOQu!c)etxD5U;#J?{2U}4-3 zP?q-gsti*e*H+$_i@+rw5Fnun%w+EiY0nN1MXpX$mfQS}XcfOx0*=`mjEiJ1U!FcU znJmHtq$41%-<~uBr#OeNfZOE~Xkzr6X9qnY=Cf6KOMFd}c3p+&bEE>UzQR5zz!p|% zlG!vT>Vty0yt?X3BTu3SrWSzjUIM})6wQjqooKci%GLWB^}nbsDL_NnPE=Y5_d}=T z1THv>&bG|(#^14){Sl1X@S=!krg(Q9r_+)9Thn@x99=>9f#8ACAequ8xHSci9_RaM z7=@$R@}VI0z8mz#gdAfF9>d;2P_K-1&k7ge5et~W@tbl}*gRmpq0 zv%6~r26`|JijP}6*xL(BO0s~BVF|2KxJAGNs9NY5OP%JbuTRMdT__+{O6GGK&W7qB zHKId`d!#QIH8@u`f2PyF6SUundq2+9dJ8rntJ##L>xBVVV{w4L|EH_*0?TH%D^4)1 zz&kiN;2u|-O-u8Vs84IBi6HkUaYuT&d|#y6`MYWsd%une@(aPp7;zkycXwCSeR0SF zm^>6RxhADzzaGp{*$PxR5z7^5qmgyK{7NMqCDiU8{FsEqA7C_fzs1Ezgjts)T=p;9 z+S?DNjZ(;2SRUUqt;Ky?4hNePyD|VOqUh?TRM6eK&Vbc;c@`R=yET?_aD8FJ zVW=*}eB-b?qry*lZzfjnpF;op1~s_fpgR{Un}tGTbO;WZfi^|n-Q&MZU%PCmGJ}y- z@8ie)y&kN!s2|(E7k^xs2f00^g$CyTOjYmB_+`e2NE*t09(k;W4r13HhzvQ z?GDg!%G|MoU_9HNED|nNF105cC4he;l$fZ zrj@^M)lQRDds_Vd0ig$e=tgz%`{)X!Z5p{Gg-l~x4cB6g@qr`5zc(0&GPv;9!=qnM zJ?YcF8yJTRhNa>WGIjJk{?zmtXil&aV8gG)C5Ouk4;bwS+U) z0{3|Dfx3VD_e&OtgB@AfPr29r@m@mP8M9jYdla62{?|DwG>nWA&+ ztj>I~2!@Zrv!ICC3&S(qr@$Ic|Li|>^ZX~^9S)@)!ct{$#TXtNA}-_zd9eD2>POQ5 zSW~Ybz|9~ZL^725e-FR4^xi+!1=7*!p9|Jvf)qh%IdrV(lrg095FRgtViRV-s0!de zj_=XQNyouT7Y?7?B0T7QBI$2}6~O%SP7pShiv#5IV04^L(KC-k+b1~kmrh&CJM-nJ zybiEerKrh7WwFK~TE%aPam7?kB;q4ZHU z*8cozo(R00Kw>`1tr=z_}+XN6k4z=_pz+pjwEEe7RSW z=4wiF=`Fb~{T)RTo=C!4+%67rjXPtwq`g)jZvs|5LWdfdZ-4|Sd>c1{@r>RV)w8TLJm*f+AtpERd`kFYXK^wB>~fkm!L{BSh~2l zI6CUPG2iB^d2P)ajDV9?!`0BmF>ZD05c>UbhMk$(2f+j!zWMq23)-u<)#kGy&L`XC zN}WN}+F!E&+yR~(Xp~w@u%(B-0hED?b+*4P!Q@`?=_dlSc0&t5!=M|u0_8+ZYcULn z6BEw>r6yKyziYWMNW9t?D}Qr;Du%vgZ#~(V<&*CaCION3nWl`e>{7omkSWt4@Tt$b5|_pYP`IWAN&FeHNEd1I9Udm zTmRJ54?sOf1tMt&5KvA+7}YJE!YRT2OkVxb0EZm$wuJ!k3`AJPsi(P+nAc94MJUF@ zL9IwLmm|ibs~}Tqs>ywx2VhYYd^UX{pmr{x&r_M{0Q&}TkOhMwFJEu9`5#h37$-8= ziVo((I3dU~6z4zQ_$DtDMa$;NDjA$OoWbS~I`DLIOndIiHS&~=0UR#8tOp!;K`*cU z^=tR%`~>w5&02m;vTaaiW@zPd-YfOWCZIeh7rqxi(P(mY0=Jd8=?9P>6^ld+EY+Av zGcJSbhr`vL*C5&jV46My36hJuQ)dqBQ}3!R8ts9T{qjVyUXJ)7F`L-l)X2KV#3d}j ztNvtwZjW{s?VZgM4B%lBe&^?};t%%{TnS|oIFMl=Z0(UWp_M3Bq7L&VS~g*!^7a)b z<0E8TB>yP06EtCzUM~Js6uK#lTx}M~;=>`_?<+Gf=4jr=r84$>uYCvb1OSO&hld~g zAUJT}AM1F|Y4$wn+c#F@5z3_3RtWcuI`Djn0!8jrAY(4D0CKleKBU~}sNPy+4B>gx zbjh4p#iZTv9DHYp;E2}p?WB&!GMKwt$MedL6GVU-vIAVnt543rhFZwDVkisD zwnADWXR3wog>Qo=3d|94# zc)5DfT@;ASKbzcKRNC!3gYfr0Z=dF0=mP7B1y_gza~t{vE*@*!;Cuv?IwIvs;j;d>4hSVvfk)@{lHU~=-?RG@JbC3X7%T)7<$NJ6K!Ghc_+CRswK(6qU7zo&{j@_o1HKAy z=nz{UIA$J`dR&*j?_oMw$waws@j_hQ*a!fg9kLfMumK4$%FCIX*k^cwkjrg}_27XL zR^n=79<*G@Nm4maneHirYX2dPw}TKo4pITPC0@u-^ZAQb0+^zrBE4lOVEusm;{B%S zWP#${pp=^pw*yjo+LoFe$phrvE^JUKR9V`d;-u^>}SM06}`2Y`eiPX!eXc#aypnf=E__r6p zFHytrUTFg@P$e<0b8>RZY9)+a;*ss|5piDX`4%nhow?d_da0GI1QJjYJJodDGMfaL zS`Et$2gani)G=Qh_CN2IK0WV@S#4<$ig=K*-f{yRRcIKG-n64&9`kfdznz{tn}sgI zVn3Kg9!dedS_t$&9u;@_NNa-)Fz7!aA}S!UrW&J6ia<3w^pwScTwGq}p?)Z1QxOrJ z*5pnr!@_=bYjavKGus+Ro$pB%7x|bSy3o*>)T}R6_L0nJb(R+il? z7|5+Qhjnq=nj5=jOmMGTeFBEl+Ksk2F6zxbGi9n$<}5aQ2#Sb^jPDs%ny4)A?tWWk zI`D&GSAjwH{q9PHLLiX9V6r6IRVL#E`0R!XcV_`nhvBmC&UWSlONLoa;zxmMZa|YF zED=Ii`-39lunT*|MyyXKQLQ48V-B~RVgRJ0RS2wS5X#lDg-d7C^Eg2 zudJ@Q|6|(-!BhMT?Ln$IliUtpoG;j&-?_ZwP0Gkyw)k;#sDP2d<26Q9b9b>p|CCw# zt)JIHrDFOUUEn5oQ{OA_0QC`xZ*^?!rRtX`!?4SJ0Re$GX9k<-*aA|X$?QghNbXnj?JM)G{A7zWXs3NY#^*&?(Pf9ba=7on)8X0S9sNLB2_x(yQJT)?FQP zdwnEdDHxQUo$X9UQ`~11CLnsz?{a_0)_hcEZ#-DTnv{8bPP&B+zdJ^Rc=3jp(98oqjRCikI-&Q+juml>%gB5j+NiD)(X zk%OZp+43o>x{&Y};>XGK`D_MM`L0)D)jjm@)OYEZHRle|685EVR*gf(!D+>b>|nxt z+!X7#bJ0}^+miFn>gddw)|5-2PI7^R#c5nQu`SZx&A`miMUZXy5<6-8!37m<6q%w< zBt+>jTf`58-ZGY@P`@Xl6Z_~~|1lsKxv0%zXG{EBv|yqn0zG@T0B4Q6(;NmryWo$G zFfYTntu1E${vuu3=^3}2_d1~H@`UP4qUq_CW5$7BcUkWqN3BGq6x0~A4fZ7!AZ1Vvh4!@S@>Qt07Ow=Sk6E+-}vP85dFKTcSVg< zsj*4IH|~X+akTq#1Aajw&lM6%!hh_XBIrTlad*R2=dfpS>?&DHK-Zee+#fq$05N%! z;JXERBYMjKpX57)9VhYTO&XAOK{~A*t_=hNFH@T9g*AP*pv`&^M&DYRC(?qp_QGr1 z&D8h8%$Wqqf@k6+VgevQL(Z?2FOj+sUwX)?|*<{N#WFa|KLD1 z&2g@jh}&7&?;6|LI3em+t@FuO4D1z7r??X6o%>jxy3#ia1A|p~&w?x?88-x=p>_K2 zT2ImeF36o0-K}1Pgrx9YspgHgbRg8P9=6kC)OB-(wf{RixYX&$u=~z!-+nOyEs4)b zI{H(1n9I#2CU9eNSDDWiNoRNO!2qxrZ=&xRUrhjxRHey@R~K)?P?6c*$g)$~PU6+{ ziGBu_1(UwrPpUOC&DOFPL`_adG0te@21WYN=hMw^symEmz3TZghWSb&`3?1PZ%|2H zQ{~RiuoN=gBUT6zdN_0%9{pz}@``#U8PUR@3H>^m?Ma0WRz(Y;0!a92U<;d6R$71` z+CX=|+T}Xl8kdRIVy%)Iw%H#R&Tz03uRUC=BxIMKV#@E!L<;o%J?pR|Y?6iH_|U9KQtN*X-6??yt&Z==ibULl|=FZ(9a>$07oZT zNVz1Nolo-K(=jkeXBGo$Egv(=4$)?yQ93vNTfKdnQ@P<575a!>*9(tW$SW%@(I_@{ zNHR4n&+)e?g1h=!9&5CvtZKcfY7W76|4O84OH_;7eBH^rrfaO1T2*f`i)^yxHpgM7^rFJvxkEF5^GdkmeFylcb7f>zcwRCwiJ8evKwmtw|F zKL9VvOHAfqAg5=EA=o8#I}XP<$sBtNq&ekJFPmo%73k9f5VD#ctg;({Nzd8GuGW5z zOdxHTADmM)us+=rOyztNtDvcRyggw#>AGM(Rr4fMCK`f8t0vOx zy`T?@fpY}r-cPetjJ;8GdS$7;GG!ww2vy*>2yzu%02^vTAt3 zt5Ve_8B%OEfS#xHjgsSR@AnZKfu5*?4A$J*nW{9qpT9qu77wqAztmPq-(G9;-EANc zd-CAaZ1#`co8&%GZo0FlI>vcF1{Rvu-4}K z9Q~x0jQVg%@p9bgIrzvB`GsMKhY;KxFdkzG4W={$%BU3jG+F*XsVX4}PcPoZM^7bn zy~YS?`qrJxOb)k;n^lD?k#z8ww13>Vob3xQx8AP^5wKePOwAcmcPDW}JDeN$fAt~CNhGW-S+L0{M z(rS|N4~!vA%Ojb6dj$%U$N1ImE_0k4Wn&Fj+YU8$?g z#-*=rO}OY0$+cL*{WZ&?m*~vMPLi zRqD@n6ql=!es2uk0UaWPS$YKVR|@F)^qyq);B%ngm!km!KoT znh|1jak9-t9~Y}~JVMDAiCdQNLF))5ss{2qH=eih!i_LO{BtR8oE=yq7~`r7LA`lr&o)tCk58G*?E`N#kX@J+x!7y5w=lhl4$#_ za>8gCfycZ_-0)%qBKtrb+)vAY`;ApG_vO1|-@e_j>0OEC={2^n&q!CdqE_u?ATH(< zg7hUgJ3EAj_9kRe_=GhY?x-PeW}n{RBScj&xK8+DcO}jWzm2z0h1A=Hzuq5Ucrh4`}SIhjZsgA zI;$yclArZCkH(_z=+w`|IJN8FxahY1@`pf|7;)x!L=K<&JIkF|6Y-K^>hnsgj=gUR z5$lP$ZZW-iUX$lx*PSKQqh~)gX6-E0eVesZ`TQg2=4l=Im0`shgRc{XHkr2&Z0JXJ z%d_7xuOkE+P)oJ>Ltcp~HJbEII>rmiKJG)#{ybeTi9fmC*{q(PZ>aSII6-wY_&uaB@6NDmoHU3G8 zxBpLETKNfP;PJ@v4U!vdIDVkr&Vji_= z{0v!95BenUlW=7Fa-r0_LRO6bz}|64==k9e?u#TV;H<(eNaS;Sr1^@I0COtd%Cq?7 z_q)jfZNp!me0~i?#GM~b4h1KaDFNg^`=K@D_kzVE1ffW2p%}gTD_=J_f)l!gwuZmf!IKX;uUx zAQbmG3NfETz=b4>1%Ds}-r=r81=cCx!_VA!bp)xS6TAQI2uQo36V;}kq(=6w;#6C`gm z{9YWzwk1mWgL~oq%bfaKj6)FM~TmRNQWLK=0>W0-}SI;|13pC;d2p=cxR!v?_D zq9_UF&_4}<&cY+8-5ixVA(RFzny7IRFNtoZg4I#%?%FgSs5G>LE_ZOeND;R_2?%$< zyLZYb&T3pDm@Rt!KWYHW9Qx_5Tt*IbME^uvN(AIK_dq(ARU0d4nDt%!ZiUljAX`g< zp)!>L%60gm^uix11dV|nIipv^F(`3NO3$j~aC69~&pS&yI|wvt%-@I5%cJcD7x1b% z%~)hyeLB#jSZ-{(5S2#Mc zed52oED6U;1r@>tc6Rn2@MGjsPYwXG?iFOJGjgCqAt(M;yY-V4m;{pc_I9iFUAQ4I zE+Yg&h$4Te>O!}`-5Wr2cy<6TM%JbcKb;A1$v zu*m8Q+J@Ix!Wcj~Q`vXlxj9Z4L`aQ*BuClg^$Pz!{dG$fiU=3xX99R~wLKE~?u4af z_|hXgH)Yh_n-~=W4XBAz(|oHRlI|1|&`RZNjk^kEg!L^g{9R97Mi#%N`uDME&(x}l zzIhW0mRr&DP~;SpaEEK2?-;8b6O_y}0g_PyDOfzKxYdgdI>n~c{5Q?JxSK=XyW~(N z*|f7hThjd;RST7MlAc>a*nxrEwSmU|*T0dKKpAsb57&4$TZ?yvr-^cyHv==RgIFSF zrAJ99`f*Ug$$Yoi((2MB`~`Ve-$2rn@3iMMx(5aa4RVjgt_Xl^>WOn3ET~(4Ymn)M zkzhG{2Fo2Sa%#ZzDnR~lPc!#gELS&^+@3VNKMjw>)KQ^xs9192w=Y8d{rxv?uPj~A z*~`~0#0AZa^7C@@jNgw|qh8$4C(*mEm+npG+rQA+_nuzfx_tg?%;?lvBQ98f=8sGz z5CtGCzfrsRovHTIP+mn_v~owKN_kiYDA@OQ13WQRFMNBU5)*(Ab4aRj_JQ3qS*MYx z>c0CdPwLw@2RcSHVB?H{&(jGTf0SO#0kgaYlApO-xlMO|TmOKi5HF z$7w9RP5=8V`-@*XXJ6@5s|#B+UXta>6K;j&y%QWEn5w#F~XCs*lPLiZViEGC9Bn~0^zH9Q( zY@d?4kDlphuOl9zyf=f^(`-ytN=ZdMaK7y1bi#m$&}id5>oZvkmLJwwCcR>}(>x+n zA(MNLVg-{Ko!VqRRUiSz=HAM}8KYxdbn zTsLA7V-PcHO5N#)N7SfFX?T_X?2tx*(FN`9yYT?G{x!tv2w7cb*MQW*KlT1oJ?+bm zd-WuGawGghzW<|036ExvCWp;uQQQ|A#XicO8scCs&2v7zP4`=eOWpnFfc2n?C>Y=dS4fL_HlZ8zw_krbzixqCt4C; z<8D%!@fb~48IHVcVU#5zYYM$8|IWRXYD;}*gs~G*U)Sv46Y*v_WPap7m^aKSX~TCe z_fB|ance-SZkMS|Nsk%HaMnR0ZVOtP*5JS(IuRwA#{X(9j0lnW2+2|q_+VuD zGR*q<`E5ah{v!yoDW;p^1HfxP1@<|6R43mc3>K=%e4n^ghn6L1nS%541vxM=6XWCS z;g9hbs2=+OYqvF#V6Sxu=8~H09pmm$8K5ya1*HSb!G%x|6>i^Ckm|0>mOX$FUyhRp z4RE#$?k-`-I$8vN-;Obl2-XM}sE#mnfVXY}1_BG(TT!^1g1cmO zA(L$dUaJi7rwtypjeU${O?aAtzBNNMaJCv-NrF1b3K>kZ4lT9Sr6mP`zl4o2zPf(p zKdfTe4q(%nNifzJH2e1I!kOFe-)(@wP`>ffFuUv8R4Pdp92_AoIh-iOFtEaFo721{ z@d+G(uWzJYq!TeciWKk58UPb0FJ{WrNVxEQ5j@#`Oyn2oWkp3%uS~Y(vxq5dt%C0XNce*0$RypTMf;IObpEM_%$ zQqU`9WUSqPYGa*^X|Rk`BHvJleRp_dw7~eYzGNl()jf)Gw2?d1G%ddl#rPiOcH_oS2Myx;MkjrvC7LS(^Bvo(ZvCr;?%Q@T>t zd>>oI|6r^&1h2AmWbH})l_6>Mg6ptPc1h(|@sh-8LI~#K7(NhLBj#6`zEH_NxkUG; zKF8{Waty|6VpwA9jnmwUAkp;<#lz|8uygHY9|l@mS^Ter->P$U(h{3vW=8EcYUkW3 z*IhTPy~-O99P~lli*ReZ8dK=+F+B2;yqeH6`0QxVX|XX|SHXQU=5^|r=4cTS1FOK@ zrwz4fCf)fCvpY0QvgoE^jI7;N`j)Vru}^`=4A(x!ko)}Hi4M6kB(Gi|hoef2-5jF$ z_?Gs9>(+U)Hvdz@(^e5o^i0cz!Ms*uUrfEI>!n{S@E{H&Pr3XR|IB);l-NGwxhU=4 zXENMkc=inQ&h5I@nNg3R4_znU{ou#@aLOhup3Tc|SE)fY)pLACS8~_ZtA~tU6x5cQ zO%V$B?;m|I_QjtoSI$2~)tzr^-U12=GDqbW+ts6^dMg}CZig9E-qY01_iA74FzH0a zqG~)+QsbmePQ6YuxyExgir2-U#39?}$Bx-zout>#qpgWDRzIp&jb=|M&-?8j7P_#0 zyQytU>np+ef=E=PjgGmB(9(sh+-+pu8}~w9_N^xj_g=i*V*HL< zeb4&jEwiJnZtYrfuM9>{F0HpbDv{?a;X$cSjt7E%MSERz%f2I(@qt_t5yfKg=-87q z-Ab#o7Zpbr##sbU%+GztZDm3e1<7KTs$cn7+J_r^dCj z;?56}fxA1znRU`{;aM{hPm-(r{fPfO?QuoSf}cmjWIbwlNDIx%A|9}+ynneeg$>w=<6w73@wS6>^3Zo4;fS7-XlySA_{MtgnXoah)D zVg(h-2f^AsTndV6it>+dI7cs3cnl|pX6VEU{5r(Q|F`2Z+QbaBR5?je6_ooH6E67Z zuQkCcQcJpZf{A^)lHUUXTAhT%VPkSmUNFe>l61}Q}C;7IpfUs&r|>Ny6+O4 zK=()AB8tUC5hRMESWAZQ*0}wx1z44dmy4xoD0sRb9y|5{#M;lqt#$AT2oRE4(rZWT zB%iKPpDd@72T-uDSyH~O&NxsNl)&4$6Efra=SzA?v&HgwHbHL1!D*%NS0L}{fjTZ3 zS=rMp)GRFFARGG`r7l+oTI_G7ws(Hy-R<{03=ynR0N0Ce#TM!yAfvVJKET)c3~)XP zNS?(H4a*;xRqb!|>jI}?;JJKRQwr)~jaIinmtq8NEE5!){h+$u-hqH;3B2wSyu3U; zuT9ljb@E_qyci+#72O4y77e2Ii{{bB9CE_)t?^tI7`GkN<5xi9VA7U6umx^ACUfVFFh{^S znpMhi;$ym6!V(yb`q9jS7eE~(Fo^F;N6aX(YQ;dcS9cQ=wCqN$ZxFEri#`JwK%g+rEeqx#j<1X}l(cZdb)SJKb}zYFiRD ze57V?1%6XpQ4Pff_lIg^;dafkI`t^qZg|2Irtg3O>;9(~XQSL~KwgshbP+_YJX-I3 zAzSIdpTlmIfVA)x#Q)|g5R{F8LX^^P7py*v9_?l%yT2di%^q_c`*hF!Ywbp#b)3A9 z9MDRSsW?^Q93hSQz+|!fvtenYw8pSn`%E9aAeg7!9T)%DU4|LpLGPL*i>XMQM0H`j zW>R8r3RnP5B*d(K90eZ^1#n(clcv+jUFtf)df|WDBrR zY_nEC>So*Nj7&@&WMUGmyZ-am{@yeczQfb1Ggu^D#&tj@ruR&%ET5eF0H5!d7?GD9 zZ?B9r8{Eu0ZQm$HNt(IBSbr!PEDn5y7;mxSc{{&+bB$$LhGRJ zWof`==;f$ToJGktfGA+zrBZzO1pok49WGd6P=Gs*q{GhuOZobPg=ZectIKm=2VY&l z(-dwisD=W|o4A`$sgw$Ti@?X9lvL3Vcy2EFGdG><3-%W$Gm6*AKkks77Bb z=EEWG(S>s(?1Ho8A5*|L2}Z4-SSL8zw*)zgR~c~YS&Y8dseYQC_Ty3CWpmv5kse0o zQ8q+U6EP%+ty_O3RHZp_*40hs_8YAPd=8^KI0Or&>p$B69eE)CXoKp;u$`bD^DSH* z5=oC?yK=J9(Cx`9Y^}S&!?%s-iagwgr^zgDiX=#6jeovB-f1O4tjxv^>x}w!deeV{ zn7{A&xdF`M?+LS^RF6*$^FTDLhD(J?gcGN!ceeSD6UT20=5|Cx%to{?GxCTy+QBb; z6JR5DDJJe5|KsZYB=AXpW@ljeQ@G--muJH?al@&h7E011N$-!{;5aKh^m6<68ma<+ z_`39~nMf*>DE!XuTxtc=-#7gCh@g-mvXd^U5V}PPSL6|W>!5;hfahek$QM=pV^jTk z`V~f2?snfT+=s_u0Ex7iH6X%x1px+^R_y6NVh>*U#5O~Rc7g}lq0G|7t`kdMhu1wi z=WPD_okS{l!dNsZsqU&^j=|UGeX|r*FfH)A%HDvP2Y=u2-*F_y7Up5){EwLr`j_E~ z+08E+&*C&8W0+Xo3e)PhQG6T$b93vyr6F<81-4AQO(1D|M|hl+i8&1Ypd6ro|0kFT zCh2njefj@BdCw)dyc(VJabgaA;dJB^Cu3*5JQ&*n=Lwb5(Y|T$)Y2*jlYed(AJt)^ z?Nz-m!?zKDuTQ)={|X_u5}K z=+=1B1Ac(u_+$)XM`4d`0L`kf$A%4H(fDxkG{A-}kA@Ql>-V16k{+#&=#cP#?WyWlnPg!Q?7{%UdR>%;;W2)q4(6tx z2SgF=IeMZHBXB+?&+Y|bw7+@F;Nz|Hire`f4`(l)?FN&EbvgXP^>T5GsoWIy>%)FTryT3d*qez0a70*5&u8O?V z%<1n&oWsX)ln545Q&T})M9nu@rVl<8`!`dKFITm{w*(E+%#X8(va$fw-e8{B;As!m zFmXT?q^qYa9G6bg`r|GlU*OtNq6NXB4d4|~yMu!RN%yrgCj|{cK{d@;7XuUlIiOVj zhCX3#p@5y9wjcDQlXcg66p`|c2+B$r0!d_pDNhG+uN`+!j6&|Sh`1faF&uy!PJ%5q z`x=ClWGF?r+<-V4TkFG$gX7mzZq04p7EjZzVGgn(Ite=_5YGn!&v4hTl3FfUjN|v+ zK$!ujj2I~o@P#+Z1`ygi-NI9Hdj%A5o2Kr~T&9}C$;ZWIx!94oktje$A)#F9#+lW$ zt{Yu8yR8KGuEuJ7ie}%~xY@;|xC<*ln1G1}4M7bv5H;uadqF-D_9Ii%Wa?Y$8NH%6 zmh`Jod2|Y*rca4d#d>1mAR{(!*V0DM>iw}%eb&a<;%-90RKjH}-z)Zpc%4RfQO59&E(#q7QhZ}1I$EfC?C^BrX3&`bW&TQys%N~>uOfWzUg5*CiIz%8Rmx$hkeB}eA19b#bKy-y>X&YXE z6sg4o=n+enrct?F8+PbRgYY-+w6jlwt}hU{p;X<%Nr#aSZXgc&#-zz__H*6o!Aw$; zI4WhMiN?v$q0g$)fYUd0T7w5wcKe#N3dS84UDq8;_21_~r0-w&dqma!8LU4oil6O- zSM~|9?&O}!nxiDxrttj}KuU?RdiacnNg!>XUPgKWTZ8u8E+X!v{~+4DFjUL>{jDky zD{IQ*Md73K?P)|%gcS&44?kI}@+n6DWtPP%e|E4S8*L1mNLvSi^CkGnVfp9fwb^b~ z@NsB@YJfU|NhDk=_?b-F_V#L2<6Kk3h^VJwg;OT+sLSG@ZFoNjZ=I%=xRA1J=nK?A z;Q~ub7mZ4tW|a5X;DfzG+pqGpq4+tuGThCN{-xI4$3WsosRE9nvvufFMHu7eiepvm zCx`esIPhU74PQAMzr~`ss|wYR5YnV&H2t}wKfjL2#oGiAoSzT@et_gI>`eWFjN8gE zB%efzt&qhWC!5i|dH^;m2||NKdOTQqCr4|j9p=$?r)2fPcH@td4S_!LWZ@y&Vboz5 ze61IJ&S<7x%S_4;{#hTr` z-VE^V+gj^TzX_GRk5{W!Tctn6izh&!=@Hhme=YAgdNc1isiBy_@%o>&3G8Pi&1<*a zjd`)No!J)2hjSGBZ^Gu&@`liU?#{FEciqx>fdW>B1UZpH)@32lt;OL^(U`}JvziRR zM0r<9#WcK^1VW66=H*ahVJ=>wyY! z5*qjGewE~g_~CLdAcN>|F}P4R9&!5%S#AXs z`#XJ)2L#GvI$p8_TH&N_I_#L=c=1)mhQcMN6RCiiPCNLnD)=&ahAbN@P<&y*50`Ws zgix7!xVswzc7sNZoB|Kg!;hFG*)I9*v8czwh^HjieT%P8A^%SU=zx-DHK)RCs!e4b zMg?pfUQ94yuY@u^p>B#$!e4mi?-}Hm^U#8(#thy+Ex0~FRJU|_cuKMS;;_LdZvH;4 zW)}aMxm=I9;>8M>C%Bt~)$*9^{FQv-K4>EHR}$=B%1lTrq;mg{MDW6N#_Hh_RDE;aNxc5y}<1#9&B&@eptGiTJ7d! zvPUaEtdTD9&gJ^x2;Yp@jOpc8mIWKUdN0=+AmTb{ueshjip?ESxj)?6rD{|5u}h^&xS^Y{+ibY;i_75A1X+(` zA{HA29)-osnrkF3s>7|_^0dn4OReHmz%&Q*Xd8gluz1>-=q`zJr8;P52}15`2VuI8%b-kP8~nIRWR7TEP= zLf)(E!zQbO}6MK zb97gB_b%*SOd#4n7hOnpB;^EJ{?OCGaV}BHF|MZF=5BqHqsfhfkzlVIX^q+GJI+nv z^4Bg)D|&^GK;fK3X9ZGhqO1GP($U27EOoUuIjx^5r7uIEgSJ)7z$ZLd>Ud!CkJV)iv0wqTk}lI(NfMn*4!-w+b}SYx@> zv4IA?>g9{$0y2ynr8asU&n-a~IN?m^*52{SPrBa8`;<9=gj_O{vtiaemj+J}E!c>c zm^3{EqdWm)+KnVZjW=his;a#*{Tv4=%n{x!?EZ>WQi*f(5dV z;38ppQVJSDK9;(&79-T|ffnt_zmam z?EQ;-fzthw4oN|<6UP@M*&4ov0bKn7Ws%eyj-wd{J$=)lqe!=2z!|aeVF5OI#vjU?`!HcwXDY!1QaUA3~{|Kg! zU?%)3wK&=QwuljMb|5c%s;Ie!->j!A({r0->f1=m8ASbA&zrAx8B9Q$<4NTY&Jv;xZ0rEk8dX z@@^_(sMI!Ldb{=M22@hSH-tgZZdSbeAuyy_?Gd2a@uS5`dm!bpw>xn^>i-f8Vv{0T zXkBxH%JkYH0wRFM4GT~Clmdi z?X%%@18%awu=?g^MIi$B*w0Sz@KF5byGZoBJ9eylraB0^fTqq1o>k7TR ztIXEL`gGLwx-iN@DEVj0sH^0nRKkAu1O1kue{BlG2C&x-Sj#z!3m_2`0k@iBi~<#@ zq{;9C*bG3V%I@W^o-K}xyXP?4NFERz+~gXH>yk}hl9q)3v*Jm;5muwu?yF|8iUG|+ z>50z5UtynNCb)Kz35(cMQ$B#9=+$ck?jrD=$#6248iUNf6N~AmVcSmaH`kLGT%gk+ zl#qhrP#e;r_=hSgerVjyPto(G4C-SlZg-e9o1XnQeTSnVQ4xm2vDyF1vrq)!9R1w= zRu!XyV*%>FH_A9V<##$Xf<`={QRnd!Orfw_17-AJu;k8n5*YzhS+_D>mG!>?QLHLl z&!Z(HuaN)&s)pZKexgz~My@xih{04tLIWU?f4Wzc{;&hT8J{dRMeJR(rG}BlWa6js zsPm0^8F@h?^Ln?MyXoI&2k%CSt$Y2Ny`a+p+v`r02X53VkM8q5(I7&ndknvrFe$>slnUifPAtjv6qgzAk{?pn-Z9s9ilT?}=e6<;$di*DZr$dd__$W( zFj~=Kr@&#(SL&HlB8#Puv6o`7InXV2VmTWrs+lDjT<@FgbLtk_(PAA{HV*00uVafk zhOA&=3Wm!f$P>uYYtDY>94^9br9&5MB>D*)&Mq#pwwnjpT0mnAqXXo?0G1ll{+E}D z!_Jz|eW)A$=s290!ph~fN)ElI< zeBRkgvyR2hgqKAwe25XcXb@7cFik!xuB)qSY>UUp$jC5xdmns?6ZWSUZ90Um$HTx{ zK6!A^6!vnm!lS#5$$!-i2q8z!hW`HkneJ;gCf-;nH>1`Ig!cB?CkIR1*`D+)JNQZa zFMsNsuGbm-u*7Po>s6W>R&1=>(R?$!4hbOQZ%F@|USiCU8QZL7`D%d&wurkHI^CQ& z{lv&J1{W`+*>sM#yvqWU-C&iafF#fq=?@Z8=OH~qL1joV(E`NBP>a--3Y!UK)C2JR z00%{;+&g6-kZh7?Q}eA#n8)U$7PR!kxyQ(1y;vBi0!cR|71g7ZuDph~A+&0!R%??$ zPne|5Cf-{kMAYxWw1HABUhD$Q@@uGS@F|9T7t}-G1nK8{#0Q~eg_GHkjoNv#qJLh-(c#HYO8} zYy#ZLtL>@)qh7BG{kly^mcx4HPGKW& z+edbxL#A00&PZJx3koaN+>&@upllX%-|(cBlY4n8-FW)D8FXo2 z<#s)E+JwN zdq+{KSE%)}SV`L7G-%E98ZHcrrOgj@$R33hKs{ z0Ox9UY-~?}#cWH7&JM0jv$!)E`|H8550kaOO-CulR~t5tA0gADdF6SqM+YNizI3LS0~oyjBfGL?)2`9@K!!hlhDRq{K8)1bwn_hCBF z3TL$59lxF^=?9?(0z8bbi zpgnly;vrwp*Qc{Avl;F}INO$*hiO3e=`DP#Eh=SPoLgW;%U*PA*iK5K zzg%bv3K+azj1(AG!Vl@ur2a01qY?c*rGZUGi|$_=M@8;$tgMOM^BfBsDfdxOxYV}N z#GD{;TJn0(U`&|AQS>XzvHWEZtRRON^r1C<{5>dWcO_k6Y`Ni_x%I@ycrRl{QC5pK z$J?TL0dv$TEDLwS3H^O?2C`_>BC;-jKW!E~N=dQoE=r4Qbzq;DD;u-%?}#_?4iZ7P zMO=v`gFTbm|7sVf3fnW>))Q#Qo7aM>)%)nginUd2I_C;yy9WI{Y$i<`9*)UPs8?`# z4OJ850S}mMwlPRaezO^9Z zwT(Z!!eNAm#Hl%bt+9ggo}MBNeLk|Ks$^tjAHd)J+E;4pa{v(P2fRr2fM(n|^shG@ z9C55Y(EZOZz^cy7H2HN^W2ErssL;h;)0*2Fm&!kUon$$6YmnP#{^!f!qa_P2j+4iK zoZ*O6h6Vow|3^J%BpE&vGrG7_1tN+_cEu}ilnHu+;8-WChfYC6iD4^zf1r9^dN_FwxPonV%X$jz{SjXGz)mcPOGz)ikpv(WaV z(+Oe@d1P2lSg4~27w4SqS|&*N&2q|*%}06S=gg1T@lryl8vQ2xU*!EHs3R8iS5wPBHmOoq{5WA=N5(@k_H3zS0pYTq_bP< zR0Zk59%EmebgH-~7H38d-q5#i{+ieh$0NsuM(}&(E#kj?`92T7)IB7`w?qyw9?5lrm z$O^c}PNK3g`(O@#3Ggnko(VgX1AaeBAm{@8=**=9AEN%*H7X(8JUX;G)`21Wpq$Et zaLDVd{OuT56;W#wKPWdb0yWWbslfZfd z*md;+0rg)`MI_|MUcm_XRi>R(8$dgphE6+B2Xb_7s8-(iIZW1;)lPoFk2H9j%odBU z4ueh$PmETkzr7)uaO=@TCe%Q?js(&QIZb`TmmTb7eQ)}C^NeP?7`QDnEvCu|{8f#_ zB6GCqj(v3?5^F-0)u-frpA->jUn_)I%;bPK^B1>rdq};Oz#g`NP91C?o)NF_92%#6ybVB}0 zI8!QSAi0f+x!)Wa!PTF)n9CpKm!@3VQpB9rCUN3Pp`BNqydL=#ay)49VVWrC)mIhN zkr7dWdGP4ij46dN8$?965NE3|pLA0X-dsKdR2$Uqv{yJ=^BLAK0VxqqiC!h-kYA5F zt#r&2suiF~wf@I)dUS?r>@m;?ii}j4_1AbwBb9N8SPrw_np6qs5}r<;zU9g09}{%2 zFvA_X^?sdn{{dE(X;29maQMFfGSL%=HAs$rISIHyGyuI27FfF##99;% z>-sbv;WjYDVt$=~^b4DQIu3nyy5Hm6YIkha&x|1u!fNfDQsCYg59KE(?4M1Eh6O)utBNVzpRe?E}y?9jG#gA$yM1@2gGp8>CGAd>cMsf!PEH^|r92^^j}O?m zkcznU=C}hubGD5Zy;}x40E#=V`Y%4g_T(Xs$jb&lM4lL`*m+gFA;S(ZXwY*1&ahB{ zs^bnmf_!{{SflRTQQ+W(MzD~{6@Z6UJ#7j`76gV|_@u}GgE{Yc+Yk=k)0YELspQ9q%)H{pK(&nX(q2bg0ws$MR zkraW4k1J$15TQOVCTAKM_GIyCyjAwQSIaE%%3dagT0HqPgJ0#=qAE<1nC;!MD-F3< zYM~t?aTUV2x<6Yj`7kLeA(yhv5ceO3Ca|_v9)JknUnDc-?H{{Y{86Q9_d-tS7>jb` z>ExdOo}q5RgUL`!FyAn{!NPDdsKR+BuDc*OIM{99-R-h%;3_DD!9<;OVlRhfvRv=x zAoPROq%nA8&?v69{9F=1#gHNeVGF5<{vIWOnTCj5d5<~fT-2D z(YQBOfJt=71tNl%jNwew*%OwQ#8_HB-Ny(>;E{H&2|!_&(8^XGzmFtKL0m)9V4RcE zqd2|%D|Qp<8*&tV8G=+hoo7(b+g>25z;maR2~Z`A1C|sZ!zRe)%7wqRz7&nsvx#QR z)}+6tovG1_e{-_U5cBa#ijvm%SUx=}IE+L_Qq4Rd`&XN4J!tB}U$C#)<^tO)E0$JA zL?My>NOq-h81QeJE#TRkV`l-6iZ8KBT~uPd*Tm{$)blNMYdltIsAFOe(WVSlxUlB8 z4N&83&ChZU&rbWD3!#Y1f4iw{c5%xlnMKlF_V|F#0YT6jw$$06p>Ba`Nx;jonsnVFX8u|VV4Dqgix;FKA(D75gIdESh5Mvs zKwU-1g5+HDLR*-`w4^|IeUYI3_vHx=`}1paz0|VqYeEvD_i3e2le?cGb@^87*@&s9p=Ia|2#687dbt?KmA zkl}c~4qtpr&2q_l))H8nL{x+XL`E<;|yP;efnBSyA$sx0(gv<$Bc zy^|jNm5fD$phS3}S1dnHl^I#iE9i!rU};eoR!as=uRmegj2Ov-jOYAXZh&`f(&t6J zdE;AiIeyagyE)R2WG5G2?RCg)(C zqzomq;vI;=!d4Vm;_t>EubV;oqLMRiiwMVTAi)mBnsLgt3_WvhIKRWUTFew z(KAAO9ZIT0j}Dg$;%!Do-f&d(k4DUnMN;OQtI%(XRd2;>}(P#BBky45u!$7RhcWArcTXDZ^i=8fj$o+;bx1-6>xtWcK*7 zO2|1B!sjQQ>+0yALbvtK-mt@p8y8F_*dso0S4qL&TDr%k0n8`D|-npkEz&z`+&O(}&xZ=1$1cA9;OL7?H5 zHm#f=lNqerj;CsGmFi1jE8M7oGrbJa^yxu~-i?TPoZj%LoWm=A(jFYe(PaH>mj>x>RIhyDb=@3rp@vCJD@f=ySN(Y@oV5(n+mvmy zP`^%~F0QHHFRe8aG}S4}zg_Z0>v7S~Bv~xfo;W%x`ubR?G7uV(4bcxeG}~+=ujWX& z_H0hoxL}11W1Y%FotK5UTmBSsLL)Lw^d}z<1^9sKR#JDj1PE#=O&%SgBV$k?x+HZ$YR@RsiXY^`J@5ZNL6ez-!Wh z)5&x*{F!kB2>GgZ%<8gui-U0JQ!H1=1u+C^vOF2wZIJB`@NAqy zf_&xDByZ3nc1!Mydh=5Cb1+Wx*p~ZRs^dT~OqMFBFKI6~UeB23Svy(5AVbt$JM1Cc z*0OnL zre{|zSqPD4r8UHBi$fGmP>IrVv8zhz^B3Kw;zK<7&G$DkozAZc)u@y0*U_Xpz1i#AwgqR)sKj zFgc@HDxqH_wKf*NOWm6rYB`Q+$m2!PTFh8N2tM?c+6 zyOC|Kwi*H|LA#G&iF@rx)6)z%iC5J-VtE96KsVqAs%b*>1OEasWKJKlD ztjWNVRiYG%+vT7+S<;dB)2IIPD3|)_Yu{CW?vl{nLHu?zyULmxzi1H}=+o&Wi~| z(2(bYr`e#MQUuu^K`j#VPF)L`dhffi5jL#3v ziLyw=BPH3Ime(l|WwWR|0zW@FRa0B2o$|I?O`n!lR?7m_wy8uo)xP`Ak&n;Y=nk!c z%UG#ojxM0}HlZY-y7NeQ^z@V8?khjM7Vamn4QIgvE>hs`dBu<@9(>6bX+FV_lRr|_d(HGK{AB@=8(+|r<{UJ{Y%EQ?Ng|ET8G>nlIL z3;oLEN6rCc%e|Z7dxVJ}(5H6MSF5U+ZTOw^A(Lgm2CdO$Ia-(STpHE28?x+aBWg!W z+v9~zt|W%kagCv$u@*n*fU@idE z$c&%gaXuOY_6{uO@ z2=WvUnS%G(_;U#B#*%eD9seJR$(4VN1A;%^g1lw7H8-#MGTde5r%RS`RH#<~nG^k- z{qgIV{WW}ku1Gl=QOv_=Us?EizoK^23pf-YEzaNm(fKQV^B3{8ijJSel((T{xgyZ<%+YMDgFj9B~2XJwd*;E{raw#srKV`cmyr?qi!t1{b z_y7D@@B}WOAT+~9rHn@Yl9)A<1xgttRp8mu#+(g)4dMT~NrgSQ`=&Nk4^_+r46g0< zEIbv=JTd?SFJ>P7`a+TafKfx3gXxbmMf(5M*eW4U{i5oBsfT);f)tRsS=tl#=)(=P zTy<6;MnmnUOId=(-N=7U7q~h)6F!kh(!Coqi@*y><=>S#htD9eEgowQ+v5V7P%45gpP@S2qJGZAr_A-l=Ps$EilzDR+yusAt0)chZE z4N4ij_;Ogdv*oQBB??JvZJUZ$Ris^>k zLigWi2+VFu+=t}Y>gsArP!z0Je0Rg4U_HWKce4ii7`o#VDecFfR&e@1l)Ys@RbAIE zEUAQ`bV-PSba#siN{EDjG^j|INT-yNii9)>2pb8dw$dP=B4NuY?W41|QbA0$)f9 z&Ki#)1!<30)YsQ{!~4#i4!4e%mAK2$dxpl$p`Y*ZUP}VR0}LhrOvW;N^Y1$li-)4z z)!5h=Xk;{FV`IBO{a`S3eDPXI|09i*u}yf$*x0$@Y8obAdyLgiuS$>xt*)-t0Zxx7 zm1Z=4J-%Y1hyhe|=8&bVP~45=5N_rUeIKpdFH}%!JbawZ*hEfoQcFGG=-|yX0H4#9UUOaO+wO6>@(ewQZ%) zaS~P+0fzA>IdhN)dSCdxVP&^?pb$P@uO9SUT!WXCZ=M8%u@&C93oYH_ea|lS>9UsE z>9;Y>6lh-K8cYI;AEKsYvceln6(fl|A$3Uns0blKih-=b2X?=}DBKK&E$8DlIC+Wq zcEqp|*X&KxN_M0Y@%m?7RQg_bWNLEZFeE}Tn27;{T~lWNWb7!qLD2g>C9VFb4@ied z13~%0bXlnAUI7a48*q4R?*AC9bR~w4o!|(>R~nOE-&jBnq+ryc667^moGQ9TKz7w+ z`;CMP%2H1={~kB#y`8z=XT5l*-c9Zbo|Ba(b)RrlOOpvglq`&1gJ{QGuzUe)EBhB% z@MsvEL3IbpwGQk#!r6T7{75K?N598%vi>=NX4emK*Tr58iPhd1R#iyVc(@ABp=EV^ zYeQ2y4w41TRyQ^fzt77x%_b+UytPV+Y;oFz z}+wR~SrJ z!0v4<#dTO7;wM!{+v&d|uzi#{Uxy!@;RfXy*|BzZJ(?)cFq=noKR&>Npd`|Zi9H4S zErVR1^Lg)JMqUDol#%2Y_2u#L@7#|PcjAlro-A~?VQ;_0+JSUCGq6)(O@5SaTxj5q z6r`Kih|ix_2%~6gQ!Upqa>vQO9s9zGS}GE_th<8vqCAt|T^<3aE(#sl+ERQwReN_( zY#W9?6IjI4D@X?c`col*?I*424%_QycN7mfNxwYXfN~qYP-;_^S0~~DmZi^k?*q}l z7|bVk)t+ttwJow;EY=@Ql~Cr98Y&iSFGvVHAl7WCe8lE9$wM~vv5_ZLuCBRHiFz@_ z>e6si{drD@onL#{ILq%d2Wow5_pXXIK0QuhknUl{u%JXtNcPYXyDOd9x_ht`S-bk+ z@d`h$?AVx?W-6~su>~6C2=aQH0gE?>CN}mxqC3K%!_9g0Vr;wIkc@x?MHZSKRzC<$J($LwnxQt>kz3X3Gt6a z(FF^hVc<)z#mnIKKgch-rwE#6Ra(mO0eGp z#|gK7f2rjMSS{J;l_mc=FgxS-ay#6KS+XQM9wm@EN6<7CF*32gl0UX)qvv+d7E8)2 zG~ITDd9&B0A>u)a&6YcOGg#HgWxtos#tKz%D9iLU4B&g@StzL12i|{X6)VQ?X z3#Sur&RJ)vI0!kZN07a_2zeJk-{zD#Uli)>Bkf%=0#nQnV7&ADbL2)4f08%`g(c9 z-3QQ*7qEM0p7>#0)z2Z+jQ~hZVuL_Df6mBzOB)=9zic{{*&NPFQL33MeeFzR$goha z>x^Q^n=$X*zWjgN_hq3`Z@?^t&c1q!Vt384eCz{PDup)Nh*{xc`D_NI=MRB%)Ix52D6K*p4oV<`KcGnZ z9Vdo7a(C^Pr)l+fV#=G&4MCk>RLT{iGoYhHb`1c`FcN`;@%u!CJ-Z&@`WA{e=(%%z z2D;6H(utqYX3#!Tb%Eszh;5f$mj`oLnv{8~A9#G;a7UI(p3y}>=Q5MSqG4^Ju&-LX z`6V+uZZxsz)^X(z?_LfRyrnsp=xRHBpY5%jKe26(<&4)8isin~6$R!%g>y&CD=Wkd z3=GcU(8<1my}WqP?m~}V6*xH`UIWit)TWoQ9eLmz(l12uK>2TdY&+dgV!#!5!r&cN z^VrJ9UsAtEm%`j-Yo?%?_um@>qd5K*y-~aRBl}~qNly;S#HS~W!4=$?uEDlI+!HoM zQ155J2secj3K)V_?nzSKo8X-TXD$BP2hNVd_gbOPJcK<>xQaC317wab|G*bM#_UK> zJ`mW&ghS27gO+kP|Chb{T{KpdZ6_3GRywv-UWo#(PELF0`AD+^wQrCQ!*}Lt$~Ip zsi_NcM*N4 zjou(7emNo@H7H#KeoD@bA3#m0lS>*HtylQL`(csRy05n(8+`{NrWDE+~ygGN6jt88(aF!4y!&mEOc`V5|+niu-RWPwo2epO{y{7?^?fk1Hv+_ zYcZ2eHbEdCGuIhM+PgZW#=Pw{6_DFg{ZV1F_uvgd&iZdQpz8>vRhhK1m23TC#c%~= zf&SpfdW@k>3y`VYk&c|Nl8ph>;=SsM>T?hPCkd3OzTC4*5f>b;-4QsG^7xe zb`=>j0G2Aiu=8le#STueE%2(Se=utbrC)xDWxefaVNJucPvt(*udsS6~=juQ90kD*ZZV1jkNJ<9o zw^>8r8qekv*Y=Fhw@1~gpVc0RalW&cDE*kSWWjppxVx&buRPCrrlMZvNB@=4cFbiu zbmd0XT>rm72^bW{J-l8i2m8)+49JqVX|Q0#5zxu?e>Y#IXw^Y@Q`BSBhgXbB-^bvE zp|S;*+hY#Y-pl!3k2fvC(u?-(a>)d7PS(G3BJf(EF1Du7gkufsu+M=lX`NtnM?5bB z5uKPx!vB9y@ogvF6wAFzB=pF6!Vc(T-EESh*jQ%}1+h`H|? zvlH(WQtlPnzelcnROt0U5sJhY2Z{Z9OL64-@vGgzADA9wpE( zmnBv-*4QSk4utbwprH!|&!2?OH&GUR?Z-LYcEA-Ww=i|9aTvRX9P3Nt2E z)Q3F_#>Wl+=|MR=`(7VR?SuU_F!nD~e9shT`cY6?Fp;y?V9geuag}e7tdUGeUboA()Wg zdjLA-=3Y|L)haIT0AsdYL&f2{%76jA#SO`pud7f;ZsquTLofhvN+Lmo-}3D27~+_Y zXs&)rwDf=a<--q{-|gm+kr8pNWE(PJt3Q0U zhLAp~4&IdEmk!vCe5+p?qFTw(Y)6xv4tG49Q|@9GvPQr5KL8NKFhX(SZ^J6F2tPwC z*LR<8Q!;?e%v?SDJLfb9RG1I%k0Z{5Pd{8zkK@t^bY>>rC1Pvk?#_4&tDb?9QICoNs)EDp9>6Jf%7k;zi@L5 zMffGxO~Tp3Z;1WdM_>GNa+r4j2|vDmdFd~(fy4nHu2PNvJk52d|0)>2z0vRC0nGO` zukrA7OSq35j0Iaa-sx3F4pRjB(RQRa!BY#5644HhF?TTo`lyH}rVoh%D^Z}38jCmf z@dw&nn=X>g4KgwlIP!}~vJwkHmk$&~L@B_h1RKT{MF#IcFs&I_q(c-?Kw0kuT36B~ z&9i?2%{>UJpSY28xAG+dXD6x{zQA6*G79H3-VFe=z9xwlsTTFp`yW8QTqk;MVHc2& zXpkHlfSL~cL9attJfV~mZa*TWLF+r)TalnPqM|2ko9}&mU_wS4;c`@>_lery;fxZdh8Xv$4O!v4J^dg8~6;cU+ z*XIqbz$hwyOo%~H)a~`I+WZwpCG_|LsFD1@v_2D%hUtAIPa^A93~o~xWz;W?NB=Q1 zqY4I3lf)Lt%U~#~+2`mkGz%agl!im_!uB-Ok;WKT(<}-l3R}rMk{7+`p4bTq5D0BTS?4N3|Bqb&ZO>OF^rh1KR>kax{u zv<46xu9>Gi$)A_N3=-5 ziD1%0-u3R^MPc!v@q-q6?P>!7t5kXE15*MD0HB{N4^p>8F|+{2;yC?-`VdZ+Ip3?G zszn^BKo1%Pg{lecF^@9-eC2+q+h+9n2{pn@4ZI7y-L<8P7(}=Fd-@>+C+}_#*sE@N z8W6OpQDpXe&-l0@R@Ps|}43oj=xAM125bi4jWOeL$)8nEH?pL7qGzQ!h{IVFw z19aOAvy|>atn{B!2lpZ!UNUwADg@!G7aRCVNH5i^RU`NsF?vSn6w$LdfI3i?L+gF0 zXrf322^!c{O{G?*yT?PWf*@KX*1X&PeZ9Aw4QDyd%2g&VLy}S_h2K2bGOY2;P@;OW zzbEjb*Wjj;n05~i-h#LxXFf%g0>1eS)(_C*Km?GWDARlK@~sY%zkSNu8KIj({Yc&% z6qopeF@F6WgGcr1?w-lu~?_(NVD1E0wmA5bKSb=21#K?xMDYxrdZf4q<0^&eT`F0V(_jWVQO?sbZIr4vkO$rqokhBh< z=uiprfc02~ct^&2-><-_*Yg#J_;C+F=TgV3PHmd-s5rGrKQ!ti3^=4g(PKN!7?Ok7YnZ*AZy{jQEpm{k8 z&;Hj;;-qERTTW5i+wjcD9NrbCKq8H-9h$5lwb^QRZFvmxyNQ^;k{FdPX0tO^lfEX> zI1inVYv;M~n4(j?77S|M_+sL{34Eh6V9!m#=RQ?4l2SkAW6_@dpRgUqTM)nC<*Z)! z6*8S7q%f+aaDK@7`Bze_kc23m&bA~wQ&*Q_kA+XA2fy)>%`YB;gU)5SjN0K|p}TxK zW=Os63KA+bS?ZwvpS-%Ho!kC@@Uk9`rhdM-e)X?lhch_W8C5ZNUT{k`me>eOJy{Jb zsK&}P4!SHb%YV&2x#f4`z=b_7zY=+jd~!&fE7Rm+g`>9(g*MM-JmHK9!Jz z>$jK0?lQ+jBSN`>QBG*n)f)YC_)$I!!OzqV#*TA>LVYei&>}QS#%x9s1PNV?Vlby! z>gtB7ro-O;cZmG{n~%3Z&Zj$?nqk5|xg?^ZT`JkzZ}McT%AvHRDFN%)!MU`tWI^R~F{Bxt)?DtU>&S+WR!gvFs^?2T&#@tblO3uJssqcUb0yaEr}gIOr)Ew#fe)IwG` zCIj-H6?iN?)Z%)2AlnkB-*JKD;@ctKKPUKwZ8VwZCkveKyY>9*A;RBEp^ZA7*o1+f zRpvL0wX?aWY}Wbu&Fc5DV1}KcdpnvkNu2xcQvn6%8OfJ2Uhv_ z;rAspGLTL&3xoRnEy$a4LB{kVs%z^EB_$a4iNbLcFs9+7E@ai_QMOjKj$>cZMtw~N#f^0Cg$9IQm7#3M~%Ur)5rt;h0g zJN3R@MW2V{tEzz&_8hP`Gq7<4ukY(GqI#zMzSSNEfMLp^gtx?HX6Cy&D%8nf-Ub#K zpXpK;W!l?$u7a5c!)(qAQAQ_9zQtLcdO_boega1r@`a9}!pR^MWO<5~KlicbT ze$P+d;k$i8c@dAJ|F?MWc>YK6mm`hTVZ2h;a1|x#+;=fMn9VM%ah-{0^-jsG0SgU7Ny-4+65$gEj6LsuEM*$Xnz3<2x2-&+@!75!Zyp>BpY5R*&^(mXw#t{sa1 z{YDF6Z_=aV>B_LrhfQ6pc-2QTI5sTFw7ap`Sh2ic@~UqvhWDAr(G4xa{pHVjs6QPM zv#&2_KtSKvbQ^7NnKh-CchB2<1&N`Nk@se1e`OQkQ*Z@_hMof-whMN4cDtZ-`rv)= zbZY|yOQyT4V_*sqyWa;z1tND03=FIi4hFr$=yaPtNZ8gwBUF2dTI%b;bAV85=~33c zH}uVr#_KkX?9nDEG*7addx7NoCtv4|=kZ16o8#@5e6{;zEPq{rC~kwJ_h7+}XdJV; zRJ63nzzNJhN!0!tJs%=mhqg!~=m!6|K@vL=8yg!TtqfgPb+xI5MHC#5O`+4z{&BM1 zk`vx!+iG`V`~)Nj&pBB*@~@Zr{av{gf6qMr)}Lht^g``v{Qfgd7N%6bv+erL1^5=% z1{&U!6%h}tL@i+Wi-~?LY2Sc95&|wbAQbOL#qW<>loZWbtdQy+Zz; zNQ#2v#Le(alJ~Tip%Ou@^$iTJlo2GIreCInfi|#jJqa~Ffc)}5>Qf7LetS0 zVlO+}NgXdqV*p!5{d0E={Df*4s*<#8!>8ijFz|JsBQInomXE^mA%mG&8?pAt)3)-h zdwg+5ZkrU=L^_Hpxi5){%0Y2At+VnevA_^Dm^6Eh0sx`_pb^u{vT^-q0UPBVbZk{^IPY6lel-^3J4nvh=_=or(FR?3_x3X6IoVAV638B?NRpmCcH6@bH7-&O6+To?%^!K2~iQM zy`R;du84pL(2_3PWr5{uK?LSsR>$y~*8q`5kY*}bpJWfiYaVzT`8qo{p>fhhRRaia z^;T?;`(?B_3R{KAq1OAL{Jw*jd2@K^^XGR*YQV<0)FWyb?TEF$6b@|(Cdfv7{3rM2 zKG!#eOlUae9!(8UuKTfpoE7xN&$^Lj1*$qm%HY*b!K0YkR#6o|5yPRf$cqQ6gzFZc zvP=vV@>)ffk@0C=g@*doti(RuRJ*lujjAP(s0lcIS`{&w?}J>dOMz3*T%KO{ zn>T=-IMA%r8aeke#CYAo2JppsVev=MZjH9j|N0bAV7!SQ-H?2 zHu(V}UIpn@Zu&CR-u>oRejp85s4dKgFCTi>+iyWgX&0 z#KBeLF8RS)#aN?ax^{jX$~NMW^OGmnq3G6Lz2GOr2d zepi~c#xqg-*{Q!n2Y!V>mSKP8FxR3(O-BGuC~2ey^Y=TtYL&^sH;z5 zZGL;<_UfbvUShUOIQFzg6RPjZY}|LJyv*3TYL&i^RAMH6+cSuMK1)KoK?tugo5KjD z_mC3`G5r23k5%kwCGF^zcMm3@>5ovKw97o)b_aGgALa7S#4adrEx=0~42*@miSe_` z#=b{oR^(8h1!6Zw_hdKRL%GKEXOXx!1mdAxS87moGmk!F=1`!Gt39;KSYLHCM_xJs zAEWHvD)$(V1m*$YhOHrDR6rO|*Q7on9%9w=1I!wRJj6khCDa!#_yHjmoXE0;W3VSk z23tktg|w$h9@!6+YnR=8HSdYHHv$2gN0Lt#Lv8gdP-W-AjVGtD2Ml3dh=1YEw{hpy z86IDH8Dg9#QY50|1hy6qa_8;w8sP
rtc} z&5eQ9LU$yIt`smSDL;o2aU4d-O8&QA$)mg*s#A0t7zeXI&9$|8Y(=tMf=b`KWR(6; zXL#R6DL=`=-@m_{e+_G=%6-EWwtdu5IQA=zJx=W<5~{+w=L8>q&q6(AITIU# z^9%DjCX!c00xhq4aLjbcn3TCm!k1lV7+~w}(3PV~tNoB?We%81D0}nAB6_59GOqBC zdrR?8@CLIl8x7~f5`Pl}rP`bXqeK@vTi3AOnTsI;VC`jTu za_HV#vNK!reLY6}y0%wFiYPb$vZ?Lroiy!!$*j0K-^EFu*m}9esST{eEft>Rx$SIk z-;=LhWS)3+_WrN#>CHL8-VzH@rhnJ^XspPqh-Bfqjrx2VQDj( z?&(PGq&5r?nBzv`g|J(c^zr7(xR?^LU2eRf2ujN5tg7NbiEwMMa1z9QXEjA@mF~=U zDdV_nkaLbD22R`C$(~5z{SqAR5|%zZI?`6Av2%t=)S|w4aSS4i(&v}x zE?r7_3{j<@1Fx>v$)P45hB2U9M_5YZqh&4H4cvBD{h?r4jNhjHCDGi-%#4XiZwA;g z(z*aye+aySL?1rz&mUhA>e_tWLmexbljp~dv;0frCjGT<3LKXHlZ%T1$ni&JHc@rh zLSu}FIOa-mH;V&O!$Kk953pYM^XW&z zuItO^y3CNP`%`%gZwe#MYj|=@{}7tqaRTe=jCuUuZ-8IKoZ|P(yZ|6yeitixbU^9H>5`#Tb)HxYde|vCfJf%%_Nd zpkH3&eWkngY4~hCsRkn+h9<$_l)Pq3QO z4-Xojw^#bxfI0k0XdwuJe)CWx@UjLLn(?`2!AuS8FwVV#|K5Ogt!CD9Qee0h; z8}+~-z=AWQXv_fB_EyGMXZ8Q7iT_)>67rf9*BJ&cM>CN+iOjzUpF-7cs2L4cKKERB zba}CUfqs5}mQnvF9}+cw#5kRq^zQN8zaE;OeRfR6nh$ic;g- zJSuTdIe#?*XE!f=rKc!uzJrH+HWA--u9b(|Vt;!n|6%Nn`}$>81r*8=3~1$1-i1p_ zqdc~@aFEYv=Omx|crgWkJkWY>K3;K%G*cg-{jS~-jzZw1`` zik34Te0vowqVo|gDEi-d@IPs;a!xQ;Vo1k3*iB5Jti{Eho?|z`z-&G@NGocE4W&I2 zGDpd47;q^4$rk#cR+j=7`%01?j@FpeW4^y%b~{2EI@BQkh2!@Ku%e@&)|V2;JWf7+ zH}8?kBDNO8?Vy4}MhBPB$!yqNp>RuQ}(2{X;jE>O~j z06~)xQZwKfcg}8PJEWu}27i!K8}-^xn`3ZevONMkxyNrjzKTED5e2QSd~Y~iM*8_F zTK$}co_ad)Fy>vvr=J;ox=`yZv9zj&H4wTtx{C*k2mi#8GO{U^fodW(7`-!gwdYfEISf}rJ8 ze7qm#O)|K*>t05Qn3n;ATfxFtXB5XU$6kgM@X+w^fVXe40A0_3O?Z`?34N#9!@=3v zb7AG1Hrir+^9$0TgEAZVGPtFWf6}R`skt5Q@_&5P_M*2}*JcoXvCOLHCa$ZiOM!v7 zGs;AUl@-_bWP>Yx;G+vC&7UG-9UUF=@s<8ET1XvH>ufY^u5=xd`BR{&SnUIS zIN(>;Z1Kq*LSL(hp69U$-nPQEYYr>J(o5EG3~mAA!dFQ={~ymz7oHvY9U1!}A!JrJ zOF58{@jXQDD=|)hB=}%yBBw4~6@w(PLXlnifF0N0YA=9n@dkErmuHU?AWy}DnyaCUiAt^#j z)+GI{SQvJH;g0@Ua3>qp@)*nr9<`3Wx=zN2g<+1n+`!y5NfH+l9E^$+u*Y3aO-;S3 zo-X&%-xFTmJU_#7aDmDMz8T846#x^!j4Z#pU#a5-wdnn8*+Q!DY4Lg_<7Tv7v5#7lgk6ZKy^I) zf>+jQ3Gwblc9Mb$S>rZe-NYonuRggNA0Y6~}7j{axO`wyZ~=K;g1 z-(j$!rmBTpxf)@}4GdBerG>hoKs#yGkdNSH3`!1NX_c$MsGOhU@vpXlw@M!ByWdZw zs;EwHED33_m5XP4m_iOe6ZkN=ALjW#@v?JtNQe^whV-$>kD06z;hM$hkUZrC{QRpEPOZDK3&e?3`kMpeM;+do`= z&302yytldGn~2)c-XQl`DdG}A9fimw%oXT|998eS@!`Bj9+0+*p#7`UAHyYwd<^UG z%Y;>z1hW!%3Y;%r#=1%Mg&5MlVP177fU}KX9$4`oZzw#`%4!!Sr#yT8h9WbZ#ID}0 zLbta#D#Cu|jvRcMFJCUU?y(bk+_Tx5tVCmzlYy{3&lQWh{*|+>Ori7}0?1r>?-pqYU7#jN z8ht#q-<4+9>s#OVD#G2K%D2*5 z%=b(UN}PWsdH2+m^ZRy=O2)zUiGPHq!~Lbw7Yyjzr)>G2M8vSE znLro0d%Jx6=zfA9JM=}?PwL<4)V_=ZUbO0%n4Cd5hpO+>H}~P+vlx)C>KqvtqF*Uqp-Ojf)n_2=q1S?yEqoR z%7NeOOPOj;Bx4nt1`U0CTcVTO%R~R15y34SFGsIo`)x}k)8>tkW*UMvpsU4xXXTx~sq>Y0(^LPJ7$W05!*xCt2kA{S5N)|gAoO>}7g>@~{l$1UxJvQ9A&%=RB4~xm#7q`JU$BiCnB+tgz zJZ>U>y=5)$Lc!QFeG&};r+^v|~#j$~f-bU!Ow=iRRt|0rh$ z9JS%Q;`L3OU0y%`2IQjh`@bOAMGXRnmC90Iis(5R8lpN`9EjaE&kc8>@R@Kiu!h@LE{lW%l;qKCZTqw1H-OGh(8#eo(_g6FPd?u1U}Qt(2PN$_7DTRXzSIiW^%r+y6U}G z<|~2UCdL9nn8wa|kq6D56xqdN-JWu2c|XnKa z(y+BWL<`TS8E^t5nRx3>D3r-fv(N)#A|4tX8qy9b&}uBn4sZWme@RuV?hn=uaC9@= z!>0S=xJcxbbq&$03n*8-o~hatD1p<>Zty<%{#x(IwXNkH{gn%~PnQz!UlsOIegjY% z%4!{&43-vCBPo)1KWVU#2d7E8(t|B-AXg~I$g5jc^}i;eju(kPi%D|0GEhmpyle+% z@UT)%O{9NaGFY0nH)toT*93Ilj;bi~EQMF`*>Vu9u&QSrQn zMV#x@;~`VJCU@dr8Mw3*!g#pJLtfr0lCDW5V@rZ}y#Dza?#!*2I>T$Cb3d(SZNG%RL_b|Z%`@1}wtf4C2eP^bKa3-O$Qwc^ zQl0UWdl8H#JpP_jQq*gk53dx;dg8H3>Z?3fC)6Y!4789qyI8rV!Wo}fXE?l&h(l!o ze&}EvmBZnX#%CIUKR&r}S>ChAs9LM?9PaW>H}j2G9^`x>b=9z<^?S;=%#(pRV!BU? zm^(>R7`djejPo`ZKE>7ae)(@QB)Q7z+{8swI)?z4)+${{6}2~7a%XWU()jezs_U|E z;k_~rB08zi`QhRIZ2|iRHcGTL0LEf?Ls(i5j~;3l@Saf2(V(sA`M$afb*;m5t_QlI z_Lk3Koi|`apsaqWHq*U8`<(6Vztcsjyyd4o@Ksixg1Vod!j0q?!p;Y^miI&UD?fB{bEK7&OclGdpGfwgOOUG^u* zuT~iiZMGA0X{fKZTwMFIB(@J2eB`<~w6%=qph=m2Z>c|5{~b~0+qcle@!#L_Mgs2v zVyA>gux1N1p;Qc8ehZQ=53Yq^`)SOvL5$)N?%YGrh}|ORlJYs!)n2__rf-%Mb;ny9 zqo=~&Sev~ET>P}ZvFU?2LA}_7X-k$SY0v1tT!1Zj@3pkIf&TsuMfJx(x#0z0jW`6-#Y^P}!lx=)mWOFC(c ztUN?p1P}e66`a49>h2eO<74J@$eZHrLj}4|q_5}kT>(iZZi8;TAE=tg=YF%X7dqI3 z2>Ez-f*5X5(G+3;DFE^?9c?F0>mx|;80F!9$f3P|ZhhsA*YE3?Q_z;l72$(sWiB^y zW7b3EAp9xqQAqa4JwD))?f{15Ths(Bh5)EE0{%`~`dZxexa#Q$($zZa^5MgWTsSMd za(-_;%A-BU&;OGH9WNpVmXhk^oAUDVsGiRN9o#ZfIF@8MJQN2bBGj@%O$Ky#e~H{Y zuH@Cv`<30*F-a?C#>DOVM7*M}*4-WJ)B)kW3iP`x=Ua?E6J6_8!EP9~J1_XLMY%NZ zV)p%3FX5i^2fS5At17l?JVggJX~WOIDxY;peenl(8CJo2Oyz-#(Bc0w&Z4Yda`U`4 zbtQ20!A$>N(4>fn2nQY7d4^~hS|+0)u~If+9NX0~_6`EfA%FPKM9WDQs&~?NcgjGc ztp}R1}}({#AE2FX>p#%!+@^SN^PY7bM;P0wt6cu;8yk8H8~^QoeM#FG(O%$0CcR4?rH5U0JHonxzahp(jnfA zivI)u)%#hY_#;#P>p$PNj4m!_(QnK6(qrlwXdfqW473|oMsr%I*>cA1`d1zr^p_2N z^3=iUid0hr{b4xDE?zaj+p_1f^ssQ+-j}Zs1vlk&miUs!yW)~E&9LI{wF+0oLX7!?1};gaYCeuC2| zkHE6p_G3+fv#80fFTI93W6L&C9PR>U-+e`+IiC@*3Ufq0JNDOgTIUI9d^?igV8|is z3Dgom6qpPpD7C+FjPys_J8&7G&w)0}$6}NxEd;TvtwN_)qZz&8XmoUh zqr&YF<*WGfCqHx*_G6v@TAJ~7EB)j#UF&e?XuU8he=smU*urMyFc9y#O8I=`ptfPD zmH)Af%7<%e7rR(W6@66MSM>rPawwl8`65o0dLNJW!v6j z`Lg2Zr)2B>Y0nk^)?k z24MfPd@C1We{*&@Ndg0QQDN8EpG_D3kiZ4^7ZzT&ll|`9UdgYE>ZI2=JkEYz_i{tj-0If-<7ftph1~= zemr}>Yj{{jo_OaAoPc;3L~N+;m)q=hyZtk=H4x=v%it#N2Iq7XOR&~l&J~CFP6LWZ zL)D%YPYHnrbR7UQu(+TiN5`}4Q(&DX>^wH`&}mh`o!jB*SqC6%N%%ob)8IdCNh9y| z$>$#2KoUW<9Ktrm@4lX;)whwpL&*mnBjGnS9%*Z~9i@u^mINWoOo4xedD zB8Cz&I3?v0Xt?}gA+w@Xjip`a{qPolEZf6?!jo3iPU`T;KHsxj*x>{$lGVX0esjKS zfjb|T4AdHY0OkMoZpW90BvH1_P|KF~yz1uPn(tEBUKDqL;*tFLEG}*)IF^7SVC4cW z?b~ez880;c)F*1iIxVc7v%HrtTp(GaxJ?qY)PIC;af?_~QF6j$9Yt2G4iFiI#D^;G z(}(I}F@GU~!Lu$N&&6xn-RPSc)fiSXzj&ZqWbMtQ2e%vzQx#5-!hYdI&HX0eH)RNx z#o01(dMoh5wB(F_c9e*`x_zu(rnb-)%^hfjWgA2SOVt>G+YXq$z1+TC=op<4Tk?MmHr%|-4CJ*3Q+)O=V zlnTGRM-N1rSEMGRsmT|7)&9Q2^YFVl{KRxYb zySi)0!Z!P5z5ItFeB#~@a>d&i=72+s#z_?^pS}_@r8m$NeeLdOk3VY&zdn4+{TjY* zG6Lr&X$)UD4dUt|@HatXBr>ql6I#-?MYn{7(B9jhX;~lQT$L2Lu@46Q>1r=-6?>5n8(LdKJ&o)u5fy`n`9tAd1B_SKg|rE=r#R2a|1MbPWfXq~$=f_+Cl3r4N&{IhY_hd~`99-uO zEJUR=8H7TXn3<=x*@>@uP)!jP{2#D8XdO$9ec%I4|C_c_@KH> zWq$w61G%dL5(ABm-{{}bhm6N$>Jc_IZD`bS;6Te_1r#ici$<~zVUdwHL5bE0BC@T@ zjB;rG=~;hs-SoMJp=H*Jn>6W ze<8)3$clgy^Fv?A7Q^}cAm;<;rV#36^5ue=F{KrgkWHfU`3}NC)dT64pKsFq_%U}N ze3aSzCh?aqUrfLnM)&g*c3{i~05rC6i4QorP`yZ`0M{lTq=qa2)MWfV&Kt4>a8efz zof1-ice=~oO*{Ve=<1P2=jP;YJuZOUFI0=pVV{m=vi z%omG@h{#f+@c+StJq>jls=;a|12D)H7kYY|8q8+GL1C)}FK@?_VQH^}r3&aJe*wm^ zBM8jkB%T4LO}U!yh*G8817wePvEab0RRke_M9@LzC>{Y9N;7lwm%`Tfz)zR~0Ptt= z@syAw@*HwMM)$X9!9Gn6K${QH?z(}PyL696?EP?PPe`GGc)vP27uoz+0-;weCU%MaiH7Rm5#yRu*GrW@25W zrOdEn9aXy$u;M=c^`&NSrEAeXgQXo=u?U9d`9Vq^gXBrA2jAkz`$`_FuFsHvkMFUh zfnI1Bi2rldxw*LLQd6nB9N|zO6rzZm(w@<02@iov#>*|U1*B-uYm-NMej}0D*&aMfHr3ea2P5mB+wHH8bwmo zv{jRS)g6HX(rtZ{okr9uYs1@F@i#eNP#xH)Do3MMz`(*BiCRpRriJPDuE90WE8ZAj zK!50w%s28Bfiu2BWQ>Cm=2KC~<4aW@pxW~UiD~C2XEq#Of49TBqwle;^MY))j{uzv zfc9y=b(P*2{fQX0j+R!|{-9FRA#kWKQtc8VqD+Ysh`IaYGGzg7DT4Fe1^L%bL`B^h z8BatkMcjMIq0IoY4NGH?jfv&lNqh9!o+dK<^ez*A8K^_m>KWH`(X+gC#JfZt#wo!u z)mV$9hJO4+SU3^@HWAnNCM&;Y^g8OSxW z_IG#B-d3vctz|&sc(y)jjoxN?FC|AO>r>@{_!FU@%w1+xOHP;+hUzz)M7LoJvP8`5 z{4h9PzwVR!#A=i<)ra(6SLO*uwWjJAK7Z~{L&#^&540xh_dYs>= zpkSP7(a}Rb+ZxZS2w4|MOrrLX3}g~^1?RwV5AOoYZd}McV1;WJ)dGR+PMX@5a=fYx zGe|i#{7)*OCXhP;N0u&84z1x2Nyat)AS#LyV%lxV|-)@5`4a zD~v^(rRery>T(g0`7JZ?;IaKO;-4#Fk*pzKFz(hx!xH0c8ip%^A}zIqj~^UzLH_%F!V1E5sS00C}4drT*20%$eX3idX@u()75-WX=H zG@wX{CNHvwt^?iD5yDlwf~P@Pes_rtN}jhh5zI+}zD|ag*)RX-vN*)w-h!z*4iXEF z!5F{eHmifK{2mkFJL{qyOf&6;fxP+34LIi} zv2ip2?BDGHLbo6|DxTk?nEv!z4p>&M?!NtfeQ%-8o|kV=mwyIbq4tVpqyF!Q&@P~p z_R!MKEi(9o2^J4>kTpQE$ho3zi5J6bEVH$ryo~}* zPyZjc>P-AA74z2;_O!hxzdBr;VJHmoU1gdgx!`1UlsZ(MwJ%+yNdEfAPket~Ws}uR zZ*vfVtUYIG-ELmex9qy8{8|{z;VwCjo4Vir2s7<93y0hfiZa$63B4@kHG; zMXUC<3B1hqU3ND(jIw#3z@>F1Jh1u5qCIx}Xw<02%N>-@#GntpOTOk5Q45YY1@4ES znr0eHS!C-fbPC7ba#{jU1{8%LIAmz_+_8F3f=Xk<@pr$ZD_A`o>%+ zf>LdFvE&Y^x9qJ^9wL%LC`~f%z6}O#PLfqUv#BQnxZdyrj~?LV*XFN%XgI2tO=NY! z1m+1eQ+q$#(rzyeAxfoOdy}sR%Kc4W8#31AR5t3AuUV?Pe9~%Y_0*7gx84^U@@ReZ zHmY3^U6~)Eu!280MSJBJ>uulwhi;DF2=(Y>@t$CDLkh+w;$Q!Vz4r=>vdgwc1wjc) z7Es9`NhF9QiIQ_h6ai6G3?1?&_1w#Sl>;t+BqI@b_8G0WKx1I2GSLH_oFT5yF9Y>*euooHA}9cmq~&|P5V6iF z#M6ts(@?(<#=-(hkfQulY1d-2gi^3FK%-rN;zP##m~5KDIJ8ss2k_s~<1+cBx=G?X z#49Ur6~t#ugs$u*4ZM6J2KpyxA4=oejsptB2rnek-p1+BpL%uN1jG;qx2kO4F_wBA`}u>p$4tee)MxvF>1QnZ zI3<&Q8g0ILyn?02+b>ENww{E}@8E-2(Zu>M$#IQXOFRhk-S&4Y%-BpKW4c}ncw7ck zfdEiO`5Ye?dd*LHJo{$#uIIT?-Izx%Gd(I1Jm>s&lEU6_(__9=lC3O~9IK637-u#jl^}q0hYiDRkCnK4;vBUiOS&dXCobLGvQ|p*i8trVpTxFIYbhpZ4oQQn6nZ4Dg()He?n& zxK~}FVYLB{7U=yM8bM6}JqmNqg`1zh@nIg#v1eiZs_UHq8=pz_yH@vXlL+IQmeK6G z_W}a>Gp7%xu8qut2$~=}B?#7K5542Hb0L(TFB8331swxszn}mt^pS_OSmMJ-5*{N0 zxhVKZmOkv@`i;(x=z|WFZ~8y+{sDWqf8^jG z`l$M9G}q9^IhyL+et|?QG|@hLF7{wvps@!tJ%M)2Zk6O}@*IF#ep@bkiV9OFb^iLh zyDICp*IylY>B|DUywrSU-?Lc1G5doFka0?m;BeSX8WS#wn+{a zaQS8?O`|k(@RGxqZh%N@*L=I|OC2AG6ihDIa=E<>G#VZr7J-zpFAQ0#dT7V`f({Hf zCjX*X0MLtJU&Zpr^mMEDwnK!Ep}oSSy2sRv-zCT6+V@7TQy9n(kAa1rukm-y@{)k9 zo@OPu57-MgK& zKxtT8S_o zN$F{;3&RDo>{`QO0QJ0I-OUh}kZ}Ec+rDQ?^}8Ji2u#ayZq#Q&wHSoZsUI|c7l0E3 zaAtkS)XGFX zTG(yu=}eI0ud;<>mSAgWM&N4PJ`s9>!C&};gXM`~ud~)?CshV=Oqc%Lbw(4qR>YhU zdzd%d57r)e`_nzDSH!P>)~DBcNp-pOH?)R9D_6N$tWJFWFv2#B(Y*UY7-@K&|H_B= z3DGPIJ-sH1UZg`gMgGPg9(r7?yN*x&Rtr?pxGiLzrY#oOHcwn~ zMXOn`?k9n-j5*a&sS<(q1mKIygEQx$8;xeA4!R>DM{0ob{97aGjJ!|-lElZz`#fFy z->(lSNJR%@%OR+wpWkY!zQ%=A4@N&7=lrmpW!lG-;QEWbSdOAGSk)aRGIpxcC-!#@ zgie_UxnFa;KB6gC#v1(LLZ>is7bF*IBlHh>{7(P&T>oRFG919}e8mc>vN8Rfo= zqj!viMkoX*#f@0k#j929) z)<@hTlNTZr-c7Sy^|^ZD=s`sw-vzmET4TkDZ3wIlcCP$H@#x>+n9AH&;8{S=zbrB1 zF0hxHlM_xfU1^Nb3aof<6{A8WmSh~F^;Re>KY~e$4Dzw7^6{U{b0TyggncVshHRq! z<#$uU4Jh-npg#Qm=<~R(Vg3P3KF?bh?uoyS?q-!cp#Z5iINhm0tlKs86Y|>u6El;X z16U3dADiog2I3FWoM4!V$+951ep@5ygQ!*Lyq@@)p|q>acm)>1l~aG|1zoxV4HN@g z{r@`*AQ2%U$hYC>hWl@Fwz^#Md(3XWM+X#_jKaL`oAuCYMani;`x7L5omaZ!mE;Ij~@X40-=1)gcV(*q@>`g6DR%IUZGyK48>LjhHYn z3R3?97yn}e7%s}*H)bd+_-d4$ojuQ@O1$~t&&!7?9AO2QrKB?8`uH_e#sDXl1U&5@ z!yP>OS`*c=xZE;YbVI^8PoH4adm{yH8E?5jGx$XeT(P5oy;fL{33Vr}fR@CZleoLo zy-;}5f&>th>Q7$mKuan3%Obqx!)j=#N2l#T*6S?>X__0HMIxGm_=d(9q^AwJ80t@x{xypprTM6 zfjl{t>|9DC9|LF$P*LQSF%(_)k-{A05EUdLpMO4BmB&No#(3`SF+0?4%YP_oAzw*nKp^m2 zKQ(xxJ&^yWk-Q-bG{^q`oWlP{IR&5q7PDp8i3PxPr0SCnpMCJ5yc6l62-Fp@|KbHK zMN*f!LpGgo9PAG3*44^?K#Ud>Si3}TWd`F1e`B0RU5lub?x|z)_llb}B&A3mE zb9#M`q{!~||2JO&o4cc|%kvO|=0pSkMz`!!sadvf#lMuh2?IXN3pYK<(z^Jw%znJH z?`Xlq`CvyHDEtSUKYp$l*D?6Nq{Tq=Cp)hqNx%P=K9GMZ$Uls>b6zsY{$N}C5jC`? zxzK*ZZVpMSuZnU zY9UUMx&waW``2h*nY|L-W`Zg9l%Nc2g2p=+mM@To#;_|L^8y>#%dCyee9>`8pf^k! zIK)qpWm^I(H2d3)i}()*!bIdV$2%<7yEG{)XEy$tLZRO2q3Y_qYf0{UX^wr~roAtn zEg~n7v{NGk8sl)JNf%gtU?40AHy}C3xAx}V$K}OTUbV6UsmMHjYHHnbV5)&mI2L|o zrPh_|L%%Y?9+=e(l<;(^Lcjy=NG@ZTu&KG}cHSbxY$|=SX8=Qt$Yy;9^?3sjw#P_{VYYgXcFCQ)lN( z0Ktbr1$i4Xr`ZHZks1dE7?9x^&=B?&k_xcHAOfLWfVAr_qOGLne6BPFGf-|JqI7Iy zW9hw)GcaLOS7^2CrM?0ZNh^6BSpONC(|wnbg4u^clJGnOD3tf+$|b(FL-q6Aj`gnh zd6bXS_ez1P&>WsLbo|5{<39JRVAJ>M&?McK($UWG3OB?_yN;hc697^#Y_BNi zKDg?EbdwPPsZamuH}!*d)TNz8dF?bgi?bLl$_*HcIKDCJqXyMZsZAFPIJto-UweJh zsReKjfGRS}VS@YDV5vXZB;-edkQC-e-}I=~8e13}I|8?Q%3_SP;g~mk`??ags6$7d zEIY!tfzUPnkHX-W_Z6g;frL{6u)C`|eEmbqownN1w?S;ak+qS^xn#nX%sj6@r=$l^ z(*X<%fjP>|p!lE=5fOnr?}ORWdvMU?m$?aZ3*g0bo8S2yJQc90G$x)hFff3r#pFm6 z3lwh!nj4US@sQvqj?aN$Rte}H#N?6d=mzw2zyy&GKttI-`xqOL0x01XaGXW}=8cE$ z)}4zck;86qKqG9r&L7qt7;;#p*UM5o_K>_aoEF1(ntUV2*MuzL(b$M*qkFP9aYy*`RH#_QON-kt2|$g1Zbg*jyyB$;78=lx%<)Q$cYX@Pl!fvSe&B4-8-v zY%6fl4h46CvuE-(Z-RpcRMkW=qo^pOpz=KNHB;w*yFN z-@32f`<)Clzzvf`Ef-|2y@~vjd^aCa6b!B9hY@*&kHt7IJQz8Fn@nJl^~j66*yB{A zmnJtW16*Abpx`1f4Rp;_M=P;QZs&yI7~^!5O+in#tB+=_*D^&L0`!7Os)m3 zwiBqu^|MrU0MaOiEdg~f$m1`d+brm1WMS%2rr!=^$ii-4*=&a^Pk}$K@jwi{xZ_D6 z_6otvry(;RFfZeK6TO)-E_jV1dk|iy|+WNH7)snLiv|QA3q_+Yefa|<=CxBjl*gQ z4+ik56U=JA@%7mLz{J>9jDXy(Gl}i*1JjeQ8(JwWhlaHQ8@f&HZ$4+iHP{sG`NC8V2^eJQ!5VcgE!wZ@mW?U@ z(4^BfKnNQxgKcts36OwD6`y;!0{UfuWZmK2;wfiCeGa7Odn+CUng2fOddERyFqBe#t z|8ADh$c-4{4V%0Y@^I`jU{+|A2k}#{Jy%2>g#N49&SJuU9@U zS8{$^e&r=$Y6!*F%8*h0WaIszcjxZlu|V|G4V!^ynh1<7Q|FD5u$Kd)Fm9EWIVYPL z4f$JGW$4GD%E-vrmT`HVqD71G*`CQIEtYa!Fb8}G(e?KpTYx^kXEK+%p<&vJJ!duk zp2WRp3>W+Ahl%nfxyY^|4XXty9wUn{fya7QvgiTYV54%Wn8$#bEG>P` z+n*mFFZB2Xvqh+2{jkyON~o=syKAEPW1Z0L#jTeb4xu~UszLE{LibSs75$^4qR`_k z!d7fvUbE~MFVet;r2hW4H~3(o(a20F?G-sd+TIjT&m@w%E^HQPy3JS%fp0}$tWtmB z5NdB*O#RmdaxvZ4mq-EsZ9sbU!Gr8J3?7TNs)j(c^yd#^LHrFH^X_hE@@fB|6IZqz zCyf~eTUy;dDHB?5(JN#q22#_nYYxvr$x0J02ChLY+$UJ&m;f-kv%J)RqX9{$h{#Pbc$&(FVwyZF7`uvz(K7>!#>Rt1wwX_7V9Z@_R|vQ)SOqToue z6Gbfo+99|1{ybRAP}9=-gSe!zvGFLU>IpTZjlbgZllfrK^7CLi;mQ;N)7+26B|{o! z=IRAu;CQXcd>#*G{PG##LWkzqU&1$Tf?5&*k)I-wxbDCSeSueUp}xxBQ6K zYHMqO66UWx*!S$~|J2-k3>plchg&0@?746mB1xCo*x4_FbpVazvPi0AfM6vcv(H0XlWlebSLybQI*KuE};j?N>* z;ev=247va$0}=iWA3jI(l~uMzf4-mWxk9FAsghqOJv5Ylb>T$btKrj8jakd1zjG5t z3tZ3&*&NksQ7q_F5w;LN)j--T?1sYDcF-J~ZcikK2(G_#jsXMY9x14^Bfgs|+Rl&* zy;^|Iaifw5MiCl>3c)|w)pz%6mz`u zEKO@naOWGCz%g^O7a136xm3xVKsh=ZUa2O+uV|r%fr&4c^?`{g#=it#&D8C!pIsQK z2?DXj{W{{MDMm2NxJxhy?KtVs`t`luV*S;ukC2cM7(Nq06#4|tZ-!#={ccpS-I85K zM2Y_+4E$$054yRU;|n0^iB{N3^Vz?0r}kZ*S&N#deW`=P;M4OY)FKY|E!d7e{HY!k z)@8jUDRaywiX~vNi9O#DO!cZ~_gH+$$;tgcy1E1e1?9(+pW@GcHU(k-enAEBjX4av z;1C6#H&j>9SZIUHs0lZ2hsIG>NcS~Rzk#{F4FDnNvhHSXaGo0Pxv1LPruyVXoFpZy zTs<(}->}*6F3-cl$e-?S*?@Vm`VT`@+RQI*KV?8>)NeTf@{wrEb(%J{(z&(00EhEEzzLCFpr+=U+=#wN6Wg{3!-UVlrepopDEw z7)w+A@EMzw4ekF_vHy#LLE;^w7Lsm&Uy<=CH&B}4U-O-EBnu1mRnTun`_uTq8oqrbm@d5aeKNlXJn69hv&$?ANz zCK>vQRmzawiyH2tt9Rg=rhPZ5=>lKr3gRoEnJ+ZMf|ZD(NgBxh z^AHvLJg&lMhx5TI2PNN~>eo6RsLf7AM!BBx5IMdl!i%9#P z?2z?xB+mk+(;mcSZpM4J!N@{8rp?RY<+}K56+^MGH*KU`s5EzZ&Z0ZolL)Hco5tvy(R1ug#`A$EXX@;h;Eqj9~e}cp<;PdBTuA{EkNuCTY;q%P% zFLH2TK@fBWNfo-~e$l9IEbN^;BW`=-!88ZH>W?`hUg#6PTiOQ~Zl3TitRCq2tI6{3 z3j7C?RX0i3QWuiN!Qy3QX6{GTtola2a~oYxp`v-H{!7I}NIsN}@R_=OPYi!gPDtR0 zr_d9+=n-j6(^9r{=PNlHF(Vp;MCzPg(=4?eK48%?#8s8#*@=&yuU)Jb9jHzku9-@- zssH6N6~p7{bFia|eo1|92eUk3P;gJ@rP*`~tE<)QH$IY+Cr`FSVN(n?1pNH`?oC|% z;D4T2pv5deItowJ1RizagQFm*h_LI*Hi!uD>|wN7Gb9>URKUu@$e?19VBdfK;bMXr z9DeIVN1i@Sfxz4C3zjZ!1tyC;ND8)n^_m z_+7039%XG5*hx(gbT0ZUHf1KjSXqI{Rt@QQ|4<+h`kD(j!CWfHM#ypKk2t#(16H<> zhlS6tUQK)^fuEW{jUApbY=Fs2fxmvBrhv6@_5aIPkUU1PL$eUspj|{qkVF%W`pBni z!cC~Fp@1Fc3o=QeA)u!E-+xew@Itzhd9yF9siRS@h0BJIRM=Olx%z5!J_4(N;2C{5%cQiZdzhrbdnd<$3X6_a?TF!<}%5%Ff{2}7>H zGeq*{ULSsS4k{;jmWa(8QRVtKu^?mw8ylZ38ldTZ}AnK3UeFRG7GvbC(Wvy@4nzP@5NBQkPzq%plU(?p@ zE#f?rzvQ-0gwjiJmo0w2nhD_JC=6EUCuPNE~REv|1jS9*pMbJ-X_31Hl5jw!}VljPo`R1XYUxA3`Ic#t& zs^3JJO_Ok)ANTQqwp@E%*Mw#yUsG);SMmC2Sbs3cNXfvj01GPD(=*F-3m_721iLxJ z!ybx_EWjcVO+BKnhE}oqj*gQsa9FYRMb1}9#W_Fob<6KADG?D7r2_&8b%-d?qanzn zK(e6%^`--iX@f+{ zy@y`3K^eq!2DthJC%$2jK}RBPf79Tm7IRLEzWue>d2NhQlhbNH%u`Eh*{lvF;i7}D z5x`M2clde?R=^==BqFVo-cOQNy^VSyh=N%WsOz`j9uFg&#!tslOX%FaoCvPf2#Bic z1+z(s@3zG@>&frkF>C<&wuE*x!}+W2-#`^2O-lf~|JF;xsd~R_H_5L4TMKaX)&L(^ z=MO%2nH#X)D7J_R_0bBDo&$=S0GT!n5Pae>jVCS7X^Os;n*7`@(){BG#(37G1l`hz?(aPpP5g|IIh4MCD@MsE@+9 zznE$8JmZ#a2Z2r&-2#aJ8tVOy;-{yNhoYeHUG=f$#7X&r(EK9b9o+ECB+?^x(QQo2 zanRB87O#rNCA^xsyXgd4?ukq5m#-6LYQb7mW;-YU<%;1#CX>0U`mWrx)w+M5cL7)7 z5A}Y_2eK3%LJAeOd~(;IGXi38jfB7^OF+Q$`h$I#eLNN=RJ$qj+@c%LF_wT_6;8Ns52@AFk=?CvW8iQyLFG3{3pr5Q zr3YkBD(?74eJMN3CRdWEeXhZ0h|n(UYoitnYt@|h%n?zxJHDnuHqDN-)NF3{wH+@_5-Y&N@+se-QkA3C2<Qug79{6BU!|0?_P-S^cy#W6#(*A>h z*^)8~HSF+^s_`B*!4J4rKFS*P@gB-&*a(-euw4o8%wbCXJef!sWRFvdFrSI}@#QFt zRSmc~W_?Hd71K`fNK`orN=s5`wzX@jbV-!6KDIC8#Rw-#3GBm`!1#!bi;K(fISj8N zM%rnR0hzktzYQ>DNPH4jh?j$ksBaxIvZb;VKF8YTSyk3KZD*;s!-Ooiy7N0xNaG9KC%)|Oou(? z9ZSKoMHZlJn6=NZ0!L#7=D%+`p;pQQLHI3$#eR-DtzI|#X!p1WpRF?ob}o@fYt7Q?s%Y0F zjIIMDMHP9YHNZramOu(=$A5x`1-g~UiX(Zdh={k`5NaX9yn8x6{X zeMOOloa#0iJ-m>~Aqvk(T-(;Rw=ULqd?Ka}-028vkbJsxi?Mgm_Vssg{d_46$i%{% zRA{0DL-|Nxy-vlD*^oWa{8L}Az}dm6ijmjD7&#k7F9MSe)Il>W{4#(!XK(Gv;A@}! zHWMYSw;r@4+k*$(0)8*gIR8b=EKp@)zwd@71EJfhzK+c0fCH4yYjf!U+%1IdaFY79 zZ<%B9Fr7mr->G(c_B^y^bbV`~lS10syu&5&^{c#nTTDiaOgq!8Fafw;+qfDUHtzKx zCS(b+p7ikJS#N@7b^uBQX;%R2Cq)km1|fH;dMt^ir$*g_P}nI`Ih%AiNQ+XdQ{i>3 zQdwH{CG0)2)u$eE;GEn&#@;V-;yf6cB5iB~q*eZav%&2qR|SP5>+?g9y_^|?#3qK! zmF&K%{|!y;fU3!4NE0}~ijtc!d1V`7Mb|0BY+npI$L{IoT1|?b9@fQo(Mj4G0UX=- zy@|xBe-uDECodrNbef(vCF8L!CuG!Po2uZhe%tN^h~D`+lx5RS4TW6?eG z_U*prE6YZ(JMPf*1Cccz4!3X@&^`=ekNU8_`+KF+8>1o~-+0SjD17>W?1wK#N27(&Mvp!n@QNGazlKj*UUr z97$5;gJkRn)p({|eZ37w_p?k6=M~k6RhyE?0R@)h^Juh1zB)Cnqo1Tam zK80D_w+w5p7JUbLHY48$!rA9oK3dQh>QtFqSS$l2o)ql5o|JEE_(nSflD!0-`zE$Z za0!VsnS_x@ib5c>VX;`#IUL9mNt(h?@^iBxK^es)WG622>D@MzQDR}Uj%J0;Nb|B+ z`!&5PX0*zOwIzzV5u$oyXB024C{UC#loywn8YuM@8Hi_u%mBT!k|`P_ZB*C}z#mxd z6u3u@r#`r5dkzQPf5DKZX2HH!^{U&NO5(Pl2y}x~eQYL?0^GyMTAHt7y&)`cx9z!K zOp*FEz@u$6YRWg?dg9SR4hrUe$MtgQblmuBwqy%+mC+aj_FJB|yoF z(=OOo8=QnRQ)FwTHH>f<#3ia2NRW>Me{|^?$d>o+_4a~CH0`TS9ixde_)ezZVB8=| zdqPf54q+vq`75C&norVQFl*9Qh;9M%#;@5L6K`W>Mb7Xt@1F1rpkJ_Ad`JcR)mDSe z=kkqyZfM86LlmN7NZ2mT{OL_tC50fky)Ru@52HTS6xHlN;$Yj zR6ST_J?P-A#jq)XFVePf;2P-%g~pjJT~a%8d?($-@L~ko#eXh64dW7L=0M4^WZ0R! zw7NztIXm&A5W_FaS(>6z(rg7d9Ly=S+kW75e?J>{P44gGx>qo`+3geQu#{{ zV@SA8AXNdrupn5$3(}b=p?7H0L%Ujmh$%{;nrtI7hRf-M_ejNxGuu;lv$2gSpMZH^TEhdfCIpGTyTG1`2{xvAuY1M^s+ipGUTXL zq7iSlU(i7+#q-I)H0^KMsw3etKaSXfNO|lNM4+4P>T1EhvPhCM-EHvWa0F6jyM;VW zz}`ZKC=~%Mk`#%Ul6*zUTw7SE%m4NjX`&!k9xokurBV+oI}pJ}7^zb~1c+7Cb2Fj8 zoQ@DAKX=%gn*~&;*b^2CV4o^~%JkQN`^(~h@z^OPpf!nZD=APflxDnfQm%grrPq(H zYkBAX^1DS3!dudM5;^a2K?*F}I!Z`5HU|HQ+3QH(-%8H6Fu2rg(S>Fz1Nl)bP)A$Wizct0P&@l$+wNR@1ahyZ#&E9o$w^8^} zT$47#byp@;=1S$C(b2)GEAG2-^1aEr27cBNyu2)4;0V2b`zx@!jaE4r;T`^{W zLkBteaNVwNPMMs_>Jtlt0t3;_(W;<~8fmLt2J*3)$w1GiPZ^i5FobXYJWD&X-wU&n zp0rBbAJ{u;Tzpm9%-?bU4`2Ag0Q1cYqtA>V7zRo6EEyb`^14WHkRla0macH=es||* zaHDweeAPymUC$4j*SmSa>tOgbs?MhEHS-UN!Y24Vtq)RDHtn_ue_pso_=2KWd&dtrh%f2CWLZJynI>u}+9`CRzxldst;JRP0RI5pb%@G;Cv z*dJ}%pu8;JIH)(x;q+nN%`!mxTva*h;7&VRyIJ|knd+K`M_ahckcGYH_#{1U_MZ;H z8aj=iw;hgpvKvhms$w?Wt4K#!#aW2M;)HL@jo(pl-p;XpL4j;z(f--|!$~Yjb+B*1 zP8i9IJ9V%Pg41m*?4BmSr>pdZV%Q+-3hm}dZ6G`RkZpx`RV~4he>_Xq{#JWS8U7VJ z=hIEQ3s(g<&n!N@bmA2HY7}*O(m?U;wag*Or+=tawRgydEZYK$J05YG%wB>a3!*N= zRdYcbLE%cRjwQ|N7<;oCq6xXAN3&Kxx8I0qi9Bo;aJIirX)Tv%%`tfWDGo!yb0Bp) zK>F@MT}xD^`;M!@emNTD;1FWqp3qwa5B_55bve>t4ZsoLr2H$D2okW+mT1#x4) z8fq9t9&=Y-DtZvGspY;RYhAQap`Q7uI3qJBQ;AF#Q$w0F+*7H&d9b_W@ABc<%8cXMUogB9><@Z zVP6+AQw4?l3v*?+5aaSKhM%1`>Pa?|OiS3D)|t5bw5zsXy>(u-!{jan z>&824)4*MqN_5=9g8naS&kF&1uAjy(|i8q=xki0lPibV1|a_QE;j1DKExV7h|w5pf`T_KdJQ+eJ;ARrbis zI?GTv;Tfg}G1r8hE*s`(gt61g>)d%EQnkC)n;U)3Q#V^Xr7io@>F-rXpX+WjZILlV zTvxe#xfRkUy_Ju?NB$DP6&R6EJn`S3C=Y7qou@m*qs8XFl@T|_BPvfWLzrHkcvJ3H z)+g=uJHm{+tCI88&5!oj6?f4X8FxoaCAZS*y2Y|;iUe~$`NM(~Bwc;!r!RW=9#yWV zK(6fuwmedeBwtRFjci%P*%}g=G`PeTsQ!|iQxTB+5uI8{Tz#IYS;ce2{OzMea=ZQM zYt*HckCoZv;9)JtP5SrQJzUNjK*s{BFku~YoU^~YJgtiC)ni9dW> z&{7Ax;!}%Iyt8Y^da_p(!wj{e*-Z%i7dsn8ueoHODjwbgDQwLNGpZ%zOdrA0LlT1G4kUj#3|-$WIulN>m@*XJ&umH%iv z)UyP;_bYqPr+;p!`V5k$0#vST$pOn)-0ne7^Cdw!a=j&@vr%hJ!@dP>OtV4JY*qvF z)tJ4TnnBNUn?rpsAp7rG<}8(86X#eswg%)iFvGL5HGe--QL64v>uqG}F#N|VdmGTR z7_h;2ngv%+e*p66Wj+2hfjbon^u#Y;+y$p1m{LIU0fxvD?Ck_q0Uz@S5%oEZKm~?P z-GO0hI+v3GW1>vFc=uvOFW(H1RGi2xAyB0%0_3<1MmZS@ez~-}0^!;|ldnv_g75OG zCn5<&jj_!SE+a-gk2toRCffpmgZXAY0F4Q|KiZ!nb48)3x&mNFV|l4oyx<$TelfdU zw{In;T{5hUjKP5A6M{ZV)O&4Z<>E^c*qCaebb9{eBiDTL)3g`|&g@>{f;z4B;t~(P@v2VGSWR<6Ht%JPW;u6(u#+3dR|aZ4 z9U#m|Cfi*fyzjegF`xtTu-AvhyIx9?TTN0_6m9J~&o!E#QXz zqMuy-aCXAONhO5c%ekGA*Gs#knRIfWKmr?mSId;_Wa!gK9zDgh5!>tQ#T>Z?X8(iXJv&FsCC~KXkXKZCD{N?C>i+FR8bEI`0TC4hg{BWe}lUhl3>_ObJ{)xWn(PK z*>oCj#~04#18lMIz4k#yAfF1On4~VY?HfFEqPYc=o&dKFmlAvzd$AuojI{%V7G^l0 zt_{yiOZb!v-@VBKrlioO8_3+azaqnu#c`;h9>XAX6yJ}z^(R*BnvJd9jPq0=v`PUPl$bR%wgxdSxl*t z?qsNFEx{7y)9~a}7RF6Yk_9=B*H^jqwg);?(W~oMTT*VgzP)Q(d*3KR69Yc_)S!B9 z_;R<6?b4-hUPCr_%+=L{ulLA#ZIT!lZC}O>%Pv77WxczZd(*2zIXT81%q)&_ZeA@m ze^7fz?Uf<7>w*lp0U|S6mY48Kh|kR%IeNjCTy5`}oMbQ+{}4k70$EY_M@2?$`gYF0 zz7jm>^%!ponP3#QIuE ztyX?&+&eAr^uv)2bi!;2qDvc(B<@%mS2j3-ZswcKN9}UQ7E0mll&L5X@zys zHKp<6QW#TtU~20(1~~SmKCY&0CZ}lAJpBfBJhb1a4($SCoN$RZMg_a-@W%SAtcUY8 z3fDRFdEiErK>~4`3wK&W&Z~*!(&HX3^Q45Nl*OUg>DZs`@twHE>e`(55Bu7(pOtpv z?NVG*T28XaFe0)9>BVa6k@JcEwa^YALbIS<+Sqj>9QRRh7uOJb(I{6(v|^++&V9bM z#ZcT17XTiI%{QR?78it8Is_~Lz-23h%MHXz*D$=^vzP11<@Y@re=qUs9Zd^5c?&_m zl=#%=197b{j-7N|Y+2h;>sT~dJ=xN6yyY#*M|Kmp5>bY7b#M^rcHeylcNYyO8XO0$ zcj4xl`Xaj8CJqxAYztMlI;~-thJCFqhzw4Q%+!e7-b8U}#kDJEFNZl8V{2?mEE%D^ z8rm#`3-QuC8;?pDk&O-dd>#?ZcNP4PV#TLV_tFhD*n{7#yrLuSlRoO1-W2BacW zGE+@C3DYZE&@{woN1mhi!r|Xq00zNJTC45?JMB7yf|k|yxtbyCX=m_JodSvd=bu0C z%3ZgkXnCZsj;W|fyt`kx0U7$|c%ihdoB#DI>tRx7G$BL&z!4pdO?iCpr`T!%!l2=% zvikRKX~u%$^7PO!knwMG5sqPG=lKcd(0e*b-?cHT-Y~~!=O!j^2?)K!4K*#zL32RNL_E#Tr`gQXKpmeiX7tjP+8TK5kd9I)MIO5PoLW# z%-r+%y%2ZCCG2Tfd2U+5Xc7Za9J4@3bc@{7hHKmJ`ZoC+69!|C=U=~M_}D4av3e$j z=6ZniCkf)pwKv)3XY^(|gT=HolEjNz4R#uivyK6|DE@346#d+a3vlI539AZ$(Sf)o zzEfxo*VF-=Mq(&HOfVMMWQsi$^|WDXO)Z=;WWciEPdG{<95<|^%)HvR;ucCkwsEjm z!gY*-$!fMOhBJtoE%sOJdia}Ms<(nMYbo7(bQ|F#xhR`#*7&Zn$DU6Qn6uO(E+&1- z=K`_WMu1K6`B2|1YW4g0eP9|ngCUiDbXC@%a8)*xw$7GZLPIBQ+m!IF^WOelMCjxUPIN#4J(~8aW7b&cdxat zb+)bn*P3K4x!n1AZCV9m)aG+$zQJdD^Ljd(Zdo;kiDRqo#NGOBSbH z86digygX%Y(M)=xc5%gtG%TPhtA?=8@}D_>upFYW;0#P`so@!m!5)obJ)YDrb7Gws zaKm^q`3X`*5qtZSM9+6epQDQfQ%NW)gDggyv_F(Bd=8KflrpLYM;xATRYmmt&$~eupCj5PizeBCTf-U~kb5)py1V?=55Ur~MgNlMU$w|WaxaETp=EgvmoUme~%iE}A zeYeG$4QzxY4_!j>b8M60HJ%^%#YOw=j{B}!w?@oEiS6!5+-;-V1+&ENI)CRo$%usE zbOt{aK`sv>R8=^A0hy=H*_POwT+onE94^(nC8w6J)6j0sJIpsB71BY9-`_jV$lgWc zuxcbnpA>sBt}=l$rJw=B$JhH_I0U-ga~xFgLs2_{1i zNBk=uMygCy>1uUMSp*zAiJ3=*aYA5K!VpdfWk@2D_^cf_D$V%`H{q?n@38O@zMt`Cs@%Hu`xz<>sV+*C)Etdsp%;3$K;FUG?_zGRV=q`*Jb)W>ezP z;n4%R{$h(qT$cl7(z1MQX5*mCpF>jebGB0kD%W@VoueS%v?EG$6_h&4d^q6OZWX79 zJ)-B_1~o-EwB#qYU~H4A=h|()FwMaE^;E?eB^yPS>9qU6EDHnShx;nnlr_<5=`7sP z^W&7eI!~|Z@(Z7&$XY>+WkA!FSiLxZqX}q3GGXR=Eh#5#*8l_P7sXyu@+0VmEdJV@ zCdXJFU`GT~sJf@W&pd=qOrR>Q7e~AkW&G=bP!x}9HKH#SYm zsX5#03naT5`TZaoc#Hcb(u8jO3p!RmMMR^FxuLc3KEI6fj|K^N^pAhM?ht1CQR<`# zFJXuRyasjp+zXB|D|iiN+xl-b#9@W#$HmpqWPkZ8Y?dTkd8nv(mOMjh6bhleYE-GmVWrm?6Rc;7HB%q?aYVgb@1Of7Oo0&2jvm|B z%J9E@UQ-Ob78Aj&Gj;HJSP=|O_}&)eO31Xiuh6YFXi9@`hrtN9;Og0&Ehx#fmXG3EGh}#)?%6mdcxm@tI;4dI>O_-IEMMeSCFMxmr zfnJs8b8t%#ijsn&840G4XELtg!9@Er$aTUyyBjoukYzvpXwCTWUM@wJeSI^3O;@3w zJ;{UN5mgV7i|t*PaG~7;y|?8k%9AGp z3kn|DCS|Q6dSsdfe0L#ekKvtF8=u>M@33g>y8yU`BHY5(!*4+3{4B}XY1}~N@YfJQ zzQzyR+Bf+nxbrl(S-GGDbJzvd3keySA{1_=w;GQj!+XzZ#WjJoul8Af)8cUG_XWSp0#l)0`Tes{ z#_^k=?`*etw0`OBr3LJO9~t)uqK!|%RTNKT2uUyCK1uz67Dpzkp5>z51L*@Y)q zfrA#sgKvqrXc}~{ut>4%MWqpFE4Fz(+ z&L)A9Qrvz-0MC*o-6=w;djb!#Vu~>HkNP{%M$j&s0HS5(Qw!^op&zIgx_>hqun-Xw zXWjksG1D_NlJ*eiqRs~UnUu&Zp4u;uWUvgCD+4Y5)znR3AD|Yg$PitJn|WKTkD2~s z_v}3%S>qz!0DR)J6sBCQWd%*-yKh7|HoaZj3QCU z-jwV;8)g}yLN+1$a0uB#wnVaLXqw^J9Lgry*%TF72ZyZZbKSq+^Ll>W_xIM2A0d868~L@oK9maqf-q<8OL z&wUT&Qjs%LcQOA_y)!h2=_!|#PO&RTU53W2z8obqjv;p){zqdHJnfWAjK@_AV*F1r zyT-$M1!*@=?vu2pHjt0}5g-g~unc$e*3zWo@*p(YCToQ4h;X-J9tK~Otakfh!8Q-U z;8Cz7PsoDW|19FLM^sk^uhzb{Q{hq;6b%s64@?pwYb_+DFxaPxua>WUu`eGJf9M@_JnhdeO6=|VAa(#mg~&yazJIsa>?o+IYz~;#R?h8yNDy`~ zH8Tsam5KG~^IvY_)3S1P=$QR*qu6=MgsMt|PBU3CyX)%AzP9geWgVht-~wftJA)6p{Tgno-Y za}5K>CM7uGC5q_t)unxZzaMqZ2K03VTA(PGslR}QROpR_k&#i?g)@`AQ85Lo3qL6$ z%t;+P?svFpz7}FUsaJ4Q9EO|Y0BwlHLRfUCVlVFef3#T18#r7|Byb`*aNNa_;=zlO zeBaq*;Sq&=BOrpO&#cqJm@h(gn|gv?H=iN2bdo7 z{S&^cdwfun(cE@97Fdo_0QYe6O%}t#rB~Xy#}Pf>om#MQm6$Z!z5DaSz5E%}A{lNU z@5+_y5^V8})5UeabX9iNd7bC|6UTDbDw{mHk<2CUdN=LpwG)IyzVoa5-Dz`pUQ%L`Md zCNpopu65!NV(+MWwF(ShoSgTA0g$f3y(JXP(+`;mHebaTFBn#&RrSE6vIa2NW9=6& z7wn$2v5nKQI_=0@ZSS6{at-B8#zeH}(55pP1n+{HFBl%Np^guiRt}G>)Q@$yc_>fe zoXN>Tw&7vv%ykB&!>yOJ7t_q9zXmE{U^eJbr!8JxVZgu1(qD+6DI|i#1Uvm2+P)r= za<_I-hApQkj`v>(YgKiozBc2PU{M;*S{56LW!0PFuE0t3VtVx zYS$ywJWdB0N;0no{xBD&p?IOCOKvN-{&-U|&zt8v)@LP-77PVeQ zjL3zIi}xhy%#7da)~>E0intfTZrhv9bz0PmxoHz;z?n;5Km=_r|Vxt zl}F^=#OTh7M8~)X*Kj(L#~Ri>j5eTbbd!oed2|hz+`aTCG4Vtcrho2empcN#k^26q`HZ#8S2KJc2Ix?#We9^v4L@S@Xz3=&Jhn}1E_`uQ`gdQYDccTYSj_F}gGbpJ+c2YbLf>?%T zInI#>`I9A@yy)2$m*$Z8Aft5G!vi_0Z#@rgelgVPfYC(38hyXw8RbqN85iLbbx^_*x=w8p|oSKujEmP2}F7WJOGTa^=()#aLnX6zUXrEgOHV?--vhKsf(a?Vce z3@7+-(`O@7xrBWc!Ppq3{|4$@YNJ!?RGl7tGWl}#pbMwIrYd9rLjw#Vt!+2XE^r~n za3)(FK78D*0nQq_uX}gW-qhwlHy^jNI7+90=#e0u>gF=)_UQTQ`tgf zByIv}kYPai^r9YrZGU#N$d7wD#^xu<6Hu{7`4}(!`ITfL&{PIlNBi+x6E3oF-C`uH zVU*zz4Sc~fPL)zp$o=$brNXgr9t0 z&HIm6G`)(Dod~HYE)l8W1+yIV!%VNN#6q&p$9&Rsb^>xlx-t{0e(GEx5jRjBzR*NO zGO1wvPX`q^;6nnsL%d^|m1Pu{q$F*n^OShbgbtGBNIRTMg&>o#;?BN&)y?*# z7~VEjBDggE@eWxCN;u8XGY2M_c)A>>Ut2dTAReDP&0(P(9tP157^QC#F(hBUN`?m? zle92@xiQgF%w^~^n61+|v%D6gPxCK+x}&M5|7$KRU1$r9q7$w!4ve~Q)AL}__1AB) zw8Iqw*xuH$xJxI#c03~^JbtqAVo_}nBCvtA*UXOoEG3EvSx*@Q!@mGpaykQnSBU@l z=~JRk60ETyU|uK)J3d4-LfksUYc$;_!^0T^&%Sp_&=k_Xn*jwDf?pt3u|Z-O5bNC0 zVE12lmAYE(E*~7K&C=35>{Zai${mcnwx&d#gPW~K`03Bbf6iqH%u_}MK1P!ls;poN z->q7Diz_!|_Jk}9VC?$fza%(DY6j{L&&@$S_PWWoxLhcyY=G|Rkg`~-+AS@HIf^2< zFH3HBvrY3Mc#jpwAjLo5lMOc_Eb#io^JQhS=JBW2*vwAxYF;bE14b*4x=p)!)sizvYk-vE9f$-=(aM+hy(!^H5^1h zdsq&HUwZCs3&|hup^*x-`H}G|oEg}1c|aBcTg>C;$KbXAKBUciGph-- z7PddDU|1B{2Pd1DwA=T9_qKA7TICYY)Y{wI%kUfm^((0y zJb~~!*qkSQFy6xyV4wI1AHTXWM3y%qS!2MB;Jg8_nfdiA8XVNwU;$2&c7H9E4~sW+ zddbM|Z|EZBZgDPMK+Zr?V~iVr?Uao(#9wg|6Jd_IpIT-v-(K2x+=?To4KM;9sG%V z23KLoOp7WIWE$t8|3&PPu?sdiOCXv?pdupfBglwI-w0dW+XDCo4B^)7*kpxz5V!F_ z0{#fiosLJR=<7c~0-w6Pr}8Iok53MCzubqBE_D%vOl@sH0CY?GzJt(E2YX&4RX7Ru z@f}W2@UuX|`UwcMW!_jErpUEY^AGR1Oq3r!oG zbQbF1aGPHpA;pJnDlUB4@>F*St`Wll>X@yb{Fy6@HJePqezMw^esgh<%)@rkTTQKn(z;9z%!>wJvq)g8|7=^m3!K#gI`MJzu? zwIz!bg#QM;N?H3^EvZS!q%WSsXxqv29poN>isPICA9Sw+g z4(MXiPD96=u>65%sDW748E-hlN^A;5z+iqFB|6g+z63~U=qat1sX1vUKR*wMOfnSN zS^D|Y?)9AArTVGz`+c&7gDMaGj6!xt%WS$yD?kj!R8L{^f3*M#mA+~EAXV_1r}gtQ zM;-0|Uea<0qw<#9#LVD!2g}h;+R^p9;2InnUa*nUG=U)hro`@uJI0WJN-)+-!nSyZ z884`(t(h$RE#wtb?jQ<6C;nKdPG3R}a$V0%rFGHg&0^E$)0c=I*Gdvde-aR2n&79j zoN=nsN9@e)+?h#X68X*$5}^L56tWL$*E=rGeEAqAq+ELliWp^#$V3$9hm)L5keaG) z3_3d{5E1pgGt{G!QW@;D53pqowdmO)7sT7V4^RAZky;x;$MR&t$d*Ny|` zczn4P+ctCV(j*oXuh{*`S}&3Km{8u&l2zSq_he(;*W}{k*5DKne>Dwa)MND?!kej^ z^PpqEMPKMjU_P32+Xh-Eg7f z&siJNia_46x4IpCruEp3=%BRtRoUkkJGZ6Gq4Yo%IR7-rNic+Jp)(F5dc|wB^;7F| z2|rdNU=Ye!@KU!N&bCxBs&hRCqqNY)uI+kt-mXDA$PHD5J05grw2+d~ZIo&|ukV~9 zY76kuJ=_il3rnN=<(q{#kq`ahcx{Z&&S#O$y%qmMvYLHh?jA$tNzy~QHbE#ndZC$q zKA|G9R|Gy$=7(ELz?1b)&we1|y3GjWC4?24|86;0tFEOOQSeIy)PfQU!ySIN^xkox zXiaH3UC@1(Lj^D_a{NtE?& zqJXKBJjHsMI*c`4<%nwY^YPM5h}};TwS8*v4g*iT;#NA^rU1elCF78?yeaJaO+oWD z61T!|#vrrZ7M6`rp(5wxfRg&haH9S5;0Fz4be#CjyjF6E*$8hSO2iu59peN;WeslYQz|Wn)LqekC z$CP86EhW%KZfK0k9o~jL99b{S0oZU0{;_!)d%u#(uc`}{Q7Gq6Sx23w>K%i<0h6De zo3ng9f5O{u=Ie8_bILui#_i5Weh0`U3~;&-u|QkP?PCepbklmCUIidA7PeA<*e2;_ zlxAp~IYR|jm3)ZIqS^TrU<*Rtu)l8Q9qcdY4FFqsD5I83qniZGykgoeQ89Bgs6?p5 zYU8u7?3rOQhHv_1Sg5*aB$c%cXqB9^+!K&^on(LWG{Hd4jF-V?zIdVK0KR*f7ssnq$YU*;{UEup?0jJ67201tw|&5i+pRhPWOItz zIlsiLy>B+WlQax-5IB@-7S`7DPPd?ETNg8Ms2e)~3fMookDB4C3)r++N$2!Pu1jZ#L#Po*dWn7jAqMeo3m{D8Gj31e80K;q{^6C0GW)JH9b+<`olYgqc=v zh@!J_oz(0Dci-O+gxMK1iXKTCx^$c$G*o!!FG-D$^85Yx_VGp3fTf!px)DeA(%qa$ zy1@w5FXGpfch7TsHQSa#!pgYZE{}4rtG!2n!2u0tA-ivyDVJW6!6W?chS0BXJ5xR) zt$YU7)T|8&$BAUIAorjC@xzvjDyc|L8Y{=9qCB>CK7n#?UE_*{+8mXs#GA6Ai!;k_ z+28nu2mB%)$wm3Yx)EKQ_bE?y&YNS3CWn3eUGKLw>XNiJ>swM&6v-yht_i6zDtzi5 z%B(&2^>27BzN?J-G{=EboM9R)=UF}2$D6V%(;4rNIXBQ5boNq%vQn7xYnQLaOe8;Yx=?e&JZPd!%{GG@-JdNI*cQFGfrK*221{ z^1CvH#00T~im3CkKjkB?V|1UOyD;XmWGgn4-qdq%VaC=d<>C2Dt=U2QtZDjm< z-uuW$yq-u42v^4#PBlARyM?#Dg%^FxplzGtTG39IXrEB;38_G*>u1S(K zw?9Exu8rir(O zn}%XIQr4f-C#*k`I<(1l%Me?%FXJ_U`K0OnOJFmvMV}wkNEpwMYk;DySIR?Wiqy+{ z>sVq|zrgwD(te|}>$ZM@h}Jut$#Qaw@Wd8<39P|(y3Aw^vNzMby}agrZ`$V@@3Gk$ zZ+{3fUElupOnY-rrD2mg^D7ghfgPp0qHcx>IGh z5=)-*Bgr#bcz#yd-?+h#@XD3IzB9{|QsngVC(G@IL`_qq)&$*uQa9`Vu@0EUVkhn{ zyU5aJYrN`M&ms1(S8Jc!F}eFfRa@j}H~y@XbUj3(QC;d<)TfEKQ!jw4dm1wQg1X@J zJ>VX5tM5R^S9I4e+`%r74yHKV!g`{Xi5)e{9O-7=L1jPFQ)gkqYs%D}Dn@31K14>r zSul^xKSKsjAOA2UZXs&JHx9B~o|}hIC#UIWF?@S1dwmmRfD*b3E%Ic!A0YqLc@igO zUJTV)xY3{xY22HFRB3bYRC zE2Q}K7~sf~=F2TEce@uXBQklN?6P&9{F$q_ILxHiLttU@j~M|yWfn1RSXMA zmPhtAEp9J;$xsL^iCsr-Vi8hre4(*5=jD4Lafe^qH-d}?&3NPTTuN%mGlmz!uyV6< zU5iQ&oL#SSz1|&t`Z_$*(f~v7C{vAh!hmi0%H<(-<7Sd0=WetdK6|5=dyjI|mu}Ru zR&Ra2+n&qM)6~U5@2$SM{5ABM5ba-l>YZ+|y^9<~6&FPHJ{Oo0xz^Zy&r6#NOt#I?^aYVulHxXKwab+$e9+qp*_8$qo2 z96&fXwI7m{s?|gc8W{9CzLintNl$uhY#lE__wc%@IsVntkGE^*+^H6AIO*uC8_ZiQiw%Zy)Vm)Y&lPx}78a{FPRcNr=0StL~EUW0_Hv z2h1hRL%`{cM-|(kh*kvTTYH_U>+j9Zp5M1~!#?FT{3PNC!4#;kcIFm@msNT)9VA*Hp;F14sFGBj02q_5>4Fy zOpr9$A}wxQeYB~aw8KSyVLoxZ&1t^8z{k3SabCe@7jr%U@^Bp4hRU3$Wlq)kY zVXni!T(()<&7iuvyR|H4Y0lesc3C}~ijGYAnM$ny-3O-8x{mft;MP-kJnHLd-L+-zp3W~-2~k^bt! z5tED3zxssO%*Y4?svldT6+|*+K8WVF{;Il}kG6PBMf>MVoK{EBT3gXUn#UYwz1V%H zMkPMEaaxK;*n-E92t*>k5l(htJ~Oh*_PWeh$(Ma>a8h!NXDWw@2xA~>+x*&xXc^ef zEpqbNF9LhyqxZ&QQHG9R0c--CE44UYP{XNc3`lL_;!Ux47~@L z{s4%>E`iu@Y*UcP0U`>8g6a9TePi#EuN#|^`p>#H>#MOJ(PdJ^U85Djv4|Z_=^mfNLvuOx$ zCs7FS(OiDnQhY$}RImDN+@PLFw>Mos1vFEi1ys^Dvv}~rpD&CMUYKNK|10F0!V5dB zEj=xx8B4load2|*Q`QL0s%qlaA(8j}BNfcfp`Sj{o5xzI8`7M`X@-fXz$FOHd%U`; z8VKSqMf+>-{|o1VFKtz17%CEy+Pl3$Y7^g?w0lWV#o35VqT4dVY@R@~4g1pMhST@9 z!p`x^_V}7)pT-M4YK_T|s1ftz;Xea{SqQX%5n0syuc_U)yKhKxW%qIR+;Yl!Ig>FO zt}%cXY{dmENlsQp{%W;r8vM3)$o(TEl+!;#{5B~bL0nG_zyuTln5YHFJOpGjQ%|_& zN;$2bE4f*Ws4E;^wyv|8JNn8?u($zSHnT}Jd&C>a`3=3{uoPsACI92P{f9dN76aHZ zAx*nHDk_L4kwkVsfGIvrf2?fN=HUg{+f}A3-L2JFj!MUKTJ;|BLopmm+ z9`!K^MWT8Wc;nT;CAbcYeF_e#pa9n?HFUfHuKS^8q9)g`?jfzbk0c@jgs(ylAjeQ8 z;DcENyqN?^S_P?VhHWc=CB;^q)`EBIKv%)Fo~{5OEgkHwA1WNj)cr<*6v%;y?)g0s zfDwV*jYm6&2r-?}7W?2QJoF*g4l3GD#>|0qE*#39=tox;7wud20q4I7;#AZxmIfdP z*x)+nHy}1HgtN%IepX-Sxc(RHsZC_p0+m6MG~lYvkefKDxUA4USkr-GZveYi@&dS& zf}cD&ZB+B%6i7ag>-L~_4&XEtvVb@>Cvb+U`@(>(0Tf1Lo&2nsa1iH|;jV z567$`e}##2l6p{3P;Bq+vIA6YO(>?^t9!N5ZjcmkCl$c%T)>V9?I7pL&hZI0CUVGkaajJx9Yxtq_N(4 z59Q=x5XwLVJb{Ey88IFL`Ya%LS-;qf^>YFKIQpGP-VHclqFQX25eb^2G-thgHfm+6 zp~04{zaNFez(*pb(Gb;eFHN7P)iU} z^+60E!Wbb znKwkG$uAU@yyTo~on7EHI_x|WaQmWOj^Fu)x|!J!2RV-hKQPT_AbvYK0KXdN zTJ6iVxE>>1p)szum&lYp(q_29Tb|yDZboPI8~b>&PN?>{oc9)y#GPHRUBU8sCyGh# zi*#qN(u1S#?KU+r*z^;E5ui644JuK+=qw4d@g=$}g92t$MOB(&H94 zkbtzCixcbtN~HYTkZt!8WZ-YZJ$(fd>fY|p?XsPu(nqNf;H==ATpA~&rpE|@S*`$ zj;S`bB_9=gLUKUq-%bvFI625=#D3~quui|k`c7saHZ3e`ikvc0GgWDIO`V=iXmL=i z29L3Q^;seXrBC74FE?804)fmjm9#k&``mAn+MIquh&0t!pWmc9*jUD+o@Jc3%JBVt zrfgKD8xoOuA24I&$UqvD^h;(zp>aFxU{pEIFe7Wv`o#|74;qjOkyENt!XTl3-#VVF z^jA(*RJcQh5Y$t{Rtg7VpX$_WMk$WhtRJQhd7kJCudlLhJQY( zIJidXdpxM1rCbi12s!Ypo~(K5K&5?cvK*n-Q8&F zCs7-PI(ZT8$xgh~%L$#-9kQY8$=Eue`STg`EzR$qRrW1_+1(+mL-zX?HvN~F)A-1X z#2-WNQf(*iU~nq{ogahie(m#%W$Qpqk1;=f4Ib;g=C2%C{wH}nd}bzb!4 zp$1I8G3ar-3-aPv%f+#n-suK z=8;pi;7yLBsEhRW>qWnRd}sZ|yATOntdg+aiex^Wf+n*b$i-ZfCPOKgWgaX+yewg{ z-QWc@cR++C685^ZqrLJ^eTX7hJca{^C5ex!&@8u5Q!@nA(fBug@UB7O+ofbzYsdv+ zho6Xh`BAG56MOYbxSdW}%cXbW98V5<{KJiBL>Jsu;subDaC0vpvCW%zCZkQ@$f@#5$Sh?)dVT>Y6Jz~NU5XtlVXFCDM1Ngdz?~{JooW{oE3pykw3NAI{e|s?`9>#=hr}1!i z%=wWI5K^Mtkoisi+Rx1YU55Yrvn+QOv4=4U+v!&1_dN;R9#V##Kz?lEcoq3C_XToS zTqc?Y$&2@RZ{R}+H-s9Gi9Q?E{XZoKf1QGX3q;g2g%|(2i@*@16`(0VuoeLSR5ppj z{og{f!IyCgqU5Q<9B1mm{*7UN5B~#6*k#Ctu=EjyI*%BA?tc;V1m0)OBF5iuda0~9 zJxJ100jP7vPgNjfFc9Cxr1#bRAGTchBc4*oq1<|-T#4li%-t$7I$}HoBr%tbi1Dor znj!IJ7iA~?1)A0d0{-RJWyvRQi9vbbkL+az%ZI*bKibeLBqm>e@qccUKuPY?5Y5oG zz-DC98+Bq|g}}*am0|B#5Wi{2jS4>PS=6_Vvq)V6SZ<WydzlRe9SC@G4?9pCAW=|IkoB4+b6y#sy!|(l0gi>!&Lk}_Q546Y%yv_( z4lEE8Xf6?-)p@$-+<3{*T=woRecJ3LWp^VYlXOU9dsEgQ$M!^@7#`pa6Io|7n52#` zP`96&getG5clXU-7|6@q8YiQTZ(YIopO4hx6zA) zM#)JscXtmjM7tQlTnRl9j?6t8|5pp}Elzd^w|Mbwsa5AW=y#A zs5ac*G3qSGLu#m=4)2~09PYs$HddqPFf^}7hlk0g*M6EFx!JE$zX!IRk!tsgK*>p_b1lj#V)1F-b=GGE1(eG*nmZP%k#3%qSFhVo$i!x5e<>!z zY_fb}4VIDlD8tBagfV;Jtq%nayxXAL?e0%3K?errLGsGHO!}W}E_fX!F9gz3E;niN ztM*M>5}C-;Wjj;`2z{0_q?9$;5jnlb_qmq#*8q&|>_d*7K&;L*9_%NH*(dH~H(0ik z-4uR~D!bj?bAGkCn&eJ4L%_!#^im|i(#qXJe_@s%?rBLkIh?Dw>X2T~wafD?;_ zfRYl<3BQG3;;8kFU^FQOk#Wt@m5_ey5?hG(q^#FzNaTX54ui#aP)ZCZNHa(TyA}j# z>UN~;p`9VyXRn2XNP$cl-s4SJ^4^-19XBYu*91iW&8*8uYq!4GBVk%jldWpJ?7eb!!Qe#`qRLhAEqn06WM;>f( z)Ika>)Y~8k>}FFS0~4b9vVH}gbt`J`kaG7L&BT>YWoCQzGljP?9+?ja#}2%M8d!K23Ep*`La(P?IP9BJUk$ z&tU)KHrKcWXwn+!!kqdG=e8cPpeDYi;y=$t%c^O3M;SQ-P$rf$^#Fyq|DcfK9fdC_ z@~oTYpbe%>UL1m1P$9mGMcn^PcoI@$`5jVYf98e50OQdG3PUUJpM21f)yEh>Wi6p70Pg-%MbqQcr^$=pep zcq!&3qF)LU9Q11`h*eqT>R4R8O9AmipoD7yZX@S&IdN=sc)g0J&TT<>hPkuCy{V|k zA|B$sI>+rPdGDkS`OjmLG75+IPx7dB)pK?dyyNr(MHKn5wj)_bpK_>K$b0if;y_gf zi*f_YI1=~<+iW((#w%y&>fw$gYn!`_oT|1DZ#va#dY zo167MQdCDZ)_%>;pL5!O9*t@Mz6i;3sy8iW=1cTTR)Xpdh#_-ij|Pi43eI_|W6YE2 ziX-go<52q?Gf{)jf}MhO1broFp>pFfw$92+Yp-^hU*35-T3ShVwMBt8Z#T2`i=V+w zr$#TRzIgDN;w+9!3|pr^h1UNxCr0ggDV%b-Jc6UOc;0rJ{&ZArX8MPwKB*^MAtG00 zxZGW7Zqa^yv`JXXV)Ia94~?a{-B2BNacV&rTi5Z4Y`f%|GIe(f(V?+#g!P=jOxp;9 zHN#%a$79bvvEda!k%`6!=27$Mf8zUs>Ppnfx_kvBn!#Zv;t&5Z%g+X1eouy{?%rkl z-l4YbWP0!Tt-@bo_b&OJ*rllr?NpJyS+k(Tv9{V=dS&NH)ihOtFP@E`Y3pOpD>{w+ z7kNLMF4+jlrYT(bd^zckHAI>r1|X?W1y2wTnMFUWeAXUi4Kd^_Nv`v=c{g~BUnlG| z;aB_E85ACLjpx@wEFB^-L=05F>OdP}ty>EETQ(5CAvL@0Z?2R46;T97Wxk32XKBdx z0th6^_W4s5AW6+y&gCBJ^N@hL$Et-ek8sSW?x7fMo@tr#DJr>zg@srsgE1d?sXY`X zu3YO295>g6Cn0eMa8zL-HATI!9~MtA&ZI%@AXD3YV9)ZIR*tu&%TVYPYzZrP@}6|q zoh;BYg$0(L*>@w3ZUq#0Pq?*AwtveDwdx#ssmKiD<9EwfiHa;Rv}(*8l%9lJRl+V9 z8XxI-LF7wWb~;1C*#x(9HU%|R=PglNR|hrEZMC!1P}`dbL43Wrg>)p2sjoxhCNAo| zyIPj75lz(KzL>ZpJHK$Dm(gVZIbi0aPuW$-As~t-E^Wd%dF_>W_oxI2v1PdpAPJmIkxQ^D4YW3GF^;}<-lh{eZ18sLW4Xavk6Nm_jto4;;r%@+3Di zW;3GauAClX^ox4MEO&VD@o7z~fdk#sZpA^GWq0ypWwI16&8l1!PPf*#2XYi@LIr!# zj$aeVda!LCG2J@bzDsbu5kKOlP-&$PKsk%IH~;P@V96&@tU+m+REkW`Zpn(B4ARc| zq4D)KifXLD>O-_{`>M4wvq*lqWogqz-6O9(WNgkZcTHy0YN|*!k;D#b#B7Yfuwlos zfh4rY?EGHTmh{@(n&E=C}n#E*Oy&Nt(zvw_ZgU z*LY-`OjMgCc*6c}J1KmxEdY+z&?I3}ysK4b)7Y-p1uFjgXzh{GCaw?}_8+PB`xRap z*z}yt%se9pilT!~=pk|i;hoe|t{ZmU>?^x1G@t;QRQ2?p=~+1vg>jy&HNfmoU$k%H;bo>47g7~Q}yU%s{!Mi>Ee%sT-Z)nG5 zA%w|AC6;b@%|On1fPdWxb2z|Cc>SI~f4hAYCamvWKfM>z7a{pdf|A`Xs=eO_hcwFu zH5pBXl%tfVX`?P&^PR?BE4*p?bg!q$?TQgki2zKi;#0kRA>JFSF(iJEOQ|Q%mcZy5 z{M!|_#6P3DXQcw~(=4MNgv2d3^{(FEsr?-39mFTzx`JC`S}Wp$TMIY-+E zcY;Sk$^KJT;0|k#g~0d@9!xgxbn(L23D=un-rFJ#TU4(mODt8;7ri{O>6G({wdD?$ zkcRf3N{!W~df(c2^%Y{Ah*vZgcY%4)F@Nc)vou zbufVH`mJq7t5FY_Wj&uzE<@_!HPBJN>n&S}x`zmQC%sqX>`gsY8b3rnhXy%}AbPqL zH+(tG!xXx83NAAuUX|0#!7?jDvo`H{ozFEA9zwc7Gaw_YpmNNN`zjlgIRUyhF2aP)9Xf&7lI? z+dY0Fvl-a^FPn=mjo+Mul8o-DB^n&&F+S~=7nW+-=rRklU*$`v>QJk(G)#K>m_GLI zZ<~wbG|OJW^<~akPHWWL*~SmYeb>7#jQbmTP;eJ+&$VtraSom?VoUz^0G3*-c@D?%PRMG`p+8e@Y4vtaJ`Z(&|`kN|GhPpUVhQ2|0IPK zS}rzY!Q(k}zon$@ZG}{Q0jk8>S5kDQFhQ97Ffot(4~XD$4%g{@?8LO&)%AzWY}>Dx znTj&c^Zr=Yj!(^yo?8pGjeG5^V2}9m)JCz)#ZlXk6p>yYQHNS^Y*W8t$t%E)7iW;nd`sBRu0}R-G}F?4gg% zTEZS=^oyIy(wu`;<_4899_C6*5W|Q6J_eoNu$z#)w$mj#b1rb3ZG14#Ohn7{#{P1hI zfKJ}J=)TQsDpGuLMp@m3YgEhktm~Deq*yx_i&8gaw@TA{MQ-d2=8Y*6klSZ?QsxyIiT<_o1KPkjOjz)jN^ zTdDb>T;q>V(pl(iy!uC~3;UTc)|sAr!oBQ6YXi($;=SY^OH8+?j?#|Sdu?58e<0rT zUN{Qnzq-mA!zgFRR)6V(2?i&=)^I>-h=KP*hRCCL`asul@c?5(?xFnbaHC&q{;@pS z2ZWI^HNc2i#8CP~MOX%B`j*6^Y<|+szb<&Pz-0m9L86tbTFJu4pao?bOPE((>2cm? ze^YO09&;p4%wC-R8}Imz^BA$G@T}y=D7wf(V+A4l{?T)jzcps$A8y=V9AR7wii!1o zii(Xb%a@P5ymSV2Z-%LcgJI5L&#y7eXZewZDb|}`+PgQwWzS*WKjtRwhl9J%`!V9Y z5(mx`ECWt1?05N9QANqc9ZREV$hr_EI&mU3gnCH9V>#_r&%T|s$MO>KwL8pzJ46*( zZV?D^jeCi+PuRb2Dw7$W_E;^tDKl+SE1ssCAG(t0rJceS@g2Z!5M(n$sLA_$Me|%r z5)WlJ-CNG1r5_YqH&+yn`1#dC;$>RgNWxNAo#sqWn_Ip~HiwPd*;vaSoL)3)!38Fz zR?q~3l~dWT#8`PlyAl7rk8pWeO~`3nVkm*XtDq;Xka|&_8%fMWFCKvBx!ZAs8iSkf zRl!4hr2<{|0f-J~|JpU2n$IkEJ?cF*l0?{`k&5qnhchBJqu=Z6JOwC&{^{A}03MAL za-`$xze@+;wrp2Rc5B(;X;?f{&kxQRi7Q7-jA_*xcB3-g;|I+nSEk#JDRk{_e5hSY ztY7m!)nLE2e0j2)#>8Bj(Qm=*NF(&kH;D>6H%xFSKCB;MD(m@&B{7cmbS`${%TW)C z#zVF5`)Vl+O@6VaQl%3W+RA3Q6Gi|HTur;k2Ehmyp%e;R@9P~sZGHZ1$0n>W$&TGs9#5soF6#MA=du?X6t`<~Z^x%7*;+QNj~BK2__hc=-7j*O~OnDBVP@6w$t@ zH!gFPU009b-r{Nxcc%`!BzgC|H?~iL0pfj3s+Y0%=Fboyq!0z)_HxZiSFBOuDtPea zB^<0wegkNdy<3o(!cbo8cu>?ZWZ9L}$7Nc0eFMqjl25c#BRX>d7W~!}1850x(fDd0 zcoW^C1Wi}OG&bDS9kVecj-ONAg#Xk)0QB`Fi3AWit*&6P2b5ueG1l-?0st5W9IJ|= zYH#Gfbdp#C0_+jxbyMIB0^2Ie?wqeCR)qduZ;mUkQvL-h{U3DqFPswDE&uwM|GD|V zJQ_0lLUN+dlkBiRCHWAnYX92mk(^fH{LD27aw|jp+LZ(gOn+t0D^< zK?(0T?6&8$JvM6pL!>0jaDRwOCe z6P?4vDU86}4p=cGO3!2PW6=?f$NxCuh~bYTNIBtm={({QaDNVOA~+*bNE(K;Is6li z{jXer;-?ZD0M;2{+@{xP0#)H(#&Yo7{x^{L|KZwj#R|dH{Qvz^W637Q1D|Eyn^A{y zb-h|{cbolu#Wnc`>+ZXtDxiaaZt_uu8wImB6mqF%AtkHWeN24dm9fmY^`oE+_*WC2 z8{0x1iM)hbTD{MIDw~;|Mxa1IH0uUjUi1BF9eN^((EKr8Lp+R0@csz3Vyd|a$ge`E z({9-U_-kMk7N;lcZSPkRg@nOs$?MB<1ajdDmoeo--t^u3J zK6WeYA&r2EXe6ZujP2K3$C#<(PzyT;GG1pi0}Nz&pgMAVVxq*h-&*#RkDcRk`y4k2 zC?pXm)w}_O6HyHZcVi(Ll@0PnvWF;~O?omFVnu9vl6EwHeG36UWb@$QG4T0*Y&c{R z8oqm`k^}7Uw_cJlZDD;uN`ZmV!f3)s!!&~CqIKa1c!&ke3FsB3dHKMoYBojqDCh~R z??Yc?3$^mG6br?5Bm|zl5DGyzsYMwS^vT;Wg{m z1AW2Q8{>j^V-*g;Kw@BNEzCzaL52g&w$ia zddX1G_W*}%!J2Q|lcpeK)xoTs_~OQMa=5ZWt$-m;GPITgsNgVFPYA31WiTH`4|p!5 zdwi<%_5d}-7veZ1M}45~aQf&4z1P7Y!kHO1^%t|SpKc7yYD^Y14+@tA0$(07XsG9_ z9F8$sbwSRE%ksDshuTvjP*{l_g25$_PT&Tpuu)*#)dklo1vCoH!;P6@4sC1@u7!ny zWR3k{6Yo>N7Su`kK`tD|dm(1W0}|Q$Ae*Ou64HE-NEOIFZ`<>oZafM}7UXcVhm@6( zUAW~*2r==5U{;K|^62;s;Cb^5zP~rXBGFCtv-9#?L_k0;}v`tU!E)CLvM45M(#jCS=sxkU%q1Y|j33=$(DA z!~Jhoiz*cg6>PwHHpEFdGl?+8?6!m-n((@2u9!m*zj)+Xjg;E6qlLkS0vck)8&Xcpl=C2ecHxdswhU0Xeaj(aXN8Az9i_at-cl~r zChBib_Mq3QC*ONfT~Fje5~YfZK#>HQ-jZ!WIN0DXVK5kF+8L%Vf8BIp?=ooq%v=al zfCG50*XUINNkRZSiATpbz})H0HR8Ij&eXg8I0M}L36QOPKy^ zC2IY|gbl6=u3VC5SxnJ#t+CxX zce>AuT~i!Xsqnk6r>*as%^`<9CdAGCLM{F{sJxWGRT>33YHrY$`UJvxz**X)>QV2< z>!g(?8S`eiFj?0{h|RR#$K0gWD9nr=i0BqWXEE_jtlc9Il5vbz|A4uuxKwHIu+}bB znWKqr!U(TUeoOYF6VY824Y+{Fr#=ny-f}neqqfK4xb2>0&uzjjbkgLb7HqEE$9+z5sIcH@i4VrtS^NEiwfu7%s3>p~hQ z+l=O`!yg55#V-BIM8}TP(Ul_6-$9<_9ef8o??HJ9h1qsbj&+~nzt*R z8wr7hu#@PPysU3Hwek4Isa|F=yIT@xo|D-`VFk2vUc_OS%JvB|@|Y1@CjlX$HZYRF zAOhUbZ1-s8T@+J+=)zTQskMEvTE9?nYyODk$+}j>2i$72p`Z~B39~5Jo2;{;up4-3 zKF`gkoTjz)qs9Lah*Q#er^he=?=pUo)sZYy=!(r0H4tWyc}NW|$t!@lYKZy)LfnNJ zb>>$`kpr@9b50)~jsy}CdQ@>v7a;_xM^u!r$f_sGX9t&Bf|?2FTZOjo>z-Rvy~DjP zdQ12O!6!=q+QPC)r6wizbx7JSFxa*tW%Uzx@$=$W-omAj*lRj2G;Nhr4v8@&*N~pU z%j7EmS;`RuvYQE^1*N*{ibM<$Xp@sNn+1p=moaH>J>5~{W!zSk(y*2FTs33oO!{rr z5=O~Y?E*Lgt91-E=}cyyKjb4_fu5DOukO=UsRTmJ3l}=3sQ$cdF!cvY-9r7({XU?t zLKK;I%blWM-+oQIc9qu#!qj9D7b>OK)`zECAGeGu+<6R=DniZx562H^Snh3%p}69U zJ<#oYbek8QnE|u{A4uw=N09 zKHk6Al6QUfFNq8m*BJPSN542Ae4~Q;hMRbZ8>FSZ#eTknfx6({uK@)-R^|yZ>`lCH z>E3nASUQj`;=~%?b0-e4ChV!RZwhbnuHb)-IXkATu)=C}vRuH2IqP2KuTJ>c73O9Z!?Cc_8Q7^1^hq*oE*y?OLc;Y}q9rt*-3HY5_xr!^ zJ?Eb9``$4aXYAt{NA_NOt>2vUnNQSSE+X%FcKP|z1H;WW?*$S0bxqnnVcv^V0bq`S zsrT+-I2m_H(=fT)hoY8At6k|GtKck={{5jA7>%zDN-na6`?f>WG#2B%*$4_i)(Y-V zJNg#|r?N-^=g_g$oj2LH;KN*XYh8KzwtBZ^APvsAC!Rb;tzwfsT~X&g0%NwrMB|6x zW;=Remsa`g-9Da`(CzV`XkFV8mubJMFR%Ury0aW3W66ozEAm!1#Cev-m^+xKh$o+$ zD^=`?MB=$Ed6s{>%QI+S+?EXlE^47P)Lp-UCE)>Qw)1&H$R{}};sYR^gX z$9ORK#7vH&ZBhNnf>c=bs+Lag`J&&$#YvIv0KLQ}s9#<3gcSgNu!`Q`F~=!!r({HIq(r zC36kU6xaHjctZ9X6}=Ay|D-+YZq&jF|3*z@!F$PNa6kvj|2=m84lr2crL=S6NuIr@ z_FOd1WT-#Hv#tEN;)RXNans=O5tL8*a*;Xhz?a$R{o^MdTp2PTQk~%T1E`*Ly=l^gTt03mp(RwY^c3$_#n#}OV*rz(4REl;$s>CZhRQ+Nl zu1xj&L`yH=oSk%oo**2AzJ}DdypA?ZVfiV>Gkj$!(2K#Xf73_We_lP+t-G#O!PhCo zqmQfKQ!2*zFWly7IiyI*h0|S==K4v1_8**XOBHb|8pBNvrFaKX?3_dC5*?sjb<)j?S6$0b>$M^~rly5icUbFo*(wz27f|a2*f3oYRwr}M&DGV2i9CzU zUbr&@`5EkFEjKVi%US57!#y|7?!@WW=1l>dS38gyVW8xZ$B8~7?WuAsvb!N>DaA`< zs4QbL^l(%bKx!U&Z=2|QWpOyE=e*QQ&|3_qewHC3ZIy4{&{zH*ja8==O5dSsf1H=UC_&u8L``+PeC-aSekRPP5aXiPXH?v#C>FSHmHoldh0L@2Epk!OJ0nYR}5d z)GDymrUda}Y9}Vwl76b6IB9j_Cl14;0Z%SZ=*hlA1NlVq^4&i|!czfA_|{}#Gl9Rd zwHs4vV`yN-nsjn=^l@B7g8yrV34)!^?{}`6RE0^#Wl}#4aDLK*cS`L~`MvH`hvp6O zlO~%1XUd~~Jrk-0(~zGB;^$Dyd}g)QuoXy!xL^^-D;=(7&j;5k(OCkQ0+$lun^UeI#;Wnou4Dz7-7`ZK36?C$GW|HMx7Rf-;SJ}AX%lmt(Ios-!bg^ z7h)M%71BhfH1+f1D62sc>xs7@dS1=lTXrkxQ|5FEP{}Flj~?l|MIA!yu`^4j=}Iyy z6)jblV-7{8cQ{X?Q9I<146%)72lH5_@_7NBqlhGbnTH{5@_S-l%cC!P-T7cFG+yv5 zHzGeiL(0?n)xvAzkFPzVe>Wby@A}vyEQ;9YcP5}(Dru%}5w@XYNYa)==ky_XX1o-f zPt)3PiIj*uFUCf(Wz}~U1G07h3Nm~dodS0QFB(zD(R1`V?QI|P+}epxt59OqgRb4p z5FXncA-wq%X2cgCAAP#i%dt?a!~P zU$MNQq;ML9;GWnYkN^$0W1`%L%U!qo!@iy6Slg^$!Ol1ALC6AFIhWj}HUS z;zQQuq7w<2^BVx4 z0`(tC-om9Ig#)NnXoT*+BTD5D5qf3Qf@Xu}DYzdgo7wFN*vLM?ODmev^uNu02z^`mJl0RAOaXp*60o>u~Y)a z)N#=+OXV-e?Y~EaJY|+|A->(f!JW-#b-9_|yR}=K!*`q|j7vI@6ERNU{(%6>GrTiD zoUi`m5qLOdS20$1RV73J0dZtk#VKQhaIL6Ev?L?!H{wOpGERHyn6PL|9`O=6E0$V|C)miMkne`P^432oskB)D z7|$wiSnRm|b9&jn*6>OLzP+_6PBR|;dlB^ZO;U2I^SF4sGrhBSy&fKWQHAAQS7Hf3 zK+2rWBC{xYN93i(xa2>C6B5)*2GLMrx65UeD5#VB<8e5K3g!79H;gb__RQgX{baWG zIk~dNbL&KuU2W(H$a5+Hy=wuH$$*O@Mg>$x6fn@YAdXAWuNc_xWw_3i0uWvb7>|ME zkK-4h=*hQec``p0PRcyJyFP>Xc_SLpcXZzuqElkIek#M=xi)KW%SNUFS0JCTApO6c zAjyJgILT1x@0ZP8sg5mPh?IY4o7gc64A~*NFYn^Q&&gQ(bR$j2a^Y+EAm?{5o`{3c zBKnpF5c$w3Q@d_L9-%0J6XAgHv5eTlBIa1oa-%%_6rXWp zfUg+^`g;_N0Nf^(S7cA>xnMGg0Vacla{toOQY5&nl zGq5UXpLihF0vUiPN9WHkTEsn;3nC@sBEfqJEP{TGRdkH|0qBC*xVzaj1Ll!;1m?^| zK)SpPB(FW&2U4uUG!tkUR{k$EiY#SF9)hFL`}Kc#@C|SVc`-et8AP4~!>Yd%aG5Vi zoSA%lb9cd*@mKBsXZ~f=D@k1wu*K53Tt|KBrFrx$R)XI-(9G1I;naz5mCNBuKVRF2 zz?0MBzT4-a-aLjYnGYol64Zp9VDw`!Gb`?EO5(Wr6rYso1zcWq} z6MwKQ8drQ=1<9MxbzWN@HsjWMaSFVah{eRjwtkJB1JRHI#3wYK9Qy*~Bsm;V1k#6s zgkpHjJvb(&=5+J!@D6o@Z$6s-LGyPR9_5&QIwTbl5F7O-$OF)ZI8s!{^Q$JY^kVO= zFx9TF^xvd>B(C(GdO})sD~pI%eCzk*q~befahFrbCJOEq^F{ z&K$7YlDP>hp|n6CF_0nhuC|-cKfkdVa2e1C(H!`};eqd6JA7g4IRQt1#0u2r5W`d> z)Dq9s!S+gcB8S>fV9FSNw0}0$RR+HFlhGND09b^=bn)bm@6It=G8>%HreO3yL#mbe zvZQNkH&ElnFE1y7UBth(PNaX7|KSWpvyq)a`_o}$8sNyIuceZKR4J^9$NEm&0fEn1 zKSSxHt|@17tVg+T?(_-aPJ-{lnfK-_IM~VdvJ%;dt+zGR)fNEAfemkQW2!^lMt%OE)lIkNs!b9149)Utt3SJ68-1!e&3 zMv92^5`XF*kliGc<0)vZnvMIl!e=V{$l#~jBx%mGfbW+uU%abx&G)Vt`Yem~ z%dG_u_y;EG%>QCWtu6L8B`}%8le-RveyN8GSEwdnv{e9fTj9g{R5&x6+x*B|Hc7TA zv-hSF?yKWpKU}-fTfY|VUNx)xl0nXOx;mj_J5(^6e5v<&pH&S@vmZ8jhpL;BQ(YeX zw#>wNrKf;^=7|~y+J{L}b|LskFJfwhfGZvX*MqiZK%u3H*nDjQ%|~ZJGxqKZoy_3uPN5tV%o2b|!$N>;?fRVET}(CU|(5 zUDapnKr{t)DlJCehz%Cvcyh2%(S1I&3m9Yjr|<#YnNohsZ0~WNV5c63l7=v_YHZ+F zfj|Tl=dDcJp$aWg6YEB&wzUvnKsK61xtD=h0fe(h=^rkGtX5EBI&HpP2WYrB%9%of zla(Q$^?3wJeeF>ozMn&94H&gk{-~je`2q4n2Dt#~R*BV6sg6l-V|NEHPvf|RO>jsT zUb3%dS??2@d;xOBZ(s(uD(SJNRAOdW>s~C8;)%;1!Zwvnc3;~H!i+hffvfQQTKf;3 zjv_TeY5$dJ(~ssJzeiwd`d$4}+LdhV?kh!3X<{N#^!M*=gp{PFZR-=7y@uQqnkS;T zn*HI)AIXW6Ez^pvXlP;MCF9`n)I~X4XnGuSWoNlxwtla1d-;)M^YNxYk>K6EUljxD zcW#pGz|O-czkN?U|0}H#c42|qxr!zzi&MiGbfDeFacpVl&OoLiqJa^xXnst6kQsANIewP7yMXOoAD&Wv6Ackr)XtV_ zv<(0YXoX6VT6ZU{j66V0U^yQi0@a$|?*1=4X;f)iAhJAU?e8GiLrmIcMa(N5(>(HZ zJUuVajc)kCXGw%t7y7d%6vuV9DxBs` z2U4!mZ~vY=L4Rx(cz7-7X#Q-Odcv2&9|G<%s>j06E8q=bv5 zBL6rr6)gk%G4MQN>3)8hG%!#O91$6j%n3`34tD`uwHS!c3GvDt&|TXUSN>|#mDr|O z0}Bb7IA%FPi9VjpwD$Yp!p3<=W2XJV-Cv^ur7`?3Q(t;bv7JF-uZC(bt5;0VJ)?u) zCW12|nm(x$mskf)t+EfeH{-(+58WCmHv_w}Z^o@=f9 zIy3NGBivRu$FJr8^jvcV(NJT5IhJ4Xj<4I~XLt2fdH1V0M(=3lPzG07yHbUV;N#pb zF6@S5^V*&op$-_EIucfLLwsk&i&F^L@57)J>AejclfM%mf_4>IZim#pbGIpwy7^*~ zGIP&=@v=%~2Uq8^!!UukghbkwSiRe0?L+m9r1+;Hz6$9YF-o|WVMx_;(SP#~&lTBs zTuY5=`lVwZ=lbJql~Ye#?bg|M%_J$m&uIG+#>+!PfkX)Vog8^Wzvujf9^U1X3>{)s z+&qwX`F=p@2mjOGio&(jukPq17=HD)xer`5P8q%Y`{(-Cq>IQ6iOIf(5GaNsid?95 zR;}(ssUFM293__TC{h<<`C-4nr1Yzo|X@e_u?(47kqdZ1WgiCNdYj zDW-|V<&ly&Fm}GX_OnG>_TxL3!2K;A&@sh87trWl2O|EKK9ed$!{zoXOUiRf>$ zd^*?jID|&kVERe}v0i9^H>Xy!_MbgQy65k8o*gi)!~;{o!2UXZ$rJ;PWH8i1&Z6P8I=KN|p2 zujuCL8v5Me(wAZA%$5X)oW_^ksl2y&tE-JijvW)`FGOvIAm z;2qYmgZC2#&F^FKedt1k5>AjI2pFyEfojiroyIGUd|{pTvj+O1Ct!S%3$nGvp~9%8 zGbabrQGAfxfak{WWz|L3R8v#a&F1t3#5R1S?E%K3E6_xC_x7wh<6B_vc8ur6Ba)GmFNof*(RJ&TiwwD~z7{n5= zpq>)z&`l9yhP7so2Kz32@Xxg#4yZ7YnJ{AAa)Hb_gqX&p3f_Bt;|>x4PFdFDP6anp z*qb+(X*?hmRiOUB*sSF;G)syLUb z3)tknrjui-A6_%Wl)3751IZ%RM_}zV4eX)E;o;#%7KlOqlv)`SFC0V%djI{{$*Ui( zWJ^)kE_`AqGeG;JLkQg`6V;77ZCpF;0VN!3FgJw zv2ll8or4VOoXF;)SERpdK8Hx5TEPU=GUi3q2XDyKff@cRu*5`l%Ai|Ej#D%hwpUDZ zy$NFZ9GYPN6#8f71P70!5Xq3U`(B;ASB!t2#-iQY9FlRuK#*)T-I~+#h_+?G3ih2g zqPF+E!H>W5Pkx-N56{%CL69j1d;#9ui^#L{yPYLqym8<4ap-i8ojo_uj+NKkjwNEx z(+Yy6vA)1OL2IbX!pE)=l-Ts#$IX6NCiF#sp`InONI&6j0{jl2$Q%}A^v$)>Hbw>a!n8;q# zh__0K{Mu}G8wf>=%Xboh9G9U4lG20zEeFU~D{4Z5>Y!;`^B%}{Z5(s!y(9Av%@>O( zf~s;lW!=P8Ks-y1j)U>V1AfsEQC+ds>E_S#gY$W+pEbXXrxi{R=Hi;72rEVD{8!&G zm|g)Sgc9aKe#me`b%R{^Zr+Ehg|K^twfApV=-*v(K%Fb((_p%{!P-i9Pjv2KM{HK_ z{etAhXhR!4Z$ok{A|D>FbczClbCUz5v}*57&S>y$O8-`fK!V=gZ3_vTN}pz@!L{h3 z7>!fEuq4$qAX7fM0g4isRo!lv8H1ZV`mdA8_rOv{5jyMMY9nyv2nMDv-BiBJ_i& zXU|dlRvRj|^y{ervqrHgz#kZLR7vkUWzkul^)6+vU|IHUbB%<8_t>Qj8fIV4dtXT9 z4ncu?6IBx)KISzFrWth3gLxc~6>-O^tXab2tLcc3TZcDP(LV;YD+H<%t?Eldj99th5Xat5;(71DMXj?&80YhB%(*dXTshvqYEGeFS@VX) zhpJ&HBF;_qJFEaUI`7e}P}V=v4>kO)T^KjUJByz&nd48q(hDIG9mA*x^;W^>)cwAX zg0#k~k;F^-i=Kc^10aB@GQO2Jj;M^)Xb(h3HIJ*H^iYfX+*6cAt-OY3=wG3uvS_$EThu%St)~PYkk`NVW?eOnPnY#S3#G@lkC~Zv3K_0G0q)s&^ay>9vO_5ZE_9mArE1mo zk@I5LZ~qejzlb!=j8tF42o!^$X$F@p!DqO-G5mpUrc`PkQQ}o{fnEb`O6D`05^0aZ zw7WCiE_ASyO`x;R=_1?3NdBs;_2)v0697lIVyc{VscN_x>Uo2Ir%tJw&-4QUN+p`V z<|oYZcd@sZz)Ne-6K?X=4gc_S<^aMADAeEmq-;A;86#l>``aDX7m5pacygh8YQcMl zV4(0TME+E(zWA%p@4ufoZ3Gw5e=6CXvSqiKGRpuu16s-c0>>%AxX(=RD`F7qZx+ zsj9B3%LqfR2V&0vaCB@kWJY*{zg6Fz3n}PD?`#yro6+`b#HJaC@f3+d$3AxR$sAP3 zdH+bBgiE*I+A>wdB{IwSe5mmau}k!_QTGbn$dUL-TU_#g^J&DCO#ve4BJa1=o)<8O z{PmFhLeqT3s8}}S>oRu?wv$Oc$<}UE&%M6awI$oT@P0ZK*HX9o_xQ(in9S$C1TOh^ zCeTMj)pzZnR;>e#Kv}KYEe^Nn!CfNm?gu|KS~|U)IT=!rn#b+!T;BabCHb&2G%riy z=hy&n83C2eA?+$P2w;d)))S%JSA+{RN>Q__{t-`pkg%zkcXMl;plr@8vYSO6UWT)nkFe?HB2HD!e61GW8b zPs+Qs?ORT@(EHozhVRXITipUOM(;5`Ak|2Mv+LdlhnJ5}I`Ddrxmlo3LDF^7`);qgbxEl)|5Nr5c_~bT?;R8nQ=mP&;r+smr zx{Qdh;?Pr^l^ML%aF!P*0a<$6>An7^kL#+&qKv}G$mr%; zy^QUQmqi_(Ib8pF>kf^>CjX6?wBBTC5AlWbZLsXi0XWAf+w<`O+RF%(cUINOz_n91FY>hk zd30)dKCJNO#ZMg_Ti4NSGuYhq~3XHdGE@yfk|% zFN|Esvk~?RxD?`m?&Hd~&}IMqxr&f~!b-LZ_77#2AV>IC?SH^I?rBRp`Pf9<)lGu; z`()^d?z8L1TQROjQ%&N5HbUjT2A_g@zPKC>lDl(PFn!@12nOn6Z~iX zm}W%$-&ykif|NQ~koR~!Jhn4Kj2SFWB3O1-s=YifKlCyPuc)Bc;pn5Y95!HPXy%hr zOYiv6E621aBV_?nckxGC+_3qgw^N|PLeqE+0<|I2E0V0kWAXE8=DwK?T&{Aa&G>AO12Qc3ckig;+=x7&&ydU;;#OL zOn9b3c`}-{`?;~Z&2w@tBP005N%yWtj!0S_h7fvfU!l=GPGvye90E)Lr~@LCHL{L( zW)0&6>##IeGOC{Nw1u&A7qp!)t{1EB=wDER=NhcfBAR= z(gK0!ym?Q%Uh8B)Zh;K~Xl;Q~oILT={!GGZ58zrYQfqX=1V?q+dIt%qadE#G6jKuB zFO?VQY5sMVZ&qL#4?XBiFrNaD8>qA>k~?WZFR#TVaCmT7t7?yC3R$@ORO3!6#p^Cp z-=9jeVLGJSB}wGcgza5prFDbjKKtsyx8V@$ke^&ub;F$wjB zR(lX0L|7J4f~D&xzNW(`0KR2djglyxT0sDKguv(F@9&=g*=qoTokisKcJ{DX3J#BB zn%XV1S{yE>#9%PZhkuY$6v*bKFk42`2;hK@sFJA%?=jS$w0(*{3E9)QceKf|gg!nZ z;^Yw*ali?Vr*+S}v#u@A*fG@C!h!`m+4MkG#k_LuRwZQOmu1N9I@xtE^asM!(7WWZ{W(*o~t@9VMcR1@k%KFj%xt>=9-t?l&p0p8NioJQjZ)v zrrV-4CXyFms8?)z`pgCDD+&X5a~xnKxi$j>WxhMkh;IkrILfdlF7^Cv<`!KA3T^&1 zFylfjmKGCG0I#SKO!aTCqA&+rOHt=(hw=kYjtK||kin@rbDQcy@*ns+I}FVI?*$l2 z07PBC3Cyeitv+;R#Q@rFFRRIC-sre}&DGO0?p4N=>;nk&9U_yveg-Ey50qircAULd z`D_@}MD7|pwkMYFHM#*@C*}_#{ii&s0t_PGP#BRUyxVy_6LDke7?dy;fXoy*q_lCJ zwJUWoXFF^2>hDr6IfO4Ex4KT7Sj_>?yHnr&SDHA^ti^Ar1-&oEQDqTQHinkuFyHUX zBro27+e+X`v;&P5AZ3ls;F}Q)$b$t`6|c=cvB8qb=eMW&nfX9jgN9A_+f^Z8zJCXr zgmuUP^uRIT*pj+1>9d0P2$Dm46RTB?a1{QPGO~IFD$!`8e_dq6!n>KnYbLN9j@(uq zaGZX{J(Vp!ZP&BHn*8D4n3`Z@XNor; zkzG-cT=x7oND)MwK-AHDB^rX=7v^u3BYU?2bpZD^kyU3e+bQOmg-g_BPN9nckoMVJ zl?$g;GB3C0fjp|KG2%@87N^uwE`>@}J%E~s4_LYq0%@gQG@0tJJcDycPDK?0zay<~ zq&gJ~xEG-78NZn!yH8|NLKc#c$o$tgH_wDF!#NM#Cy{ku`q!<&#K_J6^_gc>t2ynJz*POD~6g46^oi>zapgu?-a0 zeKp?rKOiMjbL@f+s|9s~12Fw@A3S9wS%FMDD7d(c?LLNh#i z>L&o4KWMzZUfY=_L=9dQcib4LA6$8h#7e?ij|TWH=fgkmJ|y|IAo2b^{=|v{P>01` z53BdycR9q5(CsB~H9gqq?*^E|^4I8t-d#gjorl|heVsOv2o_=T;QUo7+p2W6vTLPN zM_evNN+NILIM?cLi*n27$oVhn@k0^F_*#{Yb_Hu^on?I*Bkec`C{&mC4?9#0L*;JS z^k#+zk}|i45wD;$(BPR6>>Vu97_nd0O}2c)G$JEzZ_MRzPn-|yW$QD}O! zDrpN<6iK&6VGx=OC@F3F2NrgNGv!L1bi3CiCI~!_QwwPz5=A{Cph7O%Yay--(4e!# z^zHt!W2(XWZH8dU$#gl@gMvI_1lQ7}8tIVyeRf21BGo-w>e{$%jkt~X#0&GPJnc~) zHH#W$kfFK$FUV=LB4avHAyaT0`hMlzhbPaVNlbMY0AybQ7V(G~mD}@FCsF%+ng(frgoECZAzJ z;fz=S7k-bo^?^1spj$Ew=nQ* z76y&4td4&a8nl2X*!LbBV`xn3Gi~+Wt3BwB-5CH)X+`dLoiHs06C=BAl?0JHMayW6vuyA{R9ytcpT{Ee~uSaDHFR=Rf~1-1tkawy_D@;#DUg zRWJjYN)RWJtx2zV2s+7=P#(->y;l}kJuB^QYAFpvYYpNDGig2}RoTNT8a^=8Y^gS(+% z6j=(@HP8~n3MK5qO+|E;lMn}U52!?XcbLxdmx3IK0?sUvbAr+(9QTN*S-FqLxHgEQ zN*Y|^-mC=qOM7D9?wR$RBI12py0y4Y3nC#SrF|H;2WvL*@3YRpegM~;vi`>dOm!r5 zoDKXBGV!1K+?-m8oMfiuw&bBYNRbK4+d7*U{P%<785|s*+l@|jk2+IC=$}I1Z~b2O zNvX%9hs2&1va-~rK7DkVa-Jk0pz-A)02a7h$exuqoV4@!qEYW$UScxE0%dDGSMCRd z*z}|@0*mNPWNW$Nypkx|AK=?v;EOdAm(bi0b6@25V&7?g0bUhxui#c>0XL;w<&LJ| zQ~DCuV6UM0`OBM@Ncj(BFjYI>aJbgex#)0Fd0Egtvpg03{r2*PxAS|omR=<+KzSO~ zy8IU}=@9@Cpsajac`msqJ3j!R12M-dy zMPCX8GkuNOdZ#jKtoofVFRuCP-uqF%r(A8UGpO}KDD>PO|C0&OktSG}zwH>FZ$S6g5H16L#?r)pfd{HvbNAbNEHisuYD>Do;;t<5w>VtLtu-(hQ#X zm>pM4a<-1K*L4X^eqBNN_NNKC4UB1QXM6q-2=UeWbC#+v9-I2PgPlNUTG)BG)7XuwukbJF{R*^qT!p+#NgssZq01i51PIEFYADC`*7mn*+T z{TRUaRl#8)OG7hi55@<^zg79?99|8;hSB3_e9^)E$J<3D5ZI5Dhk zF|WK1nUo|mJReNj7f6mjCOq{PsPhsb2T+jp)!FpVJ+1i=dddmBBOB*i&;rM1H^4vV0l2reR+GoU&^TgKw znCJL_{%*_nUdbeWnj|piF3zbyd=9l_RW99Lj@q)*hcF!pGxox@oL$SkewPAkIEBA$ zD>f^9%&8EIcyJxiOMU1utcq+*YQxq0E^Lib6+FyD0mATT1&f?6*Yhk zs3$h4NaY9M3kR4RUUCeUx;R`^9C5$By;wCjsObVrP(l?0<=U<<&R$6p`;FJ=BOuAr zxe}j_uHTqPGp-r|Gp_rVDKs8Ny6&}wp;BHfKJ<2a@hGN@`{`9}V|yR37_L>!Yk9i6 z54(F}vCiF>sym%GkBMpcwE!+F@DIW`eSQ6mWyI4f=hp zDA;`uf5lb(W66tgWj>5{7vH@zx46g-9;ie~E*;BwRuv{&r>&y(WK~gA#w3O{WcmZa z-t^>|h!F*?z%>3UG7X6=GoSQ7yIeKjt#WNYqtU`m9#`q_b za4wq-T)_{s&n=myQg(MEGXs{{!m&*$H~BEFnu?(ZyANrvrkKq`FIT(A@K8GAM5bz~a=^j#96BTiMTy(LS?QSV6iX={bNEm~ z=)}wS<*;?oh+qWb9^(9_q$_l{4R@$h*nU{cK%+GN6ybmVjB*AMP-812L*H=*b)qcZ zf%jnt?{xC#Ta~7PJPSE>YS>xAxdR>Z*Xn_rEh^I!Z7F4e-;;bFoK!8*zxs1E-OQ~v z{NpghMXRM;DW>0}<+y@Af<=K2IJ%^X`tNfU^P|U8IwhHtVhjmGK9D3B#b@6IqD7HO z`bqSbS-fnJ(*|{#(`>^bw zphN|p6njwp(e5V^(zL}mr9&O*YAsLjMAC=2PX|^ORhD06CNxZPSxEiQ%lf}x<7zxc zXKCUSDheQP&YZcLLtl{nS-XTzD#2>art)nwpZohYdi$+ItmgK63kgwP2YX%LR`~Fo zy|4Xoyqh;(GU=%U)&-#7dhGmg6hfaHWa0en=uiP(A?K+3dw^`p{q`Q}j-PdOTwGp$ zed6uqwa6JAvDhEa8vHGe_+nWQOI|IIM0m|%6`Fk6px^u;!GM~43UC-Q(l}B`)t zCtG_L6>Fevtb`a#&-hk`#+Cx<+aD?@1pII z@1omr!5Ot`(nF&?rH4HYoY`+2CgA_J9t_kxj5!!XN)S!eb{e|i0_eyunf zMKShu*u>9*hH*%B2cq&qZd6}aes9)bjGB0DtsTzS%gEoVK~wqub5VAiB$=;%FG@Dro|4JDj2jI z*SIg`&D(_`6!cfc#ALsKyqyo>wi@tUurCapycA?uphYw8VW1g60VF=;TU}Gr1!Lt$ zCqR-6x?-vDfP|F@+LN^1M*p2#x{ELP7MT^rMPE8YiRkw3eE60@-eiz3`6nCZ%&muy zXRzOWTf4fnVbNNUM1kx*Z}D1)oaSU>*#`*k-Fa3Fy{O|W288@(ulb6ZAe)B!G?Aw) z&0h}3e_v$_dR%UdlU3$ux2<}!aj0*}kyxQw<*bfWRDR^~>z;voJF6qd%K18B1w$X@ z>HJqmKY5m8m(T5$*?-fD8wKQ&ZPb+P$;E_aU`-5Z=s->f{W8H0G%#vM`^0*RC?Wu_ zNjOgvPi3>;pjaBKcv(p}xY(KR2P-id8vp<<^~DV3DV=-P8Q-Stz4kN6eR=o>FqK{; zGSIVvBu8J#tJZyKOIX#Wz7h6(GB%%IT+@N2s!Z6i{E*cr#Sg?3^xCUopmMnM407M~ zP}6-qV5{L#eu{jSJAMha(n5dmN`h!P3CRlMYPI>;1|7b>=735rrXNnQ4-dhcWKuXa zUI&5MlY>lzX-6yGe#E}OqX-L?7DaU6NtW(Z>YNX4oOnDGj$DifBdM^r zg=rj#{&Fn;`~1MY4u?nF{~R7dx8x|8&E;rKr_ZfZCRAn1POq+T>1s;HtNfNZ(!QM1 ze$A}f_eNOjI=kkEqRw#-dv4xSRV*Y*y@?JZQ1%XwST+8rhY8~>Ja6p<{zmYncO~27 zX>S5@WpJ8w3bb1BP22!9CWDB*3yhjS1i0l!&qN63h&_03-M1V#bgCSDgs#GIBN)RF zxP4dyHqYvStNpoQ-iI1-CBvx)vkE$W$VD7(&B0E^_OwN=%JA@nK`Ss&JR;)q3*9-!A1@Hjn#pGttX+cJx!##z?;BvGPI& z+fspEZEG%_fTB7|gm5E{ehyGKx>aJ@SZ??0yXifio830Oskz=`3F&RzK#-4zv$7iw zXT9niNe{KnS}UP%uvLw$sHpfs?Lk7rFAvcR1rn32uh#b=i;Lq{rg+hQymH&%pf(p? z{Ki>ITw7F#gw7Z+PnFD?0#N<^C3?g7RM=Mp+Pm#;sTb(ZZ1g-(2;uY2;J-E1~BH!mKbdI29qVoumD6SLjUeC zEO(pIB6Ftgz}!AhONK$Zi#e$CA{IXN6v6+ldjI(%1TMX=_=|N4ic)r@29o`<$*EPn zgzro1GaFmnGnPfs8BbxZic>dwY3jMD*CkEU`c{L())wB_Exv5oVJ|SuK>^(GoP=@w zpsom5uR4u>cze@PTP;MM5!MASu60v7j@u=J5!K=f*;lYaz0De@p~yPU`VEL;TwpG) zKNTC{q}YLa7t_$t!1KGtJNP^~=@Ri6F%Ri_1JD{ZN-Q5R8-$i8Jmut{YIGQJ2Bi7} zgRGz7oWxVuZd5osKD4*7dATZYOi$oGTcw%GpG)McI4mVDo|+!sb*j@(oydZAcxKB!JctQbm)iS0Nw5W7 znOcEI{K0RTxp@cA8w(aC8E@RU? zEPZ)nz(;Ra<|LAY{$AN0^^W5&eLT|BN7TRpZ-pJLSh%gwoE|Dg*0A>z<<>8qxYCd1 zv|L}cyxGpaK8rOW-TOJiwZ!3@#z~- zPl$$QN)5B&;hak=SN_kr;s1FZL4ZSrCBOE}BT=4)z_89m#UNgCev6WNZ)IJoW~cwU zS*>?uus?&V8kxbRfHd{j*6Nqa(4QANrJtc?ENxL){fB#Uc69V`>JAmYf&6VTOXqfD z4f^HVZ*E^j+gW_kn|FzG>8(HgU#G9UFo!&d%kfjbmows2vS6ITIZf1X29r*qF!>AB z-R^9MHhp6Baaj#J86K{qQT2ndTK)cP2^5^PQ8*710&hB&M?Vt* zC#GkIYoACHCQtp)CbXEDhx4EZpr0!W>_svi7lYc>{u8-SWeJ1kCNB5PfU^lHRNxib zbCn^uZ(-6P+G#r8zX@~Tm4JYNL=lJEh|wLpTJj=gNG|!aSBSsa1^^Pve_&0K&=QQl z(geL9J9JSL=H}*I&|m?-@%FFPRgInX_4S0a#B{2_WuFBO?zZO%%qE`M%pe0KynYWDc(pxkjx@Jqw}fou*~Yk$q57qMpt z3O%*W!ieDgUR&?|tzxi(gR}2l!JWEFpMC%htvlmcA-0!;Re-Tk;j$RNb?bhl53$70uG(sOvCz0OeNm(>74E1fc z;jr*MBijM4dkL^3UA{jEYmpI~u6o?JCD31P93T09;rS`_aG2X}xHOkVX(W-O$=>HZ zESrb=HV^2dO4nv<{&t%r+-&x5^ zMW;w=OiIjTF7(6Mqen0XPk+ay$YR>k*-+>g-*#%LP6b_Zaa4Y4ZPTR091RRcIm6)-v$a7 zH#q$5&3*>CD&EeKcMHyie4wgZ!a279W%L%L?!8%UYk*q^1#MUofHC`0e6Iu$XJJsg z>?Fd{3V?%#>GqG$p}F0y`vq-=Peiv4J>~iuuK+oU(^CL;92Y{bk>I&qCX|oaj7nu8m0?7Gt8o*Zp1-TaHHbB#vHss zSZ+`QS#g5P18=9-B+vXVIW|b<^63eiV}=}QZyhe}W+B*%553mEyaznx0+@Myj8y+j zjh%T3SoD$*Bq!ndlL*YGdOh(%GB0nXv#2$xCa_7Ku7DdjUiOk#=3md)W+G(FZeb}U z6v%)_x|b<%I~!Nwdh>Y50oLqY2@1se39xl3)>3SB0plvY=^UAZ5>RR1fo7|4M z{4QMUgISv8P(S5~zMEhR^D!=B_xI1N53jhh5M4=*hhz#vQe-T9=0Jmk|+FS zM2MAWugK)Mc^kT@k7lF+5DNQwLH5;t??Fgp@g@B?YS7hwkb(K}+u|R%ZVq+OB3?!2 zA}emg+#nlx-1HuxvK9X_5^CRqKkg2HOdPR<6tsQH=Hn#!PTaQ|)u?jD!3o&U47Fds z#YTsTI1JSa7Z2eNsaI+hVV{FlFo*XB)%0Sjx~S8SD;>wXpV(JfCxK&I{O4=2u1bDA zYY9FH!>c^Ji{yiP&<+RF{d1i`!m$j`R@k(na+7pI_2f0}};~rjthVKtxNSu0X5ZPu37W z5OIqahIhy?Q-fylDwu8Wva5ZsE)lrJmp@2=VR|W&JELIoqNS5 zRk4k+s)xYou(OuC%}T|srG+Pq>$^VP=5YZhuY8c%lz!CubjSg0Slk6q53Z$lHt(iB z|Ga&U#~3UtUcBL@;8cHgnUiH=o%uF9AOP2Mm{lLCGq4abCsACJxt!`puq@HO90mMq zHK2ukEH)&5yd?-a7SeLre9hGA#gUSnTjwnVaq$L(f86wH7w`^R`0iJJ<~%p%`f=tg z?OzUBi>vUxzPrah8cTy5xa}k|Qnzq&IW3&fjENL>_gHwfrKP0{;7)l`kA=;@-cl;- z^|LaSK-h&RQSMm-SzSc5ilWp|n7l2UvHxbu)pFkv-k8L&X6%=jacZ8b;{dz#q>H@K z7e!S`6@Yih@oP5bL<$_bysu=~&T!(>k`CY0FQh{!T`b|!0I4)=QcmOUpeeei+wb*( z_D{xwvkI;p?gs3;t}G3`d_GjD*0oDWDSRTzw#S-A+~v%RA%7RM9m61@^P|ag@Nk(K z^|@(xHXzjnX!UhKi@BxyGxeP3%(c^U1%W}C?TW$LkFc0RZ)};-<>j#RDwEMvGU6q` zGEPs#Up<84zcmE5Tgbs6{7fB&VBu-zV#D~eXRU%F9Fjq6S{$$Pnj!iOQQi;`pDqJ8 zB^j`Un=pn8zawfD!@{`5*l-NhxSQlebYmk!NLi9qU6rMriIOMPS^!I)5^z`PR6!tA ziVyzrM_$S-7$}7AlX#U${0s~XG%}Ll5Mar9_A}e+twJ+l8R{hHZd>cV?E`ytp?;ej%od(4~#DJc3OuFORi>{Az5cb3TnX`pm@W z#hug4hq4MS1*e9lX%bFUJ)f)#avH8ZQ0KI8`wE)BF6iStHd@7a3#ustIlnwgCr%}Blr$E)BC=X^$66RiOv zPQNEUv1R)>e`p^DetrFQV{Ph;#uTNV|4$O-)Kvb5hxv@E9&RvpXR%ysDXLJt7ZS;u zuT>qsg`IW%OTS*q3!VBm5hh2C^X$+$pTW<%!x_X+0liJE9t$zuuz^u*r)}pTUsLak zB^|Ok(*jlPn0e)NDn&8+1(rCmyp)7V`tl12i?7aTH-NP7Io!QGw4F1cJ+@mWRvbW$ zGf>bbj70qEPvnS_djf^=^kBM5HA=F$Fh-TL77cwz3$*i#K-kC{P*&?o*lILg)ezr#wy z4wbuU6l<+e1AznCU;TDY--H`mgiG=^6U%GHBiw~%BHRG@|J@If`5Y*kpOb}Evx1*Q z63ucA%Bg95IFH?=*aw;R5Y7KbZ_$)WVD*ABfjYu`-zSIIyEX z>n?U=`KyE8H`GfxqIM$xFqaOZr}~!-?9Kcl9=tU?HSjoNE?w?&i%du0;d=PDCd;-C zhVB2w+;@j#+5dl+$Y|Ip^Rjm$qp~iWkjTyog^&x$-bAvu6j^1DBxH-M$_!<%?7jE% zKJVY}JMO!V<2j!Hp8NP79lrN1&g(os=jZc&zt&qm(U-UOSIMu&O0mk&X?H06UK9+m z5zWiv_{C7mm%O+7VfUlPGr7iXk!_}(FS7U_#AH|Z9uimJS4k{)3PEFK7=;4!;1{DRmu0E0Se&Q}Em~hE z4H*cJbw1&&tT%b{gJvfY zLMXv|dO2|42?n*PUf1|CY_WYc=u>Xmwd^bN7Ll>tx3nG&qeWr5*{Aw#+RFY zT|;fpX_~!$gtrZr%tTEyYz;btCIVb5rr>7RbssjsjKzKpG_?|KeA4#!?qeHdJYb9! z`a4hd9J;$wwp*IASoc_oe$uh|RntgdW*PaYZh$mXc^m58n{EHSAl9IeCvFL;z} zBJiRsz`hXD2^q-z_T$i45y~hvKi9eX`;$4epOoEQx9j8#T%Gpk=l2t*r+%e&20sp4 zVD;L5ZU-BOsu;G_34Co)Xo_;^7zD%rUXlOx(WB9R_}gU8LGtg;UT^4YQsj%vV?LkN z{UTi4ZCk?aK#a!Qb^cfDe6id$ekP(c_{2?_V-L&}PBPEOaL6^QM>lgljMSZdzg04BR|eRdswnoo)cH*x*wD_ z&p7M);B~t4ro?#TzEf)eAww8lQ{)fkU(LR9JX2edMPo~2-luRIzJJ&F97@9pMppa} zNga6GkSg5<(9INd#$%62d6>Mz;Wh_I2R6Z-0M<21@DJAzj^4r7)+_5vD%KgN6x5fl zbU&QzbXBpw7k0-tJ-wUzSzL2&omkTN(PR>}tG-^W#$ATX+XZch4)%@fQxjJ7{WWJocQqlq7ZDcZk3NDPy#6gI*f{Q{}#z zzInK7>;t9C;Uk_SJmIN!w^w%oM!xVyY;`go${o=M!B<@su#$liIJBQx)&1Mq`#)Prpa|9lils}f z-y`d-t9dtTy9ZQ~%3N;RmCEr=KBiMho;W7lbqb;%SZ%F(&T!`;ZFuQ zfffOPgWVbP+s#E&t706<4-swY#D&VCw`hjqlM7n{< zBjB6JUlJnGO@y-@DOH0Njw)_$ZjwW+Nf+&rH~3}I0~G~SCGrL@j*TO9*&8STlK05{K-(G-YL`$Tr#5HPPx;vSVf1%@L%y>K(K9~ZztQG#t? z`LY+x=)GUVM_c~^4-M}btY3LZdHKV3m+OltS{7gE>nUu>TzE#x zyC|P%B$Jbq0Z+N%7*bJD5zT$~mWasU zeUPW{VMfvAE}JH;lw|AFOGK05m%>+&IccCIkZb1+!^q9W1Ky$tnq*Vx>kuvz^5;GR zY^nk%yh(OnON+vJnj1{b>_8W}tp&#ZKYW%3{B6`;pTLx*#mm`e{j3L0K5rA7e+gTd zt}4F85Eqq}_Vw@vvOy0Kk`)S6wMKWb{qvu9|mRQ8?!qh9cT& z3h0~mfiX;-PB`P3HV$x84knrJeo&?RzNTy#EjMUruMANTF9n7Nybr*%htsDi#a|KA zbDxmmMd*kx7tOXJvBd&T`0r15OqBe-WiSyrV>9TNnXRowNEGQch>bPjIjRT=C;Xa5 zIZk}W*D3&6Zg!SwA$_*9gl+6Ma+#6KonG09O?s4cbfW^5Ql(c~dWOOas^}JQEC8Lv ze3UOo5DzI|@{nP`FSH*{FO>n7-0_74F360C0~qe1y0D*cYrpNpWE)TQ?dvvV>v-F? z6EL4WTcad@#8Vsv%bgf7V~GY~CV0DX8XBjnf}Gyt7$?<>>Pb;fTu#M=*-CvoABB#` zhhyQ=C7!>RIDc(f-fZ+lv-P6KxID9yOUwnv?h7_{;t|aF_Zl8#laP)f?xQe`hxL1t7 zvnjIhU1^<0?>IjwAmW6k5JZE6^iB!yoq7JSzvGF+Z(k{%h+#4!hLY(*UGq(Olr9Ko z!g@coefxcLlxBqQ>NLOdgM0vf>Z8Po&zezaQ-noO`dLZw+b1}I@0Sek#?Ft`n4M@GG%5W%7bSqH$QfN`fqH+Ulmy@F>*Wj;}?Rgk?QrUtx1joiVyjI7VF{Xf7sLN zRp7}UEI2XaYx_YRDs$x>2}xitavL^R-r4`Yt~&$ag7_(wo-_)=PaXJGR`i{TC;>aw z7lRgl_5*3V9Bg+vz*-T12?>?v9SIXphFPf zjwQCMDkzYKc)NMLCc z%k8Gp;br(KDjyI1M+=G;r<1PaL)O=N^#enCDgm{d^d~V3O042KC_0vs)Wz|R(1mrf;MNe^Y@u}NDX8^fW z3VKjo!0z9eTzxdCy$dG(3^+YqZrZYB{YwjQ3_Ikk_E{=HrGi&i%|a1q5N6ph-nw%N%#X(45M<{jA%g@ftXYUtE^*TuFF!&DMBzb2A1} zo0_@$VB>GN>z85f!KV(c8IXR85eInI<1-Apkys9y&IJH}PdMNo0yHmds{~lf++nl> zwA^^GZbmBaKmU>-dKVG0VJ!e-fUF`Sf8n3!SH&!Qxk5CDN-X0Dra(EBi3cu=iXed8 z;e`^sAzlKY z_|66f4C3LwGSWx|VT4RK;SJ0HxbX!`J<$?~STz#tw-$6lOkw_eb?X9fdb30o<>j+g zC{M@^Ga*?D^n&pu$<#O8>E{Xxb^(SXFvZ?xk&r1%JhKghXKsQe6~`UQUSHXWG;G#E zVpUq|f$uqe*aH+jggUe@5rcKun;OjH$I%c*l;9`KLRQ=3X2_Gm+e*3kTh5xCBUEueaVRRH_2Dr|+U`OfPmXU7?j^;Z#B|N;HNlM(m6I^TB4>j!L z@3@+VMmPip`_O4RkenzOOK5mVVDEegbfJ5gU)`|g_+WX9FH7*&P zTHvf8o?%KVe=R$dltia)JaPCe6H_V9oAvnUz!|Av^4DrP;*rcB!@GrDIHb-HPeP81 zi~9;Kop|NsQy1HzVlx#aO5(<9Kn&kb@Y((vgky%{D5DRyyvrSezEt^iyvBaFFyqHX zE_lqN*jN`Kfld6zvx_hA9F>|4s_bNdqfQ^Jqi~np=S>N0UqzY|NOHXlHO_a}Z}GfR zKyPNYtJM0qPS`iF#eb2g8Z{%!jpx(6BHim6yiNbllUNZjj8Za%UMZ|GCZlupG|KSM z{J$h0M*)|s(Oru^?=M@%=w1g&-~zgU9LBJWCDou$&i0n%pev<%&(YjaXAX0g;CDdbVh#q`OjeN^OpPZlS9;wvTh+7MP6h*z%^ z1@==voWOCG4DD3XeV29DUj+xIXRYvR$95* zv{r_7=TaI_e&Zr4!mmul?#JwTASBj=yOYl>3EESfopV=>ZLmSAE~JI>iEvm&~qB z@B5f{N2%6~RO~92tXJY0#~w(Vrct)QrWdjP$xmMIBxsBHu_!Nv;j`p^tF>*$tdra> zgY!kP!2iA6TppOsX9&pL-rOT6)Jz#Y1Z3aM6Y&tmaLlb<_u1y$LA@9tT8g`@uf7`m zUN7D$2N?cnHr0B=3DFO>@{XI6tz?-2CXJ-B56qk_~@;|^XtXy zfI|t=s}a&DlFsL>YPu}`t+Hc4d??9-_^$=d_>oC|#7Y-K^f?&00o~F^+=#GKHNZm( ziM{Jm$7#6Eu+wSK%yD2fsUZ7UJYQ-c1kdy3DHeHBUrR~Su}6bvyIh7oZt_7wH2HyC z*b%Ym%}aQ#qy0V>8236+us=rpI=vf9arr<7-sU%UMR5M^CGVD{wzGXcUe~v|gcZR( z$lD88c3%vq6YXje+V3hh;Jlgw=xj=yDU$P2oraQN3It3X z#`J4&J6He}$5ShtST_Z~2}g8@4?dUl%)~l&7q#^lh=rnky70MuCdwI~ZIFhVWcKEH zG9?d*4iNdtO|1jPx^ZzU$17YY?szBIFH=x+6J<;ng$J_CXttDJTyk~8=bU)b4A${+ z3Y%Qn;UH84l|al-Z}RjR$t8QYCt^-3XjVBy(e&z~l&5BaL0%5_l`NfVNgU-6f0CCN zITy->Qg8B`PBm#pn<|hUr^GM>$~g$jd9y7MbWsHN;>2{jp*%=Nu&g<^o_Xa85hO8A zPBxj9tFBxRb*I<OqFnC0P zsldT0fQ(MPjEepK7u*JsjNsK9T{viSMjUL0>k|z{cM^iBr~2{8$^$~Em&_HRaOF=H zVbE1&fT{H}r&3ixM3Ca#+Udx+M;>)L-W2T_QJo45!$)tX$%~p;Mtmv2RUrd1huk_U zk7wqgMpKSB_O^cU@Q4zd)sNW^1y(m0@BSwDn&**Zcs)(Thq@w}y};2utZIm^n+o zMs#>4k;Lvz)Bf~>zS%-X*y8EJV!l`KeWI_0P=W9UGiDDNDPsH#tpHSUL!^)TNjkk- zd!v8x(okPi@R>FnDbxA+v`0b$^9(k+)(1af>{Cq~kbMs)Ya+T>MxC|Uea+g~`h`!m zpG%@kGcrC=f11ksx|!?sX#r=_~Agcy8sqR*O|yUuwzix;UxrT-o|@di063r zV%IISHsc|lpTRQ{=KC-Wff9~SRjINGy}%q+l_c}@8#m4U_CJRt^2_*uV-A%ZHfmQU z^-BVqo~990^xVyf2W>oO1SDzR;YWT6l=VN0OQCcg_3%AtgWf3LgI!=nvOzqZ){ovm zuG%u)--}F&0B^mo!(2MWSm5NE3S-9Ubip+~$fbFRu~M~x-Cz~2YBKRDw#)!g!ra1v z?i&>t15>v+H^*{`&n@d9+|H-b4ncNJ{Le74b>&^eS(X!7CwdRQ$?o^O{nQuL$T`Ie zv9LI$G1eI5=m;N|RTt6xf_n9}qSQt}jmElmx)H!7uth0x3_33U%=77$-!;8lPVXDh zg~zEBu6EkGei}=yL>sZ9?NmztyomQe>9%Au6$8zpN)$`pn?i4+&(m~U@puYliQ9yI z^pw=3Zlj^?c#}dxJb|H*&UDrN^w>qs!%=%G;^{H+r5U-8k1ZUmz)e=28n$Eg>CKzK z8M#F8L}z{dQSBb!@frFxq5f#K=I=1^@BSSMxMn4Glk)cS|J?bBcA*<&OuoJ>k9CAp z_0!=%7-$Q5Hy;)ukF3320}dL9CY9xeUIehu!MZ z4klfvIW5A+anvZC+am0#$0?%J6(X55#*P+#T>Y96s=%Ipi9ZFzbG$n! z0w3ibNqn4pv}+sI2={ z&f&js|FUyEp6M{iks)}E7`phSwJ(vW2dML)#Fon$ zGc}V(N#aSr!4PJh3ZHv=jnL^4rb#ke=p)_8d*|Yyb=l+OQqhl_6cIasJ@tIO7osjL zv1tohZbV3~U*(18toJ-dimg8ii}UpFubC#Yxz05g4Rt;RDB%62n@{|(8G_+Uzm$-| zprQzS$a8_P@pN}dezC4BBWCntE_r?b7#!Hb@HMtU^`GJDfMO=rBtV5IK^lTX$%!-n z1U~xt0PDC*EBZlH7jUAm3!j>Ag`@otnB;aPvD$S(uBW$U-wQDe6v;`0h?H^CKvkbz zuuLY1eqqrY|MWloZY2MaB>Y>YfI6AOVx(L@lr2$Y`#Z?N^k0N5*jKJ1R2&HaE8%#n z*Zz5}$Qs0j%bmZ7`+?y!0{ztaw&zEo+2A$>kdUeTjxwRV*;o)@%F$WqndLI;G$B*a zc>ekplMxN9?>rOgegcxh2MBX7a>I)155Mjp44sa@&9m+-|ICFpk_CoJ7>bkLy-ZUY z*QB*5b6{)_LXgkM%_@>=4~;@+;=@WXEVt9h9B3r{C&KaS~X*m87?o?#qnd94Fcuhi*|bk_4*4b(-{Gii??uAT5h~5V#f6 z{3mM+e*nlp&c&cq7WZ(8L59OMpt-NRPtl;FCS1Enn%3}1v^d4vbE}j|Xsk{Ib$?8$ zyycJ0fUK$p%>BGSJQgDhELDy{_XD@dnEk7Fk^vhMt;}t|fG7O{BksQUp4;dil|tIZI)(T)W=pZdc^728mxYL%kUp(2Hb9H=Td#u>s6^7}W z{;v}S6oa9FM96~H%Yqn9uYtH(T40hW43HK~zau4*bXot#Z1?nC_PZZ)nQkqh5J^7t zat}SVQ+bdQzcY@>RTirgAqO{T?rsJDhQOg*hy&HfhSOJ`ly4dbR65jfPrXx)O%9DC zax9QbW*!8A@?TUGI9S@i3dek}-p|MU6Oy6SDi0B%nk1~ZxImlHw-05S3Ix0`f*S8^ z_aT4|TwqFw0(}iMHFq0@i-H;ctbW#!WN$-D&=ZyrM3S8ySK&Vhp7Lo*V>`ONuXHI@exC4RW=>RzZ8pL7& zbun-jP3qqv0VV_B21*AH$dOLCW}6C%JSp(daRe6rFKb~CN`0LH;EK5I%a`q@*6v^T zI4FZcMh=F>m}Z}eIt5wTQ3soShHv^hLIE4O2?Kf>y1TD-1aM_~Ot!`-LS`5Tuvndf z4BTHKk9!I-tHHZU6?43`wY7Fs_58yGD6(V0$D;rQ(dH)PXU*+`ki08jU$_OtX4h`8 zL;>!j2zflIL#J#)Xi_nypH)FrQ2P;qSo)j_1^)2v5RrMylP)O}EQj};ml3mQia$I* zK0TdX5a8I|18hkfeE2D#bvTkCDc$H}J7idzLYO|sdq}%LL3m~aj8K0_84rDOGmNaq zX7~ieQ-(c6T}hE45l}JkqL^3jz*0-}6MpoNb}R|>$cN<+1+_Nz2{&epQ!_UlFxk1P zU^UqDVbgvN2%1PUt3aj)YVivIX9(HO@q>7^#;uVc>vnbJEe7mhOaY`Vxa}{Nx`=CO z3F(w!yk9>Gbi@N$n5%ri`Nb(fQ~{g%lQ8SYaQ<`p5BtgR2u9f*Dl5F% zaE3wPV|bv}@0>?vh2U*_)zgUcXs@W6GwLfr;U&;?0MFxas2K7qDIG#xPI1<4G8$Am zAqw|bN>>P$zL(omlshgjvDVW}!^@!HaqM1`be`U@L|5R_H6yProGj<+-Cv0p*H0Hd z4OJ_<`-!da9Fx%yavZq2_22mIWjIN41y4F2!}FtACg}T$V8kPBOisF@+5zbrh_DF1 z3AczVcDwb?pkBE3=IQ{rOTG(vl>FlmEHR|cs#c4lQG(={{*_!lrBt1%M~ud|Nbolt77MEfzhUx_B`45;SAY!eMNV zlXisKSX)!YxIW~5L9nT3URh-#3GBqqRV@Q|sQflzT~IavUo$({ZeIrm?t0R}Y!5lD zm-obDY7JaBW+Ki~U%Nft6vk6L{Ust@AyVF`;GRu1xGQEK=EAWtCwj^?#Cp9wa_~Uh zH`Zg=cq4#R_#d*M6?|_z(xtLH#dO1gCO3GT7mp;FG*>o6hZaoxc5=rTiCL9C`MsPX z>XN7THMdFEyd?1OJ-ERYL0-!raAU1P%7yWepgR2+aDU2{DiBb8HS>T0?GU4p_eh%( zPR^KNm@?MMwgnUc7)?G&au4?9>ZH|=LFe>BiUvkpN2rxy=n<)fY&otbywv#-Q_bvQ z(O1mnfFx-A`c)a(?@h%eNtXHpOAv7aoS9#*kAR0i3kIMOxG7P3CC9~;zDZ8O5O$}T z6-BWRazBkEk3hd?M&!Wz;gA<9KXu*1)0i-#)<75zb|{D?nsD~P*~=$`<$5mh86Y8s z?Nq{#Uh-QEY71Q?+|dO<$6?)$>|huoy{fR?&h2LuS|8~|CLWu0M7C2UN&Hq6USTCl zff;TX;A9n~Ui&mh3H;z>iXk8V9 zs4C#SXaS3b!-QHiykWBs(qL1IV^yKeaq~RbDFVTtTG}|EmRW`~^n;1>d&vy}rPb*Hl=SXuaWl?;$VZTO<{P^$Xup#Fe8dEtg4)qZ!%xd1 z>0DBN`{37pd>mbC@M$>}B>y*oT|sE*NET(v!$Yzafk8sj?s)TnU^(6QwFV{w2gOW6 z$L+5(l1t4G$3Fd&&h$$_`vFwI;z7d0J~Hu)NqL^cd_wOPX)E2Cu4iqE`&Wd9H7_I6 z-N@LqSEA18T4x+unf zMEFCp-p_e9o<}Uf??Yu9LnwtFae;O}8U#Yka4@E(;#a;sM^uP2wlu-YH@>4Ev#?qG zhDiLNohNpo@=$GRf9G9--?yRDw1R@250L(8p`&Q*A0HY1FK;k9onZUo=hz`;@7u)i zC}``ssgr(OcQ^U5jZ@ZHzQu-9I=aGg+76K`FneMWqVUGDyJIiAfPu_vq*@<cd})N+X2WV9i1d=Ez2H~T*=s{M9I3T5{HE@?4aMDGT5T-)TY7OZ4)#wi zZ_^I~8_W&PuQ;fq9@6m2lMX$pk znp}?LkbH>WtKSp&{imV07`Vv#fsc~3ka_H=`TLy*FWuPM^{O79`ksW--TI4)XX<0| zz8U^DoHOz@9*2naED{)4I2*-}4;bO~C5J#mg;=fxBwYz~Weh-X#Uij*_eooEj{_$- zVPB$eL1?EToFc!LTXvxuxeq1A_|z0LfQNC=J}ZE)9?5(_NcB6SNbM#4jAV}YB`Ctl zAu&8bJy*Mv>=|su_Epo+6ik5){wE|yg3e`arjrqIA{zJ#LM@}SPXW7qox{#9WI*m6 zLmeCgz5_)d4NZ}v8L~<5BtkpG0fcVvhbrfFS>PcVyo=<)s)~rhWJ9U5tn1p#w1D4F z2g8cQR%c5KZ}T43yFDm2q8)BqhkUkC&6%O}uh2Mqit0}Z5cJS=5E2uspu7W_jTGyL zP^+U=Z`~sL4crc;^Q-ORpi5k1QX#s0JG>Xj*YHmPVa*VkufdThS0?Uxq*r~kXX!5b zUl|RL7y4EAwwDWLwIK)y1*f=1kqEe{$om=rKP8@ zuVD(rDh%FW07l{t?|Fq0B(2~lC6cUyDz z{;?twdb9f1@*4I${n~g$<0@=KH{Sl#d6}+F8F1y3fS)T2Zs&KYRLf{pD>BBp15lF} zyZUcF!08PQf{UQ`6Vhq-1vDMG_3UHpi?OZ#5-XQZw?Z34EmSOPE zdngq6?zb}0+;|)gk_B`3I($SsN5FJ7wNbU>JZ!;8O&?_6d zAW(q6sK3Orl=|9A12cy32@8GaCu{T@2MpiLVR%Exd33rkx^tZhvYQv6q#`3 zoVt*1-SUbx%<5;EY6gE-4a%Vn>C=Gg-k)@ncMMzs9j1lLMd@!7M+1%v;ZVrU>>^fP z=#C!VOn`ErQ1<03Lj`Cb1dfp+2&Z#3P)u$Z`lK*LcspR2=lP>i;NzbK zG~r4NuL0aJvm5AR)BPqSXdBF-ZGbxn(0qY)pi#zn|I7Q?ZUKRFU4uwf#5i}=wE4Al zYobs=Fr~oF?$0nGOXmf@HAe`gV9&t{B))acV{&09dJ%e>c=x2g@X|B+UBC|A*9uV? z!Q~jjAe_`KKo~f{KX5^MiZNOE=MAfV4Ed78W@qD+`)jKRIq@r+btM9mtgp=6&cmW; zPieCJy~mJ8ts!VBt>y4DY%FFY%X-nwd+(z$a)%Ij8vz=I1hsT@J)?hcmAl?E3Z5hY z4NdTKlD#yKaG;w-?2;`#~7O>@YCw z<%%g8VSnZ*`>?i1Q-nPKSRPB(f3oXPBK$YOPSg#G?IM|-`!F=^c7i9o4d`X^D{7+k zRAis2P!Fpi`4Z;j81KW6v-Uju<99^O``>D43&hBhXOBVt&@+CR132MabMeUUtMh{_ zlo7qM!HXeJ{`wp8?Ef!+BUisnPIz>I2)`d*dPgd1e!+y_b_A3x0%=8j`gZFsDSVS9 z<+%_3*twuiuW$%GbN5f?h?B6_t|uVqEC~J@Pt^`Yd7VRi1>O?If5J`C(lFQQVmO)% z!FU*h_WK70x!*T4FYHtZR&ebJJ9!t4)eDx_p`UVri+@6h!?0oW z1gkf8YCL>D8Q}>V6)ZtCeE;dw&y<<|@I1k|eu{}`6Ygc2$Ju@?lENnlfGj3yMfZn7 z%8jWO&ruBjzftbOlkok5SLCf2{_y>9TP~rzGeP)CyJ=E;A>6A3i3ZyuHs1dy^9nG_ zD5JOhGYGV3ROGv`7R#^W#-u=L*pkSSQTEZP?Gvt;-BxCSo(OH?gaj6EEZX$UHc;tJ z+Pw?c@TIZ6jT?hxq)U$G(jfw(;SUiIfVKNU*RkEeUX>vkKp4qToUCJV$P{P%rwELdzFu}9_xV5joHt}-M1Vd2tmHw(SNKtx|J znvxdQL$knwAB7#g=tWV6(VAVFViXTOK(#ONH7aemeCbadg|)K2`fsfS6=7nrb%=gK= zzJlu=Nnqe}-LiT{8gTLg{p$GN9f7Em159vzRS(3q6rzSiLV}wISQ#kJH^emcEi^1T zP_wIcYu)S%GE)%4CAh^~)x}U>b;C5Q)<;0E513Oesj~&f4pH>tkj*=n5@OPNuc+ES~g1Gw*gA0?+zpT$%*Z7PlfVI8_lGA7*UO53K5$+Wan5nNo zS=_d>O4DnkBVpHOs;*fV;r%U=1aGb-61>t`)*f)@& z3=*dF=I|#5x%BawNpzBTdw`kV(m-msB2_VL0 zp>VH+z#2(_@!5eCDB*B}lqvygtF!pp%#gUgL==ZW%IzA`Oj5=k2!)w^ryW85ySo)( zaO>9f@7T_Z(|Fq*f^64dWbaX)VT}aD%W=c78a}vd9fHiw>s;ji;uc*@S)jO^{F=C3 zJV2(6bMmiw1{r|709wq7WF{DT=r95F91SD+u$_+)pKEWveoj+}0!qo~HKL0=t~UCo zpv$`h@r3Yv|EX^myKCsTH<43hdR!^}W{XKa91VD(z*;lgsp-ZDGWH{iS&#M4xH=+T zo_=l^npPN#&IP%uu{?&CcT(r(<~YEZOhiaXSPn&!xz3I-7*~wdz*-A1m+9+1sWKsp zE@?z+f@m*m?1{)rK!+Jq$E#5m(7==HYmB#zh=3vGAp-<_T@~AXY9FjF6rTxI~51H%h18aq{=*R1-V=X+8awPOO5$HzHO+& zEw?wu5NG{E$CGY+{d&5I?9BwPwL{RS>9QV)$wPeE#`1_ELK{GWn}+QRNK12H1gAld z!imbK(Ojggx5(^y4?yU5c=yCRvKm6f@WQRj4;_$k3k-}y$pl>>pVlfsG{m4z#)l^~ z%-S9RnMMC2`7FO|3YEb%ediED#6~I&IO%7%9Q)mVht6Yw13d5+xLF#>r3y2$DO=O8N9i0rL zsuzXp+{+l;xx|O3_Nfw^dGM1taks6^KEw29f}uo~6wtjfY6fXwa0m zz-l2!nO-+Y7PS5dvyrm{KQC{A^EDIwKgzC%jqpjL@pv+SB3YMsP9*~4cX z5FKPk6R5WRow%-yg??a=Q3TDchEkexd?4CO3L;fX+MBz>j=SEf@Bo+bR6YRXdAJt> zDdv&`!8LE8Cz)?YY&+@l5t1-w`E?WO$12yTO(h`xo@*@U70M=TPWRwEb*jd?u+n(d zoJ{F#B^1rHVs7Rx+>T2ka58cY9G_K$vmAqH_}4kRo$ULHQ~s>`K~MvDxIJqq^c zQQk4W>M*A+(GZ~;`DP>Hp8YWe&ho?r>qzNlax4oQ#|~EL{n{PL5S|WiayMJ58&>gT z?0VYwxl?C|7X5f?wHb~-1rn>}Tj@L_ZMuaBJ!(+n&#(FI?r&e_b6P=#Bnk2mfxOe# zD!^|CGKw~@RDZDsOsqzGLQ$MykQjga6CooUBWnPu;>k(^Aa^GH#Pt7ue znMk$eiqnb_AX~Qz`^I$LI&B6%Q%E{(E3{5CFZ73 z2YDbda~zMS$EBTPs-=`F?aQI^ZLj?*oLw&YN*OK6QGwJZ-NY}3q6zgP2 zawjVybrrtG6bYi_8TNh=GN0;4LpL}+AhGaGFv2}ua#-(ZVwpzX`5;L~qCu>oh<=ci z`DP~Z5F3E(-)rb@7wr0B^_%dWsxje>cm-Agbh34zdG1F2h$Lv%J4Pn7z0JUWXw6qc zkuG#(fZlMIZ-PF&=lB!hh-LnU7tVHLtJtJ@(}*8yxI{K*}4&5C}RSj zBy&Q)gE*xB{XcXA5wg2oop{v{LO0N(3;F02Py=L@ixg#1G+YU~Lt=RS-DuS;C559n z`w+8Erpwuupurc-MVGYeC`JKIYy*ygfAfsr7I;rK+fb>mh&H2HRSbX~50rg7*2awO zAW!`BDaO&(Ru%t>RsZcOQEVp|KA|oTg9_!r!2F!DT3yNrU7aSn!xzkURo=)=2j(3Y zfu&A+7f?vmI5k{Hn$!afO|_Tg0V4?=HHN#)I6;_!g!j_P1Z1#J%qmec;ubm)Clb@htbKT!fJ!FxlY@S61 zqV5OZo_KGu^pIx)e{e$h(E@>o{}REw&8vR8%oTnS1y&%~H@^K20ek@s?YFb-h`*O_oT;kE;4IBK zWOKM!XXnG=`{GOssmlmVNuI~r1f%B1A@#;L9AfSbcVjR`)44%U4%M;^DK`~Ayp6p5 z{kx0u_r+{f_dG$t)P=ZL+qV3tmu<82+CLGB8{aKGim~#p`1bb1OnuX?VId&;WmM$P z;NegKqi5z=6Yp9|(8Y5Ex?EPw_Ph$SFufQ0HI5n|M(cZ#q+H3}RjZH0d9}JRUmwz> zs*tcdX(X>e9a1)(3C^r98eFU!x_aIflC~-ul^o&Uzx{%A|9mxlla6yI-1!ufMbt*c zJddy_0<$%q3oeJ>6^e8C)ZdM(XQFoIqI)W5hj+#2*U4rg-}?w&k~h0bJ`AY=zd*i! z2$ud~1rWUVlD#Xo4>rU!4CkkD%jc(0xifrmz2k5I$Cy5kL;D+p6-ec?B@Bye|L9Ot zq4IHdWCx0kfe1#5N4oni+)qXgDxPp}vx^Mn_MN-1!dVF;%RCp1+N?2XfpQGs(Q?x~-Xw!N1-RxD1KNKOD_3nlgDv`P*`?zn)Z9e*%hU=LXr0@Qf{FXUwk}`dXN4cgp&dYFJ#uF~!t9VC zHQ8k<{uzz^iLi6|)g$Al-CS%(wxM zQztw6JGjoC1{{6s`KsgpWzSJ6CDX_0a6k36Mjl%IFsZ|GRg_;tV~D_dv*q&8OD*y; zG5;qp*lpX!kE4{QMkL_O;P-dZRXv-}1cvms&JPOE9#-TAl+TGm6oZ{mLkZ3eOk4l~ zZV!dNkgo&jh%48e!qGQjn~I0_hx>-P+N(20gRbHsUn}f}T|M+air5rX-)R0SBVI5w z!mxR9+lnBDoMk@h;a+K5@odzCy=ts&7>54sl~&A(ooZR$6Yf1njS2agk}?#V+Rm#W#8!jQDE0D@)eAc0b${5_DS0S&BR8PxPqk z=DC@4l;fgLzO*N!L+7|Df(VfuKs_WW;YI6+WnA^1>QHD0;m zE)jh0PFI)@&B;6VI{9HWOx*4C5`L8m`Q2=FEp&?VL?2JdUWOrEoK|D37zdr}`o`sL zRtygqB}bY%oi8(EwrTq^-bb&B@M|nIidyl6dsp}nm&{lIu)lFdBQH+CQV`40 z&=ciMa}qmJ2E}~J(UA0Z<`VulljXz?%j!_7r2@~Dv4G3hYK}eK2ktzqF8buYyw0+r zdozTG>|0v)u|m|aV_gl$n8w9avsrUXIQ}h1nx3A(!b5re>yI5yqLo^G*8IakUWi?B z3+6pIwA8&@bIo^gN7VOzvs>%ik+Btd)C8^;7nsF&UC86g@11&Sl)k!o5<5uB5k*AO@6=3mi2X4lz)}IC5y?x;C@jN_AO^lDbG+{qafSe-LDdYEZ=x!#gUY?b&gCl*9v2{r>3o) z*}W@z73WiS%{`P8#YT1evH36~A02Yscc&N}X_{3Yxa?4i4)V-~F$*(#p-ou@bnhX-FobKw7>(v@UUtD`PD>aM!@*V%Zb)Y{4D&5yWL`b7)^aO}R+w?8 zG=#9aJQz^__E&#oZ&J_}wmz}IN=o{eeemE+B^stlT2Fiy9}Ibp zw!vQVzA)v4=a-CT!&w@8+ZO!aH4ReT;Jrk@!&#y<4#hIdwM&eOWj!*}C{I|>OYV(T zYBgs)ukUi-b(Br5FG4*FQJ897%1@dAqOTMpgWELRi<$M&dCuyu*d}05c)7u-A1isr^fwfyG=|LZHiB6JY;(!GVI z9wPa7C%j!F@voo2y2pbarQK5ef=&Eig&$mTy)^s~r3;Z3=>{jUwtFg`1XFuQlgPK! zy0=DWu?uj2^ZnQ6@xQ+X-H&|kp-8@z9v1Xy(ofCcRfh02EOweQ@g1X_=wv1{>Hk!g ze|?2Iiw#Gh5pja#9I`Hk=9^}S1?2L(DBmX|GiLdSU5vjq+B=e?PZp#Bk$-DVHTsUT`l@ zHcZ~e^UFfBon8VdL?NK4I`h4RHxMvj0Gw3TFsdZm`Ez229eBF?T^=t#0t05)D^b^8 zT92KGiz!>6v$6t(#nxn$iY}x*45ql@EWs3|cL2o>05iKd<9*awI!|5exw4POTB9`s82*!ZB+gLaR zWwrpx*8!4mf+XAm&fR~tU4(Rb=b$PSQ2Kk|S1N^U2VEe37eSf#0Rr|a<{!;;qyTKn z=NJkx$>0zj8uV;`AuwRz>F!6B`~lQOBLG*0k6xT`ZHJ=f*Ejov9b3n$t(neGOO@-l z9CJbOJ^avtf7SwgscGR0(P7KUhv+d z1Vm-HeaXd)M%YVw*m$T>MHJa7B>Jwx>i}xXC!cJ3a}#2?bh^Zjc1-RTKL{yc{{fRU z^tE$F?LbQerc67lYRAudnLL*BR1yAocht~}a?m)RU`0)__ z^U^WQXSfH5auJL;DGOSJovv|qaoM2Bph^D; zK@`@oyE@*h5IamsWY);hEJpgUpiMbi=Z42=l6_8YkdxJIJ>Oe1=4s3J&>H zY3CVX+qj~u^u`Ab0cW~!Uo`4Ias=e4XBqQUZA60V;|oqmHfsC)AUQvWp9pNIXN<_1 zhV9z81tFXGZpD@m#kcIdK%W%=={}z5v**H@$(D!mSwjX1f`vH|V4C>F@f>7G<30}7 z+0vSv=T$~&T@^NTVG^m{&ZWK&4)dDx;LM~EbL(H5ZeRYam!oOzXn*h9OG2yG=$8CQ zohGOCuR=pFfmNE#eu2#Q%zg*)#xVkxTVBGISbF}s*{yJ(=DM#oQs94xMpR|iz*apO zur4j9!cHP^bT^FId)Ij#qQEV{tzdVrSkg>GzYpepUeDJqs&_XO;s<7HHu!rGYrZo1 z5||o_cxv=!dgb%(L)^=ZTRbm{d{r` z^+Je0f5FiO^Gb}MGX6LqBNUY6{vnE6`^vGW$;)|37weC)bS^XW+rwqO=8&r@XxxBt z}p(FZ*oI`A9)c1quvBy5ZwE7jHuFF+IpBWm+%Qy#L@msuu z5Q1)&TW%vDlCuJgHq0Ygx2b`mY5xay>LhHq;U+k4ro^RC7Rrwg_W~Ko<6} zU3}y_PIa~w46b}{DVMf|I^I>e2t(_A|5 z%82yr!wLc_K5qm4`2(awi)mqEcJv=BHx2<4`+-cW6&iL*gEW=;q4?XQ1yqy-`2ufX zBve`725pIhYBLz7&N`Y%(Yxp+Vma!}eD<%m%FFqdGAc%YroHI#ozZ8kNrgX4EGq=B zWnTv5w7h~iO*vWjwrc`x2%O*6_w8v8AtNWh9T_i}8kSPTey#7EkCuURW68bFH;Lx~ z&wYHH=YR|);07*sN%nT!<@Smd;S}D_+qBDd^DIFedk#@DmO1lUoMBvWlch0xLbH^R${JUe^>o3MI z$y)1u-!C(xst$#v_obLC<|v;y zAtS+a!G%Gfw+`zCF^N=a8#A#}eRqJr(5%UmhXtVPj*T>(DKF`XS zHzQ@Ac0g~(7S3|}Pv10;nTFQs%=thUlU)31JJ-xVE7W;%efV^GROlF6?||RwJpBVWhRt5~o9cx#X;{tGp~I#Ls8b zv5h!eZ?l*}o8~!@P4ek%9+q>@1k5udiLaw4$Tjy|A9OX}$n2QSWd z~J&DSA4jG}n6#DdIc$OM|V*ZCmP+#t4$=au=Lv3D%rgtF3s?$6L4Wq~krQ z>h`fFcH=YkK26+wHW1sKBYbq%IT;0F3f=I}5{ zt{aqPc95+1Qgki=G2_RDih#dCCfjI6_}LU*N>AWkf>hFph>-y14FRo!*JoneAqD!g zWUDBtb}==<&5JW)nV=OMieg(2eot7Pr|2`Oqqg72LLv^lZ|-Z6X%3U@}xH3sd}}ZW8{vAgv@phRE{ZpFMVn4b*`C=pmUrEpgmtq{$KBCg(2EeTYE!VVX5iH`BFg9E zMs~az9w!K57AlB21G&LVn*N7&5ezw*ype7^jyC);Su|WxYmT|H&E0_MVM~~$LPHL=flU ze-jLxR_!zOM=DNVTBaYG#}siRZ4pW1>0y|%!9HLH-(WHvMTM&Xzh4E{Ue+*P3Xci1jf9HG5MTs$69!aO^q9{dAXsfmg5(dI8tBTK@rH z;6$a|p%XlR?-7g)#eT7WPyQ`r732d?!~S7yZXXi~k)=t~p9f5N!Vtsu$F*~JmxnzN zTvB#}>9gIrmAp&P5x%8;*XtM$blRh!ezrZ{ww%O)@7+v;X!#mJ{>^E9I3F2~d0XYU zbvDa*2y8x#qhSy?Wm{ko7y3P6AFzS?ZT9{_S`MtlAAWWta`d2KxY=yIL}Xr|1~Cm&ME z0HY(P19)!nt6-*LRb4G>VNlWf9)F04KtJxCscV*dW4S{qM(k6#*41} zxmDbsl%q#Ov4Ap1W3Als!tcgem1Cb;F1noV7&<&On$y$PKHl$*Kbxmp=-Fq`m4D^T zL}TnB8No1EoE@aS8`gyi3IkDqdY{_hvCP{2@;fKd+@H3S($2&+Km_IrRa=GJ_!0}c zZn$JaUi0A9u{(P;yr*)R*qk_8potR)#;c4%;bsPZ)Z3kce8@iS_1eHwX$8+vN{#FI zeV?MV^VhH-Aymj=ksJ5a&wqB_7%zeM`|)j?hR<*|C*Ykp4evbUauuN941{yHq7)A4 zP!M>cmE68Dln+>lD8>he;orE2jMFj}tJ$C#w}4Q`m9BOMUq$h3+K-Jo(UF(FzI9Z* z=lUl%<-e(_?)0lrg=!>12la_71obn0QCbESwgUE;vH&72aI$`jsAvqsb6S(OwY?%Z z_+>+5g5b$WI3*XRt|@ar+ySR#+en?lZ^!yV1@ct%W5iqa1@}7gi`|);0iEZbqud7@ zlZsSO2^RhQ{TKm%ZZ^Z#81(OC@U1;iK_mI|?eUL%aA5vQj^D$5N)Q5lZYlTQkE;wX z?*#>qM)TA#+PI?ta1zg5wE}Az64kmi2Ni3tGnAn$_z47d4=5;JXz*#$nr+cMRRK|J zIK^?N9UpkDyoS2Xp|9ZWl+}-7@BGJ>{`xE0i+kS|5*(bnHtrwfT5=a;Q$`t2ulRY5 z95vA=76Sj#0fH%E74Kjqt8^9sXjd8AtOx|BUFqjjM&C4QdJj&z?K>1O<;WRIOUeeu z!0&^o`4g$1oiH)6i{xn+W{3Tr9ML@s2tjNt2+nMNT)=dGO^h7OhaSS4=fsV6xmOSY zu70$+*2`-JlnT}zV3;E4heP@zL1 z#lV?hk=DNs-bbPD{)v(ycw-$rtXjD`zREYK4M04W$5g&_?!Gt_6Xv~E zh?Txd94~A9zIiATJzRyh)DS5DRSSg&8>Kb7gBIr;up{^)YX&{}on0H~hrnWGAxsdu z$fTDw79;j#0XX!&kLLyX9N?|>u)z#0iwF=?EO-tv1m-N&N-wdBhU-vuL66-yV$WIn zNy9zgF(_&_QQQ=(WMa>r&bWbfrM=j6Cm+oS>W>LmkQ?^7$CkfH)iXiY<1?or%An4qxWnB31eF3`6*|oCDyAsE~d_bOE4Wt^X(EK?IDrBdy?xo`fVkILx^7y-< zZG#I?3ipDZQ&8vQF8xhE&Pe=2lYUS8IO3$H2XIGr{viY54OgRjGo( zvp=Zai{+3vdpy3WJ`0^?55wxb@D3uRYBf3__F~WP&k_w4osKE-cjuIsq;HfsA~Wv` zJnWK*AtP~O;F(b9Kb-E$H~e^}cH|nCF1^tHHM67e!vuP6Dbod~l>y+>iFYv3FqF*g z#mnNagxAdptnM2q39%_?gJ*%kkE9xZ*I@@j;!vjC5=88MW#@ioc(6+?dVjvqxp5W5 zweW%pCLHz61aYpy{4E`DYM7C&MgnXoTCsp1d)bJ8QD&T+l?`fGC66fgQhu-2;BpZ5 zHjSaf$=Qdcd0G0A)~=8hji#V(2d@VHO(@A^SM+HF9DLF}aQ3%dx-E8coPrx0->b33 z*Hd}w+lKCxmoKfwV^G*dPjvq<+6LNEQtb`60!Jxj0Gis8_102@QlW^^Z(*8B&JphX zxvqO!rVXGu!frC5e7gwjN6GL+BFZ)&9=zQpr_3HESrupKKad+lg_0|Sc_$#@nGZy( zfmXKsnPM=}?!4K%(5efe9)4m`g2xxO@fDd>J+>4R4S{_IPg??Eye)UQp zuKfq!t80w%k?&3#xS5~&g`a>e!PkKz(xN!Of3zjz5fGtl)l1sJ)E<-{V-ZZG1@s5N zgTA0JV7jp_%}BZn98?(fhLm-tM*!5`kF%%+PA1racn2nkM+-sPJJOmJw|*MNwDna^ z*D8_g&`t2MY8XN-3fJ0+QFMepm`-30{aig@NaILD3I&xszD3UTBWh|oHDS>XD)u}D zte!)$W7d2Ws8VECLtUdCXcax)azH)5v$s*lc<>}c$~5TQ1MN)2eTmk-m-)>%3g(Ik zbQei$Uh(|KJmtJk_@nQ=EZG2=sNjrhImEJa_0itZI9rod>1Uoxa}tC|-l8%gAWeSLQQf?W6NQ`?-mR70^c=4ywd^chl5M=H|=`F4Xz zGq&7V754iN{N&!Zos^{RPxq5hH)I5@Gyfx$v-{HPFPL@RCPS46uJT+k2IRtY;sT2Vb??~v=bSqr zJo@zV<;!g3<+rcrWJ9x^STHC@RHT4pM7VQBo6>e@1dzD*0PjZ*d#p%4r}W_b!l`xL zOY-uh9k6y9Mz2SwdHhQ7z#0URG_bQ~S_=$hl=N*vVK7mP!1iSpYt< z)5^W<3eq-4^4PV{`B!S~VVW6H%r>xT%)4480z~H>Amzt{8oa660?^(9aDFr2Q=;Z^ zSN)c!0l0X=_Q5`!#WmayiHGC34LOdXkAasaXw7|ll$vY86#zA#iqOxc(7s+-HR$IN zu~)zu&%M-M0K{4ds%J}JU+{Vn)!DuyYjK8(4OFWF$IFOq1q`D4DU>{lRBQWpdq$|Z zb+_}2DP4upiTS{)ci@ZY!wru*3rT?&oHvzRZ-RPRdE4?*6?b|!W_8!%V%K7KX-8tf z@V<%&A*#eJ0cn=*Pf+&E<0O821YJWo#Vuwx$B#clj;f-%umR(5-U~-Pq}9D%t#1(x zw1G(UKovQ7iOU5Dl=Zn#GAhwW6xwCb?%#uRKtL+qx)@1x^qon}qW_?*79d33KK5CVZVZmEnmrUQxYLBLXbS*rS%Oy9w$WwjF$rI~oQ_W3xw-%d?!? zUI-A4kgGn<*bG))nd>e_=(cg_;|<-dZ7ERdk;?zq7^N#AsOnz)%tY;K%=cj@@f0>J zPQ6s%4;^zA!tfxRh;~CYn#e+%y0>O7)mHj4lp5Y&q(esu1b)r{H*N`1?QHNf2h8Rk zo{-8LOvE-!dj#h>7;&z*${{g}wbc_KX01Tps{qWZr_DA1VQ*k2{kt57tr{QwTHi*_ zG__|G`q`Ofn(k8zHsbvWa~Zy#MJyoXo(e(c{|>$2KisWuV0-=kn=^5P%xdZ2T!3~@ zEz=y&J-v~(wzymk;423p@+g!Q`1RdaEAdmHtEnBX2Jr=XJk2PH4pg~2G!;Gjxtd(P z02_G~$^=*vFnS}Wzb?KRa#WW=?_YWaQcBl-zkhxkKUfn}3+bM}UKz z)mhkr6HY?%B3*9eR%ik!h~u*btzc8W)MqXjgb8m0lrP_CB1kBaHa7<_O#3&$aBb zep}u-n7-3SP2g}aZNIT#V7&_UiL3{xrW2U(g#J5NEg8*MA-Tp{6Ecu(*wPJOP{@Sq05=$!rrPbA)l)=U6URAZnj?W@;&%<)bRaZJAQ?V8(uBhso9awJjr(YRt&(@oq$tTe)ASQl z%=Sz?a3CdXp9d5|Mxtc!(q5k^7nA;LJ>?78PR)DP+3Y}stC!1ly~YfC-(;qtw}INU z%>CfrCl!+o8kpMO`~*LF5|a-54NL0Ebb#uatfbB{&3;&$o?NTYLeE?M+JP0hu7Ta@@o)0#}- zo>UKOP8qz5N+M(3bL!B+fzX9%-AZ{fKu(5 z#Xv(fb^?{4213%w3!)Y5fXv7;$jkuk8!Sc|iBlG-`A;o?eSK2!;!x&yRMIiN$qIuf zQp^49t7v$4_;tLu^L!Ns2+{cqLhRcHNwtfigs_un$7&*yQ-)u@0ua@U>rLg^@GL02 z_Pv~{O0+(|UdV`&F}7R9V+?Zb_x%3gHw4|iBmbqoPB${s4?CP8OWSz!rfPk%RS}C& z(}guLnuq>)T_>b*B9f$FXc(q+WacW=QYU3yGWRW2l{~YfLN=t0ZF+^^w{LqI_g`o) z2e1=W#XfIz6|edEv`ikU|X)q$DV0?uFrkE(MP&J;r3L@ZK$KTQ!A(xvqC8NL`HN z0U7(fmq{<{s&@dzfZw}~Y9#S3M#siq5gnV3Rf$YlucAqOR>_PErUjC|M#^WSv)0lA zu3Iu^pQ`07w@jpnF-ggZg9UL~_8* ztf!g;!u89JKoc5Lr4GyP&! z>S`Q=vK2=wtA^p0996&3tyIFEnN&Frpqdi?xtp4o8S)-+uOH+@x{nzyOsG(I=tY!_ z@AEL{4F(D_#XmuVZ6dz^MMc(owtw^Vi^Ro}ZOPNdR#Lnp^Xz(TGx2M&#ly!G8aa*p zUjc|Zh!n+2sS!EcTjbtf_n+t!5!}rx!}75`#;**1EY7pU}Qn}=-=A0Odrhk zpGWZ@AgV1V2uvbI0>m7J(6EK{S=BFwnAw32#Yg^WgcX}^Zd)yHSEP-!x+gn+H z+nF;+@_ITt-~(1mi{kzKdhpju?@hF+Caft}j*(&RN>=j}BGHMI8q=3q1rVD{`99ZVJZamfqrjMVN$S-E2Oo(d#7%$6Sfv`HM%|Gq}*2~;=Jo+c?b9n zU!By-Z^P{FJ~`5~wlVRKrq#-#f>?9NHAIF%@2Cj!yP@P*kiJjTq5DM%KQ??m)1VRf zmZHwR8E+lA@|?2117bE9q0Tr1SM4%o2!Q@r=C(gLMH+3eT#Ec*zlTUo0tFO$v| z?A`-$;I@`U12^af=_z1@16b2Net2#8c!^GOp|W!6k1&m-(rIGVx8hJ?sswmxXCzU` z^f29MOY-PFBK>qYs}hs+>U&pOl<&|)>ByZ59LLZ9kbW}fuJVizhx$CBJ}f9fM{NnB6k~_A%tG#4h7& zgtYq=0r?B zKO%~*&XOe+Y7-wf<$7y^*VNK#21lsAj|IpO@lC);@L#anWtnai7au=l8lK! zDxr;ZeiwDZfLdf@F3o`wFVfnjiB+v1XEkChmJ!6!&tP#sWz^;g{m_F9hrl0xIx>~K zVM3%6N7S1I4sn6aH53n@4WKBCFLzur(bmZKcGr6T6UhEMGZLW9)6!TDfgejJKXGl* znr{>do)xI$_HJ!P!|qV+Xh@~pC|&hTD%4a4)QbFD?(DdAFC>68Mkm(7&?1&)JBZ4XgyWL$vx?lCN` zg5g6si3`IIdvY)9xqvnGS(-OhKd@$n1}{SwzK&K9zOmuoRH^nQfVjWt(`Cte`07B@^RDPtD{l6d4KdiyKVMvHAoLb?S*UdOtR$Bf9HWJsf2yq{T_DIu$d+%sj4235y*K_JDJRf4RUO?Q z`A_m`76<~CtO$9f8vw@q-=HFd^4tYhF%ZB8XY*Iz-!HBUuM_HUTj=@Yxv*S3%9E=~H z6-e{5kPVed2Y9|6|LtwkrntkRRb;dag?>Zal$7~{NbMu*Zt60Es2Kkah=AOo^!rXK zGQ0_iL!ckn$Y1?Z`!9t6+bBOTLARA}p%=L)iS9L)v8Wd0Fcxpp_^FV}aY!+^@8#YD z`g_rR7yh`^H|4N_OJm#GWtm~*6#X~@c0+lOpRUXtoP$S=vAcFbyX)l5B*={QJXOU} zK)v`%$_6)4}7xRnZEv1229fy2c?z z?CrU1;Q`IvJk8Mb`gW4KMPJdHe~ZGk^Nd9R{OzEebb}Ju0w97cA`sOj(HZjxlv!Q_ zDN1?9(K1+1gbpfUm%{L)!pra0Kxpxpc(tSEsS1_?`qgjcX~Xk-44-xaH)a=zN&g(9 z3OOk(0A(hWxxGOoNvC5KoS)I4t7~SgZHZ4t zmM)cwxI>w)h2^JwOF z^Vg?iMgcwfOg0B2e3YBgye;UD@j#uhN`+<1-9ZZMJ=R(uu8Et!2z?k7lMlQKt9`O# znOTRoQV4nbglg;l(6fM(_}DxmZ-?M^WM3e{?E*H(3OJHRkbkTLUl_bNR}G}rKsi(& z9T4+_3XyK!`uWZ`k`VPdw`LD@VOL0PlQc5@`6<_j9A+eUqApi@}s1`d=#2U9G5#N);KFQ~q8 zo9mJ(NN54MEio9DR^AP1MR$nt?sqXW(h|KU8xm1uNU^=+MLhse%2 z1&bOW*QFN9-Z(T`!gAnYq7@|JekJ$1T>$Oo0|Qno z9VUP@-$$L$=X#U*u62DGq)7-L&Mqx3V8V-SO!)9jsQxWb7E{cZfJ{;oHP5OBG_3x^ zEgg|t$~vcCP`655|0aZJ{jG18Lr)0k4IZ6!&Rp6TUF87H*5~=#3!EOI#@`Fie3DJN z2>!r&cS+`C>!Lo>UFsTx3sB%wvV{kOq67J#q!{R5`IeszdP^GV$PlNd#4+GaU1_~cy_!Le9n7Y?o~X9NTXezGAG1`3pL)*{dxxa1a?6bj$sfC z1*ThWprH047r&fQ zE@sO;h7JP{i7xD1Y;-1@%FF*Ps_7l7oPt61x1PZanrq{Om=OZfgXwhhk105%ZPBZ1(W&O_tOI> z4cW#Q(D&c`7cagSJhApEFA1K_;O7%h9--gT=0Pf2aCsmKXGlHC%=9EyoImpf8mL%+ z7VhsLj|X5{Q?hI|%Em4KddMlF^0Nd)|QX781l*!wj2ac~;5b(#pa2{Vd z?EepHh>(onFu>w#+Qt-CNt$1|iz% z$PUi)AJF{$ft2`+dmCbwA;f=zg_RDE4`K+wquM`^sG?#GB=C&Iz@ANu{u%iD16d2c zsC+K}-&*=#FfWvW>{0N>tz6v6N`j*llwpFF?(^H@3r38euSRlz+gZMstDw!|`@t4I zh)JH0ECY$Za(dh=&XJ%svaIvoe*fAzZ3=>rjduFYCPDr$?!WihUu%gh>c4M97YS$n zhz->_p1(@*%FgT4@fzHV2vQw;${0aVeU|1 zS_Pi%KA|gP&0DGN(fG9CYHj-F*xh!PfJ9-0sLb%jAK#f$nWhtn;oOZ2}N&VR{8jJGYE>51!<&@~=)x~OHk0!8d z^P)>e;YD#)-?}2Xdi(_v5GWlr*`x?t|JhQ&;D+{I{+r$mf(+KQEAT!qU7>IYvRbWhHZlJ}ditf9bQ`}1BQWgpq3R=r{XE8OeRXvUB zbP685pC1OdlkUemN>9jdzx6zg8AUtqhs~=q%dC$5tGECRaCpp(m4+b*4iv}m!vnrQ z;Qhl+ki1NJDo@E+uV#--+UNSdi?q0}6Ku8pkk#y;m&vcIZ3_biotj-LHtn;|Q9N?V zuKd^~W@dkZ4SUcplygEps$Ka;WJg=s`I#0h8Sm43R&^v^SC!SJC&tSP&rf?}M7FsS z?v+-gR&mVHJ=l}7h`Qd;;yjSXrSCcrxoc+Lfit?Ok>8~uAm-uf+VJd<#^GK|?6o?# z1maKK_3^y4^<4v~^KC;5D8Jdd>tgv(-7a8O<0cy5?yDxSsF9nlXo>Akd|FVw=hy`b ztNg%S74hPCFB=kzI-c`)3>|{<*K@(z@#|NZ7zRSO^sls_f`m|xyy1jpmt34xvV$5K z-F_hdX1$H)`=f@O>csTecek{jG4G@W~Xs5VjQzGTptr5Gt^f%ACbASs>J8l0d= z_eoZ%W0ys?KXvugI*jfO$eM_=Dheo8tiIxepJ6arD3>*2#9gwD9UV)e<+iaq?xRQF z)CYA2)u4-ITQ2PJV+83X-!t=Mgb)8b6(w2_*wJiJf4R$*n!Z`pxmQR0uzM2LP?UY2 z_3<2VLyO(N`8t^cU`pBsum*K$OOhD*;1xx8mO8*{c0`Ei9E6N+MN?}`=*0&=~}sO2G;7>zQT#$RN~ zZQl6lxl1`4m%fIGSADj>j%O)$fIMibQLcGq-F=qbX z{ivDxR^x%&t_BrlW%nnLua0P)FU+=m_&t%?NUgh~XEoq!8@xDqzUwgT+{K{xN6&TG z#oTo$%(0c*>~JPtRW!eAF=f#Dr0o%-H3?IH>2M~feKWHp%naxA*Fv{JCv0H18}&=- zsBc`yJ!z$5e@uUyb#eVBww8%kp6P1Sb8yv(%NotU=al`f9aHGpvrU|1To-$AoR>98 zw!ZRV(%wVyO1}L$mp4geUmiR=X7oSWBOKk^yaQeQrK`SdF*46tYo@Go)AJ(^t!tND zQN!Qz8;D+AZ|w%Jwuo_<_ai1wto=be?U47-b;2t4p3~?fboFT4%lNCl5~3_-mbII5 zTccfg&PjJp)N>-Q%L9w3eM{|$!h_kPt?{GNg>(3`56}8BX4SDHwKrxa(!?w)HaI_} zCYC9uDbhSDoj1_j4Bcw&-L)=IzND$hcY40r=ps3poBb0Iz?u3#9d z@8s@<@^UFR`KN)CJ3NdutsdN44k!3#;gYPV%^aENdvF(@Zr3IY@d?B8?6>$R@?;I9 zJY~H3m!TElAGrWcC|Sxh=S_1@?l^cH;JgLjt=bdSMucr@-V`>%# z1W^fGDw(XBta=cHVOBy_D{A8(BKPYg-w>#Wtme1an_N2ty|~|Q>ekx?f38G0GJ-GR zr*@R=-yr{=f#g2!I{=Z4vL(j!mSGkIVaQwokYdu7@Med{LqKd|y#I5>*8eANm!CFR z-~Stl+h5fQDkfmmXhXL!#B1H{+-f4cbuSm HDeQj%&{~I@ diff --git a/docs/_static/ms_bot_framework/05_bot_channels.png b/docs/_static/ms_bot_framework/05_bot_channels.png deleted file mode 100644 index a44ba0de6655fe75e34c91f64d9452f0ce5a8781..0000000000000000000000000000000000000000 GIT binary patch literal 0 HcmV?d00001 literal 153051 zcmd43WmuGL+chjH-H0?IAt0%wlz>P&bf=V(Lw5|R(ozBfD$US{bfbVsN=kP+G(+

0dZ} zPzbH3Nzz}R;eS2Oy*N(v`y2_JKN@lV=6e5nc1sWN1gZYv-hu!6%bx>7U5Q{?Gc1+r z<9`hJf8R_R_JgOjRibKuZzuNu@K730z%YvV4w(N>cbml8{+`G|o-*n57Wou*cz^eo zUD=&`_wGp$$6I9o(-S-dQ!*aZDSOXM8I0)5h}GOb+(JPD{J+qqk4wF^;0A`L zUN;>W^n)((eI9}G8yH9eac?@Yf0#%D%-gjG&L5;{@WHYy;m)t1R`YM>(To+`S|`|2 zn1YYO>-zFnEp!1L9sOcqdpin`a<=U?c~>+uKXk1sVF;^oxtln4jC}XSB)d*s(tXkW zw(KJoaf7ib&P&nLMp&b&Jk#O+V@ob$T4LQ}S$jQ&H|HL{aDKW23Yp!7Qw~J00sZF} zL+H9?Cfb3;Lk88hVS(3|RKiZPD$btzRn}i47z19Fm-D@s3S=-bF`1~cfxJ0SBElx5 z2~B-q!_33ufP_Kd&N-v*B?8{tq5rv}(rn~r^O-$ti8Tqw=!CRFG64`svGD<%MkRuX z-udjuOQU)({yq#Ox~XGFlYZafQk>a?z=pzI;}Hqfc^0Ve$?iw}@qYa#b{CzE(AomN z&aZNJq1O;E=anFR|K0gztRERvuSKdXdm|^-&~!dld(3B+T|UHQG!9jm>+OC!gTX&= zAsGGhj+42FZ?gP)ScZIT4GXl!6)GQ7WK81X7T?4F?h*0u$MkgC_Kprxa&n8xK<=A* z51?f_B{bW=e>)$ptBZ(C^4vpUYt0~J$XVcG5xl+PODJE>*_)jG1pRN zg|ipqeb6U-$Krc3H|=u^iwe&jlZ%4^sh77d3|+3R=N>}-GX{Vop$fK*Apaf{-V1U* zVfz#lP1i7AAN=XADC&?0HvMK(UmSh~p()HW95CM=xSqv{5XYLIu^>hC+oN~N+ad0< z-c+i!PkBqE^4@Ois+wTtt@qGy*_cH!eXVc6XPnqT+BXw}(%G`Fs;WXV;**B-l}BN& zFc+L(kAALu-@-zaiY|;=5&2Ph@b|=z9WY^hww|B;@EGGLSXJjo`c(7DBK`$<{!Odk z3Twsr>0=YHJ_dY1Z{V?JZ49ZUagWeY)MnV-8IEcO}8y<7W7N*jeCa!rhAL~0SYft8$6_HDem2DxF3TfCPKAPb1fiBz8d}} z9M<55_-RAWP_mt`!`;v{L^8=$9nus+F2jp>(*2qex_j?UV{}9tPb4|BA1^Z$`Ea6P zRXS7Nw2eQdMoj3WQz_JB0*gIMcy!bII{NNtt?TF|dOR`j(%0Ok@AH#&4p1&_Lx1Mh z+!t2MMMJb6$ND5!b1yOJPkPpmme3|9?|NRspxk;4Lmy$UA_-*IS1#qxJrD1_5huiM zo9)pqSq`kURYs+Gr`O$RuqOH}R> zu|f$8aX;~opYh;Kh+>fv^JYuW9o0T+VBLOhF&{T|lc9udelL9C?5%)eUy{xEDbBQ% za*2WNz~i~2UE@R2Mdr+|hMH+Y?7Ll4+$5A<#E&Ba!+kH&YK4&)B9;UAu|jjqoa@a# z-$EewW0CL0uRU*y8$d#vG#r^DyoNWT8jIel8~0_{c+b-g!v<>#WC+V$jVE5GSp z=K=BryHU!7s7oA+loCoCE9@V#&hD9zf$th9>4xSlEqz1GeKYitCxYo|3|*WHkKq@4 zbu{O_#jomRCKA=ZjA9IH9O*Yl3PZZP6{^pS48YOQ<}=Atq4rly6`u9N5VRdpn+rVN zrVhBi^jsduVjP9*cD#;LSztcP$aPIKXenQQ7X}ZpJvz~#0VWP$< z3-5?kfmiKA5GTK8rKKVTmySwFuxXVw;jHhDL?W*#!`0Q5h|BV;p;i-g&(ln!rwI48 z{)}3W-)S_mGxhH3pX6{UqvmRzWsS2na+Nl>NJ&X+z4o4e+G7*s819O@Bg+?f;skZe{Vl{N#Vl3$rzNhm0Ch|gB^PMeLM>LAh&1b!~_Vln)Y zOwvHUv3tJwn1YX6Z8ch0Z7<`EUV}%pSgmZ=GZFtYt+%<3#CW`w!#Z2*E45swFdU;Ef>uyC%wmQ2%ymDmu*)=TF({i8te2-)miGomoLW% zLjA|4d$19(n`?#8X-Qv6FVD>Ao4r!9(~(TALs&$U5>X~nZ0oRDCd0Z_D92$8O$z45 ztV9a)dN=P`tMPc?`wR&@zca22p%}T5C0ghYQ?k=vDq5MRkFMe@=QFovO1*ZKg{yx_ z+4Rl6io|)cJ{rN*flUW9D$!P3lLZ182*VIexf608cw)TbjVl1oMq#DjiN#Sw%N)igOEj7Z^4VUHN z#bkVsUpSOmWlMG9bld#U`(U#|PlBql-8j?Ac>c;D>u7T_n&>#p_h9ICCOJ3PwCngb z;oMgVE!7k8=!n7;Y6zr6+2v~6*yV-ELTLka$pKRrEj9wUzB@gYIc+YC39K=d!Yhg&-5?#fO2@lx@)!^G?`_ zVS9UfL>Y}bKbaCi_Pm#IBOAbqqCDTpN~=+PZ2joc0AA1a4?3Tf*PVLsJe{1JteBS` zjIrF4a$6_4x;l5#DmBvAS6=5{>uKs$#@tI2b1(8C6)M5^E3~O|FNpiwljJOKP#M1)!519T6 zvdR?zf_`Ez{u=!o)_vR=%?D6OX-*Vo76ft~-tD_K(pPsYkg7CylNmZCN2xJuI!O8| zM(#_EVcQ*FNleC5eR|#bNbXvs1%9>9B!`oEr+m}>)3~qP1HG*hV(rUI0zQW?DAiu- zM;F@G>e`m+K$W`-g<1&+<&HgnG22u-zVg0WUO6(Z><@11J&IVWwY%Q`X|saZaBglfw(0Vv2yG4mKzqoU8S zz89VoO^7kfO59rddSv0{V@vp4HbkXA0&%0{;4L3C(D^q>dnu#l%yPl3rt!h)RWiJLA znxeB|vf6=+N4lmi^5~wpyLRKH(NfnJnph*lEIG+ZXO#ltg{{UquK^@BqUS}j!gt+O ze=Li8@0k@V?q2?cNEIesxD*$E0!Jd(2balx&g!Ve$aK0>O{cck*8Am7^LeRH4^#DK zZq{U#4ZL;w=TA+4=ErB$LiX=`PVcD3P%@sn!FZr45cKZexXW|omhOq&6>JUm*J{Iz z5>*qJTe4G}WChiI^aVId;#WXqT+3J|{xmDpuhQXdfOfgD(ROdSsIN&VopG__S@l)A zbD)Q^*XK*3f&OCM7407{4P=dtv(6|HYc+*$nJRrG4lg_`%O;+MU9?4zNBGB_z9~#X zRK%NmRXU*K`PI=Q1MC^3pn-lE>M+l^gBi!Oq~SVR0-4Usz#~{~anvo0y5uHhk0w|Y zciq<3Hvqr2Lg`FP&l1@_+Mmo+y*~5oHpD~_Mz9b~j}wRZCXfQuK|6$2GrH-Q#SnSK zYDFjU=FUkgq{wNUZ0F9_R`s6clw4~Y?&Pd(@cSJPZ+(r`J;DS{%l<|df@#?@y1|y| zZteV*dJ`4BxU`TDcct#GZdfc|&-+%_(;XA-o?22el+fpVZxS~VrU<`e71Y&0zn4{f zJcq;(H;4+`88#`pA6=2GHIEqbA!Fd)Dv@GcaQ(yqMc#Y!0mD^`QDeC4cQiJRXdP$I zX=HTFLoTSW4TB{CuJZTwcB77i2%(o@ra<_w6s-RNOr8E&5&u1F!V!>Sp_ED3v8Lz60CvHqGgrkRfX^^b8 z_l@6-b%jeT*m&B!7?$Dy%O>{3C!Ho2L#evTn$MK5R=)}P2^O>R!lza;t+3A6v-~~@ z=_L8a$N2+{#0hnpYi^QkEYrw+bC#BP#6K2`xph}Qsx^^Alg#0bomXSxX*?C-{%Z{H zia+^?LM~8@yEl1P>&i*XAiDx$NlE6*&{kloLRr>$cpt4KiT z^j840?fnTeBzcY2RXlWPdE;@mc>m1LfAoN4QIk z-$c|Exp!dHU9aOTI|;9>T+DdY4U6;ahZ*$n!cVfUt5_$q6yKycqwgIGsSFm6>l$+kwBgAHA9>Q@N`zjgIWSgO!*t4s zUyAm*OY=1RM;W4CW6F3{sTsMFV*32o=bS*EAN4Q)S~~`zr(ps7nzsm}D36u~@$L?- z&)B7mLn%j_>-wmymydm6{Y&GStz{5tIc<^h9rOFY(qBc!bpEVL-@EwHTi>H7$Wgbu z`|+`=H(@8Puxiqvpk1oZuBq^C(IUTB+mr@yv3b(iQuydLnSc%V9>eo&W)c?{K@+22 z7!AdB3q$rfoWggJG=6UJ&6scO>9&kM(0vMFXq{zE$Ah`2=WT-f^5t6l_j^8qA7mL% zMB1?H-fjwLG<0!OhgamjSF#+$(3XzbEE#rvK=5HPBmD~09+t5@2l&+|wzQ_8G>sqO*Mv z3Waq&p6RPYTZzPtCUu*P6!%4yKZQ3e;CW^byml4x3j9pM&nvV0;;YT;`jpn05;yuz z`EYpJg*ptGblOYx?AsbdEItLN|5Ul4et!stEIvDkhlVv|Wo6+-SS9=f{AH2B@5;xv z3+-W}Z0~t^c#87sF%iQ#6p@LpLk^imTM(VB-tZi<>8DSh!sB$6q!syiRFyKjzuOj) z{@1Zk-bgAnZYa{=B%@PS;~9?|ovYj{NNkB8J;hDO>|Yzm(#Nb9Qzv3TT+F zde_~H=bR9&lN&Yg=}F{$ylGam6ha59+2w$o?bzbkaV*5Xfb{p(?rKXSE62(IU@ZZw z!Q>8hV}7Y{nLJDmy^y*#LCPOjVdgwPe~fwYdTTY`(v9p&Y?UK)zJqt`alW>zlNi|m z@9)Q*zcpV9Vw&hm;=X-g)9At~_cz=qlRGHX@?##9$-r2^Obx?$C&L8CJgbD!EeJ>{Yu|DGhes#4o`>)gY zKb&^lENT{!)QcwqO+lE5dh6)7_^Aw0!V3S%8~uv`hq-Zr1eNiOAmV5f5VwKut7)D} zfBV(AI&fhd#Ocb6)eV4@sLkGr$o&=e{|^WEg2ZgT6zz3{T5B@k;2a8joBx4_jq^eA z*#l|1pMuoEHU0ShX;7kI{cK{qmyIpWQ1|xQ7jgTCd*245SX+{YJ-h{|(Qo9fy_)|T z>VJ-^RSDd>1hdESo>?JapOsg<82*7&9!>$Q^PwOkT_4xII0ION@z{H4$?UC|vLyM5 z3c^>lK4czcAB3G!|C5{j_X>U?Gn;qh(mz&4i$?!KG`+N4*?RQb2U<8xcO27yrv5~t zK6(&?i-Z;s)(Q{$&08*yX@6DQrEFq&I6FTEFh}k43w63j9t3dRvS&y9-B>?2uN=Iq zs%=MQ3N>=e&DzinO}?@D+bma1VTNXC{4rR83hAPoh~L&m8t+>OITK#%d)rYA@|g9u z&U_q9hx)cWs9gSG=Ozt~@T*!k3f>rp8||{y#0DMlKl316u*>rbi!LI%##gUisUfrCWw)ZlJxUI`m;PppdU1eb4nYUUytN3{ z5c0r?OE6qHEWEsT(d;(8IKV?^((XVbY$?Gz^*emm<4V3$~H z=Ba$`NnkH`UixMU>FNIHF3->~=DqCv*AfN97qGb>NW|BN@8#>!eu8GaT(AEp;mmBV*$Vw+#({ z+*_jk9UT}=zJb9qVR%IcIiI^@rLJBbtPQ{e!N95MA9<*Jc4>Ybf_=}VQDn70BfTKJ zE+hoQ=V&ubDT(urn3zF{c#F>wO~8fot>BQ5g{7sC*W~Pc^}JW9^cb0nkL_rx9bFZTL= z`#tRCH8r|x3e=fE*eEzphTwI-@z!SzJlfR00LY(hO%7h|Hd1ab``KYiXNNgU>79hH z5QxmBtKGx=@90>9TvkB+om`L1OXK-`g`5i<%2lR1GE(TfpybjiSW_8)|Ni}8iJ`FI zctu6U=523lKy>6t$*-DUUtNB`Y(mb4B2Mz^<-N~B#3&i{GamxKa#{URDAOnE zx;j4~b**IzH*Jyh^Y;h5?~@~^!VPX(E2USQD=%3-B<nU8j4<)ub-8VA_LxVp*g1O{I_XC!z7 zF5LzDGalvh+~uWD<~Gc0zZx7&aX*xhkl0%vx_i{Zt#8b;H2C~-uZyu`YU<`(;B^M) z472KHHPFM5o5=i7L~_jIUW|iL-45vLkC&7I>9)@~23l8oQ}|t1RgTDHwk>B{POU#n z-s4sO6za2A>ALo8G`G<{J_UQ9S$=a%Il(*5VmUWloND+D1;;0)scz*vxO1$~lsXVZ zvDsA93O0L`6K-6UwOn-`|DHM;C>MXEVP|53t&mXO0?CqGpRCG1^K4%;0}yM|K+Kev zm-lG8@=vMwUfdp*z>|5N$FDJUdQ*6H;kQ{obLv;J0EKJxjhdaHe(L$fscW%m%dCNM zB_kfwSC}sYuVu;6o%eW&l||M@ZWQg19l41Odg<-nwB^-@7O->hNNELb3Lj;{_#LP9 zs`n)dg#alBg1${?zRGa#mUhy%nq4c9!oqb*^z ztZ;?e&FOMk?>c7qhPx}*Kwfq>IU6LCVyAwIU8S|Y+D@B~7uF!|YTU+i@h>mHK6>}u zoN3_u(53z2*Rc;bw!THv?wPE1kF&m;D>rXCvy^Xd_3usQelheZtjBj1?QdeAr8rE5 zv{t#Cg^MKt->8^h5KsbbM_c1e;}+RL)tKOPY<*kMtG!=2wjFf8Za>%S1*z_Jd!)q{ zDN?+9u{jD0kri;pfsTPee)RHwP@)Fy*Ef{1Fmx=YwiwDvD-zftI%tWZ1}=UL9=}JrE?F%Tm>`?Xy+u7h6?)zscgO2@HM-Y6 z%MH`O%W6&seup!+krK_PqoPE_vC<*fN7~eHa?beaUX55Kiwy!O6Pl1vRirE7yckLm ztBuhAmd0xuFkS0RUumHGE8VHNP^Odl$(tgzr!um*VL6J5gt#PSMs=UAxLH|U*7~Ch zHS>!Merl{rEAcC|vMv1VOBWuU1H+atovy2}I42VHz^%%`cD_72N^D59A54OFOAiFr z-!969H8jC#U6PAleW;4K_GI)phVq=webr0Y=OeRMN+rQn^F<<}Z3eQI&Z_NdJ$EtSJL`0*=8@No%d{R^zA)C5SQRO!^){e zk3()+bkDb?(N|+4G&I&gfOs>WpC})%AxcKTYtncd=y*ecrgjq!6Xq_UP%gIOS@xU- zurYcz%kS+|+uUCEu=~^@w;oRNxzE&BpOMM-WF(Ay%hr&|Q%)(p{lNOO*v!lfJq9*W zXjMYhS<^lET5joC!;a2R#)RF;5##Y)Nq50%cPsO^`$?yXDfe*Ge-t)K#}CdO#u)TmYSd%^?BaB7_9Bzu-v z#m_I1szrW5#cWBtOOd?IYHMNP&8@*$2inr?=9!6Ewf^T{&hElqUBnM68nja)T8R=6gYR>23I(C^l7_Jd_*PRu*P@9LHs-Ih3` zt?tP!Cb$9sV`;Qy>>&)qjkf}&vtfH@DGx}FcZm>AS6ClwOvl7^`z9&%xr9SpC&f@O zPeXK7|AWC0gZFQW?W^k-qGhWQS)Y8WbZ`iEyBY%+)3&>IZ&(1{F?wi}Ls(qLWb7db z^rLPmDW)2wvbvf^F$rsG!JfwRJnnTs(kTf*03{}>NE=?1B3W;JTcmfO#NNktdrkGb zdXF;KOkM`q5cPNnunP*C+29{m)>P~O^QGJeHNHF!z5I?JbVoLPaVR_Dom61+*%wyx zny8M%>cqkLasw0j+jaua^s$^aZhwBKsGlS}=f|b~lFDMumT-CnYbg|$9D04_?=M`L z>HsFs0kEe#?)rV1M#{F^`*RrHdw0^;zvEL0{T+a5mJTPLP8W76MaReOd237dd`oC5 z!Ga}Dza=1qla@&}IN#@JYfHCMY4AQg#wxeO{u^@EcXY%V=zW@(EOe0L?a9&bP}6|Z zTuv({I1RZ_i3PowB~A>zR7{1%w{_vmgyv<|;QV~u#!7s$#vWTweY7H0D2kSCmahyw zCS#*^TxXXglT)gD%?`<2cK{Hj*o3V-mECLu4@_-ogF$!h&)u0Xn52-an$u6l&xkGk zcQ8b#%FNu=JspC-lM&tL3qNH70LQqEXI?!vTzZV^wOtQxl2G8$i6y!#o4F8;un?>wznN6MeFddBb)Ujwj6Si{wJ5 zp%L4?W+zqmxZbkPr$igMXVDCaWpfZtyN~k|IIrZUHRcTZD?RDev!qkh*DX0Kbgs@h$gywzUNd7T+HTiBZk|3lq7@Ig(8H&V{X33?9$!xaD(qfXPq}NL zTb{lNL9b7J5GWl@Gnyc-C1Z~}>b|fG0@P^H_QZsqdrYyw1whr>dfW>MiJ9@EivT&w zB(O!R7izG&8@0De)CCaSyxTVR`Q=Zojbu=~eta>+9G5mqw#{n48l^46Ul@2l7vGY+ zhn)Rc!LXnw;)1HoX;N!9g!os{xd3HtuSmL}U0yirQu4m=bNh+?)joWVY*f*9nZmmM zHtDclyDo1vdhR+as2bgKgI|b^2Cd6;Vd$#*6(RoR+=UmV>{wdH{JWMoz}b_%{;$&3 z3BCPF*H;YE_UXm1H7*L@Cbve8N}c(uxr|3ARtqwh*55R;{Lc802IZBp#zOef~WXEM)|wHenZUuEU`(+N>X>()MRsH@Y%S6-%HmB(4g5=hXi z?!G>V1_ZP=fTn)TQnkNrodKHECDHIGe+reqgwOFv%MCw(d9{t*^iRvGdqkr2l{fC; zW?`VCYw7MkY6Dn0?^(vDGFG1SiGcIpmfAi_HK~Q5pcd2d#jOSdT-S>RHoSsO`nf$aI z`G$oFt5N{9;DK#Syw6uW^BYG7FNGE1^s#?q>SMaZwLAgz_iIKGC z91_`6#p;{2$~V~2s?B@j??xDS^K>bR?5wQh2JhtncK%FJq)(%=y}#ci%wYH%R+cN(;#JRCx@fRhU4s3bdfu`EH;ZZG9d?4|+~NdGq{dz2cK1o? zgT=4PCmZ>bwqXtC36(tizS2t5>n6i5{omflonAryn5eKA4ImRp{xEN*k|m^i{eb=8 zj#QO^bs?ps-Prwvj(BI5&jAcHZy2rnqjYVbh`oiBcQa=6S_B4OJoNn{S)ObG@_5f< z8agH<8*5>XN=hTa+j;#fqm}@SgVjFmgAamWf(*z4loP0g0x$M%#xP37AZDa073X=z znq+fPYdlMYp3Ehp;K2M@;crwG#KLMVrC`So@;60 ze3A`hQ#EG~kG8{&pChYl?i{jrS;ZM7_BE05TJtJGY0JrlE=^P{*WX5Db%mk4zf?XMc(v-95_Zr!-Ja7j*!(c? zUi4>srZ>&sGN{r@SOTi^nHo5m5ZGrjh3Hme1ht#U97;_YVs@@eI)HzwIX^^76fy*D|(|=>*@_0Zn)3#L_;=P~*#JKjGuD@Xwz^+nb@<;iW=MFlCn$Xcx z`Jp@GWsHEr3~%y97E#lQK$$@8T46saaJ0kQazO?_PG?r&RqBiINGjeK;Mp->?8IfW zxvmmO1Gv9v4|LXN?e(%OYjBioY_)poNc7FZ30J4%^v!5vND8(VSzjOjRK0reiJpWur0BQr zC-p8wTE909SyYDd+QR;Vaav4D2+CFnoO|Yo>ALPTK`wc}%W5%Xx_b8f{Ctp8%Q?~9 zv&>?dlTcN=`OF74&p=s>1A{@z1Fr*&8oksHPTq>Cp(BA|YH_T@aFewS(3#HjtuU7J z-=lCU9@FNirLNfglV96^(BA*GY)ngFJ_k-*yB~GWqbFk{>Zd%uCop3DI(~Az=#-ks z#}18+iD`c=@d*0jl&(`w{@wP}?hRm@BDl>*0%~zt;Hy%vcYE1*<->7bkB_mmwA2>s ze7PrKj zQc|PWfRPy?k-WZe#l*xc_%H`Zu&Ej+%{FfhQ(?Lq9%xJ8H5$tIeY6#*$#t~CKqKsk z4q~9V-+CnQTcYfbFDA-KKXq=LIB!o>blkuDr=I&?8!+w^C6+XcuFzZY!4grU?o;2S zKlyxdyWe}CodaZymi-;634Wxn_|ZYNp6hh&o?fLTKd3hHbYQ)2kifGB9~1)~(UVK$ z+|U;X!_n(-LfXL`MRM)TmCaH3Nil>`B}3pzk}Wl#*$u$xhJd;1qV0V}YHG3aopKjR z-1?5+g3r9&OZ)0cusg7`*-zJ!0&eQb&TN)h=c@(900|)Xvs%5+$Z%e3xz-v4Q%+n5 z7zU~Nq3oyY!?}@o*MT7N1O4-B>`lnup6XN$aOU#f>&$A&la=etmm)+XG4@Aa5Doay z14D1(t2V89hT9}TeFu4q{gVkCY>7!*XlM|%CLec|3^6QLr9|m4Jj(g?^;dVF#>@}r zDi6D1aCh&{{5(ca(9>|5p0rgt9|K57Mw=4e3CnJj2{PNN5?I5 zFOs>_Q5JT}Vn!5IXPaUcxY+snwxl1XrvH*lO5h>|H(dWp}*?dUWSS;NC{1}Ija5Rhk@zwcGODy#RYv6s* z8Fu>hSux2!UV{&uH|71UaA7$9w9|Ag3=qBTU0pZr#){iOoz3rm_7GKIMC$_*<66S^ zSc?}IR17wh4?JZ4qzO<_v;w6mZl5l#KY#^sqkGav+O?70o5pz^M$Qd<;wF!J{PxSO|2b^uNkGg{ZGgRklv-~3!pX{v6 zu;y8k>&95qqf(PP^g=A&g;T(ekkZn^`{OD1m%1RWC^XThn%jei#3g5c{vA0d+kOIY zC(6qSb#rU4celHPR*$*&t=Z8l7L?YCsqfZrpPWnSj3-qJ0R=%W1e;LXJGH@cr@%Qp zA_7dQl9zr(i(SgGs+H#Ach|ocJcr9rQsN_Fx=Ai(Ai-U-ex#q~8V}J;aIEF~l5Wsu zQtb~(gsW4vT=S8}I+^rq4U-jTi8#w7X7LA$wI1fpr+ zXx5EcslWZIv^6P^T=lKrBIC_b%Q{iG)cW_+aE}BM1bf7YgO^hQ=Hp@J4U^2jeaIkm z9h7*=s^B4urVK7=$k}!9Px-hRrSO-X&c85u3`SNqpL3A>_cs^F1@}}DEom%_Vqt=v z0}J^`+}3zmR(yDW1&~h*S;|3c$yJO3tDVW8a~($pF3vHuR7X)B43at*Qzbs0C2A^D z{er!H|C;~fUZNs#TH$iEr55-Owfm}wBVJA>xc^AiYSN!0;Xq9wnM9^C8W&KpYtWe+ zK1j1sp;~;{-_<<*>j_GgM}@#trC9}e04p-pZDN-4q&ERQcEvIZd%v&1KxIQ$NG!3}Oqwf>@@*HtA=*IPKUrw!V z*G|2ENA*{B9&{=`0TbDvEpKhj4iZ^A&@(Flbh4j5c{Cf?a*L%gCC@$V+;^ub0uX{s zrp*_%&P!tTo;#)eDSUy;-SOR7Qd0ey5|JS7-eqBtJ2Gta!d>pqzz3z+H;^bQ{Lek? zXB+i1QiPpMz=tQ%Rscy(R9bNzBWALjb@QLmAO4=AGBKImMFvLclo%A2)!2;_fxr(4 zeey33Q{jvOCuqQ0pkQlnEwH!ib}#N0%-0q(bR#ZJ_EXR()-slk?S(i$wB~H2LiW$8 zRK>KXs(n<49&av0?biOR%CaXIR8OCPx`Bg((+aw!DxeE87eG+lR0B@urY4DJnc}Yj z5JrK;dj_8i&Zas#WU)^C&-}uHZJ|K>P}F6)^BsS-c1xDuJ_neGaL^O;0hGfprfVzk zhI2sm);%nz?az5WQlB=*I(m;G>m}5ECIN{_H3X$dqgJh zPv{r&J=V?btARW13Nh0lEP1wm*S!Sf&TN@5Pd?C7&dHXP_u!|6!>7S;>qug zja(t4BsTb1;I%)w?MOi>I+9bnXdV!Qx^uk%ZWGaqg@VKfH`s^kl(YlKDiowiFIYq9 z4#-?OToZ&-!C3g9(+vBP^91lKA5AXLZi`vI8mh4BW{(MZ#sYi|#8_AYr66QQJ@TXz z{Uh+V-a>8nD$mXHc{F9rV5U>^Or5P|DePoC|NI(j~C=Y8@?FVKtO0ZrnayZ?XSX zZ`nik(BYSse=jub9IWr&sIKV(*aCa2THlAd-<$zX^2lA^{TdKOMT2t_6ZU)3<^n*A<+CIm7vQP^W&4P0k338l6mI8m@$*K{KlYyVbF_3!gvaE5Yn_|6 zoRQ?~+BMi2{MYlJTDUE_ZYrQV*D#T}+x70gx*4d zK1dZ>i+uU;;o3o=H{#I}UJkPL67aZspl2#g*eM3pk3~d86n7&4R42B?!6OBx8O}B- zcL%mX#Cvc~aB%f!$XmsoV1Wud;cz^vH)--IN>oYVF=wr_ror0Qx&9x<4!abg6(8Vk1;|JMDS&2n1&o)tO$XdW&P)Bh$=PJs6W_-y1y z^NZbf=31nHOC%cjRi2}0U-F+hXEajg=y;9CG(Lyx9~l#Xee?8rjpK}#B8_m3eXZLD zvm|J2(3X)8?gP>8u~0B5?>NEY+|=*M9GS2u`hh=Tgc@1|F{Au?B-M-I+6wiE`FWD+ zx}9=!UZgT#jpAtz5DM>rV4J}yl@Dh4VGV6x*Ic7FIgs_#cA6)uHshR83EHTs~wEnfk!i>85L1V#Qks8R(b&s1VP*`u@9xraDtD z>>Pxz(eOFIgTchPdXssy({0%8HHuUWfB>qwhw$R~W3%7D`ay|GiuoE6wLVmBk&kpq z4+lsu-`Q(-%uJ`R?olW}&^JV(*GJ=m_jF5(@h}o>f%N|EWN)dDEy@2D5rY^eF9dQ_ z_!iU82L zNRqH?yg32vD{@T~T)^KkB)Rf+5>(fF{f~CuHRdg#0Zi}|I5YOZ=_xdu`}y-{@qssV z6DNlSs2|!OsI={aenhhd&#iDC^mJB! zzR>@SUjPc(a09;PdvTc<$T?w$b~{N)+XOQURh@#~e7@z_y5A#(nzl(^8^gKLnVeid zI)LFLA3|WLq7wD1Q%}fov_yQ|-&7|=`F5$Hd+8&c@viV_+M!1M2MTj8QHenUV>ymZ z#i1!0q8d6jNQ4B?@%QCJd~jD+$OVBBA#yy)@jlsmTKrQCjF32Xh-3D6iiYNXKG@Ab zyOh5ET{9K@0th0b#6NGP z=NYsG8KWwNVb}S9s|#;!$&CbvB)_d2Vp{a#XxwytrlDpsT%(i?RCDqj7s@F-g&Gfr zS+r6FX!dJD56L8z%^2wEL)cVl;AP5jCZ82Zr(3RX085$9l!Oy*F1AC`BXm;)9I$WT&3ad~bLK?$k%wh~0Tu*{`j( zsVU{KMcl6auAyws!`K;Aljl3cNBzrwhwR`ZPBJti?Jp6W(rS|;}; zt{z!!I_;NUW6)j}=sH09C#@ERihS&XSob~{qs}Fn{lGhZjGZQArh$XQ1cRjCE8vd$ z0L(u{+i&W)@7=o{LBuE{BrL2|s(~fyx~yibFu?=w&Bcri-FW1IkQ?RH9<_zyey;*u zqA`XI9`pt^j$u^1?d8CVG*V)C^K^ej6FvvrkrBWiR}`q6Fla0!(G0RH&BKDSV0-;k zo;JUBrc02uY?J(eqM&UGd>T5m_h_+K>wS|n9yECD$ZzSCjHIwA2-8Jg7%B~cjK{(d zo0uBE{}5OAYwu&SU8?(fYEGQAyuG$akZjVGe=KP}ZhhP8e|bizSLcG2%xzjKKT%?* z>-PIs-wO`v8E#Myg!L@5{+ShYfJ@gAOU~mlEDW?}IKanJQ~l?SdxfrJZ|v2Mg3?wGHII!%@^$ z+InT1W%^jj=xRF*20I&FB%4rf(IEO(u6XI;&HrI4Nb8Eb?aKwfJ+PQP2BOZ$5reo_ zkn7q2d)>?8sHf=8Gn)sY7W~C?JRe)^m+8o9Bi9Bp;iH$w`rGQCyqqa`O^gU1mwJME z_Trk$CHCG^FVfeJ zAcg-IiLcd*w|8)T71`j4m_O3URjZOGtd^OkaRB$>RP*rg&_VNS_gMkD5BgKw-loRJ z5&2xQGeG#$;|j|pErG?zH=QY^%}VrnIrh1*KKx-JWG=_q;<9DMj_eI+A;JK*6wNMR zsk%wUW2Ev~#(3PO{p>ex*V_@c5|T8Mg@&t;X(#hQJ-+7@ziRGDz*dw`u+74a6X z9!+g8&JWxf>RLmm{t5CwLD;`-)*l`-Nnj($9$o`&czv!V@Dt4%jU&gTtY!6=Vg

  • 2`&Mhd&>fZQ{I0X1d&-3_Pr!WDQMbbN^8WcxHeUWt+d)4444MQ?XjWsD`wtbh-EH z@?5%}nq*S3|)C4SzwmSY#ZaynH&64m(Lf8f9p z_qNqIoaJMKpG%HPs=(%?UcDu5^TmFzT(;G`V>Yg3%^yG-jSFxRH3Ih|@fE!P&}ptsxf zY88-zoQGxLH5H9}N~*`80P8gL=i#~>cz;U!>{$%d%|*t@{ykoAZXmU$r545a6t0^! z_LGazPuhQmS|mjF$d_vDe*?bKKnD)xyDKlr9HThR8;Un;zKXwle~;mVBFNlxK#l7e){2w1nGjRV<{YPOx{gt@SQ4dMczN z=s|42Tg1hxneXx2>#{qNxKNijy{j_U6r7?BPe&1fKcIxP0bnj5`0geN=-B8R9MG&s zo;+Y%+59Y*Xo6gw9;|8C(H0tJ(WvUQKYjXSLwUPf5ac`P(l>X|Ov`ryiQe0#Sp^)1 z@Id6#KlKQK22hCnknCt|apR54L`S;Cenrz@%0O`*HstLKf4=GEe;O}t@It^7`(fXU zb4F1F({)STf09FAQ1K97`&bd}0ibQtVP9oA;`&jnQWfq3^w0455=8+-vj?T*qh4Nv zLD5}{0^o>g=jA_lsUeEa3FAD^38F@SNm`rEFU|m(EZ&g-uDhx)8Whv;QuUSq|8&#H zzp3%n{x?_0RE^;YCrHZxLoGygZfMte^Z&63L0`e6aOW+(rYMgA54R;!Nrh)=4Q3$K z+r!)Gua6g(iejnqCYPHm6~K3>U4DtYz{@EDGp)6<`t{F&-$EU^cQNc5tzzIdB}?5V z_u^7f)4Dpj()8C9ENu-Kyv$VV1mxR0sPA;YT>8KfR|I~jnUqfTr{6#i%<9Z@u}lTD zE!3RF-1Yq^ZH;Q>NVQ2p{_(b_R7JqZSf{DZDu+{m@7&3c4#|@)0w#HgwxdbZA4kRv z(2bsv8(z#<#dpDN#A3F?&{I+3opq-1EG^cs&OCs$>r*wnjZOZb2s-{u|APy91m4nP z4wChY*+x+|m3O;%)BH~H|Gso1ifA%z5?!OuM^*b9_A(-Lcsa7Dj}{uwpFf9gj&N*G z)er-bTn@a4107&HK{ywvIG}1A92L~O1ECMPBqb=n{ri3pb zX}x)VY#@N4Gr_eHme>ptXXa%a)JsS1=@h@yK2ssi07IP7|G$2iy+JUC zD6A&piJ(K7_kqlIxIIY@+BSMjR%V-gyMa@>XQ^03%Ogn|W#WGKz=mjZtRxa(Q^T*Q zx9Y@EB4~x|!6By{(ybx~B=KtFLFvEZfLAp*1uX^#eGRG+sMmE7v4_cik`_a~U`dEv zznd$x%dl&U9yZbB6$IepSz+Erjn+p#atec>L1uAQ0Ik?3UCj&lmU_LxJ1Plo3O;eyJ?I~C{Y=2GA_#R9$w6-7U0Ou*_>wqqea?7K$_9kG!}Z0G%S^u zHeL3Iy`bv~RA=NszC{C-cZweEG__p|Y` zQ&IPFyHVq-Wx9N`%C(5LBj8yVx&Bg5lHH@+KVYf<@`VWKla~hqm+=@R?C-i76d!%U zm{u(}#mXaoHCv0>ZfiwPdvX5dc%a1*O#2XB^>ed&1ShGBmK%8UV^eOkvK2goeI9`2rYM;goXQ3bOHP zVtmLwC~k{97mS!>ys3b-mFi0jjNBWA7FR^=I;i-JM5mZfai(5n#?b=))7<_J)C9kQ z{ecyTL?oc;lH?%o2r^Ge()@s>0t1;vdFdw@bSmwLOqZ#|l8 zPm%o*)HXOsB4VSd0>Uw{a=D|*bEy(GbvPgnszKBW>ULQUz%)<>xrF0L{31T1xO2nD z9KG{n1s~`8OLB{58rcOzl%p7)i02z~%~b);jDto2iPdRR>);7cs{{OQ)&WNCGk~E* z_TBc-GdB<5yKv#^LF^o=;tOSO27evhy?c=Ac8%79LZzDU<;Kup3 z$QiRb)lTVS!=a$v2cjFOQ5u%V>bpzWC+VVy(TW4*6bgl>U>C+Lw z=E&9+0~lJNJ*B1t<-(@AeNjuFh~=_B-P%-J{cx4ntwIzaou$Q+cSOk5+ynMkebrK}!;ji@H_}=Vw7q-q8O~oEIKJ!W58LJ}pHT=- zHth0M;{kvO^kRnq-vDLR?~G+L3eevuIC660Nmomu`DF!b6SyKsM}$M)_u}*jv8n!~ zuy$R6dfS}drW(OJ``)lBkAZq3XZL5Ys1Nh69|Wrx)CYdvuTy&77#T@%U4I|!If$T^ zy9&Rl+oqoSg+{P|K;q9|k3+uQuh}>Y2xn|IWj)&T!8*vva$Ho8^IdBCWvyaiK?N=3 zr`9cTIct6!voEJJ!u*F51Jr_GUFz|#uD$_DTuiF$aJ%i-A6b3X_e3UJchqX;V{OeF zlL|R#Q7t7uGPN8w_Eb_fpk~TF^bnBMPLf0dVI=4JO~@jrvK}`5R}+cs!u84i3VqbL z=~Dc6x_n3D6;StnWWHDA4tsZ^Am4?^@D4vREjpSqaOW58EuMOV=iP&_!@d9kJqMNDs;dP;5NdZ78ZykD_p`Q%w_7ggQ ze5Y+)yz-uK$+F04Q2fkAhB7!R8F2tB*}b~`RgHBJN`2#w*}HC_%gzUfKc=K-il4)b zbjmy>UM^slz4BxA&zOQ#86Vw8HO0*X28?3%r{NTh``I4#LhjkKtKR_(!wwhHhyDa& zkJ1xrj+=b~0tjmH9m1!eJBimLpWuH^IoY9Ad1AcNX$ti~sL2BV500c%1* z4*07VF&ni4zb>%zzVHO)8#I$zc#>1)0*r{h>W<&-e*aP}?rEoKD@cJ}s>kXwj|Og; zzN_r7jJX+A0HX5u<|#L&{kIFr@;SQV(DHWn%j-NwY=)8tiJMf*1g__4%NbS1<>cjyER^hx6HCU&owk$%hjp1 zX-gEJ)7a8fJLyZ>_S^yZlnrkcNg_W~Uk6JWT`@pNbUm=Yz7R3~&#BOe%SDBzBGWL} z;G`Dmyt}G@(Si-awD<$D@A>>marf0wb_Ti99BDCH2G-0OH=Cd_BkCdMvwF9**-ZU| z>dYBZjuvt`Rs6TSU+=cji|o32gPK!24B5RTkTKVEohA@2Zd5c z-Jk1m+MaxUL%C&=off4r?8W_j{cx{zrJhox&HC5M|CARq*C1qFPB>06(G6?la26xG zl!R}D%b8xP@3?>uOJ0}w83ul{$R4xFPGvZ@!rnL~u;2SA zz1Z?_d&P>gsXtv!Z+llRxJaSfC?Makv13de)~n)4719e_;`&~(&KuJ&CMETd@3w4C63-K#V)fE^diratm{|*kmqY}K_Y#HZdWM%b+Dqd)HKAN9g zpP8Dinq zPY!8t1O)jaV?Z4F^|+mvTFM|^i@?!YVh$mfj+ruQ#;86|Jwdv7@->jPQiEfBC%=j9 zG<*dDSY#YavA#I~q7A)l%Gsp*{G@jNkC#x=14$q{4?NmPyteWRZa`l(tk{QQRi5(P+=Z}#2M7mP=(01wnrWZ&GtJm z;<>VvjZx6)&{s-#Qe6OEC2r9&BNUj^i$yH?JscB5;o4iFwy4x5$fD^6>xr&Z-~^+US}g8j5=0sN@vW#f)#0|U}! zlg91T`@vKZ)S$N{z}xUj9Iv6M2lu5AAPwh54Hw=a02yAUe0$>goYLLzPN#D+7Cl{$ zQS*m__#V2G-T;7$8gj^71jdzdYf0Se>pAhIlje=Sp!~V@+}u7{1I5qb+5A!lMhz;wfY}S9zmqMFxGOOZ|(UqogT7k(R;@o4eUmjpI`5QO8cI! z%{@>}sySb9_~z|R@$bnPyg{TZhA@{}OsIUwZ)JAKGUy1Wg+}u-Jun+I0alQ3y4%L4 z9L4km$5L|@E>@SwV9a8O3yobB&?Q#Z&{&jl2u<&q)wr!u1y#i>NtbtV#-hFvNdli1 zK)KXe?IRMts|IcXXV>z6P3p~B=kLT9fjOn=%h(*l63+B8e;^q~j(o26S%bd)1M-s3 zpg7IP$-Zt4Cj9FADyeC>5 z@v$nLYl<%=XAFovS&KKI?`0V1AoCobUNQL6kVvD9%Zt;hI1}WY+#^t6W7DGLbZN!( z*M^2Vzs`2SM#^1fkU!i`BV#KS?RNy?#-X6T@!(&R_Dp9PnE6)$b|`}GGGDpf`YT8P zusC8u8EIyZ_Kyn)fC(KcMql_EIvjd1hBtFbZZ65k%=65X(r{*9ce?xHneZz06*ErQ z#2Ulu0e?-E8|WlNBrvKX2l(&c1mjNCsb+?uC9&P8_x^vJDG-R+>FoZ<=wsaarcdgjsM7Hj^?{IARjH23`6yHEk&lHZI1O*C-n<( zLLi5O_X}_uL^n8c8+r@O3z}7dNh&rX)kCJ1q35vS0s7j~u{6g`6r^_B+uL@P?I9$5 zX4M8Yua;L=b#tF&$h#TCVI!>U&|VU^r;XckLC{*D1E0G0o&32)&m91RZAg~(64u1> zpZ5Ze!p%Oujc)7xIB{TISEMWsg)p8pzI-<2e~4ffjCik!79UkxE>FDCIlb~k!}Di3 zOW#;Hy)$Y}XYeVg-eg_66dD>E8s_%aKBX;BcM^mUlH6I^)SC6nmv&d0UYqJGIq=$8 zvT?e6;lc%zfVv+lsqYJ%22sSFoj0;aQDoj z_r^H&4xJE|L?&?EmH5bOcyDY=sbpno<~dlCWQI#@F7-c2HkLrM1>)0S_IBi+B;M#Y zOIBAvR_UFLzkCU=>i9)Obf71HFi57Xws&_gz-hT{J${W{ElKk8xqdh)fu?bne_6nw zkkvqTv%|Cbi?KuZduy97A~O8C>8gcoj%S{^W|a-}Lb-ke`e1r>Rge-wtJEu`8q7}a zZPxD&%KG`HX(z{@z=URLrOKVNYK#yPTjtVAJe?9&>p2LpLb&CU7v~*-rh(oXhjb=UT5{Vc zN;O;)E9W-J8^It(UEww*Zf)Jg&WcB>s(?=&6HLgY$J&KX#$d* zQPkIxuyB5GHrb`qPZxiC*I3?d#sldWs+n*j$5)m&QTkkJ`C@SFL*H>Fh2VrvFveAk zVE%rh$g)X!D&?V|!>Ymd-p=@(_6VI+p^AkLb8p*8(`pi{%@;+gaWdAB;7B=mcHQ4S z(@4JfXq!}q?KCQ=|AUCXvq{r?>l0&3Q(HzO%L8GyNgwkMh_3qOUqqB2p|TlDP0#(q z1t6hDxZ@EmZqlFcae~A(zCsnVWe(vjpz>~nZgX4S0A#0i#mV_sbr<{M_7cS$#378_*_F0HD?oo{r_>B8GnFz5c`xUj zmxuL~vN{E3!#eFPJrhim&lip)9!JFIHy*)}oNZoLJ9BETBy7{D!gu36ciZPxX*oGL z?K3Fgg|}9R_IsJ>j3*KmRPBwA9PH6aS{wsD{bn;&EjQu_NY-tBO>;&N%8NYa^hN4Z zL>9XL@3Y!$gCda~HZzw~PAPM4S`y_&%hQ_| zZ7fk>wYAx1*IC9^=DbalrOhOE-Y;uu&kD2t8eYb6XpcV9?VZ_w<+#3B`Go`9v{4G1KUOW!0J3iUag2Y|G^N46~SaXw#6W9G} zeyK82?bAA*1u&{Nc+Yc3e+o_#@spBHv1%-ou$xfo1r%)@NN+Z05fGPkzz+lMDwZT$ zb8pQif<*$u%aADUbh$N2imgjY-|5V=;-Io2+IrQvmr@99Few|o=BhTbd^B)S2XR)$ z;V0y?r$KhA3`nq+Bx7;s9w^Pxv1}?-T-qk0_f``UR#JtmluL3_Q&|i?+rWcO;uG7l zkT{WhUb^-EE`J;_pvZ7=VnQs~QkoSqku>pDw-sOHZL*&4OKsD%CC;VAq~czPz{|G9 zrK~JEcBS~&wuECJ6^_1|@e1gLQLx|`9OTNS2m|sgy5vE-S1mKx z@H}Laay8gn@9@NJ`@h#pCodIXitWkF<7rzD;)D7>N>J3EOBNi;VjLjfLb(ig6ipqY zi6D`o41Vn=lO*NC6h5gfvR9y!Pq7~mFqR;A%6WA2Q&ijBvVQYTyB}jDFt~1qdw$0^H3@OIFC`z&$Sf}qq`8e>J3sZ7M zog>}anZx_Qh2=%=HJ;d7E^lWu@x_KQMyFiJ$YT7?<*G7LE^~v^in^DPE;*KvNfnOk z0FsC22foI#X$d-MO*qbRXvBvaJ(z?-h?8m3(NJRoKf=Ur6$%qhRvRlS*}Gq3bm~11 zN8--s@NIpm6%xG5cOTPGDvG&zJ%$x2r4qX6=spX6E~Ms)4nvfR&Mj-HgQ%*i*M>`! z%rb%^k^M#=?e3h7KiO~aJyc}6pH8>3;)@u^1Mk{-4evp~WKIw_50Myu4k5)^*Q}KQ zYDYYnoZl!V6A+Od&)0MW)q#zScP}2}!)bQ4XISWv+YHfDT_^9w-GVBizQ;?w__}>;YO762DdaVKR6@My0X()WIf?l2gY(2 zL4e?$!Z4u!*P;H`?6M{jy^}$Bd7VIGxG9(C={p>rg-O-mUmQndRV%=&9lv@ zOybwwLxW0mtMXC0u7_Shz-&n8Cwr7g($RZUTFrfzITWSu2|)*UZYDq1cPDhO942 zN2D&50^|06-cqYhvQ=~c8t-`*z|v_xK+A!Ni0r3?NJ&MsgqU$?dfCwJ4!IJ2iHEl^ z`?kkg!pR_QR*Yg=dZ(H2-NsX%V7-vRw&e)Orl8=JcOI>^wyzS&YfliN1*)vw*_S8t zKa!MB%`0(DKy0FUA#!W$Tx+?u5Hgv~YeL z4eNzS5VW^>4zxCBKrMy0Bx<4B`sgMQZ9}09OcVuB2Ai4aWG_wL zSMn@i%TFL+w)}wNVB0X(;W$9DteVk}Wo*~!{>cGgxeStB^TC=;$_DwHHc>99aH7wo z#-B*D(cEB?_w)_vT$3r>u&72P->^gaO$ZEN$16noDqWngwMKzIi1v;Ko*ejjzDg}! z`pyV_q%Z4Fb+JrFV#j{`XlR5C_Gm6y3greGggA4_kD&ZM(~EmrI^WCHRjQ6#Qc5KkYQ%g&)O)lJ{i@u9a?2 zbM5(g@#T4;tHw@9H=%M9RP?Q*k*lNQ2~_eQu6kt!UVXnt!Tkn3d_GJm`PqLDyMNBT z>I+oAkb8R-spDvQm*sh%WhxAx^D(lTJtn_HilCEPk!ge`YUo&kty6viX_~|W)$zO5 zCp$GL4~phi4{mRdZa%$lgJ!8-IW~G)eeQI9y>XosL8OQ#n{G=lx3+GwJ3S;O{30(d zpZcHi{p&}B3-Ti6km=C%EH0$#2}yfw?b<5AeAVHxg@=UGl>4a@pAX_U)c6)s2cr>9 zTlBT9@eFKRX39D1YC7zZ8*?HE@DK^yA-bfWcRVj(7dnbKgLh>3BPE6%F==o>zKX^7 zzu`f?a2vmjN(V>@N(+kr9tZzP_t>d6AjGDP7^uXc8Ht$05G;jr7wl`+&bo3$M_+=Y zHrCjY{VBKsbUhiY{Te&}E0NsqgIq)GYf3F_VAm9BT*~T|c1jTg+1YcK{lmDfGg|>! zH*;hL$kIaI%N9wperJK^Mp4RNP%3SEFw`--YqaO>#D%AFN(Z}mk?n2Z-)ZOng>S8@Oac^~EVZHj+|=U-N2wJfHVgMOFQzWML?3(@bzARu zVYX_vDOKM9+y-Y92nG5>LQ-l%xk>1XW*e3A1a6O!loT2PgmME!6Ki`osRD>sw|D0X zKyF5!XR<>gsUa71xR0YR?ga6R;~2h)UwTkqU)9ygSJ%|EO?b`sA%WzdAV1^`ZgZ1U zfO0inmp`~$wz0Y@{PqYSvo_06xNJgX(0{#Cg$_iG>T$gEDC`(Ck@7F!c49|<#g!2e zDa@Dtn{EH|+K=U6jda@Pv#GJ(#8WvrRA&%Ct#_%`YS+;Ti*b4=$Del|FgNDIT?a~# zPOLR&0)JIj*cKB3Tv97|DO@}G6aCy7o||aEA#vB==!M!eJU;dKoOBuY6}?J#djH*f zKGooqV&fN=3F##dgQ#6F)S?LsbSd$?hGf5*1E}dxa6?8t;WqTzmEK)@B6xqqpGkIC z0_ft-n^pchu8=PfLd7BxeP=GHkT@^ z5#ec+%83$+3&>vF8XpOmlXB;I#>_q5bn+}JwQ4D>RvMG$wW9eosZZ8GEnzM?)v~p= zh0cVhTr1IS2 z(K9{-pZG6OC<)qPn0~gs?q%`=qJfP(v@;YxhxGL3mXj$lsuqDGoq1vTuGP9-#T<}KghoWJ6ztHjX2*OcAX%m@kd+8(oxk472jG` znSBG@Q_!lGZ-eAwY;2{NQw9LscAb19vGuuL%12<8^UFfgAz-vNz^2C+(!W;Cjs2Yy zZ*HW8ev0}vHAS0ulRT)xe}@d%1t*lk10TcoOK2hr5b|!lv};kL zq9@)Ir)vsFqrS$riuC#1)npPtoWFJWZV7Y3E|Kd@F}r8EZv%uH2b*^#*EFE|a#=bI zAZHdm4kgVi8K2^Q%7;D}@jA~120$Gx%gl@2xhGpwBH<>B&p`KpGE4ZEt-+|Qos*|r zibsF$%^XXcRuRW`fCmfF2jHNVeY*gc#5$BgCB_fY_z^B;@A71k50uf()Vyc&>3V`N z5eTE3R9Idr!vyBZU3a?p|6&LB_j9X!IRE-KJ}&F9U6S+3IQsIPL%A|!rL|qc6cy&S z6|mxya&wH~Ug(%fo;y?}Sj!w}Gm3c(NUaSIAJl;meZ$xXmIXm#+3M&=yj`=me&q|< z83e#WH=%BPf%*obXcXoaqXz3cm=`vltR}U;Px60j_;lvgc=Qa{hVp@xX4n!I68Qj#kH0p&a7N`t(dFXiw(z7m`5K$~3pG zZ{_)b3tm4pVyQAj*vnQOOA`p$sZPOvI(6fR)j#Lw|5Pvna1iabd0ZnUq(Q`?P3z$+ z`>w;rJec>m`+ES6QSgStDNIFUP@RgX@< z4pn`@v+lo-3)-3p{$;*nwjoIg#-b$R#V$AA9z05UYE2aHACq!C8Y=K*yL-l@*O2hj zMTKsD4y+}-gQ=r}$K~H21zU{!7f(A6OUwlr3qxh~gBQUE@F;`rW-4*Nk0EJ@3an1= zk00|1-zl(G5^O;~T4fYS0x$rZ&w5=r{@nt955O3VAG4f*6Cd8|6Y#Q9Ii45s;&IdKs`#Gfk4%sj zH@)u&a(}qu1{WiuKbu21&jGJwwtD%)pVR9PPg6gYkZahpy5feyS=Hm} zAA^M3uBPr59!3~8_>De|>as6`YjA_4Vb=8Wk9Q4$CF{3c{jrqLksm%jnLhBA&_o-i zuH|How(9Sz!Q>`-5?ST9_)Prms&r)==U%5aJ9Zkr@LWJhmpo~oId1(oDaih|hG4i1 z*eur2vW9!+`BKiW28Rx3as|sfiK!7p@J1g5pI#;*y_OFvAbX>GKCwO5Qj7=A%| zN!);&*e!x^lf^=?@z3G% z@8YwFRCqzOSSQ9G(M39uAm?rtK>n*FLt$s0`|84vx4B?@yUGS_XnG;1iGPQ=umU#A z>*w}izYjPo^5QAP6Z8SZ1x4_1BkussCoVB+1yfqUlOhENpqUc!tL3)EaOLf-PVl9; z+xm(W?qzzse8{$o25jwu-=Pwz8BOs`BpYJ1u0Q3b&p=_eS`($g#BvDep<+fpgcy^)_bj4hL zh78S#WlNu}T90_-zq6Co_`eX_>ky}oLFd9GZsh$rZ<1lbErg4pAFKy7gP!ahCL0sJjl+Qra0nfekt9*yP@@*$r>$f>QM(%yz|>wwD$MICy` zfhE!i*l7p2BZHSLE$IBFXBHx#aj3IuC!v-%7oUJ^;URJlpxGOln)E-LlNnVgSj3|% zf{x@2bUF1hTbI|?lz^7E0`2(-W?6=y!<{TC6=22qA8hny>%7GuZ;HNeGeSuxcnriA zSHZcYvm^`jt=Xr?zqm<6(+j?-^I_V0&1dY%t~L88Ny_b=NrkKGbIy~X$a@GsPJkVV zBm(=UKTEwiWNzkX2ct9$jLVeRM>GE(`>IVJWCL^28I&*z)KU z{LZr|5>ru+#!nikl@1nDbsMoM=6)qTb6ON&LWIy4Xar}M8&#f8j(vrnLgvBW;WLmJ zwX!i(DR(@fQG*>D!xIoFK#vL?clwce5;B)aXfaztANG~#y=SD2gCCv3%|=EH0L%c` zDvVn%^9?9!QT{u^K+%Z;5E|!;y}Y&61eS{g7!(dHIwfExQ!)(YzUFo<6n!*}OZ_EH z^X1kp`gT2e++ZgY4QYlI@XidXy@e)Q6Rc(%uLIZs9*Z}y&h57&Bct=tva4@W12*a5 zB9RDuBC?Q5kq-;F?6c$pfxzQO2#doAjavvevha*f@*w;7jj}c1DDYIK>i~e3Y_k^BZ;t(Kg(+n3|n^`Co&Tm=pS#442SMkB6t)ifTD(kISV)z-gEhvQU6kL4Mo=|eiNlL zWox-~_Gn~PvFFNOTRWR{eGT~IOzwmQpAYdMX zRI5Xj)8Rf5D3^}8{``7svIvEQ&GhQyjnfdsqpzAQ82x@kvs)WTJ;p`^jr z=r&Mqg{u-rUgV7l@#TYA>DQEla}c-uPxblThI-}tfXB)Fvo*ZjKdStR*LOq}vu_q% zUJ#X!*2H2~wM?+vK;N_|{93Gb-yT?I;iR8@KJi>w061I$$a**6dVyT#59nbYPN&n4 z;Lu8=x?S~W?C;ZIZN}g7(&M~VEm@>>yL}`uB7J4Jm8EQs@?d%0=h$~HANu#Q>p`I< z<;*0~`Coewwgx!I20fJ;gCfh!%**w&!NJF17Z3r9@LnOS^pyvMbTZz8fT=~mvHOS~ zRnoYNWAOG(0q3WaaDmql=Vb7L-}Ew~QO0&yczsH91N1g*`gwhmbdCqRTTMC7xd;fh z-dvHZSHO5q@%n2xjMmClS3l!Hl#hj@N2*|{$X3NmGt{K6o_W7k9v~DjsXXHW9D3hd&(_qfVWc@bat(^!C7c7@v^@K=riFcV zNvL!S8o|h1pc0*ea+@kkXmcPv0i1ze9Re@R>aTb2{h2f-5PML(SpxP6L2-}J?M;)k zBDBwMUnB5}GqbGVh|?c`#9b?n;wNCVtzEm?2OglYyoS1kXQf+&9wm)AlMe0dPT_we zAE{8j3PC}q_%8R0OIvD%4D(lPQ6e2vMLwMo`9Qp~Ne+XhOOoQ*@s|W1`fPIGjQNz; zu=thm4Vg*!wK0NsuEPFqs!lPi4gDlD>ea_WwAdpP{;_QiGPJph^M6Eo1O;9^>5v-` zy=|&pJ?FqQ8|PHD4}CbB({#lp5gL{5Gp$SXW^SREA3BCQEDQ#P2M13{0wFDdzU7!; z9$MCGQO0HzJg0>{(hMs@qiwBpM!>Pqf8WJWWY421D%nO^S}|br4%lnhlr2J&C z$L(gt%}rB$^CT_ur{*rmq{nAV9VCMmnVxZKWuk|r%*^#SzC&VR?q#adqGX$QnZ)^Y zCpwun*T6hEoMg=d7kf2d3F5)9D;Jp6fIfJ{LS)1z53bblXR$yZfEmJD=$B=({{Wbz zbxv^lUDg!>uh5xlHXD#@aLP>0MLO^hI>Fz?B@Pr>NrzILV?dG4Ve0mrwY`0PVJe;I z#OHsV<8@4`Xg`3~#BNUdNkxGAK4d-mBUGTJ2u&^^yeRB9bn)MwXn?Yk_8XmNiRg)5 zeG8z4*H##(1DC}nbe9A`Z2h*xPs<~xQ6f~MHGc7iKer)Iuwgl`@i`=K3}aEJ8oTAP zyUv}i5r6nHSblmUp4arLY?W6XHEQkiv4;osVMk$bWD;+d-~ILsqF@oeR}J1Z=2qm}e_{KVtDrGgHt zNk8>pC0-;KFeH8thp>%t8MMd~4@i=eoo8--ZMr{L*4E`e72ydI(GnC+q|-L(HrXmh zOG?e9?YP;_umW}pg0y5^>{K@(oj&#Q$Hec2sseuJCCDv&Ydl0r0hfPH;HBbwS@ix0 z)49v;$9lC2Z-LmC9%yNy;10tIbnvo){7o7HqSIo^n)Uy10Z>ioE4|zrzZV#Kxca*J zj$W3j8u}&vmLvDX#9H(k+gm@!EQSklFUDFuF>%qFR$9J`Ez|?iaES$%c9LMYV6Gag zFvK&qw5%o=%ipSucepZb-EdBo`uHteMz58%m^PbUw{j6(iQKzzUuJ(GPta1j|C)0T z(pFW@^{49H_LSku=K*c{yh8Q9i(ZRES0E*7JM;zDQ&-_w-R_6brZ81yW!6B zC>LD;Qv>)SFIvvt95wU|oKDiiF*)_WJr+BaBGmg$9mw8$(ovAiHqoEhxP~9a1C`_r z73G6J3SUw}C;>j$h7isJ_CAaucvG)|CdK{kPpmoNzyEGlykk(4h{gmb!*>k&9CP<3 zl?0T)i;IrksZT|ESglh(zm6+X(^rhYBm1L?J9bm;UQF5&Tm}*;~|!H6-0(MH2*(+nx7sfe=GonZJ3X+ zb`4_<^d!8xllJ>_>gZqacU40%vmw84BeW6@@0mu=eoO+irD`qWr7sVj9}e${ua5e& z{ct=Lhxhp0b0FJkcw6rU+E2rf=rj`CIZ~K`hCDSG zBqj-_jnOPDxND7SfpvcRq7G|oeaJ@b+0`v*#C*Xe_^H6(P-^kB&QO1OI&1du$Krvr~_*2A-`8*rNc7+7oH8F#v=1CS%aw zG*X2A^<0b+&V)y}#sv7T=wqFa#=K^kuc=`d7xB{DPG6#H?LHw8!)etgU**3@w11E^ za5EWXF}UR$t^v5UGOk3by<>|7jN27=B^h8YS0SLbema9v8Ks+6#T_`7?{K>U?z6_- zXr*PBftKZzn{dZaL*7Na2-cz`OPhBFOC4yz5cA=cha_pbU!IDe9LhgMda0q@hUH3m zC;8U$Q>pgN8`P+*meVKBg>f9u6nuL`^Km7LWwKq_Ga%y9l(}T|8($uzDw(7MHgE^Y zV#ADj%x8Q5yvoy%X|v0W-`U-x(?gm0@km2BahP%nE%}f}94{$Lsg&VT#kpks$00k9`XR%050eQ{uOG5i9XYyKyya|H?M#&iBbphH^G?I&e4{fU{t@Fa;vMq_DgpvZ z$@zg&8qfz{gCdmNWJ!525I7Oaziw|Lv5Tb77D@)`gOYmq(u)=f$vx`uTGhVtXm2bYt z6#c12Qka(j|NY_oT|v z%#UFIy@RH`=hh7pu#tR)y8({&n2!Xe)S*`pE^h*KN8Uykt{FF=LAhOMr8WPkkSLnd zrqm$mGH$uEY7IaTWtpc^$CjG*vl%cG_T5}>tXi5x0~Xk7>!G1~RJqkG5l|-YG5ccX zK42v};I)@$T6Go}{@36%xmiCqG(yKi9m9c*K<;%N&&>j~OLCB%EpwP7R{W<+onX`OoHO zr|QO!LmzlF74aqxz};+aSzXZeYR%mD3OVu=^tbdo+?N6wuR218D!N7xg%2^#MDbK4DmKVq7P5r0&mF}AuCRBp+0uyXq2c)FY9LrK86kzJ>YUy z&w-s9KQ@(F@DMHBZjYE?PmrIkGv#QO*r&Y)%qWGTd!$xr<4n2Rkw` zkx|3I^V6+ba+pMh#p=QENZQoDNT_{^;0RFnd=+y8b92g%kt?UrkP+oI+eNRH=x@VZ z?cbH6QbvRwX8`yzNF%?*26u{sOcX6d*wjCq!qSyV&Bx= z+-?5l+gB!+*fgHTo6UY!4Uh6SAM6LvchT~Wi)*lYg({f*(buD*VQHb`H6_^X`Y8$Eir#7lOVZewg386niG}?fEL+jio5VR8#(d1Fh@dD9s_E-U<0nVgQ zZoRVugPEJ@#1{uAm7e&pz&YRkmJ^*4A-~DFU+|FydD-A+Y$66YalQW9Mi}!aO z+pCN!v`J`9@=dFN_3*^26o|i9aOY2zDJNp7GmQ&8Sgk%t@);JN&d>-VtljR*J2^zp z50&#*f7J@_8S2YkvxLMuvcBtr$TQ}WdA4v_Y_(sEp_Hkvv4;Li=2>2rdV$@0 z-_yT9m#1}s(-Ck&F_t9By~y&6HLXXV&S}Um6X-X;$bm?}4c($vqSUev9_ql~GxwX0 zcA9b?tB?2=Be&AxN^)gzBCFly@PojJ_F0yd@oO>>zxYi2KiuytWd0H!uK-O10?K;d z7bY?>w}b9|`H~9ai6?;UK(V#9ys@EJU^c?}b<%O^<)tIRBf-h`)X~IwzY z@`WE+xYyB6^dx(r-lB+rDcX8)5}3ars6G#w~qx z5o=@{(7*XauLxO|$ewj@jZGWF+Mx~rwLe$_xyR){0y$jV6@uA*Z5 z4tLlm`kR0Y2?yLR!S&#$-6;7Gb%?PChA~kk7`RUcWkGXvpp_`f4fu=n-8bBqtB|#PN3WjVt&<##k++w6(#^1CA2# z=Dfcu9vC$Bv*?Y;5WO=HZie!lVg=0QXy8e3l=CHX#ZYpg9Tu)(M5s*3Rfk1l-ygPq z{CC@qr7q!aAa4ZF99PCb$+9i#^#q-K;`z!fO&vMhT=EJLW$s_@G-wmbD}(fdEv9vR za%w6JK)ZzHTCLYTHsXx79RSM^!igF3=TPG3zjDC)4SS0gZz_Ob`BUSlL`jl%p> znOk|=Xw&VI+Qnr^GBvSngYP9Dz9={&m)gJWKj~mG@Nq^pjyES^C*k4^AYOgc&1kj* zbiJaZp_lm~g7^+^?B`s}K0dgn(l*16sv zBHr9gfw5z>EX=95Kl?ahE>7{@#E?zBbPl!huUVOuJi;8?`_;WjF)0bA#H9Lp7OYd& zwbKSRsI`Mf=1;xJTM}uiANN%6Ol6x?JcEF~2xgZ=;KLIL1VhSG)g9<9!m=2ov_Qwo zg*!cY_tSvy2V}I;Oo2G3na^@-0ZlEJo>XAJ%DME&@PR~NzTXWyx7ScD@xC7iFa!*F z-24-=@FoI)cPh^rbHQn(x*g=iB8P^E>_dmjC&$NA~Y4Q($a#F-ynVsj%l?P$az zI^TUUD0j}^5xOtGlWrpAJ~Kp*vsp|XscFj>wa1*tW+V5UMs2g@CJGc9T)NjT+{2d; zP4q(V^VS+O(R|KQ#fDBuu)OVlc7!}HJ6NXwC9iRcuP8~jxtGG$C!Sl6xO%pwm!f_g ztcr+VuVBLMO{|m3t%#iBE{&R~K8if=3w^n^R`)oEY_pn5pZnOCj?7^qWm0U9Mbs3H zBVHg61?n^fHw5{dAk08m{R=hU7Qws=+8R!RyEu^j@_V18t@`Lp+%_fKIUx4!&XA*g zT!Kpm`dLQaTHS+W{WC}{1!2KLa$22@cAGfkU-kwmz11d6n{);8SND0^y*vGJv@_l= zaQ|m5&_vz3Q)or(XhjiId!)4Uvm-TC-4pLmIsvviT$VEM`gcf2mGwR0J9b3$qLQA^ ziKL9Kr$kRhZfLx?c+bWzeZyfTsgWY$MelA%gA$RMiZB=7eB=?&hqoVeWk|h{_{{w)cdRVDpk3;Q47L`f zds;*+u6X6cWZq;SjxMQ}|Kq*Y_T0GXY3-#)r(U{_H4}_NHB{Fg&ncvQtN9UU-MNPD zy+C9~52V2^nBJ4N4@K6kC-N=e8U@jC4^Kc~SWfdjH4<|mUVL4Gp%*sBS8WkM;o7vV zd0R1&c*0Bi2~uvM+#>bHHuRMuSQ~7supoKTLvf7ER5B0L3n%wio@Hnhfvf6md@r-p z;NOS2^hxsmDXThfA%2(^xn+EoL7?}e!=0+nE!Oq!@IcW?3O%$S*h5<+p}TPRm?-55 z2Gkm=dM)hI`w|!Z0I2^72)v!)p7i$iQIigA>m~?rUsavhW~LKG*lq{S_jRJbi$xtg zb!{8rMtHTkbQ}J`=bIx1aD_QJJX83X?F=1;71ADO?D3+rJS|m-P;YO1FN{TLE0ZlZ zZD=t0JF{L?A1m~12J;pg&`(9~@kNRl8(x1};%K;w51H|GS@6dB)4(g=f29M`_G%*C&^Zjs{9ILW9e`5?Y8ppCW0uy}ih7 z*GpFL&p9-&;d6|J^CG45y0(h$`%f9xF(8dhihI)cMZy4jh&mTvTc6x?S;OD zMJW!Np-KEDU~nQqq#uNu_BM}Q8>Jmk(M(d`O{lP=hR;Ir=<2;fz2-$U-Rj!UL`D&2#nS*t5y^-{2My3Z)gR0gLykh$czbpZDG7K_i>kzveHyc7ch&5n4oR%=N|kj`zui_`*Ld z2R!k5iz64kMK(=9OKDu`(HrYxZy|xZGmK{L#_lE}n1Xyu{csL? z>j_5lSL(N74Du3~-9x2=+WWJ~;d--Y;{&C+oC;G{qYg;O6)JE-56d#bjxl$peUjpZAhFHamH-Uy;(>!dx1`ZsjbD zXN*Y}J2e+v!zT8~XpNIaXKsra%kvE2upT7}esW0Fp#<72>5cufS2BiMma;NjxlSC| z$M{jqC=xMP!QSnxJxKFH3uMCZ{kg{Q$FNQ4ckV;6nD13^@CexKev*YBbN+}G=qOyN zt+Mql$8-J1TMnvw@U^j=T@>utRVaT=10Q=l_`PR@y@2~KRgWXJ74bVSKtT5Um{d)1 z0r-bF3>!Eje{$&xK?UknMTxvj^#j+-TDJ|xnB(689wFtRs_`G>Swf~OkcO~DJlLg5 z!+m^utoNG+8pkcx7rMvxM`{v0jB6VO9PdKD0(zxvraXcBA@y-%c;i0hfvbP?kFdw# z#m?P1-4#ZM(>W+p)@hsJ>*LxfPt_RzIIjukag@zP#WxfUI8Cz90sWh`c!fUr74_oZ z&;C9%tavIQ&RIO3_C~J34)<~2ZI4yzLY(;gmL@RsfB(FN7ktQ*1>b`;69yxCru|rQ z9pUG*z;i8&E&D$Z1s}jLs?ZT_oL9zuytBUGd3^9Kc$R$2@5;abZdSZ1_)z&CRW^Z; z>E8{lFYd1J1&%zplIQiJz{g~6M6|1=TQCknY)87E{(Wo_hVXp+(jPAi*`9(%!?2-A z*L6}=EN*T3ivKxp|0|k+#p8|eu%|AJg=c2GVlVN>^C2(dUi1rkpoZ`>j*}i26DWWO z^U&+|M{AZH%LH$kY?aidK@0zB@$(Q6?zIF z3vpc#j`UGzPXDKU`TK`~*NN{nMaQ-A{So}9s%7kK7dx0f0Zr>cm-FvkJ_Y<6(65z! ziY)ab4S|g$%X|I+1RT)Ho%%g4k*WgS0vftak~#7GxF+u-lG|CFc#OC;d@9|O?GJ*> zV@vqb8{~Pz8mzqV(;d>8Ef@!UFIm8;fB!J%Jp4kbMztFEJ9xI>V@gC$`S|8P(N1)JGS$VmW?@r0x?mU&s`*a?#S zdU?OYlP^$na|F@pE>wFt(wC#0kmcXGii)0jav@*g_M}a89p-QGir--N#y?zuf1mZg zjnLV5_!uCPI378D^-YpO;2j3Gw>~B$lu;Re#j{ z>E;@6fG55^$I81&@_DG@-(Ur8a}|eU9b%s$MQ(+Xd`Qi8>5Epip5Te71-XY2Il^fz zNj7j9gQ+^}B#yl>R3tw$5iLM#7Ewl#sd<{Vo+sl<&(D>23$HH!(WNyIpyELK0Sypp zZ9JKvLnh;L=Yam#21R2j1)|jc1_S~r@=ecWTgJS+MS^K3d*L#>;McDiDVSv*1N+e^ zh}DlNS2(lKtmH+kR%c%$3{eQ zgJfvJeS7V_&-r9moqPV>s{3b^RjcWqbB-~`81M5w;i7zG&n;Z}IK!7#d9$avGClXM z$-wI~d7=dP2dpgX3WuJO1gQ-hia;J~PLL2MB<_Hf<0Q1vUdM_G+>#fTK3fp}rTx#~ zVvc*&p2#h6H&SS#$^aq;;OvQ2t$mdb(h9VQHH-vSo5+`?k}Yn~?r9PkE^^$gEs;%; zT!XT|lh3e*7+Sh@nJOuD>o;wAksk^21c&=#mRQMb9kGi+>`im+`@Rd3zEUgW7fE~> zVl)I^cz7qS_dDPs*|UEu1rK>tA-0C-c0Z4jx19Rgb=)n-zclIZR~9+LSO`kl3P&?A z2yX~~|0d$H=q;(Iq3+CMK9kYghP&i%`;?o4$jh;_vu~ig2ib-ZT?I)O%q6FqFT z{F(jGDzCx;vH>t$pCfR<6-1f8I>_$U(NCzL&KozpVX@jINsxLmzAx;)($6l9aRTg= z90DG`0PvG5IXdo(rzSKzhe2&f0-2+YP!s3*R^14HCPC1b`RNZRtW_)iRBc zkCGU;B3bQyjHX{%c7bbCrBYk`tX>JzX1-qVBdnzN7Ghz%mYw%$ePO(xVTpMN>jReu zlrNPDt{@JOe6P&@|o-R z>+ZZ+Zrxl&tgaPNko`x85b^|qD8kO)l>=@-KU=z3I%ypz0eqSDVJuo`YNK%ftPsGteT3x(*9U^*Ma;RDSWa}DufumA?> z#x*)riG%qrG|LF_m!Gjocyho@zUjWq0h8aX?&i^W=wixlEG%JLJ~UZ&r&0of9tzn- zRHL;AZ2rd>2yWew@nbe8Qxbd^d}>vIcDKwF>AJQ2#CvNZApIqo>;YJrCIJuR(ps#P zUyaA{Rs-m&GC&jTs+qt5!Fd`yWQl=GCuHUhOm~wx^#B`CfHpomve&k-2h;y22KnMO zyHriZCyb;;Yb5K#51!{N+T$Q%M=urMp~`xz38?$grzF-YL2NNHI@+h2Up7M?^)=sp+&w~1H`sb&Qa@O4n;!43>?v*X%a9@2g0-EQL z(^piDTHdEw{zxy~_5zh?*iDC1Yq(J=5WI>|RXLW~KwDbTsGZ0jIs%twVuh_4PF;NP zOeN)B^rwVjT{@Hiv{j&~Z#mb+$o;svC(az(PJ0bFg*Tx?KBx@#If`0aGhHSMaa&Yc zK?z^Hn#&-9cNGm>h;t^53LnK>j=>gmh6x1;496gas{y1iZDh}zH~iLPgl9%`qA0~? zLB}CD2`y-$U*95FKrj6wAb`B8s*2#e2y74iU(K-W>&k4>o0OBGnM^51q%va|>2>Mf zu04SIcl#D@sFVk|6xRG44WX>VyHQ9yZ~>MfLpL_)w)beUfw%z0`r>du>8iIK1$WL4V|RNz~`J}BFWJyWT$k9B=4PQWa% zUx|A&9?Sa2(RZb5v2aOJx#<6Efiq%9TGEUN@Sb_=R&095|7e@d8B~U1>dI2PPYC%jKllf?eF3ePkF`RdO*jmSO(}PQt$=d>a|*~FPt6NTz}2vWw^9ql9GLy7 z_W0qKID+%9`acM+0|>EJ-PtCg3FpoE0J2Gwvm~nRMs-T`A--$*VTI$&NS4~wC~(+| zVP1uIgL>CvzAx{cpjkEsv-g5VQW_jnZTbj-!ard-F!!tCG5Yh9^Ri0vO-cNMkkE_$ zVf$L)+nF)8q-P}%)7osq4h%e8HZ-@M9L15fPJezlz^2_m*ifYt(o%aaVzUV=h?{A= zC}Mj04GftHu)vT(9cnuP?7mmsHhnhr?>pTd?{?S3LTOGAhcB0O#p;;KI2bG+_zVj2 zYF9DA8+Xl(ujnxc_>2}9(!3jV>y%vm$=4nVABEJ-chZlv% zO-$N!`59F(MDhv~Uw8jTVqHuN z`nW6Zl(tHi3ndx@qg%)O@f%thRCvb3Op)Hk4>^S?QQ7~j2-+4WMznWArxwIq7fvsP za`;AK_3!8jO*XOn9sdg6*JazHODBF!M~albe)Ths=nML!$8pHXCO8@Vu07r;WYKPO z>D(RX@pJgryC~6BDHoho zeEw4^V~!y&Q;kf#qaD=QklpQJe(o~qA~a%;-!hupwEgodUpbXQQ8D3`D3GH=JEZLG z1(}ug~Wrqy`t?V0oNsC;R`82p)Om)A&;Pl`D`=T-D6;}y8QYr zoY5xWC8o)pV7p%UJ4)O4-DL9H+bn+VLBwd+>f=`p`?XjS%=G6V{tiW1`?~mEH7CLX zv;xQXftPZx@&MRM^gjPY?Mu_$X{`_z{gGZ-$adw@5etAc zPmuTTy|A{hA9OJnZ|w+Xmuj!d*DlMt8zi7oYDF*6auM78&xCg;On5Q{--Nr03|1TnJMQ2B4hG-;ZzZ`|QH!znIFplZlKAqww{_^s22jhs2Xvce zb$LSeqQb{cb0onn;hnIx1{p3CzoOc4k#-WC<=w9IlTjbL4o>vT=ZQRyPT%V|CefAo zXp(6Gh$%;bN3#3%9Am)~YoMP3uv zi3Gj}TeW8%%ZuEe57QoxL*8WMHG>1+JIW?#UfM9I#R-}ngR~!cI*d?QNB9-G6K8K4 z)qDph(2kz;gsV@8!SmSEyja&N-KkN$ehQqtjDe70e)N`wOp*PFltod=_ha)9YXNC} zdZt|mBsT>Pv)!D9caB>W10@FYazKyieM!qdVC%&0w$m4-4?!vUy5(cMG2gVO^lrZA zy#Blo0h^~2gQqFJ)?xC}+T4^av0~)ciQ{etVMbf+0zY@_T8BHwJ$d|RqjY|e#sQu( z5X>j-(`pIL z66r6C!p<18ac~y+(#|Sg2<*g*%KHqUS2^px>$SNwIi2V$nE1ro+dB+0+SGjCmTY?T zpIc%$KIA@uOoR*$D5*gg`WYiJjXBE%$slvTa)RnLPo&gst)K+4Iz}RI4a5T-O{5wJ zC8x0``$^`N^6N!vOY3KbOs?AOJ<>z_QG5LOT|ZE1@~&Gb%|mZVxY_~NSKA!R&(6TE z^U~|>Qj*nJHq&E|t3SVnnv`v?MXh8;)ZWnX88}wc;r*^u*<8VLnJOI@9atjL!4YlY z7@RG7W+}~2piltazVv+aTf$A3y^GzK6a)C#-EO1F3{5nfB$~r{KS(@vvH91MD*gEw z2t1~v1o@AD(^flf!ED~-`1Wm&*&N5+7H-rUz99XTS4Cywlku{Uf#rG2Eh!IEbLDH! zzbBc4oscTly3LsQa#5jM79m|wu*M2#?!KRyiZ~68bm1dn){z_RSBa)1x4-J0HOiE* z8F-}j(_d%|I7vv@&L$@6$;}O+n~{kAjnlbn$B^z;eeM#w4RIU{0c ziaY)5yOTkWMVpfyC@t-UjwERn{e)!FQp$3;UVF(_Z5fC5aEphMzBhiI5rcE=4Y$)P zC92Jh9p+V^41s*xpcI2}+q{}M?*0V%ft(P1>NeJ5x06M1q5wJXZIVs2GS0ZO5;=e{CMokpQz!hn zm3Lx9_q$!FoJaQ0CLO65?UmAuUYU*4gMUog62H%0LnQ9?D>(0`^62;toMFW2NpDk5 zb=J!H^|T(Ww9i59c58OUSSMxnLDWT7&)nT3EXuZD+wI+=m!KLpXbQLxl&szLdu#Va zhr`|%7iax3GOqmn6wGLXGB>O4Kc}#T^H5#KTYMRs+3l{on_p&sV4u|B{!oph49)eP zYxVik3pC2*Pwn7HTyHz=4))6noiL%`44I`dAIJ%MSkfGwr)$Z0+{Wklq5z5j*A#Ri)7o)Z+Qxk*D69JQ$m-WmaH@ThYmaiOpd27wTrp(#s)RA~-lXex@8@g}K`Oo!cM;BiM&FXcRp6vs@RdYxm zAf$)2#d}6hpI_>poR)aCY@?j8m25L8Xns_@KfmbZ&j@aHL6`nPt$=tsYC`OZA+Tp7 zZtE8Pq;dD!ye@?Gm^g6Z`|RU%im1Qs3AO`$L{vAyZO&nZD0YhYNJoPpl_96u!e(U` zbUKioU<7Z7_Mr}en+gLB76TU+$ zbgq8oO}p_({0`lG;pA{Ul6Bj&s~-_W$7o_zqldom8RKKK=a3muUbZbEyOVrjylLN# z{M{v$Gi&$~cGtvle|0`ha-7rfplzm9X+AvOiGN^Bjz&V8)=uRY-=sqQT$=2S zYz!P^Y|KMIKph>e2Y3Y`_9)rZwF0O;en+{%RMv@Ag}5E~<$y~qxFGSU`0^rbC+m+t z$|zRWCB@oPKDvpP1Jj?)Rs2Y3lhLtCZZEGZltM*H!3I zhSi?E;iyRkL_6>F8DU;7&0^DsPHr5n*gV5pw&NMw9xo?E<>&q!$Xch#mo(x(VlY0P z*e{ul|*D=?`;WwqWA{^Sg zsjt2cVX8;#%5HY8KIWKwrN+|fdV3y|cnu*n?{u_8GL{vkI4*_%xz?tJl%nE89IVc} zG{;^l@CZI|@-xN7pNvctjvFMZ`0?v@hoc0^rpIcR;xG0%O{ABCkIZ}(rxD4K{HM(; z$jHVPy0}7dr{B3o^T&#gk#PlQKE$YyCSOOmg@16~@Lg`Tg%a}nQ!poM4F4($kW6@7 zBw+tfd?vwc^?q$d=;b381d|Zdg^fxKpm_J03irhl$g8B$r>;B_S@H|=+jre!Yg}AL zdW{UsXx!lZR5e+KokTNFr#HN3(q;|Rp3z}3 zA1(n)Hv|qI&q}-T8?z=1X4A)Al){WS4#vEeq8~|G%E#R-(th1@Dv&&0rG4=mw$5SX$?bXmZPe#aLTEIx5kFq_MrNA-}i{#tSI~d34{js> zM>@iaG(B^lq8YZkf6Mk7=6o_OQ=QRCx5i2ejG|v|1~6)X!mRus@bv3V!2hq=HM`V> z5@vqay*qpFo=rk4!ma9B(b|ln@W?^UZ>oc7Y20HR;w|3rq+6|p0OD6n0VsaapVmix z600#ut;%6yDte|$8;0>4na4+%m4t{oB6y;%d=Yf|-AeSzx)HI*u)hn~qWPbVo-D)m z9t%Xf>NDQSIz&Rq1nmp2cBAr_KXWNN{O1R6{)j8-7ftb9#(qgU!7bkcEU~8VG8WNH z@(-EpyeAu}3olg(BgOGerN~Q)F{YtVb?B9BD0VXAs;sX9+3ll1;qlS_4?Gyz4)jZ2{^VueY7<#}cfLWF zNyP3fNS93Ja#tOagd_}Zy|n_aw~;pWJQ9VNzNJyFV^XQ{Gvnvw_`K^! zHP}caZMJPROwLv~58y!O8iapjYfSHqU-Rnuu1m*d7l$dNmUUQL8IPH;>)?tA<@=}O z_&wuiOOul?zy+GT z{cHJn$wPwwk#VK8B>-q#eCsBZ$MYpgsi|(%CA9{0KGnXjSL;Uow7NUVqwDuf73#w% ziCgF>fO=OwCoQ*bRlOl*rb-@e8db4yr5g z26cn@Z&aq{Da1v1vqG<*`GWs>403;_&z<5OS-^`V- zq=Z-TNw`bs2|n6U7z!W!*&*5Z56?;{N#m){s{4Q!lX~&H@dDz_W)>@&e?U}h4DpLT znmTsB0!;e}wB9PIi^jjeRb)Q^Q2lDM#I6j4&To50PCYa_PXhYRcI30A=WB3ve%|-w zAJh+%MCKX~Tj_nxWY>K$^Bee~57yV57mbeRicy4&c zF+=f^ZFf!fXKtBTPTM;}kHh)!N*kl!uL>Q^@BNWlW%-V1zbRx9J?*2P|FI&fV{a?& z$9Ip>CA`mtXGFws?CSh;nYOxsA9;N-Irtg!K8H zXBdeGm!3cQrFVpQGfsRnPLz1;7%j?+Ct`2@B+8;z^!^?A*gv2B*B@xZAAGb3D4L>x zKRA1>E$$+*#YJhX?>%qrN(b00r?wpeTUaAFn}&;*UjNRQ{A5h#IGbgjrk-Vq;}tz2 ze)q4hqrE8vUn8_?Volnv^$fEYOgVUj_=$itmaR%H?gpYGR3-8?C!*toMJSSpn1TcD z>tFvd)qu+=akL3|mPkr5vPartM@wM>--L}H3BiZXw6K?%4HHo{{pu!h3CdQm$+r=| zL@x99FE1NB!;E_I^xff*hp#Ak84-V;R)d%n_eDJZ;g#8!Eb$vJ#{qv2mGzmP_R=C#TLIb7Kdim{ofb)uYb`J9+Y&ZlLBcs1ubgTQJM(( zgOV1NsPuY?Fi!vDrZ{NN5`O<0P4IbgDq7Suwq?5Ki#fpkbKiX+QFXXV&nj+`t#|u|l5) z2V*Xny{pOp=eePM3e-X(+GIfyJ@}vld5PA!hLx=OG&6M$v2LG5>^oDRTDyYuyumCV z%7ua$AjW*p*E zh3&`mG!1=9JBB^cw-pqE!LmN7F{D5RjQ2k~Ox|-@7*GKz^s*~shF$@PV?7}*1{l*k zwF__KNo(QJE_a-~aEGH4UO{BSkChN__kOxcb3Uvp()?*dMQ{!D=0T3wPWi zZ#+ZfV9c>wdt5yw{EQ~=Oij(fgZGs_fdZE`B)RaSi4$Vl*%gG?Leaq;Omr=ARDcI}*K5a`W-TK)%Is~IB`kmEC; zF*65wtd5h=zA1nMV9Gwp(ahU>JAr}U(vFbqScCjmQoGo310I}3fyjMXj?=Je%R2Ni z3H?2Q(n_ac{CvN3zIATmoa)V0otV58wQwuad-p0IjWpVutdioOo!%2==$i#* zUlrf>&fg70{ejLztNvqmh(*y8XUSQ38^kly=rtYhmhHK|+MA(lKLxhOA<#ztAmHpC z(i*4@8*#$a*Rw>(Fh&jp&GwaFe=iOOD zKYmJl~l8=P{%$?acsj40i$8c5*X^vOAsc>QTH^h%;|OuWlw%Qf;* z{4qEGs7dUtsX>?V>Pg4-D?U*C*>o`}tDKuh!mvu~WHN zHax-vW_5GrH7f?nq&((w98AQU(59^*o(+)z^_ zf>e2KOy2cfCPehu`JnN2{LA5dx*CSwf9_`2!=m(aNPMt?(`(d|0bf4)M9^jKv3jmn zm(O-ulz}5-$%2G-%022$;f{@sYMz3sZl5-O=|Qi~P}?90wIpB>RIYW)uS3TogRQ=C z4So_Gq_Gi9hBh*rhWVv0Y%|a~deWG&cH!p$P@PBmZDJvYv=ygOWCoq=%Ke@BQPwLR zlK5P48ZV8P(|EOX1&BhELOUtg2j58l!E82nxBi}}5-fX7ilSV>e=pOt$%IvUF)534 zPPlhivBTvek84woo{b%SGi%yX-(oLUYuRk)yGh-m@SxTddAVQxxzt`Rb`wnWgCz#a zF4g_k-JsgfgjR1;p8Yw!y&dY)?0y4e1mmj*)S0#o_t||EpiS=@vCQyF_fwg5e^Ee{ zj&3w`lpn$}GI6DeV2z;JKv25P15d!^_9&=R)yj*mF6EVvb;GjqJJey+r9bV}pFZwJ zVlgTK*a?zh`D!Rle8_^*jg6yb=|czZeQtergv4bsBu{VnOvigjIqc2XA0|7{XdACw zSLDCry)oFuA7)BvfHXQb0wXp(YbmTnPK_)HmD^z6U%p zG=p;4z=lB8a+P(&8u)Tu0qX)TGTes@g)U8aqoEZzT>X2IIGoS5>nSvM?R>;(@MwK$ zu*!T+)xS9LZNsqR=eAdY!Q1S+LGzf3f9L= zu-NDnyxZx6M!f&?08KbgoAP2eF%pIiR+Z_SmYSve!;vJO`)?JwX<5?(1o%?UG_1^G z-kG+IdrJGS{_6fkKV0hQw)-$eci&2iRmUs$)UI<7SzDi2Be7GC@j5Sqv#wc%dKd7y zx*8aBG&>Tq#{DKkL`-|L7}B+w?9~5gv{|*@Wz_%m{07w-sPO50MhvG>kAR+eL8sx?pRbvdubvo|AJfelYqKnp z`vuGocsz>v#EG2lWY2+82TB5KMbjlP8n5jlXZQY;rB>_6=wuTO5DuI7^6kCt>BM}Q zU1x&Atz=~Mf`+~wqyIv61$4=Lz|}vC9@iJs^4x(6x2Qy`98uvv2tn9xdGB&}#$@*C zct&jZTo;F}8`oxWgGV9l&AACZ@rq?cREs5(E-IMV4S(=j|5iy5wZ@_yBb&`A7@x!m zBBVx{C`Vpw?lE#-xPYLEh&GNpGzy~nm?b0{S?0CFx^BNA<%4pLP0t%z4Cc#5ryHLe zRn0>gJKn7<(B6`C9uu>b2+K?jH&Jsi8m-m&BWGA%BN9Yr+$Yzlla;E^D_#3DkY}w^ zmZkzTnyn8Tl>whtp|qyHOGIv(!iP&C`s?NX6Y&1Ln6I|GT}Fo0n2>Z`82IqYh>uee z)>e@54N7#bMQ9p0&u74lGE;T`pc=_*cvV_CSYfScHT>~2 z(hCS3>n&XD=nWx;2Rc&)>ZQ8NmPu*xZ`V@=bUtV143!e~@H1#p(Na;cyJL@lm`e{7 zRjaSwyAWTbRY0Q2hFvG-(qru$)qiiqj1@k4fo7>M84x@_@6$*Fb)hcqUo5!3&N?`knUk1E_`Pk)@^D zg+@t`Dd7cjuy*w(FK$@+np~3#q>;q9!IK#*DHSJC2SysZ&C?Ze{vi5Ru?#^7*sFG(9ah~7ureHeswUES9*8j>;pu^a$l?WRGn1Sz4A}r zJ>|g-VYtcG#FjI+zu4cBp+`MWL$QeLSf)naP`+Z2ddm=rylv6d5$+*KW&QR1R!(3k zqD^L%E{ZGBTdq&2MjE-RQ*IY;{m>+j-=-)a4*HbRQN0xkl$3vVW-~nU%U7X2_=9?c zPr978fuBjsrHbzrM3>#luRIB zdj5KsFQ=sBIBk)Xm61`^pPBnHkNC@Jl&(B8n!=1RE0;`ltm<0K^;xxk^3jrk`S0-5 zjNg;>!3e^TySW8x;kk0y*hTE(&^JvdD*QM4UX04v3@7S;%2vw=HGl~1o~y%U+pw|0 zCTE`ssrUJYRb+Fn=P>+|+BwW#p9=MP3#p9D*u5zJfCI34YI70!O>pJCxn9exj?$&^ zK(Bt?uxBh>*fVDw?>>r~Ee$UUB4!ThIw6Nm$g|HXyu}1vP$eEN|3iD*D`F_YVt>%L z>15qmMSotw{Mj2?{TWm-85)?_9UBYDEX(@AOYEEr7P z=KJ(vtaCtHp98&u^rJYx{q-oxT`UPGzgsA^dn?ty94y!KGCuaIv%x3mpDy7TsdE;u z(RVMNb4Ob@M1m(Yn#yoP0zKcbZ+9C#N^jg@d-?IqK&>)Z9M`(z{a&f#m1A9VE60X0 z64lqhC9!WafC!0}Iz34K_i>nrSX&^>42h4SbKvt(vCg>5k5ruw=Fp3q4yUwkhm;FdCfwfV>(Q#tlW*TJ{EhMMYhP88;Ekuc!HY|1ag;F z0P%3dH}Cu?0~Ke$apq(QZ+$vmv*-CcIzvAv8oFy)pjw`T&C|+o0~F$wY-?o$P8_+! zN)kF?xZb5Yrur%UmerN@r z;0#Mcq64^WMiTy$xL$P8+TDD~R{=UUKQ!_`XbsvbYyw zTP#mF6)uQrhQ1&oj;7;4bi}dfo;!q6g^dyeo?@~2(BT%T%tJU^g_{Qzou_32>;Z#v zdVKh>?M=^t*i;W*Ke=22!l!xFBUid`4qmF3SBf_B} zZO}NEg7h9-_*L9l`oBL)bi@!o>qBqiL%Iu47|!C6Yk)Kp;So=yI?5e-#80lkZAAX~ z_R#-5T=(TR29vuSR|$}S!*u4Mk5X)L!?)!y&0oM;01PDcqF9`??h#zVajpopL%<}3 zJlsZMoQ4v2IF#<7Z!h^-{A3{&1F`Dpgh7DTmN2CBT#50GFo}+yQm+<6f zR@|w-Ukhn1!fi}_c2rHe7zVKU?)Iwen*bQNHDch!eOrN7%0l!SIRO>^pDRB`0KU%m zi76L_sR0L?Wb^ymaK4k$9FGY1`FOt?>re5NbJNWuVx-)j1=`xJq18?Vx0IZ{iw&-ifG^R%vn`xHrwqdjz= z9|`w4`_)k==>m+8m{Y@plN8}FK%ahz^8fu&BH|iAn()6HhA6i50cKrFU7$r4!nCJB zUzvLOKc50Tl)A@o8wy;JKaR1&b$3Sd+zXWc1Q$-i+2L^V&{JDhgl`iO=(~qmB0N1V zvjB>h0MB|2C|==ud1$3hC)|cuQJlsM%=CbEy=$tZ#GG*9mpwucL66U#*~4dRJc@1* z`$A>ADSic3=mi42V<#ti?ypfJLSBW}j^@Xksn0U^;kuoF2fjN`dl@cVy2VNF?~fiK z^`>j?!$|mznW|q&{B@6sH9Gg!yDPY*gWNZLWA!REA#tTADd-4;75zYo1j&0E)&&_o z*O0$&TJvSML`y@8ZUiWs|V{o4VLJ92VeF^li4M3XbwmEY)}qtzk)at770?+_Zw z_HgZt5a(Sn$F)o-GBZ|lG&eoPHSPd91ml}H_FDMR)2F34inO5~NnG@N)ZZh@&3TE( zxI@KlpQeU6ry^+e=nFCvX$}s;xBKn3+)2E3ijFQKE*So*1X0Y`|%qtCGh6^+$*)yy`KcQxj84o!rgOM;>nP4 zmuOEQy;N1-hhAjMdDRh$@qs2JA(7s|V~Sp4y+@7Ppb7=VQ1@8 z9fw*2SE&iTj;pbcl*s>n#RzaL^g2E$JBUc6Z~2dByF}ZsyEl>c-)kIdu^DE}>PtB` zi%7fo&8w?-NM!a(%X2pdQpbl zD34ve?x(2SmUX%$Ta9urPuc@~DEE2EZIp(fhd|A}3TwS2pG3(k4eV=!eqbGtose#^ z2FO-VnXPW(=7(r%yNW4>crQD3YL*+K|5{zzKt` zE}^SIq0AKnS%nx(9i|0RpWEUu`%Y}~+X$#%pVbm)&aoT!xkG+J-nyJM$FkI{ z=NIu#IY#)K*F!qS%8au^(?2e1X>G3kP`2LR-L^xAho9c$;)z7#D&+y+>x3iwoU37_ z^V5dLMyu_WDQ$GNmE zW#)D0{=uBcg|256dcMVzM{segrsPW~ckF-Gxw~@LdLuhN%P|_fxk~LAi;tbY z{D}an0=nM%7%K@uk%RTRi+jem2}+5_U|2`INmP|Dn(9Z`E8b{R&LpH8IdSx z76_SeIbiEmu{REmUEF%8!kky-(j79Mx*0$~@ixI zU$91}9ITqcQ_$0^sh^f||1GzqwzuFKfPKllTlOj`9K! z1?WS;rXnNr9BRXb?1gdfRYgt8CDA?z3rGs$uY7Pfh(Z1FW<5#olmtl`n{@ZPpX{5-TU`uCjDbw^m|WPxx1?PR9Xi4 z>eq_@0O&xqkv|4TSPk&LG^nd_PX2h8e*FxQF?U%|OSsL^artJ~9vFbB6Z+~e7$jNW zW1@erB}B-x=K+!#T{o#+=3b25dg;9NtMBRyoOotz{>`5{t+%#3bAR)<Ib z443mxMYXSb9us6iDSy2B#eaaI82FUhOVtN8XAB(tbAnl++}hvA+cUnuf9DP=KnV0p z23#5_-Q`8fk%G!OZpKilE$j=hS^Y4tC}I5d0W39-p@4D1bE~Gl#?|2M7{jH&ONv22 zC)I7tgKN~ORBjv)E;-=HIOkno`Q4TSMPS$4A98Ak%|(2=xPbujSTold2P6lS8~x6 z2;BH1!|{>Cjm1|lL9a6F-qYT^M|bLLbn{VF6_3lkpM>l?M0{fi?ow0Cu^y|`_dMvJ z9-{MG3+5il+uZTl9L<16ncVhDLo>hepgxN$c@xTE$?E4<%8cV9i+hhsey3Fegb^MW zx3FR6R?t~L8pL=$hqM%)^&@4P7ys3jR|k;9?TXBHLA9nzLwk1(WlHLWKe%U_dhau% zJkNFynLUKl(9D6Ntg}HwDK71+pP$O^a}z#&Z*5+L-(qP$!Bn!ME%$EFo~S<7j?YY| z%qH416V{Eo(!xU5(;MpnY@RczPvRiz<59I{fk76Zjm@O*-i~!^WSb`}RQeQld4e_D zzqL5*`Ap!C&*eOpawxj$3x2&^K6|0!xUkZ|hq~HJ+Ou8v*?%NRphp{C(Y-aBA7~(= zV#{j?VJ&f)k2O8FThbBQUYHjmu!^aSrh|oJ)JenNB$=L-J}XqXb#{Hc?`d1UEVlti z`TO^2`T?Q(3&os55+kB&~7l}4y9bOf|WoFi;mHMq% z&RRatjsITSOSguhD%$ZC=D z!2e+q@C|6nxOg1w@4=pv^P$lx8#5}i{#J75aw$!f?d~_80#qcEM%uhQ)-I_+;^R6n z7nND-3B49L)jS<2OyWn>LW8w8$D*TWNq5N#-E6cswx|W3ybg=PHZAeu<&g0>&sJ3l zrjMJY-rFk`Y^%`P>4u3MamDL~m=p35<%onu%zFr;kxKqW$JUD7t|>s`qp6C?N$U*Z zu)aI}QRv^S8xp&xS^wbz{AX3Aph3d69Ma^EqqcWZ4KVl1tteRRB2T3o+RjS8#84)f-6&*zBW4CcC*?IcEkI+sh3A3|&v?sm3@!;&r5&S%UvuFp+kkwiT& z?>j)!nXjk|5Y zlKMKZXEm=GB+1;Z-WbXW3=C9ljT8Ye@J1GmVksR&=~PoqOgH(iIQ-O4|jU}mYFqpoIyB!;O}cEd(%X~>+{pz7wjR-f4K80&l$ zFv=`7`eMkxJ6>uXHB@GsbDwQC``53F<@?aI(YxGniivPe6j@U4?DL`IN-}*kkWe{R z(Xh&M!|k!rRZ(u@Cxl~NI$i72!%#`QbQu}0TKsH}^@mlfr6ctC; zhkoeD0aWGq{twmXuml*W!_?-PbSME}a!_lhJj8=~nF$yBTIp>433e!$mZRm{Ns+k# z40Ld7Z~RwD{J#!{%fL(o`?{I}a2>{O2|NH%s;`Y}~ z?J2ehIFJxC3rE-_l-f=s;ZAjjK#y~TF-oRuDy%1z+TOq43*F|(fzOu6Meh9Q z#(`(C2&p$iGkDrVjUoB=9W|{qhY1o=&&@<>kY15=PDF=Sz}@q36^!>>pG*HHD;7s| z>^>Mgb>#m8pa5WpwxH6J%V-$H@rQ%wh@Wu5&AY07qdYu8v)~zjRN~E`7bBFILYl#+ zkx-%gyQ5Qm{Jj8#>l_#TJEi7_SD7tYR_u*@~UIaOMr6uMHkkeC2%k54T<6ou@I%33Ehy-Pud$Gyjfm**?@B!t{un=kMK+PJnDSgp2qPurXfJDm%17KbIXn^Fz(k&nRI zQ*d`bP1xE~xEw;#e*kmZ1xyfFCC?U|zk3Y52u`ezp;fqM#hwrWvhH*s6s4lu0BBK% zy!L|4O$-UmOpNuJ9L3n!ksB>7EmobMkg$`c0wEt&0M@HY<)f~#zPo?!-@S7u6>V=w zy!rAaY4yrCo*2ks0uEC)U=O1n)A5z_`RMRy3>-oH`Zrj`dvc$u6?DYXztDsU;6DB8 z-Jv&+BTR#%m6)JSwH!e&S?>4c4|!Q4SAGR*+>bszqnc69Ela%mBpDJEvO&t9hL0)3?7=ai3ossi3IeZSE-8S`aXzMt zP~5{aoLgy~&-OdmC!C`_p~@U73Zmyp2jC^3!HbhCkre5rl;G2T7`qE6E#^fCI8Rbk z?;6oX@T2Yol__*pN=S)E<>47B4k80{CiZoc9_ja9_DOMNy=CY#7M~Zfv~iqMsxc75 zO!ZO*YqyxxUFMywRa9j>`BSw+LK!zA_dr2mue_Yk>=sKZBdXlGzr7TZ1%}jAxsW^X zb13h#Rj(YhL^Z+PVdSHYO_}NA314HZ0eFXmu!?IyG3!GA$2Ayc)wyJa6)HUYruClEvhnb3)KSU+dgObsx0wU9x2(M%)&F#~ z8LpLSJ@5gx*Hbtpa8>^{0N4~T2zhWv4_@Xzl2wl95>sdE)XfA-}(J0t}o%skW5d$ zzOIUjO8(CuP@!t`zAo9I+A!43Eq&?l3MxIvgEHv_pjJAai;+0n6$QsQ{))2KfIIZRoj!Ndmo5Y4oAG;pv{jkMBC|S zZ$yJ-?hRW<^+^VfjB;MBLQJXQAkZkOs%`(=Kx}@F`m@lVUB7t1%*F4DoOpMi&0+7y z=F<3*CkmeOcs^$^YmRA)YX14X$0ntlj_=Ojbp~ch<>;_)u6l@ib%y@JlPnqj__?eh2u(n(Yl;dW9czjYmHk(72&zerpRJ8(GBnC64YV z3PJ)@e_f+z^6(3_$)7*hR$2wn1`8R4cM+B^CqC0$nraD47a%4Ac-iCN3WjScIT|9F z0031&k2_q%-k_nW2|c#l$95?N11?DZI@O*LgzXwAL#V5M06_2~%BfC4V*)rLyh#L& z$Bt7o^P#!*>o_tD-lpK4<$4-*fPg=ke<$Xj-?n*DRr=xqT5Br1)TEG{-Tb~kTL1J6 zAY)+oa2uN-G^T5i$cgmYfeP-KPjlEge)lpev+{9=0G)C76t9jUk_V7`nWm9v*8_w~ zT<+qRbg=Zw6LYQ7Egx}o%07rD#knT6kD;IPEanyOCi*c{1jKnp1~*zkGF9($Pqbwi zkLG8pCMtTdK4YFmW5!bnzB|l-3l;DqI$}lc!OomGubrrZHGs}QzeNp%UlGt`{6Wy@ zs0pZc*XQHaZESKcDSX7P&%dL8p$0qm&1rv?!#g)3neb+gydbl){s5&6PB~@IV-0Vy!2>(ul21zM%DglWXY!vRqqGuyFv9PT}z8zJQ%8V zYg7}-Yk0YisGUuZ)q8D!4qfU}_)Wu_ z)1>tcJ&rlg8i@x;r~m+?W?S9NN9U?x8Waq{wwD9ve^MLQ- zmtY{pBWP4Uujj>#%j^T;V-|k-2gv$H-E$AB$PR#Nt^su25kKw!+lP?XA&`8eKe8V) zJKr|w+d)ZWfz%@ zJJL9;-8$aaS9;BFXZE7cYMVlN-g-k#x`T=s++q|QYa@W6-U-1cdK7l*!+@l|KXW~ENhdQ^KX`0AC| z=Tsvkv_d+$bt5{31G2OdLF#XG3u~RGmEa4sT;>ga*VxlImEO2P=nLoy+xnx)e*ZHa z;2@X>Cz({X6xhA7K#`1yWKcvh2qFjyNX`hTNX|JcK>-m^K*>>Zme?dgk_xEgoFz9&Xs`*h z`=0xGZXe&7nzyEE>ihFmQI=(O;nn9l=j^@LUTf{xV`%NU9wkdIK3sP`K3HVZ*|H-` zdmm0U$hg?Wk=hRl_7`zhEKFaNAmE7{YkC-|HdtC zJE+RNXE{$o?dto2)$0UhODAq$aH~?pfZaTEckCD3C*N)4eXg!WEbzQ3Jzfr{;dM%{ z!yl%R$T2Ep_#(QPG>7A8&y-+AX5kAtmf@Huf3mj2aP$~!azeka zjXzl#T;kbHtZUzOL-BB)@S;Q=N(^huy`yp~_e?gjgMnon|cmSLo^f|1v9_6kIo0^@39B+c-c zAYX0PyU}WU4ucM^AH4HYE=y`w({^Nnu%E9(YD$02%H?4Qtb>&0_lvH`@sStTg@p%q zoj0bMnhJC=hCD{SUaNg}9gTC5;)c^0=#ibBVE`t>X+eHW!XVZkd_0&r;WYx&ZD4-j zdWDhSn7+aJ$D|_CK)uZ zCBdp6%@)VX&Mm`J=K6I0~P;K;JX`~_t=!AzwG1C6W zIHTncE37%q%}Z&nXy3J`J)e=0ftf&O1>w6m<_JQ@fZoC<%UejQdKWn!?4F zw!lFJsAA*FFfT#%|I{z+9P@ux|5!LG%(*zIc@pBYqTViXs&|3ExiW#MwobqD_^Zs2 zTL}I?FckT}p{YMWCJSOB;SAF z%YQ?f|B1u|$jD$SF;r6bjBG5m$-H43t!S?=C#Tr0hvDgexaz4X%{g4054>>ejqkKB{dwLu2_JY)lU~CiOwtEfT!#?;0_>_I7g-;s|NIJA zfz#xGk41XkKPw8tT1f&`shr4wc(_E2Wy|RPTrzAL`0OU0g9jd9i>cE7QcHx)(+U_s z^1#u4c_XY@^7C`iKfW9mjvpX!C}KA88_YX6BMdbbepxopvHpJw*Z)TpuK(xAtSq>G zNk`=}NgrtFjH|=Ym3gTyv(KLxkk;AUv^xkYAfs_$WXtm-i{pPp>wnWkL$YOZ@sOOH zoOEO6F~UW|>Fw@D@4RgJq%5UEs=YL7Gcx-vW+zNVb&}5hw>A=B6@DK6_3e^-dHMat zSxavu^hI`kMPQ^Cz51w`a!1;vZSM59E2>0cgb#YNmkNEmxQnk4_S?;Zeyw`L| zG3-d*2uIZygwv&#nB*L$3;QN6uZ4e3j#j96)U64#F(ElwirAaGJB{&Ju|`KSSPj0< z6CZ>LS=0>4%4KC`7A`izxBgoV2}r?wHOH!fIalcSYQuH$=brS5rN3EBPgr+RVh``T z-ju{_{3$+-Riw^xOF{4ML4BXyCNSxjpfC0fqig(NcIVEWcc?{pm{IEzP8OCM5YD=P zIu4Wnw-;c-0EO7$HeAsiK*>!Ne@E_T=IaaH!d;CUii;J0|8wzL-OAo55?ylqUTbF% z>Fn>{^wb{P2egQ;16n%Eh>b=qC`nv*x@{-?cqIIxw$UuozlMQMN9$Y*3XhawY<;D} zjO+_~QT6A~pO+KVjZQ%dj;tz+yW&3OTMcth^MMfXeSH-IWgW1&Tss)tml{x$&SPh} zweNu6#-N6(WHvy-1wdUGZS+Olia3`C&2CQM070c2)qM{-%a#5oeozft10N0}p)>iB zn#?@?0-7Axl?s7b{U5ytnQ7PoxY7g>%%Q{p`%fP9Y1uupihrrqaS|)S=^*VmYIovL zE4A+&igx<6^tmBr0J^F9z*_nRFC28t5<1IvfmOUTDKOvC0L%z=qFIA?L78X;vR05A zrfL@(=7Byq559DQkZ1Xi!~INPHP!YvPk7asfQ}JTop?>~0zZO^8&d~>Gz~vlcP2&H zp$SBT{e^nYF!7})cpE%(zhkoKPJ78oUWb>Fk=npI8d~fOBTRx;5Blxta>@VvV&Fl+ z20VQ4JATWXAJ#;PBA>`y#(YD@Z_X*^4MmRxb;|tAg3?Ynjo*3$kRy<{`YvAu!09&;>BM0H(Dxij|V)Paw*e7Xxv{Q`g<0eHU=66g)tZvVTx7a$*Bt%-jfoD9q@?T$!yV1UtB z^lx8!ZBh^+=>QQjuwt?EKVA{Mv3*uxAb5w_)$zZ))bMucv#u^}AU!9vL%iepX_hy5 z(V9H|c6V!wwBO;l7cgVEFjfOK4On1to&#vF)o`h4NATjFf( zj_7({BQwB=^Lx_HKkm@}iic@!3XR61%l~`EJVy0e)1n?e=2YxIWNz9(4>X)Y!4X-R z{(#G6P*rF!gyTuPwF+A_(>N!|=04^atoDBUBXp2&YJ-#vgo@F1jdG0C{0wH88q-?vanORsy^(|K2YgyClg4vJdKK+jFOkEJg|tG% zI5T{hZ%N#2J&<1n61}+^OJslEtKyoB^d7MS8}LPi`y*$$FQ=Bv-Fq4)D4(R*O-uQ< zpV?i8qXo1)FtHwYEDV0Zz$PrLO?Byx2r@knZVxvygNX+3YA1l0K+kzb<33@2}5LwlKy!{EG}WVaUmLMX)Y$I^>lrx(k#LSUacG1CF^NV2CTCq zt9%R@V_%45(cMFzO^(R|x?im}B}k%&@M-(Mc0*6>kwyC;VbC)f71Hqv^RPmtNv`=R zz{=M|=sRTo_BPplEzTf)Y%U3kWRQI-tvpRqtqe4hd2gZQ1Dv7r#o91NdxQy5(9ZN3 z0QU(a**)@QOsyyGAnV#y*bR2B58OFO=!F!E?CS-WRHTPHXkG4~`K>h@C_vsB(>!6w zr5X9PE|xP)5@nX`Yhfb)<5!c_*C(DCngyDLX3h$n?4o+9O-ErgcM_LGxU~zHyaa(> zGHx*mCgnBj#f`tLQmebeNbKsCF98;<0^<2dQn*HYUOM+9Q097$mzy>!pvLxEh}9YQ zTc3Fi!kYn*O<@GY*^|CeU^5F~jJ+L$bwe4DOn`?e>wmZvI@LkdBYx4LcXXsu$<3Qd z7)YkpWFIFfI%T1f;`!Kpe+UqDhoYRqbNAuG=JSvqpnTRmp?`m*^&N}|=0)q7HQ6V1 znURH7g{pzQdZI;n;vHM%=@d?3;ol_ecS8)Sl5gr@UW#zJ}0`a!p5+glol)v{Dr@+)R_3IzA#;Y9Z!#h_%q(qM@bD zQRZ*xudE~tZo@&rz8ur~thZr}Aa??w8e{sP4)?p!5~yR>tp|>0d#Uu#A@)2MgjI@2|Y-_pAl`MjK6Q*2eC47r;ynR?{)GD#K>0Tp?u||nNbTW`K#|1GQaWDbA!FKD6D#=Pk@MGUBpz_Y2mxuK96Sn0$qdsV@0sz+_ki@h&*y7a)dxXUl;R z^h$qYq;Y+5Otkfd(3TG(XmpC?+9+SU!NWt0qWfsKTU9jj z%kc13zZBm+JsmYu_koeNevT9m*_1fmmlE60;Pbt$DMNxb!=?Q@Qn3XuM#3Z*{p(@W z#2?y)sr^f@Zc2_kDxG%>i7uobW@h=Yv}#cx}#0sbMIeT8;*l26OF0mjwmC`A**{ zF`JMg*LPjdv+JrFYhL#NKk)$o+6~$3cgw2Vq#_dKV0e6`)52-JC0jD9_bDZi8}*$W zqoiXm-@}-OI5TbY*UnQ?!(43N{R*;#s^QRjDPMH{5BB&Pf$^$}t%T?J^=`B>v0RHQ zU>A)vT*rMh$_}Qde3#Cg6KK}z_r!5iowhTTeL>8pR+1v=o6?k2kNvNX>;}~G4bIyg z`cUgwfi)e#q3;$`E@WEv<*aTs`tCw{T&p%32lk42CQTi#aG}*95QPm1!5eVF--Y&H z=kZ!a@A!n78^`$6K7fV{y}K0=5mEWq zqhf>@rfIwaxuN^|gSEHRG!h2cb@&T=-p_VDtEaTq?5+_nK#MXHTw4@j!qCRa-4SYkb4L%NQ{%iyNixg z%-wH$?wNqq=XLu%fps{N1pqYL+1Zq7OVF%HPi{DBT{fAN)J4E}8e|KB+|1?PP93+ zV;?c4{rK|MdA!)B0W3)n)1L56An*Oe3A$l53#K1ygWKkk>gZ0XuO##$GQd7@N0*aT zYRkvD!Q}J&f;zFS{S?k~qhN#42^u;SC$P0^@`Vdz3V`m4F2Vn6 zcmP_>CcQ4X0A#*te&D21th5fqTUVw-Hda<4(-yoDkxBm(Ulhw2AI>m=+3IIIfAGOy zm-=GhU~lWOrux^0%dczvwZd)WF%z5zV{6Iz09Q*CFK{s^GYc-!c#}EA;n4f7ueXbB zIuc!Rwn9j)1>PKUcYqMj9UC6tKZie26;k4b2 zO8i94?Y_-=p%Tj4THFzD1koK#m5+8;()@A@OeTfa z@O}rxBB?OxiIf9IpOa9{p^hKUh~BK^-^x86@lzZlV{km_p`uQ|wQxHBUS9nujlZUJ z@kDyHSoTTawu|#4>Y2Tn;rwWuS_uAHzzARITikE#ra6UCn1k}gWi^@4p#=2aYq)W| zG-hK(pR|`7p>2Sw1X$D(I;MQ%Dv04wfda}M3ALXex5fx+;3(IQ6-kDB>dDyo|J?-< zL!xPF#>BMmT7(C`*RHfqzU|nwzC+FrvtM;AHPZlWJ-s)PXM$3@L3usB_;)(j^W|0zzYo zZ-u?P<4Ce22j8Gu8y~XcyFZu#|3fKG?(1Unv4(zefT1R4%Lnz(~6x=X&bF}*_dplCLi{I_ISbY zuxbxnUUr=}XSr*Q} z=lzyp2x%qYk3M=GcVz^{_(SoxO9M|IV@cslUM@4@<$X!JUN_!kKxCKxm4^Dy-Gm7x zMWuVI&Zk$xz8}1QU>{S}#<9FH^}}+T%J=GmpxH;lA++SW8MWo^>UA}rLsQg4MCI_4 zM+HB(HJwwTt!=#78p>2aR}oIAO>Ng83Z0`bW1%{TZi)3PcH<)5ZvxhSWjnxU2_bU+J3X`An?jb_KxMv)g^Qs4HS4qxG3{PZR(Dp!w!eqkfGp`gr zlID#;vpBR6=bBGb^A#SU_i9ttEV(O-W+%bK1M|M?bE2C47Ry|}Yrh<8^S&?#`=F?! z!9sLEB!iXBw`z;NPxdD?gPGjzmhopTaZ!D>DVEcA#(Ri2M}Bd8*UX>t7}dSE9*~<( zk=!teeDV*7B2!AZFX>QikhtFZ0J@Z#X^(G9Zhs2>#K5rLGlYZcMw71AIUP__P~=e) zsGm)JU6_@X35Ir(qYaity?okj4~q;6oxFjjmN#ZVZpUDgbFA=_L zP+6i^Ky3gHamc~lpAP&cjZeHqa(VY$3m%AVrJF62>{pH1IrEy@)y$mIS#X&(ORw9d z-j&epZnU>GUTd`;B`e%x8Me!;iW=(^?@t8SrD4>z+s&ufj2gLz-GmA3CKac}Kxo%r zP0*Tc^c^}RW?@=C5)*xwx-UC`*6=x6pl253!w7ek!eAK{!0LNbMov}=@>ZsrQbWj? zW*85b_10J$D`m^V2emct`>z#WMU>jEKWaSodS~pjg^!{?*W2rc*y-WwENV+y%z)`p zy_X$M=3fy(>X5}viWvfKO=T{f=$q#ND8yfNVjZScwMs`kRirejKdhHXXDx0)+s%8f zs>)+eJfvumXuODZ%B#_U<>^u6eOfe(Eh>Kf`oU0SFW<%C4R>INNIjhP%XfT}S0X%! z(}6U8k2N-WM9p(DSx^2YsM<=A530ofw zem&eqzjZ20$qMg8ny zf`y+r?qgrM3PD?zXx(OZ!wXoSMv3RqGKp?wdt5Y1V^0-RJs-UXC$uH6#<5?6_7k%9 z-py})v8o2SwIzige`VH*O^IPmb~nD}PSM;hWIO6oY32H~K&iTU99}%*{-Xi3o0K*% za>m!`*IR;)#`x%l+i(qfe|&D)dB1MEW+(wSlt6Vj9hgeF#<>-zeDZ`8Gd4hb-JaKD z4d>Qa)phF$u_^wP2P^pGr!G*`gnqc_beix%85DLjxF;tQW4#Zp2~bC5(z*8y#~leO z8>gj@_cIX!cD>GJQ`BpA!e~7!>7na3!_%_&;Wszo4WN8zD8Czk^Wey%NjD-Krn zqDIoJ0GK$o%TPVtXoT3KyZS>f6$cQJJfqF3)Dt%!1fm=!?-=A{0>D1estZd41@DaVJpfIn9qS^eJ3cdz0x-sP&{^+C7rs^A$Q)VUA1(5R2; zPQoCjWUb<)y)_`Ae&x+VWoJk1l2m>YT4l4}BOqOKXNv6Zd=hBd8d|V*ZK!5!3;+fL z)O~)mLy}t=7#<;Q%M8}mKNJa*A~c#SiXjrZesKfz2Pucp@+|WJ78SCA+L|>=1Fm@I zl`qbdznvHiwJQBDcu95*?EG?_V!1%U7!R$1x%`~<8=2qm!?}JGI=>b#enwiXm)~kI zBwF%%Ng^!Qx5#Dk;dk1S+haqa32bxn%zaQ`zkO#ehvyUlJ1z39mCD)Qvnx<9e8fLw ztg&cklUM)PrlS`p zd73sbjc{C}_o8aVYlmTej&>DtLh8UZ(s7-266*8q&t)DHCTR$)cHEv$*=?uC{?BK? z!uBV`)0sxTylTco81x1n^x55Wt@6`&koXu={;@95--PbvVr76#00wuIz@}#myb3)U zX*n6;KZN8m%n?w`wLZQg8v7MKLDubQ-cRTO|K!R0=ko^yfC40W2nXk%hH4MEd>oy@ z9-;r^VFLv)Ru>qbC0Ps`Ola}NJ+{9e>7Us4_um*X-+-7957WqPrfQy{5WzT|T*Z6f z{3qA2MG-y{ap{!?(Ovk&*QD#WUi|Bk{#<~+zXADu_#l=lR;a@C8CZ#4@~G}V5m-Vt z_=rHl9Tj#V_=KGEj=J6!eG zVF06QMx}~2-BxwxVO*5f$>BRWggcW36RnPnqhnczFE6{!`>@y2{5*@#Udc{t!mh8E z`m+Tu(+svEJ=WNPSj2eV(~SO`0kdQbA}X9$uE^etrhwn!K6w}x*|nk5U2W{4Bp*gS z^kQB7mupkwT4Be;^#bLM4`Ml1t|r3f6ZG&nDNipYYWyj#nij(R&o{{lOEDe~{>S?n zF)y>S3N%XZQ&2~6ryx-=8kAc3R>Y~A7JgEka)T;{(XE<2b0CJfOpKU_d4< zW%%&3j?0+sWU0!-Pg>An){myf#kk%m`bPd~{~_amQTGh_2qe&7CI=mbNt0WbgK!vi zah6$dPuC_|kX_&N+N;FD60HpayS{?C9ImSSA6`|J2}lVF-Y^g@9vZxR!DaBod_$r+ zv#{#g;nCG)NtPgTLiV2^q7S$nhJOPe&xZN3S#YbUa4BmTH2@$Np6J|ABR_kOf|#Z* zfSI|0N|C?M;uu}XKqb2p#Cdyq4E5u~12fMe46&1d5&#J zu^8@sEAwz!NhAy>49yByb~~0=RF({DK)9tX%E!-)-m7kXR5tsrz-zAEpu=pchhe4C zphHzA2ieeHlQ>w^(s4DX)vki&ir6XCT`$tIEmtl781+lFx1H;^#xJgoRZzJlQk!$! zX%{uBLsY70i&YS*fvOSxc_12#DC&!hX$}#+~~IS44#iam`xA zVwGFZ{bH(C^sR7U96TwbeA%#h>+P<5jlFRbU2oGhf*T(`czBE}o~b{6^b6M9BnG47oh{7K zanf3id*xw-sArv*4W)na?QR!dH<8u6!xIuwL#?-Lse90Rg?<=8Y;e7})sxovN#aJM z4$=15hA*CN52pw7X7l>>*2!FqpHo_)usziL~N&70g_(-lmErfuK(w?16x zi;8;lbHm!L|8{O@7k`7s#qm}))~zM~TbcGJH)_l_T~}ObD9LfoA|(he%HaIhpJ2Sp z=4Q|T$Zv9HFBT0`T!ie)H%n3r)+3KHss(fs}i45(nKRMtYoI|&n z+;2^YeHeDq^7_j6R8D>-X&$3z{EL1;WmFt<f}UBPOAf-s?jw7*PY@(GZhSZ=KI2Gzjc-}sZDapJw?Q?xtEE-C!^U6%**ldw zodv)6U#jEo9P0Y-rEF*TL}%p-RTB@0=!9?IP|>C-zuFf){$4M-`Q?jng$FK$pO{XT z(e@u6RpxH#Dh9M9TLj=K{@0(hD6C-dUK}FLtAQ0pHOhQHMX#E)mPq^HE{WmbJpDS? zqU^HmiP)yFXdzPEJmqyWKXm+kEl; zFdC zywWv&)pkK2rKfpGXM_P{G}5K~1vWTgaK)upiS_Hlib z^P^5>+}>H6koRj`Sk&9Fx%=BDQA-LK3P#U!ykov7Qe*f@Una!suKUN>{5c2FsQ5%O z4L?R>e_t{8Fub1Au){3cgq@=;1h7w|Mw@CMJ~#U`_r#Z1*(vVu{DIaw3;sdviTKmS zUSI8p11_Pd7Hp;0e!D1S-ujbSk5v`vwAQA0+i_`I?MA1^He!`!@J9I;cIo&k)9ELW znvK5C8yBP$;r{CqL2tG>)NX$se>z-B`ix{%bMN@^sdIAeDb(H_@-YFJ6M3|lrgrXg z&f2{4>0oqTi;=o}myRA~${`mma^BsmiQ{yz;K4!U)bj2Z?cW!bnqt!{2tkNqstge+ z4vwSKk$N_sb5AJYR;bxAfNd3`$^*B{voee(rB1u2j8%ll+yx0v_bG?;i_?R@< zscWlXxv%8KV1oQw-6S!Z4W*9n6X#12DyK#?YIaIfY+)<(RxvDk^$NkVS9&WDKP+pV z2^g>{nh4+8T6_BRDFr#X+B*l1rcX1kOco-z@^AP}E-vc5{LlZZs6|rZYd}F_Tc*F# z8Jsg5{Ntl3rtRp&mothfMUVZosF5(}w;5c#!$h5bB$c-V1WXS-*{ioA>7Rpdv)Pw7 zf+AqnuBzGsBlEtMm`Fc(aCSZqrDHi!cMa3~IYTc}+8?z|Vd!5pWwI*S*PiYm-RJgt zRdft?;SiF6{|*aVxlxv&@&Vq>s_>t$)!!J09F9x-9duFsiwht)zbI8uZWWV?I`AB1 zGkEn|V0o?3%uj)Apu3rVPm@qM>&JA=Eyu_7R1pVbk!+t=Zu;mlYcqB6CYWdiqoGkP z2((uA!SHxm6E7lvyWxNnbkj*IKCNK@riL8%SHv*IJ;0-))Ov{TrHazsv&izaPlFrZ zGDe2YVT~Et9uQnqD}r?|pl<4TO8djMT9@a`m@(1a+w4@7lo=1a9cTE?bd_vpD*DA< z&Cfk`;F+3gAzA((TfCAO#kiZc>ew8_`ptN{K7MU2yr>2$fJ(QM+C{=BE-^%6E{w0*4CBN+Sa0duv>z%OE83IrM< zfSAIha1x*=d4NsEw_+plT{4(wlw%GaLixs?Ee!$-;m?kM+rq}ef{ih*0MpH<&_!Y6 z<0Dm-K6-D(zYuc4PHowNl`XT>c5Ld6fGJLSd3nDzG)7zH$RvXR!_Nz!umdRs9zK5F z>njx&z3oQs6WQAZm`3gXNG=YWb}`t5*2 z&bTjM^EoKv$%~o{OM2OeQ`9q)m$}Y~U+oq#El<5N)klG3%>W970(&ES=o0jsL%@Tb z($doMAuQKdN)lt;eAaQ|v5y@cY@(ui@&V!X;J@P@+;X&DJ=G5DQkW2aKUR>v&M2rU zt2tT}>`2oet5BNq9g)!qDGoFN_eb-iUw9nv)_&IQ9D!GazsRHx$s4@(cz)Gx)-1$* z8*cJ9;hf>)^q9vEIJ?}WQ?j!){gi5e1D}RxpB~mS*=s$FySO)?}S11 zPD-KF?dDk{%nn-UnpA9_f*a-N?l@xZUJau6Rm(6YI9`KKoa-3;Cori| ze_gI`&lJ0Ec;+$cN7aV0y)ORUAn%M&C|T3m0&#|^0)j3pLscM;k@r{w30EMOgn zNgvOE25@R*WMsI~x;pGM#Uxzb z1E8^HYG7cX(x;b9(u{Bt&yg}pIDBfwSxyspc-Qn~y8HUMVKDft1jJl{<;v{^MjA_@)EM}k5^%;17pq130@fB=mmRoIio8IXsh zuddVE!tk^^N>u$EqBp-VVP3&6g<`YtS}?-maQL??Dqpslqs6zH<@BX^Y~g~t*%En5 z&zp@`d*<(`V(4A8oQ3b zGs&-CifJ`cFRX{g*FK!Herr}FJ4twT5C$(7gx1Em$J`4g9~0nSav4PuZk3sLuyS$H zfna2u)25z%+@d=nJ7b(XW;y1m@gTqG=GCiL3*IXf8VY0k6Qh8b=g19-W4&S;&`ylk ziq?uRzIgNcGaus1qIt3{u38PEY3^)0vVyDN(m`~p1)hZU-5KP0>8eZrE-j#_*sODC zIu5x&PDNF){gO6*J>eavrBVIaW47w~pqEUlZK~48y0NazmoA-Bj`P7!><$Bc8tL-# zviv$X6*6Ykh;<&c$9a6Lv5HRNZ4vRtUPK^bRVoLV-;l`tIX$?WFOCdHXrYUynUO*;pY+s`;`0>Z@C)f z)90TTGf8-q&R+K3nSZ64U+D+r%g1k$A_~9UA+)Gni)|B!hEKf=6p1-%(qN2uZ4R->_!t$}jDp&|odax=p4!nsEFI z3;&NsEP)k~Sixw$>IFjqBF(o@HpNEQPQU}&YNk23r&0%(Srm5mN*~rspZ31^?+u4d zxmoLA|1AR_2M1?@mAAIfeSLIU+en)=@al8;ZExQv8WI!jhhPOG>}!-tFwdC%fB0n_ zW|Yi3McdaoPrxtsto2kN?9OL0XMXnTrieV%>bn;q*ndR`?xgyQi1OCoC{4 z%zfItS2sGx7DD2FY!HGQ14)!UQ7mzt5!6;K5Fz!f%HXoAUC@ zfI+bTJfLMDCW?}|Mge8UrA_)cJS&R7{wHl$BKJ_ztdRi!ZP&G{)(%S?Ct{aQ@-nBr z?;fj)*B)ctmCGb&=xz=0x*?NRHXPpAn&nm8)iR9ipK-xfF$$g4*{qj6HUXo6Yj{|M zn`|X9WA|PiP^T93<+Uzw9paoh#VpWC`1B-F|1C#t-9gt=K?Jb_VWR(3@vXaZ0oa6B za0Y(Te_3~z3`UIQ>PG*Q+euu4br+O%lfg-WXZ|bJHjhBL4V!k&e#XvqMX?e`d^^xw zSmt6%C(4lEz8kiu8^eK0^DI|s;)hertTGBU_vCUZo(bUuo|36|G$a4YMW0K1Cg|6V zi#pYtJf5fNRTwNU(SGmdvr(-!x2VGY@TcnqDD=?aC<@ zmM;Nfrl*>u3Ul+wcnY;ScxVWCv^ffCIKDh0xN@aP#O+H^zvk7*oP>M)f@gQzoL{40q&bc#=puq8u#a74#;|D_h=VHcXN*q!=?ClS-&X4BjsqO1N3w(AC|Az#cZ}Ir@dq0BG0{WMu@cUtAMV8 z`m+!Lwtk&HwKVz5QF znotq;pPJ#5RSKE{4G3mpiz(2}b!#3|H0~cdM=(m#foDot*iSPSxI9QO2f!iL_=tV{ zrG7xSXR{!;Cf=GcRVjP*bgz)LDbCwZL!3>Pt9X`*r1)YtpLh0Yo~>7GAf=Fbsy21R zS8Qj><1$Z$V@;~Jb;L+q#pKRB^+Q(nCf4`6Dk4T=l&ld;eM09JFXFYNoiAnU4%u)w zaNrptdU`|7Fc;hMKqUz8@#9zI&y>}leB=uf#s;{0mErTvIS;dgf9 zEHVYjJoZoJ8Gwd+X~Lo{;o>wEPUVdnOnSt^!cqfyl_QV_<}zPCGN12E+yvREIT$+< zl96?Qs$zjkYV9?iBGyV^JSGc>5UIb@%+z9(F?4+%>%3d^I-Osyi>X{(q>c9trSqH_ zUF^F-H%qS*5o8>bAmSd67ONmy^~9E{XJP*+g&*dM$KlK(L=AV zVWJmyqV1ea71078PvK4>%V*T$7z9mkB~4u>Wa}YP(0pc}jcD3$IBqxT?5U#3ejcqU zNRvoK+R>$ot+`<9B77P}rG0;!k3m_(HFl*dQGRmtISUb`&Q|c|gJAS;P7N_0q zk*sLee@B_XFu2Qig5c_;$|Y&>g>3NgKt4633K5XPYs=TYq`P(+7>unk>?SO1Y`E8_ zB=Qq(J9$uwIR6y`?0tF1U%qiI{QNE2E`GeHIcBbT(1w9#s4I8x(F|S4b%8Ydq~eSr z-1OG{5_D|1jIEeM|LMX`tmBQ+$krb%#9G8o_0jb1M~j($2oV9O=#);rI9#`Z#FkOMCAM3^eCR^_K9-4k)dE#~r@@4Xrag7p%& zww{6jhMaM%bFM}@B^$$KxGfplj1OnUYb9)+cJIX1aXzZyT`(s|sEM;8{>{WhqFe#Q&%iH6IL9Dg?EZh3O1 zCSQv%q>@O^qZE5;O=#|gRxMZTmyeV89&21L{UX11r+V(iK<#iWvYBCmRUp^(+!tX6 zlVkIi+4V;c5{*Pk?g?D_8Fn*wTxs>`EWcm%WK~N+1;7#M>DZ+s$zO$Vad6KN?#a=m z%o1zlA?oX#4_PSxtdz?lIfNLBT*BTv%uW6i^ZvVx$%$$!!D}z z{k3UBeb~p@ukF1vWH8tX@Of^dI5BF|&HW9Rr&eJg&GN6P4xQODxgoqe+cq8qnJo{j~X z>5d6>w&gxTL=rUhew;kR;dPy;J!taYWz|^X3Nqn&(=OqwEAn((pQr^1$*wdb`^#+N z&pV%7Vd+?Dq2`r!bnb9Sop(XeN}CX(ZahtxjW>B|wwte6<3cOuuGAcG{30reekQw5 z%k%p2EXrTOWgJzFP9=|WeSI4ZE z&Mh>ctZ!$WMaHHI4*u3UvB%^gG`IKUaB^;^y*rHpj^!zEoH_$mk_PX~v!{HXzPr3hG=cT(3LQJ% zE{DX)r7n!^U9&fkjO|Y|Cll*c*jY^_0X|1jO^r#T;1)Cs4CNQ zekvk3?17R2P3r3&KNfj9TVE*rx|wdR?hvdz>NOsi{vVSDT<;B_R;wd_gAIJL&R-?bgmC9$vK^XUDR0ZH`}F7!VYKwvp+*PRD{-OYI| zMIA%BMkRmR<%A}jHBAZ{yyi#eA z6tbvq$=J0wD|Gw?``uc9*`>a0r^N8bOmA}2@8U5_)v6|iMET=a=A1oZmmpsB6zK>& zzq(U8Q%XWBMAW{IjONO|V|p-O;`3$7OxElHmNAkx^MR*)o0EZCr*IPBz1*oNppcpJ`|HT)W83R9Jj-_xsU4 zdgIO;Ueb?4L^n!|pD&HFV+R^5VP5rw9|7qbt|m?`q`|pKtaF;xGIjB&fLoLC9Xw)# zKd^`+mv#McE-Nn*ST}z?bxLC3jzSPEljd1Qr_=7L{^1otM1G&Yqq@=pMb+77Qvmsb|^BO*ae; zc^(wbb-bJdrg$6J)syg_=1zJUif?ot)R)aX@K~u83X0!CCDr7 zze`l|!`;M2!|=H;&@d+?yTfSsb#}pJtJ=<=nWS{1-h#G4o%LF8#gL;Jb2VC_muKDX z;eVGD!&;&u79!nI`B>N#l(QBk@t*wP2>HwB<|*qpE~Y{W?f7SHQ|4X$XLM?B4un~c z2Akhm)L@VPeVu=Ub(fK9IE@h))b&ts8Kle z_}ma)%{8KDZac+zENrxls$&FS#0~5GMJ;z_g(M~@Ys5ZyFGjXZ7;M~!+wtG201{fz zP`z8n1gQSi05fYljlt60$2~cU&r6)+P)TYA&0NGw>0@d35_xqa@AXO$wVs-tn-c^} z3NW-7Q+kA2AFEa>jHDOM_@xJWs=BrJFp`f;Ns5&f2OAsz2|G8pa@Xb#SaT?~V5Wa4 z^(#kr?!s?8w^6@C`U|yDlCi`7#OTu#{8aP4cQ35YL?6YJ{{ZW4nN%s?G|S18Q2SV{ zvTZ%g{&H7nrfoTjKLa&dGb2^$QZ6xv4gQXQdJ#6;tNfIThVp^%BtNhC$Z8aIr^;;cMCnAUIH{DxHu zG3|BY8x_N^t~9cOAwB)RX*~WtmQz3N&xduImG@`ku0{@*9^{f3(Vd!$48O;-r^5ZKpIA;|B`W=!vfL#19!{OC(iD_) z3AP-)|5nEG&nf*IxhcaY@`%S|Ef_hGQmwi~C^{##r>%pj_rt?@qf-blh$%HKx*+6{26CpM|=e3R(_BN=#BG=^C@SH&#m(%7wT7@#u#L;=-m8O z@3|Akr*i}lR~d9vlxmbmY8=@hH40u^vx$uh3C7ER1)$HylD9#P@Dy6 zom-WV5C)x6pJ$>s+I4)pr_3 zNeSe3H?YnSDx?@Cg=_RGbZZ90PLVa9%DnZW(1z0|`G0nQ-8~w3 z$A>JIhhGyUl`g;WC0DFiekOpUTzCGf@b~b&!ZR18vscd@TYOl^bG$(pGDx7_BJ}JhwX z+}6?H*>-1@*bk&Rk@oHen?0ivF81ewPLgT!zV~`$?*Da^?>Sx6uYMlbIwPwVMX~6r zEHl@BZ+LUj@$lyy#qb)RE)Qvvi0Qrf6dHckar?Tw?7G8#J=W^lZ@DZyJZeXW;qu<4 z(NXb_{>23__om^|CYUvoV~%Mu7^<_9us`d+HTEWRF64#GyCNGFKC-eDyKGanhF2ST zMhDGrc*WcnmRF7gW3ug{LM9!>=6g-91gv{f=M>hKrH6zjo`r%(JN@qx=$HI+Nrs&XIk}8VfLP5|4gJ zfOCyxNcgLR%EhnDx^hi*DOZ2SZ+@w+-Q3)^N$PZ;?Xjb%$-TY&o{!1?Q|qFCqNANU z`K8_G#ouRB(ANn3*Q%%U2Ti$Dy}QbPMN#+ZEU-t)muS(AQvVSb!lCg=H@P{%W_nDB_PA%L^g5Gb<;u#; z(DG|zN4gZ-y!03D$V4p-V9X^a-@ov$lY~tvA_)aqPRENP#b#?ftEUbyHk`6hn;SDm zOJcJb&vk1F%nG%H>Tbp>+`2W3v74cz@^^V}-a+bWYNf0DMk$<)wnZgX%1HiaMU>24 zHcUT(VXMl)#ED+SZlH{p>5X$KGp~FT9=0Pof3ko1)2zPfo}mj zW&y)8NioN>E|$OUdgvR~INv_?1Xn0xoiVWZc6O>}eG>~=c-31+MUq6v_J%Ss zvd?Y_UrXdKx2NjVt;fxg);DqyjL^`&BYP(Q#OLM*;x>~*AC_@t;{^R@nR-e>c68rl zUikY79}(2so9_G1YgaNrHfxtnms1fTX4DOvA4;%{3xJ zgjTx_k+Eel;!MVtxJiq_s_)K5%Fh^|cWqSpCkvd!d~3x$<<$*370 zYV5u4dQN21ipTK1;$JwLl2Hbi=b*KQUcCp=q}S@*SdPiZLcrP?bJ!iw8F4l#r0_lI|1*0g*-}MClSlNcOx-s(*@p9%XrZEafG0+W&=vG!CFpO7D2ZeXaZ zAzuIG*g}tKg|vLFbZhTr;YqPV^QNNhsDLD?iym)cV)0fz&KmO^_q(<%nE$i088w(- z_35;W_}SE69(nhw#3u6U<8<9MRRc9Uf82d>TRxiW zdK{k?occwmma9b!q{E4^`K(wT4N4ZJgnw>wX5Uoy8mXDYqyBgS&7pGWYii2``-8BI zzwsGOI0rh_%zUmhLBu4kdFBh});Al6yl+#VVYY86G#kS2Y>-Vn1-4QdS)Y5a+Rd2g zDm`o5YEECfxPs6!C0Ata_kR4$=PObD#3WXms!GrJJ@xkpwYs#K!j4B6 z#|MwCu0&4Cr!+iBk_u3s<^L6qswovXj3OEA=_0h@UK71r`G(3*mD_ZQIn?UuB<7Md ztYW{vuVVhhfT>{b22u*l6N>PtqEg9w_pY?{ElaM@ZW!llGvGLDGHT#J<3~%3q!5uh z8@Z)lru(qfGxO&HjM8jSCnuOw^H;2BOJ;VQ`NL&)EQRbZc%9EkRT|3&U>c1z2M{Dk zYLxGwoWHh}xVwb8DcB;-&1O{0#fm#67wxZZ=uIOMlm9lW&O?Jj~eqjn61yO2) zz&h2U3fdV^R_h{@ljQ5GZ(SbtpsNiB#1X70R4vC3Trzd}_El)Ks^xU-lCb^cQjk^4 zuq1wKB42JW>KdZn`S4{^aCuwN+^$Z|fn^q1j&R6a+IMqomMy)u5P%h3oW-f+u%;K- z@4j-8yylY{F|kZ=(Uu}|=8D>v^OAc@TsI~SlES#~gB;c4uw|4RiMZh=&T-s-SRoT6 zQ7zqG&#=G!rD>Y5Ah9cBj%&!mO`c6urY0B}y}}$r-m#Zaf{4 z%HhSVSu09qAyBueV-whH`}eXp>F(PNxF3`$$1;SUtW{2D4AvAMMFL7z&|okxYo^<%OC@FcWV1U2+^F zzmo3UWlw5@LTimWo_LRCU-Nr$Nn%Sw@kjlvKMjdmRsd-9I`&Y{QPI%U&(NJU>fB#Q zx6i(DXBczBP}3u)5inbb&Q}Vhlv)hj zHydi&=S*%}gg(2p67z0@%RYKYouJy)fsasO~ih$+ZLibpZj%I-XqPm zdcf1^wWf%!_zoOFS)SfqWC;OP&{27(FLj>PbB^aNX7#!LydI^3PJYGp>Zz&qz6Kup z*-)-zpJ&BQmn-`YL)Z5;ZoPY*n^aTV7%DI$Ip@#q^6Qhw8ENL_`xKM7us>gyl%ITtfgrmdo_YgmK^y>+`2@e!G)2vUIVyoYNaapRx%d)2lm43z^WfFyJ9r}5EqJU|F z6?{b~hn07c;%(Y>niLG1OGr|A?Fo1AG4SsT%aSEpt$lM`54#e(wF0Xp%{~cXKBv>8 zpR2#k$lvvr&umtJnRhK)-qShWr(!wr7$Y`tuD%F#CPhWwn%Z!<9ZK z&t|iXsfuhm(787Lsy4G2cRv6H{Uze@*K#e<(2HufJPta2U#$$ctsX~ZMO!l%w4tdJYW_p#;=Kh6Mg1FLty2o`%| zGCxAHHEtqM&O`x=$@Zn={`{|U&ObMXb_@ulVAr#MvJNMB6=ply6ehHahp1+dPWsBe zp!|8%iRKwO)+g7JII^)?M88oy{iE>DbHQY6Z%?0F;rHLMT_Rej=}2K2)iDsxJdkhy zZghp~3Q$0fmM=s1h$Tv!n%%nzuW}ECzK~>SVLC6f zz_F=vVa3CPx$Ig6{oMwSBD$9|C34NG_>6IN$4yx(YG!BGuk`*zd%M7DbV0W7)~OFI zh|*jPlKz~mHUy}pc>4*k)DEo1zgp8Gq^YJgbZ@1l7RDbvtr-b0envh&lA#cb`{|=S z!M$|6zAnCqIt}C8qqE{h1&M^%(t_M-)gsXuS~|M8Fj_y#MYr{fhgiE1Cv)BR$mJ#t z0p&$Z(Y;~DU(3!{gMxy#K$P*a z(tks41Wt84&s03XTSwbp6FhDrv1mO|@Tspch>78Ye-d`3{X}FUt0i)9f81y1OL%m2 zOQoH^MN2SFM^{%+?rZCjVzQqCBH$)0dm}T5oewd=g_`dh>#n+$v66~=2x(BwL(+HyQ zu&-O)Y$D*f+6ILi$+Dl9?WLloYgkV!DGqM*=N*w#Ofd8WD9^ddV3EoUMXO(A3?XyPiF6Iyh9WroKpZ^vgORP){)Q z+lwMBam4uJpqM<;en`5`vduaU)%QjoC4v$n%>!dI3=89iFVw_l-;}ar=?pb-fVQt`XTku=wj&e z9n{&DG>f$l?gJ_-p;aFU!$#*8ACIgY_v|7ZE$8@eKB-gmonl_NPSZOTX<&QAFej7W z-eTO(=d(BR-~i0xu7hVc6AWh^TAslB^MGI(B2v;|$OfE4x>Ic%uZW-i>ep}Z;-~eT zzy0x3+uc5kxDQr*#V2cuii!)k`*rQHZF5Q2%mI)BNi_Oz4fsCf7=EjAFlB0Aumub2 zmDT)odp4$Vu+5Bw5Y=0XdtmpdtGGR6>Tis*j|4~Dz_>Cn{17sHvB8EjbymRkJIBMx zdV|3fe&%b}X2FdsK8y$e(HbxPuLbIKNYAk9mEu=lAu8o_rkG(&xz#rW1_s777^&!; zu8&k;fz{NC`o9*{{#4gZ5}f#eDx`QT>fW*+l3gF{{)=6wC}Y%V&oa2s$Mq2fNr*Z` zqWW6wvPBsuxOc0e^zSVVr*BBjG`w8P!KXAL1U1_m20oYYfx)~YpmNAFc^&>mt+OQ> z_tjmrzZ7Y1*GhV$!npnM)PuVYp_dxcF)XAu3Y~W5x?P{pJ?v;rw7WkL#W$P6ewT>$ zUO>m5W!>ke$Ei3%x)dc=#F6PDp$v%Ov1faysgK9$TEuw12DsVH@mQH%;(JJmlS9#K z;JNr-xX1K~YWsk1b3vHP)9&g^I&XA<%MJcOsxi`#zl2j_~ zRC#XEN1sZat`ypxS}{#q-h8~q{}Ic3QA}lK{moSH&#*rS7#s7n>&#@?n@gvktg-PT zn*#(&&A&Yv8{`N=Yk4G_E~{TwF-?~)ZHRJSRf0~_b8+{yu%!F?elN9IWxn~s=lq-H z?C-Wm&JA;3W4jYiA-+bLDviR-2Ghjx?yBB^`@MUN;Ia{Xe!AhW4~|VF1K^L@4EZ;) z@80q0sK2BrPSClND?TExB5+?3%zW_d$2+=XHOUPjv9jeR!w=8f!m+Y`kSnx7#jWZXqo*=M+J^pvgW zXQ0!(kG$8v*=bJUwn%Xz#v+M+_c7naSEc4Y3|CyoN$U|mO-46b+>3r@95Y-kMw<1}bP|o>3Xt8NxB?FRQt+Qm0JaeV0^}lYCy$ep^Y!aJ z%9TVa9**#W4bz)KFk<4=PCf=dK&5YFS111}yk$$t@*g7y8Xmzk7Taa++jH`9w5CdQ1j(9-zAy|#fYlauc6RntVOO>@a5yZ^Kz!c|2hdjjVEF03 zMj0E&?wjJeKif60ha#z0^ z*A`p2Ss&bxZ>MNC0vf)SnS~IF0|z zKFw>9ds@p%UV;QIyK4+Y)WzoGL(e*RYQ73GQM$WD$mo~jvEhyNj6Mys6pXf8uje){d5nx0Fm8EYB4!6;8d=TS#=SQ3wlsqQVk}psR zQt^eJ7~2e|*ZQCzF3z35kNFUNbL&-CS@+8=+~?*% zEXs}|lS-5}qfpe<)#X6~m@>f4Iu3d^DKcL!p}_^ZGem2is%4AukAq>0xT0VDZBZvD zV2RlV5+J23dx;IBmo|U?!3N+Ea zyW8{02Pq)v6Tevgp}DH=;mu@Ad6lQci-Nc<;m$3-UrkTWZk$o*zgDZRXADJ( zdtu19n>NzL|9)wxWxKILRmO=+M3uitr*y^j82swGz}TDIc}WREn$_Ok1;ccIdW2;4 z<_9gtp(+n&ngg~Z8+WjcylAP~je~9*r%DYOKdo8uq3NA?asNjdD*J=ynGaJzfKi~D zo-b2Fs+fYP8KU4aDl|EU9il_9kDML>L811P#L<#P$LG%)hoIFM1Y1%xk_4{9Nn}17 zgezepIpwZLXE0`0JpN-rwpy8BAEZNt^{a^tzFExO<_2Xi_=Xbb} zs#)|79lsNH;CtHkN-xkCgPu%`8nTi)2_pu~Ne9=GBsvDM(q zj-tI0(0XH-H;WG0FFj&w^wdh7a4g z+7yGfEV_0AgC$=O+Z(%rzY6WiUBWfLSq z+2&5XZ+UNQ!9FhvBjh+kavu`kX(BLe*nhv*=iI>$JS)DGEe~6MFKf?jolmY`Z}zJI zrM{vVI!)C|hqG0g^7i6g*}hK-wUjlY<>oA3Sw08M6B(a!4A3uI1bm-K8*;8pH41A& zykj4;Q*3u;91Bz}5_m0kw|#HUZGKGc#}EPT*LW63zG;t4Jo4ktAw*D&)$R6hF2xT^ z$&}%tw?DC&-Z30G>GGb)I(e>kBFr|Id>FTz`wdquqD-!vu!_RZBzB-WCQQHaV2;)8ah`!HV}e*GKi@scsfzCNe& zanjX5Dn(@FGd8O}!}Th@`;yk1=2)NUr}#wb%#AmOuGWkOiq^&YFq=Ihld0-5E+$45 zZi%6iijm9(_5mf)@7i^2D>8xsJWMIyr{4APjS?Z{A|hPGYs>eIWvez6s^+p>LI9JQ+Zb~;CMsM%%Lm)A&Z<@ ziNy_nIZ`6ihY+q7}5QgQ*pNlZ!b5WE%G8g3~%=DwWY<~^sIV;q} zy~!E#!V1mq@X@!uQaISVb1>W|Rbj+JpmVY0>eL0M^;`@as(F&OVaY2vx2O`6@9iUp zqe_Zes=chVeYLk=30KT=cJv;fh^{8Q`TN{n7sIsPOphHvvMj1#pivM63>zw<4!~wrVVW6s=YtO{q1l3MaV_DDmG&wzaL79abojjxmGvSn|80u+Z_YEAGmSonPGAEq^`;TJRI9$+h@*}9`Fro|#=c`|4>Q3) z*CQkzE<*Twx{b#TM&;`Mc~tH%M-`1PziDhC=WCFLpr=%gg^BsnVXARZCB8MO`Ppw3 zoHul3|3A#T?b?C8dVK&uHOPC^B>f{8PHq8~Gz;VGPIbL78w~TlM9>ypnb$(+VAM!N ziWGVoNh+tgR5Q1PXF*HOdPD&HROYEkrV!{*DTRu7P5Y4GmEFUhZ+79D!=_yP;drN z(B5lS7Ysb6t);z{E$~idIYO9tf|%;5wq+XPnS3;5AOQI6w|r~g)^?$vk2R0M^^)LB*%h4R-@7>MF(_Lrk?-#%&t5dy;bB&3uZ9ZR@YwWpF zRG>{-Ez=^g^Z7>np6)BM*08u{&>k0z_NMI;H=Z4_Gi@M8uOZHu9p?;ik?b;c>l+Yg zlCmdzX87?O;FyHO#GoM)?>E|aQX_wCBfTGz)WvV_^mcW%AXS_-&Wp1E2Zq+~!QGb) zQ8UAJ?%WRrsF7L;4x?%!PlEjV~Mr7vd3J`d+T24U5VcQfg6h80a&wK-dCn7u zbKv9QwN^ixOHNMR$~A%rn5Saim0b*`M%7uMp0g1HX9DDyz<~5)up>8bizIsr+ZROb z=z~+~4vi7u`w-dEpu}9_rVFE>6OG6T+$Tu2H0h_za)iYB=FS-lQa12sRQb0{kPS*; z8ppUSEi&c~XV&+4FZ)*asnvJa2gLlu>z(C~i7)rz8sw|;PzHS!orz`B5_6iz22a^+ zj5?VNWzsqY`-s^_7~j_RC%M9PdlLHNRwRxR9*zw#SmBV&UFBcLpsO!B3WsnSiPWV7 z5av?>-a7{aKH7*z4_2X15UCIh_8Q}mk3sgGU7W(45$yy#$~EnjaNRx4=n2Vk>!CAK z`gOi2aejMB`CXmNTV+N+@GW`i3-jN_v8^+b;4wQ;%dDCf3~S#U%ph0zBf6TvEEV(- zKC?Ea=zk`+;mFX@Ro!Ng=*9N5rlAGMScywS0b; zR{kNQ#}I8^+~FAU>j^1OTU71?;;Wnq8*SZ4MKWZbdVoWT`k`kh&3dAm9HJd&%JwvO zyV8{(ec*~w!rgDNa`#j{N>w;&3ByJ}QY-YXTq&Azechx_AQPkz3&$z@C-8&%E=50) zV$S*pO3(U*?B2h^hr@C&>i5a=e;e8d>gq2iu)@w3W5ipgVxyOx{>N*AGJ-~NOLG2F zkBJxa>SM~L@8#Y<#i%$fL}(3S?;{1FMyp6c=$h=iD48eP+=y0LVdburuU|EgB5xS7 z_dptYe3d3FQrKeE%9rftW6F_L@-KJ1dbR-_8g*w8(w95s2NN?h7N~)d0^U4EUpF;^ zme!Vmn~6P^$oA?aGT7VGd8+_vkgL80b2UX__#_kDotn zKVj0^{hueT>EUAbBtY7R4cwc7Okfg7FVY2A2S0{dFch7+tan>N_H*F{2CkRvh{x$< z#24{x0_Ll2$FJ3b6^zRBo450V6FN4&zcHKx96*o6D7elj*=y|JNu0;d*kVz2#ts? z(T8;xVdB(!{BNqdMH^A3X;fwxrIjYMoU^=xs_xZNs2^>=vr|n|-g>Vv&@f+r4&gy! z#)z>Y$KPizJ;+CxQdujs)g;9qF{WyEr;={JdTy{qHLu5S!epD2PEZ-`+FutW4;whQtqE46K1Sn)m8NI&2f?OIV$S z+EORIU@gpGNq@ML(qi!g9Uv>8P^ zOHZ_<{W2SJI(h}6vP=0}RThAJ<4_8Ok`!{n;A$ZgbM3sL!`Bz6FLQE0kxH0#`weTN zA-TTR55&pBtoC7UpiC+hMHDdXp?5fL(DU&3&!3WliQA>l?rv!_Ga@wf!%Q}{aeMb> z`>LW}Mv0$Z0nEJ3ejc5e^QGw>SSuNW8@215q7Z`73C!9CRRQR?Wea3?-I ze3JE`q2libO-N5Ss6jN~i}Jfzz#lKXg)Bd-E+e8Bl)0Fmu-F~4k*;s=RMdoPbCYJ0 zjBuH5M^GJlKAKVUv&|O2UvRx>? z^(^i6y~fL{lc@!Nj}ZRO(%BtFc>eyKy;bo1Gycc(??N;yYk6`ZHysqUJD83hB!w4{ z->^=xjbQ|Dq1-|KwQ)mg^3RZOYi%Lkiiak3x=Ujt@GyP;>CgRD*}+~Y4NPxgO_0P# zcGp?>LStGT+Hy6yR@??zCv4Zi&?bg2s8+-f+ra34kw&%+P!sOA3q)+>HK^H{;Mm9Ko9wV7O_cH91)p6Y}tQGIbsr?TF2g0 zZaNZXH5!I-adB~MZ0vy4R0@-N58@nwYGRzH)bOO8)35KPV%*Jw=l}D@;{Wg%`Rfv= zzlIUbHSYauDVuyC(w=zaY->*b{j>VSdZ?+E1U?)wnmvKX{KaW>*qS6<2bM=U5IcXG z@;`kS5GS{3!^fPkIN+nT z8^&ze?sX;ONQYj$)tSg-Zej6wXR%jWbF3^rK!^zbMSB93VK6fZ{0cwAQ|6EE@IQjB z08b*G2*=(m3A(orddV5khDxWJM=BOClCQhDxeZr2(7`QRyxk}wA|m|~pAgWxy{T+z ztoPWfPdKBW#(fJuabR#dA@{oNpgdzQT>0=B21f{-RC%8n+yBOv{`LONmN8Ti<2&=6 zA&^)k->D{OT4&z|tiarCTQ?@&K8=u*86;D)7}vT4A3M9#JqFu$GeDJ}h^Jn=LMN0) zz#USzJp9&@q>L2a#1q~m8mzy!Bfmcfu{-`8z7G3qGf@a!3&A3!oo)$1*;|ogM6$^8lSb})~xJcY(c?;>PJ;JgSar$C7;O1RqeSGsuZ^ai&ux+d+#T5 ziZVgO-Cm*b7y>hkeP#EK(JtVc0jb-O#!oFOCUF;D@-Sq(+1LWHPbuzm6D*tZ{MP&O z@QjR&kphMJ&Lk{I7SeL~3Hoa}mYJblBVJcyb!hppa65Qd{8(JpQDLcm zt(NR`axrv&brw^;y1SB$UH2m>3mk!mXNrOOvSK^lhxhi%%q199(*~4+mCp{ueUDx^ z&a^BIm*@{9Q<&2zIz2o(ViRhBC}=XVELN@i;5^4m zBXL$mvRrS z*N_dS6LwKpV}8x@L}3R)%d8nj?d$@{4?bLsE^ zvMtWrb3)DwK@h~9T^aFn-+Z5yT5;Vq!j0lvsd}|CpHH4r%DdylFckO)K0#g3@+cV# zO+8Dz^|ir`K|^$o>j2x+I=hW0Vvyj%g%Zr$Ps9A*o3aMEDcxNf2qK}odF>XH ztS8#bCEb;KIR(vh;V7hFh(nZBBy1LXt$Y=wO6B1C{CD;fck=X!XT-hUiBJjJ=e}Ix zm(8vHX{gz66`7Jk1_mEdkejGoOc9}(+M724>t$xSpIQICjZWd2+}<$^SF#H=dhOcS zovsYpzJSm&HrZJ587>E`j>lN4#9WD)az|u)A3{VWDX-tffKwr#{Izk#{g@hmirLra zr0b7DGexeURse`%FQC;H`-iLj_mRu!h+GemMPJV4#k{>eTP2#Ocm{4&7H4J)jWgJ2 ze+-_#)db<8EVnsW@AhO#!~rT~>zv!Y!PwFOyWL}dJ%ePO-}d{*FhV-*{zTsUU?9kr zT1Oi3d;?O;A`9W>k$^F_qP3vh7{z3ka*}oEp0&4qK#+7h*08T%LCrH^H<2yhq`}MP z9JoV#Md=%sUGo}tGJD<$D{$X5!x{2!mr1Zbx9+lp-}5_0x|$mb$xX?q=$Mj-6IbVI zYPNBH5zv%8CRqpbxj|muhc6A3r2#l>M+1!q5~n!A-g5qjD=nN3dH%aJ z&MYpxB=8cXATj0^nzl^Poj?#TL2Q_=(7ddl@*N5~I=VqRgJjpaPxwlk%4&Nq%Y8-S zu)-Kyrn-aPqzgF6O^N$z`&%A^NgzJlNLs8;jJ%V*y%aD>9~>1G<@J>Qa1@_--CU@) zrl!bOLG2KUQ*=M!9OV<=n)@n{Wuj4?-fL7|z(}MAN5)*F=)UqlR7c+!%rs|cjYqhQ z<4PP;FNiSJ)|!0?QabBG%R?kDG-=8RbDPvFa0ZONxFxM`fC@s_mi;L`9pVM}svqI7 zPkxms9l+Pt*&L2XDjp>3ySu67jwWg5$l){f zBr!R+0q&&J1lA;EHlctp4AjU(Dr=uEbf+c=I+(1Xqa>Up_KNu)UGzBI4oTdA@1U_Q z@p3AaO$YZe7NK0N%ktLIE<2g|IdDpazDiV-39E=7Q;oh+U}MQZ-#kS>L;GZ|<2J+X zhCBT4HHt+&o(k9%&vud^UDLij_{6C|pAEhe>Q*;;w*Gs!JY`2s!gfDqEG2rlODgVX z+W%Ij0OKe(@Vr1}Dp~WI@bQiN#u2ZZy!Q#9Gkx&+{>YOM7zeA`#QCwQ(2NZaI|d~| z^&jdUNe-gl6b)8GNPt>0ZD7gdF*OF*r3(PEW5UY{JEK3oeIu zcqN}~9#r3`U@{|U5xY5$*CTO!MO{2IIINdpSl_dG@G9{`JEQo|*Co2{rK3^X%2#Bd zyFJI41wM2CKwSFg-&o%Nd?6nuGnbjn_+~YzGRWApwWa+wryG?nJ42>MaBzSHKU&}c z*SN%?MB69jH(THKi`?F^W;_SoX~N#(3nIg6OXGldTujlD!N{K&4ATWY769J=!MXnW z{26?>>pVYS^0xkfF3tsrs|ww_*t}d^)-V3bfU1rvwfO4=f-Hph_BhZ{HH_v^TCW3cFz}!%wc7`Yec3Y$3d55r!OsKcTgw|(eOL> zYMeKL;*UnckA^`mS`s2O2Q~;MtzrYX2?x%Vx%NXtdM}GwP|akr>9C~~%OW}^CME{= z9`ro_013I%i;H2?(`GKKlUH1AhCUVE&Os6oxlqk8DJ3@p5-zn?3Tcb_wJxmN3*8hY zbP&$_5zcS=Zz~~q6uJo;+vg-YQ;?nXbE=81r>6(0dIa_&yE046Tj1x~+U*yTsPHOp zU5|eY02^{}{?@Hq@t!9D_iz0ixxsWSCMud-O$YUJBsnKlF4FYO z7+xF7k+ZY2>?2+nhO^6 z5{ZQ`2@h}J0o(Wkn(jy6a$uq;EepKv@=s$!76SASvzG1|M$QnZX#U0<`MZ#Vi}z;(am zQyMeD7*bar@mA~~mLCt&5wPFU@)e^Q4|4el8rw&-(-r_2E#w!bmKWFZsmp{@#tS-R zyIQhf48#%`p-;l&CfkaDhT?Z78XYX&`@dS3oFY_~&|DhEh0Yy;$GFsQ7S>9ylf zhkFsj?W075eBWGnFf!QQtsg?IUqDwyx{gWr=8;}Pi`e?GbMK!s?>`|HL|2f-Np@Q; zqZ!)oi^XFPD|Y-?R%F(w$R^vb@j_YU+rhK-igP@ShUbxtnTGG1DteBvUK=(35Esma z_9fz4Dk%#97=c+?rHMm{t45j~aOY70!J;Ekpq@nwMY#-^0i2nhoI5;7j5Bs>Ql@q9 z%;rj%No7}oXwrXWZ6BNz>i9*0)Qu?awl?MNKp&jDV zI#o9JMc^1Q0V;{_7!fj3dKMmBmph$+g_U$oRr4XOgx_n#Bvg>MX6!oxsoGtkboD^1 z_Wq}Z19Wg*;0yg23>8`WM%?}BczO;IV`93~OzW^rVMSU}g_3ZKMOqv)$j$;Ae}^|vYtwTJB2dU%EZ}gv$H|qu+B2WA-mnC99+rql+AMz zBG6+1XesKIkyOfchhtl($3>NRhRewy`0Yc6ZG&L%^1E%?B8joGeZ>j`?vbRd13`s} zSYodoQ}>X58=BovD9iJgvX3t;!bUWMlyXNG7Y)33U9)cx!%Y759KYD`QY!6+xWMW% z|B51WlBlPXsOPo}j=$>Ue%ZG@bANHb=>@9lqi>BS)bH1_S|Ui+x9O!{?zYF#mD0+D zm!-Cu*bWp%0Ix4&+oLbEhpYg^o!9>d0r^*9c2^3((rW|N_&Agal0IbXydMc0_siC$ zwCuR{J3=!CxULK5cA*{9P@}9e%H>)-Qw<^8f5fk(_`V;q8+LhunEasFjhotOo&2LM z`_GXV0LH%}aTiF!Ezq_Ms`1j|8VSn}w+RoH_W9+#Wtbjx*%Ur|ApHDcQu}q)onc=v zm{v?;ShizqIfb*W6)xp6hbi&hYbeSmFqlc(v>4X#56CC7dO&9+Qpf)2KEw4iND9qy zeVIh`)PssvoN%jigFjK|fn=AAXS|?}@4S*Rcji~VIQx$&j;W-giE8o~-eAHvIQtIH zM?pd>ocdJKb(|BF>CL~3Q~p&qeGUcS`pf;|;oHi{>U%=jL4JcdoauGN z=8JG4rGH>7kNSuIJ+hDwz?jX1uQJ|?>Me=3l>r}pvaasM{X==*NMaT&5{H3&(NMLE zqK=E6^#y<`DbSa|V`?Y!Tu=RnAN~@_HF(wQr(;CJ z>z-Ir7@^0}2V%oLDqWKx{J;JW%~CiYV;$WcF;9FEa!XIKH+L#E>S@SvmH@;m$pQlr zNh$c0`85tD_HVPge--Y{M%YiatHV$fj4IgC=uD-X-<%`v-OVbv_>-3+VCch?xh2;C zEJPjz{PPSg_p}(zvSC4Ry4)4w_^;K88pU7+t^^@6@9XBeUtw)Fd9Wxs5?o;auvb^V zIoX5}VCB`?oyerT3Dw!%-Ic*?f#Taam`D#%sPXSYYrQmHb0bs2-`=D3dZ8zP)B~># zEri`nG^r*`4qwJS&eBS!+x2ul}-Ss{ZJ2hj;@uf!5 zrX0OWTS>O%%o_UoeJ%{CAp9R=KtS@@jcP}@PVid%P%)}^L8x6(Fzo?(GDB4xMPGFiKfaj5N`6qsHddHzyYvRTl|+ zQ%%pO#2pNV7dbAJ`r-w0qhb;(rSY-G(n-X^qoV~ZOmFPw&6|aV`@*g((*U}AjU`|o z_?;fjfL6e3RZiyvY2WCC@5S}WhR1MQ`fE-7vU?cWxH`rxaSp>L8S?*kLKh($hs?V?R~6ZvflCKeTE|Wcphy>}>8N*0*pryjPGR;iKKX`vmY8(JY+C}r z_&$5vEnlbwwO5|OFS0n;oT+wQePUAz22_r&u5EC?B?#Ev`3Itco7XynS)CTd399c6 z^HnbALLCAn!3P)>8w9q;jgrrbr&zBvG#~(F@CPIXT)llXrT3;7v2%KI0<&avD$jUj z^WG$AHu`uXSrGvHwI!5u{XwC3-S6ZkDT@Y*bQpo`_tzuS%ZJ1*Rh$CD1yfVH_iIqE;Ac|n4uA^JSc2g@ z{hw1@;vYa|B!Hu?{fgSmVxOP?qLjwFPQ!cL$K=d<_gm8I)Cz@umEU!CymOyXnIeGA zm)0hv?w*6XcFv{OW*0W3yD#1!&Q-j`lno_XN#!$9Puvotn%6P7gzV$8Wl=Kw`}=x* zq7Q?;wOho1-&DvH6#@^N7@)1dLr+!7Jb3_!_4#t z^lG;?D$rwxLN7B5{|q#vcu>@Znzx3|jMq3TrMa(uzXET&6)<=*bbJO+qHln}zq1`8 z01j2Z+A-Vfu^Duv3K_!u^$iV!66A9LsR`05SIVG0K~7O=K6o<7MK#|G5w`DB`*je7wIQ&pOW3QV-zTs4uOM69$q$04U?p z*@-8aWr<=0Ln5FX=}JGs$ks`|`K=7F{v3Ue2YrI>+!NV5kO+)oT>+0msRM=^2w$58b%nv}3^4>Q=5G_;OHxyuCxis{tX4d=f z72xjJ`TGCrxS>iVBN<$ejhB@0pffR#cX2;6As2fex}T7OIY=0_ft+e&H5B(hdyQxW z#*$9ynleUxxHWM{;vHGduU1Mf!9W{^AVY~fyu4;78;;q0RfkH9wT~|QvRcZ7+D2kK zD;^>UFE9of9lc={5D;KPAPf1KU~g!=2N7k6P4E3>&`IeQZuMk}$>${|bo>G)Vh)%= z82o%+`DJC0$*G~p3r{lG`^RSk9`lw1jcCS~#(PES+$OXffM20OE!+x6MLDR54j#?l z(cp&L2U225NqCT!L6-t8EBfeeYybf zE`8~-HM`>Q$cJUS%Ctlk9HVo{XJZ{?zxN~vu8p@x-%9!vGiSIzO!vPpZyq+(BpmnU zyu@{#F~mVD`+@P%jh0vu(jtOY#*)p`2mONhDl(w%^d=&jjw9#1W2N;{A?GN#PR6DC z%L+u<8n%_ceU2*`y3vCBzI_NXs$c|xM-(D(OF<`XZPk~t;`za-%tYp*|PL42UODdt`!Z| zJNJLB^~>*Sy&ORvJik_l2n8J`qqVEM4N>4pYYAMHeX+4dlAulez$BwT*wXIObxFCAkn(rxLBJNMvf?# z`|{f1vO&5R?@HcMx<|a8_DA%*aiCCSXV92TDg#5R=GGt7nQ{X4TBq zaLvmU^+*+*>nl)?L(cYU`)KBxZtEJZezz-gTEN|{GzAju`om$K-J1OK?M*oO@T!5G z^{TK(x)r1gmKI+c)NjFcU(U)cOlrQcYzRQkpx1SdKj+TBfEF`kLMEp_(rc>t{ z*IjS8mX`)!Q6ND=vhRucC5Bb&aEIRLdcyf?@@7(7{{p9o!Q)T9zpR*=Tys|V8Bzt; z=VJuCy}bwJ()fzDHfIlSkS_MSVO#pS_d z>n+Q`rzs698-jXADr5A$g+pQu^M9Q!$S}%FEtdU^X4vpanx&iohY_j5D;GS6r?L7( zq4+(t1$UuqrsTbc4HZ?+y>}-j4C72vbuQL|+OPF*y(xOhn)pPuuUC=QBgT>390vAS zbY%M`BBHZ+%n>VtuU`2vJDtGVuxUsxlLf@>Z8u^WC=dMI@@p-=@E6%^jJJE0SA+~w zE#ZP1t%bb>aeFJ-AE06$lt}o27?Rt{B)*q)%Qp9j-IsfNjO$*~ zC=ah$-X4WV^UAH_Ugt}4-&8?#$e;aR=N}1dZ^IT2r;GPtvS#9SL0krNpgt1+XMZ}V z<`QxYr{8=6I|AJ8IO!L+n2~;Tlnm33(7_UY<}nub=AYEa&$#wqcX2u zs?X1bTCG})M`LX`5;)V?2vKq!5>0-9E^3CB@#ExcXZ8QupQ5n#K3Q}=QJ|NDooW#r zL!BTQ^!uFZf5Sd*SxQpl2U4Gub0b7~-jZB(cW=|lHlEwh;q1OQr0lvXvW2C_b-JCu z$T`#GS;nD@_&=1rby$^K*F7ul95d;N^O{WMV z5`su~N_YF4`@GM2<2lbC-*;W-4=xXJ?|ZE^*PLUHF=ii_MgAHY$%fb-f3|$MF{Lni z+eQt;x-Wlp=EjNty}}_+g(r3LDi~ykroH$247#cKLruqkk6*y!YYzE$=w3RV_e-u1>3&S5$1)hM!cEOIiO9Ku~}a zT#=NiECvjL)+l^Qkk4Kc9UG>cc?Z35#((*L4+@-{W>xt9H!JOBV<|u;nQ&fAWS~d8 zk1AtG9#s7&x|!u-)ab;q4^dMEz@VTNMr@)QfC4GkClW3Itep6(FxEoN zE_lK6r>#ALI3C)2`UVEMI>kWKt>%H=RWO{yT=?Ol(xtZ=g_@0bK{FdN<#&3mw6v65 z?AdK#5Zv-Uuv;6eCdkOhP=wMY?-aOsFRJ2}zNfbhzPk@skIb*JF*8WtSDT=^(wR#} z5i`acy!D2*cXxwJN=itvsxD`-i8IyR)?s55&1#*`g_*Fx0&$SNqK^Dv?)vs}e{w`}EbOPxh)hzOZ_lKJD z`_yQ{H{8=6=GYxG!NtI;w zP%yrsprN5rZVU?vnFczj^71D}RX7jWE|Vg*|9*|rq3yykviEc8pfMm1JVJ~^$2J&l z3Jxz$!V%~a=-pect~E{oE^Lk4S)g? zL|jP0-$Px=$>@>2>deJ%Z(EtyE;F>%G5yc?Qw3WaAdp6G16tHZJp}EZRwxtRhIprN zKHyv{Tyo3-X`Wc^|1mB6@5^l#8Vlazj!VW$k}7nKP`eb8zMCj95K88j%V}gnH`GIJ zj#e6U*(7wpIIVloHf;hvEgHZC{oV{PqyQRLYlzy}8Zd;ta>u-=M%Bg{W&W}}x| z_7nm5U^E46D%a&cW3gp@@VL8v2MZg=_h z&sb^+Xy(LPK{R#MZS-v^B%nj=6r7JAIv1*xI(MZv!rYA#JTj3C;A-Ge@L@o={C=xJ z&a@o*(${1au-_tF2fl~W=L9*e^bF>vbO0OfHbLdSBvoJ zQk#OlC`k;D$hbvxtM-e=r8cN83N?-M2b40#b=-8cb~|ORDJd{SKOlnP9fu(zn^gZt zXYtqN6mX1+#!Vt-aa_@oKG!$KsvN$I!jMr!mKG!JGeCH>^D5`5hUXoJ<#_lEUbKnj zutAi%29)1C=hp6^Tto({1clbO{Hr#joewIn$(5QCA=zU3!^CHtgQ8w!8wqPOjDr)^ zJJ~8uB)jw--;+HgR2-3HJf%Wmr+}4dN8a?Brh8g#-A@IoGQkK(GI_oOXynnsR#Qf1 z-u2Q_I)u5h*^fU*viCN~Hbgo4?=92sEcJfS`i@5{PL(O)-vQ+c*V^)uBOdAkVP*UI zAKF@hNEGX+`mDx-NOR!X)^tl`NS3?%amom>p1;if z`s85U0HBTs`N@P*Jc<5ra0xexQK;Fi3s51USQW}iQGP zrHDQXLy#sF%BQ;ij~@sK1!c%g2BQo2yzK}seTx*b&g+)AM59@@KD2Cj)8JRkyw`t$ zLBx(a5O;Q|{+IXf-<3=aVYmSN3Kg0gYqv< zhJ?@FPD{ho_{`~Esfd%AiBhvQ1h$yxiPsVWJ_EC_`q#UK=Kl`$|NRALbbu1dyHZ*r zCN!p`q*YhAlE0{60pWH{kUyssw(g*=Wi}Rv97M}m7nGUAX52-Xx^2!OIY+%Z1%z0syF?wCATA- z_M)#+e6#7N$~@lo^`QMgkTrq%R%C%PUQL78;R*kR*9CUwS@4a9X`Vpj;ps1j#0?Z> zX`$uBHH{fBH;VltxrEvKAo(mK%rtykRI{d zPu533#xw_{nbD$-$+!3J*tf($8Mig>Y>na>-q?^wKW?%_)Jn#dL@kChihizhT1qWR zkW__-zD{gA{v3nz&81%46tN~MYV7t9`0`eE3UBqT2w>;bVd1z^m&Di&cOJNWXxK7q zyd8hpb$cPzQ6&}Qt+AirYgg1{5Ww)0rvYvDN$a(SkkH@g>+R|3nFVm!mHoHSWZBx; zX@&NH_IVH{?kr68_WX0vAT{>8n6VWC<_aXV=l~G}Ll}89-7KZw3}!*+<$cgY5z5$~9VsVcOtvQl6^ z^9GK`+;<+x=3ici1^J7#pf8A)LB{FrCvt}<92>rTEY0=E-pqBTlw*va(Mj^Al>zeC zn6!DFp5*msH}pzbBf1}FUXqNg85Td*Yh`J_zxcOFTw*2iU|m4KV6qqHvgW6 z2~CcuT?&$}dhmXLrV|M$R9gW?-aqL2uU=hl&UdwYvPcb3*#i1MBxr(ke1yVl8Ys3l zCPZ^L1uecN{rY4@Fg`vWJz2$cn|!Qwg6O3a9%FnTgepphjs?VnT^{Jw?C@|$rl(WS zXLh>xG$EI-9x9XD<8gT!E>Hj2rnXn# zAWe5No$tKoxiu{g8FUcjnz@k99RYM{YX%vTV#+t2^QIRs7~s+jK~d{q_5arSi2wdC zosSNNC+uR-E&Q*xnwy*1UF$wm?GMJ+AhUWYja`+17l1D)FMv@j9<`3{Im z0m#t8@TeopExtB`5(Y#Q@(b)bTfRd@MN@u$7eM-`O`NN9U+(}A4^U=R?P6WEvF8=1 z^Ic|um!R)7KLR+0-S!+douh|G7)s9sbfxk9f#=sKKU((OWa@zQ{KEn0n8F7CnAEyr z3)l`e|FG)i_3X=HJ=B5DCOE+nqu^IqPwh;A9z7VOuiXMj!E z*B|#8vUX2Vnn+0O_epRoJ1SzOQBgm7@;7Dv@<1g?lkW%` zYEkIrhe0L4z{-l%_WV5ByeqL~!uK{D*!)r39|hN-w@*|0LE>!da=Cd|B;5Ul6H$OJ5fhVTzlv0;-d2P|RAD3xXW)%ks0jD_QV`npNX z!#veF`+TY#DH&O^iHygj?YWTswf6)4R_s9&re#1n- z^Et4h!=U1l-pN!UEHiqq?{o&b5zVKro0HTlKdky9t$kY?0E|M-(X*9dESnEaRrF93 zcT1S7wx7`65w;_w;2~T|gi0j&DtcnC(R>5e0^p&!Kn%@gV^hH#|*{Lq=^u51mi9)R00Qsd7T0=IW;nnybq@ znv(Oi_JDBVlCT3QnmG%w_)%LROX&?X0Fv==hWkuF8r4ROZ|tVTL;R5&=nf`UVFhg@ zZxq#K?3XF17ky;AQ+0ZL{P|eMphg^^5N-c4w(%IDFLn_aP~$Cy{zW$a=!1TjNB@G# zC4}+15qv3|s^41b;5x%C+{XR-H5X(MWU5gSf5fEf&TNt#O+4lV{M@@kFv*OkOT#|A z4t?3_K4L%H9$`ILc(3r)-u}Ma@eaxK(qZM(h$>S0aX>FBKNj8aZ8dUk(Rj&gJVasx ziaaNui!%X%*c%ZqQ}3=Lg56XAXLAH5x1+*Om_Qi`nDo0nPQ)7lK9lOl(1r(jWW1p4 zU>x-+F0|tF2TG%9)=Qn+RVp$H6x-8e(^SkG?IXacY&0FL7N!BKnF*=$DWK4RH)*{G z9w)QFmd`$*&_#A10lE+e@zwgb()<~U#yl|8c{u{e&RI|$j`mu-SrxD~?VN6bfnaY( zbtkbth5%Ow`%~vF=+>Yht+)@U@FGuZG9|3Sl1mYjm#j3urV4Rq#X^PgIs?4Eb}ULE z1X>hx^KOuw2G3Om^joORCi)}OuNV+Og%+q(p|cLUaN60eYqP(K=Eh&XcbghLSUSnk z<6~KCP85&RZbMuUPx<-errFFcYsLE^A$~*(lUuZ`y#LwJJp&_Iq-=&td*!@BGSvdY zNhAGEehGtD^MXg>SJ|&%&Z%vrhJOh6=9Aoxg+rne?t}#*5;C0xthF0#T8{w|9<@az zF0@EOf^-i=$k8_i*vDni-%$Z8uUCKodwwQR@ySXQfS1DqfKe3_q(%}E)L9pG+7T`! z4jVgp(wk8-F_u-IgLzpf;NLpRqbeOOz4$kYH5Tk&(RP4 zqv!g?5_gFvGC0k#>`KzNn9Zqv8v2*WU%xROciP<{XHeB$mH5R#tLqG$6|=1sAHD+J zUlHpL>=!mPar<$e?A1=`Vj?u2cP8e9_Y?Y|*A}aZ?0%lOGr#{^=%v7ug2-5o`vuX; z{KUG=Di~Gsv=X4zEj(8W@j*4SjCY}^w;^M9b99shH5F8wm-jZOOt^CHxGY9H?ik#B zk;IMa?w=CSPJ^n^RssrS_5owrrV1{RVZisIamlztM5WE6cO<1wxNzH*Bhvsp1CU*O z^eVwU?8CwXCZHX#)Mtu$KSw2u&;^1((E5u1?T}CB7-k}~JPnniiFvlJ5&?q?9fA7% z63~G0WOIJQb(1qNiDjn3_P5^pC8}QA{kHegsqPg_%e;`ELi$u{|}lcch{VL ze7R{7B=h5A<&~W~SLYOlFIA}C%nFLZW2+Ao3?cd*AFzT={>)f-1wmos%Qf zf(Q78FM3Ess1#a%c2EpA$j?s8fZl%dd=Y=@t%up|hh3rET8E%{z^ZJJ)rq>Bs%RZi zo8rn*yQ0IcR~ik`f#Y~>emQ_k*a*R8IMW{Sf*kaZ zM02kZ($Cu@)H?!#y(2@G zJle{xDk%)9(~h(?X*D2CR7+yLWj~I@?|Q>y0g_8=&?iCO9|^rgnSqUHDiI<~T+*P_ zVJm?3|EIvVl1`@D;mJ8}`SkKX%WjO|<*)rw`0Z5ziXcQ!sE$4U~Rh$aZU zMO#ur^063663G%}Si4()-K7y|n$b$l@w-8{4#{*3Ao`K6yxjaU-)dC5m~@N9*_HCD z_tmS~-Lpk`tC?})v)w{p4_*?yF?h`o2$xv4=1l8or=JUzjI3{gV|CK2d11w7S5!s% zC57zkU%jVhEiBok`j5N8*G?pypF1pDB<|%nmZL42Z_BmK|Di1;?w79*{27%@|bGGUYgnNP6pve=&!9f%=e&0%%)a|q-aA|#YRwsl;|&&#GKQR zN4C$g4qa*efzTvA+70EofL5FJ*2*qii`>VSW{lU(&`-YOzJC7a*gA&Rfh8#OLUJGL z6tTkabN)a+)I325btSw_piB(7hkC;=cjz!a(aLg;bG)r+jejpW@rnLrYA?=`1R6)C zoME-HGKU>&E0#yDIvphg8u4}b6sAEF^1m2AkUfra{dP$F1m5@Zo)Ny!F!E}^^}Az@ zU+R~}`C6`nZ5}rGzC{BX_ZSKaQX%`lDYAd@W<3@6jaWZ4u7$?v`6f($zUh8t z(J=PCN4e%%ltzp7gFfu1xGV*ZLX)>4H3RyK?-J(7$dY8%K)$T!3{0+59W+weB#)kx zt50A&;}@=^cgTvlQ)w5C@mLjhMQh$E{mWC{dRd;=H>?wX;;au`K6gNu`GhXi5U9fw zV7A#h{2{3o=S`#*@{4CVk{-LAD6sH5uF%v*i^?VkYkH$7(a2lgLIoZPleT=JC--je z6DYpP3y>PKX5W5B{}4Hw=lmY$P?&<=Lm7P=KpXG7nRM=P7_%q>^qg8yBc(Or24(3H zUCk$f*GnGqJ!$G!c-#wD@;M7Uw@P^gO%7YGW|~&8&!aAIHggq5Q&-fomAAc$TjyP9 z%)NJT?&2tU!van)q%S_Rf&F2VV|6H#xf z`lUEocV7Ma-;wT{l3AAcSJV7R1K1x>f73f>Of_SF5x(BE%mi0=H zAWQjzv6oI3Boi>eLNt@DG2g!)QIf6IBl|L23`5cGvgbL^3sv1E(=_g-YrnD3AND9A zHD2*y!W|?Nb5kH%JoxS~IW-$)G&5>1*l_dBY*;@HJ+TS6CGuW#I@6Ss!#w6{V{>~Q z;oBFFL%eUZgbgXt#bU|28Nc*}AfU5?m#!%|r*UQ9@kh*e&x;itJ|>RX0xwd^3N<@y zkIBojPwpVzL(5_vZv-9NhZrS90?RpUP-#N~g8_|K)a=f(M{QRc%+{eKhDktK)ifMX z&XJ>7;EuW{wySl4q%(f24=#}os^Jq%nc6kZ1Yhh9 zQRys_<0Wo|yz?M;c#^>J^!ucFD~e)-g>&*}h!S*_Q2_O>w3%)#4IBXetXW}w4@w`< zo2kZ<^h5+S?zW2Ya3o!b-jYT7q4A(HYXco_N6wYSX5sIz_(0;_%HCqi~DJNaiUCtA2IL< zH|(d}q+f0NNjoa?ZulSJvZc8oYVKPO4PMRsay0EhFk)ETJo@z}NIP%Y+`8ReCp+YI zag2WbaSR@`f6^|Q8Sy|@^D3$agyOm03n;0TILM^39h&K}#>+Q0raIybUNj4#l*7K= zDJ9T~<%MFH00onygq!B;?=oJ#Pp~9##(i6bC&g*aKcMF)@Ki`HEuWHXr(PoI@>& z+L`I|d&u(*-bCR%`P3W}v9ANTZd)SLT(*|9p+4_!UNkvs=;HP4aJGEb`|Y^aZwI{K zT;J>-N4LncM=#4t$z+kK9D4Y0NP?@Yg`mOyt}*oy?j~B!p9X;G#rf@}(B+vxJ6w0~ zs*R+Xa}JMD?QQViFq2#_EC!8S&B(Jq7h*t_qg;0f24ht9=47_|`l#*ysHhWV>q^1} zx3-SLp?x|7jr~>TTMx?3q0CT7dg|~$Vr`v8sqSVwi zSJ_r0Fb~@_u=g};iuC-$4&$4Lzb}Ma9!p2L_Xw8Jj|(KQ)h690`<%Es1Km*e#1bG~ zqu6FBcJ^(Xfh-8d4@G#E5&^NIr=zfSzar~jxUd-+|)Ez z*$zhqh38*>F1z2k7b0eCNV=Rs#(rXi7|!RcyQZy9kmrtJyK(u#o6PUWl(KiO&f-== zX-9ovO1QuTBB6vRdZ^9ey_gl4z9#IolXr)Vl*icc>D=aK9HSi19ecx+gS{H|My{s>OEyigZCE+ox#F&aUah(Jc>pcQ1%Wj zq1aeZ#@vue$R-NqTzc9gQUe?UcC8vfL5eatJ6N(d>*>6sVKltekQRD3&O`d${_A#J zmZ2)f%}1hyP0VJtGhliRrZT!y*?{&aU~!b2aTIN?pjaJ;cJ1MWMLi!Sx<#ZLU(nFd zX4NRzjFji!sa7PMF|M$_WmxM<2>MJdhk>p`x1IT}&J-?DZbM;OWu(G^FaO2bCpn_< zmjU#x4cgBSw)Xr>&Fhj}m3FdqE|Rh2c^5O-OZ=8nviZN7uVoX@7~dE%;@}wJ{@lV% zc!zWppQY}DzXk()N4@cN9;bz`7JDmWWI)jx+{mfD+IQ&j>PxR4G#LPari%rh`$=$Z>$yRJ2Vg#{%*8-EnHo^d8^Z_U$HW!@z9wC~HLr#amEYF0bIqzu z-iaqVuzKN|6w`qc`Om2ZHc~D_ZBammn5>@MyC)&ob1LAbYR)7WQWvdsf1WE9ibGoq z4Ol51^QE)n-Q>Xm#R|BLasZb+x;Wh&Dga7a<}-eIgu#44J@74*XQ8Ut{AH%Hor>D? z7jJ)BAA69`(4ebwUfbisQf&^ypg^oQ+O#N~Cw zB83O_vgAnlg0}fYgSSVS%@5{%%X}r44^ghc zt5Pko>7v_<`%ljHerSxkl)ryR|FqjcZ71&Y(O)-2Ty1NEiQp4$MbaQ{J(;arvjPb^ zEKf@UoN7uEexfb!HSO^gkZvG~7V5HS6ODUMBCxQhj(`9V&+eV>0FZkVJ3F8txj9qX3kdPlNy7pjv%wsj$(ZBh$A^5uK2j zR~9)oCd@#${M%RBdFh*Viq$R1(D9hYyM2Z+A$J(Y@$eS4!ocvG=xAXkyt(iycuy&E zI>qXGl{l_!O;S~1T8E(a+i0fo&)|P_fR{MR1`NAVUs-no)6OXi8i{{t!~%s;t`l{4 zE9*hrREzcF?!s-7qJPP^*+bPJqyR6(rILei3CtlRVbct%#mS<VA@-csp&=4nQqJb@>2)8?yB_b0MzqFLsy<-Uv}at9 zSIxPN8mvmB<#RtXEjLs+434B zvXD~BM!}xhg43D77l8Jo<6|?hnPb%{N_tn2v3?UZ;7?NXiuMnW%TL1{g-nG;+2P9S zPW=!742L4$uOh1LsqzOvno){)`*f^&uI2nx`9LvY*Kl>vM*DWbjR?~^x8GHFhL!W` z!aa`_W@yq!ER9^4Aa-sw^JB=lBQM?Y^Dza!oU`dT+jPR=a6+omH7aX#;W6}*12h9# zS?Tth$iZdea;9rdwnb(3SzzDPN$HB{T)49eLDh42k&EUo^21I=Bxg#s>HIa(2Oku^ zlf6bD^aobDs&0Dm3^Toc<4m{#t_FkhNh+>Kpc*W}cx6Z%QJ?brdQI#LupKleVd5TU z*ooIbAT2U~{xqR|sJ!tDZ^}2UK<%}4?<2$?S?8$s;Xzy#9ft9RJF2CjjOa$@RfaN= zKB?{ktzt#DB4ksyl`9(ZCgPvmj`#+IkA%b(aO0MG#b@7{%#pgS<%-gqb?`$k& z{{-n`>15%HQ|gt>MS3aiDWMBDh&dBj-CG9xQhZfd^H4GoxzV;*^iZw?>ed2ID?hV!~drQXEdd=+CqnR*4EK2ANk2Bd; z$b`if8dS$74o(yxlji>B0sz@-qIQ1}vslrAaLjp*#AyM@+`vw}Ia_W4Xw;q{t(FDQi{ zaS6rxpKXUlb`otf3pKRrv~QsL=kssARqp*Um=fmQ(>HDCM4|he%YAUvVqse)@i|W> zBBp$mD6+lG6liMm6%(|?rt3lI5XqVBQkM3_BHPZZkY$M*?pA7VJGKiVRE3t2I=UQU zkYQrs>NosB>wIl&%dF@%T4&M?eJ6&U?M0;%Osj_%4wyr{L(xzd9MscIEkM%#4ofRy zQru~l(Z{DAaCj`onCKtLqpxq@AXGg;frp4{ZS z^oi$au95a@e$|)0`t02z%H3yD7Y9-P@sg5itD~J8Xn4nUcHFx*tA1L@H?MieSkS~! zMop)kUqmYC~&VQtsr=Yc@(a!s~!8lSL;~_9?G4CYvup|==3FJRtSapten?J z{UmoA5Pm1hoU|`Ll{53BtJ5#q>vGAPHls;BV;ao6v?9~*-iSXg)UM$x(k{hQa$}G}Aeh+3 z%{C1mcX~bA366J+%Fgf*emuCdkdvOr-`S|z@%FsRds4WDAs%6xg!PKEPdigwkgP}xc8PB2PKq;63x|{cq z*N`P_#^r@2R_o_khj^pN!){}OX`dy&bKNcF8^OVxAa_w*hmsE-cNlTmNK46r$ zLXUPbj^E5$L~D4z#OUx;i7xAm8VW7S9S@OgT9((__faDLecFDU@X&5_UropJ#vnE~c^X>?x{8=aQ}mU!j@R-`z7bg89+1+BT8%W+{}C z5toYf%W`g3--w_~=)46yosceYR49YImxP42AKl{lH=f_*KqIW%)&~etP+^7}9SS1> zjjmlvY9QQF%Y}5QmFKBm-)36EMMZrNJ2pRJRMek5#R$ooJ&{GuZ9k1lTynnFI#tpz zV@F$N$bWFH;(2{1_i;wljg-JH*n*_JUkAKLisE~-^( zh+K_$|Du{hpuf6yiMPu}++or%xs)mA2?Oc`Bn}c2d6jk%0~EHf+u6amVkxT}Ii7MA z)4Ad817aD(pZOEb(lMsZ+D_wdITk~DSHSAt;sP&vX5>l&W;qrFS zylyV4NhqcHJS%`w)-ekqjOO&wi32Hg-<$zUSaPA)nb#br^SPG1%?gc8;-6h(B`hPQ z*CwO4&nI{HY~GA-`gvrYF7!+sPI^0hSz%gz{lpf(rBZ;F=SI!XIZO%`#R{~3bn~ma z>ao~By3(D)I1P0i5&OODT~jZ?^7G)n-{nO;zC`4$-)U79(N@CO+rSx`2tRUpt4)Djn`6qMP#4 zv*FX+t<|bu+bLBOeHAIT6`%S1;w5>Z$!t11!4!3kPsBynG4<(d=*izNldvdf3p0rr zzNlIsyXEnnO5j#XaA-S++Lbb*3whnNg+87Kh4D#;{#F==4o&G{FyQ6EQ?6v*JLSq; zZiFpHSK9c>USM87ZcYAt{ZWf{Ee0#{$(N~qjE5M93#WYj^~xKFtU8-R3WjGw*II+Q z+IC3tKTD2GoNPY>st@aoc1Z4CG+wx-uKtNADbnR0d17?v5?iEIUK6IuNq){mK7W@r z+0mt88VXu7ITFDA1TpS59{yoXSY`cjECi|8cVH+~dW zpls(Xa$SIxRnrSc4Xnng`xwUDJ~JrQ?=4h^e{cT@PA}LN-8*@*rYt_L`vMG~L2Jy~#XLlTp29`>AaGdq(XTPCyXZgY~vG$wVxE(m@ioo5;wWuxC zrJHAQn3a5>@|80RBT}O|!RRB;@!LO~o~~ZxL@_m2eeb+sKYFw6wW;m;M7u&?qi&kB zzdT*TJ?3j{URIVVNv$Pl&{x-JV{7W)nxt3ac7MObaQHC(9$vrE`c34L4ZaW1Kd%oYLxt$Cz=H%19{xig(}H^Dp2!OiqO9psqb6a|z@Hnjp{ebG&Ptk# zVv5bT(Dc?@pFBC=o?N8+f@$)tzd)H(d^tH{QHMHeuL4)yjOzyl(_n1K!F85!NNnCA zM={g)?-x)$=-}9lG!jk%UuOR1Eup-@uj{`iTLZe zRKe`U?twvdbBz|U<#NrizgAL`b?T;VxF@y6F+Chsul>w7B-+w#F_KPKylWzIbRWO% z!ADbiP^uX}_2%MSalJF{(%Y~+@+pkjSEAFO@v^eS#|E@6696OQ6cOFj@y;f%oW2@NQAf7Ch(B!-#UQj&B`dOy;as|bdXr^ZCS2?u@L>zfL2UDgV5$L5FA0MWkX+bfj}w_{u6o`` zhTe3ZbcJ;Sgbk@%{uD&S#f_p7tajAVZ;V&|Cc{svH0w%KY}Ug< z+LAG!5d~BaU!AReP4V&<7o{4z!N))E@txRef*w{nMm^&~{Mx^)D-w8+{5Eo(^*4rL zC2nZt;Q<$E{iJ>vDdA@p={ijx_xePP3dXEtqrM#V3fza9dv+8z#>Ki*^IrtWp!SgH zdtrX)EGChoPD%FS|9)>X#S#I9bo?h(gCt&^woXTm=tfkw_0KUXSf-1NSl?e^lgDj6 zkJ_BzUKrrEo@(Muz=)KQmS7!Xt^Bl}GfeIWyu zXq^V_e?Dj*_mQ$jIW=PzHQFfVx_&t$(H04M1#aWO3#K3VI2%lDuRQ1jrLi>~m)c1;$rgw9(L=fght4FM{d> zsUF~9-+--g_&9ei6hJU6O~>KVVv8(J37BCtO0p`i1nQ(!PR!#+U?Iaw4ZknHE!~3& z&4F36Zn~$Ttu{e&<$9J8EA@uOe(8EZsU8G?TnEQ4Z5Z;!S%8M zS=zQEz`@a)t*cjWnRqe(KoI+eT7Y|?1{Tajlxn^C1=oigo@fo*vA%h!g#?|3WSjwk z*xrf{yO-~*V{TzZFT#d>C0N4QIl5EuCLu9(41{jG@d~szg7&h~(m@<nkI*zhpE zIg}$kz*LV zanZf6!<96|<6z?v8y1GAlVM1blhaqT+rzG2ml(5{0wGDaeP9@9anEs>w5)-pZr?$}yLzme?t_A8!&v3I zNs2uBXL`vx&A-l{&l~vD4u?Dj3V;9O!tTvDtc~Pi?Uws(VGImn#OtF~^3b_+Tt2&* zrZ^gHsdHl1RVo#Z{WQGxp3roW?V;IW?B%21n_R2YA7Ak|N!uE1EcKdgB#7r&2;g7OAH7p=wjP_TL{H*71lLcEHq{UR5N;MRvFr|b+=zfj~I9tHeKG!ULIX`3>F&2GRxvihP< zqHEb9WRgm7V#kDiDQo)s7nE8WOno+Dv9^67gZlUXw(!wk<8GjQZ^RDA!+`oRlifl< z&mc;3`n{yND^KDcfBEItx*IY4Y;}&eXFLXDHbU0R_??*DgsxN@1iv!jl7vZ5i@lj) zpu-SVvj%&Q)k_rsPt4abv$NxXIix6RG$0uNw5{;rer3%#aj4(E0=knqU6J~iu;U;U z?SjDkU!mwo{2vmfOW2-Q(RwfsTY_oj=Q`B-%I5ZYF*8~s+OHLo`SPEgoe@%KpO))> zVRHCl^~RuhxK_ZIOpW1+S^wjFL_ zZ9oH9tD5G%T5s&@rUG+AXjf@q#5Ecqyw8C`f5%AAk}QZ?bS;9ka^Qdu>?poA7g-7Pr2ljyBK3 zhC@?9gRP!n6A?8?q47R6Fg5n}s~2 zAm6P}iwcv9#*LUd&i*MeWz@Wbahq%$oO6RhLUug?Of8!mv?EuuDb&ekC<+nK&x;x9&nuXQw3C zt!yv8amWF4T6R@6RaHDKt^3U>q(lTUVOKCntwi zPkCMScHis}n(}dZHO(#5`FjQ&Rjff!LjhteId^x#=LEF)5jH%Jp31(@1j8njVLX@~ zO1=H@4!Nt!DA|$@1Ipq97+_(k31?6L!@^EeqZU>n+UE+|1)Aq_cHkzA>_skPzg3ylhpKh>Bz_6V%V@Q~n`g48Ty1MID!t zYWKDtArJY}gQtX~*V0sa#M&z!%5q>(AfoIZz!hfM`zp#v)Z;OX?oq%05oZHv{VmYN z-9Fpx<(M(J+bMB8PaI8sTCV2kf4rav6PNCN%kWj|+Z^53g_(e4G~R2sn2^v!MF*#m zPqj_;!!ToYp~Rk6OMddxp=0V6pTxxx!)WCbj|UC;;OqknSPgzMDxr@6P+G-}3J`v*(` z&7Zum>=+K0@c3?na}ab2gHYergY{i4_Rb-6p16s}<+S5sV_Sd{bz2xPu{AhJA#e2PE|cK%?3CjN&>m=>Go8*Jvo00E-C|%6Z(|EI86%AYdNPMFa&z1t;_62#$qbQ8q-S@^Q;WJ&75kqND#TH>M3QtHp_}p` z!$L!6;MjkL`te=;0Mm#!G74E^IwO5o%m*1cV*FO!$Y=ZW5^Ai?#rdI0D#ju3MQ32p zYxBwe@8qkeN87~LiEakznRv;4cJ!Z(U;#ImIrN4(BU?KD)7$W%;u8&aRM^(i5K@`R zA3z@R-&Vwbf8m)Hpn%~KVI+RGLWqTX?Q^o8aUgi4-5807&Gu+$l~xDu;6vjXIvX1s zx4<~mzqoQo!28Pl2;DoMpt|}#5JHl;7d)(71%l3P21Sv z7l#lQChrwLrLwQd=l^EOFK9m=Qa$;M7}CV!)_&xmq~(DPx`mWuFW-ZYqT8WPho=Hh#YmS2Ta@2w!|4islT%1SM#y16YajE7p%)?(5cJ=#9 zN=zT)Y)y!Xg=oLDw5xtLHjcvQW1VTxYa^G%T=mV-_yviVAZ&+*5(WK#SkXarWPKU% z1;eAp2#gqSIZn$6`-{`uBVnJGevFWokr{eafT;R??Mg`K^O?1pPx?qxw1FQ#erYgj zZeV$Sn)YL!2!HoRW7_cfz5VX>eU2=R+{l+rn^g7=*JFz4FIM-pb1AM8^J@z6h39zJ zcn6J4C(=o_G%&L6()@?*I4zDAhqIw2maZOla_weeuxhbRab#g34i*kxYt501uzWS#6|PF^OCbU<;T0DeNV(e zno1a^X3%LK<*GHQu_ch5jyCExx4Kv|n=dx`Mu(1z&YLe26mIDP1XekGTd5WNp>ELn zpOp;{Mv$PvCC~QH^oTC7T&j64$a|!@`C+2u3L%u&=W+Mnh2!5cT?zDDy9;(~kEMAJ zMjtAxHpN>=$F~p=3sSe0<-PBoj1TsG$?a-s=Km`K4GJU}Go`-Y#1| z#~kLWHYSW3eBaHdewH7>&Kpj(W6b~s-z@C)vdWPQf9y>+fW38im^V?u)cYhz;Gfgh z=mK+~2wU2f`WGZ9^yf~*?;?eqXZn}a3oNtUl6gapPiMybdmjyDW9&Vp3LaZkFqbfY)19GD9>3u~5cK8~m~Ky>`B$wI>=9jxTJ@0kp~h6KAJ)9&pMu=bl_D3DI%|DacdOw*`AwY^_|A(d46oiN4p*(MYc*2Y3Kr)_b=z>)} z7~7s6!!f%2LEIV5-2Z8eBc81>rhj_@;>+Wf^2^)TpC11(psXDD?ZYO+>g0P0N zfQ>i8*laxI2I@k=D81Vhn2sVC7<9Z^rGDA}`q z{$G~e^{oF5i4hpq&=T7hM%5Aq3Z67&Dt3^G&T{|0MmX+JLYZ&;=5SNwy?9!{1F_e_ z6J`k4D8(~Kj~=6Pf*-!9GT@mNrUxMdOlUS(7`f=6bSQiR!xQuUqNW?&e(SxOwUh1DW#2-4DXz+#Shpyf;`@AGG9%A&x;!G>$dz!2TA61@m~Ym zg=PHJHU8!TEcTq@LvqlOr}QFedJ;II2Gp{qC_V;CUBCu6>S#^AW2|xwq;9D%xbha4J@E!`AtKUwyqk&c5~S>$BH zrlxd-XEr1koy?L^(=>#Hfl&Gw4k^`t{7|=X2c-yFcm)=tL_H-daGZ1z<>8-yy3V#^ zqYN-T#+|~5x{9^mp^{ZLdLT8dvcC!czHyHx@q5dxw#WP{Q z06tPo2%Qo*rSvXTgt z&d*g8Y2WD6=Rsvq2+0;lPK;+HnX6g&9;9QzKQE4c0IZ}yOtMXudWV(IYKOFpAAYM| zTVu#2!%IOy5};1+1VxeB)p@td)k9r56O^v;NFsXn+aeetEIf zSg=HOcQX8jURP0BIaZ@{+;zM4v`r4w4`nKvFkSWE|Iqot7I{1>r9OlPHTwq+W?Who z_R+kz;^;S5Z>a0v%zk^Zt~4%R{xSr4+~ls8$~sKGDPmcrxNvQDdqv?bZq$1ed5WX% z;w|8%)iu1i=Dtg}<@DcDw&5{9E1TqhV@OHRUdKf0& zJH<88(b1Xv3}%8C^B*)D+{(kQnBl7$?%Ofn2$fFF^~JH$!*m4U@Q`B(Jmj26iey>3 zFW|mYvK_%gAVUmUf~CvMh@tpC-~bBD5rYFIY*uKR$Lr)Rc2K)r3H^5=94#~htEH?@ajdr5VYeRN-+OLLRTTrRWpgQN+9UI= z6RSj3X{cdf24tmq(fNf#zC^N-aI$ce#oUB_eGNEuMuMi8(MYC$lHSR*Nz@8hjYS4i zIm4dKi%TV9y@DKzVeI|spO)0qW`-{OJ&#ey&79p4yFeoRQ$z8=r|3rVn zk6k$>O7L3h<0%1`wI)Hn!%P7U;zh-TK@rn`Ek#0C)Zl^Mo(t&gQzJBJ+2f1hSI8w- zbK*ULuv>Qop!@v5!4AomPU&ISQleyhL^(Ghb~ZBqdUr9NuGaHAH(+mATcsGLu98v=co+cPs~X zhOL;UAIWIz{iZ4dYR!0P(9lP&BvWP(Ovde!5SavyWY0jy=X79DvtUnX z0azl$e)@#4zWjSkj`wSg_Rv^Pn`fNR76kP7_3j(^P~0d%U(;?EcQO<~Ch5_h$ks6n znrq&&KAuOY^XXsbs*5cQ!!@Z)bgax^X!TdskwyU!N?Z*^+H0lji*#*&2s zRL9}GfokM%Ea1-S%|(`D#1EQttV=3OqwPEX_P>?-wp?QKQ1J>vhXK%xRi!mfs^vLY z0~+c(!~4%4GGt7FS~cTtNZk|GG5yCj;s#;^_h+3W&|{0=NMvIltGtkIW9y9y*_+H5 z`|;!en7~9smnEPrw7~kM1%M{-8Bc&cQ3H7IOSxLu@s{CdauNc+5P0<@vqB1T$Oaxo z)s7N?BS2wIkdku9ko$VuJRGrASQsD^u>Yo?C@ZcSXp(n3y_fs-B~?l$<~wA zf`HaiCAh3K>P{~zB_HGw=J<>zb6M#YEzpu5iZvDl@AY*U4kaGgMeXO#!!-z?#KZk{ ztyEU?=K})+knlNu%{}UPIVGiN$lS|XtEp}P3+hsXhBZCgZ(6t|TJY+|djK!iuMm8W zF132oiHbeT(%tZ+Cs0`K>yMxm!K1hVeF3@rMg5mlTJb(&XHK><*g zh5lHnOil3XVIM!^$M`AYWa=Gm?VEuy@{b=D&+Ch|HqN^U5biU9$?GYrSVls%+z10d zrnncmlrT^*KIxa{1w&o*1_1VF2EH3JTM+qo%Dq?u;4=}$WK zn0wxFl*uf$AQIy2vk>&d52;yvgj~joDi%P-~}@VBEkl}|B&dD z11P)FJsQaAL@kdpE&VVw5OuINNn3hldU1#Za?+U`u!+rM;5yL&;O%9uZJBjT;Y%5#+J2ds-yUM zn{u>%Fp1jZkP8(Lxqnz;HX3(X7vz$X+x zeBTHixCuwPWKnqlL7|ep>t7&S4NCmaWaA8&A* zFeUr+MMyoi{}4|~PvA$Xg?D=ZA8pj+MX^`<3sfB#Q5tad5O7&-W9>kyy*%g`|CC#$3u7P7HXv!q5q5r8n8(}0_A-^rXd^wV&=l`&03gB@@VWjejoM$eSd8^`7<$-FM0Ph} z7xQDGA7yqc`ohVO8~`<^)51b|79|9hh0a3iQW!=l4;$mz2LnRxlUisJ#GKi7;%^$X zB!gM`4k~M{4eja)I3IVObOhjN;aMw87`<;mbeO@LR~8O_@9Y$UV$b;R-;2YgRokb! zH;7d6K-@!tt6u$mKyoJQT(Nwfuh3Al-R%989@~pNiuz?u39f)Ur+37-@PW>;dQu&wE=V%twLcbFv~9p5ujnSTj)K0$@^NuGc8Lu&M^OP1z4cWjh*3N2btLtame z{3!H5p7Z!(j8BEB$c|w=wCxh4gKeVZ#G(+DBV^ZRvsA-PxxFd(WpM3wB6Z@X?Km=YN zh5_L_BNc*|GT@~P@!Gr0ThVWbQMVRdx8Hqg*(jQ4o4AQDIcEZ<8PP<}cz>cJ@=6Nb zF})Y9K*YSW5MD4eG{Yf=4_DgdQJO3<0LyZ%=wGZE&(TodvaNaYIa!mE23+j>tk>yA z9V-#mMAKY|T^)LL2RrD<)*QsVq9NBcT5edjVRh@+!ze%b8oR8dTPk69=!40Jve#Vw zxnV&BPPI+SSX}A-VwOX>_7K+zQ|x}rLSxD@HkAe7f8`0UNMNCX-B=q&hnpGFkF zw?Bk^SD1KZaiO0hc=d|$ecd5% zqJTzR5-0!0vI5bqF_FN6?3u&zpll$Z28?A6dBzIOfK6Esdhhs3$>+^b0JlQ&kpbbI zS(a%8p5Y}nntI2V7}sBJIn5X>Q{yZCGIif>Fj4UMT}C2SCY+z^RUV2)z5d(%8UFa% z%h?Ub(Q!aZN=|rfljetZ-AhO2E~7!769&h~feO9Vr9KI~v85nm{8RD_%%*-@AQfG06 z0ayglzVkxR%zrLeU;xV~NXQ&O4W;E(YEZmQ*x_yK2)YD3V|ccllZVpO()ay`Bb3s6 zz%UC>Gni~E)mB4jW=Hz;T|T<%8nDyY74#19vHi<7!ZR`Zn_)aC1;xYx7~svIKl9>k z6=AzI`#1gWIV|!sjd%nHx3-leb@>B|*-S%W`Pm)P7(|W6gzIXSRjAZG#?TfkhkaJ8L zJi52c5{=j6IcGXL6}kdBw~_#BbN~k|%O@k9v0T>EexTHbc`&=3j!Nn^)SFfq)J6vZ z;lWDl6H*xBC-eJzL%0UEy(xZCf;_jDJuBzq(vp-=j-x;0rgb7>)Ta%$e#HWKCXkBi zNL35|4ut0;KP4Dm*_^RcJl@3!r2BTZXp6so0=#HV&)h+SDof*xl0g(+8v@=@J)LW{ zaBW6>k)2*BxI|6VUgwN#9G#7faxRF>_f3aIio?VAORp)6d}2QmwY!tX{r6BMCCW>! zspI#?T1_HgCiUo+Jtweh%1+%`Dq(69Oewin*ynikJ&xAqxky(dJc>Y^;KtKAt7{Ku z?D9jMGvB@uqn;~^;A_6q4L0P>U;PZbB8xI!hpkU)w+jk^zMgiyiU0tVU7+U|g|F`h zEb}l{t?`ciOf4+d0KVN0dbKED13Bz}x^*$)p`Gt9z)S#ep+o0qBl`9izt*lT^d&nb zG=5vzMfSpH&wOj^eHIU?DQ+*n`t@_UgIy|f0M>RhuMfke_E-Lk{&E_L)VCAI_`GZ> z0`tXXmcamZYvZsW+PxfCwk=j!T_^Pl<0A5W-BLt00Uo1ufn@OEeEof#Wvh+npj%@J zDUurVYpe!5oK4`4{RH&i2tiBoNuz!g262fnkWQ--%jnoE%FB0xR^v#XBIzruMWPsN z>{O@qWHu8+_7qhuBO{}C3-wPuPHP*dA3MC<0SacJLiJ)SNG#!XV7?dZ!H-+;zI^Pf zD;4ZzJ9VHLnC)<56Ma2@O4b%gxt#YaOFI*80z+PGVZaw_E9vQ1^5pyNIv_!vnl*(o!g?D7SRD3utmn` zNfBJ}J91eoUp)2phQQZ5A<3ER*6!12GDqrdG-d{q!aF)f^H@a|N7g^`6|>RKyC2rS z;}~Q1sC(kHA2h<9=k`&rM+YS(ShZMN?GB8A&l{m!Z>eVtt&(2pU%eGG!m6iJai zQ*i76a4jt@jf{%grsukli|_#O-WAZ8QeL|RmI(w%?YW1#Fi*dxtKWpcCGqJn|4BrF ztcM40(}NEW2GMj>LVlq$bTZ|V1?hxi(GE``6q1Qhed`r9smnl zm}+!+8$fissqqBxy6))yW1v;7e{#-uU^vLq3^PJtbouH=j`A9IhQcOqQiKOUv>w!= zWmHJ(`_Re&^+I+u%qP>vTI{R5aJbiyx~H zOkz{NJVaCcmYeDkxyPIJv}DX*tx?ukPX}p*4D#btlooI&`EBs&+IS{D53D~%Ae#9o z>0-LK|liqA@2 z`s?@E1`_9~1|Nc-!rdd^{6^D^;F0n)ea_gW8*KVutB1|y-f%q^A04u;^MUx?DW?^M z@l1f1&C91yrDsaTHeD$k7OcSjD*~{cP_|YLsm=?56Z;Q6C|7J;TmpXmtqtycXF6z= zqXBc!Dz&qImS$D7$hU9ba<$$SmNvD4MO#fPCMG7@vzK}5@X(&R3l`f0LMb%RQ1+<+ zt7d3TP||JUMbgBUbt*OikGZN;cxOoQaJKdaBLdQI^WHvqg|A~Pcjv$CDY!_2o70tc z&wCO#e*S#5AzuXo2q^j;BWcADiz8 z>2)}vi2s7R+5xROjlqnvVtxkWt6Kmp%h>zw*ZoP*2#Y{`xZYuDI91-C5q5t7uIWr5 z$RP)hHpe<&Wf3JHg8ZXD`ZRUv=sOSquA%Gjk-m}+iNYLM_rC}Y4YiC}0l5e2Q_q|X z_R=jA*O_7L^s!&kA^ihipY1T3-~MzaZ)*$0)*~nS{EgnoZA~_yo@Wg`!WTn{+=A*# ztr78>ewgB>hPW(EQCR39azvxHmY7CyLw1b0r?T0w)r%85VL03C`}&E6Ja8dv1p?8? zpdbWQU>FHP#v67|V2Kf&1SQ&4FOV?2%BG=m^Luaf72(s#mV21dLC{uOc(^_GJ?}tM z2_#?-Lxwwb;I%V3P706t_-72^%!l0a7Fj-K!~Uz=;Nmdu+komTSM540{35S=(5q$C ztyN;`>36No0Yb~=U{W8YrD4c@F#rL!Zwkqdgp!$@0^kqiu{|-}@IEfJbbTSl@~i7ln>h@Np)~aq zV(C@GI{?D0_Az_`d5)D_t$IX>rt>GjP7wdKON_gUlhaR|DSL-Ecz}0Oqo@P#TbPRR zgC%P^etz0K>h2DnSBzA|@Q0qW{7g>>->0+CJ=V&dDJcP;N8NuEC0%Y`Ce1nl0uuu! zJvR0Bcz$>&PB`BFSL}Jcf=3*2LTrGM$3pX3Ujlvu#tz=&odf&8{*)nz=T8Lk~DDXL^A$X?~H~*)4&bD2idm%@kUdA5(Ai;IfU2;Oh zn4^!ZiR^7_pU^fL-;>`%cj_nz>_^e6&tO>`qDVIY2Ul5SAKP!+FI-R=jGdqpB9WM92Tlgx(XNA9R{Ujv>xRzFbA)KkC4d zDKnw)kA+?Oj}bPCT1jzo$H4+wwSG2otFtK944%Tafz6%Cw+AU0q{8R3OUoLKyztN^ z)ajWpG;U{qY9?YIa`KSdrBts$C)Wl;Ce{Oyw4ZOgDl2-ZR}2BO?i<2!H;Jbl&uYkX z7KNeX#)GhB=Cwa}z$%)LJj1)3V6}rH{0fn}4rXEE(EgPBM-8=VuGhKQJ$WSmaeYui zj4AHC_`9UK-~k=W35d>5^sOxdWK@i{@@z?)04O6p2*^^I!A&^`!HIOnR_=@Y*a^%Y zUh=IejOp4RoE~?F6s6o8O7*d(+OW;#l{Dg1eQ@RZL!-hK?Wk(%PBe?u;IY)C8{Lhu zyxd*aH97OjWw-ye*JboEnRbQgOJ3)CX#w=fQ@~dd@-Yqf_e)?wJ&KGCwg7BI`s8M+ zlpivTq;>l0A|2s7pu+qV^6gj&xMx2nAlTYAWJ(c|z8?kj0p~(>`tQ+RD4}`5^E$_` ztf!@(YH#%SEShx8xUgj$z&OeOXxBnP`xGjBICcgO9A#V$>|XG#qy@4nj~WrlLRsr9P`jG?{L%iXnWt>mMM3-#ykk&gu zELC@Z1{%Hp8X`X#t(17NpQrF405J00ZNai34hC`#x)+|eEJC?cz`EZV_L+zLS zZ=?*$v9^LgVDiU3Gc0->NUeOghmRM4`elisY*t%7GrR(AvOPX`f#MsnfD8K3pP`p< zazR|!9i~|?sj0QxeP?KIKC)^3guuzs6v!2pb1GBk(SBEG|20azh!u2Y-H58>A8A_COimGxd6%Ok}urdToinhWKG&SQ_W zowcDR?+F5Cb_^^`9Ujk*0IZ3HHA|dsh*>iP^3+!;eQr>$%UIPKPaWc-&qPZe9pDp1 zjHDRA^U&l7-Xb%m)Is&&Zx;Qd%d^LqzN6ts&sPP{zRW9qON_n5(l;en+=Yg?=LJgH zlKLg48mF#r;^A1>*r8zI$sf*$U@_Gv@5OD^J5^1Qb=sfT(zDW=#*cC*`0_{IuJJ5! zOsV+fg}cFJzpd_!H|?5LelX*#7NTpp9>w~TC&8J(xHkFKdk8Q}2`HYc)}yJ_v5DS2 zOwg+WR9EHme#@nfl}*4?P=-3pON0Hg%C7cl@xE`LdfqM(+Ui&Zb{t9gTb)nYpEw)7 zf1PFqMvh2}n@vd4Mw~HU)BGCaKJbmCP{mQFyjyht^R-U}=3;KSA*xPIB~9Cl8XU4) zfheu6WJ*|-mcxLxKMa90}Ci&HI)8TA=r zj{!+WK^%h=Vs`-OzJ=1Y5gB5KF7e>U4h{}(VAqJx^Zv?PH$*^D+wbBqle=lXD|r*m zuKX$FIws~VASDaKN_4g7&+{v(2pDC*6p@>o+B4&z5m5HIDIje}R=3Fbq*yl`brBW} ze0$z=LQTTP?v}(#w2I3zM1r_5h5e?D76uOQ2>+(-DUmIh*Kl>k1D@< zallOk0Y^-n$xr!$;-}?4`r1F=<>U2>MdP!u(?AY6K`TN8NG(x=6-JSC@hv}zS=;Hb zAPIkxT?-xjQ*o#b_vCw@b2{&s?GTw zQ`R<>mgvMD$Q-VC^p!g&VjABHvLGhX1q+R0>IwgDe3xGJDYN`{Qn_d#($i5uNOrD! zjS>#5FV>)#WjKYCaFq_@FeV7r!TtVmoVlA;nK8qFvDp;eFmSI$)6yYXVuQ=~vtz2`}( z#TW!0dl#O7eua1Ilmy5T%{-Fo-1NpW4^Qc7Mz%Ol9L(kKMd47besmDVPlN=Mcu1Ln zd_zB8DXY`X`H2v~$d$ACT210ZUsw&VgrR{us;e%U^GjSTgU^+jsw>*f6O=o3w4NCq z(1RwElQcWvhpf#OQ=B+SBfe`v;ok;hM^^s3+-(Vzv@)sglM$nm zOq%Ywq}st79^XT!Zkn9X_CrQjRG>;65WU~CU(GwrwMO_=gAG#kQdL<5?7DD*8Iz|v zFuziqbi28_W-^lOQEZ1zClG3Q)rAq;;(jR(G@SWNi}9FUp`wdC$v>Zng6g2xaF#c@ z3U$9iT(^rAbmREA>=uJSAMtIc6xgS5Xd_L4Da#L~tN=(7wSn+UKvLQ5)Dh?3DkeAU zkQU)@-S7dFQT7nqU)9Ag_=mr6E9$YB@DOPBYbrEs=lWC-t95x$)XH9MZN0a|7C{=N zs1L8HFAVT}i*9=K4VV*JEXta#N4PHWF&v-vG&ty_cG zYXKq8xE)46!^*nvlTZ2dV0hlIlx_=mc8l%&p!(d_7RB52UP*e=N~Vvw$3lWAs6e$i z47%Td18h0ePpAu*nX~dtFG7guCVx1XbbtmA;3evxyU#S;kpg1S*|@zKX|hyH67uT` zGzLY&qlu&Wkveyk+(*PD`3OYvoW_-GkA^wJMb-}I%Vaq)^NjXK$Kgu}mH8eC6#T{< zUN6xrTVyOPqd{ANZ12%%Nga%%01r5&l|`xn-Nge5t${>`Pp|1@++JR##~o;CHQ@b` zBq;2g7HV2Ei_z|)`v|5(V(&$TTYkSSCrVU}N%tF|RNH#Io^9h`iqHOsNvMnOpm6fW zwE@HEpt<;7VS5Ju_piF(d>M$M3(qUgwLw?w(k-hAv5qla;#X(A*sS+guC|MBhbhY* zzFviRfP3=UbeGY@G=;WjHSsmRFX)^1wR?4h#8e-6f%z7S=({;0jO*U=hRCVNLR@;29{YE$0{c?TQM1^yI#6Q+IRN$WoLzIVh-8?G3y4=aM%ib_j9&1n zh$gF+>EfGIidB9x!{!M}YWg*E!x_BYv^HZbsMjTIrB%6l9qZjf$Im3DfM)dcPx_(G2dclHCHb=*{TthLq}1s z;3p{Z`Y|GToSYWQ)+pu0D?`s#Jzlk@XC0JcG(SH49(Wta$D`E$BHExNrsaCfy<$%o z!F8s9Y4KQ&N(rWl^y2z=)SK@Govbu|1m4f3_pMdfEdvsDt1}#uOqcuCw!6uZZ+$k2 z69^;Sg&PRwScbkxPMo&)Y$pYsiDe^d*m9f5F}>l}llN@B117$f$_&tpjuau|qWFcZ+)evygO1|rWoL@DYb0uP?3h@8${ID&lk89G1wz>KoWF2wE*5_| z9*Ucye%1x3ze;T$a&qFhj!IqjfO4b$q$@0$EWq!zq^|RX8Z+9yI2#rV|NCt8b2D-g zJS0?T>4_rK2ai-g7$*E=t(-&3cZDJOoVdI6JC&EEphe+gwoih6xIxgh0X`A(PuAAb zR%QgQBvy96tPQ>q$r^?Vo_&Q(J%D@Gc=yIqZ&%V~IG_x^2mL-2ncuG<%F^lXp$BQM zk|AFAW=7J#^;Qi{pUSRm-5Xy~DM4UWWsWt+r@cP0?=LUx*BiEGS<$L<;3UE^9QC)$ zB(OKND4%~MPJWRywektP<>buXaG7SX+;@1iM!BQ7>J-j7*79Z;IhnhqpZ$b+BD zSUb;<(=L@$B!??dKv(dXb#p|9#3Ur>nZ){jdX)S=(@eVV!FY94hTqy`4hY94C+0wM zd}Me+Vhe)|;PC)hZYiMYMa* zj%yl&p%JZS{z}#njMu$HFH3`=9dS#IR!J|KLk2Cu0T*(f`!&|onoX<Hh9W&8=EHi*j8KZaEcir8y5REUrm5Xo(vxCR8ZmB}Q4Rq4qVxj6M(A=bx3mN)+Q z#&qH+EGE9J1%0L$saqpPIsDgDjMCEE#hF=w*tc%`Gsx~LT9s%e=f!R_{V87{Vah`R z^!ZeLzNSwyUtfmMHu2?mAYRl<;OLL!Qjk$iwF~eq_|-5Pz23HaBktTJU4crjlVhe6 z;2m4W?l@cIrW5if!8##vT7bPbQFDJafmtU?+&)R|1@><RvVfHIWH@ zb&q^5t}|OsUBh%I7hwZG%5@9<2rmxt(N#tO^yE%6{e!YS%W&gxb)4S^@BsO`dsB3W z(BmgAJK&|z&qWDZA|!a7B6`l;qRZHusdZGXT~Kh|@;%ywpo~B5J7{epp9#3X8MthS zA@xujS!`cOIme5aY*3>vfxf1Z+5~5vg6WF4_AJb5`WlHSudVQNK5rc`ig!hj+6;R4 zV?||asEMdTJ&^4*NZ000B|>~N_q^98vsh_Kl^%9>-k^(qoySW4NNtcIl(g|D$eG0e z_T;o^mI2|yfuhk6dqK;%O61h-B#aCJk0R6!s)yw-$i)rX=bB$>$~)>DxqlmpDRBcx?OLS4+%a+Y4P_9 z(D*>isZ_l$bE~u?Ogwp8Q6;^)FQw!YItWu-{@8nIGHUS~A>iT=2JW6GCps<`M za-XSw=h^DLnj^hAn;T9KS|^jsgWO9>b7UQBdScgwC6@l<Xkq~m z>x2hTMgNcZcc>WH9-k0f5%Lxru86*MxE2z29;h9O%ZvH5q~M^DjL6 zFZYlHSU1^Ya>l>nED{o|w_5WM|4#T!i1@l|(_a3$5`)qcO+U?SMEL-xAK)lN0qyK2 zsxe&Q;h>oqHIhoWT`?Uen==-cvM&oWU_Rb-P z3?O@%HDm4|m#pbc0BgRJ0Zu7cNk&~e!4-PvPWZ#;qfgM|a?=SzJ$;DzluXS^e#3A4AKbsJ5mxr| z;%|C~a-@~OZy*77ztU1d2kgvSFBjmGe~pDVd}U^Q=6|sem??}Yg0FdM`}$>wOY6iU z!__S{b8UF=4N4#h{%Kud?OPfS%hZ8sy@6Zk};DmWVe6ig-3%2L8?H~?U*@9dnJ zb}=L^f#zcVD6bMOt}_4?X*r3Cc#>Mjo~u8=J`lsE-QP&fX%kDoc6!La!o|A}5H$7R z_W{f6K+#BEHUKyX{Ule?{fnma2?3{?_KQ7ZJ3IlmRM_cMmO>duF~#ZT^I}O;zZ1SW zx}x{L@)$oN%u|D3FqDxI85*`Sx?NLD#v}=#J_@joR|_uKCgfCIF~7CjGe+ZFlM{H4 zpO&Rs?+(T+%fj#DeX3<|Fu6SqGRh>VmTkaGt0~G(Vmd30Kv`;s%w3cM#yA9B7^Rh$ zLthxaNx{#hb;>h&j}mSQWrVqQsEFS{xV4f0LU3*$7xq5l++`4|<%z%EM}UaVsymiC z#;+7uog5~loAOBs@c}04Y`Lj4)iMKMw|eE$%6=z)HZS{$1F*_I!bJVSf&X%9;CX05 zkKdrBq-?(m=TGK%{aTYaxldF@069fG&hIgJBQG&9{*M9AdYJ2>!FZp>4rU{lE6LM5gs0z=ja}LsIFO5*=#_0g#_rGa4#xj{el4x*bSL6 zoZ6pV&d((a-Ib(J0iFc#jF-vF1hp9vT}k&exe9k-6cv~2FivX!F&M3j?|9PRQC%HR zSw+Q&^*I(6R_4OfG&<>^``ul}ppPHPSy`h=_7zx7@F0g&fC8>KW;R2m#0BnK{Gg9X zv2;ruxCQ*uQx|>Tm-jur`|uKAr09Ol;PQGZB~;ige6nC%{v(lr zf>`Vbn@6L*dv$PdaQo7d0m?q_*Agn*VNal$cqRn=Orj8hpNT%~{?LH?0~pxowJ>WL zzEj|_@eanoM#9oUK)HPV7kjEqE^|Xt3=o^;Lh_($_DNF$DPpQX1h6_o^6XhohpYS7 z04Qy5C9r`31Z?;W;5XnO5X2dYf2j*TZW?FV&3`zM0bDCoB-~xPfP#vRGfctDM<7a46y`GEcG$kT)SLlS zVLWz={JQQN3EK}qZv71e2>{V}kY)3@MuQr_?jwLm2_L|?s-+<9Px!8y=8Es-OX+ez z42fNRW4Gvf`TI+$1JJMjpMns_e1r{f>lw%|Mokf%8gB&~WMx`|y4d!n;I>SU6RKvE4+KD6p7kTY_c52qIgor81*W1yu;W;LSV zb!{R5P+T{7v>@EO8zwp?M%^E~#~|hfF#P@IWHBL#WL5vMY|GM?!Rsu8au1B}#XxL? z!{wpTc&ULw=%P5vzIgc^5axwhhgT_9R8}TLGfVGVu+#!?b9v(3OF+l?A&w0i1$YF& zLOx&LYWzpcwGN&<0GQX*4USg(!Af*7fb%w%!c&+jb7Vio$1I;}8E7#O|FTs$ln0Zx zX{OW~7qDT$ohA0zSkwML?h7n5%TEKKZUq20qzftY07S33cRT=f3Y+k~JHsads3@q# z+i7ZDL6zk$4yai2?iC;%S0{7^gubh90Hf?ePw(>?3xKYL$lK#%USd;9tWGWYh?|N9 z{R+4_Txm}T_5j%@sC#?g5L}PMfEc(rVe^gSItMZ)W+lpqawa2v$f2OsBx4@T_oP&r zljy6Pn#zS5p*sVj;^07X4q zu33tSivuH8R5yU#SOWx+4q4@<6PxA!N5WPCDBXW8boa&m)BE6GBCvRv8A$(Ii^}{( zY1pzr&H*Aeyl#&q-vP7-^)cn*#t`y8(^ot60Ol#FKTg5!3#Pfucr^{sForYKmI5w~d8L-SML)bhBXUUe ze72+_Fw7y-AqI4`Ps4(Dsvo@_XmTO|G8)YRu@&!QlmiwTe5=^ZX=39`4epilkCMK^ z9oKUfg{%1`$!x5z8-n;2AjM{g-P@h11mlJy;0B}#s=gz_KBpdt+%=7^55aE^O1J_Y z95qrI!jEPk2w_6xahwZm1AuLx|8+N_{b*t1h)OR}9$o?K5U*9Qfc{&)8{ffvol0nd zqQF^vm?-nN`Wyf-ihR_;7S?Sm578thA}Tr#dCf@C$oa_~CS3_PaNocs67YBiC_(s> zrAvOBo2Uhhp}4*a&Qro7b-PQxLhiM`fFdEYjz8>*JyM)c*LoMHDEK!01cD^F+A2$Q z3>3B!9|e2Apq|(#)h~c0%nbmTYP$~aYeLq6th=BX8GEQR&|uqhkaLQTTo1uQdUb(i zN_wG~+?gthHHMYV@iPbLGkpntC*XSz-*A7eN_|+WbI~j!+7}cYjNhxCj5mQ-mq!Wn zZ=oLe+~M@FTRA)mX1&w{42at9ZZgOnj4f*ZdWp;$jsxx&A6%#w*z*4Vrv(i(4lV)| z3~~Fo;q0*HffsZh#%BEG9>pn8K=btx>II#Fd_V?r-5~2wE0_p(rM#O&B>>8q6XA`i zjc9>AS1i5yB{Y-CNSZ1rOy5q)xT1qDDNqQXQS%+zOG@WDu#~CsyB}32txIFNS`mV^ zCy?iZdEx7gF_%#J`$-+UeDTLxrqSnpMwwsvT-Aen!bX_*;7r*-*tkDT+za?2h5aJi zQaYGKuMJObSvE&hZ>EMoz|Z=+DL|~1i1FqCYZ35@uz-e}Ue!KH^FX|O0k8vOY#hL8 zykYDDLL(LAGE;R3uyQhXc4d@Zh~w{%`1(M$xc~6a1sw>g{#zB74(is0I459+_F5Rs ztqWj*5R$L*oetde#S9WQ5n>E_{Y}UVQy{yi4+;s{g0$;2e9^<6M-1$s#c$Paiwb!J zM(mS@)3N6p-b~~wMZl|bF-hM81>j_TDhpbz-lN|-Yc~>gl(Ixrv&O^2R+!1PsZ3=+ z2Fc-ioFd3%AUy=MvX_`=mJ%e?sat;zCz6(Qe=t+TD!2 z4Jnb(SCo+n&@a^>H>J{VxU*8;uUYabbiqjj3D7V-v?_p{uCr#OvtzOQc+Q@?$s;>bSx^$%1Q#+@Uu6e;=?(_uzCnqngS!Pzx@HecR)BO zon#j0hl9`{Z14IlbwC83N+}GCxg-j&#nk^Aa()Q9K&`rfs5u}5IA!1!j;a<7$|cO4 zrfDL>`ssg(A^i?O^}7!R0bdE&u*FVFvOM+w#Ro9Xh_e5a3-JH@hy8!^0ZOX@H@Mv2 z+uyVVe6qA*^5503EsM=yk!%Z zo?Hma<4uoRR{zYX1H;HeV{ls=Yd>gmLJlFYmpPQ0PyZJ+ti4c+Nr2~x!dC6KY-}G7 zG>UIb$jIZknz)rS0dd_GG4aMCZKGtn`nZD%l>Uz5egU3!+qPVZ2w||7rh3e!5KTHqE!$kK2K4{V`eoOw`1 ztpZ!PDTL^3k^QC{+gG;p2%vU*E5O_0zmQ7erF zKo4R~`=wf3VH?&0(9ivSY4$_#3Wr^YYn67S=pS{mmusxH?=E=-XGcf~x-@GLTz!3g zL3xh~{FA9cP)1~QG*(E>isG9)RSE$$V6O)DiM1C0$J;*tW>{D6^Z=@Kqu&E5AV(Mv zgjRe4OkHrv&w&{VNbzcxHJh=(!vhs^7pS1MciG*tEMS+%XppD?-vm1!v7A!O{cQyR z|5PGD2BVCG2rJNk@4n4~kdSOJIkbuCw^4uZaQVT2aSg%+s2`2@=j&kNZH?2UkOxYM zh!zkm_($0r*s2A?jwT-;GpT^GjQkB?p^jH*CgzfZC;g($^exqLHW=VH@473r{=WJM z!3#=eb|73>#@LJ$Uhy-|Ib$kU{2I+2HQnWdq7z0ntR#ZVB8mV^WZ>Q6(DzHg4jddH zT~VY_;A1J7nK4Qmj>85C8$k>htPo0l^&TD=bi%w=z^{md)tk-h(q88+*jhROvA?ik z2xd3{lJXFv1kRrWf)LzMS*acrgGMnu2xk7m#>OTkE#KMfd12iCFb6$({dK1?YK_!{ zaYnMo-eOQ8Q8kf(N*2qXU*|YRL)u(5*4mzu^k=ACpWuI zud1{v&DlNAt*^iiCx3YhP7c~#S|Dr-u%KrVW(U7;Cjo<-9+M~m^S&0ZtBoXfo7q>R zWZ7;%8hzyP-U1aP>N_44BP%d^cyIUkX- zn})cvy-z4mGc&fg4;!2mnw>?t0DRTH*1#w7d{6V^6l(J@=Fk3!hub2~YBT>%u}Ej* z{QYM*Bg1$A9MqCk{><8GYe?+$B(Y>PoK5Qd?hne5?S8>(n3*Q%ySX9t|Aj$_J-kXwYS*)?MZ~Y^!sB54u9yUag7pPw!tFAeVf^zF_Sj4J)3W{*Fn#K zHP;fHR^p81yvw)d>rSXxS;+>gc2F))C$ULLYz1ZSa|4?Ax#;Mmz;}1jS#AQn{?X%C zC@3iAj3C?f3RE!zp{{zeSI4+iRHmu&oVCE#yDfway=B6|LdWymKp3ihRec>1spqiGoR*y(tys#9k z!K=0UdJ4`7T>~cTJu&aoji*@Pxa{urfz#-xmwU0-V?u(B&Qufj!`90s&VSD0p_7h^ z-nshWgM`9QQXpjZ!_RQI$(JX@f4e$%%Sr!VeX?R()2|&%a871NAysgB<})!D-5f$$kpI z9D`I)*W+~4$a2UP#0P;5)&kz^ZcOOT*kn!aiU9tCQNY6mn$ogh8}Ah4*Nuu*>$y8w z!TKMw?GYR%j{Uo3{mY@(?*%W5=2u)fM60LiVv57*r8iSzHm zF|cUsf!)yLUZ)g5WIaGGmTTN81Hp3<1)dfrC#%YhFSu>3^|`da%W2iwuQZ@64QISz z0WAL<9$eq6V>6s}tL2gl2{ExA(Bh)X#@T<7w&UhOuODdJ2$o!0z%=DzQIywTq&YMG z^6EW>$8i(L9-+CN;yvOpABYAzn>;9nl|M8+-4~n>a~(JO%mr^Q4!AjY0WHu0r^RXs zoW)^az zJZx>DYG!)Mm8267EUUg;r@dLV{bHSqYW$xby8&97i3rH^l^vaf$Zolh{Gl4QeBl!R zvv$E=*^9yjKiRijUHd}DscjIq&GZ1tFvT^Av|)MKT*AP7#HPhSxyC%Cjn@U@XwwR& zl5#EIKW9czMyr)wCixOgM|yu$b9?O?r29Dxr}?%?`JqH)tWZMw=&W3sBg{tf5!@GW z1vabMY=2*B6maZimeYTxQry2)J_uDhI+BimymZ-{Yl4_oJ(BkP2WV@uJFe5Eym@fG z0;eTMW7LQ(B=~)-+HTq~1iLq8BIl6_vQppf4UCz!TvsblmPFy{kyD_A{MK@#Heh{Q zaM?+I{huG|mxS_~)Bb{8Cox>;JCKaqKdfs0xOIG~3lBQbxy=U81olecJf|$FH)>?D zm1|VVx0GgBRCYFK5U_gcANhMu4!-mM&n3lANC6qE_D>M62}QIVXu)6j+VN3)Th3E+ z_y|&Z%nDp_{uI^%mlqa<5N~0CEz}|m<}*e>uFl^^HNq~V(6X(CQoSfyA?!O|+-!#o zc>RcNI9)iJOT0Mf+Ay6Q_U*fgQXkwN8@z*UpWw{c3ur<%?|p$aC~Tv*UTome@vc|N zZoa>koh;E~00cwFtM|}fx_TrW;W$+G!xYbJDF1UvkwuGw3*tc-E<>gQoww!mS5rny z#7PV>4vB;P2RoE@Kbs_xe7wa3y^6G}xzry2U+sN)Je2*re}s5Sg`Py&muW&m5}^e# z#u6&~426=VDP>>dNsB@lYa_Cx?1{0?wA+KAs7rGql4X-xGQyUIHZcH@ zYomfmk%wVR@fC{3cts>oitsS{F|GPrhgR?7*ve#>aGq29?(5st&S~WrXFy*z=6ayd z0_~_pvW;j&E!uNq!au^0fABZa)Ieg@OVC&NrrFI2S>u zp%FAI^c9wY=G+6qUp%XhHP=*zSvs%)yUN0~Nl;dB-9>!= z_HCZ~5v{#@O%=?D7bXR&&@!BUwFf%+ezyS|IJXMJM|W6dqhwrJ__IxP!xN8P-e!^7 zr0I1sv0I8~l&jaOa&kAV;jJi-Tg_w7?%U6g7H*#PMR3z$5Tt7Udx^j&xfK^bQY~LQ znumKdB|#f|N`v}RuTv0+jCNLf+1plnrXt^GBVMz*z-;RSUjX^{Wd>gP^tVjLoA>Y0 zzz@#>9JmuIECsgD^CpxZlUJd$eg}W*LIn|{wGFxzMGoU0@wKZ{OCS2reM3^3&<|~>FUT$^F2mDOvM`&A@ND1nC{y}*cNNlHOkA(o@ z$;HLxwu^XqU|BD%5){!*y6%v5T4JjPM7I)Hx&v5o{m+%BWBfWr$Jsgvq#p1*n zhQbLI?D`kN@zeXm2RxD?8u#f;_1o?6o zatodvWtvo8?st3jy*nB2RbcTvuMqR1IjYlc%>Z513Pe_jkn8*Dln;)PKOn4lpBViy zVNPqUPnndMZEz=r2@BkZ4XtiZOmwFb%ccfuDV?zdZ`iz>0B3b^jjyjQlG@z;>U)dq zmEqb4j~;#Oth$VyTSqvG0w6G3kG%4D8ZtLezLg?wItafWz{-A0BD{q@gx0=(eGC)4kw;Sy2d^u@Bb+zarL*9_O_Y##NCA5Xp_3 z`J0>5neFJp-bBV1*I2S+tzGReG=mnoXd2?nPgOVRjkEJewSIaURg9^Z&&smSfgm7n zZ3!fQqyn`t=9wM?KS3%M4brGNz`sKN0PQNIg~iA0>n)2@pU=}NRzE;6_Rz~9n2SCF zefS4Pogn=3=JZn%QSY&5;%^^Zy$HE+w^e=!;f^N#cwyV zA`fsb&R2fKMAZH|dhOdOO&l#eSO3nR;|YCo zM~8|%K9sJ3Ui?_@vzp=@h3m9pl6l8Dz$)_UF{0$VBjp)5^UL>^NA>JhCxmyo+qxdK zUaJ#RI7Xw}U;`n-`OTM;F|aSc9^^=Hl_P1>5kZGO?-^9o(&~$10F~F+r?4B*Bc_|B zxb;!~7*s#cnGt=eNpRelGg1m{=KRN2NMmL`T{t`O3T|0kt;zbKJgB7r$<07O@4eqY zPMSAdM6W#GN*RWL%h3nT2S2cN*WrsEteGp^&GwW7aM6I*`;?ORe8fN+{R&NymqTL* z7{=G8=bQhT<3qfFLBdj#`rES|QE)Djd?TwRX=#5yl5*=Gzu$L5L^u3{wh-?~Da1=# zFK>7)L*s$J@h(M~2(VqsAs~etS2-pb4Ts9df8#E34h>c~5NUeu4%-7jh&8uGGHwoT zq)FP;x`D%*Wx^N4&V{mHNEM_-9Wm$R)P!(~@On4-2QM)Kc6Ya@pp`z`i-LgU!iI

    twitter - facebook youtube medium

    diff --git a/docs/apiref/core/common.rst b/docs/apiref/core/common.rst index 12b3892b5f..b4346351cc 100644 --- a/docs/apiref/core/common.rst +++ b/docs/apiref/core/common.rst @@ -7,6 +7,14 @@ Registration and classes initialization functionality, class method decorators. .. automethod:: __call__ +.. autoclass:: deeppavlov.core.common.base.Element + + .. automethod:: __init__ + +.. autoclass:: deeppavlov.core.common.base.Model + + .. automethod:: __init__ + .. automodule:: deeppavlov.core.common.metrics_registry :members: diff --git a/docs/apiref/core/data.rst b/docs/apiref/core/data.rst index 81567f9400..7f6e705fc1 100644 --- a/docs/apiref/core/data.rst +++ b/docs/apiref/core/data.rst @@ -8,6 +8,4 @@ DatasetReader, Vocab, DataLearningIterator and DataFittingIterator classes. .. autoclass:: deeppavlov.core.data.data_learning_iterator.DataLearningIterator -.. autoclass:: deeppavlov.core.data.sqlite_database.Sqlite3Database - .. autoclass:: deeppavlov.core.data.simple_vocab.SimpleVocabulary diff --git a/docs/apiref/core/models.rst b/docs/apiref/core/models.rst index ee9d59a537..8a0628dc1c 100644 --- a/docs/apiref/core/models.rst +++ b/docs/apiref/core/models.rst @@ -10,10 +10,4 @@ Abstract model classes and interfaces. .. autoclass:: deeppavlov.core.models.nn_model.NNModel -.. autoclass:: deeppavlov.core.models.tf_backend.TfModelMeta - -.. autoclass:: deeppavlov.core.models.tf_model.TFModel - -.. autoclass:: deeppavlov.core.models.keras_model.KerasModel - .. autoclass:: deeppavlov.core.models.lr_scheduled_model.LRScheduledModel diff --git a/docs/apiref/dataset_iterators.rst b/docs/apiref/dataset_iterators.rst index c5c5c408c6..e2b47ee805 100644 --- a/docs/apiref/dataset_iterators.rst +++ b/docs/apiref/dataset_iterators.rst @@ -5,25 +5,6 @@ Concrete DatasetIterator classes. .. autoclass:: deeppavlov.dataset_iterators.basic_classification_iterator.BasicClassificationDatasetIterator :members: -.. autoclass:: deeppavlov.dataset_iterators.dialog_iterator.DialogDatasetIterator - -.. autoclass:: deeppavlov.dataset_iterators.dialog_iterator.DialogDatasetIndexingIterator - -.. autoclass:: deeppavlov.dataset_iterators.dialog_iterator.DialogDBResultDatasetIterator - -.. autoclass:: deeppavlov.dataset_iterators.dstc2_intents_iterator.Dstc2IntentsDatasetIterator - -.. autoclass:: deeppavlov.dataset_iterators.dstc2_ner_iterator.Dstc2NerDatasetIterator - -.. autoclass:: deeppavlov.dataset_iterators.elmo_file_paths_iterator.ELMoFilePathsIterator - -.. autoclass:: deeppavlov.dataset_iterators.file_paths_iterator.FilePathsIterator - -.. autoclass:: deeppavlov.dataset_iterators.kvret_dialog_iterator.KvretDialogDatasetIterator - -.. autofunction:: deeppavlov.dataset_iterators.morphotagger_iterator.preprocess_data -.. autoclass:: deeppavlov.dataset_iterators.morphotagger_iterator.MorphoTaggerDatasetIterator - .. autoclass:: deeppavlov.dataset_iterators.siamese_iterator.SiameseIterator .. autoclass:: deeppavlov.dataset_iterators.sqlite_iterator.SQLiteDataIterator diff --git a/docs/apiref/dataset_readers.rst b/docs/apiref/dataset_readers.rst index 1dd26030f5..a7ad0f6abc 100644 --- a/docs/apiref/dataset_readers.rst +++ b/docs/apiref/dataset_readers.rst @@ -7,31 +7,14 @@ Concrete DatasetReader classes. .. autoclass:: deeppavlov.dataset_readers.conll2003_reader.Conll2003DatasetReader -.. automodule:: deeppavlov.dataset_readers.dstc2_reader - :members: - -.. automodule:: deeppavlov.dataset_readers.md_yaml_dialogs_reader - :members: - .. autoclass:: deeppavlov.dataset_readers.faq_reader.FaqDatasetReader :members: -.. autoclass:: deeppavlov.dataset_readers.file_paths_reader.FilePathsReader - :members: - -.. automodule:: deeppavlov.dataset_readers.kvret_reader - :members: - .. autoclass:: deeppavlov.dataset_readers.line_reader.LineReader :members: -.. automodule:: deeppavlov.dataset_readers.morphotagging_dataset_reader - :members: - .. autoclass:: deeppavlov.dataset_readers.paraphraser_reader.ParaphraserReader -.. autoclass:: deeppavlov.dataset_readers.siamese_reader.SiameseReader - .. autoclass:: deeppavlov.dataset_readers.squad_dataset_reader.SquadDatasetReader :members: @@ -40,8 +23,3 @@ Concrete DatasetReader classes. .. automodule:: deeppavlov.dataset_readers.ubuntu_v2_reader :members: - -.. automodule:: deeppavlov.dataset_readers.ubuntu_v2_mt_reader - :members: - -.. autoclass:: deeppavlov.dataset_readers.intent_catcher_reader.IntentCatcherReader diff --git a/docs/apiref/models/bert.rst b/docs/apiref/models/bert.rst deleted file mode 100644 index 5ebcc6552d..0000000000 --- a/docs/apiref/models/bert.rst +++ /dev/null @@ -1,63 +0,0 @@ -deeppavlov.models.bert -====================== - -.. automodule:: deeppavlov.models.bert - :members: - -.. autoclass:: deeppavlov.models.preprocessors.bert_preprocessor.BertPreprocessor - - .. automethod:: __call__ - -.. autoclass:: deeppavlov.models.preprocessors.bert_preprocessor.BertNerPreprocessor - - .. automethod:: __call__ - -.. autoclass:: deeppavlov.models.preprocessors.bert_preprocessor.BertRankerPreprocessor - - .. automethod:: __call__ - -.. autoclass:: deeppavlov.models.preprocessors.bert_preprocessor.BertSepRankerPreprocessor - - .. automethod:: __call__ - -.. autoclass:: deeppavlov.models.preprocessors.bert_preprocessor.BertSepRankerPredictorPreprocessor - - .. automethod:: __call__ - -.. autoclass:: deeppavlov.models.bert.bert_classifier.BertClassifierModel - - .. automethod:: __call__ - .. automethod:: train_on_batch - -.. autofunction:: deeppavlov.models.bert.bert_sequence_tagger.token_from_subtoken - -.. autoclass:: deeppavlov.models.bert.bert_sequence_tagger.BertSequenceNetwork - - .. automethod:: train_on_batch - -.. autoclass:: deeppavlov.models.bert.bert_sequence_tagger.BertSequenceTagger - - .. automethod:: __call__ - -.. autoclass:: deeppavlov.models.bert.bert_squad.BertSQuADModel - - .. automethod:: __call__ - .. automethod:: train_on_batch - -.. autoclass:: deeppavlov.models.bert.bert_squad.BertSQuADInferModel - - .. automethod:: __call__ - -.. autoclass:: deeppavlov.models.bert.bert_ranker.BertRankerModel - - .. automethod:: __call__ - .. automethod:: train_on_batch - -.. autoclass:: deeppavlov.models.bert.bert_ranker.BertSepRankerModel - - .. automethod:: __call__ - .. automethod:: train_on_batch - -.. autoclass:: deeppavlov.models.bert.bert_ranker.BertSepRankerPredictor - - .. automethod:: __call__ \ No newline at end of file diff --git a/docs/apiref/models/classifiers.rst b/docs/apiref/models/classifiers.rst index fda32d79ff..618f28d837 100644 --- a/docs/apiref/models/classifiers.rst +++ b/docs/apiref/models/classifiers.rst @@ -9,11 +9,6 @@ deeppavlov.models.classifiers .. automethod:: __call__ -.. autoclass:: deeppavlov.models.classifiers.keras_classification_model.KerasClassificationModel - :members: - - .. automethod:: __call__ - .. autoclass:: deeppavlov.models.classifiers.cos_sim_classifier.CosineSimilarityClassifier :members: diff --git a/docs/apiref/models/elmo.rst b/docs/apiref/models/elmo.rst deleted file mode 100644 index f3e2666488..0000000000 --- a/docs/apiref/models/elmo.rst +++ /dev/null @@ -1,6 +0,0 @@ -deeppavlov.models.elmo -====================== - -.. automodule:: deeppavlov.models.elmo - -.. autoclass:: deeppavlov.models.elmo.elmo.ELMo diff --git a/docs/apiref/models/embedders.rst b/docs/apiref/models/embedders.rst index b004dfa006..eb0b2b7507 100644 --- a/docs/apiref/models/embedders.rst +++ b/docs/apiref/models/embedders.rst @@ -1,27 +1,15 @@ deeppavlov.models.embedders ============================ -.. autoclass:: deeppavlov.models.embedders.bow_embedder.BoWEmbedder - .. autoclass:: deeppavlov.models.embedders.fasttext_embedder.FasttextEmbedder .. automethod:: __call__ .. automethod:: __iter__ -.. autoclass:: deeppavlov.models.embedders.elmo_embedder.ELMoEmbedder - - .. automethod:: __call__ - .. automethod:: __iter__ - -.. autoclass:: deeppavlov.models.embedders.glove_embedder.GloVeEmbedder - - .. automethod:: __call__ - .. automethod:: __iter__ - .. autoclass:: deeppavlov.models.embedders.tfidf_weighted_embedder.TfidfWeightedEmbedder .. automethod:: __call__ .. autoclass:: deeppavlov.models.embedders.transformers_embedder.TransformersBertEmbedder - .. automethod:: __call__ \ No newline at end of file + .. automethod:: __call__ diff --git a/docs/apiref/models/entity_extraction.rst b/docs/apiref/models/entity_extraction.rst new file mode 100644 index 0000000000..865b51e686 --- /dev/null +++ b/docs/apiref/models/entity_extraction.rst @@ -0,0 +1,19 @@ +deeppavlov.models.entity_extraction +=================================== + +.. autoclass:: deeppavlov.models.entity_extraction.ner_chunker.NerChunker + + .. automethod:: __init__ + .. automethod:: __call__ + +.. autoclass:: deeppavlov.models.entity_extraction.entity_linking.EntityLinker + + .. automethod:: __init__ + .. automethod:: __call__ + +.. autoclass:: deeppavlov.models.entity_extraction.entity_detection_parser.EntityDetectionParser + + .. automethod:: __init__ + .. automethod:: __call__ + +.. autofunction:: deeppavlov.models.entity_extraction.entity_detection_parser.question_sign_checker diff --git a/docs/apiref/models/entity_linking.rst b/docs/apiref/models/entity_linking.rst deleted file mode 100644 index 96c8f15d6e..0000000000 --- a/docs/apiref/models/entity_linking.rst +++ /dev/null @@ -1,22 +0,0 @@ -deeppavlov.models.entity_linking -================================ - -.. autoclass:: deeppavlov.models.kbqa.entity_linking.NerChunker - - .. automethod:: __init__ - .. automethod:: __call__ - -.. autoclass:: deeppavlov.models.kbqa.entity_linking.EntityLinker - - .. automethod:: __init__ - .. automethod:: __call__ - -.. autoclass:: deeppavlov.models.kbqa.entity_detection_parser.EntityDetectionParser - - .. automethod:: __init__ - .. automethod:: __call__ - -.. autoclass:: deeppavlov.models.kbqa.entity_detection_parser.QuestionSignChecker - - .. automethod:: __init__ - .. automethod:: __call__ diff --git a/docs/apiref/models/go_bot.rst b/docs/apiref/models/go_bot.rst deleted file mode 100644 index 216b1f910f..0000000000 --- a/docs/apiref/models/go_bot.rst +++ /dev/null @@ -1,17 +0,0 @@ -deeppavlov.models.go_bot -======================== - -.. automodule:: deeppavlov.models.go_bot - :members: - -.. autoclass:: deeppavlov.models.go_bot.go_bot.GoalOrientedBot - :members: - -.. autoclass:: deeppavlov.models.go_bot.policy.policy_network.PolicyNetwork - :members: - -.. autoclass:: deeppavlov.models.go_bot.nlg.nlg_manager_interface.NLGManagerInterface - :members: - -.. autoclass:: deeppavlov.models.go_bot.nlu.nlu_manager_interface.NLUManagerInterface - :members: diff --git a/docs/apiref/models/intent_catcher.rst b/docs/apiref/models/intent_catcher.rst deleted file mode 100644 index d8605d420d..0000000000 --- a/docs/apiref/models/intent_catcher.rst +++ /dev/null @@ -1,8 +0,0 @@ -deeppavlov.models.intent_catcher -================================ - -.. autoclass:: deeppavlov.models.intent_catcher.intent_catcher.IntentCatcher - - .. automethod:: __init__ - .. automethod:: __call__ - diff --git a/docs/apiref/models/kbqa.rst b/docs/apiref/models/kbqa.rst index f873053bd5..8a327251cb 100644 --- a/docs/apiref/models/kbqa.rst +++ b/docs/apiref/models/kbqa.rst @@ -3,22 +3,17 @@ deeppavlov.models.kbqa .. automodule:: deeppavlov.models.kbqa -.. autoclass:: deeppavlov.models.kbqa.query_generator.QueryGenerator - - .. automethod:: __init__ - .. automethod:: __call__ - -.. autoclass:: deeppavlov.models.kbqa.query_generator_base.QueryGeneratorBase +.. autoclass:: deeppavlov.models.kbqa.type_define.AnswerTypesExtractor .. automethod:: __init__ .. automethod:: __call__ -.. autoclass:: deeppavlov.models.kbqa.query_generator_online.QueryGeneratorOnline +.. autoclass:: deeppavlov.models.kbqa.query_generator.QueryGenerator .. automethod:: __init__ .. automethod:: __call__ -.. autoclass:: deeppavlov.models.kbqa.rel_ranking_bert_infer.RelRankerBertInfer +.. autoclass:: deeppavlov.models.kbqa.query_generator_base.QueryGeneratorBase .. automethod:: __init__ .. automethod:: __call__ @@ -47,7 +42,3 @@ deeppavlov.models.kbqa .. automethod:: __init__ .. automethod:: __call__ - -.. autoclass:: deeppavlov.models.kbqa.wiki_parser_online.WikiParserOnline - - .. automethod:: __init__ diff --git a/docs/apiref/models/morpho_tagger.rst b/docs/apiref/models/morpho_tagger.rst deleted file mode 100644 index 8e73a7a9ce..0000000000 --- a/docs/apiref/models/morpho_tagger.rst +++ /dev/null @@ -1,27 +0,0 @@ -deeppavlov.models.morpho_tagger -=============================== - -.. autoclass:: deeppavlov.models.morpho_tagger.morpho_tagger.MorphoTagger - :members: - - .. automethod:: __call__ - -.. autofunction:: deeppavlov.models.morpho_tagger.common.predict_with_model - -.. autoclass:: deeppavlov.models.morpho_tagger.lemmatizer.UDPymorphyLemmatizer - :members: - - .. automethod:: __call__ - -.. autoclass:: deeppavlov.models.morpho_tagger.common.TagOutputPrettifier - :members: - - .. automethod:: __call__ - -.. autoclass:: deeppavlov.models.morpho_tagger.common.LemmatizedOutputPrettifier - :members: - - .. automethod:: __call__ - - - diff --git a/docs/apiref/models/multitask_bert.rst b/docs/apiref/models/multitask_bert.rst deleted file mode 100644 index 1d52206270..0000000000 --- a/docs/apiref/models/multitask_bert.rst +++ /dev/null @@ -1,58 +0,0 @@ -deeppavlov.models.multitask_bert -================================ - -.. autoclass:: deeppavlov.dataset_readers.multitask_reader.MultiTaskReader - -.. autoclass:: deeppavlov.dataset_iterators.multitask_iterator.MultiTaskIterator - - .. automethod:: gen_batches - - .. automethod:: get_instances - -.. autoclass:: deeppavlov.models.multitask_bert.multitask_bert.MultiTaskBert - - .. automethod:: train_on_batch - - .. automethod:: __call__ - - .. automethod:: call - -.. autoclass:: deeppavlov.models.multitask_bert.multitask_bert.MTBertTask - - .. automethod:: build - - .. automethod:: _init_graph - - .. automethod:: get_train_op - - .. automethod:: train_on_batch - - .. automethod:: get_sess_run_infer_args - - .. automethod:: get_sess_run_train_args - - .. automethod:: post_process_preds - -.. autoclass:: deeppavlov.models.multitask_bert.multitask_bert.MTBertSequenceTaggingTask - - .. automethod:: get_sess_run_infer_args - - .. automethod:: get_sess_run_train_args - - .. automethod:: post_process_preds - -.. autoclass:: deeppavlov.models.multitask_bert.multitask_bert.MTBertClassificationTask - - .. automethod:: get_sess_run_infer_args - - .. automethod:: get_sess_run_train_args - - .. automethod:: post_process_preds - -.. autoclass:: deeppavlov.models.multitask_bert.multitask_bert.MTBertReUser - - .. automethod:: __call__ - -.. autoclass:: deeppavlov.models.multitask_bert.multitask_bert.InputSplitter - - .. automethod:: __call__ diff --git a/docs/apiref/models/nemo.rst b/docs/apiref/models/nemo.rst deleted file mode 100644 index 27c2054336..0000000000 --- a/docs/apiref/models/nemo.rst +++ /dev/null @@ -1,32 +0,0 @@ -deeppavlov.models.nemo -====================== - -.. autoclass:: deeppavlov.models.nemo.asr.NeMoASR - - .. automethod:: __init__ - .. automethod:: __call__ - -.. autoclass:: deeppavlov.models.nemo.tts.NeMoTTS - - .. automethod:: __init__ - .. automethod:: __call__ - -.. autofunction:: deeppavlov.models.nemo.common.ascii_to_bytes_io - -.. autofunction:: deeppavlov.models.nemo.common.bytes_io_to_ascii - -.. autoclass:: deeppavlov.models.nemo.asr.AudioInferDataLayer - - .. automethod:: __init__ - -.. autoclass:: deeppavlov.models.nemo.tts.TextDataLayer - - .. automethod:: __init__ - -.. autoclass:: deeppavlov.models.nemo.vocoder.WaveGlow - - .. automethod:: __init__ - -.. autoclass:: deeppavlov.models.nemo.vocoder.GriffinLim - - .. automethod:: __init__ diff --git a/docs/apiref/models/ner.rst b/docs/apiref/models/ner.rst deleted file mode 100644 index 7a726aec5e..0000000000 --- a/docs/apiref/models/ner.rst +++ /dev/null @@ -1,4 +0,0 @@ -deeppavlov.models.ner -===================== - -.. autoclass:: deeppavlov.models.ner.network.NerNetwork diff --git a/docs/apiref/models/preprocessors.rst b/docs/apiref/models/preprocessors.rst index 5561f511db..d87aa1f250 100644 --- a/docs/apiref/models/preprocessors.rst +++ b/docs/apiref/models/preprocessors.rst @@ -1,16 +1,6 @@ deeppavlov.models.preprocessors =============================== -.. autoclass:: deeppavlov.models.preprocessors.assemble_embeddings_matrix.EmbeddingsMatrixAssembler - -.. autoclass:: deeppavlov.models.preprocessors.capitalization.CapitalizationPreprocessor - -.. autofunction:: deeppavlov.models.preprocessors.capitalization.process_word - -.. autoclass:: deeppavlov.models.preprocessors.capitalization.CharSplittingLowercasePreprocessor - -.. autoclass:: deeppavlov.models.preprocessors.char_splitter.CharSplitter - .. autoclass:: deeppavlov.models.preprocessors.dirty_comments_preprocessor.DirtyCommentsPreprocessor .. automethod:: __call__ @@ -19,14 +9,8 @@ deeppavlov.models.preprocessors .. autoclass:: deeppavlov.models.preprocessors.one_hotter.OneHotter -.. autoclass:: deeppavlov.models.preprocessors.random_embeddings_matrix.RandomEmbeddingsMatrix - -.. autoclass:: deeppavlov.models.preprocessors.russian_lemmatizer.PymorphyRussianLemmatizer - .. autoclass:: deeppavlov.models.preprocessors.sanitizer.Sanitizer -.. autoclass:: deeppavlov.models.preprocessors.siamese_preprocessor.SiamesePreprocessor - .. autofunction:: deeppavlov.models.preprocessors.str_lower.str_lower .. autoclass:: deeppavlov.models.preprocessors.str_token_reverser.StrTokenReverser diff --git a/docs/apiref/models/ranking.rst b/docs/apiref/models/ranking.rst deleted file mode 100644 index e9331289af..0000000000 --- a/docs/apiref/models/ranking.rst +++ /dev/null @@ -1,26 +0,0 @@ -deeppavlov.models.ranking -========================= - -Ranking classes. - -.. automodule:: deeppavlov.models.ranking.siamese_model - -.. autoclass:: deeppavlov.models.ranking.bilstm_siamese_network.BiLSTMSiameseNetwork - -.. autoclass:: deeppavlov.models.ranking.bilstm_gru_siamese_network.BiLSTMGRUSiameseNetwork - -.. autoclass:: deeppavlov.models.ranking.keras_siamese_model.KerasSiameseModel - -.. autoclass:: deeppavlov.models.ranking.mpm_siamese_network.MPMSiameseNetwork - -.. autoclass:: deeppavlov.models.ranking.siamese_model.SiameseModel - - .. automethod:: load - .. automethod:: save - .. automethod:: train_on_batch - .. automethod:: __call__ - .. automethod:: reset - -.. autoclass:: deeppavlov.models.ranking.siamese_predictor.SiamesePredictor - - diff --git a/docs/apiref/models/slotfill.rst b/docs/apiref/models/slotfill.rst deleted file mode 100644 index 3c66244996..0000000000 --- a/docs/apiref/models/slotfill.rst +++ /dev/null @@ -1,8 +0,0 @@ -deeppavlov.models.slotfill -========================== - -.. autoclass:: deeppavlov.models.slotfill.slotfill.DstcSlotFillingNetwork - -.. autoclass:: deeppavlov.models.slotfill.slotfill_raw.SlotFillingComponent - -.. autoclass:: deeppavlov.models.slotfill.slotfill_raw.RASA_SlotFillingComponent diff --git a/docs/apiref/models/squad.rst b/docs/apiref/models/squad.rst deleted file mode 100644 index 4de3ff5f07..0000000000 --- a/docs/apiref/models/squad.rst +++ /dev/null @@ -1,9 +0,0 @@ -deeppavlov.models.squad -===================================== -.. automodule:: deeppavlov.models.squad.squad - -.. autoclass:: deeppavlov.models.squad.squad.SquadModel - - .. automethod:: __call__ - .. automethod:: train_on_batch - .. automethod:: process_event diff --git a/docs/apiref/models/syntax_parser.rst b/docs/apiref/models/syntax_parser.rst deleted file mode 100644 index 3884daf8d2..0000000000 --- a/docs/apiref/models/syntax_parser.rst +++ /dev/null @@ -1,16 +0,0 @@ -deeppavlov.models.syntax_parser -=============================== - -.. autoclass:: deeppavlov.models.syntax_parser.network.BertSyntaxParser - - .. automethod:: __call__ - -.. autofunction:: deeppavlov.models.syntax_parser.network.gather_indexes - -.. autofunction:: deeppavlov.models.syntax_parser.network.biaffine_layer - -.. autofunction:: deeppavlov.models.syntax_parser.network.biaffine_attention - -.. autoclass:: deeppavlov.models.syntax_parser.joint.JointTaggerParser - - .. automethod:: __call__ \ No newline at end of file diff --git a/docs/apiref/models/tokenizers.rst b/docs/apiref/models/tokenizers.rst index beb530d43a..99e735acda 100644 --- a/docs/apiref/models/tokenizers.rst +++ b/docs/apiref/models/tokenizers.rst @@ -1,8 +1,6 @@ deeppavlov.models.tokenizers ============================ -.. autoclass:: deeppavlov.models.tokenizers.lazy_tokenizer.LazyTokenizer - .. autoclass:: deeppavlov.models.tokenizers.nltk_moses_tokenizer.NLTKMosesTokenizer .. automethod:: __call__ @@ -11,8 +9,6 @@ deeppavlov.models.tokenizers .. automethod:: __call__ -.. autoclass:: deeppavlov.models.tokenizers.ru_sent_tokenizer.RuSentTokenizer - .. autoclass:: deeppavlov.models.tokenizers.split_tokenizer.SplitTokenizer .. autoclass:: deeppavlov.models.tokenizers.spacy_tokenizer.StreamSpacyTokenizer diff --git a/docs/apiref/models/torch_bert.rst b/docs/apiref/models/torch_bert.rst index a13ec3e52f..32a550124c 100644 --- a/docs/apiref/models/torch_bert.rst +++ b/docs/apiref/models/torch_bert.rst @@ -31,10 +31,6 @@ deeppavlov.models.torch_bert .. automethod:: __call__ .. automethod:: train_on_batch -.. autoclass:: deeppavlov.models.torch_bert.torch_transformers_squad.TorchTransformersSquadInfer - - .. automethod:: __call__ - .. autoclass:: deeppavlov.models.torch_bert.torch_bert_ranker.TorchBertRankerModel .. automethod:: __call__ diff --git a/docs/apiref/models/vectorizers.rst b/docs/apiref/models/vectorizers.rst index 979dd6bb75..bf15afa101 100644 --- a/docs/apiref/models/vectorizers.rst +++ b/docs/apiref/models/vectorizers.rst @@ -6,14 +6,3 @@ deeppavlov.models.vectorizers :members: .. automethod:: __call__ - -.. autoclass:: deeppavlov.models.vectorizers.word_vectorizer.DictionaryVectorizer - :members: - - .. automethod:: __call__ - -.. autoclass:: deeppavlov.models.vectorizers.word_vectorizer.PymorphyVectorizer - :members: - - .. automethod:: __call__ - diff --git a/docs/apiref/skills.rst b/docs/apiref/skills.rst deleted file mode 100644 index bfb47a5e59..0000000000 --- a/docs/apiref/skills.rst +++ /dev/null @@ -1,12 +0,0 @@ -skills -====== -Skill classes. Skills are dialog models. - -.. automodule:: deeppavlov.skills - :members: - -.. toctree:: - :glob: - :caption: Skills - - skills/* \ No newline at end of file diff --git a/docs/apiref/skills/aiml_skill.rst b/docs/apiref/skills/aiml_skill.rst deleted file mode 100644 index 97e7e0ffce..0000000000 --- a/docs/apiref/skills/aiml_skill.rst +++ /dev/null @@ -1,5 +0,0 @@ -deeppavlov.skills.aiml_skill -============================ - -.. automodule:: deeppavlov.skills.aiml_skill.aiml_skill - :members: diff --git a/docs/apiref/skills/dsl_skill.rst b/docs/apiref/skills/dsl_skill.rst deleted file mode 100644 index 7bc6cf2fed..0000000000 --- a/docs/apiref/skills/dsl_skill.rst +++ /dev/null @@ -1,5 +0,0 @@ -deeppavlov.skills.dsl_skill -============================================ - -.. automodule:: deeppavlov.skills.dsl_skill.dsl_skill - :members: diff --git a/docs/apiref/skills/rasa_skill.rst b/docs/apiref/skills/rasa_skill.rst deleted file mode 100644 index 4dcd93f7ac..0000000000 --- a/docs/apiref/skills/rasa_skill.rst +++ /dev/null @@ -1,5 +0,0 @@ -deeppavlov.skills.rasa_skill -============================ - -.. automodule:: deeppavlov.skills.rasa_skill.rasa_skill - :members: diff --git a/docs/conf.py b/docs/conf.py index e9d5e42c9c..f885ffeb17 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -190,13 +190,13 @@ # -- Extension configuration ------------------------------------------------- -autodoc_mock_imports = ['aiml', 'bert_dp', 'bs4', 'faiss', 'fastText', 'fasttext', 'gensim', 'hdt', 'kenlm', 'librosa', - 'lxml', 'nemo', 'nemo_asr', 'nemo_tts', 'nltk', 'opt_einsum', 'rapidfuzz', 'rasa', - 'russian_tagsets', 'sacremoses', 'sortedcontainers', 'spacy', 'tensorflow', 'tensorflow_hub', - 'torch', 'transformers', 'udapi', 'ufal_udpipe', 'whapi', 'xeger'] +autodoc_mock_imports = ['bs4', 'fasttext', 'hdt', 'kenlm', 'lxml', 'navec', 'nltk', 'opt_einsum', 'rapidfuzz', + 'sacremoses', 'slovnet', 'sortedcontainers', 'spacy', 'torch', 'torchcrf', 'transformers', + 'udapi', 'whapi'] extlinks = { - 'config': (f'https://github.com/deepmipt/DeepPavlov/blob/{release}/deeppavlov/configs/%s', None) + 'config': (f'https://github.com/deepmipt/DeepPavlov/blob/{release}/deeppavlov/configs/%s', None), + 'dp_file': (f'https://github.com/deepmipt/DeepPavlov/blob/{release}/%s', None) } # -- Options for intersphinx extension --------------------------------------- diff --git a/docs/devguides/contribution_guide.rst b/docs/devguides/contribution_guide.rst index 9ff47ddf5d..83e6a1be43 100644 --- a/docs/devguides/contribution_guide.rst +++ b/docs/devguides/contribution_guide.rst @@ -51,6 +51,22 @@ How to contribute: git checkout -b what_my_code_does_branch +#. **Install DeepPavlov** in editable mode: + + .. code:: bash + + pip install -e . + + or + + .. code:: bash + + pip install -e .[docs,tests] + + In editable mode changes of the files in the repository directory will automatically reflect in your + python environment. The last command with ``[docs,tests]`` will install additional requirements to build + documentation and run tests. + #. **Write readable code** and keep it `PEP8 `_-ed, **add docstrings** and keep them consistent with the @@ -72,7 +88,15 @@ How to contribute: directory. #. Please, **update the documentation**, if you committed significant changes - to our code. + to our code. Make sure that documentation could be built after your changes + and check how it looks using: + + .. code:: bash + + cd docs + make html + + The built documentation will be added to ``docs/_build`` directory. Open it with your browser. #. **Commit your changes and push** your feature branch to your GitHub fork: diff --git a/docs/features/models/bert.rst b/docs/features/models/bert.rst index 9e68437742..a69605a5cb 100644 --- a/docs/features/models/bert.rst +++ b/docs/features/models/bert.rst @@ -22,19 +22,19 @@ There are several pre-trained BERT models released by Google Research, more deta We have trained BERT-base model for other languages and domains: - RuBERT, Russian, cased, 12-layer, 768-hidden, 12-heads, 180M parameters: `[deeppavlov] `__, - `[deeppavlov_pytorch] `__ + `[deeppavlov_pytorch] `__ - SlavicBERT, Slavic (bg, cs, pl, ru), cased, 12-layer, 768-hidden, 12-heads, 180M parameters: `[deeppavlov] `__, - `[deeppavlov_pytorch] `__ + `[deeppavlov_pytorch] `__ - Conversational BERT, English, cased, 12-layer, 768-hidden, 12-heads, 110M parameters: `[deeppavlov] `__, - `[deeppavlov_pytorch] `__ + `[deeppavlov_pytorch] `__ - Conversational RuBERT, Russian, cased, 12-layer, 768-hidden, 12-heads, 180M parameters: `[deeppavlov] `__, - `[deeppavlov_pytorch] `__ + `[deeppavlov_pytorch] `__ - Conversational DistilRuBERT, Russian, cased, 6-layer, 768-hidden, 12-heads, 135.4M parameters: `[deeppavlov_pytorch] `__ - Conversational DistilRuBERT-tiny, Russian, cased, 2-layer, 768-hidden, 12-heads, 107M parameters: `[deeppavlov_pytorch] `__ - Sentence Multilingual BERT, 101 languages, cased, 12-layer, 768-hidden, 12-heads, 180M parameters: `[deeppavlov] `__, - `[deeppavlov_pytorch] `__ + `[deeppavlov_pytorch] `__ - Sentence RuBERT, Russian, cased, 12-layer, 768-hidden, 12-heads, 180M parameters: `[deeppavlov] `__, - `[deeppavlov_pytorch] `__ + `[deeppavlov_pytorch] `__ The ``deeppavlov_pytorch`` models are designed to be run with the `HuggingFace's Transformers `__ library. @@ -52,7 +52,7 @@ English cased version of BERT-base as initialization for English Conversational Conversational RuBERT was trained on OpenSubtitles [5]_, Dirty, Pikabu, and Social Media segment of Taiga corpus [8]_. We assembled new vocabulary for Conversational RuBERT model on this data and initialized model with RuBERT. -Conversational DistilRuBERT (6 transformer layers) and DistilRuBERT-tiny (2 transformer layers) were trained on the same data as Conversational RuBERT and highly inspired by DistilBERT [13]_. Namely, Distil* models (students) used pretrained Conversational RuBERT as teacher and linear combination of the following losses: +Conversational DistilRuBERT (6 transformer layers) and DistilRuBERT-tiny (2 transformer layers) were trained on the same data as Conversational RuBERT and highly inspired by DistilBERT [3]_. Namely, Distil* models (students) used pretrained Conversational RuBERT as teacher and linear combination of the following losses: 1. Masked language modeling loss (between student output logits for tokens and its true labels) 2. Kullback-Leibler divergence (between student and teacher output logits) @@ -92,31 +92,24 @@ you can use or modify a :config:`BERT embedder configuration ` -and :config:`NER Ontonotes ` configuration files. - BERT for Classification ----------------------- -:class:`~deeppavlov.models.bert.bert_classifier.BertClassifierModel` and :class:`~deeppavlov.models.torch_bert.torch_transformers_classifier.TorchTransformersClassifierModel` -provide easy to use solution for classification problem -using pre-trained BERT on TensorFlow and PyTorch correspondingly. +provides solution for classification problem using pre-trained BERT on PyTorch. One can use several pre-trained English, multi-lingual and Russian BERT models that are listed above. :class:`~deeppavlov.models.torch_bert.torch_transformers_classifier.TorchTransformersClassifierModel` -supports any Transformer-based model of `Transformers `. +also supports any Transformer-based model of `Transformers `. Two main components of BERT classifier pipeline in DeepPavlov are -:class:`~deeppavlov.models.preprocessors.bert_preprocessor.BertPreprocessor` on TensorFlow -(:class:`~deeppavlov.models.preprocessors.torch_transformers_preprocessor.TorchTransformersPreprocessor` on PyTorch) and -:class:`~deeppavlov.models.bert.bert_classifier.BertClassifierModel` on TensorFlow -(:class:`~deeppavlov.models.torch_bert.torch_transformers_classifier.TorchTransformersClassifierModel` on PyTorch). -Non-processed texts should be given to ``bert_preprocessor`` (or ``torch_transformers_preprocessor``) for tokenization on subtokens, +:class:`~deeppavlov.models.preprocessors.torch_transformers_preprocessor.TorchTransformersPreprocessor` and +:class:`~deeppavlov.models.torch_bert.torch_transformers_classifier.TorchTransformersClassifierModel`. +Non-processed texts should be given to ``torch_transformers_preprocessor`` for tokenization on subtokens, encoding subtokens with their indices and creating tokens and segment masks. In case of using one-hot encoded classes in the pipeline, set ``one_hot_labels`` to ``true``. -``bert_classifier`` and ``torch_bert_classifier`` have a dense layer of number of classes size upon pooled outputs of Transformer encoder, +``torch_transformers_classifier`` has a dense layer of number of classes size upon pooled outputs of Transformer encoder, it is followed by ``softmax`` activation (``sigmoid`` if ``multilabel`` parameter is set to ``true`` in config). @@ -124,62 +117,25 @@ BERT for Named Entity Recognition (Sequence Tagging) ---------------------------------------------------- Pre-trained BERT model can be used for sequence tagging. Examples of BERT application to sequence tagging -can be found :doc:`here `. The modules used for tagging -are :class:`~deeppavlov.models.bert.bert_sequence_tagger.BertSequenceTagger` on TensorFlow and -:class:`~deeppavlov.models.torch_bert.torch_transformers_sequence_tagger:TorchTransformersSequenceTagger` on PyTorch. +can be found :doc:`here `. The module used for tagging +is :class:`~deeppavlov.models.torch_bert.torch_transformers_sequence_tagger:TorchTransformersSequenceTagger`. The tags are obtained by applying a dense layer to the representation of -the first subtoken of each word. There is also an optional CRF layer on the top for TensorFlow implementation. -In the PyTorch implementation you can choose among different Transformers architectures by modifying the TRANSFORMER variable in the corresponding configuration files. +the first subtoken of each word. There is also an optional CRF layer on the top. +You can choose among different Transformers architectures by modifying the TRANSFORMER variable in the corresponding configuration files. The possible choices are DistilBert, Albert, Camembert, XLMRoberta, Bart, Roberta, Bert, XLNet, Flaubert, XLM. Multilingual BERT model allows to perform zero-shot transfer across languages. To use our 19 tags NER for over a hundred languages see :ref:`ner_multi_bert`. -BERT for Morphological Tagging ------------------------------- - -Since morphological tagging is also a sequence labeling task, it can be solved in a similar fashion. -The only difference is that we may use the last subtoken of each word in case word morphology -is mostly defined by its suffixes, not prefixes (that is the case for most Indo-European languages, -such as Russian, Spanish, German etc.). See :doc:`also `. - -BERT for Syntactic Parsing --------------------------- - -You can use BERT for syntactic parsing also. As most modern parsers, we use the biaffine model -over the embedding layer, which is the output of BERT. The model outputs the index of syntactic -head and the dependency type for each word. See :doc:`the parser documentation ` -for more information about model performance and algorithm. - BERT for Context Question Answering (SQuAD) ------------------------------------------- Context Question Answering on `SQuAD `__ dataset is a task of looking for an answer on a question in a given context. This task could be formalized as predicting answer start -and end position in a given context. :class:`~deeppavlov.models.bert.bert_squad.BertSQuADModel` on TensorFlow and -:class:`~deeppavlov.models.torch_bert.torch_transformers_squad:TorchTransformersSquad` on PyTorch use two linear +and end position in a given context. :class:`~deeppavlov.models.torch_bert.torch_transformers_squad:TorchTransformersSquad` on PyTorch uses two linear transformations to predict probability that current subtoken is start/end position of an answer. For details check :doc:`Context Question Answering documentation page `. -BERT for Ranking ----------------- -There are two main approaches in text ranking. The first one is interaction-based which is relatively accurate but -works slow and the second one is representation-based which is less accurate but faster [3]_. -The interaction-based ranking based on BERT is represented in the DeepPavlov with two main components -:class:`~deeppavlov.models.preprocessors.bert_preprocessor.BertRankerPreprocessor` on TensorFlow -(:class:`~deeppavlov.models.preprocessors.torch_transformers_preprocessor.TorchBertRankerPreprocessor` on PyTorch) -and :class:`~deeppavlov.models.bert.bert_ranker.BertRankerModel` on TensorFlow -(:class:`~deeppavlov.models.torch_bert.torch_bert_ranker.TorchBertRankerModel` on PyTorch) -and the representation-based ranking with components -:class:`~deeppavlov.models.preprocessors.bert_preprocessor.BertSepRankerPreprocessor` -and :class:`~deeppavlov.models.bert.bert_ranker.BertSepRankerModel` on TensorFlow. -Additional components -:class:`~deeppavlov.models.preprocessors.bert_preprocessor.BertSepRankerPredictorPreprocessor` -and :class:`~deeppavlov.models.bert.bert_ranker.BertSepRankerPredictor` (on TensorFlow) are for usage in the ``interact`` mode -where the task for ranking is to retrieve the best possible response from some provided response base with the help of -the trained model. Working examples with the trained models are given :doc:`here `. -Statistics are available :doc:`here `. - Using custom BERT in DeepPavlov ------------------------------- @@ -190,12 +146,12 @@ the :doc:`config ` file must be changed to match new BERT * download URL in the ``metadata.download.url`` part of the config * ``bert_config_file``, ``pretrained_bert`` in the BERT based Component. In case of PyTorch BERT, ``pretrained_bert`` can be assigned to string name of any Transformer-based model (e.g. ``"bert-base-uncased"``, ``"distilbert-base-uncased"``) and then ``bert_config_file`` is set to ``None``. -* ``vocab_file`` in the ``bert_preprocessor`` (``torch_transformers_preprocessor``). In case of PyTorch BERT, ``vocab_file`` can be assigned to +* ``vocab_file`` in the ``torch_transformers_preprocessor``. ``vocab_file`` can be assigned to string name of used pre-trained BERT (e.g. ``"bert-base-uncased"``). .. [1] Kuratov, Y., Arkhipov, M. (2019). Adaptation of Deep Bidirectional Multilingual Transformers for Russian Language. arXiv preprint arXiv:1905.07213. .. [2] Arkhipov M., Trofimova M., Kuratov Y., Sorokin A. (2019). `Tuning Multilingual Transformers for Language-Specific Named Entity Recognition `__ . ACL anthology W19-3712. -.. [3] McDonald, R., Brokos, G. I., & Androutsopoulos, I. (2018). Deep relevance ranking using enhanced document-query interactions. arXiv preprint arXiv:1809.01682. +.. [3] Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108. .. [4] Yanran Li, Hui Su, Xiaoyu Shen, Wenjie Li, Ziqiang Cao, and Shuzi Niu. DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset. IJCNLP 2017. .. [5] P. Lison and J. Tiedemann, 2016, OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016) .. [6] Justine Zhang, Ravi Kumar, Sujith Ravi, Cristian Danescu-Niculescu-Mizil. Proceedings of NAACL, 2016. @@ -205,4 +161,3 @@ the :doc:`config ` file must be changed to match new BERT .. [10] Williams A., Bowman S. (2018) XNLI: Evaluating Cross-lingual Sentence Representations. arXiv preprint arXiv:1809.05053 .. [11] S. R. Bowman, G. Angeli, C. Potts, and C. D. Manning. (2015) A large annotated corpus for learning natural language inference. arXiv preprint arXiv:1508.05326 .. [12] N. Reimers, I. Gurevych (2019) Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. arXiv preprint arXiv:1908.10084 -.. [13] Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108. diff --git a/docs/features/models/classifiers.rst b/docs/features/models/classifiers.rst index 64905b4803..f1798c7f2c 100644 --- a/docs/features/models/classifiers.rst +++ b/docs/features/models/classifiers.rst @@ -6,9 +6,7 @@ which are implemented as a number of different **neural networks** or **sklearn Models can be used for binary, multi-class or multi-label classification. List of available classifiers (more info see below): -* **BERT classifier** (see :doc:`here `) builds BERT [8]_ architecture for classification problem on **TensorFlow** or on **PyTorch**. - -* **Keras classifier** (see :doc:`here `) builds neural network on Keras with tensorflow backend. +* **BERT classifier** (see :doc:`here `) builds BERT [4]_ architecture for classification problem on **PyTorch**. * **PyTorch classifier** (see :doc:`here `) builds neural network on PyTorch. @@ -27,24 +25,18 @@ Command line python -m deeppavlov install where ```` is a path to one of the :config:`provided config files ` -or its name without an extension, for example :config:`"intents_snips" `. +or its name without an extension, for example :config:`"insults_kaggle_bert" `. To download pre-trained models, vocabs, embeddings on the dataset of interest one should run the following command providing corresponding name of the config file (see above) -or provide flag ``-d`` for commands like ``interact``, ``telegram``, ``train``, ``evaluate``.: +or provide flag ``-d`` for commands like ``interact``, ``train``, ``evaluate``: .. code:: bash python -m deeppavlov download where ```` is a path to one of the :config:`provided config files ` -or its name without an extension, for example :config:`"intents_snips" `. - -When using KerasClassificationModel for **Windows** platforms one have to set `KERAS_BACKEND` to `tensorflow`: - -.. code:: bash - - set "KERAS_BACKEND=tensorflow" +or its name without an extension, for example :config:`"insults_kaggle_bert" `. **INTERACT** One can run the following command to interact in command line interface with provided config: @@ -53,7 +45,7 @@ When using KerasClassificationModel for **Windows** platforms one have to set `K python -m deeppavlov interact [-d] where ```` is a path to one of the :config:`provided config files ` -or its name without an extension, for example :config:`"intents_snips" `. +or its name without an extension, for example :config:`"insults_kaggle_bert" `. With the optional ``-d`` parameter all the data required to run selected pipeline will be **downloaded**. **TRAIN** After preparing the config file (including change of dataset, pipeline elements or parameters) @@ -73,103 +65,106 @@ Then training can be run in the following way: python -m deeppavlov train where ```` is a path to one of the :config:`provided config files ` -or its name without an extension, for example :config:`"intents_snips" `. +or its name without an extension, for example :config:`"insults_kaggle_bert" `. With the optional ``-d`` parameter all the data required to run selected pipeline will be **downloaded**. Python code ~~~~~~~~~~~ One can also use these configs in python code. -When using ``KerasClassificationModel`` for **Windows** platform -one needs to set ``KERAS_BACKEND`` to ``tensorflow`` in the following way: - -.. code:: python - - import os - - os.environ["KERAS_BACKEND"] = "tensorflow" **INTERACT** To download required data one have to set ``download`` parameter to ``True``. Then one can build and interact a model from configuration file: .. code:: python - from deeppavlov import build_model, configs - - CONFIG_PATH = configs.classifiers.intents_snips # could also be configuration dictionary or string path or `pathlib.Path` instance + from deeppavlov import build_model - model = build_model(CONFIG_PATH, download=True) # in case of necessity to download some data + model = build_model('insults_kaggle_bert', download=True) # in case of necessity to download some data - model = build_model(CONFIG_PATH, download=False) # otherwise + model = build_model('insults_kaggle_bert', download=False) # otherwise - print(model(["What is the weather in Boston today?"])) + print(model(["You are dumb", "He lay flat on the brown, pine-needled floor of the forest"])) - >>> [['GetWeather']] + >>> ['Insult', 'Not Insult'] **TRAIN** Also training can be run in the following way: .. code:: python - from deeppavlov import train_model, configs + from deeppavlov import train_model - CONFIG_PATH = configs.classifiers.intents_snips # could also be configuration dictionary or string path or `pathlib.Path` instance + model = train_model('insults_kaggle_bert', download=True) # in case of necessity to download some data - model = train_model(CONFIG_PATH, download=True) # in case of necessity to download some data - - model = train_model(CONFIG_PATH, download=False) # otherwise + model = train_model('insults_kaggle_bert', download=False) # otherwise BERT models ----------- -BERT (Bidirectional Encoder Representations from Transformers) [8]_ is a Transformer pre-trained on masked language model +BERT (Bidirectional Encoder Representations from Transformers) [4]_ is a Transformer pre-trained on masked language model and next sentence prediction tasks. This approach showed state-of-the-art results on a wide range of NLP tasks in English. -**deeppavlov.models.bert.BertClassifierModel** (see :doc:`here `) provides easy to use +**deeppavlov.models.torch_bert.torch_transformers_classifier.TorchTransformersClassifierModel** (see :doc:`here `) provides easy to use solution for classification problem using pre-trained BERT. Several **pre-trained English, multi-lingual and Russian BERT** models are provided in :doc:`our BERT documentation `. Two main components of BERT classifier pipeline in DeepPavlov are -``deeppavlov.models.preprocessors.bert_preprocessor.BertPreprocessor`` on TensorFlow (or ``deeppavlov.models.preprocessors.torch_transformers_preprocessor.TorchTransformersPreprocessor`` on PyTorch) (see :doc:`here `) -and ``deeppavlov.models.bert.bert_classifier.BertClassifierModel`` on TensorFlow (or ``deeppavlov.models.torch_bert.torch_transformers_classifier.TorchTransformersClassifierModel`` on PyTorch) (see :doc:`here `). +``deeppavlov.models.preprocessors.torch_transformers_preprocessor.TorchTransformersPreprocessor`` +and ``deeppavlov.models.torch_bert.torch_transformers_classifier.TorchTransformersClassifierModel`` (see :doc:`here `). The ``deeppavlov.models.torch_bert.torch_transformers_classifier.TorchTransformersClassifierModel`` class supports any Transformer-based model. -Non-processed texts should be given to ``bert_preprocessor`` (``torch_transformers_preprocessor``) for tokenization on subtokens, +Non-processed texts should be given to ``torch_transformers_preprocessor`` for tokenization on subtokens, encoding subtokens with their indices and creating tokens and segment masks. If one processed classes to one-hot labels in pipeline, ``one_hot_labels`` should be set to ``true``. -``bert_classifier`` (``torch_bert_classifier``) has a dense layer of number of classes size upon pooled outputs of Transformer encoder, +``torch_transformers_classifier`` has a dense layer of number of classes size upon pooled outputs of Transformer encoder, it is followed by ``softmax`` activation (``sigmoid`` if ``multilabel`` parameter is set to ``true`` in config). -Neural Networks on Keras ------------------------- +Neural Networks on PyTorch +-------------------------- -**deeppavlov.models.classifiers.KerasClassificationModel** (see :doc:`here `) -contains a number of different neural network configurations for classification task. -Please, pay attention that each model has its own parameters that should be specified in config. -Information about parameters could be found :doc:`here `. -One of the available network configurations can be chosen in ``model_name`` parameter in config. -Below the list of available models is presented: +**deeppavlov.models.classifiers.TorchClassificationModel** (see :doc:`here `) +could be used for implementation of different neural network configurations for classification task. -* ``cnn_model`` -- Shallow-and-wide CNN [1]_ with max pooling after convolution, -* ``dcnn_model`` -- Deep CNN with number of layers determined by the given number of kernel sizes and filters, -* ``cnn_model_max_and_aver_pool`` -- Shallow-and-wide CNN [1]_ with max and average pooling concatenation after convolution, -* ``bilstm_model`` -- Bidirectional LSTM, -* ``bilstm_bilstm_model`` -- 2-layers bidirectional LSTM, -* ``bilstm_cnn_model`` -- Bidirectional LSTM followed by shallow-and-wide CNN, -* ``cnn_bilstm_model`` -- Shallow-and-wide CNN followed by bidirectional LSTM, -* ``bilstm_self_add_attention_model`` -- Bidirectional LSTM followed by self additive attention layer, -* ``bilstm_self_mult_attention_model`` -- Bidirectional LSTM followed by self multiplicative attention layer, -* ``bigru_model`` -- Bidirectional GRU model. +If you want to build your own architecture for **text classification** tasks, do the following: + .. code:: python -Neural Networks on PyTorch --------------------------- + from deeppavlov.models.classifiers.torch_classification_model import TorchTextClassificationModel -**deeppavlov.models.classifiers.TorchClassificationModel** (see :doc:`here `) -does not contain a zoo of models while it has an example of shallow-and-wide CNN (``swcnn_model``). -An instruction of how to build your own architecture on PyTorch one may find :doc:`here `. + class MyModel(TorchTextClassificationModel): + + def my_network_architecture(self, **kwargs): + model = + return model + +In the config file, assign ``"class_name": "module.path.to.my.model.file:MyModel"`` +and ``"model_name": "my_network_architecture"`` in the dictionary with the main model. + +If you want to build your own **PyTorch**-based model for **some other NLP** task, do the following: + + .. code:: python + + from deeppavlov.core.models.torch_model import TorchModel + + class MyModel(TorchModel): + + def train_on_batch(x, y, *args, **kwargs): + + return loss + + def __call__(data, *args, **kwargs): + + return predictions + + def my_network_architecture(self, **kwargs): + model = + return model + +In the config file, assign ``"class_name": "module.path.to.my.model.file:MyModel"`` +and ``"model_name": "my_network_architecture"`` in the dictionary with the main model. Sklearn models -------------- @@ -188,62 +183,9 @@ Therefore, for sklearn component classifier one should set ``ensure_list_output` Pre-trained models ------------------ -We also provide with **pre-trained models** for classification on DSTC 2 dataset, SNIPS dataset, "AG News" dataset, +We also provide with **pre-trained models** for classification on "AG News" dataset, "Detecting Insults in Social Commentary", Twitter sentiment in Russian dataset. -`DSTC 2 dataset `__ does not initially contain information about **intents**, -therefore, ``Dstc2IntentsDatasetIterator`` (``deeppavlov/dataset_iterators/dstc2_intents_interator.py``) instance -extracts artificial intents for each user reply using information from acts and slots. - -Below we give several examples of intent construction: - - System: "Hello, welcome to the Cambridge restaurant system. You can - ask for restaurants by area, price range or food type. How may I - help you?" - - User: "cheap restaurant" - -In the original dataset this user reply has characteristics - -.. code:: bash - - "goals": {"pricerange": "cheap"}, - "db_result": null, - "dialog-acts": [{"slots": [["pricerange", "cheap"]], "act": "inform"}]} - -This message contains only one intent: ``inform_pricerange``. - - User: "thank you good bye", - -In the original dataset this user reply has characteristics - -.. code:: bash - - "goals": {"food": "dontcare", "pricerange": "cheap", "area": "south"}, - "db_result": null, - "dialog-acts": [{"slots": [], "act": "thankyou"}, {"slots": [], "act": "bye"}]} - -This message contains two intents ``(thankyou, bye)``. Train, valid and -test division is the same as on web-site. - -`SNIPS dataset `__ -contains **intent classification** task for 7 intents (approximately 2.4 -samples per intent): - -- GetWeather -- BookRestaurant -- PlayMusic -- AddToPlaylist -- RateBook -- SearchScreeningEvent -- SearchCreativeWork - -Initially, classification model on SNIPS dataset [7]_ was trained only as an -example of usage that is why we provide pre-trained model for SNIPS with -embeddings trained on DSTC-2 dataset that is not the best choice for -this task. Train set is divided to train and validation sets to -illustrate ``basic_classification_iterator`` work. - `Detecting Insults in Social Commentary dataset `__ contains binary classification task for **detecting insults** for participants of conversation. Train, valid and test division is the same @@ -257,7 +199,7 @@ and the train set is the rest. `Twitter mokoron dataset `__ contains **sentiment classification** of Russian tweets for positive and negative -replies [2]_. It was automatically labeled. +replies [1]_. It was automatically labeled. Train, valid and test division is made by hands (Stratified division: 1/5 from all dataset for test set with 42 seed, then 1/5 from the rest for validation set with 42 seed). Two provided pre-trained @@ -291,69 +233,22 @@ of sentences. Each sentence were initially labelled with floating point value fr the floating point labels are converted to integer labels according to the intervals `[0, 0.2], (0.2, 0.4], (0.4, 0.6], (0.6, 0.8], (0.8, 1.0]` corresponding to `very negative`, `negative`, `neutral`, `positive`, `very positive` classes. -`Yelp Reviews `__ contains 5-classes **sentiment classification** of product reviews. -The labels are `1`, `2`, `3`, `4`, `5` corresponding to `very negative`, `negative`, `neutral`, `positive`, `very positive` classes. -The reviews are long enough (cut up to 200 subtokens). - +------------------+--------------------+------+-------------------------------------------------------------------------------------------------+-------------+--------+--------+-----------+ | Task | Dataset | Lang | Model | Metric | Valid | Test | Downloads | +==================+====================+======+=================================================================================================+=============+========+========+===========+ -| 28 intents | `DSTC 2`_ | En | :config:`DSTC 2 emb ` | Accuracy | 0.7613 | 0.7733 | 800 Mb | -+ + + +-------------------------------------------------------------------------------------------------+ +--------+--------+-----------+ -| | | | :config:`Wiki emb ` | | 0.9629 | 0.9617 | 8.5 Gb | -+ + + +-------------------------------------------------------------------------------------------------+ +--------+--------+-----------+ -| | | | :config:`BERT ` | | 0.9673 | 0.9636 | 800 Mb | -+------------------+--------------------+ +-------------------------------------------------------------------------------------------------+-------------+--------+--------+-----------+ -| 7 intents | `SNIPS-2017`_ [7]_ | | :config:`DSTC 2 emb ` | F1-macro | 0.8591 | -- | 800 Mb | -+ + + +-------------------------------------------------------------------------------------------------+ +--------+--------+-----------+ -| | | | :config:`Wiki emb ` | | 0.9820 | -- | 8.5 Gb | -+ + + +-------------------------------------------------------------------------------------------------+ +--------+--------+-----------+ -| | | | :config:`Tfidf + SelectKBest + PCA + Wiki emb ` | | 0.9673 | -- | 8.6 Gb | -+ + + +-------------------------------------------------------------------------------------------------+ +--------+--------+-----------+ -| | | | :config:`Wiki emb weighted by Tfidf ` | | 0.9786 | -- | 8.5 Gb | -+------------------+--------------------+ +-------------------------------------------------------------------------------------------------+-------------+--------+--------+-----------+ -| Insult detection | `Insults`_ | | :config:`Reddit emb ` | ROC-AUC | 0.9263 | 0.8556 | 6.2 Gb | -+ + + +-------------------------------------------------------------------------------------------------+ +--------+--------+-----------+ -| | | | :config:`English BERT ` | | 0.9255 | 0.8612 | 1200 Mb | -+ + + +-------------------------------------------------------------------------------------------------+ +--------+--------+-----------+ -| | | | :config:`English Conversational BERT ` | | 0.9389 | 0.8941 | 1200 Mb | -+ + + +-------------------------------------------------------------------------------------------------+ +--------+--------+-----------+ -| | | | :config:`English BERT on PyTorch ` | | 0.9329 | 0.877 | 1.1 Gb | -+------------------+--------------------+ +-------------------------------------------------------------------------------------------------+-------------+--------+--------+-----------+ -| 5 topics | `AG News`_ | | :config:`Wiki emb ` | Accuracy | 0.8922 | 0.9059 | 8.5 Gb | -+------------------+--------------------+ +-------------------------------------------------------------------------------------------------+-------------+--------+--------+-----------+ -| Intent |`Yahoo-L31`_ | | :config:`Yahoo-L31 on conversational BERT ` | ROC-AUC | 0.9436 | -- | 1200 Mb | +| Insult detection | `Insults`_ | En | :config:`English BERT ` | ROC-AUC | 0.9327 | 0.8602 | 1.1 Gb | +------------------+--------------------+ +-------------------------------------------------------------------------------------------------+-------------+--------+--------+-----------+ -| Sentiment |`SST`_ | | :config:`5-classes SST on conversational BERT ` | Accuracy | 0.6456 | 0.6715 | 400 Mb | -+ + + +-------------------------------------------------------------------------------------------------+ +--------+--------+-----------+ -| | | | :config:`5-classes SST on multilingual BERT ` | | 0.5738 | 0.6024 | 660 Mb | -+ + + +-------------------------------------------------------------------------------------------------+ +--------+--------+-----------+ -| | | | :config:`3-classes SST SWCNN on PyTorch ` | | 0.7379 | 0.6312 | 4.3 Mb | -+ +--------------------+ +-------------------------------------------------------------------------------------------------+ +--------+--------+-----------+ -| |`Yelp`_ | | :config:`5-classes Yelp on conversational BERT ` | | 0.6925 | 0.6842 | 400 Mb | -+ + + +-------------------------------------------------------------------------------------------------+ +--------+--------+-----------+ -| | | | :config:`5-classes Yelp on multilingual BERT ` | | 0.5896 | 0.5874 | 660 Mb | +| Sentiment |`SST`_ | | :config:`5-classes SST on conversational BERT ` | Accuracy | 0.6293 | 0.6626 | 1.1 Gb | +------------------+--------------------+------+-------------------------------------------------------------------------------------------------+-------------+--------+--------+-----------+ -| Sentiment |`Twitter mokoron`_ | Ru | :config:`RuWiki+Lenta emb w/o preprocessing ` | | 0.9965 | 0.9961 | 6.2 Gb | -+ + + +-------------------------------------------------------------------------------------------------+ +--------+--------+-----------+ -| | | | :config:`RuWiki+Lenta emb with preprocessing ` | | 0.7823 | 0.7759 | 6.2 Gb | +| Sentiment |`Twitter mokoron`_ | Ru | :config:`RuWiki+Lenta emb w/o preprocessing ` | F1-macro | 0.9965 | 0.9961 | 6.2 Gb | + +--------------------+ +-------------------------------------------------------------------------------------------------+-------------+--------+--------+-----------+ -| |`RuSentiment`_ | | :config:`RuWiki+Lenta emb ` | F1-weighted | 0.6541 | 0.7016 | 6.2 Gb | -+ + + +-------------------------------------------------------------------------------------------------+ +--------+--------+-----------+ -| | | | :config:`Twitter emb super-convergence ` [6]_ | | 0.7301 | 0.7576 | 3.4 Gb | -+ + + +-------------------------------------------------------------------------------------------------+ +--------+--------+-----------+ -| | | | :config:`ELMo ` | | 0.7519 | 0.7875 | 700 Mb | +| |`RuSentiment`_ | | :config:`Multi-language BERT ` | F1-weighted | 0.6787 | 0.7005 | 1.3 Gb | + + + +-------------------------------------------------------------------------------------------------+ +--------+--------+-----------+ -| | | | :config:`Multi-language BERT ` | | 0.6809 | 0.7193 | 1900 Mb | -+ + + +-------------------------------------------------------------------------------------------------+ +--------+--------+-----------+ -| | | | :config:`Conversational RuBERT ` | | 0.7548 | 0.7742 | 657 Mb | -+------------------+--------------------+ +-------------------------------------------------------------------------------------------------+-------------+--------+--------+-----------+ -| Intent |Ru like`Yahoo-L31`_ | | :config:`Conversational vs Informational on ELMo ` | ROC-AUC | 0.9412 | -- | 700 Mb | +| | | | :config:`Conversational RuBERT ` | | 0.739 | 0.7724 | 1.5 Gb | +------------------+--------------------+------+-------------------------------------------------------------------------------------------------+-------------+--------+--------+-----------+ .. _`DSTC 2`: http://camdial.org/~mh521/dstc/ -.. _`SNIPS-2017`: https://github.com/snipsco/nlu-benchmark/tree/master/2017-06-custom-intent-engines .. _`Insults`: https://www.kaggle.com/c/detecting-insults-in-social-commentary .. _`AG News`: https://www.di.unipi.it/~gulli/AG_corpus_of_news_articles.html .. _`Twitter mokoron`: http://study.mokoron.com/ @@ -362,7 +257,6 @@ The reviews are long enough (cut up to 200 subtokens). .. _`Yahoo-L31`: https://webscope.sandbox.yahoo.com/catalog.php?datatype=l .. _`Yahoo-L6`: https://webscope.sandbox.yahoo.com/catalog.php?datatype=l .. _`SST`: https://nlp.stanford.edu/sentiment/index.html -.. _`Yelp`: https://www.yelp.com/dataset GLUE Benchmark -------------- @@ -422,60 +316,22 @@ Then training process can be run in the same way: python -m deeppavlov train -Comparison ----------- - -The comparison of the presented model is given on **SNIPS** dataset [7]_. The -evaluation of model scores was conducted in the same way as in [3]_ to -compare with the results from the report of the authors of the dataset. -The results were achieved with tuning of parameters and embeddings -trained on Reddit dataset. - -+------------------------+-----------------+------------------+---------------+--------------+--------------+----------------------+------------------------+ -| Model | AddToPlaylist | BookRestaurant | GetWheather | PlayMusic | RateBook | SearchCreativeWork | SearchScreeningEvent | -+========================+=================+==================+===============+==============+==============+======================+========================+ -| api.ai | 0.9931 | 0.9949 | 0.9935 | 0.9811 | 0.9992 | 0.9659 | 0.9801 | -+------------------------+-----------------+------------------+---------------+--------------+--------------+----------------------+------------------------+ -| ibm.watson | 0.9931 | 0.9950 | 0.9950 | 0.9822 | 0.9996 | 0.9643 | 0.9750 | -+------------------------+-----------------+------------------+---------------+--------------+--------------+----------------------+------------------------+ -| microsoft.luis | 0.9943 | 0.9935 | 0.9925 | 0.9815 | 0.9988 | 0.9620 | 0.9749 | -+------------------------+-----------------+------------------+---------------+--------------+--------------+----------------------+------------------------+ -| wit.ai | 0.9877 | 0.9913 | 0.9921 | 0.9766 | 0.9977 | 0.9458 | 0.9673 | -+------------------------+-----------------+------------------+---------------+--------------+--------------+----------------------+------------------------+ -| snips.ai | 0.9873 | 0.9921 | 0.9939 | 0.9729 | 0.9985 | 0.9455 | 0.9613 | -+------------------------+-----------------+------------------+---------------+--------------+--------------+----------------------+------------------------+ -| recast.ai | 0.9894 | 0.9943 | 0.9910 | 0.9660 | 0.9981 | 0.9424 | 0.9539 | -+------------------------+-----------------+------------------+---------------+--------------+--------------+----------------------+------------------------+ -| amazon.lex | 0.9930 | 0.9862 | 0.9825 | 0.9709 | 0.9981 | 0.9427 | 0.9581 | -+------------------------+-----------------+------------------+---------------+--------------+--------------+----------------------+------------------------+ -+------------------------+-----------------+------------------+---------------+--------------+--------------+----------------------+------------------------+ -| Shallow-and-wide CNN | **0.9956** | **0.9973** | **0.9968** | **0.9871** | **0.9998** | **0.9752** | **0.9854** | -+------------------------+-----------------+------------------+---------------+--------------+--------------+----------------------+------------------------+ - How to improve the performance ------------------------------ -- One can use FastText [4]_ to train embeddings that are better suited +- One can use FastText [2]_ to train embeddings that are better suited for considered datasets. - One can use some custom preprocessing to clean texts. -- One can use ELMo [5]_ or BERT [8]_. +- One can use ELMo [3]_ or BERT [4]_. - All the parameters should be tuned on the validation set. References ---------- -.. [1] Kim Y. Convolutional neural networks for sentence classification //arXiv preprint arXiv:1408.5882. – 2014. - -.. [2] Ю. В. Рубцова. Построение корпуса текстов для настройки тонового классификатора // Программные продукты и системы, 2015, №1(109), –С.72-78 - -.. [3] https://www.slideshare.net/KonstantinSavenkov/nlu-intent-detection-benchmark-by-intento-august-2017 - -.. [4] P. Bojanowski\ *, E. Grave*, A. Joulin, T. Mikolov, Enriching Word Vectors with Subword Information. - -.. [5] Peters, Matthew E., et al. "Deep contextualized word representations." arXiv preprint arXiv:1802.05365 (2018). +.. [1] Ю. В. Рубцова. Построение корпуса текстов для настройки тонового классификатора // Программные продукты и системы, 2015, №1(109), –С.72-78 -.. [6] Smith L. N., Topin N. Super-convergence: Very fast training of residual networks using large learning rates. – 2018. +.. [2] P. Bojanowski\ *, E. Grave*, A. Joulin, T. Mikolov, Enriching Word Vectors with Subword Information. -.. [7] Coucke A. et al. Snips voice platform: an embedded spoken language understanding system for private-by-design voice interfaces //arXiv preprint arXiv:1805.10190. – 2018. +.. [3] Peters, Matthew E., et al. "Deep contextualized word representations." arXiv preprint arXiv:1802.05365 (2018). -.. [8] Devlin J. et al. Bert: Pre-training of deep bidirectional transformers for language understanding //arXiv preprint arXiv:1810.04805. – 2018. +.. [4] Devlin J. et al. Bert: Pre-training of deep bidirectional transformers for language understanding //arXiv preprint arXiv:1810.04805. – 2018. diff --git a/docs/features/models/entity_extraction.rst b/docs/features/models/entity_extraction.rst new file mode 100644 index 0000000000..8b5fa7ba9c --- /dev/null +++ b/docs/features/models/entity_extraction.rst @@ -0,0 +1,107 @@ +Entity Extraction +======================================== +Entity Detection is the task of identifying entity mentions in text with corresponding entity types. +Entity Detection configs are available in :config:`English ` and :config:`Russian ` languages. These configs support entity detection in texts longer than 512 tokens. + +Use the model +------------- + +Pre-trained model can be used for inference from both Command Line Interface (CLI) and Python. Before using the model make sure that all required packages are installed using the command: + +For English version: + +.. code:: bash + + python -m deeppavlov install entity_detection_en + +To use a pre-trained model from CLI use the following command: + +.. code:: bash + + python -m deeppavlov interact entity_detection_en -d + >>> Forrest Gump is a comedy-drama film directed by Robert Zemeckis and written by Eric Roth. + >>> ([['forrest gump', 'robert zemeckis', 'eric roth']], [[(0, 12), (48, 63), (79, 88)]], [[[0, 1], [10, 11], [15, 16]]], [['WORK_OF_ART', 'PERSON', 'PERSON']], [[(0, 89)]], [['Forrest Gump is a comedy-drama film directed by Robert Zemeckis and written by Eric Roth.']], [[0.8997, 0.9979, 0.9979]]) + +The output elements: + +* entity substrings +* entity offsets (indices of start and end symbols of entities in text) +* entity positions (indices of entity tokens in text) +* entity tags +* sentences offsets +* list of sentences in text + +For Russian version: + +.. code:: bash + + python -m deeppavlov install entity_linking_ru + +To use a pre-trained model from CLI use the following command: + +.. code:: bash + + python -m deeppavlov interact entity_linking_ru -d + >>> Москва — столица России, центр Центрального федерального округа и центр Московской области. + >>> ([['москва', 'россии', 'центрального федерального округа', 'московской области']], [[(0, 6), (17, 23), (31, 63), (72, 90)]], [[[0], [3], [6, 7, 8], [11, 12]]], [['CITY', 'COUNTRY', 'LOC', 'LOC']], [[(0, 91)]], [['Москва — столица России, центр Центрального федерального округа и центр Московской области.']], [[0.8359, 0.938, 0.9917, 0.9803]]) + +Entity Detection model can be used from Python using the following code: + +.. code:: python + + from deeppavlov import configs, build_model + + ed = build_model(configs.entity_extraction.entity_detection_en, download=True) + ed(['Forrest Gump is a comedy-drama film directed by Robert Zemeckis and written by Eric Roth.']) + +Entity Linking is the task of finding knowledge base entity ids for entity mentions in text. Entity Linking in DeepPavlov supports Wikidata and Wikipedia (for :config:`English ` and :config:`Russian `). Entity Linking component performs the following steps: + +* extraction of candidate entities from SQLite database; +* candidate entities sorting by entity tags (if entity tags are provided); +* ranking of candidate entities by connections in Wikidata knowledge graph of candidate entities for different mentions; +* candidate entities ranking by context and descriptions using Transformer model `bert-small `__ in English config and `distilrubert-tiny `__. + +Entity linking models in DeepPavlov are lightweight: English version requires 2.4 Gb RAM and 1.2 Gb GPU, Russian version 2.2 Gb RAM and 1.1 Gb GPU. + +Entity Extraction configs perform subsequent Entity Detection and Entity Linking of extracted entity mentions. +Entity Extraction configs are available for :config:`English ` and :config:`Russian `. + +Use the model +------------- + +For English version: + +.. code:: bash + + python -m deeppavlov install entity_extraction_en + +To use a pre-trained model from CLI use the following command: + +.. code:: bash + + python -m deeppavlov interact entity_extraction_en -d + >>> Forrest Gump is a comedy-drama film directed by Robert Zemeckis and written by Eric Roth. + >>> (['forrest gump', 'robert zemeckis', 'eric roth'], ['WORK_OF_ART', 'PERSON', 'PERSON'], [(0, 12), (48, 63), (79, 88)], ['Q134773', 'Q187364', 'Q942932'], [(1.0, 110, 1.0), (1.0, 73, 1.0), (1.0, 37, 0.95)], ['Forrest Gump', 'Robert Zemeckis', 'Eric Roth']) + +For Russian version: + +.. code:: bash + + python -m deeppavlov install entity_extraction_ru + +To use a pre-trained model from CLI use the following command: + +.. code:: bash + + python -m deeppavlov interact entity_extraction_ru -d + >>> Москва — столица России, центр Центрального федерального округа и центр Московской области. + >>> (['москва', 'россии', 'центрального федерального округа', 'московской области'], ['CITY', 'COUNTRY', 'LOC', 'LOC'], [(0, 6), (17, 23), (31, 63), (72, 90)], ['Q649', 'Q159', 'Q190778', 'Q1697'], [(1.0, 134, 1.0), (1.0, 203, 1.0), (0.97, 24, 0.28), (0.9, 30, 1.0)], ['Москва', 'Россия', 'Центральный федеральный округ', 'Московская область']) + +Entity Linking model can be used from Python using the following code: + +.. code:: python + + from deeppavlov import configs, build_model + + entity_extraction = build_model(configs.kbqa.entity_extraction_en, download=True) + entity_extraction(['Forrest Gump is a comedy-drama film directed by Robert Zemeckis and written by Eric Roth.']) diff --git a/docs/features/models/entity_linking.rst b/docs/features/models/entity_linking.rst deleted file mode 100644 index 90b2753777..0000000000 --- a/docs/features/models/entity_linking.rst +++ /dev/null @@ -1,55 +0,0 @@ -Entity Linking -======================================== - -Entity linking is the task of mapping words from text (e.g. names of persons, locations and organizations) to entities from the target knowledge base (Wikidata in our case). - -Entity Linking systems are available for English and Russian languages. - -Entity Linking component performs the following steps: - -* the substring, detected with :config:`NER (English) ` or :config:`NER (Russian) `, is fed to TfidfVectorizer and the resulting sparse vector is converted to dense one -* `Faiss `__ library is used to find k nearest neighbours for tf-idf vector in the matrix where rows correspond to tf-idf vectors of words in entity titles -* entities are ranked by number of relations in Wikidata (number of outgoing edges of nodes in the knowledge graph) -* :config:`BERT (English) ` or :config:`BERT (Russian) ` is used for entities ranking by entity description and by sentence that mentions the entity - -Use the model -------------- - -Pre-trained model can be used for inference from both Command Line Interface (CLI) and Python. Before using the model make sure that all required packages are installed using the command: - -For English version: - -.. code:: bash - - python -m deeppavlov install entity_linking_eng - -To use a pre-trained model from CLI use the following command: - -.. code:: bash - - python -m deeppavlov interact entity_linking_eng -d - >>> The city stands on the River Thames in the south-east of England, at the head of its 50-mile (80 km) estuary leading to the North Sea. - >>> (['the river thames', 'the north sea', 'england'], [[4, 5, 6], [30, 31, 32], [13]], ['Q19686', 'Q1693', 'Q21']) - -For Russian version: - -.. code:: bash - - python -m deeppavlov install entity_linking_rus - -To use a pre-trained model from CLI use the following command: - -.. code:: bash - - python -m deeppavlov interact entity_linking_rus -d - >>> Москва — столица России, город федерального значения, административный центр Центрального федерального округа и центр Московской области. - >>> (['москва', 'россии', 'центрального федерального округа', 'московской области'], [[0], [3], [11, 12, 13], [16, 17]], ['Q649', 'Q159', 'Q190778', 'Q1749']) - -Entity Linking model can be used from Python using the following code: - -.. code:: python - - from deeppavlov import configs, build_model - - el_model = build_model(configs.kbqa.entity_linking_rus, download=True) - el_model(['Москва — столица России, город федерального значения, административный центр Центрального федерального округа и центр Московской области.']) diff --git a/docs/features/models/intent_catcher.rst b/docs/features/models/intent_catcher.rst deleted file mode 100644 index f14d1c0213..0000000000 --- a/docs/features/models/intent_catcher.rst +++ /dev/null @@ -1,83 +0,0 @@ -Intent Catcher -############## - -Overview -******** -Intent Catcher is an NLP component used for intent detection in the Conversational AI systems. - -It consists of an embedder, which is a Transformer model, and a number of dense layers, that are fitted upon provided embeddings. The current provided embeddings are: Universal Sentence Encoder [1]_, and it's larger version. - -Intent Catcher has been originally designed for the high-level intent detection as part of the `DREAM Socialbot `_ that was built by DeepPavlov team for Alexa Prize 3. - -Goals -===== -Typical approach for building ML-based intent classification is based on providing a relatively large number of examples for each of the intents. This might make sense when a number of intents is relatively small and there is enough data (e.g., a small internal organizational chatbot) but is questionable when the number of intents is large and amount of available data is relatively small. - -For Alexa Prize 3, typical approach didn't work. Alexa Prize socialbots are expected to react a wide number of user intents in the open domain. The team needed to have a simple and fast way to add more intents, and add a relatively small number of examples for each new intent. Using regular expressions alone wouldn't be useful. But they could be used for up-sampling. - -Intent Catcher was designed around idea that by adding an additional cost of requiring basic knowledge of Regular Expressions, users would be able to provide a smaller number of examples in RegEx format to enable up-sampling. In addition to that, it turned out that using RegEx directly, in addition to the up-sampled dataset was useful, too. Finally, there was need to check punctuation as a useful way to distinguish statements from questions and the like. - -Features -******** -* Up-sampling using RegEx-based format -* Direct RegEx-based pattern matching -* Additional checks for punctuation - -How Do I: Train My Intent Classifier -************************************ - -Dataset construction -==================== - -Dataset can be constructed in 2 ways: listing number of intents and regular expressions in .json, or just a usual .csv format. -The json format is down below: - -.. code:: json - - { - "intent_1": ["regexp1", "regexp2"] - } - -To use data in this format, don't forget to add ``intent_catcher_reader`` as a dataset_reader in the config of model. - -Train and evaluate model -======================== - -All the embeddings come pre-trained, and there is no need to install them. Though, for both Command Line Interface (CLI) and Python it is necessary to install dependences first. -To do so, run: - -.. code:: bash - - python -m deeppavlov install intent_catcher - -To use a pre-trained model from CLI use the following command: - -.. code:: bash - - python -m deeppavlov interact intent_catcher -d - -where ``intent_catcher`` is the name of the config. - -The provided config example is :config:`intent_catcher ` - - -How Do I: Integrate Intent Catcher into DeepPavlov Deepy -******************************************************** - -To integrate your Intent Catcher-based intent classifier into your Multiskill AI Assistant built using DeepPavlov Conversational AI Stack, follow the following instructions: - -1. Clone `Deepy repository `_ -2. Replace ``docker-compose.yml`` in the root of the repository and ``pipeline_conf.json`` in the ``/agent/`` subdirectory with the corresponding files from the `deepy_adv `_ **Deepy Distribution** -3. Clone the `Tutorial Notebook `_ -4. Change its ``intents`` based on your project needs with your custom **intents** -5. Train the Intent Catcher model in your copy of the Tutorial Notebook -6. Download and put saved data from your copy of the Tutorial Notebook into the `Intent Catcher `_ -7. [Optional] Unless you need a Chit-Chat skill remove `it `_ from at both the ``/agent/pipeline_conf.json`` and from ``docker-compose.yml`` -8. Use ``docker-compose up --build`` command to build and run your DeepPavlov-based Multiskill AI Assistant - -.. note:: - In the future versions of the DeepPavlov Library we will provide a more comprehensive update to the documentation to further simplify the process of adding DeepPavlov NLP components as annotators to the Multiskill AI Assistants built using DeepPavlov Conversational AI Stack. Stay tuned! - -References -************ -.. [1] Cer, Daniel, et al. "Universal sentence encoder." arXiv preprint arXiv:1803.11175 (2018). diff --git a/docs/features/models/kbqa.rst b/docs/features/models/kbqa.rst index 9d87bc316e..34cddc1fd6 100755 --- a/docs/features/models/kbqa.rst +++ b/docs/features/models/kbqa.rst @@ -20,18 +20,21 @@ The question answerer: Built-In Models ------------------ -Currently, we provide three built-in models for KBQA in DeepPavlov library: +Currently, we provide two built-in models for KBQA in DeepPavlov library: -* :config:`kbqa_cq ` - for answering complex questions over Wikidata in English, +* :config:`kbqa_cq_en ` - for answering complex questions over Wikidata in English, -* :config:`kbqa_rus ` - for answering complex questions over Wikidata in Russian, +* :config:`kbqa_cq_ru ` - for answering complex questions over Wikidata in Russian, -* :config:`kbqa_cq_online ` - for answering complex questions in English over Wikidata using Wikidata Query Service. +These configs use local Wikidata dump in hdt format (3.7 Gb on disk). -The first two models are very similar to each other, and they allow you to deploy them together with local copy of Wikidata on-premises or in the cloud. The third model is lightweight as it allows you to skip downloading entire Wikidata and use the existing Wikidata APIs instead. - -.. note:: - We recommend you to use the lightweight model for quick experiments as well as academic research, and full models in production to avoid dependencies on the public Wikidata APIs. ++--------------------------------------------------+-----------+-----------+ +| Model | RAM, Gb | GPU, Gb | ++==================================================+===========+===========+ +| :config:`kbqa_cq_en ` | 3.5 | 4.3 | ++--------------------------------------------------+-----------+-----------+ +| :config:`kbqa_cq_ru ` | 6.9 | 6.5 | ++--------------------------------------------------+-----------+-----------+ The Knowledge Base Question Answering model uses Wikidata to answer complex questions. Here are some of the most popular types of questions supported by the model: @@ -61,9 +64,7 @@ The following models are used to find the answer: title. The result of the matching procedure is a set of candidate entities. The reset is search of the entity among this set with one of the top-k relations predicted by classification model, -* BiGRU model for ranking of candidate relations, - -* BERT model for ranking of candidate relation paths, +* BERT model for ranking of candidate relations and candidate relation paths, * Query generator model is used to fill query template with candidate entities and relations (to find valid combinations of entities and relations for query template). Query Generation model uses Wikidata HDT file. Query Generation Online model uses Wikidata Query Service. @@ -74,42 +75,32 @@ Any pre-trained model in DeepPavlov Library can be used for inference from both .. code:: bash - python -m deeppavlov install kbqa_cq - python -m deeppavlov install kbqa_cq_online - python -m deeppavlov install kbqa_cq_rus + python -m deeppavlov install kbqa_cq_en + python -m deeppavlov install kbqa_cq_ru To use a pre-trained model from CLI use the following command: .. code:: bash - python deeppavlov/deep.py interact kbqa_сq [-d] - python deeppavlov/deep.py interact kbqa_cq_online [-d] - python deeppavlov/deep.py interact kbqa_cq_rus [-d] + python deeppavlov/deep.py interact kbqa_сq_en [-d] + python deeppavlov/deep.py interact kbqa_cq_ru [-d] -where ``kbqa_cq`` and others are the names of configs and ``-d`` is an optional download key. The key ``-d`` is used +where ``kbqa_cq_en`` and others are the names of configs and ``-d`` is an optional download key. The key ``-d`` is used to download the pre-trained model along with embeddings and all other files needed to run the model. You can also use command ``download``. KBQA model for complex question answering can be used from Python using the following code: .. code:: python - from deeppavlov import configs, build_model + from deeppavlov import build_model - kbqa_model = build_model(configs.kbqa.kbqa_cq, download=True) - kbqa_model(['What is in the village of Negev that has diplomatic relations with the Czech Republic?']) - >>> ["Israel"] + kbqa_model = build_model('kbqa_cq_en', download=True) + kbqa_model(['What is the currency of Sweden?']) + >>> ["Swedish krona"] kbqa_model(['Magnus Carlsen is a part of what sport?']) >>> ["chess"] kbqa_model(['How many sponsors are for Juventus F.C.?']) >>> [4] - -In the models mentioned above lite version of Wikidata is used. Full version of Wikidata can be downloaded from http://www.rdfhdt.org/datasets/. Examples of questions which the model can answer with the following version of Wikidata: - -.. code:: python - - from deeppavlov import configs, build_model - - kbqa_model = build_model(configs.kbqa.kbqa_cq, download=True) kbqa_model(['When did Jean-Paul Sartre move to Le Havre?']) >>> ["1931-01-01"] kbqa_model(['What position did Angela Merkel hold on November 10, 1994?']) @@ -119,9 +110,9 @@ KBQA model for complex question answering in Russian can be used from Python usi .. code:: python - from deeppavlov import configs, build_model + from deeppavlov import build_model - kbqa_model = build_model(configs.kbqa.kbqa_cq_rus, download=True) + kbqa_model = build_model('kbqa_cq_ru', download=True) kbqa_model(['Когда родился Пушкин?']) >>> ["1799-05-26"] @@ -131,16 +122,14 @@ Here are the models we've trained for complex question answering: * :config:`query_pr ` - classification model for prediction of query template type, -* :config:`entity_detection ` - sequence tagging model for detection of entity and entity types substrings in the question, - -* :config:`rel_ranking ` - model for ranking of candidate relations for the question, +* :config:`entity_detection ` - sequence tagging model for detection of entity and entity types substrings in the question, -* :config:`rel_ranking_bert ` - model for ranking of candidate relation paths for the question. +* :config:`rel_ranking ` - model for ranking of candidate relations and candidate_relation_paths for the question, How Do I: Train Query Prediction Model -------------------------------------- -The dataset consists of three csv files: train.csv, valid.csv and test.csv. Each line in this file contains question and corresponding query template type, for example:: +The dataset (in pickle format) is a dict of three keys: "train", "valid" and "test". The value by each key is the list of samples, an example of a sample: "What is the longest river in the UK?", 6 @@ -150,32 +139,18 @@ How Do I: Train Entity Detection Model The dataset is a pickle file. The dataset must be split into three parts: train, test, and validation. Each part is a list of tuples of question tokens and tags for each token. An example of training sample:: (['What', 'is', 'the', 'complete', 'list', 'of', 'records', 'released', 'by', 'Jerry', 'Lee', 'Lewis', '?'], - ['O-TAG', 'O-TAG', 'O-TAG', 'O-TAG', 'T-TAG', 'T-TAG', 'T-TAG', 'O-TAG', 'O-TAG', 'E-TAG', 'E-TAG', 'E-TAG', 'O-TAG']) + ['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-PER', 'I-PER', 'I-PER', 'O']) -``T-TAG`` corresponds to tokens of entity types, ``E-TAG`` - for entities, ``O-TAG`` - for other tokens. +The tags of tokens correspond to BIO-markup. How Do I: Train Relation and Path Ranking Models ------------------------------------------------ -The dataset for relation ranking consists of two xml files (train and test sets). Each sample contains a question, a relation title and a label (1 if the relation corresponds to the question and 0 otherwise). An example of training sample: +The dataset (in pickle format) is a dict of three keys: "train", "valid" and "test". The value by each key is the list of samples, an example of a sample:: -.. code:: xml - - - Is it true that the total shots in career of Rick Adduono is equal to 1? - total shots in career - 1 - - -The dataset for path ranking is similar to the dataset for relation ranking. If the path from the grounded entity in the question and the answer consists of two relations, relation titles are separated with "#": - -.. code:: xml - - - When did Thomas Cromwell end his position as Lord Privy Seal? - position held # end time - 1 - + (['What is the Main St. Exile label, which Nik Powell co-founded?', ['record label', 'founded by']], '1') + +The sample contains the question, relations in the question and label (1 - if the relations correspond to the question, 0 - otherwise). How Do I: Adding Templates For New SPARQL Queries ------------------------------------------------- @@ -200,7 +175,7 @@ An example of a template:: * ``query_template`` is the template of the SPARQL query, * ``property_types`` defines the types of unknown relations in the template, -* ``rank_rels`` is a list which defines whether to rank relations, in this example **p:R1** relations we extract from Wikidata for **wd:E1** entities and rank with RelRanker, **ps:R1** and **?p** relations we do not extract and rank, +* ``rank_rels`` is a list which defines whether to rank relations, in this example **p:R1** relations we extract from Wikidata for **wd:E1** entities and rank with rel_ranker, **ps:R1** and **?p** relations we do not extract and rank, * ``rel_types`` - direct, statement or qualifier relations, * ``filter_rels`` (only for online version of KBQA) - whether candidate rels will be enumerated in the **filter** expression in the query, for example, **SELECT ?ent WHERE { ?ent wdt:P31 wd:Q4022 . ?ent ?p1 wd:Q90 } filter(?p1 = wdt:P131 || ?p1 = wdt:P17)**, @@ -213,33 +188,11 @@ An example of a template:: * ``template_num`` - the number of template, * alternative_templates - numbers of alternative templates to use if the answer was not found with the current template. -Advanced: Using Entity Linking and Wiki Parser As Standalone Services For KBQA +Advanced: Using Wiki Parser As Standalone Service For KBQA ------------------------------------------------------------------------------ Default configuration for KBQA was designed to use all of the supporting models together as a part of the KBQA pipeline. However, there might be a case when you want to work with some of these models in addition to KBQA. -For example, you might want to use Entity Linking as an annotator in your `Deepy-based `_ multiskill AI Assistant. Or, you might want to use Wiki Parser component to directly run SPARQL queries against your copy of Wikidata. To support these usecase, starting with this release you can also deploy supporting models as standalone components. - -Config :config:`kbqa_entity_linking ` can be used as service with the following command: - -.. code:: bash - - python -m deeppavlov riseapi kbqa_entity_linking [-d] [-p ] - -Arguments: - -* ``entity_substr`` - batch of lists of entity substrings for which we want to find ids in Wikidata, -* ``template`` - template of the sentence (if the sentence with the entity matches of one of templates), -* ``context`` - text with the entity. - -.. code:: python - - import requests - - payload = {"entity_substr": [["Forrest Gump"]], "template": [""], "context": ["Who directed Forrest Gump?"]} - response = requests.post(entity_linking_url, json=payload).json() - print(response) - - +For example, you might want to use Wiki Parser component to directly run SPARQL queries against your copy of Wikidata. To support these usecase, starting with this release you can also deploy supporting models as standalone components. Config :config:`wiki_parser ` can be used as service with the following command: @@ -291,21 +244,17 @@ To find labels for entities ids, the ``query`` argument should be the list of en In the example in the list ["Q159", ""] the second element which is an empty string can be the string with the sentence. -To use Entity Linking service in KBQA, in the :config:`kbqa_cq_sep ` you should use add to ``pipe`` API Requester component:: +To use Entity Linking service in KBQA, in the :config:`kbqa_cq_en ` you should replace :config:`entity linking component ` with API Requester component in the following way:: { "class_name": "api_requester", - "id": "linker_entities", + "id": "entity_linker", "url": "entity_linking_url", "out": ["entity_ids"], "param_names": ["entity_substr", "template_found"] } - -and replace ``linker_entities`` parameter value of the :config:`query_generator ` component with ``#linker_entities``:: - - "linker_entities": "#linker_entities", -To use Wiki Parser service in KBQA, in the :config:`kbqa_cq_sep ` you should add to ``pipe`` API Requester component:: +To use Wiki Parser service in KBQA, in the :config:`kbqa_cq_en ` you should replace :config:`wiki parser component ` with API Requester component in the following way:: { "class_name": "api_requester", @@ -315,9 +264,5 @@ To use Wiki Parser service in KBQA, in the :config:`kbqa_cq_sep ` and :config:`rel_ranking_bert_infer ` components with ``#wiki_p``:: - - "wiki_parser": "#wiki_p", - .. warning:: - Don't forget to replace the ``url`` parameter values in the above examples with correct URLs \ No newline at end of file + Don't forget to replace the ``url`` parameter values in the above examples with correct URLs diff --git a/docs/features/models/morphotagger.rst b/docs/features/models/morphotagger.rst deleted file mode 100644 index e8e7769cd5..0000000000 --- a/docs/features/models/morphotagger.rst +++ /dev/null @@ -1,684 +0,0 @@ -Neural Morphological Tagging -============================ - -It is an implementation of neural morphological tagger. -As for now (November, 2019) we have two types of models: -the BERT-based ones (available only for Russian) and -the character-based bidirectional LSTM. The BERT model -includes only a dense layer on the top of BERT embedder. -See the `BERT paper `__ -for a more complete description, as well as the -`BERT section `__ of the documentation. -We plan to release more BERT-based models in near future. - -Most of our models follow -`Heigold et al., 2017. An extensive empirical evaluation of -character-based morphological tagging for 14 -languages `__. -They also achieve the state-of-the-art performance among open source -systems. - -The BERT-based model is trained on `Universal -Dependencies corpora `__ -(version 2.3), while all the other models were trained -on Universal Dependencies 2.0 corpora. - -+----------------+--------------+-----------------+-------------------------------+------------------+----------------+ -| Language | Code | UDPipe accuracy | UDPipe Future accuracy [#f1]_ | Our top accuracy | Model size (MB)| -+================+==============+=================+===============================+==================+================+ -| Arabic | ar | 88.31 | | 90.85 | 23.7 | -+----------------+--------------+-----------------+-------------------------------+------------------+----------------+ -| Czech | cs | 91.86 | | 94.35 | 41.8 | -+----------------+--------------+-----------------+-------------------------------+------------------+----------------+ -| English | en | 92.53 | | 93.00 | 16.9 | -+----------------+--------------+-----------------+-------------------------------+------------------+----------------+ -| French | fr | 95.25 | | 95.45 | 19.0 | -+----------------+--------------+-----------------+-------------------------------+------------------+----------------+ -| German | de | 76.65 | | 83.83 | 18.6 | -+----------------+--------------+-----------------+-------------------------------+------------------+----------------+ -| Hindi | hi | 87.74 | | 90.01 | 21.9 | -+----------------+--------------+-----------------+-------------------------------+------------------+----------------+ -| Hungarian | hu | 69.52 | | 75.34 | 15.4 | -+----------------+--------------+-----------------+-------------------------------+------------------+----------------+ -| Italian | it | 96.33 | | 96.47 | 32.0 | -+----------------+--------------+-----------------+-------------------------------+------------------+----------------+ -| Russian | ru_syntagrus | 93.57 | | 96.23 | 48.7 | -+----------------+--------------+-----------------+-------------------------------+------------------+----------------+ -| Russian (UD2.3)| ru_syntagrus | 93.5 | 96.90 | 97.83 | 661 | -+----------------+--------------+-----------------+-------------------------------+------------------+----------------+ -| Spanish | es_ancora | 96.88 | | 97.00 | 20.8 | -+----------------+--------------+-----------------+-------------------------------+------------------+----------------+ -| Turkish | tr | 86.98 | | 88.03 | 16.1 | -+----------------+--------------+-----------------+-------------------------------+------------------+----------------+ - -.. rubric:: Footnotes - -.. [#f1] No models available, only the source code. The scores are taken from - `Straka. UDPipe 2.0 Prototype at CoNLL 2018 UD Shared Task. `__. - - -=========================== -Usage examples. -=========================== - -Before using the model make sure that all required packages are installed using the command: - -.. code:: bash - - python -m deeppavlov install morpho_ru_syntagrus_pymorphy - -For Windows platform one has to set `KERAS_BACKEND` to `tensorflow` (it could be done only once): - -.. code:: bash - - set "KERAS_BACKEND=tensorflow" - -Python: ---------------------------- - -For Windows platform if one did not set `KERAS_BACKEND` to `tensorflow` from command line it could be done in python code in the following way: - -.. code:: python - - import os - - os.environ["KERAS_BACKEND"] = "tensorflow" - - -.. code:: python - - from deeppavlov import build_model, configs - model = build_model(configs.morpho_tagger.UD2_0.morpho_ru_syntagrus_pymorphy, download=True) - sentences = ["Я шёл домой по незнакомой улице.", "Девушка пела в церковном хоре о всех уставших в чужом краю."] - for parse in model(sentences): - print(parse) - -If you want to use the obtained tags further in Python, just split the output using tabs and newlines. - -You may also pass the tokenized sentences instead of raw ones: - -.. code:: python - - sentences = [["Я", "шёл", "домой", "по", "незнакомой", "улице", "."]] - for parse in model(sentences): - print(parse) - -If your data is large, you can call -:meth:`~deeppavlov.core.common.chainer.Chainer.batched_call` method of the model, which will additionally -separate you list of sentences into small batches. - -.. code:: python - - from deeppavlov import build_model, configs - model = build_model(configs.morpho_tagger.UD2_0.morpho_ru_syntagrus_pymorphy, download=True) - sentences = ["Я шёл домой по незнакомой улице.", "Девушка пела в церковном хоре о всех уставших в чужом краю."] - for parse in model.batched_call(sentences, batch_size=16): - print(parse) - -:: - - 1 Я PRON,Case=Nom|Number=Sing|Person=1 _ - 2 шёл VERB,Aspect=Imp|Gender=Masc|Mood=Ind|Number=Sing|Tense=Past|VerbForm=Fin|Voice=Act _ - 3 домой ADV,Degree=Pos _ - 4 по ADP _ - 5 незнакомой ADJ,Case=Dat|Degree=Pos|Gender=Fem|Number=Sing _ - 6 улице NOUN,Animacy=Inan|Case=Dat|Gender=Fem|Number=Sing _ - 7 . PUNCT _ - - 1 Девушка NOUN,Animacy=Anim|Case=Nom|Gender=Fem|Number=Sing _ - 2 пела VERB,Aspect=Imp|Gender=Fem|Mood=Ind|Number=Sing|Tense=Past|VerbForm=Fin|Voice=Act _ - 3 в ADP _ - 4 церковном ADJ,Case=Loc|Degree=Pos|Gender=Masc|Number=Sing _ - 5 хоре NOUN,Animacy=Inan|Case=Loc|Gender=Masc|Number=Sing _ - 6 о ADP _ - 7 всех PRON,Animacy=Anim|Case=Loc|Number=Plur _ - 8 уставших VERB,Aspect=Perf|Case=Loc|Number=Plur|Tense=Past|VerbForm=Part|Voice=Act _ - 9 в ADP _ - 10 чужом ADJ,Case=Loc|Degree=Pos|Gender=Masc|Number=Sing _ - 11 краю NOUN,Animacy=Inan|Case=Loc|Gender=Masc|Number=Sing _ - 12 . PUNCT _ - -If you want the output in UD format, try setting ``"data_format": ud`` in the ``tag_output_prettifier`` section -of :config:`configuration file ` -you import. - -Advanced models (BERT and lemmatized models). ---------------------------------------------- - -#. For Russian you can use the BERT-based model. It has much higher performance (97.8% instead of 96.2), - however, you need a more powerful GPU (ideally, 16 GB) to train it. However, the speed - of inference and training on such GPU is comparable with character-based model. - -#. Exclusively for Russian language you can obtain lemmatized UD output by using either the - :config:`BERT model ` - :config:`augmented version ` - of Pymorphy model. Both models select the Pymorphy lemma whose tag correspond to the tag - predicted by the tagger. - - .. code:: python - - from deeppavlov import build_model, configs - model = build_model(configs.morpho_tagger.BERT.morpho_ru_syntagrus_bert, download=True) - # model = build_model(configs.morpho_tagger.UD2_0.morpho_ru_syntagrus_pymorphy_lemmatize, download=True) - sentences = ["Я шёл домой по незнакомой улице.", "Девушка пела в церковном хоре о всех уставших в чужом краю."] - for parse in model(sentences): - print(parse) - - :: - - 1 Я я PRON _ Case=Nom|Number=Sing|Person=1 _ _ _ _ - 2 шёл идти VERB _ Aspect=Imp|Gender=Masc|Mood=Ind|Number=Sing|Tense=Past|VerbForm=Fin|Voice=Act _ _ _ _ - 3 домой домой ADV _ Degree=Pos _ _ _ _ - 4 по по ADP _ _ _ _ _ _ - 5 незнакомой незнакомый ADJ _ Case=Dat|Degree=Pos|Gender=Fem|Number=Sing _ _ _ _ - 6 улице улица NOUN _ Animacy=Inan|Case=Dat|Gender=Fem|Number=Sing _ _ _ _ - 7 . . PUNCT _ _ _ _ _ _ - - 1 Девушка девушка NOUN _ Animacy=Anim|Case=Nom|Gender=Fem|Number=Sing _ _ _ _ - 2 пела петь VERB _ Aspect=Imp|Gender=Fem|Mood=Ind|Number=Sing|Tense=Past|VerbForm=Fin|Voice=Act _ _ _ _ - 3 в в ADP _ _ _ _ _ _ - 4 церковном церковный ADJ _ Case=Loc|Degree=Pos|Gender=Masc|Number=Sing _ _ _ _ - 5 хоре хор NOUN _ Animacy=Inan|Case=Loc|Gender=Masc|Number=Sing _ _ _ _ - 6 о о ADP _ _ _ _ _ _ - 7 всех весь PRON _ Animacy=Anim|Case=Loc|Number=Plur _ _ _ _ - 8 уставших устать VERB _ Aspect=Perf|Case=Loc|Number=Plur|Tense=Past|VerbForm=Part|Voice=Act _ _ _ _ - 9 в в ADP _ _ _ _ _ _ - 10 чужом чужой ADJ _ Case=Loc|Degree=Pos|Gender=Masc|Number=Sing _ _ _ _ - 11 краю край NOUN _ Animacy=Inan|Case=Loc|Gender=Masc|Number=Sing _ _ _ _ - 12 . . PUNCT _ _ _ _ _ _ - -Command line: ----------------- - -If you want to use our models from scratch, do the following -(all the examples are for ru\_syntagrus\_pymorphy model, -change the filenames accordingly to invoke models for other languages): - -#. Download data - - .. code:: bash - - python -m deeppavlov download morpho_ru_syntagrus_pymorphy - - To perform all downloads in runtime you can also run all subsequent - commands with ``-d`` key, - -#. To apply a pre-trained ru\_syntagrus\_pymorphy model to ru\_syntagrus test - data provided it was downloaded using the previous command, run - - .. code:: bash - - python -m deeppavlov.models.morpho_tagger morpho_ru_syntagrus_pymorphy \ - > -f ~/.deeppavlov/downloads/UD2.0_source/ru_syntagrus/ru_syntagrus-ud-test.conllu - - ``-f`` argument points to the path to the test data. If you do not pass it the model expects data from ``stdin``. - This command writes the output to stdout, you can redirect it using standard ``>`` notation. - - - By default the ``deeppavlov.models.morpho_tagger`` script expects the data to be in CoNLL-U format, - however, you can specify input format by using the `-i` key. For example, your input can be in one word per line - format, in this case you set this key to ``"vertical"``. Note also that you can pass the data from - - .. code:: bash - - echo -e "Мама\nмыла\nраму\n.\n\nВаркалось\n,\nхливкие\nшорьки\nпырялись\nпо\nнаве\n." \ - > | python -m deeppavlov.models.morpho_tagger morpho_ru_syntagrus_pymorphy -i "vertical" - - :: - - 1 Мама NOUN Animacy=Anim|Case=Nom|Gender=Fem|Number=Sing - 2 мыла VERB Aspect=Imp|Gender=Fem|Mood=Ind|Number=Sing|Tense=Past|VerbForm=Fin|Voice=Act - 3 раму NOUN Animacy=Inan|Case=Acc|Gender=Fem|Number=Sing - 4 . PUNCT _ - - 1 Варкалось NOUN Animacy=Anim|Case=Nom|Gender=Masc|Number=Sing - 2 , PUNCT _ - 3 хливкие ADJ Case=Nom|Degree=Pos|Number=Plur - 4 шорьки NOUN Animacy=Inan|Case=Nom|Gender=Masc|Number=Plur - 5 пырялись VERB Aspect=Imp|Mood=Ind|Number=Plur|Tense=Past|VerbForm=Fin|Voice=Mid - 6 по ADP _ - 7 наве NOUN Animacy=Inan|Case=Dat|Gender=Masc|Number=Sing - 8 . PUNCT _ - - - - Untokenized sentences (one sentence per line) can be tagged as well, in this case input format should be ``"text"`` - - .. code:: bash - - echo -e "Мама мыла раму.\nВаркалось, хливкие шорьки пырялись по наве." \ - > | python -m deeppavlov.models.morpho_tagger morpho_ru_syntagrus_pymorphy -i "text" - - :: - - 1 Мама NOUN Animacy=Anim|Case=Nom|Gender=Fem|Number=Sing - 2 мыла VERB Aspect=Imp|Gender=Fem|Mood=Ind|Number=Sing|Tense=Past|VerbForm=Fin|Voice=Act - 3 раму NOUN Animacy=Inan|Case=Acc|Gender=Fem|Number=Sing - 4 . PUNCT _ - - 1 Варкалось NOUN Animacy=Anim|Case=Nom|Gender=Masc|Number=Sing - 2 , PUNCT _ - 3 хливкие ADJ Case=Nom|Degree=Pos|Number=Plur - 4 шорьки NOUN Animacy=Inan|Case=Nom|Gender=Masc|Number=Plur - 5 пырялись VERB Aspect=Imp|Mood=Ind|Number=Plur|Tense=Past|VerbForm=Fin|Voice=Mid - 6 по ADP _ - 7 наве NOUN Animacy=Inan|Case=Dat|Gender=Masc|Number=Sing - 8 . PUNCT _ - - - You can also obtain the output in CoNLL-U format by passing the ``-o ud`` argument: - - .. code:: bash - - echo -e "Мама мыла раму.\nВаркалось, хливкие шорьки пырялись по наве." \ - > | python -m deeppavlov.models.morpho_tagger morpho_ru_syntagrus_pymorphy -i "text" -o "ud" - - :: - - 1 Мама _ NOUN _ Animacy=Anim|Case=Nom|Gender=Fem|Number=Sing _ _ _ _ - 2 мыла _ VERB _ Aspect=Imp|Gender=Fem|Mood=Ind|Number=Sing|Tense=Past|VerbForm=Fin|Voice=Act _ _ _ _ - 3 раму _ NOUN _ Animacy=Inan|Case=Acc|Gender=Fem|Number=Sing _ _ _ _ - 4 . _ PUNCT _ _ _ _ _ _ - - 1 Варкалось _ NOUN _ Animacy=Anim|Case=Nom|Gender=Masc|Number=Sing _ _ _ _ - 2 , _ PUNCT _ _ _ _ _ _ - 3 хливкие _ ADJ _ Case=Nom|Degree=Pos|Number=Plur _ _ _ _ - 4 шорьки _ NOUN _ Animacy=Inan|Case=Nom|Gender=Masc|Number=Plur _ _ _ _ - 5 пырялись _ VERB _ Aspect=Imp|Mood=Ind|Number=Plur|Tense=Past|VerbForm=Fin|Voice=Mid _ _ _ _ - 6 по _ ADP _ _ _ _ _ _ - 7 наве _ NOUN _ Animacy=Inan|Case=Dat|Gender=Masc|Number=Sing _ _ _ _ - 8 . _ PUNCT _ _ _ _ _ _ - - -#. To evaluate ru\_syntagrus model on ru\_syntagrus test subset, run - - .. code:: bash - - python -m deeppavlov evaluate morpho_ru_syntagrus_pymorphy - -#. To retrain model on ru\_syntagrus dataset, run one of the following - (the first is for Pymorphy-enriched model) - - .. code:: bash - - python -m deeppavlov train morpho_ru_syntagrus_pymorphy - python -m deeppavlov train morpho_ru_syntagrus - - Be careful, one epoch takes 2-60 minutes depending on your GPU. - -#. To tag Russian sentences from stdin, run - - .. code:: bash - - python -m deeppavlov interact morpho_ru_syntagrus_pymorphy - -Read the detailed readme below. - -Task description ----------------- - -Morphological tagging consists in assigning labels, describing word -morphology, to a pre-tokenized sequence of words. -In the most simple case these labels are just part-of-speech (POS) -tags, hence in earlier times of NLP the task was -often referred as POS-tagging. The refined version of the problem -which we solve here performs more fine-grained -classification, also detecting the values of other morphological -features, such as case, gender and number for nouns, -mood, tense, etc. for verbs and so on. Morphological tagging is a -stage of common NLP pipeline, it generates useful -features for further tasks such as syntactic parsing, named entity -recognition or machine translation. - -Common output for morphological tagging looks as below. The examples -are for Russian and English language and use the -inventory of tags and features from `Universal Dependencies -project `__. - -:: - - 1 Это PRON Animacy=Inan|Case=Acc|Gender=Neut|Number=Sing - 2 чутко ADV Degree=Pos - 3 фиксируют VERB Aspect=Imp|Mood=Ind|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin|Voice=Act - 4 энциклопедические ADJ Case=Nom|Degree=Pos|Number=Plur - 5 издания NOUN Animacy=Inan|Case=Nom|Gender=Neut|Number=Plur - 6 . PUNCT _ - - 1 Four NUM NumType=Card - 2 months NOUN Number=Plur - 3 later ADV _ - 4 , PUNCT _ - 5 we PRON Case=Nom|Number=Plur|Person=1|PronType=Prs - 6 were AUX Mood=Ind|Tense=Past|VerbForm=Fin - 7 married VERB Tense=Past|VerbForm=Part|Voice=Pass - 8 . PUNCT _ - -The full UD format (see below) includes more columns including lemma and -syntactic information. - -Training data -~~~~~~~~~~~~~ - -Our tagger accepts the data in `CONLL-U -format `__: - -:: - - 1 Four four NUM CD NumType=Card 2 nummod _ _ - 2 months month NOUN NNS Number=Plur 3 obl:npmod _ _ - 3 later later ADV RB _ 7 advmod _ SpaceAfter=No - 4 , , PUNCT , _ 7 punct _ _ - 5 we we PRON PRP Case=Nom|Number=Plur|Person=1|PronType=Prs 7 nsubj:pass _ _ - 6 were be AUX VBD Mood=Ind|Tense=Past|VerbForm=Fin 7 aux:pass _ _ - 7 married marry VERB VBN Tense=Past|VerbForm=Part|Voice=Pass 0 root _ SpaceAfter=No - 8 . . PUNCT . _ 7 punct _ _ - -It does not take into account the contents except the columns number -2, 4, 6 -(the word itself, POS label and morphological tag), however, in the -default setting the reader -expects the word to be in column 2, the POS label in column 4 and the -detailed tag description -in column 6. - -Test data -~~~~~~~~~ - -When annotating unlabeled text, our model expects the data in -10-column UD format as well. However, it does not pay attention to any column except the first one, -which should be a number, and the second, which must contain a word. -You can also pass only the words with exactly one word on each line -by adding ``"from_words": True`` to ``dataset_reader`` section. -Sentences are separated with blank lines. - -You can also pass the unlemmatized text as input. In this case it is preliminarly lemmatized using the -NLTK ``word_tokenize`` function. - -Algorithm description ---------------------- - -We adopt a neural model for morphological tagging from -`Heigold et al., 2017. An extensive empirical evaluation of -character-based morphological tagging for 14 -languages `__. -We refer the reader to the paper for complete description of the -algorithm. The tagger consists -of two parts: a character-level network which creates embeddings for -separate words and word-level -recurrent network which transforms these embeddings to morphological -tags. - -The character-level part implements the model from -`Kim et al., 2015. Character-aware language -models `__. -First it embeds the characters into dense vectors, then passes these -vectors through multiple -parallel convolutional layers and concatenates the output of these -convolutions. The convolution -output is propagated through a highway layer to obtain the final word -representation. - -You can optionally use a morphological dictionary during tagging. In -this case our model collects -a 0/1 vector with ones corresponding to the dictionary tags of a -current word. This vector is -passed through a one-layer perceptron to obtain an embedding of -dictionary information. -This embedding is concatenated with the output of character-level -network. - -As a word-level network we utilize a Bidirectional LSTM, its outputs -are projected through a dense -layer with a softmax activation. In principle, several BiLSTM layers -may be stacked as well -as several convolutional or highway layers on character level; -however, we did not observed -any sufficient gain in performance and use shallow architecture -therefore. - -Model configuration. --------------------- - -Training configuration -~~~~~~~~~~~~~~~~~~~~~~ - -We distribute pre-trained models for 11 languages trained on Universal Dependencies data. -Configuration files for reproducible training are also available in -:config:`deeppavlov/configs/morpho_tagger/UD2.0 `, for -example -:config:`deeppavlov/configs/morpho_tagger/UD2.0/morpho_en.json `. -The configuration file consists of several parts: - -Dataset Reader -^^^^^^^^^^^^^^ - -The dataset reader describes the instance of -:class:`~deeppavlov.dataset_readers.morphotagging_dataset_reader.MorphotaggerDatasetReader` class. - -:: - - "dataset_reader": { - "class_name": "morphotagger_dataset_reader", - "data_path": "{DOWNLOADS_PATH}/UD2.0_source", - "language": "en", "data_types": ["train", "dev", "test"] - } - -``class_name`` field refers to the class MorphotaggerDatasetReader, -``data_path`` contains the path to data directory, the ``language`` -field is used to derive the name of training and development file. -Alternatively, you can specify these files separately by full (or absolute) paths -like - -:: - - "dataset_reader": { - "class_name": "morphotagger_dataset_reader", - "data_path": ["{DOWNLOADS_PATH}/UD2.0_source/en-ud-train.conllu", - "{DOWNLOADS_PATH}/UD2.0_source/en-ud-dev.conllu", - "{DOWNLOADS_PATH}/UD2.0_source/en-ud-test.conllu"] - "data_types": ["train", "dev", "test"] - } - -By default you need only the train file, the dev file is used to -validate -your model during training and the test file is for model evaluation -after training. Since you need some validation data anyway, without -the dev part -you need to resplit your data as described in `Dataset -Iterator <#dataset-iterator>`__ section. - -Your data should be in CONLL-U format. It refers to ``predict`` mode also, but in this case only word -column is taken into account. If your data is in single word per line format and you do not want to -reformat it, add ``"from_words": True`` to ``dataset_reader`` section. You can also specify -which columns contain words, tags and detailed tags, for documentation see -:func:`Documentation `. - -Dataset iterator -^^^^^^^^^^^^^^^^ - -:class:`Dataset iterator ` class -performs simple batching and shuffling. - -:: - - "dataset_iterator": { - "class_name": "morphotagger_dataset" - } - -By default it has no parameters, but if your training and validation -data -are in the same file, you may specify validation split here: - -:: - - "dataset_iterator": { - "class_name": "morphotagger_dataset", - "validation_split": 0.2 - } - -Chainer -^^^^^^^ - -The ``chainer`` part of the configuration file contains the -specification of the neural network model and supplementary things such as vocabularies. -Chainer refers to an instance of :class:`~deeppavlov.core.common.chainer.Chainer`, see -:doc:`configuration ` for a complete description. - -The major part of ``chainer`` is ``pipe``. The ``pipe`` contains -vocabularies and the network itself as well -as some pre- and post- processors. The first part lowercases the input -and normalizes it (see :class:`~deeppavlov.models.preprocessors.capitalization.CapitalizationPreprocessor`). - -:: - - "pipe": [ - { - "id": "lowercase_preprocessor", - "class_name": "lowercase_preprocessor", - "in": ["x"], - "out": ["x_processed"] - }, - -The second part is the tag vocabulary which transforms tag labels the -model should predict to tag indexes. - -:: - - { - "id": "tag_vocab", - "class_name": "simple_vocab", - "fit_on": ["y"], - "special_tokens": ["PAD", "BEGIN", "END"], - "save_path": "{MODELS_PATH}/morpho_tagger/UD2.0/tag_en.dict", - "load_path": "{MODELS_PATH}/morpho_tagger/UD2.0/tag_en.dict" - }, - -The third part is the character vocabulary used to represent words as sequences of indexes. Only the -symbols which occur at least ``min_freq`` times in the training set are kept. - -:: - - { - "id": "char_vocab", - "class_name": "simple_vocab", - "min_freq": 3, - "fit_on": ["x_processed"], - "special_tokens": ["PAD", "BEGIN", "END"], - "save_path": "{MODELS_PATH}/morpho_tagger/UD2.0/char_en.dict", - "load_path": "{MODELS_PATH}/morpho_tagger/UD2.0/char_en.dict" - }, - - -If you want to utilize external morphological knowledge, you can do it in two ways. -The first is to use :class:`~deeppavlov.models.vectorizers.word_vectorizer.DictionaryVectorizer`. -:class:`~deeppavlov.models.vectorizers.word_vectorizer.DictionaryVectorizer` is instantiated from a dictionary file. -Each line of a dictionary file contains two columns: -a word and a space-separated list of its possible tags. Tags can be in any possible format. The config part for -:class:`~deeppavlov.models.vectorizers.word_vectorizer.DictionaryVectorizer` looks as - -:: - - { - "id": "dictionary_vectorizer", - "class_name": "dictionary_vectorizer", - "load_path": PATH_TO_YOUR_DICTIONARY_FILE, - "save_path": PATH_TO_YOUR_DICTIONARY_FILE, - "in": ["x"], - "out": ["x_possible_tags"] - } - - -The second variant for external morphological dictionary, available only for Russian, -is `Pymorphy2 `_. In this case the vectorizer list all Pymorphy2 tags -for a given word and transforms them to UD2.0 format using -`russian-tagsets `_ library. Possible UD2.0 tags -are listed in a separate distributed with the library. This part of the config look as -(see :config:`config `)) - -:: - - { - "id": "pymorphy_vectorizer", - "class_name": "pymorphy_vectorizer", - "save_path": "{MODELS_PATH}/morpho_tagger/UD2.0/ru_syntagrus/tags_russian.txt", - "load_path": "{MODELS_PATH}/morpho_tagger/UD2.0/ru_syntagrus/tags_russian.txt", - "max_pymorphy_variants": 5, - "in": ["x"], - "out": ["x_possible_tags"] - } - -The next part performs the tagging itself. Together with general parameters it describes -the input parameters of :class:`~deeppavlov.models.morpho_tagger.morpho_tagger.MorphoTagger`) class. - -:: - - { - "in": ["x_processed"], - "in_y": ["y"], - "out": ["y_predicted"], - "class_name": "morpho_tagger", - "main": true, - "save_path": "{MODELS_PATH}/morpho_tagger/UD2.0/ud_en.hdf5", - "load_path": "{MODELS_PATH}/morpho_tagger/UD2.0/ud_en.hdf5", - "tags": "#tag_vocab", - "symbols": "#char_vocab", - "verbose": 1, - "char_embeddings_size": 32, "char_window_size": [1, 2, 3, 4, 5, 6, 7], - "word_lstm_units": 128, "conv_dropout": 0.0, "char_conv_layers": 1, - "char_highway_layers": 1, "highway_dropout": 0.0, "word_lstm_layers": 1, - "char_filter_multiple": 50, "intermediate_dropout": 0.0, "word_dropout": 0.2, - "lstm_dropout": 0.3, "regularizer": 0.01, "lm_dropout": 0.3 - } - - -When an additional vectorizer is used, the first line is changed to -``"in": ["x_processed", "x_possible_tags"]`` and an additional parameter -``"word_vectorizers": [["#pymorphy_vectorizer.dim", 128]]`` is appended. - -Config includes general parameters of :class:`~deeppavlov.core.models.component.Component` class, -described in the :doc:`configuration ` and specific -:class:`~deeppavlov.models.morpho_tagger.morpho_tagger.MorphoTagger` -parameters. The latter include - -- ``tags`` - tag vocabulary. ``#tag_vocab`` refers to an already defined model with ``"id" = "tag_vocab"``. -- ``symbols`` - character vocabulary. ``#char_vocab`` refers to an already defined model with ``"id" = "char_vocab"``. - -and other specific parameters of the network, available in :class:`~deeppavlov.models.morpho_tagger.morpho_tagger.MorphoTagger` documentation. - -The ``"train"`` section of ``"chainer"`` contains training parameters, such as number of epochs, -batch_size and logging frequency, see general readme for more details. - -**chainer** also includes the ``"prettifier"`` subsection, which describes the parameters -of :class:`~deeppavlov.core.models.morpho_tagger.common.TagOutputPrettifier` -which transforms the predictions of the tagger to a readable form. - -:: - - { - "in": ["x", "y_predicted"], - "out": ["y_prettified"], - "class_name": "tag_output_prettifier", - "end": "\\n" - } - - -It takes two inputs — source sequence of words and predicted sequence of tags -and produces the output of the format - -:: - - 1 Это PRON Animacy=Inan|Case=Acc|Gender=Neut|Number=Sing - 2 чутко ADV Degree=Pos - 3 фиксируют VERB - Aspect=Imp|Mood=Ind|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin|Voice=Act - 4 энциклопедические ADJ Case=Nom|Degree=Pos|Number=Plur - 5 издания NOUN Animacy=Inan|Case=Nom|Gender=Neut|Number=Plur - 6 . PUNCT _ - - 1 Four NUM NumType=Card - 2 months NOUN Number=Plur - 3 later ADV * - 4 , PUNCT * - 5 we PRON Case=Nom|Number=Plur|Person=1|PronType=Prs - 6 were AUX Mood=Ind|Tense=Past|VerbForm=Fin - 7 married VERB Tense=Past|VerbForm=Part|Voice=Pass - 8 . PUNCT _ - -To generate output in 10 column CONLL-U format add ``"format_mode": "ud"`` to the described section. diff --git a/docs/features/models/multitask_bert.rst b/docs/features/models/multitask_bert.rst deleted file mode 100644 index 3f0f33021c..0000000000 --- a/docs/features/models/multitask_bert.rst +++ /dev/null @@ -1,348 +0,0 @@ -Multi-task BERT in DeepPavlov -============================= - -Multi-task BERT in DeepPavlov is an implementation of BERT training algorithm published in the paper "Multi-Task Deep -Neural Networks for Natural Language Understanding". - -| Multi-task BERT paper: https://arxiv.org/abs/1901.11504 - -The idea is to share BERT body between several tasks. This is necessary if a model pipe has several -components using BERT and the amount of GPU memory is limited. Each task has its own 'head' part attached to the -output of the BERT encoder. If multi-task BERT has :math:`T` heads, one training iteration consists of - -- composing :math:`T` mini-batches, one for each task, - -- :math:`T` gradient steps, one gradient step for each task. - -When one of BERT heads is being trained, other heads' parameters do not change. On each training step both BERT head -and body parameters are modified. You may specify different learning rates for a head and a body. - -Currently there are heads for classification (``mt_bert_classification_task``) and sequence tagging -(``mt_bert_seq_tagging_task``). - -At this page, multi-task BERT usage is explained on a toy configuration file of a model that detects -insults, analyzes sentiment, and recognises named entities. Multi-task BERT configuration files for training -:config:`mt_bert_train_tutorial.json ` and for inference -:config:`mt_bert_inference_tutorial.json ` are based on configs -:config:`insults_kaggle_bert.json `, -:config:`sentiment_sst_multi_bert.json `, -:config:`ner_conll2003_bert.json `. - -We start with the ``metadata`` field of the configuration file. Multi-task BERT model is saved in -``{"MT_BERT_PATH": "{MODELS_PATH}/mt_bert"}``. Classes and tag vocabularies are saved in -``{"INSULTS_PATH": "{MT_BERT_PATH}/insults"}``, ``{"SENTIMENT_PATH": "{MT_BERT_PATH}/sentiment"}``. ``downloads`` -field of Multitask BERT configuration file is a union of ``downloads`` fields of original configs without pre-trained -models. The ``metadata`` field of our config is given below. - -.. code:: json - - { - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "DOWNLOADS_PATH": "{ROOT_PATH}/downloads", - "MODELS_PATH": "{ROOT_PATH}/models", - "BERT_PATH": "{DOWNLOADS_PATH}/bert_models/cased_L-12_H-768_A-12", - "MT_BERT_PATH": "{MODELS_PATH}/mt_bert_tutorial", - "INSULTS_PATH": "{MT_BERT_PATH}/insults", - "SENTIMENT_PATH": "{MT_BERT_PATH}/sentiment", - "NER_PATH": "{MT_BERT_PATH}/ner" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/datasets/insults_data.tar.gz", - "subdir": "{DOWNLOADS_PATH}" - }, - { - "url": "http://files.deeppavlov.ai/datasets/yelp_review_full_csv.tar.gz", - "subdir": "{DOWNLOADS_PATH}" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/bert/cased_L-12_H-768_A-12.zip", - "subdir": "{DOWNLOADS_PATH}/bert_models" - } - ] - } - } - -Train config ------------- - -When using ``multitask_bert`` component, you need **separate train and inference configuration files**. - -Data reading and iteration is performed by ``multitask_reader`` and ``multitask_iterator``. These classes are composed -of task readers and iterators and generate batches that contain data from heterogeneous datasets. - -A ``multitask_reader`` configuration has parameters ``class_name``, ``data_path``, and ``tasks``. -``data_path`` field may be any string because data paths are passed for tasks individually in ``tasks`` -parameter. However, you can not drop a ``data_path`` parameter because it is obligatory for dataset reader -configuration. ``tasks`` parameter is a dictionary of task dataset readers configurations. In configurations of -task readers, ``reader_class_name`` parameter is used instead of ``class_name``. The dataset reader configuration is -provided: - -.. code:: json - - { - "dataset_reader": { - "class_name": "multitask_reader", - "data_path": "null", - "tasks": { - "insults": { - "reader_class_name": "basic_classification_reader", - "x": "Comment", - "y": "Class", - "data_path": "{DOWNLOADS_PATH}/insults_data" - }, - "sentiment": { - "reader_class_name": "basic_classification_reader", - "x": "text", - "y": "label", - "data_path": "{DOWNLOADS_PATH}/yelp_review_full_csv", - "train": "train.csv", - "test": "test.csv", - "header": null, - "names": [ - "label", - "text" - ] - }, - "ner": { - "reader_class_name": "conll2003_reader", - "data_path": "{DOWNLOADS_PATH}/conll2003/", - "dataset_name": "conll2003", - "provide_pos": false - } - } - } - } - -A ``multitask_iterator`` configuration has parameters ``class_name`` and ``tasks``. ``tasks`` is a dictionary of -configurations of task iterators. In configurations of task iterators, ``iterator_class_name`` is used instead of -``class_name``. The dataset iterator configuration is as follows: - -.. code:: json - - { - "dataset_iterator": { - "class_name": "multitask_iterator", - "tasks": { - "insults": { - "iterator_class_name": "basic_classification_iterator", - "seed": 42 - }, - "sentiment": { - "iterator_class_name": "basic_classification_iterator", - "seed": 42, - "split_seed": 23, - "field_to_split": "train", - "split_fields": [ - "train", - "valid" - ], - "split_proportions": [ - 0.9, - 0.1 - ] - }, - "ner": {"iterator_class_name": "data_learning_iterator"} - } - } - } - -Batches generated by ``multitask_iterator`` are tuples of two elements: inputs of the model and labels. Both inputs -and labels are lists of tuples. The inputs have following format: ``[(first_task_inputs[0], second_task_inputs[0], -...), (first_task_inputs[1], second_task_inputs[1], ...), ...]`` where ``first_task_inputs``, ``second_task_inputs``, -and so on are x values of batches from task dataset iterators. The labels in the have the similar format. - -If task datasets have different sizes, then smaller datasets are repeated until -their sizes are equal to the size of the largest dataset. For example, if the first task dataset inputs are -``[0, 1, 2, 3, 4, 5, 6]``, the second task dataset inputs are ``[7, 8, 9]``, and the batch size is ``2``, then -multi-task input mini-batches will be ``[(0, 7), (1, 8)]``, ``[(2, 9), (3, 7)]``, ``[(4, 8), (5, 9)]``, ``[(6, 7)]``. - -In this tutorial, there are 3 datasets. Considering the batch structure, ``chainer`` inputs are: - -.. code:: json - - { - "in": ["x_insults", "x_sentiment", "x_ner"], - "in_y": ["y_insults", "y_sentiment", "y_ner"] - } - -Sometimes a task dataset iterator returns inputs or labels consisting of more than one element. For example, in model -:config:`mt_bert_train_tutorial.json ` ``siamese_iterator`` input -element consists of 2 strings. If there is a necessity to split such a variable, ``InputSplitter`` component can -be used. - -Data preparation steps in the pipe of tutorial config are similar to data preparation steps in the original -configs except for names of the variables. - -A ``multitask_bert`` component has task-specific parameters and parameters that are common for all tasks. The first -are provided inside the ``tasks`` parameter. The ``tasks`` is a dictionary that keys are task names and values are -task-specific parameters. **The names of tasks have to be the same in train and inference configs.** - -If ``inference_task_names`` parameter of a ``multitask_bert`` component is provided, the component is created for -inference. Otherwise, it is created for training. - -Task classes inherit ``MTBertTask`` class. Inputs and labels of a ``multitask_bert`` component are distributed between -the tasks according to the ``in_distribution`` and ``in_y_distribution`` parameters. You can drop these parameters if -only one task is called. In that case, all ``multitask_bert`` inputs are passed to the task. Another option is -to make a distribution parameter a dictionary whose keys are task names and values are numbers of arguments the tasks -take. If this option is used, the order of the ``multitask_bert`` component inputs in ``in`` and ``in_y`` parameters -must meet three conditions. First, ``in`` and ``in_y`` elements have to be grouped by tasks, e.g. arguments for the -first task, then arguments for the second task and so on. Secondly, the order of tasks in ``in`` and ``in_y`` has to -be the same as the order of tasks in the ``in_distribution`` and ``in_y_distribution`` parameters. Thirdly, in ``in`` -and ``in_y`` parameters the arguments of a task have to be put in the same order as the order in which they are passed -to ``get_sess_run_infer_args`` and ``get_sess_run_train_args`` methods of the task. If ``in`` and ``in_y`` parameters -are dictionaries, you may make ``in_distribution`` and ``in_y_distribution`` parameter dictionaries which keys are -task names and values are lists of elements of ``in`` or ``in_y``. - -.. code:: json - - { - "id": "mt_bert", - "class_name": "mt_bert", - "save_path": "{MT_BERT_PATH}/model", - "load_path": "{MT_BERT_PATH}/model", - "bert_config_file": "{BERT_PATH}/bert_config.json", - "pretrained_bert": "{BERT_PATH}/bert_model.ckpt", - "attention_probs_keep_prob": 0.5, - "body_learning_rate": 3e-5, - "min_body_learning_rate": 2e-7, - "learning_rate_drop_patience": 10, - "learning_rate_drop_div": 1.5, - "load_before_drop": true, - "optimizer": "tf.train:AdamOptimizer", - "clip_norm": 1.0, - "tasks": { - "insults": { - "class_name": "mt_bert_classification_task", - "n_classes": "#classes_vocab_insults.len", - "keep_prob": 0.5, - "return_probas": true, - "learning_rate": 1e-3, - "one_hot_labels": true - }, - "sentiment": { - "class_name": "mt_bert_classification_task", - "n_classes": "#classes_vocab_sentiment.len", - "return_probas": true, - "one_hot_labels": true, - "keep_prob": 0.5, - "learning_rate": 1e-3 - }, - "ner": { - "class_name": "mt_bert_seq_tagging_task", - "n_tags": "#tag_vocab.len", - "return_probas": false, - "keep_prob": 0.5, - "learning_rate": 1e-3, - "use_crf": true, - "encoder_layer_ids": [-1] - } - }, - "in_distribution": {"insults": 1, "sentiment": 1, "ner": 3}, - "in": [ - "bert_features_insults", - "bert_features_sentiment", - "x_ner_subword_tok_ids", - "ner_attention_mask", - "ner_startofword_markers"], - "in_y_distribution": {"insults": 1, "sentiment": 1, "ner": 1}, - "in_y": ["y_insults_onehot", "y_sentiment_onehot", "y_ner_ind"], - "out": ["y_insults_pred_probas", "y_sentiment_pred_probas", "y_ner_pred_ind"] - } - -You may need to design your own metric for early stopping. In this example, the target metric is an average of AUC ROC -for insults and sentiment tasks and F1 for NER task. In order to add a metric to config, you have to register the -metric. To register metric, add the decorator ``register_metric`` and run the command -``python -m utils.prepare.registry`` in DeepPavlov root directory. The code below should be placed in the file -``deeppavlov/metrics/fmeasure.py`` and registry is updated with command ``python -m utils.prepare.registry``. - -.. code:: python - - @register_metric("average__roc_auc__roc_auc__ner_f1") - def roc_auc__roc_auc__ner_f1(true_onehot1, pred_probas1, true_onehot2, pred_probas2, ner_true3, ner_pred3): - from .roc_auc_score import roc_auc_score - roc_auc1 = roc_auc_score(true_onehot1, pred_probas1) - roc_auc2 = roc_auc_score(true_onehot2, pred_probas2) - ner_f1_3 = ner_f1(ner_true3, ner_pred3) / 100 - return (roc_auc1 + roc_auc2 + ner_f1_3) / 3 - -Inference config ----------------- - -There is no need in dataset reader and dataset iterator in and inference config. A ``train`` field and components -preparing ``in_y`` are removed. In ``multitask_bert`` component configuration all training parameters (learning rate, -optimizer, etc.) are omitted. - -For demonstration of DeepPavlov multi-task BERT functionality, in this example, the inference is made in 2 separate -components: ``multitask_bert`` and ``mtbert_reuser``. The first component performs named entity recognition and the -second performs insult detection and sentiment analysis. - -To run NER using the ``multitask_bert`` component, ``inference_task_names`` parameter is added to -``multitask_bert`` component configuration. An ``inference_task_names`` parameter can be a string or a list containing -strings and lists of strings. If an ``inference_task_names`` parameter is a string, it is the name of the task called -separately (in individual ``tf.Session.run`` call). - -If an ``inference_task_names`` parameter is a list, then this list contains names of called tasks. You may group -several tasks to speed up inference if these tasks have common inputs. If an element of the ``inference_task_names`` -is a list of task names, the tasks from the list are run simultaneously in one ``tf.Session.run`` call. Despite the -fact that tasks share inputs, you have to provide full sets of inputs for all tasks in ``in`` parameter of -``multitask_bert``. - -In the tutorial, NER task do not have common inputs with other tasks and have to be run -separately. - -.. code:: json - - { - "id": "mt_bert", - "class_name": "mt_bert", - "inference_task_names": "ner", - "bert_config_file": "{BERT_PATH}/bert_config.json", - "save_path": "{MT_BERT_PATH}/model", - "load_path": "{MT_BERT_PATH}/model", - "pretrained_bert": "{BERT_PATH}/bert_model.ckpt", - "tasks": { - "insults": { - "class_name": "mt_bert_classification_task", - "n_classes": "#classes_vocab_insults.len", - "return_probas": true, - "one_hot_labels": true - }, - "sentiment": { - "class_name": "mt_bert_classification_task", - "n_classes": "#classes_vocab_sentiment.len", - "return_probas": true, - "one_hot_labels": true - }, - "ner": { - "class_name": "mt_bert_seq_tagging_task", - "n_tags": "#tag_vocab.len", - "return_probas": false, - "use_crf": true, - "encoder_layer_ids": [-1] - } - }, - "in": ["x_ner_subword_tok_ids", "ner_attention_mask", "ner_startofword_markers"], - "out": ["y_ner_pred_ind"] - } - -``mtbert_reuser`` component is an interface to ``call`` method of ``MultiTaskBert`` class. ``mtbert_reuser`` -component is provided with ``multitask_bert`` component, a list of task names for inference ``task_names`` (the format -is same as in ``inference_task_names`` parameter of ``multitask_bert``), and ``in_distribution`` parameter. Notice -that tasks "insults" and "sentiment" are grouped into a list of 2 elements. This syntax invokes inference of these -tasks in one call of ``tf.Session.run``. If ``task_names`` were equal to ``["insults", "sentiment"]``, the inference -of the tasks would be sequential and take approximately 2 times more time. - -.. code:: json - - { - "class_name": "mt_bert_reuser", - "mt_bert": "#mt_bert", - "task_names": [["insults", "sentiment"]], - "in_distribution": {"insults": 1, "sentiment": 1}, - "in": ["bert_features", "bert_features"], - "out": ["y_insults_pred_probas", "y_sentiment_pred_probas"] - } - diff --git a/docs/features/models/nemo.rst b/docs/features/models/nemo.rst deleted file mode 100644 index bfa3bd4421..0000000000 --- a/docs/features/models/nemo.rst +++ /dev/null @@ -1,164 +0,0 @@ -Speech recognition and synthesis (ASR and TTS) -============================================== - -DeepPavlov contains models for automatic speech recognition (ASR) and text synthesis (TTS) based on pre-build modules -from `NeMo `__ (v0.10.0) - NVIDIA toolkit for defining and building -Conversational AI applications. Named arguments for modules initialization are taken from the NeMo config file (please -do not confuse with the DeepPavlov config file that defines model pipeline). - -Speech recognition ------------------- - -The ASR pipeline is based on Jasper: an CTC-based end-to-end model. The model transcripts speech samples without -any additional alignment information. :class:`~deeppavlov.models.nemo.asr.NeMoASR` contains following modules: - -- `AudioToMelSpectrogramPreprocessor `_ - uses arguments from ``AudioToMelSpectrogramPreprocessor`` section of the NeMo config file. -- `JasperEncoder `__ - uses arguments from ``JasperEncoder`` section of the NeMo config file. Needs pretrained checkpoint. -- `JasperDecoderForCTC `__ - uses arguments from ``JasperDecoder`` section of the NeMo config file. Needs pretrained checkpoint. -- `GreedyCTCDecoder `__ - doesn't use any arguments. -- :class:`~deeppavlov.models.nemo.asr.AudioInferDataLayer` - uses arguments from ``AudioToTextDataLayer`` section of the NeMo config file. - -NeMo config file for ASR should contain ``labels`` argument besides named arguments for the modules above. ``labels`` is -a list of characters that can be output by the ASR model used in model training. - -Speech synthesis ----------------- - -The TTS pipeline that creates human audible speech from text is based on Tacotron 2 and Waveglow models. -:class:`~deeppavlov.models.nemo.tts.NeMoTTS` contains following modules: - -- `TextEmbedding `__ - uses arguments from ``TextEmbedding`` section of the NeMo config file. Needs pretrained checkpoint. -- `Tacotron2Encoder `__ - uses arguments from ``Tacotron2Encoder`` section of the NeMo config file. Needs pretrained checkpoint. -- `Tacotron2DecoderInfer `__ - uses arguments from ``Tacotron2Decoder`` section of the NeMo config file. Needs pretrained checkpoint. -- `Tacotron2Postnet `__ - uses arguments from ``Tacotron2Postnet`` section of the NeMo config file. Needs pretrained checkpoint. -- :class:`~deeppavlov.models.nemo.vocoder.WaveGlow` - uses arguments from ``WaveGlowNM`` section of the NeMo config file. Needs pretrained checkpoint. -- :class:`~deeppavlov.models.nemo.vocoder.GriffinLim` - uses arguments from ``GriffinLim`` section of the NeMo config file. -- :class:`~deeppavlov.models.nemo.tts.TextDataLayer` - uses arguments from ``TranscriptDataLayer`` section of the NeMo config file. - -NeMo config file for TTS should contain ``labels`` and ``sample_rate`` args besides named arguments for the modules -above. ``labels`` is a list of characters used in TTS model training. - -Audio encoding end decoding. ----------------------------- - -:func:`~deeppavlov.models.nemo.common.ascii_to_bytes_io` and :func:`~deeppavlov.models.nemo.common.bytes_io_to_ascii` -was added to the library to achieve uniformity at work with both text and audio data. Components can be used to encode -binary data to ascii string and decode back. - -Quck Start ----------- - -Preparation -~~~~~~~~~~~ - -Install requirements and download model files. - -.. code:: bash - - python -m deeppavlov install asr_tts - python -m deeppavlov download asr_tts - -Examples below use `sounddevice `_ library. Install -it with ``pip install sounddevice==0.3.15``. You may need to install ``libportaudio2`` package with -``sudo apt-get install libportaudio2`` to make ``sounddevice`` work. - -.. note:: - ASR reads and TTS generates single channel WAV files. Files transferred to ASR are resampled to the frequency - specified in the NeMo config file (16 kHz for models from DeepPavlov configs). - -Speech recognition -~~~~~~~~~~~~~~~~~~ - -DeepPavlov :config:`asr ` config contains minimal pipeline for english speech recognition using -`QuartzNet15x5En `_ pretrained model. -To record speech on your computer and print transcription run following script: - -.. code:: python - - from io import BytesIO - - import sounddevice as sd - from scipy.io.wavfile import write - - from deeppavlov import build_model, configs - - sr = 16000 - duration = 3 - - print('Recording...') - myrecording = sd.rec(duration*sr, samplerate=sr, channels=1) - sd.wait() - print('done') - - out = BytesIO() - write(out, sr, myrecording) - - model = build_model(configs.nemo.asr) - text_batch = model([out]) - - print(text_batch[0]) - -Speech synthesis -~~~~~~~~~~~~~~~~ - -DeepPavlov :config:`tts ` config contains minimal pipeline for speech synthesis using -`Tacotron2 `_ and -`WaveGlow `_ pretrained models. -To generate audiofile and save it to hard drive run following script: - -.. code:: python - - from deeppavlov import build_model, configs - - model = build_model(configs.nemo.tts) - filepath_batch = model(['Hello world'], ['~/hello_world.wav']) - - print(f'Generated speech has successfully saved at {filepath_batch[0]}') - -Speech to speech -~~~~~~~~~~~~~~~~ - -Previous examples assume files with speech to recognize and files to be generated are on the same system where the -DeepPavlov is running. DeepPavlov :config:`asr_tts ` config allows sending files with speech to -recognize and receiving files with generated speech from another system. This config is recognizes received speech and -re-sounds it. - -Run ``asr_tts`` in REST Api mode: - -.. code:: bash - - python -m deeppavlov riseapi asr_tts - -This python script supposes that you already have file with speech to recognize. You can use code from speech -recognition example to record speech on your system. ``127.0.0.1`` should be replased by address of system where -DeepPavlov has started. - -.. code:: python - - from base64 import encodebytes, decodebytes - - from requests import post - - with open('/path/to/wav/file/with/speech', 'rb') as fin: - input_speech = fin.read() - - input_ascii = encodebytes(input_speech).decode('ascii') - - resp = post('http://127.0.0.1:5000/model', json={"speech_in_encoded": [input_ascii]}) - text, generated_speech_ascii = resp.json()[0] - generated_speech = decodebytes(generated_speech_ascii.encode()) - - with open('/path/where/to/save/generated/wav/file', 'wb') as fout: - fout.write(generated_speech) - - print(f'Speech transcriptions is: {text}') - -.. warning:: - NeMo library v0.10.0 doesn't allow to infer batches longer than one without compatible NVIDIA GPU. - -Models training ---------------- - -To get your own pre-trained checkpoints for NeMo modules see `Speech recognition `_ -and `Speech Synthesis `_ tutorials. Pre-trained models list could be found -`here `_. \ No newline at end of file diff --git a/docs/features/models/ner.rst b/docs/features/models/ner.rst index 3663bb84ef..0679cd4c52 100644 --- a/docs/features/models/ner.rst +++ b/docs/features/models/ner.rst @@ -4,23 +4,22 @@ Named Entity Recognition (NER) Train and use the model ----------------------- -There are three main types of models available: Standard RNN-based model, BERT-based model (on TensorFlow and PyTorch), and the hybrid model. -To see details about BERT based models see :doc:`here `. The last one, the hybrid model, reproduces the architecture proposed -in the paper `A Deep Neural Network Model for the Task of Named Entity Recognition `__. +Entity recognition is based on BERT model on PyTorch. +To see details about BERT based models see :doc:`here `. Any pre-trained model can be used for inference from both Command Line Interface (CLI) and Python. Before using the model make sure that all required packages are installed using the command: .. code:: bash - python -m deeppavlov install ner_ontonotes_bert_torch + python -m deeppavlov install ner_ontonotes_bert To use a pre-trained model from CLI use the following command: .. code:: bash - python deeppavlov/deep.py interact ner_ontonotes_bert_torch [-d] + python deeppavlov/deep.py interact ner_ontonotes_bert [-d] -where ``ner_ontonotes_bert_torch`` is the name of the config and ``-d`` is an optional download key. The key ``-d`` is used +where ``ner_ontonotes_bert`` is the name of the config and ``-d`` is an optional download key. The key ``-d`` is used to download the pre-trained model along with embeddings and all other files needed to run the model. Other possible commands are ``train``, ``evaluate``, and ``download``, @@ -31,43 +30,28 @@ Here is the list of all available configs: .. table:: :widths: auto - +------------------------------------------------------------------------+--------------------+----------+-----------------+------------+------------+ - | Model | Dataset | Language | Embeddings Size | Model Size | F1 score | - +========================================================================+====================+==========+=================+============+============+ - | :config:`ner_rus_bert_torch ` | Collection3 [1]_ | Ru | 700 MB | 2.0 GB | **97.7** | - +------------------------------------------------------------------------+ + +-----------------+------------+------------+ - | :config:`ner_collection3_m1 ` | | | 1.1 GB | 1 GB | 97.8 | - +------------------------------------------------------------------------+ + +-----------------+------------+------------+ - | :config:`ner_rus ` | | | 1.0 GB | 5.6 MB | 95.1 | - +------------------------------------------------------------------------+--------------------+----------+-----------------+------------+------------+ - | :config:`` | Ontonotes | Multi | 700 MB | 2.0 GB | **87.2** | - +------------------------------------------------------------------------+ +----------+-----------------+------------+------------+ - | :config:`ner_ontonotes_bert_torch ` | | En | 400 MB | 1.3 GB | 87.9 | - +------------------------------------------------------------------------+ + +-----------------+------------+------------+ - | :config:`ner_ontonotes_m1 ` | | | 347 MB | 379.4 MB | 87.7 | - +------------------------------------------------------------------------+ + +-----------------+------------+------------+ - | :config:`ner_ontonotes ` | | | 331 MB | 7.8 MB | 86.7 | - +------------------------------------------------------------------------+--------------------+ +-----------------+------------+------------+ - | :config:`ner_conll2003_bert ` | CoNLL-2003 | | 400 MB | 850 MB | 91.7 | - +------------------------------------------------------------------------+ + +-----------------+------------+------------+ - | :config:`ner_conll2003_torch_bert ` | | | --- | 1.3 GB | 90.7 | - +------------------------------------------------------------------------+ + +-----------------+------------+------------+ - | :config:`ner_conll2003 ` | | | 331 MB | 3.1 MB | 89.9 | - +------------------------------------------------------------------------+ + +-----------------+------------+------------+ - | :config:`conll2003_m1 ` | | | 339 MB | 359.7 MB | **91.9** | - +------------------------------------------------------------------------+--------------------+ +-----------------+------------+------------+ - | :config:`ner_dstc2 ` | DSTC2 | | --- | 626 KB | 97.1 | - +------------------------------------------------------------------------+--------------------+----------+-----------------+------------+------------+ - | :config:`vlsp2016_full ` | VLSP-2016 | Vi | 520 MB | 37.2 MB | 93.4 | - +------------------------------------------------------------------------+--------------------+----------+-----------------+------------+------------+ + +--------------------------------------------------------------------------------------+--------------------+----------+-----------------+------------+------------+ + | Model | Dataset | Language | Embeddings Size | Model Size | F1 score | + +======================================================================================+====================+==========+=================+============+============+ + | :config:`ner_rus_bert ` | Collection3 [1]_ | Ru | 700 MB | 2.0 GB | **97.9** | + +--------------------------------------------------------------------------------------+--------------------+----------+-----------------+------------+------------+ + | :config:`ner_ontonotes_bert_mult ` | Ontonotes | Multi | 700 MB | 2.0 GB | **88.9** | + +--------------------------------------------------------------------------------------+--------------------+----------+-----------------+------------+------------+ + | :config:`ner_ontonotes_bert ` | | En | 400 MB | 1.3 GB | 89.2 | + +--------------------------------------------------------------------------------------+--------------------+----------+-----------------+------------+------------+ + | :config:`ner_conll2003_bert ` | CoNLL-2003 | | 400 MB | 1.3 GB | 91.7 | + +--------------------------------------------------------------------------------------+--------------------+----------+-----------------+------------+------------+ + | :config:`ner_case_agnostic_mdistilbert ` | CoNLL-2003 | En+Ru | 700 MB | 1.6 GB | 89.4 | + | | Collection3 | | | | 96.4 | + +--------------------------------------------------------------------------------------+--------------------+----------+-----------------+------------+------------+ Models can be used from Python using the following code: .. code:: python - from deeppavlov import configs, build_model + from deeppavlov import build_model - ner_model = build_model(configs.ner.ner_ontonotes_bert_torch, download=True) + ner_model = build_model('ner_ontonotes_bert', download=True) ner_model(['Bob Ross lived in Florida']) >>> [[['Bob', 'Ross', 'lived', 'in', 'Florida']], [['B-PERSON', 'I-PERSON', 'O', 'O', 'B-GPE']]] @@ -76,21 +60,21 @@ The model also can be trained from the Python: .. code:: python - from deeppavlov import configs, train_model + from deeppavlov import train_model - ner_model = train_model(configs.ner.ner_ontonotes_bert_torch) + ner_model = train_model('ner_ontonotes_bert') The data for training should be placed in the folder provided in the config: .. code:: python - from deeppavlov import configs, train_model + from deeppavlov import train_model from deeppavlov.core.commands.utils import parse_config - - config_dict = parse_config(configs.ner.ner_ontonotes_bert_torch) + + config_dict = parse_config('ner_ontonotes_bert') print(config_dict['dataset_reader']['data_path']) - >>> '~/.deeppavlov/downloads/ontonotes' + >>> '~/.deeppavlov/downloads/ontonotes/' There must be three txt files: train.txt, valid.txt, and test.txt. Furthermore the `data_path` can be changed from code. The format of the data is described in the `Training data`_ section. @@ -102,7 +86,7 @@ Multilingual BERT Zero-Shot Transfer ------------------------------------ Multilingual BERT models allow to perform zero-shot transfer from one language to another. The model -:config:`ner_ontonotes_bert_mult_torch ` was trained on OntoNotes corpus which has 19 types +:config:`ner_ontonotes_bert_mult ` was trained on OntoNotes corpus which has 19 types in the markup schema. The model performance was evaluated on Russian corpus Collection 3 [1]_. Results of the transfer are presented in the table below. @@ -121,9 +105,9 @@ The following Python code can be used to infer the model: .. code:: python - from deeppavlov import configs, build_model + from deeppavlov import build_model - ner_model = build_model(configs.ner.ner_ontonotes_bert_mult_torch, download=True) + ner_model = build_model('ner_ontonotes_bert_mult', download=True) ner_model(['Curling World Championship will be held in Antananarivo']) >>> (['Curling', 'World', 'Championship', 'will', 'be', 'held', 'in', 'Antananarivo']], @@ -265,81 +249,6 @@ quality. Typical partition of a dataset into train, validation, and test are 80%, 10%, 10%, respectively. - -Few-shot Language-Model based ------------------------------ - -It is possible to get a cold-start baseline from just a few samples of labeled data in a couple of seconds. The solution -is based on a Language Model trained on open domain corpus. On top of the LM a SVM classification layer is placed. It is -possible to start from as few as 10 sentences containing entities of interest. - -The data for training this model should be collected in the following way. Given a collection of `N` sentences without -markup, sequentially markup sentences until the total number of sentences with entity of interest become equal -`K`. During the training both sentences with and without markup are used. - - -Mean chunk-wise F1 scores for Russian language on 10 sentences with entities : - -+---------+-------+ -|PER | 84.85 | -+---------+-------+ -|LOC | 68.41 | -+---------+-------+ -|ORG | 32.63 | -+---------+-------+ - -(the total number of training sentences is bigger and defined by the distribution of sentences with / without entities). - -The model can be trained using CLI: - -.. code:: bash - - python -m deeppavlov train ner_few_shot_ru - -you have to provide the `train.txt`, `valid.txt`, and `test.txt` files in the format described in the `Training data`_ -section. The files must be in the `ner_few_shot_data` folder as described in the `dataset_reader` part of the config -:config:`ner/ner_few_shot_ru_train.json ` . - -To train and use the model from python code the following snippet can be used: - -.. code:: python - - from deeppavlov import configs, train_model - - ner_model = train_model(configs.ner.ner_few_shot_ru, download=True) - - ner_model(['Example sentence']) - -Warning! This model can take a lot of time and memory if the number of sentences is greater than 1000! - -If a lot of data is available the few-shot setting can be simulated with special `dataset_iterator`. For this purpose -the config -:config:`ner/ner_few_shot_ru_train.json ` . The following code can be used for this -simulation: - -.. code:: python - - from deeppavlov import configs, train_model - - ner_model = train_model(configs.ner.ner_few_shot_ru_simulate, download=True) - -In this config the `Collection dataset `__ is used. However, if -there are files `train.txt`, `valid.txt`, and `test.txt` in the `ner_few_shot_data` folder they will be used instead. - - -To use existing few-shot model use the following python interface can be used: - -.. code:: python - - from deeppavlov import configs, build_model - - ner_model = build_model(configs.ner.ner_few_shot_ru) - - ner_model([['Example', 'sentence']]) - ner_model(['Example sentence']) - - - NER-based Model for Sentence Boundary Detection Task ---------------------------------------------------- @@ -365,14 +274,14 @@ dataset generated from the DailyDialog dataset [2]_: +----------------------+---------+ Here is the achieved result of training the hybrid model on the above dataset using -the config file :config:`sentseg_dailydialog `: +the config file :config:`sentseg_dailydialog_bert `: +-----------+-----------+--------+-------+ | Tag | Precision | Recall | F1 | +-----------+-----------+--------+-------+ -| Question | 96.48 | 93.49 | 94.96 | +| Question | 96.56 | 96.78 | 96.67 | +-----------+-----------+--------+-------+ -| Statement | 96.24 | 96.69 | 96.47 | +| Statement | 96.83 | 97.37 | 97.10 | +-----------+-----------+--------+-------+ | Overall | 96.30 | 95.89 | 96.10 | +-----------+-----------+--------+-------+ @@ -381,16 +290,29 @@ The command below is used to download and use the pre-trained model in the CLI: .. code:: bash - python -m deeppavlov interact sentseg_dailydialog -d + python -m deeppavlov interact sentseg_dailydialog_bert -d The model also can be trained from scratch by using the command: .. code:: bash - python -m deeppavlov train sentseg_dailydialog + python -m deeppavlov train sentseg_dailydialog_bert + + + +Multilingual Case-insensitive Named Entity Recognition +------------------------------------------------------ +Although capitalisation is an important feature for the Named Entity Recognition (NER) task, +the NER input data is not always cased, for example, virtual assistants data coming from ASR. +Moreover, while developing virtual assistants there is often a need to support interaction in several languages. +It has been shown that multilingual BERT can be successfully used for cross-lingual transfer, +performing on datasets in various languages with scores comparable to those obtained with language-specific models. +The model :config:`ner_case_agnostic_mdistilbert ` was trained on +on a concatenation of original and lowered datasets to solve the task. Our model achieves +89.5 F1 on CoNLL-2003 and 96.4 F1 on Collection 3 datasets while being robust to missing casing. Literature diff --git a/docs/features/models/neural_ranking.rst b/docs/features/models/neural_ranking.rst index a02f089f4d..dc609464f9 100644 --- a/docs/features/models/neural_ranking.rst +++ b/docs/features/models/neural_ranking.rst @@ -12,104 +12,13 @@ Training and inference models on predifined datasets BERT Ranking ~~~~~~~~~~~~ -Before using models make sure that all required packages are installed running the command for TensorFlow: - -.. code:: bash - - python -m deeppavlov install ranking_ubuntu_v2_bert_uncased - python -m deeppavlov install ranking_ubuntu_v2_bert_sep - python -m deeppavlov install ranking_ubuntu_v2_bert_sep_interact - -or on PyTorch: +Before using models make sure that all required packages are installed running the command: .. code:: bash python -m deeppavlov install ranking_ubuntu_v2_torch_bert_uncased -To train the interaction-based (accurate, slow) model on the `Ubuntu V2`_ from command line: - -:: - - python -m deeppavlov train ranking_ubuntu_v2_bert_uncased [-d] - -To train the representation-based (accurate, fast) model on the `Ubuntu V2`_ from command line: - -:: - - python -m deeppavlov train ranking_ubuntu_v2_bert_sep [-d] - -Further the trained representation-based model can be run for inference over the provided response base -(~500K in our case) from command line: - -:: - - python -m deeppavlov interact ranking_ubuntu_v2_bert_sep_interact [-d] - -Statistics on the models quality are available :doc:`here `. - -Building your own response base for bert ranking -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -For the BERT-based models we have the following mechanism of building your own response base. -If you run ``python -m deeppavlov download ranking_ubuntu_v2_bert_sep_interact`` in console -the model with the existing base will be downloaded. -If you look in the folder where the model with the base is located you will find four files: -``contexts.csv``, ``responses.csv``, ``cont_vecs.npy``, ``resp_vecs.npy``. -These are possible responses with their corresponding contexts (``.csv`` files) and their vector representations (``.npy`` files) -indexed using the model. Contexts for responses are used as additional features in some modes of the model operation -(see the attribute ``interact_mode`` in the class :class:`~deeppavlov.models.preprocessors.bert_preprocessor.BertSepRankerPredictorPreprocessor`). -If you would like to use your own response base you should remove all four files indicated above -and place your own ``responses.csv`` file in the folder, -and probably ``contexts.csv`` file depending on the value of the ``interact_mode`` you are planning to use. -The format of these files is very simple, namely each line should represent single response (or context). -You can use existing files as an example. Numbers of lines in ``responses.csv`` and ``contexts.csv`` must match exactly. -Once you have provided these files, you can run the above command in console. -As the system will not find vector representations, it will build them first. -You will see the message ``Building BERT features for the response base...`` -(and probably ``Building BERT features for the context base...``) and then -``Building BERT vector representations for the response base...`` -(and probably ``Building BERT vector representations for the context base...``). -After this is done, you will be able to interact with the system. -Next time you will use the model, built vector representations will be loaded. - -Ranking -~~~~~~~ - -To use Sequential Matching Network (SMN) or Deep Attention Matching Network (DAM) or -Deep Attention Matching Network with Universal Sentence Encoder (DAM-USE-T) -on the `Ubuntu V2`_ for inference, please run one of the following commands: - -:: - - python -m deeppavlov interact -d ranking_ubuntu_v2_mt_word2vec_smn - python -m deeppavlov interact -d ranking_ubuntu_v2_mt_word2vec_dam_transformer - -Now a user can enter a dialog consists of 10 context sentences and several (>=1) candidate response sentences separated by '&' -and then get the probability that the response is proper continuation of the dialog: - -:: - - :: & & & & & & & & bonhoeffer whar drives do you want to mount what & i have an ext3 usb drive & look with fdisk -l & hello there & fdisk is all you need - >> [0.9776373 0.05753616 0.9642599 ] - -To train the models on the `Ubuntu V2`_ dataset please run one of the following commands: - -:: - - python -m deeppavlov train -d ranking_ubuntu_v2_mt_word2vec_smn - python -m deeppavlov train -d ranking_ubuntu_v2_mt_word2vec_dam_transformer - -As an example of configuration file see -:config:`ranking_ubuntu_v2_mt_word2vec_smn.json `. - -If the model with multi-turn context is used -(such as :class:`~deeppavlov.models.ranking.bilstm_gru_siamese_network.BiLSTMGRUSiameseNetwork` -with the parameter ``num_context_turns`` set to the value higher than 1 in the configuration JSON file) -then the ``context`` to evaluate should consist of ``num_context_turns`` strings connected by the ampersand. -Some of these strings can be empty, i.e. equal to ``''``. - - Paraphrase identification ~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -120,80 +29,16 @@ Before using the model make sure that all required packages are installed runnin .. code:: bash - python -m deeppavlov install paraphrase_ident_paraphraser - python -m deeppavlov install elmo_paraphraser_fine_tuning - -To train the model on the `paraphraser.ru`_ dataset with fasttext embeddings one can use the following code in python: - -.. code:: python - - from deeppavlov import configs, train_model - - para_model = train_model(configs.ranking.paraphrase_ident_paraphraser, download=True) - - -To train the model on the `paraphraser.ru`_ dataset with fine-tuned ELMO embeddings one should first fine-tune ELMO embeddings: - -.. code:: python - - from deeppavlov import configs, train_model - - para_model = train_model(configs.elmo.elmo_paraphraser_fine_tuning, download=True) - -Training and inference on your own data ---------------------------------------- - -Ranking -~~~~~~~ - -To train the model for ranking on your own data you should write your own :class:`~deeppavlov.core.data.dataset_reader.DatasetReader` component -or you can use default :class:`~deeppavlov.dataset_readers.siamese_reader.SiameseReader`. In the latter case, you should provide -three separate files in the default data format described below: - -**train.csv**: each line in the file contains ``context``, ``response`` and ``label`` separated by the tab key. ``label`` can be -binary, i.e. 1 or 0 corresponding to the correct or incorrect ``response`` for the given ``context``, or it can be multi-class label. -In the latter case, each unique ``context`` has the unique class ``label`` and the only correct ``response`` is indicated for each ``context``. -Currently, all ranking and paraphrase identification models support `cross-entropy loss` training with binary labels. -Some models, such as :class:`~deeppavlov.models.ranking.bilstm_siamese_network.BiLSTMSiameseNetwork`, -:class:`~deeppavlov.models.ranking.bilstm_gru_siamese_network.BiLSTMGRUSiameseNetwork` -and :class:`~deeppavlov.models.ranking.mpm_siamese_network.MPMSiameseNetwork` support also training with `triplet loss` -(the parameter ``triplet_loss`` should be set to ``true`` for the model in the configuration JSON file in this case) -which can give potentially few percent of performance over the `cross-entropy loss` training. - -If the model with multi-turn context is used -(such as :class:`~deeppavlov.models.ranking.bilstm_gru_siamese_network.BiLSTMGRUSiameseNetwork` -with the parameter ``num_context_turns`` set to the value higher than 1 in the configuration JSON file) -then the ``context`` should be specified with ``num_context_turns`` strings separated by the tab key instead of a single string. -Some of these strings can be empty, i.e. equal to ``''``. + python -m deeppavlov install paraphraser_rubert -Classification metrics on the train dataset part (the parameter ``train_metrics`` in the JSON configuration file) -such as ``f1``, ``acc`` and ``log_loss`` can be calculated only in the ``cross-entropy loss`` training mode. -Both, `cross-entropy loss` and `triplet loss` training can output loss function value returned by -:meth:`~deeppavlov.models.ranking.siamese_model.SiameseModel.train_on_batch` if the ``log_every_n_batches`` parameter is set to the non-negative value. - - -**valid.csv**, **test.csv**: each line in these files contains ``context``, ``response_1``, ``response_2``, ..., ``response_n`` -separated by the tab key, where ``response_1`` is the correct response for the given ``context`` and the rest ``response_2``, ..., ``response_n`` -are incorrect response candidates. The number of responses `n` in these files should correspond to the -parameter ``num_ranking_samples`` in the JSON configuration file. As an example see - -Such ranking metrics on the valid and test parts of the dataset (the parameter ``metrics`` in the JSON configuration file) as -``r@1``, ``r@2``, ..., ``r@n`` and ``rank_response`` can be evaluated. - -As an example of data usage in the default format, please, see :config:`ranking_default.json `. -To train the model with this configuration file in python: +To train the model on the `paraphraser.ru`_ dataset one can use the following code in Python: .. code:: python from deeppavlov import configs, train_model - rank_model = train_model(configs.ranking.ranking_default, download=True) - -To train from command line: - -:: + para_model = train_model('paraphraser_rubert', download=True) - python -m deeppavlov train deeppavlov/configs/ranking/ranking_default.json [-d] Paraphrase identification ~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/docs/features/models/slot_filling.rst b/docs/features/models/slot_filling.rst deleted file mode 100644 index 39b6e0c230..0000000000 --- a/docs/features/models/slot_filling.rst +++ /dev/null @@ -1,264 +0,0 @@ -Neural Named Entity Recognition and Slot Filling -================================================ - -This model solves Slot-Filling task using Levenshtein search and different neural network architectures for NER. -To read about NER without slot filling please address :doc:`NER documentation `. -This model serves for solving DSTC 2 Slot-Filling task. In most of the cases, NER task can be formulated as: - -*Given a sequence of tokens (words, and maybe punctuation symbols) -provide a tag from a predefined set of tags for each token in the -sequence.* - -For NER task there are some common types of entities used as tags: - -- persons -- locations -- organizations -- expressions of time -- quantities -- monetary values - -Furthermore, to distinguish adjacent entities with the same tag many -applications use BIO tagging scheme. Here "B" denotes beginning of an -entity, "I" stands for "inside" and is used for all words comprising the -entity except the first one, and "O" means the absence of entity. -Example with dropped punctuation: - -:: - - Restaraunt O - in O - the O - west B-LOC - of O - the O - city O - serving O - modern B-FOOD - european I-FOOD - cuisine O - -In the example above, ``FOOD`` means food tag, ``LOC`` means location -tag, and "B-" and "I-" are prefixes identifying beginnings and -continuations of the entities. - -Slot Filling is a typical step after the NER. It can be formulated as: - -*Given an entity of a certain type and a set of all possible values of -this entity type provide a normalized form of the entity.* - -In this model, the Slot Filling task is solved by Levenshtein -Distance search across all known entities of a given type. - -For example, there is an entity of "food" type: - -*chainese* - -It is definitely misspelled. The set of all known food entities is -{'chinese', 'russian', 'european'}. The nearest known entity from the -given set is *chinese*. So the output of the Slot Filling system will be -*chinese*. - -Configuration of the model --------------------------- - -Configuration of the model can be performed in code or in JSON configuration file. -To train the model you need to specify four groups of parameters: - -- ``dataset_reader`` -- ``dataset_iterator`` -- ``chainer`` -- ``train`` - -In the subsequent text we show the parameter specification in config -file. However, the same notation can be used to specify parameters in -code by replacing the JSON with python dictionary. - -Dataset Reader -~~~~~~~~~~~~~~ - -The dataset reader is a class which reads and parses the data. It -returns a dictionary with three fields: "train", "test", and "valid". -The basic dataset reader is "ner\_dataset\_reader." The dataset reader -config part with "ner\_dataset\_reader" should look like: - -:: - - "dataset_reader": { - "class_name": "dstc2_datasetreader", - "data_path": "dstc2" - } - -where ``class_name`` refers to the basic ner dataset reader class and ``data_path`` -is the path to the folder with DSTC 2 dataset. - -Dataset Iterator -~~~~~~~~~~~~~~~~ - -For simple batching and shuffling you can use "dstc2\_ner\_iterator". -The part of the configuration file for the dataset iterator looks like: -``"dataset_iterator": { "class_name": "dstc2_ner_iterator" }`` - -There are no additional parameters in this part. - -Chainer -~~~~~~~ - -The chainer part of the configuration file contains the specification of -the neural network model and supplementary things such as vocabularies. -The chainer part must have the following form: - -:: - - "chainer": { - "in": ["x"], - "in_y": ["y"], - "pipe": [ - ... - ], - "out": ["y_predicted"] - } - -The inputs and outputs must be specified in the pipe. "in" means regular -input that is used for inference and train mode. "in\_y" is used for -training and usually contains ground truth answers. "out" field stands -for model prediction. The model inside the pipe must have output -variable with name "y\_predicted" so that "out" knows where to get -predictions. - -The major part of "chainer" is "pipe". The "pipe" contains the -pre-processing modules, vocabularies and model. However, we can use -existing pipelines: - -:: - - "pipe": [ - { - "in": ["x"], - "class_name": "lazy_tokenizer", - "out": ["x"] - }, - { - "in": ["x"], - "config_path": "../deeppavlov/configs/ner/ner_dstc2.json", - "out": ["tags"] - }, - ... - ] - -This part will initialize already existing pre-trained NER module. The -only thing need to be specified is path to existing config. The -preceding lazy tokenizer serves to extract tokens for raw string of -text. - -The following component in the pipeline is the ``slotfiller``: - -:: - - "pipe": [ - { - "in": ["x_lower", "tags"], - "class_name": "dstc_slotfilling", - "save_path": "slotfill_dstc2/dstc_slot_vals.json", - "load_path": "slotfill_dstc2/dstc_slot_vals.json", - "out": ["slots"] - } - -The ``slotfiller`` takes the tags and tokens to perform normalization of -extracted entities. The normalization is performed via fuzzy Levenshtein -search in dstc\_slot\_vals dictionary. The output of this component is -dictionary of slot values found in the input utterances. - -The main part of the ``dstc_slotfilling`` componet is the slot values -dictionary. The dicttionary has the following structure: - -:: - - { - "entity_type_0": { - "entity_value_0": [ - "entity_value_0_variation_0", - "entity_value_0_variation_1", - "entity_value_0_variation_2" - ], - "entity_value_1": [ - "entity_value_1_variation_0" - ], - ... - } - "entity_type_1": { - ... - -Slotfiller will perform fuzzy search through the all variations of all -entity values of given entity type. The entity type is determined by the -NER component. - -The last part of the config is metadata: - -:: - - "metadata": { - "variables": { - "ROOT_PATH": "~/.deeppavlov", - "NER_CONFIG_PATH": "{DEEPPAVLOV_PATH}/configs/ner/ner_dstc2.json", - "DATA_PATH": "{ROOT_PATH}/downloads/dstc2", - "SLOT_VALS_PATH": "{DATA_PATH}/dstc_slot_vals.json", - "MODELS_PATH": "{ROOT_PATH}/models", - "MODEL_PATH": "{MODELS_PATH}/slotfill_dstc2" - }, - "download": [ - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/dstc_slot_vals.tar.gz", - "subdir": "{DATA_PATH}" - }, - { - "url": "http://files.deeppavlov.ai/deeppavlov_data/slotfill_dstc2.tar.gz", - "subdir": "{MODELS_PATH}" - } - ] - } - -It contains information for deployment of the model and urls for -download pre-trained models. - -You can see all parts together in ``deeeppavlov/configs/ner/slotfill_dstc2.json`` - -Usage of the model ------------------- - -Please see an example of training a Slot Filling model and using it for -prediction: - -.. code:: python - - from deeppavlov import build_model, configs - - PIPELINE_CONFIG_PATH = configs.ner.slotfill_dstc2 - slotfill_model = build_model(PIPELINE_CONFIG_PATH, download=True) - slotfill_model(['I would like some chinese food', 'The west part of the city would be nice']) - -This example assumes that the working directory is the root of the -project. - -Slotfilling without NER ------------------------ - -An alternative approach to Slot Filling problem could be fuzzy search -for each instance of each slot value inside the text. This approach is -realized in ``slotfill_raw`` component. The component uses needle in -haystack - -The main advantage of this approach is elimination of a separate Named -Entity Recognition module. However, absence of NER module make this -model less robust to noise (words with similar spelling) especially for -long utterances. - -Usage example: - -.. code:: python - - from deeppavlov import build_model, configs - - PIPELINE_CONFIG_PATH = configs.ner.slotfill_dstc2_raw - slotfill_model = build_model(PIPELINE_CONFIG_PATH, download=True) - slotfill_model(['I would like some chinese food', 'The west part of the city would be nice']) diff --git a/docs/features/models/spelling_correction.rst b/docs/features/models/spelling_correction.rst index e5d16ba3db..43827a506b 100644 --- a/docs/features/models/spelling_correction.rst +++ b/docs/features/models/spelling_correction.rst @@ -49,7 +49,7 @@ lines to stdout: from deeppavlov import build_model, configs - CONFIG_PATH = configs.spelling_correction.brillmoore_kartaslov_ru + CONFIG_PATH = configs.spelling_correction.levenshtein_corrector_ru model = build_model(CONFIG_PATH, download=True) for line in sys.stdin: @@ -185,14 +185,9 @@ on Automatic Spelling Correction for Russian: +-----------------------------------------------------------------------------------------+-----------+--------+-----------+---------------------+ | :config:`Damerau Levenshtein 1 + lm` | 59.38 | 53.44 | 56.25 | 39.3 | +-----------------------------------------------------------------------------------------+-----------+--------+-----------+---------------------+ -| :config:`Brill Moore top 4 + lm` | 51.92 | 53.94 | 52.91 | 0.6 | -+-----------------------------------------------------------------------------------------+-----------+--------+-----------+---------------------+ | Hunspell + lm | 41.03 | 48.89 | 44.61 | 2.1 | +-----------------------------------------------------------------------------------------+-----------+--------+-----------+---------------------+ | JamSpell | 44.57 | 35.69 | 39.64 | 136.2 | +-----------------------------------------------------------------------------------------+-----------+--------+-----------+---------------------+ -| :config:`Brill Moore top 1 ` | 41.29 | 37.26 | 39.17 | 2.4 | -+-----------------------------------------------------------------------------------------+-----------+--------+-----------+---------------------+ | Hunspell | 30.30 | 34.02 | 32.06 | 20.3 | +-----------------------------------------------------------------------------------------+-----------+--------+-----------+---------------------+ - diff --git a/docs/features/models/squad.rst b/docs/features/models/squad.rst index 128ea07627..ab30ddab39 100644 --- a/docs/features/models/squad.rst +++ b/docs/features/models/squad.rst @@ -37,7 +37,7 @@ Datasets, which follow this task format: Models ------ -There are two models for this task in DeepPavlov: BERT-based and R-Net. Both models predict answer start and end +SQuAD model in DeepPavlov is based on BERT. The model predicts answer start and end position in a given context. Their performance is compared in :ref:`pretrained models ` section of this documentation. @@ -47,19 +47,7 @@ Pretrained BERT can be used for Question Answering on SQuAD dataset just by appl BERT outputs for each subtoken. First/second linear transformation is used for prediction of probability that current subtoken is start/end position of an answer. -BERT for SQuAD model documentation on TensorFlow :class:`~deeppavlov.models.bert.bert_squad.BertSQuADModel` -and on PyTorch :class:`~deeppavlov.models.torch_bert.torch_transformers_squad:TorchTransformersSquad`. - -R-Net -~~~~~ - -Question Answering Model is based on R-Net, proposed by Microsoft -Research Asia (`"R-NET: Machine Reading Comprehension with Self-matching -Networks" `__) -and its `implementation `__ by -Wenxuan Zhou. - -R-Net for SQuAD model documentation: :class:`~deeppavlov.models.squad.squad.SquadModel` +BERT for SQuAD model documentation on PyTorch :class:`~deeppavlov.models.torch_bert.torch_transformers_squad:TorchTransformersSquad`. Configuration ------------- @@ -69,31 +57,24 @@ Default configs could be found in :config:`deeppavlov/configs/squad/ ` f Prerequisites ------------- -Before using the model make sure that all required packages are installed running the command for TensorFlow: +Before using the model make sure that all required packages are installed running the command: .. code:: bash python -m deeppavlov install squad_bert -and for PyTorch - -.. code:: bash - - python -m deeppavlov install squad_torch_bert - By running this command we will install requirements for -:config:`deeppavlov/configs/squad/squad_bert.json ` or for -:config:`deeppavlov/configs/squad/squad_torch_bert.json ` +:config:`deeppavlov/configs/squad/squad_bert.json `. Model usage from Python ----------------------- .. code:: python - from deeppavlov import build_model, configs + from deeppavlov import build_model - model = build_model(configs.squad.squad, download=True) + model = build_model('squad_bert', download=True) model(['DeepPavlov is library for NLP and dialog systems.'], ['What is DeepPavlov?']) @@ -110,7 +91,7 @@ following command to train the model: .. code:: bash - python -m deeppavlov train deeppavlov/configs/squad/squad_bert.json + python -m deeppavlov train squad_bert Interact mode ~~~~~~~~~~~~~ @@ -121,7 +102,7 @@ To run model in interact mode run the following command: .. code:: bash - python -m deeppavlov interact deeppavlov/configs/squad/squad_bert.json + python -m deeppavlov interact squad_bert Model will ask you to type in context and question. @@ -137,7 +118,7 @@ We have all pretrained model available to download: .. code:: bash - python -m deeppavlov download deeppavlov/configs/squad/squad_bert.json + python -m deeppavlov download squad_bert It achieves ~88 F-1 score and ~80 EM on `SQuAD-v1.1`_ dev set. @@ -147,11 +128,7 @@ Leadearboad `__. +---------------------------------------------------------+----------------+-----------------+ | Model (single model) | EM (dev) | F-1 (dev) | +=========================================================+================+=================+ -| :config:`DeepPavlov BERT ` | 80.88 | 88.49 | -+---------------------------------------------------------+----------------+-----------------+ -| :config:`BERT on PyTorch ` | 78.8 | 86.7 | -+---------------------------------------------------------+----------------+-----------------+ -| :config:`DeepPavlov R-Net ` | 71.49 | 80.34 | +| :config:`DeepPavlov BERT ` | 81.49 | 88.86 | +---------------------------------------------------------+----------------+-----------------+ | `BiDAF + Self Attention + ELMo`_ | -- | 85.6 | +---------------------------------------------------------+----------------+-----------------+ @@ -174,11 +151,9 @@ Leadearboad `__. SQuAD with contexts without correct answers ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -In the case when answer is not necessary present in given context we have :config:`squad_noans ` -config with pretrained model. This model outputs empty string in case if there is no answer in context. -This model was trained not on SQuAD dataset. For each question-context pair from SQuAD we extracted contexts from the same -Wikipedia article and ranked them according to tf-idf score between question and context. In this manner we built dataset -with contexts without an answer. +In the case when answer is not necessary present in given context we have :config:`squad_noans ` +with pretrained model. This model outputs empty string in case if there is no answer in +context. :config:`squad_noans ` was trained on SQuAD2.0 dataset. Special trainable `no_answer` token is added to output of self-attention layer and it makes model able to select `no_answer` token in cases, when answer is not present in given context. @@ -188,7 +163,7 @@ We got 57.88 EM and 65.91 F-1 on ground truth Wikipedia article (we used the sam +---------------+-----------------------------------------------+----------------+-----------------+ | Model config | EM (dev) | F-1 (dev) | +===============================================================+================+=================+ -| :config:`DeepPavlov ` | 57.88 | 65.91 | +| :config:`DeepPavlov ` | 75.54 | 83.56 | +---------------------------------------------------------------+----------------+-----------------+ | `Simple and Effective Multi-Paragraph Reading Comprehension`_ | 59.14 | 67.34 | +---------------------------------------------------------------+----------------+-----------------+ @@ -199,7 +174,7 @@ Pretrained model is available and can be downloaded (~2.5Gb): .. code:: bash - python -m deeppavlov download deeppavlov/configs/squad/multi_squad_noans.json + python -m deeppavlov download qa_squad2_bert .. _`DrQA`: https://arxiv.org/abs/1704.00051 @@ -208,46 +183,17 @@ Pretrained model is available and can be downloaded (~2.5Gb): SDSJ Task B ~~~~~~~~~~~ -Pretrained models are available and can be downloaded: +Pretrained model is available and can be downloaded: .. code:: bash - python -m deeppavlov download deeppavlov/configs/squad/squad_ru.json - - python -m deeppavlov download deeppavlov/configs/squad/squad_ru_rubert_infer.json - - python -m deeppavlov download deeppavlov/configs/squad/squad_ru_bert_infer.json + python -m deeppavlov download squad_ru_bert Link to SDSJ Task B dataset: http://files.deeppavlov.ai/datasets/sber_squad-v1.1.tar.gz +------------------------------------------------------------------------+----------------+-----------------+ | Model config | EM (dev) | F-1 (dev) | +========================================================================+================+=================+ -| :config:`DeepPavlov RuBERT ` | 66.30+-0.24 | 84.60+-0.11 | -+------------------------------------------------------------------------+----------------+-----------------+ -| :config:`DeepPavlov multilingual BERT `| 64.35+-0.39 | 83.39+-0.08 | +| :config:`DeepPavlov RuBERT ` | 66.21 | 84.71 | +------------------------------------------------------------------------+----------------+-----------------+ -| :config:`DeepPavlov R-Net ` | 60.62 | 80.04 | -+------------------------------------------------------------------------+----------------+-----------------+ - - -DRCD -~~~~~~~~~~~ - -Pretrained models are available and can be downloaded: - -.. code:: bash - python -m deeppavlov download deeppavlov/configs/squad/squad_zh_bert.json - python -m deeppavlov download deeppavlov/configs/squad/squad_zh_zh_bert.json - -Link to DRCD dataset: http://files.deeppavlov.ai/datasets/DRCD.tar.gz -Link to DRCD paper: https://arxiv.org/abs/1806.00920 - -+------------------------------------------------------------------------+----------------+-----------------+ -| Model config | EM (dev) | F-1 (dev) | -+========================================================================+================+=================+ -| :config:`DeepPavlov ChineseBERT ` | 84.19 | 89.23 | -+------------------------------------------------------------------------+----------------+-----------------+ -| :config:`DeepPavlov multilingual BERT ` | 84.86 | 89.03 | -+------------------------------------------------------------------------+----------------+-----------------+ diff --git a/docs/features/models/syntaxparser.rst b/docs/features/models/syntaxparser.rst deleted file mode 100644 index b08ce2ffb7..0000000000 --- a/docs/features/models/syntaxparser.rst +++ /dev/null @@ -1,170 +0,0 @@ -Syntactic parsing -============================ - -Syntactic parsing is the task of prediction of the syntactic tree given the tokenized (or raw) sentence. -The typical output of the parser looks looks like - -.. image:: /_static/tree.png - -To define a tree, for each word one should know its syntactic head and the dependency label for the edge between them. -For example, the tree above can be restored from the data - -:: - - 1 John 2 nsubj - 2 bought 0 root - 3 a 6 det - 4 very 5 advmod - 5 tasty 6 amod - 6 cake 2 obj - 7 . . 2 punct - -Here the third column contains the positions of syntactic heads and the last one -- the dependency labels. -The words are enumerated from 1 since 0 is the index of the artificial root of the tree, whose only -dependent is the actual syntactic head of the sentence (usually a verb). - -Syntactic trees can be used in many information extraction tasks. For example, to detect who is the winner -and who is the loser in the sentence *Manchester defeated Liverpool* one relies on the word order. However, -many languages, such as Russian, Spanish and German, have relatively free word order, which means we need -other cues. Note also that syntactic relations (`nsubj`, `obj` and so one) have clear semantic counterparts, -which makes syntactic parsing an appealing preprocessing step for the semantic-oriented tasks. - -Model usage ------------ - -Before using the model make sure that all required packages are installed using the command: - -.. code:: bash - - python -m deeppavlov install syntax_ru_syntagrus_bert - -Our model produces the output in `CONLL-U format `__ -and is trained on Universal Dependency corpora, available on http://universaldependencies.org/format.html . -The example usage for inference is - -.. code:: python - - from deeppavlov import build_model, configs - model = build_model(configs.syntax.syntax_ru_syntagrus_bert, download=True) - sentences = ["Я шёл домой по незнакомой улице.", "Девушка пела в церковном хоре."] - for parse in model(sentences): - print(parse, end="\n\n") - - -:: - - 1 Я _ _ _ _ 2 nsubj _ _ - 2 шёл _ _ _ _ 0 root _ _ - 3 домой _ _ _ _ 2 advmod _ _ - 4 по _ _ _ _ 6 case _ _ - 5 незнакомой _ _ _ _ 6 amod _ _ - 6 улице _ _ _ _ 2 obl _ _ - 7 . _ _ _ _ 2 punct _ _ - - 1 Девушка _ _ _ _ 2 nsubj _ _ - 2 пела _ _ _ _ 0 root _ _ - 3 в _ _ _ _ 5 case _ _ - 4 церковном _ _ _ _ 5 amod _ _ - 5 хоре _ _ _ _ 2 obl _ _ - 6 . _ _ _ _ 2 punct _ _ - -As prescribed by UD standards, our model writes the head information to the 7th column and the dependency -information -- to the 8th. Our parser does not return morphological tags and even does not use them in -training. - -Model training is done via configuration files, see the -:config:`configuration file ` for reference. Note that as any BERT -model, it requires 16GB of GPU and the training speed is 1-5 sentences per second. However, you can -try less powerful GPU at your own risk (the author himself was able to run the model on 11GB). -The inference speed is several hundreds sentences per second, depending on their length, on GPU -and one magnitude lower on CPU. - -For other usage options see the :doc:`morphological tagger documentation `, -the training and prediction procedure is analogous, only the model name is changed. - -Joint model usage ------------------ - -Our model in principle supports joint prediction of morphological tags and syntactic information, -however, the quality of the joint model is slightly inferior to the separate ones. Therefore we -release a special component that can combine the outputs of tagger and parser: -:class:`~deeppavlov.models.syntax_parser.joint.JointTaggerParser`. Its sample output for the -Russian language with default settings -(see the :config:`configuration file ` for exact options) -looks like - -.. code:: python - - from deeppavlov import build_model, configs - model = build_model("ru_syntagrus_joint_parsing", download=True) - sentences = ["Я шёл домой по незнакомой улице.", "Девушка пела в церковном хоре."] - for parse in model(sentences): - print(parse, end="\n\n") - -:: - - 1 Я я PRON _ Case=Nom|Number=Sing|Person=1 2 nsubj _ _ - 2 шёл идти VERB _ Aspect=Imp|Gender=Masc|Mood=Ind|Number=Sing|Tense=Past|VerbForm=Fin|Voice=Act 0 root _ _ - 3 домой домой ADV _ Degree=Pos 2 advmod _ _ - 4 по по ADP _ _ 6 case _ _ - 5 незнакомой незнакомый ADJ _ Case=Dat|Degree=Pos|Gender=Fem|Number=Sing 6 amod _ _ - 6 улице улица NOUN _ Animacy=Inan|Case=Dat|Gender=Fem|Number=Sing 2 obl _ _ - 7 . . PUNCT _ _ 2 punct _ _ - - 1 Девушка девушка NOUN _ Animacy=Anim|Case=Nom|Gender=Fem|Number=Sing 2 nsubj _ _ - 2 пела петь VERB _ Aspect=Imp|Gender=Fem|Mood=Ind|Number=Sing|Tense=Past|VerbForm=Fin|Voice=Act 0 root _ _ - 3 в в ADP _ _ 5 case _ _ - 4 церковном церковный ADJ _ Case=Loc|Degree=Pos|Gender=Masc|Number=Sing 5 amod _ _ - 5 хоре хор NOUN _ Animacy=Inan|Case=Loc|Gender=Masc|Number=Sing 2 obl _ _ - 6 . . PUNCT _ _ 2 punct _ _ - -In the basic case the model outputs a human-readable string with parse data for each information. If you need -to use the output in Python, consult the -:class:`class documentation ` and source code. - -Model architecture ------------------- - -We use BERT as the lowest layer of our model (the embedder). To extract syntactic information we apply -the biaffine network of `[Dozat, Manning, 2017] `__. -For each sentence of length `K` this network produces two outputs: the first is an array of shape ``K*(K+1)``, -where `i`-th row is the probability distribution of the head of `i`-th word over the sentence elements. -The 0-th element of this distribution is the probability of the word to be a root of the sentence. -The second output of the network is of shape `K*D`, where `D` is the number of possible dependency labels. - -The easiest way to obtain a tree is simply to return the head with the highest probability -for each word in the sentence. However, the graph obtained in such a way may fail to be a valid tree: -it may either contain a cycle or have multiple nodes with head at position 0. -Therefore we apply the well-known Chu-Liu-Edmonds algorithm for minimal spanning tree -to return the optimal tree, using the open-source modification from -`dependency_decoding package `. - -Model quality -------------- - -Syntactic parsers are evaluated using two metrics: UAS (unlabeled attachment score), which is -the percentage of correctly predicted head positions. The second metric is LAS (labeled attachment -score) which treats as positive only the words with correctly predicted dependency label -and dependency head. - -.. table:: - :widths: auto - - +-------------------------+-------------------------------------------------------------------------------------------+---------+----------+ - | Dataset | Model | UAS | LAS | - +=========================+===========================================================================================+=========+==========+ - | `UD2.3`_ (Russian) | `UD Pipe 2.3`_ (Straka et al., 2017) | 90.3 | 89.0 | - | +-------------------------------------------------------------------------------------------+---------+----------+ - | | `UD Pipe Future`_ (Straka, 2018) | 93.0 | 91.5 | - | +-------------------------------------------------------------------------------------------+---------+----------+ - | | `UDify (multilingual BERT)`_ (Kondratyuk, 2018) | 94.8 | 93.1 | - | +-------------------------------------------------------------------------------------------+---------+----------+ - | |:config:`our BERT model ` | 95.2 | 93.7 | - +-------------------------+-------------------------------------------------------------------------------------------+---------+----------+ - -.. _`UD2.3`: http://hdl.handle.net/11234/1-2895 -.. _`UD Pipe 2.3`: http://ufal.mff.cuni.cz/udpipe -.. _`UD Pipe Future`: https://github.com/CoNLL-UD-2018/UDPipe-Future -.. _`UDify (multilingual BERT)`: https://github.com/hyperparticle/udify - -So our model is the state-of-the-art system for Russian syntactic parsing by a valuable margin. diff --git a/docs/features/models/tfidf_ranking.rst b/docs/features/models/tfidf_ranking.rst index d594b16b12..699c0d1b22 100644 --- a/docs/features/models/tfidf_ranking.rst +++ b/docs/features/models/tfidf_ranking.rst @@ -161,9 +161,7 @@ Scores for **TF-IDF Ranker** model: | Model | Dataset | Recall@5 | +------------------------------------------------------------------------------+----------------+-----------------+ | :config:`enwiki20180211 ` | | 75.6 | -+------------------------------------------------------------------------------+ +-----------------+ -| :config:`enwiki20161221 ` | SQuAD (dev) | 76.2 | -+------------------------------------------------------------------------------+ +-----------------+ ++------------------------------------------------------------------------------+ SQuAD (dev) +-----------------+ | `DrQA`_ enwiki20161221 | | 77.8 | +------------------------------------------------------------------------------+----------------+-----------------+ diff --git a/docs/features/overview.rst b/docs/features/overview.rst index 7a515a40cb..10a54863a3 100644 --- a/docs/features/overview.rst +++ b/docs/features/overview.rst @@ -9,56 +9,28 @@ Models NER model :doc:`[docs] ` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -There are two models for Named Entity Recognition task in DeepPavlov: -BERT-based and Bi-LSTM+CRF. The models predict tags (in BIO format) for tokens -in input. +Named Entity Recognition task in DeepPavlov is solved with BERT-based model. +The models predict tags (in BIO format) for tokens in input. BERT-based model is described in `BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding `__. -The second model reproduces architecture from the paper `Application -of a Hybrid Bi-LSTM-CRF model to the task of Russian Named Entity Recognition `__ -which is inspired by Bi-LSTM+CRF architecture from https://arxiv.org/pdf/1603.01360.pdf. - +---------------------------------------------------------+-------+--------------------------------------------------------------------------------------------+-------------+ | Dataset | Lang | Model | Test F1 | +=========================================================+=======+============================================================================================+=============+ -| Persons-1000 dataset with additional LOC and ORG markup | Ru | :config:`ner_rus_bert.json ` | 98.1 | -+ + +--------------------------------------------------------------------------------------------+-------------+ -| (Collection 3) | | :config:`ner_rus.json ` | 95.1 | +| Persons-1000 dataset with additional LOC and ORG markup | Ru | :config:`ner_rus_bert.json ` | 97.9 | + + +--------------------------------------------------------------------------------------------+-------------+ -| | | :config:`ner_rus_convers_distilrubert_2L.json ` | 88.4 ± 0.5 | +| (Collection 3) | | :config:`ner_rus_convers_distilrubert_2L.json ` | 88.4 ± 0.5 | + + +--------------------------------------------------------------------------------------------+-------------+ | | | :config:`ner_rus_convers_distilrubert_6L.json ` | 93.3 ± 0.3 | +---------------------------------------------------------+-------+--------------------------------------------------------------------------------------------+-------------+ -| Ontonotes | Multi | :config:`ner_ontonotes_bert_mult.json ` | 88.8 | +| Ontonotes | Multi | :config:`ner_ontonotes_bert_mult.json ` | 88.9 | + +-------+--------------------------------------------------------------------------------------------+-------------+ -| | En | :config:`ner_ontonotes_bert.json ` | 88.6 | -+ + +--------------------------------------------------------------------------------------------+-------------+ -| | | :config:`ner_ontonotes.json ` | 87.1 | +| | En | :config:`ner_ontonotes_bert.json ` | 89.2 | +---------------------------------------------------------+ +--------------------------------------------------------------------------------------------+-------------+ | ConLL-2003 | | :config:`ner_conll2003_bert.json ` | 91.7 | -+ + +--------------------------------------------------------------------------------------------+-------------+ -| | | :config:`ner_conll2003_torch_bert.json ` | 88.6 | -+ + +--------------------------------------------------------------------------------------------+-------------+ -| | | :config:`ner_conll2003.json ` | 89.9 | -+---------------------------------------------------------+ +--------------------------------------------------------------------------------------------+-------------+ -| DSTC2 | | :config:`ner_dstc2.json ` | 97.1 | +---------------------------------------------------------+-------+--------------------------------------------------------------------------------------------+-------------+ -Slot filling models :doc:`[docs] ` -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Based on fuzzy Levenshtein search to extract normalized slot values from text. The models either rely on NER results -or perform needle in haystack search. - -+---------------------------------------------------------------------------------------------------------------------------+------------------+ -| Dataset | Slots Accuracy | -+===========================================================================================================================+==================+ -| :config:`DSTC 2 ` | 98.85 | -+---------------------------------------------------------------------------------------------------------------------------+------------------+ - - Classification model :doc:`[docs] ` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -70,66 +42,21 @@ Several pre-trained models are available and presented in Table below. +------------------+---------------------+------+----------------------------------------------------------------------------------------------------+-------------+------------------+-----------------+-----------+ | Task | Dataset | Lang | Model | Metric | Valid | Test | Downloads | +==================+=====================+======+====================================================================================================+=============+==================+=================+===========+ -| 28 intents | `DSTC 2`_ | En | :config:`DSTC 2 emb ` | Accuracy | 0.7613 | 0.7733 | 800 Mb | -+ + + +----------------------------------------------------------------------------------------------------+ +------------------+-----------------+-----------+ -| | | | :config:`Wiki emb ` | | 0.9629 | 0.9617 | 8.5 Gb | -+ + + +----------------------------------------------------------------------------------------------------+ +------------------+-----------------+-----------+ -| | | | :config:`BERT ` | | 0.9673 | 0.9636 | 800 Mb | -+------------------+---------------------+ +----------------------------------------------------------------------------------------------------+-------------+------------------+-----------------+-----------+ -| 7 intents | `SNIPS-2017`_ [1]_ | | :config:`DSTC 2 emb ` | F1-macro | 0.8591 | -- | 800 Mb | -+ + + +----------------------------------------------------------------------------------------------------+ +------------------+-----------------+-----------+ -| | | | :config:`Wiki emb ` | | 0.9820 | -- | 8.5 Gb | -+ + + +----------------------------------------------------------------------------------------------------+ +------------------+-----------------+-----------+ -| | | | :config:`Tfidf + SelectKBest + PCA + Wiki emb ` | | 0.9673 | -- | 8.6 Gb | -+ + + +----------------------------------------------------------------------------------------------------+ +------------------+-----------------+-----------+ -| | | | :config:`Wiki emb weighted by Tfidf ` | | 0.9786 | -- | 8.5 Gb | +| Insult detection | `Insults`_ | En | :config:`English BERT` | ROC-AUC | 0.9327 | 0.8602 | 1.1 Gb | +------------------+---------------------+ +----------------------------------------------------------------------------------------------------+-------------+------------------+-----------------+-----------+ -| Insult detection | `Insults`_ | | :config:`Reddit emb ` | ROC-AUC | 0.9263 | 0.8556 | 6.2 Gb | -+ + + +----------------------------------------------------------------------------------------------------+ +------------------+-----------------+-----------+ -| | | | :config:`English BERT ` | | 0.9255 | 0.8612 | 1200 Mb | -+ + + +----------------------------------------------------------------------------------------------------+ +------------------+-----------------+-----------+ -| | | | :config:`English Conversational BERT ` | | 0.9389 | 0.8941 | 1200 Mb | -+ + + +----------------------------------------------------------------------------------------------------+ +------------------+-----------------+-----------+ -| | | | :config:`English BERT on PyTorch ` | | 0.9329 | 0.877 | 1.1 Gb | -+------------------+---------------------+ +----------------------------------------------------------------------------------------------------+-------------+------------------+-----------------+-----------+ -| 5 topics | `AG News`_ | | :config:`Wiki emb ` | Accuracy | 0.8922 | 0.9059 | 8.5 Gb | -+------------------+---------------------+ +----------------------------------------------------------------------------------------------------+-------------+------------------+-----------------+-----------+ -| Intent | `Yahoo-L31`_ | | :config:`Yahoo-L31 on conversational BERT ` | ROC-AUC | 0.9436 | -- | 1200 Mb | -+------------------+---------------------+ +----------------------------------------------------------------------------------------------------+-------------+------------------+-----------------+-----------+ -| Sentiment | `SST`_ | | :config:`5-classes SST on conversational BERT ` | Accuracy | 0.6456 | 0.6715 | 400 Mb | -+ + + +----------------------------------------------------------------------------------------------------+ +------------------+-----------------+-----------+ -| | | | :config:`5-classes SST on multilingual BERT ` | | 0.5738 | 0.6024 | 660 Mb | -+ + + +----------------------------------------------------------------------------------------------------+ +------------------+-----------------+-----------+ -| | | | :config:`3-classes SST SWCNN on PyTorch ` | | 0.7379 | 0.6312 | 4.3 Mb | -+ +---------------------+ +----------------------------------------------------------------------------------------------------+ +------------------+-----------------+-----------+ -| | `Yelp`_ | | :config:`5-classes Yelp on conversational BERT ` | | 0.6925 | 0.6842 | 400 Mb | -+ + + +----------------------------------------------------------------------------------------------------+ +------------------+-----------------+-----------+ -| | | | :config:`5-classes Yelp on multilingual BERT ` | | 0.5896 | 0.5874 | 660 Mb | +| Sentiment | `SST`_ | | :config:`5-classes SST on conversational BERT ` | Accuracy | 0.6293 | 0.6626 | 1.1 Gb | +------------------+---------------------+------+----------------------------------------------------------------------------------------------------+-------------+------------------+-----------------+-----------+ -| Sentiment | `Twitter mokoron`_ | Ru | :config:`RuWiki+Lenta emb w/o preprocessing ` | | 0.9965 | 0.9961 | 6.2 Gb | -+ + + +----------------------------------------------------------------------------------------------------+ +------------------+-----------------+-----------+ -| | | | :config:`RuWiki+Lenta emb with preprocessing ` | | 0.7823 | 0.7759 | 6.2 Gb | +| Sentiment | `Twitter mokoron`_ | Ru | :config:`RuWiki+Lenta emb w/o preprocessing ` | Accuracy | 0.9918 | 0.9923 | 5.8 Gb | + +---------------------+ +----------------------------------------------------------------------------------------------------+-------------+------------------+-----------------+-----------+ -| | `RuSentiment`_ | | :config:`RuWiki+Lenta emb ` | F1-weighted | 0.6541 | 0.7016 | 6.2 Gb | -+ + + +----------------------------------------------------------------------------------------------------+ +------------------+-----------------+-----------+ -| | | | :config:`Twitter emb super-convergence ` [2]_ | | 0.7301 | 0.7576 | 3.4 Gb | -+ + + +----------------------------------------------------------------------------------------------------+ +------------------+-----------------+-----------+ -| | | | :config:`ELMo ` | | 0.7519 | 0.7875 | 700 Mb | -+ + + +----------------------------------------------------------------------------------------------------+ +------------------+-----------------+-----------+ -| | | | :config:`Multi-language BERT ` | | 0.6809 | 0.7193 | 1900 Mb | +| | `RuSentiment`_ | | :config:`Multi-language BERT ` | F1-weighted | 0.6787 | 0.7005 | 1.3 Gb | + + + +----------------------------------------------------------------------------------------------------+ +------------------+-----------------+-----------+ -| | | | :config:`Conversational RuBERT ` | | 0.7548 | 0.7742 | 657 Mb | +| | | | :config:`Conversational RuBERT ` | | 0.739 | 0.7724 | 1.5 Gb | + + + +----------------------------------------------------------------------------------------------------+ +------------------+-----------------+-----------+ | | | | :config:`Conversational DistilRuBERT-tiny ` | | 0.703 ± 0.0031 | 0.7348 ± 0.0028 | 690 Mb | + + + +----------------------------------------------------------------------------------------------------+ +------------------+-----------------+-----------+ | | | | :config:`Conversational DistilRuBERT-base ` | | 0.7376 ± 0.0045 | 0.7645 ± 0.035 | 1.0 Gb | -+------------------+---------------------+ +----------------------------------------------------------------------------------------------------+-------------+------------------+-----------------+-----------+ -| Intent | Ru like`Yahoo-L31`_ | | :config:`Conversational vs Informational on ELMo ` | ROC-AUC | 0.9412 | -- | 700 Mb | +------------------+---------------------+------+----------------------------------------------------------------------------------------------------+-------------+------------------+-----------------+-----------+ -.. [1] Coucke A. et al. Snips voice platform: an embedded spoken language understanding system for private-by-design voice interfaces //arXiv preprint arXiv:1805.10190. – 2018. -.. [2] Smith L. N., Topin N. Super-convergence: Very fast training of residual networks using large learning rates. – 2018. - .. _`DSTC 2`: http://camdial.org/~mh521/dstc/ .. _`SNIPS-2017`: https://github.com/snipsco/nlu-benchmark/tree/master/2017-06-custom-intent-engines .. _`Insults`: https://www.kaggle.com/c/detecting-insults-in-social-commentary @@ -139,7 +66,6 @@ Several pre-trained models are available and presented in Table below. .. _`Yahoo-L31`: https://webscope.sandbox.yahoo.com/catalog.php?datatype=l .. _`Yahoo-L6`: https://webscope.sandbox.yahoo.com/catalog.php?datatype=l .. _`SST`: https://nlp.stanford.edu/sentiment/index.html -.. _`Yelp`: https://www.yelp.com/dataset As no one had published intent recognition for DSTC-2 data, the comparison of the presented model is given on **SNIPS** dataset. The @@ -192,14 +118,10 @@ on Automatic Spelling Correction for Russian: +-----------------------------------------------------------------------------------------+-----------+--------+-----------+---------------------+ | :config:`Damerau Levenshtein 1 + lm` | 53.26 | 53.74 | 53.50 | 29.3 | +-----------------------------------------------------------------------------------------+-----------+--------+-----------+---------------------+ -| :config:`Brill Moore top 4 + lm` | 51.92 | 53.94 | 52.91 | 0.6 | -+-----------------------------------------------------------------------------------------+-----------+--------+-----------+---------------------+ | Hunspell + lm | 41.03 | 48.89 | 44.61 | 2.1 | +-----------------------------------------------------------------------------------------+-----------+--------+-----------+---------------------+ | JamSpell | 44.57 | 35.69 | 39.64 | 136.2 | +-----------------------------------------------------------------------------------------+-----------+--------+-----------+---------------------+ -| :config:`Brill Moore top 1 ` | 41.29 | 37.26 | 39.17 | 2.4 | -+-----------------------------------------------------------------------------------------+-----------+--------+-----------+---------------------+ | Hunspell | 30.30 | 34.02 | 32.06 | 20.3 | +-----------------------------------------------------------------------------------------+-----------+--------+-----------+---------------------+ @@ -208,48 +130,6 @@ on Automatic Spelling Correction for Russian: Ranking model :doc:`[docs] ` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -The main neural ranking model based on `LSTM-based deep learning models for non-factoid answer selection -`__. The model performs ranking of responses or contexts from some database by their -relevance for the given context. - -There are 3 alternative neural architectures available as well: - -Sequential Matching Network (SMN) - Based on the work `Wu, Yu, et al. "Sequential Matching Network: A New Architecture for Multi-turn Response Selection in Retrieval-based Chatbots". ACL. 2017. `__ - -Deep Attention Matching Network (DAM) - Based on the work `Xiangyang Zhou, et al. "Multi-Turn Response Selection for Chatbots with Deep Attention Matching Network". Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2018 `__ - -Deep Attention Matching Network + Universal Sentence Encoder v3 (DAM-USE-T) - Our new proposed architecture based on the works: `Xiangyang Zhou, et al. "Multi-Turn Response Selection for Chatbots with Deep Attention Matching Network". Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2018 `__ - and `Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Brian Strope, Ray Kurzweil. 2018a. Universal Sentence Encoder for English. `__ - - -Available pre-trained models for ranking: - -.. table:: - :widths: auto - - +-------------------+----------------------------------------------------------------------------------------------------------------------+-----------+-----------------------------------+ - | Dataset | Model config | Val | Test | - | | +-----------+-------+-------+-------+-----------+ - | | | R10@1 | R10@1 | R10@2 | R10@5 | Downloads | - +===================+======================================================================================================================+===========+=======+=======+=======+===========+ - | `Ubuntu V2`_ | :config:`ranking_ubuntu_v2_mt_word2vec_dam_transformer ` | 74.32 | 74.46 | 86.77 | 97.38 | 2457 MB | - +-------------------+----------------------------------------------------------------------------------------------------------------------+-----------+-------+-------+-------+-----------+ - | `Ubuntu V2`_ | :config:`ranking_ubuntu_v2_mt_word2vec_smn ` | 68.56 | 67.91 | 81.49 | 95.63 | 1609 MB | - +-------------------+----------------------------------------------------------------------------------------------------------------------+-----------+-------+-------+-------+-----------+ - | `Ubuntu V2`_ | :config:`ranking_ubuntu_v2_bert_uncased ` | 66.5 | 66.6 | -- | -- | 396 MB | - +-------------------+----------------------------------------------------------------------------------------------------------------------+-----------+-------+-------+-------+-----------+ - | `Ubuntu V2`_ | :config:`ranking_ubuntu_v2_bert_uncased on PyTorch ` | 65.73 | 65.74 | -- | -- | 1.1 Gb | - +-------------------+----------------------------------------------------------------------------------------------------------------------+-----------+-------+-------+-------+-----------+ - | `Ubuntu V2`_ | :config:`ranking_ubuntu_v2_bert_sep ` | 66.5 | 66.5 | -- | -- | 396 MB | - +-------------------+----------------------------------------------------------------------------------------------------------------------+-----------+-------+-------+-------+-----------+ - | `Ubuntu V2`_ | :config:`ranking_ubuntu_v2_mt_interact ` | 59.2 | 58.7 | -- | -- | 8906 MB | - +-------------------+----------------------------------------------------------------------------------------------------------------------+-----------+-------+-------+-------+-----------+ - -.. _`Ubuntu V2`: https://github.com/rkadlec/ubuntu-ranking-dataset-creator - Available pre-trained models for paraphrase identification: .. table:: @@ -258,11 +138,7 @@ Available pre-trained models for paraphrase identification: +------------------------+------------------------------------------------------------------------------------------------------+----------------+-----------------+------------+------------+----------------+-----------------+-----------+ | Dataset | Model config | Val (accuracy) | Test (accuracy) | Val (F1) | Test (F1) | Val (log_loss) | Test (log_loss) | Downloads | +========================+======================================================================================================+================+=================+============+============+================+=================+===========+ - | `paraphraser.ru`_ | :config:`paraphrase_ident_paraphraser_ft ` | 83.8 | 75.4 | 87.9 | 80.9 | 0.468 | 0.616 | 5938M | - +------------------------+------------------------------------------------------------------------------------------------------+----------------+-----------------+------------+------------+----------------+-----------------+-----------+ - | `paraphraser.ru`_ | :config:`paraphrase_bert_multilingual ` | 87.4 | 79.3 | 90.2 | 83.4 | -- | -- | 1330M | - +------------------------+------------------------------------------------------------------------------------------------------+----------------+-----------------+------------+------------+----------------+-----------------+-----------+ - | `paraphraser.ru`_ | :config:`paraphrase_rubert ` | 90.2 | 84.9 | 92.3 | 87.9 | -- | -- | 1325M | + | `paraphraser.ru`_ | :config:`paraphrase_rubert ` | 89.8 | 84.2 | 92.2 | 87.4 | -- | -- | 1325M | +------------------------+------------------------------------------------------------------------------------------------------+----------------+-----------------+------------+------------+----------------+-----------------+-----------+ | `paraphraser.ru`_ | :config:`paraphraser_convers_distilrubert_2L ` | 76.1 ± 0.2 | 64.5 ± 0.5 | 81.8 ± 0.2 | 73.9 ± 0.8 | -- | -- | 618M | +------------------------+------------------------------------------------------------------------------------------------------+----------------+-----------------+------------+------------+----------------+-----------------+-----------+ @@ -272,27 +148,6 @@ Available pre-trained models for paraphrase identification: .. _`paraphraser.ru`: https://paraphraser.ru/ -Comparison with other models on the `Ubuntu Dialogue Corpus v2 `__ (test): - -+---------------------------------------------------------------------------------------------------------------------------------------------+-----------+-----------+-----------+ -| Model | R@1 | R@2 | R@5 | -+=============================================================================================================================================+===========+===========+===========+ -| SMN last [`Wu et al., 2017 `_] | -- | -- | -- | -+---------------------------------------------------------------------------------------------------------------------------------------------+-----------+-----------+-----------+ -| SMN last [DeepPavlov :config:`ranking_ubuntu_v2_mt_word2vec_smn `] | 0.6791 | 0.8149 | 0.9563 | -+---------------------------------------------------------------------------------------------------------------------------------------------+-----------+-----------+-----------+ -| DAM [`Zhou et al., 2018 `_] | -- | -- | -- | -+---------------------------------------------------------------------------------------------------------------------------------------------+-----------+-----------+-----------+ -| MRFN-FLS [`Tao et al., 2019 `_] | -- | -- | -- | -+---------------------------------------------------------------------------------------------------------------------------------------------+-----------+-----------+-----------+ -| IMN [`Gu et al., 2019 `_] | 0.771 | 0.886 | 0.979 | -+---------------------------------------------------------------------------------------------------------------------------------------------+-----------+-----------+-----------+ -| IMN Ensemble [`Gu et al., 2019 `_] | **0.791** | **0.899** | **0.982** | -+---------------------------------------------------------------------------------------------------------------------------------------------+-----------+-----------+-----------+ -| DAM-USE-T [DeepPavlov :config:`ranking_ubuntu_v2_mt_word2vec_dam_transformer `] | 0.7446 | 0.8677 | 0.9738 | -+---------------------------------------------------------------------------------------------------------------------------------------------+-----------+-----------+-----------+ - - References: * Yu Wu, Wei Wu, Ming Zhou, and Zhoujun Li. 2017. Sequential match network: A new architecture for multi-turn response selection in retrieval-based chatbots. In ACL, pages 372–381. https://www.aclweb.org/anthology/P17-1046 @@ -328,117 +183,29 @@ position in a given context. BERT-based model is described in `BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding `__. -R-Net model is based on `R-NET: Machine Reading Comprehension with Self-matching Networks -`__. - -+----------------+---------------------------------------------------------------------------------------------+-------+----------------+-----------------+-----------------+ -| Dataset | Model config | lang | EM (dev) | F-1 (dev) | Downloads | -+================+=============================================================================================+=======+================+=================+=================+ -| `SQuAD-v1.1`_ | :config:`DeepPavlov BERT ` | en | 80.88 | 88.49 | 806Mb | -+----------------+---------------------------------------------------------------------------------------------+-------+----------------+-----------------+-----------------+ -| `SQuAD-v1.1`_ | :config:`DeepPavlov BERT on PyTorch ` | en | 80.79 | 88.30 | 1.1 Gb | -+----------------+---------------------------------------------------------------------------------------------+-------+----------------+-----------------+-----------------+ -| `SQuAD-v1.1`_ | :config:`DeepPavlov R-Net ` | en | 71.49 | 80.34 | ~2.5Gb | -+----------------+---------------------------------------------------------------------------------------------+-------+----------------+-----------------+-----------------+ -| `SDSJ Task B`_ | :config:`DeepPavlov RuBERT ` | ru | 66.30 ± 0.24 | 84.60 ± 0.11 | 1325Mb | -+----------------+---------------------------------------------------------------------------------------------+-------+----------------+-----------------+-----------------+ -| `SDSJ Task B`_ | :config:`DeepPavlov multilingual BERT ` | ru | 64.35 ± 0.39 | 83.39 ± 0.08 | 1323Mb | -+----------------+---------------------------------------------------------------------------------------------+-------+----------------+-----------------+-----------------+ -| `SDSJ Task B`_ | :config:`DeepPavlov R-Net ` | ru | 60.62 | 80.04 | ~5Gb | -+----------------+---------------------------------------------------------------------------------------------+-------+----------------+-----------------+-----------------+ -| `SDSJ Task B`_ | :config:`DeepPavlov DistilRuBERT-tiny ` | ru | 44.2 ± 0.46 | 65.1 ± 0.36 | 867Mb | -+----------------+---------------------------------------------------------------------------------------------+-------+----------------+-----------------+-----------------+ -| `SDSJ Task B`_ | :config:`DeepPavlov DistilRuBERT-base ` | ru | 61.23 ± 0.42 | 80.36 ± 0.28 | 1.18Gb | -+----------------+---------------------------------------------------------------------------------------------+-------+----------------+-----------------+-----------------+ -| `DRCD`_ | :config:`DeepPavlov multilingual BERT ` | ch | 84.86 | 89.03 | 630Mb | -+----------------+---------------------------------------------------------------------------------------------+-------+----------------+-----------------+-----------------+ -| `DRCD`_ | :config:`DeepPavlov Chinese BERT ` | ch | 84.19 | 89.23 | 362Mb | -+----------------+---------------------------------------------------------------------------------------------+-------+----------------+-----------------+-----------------+ - -In the case when answer is not necessary present in given context we have :config:`squad_noans ` +RuBERT-based model is described in `Adaptation of Deep Bidirectional Multilingual Transformers for Russian Language +`__. + ++----------------+---------------------------------------------------------------------------------------------------------------+-------+----------------+-----------------+-----------------+ +| Dataset | Model config | lang | EM (dev) | F-1 (dev) | Downloads | ++================+===============================================================================================================+=======+================+=================+=================+ +| `SQuAD-v1.1`_ | :config:`DeepPavlov BERT ` | en | 81.49 | 88.86 | 1.2 Gb | ++----------------+---------------------------------------------------------------------------------------------------------------+-------+----------------+-----------------+-----------------+ +| `SQuAD-v2.0`_ | :config:`DeepPavlov BERT ` | en | 75.71 | 80.72 | 1.2 Gb | ++----------------+---------------------------------------------------------------------------------------------------------------+-------+----------------+-----------------+-----------------+ +| `SDSJ Task B`_ | :config:`DeepPavlov RuBERT ` | ru | 66.21 | 84.71 | 1.7 Mb | ++----------------+---------------------------------------------------------------------------------------------------------------+-------+----------------+-----------------+-----------------+ +| `SDSJ Task B`_ | :config:`DeepPavlov RuBERT, trained with tfidf-retrieved negative samples ` | ru | 66.24 | 84.71 | 1.6 Gb | ++----------------+---------------------------------------------------------------------------------------------------------------+-------+----------------+-----------------+-----------------+ +| `SDSJ Task B`_ | :config:`DeepPavlov DistilRuBERT-tiny ` | ru | 44.2 ± 0.46 | 65.1 ± 0.36 | 867Mb | ++----------------+---------------------------------------------------------------------------------------------------------------+-------+----------------+-----------------+-----------------+ +| `SDSJ Task B`_ | :config:`DeepPavlov DistilRuBERT-base ` | ru | 61.23 ± 0.42 | 80.36 ± 0.28 | 1.18Gb | ++----------------+---------------------------------------------------------------------------------------------------------------+-------+----------------+-----------------+-----------------+ + +In the case when answer is not necessary present in given context we have :config:`qa_squad2_bert ` model. This model outputs empty string in case if there is no answer in context. -Morphological tagging model :doc:`[docs] ` -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -We have a BERT-based model for Russian and character-based models for 11 languages. -The character model is based on `Heigold et al., 2017. An extensive empirical evaluation of -character-based morphological tagging for 14 languages `__. -It is a state-of-the-art model for Russian and near state of the art for several other languages. -Model takes as input tokenized sentences and outputs the corresponding -sequence of morphological labels in `UD format `__. -The table below contains word and sentence accuracy on UD2.0 datasets. -For more scores see :doc:`full table `. - -.. table:: - :widths: auto - - +----------------------+--------------------------------------------------------------------------------------------------------------+---------------+----------------+--------------------+ - | Dataset | Model | Word accuracy | Sent. accuracy | Download size (MB) | - +======================+==============================================================================================================+===============+================+====================+ - | `UD2.3`_ (Russian) | `UD Pipe 2.3`_ (Straka et al., 2017) | 93.5 | | | - | +--------------------------------------------------------------------------------------------------------------+---------------+----------------+--------------------+ - | | `UD Pipe Future`_ (Straka et al., 2018) | 96.90 | | | - | +--------------------------------------------------------------------------------------------------------------+---------------+----------------+--------------------+ - | | :config:`BERT-based model ` | 97.83 | 72.02 | 661 | - +----------------------+--------------------------------------------------------------------------------------------------------------+---------------+----------------+--------------------+ - | | `Pymorphy`_ + `russian_tagsets`_ (first tag) | 60.93 | 0.00 | | - + +--------------------------------------------------------------------------------------------------------------+---------------+----------------+--------------------+ - | `UD2.0`_ (Russian) | `UD Pipe 1.2`_ (Straka et al., 2017) | 93.57 | 43.04 | | - + +--------------------------------------------------------------------------------------------------------------+---------------+----------------+--------------------+ - | | :config:`Basic model ` | 95.17 | 50.58 | 48.7 | - + +--------------------------------------------------------------------------------------------------------------+---------------+----------------+--------------------+ - | | :config:`Pymorphy-enhanced model ` | **96.23** | 58.00 | 48.7 | - +----------------------+--------------------------------------------------------------------------------------------------------------+---------------+----------------+--------------------+ - | `UD2.0`_ (Czech) | `UD Pipe 1.2`_ (Straka et al., 2017) | 91.86 | 42.28 | | - | +--------------------------------------------------------------------------------------------------------------+---------------+----------------+--------------------+ - | | :config:`Basic model ` | **94.35** | 51.56 | 41.8 | - +----------------------+--------------------------------------------------------------------------------------------------------------+---------------+----------------+--------------------+ - | `UD2.0`_ (English) | `UD Pipe 1.2`_ (Straka et al., 2017) | 92.89 | 55.75 | | - | +--------------------------------------------------------------------------------------------------------------+---------------+----------------+--------------------+ - | | :config:`Basic model ` | **93.00** | 55.18 | 16.9 | - +----------------------+--------------------------------------------------------------------------------------------------------------+---------------+----------------+--------------------+ - | `UD2.0`_ (German) | `UD Pipe 1.2`_ (Straka et al., 2017) | 76.65 | 10.24 | | - | +--------------------------------------------------------------------------------------------------------------+---------------+----------------+--------------------+ - | | :config:`Basic model ` | **83.83** | 15.25 | 18.6 | - +----------------------+--------------------------------------------------------------------------------------------------------------+---------------+----------------+--------------------+ - -.. _`Pymorphy`: https://pymorphy2.readthedocs.io/en/latest/ -.. _`russian_tagsets`: https://github.com/kmike/russian-tagsets -.. _`UD2.0`: https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-1983 -.. _`UD2.3`: http://hdl.handle.net/11234/1-2895 -.. _`UD Pipe 1.2`: http://ufal.mff.cuni.cz/udpipe -.. _`UD Pipe 2.3`: http://ufal.mff.cuni.cz/udpipe -.. _`UD Pipe Future`: https://github.com/CoNLL-UD-2018/UDPipe-Future - -Syntactic parsing model :doc:`[docs] ` -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -We have a biaffine model for syntactic parsing based on RuBERT. -It achieves the highest known labeled attachments score of 93.7% -on ``ru_syntagrus`` Russian corpus (version UD 2.3). - -.. table:: - :widths: auto - - +-------------------------+-------------------------------------------------------------------------------------------+---------+----------+ - | Dataset | Model | UAS | LAS | - +=========================+===========================================================================================+=========+==========+ - | `UD2.3`_ (Russian) | `UD Pipe 2.3`_ (Straka et al., 2017) | 90.3 | 89.0 | - | +-------------------------------------------------------------------------------------------+---------+----------+ - | | `UD Pipe Future`_ (Straka, 2018) | 93.0 | 91.5 | - | +-------------------------------------------------------------------------------------------+---------+----------+ - | | `UDify (multilingual BERT)`_ (Kondratyuk, 2018) | 94.8 | 93.1 | - | +-------------------------------------------------------------------------------------------+---------+----------+ - | | :config:`our BERT model ` | 95.2 | 93.7 | - +-------------------------+-------------------------------------------------------------------------------------------+---------+----------+ - -.. _`UD2.3`: http://hdl.handle.net/11234/1-2895 -.. _`UD Pipe 2.3`: http://ufal.mff.cuni.cz/udpipe -.. _`UD Pipe Future`: https://github.com/CoNLL-UD-2018/UDPipe-Future -.. _`UDify (multilingual BERT)`: https://github.com/hyperparticle/udify - Frequently Asked Questions (FAQ) model :doc:`[docs] ` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -449,36 +216,6 @@ You can build different pipelines based on: tf-idf, weighted fasttext, cosine si Skills ------ -Goal-oriented bot :doc:`[docs] ` -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Based on Hybrid Code Networks (HCNs) architecture from `Jason D. Williams, Kavosh Asadi, -Geoffrey Zweig, Hybrid Code Networks: practical and efficient end-to-end dialog control -with supervised and reinforcement learning – 2017 `__. -It allows to predict responses in a goal-oriented dialog. The model is -customizable: embeddings, slot filler and intent classifier can be switched on and off on demand. - -Available pre-trained models and their comparison with existing benchmarks: - -+-----------------------------------+------+------------------------------------------------------------------------------------+---------------+-----------+---------------+ -| Dataset | Lang | Model | Metric | Test | Downloads | -+===================================+======+====================================================================================+===============+===========+===============+ -| `DSTC 2`_ | En | :config:`basic bot ` | Turn Accuracy | 0.380 | 10 Mb | -+ (:ref:`modified `) + +------------------------------------------------------------------------------------+ +-----------+---------------+ -| | | :config:`bot with slot filler ` | | 0.542 | 400 Mb | -+ + +------------------------------------------------------------------------------------+ +-----------+---------------+ -| | | :config:`bot with slot filler, intents & attention ` | | **0.553** | 8.5 Gb | -+-----------------------------------+ +------------------------------------------------------------------------------------+ +-----------+---------------+ -| `DSTC 2`_ | | Bordes and Weston (2016) | | 0.411 | -- | -+ + +------------------------------------------------------------------------------------+ +-----------+---------------+ -| | | Eric and Manning (2017) | | 0.480 | -- | -+ + +------------------------------------------------------------------------------------+ +-----------+---------------+ -| | | Perez and Liu (2016) | | 0.487 | -- | -+ + +------------------------------------------------------------------------------------+ +-----------+---------------+ -| | | Williams et al. (2017) | | **0.556** | -- | -+-----------------------------------+------+------------------------------------------------------------------------------------+---------------+-----------+---------------+ - - ODQA :doc:`[docs] ` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -489,13 +226,9 @@ based on its Wikipedia knowledge. +----------------+--------------------------------------------------------------------+-----------------------+--------+-----------+ | Dataset | Model config | Wiki dump | F1 | Downloads | +================+====================================================================+=======================+========+===========+ -| `SQuAD-v1.1`_ | :config:`ODQA ` | enwiki (2018-02-11) | 35.89 | 9.7Gb | -+----------------+--------------------------------------------------------------------+-----------------------+--------+-----------+ -| `SQuAD-v1.1`_ | :config:`ODQA ` | enwiki (2016-12-21) | 37.83 | 9.3Gb | +| `SQuAD-v1.1`_ | :config:`ODQA ` | enwiki (2018-02-11) | 46.24 | 9.7Gb | +----------------+--------------------------------------------------------------------+-----------------------+--------+-----------+ -| `SDSJ Task B`_ | :config:`ODQA ` | ruwiki (2018-04-01) | 28.56 | 7.7Gb | -+----------------+--------------------------------------------------------------------+-----------------------+--------+-----------+ -| `SDSJ Task B`_ | :config:`ODQA with RuBERT ` | ruwiki (2018-04-01) | 37.83 | 4.3Gb | +| `SDSJ Task B`_ | :config:`ODQA with RuBERT ` | ruwiki (2018-04-01) | 37.83 | 4.3Gb | +----------------+--------------------------------------------------------------------+-----------------------+--------+-----------+ @@ -522,53 +255,25 @@ Word vectors for the Russian language trained on joint `Russian Wikipedia - -- Run goal-oriented bot with console interface: +- Run insults detection model with console interface: .. code-block:: bash - python -m deeppavlov interact gobot_dstc2 -d + python -m deeppavlov interact insults_kaggle_bert -d -- Run goal-oriented bot with REST API: +- Run insults detection model with REST API: .. code-block:: bash - python -m deeppavlov riseapi gobot_dstc2 -d + python -m deeppavlov riseapi insults_kaggle_bert -d -- Run slot-filling model with Telegram interface: +- Predict whether it is an insult on every line in a file: .. code-block:: bash - python -m deeppavlov telegram slotfill_dstc2 -d -t - -- Run slot-filling model with console interface: - - .. code-block:: bash - - python -m deeppavlov interact slotfill_dstc2 -d - -- Run slot-filling model with REST API: - - .. code-block:: bash - - python -m deeppavlov riseapi slotfill_dstc2 -d - -- Predict intents on every line in a file: - - .. code-block:: bash - - python -m deeppavlov predict intents_snips -d --batch-size 15 < /data/in.txt > /data/out.txt - - -View `video demo `__ of deployment of a -goal-oriented bot and a slot-filling model with Telegram UI. + python -m deeppavlov predict insults_kaggle_bert -d --batch-size 15 < /data/in.txt > /data/out.txt .. _`SQuAD-v1.1`: https://arxiv.org/abs/1606.05250 +.. _`SQuAD-v2.0`: https://arxiv.org/abs/1806.03822 .. _`SDSJ Task B`: https://arxiv.org/abs/1912.09723 -.. _`DRCD`: https://arxiv.org/abs/1806.00920 diff --git a/docs/features/pretrained_vectors.rst b/docs/features/pretrained_vectors.rst index ee8d6d01e0..63a72b5a58 100644 --- a/docs/features/pretrained_vectors.rst +++ b/docs/features/pretrained_vectors.rst @@ -28,58 +28,72 @@ The ``TensorFlow`` models can be run with the original `BERT repo `__ library. The download links are: -+----------------------------+---------------------------------------+--------------------------------------------------------------------------------------------------------------------+ -| Description | Model parameters | Download links | -+============================+=======================================+====================================================================================================================+ -| RuBERT | vocab size = 120K, parameters = 180M, | `[tensorflow] `__, | -| | size = 632MB | `[pytorch] `__ | -+----------------------------+---------------------------------------+--------------------------------------------------------------------------------------------------------------------+ -| Slavic BERT | vocab size = 120K, parameters = 180M, | `[tensorflow] `__, | -| | size = 632MB | `[pytorch] `__ | -+----------------------------+---------------------------------------+--------------------------------------------------------------------------------------------------------------------+ -| Conversational BERT | vocab size = 30K, parameters = 110M, | `[tensorflow] `__, | -| | size = 385MB | `[pytorch] `__ | -+----------------------------+---------------------------------------+--------------------------------------------------------------------------------------------------------------------+ -| Conversational RuBERT | vocab size = 120K, parameters = 180M, | `[tensorflow] `__, | -| | size = 630MB | `[pytorch] `__ | -+----------------------------+---------------------------------------+--------------------------------------------------------------------------------------------------------------------+ -| Sentence Multilingual BERT | vocab size = 120K, parameters = 180M, | `[tensorflow] `__, | -| | size = 630MB | `[pytorch] `__ | -+----------------------------+---------------------------------------+--------------------------------------------------------------------------------------------------------------------+ -| Sentence RuBERT | vocab size = 120K, parameters = 180M, | `[tensorflow] `__, | -| | size = 630MB | `[pytorch] `__ | -+----------------------------+---------------------------------------+--------------------------------------------------------------------------------------------------------------------+ ++----------------------------+---------------------------------------+----------------------------------------------------------------------------------------------------------------------+ +| Description | Model parameters | Download links | ++============================+=======================================+======================================================================================================================+ +| RuBERT | vocab size = 120K, parameters = 180M, | `[pytorch] `__, | +| | size = 632MB | `[tensorflow] `__ | ++----------------------------+---------------------------------------+----------------------------------------------------------------------------------------------------------------------+ +| Slavic BERT | vocab size = 120K, parameters = 180M, | `[pytorch] `__, | +| | size = 632MB | `[tensorflow] `__ | ++----------------------------+---------------------------------------+----------------------------------------------------------------------------------------------------------------------+ +| Conversational BERT | vocab size = 30K, parameters = 110M, | `[pytorch] `__, | +| | size = 385MB | `[tensorflow] `__ | ++----------------------------+---------------------------------------+----------------------------------------------------------------------------------------------------------------------+ +| Conversational RuBERT | vocab size = 120K, parameters = 180M, | `[pytorch] `__,| +| | size = 630MB | `[tensorflow] `__ | ++----------------------------+---------------------------------------+----------------------------------------------------------------------------------------------------------------------+ +| Sentence Multilingual BERT | vocab size = 120K, parameters = 180M, | `[pytorch] `__, | +| | size = 630MB | `[tensorflow] `__ | ++----------------------------+---------------------------------------+----------------------------------------------------------------------------------------------------------------------+ +| Sentence RuBERT | vocab size = 120K, parameters = 180M, | `[pytorch] `__, | +| | size = 630MB | `[tensorflow] `__ | ++----------------------------+---------------------------------------+----------------------------------------------------------------------------------------------------------------------+ ELMo ---- -| We are publishing :class:`Russian language ELMo embeddings model ` for tensorflow-hub and :class:`LM model ` for training and fine-tuning ELMo as LM model. -| ELMo (Embeddings from Language Models) representations are pre-trained contextual representations from - large-scale bidirectional language models. See a paper `Deep contextualized word representations - `__ for more information about the algorithm and a detailed analysis. +The ELMo can used via Python code as following: -License -~~~~~~~ +.. code:: python + + import tensorflow as tf + import tensorflow_hub as hub + elmo = hub.Module("http://files.deeppavlov.ai/deeppavlov_data/elmo_ru-news_wmt11-16_1.5M_steps.tar.gz", trainable=True) + sess = tf.Session() + sess.run(tf.global_variables_initializer()) + embeddings = elmo(["это предложение", "word"], signature="default", as_dict=True)["elmo"] + sess.run(embeddings) + + +TensorFlow Hub module also supports tokenized sentences in the following format. + +.. code:: python + + tokens_input = [["мама", "мыла", "раму"], ["рама", "", ""]] + tokens_length = [3, 1] + embeddings = elmo(inputs={"tokens": tokens_input,"sequence_len": tokens_length},signature="tokens",as_dict=True)["elmo"] + sess.run(embeddings) -The pre-trained models are distributed under the `License Apache -2.0 `__. Downloads ~~~~~~~~~ -The models can be downloaded and run by configuration file or tensorflow hub module from: +The models can be downloaded and run by tensorflow hub module from: + +--------------------------------------------------------------------+---------------------------------------------+------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| Description | Dataset parameters | Perplexity | Configuration file and tensorflow hub module | +| Description | Dataset parameters | Perplexity | Tensorflow hub module | +====================================================================+=============================================+==================+=======================================================================================================================================================================================================================================+ -| ELMo on `Russian Wikipedia `__ | lines = 1M, tokens = 386M, size = 5GB | 43.692 | `config_file `__, `module_spec `__ | +| ELMo on `Russian Wikipedia `__ | lines = 1M, tokens = 386M, size = 5GB | 43.692 | `module_spec `__ | +--------------------------------------------------------------------+---------------------------------------------+------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| ELMo on `Russian WMT News `__ | lines = 63M, tokens = 946M, size = 12GB | 49.876 | `config_file `__, `module_spec `__ | +| ELMo on `Russian WMT News `__ | lines = 63M, tokens = 946M, size = 12GB | 49.876 | `module_spec `__ | +--------------------------------------------------------------------+---------------------------------------------+------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| ELMo on `Russian Twitter `__ | lines = 104M, tokens = 810M, size = 8.5GB | 94.145 | `config_file `__, `module_spec `__ | +| ELMo on `Russian Twitter `__ | lines = 104M, tokens = 810M, size = 8.5GB | 94.145 | `module_spec `__ | +--------------------------------------------------------------------+---------------------------------------------+------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + fastText -------- @@ -94,8 +108,7 @@ All vectors are 300-dimensional. We used fastText skip-gram (see `Bojanowski et al. (2016) `__) for vectors training as well as various preprocessing options (see below). -You can get vectors either in binary or in text (vec) formats both for -fastText and GloVe. +You can get vectors either in binary or in text (vec) formats for FastText. License ~~~~~~~ diff --git a/docs/features/skills/aiml_skill.rst b/docs/features/skills/aiml_skill.rst deleted file mode 100644 index ac385c7ea9..0000000000 --- a/docs/features/skills/aiml_skill.rst +++ /dev/null @@ -1,44 +0,0 @@ -AIML Skill -====================== - -An :doc:`AIML scripts wrapper implementation` that reads a folder with AIML scripts -(provided by `path_to_aiml_scripts` argument), loads it into AIML's Kernel and responds for incoming utterances -accroding to patterns learned by AIML Kernel. - -For the case when AIML kernel matched utterance and found response the AIML Wrapper outputs response with confidence -value (as specified by`positive_confidence` argument). - -For the case when no match occured the wrapper returns the argument `null_response` as utterance and sets confidence to -`null_confidence` attribute. - - -Quick Start ------------ -To setup AIML Skill you need load your AIML scripts to some folder and specify path to it with initilization -parameter `path_to_aiml_scripts`. - -You can download bunch of free and ready for use AIML scripts from pandorabots repo: -https://github.com/pandorabots/Free-AIML - -DeepPavlov library has default config for AIMLSkill here: :config:`configs/skills/aiml_skill.json ` - -Usage -^^^^^^^^ - -.. code:: python - - from deeppavlov.skills.aiml_skill import AIMLSkill - - aiml_skill_config = { - 'positive_confidence': 0.66, - 'path_to_aiml_scripts': , - 'null_response': "I don't know what to answer you", - 'null_confidence': 0.33 - } - - aiml_skill = AIMLSkill(**aiml_skill_config) - - states_batch = None - for utterance in ["Hello", "Hello to the same user_id"]: - responses_batch, confidences_batch, states_batch = aiml_skill([utterance], states_batch) - print(responses_batch[0]) diff --git a/docs/features/skills/dsl_skill.rst b/docs/features/skills/dsl_skill.rst deleted file mode 100644 index 919b1661a5..0000000000 --- a/docs/features/skills/dsl_skill.rst +++ /dev/null @@ -1,42 +0,0 @@ -DSL Skill -====================== - -A :doc:`DSL implementation`. DSL helps to easily create user-defined skills for dialog systems. - -For the case when DSL skill matched utterance and found response it outputs response with confidence value. - -For the case when no match occurred DSL skill returns the argument `on_invalid_command` ("Простите, я вас не понял" by delault) as utterance and sets confidence to `null_confidence` attribute (0 by default). - -`on_invalid_command` and `null_confidence` can be changed in model config - - -Quick Start ------------ - -DeepPavlov library has default config for DSLSkill here: :config:`configs/skills/dsl_skill.json ` - -Usage -^^^^^^^^ - -.. code:: python - - from deeppavlov import configs, build_model - from deeppavlov.core.common.file import read_json - from deeppavlov.skills.dsl_skill import DSLMeta - - - class DSLSkill(metaclass=DSLMeta): - @DSLMeta.handler(commands=["hello", "hi", "sup", "greetings"]) - def greeting(context): - response = "Hello, my friend!" - confidence = 1.0 - return response, confidence - - - skill_config = read_json(configs.skills.dsl_skill) - - skill = build_model(skill_config, download=True) - utterance = "Hello" - user_id = 1 - response = skill([utterance], [user_id]) - print(response) diff --git a/docs/features/skills/go_bot.rst b/docs/features/skills/go_bot.rst deleted file mode 100644 index e585ab8e55..0000000000 --- a/docs/features/skills/go_bot.rst +++ /dev/null @@ -1,640 +0,0 @@ -Go-Bot Framework -################ - -Overview -******** - -Go-Bot is an ML-driven framework designed to enable development of the goal-oriented skills for DeepPavlov Dream AI Assistant Platform. - -These goal-oriented skills can be written in Python (enabling using their corresponding Go-Bot-trained models natively) or in any other programming language (requiring running their corresponding Go-Bot-trained models as microservices). - -To build a Go-Bot-based goal-oriented skill, you need to provide Go-Bot framework with a dataset (in RASA v1 or DSTC2 formats), train model, download it, and then use it by either calling them natively from Python or by rising them as microservices and then calling them via its standard DeepPavlov REST API. - -Currently, we support two different approaches to define domain model and behavior of a given goal-oriented skill - using either a subset of the v1 of the RASA DSLs (domain.yml, nlu.md, stories.md) or a DSTC2 format. As of the latest release, the following subset of functionality is supported: - -* Intents -* Slots (simple slots requiring custom classifiers for custom data types) -* Stories (w/o 1:1 mapping between intents and responses) -* Templated Responses (w/o variables) -* **Form-Filling** (basic, added in **v0.14 release**) - -In the future, we will expand support for RASA DSLs where appropriate to enable backward compatibility, add integration with the upcoming Intent Catcher component available as part of the DeepPavlov component library, and so on. - -To experiment with the Go-Bot you can follow tutorials for using RASA DSLs, or pick one of the two available pre-trained models designed around the DSTSC2 dataset (English). - -RASA DSLs Format Support -************************ - -Overview -======== -While DSTC-2 schema format is quite rich, preparing this kind of dataset with all required annotations might be challenging. To simplify the process of building goal-oriented bots using DeepPavlov technology, in `v0.12.0 `_ we have introduced a (limited) support for defining them using RASA DSLs. - -.. note:: - DSLs, known as Domain-Specific Languages, provide a rich mechanism to define the behavior, or "the what", while the underlying system uses the parser to transform these definitions into commands that implement this behavior, or "the how" using the system's components. - -RASA.ai is an another well-known Open Source Conversational AI Framework. Their approach to defining the domain model and behavior of the goal-oriented bots is quite simple for building simple goal-oriented bots. In this section you will learn how to use key parts of RASA DSLs (configuration files) to build your own goal-oriented skill based on the DeepPavlov's Go-Bot framework. - - - -While there are several configuration files used by the RASA platform, each with their own corresponding DSL (mostly re-purposed Markdown and YAML), for now only three essential files: ``stories.md``, -``nlu.md``, ``domain.yml`` are supported by the DeepPavlov Go-Bot Framework. - -These files allows you to define user stories that match intents and bot actions, intents with slots and entities, as well as the training data for the NLU components. - -.. note:: - As mentioned in our `blog post `__, **this is the very beginning of our work** focused on supporting RASA DSLs as a way to configure DeepPavlov-based goal-oriented chatbots. - -Currently, only a subset of the functionality in these files is supported by now. - -stories.md -^^^^^^^^^^ - -``stories.md`` is a mechanism used to teach your chatbot how to respond -to user messages. It allows you to control your chatbot's dialog -management. - -The full RASA functionality is described in the `original -documentation `__. - -The format supported by DeepPavlov is the subset of features described -in `"What makes up a story" -section `__. - -The original format features are: *User Messages*, *Actions*, *Events*, -*Checkpoints*, *OR Statements*, *End-to-End Story Evaluation Format*. - -- We **do support** all the functionality of User Messages format - feature. - -- We **do support only** utterance actions of the Actions format - feature. Custom actions are **not supported yet**. - -- We **do partially support** Form Filling (starting with v0.14.0 release). - -- We **do not support** Events, Checkpoints and OR Statements format - features. - -format -"""""" - -see the `original -documentation `__ for the -detailed ``stories.md`` format description. - -Stories file is a markdown file of the following format: - -.. code:: md - - ## story_title (not used by algorithm, but useful to work with for humans) - * user_action_label{"1st_slot_present_in_action": "slot1_value", .., "Nth_slot_present_in_action": "slotN_value"} - - system_respective_utterance - * another_user_action_of_the_same_format - - another_system_response - ... - - ## another_story_title - ... - - ## formfilling dialogue - * greet - - form{"name": "zoo_form"} - - utter_api_call - - -nlu.md -^^^^^^ - -``nlu.md`` represents an NLU model of your chatbot. It allows you to -provide training examples that show how your chatbot should -understand user messages, and then train a model through these -examples. - -We do support the format described in the `Markdown -format `__ -section of the original RASA documentation with the following -limitations: - -- an extended entities annotation format - (``[]{"entity": "", "role": "", ...}``) - is **not supported** -- *synonyms*, *regex features* and *lookup tables* format features are - **not supported** - -format -"""""" - -see the `original -documentation `__ -on the RASA NLU markdown format for the detailed ``nlu.md`` format -description. - -NLU file is a markdown file of the following format: - -.. code:: md - - ## intent:possible_user_action_label_1 - - An example of user text that has the possible_user_action_label_1 action label - - Another example of user text that has the possible_user_action_label_1 action label - ... - - ## intent:possible_user_action_label_N - - An example of user text that has the (possible_user_action_label_N)[action_label] action label - - ... - - -domain.yml -^^^^^^^^^^ - -``domain.yml`` helps you to define the universe your chatbot lives in: -what user inputs it expects to get, what actions it should be able to -predict, -how to respond, and what information to store. - -The format supported by DeepPavlov is the same as the described in the -`original documentation `__ -with the following limitations: - -- only textual slots are allowed -- only slot classes are allowed as entity classes -- only textual response actions are allowed with currently no variables - support - -format -"""""" - -see the `original -documentation `__ on the RASA -Domains YAML config format for the detailed ``domain.yml`` format -description. - -Domain file is a YAML file of the following format: - -.. code:: yaml - - # slots section lists the possible slot names (aka slot types) - # that are used in the domain (i.e. relevant for bot's tasks) - # currently only type: text is supported - slots: - slot1_name: - type: text - ... - slotN_name: - type: text - - # entities list now follows the slots list 2nd level keys - # and is present to support upcoming features. Stay tuned for updates with this! - entities: - - slot1_name - ... - - slotN_name - - # intents section lists the intents that can appear in the stories - # being kept together they do describe the user-side part of go-bot's experience - intents: - - user_action_label - - another_user_action_of_the_same_format - ... - - # responses section lists the system response templates. - # Despite system response' titles being usually informative themselves - # (one could even find them more appropriate when no actual "Natural Language" is needed - # (e.g. for buttons actions in bot apps)) - # It is though extremely useful to be able to serialize the response title to text. - # That's what this section content is needed for. - responses: - system_utterance_1: - - text: "The text that system responds with" - another_system_response: - - text: "Here some text again" - - forms: - zoo_form: - animal: - - type: from_entity - entity: animal - -How Do I: Build Go-Bot Skill with RASA DSLs (v1) -================================================ - -Tutorials -^^^^^^^^^ - -We encourage you to explore the tutorials below to get better understanding of how to build basic and more advanced goal-oriented skills with these RASA DSLs: - -* `Original Tutorial Notebook Featuring Simple and DSTC2-based Skills `_ - -* `Tutorial Notebook Featuring Harvesters Maintenance Go-Bot Skill from Deepy 3000 Demo `_ - - -How Do I: Integrate Go-Bot-based Goal-Oriented Skill into DeepPavlov Deepy -============================================================================ - -To integrate your Go-Bot-based goal-oriented skill into your Multiskill AI Assistant built using DeepPavlov Conversational AI Stack, follow the following instructions: - -1. Clone `Deepy repository `_ -2. Replace ``docker-compose.yml`` in the root of the repository and ``pipeline_conf.json`` in the ``/agent/`` subdirectory with the corresponding files from the `deepy_gobot_base `_ **Deepy Distribution** -3. Clone the second `Tutorial Notebook `_ -4. Change its ``domain.yml``, ``nlu.md``, and ``stories.md`` based on your project needs with your custom **intents**, **slots**, **forms**, and write your own **stories** -5. Train the go-bot model in your copy of the Tutorial Notebook -6. Download and put saved data from your copy of the Tutorial Notebook into the `Harvesters Maintenance Go-Bot Skill `_ -7. [Optional] Unless you need a Chit-Chat skill remove `it `_ from at both the ``/agent/pipeline_conf.json`` and from ``docker-compose.yml`` -8. Use ``docker-compose up --build`` command to build and run your DeepPavlov-based Multiskill AI Assistant - -.. note:: - In the coming version of the DeepPavlov Library we will provide a more comprehensive update to the documentation to further simplify the process of building goal-oriented skills with DeepPavlov Conversational AI technology stack. Stay tuned! - -How Do I: Use Form-Filling in Go-Bot Skill with RASA DSLs (v1) -================================================================ - -Tutorials -^^^^^^^^^ - -Follow this tutorial to experiment with the Form-Filling functionality in Go-Bot-based goal-oriented skills built using RASA DSLs (v1): - -* `Tutorial Notebook Featuring Basic Form-Filling `_ - - -DSTC2 Format Support -********************** - -Overview -========== - -The DeepPavlov Go-Bot Framework is based on [1]_ which introduces -Hybrid Code Networks (HCNs) that combine an RNN with domain-specific -knowledge and system action templates. Originally, the DSTC2 format was used for the dataset to train a Go-Bot-based goal-oriented skills upon. - -|alt text| **Diagram 1.** Dotted lines correspond to unrequired -(optional) modules, black squares to trained models, trapezes are -modules that depend on a dataset and must be provided by software -developer. - -Here is a simple example of interaction with a trained goal-oriented skill -(available for download): - -.. note:: - - Models that rely on fasttext word embeddings will require 8.5 GB of disk space, those that use only one-hot encoding of words will require less than 300Mb of disk space. - -.. code:: bash - - >> Hello, welcome to the Cambridge restaurant system. You can ask for restaurants by area, price range or food type. How may I help you? - x::hi, i want cheap food - 2018-05-16 12:28:48.459 INFO in 'deeppavlov.models.go_bot.bot'['bot'] at line 346: Made api_call with {'pricerange': 'cheap'}, got 22 results. - >> The lucky star serves chinese food in the cheap price range. - x::is there anything with french food? - 2018-05-16 12:29:02.795 INFO in 'deeppavlov.models.go_bot.bot'['bot'] at line 346: Made api_call with {'pricerange': 'cheap', 'food': 'french'}, got 0 results. - >> Sorry there is no french restaurant in the cheap price range. - x::ok, maybe there is something in expensive pricerange? - 2018-05-16 12:29:23.337 INFO in 'deeppavlov.models.go_bot.bot'['bot'] at line 346: Made api_call with {'pricerange': 'expensive', 'food': 'french'}, got 2 results. - >> Cote serves french food in the expensive price range. - x::in what part of town is it? - 2018-05-16 12:29:31.193 INFO in 'deeppavlov.models.go_bot.bot'['bot'] at line 346: Made api_call with {'pricerange': 'expensive', 'food': 'french'}, got 2 results. - >> Restaurant two two is a nice place in the north of town serving tasty french food. - x::ok, give me their phone number - >> The phone number of restaurant two two is 01223 351880. - x::address and post code too - >> Restaurant two two is on 22 chesterton road chesterton, c.b 4, 3 a.x. - x::ok, thank you! - >> Restaurant two two serves french food in the expensive price range. - x::and if i wanted an italian restuarant? - 2018-05-16 12:30:15.98 INFO in 'deeppavlov.models.go_bot.bot'['bot'] at line 346: Made api_call with {'pricerange': 'expensive', 'food': 'italian'}, got 4 results. - >> Frankie and bennys serves italian food in the expensive price range. - x::more cheaply priced? - 2018-05-16 12:30:36.560 INFO in 'deeppavlov.models.go_bot.bot'['bot'] at line 346: Made api_call with {'pricerange': 'cheap', 'food': 'italian'}, got 5 results. - >> Da vinci pizzeria serves italian food in the cheap price range. - x::ok, give me their phone number and address, please - >> The phone number of da vinci pizzeria is 01223 351707 and it is on 20 milton road chesterton. - x::thanks - >> Da vinci pizzeria serves italian food in the cheap price range. - x::bye - >> You are welcome! - - -Quick Demo -============ - -To quickly try out the Go-Bot capabilities you can use one of the two available pretrained models for DSTC2 dataset (English). Check them out by running this code: - -.. code:: python - - from deeppavlov import build_model, configs - - bot1 = build_model(configs.go_bot.gobot_dstc2, download=True) - - bot1(['hi, i want restaurant in the cheap pricerange']) - bot1(['bye']) - - bot2 = build_model(configs.go_bot.gobot_dstc2_best, download=True) - - bot2(['hi, i want chinese restaurant']) - bot2(['bye']) - -If some required packages are missing, install all the requirements by running in command line: - -.. code:: bash - - python -m deeppavlov install gobot_dstc2 - -How Do I: Build Go-Bot with DSTC2 -=================================== -DSTC is a set of competitions originally known as "Dialog State Tracking Challenges" (DSTC, for short). First challenge -was organized in 2012-2013. Starting as an initiative to provide a common testbed for the task of Dialog State Tracking, -the first Dialog State Tracking Challenge (DSTC) was organized in 2013, followed by DSTC2&3 in 2014, DSTC4 in 2015, -and DSTC5 in 2016. Given the remarkable success of the first five editions, and understanding both, the complexity -of the dialog phenomenon and the interest of the research community in a wider variety of dialog related problems, -the DSTC rebranded itself as "Dialog System Technology Challenges" for its sixth edition. Then, DSTC6 and DSTC7 have -been completed in 2017 and 2018, respectively. - -DSTC-2 released a large number of training dialogs related to restaurant search. Compared to DSTC (which was in the bus -timetables domain), DSTC 2 introduced changing user goals, tracking 'requested slots' as well as the new Restaurants domain. - -Historically, DeepPavlov's Go-Bot used this DSTC-2 approach to defining domain model and behavior of the goal-oriented bots. -In this section you will learn how to use this approach to build a DSTC-2-based Go-Bot. - -Requirements -^^^^^^^^^^^^ - -**TO TRAIN** a go\_bot model you should have: - -1. (*optional, but recommended*) pretrained named entity recognition model (NER) - - - config :config:`configs/ner/slotfill_dstc2.json ` is recommended -2. (*optional, but recommended*) pretrained intents classifier model - - - config :config:`configs/classifiers/intents_dstc2_big.json ` is recommended -3. (*optional*) any sentence (word) embeddings for english - - - fasttext embeddings can be downloaded - - - via link https://s3-us-west-1.amazonaws.com/fasttext-vectors/wiki.en.zip - - or using deeppavlov with :code:`python3 -m deeppavlov download `, - where ```` is one of the :config:`provided config files `. - -**TO INFER** from a go\_bot model you should **additionally** have: - -4. pretrained vocabulary of dataset utterance tokens - - - it is trained in the same config as go\_bot model - -5. pretrained goal-oriented bot model - - - config :config:`configs/go_bot/gobot_dstc2.json ` is recommended - - ``slot_filler`` section of go\_bot's config should match NER's configuration - - ``intent_classifier`` section of go\_bot's config should match classifier's configuration - -Configs -^^^^^^^ - -For a working exemplary config see -:config:`configs/go_bot/gobot_dstc2.json ` (model without embeddings). - -A minimal model without ``slot_filler``, ``intent_classifier`` and ``embedder`` is configured -in :config:`configs/go_bot/gobot_dstc2_minimal.json `. - -The best state-of-the-art model (with attention mechanism, relies on ``embedder`` and -does not use bag-of-words) is configured in -:config:`configs/go_bot/gobot_dstc2_best.json `. - -Usage example -^^^^^^^^^^^^^ - -To interact with a pretrained go\_bot model using commandline run: - -.. code:: bash - - python -m deeppavlov interact [-d] - -where ```` is one of the :config:`provided config files `. - -You can also train your own model by running: - -.. code:: bash - - python -m deeppavlov train [-d] - -The ``-d`` parameter downloads - - - data required to train your model (embeddings, etc.); - - a pretrained model if available (provided not for all configs). - -**Pretrained for DSTC2** models are available for - - - :config:`configs/go_bot/gobot_dstc2.json ` and - - :config:`configs/go_bot/gobot_dstc2.json `. - -After downloading required files you can use the configs in your python code. -To infer from a pretrained model with config path equal to ````: - -.. code:: python - - from deeppavlov import build_model - - CONFIG_PATH = '' - model = build_model(CONFIG_PATH) - - utterance = "" - while utterance != 'exit': - print(">> " + model([utterance])[0]) - utterance = input(':: ') - -Config parameters -^^^^^^^^^^^^^^^^^ - -To configure your own pipelines that contain a ``"go_bot"`` component, refer to documentation for :class:`~deeppavlov.models.go_bot.bot.GoalOrientedBot` and :class:`~deeppavlov.models.go_bot.network.GoalOrientedBotNetwork` classes. - -Datasets -======== - -.. _dstc2_dataset: - -DSTC2 -^^^^^ - -The Hybrid Code Network model was trained and evaluated on a modification of a dataset from Dialogue State Tracking -Challenge 2 [2]_. The modifications were as follows: - -- **new turns with api calls** - - - added api\_calls to restaurant database (example: - ``{"text": "api_call area=\"south\" food=\"dontcare\" pricerange=\"cheap\"", "dialog_acts": ["api_call"]}``) - -- **new actions** - - - bot dialog actions were concatenated into one action (example: - ``{"dialog_acts": ["ask", "request"]}`` -> - ``{"dialog_acts": ["ask_request"]}``) - - if a slot key was associated with the dialog action, the new act - was a concatenation of an act and a slot key (example: - ``{"dialog_acts": ["ask"], "slot_vals": ["area"]}`` -> - ``{"dialog_acts": ["ask_area"]}``) - -- **new train/dev/test split** - - - original dstc2 consisted of three different MDP policies, the original train - and dev datasets (consisting of two policies) were merged and - randomly split into train/dev/test - -- **minor fixes** - - - fixed several dialogs, where actions were wrongly annotated - - uppercased first letter of bot responses - - unified punctuation for bot responses - -See :class:`deeppavlov.dataset_readers.dstc2_reader.DSTC2DatasetReader` for implementation. - -Your data -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -Dialogs -""""""" - -If your model uses DSTC2 and relies on ``"dstc2_reader"`` -(:class:`~deeppavlov.dataset_readers.dstc2_reader.DSTC2DatasetReader`), -all needed files, if not present in the -:attr:`DSTC2DatasetReader.data_path ` directory, -will be downloaded from web. - -If your model needs to be trained on different data, you have several ways of -achieving that (sorted by increase in the amount of code): - -1. Use ``"dialog_iterator"`` in dataset iterator config section and - ``"dstc2_reader"`` in dataset reader config section - (**the simplest, but not the best way**): - - - set ``dataset_reader.data_path`` to your data directory; - - your data files should have the same format as expected in - :meth:`DSTC2DatasetReader.read() ` - method. - -2. Use ``"dialog_iterator"`` in dataset iterator config section and - ``"your_dataset_reader"`` in dataset reader config section (**recommended**): - - - clone :class:`deeppavlov.dataset_readers.dstc2_reader.DSTC2DatasetReader` to - ``YourDatasetReader``; - - register as ``"your_dataset_reader"``; - - rewrite so that it implements the same interface as the origin. - Particularly, ``YourDatasetReader.read()`` must have the same output as - :meth:`DSTC2DatasetReader.read() `. - - - ``train`` — training dialog turns consisting of tuples: - - - first tuple element contains first user's utterance info - (as dictionary with the following fields): - - - ``text`` — utterance string - - ``intents`` — list of string intents, associated with user's utterance - - ``db_result`` — a database response *(optional)* - - ``episode_done`` — set to ``true``, if current utterance is - the start of a new dialog, and ``false`` (or skipped) otherwise *(optional)* - - - second tuple element contains second user's response info - - - ``text`` — utterance string - - ``act`` — an act, associated with the user's utterance - - - ``valid`` — validation dialog turns in the same format - - ``test`` — test dialog turns in the same format - -3. Use your own dataset iterator and dataset reader (**if 2. doesn't work for you**): - - - your ``YourDatasetIterator.gen_batches()`` class method output should match the - input format for chainer from - :config:`configs/go_bot/gobot_dstc2.json `. - -Templates -""""""""" - -You should provide a maping from actions to text templates in the format - -.. code:: text - - action1template1 - action2template2 - ... - actionNtemplateN - -where filled slots in templates should start with "#" and mustn't contain whitespaces. - -For example, - -.. code:: text - - bye You are welcome! - canthear Sorry, I can't hear you. - expl-conf_area Did you say you are looking for a restaurant in the #area of town? - inform_area+inform_food+offer_name #name is a nice place in the #area of town serving tasty #food food. - -It is recommended to use ``"DefaultTemplate"`` value for ``template_type`` parameter. - - -Database (Optional) -===================== - -If your dataset doesn't imply any api calls to an external database, just do not set -``database`` and ``api_call_action`` parameters and skip the section below. - -Otherwise, you should - -1. provide sql table with requested items or -2. construct such table from provided in train samples ``db_result`` items. - This can be done with the following script: - - - .. code:: bash - - python -m deeppavlov train configs/go_bot/database_.json - - where ``configs/go_bot/database_.json`` is a copy - of ``configs/go_bot/database_dstc2.json`` with configured - ``save_path``, ``primary_keys`` and ``unknown_value``. - -Comparison -************ - -Scores for different modifications of our bot model and comparison with existing benchmarks: - -+-----------------------------------+------+------------------------------------------------------------------------------------+---------------+-----------+---------------+ -| Dataset | Lang | Model | Metric | Test | Downloads | -+===================================+======+====================================================================================+===============+===========+===============+ -| `DSTC 2`_ | En | :config:`basic bot ` | Turn Accuracy | 0.380 | 10 Mb | -+ (:ref:`modified `) + +------------------------------------------------------------------------------------+ +-----------+---------------+ -| | | :config:`bot with slot filler ` | | 0.542 | 400 Mb | -+ + +------------------------------------------------------------------------------------+ +-----------+---------------+ -| | | :config:`bot with slot filler, intents & attention ` | | **0.553** | 8.5 Gb | -+-----------------------------------+ +------------------------------------------------------------------------------------+ +-----------+---------------+ -| `DSTC 2`_ | | Bordes and Weston (2016) [3]_ | | 0.411 | -- | -+ + +------------------------------------------------------------------------------------+ +-----------+---------------+ -| | | Eric and Manning (2017) [4]_ | | 0.480 | -- | -+ + +------------------------------------------------------------------------------------+ +-----------+---------------+ -| | | Perez and Liu (2016) [5]_ | | 0.487 | -- | -+ + +------------------------------------------------------------------------------------+ +-----------+---------------+ -| | | Williams et al. (2017) [1]_ | | **0.556** | -- | -+-----------------------------------+------+------------------------------------------------------------------------------------+---------------+-----------+---------------+ - -.. _`DSTC 2`: http://camdial.org/~mh521/dstc/ - -References -************ - -.. [1] `Jason D. Williams, Kavosh Asadi, Geoffrey Zweig "Hybrid Code - Networks: practical and efficient end-to-end dialog control with - supervised and reinforcement learning" – - 2017 `_ - -.. [2] `Dialog State Tracking Challenge 2 - dataset `_ - -.. [3] `Antoine Bordes, Y-Lan Boureau & Jason Weston "Learning end-to-end - goal-oriented dialog" - 2017 `_ - -.. [4] `Mihail Eric, Christopher D. Manning "A Copy-Augmented - Sequence-to-Sequence Architecture Gives Good Performance on - Task-Oriented Dialogue" - 2017 `_ - -.. [5] `Fei Liu, Julien Perez "Gated End-to-end Memory Networks" - - 2016 `_ - - -.. |alt text| image:: ../../_static/gobot_diagram.png diff --git a/docs/features/skills/odqa.rst b/docs/features/skills/odqa.rst index d71c1f1504..e4f316dce3 100644 --- a/docs/features/skills/odqa.rst +++ b/docs/features/skills/odqa.rst @@ -27,18 +27,18 @@ Training (if you have your own data) .. code:: python - from deeppavlov import configs, train_evaluate_model_from_config + from deeppavlov import train_evaluate_model_from_config - train_evaluate_model_from_config(configs.doc_retrieval.en_ranker_tfidf_wiki, download=True) - train_evaluate_model_from_config(configs.squad.multi_squad_noans, download=True) + train_evaluate_model_from_config('en_ranker_tfidf_wiki', download=True) + train_evaluate_model_from_config('qa_squad2_bert', download=True) Building .. code:: python - from deeppavlov import build_model, configs + from deeppavlov import build_model - odqa = build_model(configs.odqa.en_odqa_infer_wiki, download=True) + odqa = build_model('en_odqa_infer_wiki', download=True) Inference @@ -73,7 +73,7 @@ Running ODQA .. note:: - About **24 GB of RAM** required. + About **22 GB of RAM** required. It is possible to run on a 16 GB machine, but than swap size should be at least 8 GB. Training @@ -121,10 +121,6 @@ There are several ODQA configs available: | | of TF-IDF ranker and reader. Searches for an | | | answer in ``enwiki20180211`` Wikipedia dump. | +----------------------------------------------------------------------------------------+-------------------------------------------------+ -|:config:`en_odqa_infer_enwiki20161221 ` | Basic config for **English** language. Consists | -| | of TF-IDF ranker and reader. Searches for an | -| | answer in ``enwiki20161221`` Wikipedia dump. | -+----------------------------------------------------------------------------------------+-------------------------------------------------+ |:config:`ru_odqa_infer_wiki ` | Basic config for **Russian** language. Consists | | | of TF-IDF ranker and reader. Searches for an | | | answer in ``ruwiki20180401`` Wikipedia dump. | @@ -140,23 +136,19 @@ Comparison Scores for **ODQA** skill: -+-------------------------------------------------------------------------------------+------+----------------------+----------------+---------------------+---------------------+ -| | | | | Ranker@5 | Ranker@25 | -| | | | +----------+----------+-----------+---------+ -| Model | Lang | Dataset | WikiDump | F1 | EM | F1 | EM | -+-------------------------------------------------------------------------------------+------+----------------------+----------------+----------+----------+-----------+---------+ -|:config:`DeppPavlov ` | | | enwiki20180211 | 35.89 | 29.21 | 39.96 | 32.64 | -+-------------------------------------------------------------------------------------+ + +----------------+----------+----------+-----------+---------+ -|:config:`DeepPavlov ` | En | SQuAD (dev) | | **37.83**|**31.26** | 41.86 | 34.73 | -+-------------------------------------------------------------------------------------+ + + +----------+----------+-----------+---------+ -|`DrQA`_ | | | | \- | 27.1 | \- | \- | -+-------------------------------------------------------------------------------------+ + + +----------+----------+-----------+---------+ -|`R3`_ | | | enwiki20161221 | 37.5 | 29.1 | \- | \- | -+-------------------------------------------------------------------------------------+------+----------------------+----------------+----------+----------+-----------+---------+ -|:config:`DeepPavlov with RuBERT reader ` | | | | **42.02**|**29.56** | \- | \- | -+-------------------------------------------------------------------------------------+ Ru + SDSJ Task B (dev) + ruwiki20180401 +----------+----------+-----------+---------+ -|:config:`DeepPavlov ` | | | | 28.56 | 18.17 | \- | \- | -+-------------------------------------------------------------------------------------+------+----------------------+----------------+----------+----------+-----------+---------+ ++----------------------------------------------------------------------------------------------------------------------------------+------+----------------------+----------------+---------------------+---------------------+ +| | | | | Ranker@5 | Ranker@25 | +| | | | +----------+----------+-----------+---------+ +| Model | Lang | Dataset | WikiDump | F1 | EM | F1 | EM | ++----------------------------------------------------------------------------------------------------------------------------------+------+----------------------+----------------+----------+----------+-----------+---------+ +|:config:`DeppPavlov ` | En | | enwiki20180211 | 29.03 | 22.75 | 31.38 | 25.96 | ++----------------------------------------------------------------------------------------------------------------------------------+ + +----------------+----------+----------+-----------+---------+ +|`DrQA`_ | | | | \- | 27.1 | \- | \- | ++----------------------------------------------------------------------------------------------------------------------------------+ + + +----------+----------+-----------+---------+ +|`R3`_ | | | enwiki20161221 | 37.5 | 29.1 | \- | \- | ++----------------------------------------------------------------------------------------------------------------------------------+------+----------------------+----------------+----------+----------+-----------+---------+ +|:config:`DeepPavlov with RuBERT reader ` | Ru | SDSJ Task B (dev) | ruwiki20180401 | **42.02**|**29.56** | \- | \- | ++----------------------------------------------------------------------------------------------------------------------------------+------+----------------------+----------------+----------+----------+-----------+---------+ EM stands for "exact-match accuracy". Metrics are counted for top 5 and top 25 documents returned by retrieval module. diff --git a/docs/features/skills/rasa_skill.rst b/docs/features/skills/rasa_skill.rst deleted file mode 100644 index 5f8ffdd3db..0000000000 --- a/docs/features/skills/rasa_skill.rst +++ /dev/null @@ -1,50 +0,0 @@ -Rasa Skill -====================== - -A :class:`Rasa wrapper implementation` that reads a folder with Rasa models -(provided by ``path_to_models`` argument), initializes Rasa Agent with this configuration and responds for incoming -utterances according to responses predicted by Rasa. Each response has confidence value estimated as product of -scores of executed actions by Rasa system in the current prediction step (each prediction step in Rasa usually consists of -multiple actions). If Rasa responds with multiple ``BotUttered`` actions, then such phrases are merged into one utterance -divided by ``'\n'``. - -Quick Start ------------ -To setup a Rasa Skill you need to have a working Rasa project at some path, then you can specify the path to Rasa's -models (usually it is a folder with name ``models`` inside the project path) at initialization of Rasa Skill class -by providing ``path_to_models`` attribute. - -Dummy Rasa project ------------------- -DeepPavlov library has :config:`a template config for RASASkill`. -This project is in essence a working Rasa project created with ``rasa init`` and ``rasa train`` commands -with minimal additions. The Rasa bot can greet, answer about what he can do and detect user's mood sentiment. - -The template DeepPavlov config specifies only one component (RASASkill) in :doc:`a pipeline`. -The ``metadata.download`` field in configuration allows to download and unpack the gzipped template project into -subdir ``{DOWNLOADS_PATH}``. - -If you create a configuration for a Rasa project hosted on your machine, you don't need to specify ``metadata.download`` -and just need to correctly set ``path_to_models`` of the ``rasa_skill`` component. -``path_to_models`` needs to be a path to your Rasa's ``models`` directory. - -See `Rasa's documentation `_ for explanation on how -to create project. - -Usage without DeepPavlov configuration files -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -.. code:: python - - from deeppavlov.skills.rasa_skill import RASASkill - - rasa_skill_config = { - 'path_to_models': , - } - - rasa_skill = RASASkill(**rasa_skill_config) - - states_batch = None - for utterance in ["Hello", "Hello to the same user_id"]: - responses_batch, confidences_batch, states_batch = rasa_skill([utterance], states_batch) - print(responses_batch[0]) diff --git a/docs/index.rst b/docs/index.rst index 1f5f795c33..47be64eb43 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -9,7 +9,6 @@ Welcome to DeepPavlov's documentation! QuickStart General concepts Configuration file - Choosing The Framework Models/Skills overview @@ -28,21 +27,15 @@ Welcome to DeepPavlov's documentation! :caption: Models BERT-based models - Multitask BERT Context Question Answering Classification - Entity Linking - Morphological Tagger + Entity Extraction Named Entity Recognition Neural Ranking - Slot filling - Speech recognition and synthesis Spelling Correction - Syntactic Parser TF-IDF Ranking Popularity Ranking Knowledge Base Question answering - Intent Catcher Relation Extraction @@ -51,12 +44,8 @@ Welcome to DeepPavlov's documentation! :maxdepth: 1 :caption: Skills - Goal-Oriented Dialogue Bot Open-Domain Question Answering Frequently Asked Questions Answering - AIML - Rasa - DSL .. toctree:: @@ -67,10 +56,6 @@ Welcome to DeepPavlov's documentation! REST API Socket API DeepPavlov Agent RabbitMQ integration - Telegram integration - Yandex Alice integration - Amazon Alexa integration - Microsoft Bot Framework integration Amazon AWS deployment DeepPavlov settings diff --git a/docs/integrations/amazon_alexa.rst b/docs/integrations/amazon_alexa.rst deleted file mode 100644 index 5cf3f2d034..0000000000 --- a/docs/integrations/amazon_alexa.rst +++ /dev/null @@ -1,202 +0,0 @@ -Amazon Alexa integration -======================== - -DeepPavlov models can be made available for inference via Amazon Alexa. Because of Alexa predominantly -conversational nature (raw text in, raw text out), the best results can be achieved with models with raw text both -in input and output (ODQA, SQuAD, etc.). - -Also we **highly** recommend you to study `Alexa skills building basics `__ -and `Alexa Developer console `__ -to make you familiar with main Alexa development concepts and terminology. - -Further instructions are given counting on the fact that you are already familiar with them. - -The whole integrations process takes two main steps: - -1. Skill setup in Amazon Alexa Developer console -2. DeepPavlov skill/model REST service mounting - -1. Skill setup --------------- - -The main feature of Alexa integration is that Alexa API does not provide direct ways to pass raw user text to your custom skill. -You will define at least one intent in Developer Console (you will even not be able to compile your skill without one) -and at least one slot (without it you will not be able to pass any user input). Of course, you can not cover infinite -possible user inputs with list of predefined intents and slots. There are to ways to hack it: - -**1. AMAZON.SearchQuery slot type** - -This hack uses AMAZON.SearchQuery slot type which grabs raw text (speech) user input. Bad news that sample utterance -can not consist only of AMAZON.SearchQuery slot and requires some carrier phrase (one word carrier phrase will work). -So you should define this phrase and restrict your user to use it before or after you query. - -Here is JSON config example for Skill Developer console with *"tell"* carrier phrase: - -.. code:: json - - { - "interactionModel": { - "languageModel": { - "invocationName": "my beautiful sandbox skill", - "intents": [ - { - "name": "AMAZON.CancelIntent", - "samples": [] - }, - { - "name": "AMAZON.HelpIntent", - "samples": [] - }, - { - "name": "AMAZON.StopIntent", - "samples": [] - }, - { - "name": "AMAZON.NavigateHomeIntent", - "samples": [] - }, - { - "name": "AskDeepPavlov", - "slots": [ - { - "name": "raw_input", - "type": "AMAZON.SearchQuery" - } - ], - "samples": [ - "tell {raw_input}" - ] - } - ], - "types": [] - } - } - } - -**2. Custom slot type** - -This is kind of "black market hack" but it gives the exact result we want. The idea is to use -`custom slot types `__. -In our case, we will need only one slot type. We will rely on the fact, that, according the docs values outside the -predefined custom slot values list are still returned if recognized by the spoken language understanding system. -Although input to a custom slot type is weighted towards the values in the list, it is not constrained to just the -items on the list. - -The other good news is that custom slot does not require any wrapper words and will grab exact user speech. - -So, the recipe is to define only one intent with only one sample utterance which in turn will consist of your only custom slot. -Custom slot values list should consist of several "abracadabra" entries. Here is JSON config example for Skill Developer -console: - -.. code:: json - - { - "interactionModel": { - "languageModel": { - "invocationName": "my beautiful sandbox skill", - "intents": [ - { - "name": "AMAZON.CancelIntent", - "samples": [] - }, - { - "name": "AMAZON.HelpIntent", - "samples": [] - }, - { - "name": "AMAZON.StopIntent", - "samples": [] - }, - { - "name": "AMAZON.NavigateHomeIntent", - "samples": [] - }, - { - "name": "AskDeepPavlov", - "slots": [ - { - "name": "raw_input", - "type": "GetInput" - } - ], - "samples": [ - "{raw_input}" - ] - } - ], - "types": [ - { - "name": "GetInput", - "values": [ - { - "name": { - "value": "Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum" - } - }, - { - "name": { - "value": "Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur" - } - }, - { - "name": { - "value": "quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat" - } - }, - { - "name": { - "value": "Ut enim ad minim veniam" - } - }, - { - "name": { - "value": "sed do eiusmod tempor incididunt ut labore et dolore magna aliqua" - } - }, - { - "name": { - "value": "Lorem ipsum dolor sit amet, consectetur adipiscing elit" - } - } - ] - } - ] - } - } - } - -Please note, that in both cases you should have only one intent with only one slot defined in Alexa Development Console. - -2. DeepPavlov skill/model REST service mounting ---------------------------------------------------- - -Alexa sends request to the https endpoint which was set in the **Endpoint** section of Alexa Development Console. - -You should deploy DeepPavlov skill/model REST service on this -endpoint or redirect it to your REST service. Full REST endpoint URL -can be obtained by the swagger ``docs/`` endpoint. We remind you that Alexa requires https endpoint -with valid certificate from CA. `Here is the guide `__ -for running custom skill service with self-signed certificates in test mode. - -Your intent and slot names defined in Alexa Development Console should be the same with values defined in -DeepPavlov settings file ``deeppavlov/utils/settings/server_config.json``. JSON examples from this guide use default values from -the settings file. - -DeepPavlov skill/model can be made available for Amazon Alexa as a REST service by: - -.. code:: bash - - python -m deeppavlov alexa [--https] [--key ] \ - [--cert ] [-d] [-p ] - -If you redirect requests to your skills service from some https endpoint, you may want to run it in http mode by -omitting ``--https``, ``--key``, ``--cert`` keys. - -Optional ``-d`` key can be provided for dependencies download -before service start. - -Optional ``-p`` key can be provided to override the port value from a settings file. -for **each** conversation. - -REST service properties (host, port, https options) are provided in ``deeppavlov/utils/settings/server_config.json``. Please note, -that all command line parameters override corresponding config ones. diff --git a/docs/integrations/ms_bot.rst b/docs/integrations/ms_bot.rst deleted file mode 100644 index e27d603803..0000000000 --- a/docs/integrations/ms_bot.rst +++ /dev/null @@ -1,104 +0,0 @@ -Microsoft Bot Framework integration -=================================== - -Each library model or skill can be made available for -inference via Microsoft Bot Framework. - -The whole process takes two main steps: - -1. Web App Bot setup in Microsoft Azure -2. DeepPavlov skill/model REST service mounting - -1. Web App Bot setup --------------------- - -1. Web App Bot setup guide presumes that you already have - active Microsoft Azure account and logged in to the main Azure dashboard - -2. **Create Web App Bot**: - - 2.1 Go to the *All resources* menu. - - 2.2 Click *Add*. - - 2.3 Type "bot" in the search pane and select *Web App Bot*. - - .. image:: ../_static/ms_bot_framework/01_web_app_bot.png - :width: 800 - - 2.4 Press *"Create"* button on the next screen. - - 2.5 Select Web App Bot creation settings. - - 2.6 Pay attention to the *Pricing tier*, be sure to select free one: - *F0 (10K Premium Messages)*. - - 2.7 Press *"Create"* button. - - .. image:: ../_static/ms_bot_framework/02_web_app_bot_settings.png - :width: 800 - - 2.8 Navigate to your bot control dashboard. - - .. image:: ../_static/ms_bot_framework/03_navigate_to_bot.png - :width: 1500 - -3. **Web App Bot connection configuration**: - - 3.1 Navigate to your bot *Settings* menu. - - 3.2 Input your DeepPavlov skill/model REST service URL - to the *Messaging endpoint* pane. Note, that Microsoft Bot - Framework requires https endpoint with valid certificate from CA. - - 3.3 Save somewhere *Microsoft App ID* (*App ID*). To create *App Secret* - you need to proceed to the *Manage* link near the *Microsoft App ID* pane. - You will need both during your DeepPavlov skill/model REST service start. - - .. image:: ../_static/ms_bot_framework/04_bot_settings.png - :width: 1500 - -4. **Web App Bot channels configuration** - - 4.1 Microsoft Bot Framework allows your bot to communicate - to the outer world via different channels. To set up these channels - navigate to the *Channels* menu, select channel and follow further instructions. - - .. image:: ../_static/ms_bot_framework/05_bot_channels.png - :width: 1500 - -2. DeepPavlov skill/model REST service mounting ---------------------------------------------------- - -MS Bot Framework sends messages from all channels to the https endpoint -which was set in the **Web App Bot connection configuration** section. - -You should deploy DeepPavlov skill/model REST service on this -endpoint or terminate it to your REST service. Full REST endpoint URL -can be obtained by the swagger ``docs/`` endpoint. We remind you that Microsoft Bot Framework requires https endpoint -with valid certificate from CA. - -Each DeepPavlov skill/model can be made available for MS Bot Framework -as a REST service by: - -.. code:: bash - - python -m deeppavlov msbot [-i ] [-s ] \ - [--https] [--key ] [--cert ] [-d] [-p ] - -Use *Microsoft App ID* and *Microsoft App Secret* obtained -in the **Web App Bot connection configuration** section. - -If you redirect requests to your skills service from some https endpoint, you may want to run it in http mode by -omitting ``--https``, ``--key``, ``--cert`` keys. - -Optional ``-d`` key can be provided for dependencies download -before service start. - -Optional ``-p`` key can be provided to override the port value from a settings file. -for **each** conversation. - -REST service properties (host, port) are provided in ``deeppavlov/utils/settings/server_config.json``. You can also store your -app id and app secret in appropriate section of ``server_config.json``. Please note, that all command line parameters -override corresponding config ones. - diff --git a/docs/integrations/rest_api.rst b/docs/integrations/rest_api.rst index 12245ab073..64152252ce 100644 --- a/docs/integrations/rest_api.rst +++ b/docs/integrations/rest_api.rst @@ -72,8 +72,8 @@ to the model by ``server_utils`` label in ``metadata`` section of the model config. Value of ``server_utils`` label from model config should match with properties key from ``model_defaults`` section of ``server_config.json``. -For example, adding ``metadata/server_utils`` key to ``go_bot/gobot_dstc2.json`` -with value *GoalOrientedBot* will initiate the search of *GoalOrientedBot* tag +For example, adding ``metadata/server_utils`` key to ``kbqa/kbqa_cq.json`` +with value *KBQA* will initiate the search of *KBQA* tag at ``model_defaults`` section of ``server_config.json``. Therefore, if this section is present, all parameters with non empty (i.e. not ``""``, not ``[]`` etc.) values stored by this tag will overwrite the parameter values diff --git a/docs/integrations/socket_api.rst b/docs/integrations/socket_api.rst index 48214a6196..adb76f618f 100644 --- a/docs/integrations/socket_api.rst +++ b/docs/integrations/socket_api.rst @@ -42,8 +42,8 @@ to the model by ``server_utils`` label in ``metadata`` section of the model config. Value of ``server_utils`` label from model config should match with properties key from ``model_defaults`` section of ``server_config.json``. -For example, adding ``metadata/server_utils`` key to ``go_bot/gobot_dstc2.json`` -with value *GoalOrientedBot* will initiate the search of *GoalOrientedBot* tag +For example, adding ``metadata/server_utils`` key to ``kbqa/kbqa_cq.json`` +with value *KBQA* will initiate the search of *KBQA* tag at ``model_defaults`` section of ``server_config.json``. Therefore, if this section is present, all parameters with non empty (i.e. not ``""``, not ``[]`` etc.) values stored by this tag will overwrite the parameter values diff --git a/docs/integrations/telegram.rst b/docs/integrations/telegram.rst deleted file mode 100644 index 561cb94891..0000000000 --- a/docs/integrations/telegram.rst +++ /dev/null @@ -1,39 +0,0 @@ - -Telegram integration -======================== - -Any model specified by a DeepPavlov config can be launched as a Telegram bot. -You can do it using command line interface or using python. - -Command line interface -~~~~~~~~~~~~~~~~~~~~~~ - -To run a model specified by the ```` config file as a Telegram bot -with a ````: - -.. code:: bash - - python -m deeppavlov telegram [-t ] [-d] - - -* ``-t ``: specifies telegram token as ````. Overrides - default value from ``deeppavlov/utils/settings/server_config.json``. -* ``-d``: downloads model specific data before starting the service. - -The command will print info message ``Bot initiated`` when starts bot. - -``/start`` and ``/help`` Telegram bot messages can be modified via changing -``telegram.start_message`` and ``telegram.help_message`` -in `deeppavlov/utils/settings/server_config.json`. - -Python -~~~~~~ - -To run a model specified by a DeepPavlov config ```` as -Telegram bot, you have to run following code: - -.. code:: python - - from deeppavlov.utils.telegram import interact_model_by_telegram - - interact_model_by_telegram(model_config=, token=) diff --git a/docs/integrations/yandex_alice.rst b/docs/integrations/yandex_alice.rst deleted file mode 100644 index 8a30bc6bdf..0000000000 --- a/docs/integrations/yandex_alice.rst +++ /dev/null @@ -1,59 +0,0 @@ -Yandex Alice integration -======================== - -Any model specified by a DeepPavlov config can be launched as a skill for -Yandex.Alice. You can do it using command line interface or using python. - -Command line interface -~~~~~~~~~~~~~~~~~~~~~~ - -To interact with Alice you will require your own HTTPS certificate. To generate -a new one -- run: - -:: - - openssl req -new -newkey rsa:4096 -days 365 -nodes -x509 -subj "/CN=MY_DOMAIN_OR_IP" -keyout my.key -out my.crt - -To run a model specified by the ```` config file as an Alice -skill, run: - -:: - - python -m deeppavlov alice --https --key my.key --cert my.crt [-d] [-p ] - -* ``-d``: download model specific data before starting the service. - -The command will print the used host and port. Default web service properties -(host, port, model endpoint, GET request arguments, paths to ssl cert and key, -https mode) can be modified via changing -``deeppavlov/utils/settings/server_config.json`` file. ``--https``, ``--key``, -``--cert``, ``-p`` arguments override default values from ``server_config.json``. -Advanced API configuration is described in -:doc:`REST API ` section. - -Now set up and test your dialog (https://dialogs.yandex.ru/developer/). -Detailed documentation of the platform could be found on -https://tech.yandex.ru/dialogs/alice/doc/about-docpage/. Advanced API -configuration is described in :doc:`REST API ` section. - - -Python -~~~~~~ - -To run a model specified by a DeepPavlov config ```` as an Alice -skill using python, you have to run following code: - -.. code:: python - - from deeppavlov.utils.alice import start_alice_server - - start_alice_server(, - host=, - port=, - endpoint=, - https=True, - ssl_key='my.key', - ssl_cert='my.crt') - -All arguments except ```` are optional. Optional arguments override -corresponding values from ``deeppavlov/utils/settings/server_config.json``. diff --git a/docs/intro/choose_framework.rst b/docs/intro/choose_framework.rst deleted file mode 100644 index aa208cb7c0..0000000000 --- a/docs/intro/choose_framework.rst +++ /dev/null @@ -1,135 +0,0 @@ -Choose the Framework -==================== - -DeepPavlov is built on top of the machine learning frameworks -`TensorFlow `__, -`Keras `__ and `PyTorch `__: - -* BERT-based models on TensorFlow and PyTorch; -* Text classification on Keras and PyTorch; -* Text ranking and morpho-tagging on Keras; -* All other models on TensorFlow. - -First, follow the instructions on :doc:`Installation page ` -to install the ``deeppavlov`` package for Python 3.6/3.7. - -Depending on the considered NLP task, you need to choose one of the available frameworks. -The full list of available models is :doc:`here `. - -- To install the requirements for the considered model, you can find the config file with the same configuration - in terms of used components, and install the requirements in the following way: - - .. code:: bash - - python -m deeppavlov install -d - - where ```` is path to the chosen model's config file (e.g. ``deeppavlov/configs/ner/slotfill_dstc2.json``) - or just the file name without the `.json` extension (e.g. ``slotfill_dstc2``); - ``-d`` downloads required data -- pretrained model files and embeddings (optional). - -Trainer -------- - -If you are going to use models on Keras or TensorFlow, in ``config["train"]``, you need to set ``"class_name": "nn_trainer"``; -If using PyTorch, you need to use ``"class_name": "torch_trainer"``, which differs from ``nn_trainer`` -only in assigning ``torch.nn.Module.train()`` and ``torch.nn.Module.eval()`` models for PyTorch modules. - - -Text Classification on Keras or PyTorch ---------------------------------------- - -If you want to build your own architecture for **text classification** tasks, do the following in **Keras** or in **PyTorch**: - - .. code:: python - - # Keras - from deeppavlov.models.classifiers.keras_classification_model import KerasClassificationModel - # PyTorch - # from deeppavlov.models.classifiers.torch_classification_model import TorchTextClassificationModel - - # Keras - class MyModel(KerasClassificationModel): - # Torch - # class MyModel(TorchTextClassificationModel): - - def my_network_architecture(self, **kwargs): - model = - return model - - In the config file, assign ``"class_name": "module.path.to.my.model.file:MyModel"`` - and ``"model_name": "my_network_architecture"`` - in the dictionary with the main model. - Don't forget to set ``torch_trainer`` or ``nn_trainer`` (for PyTorch) or ``nn_trainer`` (for TensorFlow and Keras). - -Other NLP-tasks on TensorFlow, Keras, or PyTorch ------------------------------------------------- - -- If you want to build your own model for **some other NLP** task, do the following in **Keras** or **PyTorch**: - - .. code:: python - - # Keras - from deeppavlov.core.models.keras_model import LRScheduledKerasModel - # PyTorch - # from deeppavlov.core.models.torch_model import TorchModel - - # Keras - class MyModel(LRScheduledKerasModel): - # Torch - # class MyModel(TorchModel): - - def train_on_batch(x, y, *args, **kwargs): - - return loss - - def __call__(data, *args, **kwargs): - - return predictions - - def my_network_architecture(self, **kwargs): - model = - return model - - In the config file, assign ``"class_name": "module.path.to.my.model.file:MyModel"`` - and ``"model_name": "my_network_architecture"`` - in the dictionary with the main model. - Don't forget to set ``torch_trainer`` (for PyTorch) or ``nn_trainer`` (for TensorFlow and Keras). - - -- If you want to build your own model for **some other NLP** task, do the following in **TensorFlow**: - - .. code:: python - - from deeppavlov.core.models.tf_model import LRScheduledTFModel - - class MyModel(LRScheduledTFModel): - - def _init_graph(self): - - - def _init_placeholders(self): - - - def _init_optimizer(self): - - - def _build_feed_dict(self, *variables): - - return feed_dict - - def train_on_batch(x, y, *args, **kwargs): - - feed_dict = self._build_feed_dict(*variables) - loss, _ = self.sess.run([self.loss, self.train_op], feed_dict=feed_dict) - return {"loss": loss} - - def __call__(data, *args, **kwargs): - - feed_dict = self._build_feed_dict(*variables) - predictions = self.sess.run([self.predictions], feed_dict=feed_dict) - return predictions.tolist() - - In the config file, assign ``"class_name": "module.path.to.my.model.file:MyModel"`` - and ``"model_name": "my_network_architecture"`` - in the dictionary with the main model; Also, set all the necessary parameters in the same dictionary. - Don't forget to set ``nn_trainer`` (for TensorFlow). diff --git a/docs/intro/configuration.rst b/docs/intro/configuration.rst index 9f873c5e9c..88cca82df8 100644 --- a/docs/intro/configuration.rst +++ b/docs/intro/configuration.rst @@ -61,6 +61,38 @@ parameters: }, +Nested configuration files +-------------------------- + +Any configuration file could be used inside another configuration file as an element of the +:class:`~deeppavlov.core.common.chainer.Chainer` or as a field of another component using ``config_path`` key. +Any field of the nested configuration file could be overwritten using ``overwrite`` field: + +.. code:: + + "chainer": { + "pipe": { + ... + { + "class_name": "ner_chunk_model", + "ner": { + "config_path": "{CONFIGS_PATH}/ner/ner_ontonotes_bert.json", + "overwrite": { + "chainer.out": ["x_tokens", "tokens_offsets", "y_pred", "probas"] + } + }, + ... + } + } + } + +In this example ``ner_ontonotes_bert.json`` is used as ``ner`` argument value in ``ner_chunk_model`` component. +``chainer.out`` value is overwritten with new list. Overwritten fields names are defined using dot notation. In this +notation numeric fields are treated as indexes of lists. For example, to change ``class_name`` value of the second +element of the pipe to ``ner_chunker`` (1 is the index of the second element), use +``"chainer.pipe.1.class_name": "ner_chunker"`` key-value pair. + + Variables --------- @@ -83,7 +115,7 @@ from ``metadata.variables`` element: { "in": ["x"], "out": ["y_predicted"], - "config_path": "{CONFIGS_PATH}/classifiers/intents_snips.json" + "config_path": "{CONFIGS_PATH}/classifiers/insults_kaggle_bert.json" } ], "out": ["y_predicted"] @@ -177,18 +209,15 @@ and ``train``: Simplified version of training pipeline contains two elements: ``dataset`` and ``train``. The ``dataset`` element -currently can be used for train from classification data in ``csv`` and ``json`` formats. You can find complete examples -of how to use simplified training pipeline in -:config:`intents_sample_csv.json ` and -:config:`intents_sample_json.json ` config files. +currently can be used for train from classification data in ``csv`` and ``json`` formats. Train Parameters ~~~~~~~~~~~~~~~~ ``train`` element can contain a ``class_name`` parameter that references a trainer class (default value is -:class:`nn_trainer `). All other parameters will be passed as keyword arguments -to the trainer class's constructor. +:class:`torch_trainer `). +All other parameters will be passed as keyword arguments to the trainer class's constructor. Metrics @@ -197,7 +226,7 @@ _______ .. code:: python "train": { - "class_name": "nn_trainer", + "class_name": "torch_trainer", "metrics": [ "f1", { @@ -205,22 +234,27 @@ _______ "inputs": ["y", "y_labels"] }, { - "name": "roc_auc", - "inputs": ["y", "y_probabilities"] + "name": "sklearn.metrics:accuracy_score", + "alias": "unnormalized_accuracy", + "inputs": ["y", "y_labels"], + "normalize": false } ], ... } -| The first metric in the list is used for early stopping. -| -| Each metric can be described as a JSON object with ``name`` and ``inputs`` properties, where ``name`` - is a registered name of a metric function and ``inputs`` is a list of parameter names from chainer's - inner memory that will be passed to the metric function. -| -| If a metric is described as a single string, this string is interpreted as a registered name. -| -| Default value for ``inputs`` parameter is a concatenation of chainer's ``in_y`` and ``out`` parameters. +The first metric in the list is used for early stopping. + +Each metric can be described as a JSON object with ``name``, ``alias`` and ``inputs`` properties, where: + + - ``name`` is either a registered name of a metric function or ``module.submodules:function_name``. + - ``alias`` is a metric name. Default value is ``name`` value. + - ``inputs`` is a list of parameter names from chainer's inner memory that will be passed to the metric function. + Default value is a concatenation of chainer's ``in_y`` and ``out`` parameters. + +All other arguments are interpreted as kwargs when the metric is called. +If a metric is given as a string, this string is interpreted as a metric name, i.e. ``"f1"`` in the example +above is equivalent to ``{"name": "f1"}``. DatasetReader @@ -235,8 +269,8 @@ A concrete :class:`DatasetReader` class should be inherited from this base class from deeppavlov.core.common.registry import register from deeppavlov.core.data.dataset_reader import DatasetReader - @register('dstc2_datasetreader') - class DSTC2DatasetReader(DatasetReader): + @register('conll2003_reader') + class Conll2003DatasetReader(DatasetReader): DataLearningIterator and DataFittingIterator @@ -284,18 +318,10 @@ Preprocessor is a component that processes batch of samples. * Already implemented universal preprocessors of **tokenized texts** (each sample is a list of tokens): - - :class:`~deeppavlov.models.preprocessors.char_splitter.CharSplitter` - (registered as ``char_splitter``) splits every token in given batch of - tokenized samples to a sequence of characters. - - :class:`~deeppavlov.models.preprocessors.mask.Mask` (registered as ``mask``) returns binary mask of corresponding length (padding up to the maximum length per batch. - - :class:`~deeppavlov.models.preprocessors.russian_lemmatizer.PymorphyRussianLemmatizer` - (registered as ``pymorphy_russian_lemmatizer``) performs lemmatization - for Russian language. - - :class:`~deeppavlov.models.preprocessors.sanitizer.Sanitizer` (registered as ``sanitizer``) removes all combining characters like diacritical marks from tokens. @@ -327,9 +353,6 @@ Tokenizers Tokenizer is a component that processes batch of samples (each sample is a text string). - - :class:`~deeppavlov.models.tokenizers.lazy_tokenizer.LazyTokenizer` - (registered as ``lazy_tokenizer``) tokenizes using ``nltk.word_tokenize``. - - :class:`~deeppavlov.models.tokenizers.nltk_tokenizer.NLTKTokenizer` (registered as ``nltk_tokenizer``) tokenizes using tokenizers from ``nltk.tokenize``, e.g. ``nltk.tokenize.wordpunct_tokenize``. @@ -339,10 +362,6 @@ string). ``nltk.tokenize.moses.MosesDetokenizer``, ``nltk.tokenize.moses.MosesTokenizer``. - - :class:`~deeppavlov.models.tokenizers.ru_sent_tokenizer.RuSentTokenizer` - (registered as ``ru_sent_tokenizer``) is a rule-based tokenizer for - Russian language. - - :class:`~deeppavlov.models.tokenizers.ru_tokenizer.RussianTokenizer` (registered as ``ru_tokenizer``) tokenizes or lemmatizes Russian texts using ``nltk.tokenize.toktok.ToktokTokenizer``. @@ -363,21 +382,11 @@ Embedder is a component that converts every token in a tokenized batch to a vector of a particular dimension (optionally, returns a single vector per sample). - - :class:`~deeppavlov.models.embedders.glove_embedder.GloVeEmbedder` - (registered as ``glove``) reads embedding file in GloVe format (file - starts with ``number_of_words embeddings_dim line`` followed by lines - ``word embedding_vector``). If ``mean`` returns one vector per - sample --- mean of embedding vectors of tokens. - - :class:`~deeppavlov.models.embedders.fasttext_embedder.FasttextEmbedder` (registered as ``fasttext``) reads embedding file in fastText format. If ``mean`` returns one vector per sample - mean of embedding vectors of tokens. - - :class:`~deeppavlov.models.embedders.bow_embedder.BoWEmbedder` - (registered as ``bow``) performs one-hot encoding of tokens using - pre-built vocabulary. - - :class:`~deeppavlov.models.embedders.tfidf_weighted_embedder.TfidfWeightedEmbedder` (registered as ``tfidf_weighted``) accepts embedder, tokenizer (for detokenization, by default, detokenize with joining with space), TFIDF @@ -385,11 +394,6 @@ sample). assign additional multiplcative weights to particular tags). If ``mean`` returns one vector per sample - mean of embedding vectors of tokens. - - :class:`~deeppavlov.models.embedders.elmo_embedder.ELMoEmbedder` - (registered as ``elmo``) converts tokens to pre-trained contextual - representations from large-scale bidirectional language models. See - examples `here `__. - Vectorizers ~~~~~~~~~~~ diff --git a/docs/intro/overview.rst b/docs/intro/overview.rst index d5f0e48d95..d7c38464a7 100644 --- a/docs/intro/overview.rst +++ b/docs/intro/overview.rst @@ -48,8 +48,5 @@ the input and output of a ``Skill`` should both be strings. Therefore, ``Skill``\ s are usually associated with dialogue tasks. -DeepPavlov is built on top of the machine learning frameworks -`TensorFlow `__, -`Keras `__ and `PyTorch `__. Other external libraries can be used to -build basic components. - +Most of DeepPavlov models are built on top of `PyTorch `__. +Other external libraries can be used to build basic components. diff --git a/docs/intro/quick_start.rst b/docs/intro/quick_start.rst index 9f31ee475f..1008966e9d 100644 --- a/docs/intro/quick_start.rst +++ b/docs/intro/quick_start.rst @@ -2,7 +2,7 @@ QuickStart ------------ First, follow instructions on :doc:`Installation page ` -to install ``deeppavlov`` package for Python 3.6/3.7. +to install ``deeppavlov`` package for Python 3.6/3.7/3.8/3.9. DeepPavlov contains a bunch of great pre-trained NLP models. Each model is determined by its config file. List of models is available on @@ -27,8 +27,8 @@ Before making choice of an interface, install model's package requirements python -m deeppavlov install * where ```` is path to the chosen model's config file (e.g. - ``deeppavlov/configs/ner/slotfill_dstc2.json``) or just name without - `.json` extension (e.g. ``slotfill_dstc2``) + ``deeppavlov/configs/classifiers/insults_kaggle_bert.json``) or just name without + `.json` extension (e.g. ``insults_kaggle_bert``) Command line interface (CLI) @@ -71,10 +71,6 @@ There are even more actions you can perform with configs: `), * ``risesocket`` to run a socket API server (see :doc:`docs `), - * ``telegram`` to run as a Telegram bot (see :doc:`docs - `), - * ``msbot`` to run a Miscrosoft Bot Framework server (see - :doc:`docs `), * ``predict`` to get prediction for samples from `stdin` or from `` if ``-f `` is specified. * ```` specifies path (or name) of model's config file @@ -111,7 +107,7 @@ You can train it in the same simple way: model = train_model(, download=True) * ``download=True`` downloads pretrained model, therefore the pretrained - model will be, first, loaded and then train (optional). + model will be, first, loaded and then trained (optional). Dataset will be downloaded regardless of whether there was ``-d`` flag or not. @@ -128,26 +124,23 @@ You can also calculate metrics on the dataset specified in your config file: model = evaluate_model(, download=True) -There are also available integrations with various messengers, see -:doc:`Telegram Bot doc page ` and others in the -Integrations section for more info. - Using GPU ~~~~~~~~~ -To run or train **TensorFlow**-based DeepPavlov models on GPU you should have `CUDA `__ 10.0 -installed on your host machine and TensorFlow with GPU support (``tensorflow-gpu``) -installed in your python environment. Current supported TensorFlow version is 1.15.5. Run +To run or train **PyTorch**-based DeepPavlov models on GPU you should have `CUDA `__ +installed on your host machine, and install model's package requirements. CUDA version should be compatible with +DeepPavlov :dp_file:`required PyTorch version `. - .. code:: bash +.. warning:: + If you use latest NVIDIA architecture, PyTorch installed from PyPI using DeepPavlov could not support your device + CUDA capability. You will receive incompatible device warning after model initialization. You can install compatible + package from `download.pytorch.org `_. For example: - pip install tensorflow-gpu==1.15.5 + .. code:: bash -before installing model's package requirements to install supported ``tensorflow-gpu`` version. + pip3 install torch==1.8.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html -To run or train **PyTorch**-based DeepPavlov models on GPU you should also have `CUDA `__ 9.0 or 10.0 -installed on your host machine, and install model's package requirements. If you want to run the code on GPU, just make the device visible for the script. If you want to use a particular device, you may set it in command line: @@ -207,15 +200,13 @@ a paragraph of text), where the answer to the question is a segment of the conte .. table:: :widths: auto - +----------+------------------------------------------------------------------------------------------------+-------------------------------------------+ - | Language | DeepPavlov config | Demo | - +==========+================================================================================================+===========================================+ - | Multi | :config:`squad_bert_multilingual_freezed_emb ` | https://demo.deeppavlov.ai/#/mu/textqa | - +----------+------------------------------------------------------------------------------------------------+-------------------------------------------+ - | En | :config:`squad_bert_infer ` | https://demo.deeppavlov.ai/#/en/textqa | - +----------+------------------------------------------------------------------------------------------------+-------------------------------------------+ - | Ru | :config:`squad_ru_bert_infer ` | https://demo.deeppavlov.ai/#/ru/textqa | - +----------+------------------------------------------------------------------------------------------------+-------------------------------------------+ + +----------+------------------------------------------------------------------------------------+-------------------------------------------+ + | Language | DeepPavlov config | Demo | + +==========+====================================================================================+===========================================+ + | En | :config:`squad_bert ` | https://demo.deeppavlov.ai/#/en/textqa | + +----------+------------------------------------------------------------------------------------+-------------------------------------------+ + | Ru | :config:`squad_ru_bert ` | https://demo.deeppavlov.ai/#/ru/textqa | + +----------+------------------------------------------------------------------------------------+-------------------------------------------+ Name Entity Recognition @@ -252,22 +243,7 @@ related to. +----------+------------------------------------------------------------------------------------------------+-------------------------------------------+ | Language | DeepPavlov config | Demo | +==========+================================================================================================+===========================================+ - | En | :config:`insults_kaggle_conv_bert ` | https://demo.deeppavlov.ai/#/en/insult | - +----------+------------------------------------------------------------------------------------------------+-------------------------------------------+ - - -Sentiment Analysis -================== - -Classify text according to a prevailing emotion (positive, negative, etc.) in it. - -.. table:: - :widths: auto - - +----------+------------------------------------------------------------------------------------------------+-------------------------------------------+ - | Language | DeepPavlov config | Demo | - +==========+================================================================================================+===========================================+ - | Ru | :config:`rusentiment_elmo_twitter_cnn ` | https://demo.deeppavlov.ai/#/ru/sentiment | + | En | :config:`insults_kaggle_bert ` | https://demo.deeppavlov.ai/#/en/insult | +----------+------------------------------------------------------------------------------------------------+-------------------------------------------+ @@ -282,7 +258,5 @@ Detect if two given texts have the same meaning. +----------+------------------------------------------------------------------------------------------------+-------------------------------------------+ | Language | DeepPavlov config | Demo | +==========+================================================================================================+===========================================+ - | En | :config:`paraphraser_bert ` | None | - +----------+------------------------------------------------------------------------------------------------+-------------------------------------------+ | Ru | :config:`paraphraser_rubert ` | None | +----------+------------------------------------------------------------------------------------------------+-------------------------------------------+ diff --git a/examples/Pseudo-labeling for classification.ipynb b/examples/Pseudo-labeling for classification.ipynb deleted file mode 100644 index 8d01922069..0000000000 --- a/examples/Pseudo-labeling for classification.ipynb +++ /dev/null @@ -1,210 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import json\n", - "from pathlib import Path\n", - "import numpy as np\n", - "from copy import deepcopy\n", - "import pandas as pd\n", - "\n", - "from deeppavlov.core.commands.train import read_data_by_config, train_evaluate_model_from_config\n", - "from deeppavlov.core.commands.infer import interact_model, build_model\n", - "from deeppavlov.core.commands.utils import expand_path, parse_config\n", - "from deeppavlov.core.common.params import from_params\n", - "from deeppavlov.core.common.errors import ConfigError" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# read unlabelled data for label propagation\n", - "def read_unlabelled_data(UNLABELLED_DATA_PATH):\n", - " with open(UNLABELLED_DATA_PATH, \"r\") as f:\n", - " unlabelled_data = f.read().splitlines()\n", - " unlabelled_data = [x for x in unlabelled_data if x != '']\n", - " return unlabelled_data" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "scrolled": true - }, - "outputs": [], - "source": [ - "def make_pl_config(CONFIG_PATH):\n", - " config_path_pl = Path(CONFIG_PATH).parent / Path(Path(CONFIG_PATH).stem + \"_pl.json\")\n", - "\n", - " with open(CONFIG_PATH, \"r\") as f:\n", - " config = json.load(f)\n", - " \n", - " config_pl = deepcopy(config)\n", - " config_pl[\"dataset_reader\"][\"train\"] = Path(config_pl[\"dataset_reader\"].get(\"train\", \"train.csv\")).stem + \"_pl.csv\"\n", - " \n", - " with open(config_path_pl, \"w\") as f:\n", - " json.dump(config_pl, f, indent=2)\n", - " \n", - " return config, config_pl" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "def save_extended_data(config, samples, labels, new_config = None):\n", - " train_data = read_data_by_config(deepcopy(config))\n", - " \n", - " for i in range(len(samples)):\n", - " train_data[\"train\"].append((samples[i], labels[i]))\n", - " df = pd.DataFrame(train_data[\"train\"], \n", - " columns=[config[\"dataset_reader\"][\"x\"], \n", - " config[\"dataset_reader\"][\"y\"]])\n", - " df[config[\"dataset_reader\"][\"y\"]] = df[config[\"dataset_reader\"][\"y\"]].apply(\n", - " lambda x: config[\"dataset_reader\"].get(\"class_sep\", \",\").join(x))\n", - " \n", - " if new_config is not None:\n", - " config = new_config\n", - " file = expand_path(Path(config[\"dataset_reader\"][\"data_path\"]) / \n", - " Path(config[\"dataset_reader\"][\"train\"]))\n", - "\n", - " if config[\"dataset_reader\"].get(\"format\", \"csv\") == \"csv\":\n", - " keys = ('sep', 'header', 'names')\n", - " df.to_csv(file, \n", - " index=False,\n", - " sep=config[\"dataset_reader\"].get(\"sep\", \",\")\n", - " )\n", - " elif config[\"dataset_reader\"].get(\"format\", \"csv\") == \"json\":\n", - " keys = ('orient', 'lines')\n", - " df.to_json(file, \n", - " index=False,\n", - " orient=config[\"dataset_reader\"].get(\"orient\", None),\n", - " lines=config[\"dataset_reader\"].get(\"lines\", False)\n", - " )\n", - " else:\n", - " raise ConfigError(\"Can not work with current data format\")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "scrolled": true - }, - "outputs": [], - "source": [ - "# manually given parameters for pseudo-labeling\n", - "\n", - "# path to config file\n", - "CONFIG_PATH = \"../deeppavlov/configs/classifiers/convers_vs_info.json\"\n", - "# read config, compose new one, save it\n", - "config, config_pl = make_pl_config(CONFIG_PATH)\n", - "config, config_pl = parse_config(config), parse_config(config_pl)\n", - "config" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# path to file with unlabelled data\n", - "UNLABELLED_DATA_PATH = expand_path(Path(config[\"dataset_reader\"][\"data_path\"])) / Path(\"question_L6.txt\")\n", - "# number of samples that are going to be labelled during one iteration of label propagation\n", - "ONE_ITERATION_PORTION = 100\n", - "# number of iterations\n", - "N_ITERATIONS = 10\n", - "CLASSES_VOCAB_ID_IN_PIPE = 0\n", - "CONFIDENT_PROBA = 0.9" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# read unlabelled dataset\n", - "unlabelled_data = read_unlabelled_data(UNLABELLED_DATA_PATH)\n", - "\n", - "# save initial dataset as extended\n", - "save_extended_data(config, [], [], new_config=config_pl)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "scrolled": true - }, - "outputs": [], - "source": [ - "available_unlabelled_ids = np.arange(len(unlabelled_data))\n", - "\n", - "np.random.seed(42)\n", - "\n", - "for i in range(N_ITERATIONS):\n", - " samples = []\n", - " labels = []\n", - " \n", - " ids_to_label = available_unlabelled_ids[\n", - " np.random.randint(low=0, \n", - " high=len(available_unlabelled_ids), \n", - " size=ONE_ITERATION_PORTION)]\n", - " available_unlabelled_ids = np.delete(available_unlabelled_ids, ids_to_label)\n", - " train_evaluate_model_from_config(deepcopy(config_pl))\n", - " model = build_model(deepcopy(config_pl))\n", - " classes = np.array(list(from_params(\n", - " deepcopy(config_pl[\"chainer\"][\"pipe\"][CLASSES_VOCAB_ID_IN_PIPE])).keys()))\n", - "\n", - " for j, sample_id in enumerate(ids_to_label):\n", - " prediction = model([unlabelled_data[sample_id]])[0]\n", - " if len(np.where(np.array(prediction) > CONFIDENT_PROBA)[0]):\n", - " samples.append(unlabelled_data[sample_id])\n", - " labels.append(classes[np.where(np.array(prediction) > CONFIDENT_PROBA)])\n", - " \n", - " print(\"Iteration {}: add {} samples to train dataset\".format(i, len(samples)))\n", - " save_extended_data(config_pl, samples, labels)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "name": "python3" - }, - "accelerator": "GPU", - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.6" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/examples/README.md b/examples/README.md deleted file mode 100644 index e594ae7003..0000000000 --- a/examples/README.md +++ /dev/null @@ -1,19 +0,0 @@ -# Examples & Tutorials - -* Tutorial for simple bot [[notebook]](gobot_tutorial.ipynb) [[colab]](https://colab.research.google.com/github/deepmipt/DeepPavlov/blob/master/examples/gobot_tutorial.ipynb) - -* Tutorial for advanced goal-oriented bot [[notebook]](gobot_extended_tutorial.ipynb) [[colab]](https://colab.research.google.com/github/deepmipt/DeepPavlov/blob/master/examples/gobot_extended_tutorial.ipynb) - -* Tutorial for intent classifier [[notebook]](classification_tutorial.ipynb) [[colab]](https://colab.research.google.com/github/deepmipt/DeepPavlov/blob/master/examples/classification_tutorial.ipynb) - -* Morphotagger model usage example [[notebook]](morphotagger_example.ipynb) [[colab]](https://colab.research.google.com/github/deepmipt/DeepPavlov/blob/master/examples/morphotagger_example.ipynb) - -* Pseudo-labeling for classication task [[notebook]](Pseudo-labeling%20for%20classification.ipynb) [[colab]](https://colab.research.google.com/github/deepmipt/DeepPavlov/blob/master/examples/Pseudo-labeling%20for%20classification.ipynb) - -* Optimal learning rate search in DeepPavlov [[notebook]](super_convergence_tutorial.ipynb) [[colab]](https://colab.research.google.com/github/deepmipt/DeepPavlov/blob/master/examples/super_convergence_tutorial.ipynb) - -# Links - -More examples are available: -* [github.com/deepmipt/dp_tutorials/](https://github.com/deepmipt/dp_tutorials) -* [github.com/deepmipt/db_notebooks/](https://github.com/deepmipt/dp_notebooks). diff --git a/examples/classification_tutorial.ipynb b/examples/classification_tutorial.ipynb deleted file mode 100644 index e7792ccd1e..0000000000 --- a/examples/classification_tutorial.ipynb +++ /dev/null @@ -1,2961 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## You can also run the notebook in [COLAB](https://colab.research.google.com/github/deepmipt/DeepPavlov/blob/master/examples/classification_tutorial.ipynb)." - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "!pip3 install deeppavlov" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Classification on DeepPavlov" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "**Task**:\n", - "Intent recognition on SNIPS dataset: https://github.com/snipsco/nlu-benchmark/tree/master/2017-06-custom-intent-engines that has already been recomposed to `csv` format and can be downloaded from http://files.deeppavlov.ai/datasets/snips_intents/train.csv\n", - "\n", - "FastText English word embeddings ~8Gb: http://files.deeppavlov.ai/deeppavlov_data/embeddings/wiki.en.bin" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Plan of the notebook with documentation links:\n", - "\n", - "1. [Data aggregation](#Data-aggregation)\n", - " * [DatasetReader](#DatasetReader): [docs link](https://deeppavlov.readthedocs.io/en/latest/apiref/dataset_readers.html)\n", - " * [DatasetIterator](#DatasetIterator): [docs link](https://deeppavlov.readthedocs.io/en/latest/apiref/dataset_iterators.html)\n", - "2. [Data preprocessing](#Data-preprocessing): [docs link](https://deeppavlov.readthedocs.io/en/latest/components/data_processors.html)\n", - " * [Lowercasing](#Lowercasing)\n", - " * [Tokenization](#Tokenization)\n", - " * [Vocabulary](#Vocabulary)\n", - "3. [Featurization](#Featurization): [docs link](https://deeppavlov.readthedocs.io/en/latest/components/data_processors.html), [pre-trained embeddings link](https://deeppavlov.readthedocs.io/en/latest/intro/pretrained_vectors.html)\n", - " * [Bag-of-words embedder](#Bag-of-words)\n", - " * [TF-IDF vectorizer](#TF-IDF-Vectorizer)\n", - " * [GloVe embedder](#GloVe-embedder)\n", - " * [Mean GloVe embedder](#Mean-GloVe-embedder)\n", - " * [GloVe weighted by TF-IDF embedder](#GloVe-weighted-by-TF-IDF-embedder)\n", - "4. [Models](#Models): [docs link](https://deeppavlov.readthedocs.io/en/latest/components/classifiers.html)\n", - " * [Building models in python](#Models-in-python)\n", - " - [Sklearn component classifiers](#SklearnComponent-classifier-on-Tfidf-features-in-python)\n", - " - [Keras classification model on GloVe emb](#KerasClassificationModel-on-GloVe-embeddings-in-python)\n", - " - [Sklearn component classifier on GloVe weighted emb](#SklearnComponent-classifier-on-GloVe-weighted-by-TF-IDF-embeddings-in-python)\n", - " * [Building models from configs](#Models-from-configs)\n", - " - [Sklearn component classifiers](#SklearnComponent-classifier-on-Tfidf-features-from-config)\n", - " - [Keras classification model](#KerasClassificationModel-on-fastText-embeddings-from-config)\n", - " - [Sklearn component classifier on GloVe weighted emb](#SklearnComponent-classifier-on-GloVe-weighted-by-TF-IDF-embeddings-from-config)\n", - " * [Bonus: pre-trained CNN model in DeepPavlov](#Bonus:-pre-trained-CNN-model-in-DeepPavlov)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Data aggregation" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "First of all, let's download and look into data we will work with." - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "metadata": { - "scrolled": true - }, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2019-02-12 12:14:21.101 INFO in 'deeppavlov.core.data.utils'['utils'] at line 63: Downloading from http://files.deeppavlov.ai/datasets/snips_intents/train.csv to snips/train.csv\n", - "100%|██████████| 981k/981k [00:00<00:00, 63.5MB/s]\n" - ] - } - ], - "source": [ - "from deeppavlov.core.data.utils import simple_download\n", - "\n", - "#download train data file for SNIPS\n", - "simple_download(url=\"http://files.deeppavlov.ai/datasets/snips_intents/train.csv\", \n", - " destination=\"./snips/train.csv\")" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "text,intents\r\n", - "Add another song to the Cita RomГЎntica playlist. ,AddToPlaylist\r\n", - "add clem burke in my playlist Pre-Party R&B Jams,AddToPlaylist\r\n", - "Add Live from Aragon Ballroom to Trapeo,AddToPlaylist\r\n", - "add Unite and Win to my night out,AddToPlaylist\r\n", - "Add track to my Digster Future Hits,AddToPlaylist\r\n", - "add the piano bar to my Cindy Wilson,AddToPlaylist\r\n", - "Add Spanish Harlem Incident to cleaning the house,AddToPlaylist\r\n", - "add The Greyest of Blue Skies in Indie EspaГ±ol my playlist,AddToPlaylist\r\n", - "Add the name kids in the street to the plylist New Indie Mix,AddToPlaylist\r\n", - "add album radar latino,AddToPlaylist\r\n", - "Add Tranquility to the Latin Pop Rising playlist. ,AddToPlaylist\r\n", - "Add d flame to the Dcode2016 playlist.,AddToPlaylist\r\n", - "Add album to my fairy tales,AddToPlaylist\r\n", - "I need another artist in the New Indie Mix playlist. ,AddToPlaylist\r\n" - ] - } - ], - "source": [ - "! head -n 15 snips/train.csv" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### DatasetReader\n", - "\n", - "Read data using `BasicClassificationDatasetReader` из DeepPavlov" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": { - "scrolled": true - }, - "outputs": [], - "source": [ - "from deeppavlov.dataset_readers.basic_classification_reader import BasicClassificationDatasetReader" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2019-02-12 12:14:23.376 WARNING in 'deeppavlov.dataset_readers.basic_classification_reader'['basic_classification_reader'] at line 96: Cannot find snips/valid.csv file\n", - "2019-02-12 12:14:23.376 WARNING in 'deeppavlov.dataset_readers.basic_classification_reader'['basic_classification_reader'] at line 96: Cannot find snips/test.csv file\n" - ] - } - ], - "source": [ - "# read data from particular columns of `.csv` file\n", - "dr = BasicClassificationDatasetReader().read(\n", - " data_path='./snips/',\n", - " train='train.csv',\n", - " x = 'text',\n", - " y = 'intents'\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We don't have a ready train/valid/test split." - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "[('train', 15884), ('valid', 0), ('test', 0)]" - ] - }, - "execution_count": 5, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# check train/valid/test sizes\n", - "[(k, len(dr[k])) for k in dr.keys()]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### DatasetIterator\n", - "\n", - "Use `BasicClassificationDatasetIterator` to split `train` on `train` and `valid` and to generate batches of samples." - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "from deeppavlov.dataset_iterators.basic_classification_iterator import BasicClassificationDatasetIterator" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2019-02-12 12:14:23.557 INFO in 'deeppavlov.dataset_iterators.basic_classification_iterator'['basic_classification_iterator'] at line 73: Splitting field <> to new fields <<['train', 'valid']>>\n" - ] - } - ], - "source": [ - "# initialize data iterator splitting `train` field to `train` and `valid` in proportion 0.8/0.2\n", - "train_iterator = BasicClassificationDatasetIterator(\n", - " data=dr,\n", - " field_to_split='train', # field that will be splitted\n", - " split_fields=['train', 'valid'], # fields to which the fiald above will be splitted\n", - " split_proportions=[0.8, 0.2], #proportions for splitting\n", - " split_seed=23, # seed for splitting dataset\n", - " seed=42) # seed for iteration over dataset" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Let's look into training samples. " - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "x: Is it freezing in Offerman, California?\n", - "y: ['GetWeather']\n", - "=================\n", - "x: put this song in the playlist Trap Land\n", - "y: ['AddToPlaylist']\n", - "=================\n", - "x: show me a textbook with a rating of 2 and a maximum rating of 6 that is current\n", - "y: ['RateBook']\n", - "=================\n", - "x: Will the weather be okay in Northern Luzon Heroes Hill National Park 4 and a half months from now?\n", - "y: ['GetWeather']\n", - "=================\n", - "x: Rate the current album a four\n", - "y: ['RateBook']\n", - "=================\n" - ] - } - ], - "source": [ - "# one can get train instances (or any other data type including `all`)\n", - "x_train, y_train = train_iterator.get_instances(data_type='train')\n", - "for x, y in list(zip(x_train, y_train))[:5]:\n", - " print('x:', x)\n", - " print('y:', y)\n", - " print('=================')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Data preprocessing" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We will be using lowercasing and tokenization as data preparation. \n", - "\n", - "DeepPavlov also contains several other preprocessors and tokenizers." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Lowercasing" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "`str_lower` lowercases texts." - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "[nltk_data] Downloading package punkt to /home/vimary/nltk_data...\n", - "[nltk_data] Package punkt is already up-to-date!\n", - "[nltk_data] Downloading package stopwords to /home/vimary/nltk_data...\n", - "[nltk_data] Package stopwords is already up-to-date!\n", - "[nltk_data] Downloading package perluniprops to\n", - "[nltk_data] /home/vimary/nltk_data...\n", - "[nltk_data] Package perluniprops is already up-to-date!\n", - "[nltk_data] Downloading package nonbreaking_prefixes to\n", - "[nltk_data] /home/vimary/nltk_data...\n", - "[nltk_data] Package nonbreaking_prefixes is already up-to-date!\n" - ] - } - ], - "source": [ - "from deeppavlov.models.preprocessors.str_lower import str_lower" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "['is it freezing in offerman, california?']" - ] - }, - "execution_count": 10, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "str_lower(['Is it freezing in Offerman, California?'])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Tokenization\n", - "\n", - "`NLTKTokenizer` can split string to tokens." - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [], - "source": [ - "from deeppavlov.models.tokenizers.nltk_moses_tokenizer import NLTKMosesTokenizer" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "[['Is', 'it', 'freezing', 'in', 'Offerman', ',', 'California', '?']]" - ] - }, - "execution_count": 12, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "tokenizer = NLTKMosesTokenizer()\n", - "tokenizer(['Is it freezing in Offerman, California?'])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Let's preprocess all `train` part of the dataset." - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [], - "source": [ - "train_x_lower_tokenized = str_lower(tokenizer(train_iterator.get_instances(data_type='train')[0]))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Vocabulary\n", - "\n", - "Now we are ready to use `vocab`. They are very usefull for:\n", - "* extracting class labels and converting labels to indices and vice versa,\n", - "* building of characters or tokens vocabularies." - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [], - "source": [ - "from deeppavlov.core.data.simple_vocab import SimpleVocabulary" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [], - "source": [ - "# initialize simple vocabulary to collect all appeared in the dataset classes\n", - "classes_vocab = SimpleVocabulary(\n", - " save_path='./snips/classes.dict',\n", - " load_path='./snips/classes.dict')" - ] - }, - { - "cell_type": "code", - "execution_count": 16, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2019-02-12 12:14:25.35 INFO in 'deeppavlov.core.data.simple_vocab'['simple_vocab'] at line 89: [saving vocabulary to /home/vimary/ipavlov/Pilot/examples/tutorials/snips/classes.dict]\n" - ] - } - ], - "source": [ - "classes_vocab.fit((train_iterator.get_instances(data_type='train')[1]))\n", - "classes_vocab.save()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Let's see what classes the dataset contains and their indices in the vocabulary." - ] - }, - { - "cell_type": "code", - "execution_count": 17, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "[('GetWeather', 0),\n", - " ('PlayMusic', 1),\n", - " ('SearchScreeningEvent', 2),\n", - " ('BookRestaurant', 3),\n", - " ('RateBook', 4),\n", - " ('SearchCreativeWork', 5),\n", - " ('AddToPlaylist', 6)]" - ] - }, - "execution_count": 17, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "list(classes_vocab.items())" - ] - }, - { - "cell_type": "code", - "execution_count": 18, - "metadata": {}, - "outputs": [], - "source": [ - "# also one can collect vocabulary of textual tokens appeared 2 and more times in the dataset\n", - "token_vocab = SimpleVocabulary(\n", - " save_path='./snips/tokens.dict',\n", - " load_path='./snips/tokens.dict',\n", - " min_freq=2,\n", - " special_tokens=('', '',),\n", - " unk_token='')" - ] - }, - { - "cell_type": "code", - "execution_count": 19, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2019-02-12 12:14:25.157 INFO in 'deeppavlov.core.data.simple_vocab'['simple_vocab'] at line 89: [saving vocabulary to /home/vimary/ipavlov/Pilot/examples/tutorials/snips/tokens.dict]\n" - ] - } - ], - "source": [ - "token_vocab.fit(train_x_lower_tokenized)\n", - "token_vocab.save()" - ] - }, - { - "cell_type": "code", - "execution_count": 20, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "4564" - ] - }, - "execution_count": 20, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# number of tokens in dictionary\n", - "len(token_vocab)" - ] - }, - { - "cell_type": "code", - "execution_count": 21, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "[('the', 6953),\n", - " ('a', 3917),\n", - " ('in', 3265),\n", - " ('to', 3203),\n", - " ('for', 2814),\n", - " ('of', 2401),\n", - " ('.', 2400),\n", - " ('i', 2079),\n", - " ('at', 1935),\n", - " ('play', 1703)]" - ] - }, - "execution_count": 21, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# 10 most common words and number of times their appeared\n", - "token_vocab.freqs.most_common()[:10]" - ] - }, - { - "cell_type": "code", - "execution_count": 22, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "[[13, 36, 244, 4, 1, 29, 996, 20]]" - ] - }, - "execution_count": 22, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "token_ids = token_vocab(str_lower(tokenizer(['Is it freezing in Offerman, California?'])))\n", - "token_ids" - ] - }, - { - "cell_type": "code", - "execution_count": 23, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "['is it freezing in , california?']" - ] - }, - "execution_count": 23, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "tokenizer(token_vocab(token_ids))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Featurization\n", - "\n", - "This part contains several possible ways of featurization of text samples. One can chose any appropriate vectorizer/embedder according to available resources and given task.\n", - "\n", - "Bag-of-words (BoW) and TF-IDF vectorizers converts text samples to vectors (one vector per sample) while fastText, GloVe, fastText weighted by TF-IDF embedders either produce an embedding vector per token or an embedding vector per text sample (if `mean` set to True)." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Bag-of-words\n", - "\n", - "Matches a vector to each text sample indicating which words appeared in the given sample: text -> binary vector $v$: \\[0, 1, 0, 0, 0, 1, ..., ...1, 0, 1\\]. \n", - "\n", - "Dimensionality of vector $v$ is equal to vocabulary size.\n", - "\n", - "$v_i$ == 1, if word $i$ is in the text,\n", - "\n", - "$v_i$ == 0, else." - ] - }, - { - "cell_type": "code", - "execution_count": 24, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "from deeppavlov.models.embedders.bow_embedder import BoWEmbedder" - ] - }, - { - "cell_type": "code", - "execution_count": 25, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "[array([0, 1, 0, ..., 0, 0, 0], dtype=int32)]" - ] - }, - "execution_count": 25, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# initialize bag-of-words embedder giving total number of tokens\n", - "bow = BoWEmbedder(depth=token_vocab.len)\n", - "# it assumes indexed tokenized samples\n", - "bow(token_vocab(str_lower(tokenizer(['Is it freezing in Offerman, California?']))))" - ] - }, - { - "cell_type": "code", - "execution_count": 26, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "8" - ] - }, - "execution_count": 26, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# all 8 tokens are in the vocabulary\n", - "sum(bow(token_vocab(str_lower(tokenizer(['Is it freezing in Offerman, California?']))))[0])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### TF-IDF Vectorizer\n", - "\n", - "Matches a vector to each text sample: text -> vector $v$ from $R^N$ where $N$ is a vocabulary size.\n", - "\n", - "$TF-IDF(token, document) = TF(token, document) * IDF(token, document)$\n", - "\n", - "$TF$ is a term frequency:\n", - "\n", - "$TF(token, document) = \\frac{n_{token}}{\\sum_{k}n_k}.$\n", - "\n", - "$IDF$ is a inverse document frequency:\n", - "\n", - "$IDF(token, all\\_documents) = \\frac{Total\\ number\\ of\\ documents}{number\\ of\\ documents\\ where\\ token\\ appeared}.$" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "`SklearnComponent` in DeepPavlov is a universal wrapper for any vecotirzer/estimator from `sklearn` package. The only requirement to specify component usage is following: model class and name of infer method should be passed as parameters." - ] - }, - { - "cell_type": "code", - "execution_count": 27, - "metadata": {}, - "outputs": [], - "source": [ - "from deeppavlov.models.sklearn import SklearnComponent" - ] - }, - { - "cell_type": "code", - "execution_count": 28, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2019-02-12 12:14:25.268 WARNING in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 218: Cannot load model from /home/vimary/ipavlov/Pilot/examples/tutorials/tfidf_v0.pkl\n", - "2019-02-12 12:14:25.269 INFO in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 165: Initializing model sklearn.feature_extraction.text:TfidfVectorizer from scratch\n" - ] - } - ], - "source": [ - "# initialize TF-IDF vectorizer sklearn component with `transform` as infer method\n", - "tfidf = SklearnComponent(\n", - " model_class=\"sklearn.feature_extraction.text:TfidfVectorizer\",\n", - " infer_method=\"transform\",\n", - " save_path='./tfidf_v0.pkl',\n", - " load_path='./tfidf_v0.pkl',\n", - " mode='train')" - ] - }, - { - "cell_type": "code", - "execution_count": 29, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2019-02-12 12:14:25.296 INFO in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 108: Fitting model sklearn.feature_extraction.text:TfidfVectorizer\n", - "2019-02-12 12:14:25.395 INFO in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 240: Saving model to /home/vimary/ipavlov/Pilot/examples/tutorials/tfidf_v0.pkl\n" - ] - } - ], - "source": [ - "# fit on textual train instances and save it\n", - "tfidf.fit(str_lower(train_iterator.get_instances(data_type='train')[0]))\n", - "tfidf.save()" - ] - }, - { - "cell_type": "code", - "execution_count": 30, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "<1x10709 sparse matrix of type ''\n", - "\twith 6 stored elements in Compressed Sparse Row format>" - ] - }, - "execution_count": 30, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "tfidf(str_lower(['Is it freezing in Offerman, California?']))" - ] - }, - { - "cell_type": "code", - "execution_count": 31, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "10709" - ] - }, - "execution_count": 31, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# number of tokens in the TF-IDF vocabulary\n", - "len(tfidf.model.vocabulary_)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### GloVe embedder\n", - "\n", - "[GloVe](https://nlp.stanford.edu/projects/glove/) is an unsupervised learning algorithm for obtaining vector representations for words. Training is performed on aggregated global word-word co-occurrence statistics from a corpus, and the resulting representations showcase interesting linear substructures of the word vector space." - ] - }, - { - "cell_type": "code", - "execution_count": 32, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "Using TensorFlow backend.\n" - ] - } - ], - "source": [ - "from deeppavlov.models.embedders.glove_embedder import GloVeEmbedder" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Let's download GloVe embedding file" - ] - }, - { - "cell_type": "code", - "execution_count": 33, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2019-02-12 12:14:26.153 INFO in 'deeppavlov.core.data.utils'['utils'] at line 63: Downloading from http://files.deeppavlov.ai/embeddings/glove.6B.100d.txt to glove.6B.100d.txt\n", - "347MB [00:06, 50.0MB/s] \n" - ] - } - ], - "source": [ - "simple_download(url=\"http://files.deeppavlov.ai/embeddings/glove.6B.100d.txt\", \n", - " destination=\"./glove.6B.100d.txt\")" - ] - }, - { - "cell_type": "code", - "execution_count": 34, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2019-02-12 12:14:33.99 INFO in 'deeppavlov.models.embedders.glove_embedder'['glove_embedder'] at line 52: [loading GloVe embeddings from `/home/vimary/ipavlov/Pilot/examples/tutorials/glove.6B.100d.txt`]\n" - ] - } - ], - "source": [ - "embedder = GloVeEmbedder(load_path='./glove.6B.100d.txt',\n", - " dim=100, pad_zero=True)" - ] - }, - { - "cell_type": "code", - "execution_count": 35, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "(1, 8, (100,))" - ] - }, - "execution_count": 35, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# output shape is (batch_size x max_num_tokens_in_the_batch x embedding_dim)\n", - "embedded_batch = embedder(str_lower(tokenizer(['Is it freezing in Offerman, California?']))) \n", - "len(embedded_batch), len(embedded_batch[0]), embedded_batch[0][0].shape" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Mean GloVe embedder" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Embedder returns a vector per token while we want to get a vector per text sample. Therefore, let's calculate mean vector of embeddings of tokens. \n", - "For that we can either init `GloVeEmbedder` with `mean=True` parameter (`mean=false` by default), or pass `mean=true` while calling function (this way `mean` value is assigned only for this call)." - ] - }, - { - "cell_type": "code", - "execution_count": 36, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "(1, (100,))" - ] - }, - "execution_count": 36, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# output shape is (batch_size x embedding_dim)\n", - "embedded_batch = embedder(str_lower(tokenizer(['Is it freezing in Offerman, California?'])), mean=True) \n", - "len(embedded_batch), embedded_batch[0].shape" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### GloVe weighted by TF-IDF embedder\n", - "\n", - "One of the possible ways to combine TF-IDF vectorizer and any token embedder is to weigh token embeddings by TF-IDF coefficients (therefore, `mean` set to True is obligatory to obtain embeddings of interest while it still **by default** returns embeddings of tokens." - ] - }, - { - "cell_type": "code", - "execution_count": 37, - "metadata": {}, - "outputs": [], - "source": [ - "from deeppavlov.models.embedders.tfidf_weighted_embedder import TfidfWeightedEmbedder" - ] - }, - { - "cell_type": "code", - "execution_count": 38, - "metadata": {}, - "outputs": [], - "source": [ - "weighted_embedder = TfidfWeightedEmbedder(\n", - " embedder=embedder, # our GloVe embedder instance\n", - " tokenizer=tokenizer, # our tokenizer instance\n", - " mean=True, # to return one vector per sample\n", - " vectorizer=tfidf # our TF-IDF vectorizer\n", - ")" - ] - }, - { - "cell_type": "code", - "execution_count": 39, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "(1, (100,))" - ] - }, - "execution_count": 39, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# output shape is (batch_size x embedding_dim)\n", - "embedded_batch = weighted_embedder(str_lower(tokenizer(['Is it freezing in Offerman, California?']))) \n", - "len(embedded_batch), embedded_batch[0].shape" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Models" - ] - }, - { - "cell_type": "code", - "execution_count": 40, - "metadata": {}, - "outputs": [], - "source": [ - "from deeppavlov.metrics.accuracy import sets_accuracy" - ] - }, - { - "cell_type": "code", - "execution_count": 41, - "metadata": {}, - "outputs": [], - "source": [ - "# get all train and valid data from iterator\n", - "x_train, y_train = train_iterator.get_instances(data_type=\"train\")\n", - "x_valid, y_valid = train_iterator.get_instances(data_type=\"valid\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Models in python" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### SklearnComponent classifier on Tfidf-features in python" - ] - }, - { - "cell_type": "code", - "execution_count": 42, - "metadata": { - "scrolled": true - }, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2019-02-12 12:14:53.75 WARNING in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 218: Cannot load model from /home/vimary/ipavlov/Pilot/examples/tutorials/logreg_v0.pkl\n", - "2019-02-12 12:14:53.75 INFO in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 165: Initializing model sklearn.linear_model:LogisticRegression from scratch\n" - ] - } - ], - "source": [ - "# initialize sklearn classifier, all parameters for classifier could be passed\n", - "cls = SklearnComponent(\n", - " model_class=\"sklearn.linear_model:LogisticRegression\",\n", - " infer_method=\"predict\",\n", - " save_path='./logreg_v0.pkl',\n", - " load_path='./logreg_v0.pkl',\n", - " C=1,\n", - " mode='train')" - ] - }, - { - "cell_type": "code", - "execution_count": 43, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2019-02-12 12:14:53.591 INFO in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 108: Fitting model sklearn.linear_model:LogisticRegression\n", - "2019-02-12 12:14:53.756 INFO in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 240: Saving model to /home/vimary/ipavlov/Pilot/examples/tutorials/logreg_v0.pkl\n" - ] - } - ], - "source": [ - "# fit sklearn classifier and save it\n", - "cls.fit(tfidf(x_train), y_train)\n", - "cls.save()" - ] - }, - { - "cell_type": "code", - "execution_count": 44, - "metadata": {}, - "outputs": [], - "source": [ - "y_valid_pred = cls(tfidf(x_valid))" - ] - }, - { - "cell_type": "code", - "execution_count": 45, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Text sample: I need seating at Floating restaurant in Tennessee for a group of 9\n", - "True label: ['BookRestaurant']\n", - "Predicted label: BookRestaurant\n" - ] - } - ], - "source": [ - "# Let's look into obtained result\n", - "print(\"Text sample: {}\".format(x_valid[0]))\n", - "print(\"True label: {}\".format(y_valid[0]))\n", - "print(\"Predicted label: {}\".format(y_valid_pred[0]))" - ] - }, - { - "cell_type": "code", - "execution_count": 46, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0.982373308152345" - ] - }, - "execution_count": 46, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# let's calculate sets accuracy (because each element is a list of labels)\n", - "sets_accuracy(np.squeeze(y_valid), y_valid_pred)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### KerasClassificationModel on GloVe embeddings in python" - ] - }, - { - "cell_type": "code", - "execution_count": 47, - "metadata": {}, - "outputs": [], - "source": [ - "from deeppavlov.models.classifiers.keras_classification_model import KerasClassificationModel\n", - "from deeppavlov.models.preprocessors.one_hotter import OneHotter\n", - "from deeppavlov.models.classifiers.proba2labels import Proba2Labels" - ] - }, - { - "cell_type": "code", - "execution_count": 48, - "metadata": { - "scrolled": true - }, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2019-02-12 12:14:54.421 INFO in 'deeppavlov.models.classifiers.keras_classification_model'['keras_classification_model'] at line 272: [initializing `KerasClassificationModel` from scratch as cnn_model]\n", - "2019-02-12 12:14:54.818 INFO in 'deeppavlov.models.classifiers.keras_classification_model'['keras_classification_model'] at line 136: Model was successfully initialized!\n", - "Model summary:\n", - "__________________________________________________________________________________________________\n", - "Layer (type) Output Shape Param # Connected to \n", - "==================================================================================================\n", - "input_1 (InputLayer) (None, 15, 100) 0 \n", - "__________________________________________________________________________________________________\n", - "conv1d_1 (Conv1D) (None, 15, 128) 38528 input_1[0][0] \n", - "__________________________________________________________________________________________________\n", - "conv1d_2 (Conv1D) (None, 15, 128) 64128 input_1[0][0] \n", - "__________________________________________________________________________________________________\n", - "conv1d_3 (Conv1D) (None, 15, 128) 89728 input_1[0][0] \n", - "__________________________________________________________________________________________________\n", - "batch_normalization_1 (BatchNor (None, 15, 128) 512 conv1d_1[0][0] \n", - "__________________________________________________________________________________________________\n", - "batch_normalization_2 (BatchNor (None, 15, 128) 512 conv1d_2[0][0] \n", - "__________________________________________________________________________________________________\n", - "batch_normalization_3 (BatchNor (None, 15, 128) 512 conv1d_3[0][0] \n", - "__________________________________________________________________________________________________\n", - "activation_1 (Activation) (None, 15, 128) 0 batch_normalization_1[0][0] \n", - "__________________________________________________________________________________________________\n", - "activation_2 (Activation) (None, 15, 128) 0 batch_normalization_2[0][0] \n", - "__________________________________________________________________________________________________\n", - "activation_3 (Activation) (None, 15, 128) 0 batch_normalization_3[0][0] \n", - "__________________________________________________________________________________________________\n", - "global_max_pooling1d_1 (GlobalM (None, 128) 0 activation_1[0][0] \n", - "__________________________________________________________________________________________________\n", - "global_max_pooling1d_2 (GlobalM (None, 128) 0 activation_2[0][0] \n", - "__________________________________________________________________________________________________\n", - "global_max_pooling1d_3 (GlobalM (None, 128) 0 activation_3[0][0] \n", - "__________________________________________________________________________________________________\n", - "concatenate_1 (Concatenate) (None, 384) 0 global_max_pooling1d_1[0][0] \n", - " global_max_pooling1d_2[0][0] \n", - " global_max_pooling1d_3[0][0] \n", - "__________________________________________________________________________________________________\n", - "dropout_1 (Dropout) (None, 384) 0 concatenate_1[0][0] \n", - "__________________________________________________________________________________________________\n", - "dense_1 (Dense) (None, 100) 38500 dropout_1[0][0] \n", - "__________________________________________________________________________________________________\n", - "batch_normalization_4 (BatchNor (None, 100) 400 dense_1[0][0] \n", - "__________________________________________________________________________________________________\n", - "activation_4 (Activation) (None, 100) 0 batch_normalization_4[0][0] \n", - "__________________________________________________________________________________________________\n", - "dropout_2 (Dropout) (None, 100) 0 activation_4[0][0] \n", - "__________________________________________________________________________________________________\n", - "dense_2 (Dense) (None, 7) 707 dropout_2[0][0] \n", - "__________________________________________________________________________________________________\n", - "batch_normalization_5 (BatchNor (None, 7) 28 dense_2[0][0] \n", - "__________________________________________________________________________________________________\n", - "activation_5 (Activation) (None, 7) 0 batch_normalization_5[0][0] \n", - "==================================================================================================\n", - "Total params: 233,555\n", - "Trainable params: 232,573\n", - "Non-trainable params: 982\n", - "__________________________________________________________________________________________________\n" - ] - } - ], - "source": [ - "# Intialize `KerasClassificationModel` that composes CNN shallow-and-wide network \n", - "# (name here as`cnn_model`)\n", - "cls = KerasClassificationModel(save_path=\"./cnn_model_v0\", \n", - " load_path=\"./cnn_model_v0\", \n", - " embedding_size=embedder.dim,\n", - " n_classes=classes_vocab.len,\n", - " model_name=\"cnn_model\",\n", - " text_size=15, # number of tokens\n", - " kernel_sizes_cnn=[3, 5, 7],\n", - " filters_cnn=128,\n", - " dense_size=100,\n", - " optimizer=\"Adam\",\n", - " learning_rate=0.1,\n", - " learning_rate_decay=0.01,\n", - " loss=\"categorical_crossentropy\")" - ] - }, - { - "cell_type": "code", - "execution_count": 49, - "metadata": {}, - "outputs": [], - "source": [ - "# `KerasClassificationModel` assumes one-hotted distribution of classes per sample.\n", - "# `OneHotter` converts indices to one-hot vectors representation.\n", - "# To obtain indices we can use our `classes_vocab` intialized and fitted above\n", - "onehotter = OneHotter(depth=classes_vocab.len, single_vector=True)" - ] - }, - { - "cell_type": "code", - "execution_count": 50, - "metadata": {}, - "outputs": [], - "source": [ - "# Train for 10 epochs\n", - "for ep in range(10):\n", - " for x, y in train_iterator.gen_batches(batch_size=64, \n", - " data_type=\"train\"):\n", - " x_embed = embedder(tokenizer(str_lower(x)))\n", - " y_onehot = onehotter(classes_vocab(y))\n", - " cls.train_on_batch(x_embed, y_onehot)" - ] - }, - { - "cell_type": "code", - "execution_count": 51, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2019-02-12 12:15:22.184 INFO in 'deeppavlov.models.classifiers.keras_classification_model'['keras_classification_model'] at line 386: [saving model to /home/vimary/ipavlov/Pilot/examples/tutorials/cnn_model_v0_opt.json]\n" - ] - } - ], - "source": [ - "# Save model weights and parameters\n", - "cls.save()" - ] - }, - { - "cell_type": "code", - "execution_count": 52, - "metadata": {}, - "outputs": [], - "source": [ - "# Infering on validation data we get probability distribution on given data.\n", - "y_valid_pred = cls(embedder(tokenizer(str_lower(x_valid))))" - ] - }, - { - "cell_type": "code", - "execution_count": 53, - "metadata": {}, - "outputs": [], - "source": [ - "# To convert probability distribution to labels, \n", - "# we first need to convert probabilities to indices,\n", - "# and then using vocabulary `classes_vocab` convert indices to labels.\n", - "# \n", - "# `Proba2Labels` converts probabilities to indices and supports three different modes:\n", - "# if `max_proba` is true, returns indices of the highest probabilities\n", - "# if `confidence_threshold` is given, returns indices with probabiltiies higher than threshold\n", - "# if `top_n` is given, returns `top_n` indices with highest probabilities\n", - "prob2labels = Proba2Labels(max_proba=True)" - ] - }, - { - "cell_type": "code", - "execution_count": 54, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Text sample: I need seating at Floating restaurant in Tennessee for a group of 9\n", - "True label: ['BookRestaurant']\n", - "Predicted probability distribution: {'GetWeather': 4.443174475454725e-05, 'PlayMusic': 0.0002085473679471761, 'SearchScreeningEvent': 6.492184911621734e-05, 'BookRestaurant': 0.9995043277740479, 'RateBook': 0.00021818796813022345, 'SearchCreativeWork': 0.0013526129769161344, 'AddToPlaylist': 8.029041782720014e-05}\n", - "Predicted label: ['BookRestaurant']\n" - ] - } - ], - "source": [ - "# Let's look into obtained result\n", - "print(\"Text sample: {}\".format(x_valid[0]))\n", - "print(\"True label: {}\".format(y_valid[0]))\n", - "print(\"Predicted probability distribution: {}\".format(dict(zip(classes_vocab.keys(), \n", - " y_valid_pred[0]))))\n", - "print(\"Predicted label: {}\".format(classes_vocab(prob2labels(y_valid_pred))[0]))" - ] - }, - { - "cell_type": "code", - "execution_count": 55, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0.982373308152345" - ] - }, - "execution_count": 55, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# calculate sets accuracy\n", - "sets_accuracy(y_valid, classes_vocab(prob2labels(y_valid_pred)))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### SklearnComponent classifier on GloVe weighted by TF-IDF embeddings in python" - ] - }, - { - "cell_type": "code", - "execution_count": 56, - "metadata": { - "scrolled": true - }, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2019-02-12 12:15:22.961 WARNING in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 218: Cannot load model from /home/vimary/ipavlov/Pilot/examples/tutorials/logreg_v1.pkl\n", - "2019-02-12 12:15:22.962 INFO in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 165: Initializing model sklearn.linear_model:LogisticRegression from scratch\n" - ] - } - ], - "source": [ - "# initialize sklearn classifier, all parameters for classifier could be passed\n", - "cls = SklearnComponent(\n", - " model_class=\"sklearn.linear_model:LogisticRegression\",\n", - " infer_method=\"predict\",\n", - " save_path='./logreg_v1.pkl',\n", - " load_path='./logreg_v1.pkl',\n", - " C=1,\n", - " mode='train')" - ] - }, - { - "cell_type": "code", - "execution_count": 57, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2019-02-12 12:15:44.521 INFO in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 108: Fitting model sklearn.linear_model:LogisticRegression\n", - "2019-02-12 12:15:46.59 INFO in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 240: Saving model to /home/vimary/ipavlov/Pilot/examples/tutorials/logreg_v1.pkl\n" - ] - } - ], - "source": [ - "# fit sklearn classifier and save it\n", - "cls.fit(weighted_embedder(str_lower(tokenizer(x_train))), y_train)\n", - "cls.save()" - ] - }, - { - "cell_type": "code", - "execution_count": 58, - "metadata": {}, - "outputs": [], - "source": [ - "y_valid_pred = cls(weighted_embedder(str_lower(tokenizer(x_valid))))" - ] - }, - { - "cell_type": "code", - "execution_count": 59, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Text sample: I need seating at Floating restaurant in Tennessee for a group of 9\n", - "True label: ['BookRestaurant']\n", - "Predicted label: BookRestaurant\n" - ] - } - ], - "source": [ - "# Let's look into obtained result\n", - "print(\"Text sample: {}\".format(x_valid[0]))\n", - "print(\"True label: {}\".format(y_valid[0]))\n", - "print(\"Predicted label: {}\".format(y_valid_pred[0]))" - ] - }, - { - "cell_type": "code", - "execution_count": 60, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0.9184765502045955" - ] - }, - "execution_count": 60, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# let's calculate sets accuracy (because each element is a list of labels)\n", - "sets_accuracy(np.squeeze(y_valid), y_valid_pred)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Let's free our memory from embeddings and models" - ] - }, - { - "cell_type": "code", - "execution_count": 61, - "metadata": {}, - "outputs": [], - "source": [ - "embedder.reset()\n", - "cls.reset()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Models from configs" - ] - }, - { - "cell_type": "code", - "execution_count": 62, - "metadata": {}, - "outputs": [], - "source": [ - "from deeppavlov import build_model\n", - "from deeppavlov import train_model" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### SklearnComponent classifier on Tfidf-features from config" - ] - }, - { - "cell_type": "code", - "execution_count": 63, - "metadata": {}, - "outputs": [], - "source": [ - "logreg_config = {\n", - " \"dataset_reader\": {\n", - " \"class_name\": \"basic_classification_reader\",\n", - " \"x\": \"text\",\n", - " \"y\": \"intents\",\n", - " \"data_path\": \"./snips\"\n", - " },\n", - " \"dataset_iterator\": {\n", - " \"class_name\": \"basic_classification_iterator\",\n", - " \"seed\": 42,\n", - " \"split_seed\": 23,\n", - " \"field_to_split\": \"train\",\n", - " \"split_fields\": [\n", - " \"train\",\n", - " \"valid\"\n", - " ],\n", - " \"split_proportions\": [\n", - " 0.9,\n", - " 0.1\n", - " ]\n", - " },\n", - " \"chainer\": {\n", - " \"in\": [\n", - " \"x\"\n", - " ],\n", - " \"in_y\": [\n", - " \"y\"\n", - " ],\n", - " \"pipe\": [\n", - " {\n", - " \"id\": \"classes_vocab\",\n", - " \"class_name\": \"simple_vocab\",\n", - " \"fit_on\": [\n", - " \"y\"\n", - " ],\n", - " \"save_path\": \"./snips/classes.dict\",\n", - " \"load_path\": \"./snips/classes.dict\",\n", - " \"in\": \"y\",\n", - " \"out\": \"y_ids\"\n", - " },\n", - " {\n", - " \"in\": [\n", - " \"x\"\n", - " ],\n", - " \"out\": [\n", - " \"x_vec\"\n", - " ],\n", - " \"fit_on\": [\n", - " \"x\",\n", - " \"y_ids\"\n", - " ],\n", - " \"id\": \"tfidf_vec\",\n", - " \"class_name\": \"sklearn_component\",\n", - " \"save_path\": \"tfidf_v1.pkl\",\n", - " \"load_path\": \"tfidf_v1.pkl\",\n", - " \"model_class\": \"sklearn.feature_extraction.text:TfidfVectorizer\",\n", - " \"infer_method\": \"transform\"\n", - " },\n", - " {\n", - " \"in\": \"x\",\n", - " \"out\": \"x_tok\",\n", - " \"id\": \"my_tokenizer\",\n", - " \"class_name\": \"nltk_moses_tokenizer\",\n", - " \"tokenizer\": \"wordpunct_tokenize\"\n", - " },\n", - " {\n", - " \"in\": [\n", - " \"x_vec\"\n", - " ],\n", - " \"out\": [\n", - " \"y_pred\"\n", - " ],\n", - " \"fit_on\": [\n", - " \"x_vec\",\n", - " \"y\"\n", - " ],\n", - " \"class_name\": \"sklearn_component\",\n", - " \"main\": True,\n", - " \"save_path\": \"logreg_v2.pkl\",\n", - " \"load_path\": \"logreg_v2.pkl\",\n", - " \"model_class\": \"sklearn.linear_model:LogisticRegression\",\n", - " \"infer_method\": \"predict\",\n", - " \"ensure_list_output\": True\n", - " }\n", - " ],\n", - " \"out\": [\n", - " \"y_pred\"\n", - " ]\n", - " },\n", - " \"train\": {\n", - " \"batch_size\": 64,\n", - " \"metrics\": [\n", - " \"accuracy\"\n", - " ],\n", - " \"validate_best\": True,\n", - " \"test_best\": False\n", - " }\n", - "}\n" - ] - }, - { - "cell_type": "code", - "execution_count": 64, - "metadata": { - "scrolled": true - }, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2019-02-12 12:15:52.310 WARNING in 'deeppavlov.dataset_readers.basic_classification_reader'['basic_classification_reader'] at line 96: Cannot find /home/vimary/ipavlov/Pilot/examples/tutorials/snips/valid.csv file\n", - "2019-02-12 12:15:52.310 WARNING in 'deeppavlov.dataset_readers.basic_classification_reader'['basic_classification_reader'] at line 96: Cannot find /home/vimary/ipavlov/Pilot/examples/tutorials/snips/test.csv file\n", - "2019-02-12 12:15:52.311 INFO in 'deeppavlov.dataset_iterators.basic_classification_iterator'['basic_classification_iterator'] at line 73: Splitting field <> to new fields <<['train', 'valid']>>\n", - "2019-02-12 12:15:52.314 WARNING in 'deeppavlov.core.commands.train'['train'] at line 108: \"validate_best\" and \"test_best\" parameters are deprecated. Please, use \"evaluation_targets\" list instead\n", - "2019-02-12 12:15:52.322 INFO in 'deeppavlov.core.data.simple_vocab'['simple_vocab'] at line 103: [loading vocabulary from /home/vimary/ipavlov/Pilot/examples/tutorials/snips/classes.dict]\n", - "2019-02-12 12:15:52.339 INFO in 'deeppavlov.core.data.simple_vocab'['simple_vocab'] at line 89: [saving vocabulary to /home/vimary/ipavlov/Pilot/examples/tutorials/snips/classes.dict]\n", - "2019-02-12 12:15:52.340 WARNING in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 218: Cannot load model from /home/vimary/ipavlov/Pilot/examples/tutorials/tfidf_v1.pkl\n", - "2019-02-12 12:15:52.341 INFO in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 165: Initializing model sklearn.feature_extraction.text:TfidfVectorizer from scratch\n", - "2019-02-12 12:15:52.389 INFO in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 108: Fitting model sklearn.feature_extraction.text:TfidfVectorizer\n", - "2019-02-12 12:15:52.493 INFO in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 240: Saving model to /home/vimary/ipavlov/Pilot/examples/tutorials/tfidf_v1.pkl\n", - "2019-02-12 12:15:52.510 WARNING in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 218: Cannot load model from /home/vimary/ipavlov/Pilot/examples/tutorials/logreg_v2.pkl\n", - "2019-02-12 12:15:52.510 INFO in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 165: Initializing model sklearn.linear_model:LogisticRegression from scratch\n", - "2019-02-12 12:15:53.67 INFO in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 108: Fitting model sklearn.linear_model:LogisticRegression\n", - "2019-02-12 12:15:53.254 INFO in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 240: Saving model to /home/vimary/ipavlov/Pilot/examples/tutorials/logreg_v2.pkl\n", - "2019-02-12 12:15:53.255 WARNING in 'deeppavlov.core.trainers.nn_trainer'['nn_trainer'] at line 295: Using NNTrainer for a pipeline without batched training\n", - "2019-02-12 12:15:53.256 INFO in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 240: Saving model to /home/vimary/ipavlov/Pilot/examples/tutorials/logreg_v2.pkl\n", - "2019-02-12 12:15:53.257 INFO in 'deeppavlov.core.data.simple_vocab'['simple_vocab'] at line 103: [loading vocabulary from /home/vimary/ipavlov/Pilot/examples/tutorials/snips/classes.dict]\n", - "2019-02-12 12:15:53.258 INFO in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 202: Loading model sklearn.feature_extraction.text:TfidfVectorizer from /home/vimary/ipavlov/Pilot/examples/tutorials/tfidf_v1.pkl\n", - "2019-02-12 12:15:53.263 INFO in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 209: Model sklearn.feature_extraction.textTfidfVectorizer loaded with parameters\n", - "2019-02-12 12:15:53.264 WARNING in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 215: Fitting of loaded model can not be continued. Model can be fitted from scratch.If one needs to continue fitting, please, look at `warm_start` parameter\n", - "2019-02-12 12:15:53.266 INFO in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 202: Loading model sklearn.linear_model:LogisticRegression from /home/vimary/ipavlov/Pilot/examples/tutorials/logreg_v2.pkl\n", - "2019-02-12 12:15:53.266 INFO in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 209: Model sklearn.linear_model.logisticLogisticRegression loaded with parameters\n", - "2019-02-12 12:15:53.267 WARNING in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 215: Fitting of loaded model can not be continued. Model can be fitted from scratch.If one needs to continue fitting, please, look at `warm_start` parameter\n", - "2019-02-12 12:15:53.346 INFO in 'deeppavlov.core.data.simple_vocab'['simple_vocab'] at line 103: [loading vocabulary from /home/vimary/ipavlov/Pilot/examples/tutorials/snips/classes.dict]\n", - "2019-02-12 12:15:53.347 INFO in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 202: Loading model sklearn.feature_extraction.text:TfidfVectorizer from /home/vimary/ipavlov/Pilot/examples/tutorials/tfidf_v1.pkl\n", - "2019-02-12 12:15:53.352 INFO in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 209: Model sklearn.feature_extraction.textTfidfVectorizer loaded with parameters\n", - "2019-02-12 12:15:53.352 WARNING in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 215: Fitting of loaded model can not be continued. Model can be fitted from scratch.If one needs to continue fitting, please, look at `warm_start` parameter\n", - "2019-02-12 12:15:53.354 INFO in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 202: Loading model sklearn.linear_model:LogisticRegression from /home/vimary/ipavlov/Pilot/examples/tutorials/logreg_v2.pkl\n", - "2019-02-12 12:15:53.354 INFO in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 209: Model sklearn.linear_model.logisticLogisticRegression loaded with parameters\n", - "2019-02-12 12:15:53.355 WARNING in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 215: Fitting of loaded model can not be continued. Model can be fitted from scratch.If one needs to continue fitting, please, look at `warm_start` parameter\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "{\"valid\": {\"eval_examples_count\": 1589, \"metrics\": {\"accuracy\": 0.983}, \"time_spent\": \"0:00:01\"}}\n" - ] - } - ], - "source": [ - "# we can train and evaluate model from config\n", - "m = train_model(logreg_config)" - ] - }, - { - "cell_type": "code", - "execution_count": 65, - "metadata": { - "scrolled": true - }, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2019-02-12 12:15:53.359 INFO in 'deeppavlov.core.data.simple_vocab'['simple_vocab'] at line 103: [loading vocabulary from /home/vimary/ipavlov/Pilot/examples/tutorials/snips/classes.dict]\n", - "2019-02-12 12:15:53.360 INFO in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 202: Loading model sklearn.feature_extraction.text:TfidfVectorizer from /home/vimary/ipavlov/Pilot/examples/tutorials/tfidf_v1.pkl\n", - "2019-02-12 12:15:53.366 INFO in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 209: Model sklearn.feature_extraction.textTfidfVectorizer loaded with parameters\n", - "2019-02-12 12:15:53.367 WARNING in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 215: Fitting of loaded model can not be continued. Model can be fitted from scratch.If one needs to continue fitting, please, look at `warm_start` parameter\n", - "2019-02-12 12:15:53.368 INFO in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 202: Loading model sklearn.linear_model:LogisticRegression from /home/vimary/ipavlov/Pilot/examples/tutorials/logreg_v2.pkl\n", - "2019-02-12 12:15:53.369 INFO in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 209: Model sklearn.linear_model.logisticLogisticRegression loaded with parameters\n", - "2019-02-12 12:15:53.369 WARNING in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 215: Fitting of loaded model can not be continued. Model can be fitted from scratch.If one needs to continue fitting, please, look at `warm_start` parameter\n" - ] - } - ], - "source": [ - "# or we can just load pre-trained model (conicides with what we did above)\n", - "m = build_model(logreg_config)" - ] - }, - { - "cell_type": "code", - "execution_count": 66, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "[['GetWeather']]" - ] - }, - "execution_count": 66, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "m([\"Is it freezing in Offerman, California?\"])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### KerasClassificationModel on GloVe embeddings from config" - ] - }, - { - "cell_type": "code", - "execution_count": 67, - "metadata": {}, - "outputs": [], - "source": [ - "cnn_config = {\n", - " \"dataset_reader\": {\n", - " \"class_name\": \"basic_classification_reader\",\n", - " \"x\": \"text\",\n", - " \"y\": \"intents\",\n", - " \"data_path\": \"snips\"\n", - " },\n", - " \"dataset_iterator\": {\n", - " \"class_name\": \"basic_classification_iterator\",\n", - " \"seed\": 42,\n", - " \"split_seed\": 23,\n", - " \"field_to_split\": \"train\",\n", - " \"split_fields\": [\n", - " \"train\",\n", - " \"valid\"\n", - " ],\n", - " \"split_proportions\": [\n", - " 0.9,\n", - " 0.1\n", - " ]\n", - " },\n", - " \"chainer\": {\n", - " \"in\": [\n", - " \"x\"\n", - " ],\n", - " \"in_y\": [\n", - " \"y\"\n", - " ],\n", - " \"pipe\": [\n", - " {\n", - " \"id\": \"classes_vocab\",\n", - " \"class_name\": \"simple_vocab\",\n", - " \"fit_on\": [\n", - " \"y\"\n", - " ],\n", - " \"level\": \"token\",\n", - " \"save_path\": \"./snips/classes.dict\",\n", - " \"load_path\": \"./snips/classes.dict\",\n", - " \"in\": \"y\",\n", - " \"out\": \"y_ids\"\n", - " },\n", - " {\n", - " \"in\": \"x\",\n", - " \"out\": \"x_tok\",\n", - " \"id\": \"my_tokenizer\",\n", - " \"class_name\": \"nltk_tokenizer\",\n", - " \"tokenizer\": \"wordpunct_tokenize\"\n", - " },\n", - " {\n", - " \"in\": \"x_tok\",\n", - " \"out\": \"x_emb\",\n", - " \"id\": \"my_embedder\",\n", - " \"class_name\": \"glove\",\n", - " \"load_path\": \"./glove.6B.100d.txt\",\n", - " \"dim\": 100,\n", - " \"pad_zero\": True\n", - " },\n", - " {\n", - " \"in\": \"y_ids\",\n", - " \"out\": \"y_onehot\",\n", - " \"class_name\": \"one_hotter\",\n", - " \"depth\": \"#classes_vocab.len\",\n", - " \"single_vector\": True\n", - " },\n", - " {\n", - " \"in\": [\n", - " \"x_emb\"\n", - " ],\n", - " \"in_y\": [\n", - " \"y_onehot\"\n", - " ],\n", - " \"out\": [\n", - " \"y_pred_probas\"\n", - " ],\n", - " \"main\": True,\n", - " \"class_name\": \"keras_classification_model\",\n", - " \"save_path\": \"./cnn_model_v1\",\n", - " \"load_path\": \"./cnn_model_v1\",\n", - " \"embedding_size\": \"#my_embedder.dim\",\n", - " \"n_classes\": \"#classes_vocab.len\",\n", - " \"kernel_sizes_cnn\": [\n", - " 1,\n", - " 2,\n", - " 3\n", - " ],\n", - " \"filters_cnn\": 256,\n", - " \"optimizer\": \"Adam\",\n", - " \"learning_rate\": 0.01,\n", - " \"learning_rate_decay\": 0.1,\n", - " \"loss\": \"categorical_crossentropy\",\n", - " \"coef_reg_cnn\": 1e-4,\n", - " \"coef_reg_den\": 1e-4,\n", - " \"dropout_rate\": 0.5,\n", - " \"dense_size\": 100,\n", - " \"model_name\": \"cnn_model\"\n", - " },\n", - " {\n", - " \"in\": \"y_pred_probas\",\n", - " \"out\": \"y_pred_ids\",\n", - " \"class_name\": \"proba2labels\",\n", - " \"max_proba\": True\n", - " },\n", - " {\n", - " \"in\": \"y_pred_ids\",\n", - " \"out\": \"y_pred_labels\",\n", - " \"ref\": \"classes_vocab\"\n", - " }\n", - " ],\n", - " \"out\": [\n", - " \"y_pred_labels\"\n", - " ]\n", - " },\n", - " \"train\": {\n", - " \"epochs\": 10,\n", - " \"batch_size\": 64,\n", - " \"metrics\": [\n", - " \"sets_accuracy\",\n", - " \"f1_macro\",\n", - " {\n", - " \"name\": \"roc_auc\",\n", - " \"inputs\": [\"y_onehot\", \"y_pred_probas\"]\n", - " }\n", - " ],\n", - " \"validation_patience\": 5,\n", - " \"val_every_n_epochs\": 1,\n", - " \"log_every_n_epochs\": 1,\n", - " \"show_examples\": True,\n", - " \"validate_best\": True,\n", - " \"test_best\": False\n", - " }\n", - "}\n" - ] - }, - { - "cell_type": "code", - "execution_count": 68, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2019-02-12 12:15:54.311 WARNING in 'deeppavlov.dataset_readers.basic_classification_reader'['basic_classification_reader'] at line 96: Cannot find /home/vimary/ipavlov/Pilot/examples/tutorials/snips/valid.csv file\n", - "2019-02-12 12:15:54.312 WARNING in 'deeppavlov.dataset_readers.basic_classification_reader'['basic_classification_reader'] at line 96: Cannot find /home/vimary/ipavlov/Pilot/examples/tutorials/snips/test.csv file\n", - "2019-02-12 12:15:54.313 INFO in 'deeppavlov.dataset_iterators.basic_classification_iterator'['basic_classification_iterator'] at line 73: Splitting field <> to new fields <<['train', 'valid']>>\n", - "2019-02-12 12:15:54.316 WARNING in 'deeppavlov.core.commands.train'['train'] at line 108: \"validate_best\" and \"test_best\" parameters are deprecated. Please, use \"evaluation_targets\" list instead\n", - "2019-02-12 12:15:54.319 INFO in 'deeppavlov.core.data.simple_vocab'['simple_vocab'] at line 103: [loading vocabulary from /home/vimary/ipavlov/Pilot/examples/tutorials/snips/classes.dict]\n", - "2019-02-12 12:15:54.335 INFO in 'deeppavlov.core.data.simple_vocab'['simple_vocab'] at line 89: [saving vocabulary to /home/vimary/ipavlov/Pilot/examples/tutorials/snips/classes.dict]\n", - "2019-02-12 12:15:54.337 INFO in 'deeppavlov.models.embedders.glove_embedder'['glove_embedder'] at line 52: [loading GloVe embeddings from `/home/vimary/ipavlov/Pilot/examples/tutorials/glove.6B.100d.txt`]\n", - "2019-02-12 12:16:14.207 INFO in 'deeppavlov.models.classifiers.keras_classification_model'['keras_classification_model'] at line 272: [initializing `KerasClassificationModel` from scratch as cnn_model]\n", - "2019-02-12 12:16:14.548 INFO in 'deeppavlov.models.classifiers.keras_classification_model'['keras_classification_model'] at line 136: Model was successfully initialized!\n", - "Model summary:\n", - "__________________________________________________________________________________________________\n", - "Layer (type) Output Shape Param # Connected to \n", - "==================================================================================================\n", - "input_1 (InputLayer) (None, None, 100) 0 \n", - "__________________________________________________________________________________________________\n", - "conv1d_1 (Conv1D) (None, None, 256) 25856 input_1[0][0] \n", - "__________________________________________________________________________________________________\n", - "conv1d_2 (Conv1D) (None, None, 256) 51456 input_1[0][0] \n", - "__________________________________________________________________________________________________\n", - "conv1d_3 (Conv1D) (None, None, 256) 77056 input_1[0][0] \n", - "__________________________________________________________________________________________________\n", - "batch_normalization_1 (BatchNor (None, None, 256) 1024 conv1d_1[0][0] \n", - "__________________________________________________________________________________________________\n", - "batch_normalization_2 (BatchNor (None, None, 256) 1024 conv1d_2[0][0] \n", - "__________________________________________________________________________________________________\n", - "batch_normalization_3 (BatchNor (None, None, 256) 1024 conv1d_3[0][0] \n", - "__________________________________________________________________________________________________\n", - "activation_1 (Activation) (None, None, 256) 0 batch_normalization_1[0][0] \n", - "__________________________________________________________________________________________________\n", - "activation_2 (Activation) (None, None, 256) 0 batch_normalization_2[0][0] \n", - "__________________________________________________________________________________________________\n", - "activation_3 (Activation) (None, None, 256) 0 batch_normalization_3[0][0] \n", - "__________________________________________________________________________________________________\n", - "global_max_pooling1d_1 (GlobalM (None, 256) 0 activation_1[0][0] \n", - "__________________________________________________________________________________________________\n", - "global_max_pooling1d_2 (GlobalM (None, 256) 0 activation_2[0][0] \n", - "__________________________________________________________________________________________________\n", - "global_max_pooling1d_3 (GlobalM (None, 256) 0 activation_3[0][0] \n", - "__________________________________________________________________________________________________\n", - "concatenate_1 (Concatenate) (None, 768) 0 global_max_pooling1d_1[0][0] \n", - " global_max_pooling1d_2[0][0] \n", - " global_max_pooling1d_3[0][0] \n", - "__________________________________________________________________________________________________\n", - "dropout_1 (Dropout) (None, 768) 0 concatenate_1[0][0] \n", - "__________________________________________________________________________________________________\n", - "dense_1 (Dense) (None, 100) 76900 dropout_1[0][0] \n", - "__________________________________________________________________________________________________\n", - "batch_normalization_4 (BatchNor (None, 100) 400 dense_1[0][0] \n", - "__________________________________________________________________________________________________\n", - "activation_4 (Activation) (None, 100) 0 batch_normalization_4[0][0] \n", - "__________________________________________________________________________________________________\n", - "dropout_2 (Dropout) (None, 100) 0 activation_4[0][0] \n", - "__________________________________________________________________________________________________\n", - "dense_2 (Dense) (None, 7) 707 dropout_2[0][0] \n", - "__________________________________________________________________________________________________\n", - "batch_normalization_5 (BatchNor (None, 7) 28 dense_2[0][0] \n", - "__________________________________________________________________________________________________\n", - "activation_5 (Activation) (None, 7) 0 batch_normalization_5[0][0] \n", - "==================================================================================================\n", - "Total params: 235,475\n", - "Trainable params: 233,725\n", - "Non-trainable params: 1,750\n", - "__________________________________________________________________________________________________\n", - "/home/vimary/tensorflow/lib/python3.6/site-packages/sklearn/metrics/classification.py:1135: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples.\n", - " 'precision', 'predicted', average, warn_for)\n", - "2019-02-12 12:16:14.932 INFO in 'deeppavlov.core.trainers.nn_trainer'['nn_trainer'] at line 163: New best sets_accuracy of 0.1479\n", - "2019-02-12 12:16:14.932 INFO in 'deeppavlov.core.trainers.nn_trainer'['nn_trainer'] at line 165: Saving model\n", - "2019-02-12 12:16:14.933 INFO in 'deeppavlov.models.classifiers.keras_classification_model'['keras_classification_model'] at line 386: [saving model to /home/vimary/ipavlov/Pilot/examples/tutorials/cnn_model_v1_opt.json]\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "{\"valid\": {\"eval_examples_count\": 1589, \"metrics\": {\"sets_accuracy\": 0.1479, \"f1_macro\": 0.044, \"roc_auc\": 0.5499}, \"time_spent\": \"0:00:01\", \"examples\": [{\"x\": \"Book a table at Carter House Inn in Saint Bonaventure, Alaska.\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Rate the current textbook one of 6 stars\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"find a nearby movie schedule for movies\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"what is the Mississippi for the week\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Play me a song from 1968 on Spotify\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Book a table for me, naomi and elisabeth at a brasserie with wifi\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"The current album gets three out of 6 points\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"find Goodrich Quality Theaters films\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"show me the picture Unfinished Monkey Business\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"When is The Third Eye showing at Dickinson Theatres?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Please get me the Welcome to the Rileys game.\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Find a song called Bronco Billy.\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Rate this essay five stars\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"add tune to my relax & unwind playlist\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"play 2007 tunes by Bunny Berigan\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"book a table for ten downtown at a close-by restaurant\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Find the schedule for for Corn at eleven A.M. at Loews Cineplex Entertainment.\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"1 minute from now, I will need reservations at a restaurant in Vanlue.\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Play hanging in the balance by Nik Kershaw on Zvooq.\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Will it be windy at 4 Pm in NY?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Rate my current textbook 1 out of 6 points\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"What are the weather conditions in Noel?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Add this artist to the laugh list\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"I am rating Book of Challenges four stars\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"rate this textbook a 4\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Put an album by max richter into my this is Rosana playlist. \", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"where can i watch animated movies around here\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Is A Man, a Woman, and a Bank showing in the nearest Neighborhood Cinema Group\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"book a popular food truck in Kentucky\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Show me animated movies that are playig at Great Escape Theatres\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"add Sara Carter to my Nothing But A Party R&B\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"I would like an outdoor cafeteria for 3\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"rate the book Whit a zero\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Find a show called Time Is Just the Same.\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"I need the weather in Hubbardston, will it be chillier?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"rate the previous essay four of 6 points\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Add wiktor coj to the Sleep playlist.\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Rate Dixie Lullaby: A Story of Music, Race and New Beginnings in a New South five out of 6 points\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"please put live with me onto my playlist named CARГЃCTER LATINO\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Please add tobymac's song onto the indiespensables playlist.\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Show me the movie schedule for Caribbean Cinemas\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Can you put this song on the metal xplorer playlist\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Add this tune to my rage radio playlist\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"I want to go see A Troll in Central Park.\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Give the current series a one.\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"I'd like to watch animated movies at National Amusements\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"my jazz for loving couples needs more push the button\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"What are the movie schedules for Kerasotes Theatres\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"rate the Dry series two out of 6 stars\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"I want a list of showings of Days of Fire at Harkins Theatres\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Give White House Diary two points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"I'd like the weather forecast in Gang Mills four years from now.\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Rate Tropic of Capricorn two stars\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"RateBook\"]}], \"epochs_done\": 0, \"batches_seen\": 0, \"train_examples_seen\": 0, \"impatience\": 0, \"patience_limit\": 5}}\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2019-02-12 12:16:19.387 INFO in 'deeppavlov.core.trainers.nn_trainer'['nn_trainer'] at line 163: New best sets_accuracy of 0.9434\n", - "2019-02-12 12:16:19.388 INFO in 'deeppavlov.core.trainers.nn_trainer'['nn_trainer'] at line 165: Saving model\n", - "2019-02-12 12:16:19.388 INFO in 'deeppavlov.models.classifiers.keras_classification_model'['keras_classification_model'] at line 386: [saving model to /home/vimary/ipavlov/Pilot/examples/tutorials/cnn_model_v1_opt.json]\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "{\"train\": {\"eval_examples_count\": 64, \"metrics\": {\"sets_accuracy\": 0.9375, \"f1_macro\": 0.9421, \"roc_auc\": 0.9938}, \"time_spent\": \"0:00:05\", \"examples\": [{\"x\": \"Please find me the work, Instrumental Directions.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"What weather will it be in Battlement Mesa?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"play theme by Yanni on Vimeo\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"rate the Beyond Black saga a one\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Find the schedule for The Tooth Will Out at sunrise.\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Rate Lords of the Rim zero stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"play an Masaki Aiba tune\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"I need a table for 5 at the restaurant I ate at last Oct.\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"book a table at a restaurant in Lucerne Valley that serves chicken nugget\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"add the tune to my viajes playlist\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Play some thrash metal.\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Need to find the soundtrack called Fire in the Valley\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Get Jump Down painting\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Is it chillier in Baconton KY\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Find I Could Fall in Love.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Evolution and the Theory of Games gets a five out of 6.\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Is Outlaw of Gor showing at thenearest movie house at 5 A.M.\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Rate Pillar of Fire and Other Plays a three\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Play something by Holly Cole on lastfm \", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"I need some ambient music. \", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Rate Steps two out of 6 points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"What are the movie times at the Loews Cineplex\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Find THUNDER IN THE EAST.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Rate this album zero stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"play music from the sixties\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Find a television show called Twisted.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Show The Late Great Townes Van Zandt\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Will it get hotter around elevenses in KS?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Play a cohesive playlist for me\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Will there be rainfall at one PM in Catahoula\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Play the greatest record by Leroi Moore\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"this essay should get 1 of the points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"book a table in Connecticut in Robinette for one second from now\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Rate the current textbook 1 of 6 stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"how's the weather going to be at fourteen o'clock in Falkland Islands\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"She me the Sons of Satan Praise the Lord picture\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Please book a Uncommon Grounds Coffeehouse restaurant \", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Make me a reservation at Illinois Central Railroad Freight Depot in Singapore with vickie rodriguez, lila reyes and ruby\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Is Love Is a Ball playing right now?\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"The Far Side of the World chronicle deserves three out of 6 points.\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Book a restaurant for one in AL.\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"add tune to my instrumental funk playlist\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Rate my current book 1 out of 6 points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Which films are playing at the closest movie house?\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"rate this book series zero out of 6 points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Find a painting called Beyond the Neighbourhood.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"When is Longwave going to be playing?\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Show me Dangers of the Canadian Mounted\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"around here find movie schedule for films\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"I want to see Teeny Little Super Guy at Malco Theatres\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Will it be windy in NM?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Locate the best pub in Apache Junction\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"party for 2 in Cleveland\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"find Dickinson Theatres showing From Bondage to Freedom\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"add potje met vet to my electronic gaming playlis\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"I'm looking for the movie called The Beast that Shouted Love at the Heart of the World.\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Play White Noise.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Find Just South of Heaven\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Put Al Jarreau on the ConcentraciГіn playlist\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"add banking violence and the inner life today to my retro gaming playlist\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"find a movie house with Colic: The Movie that is nearest\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"I am looking for any creative work with the title of Journal of Pharmacy and Pharmacology\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"add the album by Cham to my Cloud Rap playlist\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Find an album called List of Re: Hamatora episodes.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}], \"epochs_done\": 1, \"batches_seen\": 224, \"train_examples_seen\": 14295, \"loss\": 1.386025922106845}}\n", - "{\"valid\": {\"eval_examples_count\": 1589, \"metrics\": {\"sets_accuracy\": 0.9434, \"f1_macro\": 0.9427, \"roc_auc\": 0.9965}, \"time_spent\": \"0:00:05\", \"examples\": [{\"x\": \"Book a table at Carter House Inn in Saint Bonaventure, Alaska.\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Rate the current textbook one of 6 stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"find a nearby movie schedule for movies\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"what is the Mississippi for the week\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Play me a song from 1968 on Spotify\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Book a table for me, naomi and elisabeth at a brasserie with wifi\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"The current album gets three out of 6 points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"find Goodrich Quality Theaters films\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"show me the picture Unfinished Monkey Business\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"When is The Third Eye showing at Dickinson Theatres?\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Please get me the Welcome to the Rileys game.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Find a song called Bronco Billy.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Rate this essay five stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"add tune to my relax & unwind playlist\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"play 2007 tunes by Bunny Berigan\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"book a table for ten downtown at a close-by restaurant\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Find the schedule for for Corn at eleven A.M. at Loews Cineplex Entertainment.\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"1 minute from now, I will need reservations at a restaurant in Vanlue.\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Play hanging in the balance by Nik Kershaw on Zvooq.\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Will it be windy at 4 Pm in NY?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Rate my current textbook 1 out of 6 points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"What are the weather conditions in Noel?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Add this artist to the laugh list\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"I am rating Book of Challenges four stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"rate this textbook a 4\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Put an album by max richter into my this is Rosana playlist. \", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"where can i watch animated movies around here\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Is A Man, a Woman, and a Bank showing in the nearest Neighborhood Cinema Group\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"book a popular food truck in Kentucky\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Show me animated movies that are playig at Great Escape Theatres\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"add Sara Carter to my Nothing But A Party R&B\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"I would like an outdoor cafeteria for 3\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"rate the book Whit a zero\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Find a show called Time Is Just the Same.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"I need the weather in Hubbardston, will it be chillier?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"rate the previous essay four of 6 points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Add wiktor coj to the Sleep playlist.\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Rate Dixie Lullaby: A Story of Music, Race and New Beginnings in a New South five out of 6 points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"please put live with me onto my playlist named CARГЃCTER LATINO\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Please add tobymac's song onto the indiespensables playlist.\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Show me the movie schedule for Caribbean Cinemas\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Can you put this song on the metal xplorer playlist\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Add this tune to my rage radio playlist\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"I want to go see A Troll in Central Park.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Give the current series a one.\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"I'd like to watch animated movies at National Amusements\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"my jazz for loving couples needs more push the button\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"What are the movie schedules for Kerasotes Theatres\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"rate the Dry series two out of 6 stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"I want a list of showings of Days of Fire at Harkins Theatres\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Give White House Diary two points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"I'd like the weather forecast in Gang Mills four years from now.\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Rate Tropic of Capricorn two stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}], \"epochs_done\": 1, \"batches_seen\": 224, \"train_examples_seen\": 14295, \"impatience\": 0, \"patience_limit\": 5}}\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2019-02-12 12:16:21.734 INFO in 'deeppavlov.core.trainers.nn_trainer'['nn_trainer'] at line 163: New best sets_accuracy of 0.9515\n", - "2019-02-12 12:16:21.735 INFO in 'deeppavlov.core.trainers.nn_trainer'['nn_trainer'] at line 165: Saving model\n", - "2019-02-12 12:16:21.735 INFO in 'deeppavlov.models.classifiers.keras_classification_model'['keras_classification_model'] at line 386: [saving model to /home/vimary/ipavlov/Pilot/examples/tutorials/cnn_model_v1_opt.json]\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "{\"train\": {\"eval_examples_count\": 64, \"metrics\": {\"sets_accuracy\": 0.9688, \"f1_macro\": 0.9623, \"roc_auc\": 0.999}, \"time_spent\": \"0:00:08\", \"examples\": [{\"x\": \"She me movie times\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"I'd like a table in a smoking room in a taverna on sep. 23, 2023\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"find a movie called No More Sadface\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"A rating of 5 of 6 points goes to Dickson McCunn trilogy\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"For the book The Mirrored Heavens I give one of a possiable 6 stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"What are the weather conditions in Patagonia, South Africa?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Where can I watch the trailer for Home Economics\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Book a restaurant on san jacinto day in Anderson for me and my colleagues.\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"What time is The Man Who Dared playing at the movie theatre?\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"What will the temperature be at midnight in NE\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"I need a reservation for two at a diner in Venezuela\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Show me the forecast for the distant area of ME at three pm\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"I want to eat at the Trout Creek restaurant for 9 people for bougatsa that is the best\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Rate the current novel four out of 6 points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"What will the weather be at Noon in Durbin OH?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Find the album Follow That Camel\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Can you play some Andrew Cash music on Slacker\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"I need a table right now for four in ME\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"play Peja tunes\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"add this track by clem burke to my atmospheric black metal playlist\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"I want to play the game Show Me the Wonder\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"I need a table for four at a restaurant in AL\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"what is the forecast for Orienta for hotter weather\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"when are animated movies playing at Goodrich Quality Theaters\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Will there be a blizzard in Egypt?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Find me the soundtrack called Enter the Chicken\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"find the video game called Turnin Me On\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Check movie schedules and find which animated movies are being aired in the neighborhood\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"renee sanders, marlene and jewel want to go to a gastropub in the spa\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"I want to watch The Original Recordings\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"book a table in Fort Loudon at a restaurant for 5\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"in 1 week is there going to be a depression in Washington\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Find the schedule for Grand Canyon Trail.\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Where Can I watch Chaos and Desire?\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"add the album to the Six string peacefulness playlist\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"What movies are playing at the closest cinema\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Search for the Halfway Home TV show\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Find a painting called The Book of Folly.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"What is the weather forecast here?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"play In The Arms Of God on Zvooq by Nimal Mendis\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Will it get colder in Cape Fair\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Play some seventies track from top Rie Tomosaka\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Will it be temperate near Neylandville\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"play the most popular album on Google Music by sasu ripatti\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"rate the book series Sons of Destiny a five\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"What's the movie schedule for B&B Theatres?\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Show the movie schedules at KB Theatres\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Put the bill berry track on elrow Guest List\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Tell me the weather forecast in 4 years and a half in GA\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"will the weather be warm far from Niger at 15 o'clock\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"find the closest cinema for films\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"I am giving this current book album 0 out of 6 stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Add song to my Pop Brasil\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Find a reservation at a brasserie restaurant nearby SC for a party of ten\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"can you find Leadership in my library, please?\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"give four stars out of 6 to current book\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Let's hear something from Elena Risteska\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Include dschiwan gasparjan in beth's rare groove playlist. \", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Book a table at a brasserie type restaurant that serves jain for a party of 8\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Book PM Park, Clear Lake, Iowa at 5 am for 6 people.\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Tell me the weather forecast for Molino, Washington\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"I want to add a tune to my spanish metalblood playlist\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Can you find me the Back When I Knew It All album?\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Rate the current novel four of 6 stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}], \"epochs_done\": 2, \"batches_seen\": 448, \"train_examples_seen\": 28590, \"loss\": 1.2655515175844942}}\n", - "{\"valid\": {\"eval_examples_count\": 1589, \"metrics\": {\"sets_accuracy\": 0.9515, \"f1_macro\": 0.9509, \"roc_auc\": 0.9973}, \"time_spent\": \"0:00:08\", \"examples\": [{\"x\": \"Book a table at Carter House Inn in Saint Bonaventure, Alaska.\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Rate the current textbook one of 6 stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"find a nearby movie schedule for movies\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"what is the Mississippi for the week\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Play me a song from 1968 on Spotify\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Book a table for me, naomi and elisabeth at a brasserie with wifi\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"The current album gets three out of 6 points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"find Goodrich Quality Theaters films\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"show me the picture Unfinished Monkey Business\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"When is The Third Eye showing at Dickinson Theatres?\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Please get me the Welcome to the Rileys game.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Find a song called Bronco Billy.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Rate this essay five stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"add tune to my relax & unwind playlist\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"play 2007 tunes by Bunny Berigan\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"book a table for ten downtown at a close-by restaurant\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Find the schedule for for Corn at eleven A.M. at Loews Cineplex Entertainment.\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"1 minute from now, I will need reservations at a restaurant in Vanlue.\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Play hanging in the balance by Nik Kershaw on Zvooq.\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Will it be windy at 4 Pm in NY?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Rate my current textbook 1 out of 6 points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"What are the weather conditions in Noel?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Add this artist to the laugh list\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"I am rating Book of Challenges four stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"rate this textbook a 4\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Put an album by max richter into my this is Rosana playlist. \", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"where can i watch animated movies around here\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Is A Man, a Woman, and a Bank showing in the nearest Neighborhood Cinema Group\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"book a popular food truck in Kentucky\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Show me animated movies that are playig at Great Escape Theatres\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"add Sara Carter to my Nothing But A Party R&B\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"I would like an outdoor cafeteria for 3\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"rate the book Whit a zero\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Find a show called Time Is Just the Same.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"I need the weather in Hubbardston, will it be chillier?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"rate the previous essay four of 6 points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Add wiktor coj to the Sleep playlist.\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Rate Dixie Lullaby: A Story of Music, Race and New Beginnings in a New South five out of 6 points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"please put live with me onto my playlist named CARГЃCTER LATINO\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Please add tobymac's song onto the indiespensables playlist.\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Show me the movie schedule for Caribbean Cinemas\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Can you put this song on the metal xplorer playlist\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Add this tune to my rage radio playlist\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"I want to go see A Troll in Central Park.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Give the current series a one.\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"I'd like to watch animated movies at National Amusements\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"my jazz for loving couples needs more push the button\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"What are the movie schedules for Kerasotes Theatres\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"rate the Dry series two out of 6 stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"I want a list of showings of Days of Fire at Harkins Theatres\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Give White House Diary two points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"I'd like the weather forecast in Gang Mills four years from now.\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Rate Tropic of Capricorn two stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}], \"epochs_done\": 2, \"batches_seen\": 448, \"train_examples_seen\": 28590, \"impatience\": 0, \"patience_limit\": 5}}\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2019-02-12 12:16:24.94 INFO in 'deeppavlov.core.trainers.nn_trainer'['nn_trainer'] at line 163: New best sets_accuracy of 0.9553\n", - "2019-02-12 12:16:24.94 INFO in 'deeppavlov.core.trainers.nn_trainer'['nn_trainer'] at line 165: Saving model\n", - "2019-02-12 12:16:24.95 INFO in 'deeppavlov.models.classifiers.keras_classification_model'['keras_classification_model'] at line 386: [saving model to /home/vimary/ipavlov/Pilot/examples/tutorials/cnn_model_v1_opt.json]\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "{\"train\": {\"eval_examples_count\": 64, \"metrics\": {\"sets_accuracy\": 0.9844, \"f1_macro\": 0.9859, \"roc_auc\": 0.9998}, \"time_spent\": \"0:00:10\", \"examples\": [{\"x\": \"find the trailer for Hit the Ice\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Can I get the movies showtimes for the closest movie house.\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"I want to give this book zero\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"give The Creator zero points out of 6\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Find the movie schedules for Cineplex Odeon Corporation.\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Get soundtrack of Comprehensive Knowledge Archive Network\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"play Pandora tunes from the fourties\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"A Sport and a Pastime is a solid 5 out of 6 points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"I give Life During Wartime a one out of 6.\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Rate this essay two stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"What is the cloud coverage in my current place\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Find the movie schedule for animated movies in the neighborhood.\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"add ali lohan songs in Club Hits\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Play some alternative music on Vimeo\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Tell me if it'll be freezing here in 21 seconds\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Add inconfundible to the piano in the background playlist. \", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Are there any animated movies playing in the neighborhood?\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Showtimes for animated movies in Malco Theatres?\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Show me the movie schedule for movies opening today close by\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Rate this album a 3\\n\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Can you add this tune to the night out playlist?\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"rate this textbook 3 out 6\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Rate the Michel Strogoff saga four of 6\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"I want to listen to the soundtrack Bed of Roses\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"What weather will HI have will there be hail twenty one minutes from now\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"what is the forecast for in 1 second at Monte Sereno for freezing temps\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"give Jackass Investing a three\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Give this album a three\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"play 1951 tunes\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Can you please find me Season of Glass?\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"play Going Down To The River on Pandora\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Play 2004 on pandora\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Add radhae unakku kobam aagathadi to my Women of Metal playlist\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Play Alone, Again from Mike Viola\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"For the current essay I rate 1 out of 6\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"will there be a cloud next year in Kewanee\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"what is the weather of Sri Lanka\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"I give the Knife of Dreams saga a 0 of 6\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"I want to give The Plague Lords of Ruel 0 stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Can I see Standing on the Edge of the Noise in the nearest cinema\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Play the top music from Epic Mazur.\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Play Suite Sudarmoricaine by Afi on itunes\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Book a reservation for 6 at a restaurant in Deersville\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"play laura love songs from 1959\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"what is the MT forecast for 22\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Play the top caleigh peters.\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"rate this novel five points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"book a table for me and bettye at Washington, D.C. Jewish Community Center in Montana\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"add Puzzles Like You in my playlist Reggae\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"book for 3 in U.S. Virgin Islands\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Play the top fifty record from Alan Jardine\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Book a reservation for three at a top-rated sicilian restaurant in Portugal\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"book cornelia and bettie a table at a brasserie restaurant in Colombia\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Will it get warmer in Berkley\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"add the artist to my emotron playlist\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Add in the heart of the world to the Epic Gaming playlist\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Find me the movie times\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Can you give me a local and current movie schedule \", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Give the current chronicle 2 stars.\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Add the name the magnificent tree to playlist this is Rosana\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Play a new ballad by Valy on Iheart\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Book spot for 7 at NH Theressa\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Add Jermaine Fagan to spring music\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"I need a table for 7 people at a bar that specialises in being a protein bar.\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}], \"epochs_done\": 3, \"batches_seen\": 672, \"train_examples_seen\": 42885, \"loss\": 1.2179965520543712}}\n", - "{\"valid\": {\"eval_examples_count\": 1589, \"metrics\": {\"sets_accuracy\": 0.9553, \"f1_macro\": 0.9546, \"roc_auc\": 0.9977}, \"time_spent\": \"0:00:10\", \"examples\": [{\"x\": \"Book a table at Carter House Inn in Saint Bonaventure, Alaska.\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Rate the current textbook one of 6 stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"find a nearby movie schedule for movies\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"what is the Mississippi for the week\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Play me a song from 1968 on Spotify\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Book a table for me, naomi and elisabeth at a brasserie with wifi\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"The current album gets three out of 6 points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"find Goodrich Quality Theaters films\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"show me the picture Unfinished Monkey Business\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"When is The Third Eye showing at Dickinson Theatres?\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Please get me the Welcome to the Rileys game.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Find a song called Bronco Billy.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Rate this essay five stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"add tune to my relax & unwind playlist\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"play 2007 tunes by Bunny Berigan\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"book a table for ten downtown at a close-by restaurant\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Find the schedule for for Corn at eleven A.M. at Loews Cineplex Entertainment.\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"1 minute from now, I will need reservations at a restaurant in Vanlue.\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Play hanging in the balance by Nik Kershaw on Zvooq.\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Will it be windy at 4 Pm in NY?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Rate my current textbook 1 out of 6 points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"What are the weather conditions in Noel?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Add this artist to the laugh list\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"I am rating Book of Challenges four stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"rate this textbook a 4\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Put an album by max richter into my this is Rosana playlist. \", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"where can i watch animated movies around here\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Is A Man, a Woman, and a Bank showing in the nearest Neighborhood Cinema Group\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"book a popular food truck in Kentucky\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Show me animated movies that are playig at Great Escape Theatres\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"add Sara Carter to my Nothing But A Party R&B\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"I would like an outdoor cafeteria for 3\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"rate the book Whit a zero\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Find a show called Time Is Just the Same.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"I need the weather in Hubbardston, will it be chillier?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"rate the previous essay four of 6 points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Add wiktor coj to the Sleep playlist.\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Rate Dixie Lullaby: A Story of Music, Race and New Beginnings in a New South five out of 6 points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"please put live with me onto my playlist named CARГЃCTER LATINO\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Please add tobymac's song onto the indiespensables playlist.\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Show me the movie schedule for Caribbean Cinemas\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Can you put this song on the metal xplorer playlist\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Add this tune to my rage radio playlist\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"I want to go see A Troll in Central Park.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Give the current series a one.\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"I'd like to watch animated movies at National Amusements\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"my jazz for loving couples needs more push the button\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"What are the movie schedules for Kerasotes Theatres\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"rate the Dry series two out of 6 stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"I want a list of showings of Days of Fire at Harkins Theatres\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Give White House Diary two points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"I'd like the weather forecast in Gang Mills four years from now.\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Rate Tropic of Capricorn two stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}], \"epochs_done\": 3, \"batches_seen\": 672, \"train_examples_seen\": 42885, \"impatience\": 0, \"patience_limit\": 5}}\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2019-02-12 12:16:26.435 INFO in 'deeppavlov.core.trainers.nn_trainer'['nn_trainer'] at line 163: New best sets_accuracy of 0.9566\n", - "2019-02-12 12:16:26.435 INFO in 'deeppavlov.core.trainers.nn_trainer'['nn_trainer'] at line 165: Saving model\n", - "2019-02-12 12:16:26.436 INFO in 'deeppavlov.models.classifiers.keras_classification_model'['keras_classification_model'] at line 386: [saving model to /home/vimary/ipavlov/Pilot/examples/tutorials/cnn_model_v1_opt.json]\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "{\"train\": {\"eval_examples_count\": 64, \"metrics\": {\"sets_accuracy\": 0.9531, \"f1_macro\": 0.9521, \"roc_auc\": 0.999}, \"time_spent\": \"0:00:12\", \"examples\": [{\"x\": \"Book a northeastern brazilian restaurant for 10 am\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Rate The Life and Loves of a She-Devil 5 out of 6\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"what is the forecast for Montana at dinner\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Where is The Toxic Avenger II playing\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Play some music on Last Fm\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Tell me the weather forecast one year from now in Kulpsville, Togo\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"show freezing forcast now within the same area in North Dakota\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Play some music from 1993 on Itunes.\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Which animated movies are playing at the nearest movie house?\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"I want a table for 4 in Florida\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"book spot for my mother in law and I at 18 o'clock\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Play some House music\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"book a spot for me and sonja at a popular pizzeria\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"I'm in the mood to listen to meditative music.\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Play a new tune by louis silvers.\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"rate this series 5 out of 6 stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"I want to listen to the soundtrack Bed of Roses\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Will the sun be out close-by Admiralty Island National Monument?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Play Wynton Kelly music on Netflix sort by popular\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Rate Who Moved My Cheese? a one\\n\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Please find me Glass Cloud – Single.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Show me The Courts of Chaos\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Look for Hail Satanas We Are The Black Legions\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Play the greatest soundtrack by Nhat Son on Last Fm.\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Can you find me the trailer of the Hippocratic Oath?\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"what movies are at the nearest movie house\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Play Barbra Streisand music from 1997.\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Please use pandora to play a record from 1993\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Play the theme music from 1963 by Yuki Koyanagi\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Will it be windy in Tequesta?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Weather for Coaldale Arkansas \", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Where can I find the album The Adventures of Lolo II\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Add Buddy DeSylva to my this is j balvin playlist\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"What are the movie times for movies premiering in the neighbourhood \", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"add Curse song to my playlist Guest List Engadget\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"book a table 1 year from now for corinne, tisha and I at a restaurant in Guernsey that is top-rated\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"I'd like to watch Wish You Were Dead\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Add to my list the tune summer of love\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"I want to listen to Roger Daltrey from the sixties on slacker\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"rate this novel a two\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"book me a reservation at a highly rated tavern in Hornersville\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Give the current chronicle 2 stars.\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Show me the Caribbean Blue television show\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Play a song from the thirties by Bruno Pelletier\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Open Vimeo and play music.\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Rate the current album a 5 out of 6\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Book me a table for one at Blue Ribbon Barbecue\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"I need to book a restaurant with a smoking room in AL\\n\\n\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Will it be hotter neighboring ME on august eighteenth, 2025?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Will it get colder in Alaska?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Will it snow in AMy\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Find the schedule for Vanishing of the Bees at a movie house.\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"find To Each His Own Cinema, an album\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"add Funtwo to disco fever track\\n\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Can you play some music from my road trip album\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Can I get the movie schedule for the Bow Tie Cinemas.\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"What is the movie schedules for films in the neighborhood\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"rate this book a zero\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Show me the album Til the Morning\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Add album to my Country Hits\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"give one rank to this album\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"add this Roy Orbison song onto Women of Comedy\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Play the newest Roger Troutman track possible\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Book a restaurant at sixteen o'clock in SC\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}], \"epochs_done\": 4, \"batches_seen\": 896, \"train_examples_seen\": 57180, \"loss\": 1.1904606600957257}}\n", - "{\"valid\": {\"eval_examples_count\": 1589, \"metrics\": {\"sets_accuracy\": 0.9566, \"f1_macro\": 0.9559, \"roc_auc\": 0.9978}, \"time_spent\": \"0:00:12\", \"examples\": [{\"x\": \"Book a table at Carter House Inn in Saint Bonaventure, Alaska.\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Rate the current textbook one of 6 stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"find a nearby movie schedule for movies\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"what is the Mississippi for the week\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Play me a song from 1968 on Spotify\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Book a table for me, naomi and elisabeth at a brasserie with wifi\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"The current album gets three out of 6 points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"find Goodrich Quality Theaters films\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"show me the picture Unfinished Monkey Business\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"When is The Third Eye showing at Dickinson Theatres?\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Please get me the Welcome to the Rileys game.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Find a song called Bronco Billy.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Rate this essay five stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"add tune to my relax & unwind playlist\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"play 2007 tunes by Bunny Berigan\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"book a table for ten downtown at a close-by restaurant\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Find the schedule for for Corn at eleven A.M. at Loews Cineplex Entertainment.\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"1 minute from now, I will need reservations at a restaurant in Vanlue.\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Play hanging in the balance by Nik Kershaw on Zvooq.\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Will it be windy at 4 Pm in NY?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Rate my current textbook 1 out of 6 points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"What are the weather conditions in Noel?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Add this artist to the laugh list\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"I am rating Book of Challenges four stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"rate this textbook a 4\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Put an album by max richter into my this is Rosana playlist. \", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"where can i watch animated movies around here\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Is A Man, a Woman, and a Bank showing in the nearest Neighborhood Cinema Group\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"book a popular food truck in Kentucky\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Show me animated movies that are playig at Great Escape Theatres\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"add Sara Carter to my Nothing But A Party R&B\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"I would like an outdoor cafeteria for 3\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"rate the book Whit a zero\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Find a show called Time Is Just the Same.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"I need the weather in Hubbardston, will it be chillier?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"rate the previous essay four of 6 points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Add wiktor coj to the Sleep playlist.\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Rate Dixie Lullaby: A Story of Music, Race and New Beginnings in a New South five out of 6 points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"please put live with me onto my playlist named CARГЃCTER LATINO\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Please add tobymac's song onto the indiespensables playlist.\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Show me the movie schedule for Caribbean Cinemas\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Can you put this song on the metal xplorer playlist\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Add this tune to my rage radio playlist\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"I want to go see A Troll in Central Park.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Give the current series a one.\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"I'd like to watch animated movies at National Amusements\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"my jazz for loving couples needs more push the button\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"What are the movie schedules for Kerasotes Theatres\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"rate the Dry series two out of 6 stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"I want a list of showings of Days of Fire at Harkins Theatres\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Give White House Diary two points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"I'd like the weather forecast in Gang Mills four years from now.\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Rate Tropic of Capricorn two stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}], \"epochs_done\": 4, \"batches_seen\": 896, \"train_examples_seen\": 57180, \"impatience\": 0, \"patience_limit\": 5}}\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2019-02-12 12:16:28.776 INFO in 'deeppavlov.core.trainers.nn_trainer'['nn_trainer'] at line 163: New best sets_accuracy of 0.9585\n", - "2019-02-12 12:16:28.776 INFO in 'deeppavlov.core.trainers.nn_trainer'['nn_trainer'] at line 165: Saving model\n", - "2019-02-12 12:16:28.777 INFO in 'deeppavlov.models.classifiers.keras_classification_model'['keras_classification_model'] at line 386: [saving model to /home/vimary/ipavlov/Pilot/examples/tutorials/cnn_model_v1_opt.json]\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "{\"train\": {\"eval_examples_count\": 64, \"metrics\": {\"sets_accuracy\": 0.9688, \"f1_macro\": 0.9702, \"roc_auc\": 1.0}, \"time_spent\": \"0:00:15\", \"examples\": [{\"x\": \"Play Pandora on Last Fm\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"add this artist to my SinfonГ­a Hipster\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"play some movement by Franky Gee\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"When and where is Nefertiti, Queen of the Nile playing?\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"What movies are playing at Loews Cineplex?\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Include hohenfriedberger marsch to my Novedades Pop list.\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Find movie schedules at IMAX Corporation\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Rate A Tale of Love and Darkness 0 points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Play ClГЎsicos del Hip Hop EspaГ±ol\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"book for 10 in a restaurant\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Find the Return to Grace saga\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"What is the movie schedule 1 second from now\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"add tune to my this is animal collective\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Play new track from the fifties\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"add Blag Dahlia to Pura Vida\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Book a restasurant in Pohick Delaware.\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"show creative photograph of Icewind Dale: Heart of Winter\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"I want to book a delicatessen serving testaroli in Somalia for 7/25/2027.\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Book spot in Fults in Federated States Of Micronesia\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"can you find me a showing for Before the Music Dies in one second ?\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"find Remedial Chaos Theory, a soundtrack\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Show me the photograph A Woman from the Street\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Add this artist to my This Is Philip Glass playlist\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"i need to book a table for three in Lesotho\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Book me a restaurant reservation at 3\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Need a table for sep. first in Haiti for a party of three\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"book a restaurant for three on feb. 18\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"I'd like to watch movies at Amco Entertainment\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"play the Gary Chaw album\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Add this tomoyasu hotei song to my concentraciГіn playlist\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Rate the current textbook 1 of 6 stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"I'd like to listen to Space music\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"find Dickinson Theatres showing From Bondage to Freedom\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"will it rain on Jan. 18th, 2029 in Kanopolis Arkansas\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Find the schedule for animated movies nearby\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"find Plitt Theatres movie schedules\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Add this tune by Rafet el Roman to my Latin Pop Rising playlist.\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Book a reservation for a southern brazilian restaurant for 10 people within walking distance of Broadway-Lafayette St\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Play music from the list Indie Electronics\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Give this book a rating of four out of 6.\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"my feelin' good playlist needs some Mai Selim in it. \", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"I would like to book a restaurant in Poncha Springs for 8 at 00:32 am\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Rate this current essay a 5.\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Find a restaurant for marylou and I within walking distance of my mum's hotel\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"find a movie theatre showing The Tailor of Panama\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"show creative game Elements of Life: Remixed\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Play the best music by Arthur Johnston.\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"add Diarios de Bicicleta to my la la playlist\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"I want to add michelle heaton to this is chopin\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Is it going to be warm here for brunch\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Add an album to my week end playlist\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Please play something that's freak folk on Google Music\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Play the most popular Johnny Clarke on Deezer.\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Please find me the work, Instrumental Directions.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Find me the novel called Ressha Sentai ToQger\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"what animated movies are at the nearest movie house\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Play breed the killers on Itunes\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"book a restaurant that serves rolled oyster in Merkel\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"What will the humidity be like on june eighteenth in my current location\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Tell me when it'll be cloudy in Woodport\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"show movie times\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"what time is Bordertown Trail showing\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"book a table for one in a bar serving saucisse for meal in Calistoga CO\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Will there be snowfall in KY?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}], \"epochs_done\": 5, \"batches_seen\": 1120, \"train_examples_seen\": 71475, \"loss\": 1.1707027325672763}}\n", - "{\"valid\": {\"eval_examples_count\": 1589, \"metrics\": {\"sets_accuracy\": 0.9585, \"f1_macro\": 0.9579, \"roc_auc\": 0.998}, \"time_spent\": \"0:00:15\", \"examples\": [{\"x\": \"Book a table at Carter House Inn in Saint Bonaventure, Alaska.\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Rate the current textbook one of 6 stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"find a nearby movie schedule for movies\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"what is the Mississippi for the week\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Play me a song from 1968 on Spotify\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Book a table for me, naomi and elisabeth at a brasserie with wifi\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"The current album gets three out of 6 points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"find Goodrich Quality Theaters films\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"show me the picture Unfinished Monkey Business\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"When is The Third Eye showing at Dickinson Theatres?\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Please get me the Welcome to the Rileys game.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Find a song called Bronco Billy.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Rate this essay five stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"add tune to my relax & unwind playlist\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"play 2007 tunes by Bunny Berigan\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"book a table for ten downtown at a close-by restaurant\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Find the schedule for for Corn at eleven A.M. at Loews Cineplex Entertainment.\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"1 minute from now, I will need reservations at a restaurant in Vanlue.\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Play hanging in the balance by Nik Kershaw on Zvooq.\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Will it be windy at 4 Pm in NY?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Rate my current textbook 1 out of 6 points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"What are the weather conditions in Noel?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Add this artist to the laugh list\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"I am rating Book of Challenges four stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"rate this textbook a 4\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Put an album by max richter into my this is Rosana playlist. \", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"where can i watch animated movies around here\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Is A Man, a Woman, and a Bank showing in the nearest Neighborhood Cinema Group\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"book a popular food truck in Kentucky\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Show me animated movies that are playig at Great Escape Theatres\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"add Sara Carter to my Nothing But A Party R&B\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"I would like an outdoor cafeteria for 3\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"rate the book Whit a zero\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Find a show called Time Is Just the Same.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"I need the weather in Hubbardston, will it be chillier?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"rate the previous essay four of 6 points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Add wiktor coj to the Sleep playlist.\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Rate Dixie Lullaby: A Story of Music, Race and New Beginnings in a New South five out of 6 points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"please put live with me onto my playlist named CARГЃCTER LATINO\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Please add tobymac's song onto the indiespensables playlist.\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Show me the movie schedule for Caribbean Cinemas\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Can you put this song on the metal xplorer playlist\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Add this tune to my rage radio playlist\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"I want to go see A Troll in Central Park.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Give the current series a one.\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"I'd like to watch animated movies at National Amusements\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"my jazz for loving couples needs more push the button\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"What are the movie schedules for Kerasotes Theatres\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"rate the Dry series two out of 6 stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"I want a list of showings of Days of Fire at Harkins Theatres\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Give White House Diary two points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"I'd like the weather forecast in Gang Mills four years from now.\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Rate Tropic of Capricorn two stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}], \"epochs_done\": 5, \"batches_seen\": 1120, \"train_examples_seen\": 71475, \"impatience\": 0, \"patience_limit\": 5}}\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2019-02-12 12:16:31.141 INFO in 'deeppavlov.core.trainers.nn_trainer'['nn_trainer'] at line 163: New best sets_accuracy of 0.9604\n", - "2019-02-12 12:16:31.141 INFO in 'deeppavlov.core.trainers.nn_trainer'['nn_trainer'] at line 165: Saving model\n", - "2019-02-12 12:16:31.142 INFO in 'deeppavlov.models.classifiers.keras_classification_model'['keras_classification_model'] at line 386: [saving model to /home/vimary/ipavlov/Pilot/examples/tutorials/cnn_model_v1_opt.json]\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "{\"train\": {\"eval_examples_count\": 64, \"metrics\": {\"sets_accuracy\": 0.9844, \"f1_macro\": 0.9808, \"roc_auc\": 0.9935}, \"time_spent\": \"0:00:17\", \"examples\": [{\"x\": \"Will it be freezing on 4/20/2038 in AMerican Beach NC\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"put live and rare into dancehall official\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"What is the weather like in Wyatte\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"rate The Descendants two points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Find the movie schedule for animated movies in the neighbourhood.\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"What time is The Bride’s Journey playing at Star Theatres?\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Help me find the saga titled The Eternal Return\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"add iemand als jij to my playlist named In The Name Of Blues\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Make me a reservation in Hardesty at a joint the is indoor\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Play a song from 2003\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"play Disney Sing It! – High School Musical 3: Senior Year\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"I want to book a restaurant in 40 weeks in Iowa.\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Will it be hot in Keachi\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"rate this textbook a one\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"What time is A Man for Burning playing\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"rate this book three points out of 6\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"give Heartland chronicle four points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Turn on Spotify to Tiny Tim ep\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"The Postman is awful and only gets a 1 out of 6. \", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Will the weather this week be warmer in Crystal River?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Rate this novel four stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"In Wynnedale AK will it blizzard\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Please give me the movie schedule for Pacific Theatres\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Find me the novel of A Dictionary of Slang and Unconventional English\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Tell me when sunrise is in Tennessee\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Please search for Columbia Records 1958–1986.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Find the movie schedule close by\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Can you play some music by andrew diamond\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Is The Strange Case of the End of Civilization as We Know It playing at the movie theatre\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Rate The Astonishing Life of Octavian Nothing, Traitor to the Nation, Volume II: The Kingdom on the Waves series 2 points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"I want to see Wenn Lucy springt now at a movie theatre.\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"give 0 out of 6 points to current book\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"What will the weather be in Rwanda?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"add the artist a j pero to my Country Gold playlist\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"For this series I give the rating of four of 6 stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Locate the Koi to Senkyo to Chocolate television show\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"What's the forecast for Pipe Spring National Monument?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"this winter, meredith, betty and erika want to food at a gastropub that is in the same area as fran's location.\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Give me the schedule now at the nearest movie house\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"I want to book a restaurant in Reily VT.\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"How cloudy is it in Morrisonville, Kentucky\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Play some noise music on Netflix.\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Rate this textbook 4 out of 6 stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"What time are movies showing at Megaplex Theatres\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"book a spot at a highly rated afghan restaurant\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"What time is The Bride Wore Boots playing\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"I give Ruled Britannia a rating of five out of 6.\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"I want to play the game The Carny\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"add agua y sal in Classic Jazz Funk\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Give the current album 1 star\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"find a book called The Mad Magician\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"What's the weather in Gabon\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Can you play a top song from a chyi chin concerto\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Play the album 21st Century Live by Chet Lam on Itunes.\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"I need some Hardcore Hip Hop\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Put gregory douglass in Halloween Teens please\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Find the movie schedule for ArcLight Hollywood.\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Book something for my girlfriend and I at a food truck that has pizzas in Brookwood on October fifteenth, 2020\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Can you put Musiri Subramania Iyer's song onto the lo-fi love soundtrack?\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"what is the weather forecast for Cuba at eleven am\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Go to the movie The Best of Pirates of the Mississippi\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"search for a photograph of Road Hogs\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Give The Turning Point a 0 out of 6.\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"find the movie schedule\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}], \"epochs_done\": 6, \"batches_seen\": 1344, \"train_examples_seen\": 85770, \"loss\": 1.153284895100764}}\n", - "{\"valid\": {\"eval_examples_count\": 1589, \"metrics\": {\"sets_accuracy\": 0.9604, \"f1_macro\": 0.9598, \"roc_auc\": 0.9981}, \"time_spent\": \"0:00:17\", \"examples\": [{\"x\": \"Book a table at Carter House Inn in Saint Bonaventure, Alaska.\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Rate the current textbook one of 6 stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"find a nearby movie schedule for movies\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"what is the Mississippi for the week\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Play me a song from 1968 on Spotify\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Book a table for me, naomi and elisabeth at a brasserie with wifi\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"The current album gets three out of 6 points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"find Goodrich Quality Theaters films\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"show me the picture Unfinished Monkey Business\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"When is The Third Eye showing at Dickinson Theatres?\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Please get me the Welcome to the Rileys game.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Find a song called Bronco Billy.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Rate this essay five stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"add tune to my relax & unwind playlist\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"play 2007 tunes by Bunny Berigan\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"book a table for ten downtown at a close-by restaurant\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Find the schedule for for Corn at eleven A.M. at Loews Cineplex Entertainment.\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"1 minute from now, I will need reservations at a restaurant in Vanlue.\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Play hanging in the balance by Nik Kershaw on Zvooq.\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Will it be windy at 4 Pm in NY?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Rate my current textbook 1 out of 6 points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"What are the weather conditions in Noel?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Add this artist to the laugh list\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"I am rating Book of Challenges four stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"rate this textbook a 4\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Put an album by max richter into my this is Rosana playlist. \", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"where can i watch animated movies around here\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Is A Man, a Woman, and a Bank showing in the nearest Neighborhood Cinema Group\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"book a popular food truck in Kentucky\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Show me animated movies that are playig at Great Escape Theatres\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"add Sara Carter to my Nothing But A Party R&B\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"I would like an outdoor cafeteria for 3\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"rate the book Whit a zero\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Find a show called Time Is Just the Same.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"I need the weather in Hubbardston, will it be chillier?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"rate the previous essay four of 6 points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Add wiktor coj to the Sleep playlist.\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Rate Dixie Lullaby: A Story of Music, Race and New Beginnings in a New South five out of 6 points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"please put live with me onto my playlist named CARГЃCTER LATINO\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Please add tobymac's song onto the indiespensables playlist.\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Show me the movie schedule for Caribbean Cinemas\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Can you put this song on the metal xplorer playlist\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Add this tune to my rage radio playlist\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"I want to go see A Troll in Central Park.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Give the current series a one.\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"I'd like to watch animated movies at National Amusements\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"my jazz for loving couples needs more push the button\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"What are the movie schedules for Kerasotes Theatres\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"rate the Dry series two out of 6 stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"I want a list of showings of Days of Fire at Harkins Theatres\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Give White House Diary two points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"I'd like the weather forecast in Gang Mills four years from now.\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Rate Tropic of Capricorn two stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}], \"epochs_done\": 6, \"batches_seen\": 1344, \"train_examples_seen\": 85770, \"impatience\": 0, \"patience_limit\": 5}}\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2019-02-12 12:16:33.547 INFO in 'deeppavlov.core.trainers.nn_trainer'['nn_trainer'] at line 163: New best sets_accuracy of 0.9622\n", - "2019-02-12 12:16:33.548 INFO in 'deeppavlov.core.trainers.nn_trainer'['nn_trainer'] at line 165: Saving model\n", - "2019-02-12 12:16:33.548 INFO in 'deeppavlov.models.classifiers.keras_classification_model'['keras_classification_model'] at line 386: [saving model to /home/vimary/ipavlov/Pilot/examples/tutorials/cnn_model_v1_opt.json]\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "{\"train\": {\"eval_examples_count\": 64, \"metrics\": {\"sets_accuracy\": 0.9375, \"f1_macro\": 0.9374, \"roc_auc\": 0.997}, \"time_spent\": \"0:00:19\", \"examples\": [{\"x\": \"How much wind will there be in NM on november 11th\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"find The Many Loves of Dobie Gillis\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Put Jazzy B on Lazy Chill Afternoon playlist\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"What time is The Bride from Hell playing at Malco Theatres\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"I am giving the book After Henry a rating of 0 out of 6 stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"I need to add an artist to one of my playlists, Classical New Releases Spotify Picks.\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Will it be warm here in one hour\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Is it freezing in Kelso\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"give The Story of the Last Thought a five\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Add Nazad, nazad, Kalino mome to Escapada\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Add outside the dream syndicate to millicent's fresh electronic playlist\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Rate this album two out of 6 stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Book a reservation for me and my step sister in Nebraska in two seconds\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"I want to play music from 1979 on Groove Shark.\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"book a restaurant for tortelloni for eight\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Add Shooter Jennings to the All Out 70s playlist. \", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"find the book Metallica Through the Never\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Add Hanging On to my just dance by aftercluv playlist.\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"play a good John Maher record with Netflix\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"I want to rate my current book three out of 6 points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Which animated movies are showing close by?\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Give 4 stars to the current essay\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"I want to see Sympathy for the Devil\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"She me movie times at Mann Theatres\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Looking for the saga called The Scofflaw\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Will there be a snowstorm in Pomona, New Mexico?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"What will the weather be in Deer River?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"What's the weather in South Punta Gorda Heights\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Find movie times for Great Escape Theatres\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Make me a reservation in Colorado at nine am at National Cash Register Building\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Play Elitsa Todorova music\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"add this artist to the playlist cool jazz\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"rate this album one out of 6\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"What will the weather be like at my current spot on january the 19th\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Please look up the television show, Noel Hill & Tony Linnane.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Can you put this tune onto Latin Dance Cardio?\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Rate current novel two stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Play some sixties on netflix\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Find a soundtrack called The Dragon.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Play Trace Adkins' music from the thirties.\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Can i see the Boat People?\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Add the song to my R&B Movement playlist.\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"When is Robotix playing?\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Find me An Echo in the Darkness\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Where can I watch the television show called Fangs of the Arctic?\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"rate this book 5 points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"give one out of 6 stars to Free Market Fairness\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Add nina hagen to essential folk\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"play The Edge by Deezer on Vans Warped Tour Compilation 2003\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"I need a reservation for January 9 at a restaurant that serves souvlaki nearby Cypress Av for a party of 1\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Find time for College Rock Stars at any movie theatre\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"add Elvis Presley and America in my playlist Electro Workout\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Give one 6 stars to this book\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"add Michael Hayvoronsky to Lo Que Suena Los Angeles\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"How do I rate this book 4 stars?\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"want to eat somewhere windy in NM\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Open Netflix and find a movie with the song heartful\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"What film is playing nearby\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Put any 1972 record on.\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Rate the Beyond This Place chronicle three of 6\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"will Dick Tracy e il gas misterioso start twenty one hours from now\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"play a sound track by Vegard Sverre Tveitan\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Play the song Victim Of Changes from Hawkshaw Hawkins on Spotify\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"I need to book a restaurant for eight nearby Limerick one year from now that serves jerky \", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}], \"epochs_done\": 7, \"batches_seen\": 1568, \"train_examples_seen\": 100065, \"loss\": 1.1379986719361372}}\n", - "{\"valid\": {\"eval_examples_count\": 1589, \"metrics\": {\"sets_accuracy\": 0.9622, \"f1_macro\": 0.9617, \"roc_auc\": 0.9982}, \"time_spent\": \"0:00:19\", \"examples\": [{\"x\": \"Book a table at Carter House Inn in Saint Bonaventure, Alaska.\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Rate the current textbook one of 6 stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"find a nearby movie schedule for movies\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"what is the Mississippi for the week\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Play me a song from 1968 on Spotify\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Book a table for me, naomi and elisabeth at a brasserie with wifi\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"The current album gets three out of 6 points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"find Goodrich Quality Theaters films\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"show me the picture Unfinished Monkey Business\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"When is The Third Eye showing at Dickinson Theatres?\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Please get me the Welcome to the Rileys game.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Find a song called Bronco Billy.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Rate this essay five stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"add tune to my relax & unwind playlist\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"play 2007 tunes by Bunny Berigan\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"book a table for ten downtown at a close-by restaurant\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Find the schedule for for Corn at eleven A.M. at Loews Cineplex Entertainment.\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"1 minute from now, I will need reservations at a restaurant in Vanlue.\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Play hanging in the balance by Nik Kershaw on Zvooq.\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Will it be windy at 4 Pm in NY?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Rate my current textbook 1 out of 6 points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"What are the weather conditions in Noel?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Add this artist to the laugh list\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"I am rating Book of Challenges four stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"rate this textbook a 4\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Put an album by max richter into my this is Rosana playlist. \", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"where can i watch animated movies around here\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Is A Man, a Woman, and a Bank showing in the nearest Neighborhood Cinema Group\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"book a popular food truck in Kentucky\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Show me animated movies that are playig at Great Escape Theatres\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"add Sara Carter to my Nothing But A Party R&B\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"I would like an outdoor cafeteria for 3\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"rate the book Whit a zero\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Find a show called Time Is Just the Same.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"I need the weather in Hubbardston, will it be chillier?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"rate the previous essay four of 6 points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Add wiktor coj to the Sleep playlist.\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Rate Dixie Lullaby: A Story of Music, Race and New Beginnings in a New South five out of 6 points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"please put live with me onto my playlist named CARГЃCTER LATINO\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Please add tobymac's song onto the indiespensables playlist.\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Show me the movie schedule for Caribbean Cinemas\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Can you put this song on the metal xplorer playlist\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Add this tune to my rage radio playlist\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"I want to go see A Troll in Central Park.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Give the current series a one.\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"I'd like to watch animated movies at National Amusements\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"my jazz for loving couples needs more push the button\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"What are the movie schedules for Kerasotes Theatres\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"rate the Dry series two out of 6 stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"I want a list of showings of Days of Fire at Harkins Theatres\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Give White House Diary two points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"I'd like the weather forecast in Gang Mills four years from now.\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Rate Tropic of Capricorn two stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}], \"epochs_done\": 7, \"batches_seen\": 1568, \"train_examples_seen\": 100065, \"impatience\": 0, \"patience_limit\": 5}}\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2019-02-12 12:16:35.979 INFO in 'deeppavlov.core.trainers.nn_trainer'['nn_trainer'] at line 169: Did not improve on the sets_accuracy of 0.9622\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "{\"train\": {\"eval_examples_count\": 64, \"metrics\": {\"sets_accuracy\": 0.9844, \"f1_macro\": 0.9849, \"roc_auc\": 1.0}, \"time_spent\": \"0:00:22\", \"examples\": [{\"x\": \"can you get me the trailer of The Multiversity?\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Find the films at ArcLight Hollywood.\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Will the weather be temperate 22 minutes from now in Alba\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"I'm looking for a picture titled Rock Painting\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"What's the weather forecast for Haigler?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Plpay my Disco Fever playlist.\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Add artist to playlist Epic Gaming\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Show me Rapid City Muscle Car.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Find the schedule for Evening Clothes in 1 second.\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"add pete shelley to Is It New Wave\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"play some fifties tunes by Mike Mccready\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"play the new Feist on deezer\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Find a show called Ichibyōgoto ni Love for You.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Play some Rockwell from around 1996\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Add Jarvis Cocker to my Chillin' on a Dirt Road playlist\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Please play With Echoes In The Movement Of Stone by Faith Evans.\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Is there a blizzard in Tennessee Colony, KS\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"What is the local movie schedule\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"I want to listen to Swing music on Iheart\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"how's the forecast for my current spot\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"my step aunt and I want to go cheese fries at the tavern\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Play me something by Funtwo\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"I need to add a tune by Amanda Stern to the playlist cloud rap.\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Please get me the Before Crisis: Final Fantasy VII television show.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Book a popular bar in Chowchilla\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Find the TV series I Build the Tower \", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Find the schedule for Metallica Through the Never.\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"show creativity of Doomsday Comfort\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"is it going to be foggy in Jewell Cemetery State Historic Site 7 weeks from now\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"play Punk Essentials on Zvooq\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Which films are playing at the closest cinema?\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"I want to find the video game Masada Anniversary Edition Vol. 3: The Unknown Masada\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"rate this album one stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Is The Right to Strike playing at Star Theatres\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"play Latin Dinner\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Rate this current essay a 5.\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"I want to hear that tune from 2010\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Play thirties concerto music on Google Music\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"show movie times at sunrise\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Please look up for the work titled We Own The Night.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Play a song from Helena Iren Michaelsen on Spotify\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"rate the current book two stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Play Elizeth Cardoso to my Nothing But A Party R&B playlist\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Play the most popular music by Ronald Isley on Google Music\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"I want to hear something eclectic\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Go to the saga The Quantum Thief\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"I'd like to see weather conditions for Ennis.\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Play some songs from the fifties\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"play By The Sleepy Lagoon by Greg Kurstin\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"I want to book a restaurant in the same area where I live in MA for ebony and yolanda.\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"What is the weather forecast for Agate Fossil Beds National Monument\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"add this track to my global funk\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"play Zvooq\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"I'd like to rate this textbook 4 out of 6\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Book a reservation for one at a highly rated restaurant in Datil\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"play a symphony that is good from 2000\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"What is the movie schedule today at Neighborhood Cinema Group?\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"I'd like to eat salads at a restaurant\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"find Plitt Theatres movie schedules\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"I think this novel only deserves 2 points out of 6.\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Is it rainy at the Edward L. Ryerson Conservation Area?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Will it be chillier on october 17 nearby East Glacier Park\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Give the current series a rating of three.\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"book an oyster bar in AMerican Samoa for lunch\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}], \"epochs_done\": 8, \"batches_seen\": 1792, \"train_examples_seen\": 114360, \"loss\": 1.127019821533135}}\n", - "{\"valid\": {\"eval_examples_count\": 1589, \"metrics\": {\"sets_accuracy\": 0.9622, \"f1_macro\": 0.9617, \"roc_auc\": 0.9983}, \"time_spent\": \"0:00:22\", \"examples\": [{\"x\": \"Book a table at Carter House Inn in Saint Bonaventure, Alaska.\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Rate the current textbook one of 6 stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"find a nearby movie schedule for movies\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"what is the Mississippi for the week\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Play me a song from 1968 on Spotify\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Book a table for me, naomi and elisabeth at a brasserie with wifi\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"The current album gets three out of 6 points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"find Goodrich Quality Theaters films\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"show me the picture Unfinished Monkey Business\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"When is The Third Eye showing at Dickinson Theatres?\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Please get me the Welcome to the Rileys game.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Find a song called Bronco Billy.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Rate this essay five stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"add tune to my relax & unwind playlist\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"play 2007 tunes by Bunny Berigan\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"book a table for ten downtown at a close-by restaurant\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Find the schedule for for Corn at eleven A.M. at Loews Cineplex Entertainment.\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"1 minute from now, I will need reservations at a restaurant in Vanlue.\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Play hanging in the balance by Nik Kershaw on Zvooq.\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Will it be windy at 4 Pm in NY?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Rate my current textbook 1 out of 6 points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"What are the weather conditions in Noel?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Add this artist to the laugh list\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"I am rating Book of Challenges four stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"rate this textbook a 4\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Put an album by max richter into my this is Rosana playlist. \", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"where can i watch animated movies around here\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Is A Man, a Woman, and a Bank showing in the nearest Neighborhood Cinema Group\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"book a popular food truck in Kentucky\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Show me animated movies that are playig at Great Escape Theatres\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"add Sara Carter to my Nothing But A Party R&B\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"I would like an outdoor cafeteria for 3\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"rate the book Whit a zero\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Find a show called Time Is Just the Same.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"I need the weather in Hubbardston, will it be chillier?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"rate the previous essay four of 6 points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Add wiktor coj to the Sleep playlist.\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Rate Dixie Lullaby: A Story of Music, Race and New Beginnings in a New South five out of 6 points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"please put live with me onto my playlist named CARГЃCTER LATINO\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Please add tobymac's song onto the indiespensables playlist.\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Show me the movie schedule for Caribbean Cinemas\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Can you put this song on the metal xplorer playlist\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Add this tune to my rage radio playlist\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"I want to go see A Troll in Central Park.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Give the current series a one.\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"I'd like to watch animated movies at National Amusements\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"my jazz for loving couples needs more push the button\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"What are the movie schedules for Kerasotes Theatres\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"rate the Dry series two out of 6 stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"I want a list of showings of Days of Fire at Harkins Theatres\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Give White House Diary two points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"I'd like the weather forecast in Gang Mills four years from now.\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Rate Tropic of Capricorn two stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}], \"epochs_done\": 8, \"batches_seen\": 1792, \"train_examples_seen\": 114360, \"impatience\": 1, \"patience_limit\": 5}}\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2019-02-12 12:16:38.311 INFO in 'deeppavlov.core.trainers.nn_trainer'['nn_trainer'] at line 163: New best sets_accuracy of 0.9629\n", - "2019-02-12 12:16:38.312 INFO in 'deeppavlov.core.trainers.nn_trainer'['nn_trainer'] at line 165: Saving model\n", - "2019-02-12 12:16:38.312 INFO in 'deeppavlov.models.classifiers.keras_classification_model'['keras_classification_model'] at line 386: [saving model to /home/vimary/ipavlov/Pilot/examples/tutorials/cnn_model_v1_opt.json]\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "{\"train\": {\"eval_examples_count\": 64, \"metrics\": {\"sets_accuracy\": 0.9844, \"f1_macro\": 0.9837, \"roc_auc\": 0.9983}, \"time_spent\": \"0:00:24\", \"examples\": [{\"x\": \"Rate my current essay 1 out of 6 stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"What's the weather in FL?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"play me some Dom Pachino\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Is cloudy in Lyncourt?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Is temperature in Hanksville freezing ?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"play some Bertine Zetlitz record\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"play latest George Ducas music\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"For The Curious Incident of the Dog in the Nightdress I rate it 2 of 6 points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"book verdure serving restaurant in Bloom City\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"I want another song in my rock espaГ±ol playlist. \", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Give the current essay five points / 6.\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Play show of Cissy Houston\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Add the artist Gwenno Pipette to the sexy as folk playlist. \", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"I rate Egg Collecting and Bird Life of Australia a zero out of 6 points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Book me a tibetan restaurant for my boss and I.\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"I give The Logic of Sense a zero\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"rate The Ape-Man Within 4\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"play In The Disco by Danny Hutton\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"What is the weather like right now for Fort Adams?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Add Grey Cloudy Lies to the hip hop playlist. \", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"I'd like to hear the song In a Reverie\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Add por tu maldito amor to my orgullo gay\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Where can I buy The Lying Game\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"rate the current novel 0 of 6\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Please find the movie, A Jingle with Jillian.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"I want to find a restaurant that has a table for two at 5 AM\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"tell me how Bellwood weather is\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Find a table for madge and tami at a faraway joint on Sterling St that serves chicken divan\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"What cinema has the closest movies\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Give The Blue Equinox series 5 points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Add this album to my hot house playlist\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"I need a table booking for a highly rated sardinian pub.\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Can you put some monifah on my disco fever playlist\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"I'd like to see movie schedules for animated movies around here\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Is F.I.S.T. at Malco Theatres\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"What is the weather going to be like in Klondike Gold Rush National Historical Park on february the 28th, 2034?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"show creative picture of The Secret Doctrine\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Book a restaurant in CA for my parents and I on oct. the seventeenth\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Need to find the TV series called Administrative Behavior\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Find the work I Looked Up\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"rate this album book zero out of 6 points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"I want to listen to Merrily We Roll Along by Marko Desantis.\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Play the album entitled Se Potrei Avere Te.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Please help me find the video game John Michael Montgomery discography.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Can you find the album SimpleScreenRecorder\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Will it be warm in Kipp Rhode Island one hour and 9 seconds from now?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Find the movie times at Bow Tie Cinemas.\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Add Gary Valenciano to the power gaming playlist. \", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Add this tune to the Leche con Chocolate playlist\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"show me the movie times in the neighbourhood\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Give The Street five points.\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"I want to book a restaurant in Ayer for 2 people.\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Find the Endangered Species song\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"book a gibassier serving tavern in Vermont for nine\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Book me a restaurant reservation for a party of 8 ten hours from now\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Is it warm in Albania at noon\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Is it going to get any hotter in Kerrick?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"book a table for ten in Pollock PA\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"book Guenther House for 6 on Oct. 24, 2035 in Waddy\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"play my melodious playlist\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Add Hallucinations of Despair to my this is trey songz playlist.\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"I need to book a restaurant in Burkettsville in 2 years for rhoda adams, roxanne and I\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"give me the local movie times\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"open Itunes and play Kenny Cox most popular concerto\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}], \"epochs_done\": 9, \"batches_seen\": 2016, \"train_examples_seen\": 128655, \"loss\": 1.1169953814574651}}\n", - "{\"valid\": {\"eval_examples_count\": 1589, \"metrics\": {\"sets_accuracy\": 0.9629, \"f1_macro\": 0.9623, \"roc_auc\": 0.9983}, \"time_spent\": \"0:00:24\", \"examples\": [{\"x\": \"Book a table at Carter House Inn in Saint Bonaventure, Alaska.\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Rate the current textbook one of 6 stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"find a nearby movie schedule for movies\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"what is the Mississippi for the week\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Play me a song from 1968 on Spotify\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Book a table for me, naomi and elisabeth at a brasserie with wifi\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"The current album gets three out of 6 points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"find Goodrich Quality Theaters films\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"show me the picture Unfinished Monkey Business\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"When is The Third Eye showing at Dickinson Theatres?\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Please get me the Welcome to the Rileys game.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Find a song called Bronco Billy.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Rate this essay five stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"add tune to my relax & unwind playlist\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"play 2007 tunes by Bunny Berigan\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"book a table for ten downtown at a close-by restaurant\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Find the schedule for for Corn at eleven A.M. at Loews Cineplex Entertainment.\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"1 minute from now, I will need reservations at a restaurant in Vanlue.\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Play hanging in the balance by Nik Kershaw on Zvooq.\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Will it be windy at 4 Pm in NY?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Rate my current textbook 1 out of 6 points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"What are the weather conditions in Noel?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Add this artist to the laugh list\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"I am rating Book of Challenges four stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"rate this textbook a 4\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Put an album by max richter into my this is Rosana playlist. \", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"where can i watch animated movies around here\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Is A Man, a Woman, and a Bank showing in the nearest Neighborhood Cinema Group\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"book a popular food truck in Kentucky\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Show me animated movies that are playig at Great Escape Theatres\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"add Sara Carter to my Nothing But A Party R&B\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"I would like an outdoor cafeteria for 3\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"rate the book Whit a zero\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Find a show called Time Is Just the Same.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"I need the weather in Hubbardston, will it be chillier?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"rate the previous essay four of 6 points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Add wiktor coj to the Sleep playlist.\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Rate Dixie Lullaby: A Story of Music, Race and New Beginnings in a New South five out of 6 points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"please put live with me onto my playlist named CARГЃCTER LATINO\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Please add tobymac's song onto the indiespensables playlist.\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Show me the movie schedule for Caribbean Cinemas\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Can you put this song on the metal xplorer playlist\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Add this tune to my rage radio playlist\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"I want to go see A Troll in Central Park.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Give the current series a one.\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"I'd like to watch animated movies at National Amusements\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"my jazz for loving couples needs more push the button\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"What are the movie schedules for Kerasotes Theatres\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"rate the Dry series two out of 6 stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"I want a list of showings of Days of Fire at Harkins Theatres\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Give White House Diary two points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"I'd like the weather forecast in Gang Mills four years from now.\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Rate Tropic of Capricorn two stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}], \"epochs_done\": 9, \"batches_seen\": 2016, \"train_examples_seen\": 128655, \"impatience\": 0, \"patience_limit\": 5}}\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2019-02-12 12:16:40.661 INFO in 'deeppavlov.core.trainers.nn_trainer'['nn_trainer'] at line 169: Did not improve on the sets_accuracy of 0.9629\n", - "2019-02-12 12:16:40.693 INFO in 'deeppavlov.core.data.simple_vocab'['simple_vocab'] at line 103: [loading vocabulary from /home/vimary/ipavlov/Pilot/examples/tutorials/snips/classes.dict]\n", - "2019-02-12 12:16:40.693 INFO in 'deeppavlov.models.embedders.glove_embedder'['glove_embedder'] at line 52: [loading GloVe embeddings from `/home/vimary/ipavlov/Pilot/examples/tutorials/glove.6B.100d.txt`]\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "{\"train\": {\"eval_examples_count\": 64, \"metrics\": {\"sets_accuracy\": 1.0, \"f1_macro\": 1.0, \"roc_auc\": 0.9996}, \"time_spent\": \"0:00:26\", \"examples\": [{\"x\": \"book in town for 3 at a restaurant outdoor that is not far\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Need a table for the day after tomorrow in Clarenceville at the Black Rapids Roadhouse\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"What will the weather be like this tuesday in the area neighboring Rendezvous Mountain Educational State Forest?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Rate The CIA and the Cult of Intelligence a 5.\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Is the forecast windy in Nigeria on Nov. the 6th\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Book the nearby Meriton Grand Hotel Tallinn in Missouri.\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"please give me the movie schedule\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Book a reservation for 4 for Cherry Hut at Noon\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"I think Memorial Day should have a rating value of 3 and a best rating of 6\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Play some G. V. Prakash Kumar\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"is The Clowns at the nearest cinema \", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"when is Letters from a Porcupine showing at Alamo Drafthouse Cinema\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Play some fun-punk\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"book Guenther House for 6 on Oct. 24, 2035 in Waddy\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"what is the forecast in North Carolina\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"what's the movie schedules for in the neighborhood at the movies\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Book a highly rated food court for 2 people on jul. 4th.\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Can you put freddie freeloader on the playlist instrumental madness\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Please get me the Just the Hits 2 TV show.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Book a reservation for seven people at Fraser Mansion in IL\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Use Spotify to play Who Was In My Room Last Night?\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"What is the forecast in Lono\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"I'd like to watch Sherlock Holmes à New York at KB Theatres\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Let's listen to the most popular Marty Friedman songs on Zvooq.\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"weather in Tioga Colorado\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"I give The Monkey and the Tiger a rating of 2 points.\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Find WxHexEditor.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Book a table for my granddaughter and I at the highly rated restaurant that is close by in Tuvalu.\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Add The Maid of Amsterdam to my 80s smash hits\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"find the picture Louder Than Bombs\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Will it rain in Barberville\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Find a show called Ichibyōgoto ni Love for You.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Will it be warmer near here on jan. the fifteenth?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Rate The Travels of Lao Can five out of 6\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"What time is Sontha Ooru playing\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Rate The Hindus: An Alternative History 3 of 6 stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"play latest George Ducas music\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"find movie schedules for Dickinson Theatres\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Rate this book a five \", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"find me a restaurant in Pembine Montana\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"What is the movie schedule right now for movies around here\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Is it going to be foggy at two am in Barberville\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"What will be the forecast for Belarus in the future around sep. the 22nd, 2020?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"I need to find the work Brotherly Love\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Find me the Spartan: Total Warrior painting\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"I'd like to see the television show Best-Of: Design of a Decade 2003–2013\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Add this album to my spotify orchestra cello playlist\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Give this textbook a rating of three.\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Give The Irish Filmography saga a rating of 2 out of 6.\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"What is the weather forecast for Theodore Roosevelt Inaugural National Historic Site\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"play Isaac Hayes on Pandora from love, sweat and beer ep\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Will it be colder in Oswego 16 weeks from now ?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"What is the movies playing at North American Cinemas\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"What will the weather be a nine in Willow River State Park?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Find me the photograph The Late Music\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Get me a Johnny Cool photograph\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"add artist my laundry playlist\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Is the forecast colder in Idaho 1 second from now\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"book a spot for 3 at the pizza place\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Find me a showing of The Vanquished that starts nine hours and 1 second from now.\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Please help me find the video game John Michael Montgomery discography.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Search for To Heart 2\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Add Nightmares That Surface from Shallow Sleep to michael's Rock Solid playlist\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Find a show called The Inheritors\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}], \"epochs_done\": 10, \"batches_seen\": 2240, \"train_examples_seen\": 142950, \"loss\": 1.10787156862872}}\n", - "{\"valid\": {\"eval_examples_count\": 1589, \"metrics\": {\"sets_accuracy\": 0.9629, \"f1_macro\": 0.9623, \"roc_auc\": 0.9983}, \"time_spent\": \"0:00:27\", \"examples\": [{\"x\": \"Book a table at Carter House Inn in Saint Bonaventure, Alaska.\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Rate the current textbook one of 6 stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"find a nearby movie schedule for movies\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"what is the Mississippi for the week\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Play me a song from 1968 on Spotify\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Book a table for me, naomi and elisabeth at a brasserie with wifi\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"The current album gets three out of 6 points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"find Goodrich Quality Theaters films\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"show me the picture Unfinished Monkey Business\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"When is The Third Eye showing at Dickinson Theatres?\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Please get me the Welcome to the Rileys game.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Find a song called Bronco Billy.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Rate this essay five stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"add tune to my relax & unwind playlist\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"play 2007 tunes by Bunny Berigan\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"book a table for ten downtown at a close-by restaurant\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Find the schedule for for Corn at eleven A.M. at Loews Cineplex Entertainment.\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"1 minute from now, I will need reservations at a restaurant in Vanlue.\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Play hanging in the balance by Nik Kershaw on Zvooq.\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Will it be windy at 4 Pm in NY?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Rate my current textbook 1 out of 6 points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"What are the weather conditions in Noel?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Add this artist to the laugh list\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"I am rating Book of Challenges four stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"rate this textbook a 4\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Put an album by max richter into my this is Rosana playlist. \", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"where can i watch animated movies around here\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Is A Man, a Woman, and a Bank showing in the nearest Neighborhood Cinema Group\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"book a popular food truck in Kentucky\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Show me animated movies that are playig at Great Escape Theatres\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"add Sara Carter to my Nothing But A Party R&B\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"I would like an outdoor cafeteria for 3\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"rate the book Whit a zero\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Find a show called Time Is Just the Same.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"I need the weather in Hubbardston, will it be chillier?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"rate the previous essay four of 6 points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Add wiktor coj to the Sleep playlist.\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Rate Dixie Lullaby: A Story of Music, Race and New Beginnings in a New South five out of 6 points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"please put live with me onto my playlist named CARГЃCTER LATINO\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Please add tobymac's song onto the indiespensables playlist.\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Show me the movie schedule for Caribbean Cinemas\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Can you put this song on the metal xplorer playlist\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Add this tune to my rage radio playlist\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"I want to go see A Troll in Central Park.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Give the current series a one.\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"I'd like to watch animated movies at National Amusements\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"my jazz for loving couples needs more push the button\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"What are the movie schedules for Kerasotes Theatres\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"rate the Dry series two out of 6 stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"I want a list of showings of Days of Fire at Harkins Theatres\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Give White House Diary two points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"I'd like the weather forecast in Gang Mills four years from now.\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Rate Tropic of Capricorn two stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}], \"epochs_done\": 10, \"batches_seen\": 2240, \"train_examples_seen\": 142950, \"impatience\": 1, \"patience_limit\": 5}}\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2019-02-12 12:17:00.634 INFO in 'deeppavlov.models.classifiers.keras_classification_model'['keras_classification_model'] at line 302: [initializing `KerasClassificationModel` from saved]\n", - "2019-02-12 12:17:00.963 INFO in 'deeppavlov.models.classifiers.keras_classification_model'['keras_classification_model'] at line 312: [loading weights from cnn_model_v1.h5]\n", - "2019-02-12 12:17:01.131 INFO in 'deeppavlov.models.classifiers.keras_classification_model'['keras_classification_model'] at line 136: Model was successfully initialized!\n", - "Model summary:\n", - "__________________________________________________________________________________________________\n", - "Layer (type) Output Shape Param # Connected to \n", - "==================================================================================================\n", - "input_1 (InputLayer) (None, None, 100) 0 \n", - "__________________________________________________________________________________________________\n", - "conv1d_1 (Conv1D) (None, None, 256) 25856 input_1[0][0] \n", - "__________________________________________________________________________________________________\n", - "conv1d_2 (Conv1D) (None, None, 256) 51456 input_1[0][0] \n", - "__________________________________________________________________________________________________\n", - "conv1d_3 (Conv1D) (None, None, 256) 77056 input_1[0][0] \n", - "__________________________________________________________________________________________________\n", - "batch_normalization_1 (BatchNor (None, None, 256) 1024 conv1d_1[0][0] \n", - "__________________________________________________________________________________________________\n", - "batch_normalization_2 (BatchNor (None, None, 256) 1024 conv1d_2[0][0] \n", - "__________________________________________________________________________________________________\n", - "batch_normalization_3 (BatchNor (None, None, 256) 1024 conv1d_3[0][0] \n", - "__________________________________________________________________________________________________\n", - "activation_1 (Activation) (None, None, 256) 0 batch_normalization_1[0][0] \n", - "__________________________________________________________________________________________________\n", - "activation_2 (Activation) (None, None, 256) 0 batch_normalization_2[0][0] \n", - "__________________________________________________________________________________________________\n", - "activation_3 (Activation) (None, None, 256) 0 batch_normalization_3[0][0] \n", - "__________________________________________________________________________________________________\n", - "global_max_pooling1d_1 (GlobalM (None, 256) 0 activation_1[0][0] \n", - "__________________________________________________________________________________________________\n", - "global_max_pooling1d_2 (GlobalM (None, 256) 0 activation_2[0][0] \n", - "__________________________________________________________________________________________________\n", - "global_max_pooling1d_3 (GlobalM (None, 256) 0 activation_3[0][0] \n", - "__________________________________________________________________________________________________\n", - "concatenate_1 (Concatenate) (None, 768) 0 global_max_pooling1d_1[0][0] \n", - " global_max_pooling1d_2[0][0] \n", - " global_max_pooling1d_3[0][0] \n", - "__________________________________________________________________________________________________\n", - "dropout_1 (Dropout) (None, 768) 0 concatenate_1[0][0] \n", - "__________________________________________________________________________________________________\n", - "dense_1 (Dense) (None, 100) 76900 dropout_1[0][0] \n", - "__________________________________________________________________________________________________\n", - "batch_normalization_4 (BatchNor (None, 100) 400 dense_1[0][0] \n", - "__________________________________________________________________________________________________\n", - "activation_4 (Activation) (None, 100) 0 batch_normalization_4[0][0] \n", - "__________________________________________________________________________________________________\n", - "dropout_2 (Dropout) (None, 100) 0 activation_4[0][0] \n", - "__________________________________________________________________________________________________\n", - "dense_2 (Dense) (None, 7) 707 dropout_2[0][0] \n", - "__________________________________________________________________________________________________\n", - "batch_normalization_5 (BatchNor (None, 7) 28 dense_2[0][0] \n", - "__________________________________________________________________________________________________\n", - "activation_5 (Activation) (None, 7) 0 batch_normalization_5[0][0] \n", - "==================================================================================================\n", - "Total params: 235,475\n", - "Trainable params: 233,725\n", - "Non-trainable params: 1,750\n", - "__________________________________________________________________________________________________\n", - "2019-02-12 12:17:01.431 INFO in 'deeppavlov.core.data.simple_vocab'['simple_vocab'] at line 103: [loading vocabulary from /home/vimary/ipavlov/Pilot/examples/tutorials/snips/classes.dict]\n", - "2019-02-12 12:17:01.431 INFO in 'deeppavlov.models.embedders.glove_embedder'['glove_embedder'] at line 52: [loading GloVe embeddings from `/home/vimary/ipavlov/Pilot/examples/tutorials/glove.6B.100d.txt`]\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "{\"valid\": {\"eval_examples_count\": 1589, \"metrics\": {\"sets_accuracy\": 0.9629, \"f1_macro\": 0.9623, \"roc_auc\": 0.9983}, \"time_spent\": \"0:00:01\", \"examples\": [{\"x\": \"Book a table at Carter House Inn in Saint Bonaventure, Alaska.\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Rate the current textbook one of 6 stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"find a nearby movie schedule for movies\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"what is the Mississippi for the week\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Play me a song from 1968 on Spotify\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Book a table for me, naomi and elisabeth at a brasserie with wifi\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"The current album gets three out of 6 points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"find Goodrich Quality Theaters films\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"show me the picture Unfinished Monkey Business\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"When is The Third Eye showing at Dickinson Theatres?\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Please get me the Welcome to the Rileys game.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Find a song called Bronco Billy.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"Rate this essay five stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"add tune to my relax & unwind playlist\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"play 2007 tunes by Bunny Berigan\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"book a table for ten downtown at a close-by restaurant\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Find the schedule for for Corn at eleven A.M. at Loews Cineplex Entertainment.\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"1 minute from now, I will need reservations at a restaurant in Vanlue.\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Play hanging in the balance by Nik Kershaw on Zvooq.\", \"y_predicted\": [\"PlayMusic\"], \"y_true\": [\"PlayMusic\"]}, {\"x\": \"Will it be windy at 4 Pm in NY?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Rate my current textbook 1 out of 6 points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"What are the weather conditions in Noel?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Add this artist to the laugh list\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"I am rating Book of Challenges four stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"rate this textbook a 4\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Put an album by max richter into my this is Rosana playlist. \", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"where can i watch animated movies around here\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Is A Man, a Woman, and a Bank showing in the nearest Neighborhood Cinema Group\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"book a popular food truck in Kentucky\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"Show me animated movies that are playig at Great Escape Theatres\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"add Sara Carter to my Nothing But A Party R&B\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"I would like an outdoor cafeteria for 3\", \"y_predicted\": [\"BookRestaurant\"], \"y_true\": [\"BookRestaurant\"]}, {\"x\": \"rate the book Whit a zero\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Find a show called Time Is Just the Same.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchCreativeWork\"]}, {\"x\": \"I need the weather in Hubbardston, will it be chillier?\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"rate the previous essay four of 6 points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"Add wiktor coj to the Sleep playlist.\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Rate Dixie Lullaby: A Story of Music, Race and New Beginnings in a New South five out of 6 points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"please put live with me onto my playlist named CARГЃCTER LATINO\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Please add tobymac's song onto the indiespensables playlist.\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Show me the movie schedule for Caribbean Cinemas\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Can you put this song on the metal xplorer playlist\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"Add this tune to my rage radio playlist\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"I want to go see A Troll in Central Park.\", \"y_predicted\": [\"SearchCreativeWork\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Give the current series a one.\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"I'd like to watch animated movies at National Amusements\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"my jazz for loving couples needs more push the button\", \"y_predicted\": [\"AddToPlaylist\"], \"y_true\": [\"AddToPlaylist\"]}, {\"x\": \"What are the movie schedules for Kerasotes Theatres\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"rate the Dry series two out of 6 stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"I want a list of showings of Days of Fire at Harkins Theatres\", \"y_predicted\": [\"SearchScreeningEvent\"], \"y_true\": [\"SearchScreeningEvent\"]}, {\"x\": \"Give White House Diary two points\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}, {\"x\": \"I'd like the weather forecast in Gang Mills four years from now.\", \"y_predicted\": [\"GetWeather\"], \"y_true\": [\"GetWeather\"]}, {\"x\": \"Rate Tropic of Capricorn two stars\", \"y_predicted\": [\"RateBook\"], \"y_true\": [\"RateBook\"]}]}}\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2019-02-12 12:17:21.399 INFO in 'deeppavlov.models.classifiers.keras_classification_model'['keras_classification_model'] at line 302: [initializing `KerasClassificationModel` from saved]\n", - "2019-02-12 12:17:21.744 INFO in 'deeppavlov.models.classifiers.keras_classification_model'['keras_classification_model'] at line 312: [loading weights from cnn_model_v1.h5]\n", - "2019-02-12 12:17:21.909 INFO in 'deeppavlov.models.classifiers.keras_classification_model'['keras_classification_model'] at line 136: Model was successfully initialized!\n", - "Model summary:\n", - "__________________________________________________________________________________________________\n", - "Layer (type) Output Shape Param # Connected to \n", - "==================================================================================================\n", - "input_1 (InputLayer) (None, None, 100) 0 \n", - "__________________________________________________________________________________________________\n", - "conv1d_1 (Conv1D) (None, None, 256) 25856 input_1[0][0] \n", - "__________________________________________________________________________________________________\n", - "conv1d_2 (Conv1D) (None, None, 256) 51456 input_1[0][0] \n", - "__________________________________________________________________________________________________\n", - "conv1d_3 (Conv1D) (None, None, 256) 77056 input_1[0][0] \n", - "__________________________________________________________________________________________________\n", - "batch_normalization_1 (BatchNor (None, None, 256) 1024 conv1d_1[0][0] \n", - "__________________________________________________________________________________________________\n", - "batch_normalization_2 (BatchNor (None, None, 256) 1024 conv1d_2[0][0] \n", - "__________________________________________________________________________________________________\n", - "batch_normalization_3 (BatchNor (None, None, 256) 1024 conv1d_3[0][0] \n", - "__________________________________________________________________________________________________\n", - "activation_1 (Activation) (None, None, 256) 0 batch_normalization_1[0][0] \n", - "__________________________________________________________________________________________________\n", - "activation_2 (Activation) (None, None, 256) 0 batch_normalization_2[0][0] \n", - "__________________________________________________________________________________________________\n", - "activation_3 (Activation) (None, None, 256) 0 batch_normalization_3[0][0] \n", - "__________________________________________________________________________________________________\n", - "global_max_pooling1d_1 (GlobalM (None, 256) 0 activation_1[0][0] \n", - "__________________________________________________________________________________________________\n", - "global_max_pooling1d_2 (GlobalM (None, 256) 0 activation_2[0][0] \n", - "__________________________________________________________________________________________________\n", - "global_max_pooling1d_3 (GlobalM (None, 256) 0 activation_3[0][0] \n", - "__________________________________________________________________________________________________\n", - "concatenate_1 (Concatenate) (None, 768) 0 global_max_pooling1d_1[0][0] \n", - " global_max_pooling1d_2[0][0] \n", - " global_max_pooling1d_3[0][0] \n", - "__________________________________________________________________________________________________\n", - "dropout_1 (Dropout) (None, 768) 0 concatenate_1[0][0] \n", - "__________________________________________________________________________________________________\n", - "dense_1 (Dense) (None, 100) 76900 dropout_1[0][0] \n", - "__________________________________________________________________________________________________\n", - "batch_normalization_4 (BatchNor (None, 100) 400 dense_1[0][0] \n", - "__________________________________________________________________________________________________\n", - "activation_4 (Activation) (None, 100) 0 batch_normalization_4[0][0] \n", - "__________________________________________________________________________________________________\n", - "dropout_2 (Dropout) (None, 100) 0 activation_4[0][0] \n", - "__________________________________________________________________________________________________\n", - "dense_2 (Dense) (None, 7) 707 dropout_2[0][0] \n", - "__________________________________________________________________________________________________\n", - "batch_normalization_5 (BatchNor (None, 7) 28 dense_2[0][0] \n", - "__________________________________________________________________________________________________\n", - "activation_5 (Activation) (None, 7) 0 batch_normalization_5[0][0] \n", - "==================================================================================================\n", - "Total params: 235,475\n", - "Trainable params: 233,725\n", - "Non-trainable params: 1,750\n", - "__________________________________________________________________________________________________\n" - ] - } - ], - "source": [ - "# we can train and evaluate model from config\n", - "m = train_model(cnn_config)" - ] - }, - { - "cell_type": "code", - "execution_count": 69, - "metadata": { - "scrolled": true - }, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2019-02-12 12:17:21.914 INFO in 'deeppavlov.core.data.simple_vocab'['simple_vocab'] at line 103: [loading vocabulary from /home/vimary/ipavlov/Pilot/examples/tutorials/snips/classes.dict]\n", - "2019-02-12 12:17:21.915 INFO in 'deeppavlov.models.embedders.glove_embedder'['glove_embedder'] at line 52: [loading GloVe embeddings from `/home/vimary/ipavlov/Pilot/examples/tutorials/glove.6B.100d.txt`]\n", - "2019-02-12 12:17:42.89 INFO in 'deeppavlov.models.classifiers.keras_classification_model'['keras_classification_model'] at line 302: [initializing `KerasClassificationModel` from saved]\n", - "2019-02-12 12:17:42.406 INFO in 'deeppavlov.models.classifiers.keras_classification_model'['keras_classification_model'] at line 312: [loading weights from cnn_model_v1.h5]\n", - "2019-02-12 12:17:42.569 INFO in 'deeppavlov.models.classifiers.keras_classification_model'['keras_classification_model'] at line 136: Model was successfully initialized!\n", - "Model summary:\n", - "__________________________________________________________________________________________________\n", - "Layer (type) Output Shape Param # Connected to \n", - "==================================================================================================\n", - "input_1 (InputLayer) (None, None, 100) 0 \n", - "__________________________________________________________________________________________________\n", - "conv1d_1 (Conv1D) (None, None, 256) 25856 input_1[0][0] \n", - "__________________________________________________________________________________________________\n", - "conv1d_2 (Conv1D) (None, None, 256) 51456 input_1[0][0] \n", - "__________________________________________________________________________________________________\n", - "conv1d_3 (Conv1D) (None, None, 256) 77056 input_1[0][0] \n", - "__________________________________________________________________________________________________\n", - "batch_normalization_1 (BatchNor (None, None, 256) 1024 conv1d_1[0][0] \n", - "__________________________________________________________________________________________________\n", - "batch_normalization_2 (BatchNor (None, None, 256) 1024 conv1d_2[0][0] \n", - "__________________________________________________________________________________________________\n", - "batch_normalization_3 (BatchNor (None, None, 256) 1024 conv1d_3[0][0] \n", - "__________________________________________________________________________________________________\n", - "activation_1 (Activation) (None, None, 256) 0 batch_normalization_1[0][0] \n", - "__________________________________________________________________________________________________\n", - "activation_2 (Activation) (None, None, 256) 0 batch_normalization_2[0][0] \n", - "__________________________________________________________________________________________________\n", - "activation_3 (Activation) (None, None, 256) 0 batch_normalization_3[0][0] \n", - "__________________________________________________________________________________________________\n", - "global_max_pooling1d_1 (GlobalM (None, 256) 0 activation_1[0][0] \n", - "__________________________________________________________________________________________________\n", - "global_max_pooling1d_2 (GlobalM (None, 256) 0 activation_2[0][0] \n", - "__________________________________________________________________________________________________\n", - "global_max_pooling1d_3 (GlobalM (None, 256) 0 activation_3[0][0] \n", - "__________________________________________________________________________________________________\n", - "concatenate_1 (Concatenate) (None, 768) 0 global_max_pooling1d_1[0][0] \n", - " global_max_pooling1d_2[0][0] \n", - " global_max_pooling1d_3[0][0] \n", - "__________________________________________________________________________________________________\n", - "dropout_1 (Dropout) (None, 768) 0 concatenate_1[0][0] \n", - "__________________________________________________________________________________________________\n", - "dense_1 (Dense) (None, 100) 76900 dropout_1[0][0] \n", - "__________________________________________________________________________________________________\n", - "batch_normalization_4 (BatchNor (None, 100) 400 dense_1[0][0] \n", - "__________________________________________________________________________________________________\n", - "activation_4 (Activation) (None, 100) 0 batch_normalization_4[0][0] \n", - "__________________________________________________________________________________________________\n", - "dropout_2 (Dropout) (None, 100) 0 activation_4[0][0] \n", - "__________________________________________________________________________________________________\n", - "dense_2 (Dense) (None, 7) 707 dropout_2[0][0] \n", - "__________________________________________________________________________________________________\n", - "batch_normalization_5 (BatchNor (None, 7) 28 dense_2[0][0] \n", - "__________________________________________________________________________________________________\n", - "activation_5 (Activation) (None, 7) 0 batch_normalization_5[0][0] \n", - "==================================================================================================\n", - "Total params: 235,475\n", - "Trainable params: 233,725\n", - "Non-trainable params: 1,750\n", - "__________________________________________________________________________________________________\n" - ] - } - ], - "source": [ - "# or we can just load pre-trained model (conicides with what we did above)\n", - "m = build_model(cnn_config)" - ] - }, - { - "cell_type": "code", - "execution_count": 70, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "[['GetWeather']]" - ] - }, - "execution_count": 70, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "m([\"Is it freezing in Offerman, California?\"])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### SklearnComponent classifier on GloVe weighted by TF-IDF embeddings from config" - ] - }, - { - "cell_type": "code", - "execution_count": 71, - "metadata": {}, - "outputs": [], - "source": [ - "logreg_config = {\n", - " \"dataset_reader\": {\n", - " \"class_name\": \"basic_classification_reader\",\n", - " \"x\": \"text\",\n", - " \"y\": \"intents\",\n", - " \"data_path\": \"snips\"\n", - " },\n", - " \"dataset_iterator\": {\n", - " \"class_name\": \"basic_classification_iterator\",\n", - " \"seed\": 42,\n", - " \"split_seed\": 23,\n", - " \"field_to_split\": \"train\",\n", - " \"split_fields\": [\n", - " \"train\",\n", - " \"valid\"\n", - " ],\n", - " \"split_proportions\": [\n", - " 0.9,\n", - " 0.1\n", - " ]\n", - " },\n", - " \"chainer\": {\n", - " \"in\": [\n", - " \"x\"\n", - " ],\n", - " \"in_y\": [\n", - " \"y\"\n", - " ],\n", - " \"pipe\": [\n", - " {\n", - " \"id\": \"classes_vocab\",\n", - " \"class_name\": \"simple_vocab\",\n", - " \"fit_on\": [\n", - " \"y\"\n", - " ],\n", - " \"save_path\": \"./snips/classes.dict\",\n", - " \"load_path\": \"./snips/classes.dict\",\n", - " \"in\": \"y\",\n", - " \"out\": \"y_ids\"\n", - " },\n", - " {\n", - " \"in\": [\n", - " \"x\"\n", - " ],\n", - " \"out\": [\n", - " \"x_vec\"\n", - " ],\n", - " \"fit_on\": [\n", - " \"x\",\n", - " \"y_ids\"\n", - " ],\n", - " \"id\": \"my_tfidf_vectorizer\",\n", - " \"class_name\": \"sklearn_component\",\n", - " \"save_path\": \"tfidf_v2.pkl\",\n", - " \"load_path\": \"tfidf_v2.pkl\",\n", - " \"model_class\": \"sklearn.feature_extraction.text:TfidfVectorizer\",\n", - " \"infer_method\": \"transform\"\n", - " },\n", - " {\n", - " \"in\": \"x\",\n", - " \"out\": \"x_tok\",\n", - " \"id\": \"my_tokenizer\",\n", - " \"class_name\": \"nltk_moses_tokenizer\"\n", - " },\n", - " {\n", - " \"in\": \"x_tok\",\n", - " \"out\": \"x_emb\",\n", - " \"id\": \"my_embedder\",\n", - " \"class_name\": \"glove\",\n", - " \"save_path\": \"./glove.6B.100d.txt\",\n", - " \"load_path\": \"./glove.6B.100d.txt\",\n", - " \"dim\": 100,\n", - " \"pad_zero\": True\n", - " },\n", - " {\n", - " \"class_name\": \"one_hotter\",\n", - " \"id\": \"my_onehotter\",\n", - " \"depth\": \"#classes_vocab.len\",\n", - " \"in\": \"y_ids\",\n", - " \"out\": \"y_onehot\",\n", - " \"single_vector\": True\n", - " },\n", - " {\n", - " \"in\": \"x_tok\",\n", - " \"out\": \"x_weighted_emb\",\n", - " \"class_name\": \"tfidf_weighted\",\n", - " \"id\": \"my_weighted_embedder\",\n", - " \"embedder\": \"#my_embedder\",\n", - " \"tokenizer\": \"#my_tokenizer\",\n", - " \"vectorizer\": \"#my_tfidf_vectorizer\",\n", - " \"mean\": True\n", - " },\n", - " {\n", - " \"in\": [\n", - " \"x_weighted_emb\"\n", - " ],\n", - " \"out\": [\n", - " \"y_pred\"\n", - " ],\n", - " \"fit_on\": [\n", - " \"x_weighted_emb\",\n", - " \"y\"\n", - " ],\n", - " \"class_name\": \"sklearn_component\",\n", - " \"main\": True,\n", - " \"save_path\": \"logreg_v3.pkl\",\n", - " \"load_path\": \"logreg_v3.pkl\",\n", - " \"model_class\": \"sklearn.linear_model:LogisticRegression\",\n", - " \"infer_method\": \"predict\",\n", - " \"ensure_list_output\": True\n", - " }\n", - " ],\n", - " \"out\": [\n", - " \"y_pred\"\n", - " ]\n", - " },\n", - " \"train\": {\n", - " \"epochs\": 10,\n", - " \"batch_size\": 64,\n", - " \"metrics\": [\n", - " \"sets_accuracy\"\n", - " ],\n", - " \"show_examples\": False,\n", - " \"validate_best\": True,\n", - " \"test_best\": False\n", - " }\n", - "}\n" - ] - }, - { - "cell_type": "code", - "execution_count": 72, - "metadata": { - "scrolled": true - }, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2019-02-12 12:32:01.417 WARNING in 'deeppavlov.dataset_readers.basic_classification_reader'['basic_classification_reader'] at line 96: Cannot find /home/vimary/ipavlov/Pilot/examples/tutorials/snips/valid.csv file\n", - "2019-02-12 12:32:01.417 WARNING in 'deeppavlov.dataset_readers.basic_classification_reader'['basic_classification_reader'] at line 96: Cannot find /home/vimary/ipavlov/Pilot/examples/tutorials/snips/test.csv file\n", - "2019-02-12 12:32:01.418 INFO in 'deeppavlov.dataset_iterators.basic_classification_iterator'['basic_classification_iterator'] at line 73: Splitting field <> to new fields <<['train', 'valid']>>\n", - "2019-02-12 12:32:01.420 WARNING in 'deeppavlov.core.commands.train'['train'] at line 108: \"validate_best\" and \"test_best\" parameters are deprecated. Please, use \"evaluation_targets\" list instead\n", - "2019-02-12 12:32:01.421 INFO in 'deeppavlov.core.data.simple_vocab'['simple_vocab'] at line 103: [loading vocabulary from /home/vimary/ipavlov/Pilot/examples/tutorials/snips/classes.dict]\n", - "2019-02-12 12:32:01.439 INFO in 'deeppavlov.core.data.simple_vocab'['simple_vocab'] at line 89: [saving vocabulary to /home/vimary/ipavlov/Pilot/examples/tutorials/snips/classes.dict]\n", - "2019-02-12 12:32:01.440 WARNING in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 218: Cannot load model from /home/vimary/ipavlov/Pilot/examples/tutorials/tfidf_v2.pkl\n", - "2019-02-12 12:32:01.441 INFO in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 165: Initializing model sklearn.feature_extraction.text:TfidfVectorizer from scratch\n", - "2019-02-12 12:32:01.486 INFO in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 108: Fitting model sklearn.feature_extraction.text:TfidfVectorizer\n", - "2019-02-12 12:32:01.587 INFO in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 240: Saving model to /home/vimary/ipavlov/Pilot/examples/tutorials/tfidf_v2.pkl\n", - "2019-02-12 12:32:01.603 INFO in 'deeppavlov.models.embedders.glove_embedder'['glove_embedder'] at line 52: [loading GloVe embeddings from `/home/vimary/ipavlov/Pilot/examples/tutorials/glove.6B.100d.txt`]\n", - "2019-02-12 12:32:21.226 WARNING in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 218: Cannot load model from /home/vimary/ipavlov/Pilot/examples/tutorials/logreg_v3.pkl\n", - "2019-02-12 12:32:21.227 INFO in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 165: Initializing model sklearn.linear_model:LogisticRegression from scratch\n", - "2019-02-12 12:32:43.431 INFO in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 108: Fitting model sklearn.linear_model:LogisticRegression\n", - "2019-02-12 12:32:45.621 INFO in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 240: Saving model to /home/vimary/ipavlov/Pilot/examples/tutorials/logreg_v3.pkl\n", - "2019-02-12 12:32:45.626 WARNING in 'deeppavlov.core.trainers.nn_trainer'['nn_trainer'] at line 295: Using NNTrainer for a pipeline without batched training\n", - "2019-02-12 12:32:45.626 INFO in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 240: Saving model to /home/vimary/ipavlov/Pilot/examples/tutorials/logreg_v3.pkl\n", - "2019-02-12 12:32:45.658 INFO in 'deeppavlov.core.data.simple_vocab'['simple_vocab'] at line 103: [loading vocabulary from /home/vimary/ipavlov/Pilot/examples/tutorials/snips/classes.dict]\n", - "2019-02-12 12:32:45.659 INFO in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 202: Loading model sklearn.feature_extraction.text:TfidfVectorizer from /home/vimary/ipavlov/Pilot/examples/tutorials/tfidf_v2.pkl\n", - "2019-02-12 12:32:45.664 INFO in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 209: Model sklearn.feature_extraction.textTfidfVectorizer loaded with parameters\n", - "2019-02-12 12:32:45.665 WARNING in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 215: Fitting of loaded model can not be continued. Model can be fitted from scratch.If one needs to continue fitting, please, look at `warm_start` parameter\n", - "2019-02-12 12:32:45.666 INFO in 'deeppavlov.models.embedders.glove_embedder'['glove_embedder'] at line 52: [loading GloVe embeddings from `/home/vimary/ipavlov/Pilot/examples/tutorials/glove.6B.100d.txt`]\n", - "2019-02-12 12:33:05.258 INFO in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 202: Loading model sklearn.linear_model:LogisticRegression from /home/vimary/ipavlov/Pilot/examples/tutorials/logreg_v3.pkl\n", - "2019-02-12 12:33:05.259 INFO in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 209: Model sklearn.linear_model.logisticLogisticRegression loaded with parameters\n", - "2019-02-12 12:33:05.259 WARNING in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 215: Fitting of loaded model can not be continued. Model can be fitted from scratch.If one needs to continue fitting, please, look at `warm_start` parameter\n", - "2019-02-12 12:33:07.749 INFO in 'deeppavlov.core.data.simple_vocab'['simple_vocab'] at line 103: [loading vocabulary from /home/vimary/ipavlov/Pilot/examples/tutorials/snips/classes.dict]\n", - "2019-02-12 12:33:07.750 INFO in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 202: Loading model sklearn.feature_extraction.text:TfidfVectorizer from /home/vimary/ipavlov/Pilot/examples/tutorials/tfidf_v2.pkl\n", - "2019-02-12 12:33:07.755 INFO in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 209: Model sklearn.feature_extraction.textTfidfVectorizer loaded with parameters\n", - "2019-02-12 12:33:07.755 WARNING in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 215: Fitting of loaded model can not be continued. Model can be fitted from scratch.If one needs to continue fitting, please, look at `warm_start` parameter\n", - "2019-02-12 12:33:07.756 INFO in 'deeppavlov.models.embedders.glove_embedder'['glove_embedder'] at line 52: [loading GloVe embeddings from `/home/vimary/ipavlov/Pilot/examples/tutorials/glove.6B.100d.txt`]\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "{\"valid\": {\"eval_examples_count\": 1589, \"metrics\": {\"sets_accuracy\": 0.9283}, \"time_spent\": \"0:00:03\"}}\n" - ] - }, - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2019-02-12 12:33:27.702 INFO in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 202: Loading model sklearn.linear_model:LogisticRegression from /home/vimary/ipavlov/Pilot/examples/tutorials/logreg_v3.pkl\n", - "2019-02-12 12:33:27.702 INFO in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 209: Model sklearn.linear_model.logisticLogisticRegression loaded with parameters\n", - "2019-02-12 12:33:27.703 WARNING in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 215: Fitting of loaded model can not be continued. Model can be fitted from scratch.If one needs to continue fitting, please, look at `warm_start` parameter\n" - ] - } - ], - "source": [ - "# we can train and evaluate model from config\n", - "m = train_model(logreg_config)" - ] - }, - { - "cell_type": "code", - "execution_count": 73, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2019-02-12 12:33:27.742 INFO in 'deeppavlov.core.data.simple_vocab'['simple_vocab'] at line 103: [loading vocabulary from /home/vimary/ipavlov/Pilot/examples/tutorials/snips/classes.dict]\n", - "2019-02-12 12:33:27.743 INFO in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 202: Loading model sklearn.feature_extraction.text:TfidfVectorizer from /home/vimary/ipavlov/Pilot/examples/tutorials/tfidf_v2.pkl\n", - "2019-02-12 12:33:27.748 INFO in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 209: Model sklearn.feature_extraction.textTfidfVectorizer loaded with parameters\n", - "2019-02-12 12:33:27.749 WARNING in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 215: Fitting of loaded model can not be continued. Model can be fitted from scratch.If one needs to continue fitting, please, look at `warm_start` parameter\n", - "2019-02-12 12:33:27.750 INFO in 'deeppavlov.models.embedders.glove_embedder'['glove_embedder'] at line 52: [loading GloVe embeddings from `/home/vimary/ipavlov/Pilot/examples/tutorials/glove.6B.100d.txt`]\n", - "2019-02-12 12:33:47.483 INFO in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 202: Loading model sklearn.linear_model:LogisticRegression from /home/vimary/ipavlov/Pilot/examples/tutorials/logreg_v3.pkl\n", - "2019-02-12 12:33:47.484 INFO in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 209: Model sklearn.linear_model.logisticLogisticRegression loaded with parameters\n", - "2019-02-12 12:33:47.484 WARNING in 'deeppavlov.models.sklearn.sklearn_component'['sklearn_component'] at line 215: Fitting of loaded model can not be continued. Model can be fitted from scratch.If one needs to continue fitting, please, look at `warm_start` parameter\n" - ] - } - ], - "source": [ - "# or we can just load pre-trained model (conicides with what we did above)\n", - "m = build_model(logreg_config)" - ] - }, - { - "cell_type": "code", - "execution_count": 74, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "[['GetWeather']]" - ] - }, - "execution_count": 74, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "m([\"Is it freezing in Offerman, California?\"])" - ] - }, - { - "cell_type": "code", - "execution_count": 75, - "metadata": {}, - "outputs": [], - "source": [ - "# let's free memory\n", - "del m" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Bonus: pre-trained CNN model in DeepPavlov" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Download model files (`wiki.en.bin` 8Gb embeddings):" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "! python -m deeppavlov download intents_snips_big" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Evaluate metrics on validation set (no test set provided):" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "! python -m deeppavlov evaluate intents_snips_big" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Or one can use model from python code:" - ] - }, - { - "cell_type": "code", - "execution_count": 77, - "metadata": { - "scrolled": true - }, - "outputs": [], - "source": [ - "from pathlib import Path\n", - "\n", - "import deeppavlov\n", - "from deeppavlov import build_model, evaluate_model\n", - "from deeppavlov.download import deep_download\n", - "\n", - "config_path = Path(deeppavlov.__file__).parent.joinpath('configs/classifiers/intents_snips_big.json')" - ] - }, - { - "cell_type": "code", - "execution_count": 78, - "metadata": { - "scrolled": true - }, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2018-12-13 18:44:55.284 DEBUG in 'urllib3.connectionpool'['connectionpool'] at line 205: Starting new HTTP connection (1): files.deeppavlov.ai:80\n", - "2018-12-13 18:44:55.341 DEBUG in 'urllib3.connectionpool'['connectionpool'] at line 393: http://files.deeppavlov.ai:80 \"GET /datasets/snips_intents/train.csv.md5 HTTP/1.1\" 200 44\n", - "2018-12-13 18:44:55.346 INFO in 'deeppavlov.download'['download'] at line 115: Skipped http://files.deeppavlov.ai/datasets/snips_intents/train.csv download because of matching hashes\n", - "2018-12-13 18:44:55.348 DEBUG in 'urllib3.connectionpool'['connectionpool'] at line 205: Starting new HTTP connection (1): files.deeppavlov.ai:80\n", - "2018-12-13 18:44:55.540 DEBUG in 'urllib3.connectionpool'['connectionpool'] at line 393: http://files.deeppavlov.ai:80 \"GET /deeppavlov_data/classifiers/intents_snips_v10.tar.gz.md5 HTTP/1.1\" 200 193\n", - "2018-12-13 18:44:55.589 INFO in 'deeppavlov.download'['download'] at line 115: Skipped http://files.deeppavlov.ai/deeppavlov_data/classifiers/intents_snips_v10.tar.gz download because of matching hashes\n", - "2018-12-13 18:44:55.593 DEBUG in 'urllib3.connectionpool'['connectionpool'] at line 205: Starting new HTTP connection (1): files.deeppavlov.ai:80\n", - "2018-12-13 18:44:55.629 DEBUG in 'urllib3.connectionpool'['connectionpool'] at line 393: http://files.deeppavlov.ai:80 \"GET /deeppavlov_data/embeddings/wiki.en.bin.md5 HTTP/1.1\" 200 46\n", - "2018-12-13 18:45:11.617 INFO in 'deeppavlov.download'['download'] at line 115: Skipped http://files.deeppavlov.ai/deeppavlov_data/embeddings/wiki.en.bin download because of matching hashes\n" - ] - } - ], - "source": [ - "# let's download all the required data - model files, embeddings, vocabularies\n", - "deep_download(config_path)" - ] - }, - { - "cell_type": "code", - "execution_count": 79, - "metadata": { - "scrolled": true - }, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2018-12-13 18:45:11.621 INFO in 'deeppavlov.core.data.simple_vocab'['simple_vocab'] at line 100: [loading vocabulary from /home/dilyara/.deeppavlov/models/classifiers/intents_snips_v10/classes.dict]\n", - "2018-12-13 18:45:11.632 INFO in 'deeppavlov.models.embedders.fasttext_embedder'['fasttext_embedder'] at line 52: [loading fastText embeddings from `/home/dilyara/.deeppavlov/downloads/embeddings/wiki.en.bin`]\n", - "2018-12-13 18:45:32.229 INFO in 'deeppavlov.models.classifiers.keras_classification_model'['keras_classification_model'] at line 287: [initializing `KerasClassificationModel` from saved]\n", - "2018-12-13 18:45:32.554 INFO in 'deeppavlov.models.classifiers.keras_classification_model'['keras_classification_model'] at line 297: [loading weights from model.h5]\n", - "2018-12-13 18:45:32.772 INFO in 'deeppavlov.models.classifiers.keras_classification_model'['keras_classification_model'] at line 137: Model was successfully initialized!\n", - "Model summary:\n", - "__________________________________________________________________________________________________\n", - "Layer (type) Output Shape Param # Connected to \n", - "==================================================================================================\n", - "input_1 (InputLayer) (None, None, 300) 0 \n", - "__________________________________________________________________________________________________\n", - "conv1d_1 (Conv1D) (None, None, 256) 230656 input_1[0][0] \n", - "__________________________________________________________________________________________________\n", - "conv1d_2 (Conv1D) (None, None, 256) 384256 input_1[0][0] \n", - "__________________________________________________________________________________________________\n", - "conv1d_3 (Conv1D) (None, None, 256) 537856 input_1[0][0] \n", - "__________________________________________________________________________________________________\n", - "batch_normalization_1 (BatchNor (None, None, 256) 1024 conv1d_1[0][0] \n", - "__________________________________________________________________________________________________\n", - "batch_normalization_2 (BatchNor (None, None, 256) 1024 conv1d_2[0][0] \n", - "__________________________________________________________________________________________________\n", - "batch_normalization_3 (BatchNor (None, None, 256) 1024 conv1d_3[0][0] \n", - "__________________________________________________________________________________________________\n", - "activation_1 (Activation) (None, None, 256) 0 batch_normalization_1[0][0] \n", - "__________________________________________________________________________________________________\n", - "activation_2 (Activation) (None, None, 256) 0 batch_normalization_2[0][0] \n", - "__________________________________________________________________________________________________\n", - "activation_3 (Activation) (None, None, 256) 0 batch_normalization_3[0][0] \n", - "__________________________________________________________________________________________________\n", - "global_max_pooling1d_1 (GlobalM (None, 256) 0 activation_1[0][0] \n", - "__________________________________________________________________________________________________\n", - "global_max_pooling1d_2 (GlobalM (None, 256) 0 activation_2[0][0] \n", - "__________________________________________________________________________________________________\n", - "global_max_pooling1d_3 (GlobalM (None, 256) 0 activation_3[0][0] \n", - "__________________________________________________________________________________________________\n", - "concatenate_1 (Concatenate) (None, 768) 0 global_max_pooling1d_1[0][0] \n", - " global_max_pooling1d_2[0][0] \n", - " global_max_pooling1d_3[0][0] \n", - "__________________________________________________________________________________________________\n", - "dropout_1 (Dropout) (None, 768) 0 concatenate_1[0][0] \n", - "__________________________________________________________________________________________________\n", - "dense_1 (Dense) (None, 100) 76900 dropout_1[0][0] \n", - "__________________________________________________________________________________________________\n", - "batch_normalization_4 (BatchNor (None, 100) 400 dense_1[0][0] \n", - "__________________________________________________________________________________________________\n", - "activation_4 (Activation) (None, 100) 0 batch_normalization_4[0][0] \n", - "__________________________________________________________________________________________________\n", - "dropout_2 (Dropout) (None, 100) 0 activation_4[0][0] \n", - "__________________________________________________________________________________________________\n", - "dense_2 (Dense) (None, 7) 707 dropout_2[0][0] \n", - "__________________________________________________________________________________________________\n", - "batch_normalization_5 (BatchNor (None, 7) 28 dense_2[0][0] \n", - "__________________________________________________________________________________________________\n", - "activation_5 (Activation) (None, 7) 0 batch_normalization_5[0][0] \n", - "==================================================================================================\n", - "Total params: 1,233,875\n", - "Trainable params: 1,232,125\n", - "Non-trainable params: 1,750\n", - "__________________________________________________________________________________________________\n" - ] - } - ], - "source": [ - "# now one can initialize model\n", - "m = build_model(config_path)" - ] - }, - { - "cell_type": "code", - "execution_count": 80, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "[['GetWeather']]" - ] - }, - "execution_count": 80, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "m([\"Is it freezing in Offerman, California?\"])" - ] - }, - { - "cell_type": "code", - "execution_count": 81, - "metadata": {}, - "outputs": [], - "source": [ - "# let's free memory\n", - "del m" - ] - }, - { - "cell_type": "code", - "execution_count": 82, - "metadata": { - "scrolled": true - }, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2018-12-13 18:45:33.675 WARNING in 'deeppavlov.dataset_readers.basic_classification_reader'['basic_classification_reader'] at line 97: Cannot find /home/dilyara/.deeppavlov/downloads/snips/valid.csv file\n", - "2018-12-13 18:45:33.675 WARNING in 'deeppavlov.dataset_readers.basic_classification_reader'['basic_classification_reader'] at line 97: Cannot find /home/dilyara/.deeppavlov/downloads/snips/test.csv file\n", - "2018-12-13 18:45:33.676 INFO in 'deeppavlov.dataset_iterators.basic_classification_iterator'['basic_classification_iterator'] at line 73: Splitting field <> to new fields <<['train', 'valid']>>\n", - "2018-12-13 18:45:33.679 INFO in 'deeppavlov.core.data.simple_vocab'['simple_vocab'] at line 100: [loading vocabulary from /home/dilyara/.deeppavlov/models/classifiers/intents_snips_v10/classes.dict]\n", - "2018-12-13 18:45:33.680 INFO in 'deeppavlov.models.embedders.fasttext_embedder'['fasttext_embedder'] at line 52: [loading fastText embeddings from `/home/dilyara/.deeppavlov/downloads/embeddings/wiki.en.bin`]\n", - "2018-12-13 18:45:54.568 INFO in 'deeppavlov.models.classifiers.keras_classification_model'['keras_classification_model'] at line 287: [initializing `KerasClassificationModel` from saved]\n", - "2018-12-13 18:45:54.913 INFO in 'deeppavlov.models.classifiers.keras_classification_model'['keras_classification_model'] at line 297: [loading weights from model.h5]\n", - "2018-12-13 18:45:55.112 INFO in 'deeppavlov.models.classifiers.keras_classification_model'['keras_classification_model'] at line 137: Model was successfully initialized!\n", - "Model summary:\n", - "__________________________________________________________________________________________________\n", - "Layer (type) Output Shape Param # Connected to \n", - "==================================================================================================\n", - "input_1 (InputLayer) (None, None, 300) 0 \n", - "__________________________________________________________________________________________________\n", - "conv1d_1 (Conv1D) (None, None, 256) 230656 input_1[0][0] \n", - "__________________________________________________________________________________________________\n", - "conv1d_2 (Conv1D) (None, None, 256) 384256 input_1[0][0] \n", - "__________________________________________________________________________________________________\n", - "conv1d_3 (Conv1D) (None, None, 256) 537856 input_1[0][0] \n", - "__________________________________________________________________________________________________\n", - "batch_normalization_1 (BatchNor (None, None, 256) 1024 conv1d_1[0][0] \n", - "__________________________________________________________________________________________________\n", - "batch_normalization_2 (BatchNor (None, None, 256) 1024 conv1d_2[0][0] \n", - "__________________________________________________________________________________________________\n", - "batch_normalization_3 (BatchNor (None, None, 256) 1024 conv1d_3[0][0] \n", - "__________________________________________________________________________________________________\n", - "activation_1 (Activation) (None, None, 256) 0 batch_normalization_1[0][0] \n", - "__________________________________________________________________________________________________\n", - "activation_2 (Activation) (None, None, 256) 0 batch_normalization_2[0][0] \n", - "__________________________________________________________________________________________________\n", - "activation_3 (Activation) (None, None, 256) 0 batch_normalization_3[0][0] \n", - "__________________________________________________________________________________________________\n", - "global_max_pooling1d_1 (GlobalM (None, 256) 0 activation_1[0][0] \n", - "__________________________________________________________________________________________________\n", - "global_max_pooling1d_2 (GlobalM (None, 256) 0 activation_2[0][0] \n", - "__________________________________________________________________________________________________\n", - "global_max_pooling1d_3 (GlobalM (None, 256) 0 activation_3[0][0] \n", - "__________________________________________________________________________________________________\n", - "concatenate_1 (Concatenate) (None, 768) 0 global_max_pooling1d_1[0][0] \n", - " global_max_pooling1d_2[0][0] \n", - " global_max_pooling1d_3[0][0] \n", - "__________________________________________________________________________________________________\n", - "dropout_1 (Dropout) (None, 768) 0 concatenate_1[0][0] \n", - "__________________________________________________________________________________________________\n", - "dense_1 (Dense) (None, 100) 76900 dropout_1[0][0] \n", - "__________________________________________________________________________________________________\n", - "batch_normalization_4 (BatchNor (None, 100) 400 dense_1[0][0] \n", - "__________________________________________________________________________________________________\n", - "activation_4 (Activation) (None, 100) 0 batch_normalization_4[0][0] \n", - "__________________________________________________________________________________________________\n", - "dropout_2 (Dropout) (None, 100) 0 activation_4[0][0] \n", - "__________________________________________________________________________________________________\n", - "dense_2 (Dense) (None, 7) 707 dropout_2[0][0] \n", - "__________________________________________________________________________________________________\n", - "batch_normalization_5 (BatchNor (None, 7) 28 dense_2[0][0] \n", - "__________________________________________________________________________________________________\n", - "activation_5 (Activation) (None, 7) 0 batch_normalization_5[0][0] \n", - "==================================================================================================\n", - "Total params: 1,233,875\n", - "Trainable params: 1,232,125\n", - "Non-trainable params: 1,750\n", - "__________________________________________________________________________________________________\n", - "2018-12-13 18:45:55.113 INFO in 'deeppavlov.core.commands.train'['train'] at line 207: Testing the best saved model\n" - ] - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "{\"valid\": {\"eval_examples_count\": 1589, \"metrics\": {\"sets_accuracy\": 0.9824, \"f1_macro\": 0.982, \"roc_auc\": 0.9986}, \"time_spent\": \"0:00:01\"}}\n" - ] - }, - { - "data": { - "text/plain": [ - "{'valid': OrderedDict([('sets_accuracy', 0.9824),\n", - " ('f1_macro', 0.982),\n", - " ('roc_auc', 0.9986)])}" - ] - }, - "execution_count": 82, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# or one can evaluate model WITHOUT training\n", - "evaluate_model(config_path)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "name": "python3" - }, - "accelerator": "GPU", - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.6" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/examples/gobot_extended_tutorial.ipynb b/examples/gobot_extended_tutorial.ipynb deleted file mode 100644 index d3173c5ea0..0000000000 --- a/examples/gobot_extended_tutorial.ipynb +++ /dev/null @@ -1,1387 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": { - "colab_type": "text", - "id": "K7nBJnADTgUw" - }, - "source": [ - "### You can also run the notebook in [COLAB](https://colab.research.google.com/github/deepmipt/DeepPavlov/blob/master/examples/gobot_extended_tutorial.ipynb)." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "colab_type": "text", - "id": "iPbAiv8KTgU4" - }, - "source": [ - "# Goal-oriented bot in DeepPavlov" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "colab_type": "text", - "id": "us6IsTUETgU5" - }, - "source": [ - "This tutorial describes how to build a goal/task-oriented dialogue system with DeepPavlov framework. It covers the following steps:\n", - "\n", - "0. [Data preparation](#0.-Data-Preparation)\n", - "1. [Build Database of items](#1.-Build-Database-of-items)\n", - "2. [Build Slot Filler](#2.-Build-Slot-Filler)\n", - "3. [Build and Train a Bot](#3.-Build-and-Train-a-Bot)\n", - "4. [Interact with bot](#4.-Interact-with-Bot)\n", - "\n", - "An example of the final model served as a telegram bot:\n", - "\n", - "![gobot_example.png](https://github.com/deepmipt/DeepPavlov/blob/master/examples/img/gobot_example.png?raw=1)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 806 - }, - "colab_type": "code", - "id": "Vtu-7ns2TgUz", - "outputId": "8cdc252f-1a35-4ed3-bf0a-f54046d8c6a8" - }, - "outputs": [], - "source": [ - "!pip install deeppavlov\n", - "!python -m deeppavlov install gobot_simple_dstc2" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "colab_type": "text", - "id": "4R066YWhTgU6" - }, - "source": [ - "## 0. Data Preparation" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "colab_type": "text", - "id": "gppbVe-HTgU7" - }, - "source": [ - "In this tutorial we build a chatbot for restaurant booking. To train our chatbot we use [Dialogue State Tracking Challenge 2 (DSTC-2)](http://camdial.org/~mh521/dstc/) dataset. DSTC-2 provides dialogues of a human talking to a booking system labelled with slots and dialogue actions. These labels will be used for training a dialogue policy network.\n", - "\n", - "First of all let's take a quick look at the data for the task. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 137 - }, - "colab_type": "code", - "id": "K9lF3QFJTgU8", - "outputId": "6ab259e2-3f88-4b25-9371-21d3f38fcef3" - }, - "outputs": [], - "source": [ - "from deeppavlov.dataset_readers.dstc2_reader import SimpleDSTC2DatasetReader\n", - "\n", - "data = SimpleDSTC2DatasetReader().read('my_data')" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 50 - }, - "colab_type": "code", - "id": "uu56jAGJTgVD", - "outputId": "1536bb2c-6c1f-45a6-c0a7-a92106ed7dfe" - }, - "outputs": [], - "source": [ - "!ls my_data" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "colab_type": "text", - "id": "HmNmE80MTgVG" - }, - "source": [ - "The training/validation/test data are stored in json files (`simple-dstc2-trn.json`, `simple-dstc2-val.json` and `simple-dstc2-tst.json`):" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 1000 - }, - "colab_type": "code", - "id": "LIm9DQyzTgVH", - "outputId": "0a82c3f1-8afb-42d5-e3e3-0e9dd9178a20" - }, - "outputs": [], - "source": [ - "!head -n 101 my_data/simple-dstc2-trn.json" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "colab_type": "text", - "id": "zO4CWg0XYNSw" - }, - "source": [ - "To iterate over batches of preprocessed DSTC-2 we need to import `DatasetIterator`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": {}, - "colab_type": "code", - "id": "piBBcw9ZTgVK", - "scrolled": true - }, - "outputs": [], - "source": [ - "from deeppavlov.dataset_iterators.dialog_iterator import DialogDatasetIterator\n", - "\n", - "iterator = DialogDatasetIterator(data)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "colab_type": "text", - "id": "jVU5JGnTTgVM" - }, - "source": [ - "You can now iterate over batches of preprocessed DSTC-2 dialogs:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 1000 - }, - "colab_type": "code", - "id": "1RSwEH3CTgVN", - "outputId": "b2a0ecdb-89d1-4784-eeb9-749f7b754ff6" - }, - "outputs": [], - "source": [ - "from pprint import pprint\n", - "\n", - "for dialog in iterator.gen_batches(batch_size=1, data_type='train'):\n", - " turns_x, turns_y = dialog\n", - " \n", - " print(\"User utterances:\\n----------------\\n\")\n", - " pprint(turns_x[0], indent=4)\n", - " print(\"\\nSystem responses:\\n-----------------\\n\")\n", - " pprint(turns_y[0], indent=4)\n", - " \n", - " break" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "colab_type": "text", - "id": "AKTZWtm8ZtPi" - }, - "source": [ - "In real-life annotation of data is expensive. To make our tutorial closer to production use-cases we take only 50 dialogues for training." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": {}, - "colab_type": "code", - "id": "UlappYTbTgVT" - }, - "outputs": [], - "source": [ - "!cp my_data/simple-dstc2-trn.json my_data/simple-dstc2-trn.full.json" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 33 - }, - "colab_type": "code", - "id": "tTU9yM-CTgVX", - "outputId": "1568aaed-7f8e-4f77-a637-cda5a9556740" - }, - "outputs": [], - "source": [ - "import json\n", - "\n", - "NUM_TRAIN = 50\n", - "\n", - "with open('my_data/simple-dstc2-trn.full.json', 'rt') as fin:\n", - " data = json.load(fin)\n", - "with open('my_data/simple-dstc2-trn.json', 'wt') as fout:\n", - " json.dump(data[:NUM_TRAIN], fout, indent=2)\n", - "print(f\"Train set is reduced to {NUM_TRAIN} dialogues (out of {len(data)}).\")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "colab_type": "text", - "id": "l5mjRphbTgVb" - }, - "source": [ - "## 1. Build Database of items" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "colab_type": "text", - "id": "n597CLhqjqcd" - }, - "source": [ - "### Building database of restaurants" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "colab_type": "text", - "id": "nJFkgfjTTgVf" - }, - "source": [ - "To assist with restaurant booking the chatbot should have access to a `database` of restaurants. The `database` contains task-specific information such as type of food, price range, location, etc.\n", - "\n", - " >> database([{'pricerange': 'cheap', 'area': 'south'}])\n", - " \n", - " Out[1]: \n", - " [[{'name': 'the lucky star',\n", - " 'food': 'chinese',\n", - " 'pricerange': 'cheap',\n", - " 'area': 'south',\n", - " 'addr': 'cambridge leisure park clifton way cherry hinton',\n", - " 'phone': '01223 244277',\n", - " 'postcode': 'c.b 1, 7 d.y'},\n", - " {'name': 'nandos',\n", - " 'food': 'portuguese',\n", - " 'pricerange': 'cheap',\n", - " 'area': 'south',\n", - " 'addr': 'cambridge leisure park clifton way',\n", - " 'phone': '01223 327908',\n", - " 'postcode': 'c.b 1, 7 d.y'}]]\n", - " " - ] - }, - { - "cell_type": "markdown", - "metadata": { - "colab_type": "text", - "id": "rNpewHp-TgVd" - }, - "source": [ - " \n", - "![gobot_database.png](https://github.com/deepmipt/DeepPavlov/blob/master/examples/img/gobot_database.png?raw=1)\n", - " " - ] - }, - { - "cell_type": "markdown", - "metadata": { - "colab_type": "text", - "id": "-TU-NLnNa9tk" - }, - "source": [ - "The chatbot should be trained to make api calls. For this, training dataset contains a `\"db_result\"` dictionary key. It annotates turns where system performs an api call to the database of items. Rusulting value is stored in `\"db_result\"`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "colab_type": "code", - "id": "EVNRZmeiTgVh", - "outputId": "edba5e2b-235f-423f-8bfa-8d02506c4c7e" - }, - "outputs": [], - "source": [ - "!head -n 78 my_data/simple-dstc2-trn.json | tail +51" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "colab_type": "text", - "id": "GT4YBHMnl0Xd" - }, - "source": [ - "Set `primary_keys` to a list of slot names that have unique values for different items (common SQL term). For the case of DSTC-2, the primary slot is a restaurant name." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "colab_type": "code", - "id": "JjKbIAyaTgVk", - "outputId": "07620401-80f5-490a-cff2-5d5f013a365b" - }, - "outputs": [], - "source": [ - "from deeppavlov.core.data.sqlite_database import Sqlite3Database\n", - "\n", - "database = Sqlite3Database(primary_keys=[\"name\"],\n", - " save_path=\"my_bot/db.sqlite\")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "colab_type": "text", - "id": "a2e1u-z0TgVo" - }, - "source": [ - "\n", - "Let's find all `\"db_result\"` api call results and add them to our database of restaurants:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "colab_type": "code", - "id": "RlKg5UtqTgVp", - "outputId": "a387df1f-4418-498b-a125-9e351a8e0cf9" - }, - "outputs": [], - "source": [ - "db_results = []\n", - "\n", - "for dialog in iterator.gen_batches(batch_size=1, data_type='all'):\n", - " turns_x, turns_y = dialog\n", - " db_results.extend(x['db_result'] for x in turns_x[0] if x.get('db_result'))\n", - "\n", - "print(f\"Adding {len(db_results)} items.\")\n", - "if db_results:\n", - " database.fit(db_results)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "colab_type": "text", - "id": "XeJMI9qaTgVt" - }, - "source": [ - "### Interacting with database" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "colab_type": "text", - "id": "2JLUF2b_TgVu" - }, - "source": [ - "We can now play with the database and make requests to it:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "colab_type": "code", - "id": "VRCU_MJnTgVv", - "outputId": "017803c4-36ab-49bc-ae40-7df87356f5c2" - }, - "outputs": [], - "source": [ - "database([{'pricerange': 'cheap', 'area': 'south'}])" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "colab_type": "code", - "id": "U2wOAIlpTgV1", - "outputId": "e83e53b9-3431-4d1c-9bed-0e841d2b6fc4" - }, - "outputs": [], - "source": [ - "!ls my_bot" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "colab_type": "text", - "id": "mBoO34NzTgV4" - }, - "source": [ - "## 2. Build Slot Filler" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "colab_type": "text", - "id": "TGlJRwTCYkiQ" - }, - "source": [ - "`Slot Filler` is a component that finds slot values in user input:\n", - "\n", - " >> slot_filler(['I would like some chineese food'])\n", - " \n", - " Out[1]: [{'food': 'chinese'}]\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "colab_type": "text", - "id": "5RqXeLdTTgV4" - }, - "source": [ - " \n", - "![gobot_slotfiller.png](https://github.com/deepmipt/DeepPavlov/blob/master/examples/img/gobot_slotfiller.png?raw=1)\n", - " " - ] - }, - { - "cell_type": "markdown", - "metadata": { - "colab_type": "text", - "id": "TcJGPFq4TgV5" - }, - "source": [ - "To implement a `Slot Filler` you need to provide\n", - " \n", - " - **slot types**,\n", - " - all possible **slot values**,\n", - " - also, it is good to have examples of mentions for every value of each slot.\n", - " \n", - "In this tutorial, a schema for `slot types` and `slot values` should be defined in `slot_vals.json` with the following format:\n", - "\n", - " {\n", - " 'food': {\n", - " 'chinese': ['chinese', 'chineese', 'chines'],\n", - " 'french': ['french', 'freench'],\n", - " 'dontcare': ['any food', 'any type of food']\n", - " }\n", - " }\n", - " \n", - "\n", - "Let's use a simple non-trainable slot filler that relies on Levenshtein distance." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "colab_type": "code", - "id": "zVi5XynnTgV6", - "outputId": "e9d68c8c-3bbb-4f80-98a5-92cbfe0eb5ac" - }, - "outputs": [], - "source": [ - "from deeppavlov.download import download_decompress\n", - "\n", - "download_decompress(url='http://files.deeppavlov.ai/deeppavlov_data/dstc_slot_vals.tar.gz',\n", - " download_path='my_bot/slotfill')" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "colab_type": "code", - "id": "NR1S3PXCTgV9", - "outputId": "013e9dba-427c-4255-aad5-0627477157e8" - }, - "outputs": [], - "source": [ - "!ls my_bot/slotfill" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "colab_type": "text", - "id": "-OZ9TqDKZ6Fv" - }, - "source": [ - "Print some `slot types` and `slot values`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "colab_type": "code", - "id": "KqgfYr4RTgWE", - "outputId": "a6830aa3-0bcc-4011-a4ab-5b5e48e6a20f" - }, - "outputs": [], - "source": [ - "!head -n 10 my_bot/slotfill/dstc_slot_vals.json" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "colab_type": "text", - "id": "eIufDAvATgWN" - }, - "source": [ - "Check performance of our slot filler on DSTC-2 dataset." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": {}, - "colab_type": "code", - "id": "XUSj5R3uTgWP" - }, - "outputs": [], - "source": [ - "from deeppavlov import configs\n", - "from deeppavlov.core.common.file import read_json\n", - "\n", - "slotfill_config = read_json(configs.ner.slotfill_simple_dstc2_raw)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "colab_type": "text", - "id": "pFda6_LBTgWT" - }, - "source": [ - "We take [original DSTC2 slot-filling config](https://github.com/deepmipt/DeepPavlov/blob/master/deeppavlov/configs/ner/slotfill_dstc2_raw.json) from DeepPavlov and change variables determining data paths:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": {}, - "colab_type": "code", - "id": "yr8MbFLwTgWV" - }, - "outputs": [], - "source": [ - "slotfill_config['metadata']['variables']['DATA_PATH'] = 'my_data'\n", - "slotfill_config['metadata']['variables']['SLOT_VALS_PATH'] = 'my_bot/slotfill/dstc_slot_vals.json'" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "colab_type": "text", - "id": "ZxMTySrpaZVP" - }, - "source": [ - "Run evaluation." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "colab_type": "code", - "id": "CdrDW4bVTgWZ", - "outputId": "ac56ae74-b368-437e-c70f-01b418ba883f" - }, - "outputs": [], - "source": [ - "from deeppavlov import evaluate_model\n", - "\n", - "slotfill = evaluate_model(slotfill_config);" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "colab_type": "text", - "id": "azulujiLTgWb" - }, - "source": [ - "We've got slot accuracy of **93% on valid** set and **95% on test** set." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "colab_type": "text", - "id": "FkZvQ-yNig1u" - }, - "source": [ - "Building `Slot Filler` model from DeepPavlov config." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": {}, - "colab_type": "code", - "id": "uWeXTtVhTgWc" - }, - "outputs": [], - "source": [ - "from deeppavlov import build_model\n", - "\n", - "slotfill = build_model(slotfill_config)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "colab_type": "text", - "id": "ihi4lpXUi-_V" - }, - "source": [ - "Testing the model." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "colab_type": "code", - "id": "bMRSU_bnTgWf", - "outputId": "d224e4be-1537-428d-ff67-55076224946d" - }, - "outputs": [], - "source": [ - "slotfill(['i want cheap chinee food'])" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "colab_type": "text", - "id": "U2PUxB5fTgWl" - }, - "source": [ - "Saving slotfill config file to disk (we will require it's path later)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": {}, - "colab_type": "code", - "id": "5MyFaEM7TgWl" - }, - "outputs": [], - "source": [ - "import json\n", - "\n", - "json.dump(slotfill_config, open('my_bot/slotfill_config.json', 'wt'))" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "colab_type": "code", - "id": "_ZlRvicuTgWo", - "outputId": "4f1c3d46-d3b1-4923-823e-e2df1027fc6f" - }, - "outputs": [], - "source": [ - "!ls my_bot" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "colab_type": "text", - "id": "E_InRKO6TgWt" - }, - "source": [ - "## 3. Build and Train a Bot" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "colab_type": "text", - "id": "ySe2m9-5m6iW" - }, - "source": [ - "### Dialogue policy and response templates" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "colab_type": "text", - "id": "qjwbkeDl3TBg" - }, - "source": [ - "A policy module of the bot decides what action should be taken in the current dialogue state. The policy in our bot is implemented as a recurrent neural network (recurrency over user utterances) followed by a dense layer with softmax function on top. The network classifies user input into one of predefined system actions. Examples of possible actions are to say hello, to request user's location or to make api call to a database. " - ] - }, - { - "cell_type": "markdown", - "metadata": { - "colab_type": "text", - "id": "wLE1iebG3WJc" - }, - "source": [ - "![gobot_policy.png](https://github.com/deepmipt/DeepPavlov/blob/master/examples/img/gobot_policy.png?raw=1)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "colab_type": "text", - "id": "ghF-W56m3iW-" - }, - "source": [ - "All actions available for the system should be listed in a `simple-dstc2-templates.txt` file. Also, every action should be associated with a template string of the corresponding system response." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "colab_type": "text", - "id": "TjDnGyiN3nIr" - }, - "source": [ - "![gobot_templates.png](https://github.com/deepmipt/DeepPavlov/blob/master/examples/img/gobot_templates.png?raw=1)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "colab_type": "text", - "id": "-xqGKtXBTgWu" - }, - "source": [ - "Templates for responses should be in the format `TAB

    0}4*o}ATT}ToZU&#&U z1duKwEmTxKy9nh5{(OLx51S4w4A#~-*}kXUE7{;9A15$irpN;iJyR4od8v?bj-%MV zz++?`R+`C7x|dM-T3RkF%b`HldmDA(<|iUoz~!?!8B0kY26V*5+}xGnQw>6OZsR1d zjzx7)OEcSZX`{(f5kVXz{{Y2rc{}z#(DJmdtJvr#RP+8_OPKu0mbX@kDL)I}$Mm?! z9;3C-L73eOdtAi#Q=yaa5eV z*PPKrXMm+q6y{!1YUHq^UbL+wjSHR;A*sg2>AfGGacpwFhsx|CjUB+LRTwQpxTdwaRhiUSA$Z+LH7CFn|CHc&Yk88 zp-U66HbzI2v4@zG!^1MzkcFfzdk%k?VnPJvGTo=_3L(4idh!M5ET9}Z~43#KJbn^Zr=zMpOmX^)| zCj&r&T=X`H_TEYjDAH+N`?eRnj>+?S9y)|)@qcX=ZZ3Civ7}nn`g+-xyWHlBtXIcX zjV1S(qW+))z^D)~I}(o|_@97AZfOn zi$cUBiOKB(KX~B{#(p}vlPw9OadY}^qh|7MP0r3;kMAcQ$jo34pH0D3T(n5l4k(W@ z)HnJO#fML?s52Q4#XsVegwmWkDtUe1I**8G5H zJakdo*#b)gSX%*k?CcqfAGlMEVzd4Bw95aHYLA1f~7YBZ}Et+el#2S8)Qql0) zeMX0|+KE*US7Vf z-#Brv)cN=K5_lFS(@W2`#I|N0{gQFB9gsp3-mQ3Y_=cv3{nj@k5k{ZOT}MH-%Dd)-vk1(~qMeq!UeN~q)L#z>?>U^T~;N?M+ z*|9bRP=AV|qD<5lJ(?1p_B=&-niRrt3Dh4u_DUWn_kKX!QJK0SvMMST zECHYZAYhC_X<04Tla=hSV2?3q;fVv)7oYN4C|bP3^0O3XkbvLBNE{tZd{+RW?+#@Q zweSLzWZXh~mjO_|2ek~FikyzRfH@afd|vl`bOzz6~Bgte9NqHu%4;Ee*GOZwoR>( zwgU!($kpT0HiM%yT$#wtc9i^|uujsTkGBzFMR#ZMNJQ=0EC>uFLbArflT8j3TK|;h zD|fX>$u;$N$v!Azqi8iET7GGo&cdYw!XQ5!=`uf>v@l6Y?^G7ONcz?1Auky=d`}|> zhGbYicT*BM;u|vpsU;jdH_l5Qh>8TxMVQ=DJ=1U#a5x*a1|HhlnybtD+G=O@oRB<@ zLR*6ZRWeW>UPXP=lHh*Vs6QB4nu=x{$!ejT3_QzxIWC)}iTx*C;fs z`PXNWm1nGrq)0$0nsbf1IdT47jEZv|Pa}+adH^l6eNo+7bRZ9O z$=T00Kap^Z4vol`VuEIr3_4jUVXQ3;<>^^?hBT6%ltgx<*gZKVRefdnCga8s413Od zHpN93X~RbP!#uc2z9J5iOH=La>uvUYRJ8@`CR1KXr^_q{a<}te{N#NBI45gwqHxIX z>m4o{D~QaDqyEowVtVZ|54u0lR3CtdA8g_sM#$W9S-I1^Df+R{1)rwj;|8iP&gnQ0 z7TIfbVeKfOB;@hRi=UAIV^1!nesFvp$ls`5*DLfHzOCofgwJFazayT1_aS5Euh_Kh z{s3>Nx-@Q<|77+DH9@PVYXcte+UDL?E6 zI?K_6vF%a}?H7d|P?#B=pMW74)a%+sL&fS}flY^CK`q$RE(i!h*9`9Lz_UaK3<@My zIde`I)bjq%c5sk*A?NgFqxIYZyFmjq>Ehvs*OXGX%kz84+k|3XXvUg2u9%xH#^8%z zPW~qt-``i^e*Q*RmR_`-E~OkP>Q8xQ-NDtfY&+iWIFf$3A)bOsmx0kjXEzS#2+{4D z4O-r2Xa~GX^c2zdFV)l;1T+{;3mt#x*9`2Rw;dhk0RKt@Gw`76h50ujv#JQK;}6sE z8LXB!xyQbuQ?a7z`wZL(nW4)~8RsZ9V|Is5QsS?<8x<7V z)Nk6l=fEM*ENi`Xw9kf;Gm5Klh6b~K~( zkDtzO>MG>Iclk ziIaD&VBC=$Ozy|Se^$$HPuN5kjTvtqJ-^)J@x8+R>U^WXvxuOR)4I#wWPB`P!2EQc z`9l(6wpWM-GtlFJ;Fe!v)0V7n209o@jf8<%Mng4{R#kN(ef$1!fD<><(t32p9JJdZ?jO3&vAls z#%T@na)qcY52!VpI?j$*&ypR(N;ru4t56(TR_z^$U?~p)26)O!4P&0@Q8dp&7B0~U z7DtLG#fT3J`XT!IQ8A+Gc7eZwEyY8Zoaah5Rm|TdJf&DM?c~P*JxK%;X@^NmOC>vs zsrAQ-%5I|;G32URjcLWqTxQnf#|0=I4_9Zgg|KmLRTv_bBzH!5e)O7zrh1;q$I@iwyfeDT3 zM|%!Wq*vc;qIV{TTv~E??q+ILK1O>PsR|kPwO4^dbvfT4B(v$xwBlTc`L|h64pv)~gLfCq&v=GhjC7bJcut&I8ruYAvWZx2CYzhk3VkHu??h zUQHJGo`1dHZ@SMN({A_Z*zp#UW#jC|wBs9#9+#}A4fXnT^${GqggJTrZ$UN|?J3*x zdx*aMhd)O&7yap&^`WzU{IL(}xUahYp)%thKj1OfG^kgFDQ!+t=Liuf!gqg6_~P(O z80IjP1J&*k_^ zUHUug1Os#?5&b$o_Jt+J=1&a`4bu~b?0kZO+OdW$cIQrqMF@t(tylH~5Y zGu;J`8I*0m8hEMGO2%Ql{sdUf_%u@C#P6r*hY3eC4;|&=cjk7|?84XF({B$W1Ff@V zqr1>n4`e9hSoySG^wuUIlR&7dP+1h-%S)_= z^OAm9vdfUn2dB*Jm>Vyf#p9Ziq$QH;A+rK7;f!YRQYlbLF(6?328z8=Gcfui0fLLq z&d6&K;X$lScDzBwgDZoD2by@jV>4#QB$=FBxBZ>nLw1wQZ|`SR5MA5kxY!++;s91` zhSb<5=8^0Dwj4+S7T_pcepw$!b5U);oZQohEs&wff{j2NSKGpFP7eKX_#m$1;gKrs zg>I{IP>(_m95FJ+*Y}%fNXsAJZ}`sGVTr$b7E{{&%*wt21gB6&Mx|FY4;qqJSwP+8 zjuaKncCbW#m4}Df@;S*k9hM0O%UT5k_|D=(ms2a!t6NoHdstk#s-rcy61;Tue2&3% zkPEobt6+Mkif)|cC6>X??!GK;Wk^P`cJ_wbF7}=E+`-~0z^KIn$J)w5W@v2%pRH%= z_tEq+L4lyjgUpWxLOIUW53a5(UC{)ZR3-j7RODC^OjCU?S~@Y<)*2`W6j+xK-n@I~ z_Nk<@OPS|{p5AkH-Fw?h4PeK1>hwv|#Nd|?mUR>gvf1@j?mePJ#JIZ#Q>e-QNgSgG zJES5gmbn+kPtLY1i3@(pDJvKt|orZ3pT)pV~KoEJ8kr|WR3b#fdN^_NVii9Qd z`yc$5xbHTn*=8!b$4h!~-J@`6c?|ztBU?__Vr1&6M$V@mE6cKT2AcjqBqM5tG@zT5 z)t>QjeT#%HmD_Dt%nDq~{Sdb#-cR66yzh#8u_18VaV$oepQGi_A3Zte5)KK9{;5C- zWmsgJN$k+ED5609aRdZFKidVk88=_NoeGsd6oj|3^71D^f>;@x-U|`s08}a8f zTNn>x9h)HaG3mcR2sjBM!>sDDWqsgvtNjS6a&q=o?GG>8nk_KFJ&U%*@p>d-m!rlTa~A|n$9-(oSrKMhzw5;V%8Pkp%f z8OqR=zoJg2vY_a?KmDX>MXcBQe+2E}wF)cw#V!qAc@|^h=|ZF)_EG%Hyy3{IoZry7 zF~8ea$^-)rjgF_jZ8f3ky0zLM&laYbWb9IuL*o;nUcxFpvS7+8ZaRps&CYk3qit+b z4tn*S2Fj!|a2`|m3cG4mg@}uj6!S~Hg>pb`jbCBRr8A&O>g6R`u`?42uP!ie_Uq)N z0PVNz)RI=OAyd?|+kCPK10?-NYv)%ps`G=klIR4wVq$#{rIHQi?dl~`o7MU#xfC}5Rcx%{BLyb6Qru8k+Bl3Bzjx2=J9 zR-itGZzz0-!`3Z1^6_50R84|4&7Nw(t6my05+dene1T#vSe#qerapb_tOTh&`D>Kq zH?6~A|G|X8NPz+Uv%giE?eo_@Ms|;=wKKqy_U%R-RrvYQ z!gAp+>)d0Xjdz}~-?#|fJgEu&sAPQHXK1(Kqu$-Av=1c0A~nkh*W3$ISj>hB$M0m2 z1N$2<#s@99R1*$-f$#l}5~XVlYNV>9uCC$0Y;9H5&KOo^-!Hzc9!Aq-byoWbf84(Q zf-;uEgj%VcrF_7}wGVpRP{4UGoe<%tcBNm2#JE-aT5Ap(-6>lQrmZ2En~yKm4c0cm z2`XV3*c8X@O z3br1{AP?mkfaI8$e#fvyRV1|{q+P|JRmcpvq2Xs!LoUAY4X$f}h*)2=ORnLzCQ48_bg z>qv*9qhFxNh_=@mDn45y&QEPgQczYV?KwdsM6Pd!+EL`vmV4ppP_M0*yLRs$ZPix4GqmweL7IL1?Ac%W`7PNj6iLRg}mO`ft0qBHZ*Ss-ht?v zha?H4sS99A)aYm)s2ShogI7{=^H*RXhRKBTW*ahCOmAc0s%8<{(W7@V4W!eL!5}^jg3!PdeiSICPwtIP33#}vh&-^lY(Mm zav%$eWmOppW(Zn1&>;W9!nG`GDo9gbNapgUX|hmE717!y%o7kTC3TsqVf?>b~MlN;7bETjk5W~Z^d%~8dBW$ z@Ke>&fX?sQhQ>~IS6jb&{kpZ~Y_g*5IkVOLKZ%p1Ev!~67L`uk|4Xf3iN=c9$Tv|w zs!w9(O99+wh#;yy$;;pVJL#N?TLjSjQp5*A4O8gfn# zC^+-M=x1#j&nAaM1vvc(oLt*W<-tdp-s)gGq}y@)9Qi!-+qb1Jcf@ ztm(t%HYZ*xl)Kuk{@aQg+)R{7R;*)%Vs|w%b7kUCQ0p`~afQtkZ}>-H_3xT&c(^*O z(&I$%eYQo0d$7*VJKY1Dl_Bt2{MvS5g4YGgQ#uK!s5jZJJ+^f-aOi&U9MFbO6zZpv zPAn^uwmp-btVR=s3yRCOwZ6Iwi>vpav(@5Eg`#zb?2f8`iR)g&wVHJtTVesi>ww>A zBLu0a{F`yMw3e5669OHNsg2KnoP(t(4+i7!ACrgMtMR?{V!!sks@kxnYGC62??uZ0 dCtJF5w5USNZ{NiaGaKL!%D_y&_=MAqe*^VNyK4Xd diff --git a/docs/_static/social/f_logo_RGB-Blue_58.png b/docs/_static/social/f_logo_RGB-Blue_58.png deleted file mode 100644 index 743ec2d28b193e1ee5005f38646ac3651e52e519..0000000000000000000000000000000000000000 GIT binary patch literal 0 HcmV?d00001 literal 2465 zcmV;S310SzP)%xKL7v;Q%OWYRCt{2on2^LMHI(pcayZ+bh}AgT5qv*D?X?#Bn68o#caWs zg5nn6{J<1MAN<-X!h%9mMc5Z11^ebl@}T%qNEGx*O<3T@U}*OsKsW~tle zV^?N-7Ppx@ckjpCxij~kA1oVoc9WYs|2=c&%*;8lMx#LxBp&%YfHXlNAWcvRND~wS z(xfY_x4|e>vxJZgJO&9Neem-h=FNPjp=E{;vPcM-B81G9i>W#8Cp0A#vj(G3rJbB@ zdkipd5=#X{(R`GPsTtY-vuLyxf>Ef_Z}~8h)A5~X31fmj%Ei=ThxaRoRzffeRchMv zgpiy=6EB-d5X6|m*LV+F2*D^+hvC~l#5Af6wG>ngqpJHA$_fF-{b8^GDu4U!GEPNN zE~chbyg22A00PLvBh9=)f9OnPL^VMuB?K4(MnYl%Ix!A{P-4hXLI^Me7}d}R@Ja>z zhmoK!Bv2v1FhFm@+e~A$C{ceX$l)R|AwcABEF!~zT~5JpAw3Qe$XbI@s8TENGm3zL zTE;0DDy1UTBEV2E5kX(TDb#n!%M3_K69TAwDx_0FzZfqUQ^PX;kQ5=nv~Uzf;0hXd z$ocIz_$~wx!Iw+};)GecxaOxAe03JVDMS%?XTWbK(l?7>6spwGd6{WIyiftlAbkIW zFBSnT0g6C8X|N0#(Z5-^ML?fG3c`H`p%wu}AdQyK&?o8-3KIf2GAIIh3_yrp2ttMi zQ080|;spD4#YuW&jCuWhxlWd9fvnq{6tQ|KN(eIwY#(}Yem*(s7xrKWDLb<-<@d7IG1^7kf!zWroeS!#xfPnAjsGpo#1Z__r z(b>WC2fM@Q6Y$78u?n_@ptCige!&+nB`ZurxN&3^oLB^Ht-+hwo=S+Oh)%F=6>JGX zcWXc+hD<{=ux%CWT7;ag)_^V?X&y23imj=nT_FU1NYgFvJhkDLo>wl6J$B*>9fbf< zL@nb6yZhs0S2`A@nF5cFLa-BJmU7a@51xE5QIoZjxOBU5bKzEFiD_8J#cTD=%!~d* zqm3D>WokP@$T6>I!jb)5^VBf-b+wgRqw>-BcY3}%znWaCHTnr5{mh%xAqNYErdSHW zC{(p|5-IzPgm1HjPp6kFAOCPir+YJOC4^2sg0|Lxa^IGiFT0GsKDRQjnFuHYR!h{O z6#|VP_yHXb-aT_$R|k~apIjlp_H=EBh5ZKiU-QK2>1E+NPbI?Lr(kurxF3o#KV4i^ z*&*x%w@*^G;Nw5FGH&QsbI77t6=AUrmJo-X&%?Sx0T9ahrKjKO9*E z(>Eibz_f2s2+_mzXd((Bf|xmsJRt}>caJCu)4x0+Fp5D!1=GJgA>fQ5+;p}GBGa)T z%EI(9BLp+EfrJjGkIjUDL=Y4L(gcNoG_l;OrUIY6n7r_a5#QwWQj*=Vjf4=PJCt_! z$G4w-ueZcBIN(T5z1LO>x%q?LqIMHnHdTDv$5{8Cy|#V2F>m=OZvAAel0JD&Z> zf+vKiWfzgcuYazovICl_^S~2=@KQ0z-3;!E!1OOq2;5V%h&$+M>44Jso5A!i3PBRr ztBrqExfZkI@jM|gUM1p<=|5Ii6}g4e%qtc`U=)LRgBI>keba+}^LGESd z>Y6GGO{Uh~R0sg$St9OH=**;Pzgh{wR5^%q6go3m;Z>1EAxPlrVtrYWOX2pZWpOW~ zP@M}pwH~j$Ugysx$cImFU|#nE{bs-ZP0jIU7oRG-x#3v3m@=)k(PD&-@f zr=ibYT4P?<2yP87tD%8dnG!@hX02*TTM7Y+O5!v!9B@wZrz8*R~MA>qM)d4;5d8K<9>Z9K^|AVE`cTb0lC*++03XPE{hNQRv_h(d$_ zXj>C9giOM=0athu0t^wynFd5QC9hvf&&T4+#S{%%j57_0ET~_Qb2}uo5S1SxkN!en zB8pjF$v!Xml1vN6Efj-znj-KzLoma)tZFC*;rJ#3;&*yI5W`_~AU!kg zrwE9QJd3arhO-Ix(Ak0OH3YjzNw^8Ki3|pv9CJ+c?0|=~%&Sz<f(u;T)OXJ?0TFcZ77o-5F>lg~*738^ARIwE zC|&NaO%NH_1vRW$C3|Ys#$?6=+@gHlNAmxs7b=9jQK;rX5YIAiRD%+n93KnX0d#^E zLTCq^s|lk}?SnOp96Se^7<3wvRZYnb1c^dBA+&(8cA}Ln%qUa`AxM>l=KnV85s5GJ2pEJI1%2*==N3E%wA5D9U=cdK^*m!%7%L2fT?mLxuProblem? Ask a Question or try our Demo