19 Jun 14:07

nicolay-r

2a7fa1b

AREkit-0.23.1

Main Updates

Full Changelog

Implemented enhancements:

NativeCsvWriter -- sync deliimiter with other CSV formatters #486

v0.23.1-rc (2023-06-02)

Full Changelog

Implemented enhancements:

filters=[] -- consider the case of None by default [Paper feedback] #479
opinions=[] -- simplify usage of API [paper feedback] #478
BaseSerializerPipelineItem -- required by arekit-ss #476
Neural Network Serializer -- rows_provider should be declared outside [paper backlog/arekit_ss project] #475
Streaming -- support JSON output format #474
RuAttitudesDocumentProvider -- refactor to follow the structure of the rest resources #470
Support None for get_doc_existed_opinion_func [user/paper feedback] #469
SynonymsCollection -- setup default value of iter_group_values_lists to [] #468
DOC_ID column -- remove int type limitation #463
Streaming -- provide header column names for CSV #462
tqdm -- display amount of processed documents in progress-bar [Project Gutenberg backlog] #461
OpinionCollection -- iter_sentiment method is not in use anymore #456
OpinionCollection -- the case of None for opinion results in incomplete initialization #455
OpinionCollection -- copy method is not in use anymore #454
OpinionCollection -- consider opinions=[] by default in, i.e. empty collection. #453
synonyms.py -- is empty and might be removed [QUICK check and fix] #451
Pandas -- completely remove dependencies #450
BertTextBTemplates -- switch name to prompts #446
RuSentRel -- embed train and test indices in collection #444
SentiNEREL -- entity filter #443
SentiNEREL -- move from another project [NIVTS project backlog, RuSentNE competitions] #439

Fixed bugs:

Network module -- context constant has a predefined text value which is limited for networks only #485
read_ruattitudes_to_brat_in_memory -- case of keep_doc_ids_only==True causes exception #482
prompt -- object non subscriptable #481
fill -- in case of None rows count tqdm throws exception #458
create_sample_provider -- misused parameter #445
CroppedBertSampleRowProvider -- might crop with references outside of the bounds [googletranslate-feedback] #440

Closed issues:

Shortening to RuSentRelOpinions.iter_from_doc #480
InputTextOpinionProvider -- rename to ContentsProvider #473
RuSentiFramesCollection.read -- rename method read_collection to read [paper feedback] #472
DocumentOperation -- provide directory-based document provider by default [Project Gutenberg feedback] #467
Stream writing #459
dist_in_sent=0 by default #452
Evaluation -- is not a part of the AREkit soon #449
Prompting -- collect base classes that allows such input processing #447
SentiNEREL -- move split_fixed.txt into the data SentiNEREL data archive. #442
What's new in 0.23.0 #401

Merged pull requests:

CVE-2007-4559 Patch #412 (TrellixVulnTeam)

* This Changelog was automatically generated by github_changelog_generator

Assets 2

21 Jan 11:15

nicolay-r

v0.23.0-rc

a2f6fe8

AREkit-0.23.0-ChineseNY

What's new: Globalization and Internalization

Globalization for any language is the major aspect of 0.23.0, since we annou
nce AREnets and sample-transfer
We tend to generalize some aspects in order to consider other languages than original one (Russian).
We introduce CompoundEntities which may include other entities.

Major

Nested/Compound entities support! #398
Detaching networks contrib module #423 -> AREnets
Appearance of transfer: https://github.com/nicolay-r/arekit-googletrans-sampler

Fixed bugs

Refactored BRAT parser, fixed bugs for other languages/collections.

Minor

#375
Internalization (#435)

Full Changelog

Implemented enhancements:

PipelineContext -- support parent contexts in case of the nested pipelines. #433
Idle mode -- provide such flag into main pipeline #432
MapPipelineItem -- provide ctx parameter in order to reach out parent Pipeline Context [Idle mode] #431
NetworkSerializer -- support the case of Vectorizers==Null [Without embedding, google-trans-sampler backlog] #430
ParsedRow -- depends on pandas, while it might be switched to dict type instead [AREnets backlog] #427
Remove unused code after AREnets movement #425
AREnets -- separated project for networks contrib part, which provides NN implementation based on Tensorflow #423
Entity -- Adopt DisplayValue property for CSV serialization #419
TsvWriter -- Remove Dataframe dependency #408
OpenNREJsonWriter -- df.sort is not an inplace by default #407
NeuralNetworkModelIO -- simplify implementation #406
Brat -- support nested entities (CompoundEntity type) [simple implementation] #398
What's New -- 0.22.1 Release #323

Fixed bugs:

Brat -- incorrect parsing approach may sometimes results in a wrong value might be mismatched (use t) #437
VocabRepositoryUtils -- numpy API considers # by default in vocabulary on load #428
LabelsScaler -- uint dict and dict might have different sizes #426

Closed issues:

read_ruattitudes_to_brat_in_memory -- no need to pass label scaler #436
PosTags -- make them optional parameter for neural networks #435
RuSentiFrames -- clarify tqdm caption when loading (ARElight backlog) #434
Sync with AREnets updates #429
BERT -- provide cropped sampler #422
googletrans -- move to the separeted project #421
_provide_sentence_terms -- consider s_ind and t_ind as well since they may combined with and modified at the same time [nivts_project backlog] #420
Entity -- provide DisplayValue property (which is Value by default) #418
googletrans -- TranslatorPipelineItem for parsed texts #416
Instant downloading -- simplify data downloading #413
PandasBasedRowsStorage -- implement the nested type from the BaseRowsStorage #410
Readers/Writers -- make a part of the contrib #409
TextOpinion Annotation -- particular filtering rules for SentiNEREL and Russian texts. [pipeline items] #404
Evalution -- enhancing error log analysis #400
Statistical Folding provided via file #399
Balancing as a side part of the Storage #380

Merged pull requests:

CVE-2007-4559 Patch #412 (TrellixVulnTeam)

* This Changelog was automatically generated by github_changelog_generator

Assets 2

06 Sep 08:36

nicolay-r

v0.22.1-rc

fadd6d8

arekit-0.22.1

Release Notes 🎉

Full Changelog

WHAT'S NEW:

📓 Provide BRAT-based reader (refactoring) of documents and mentioned entities in it! 🥳
🔧 Provide verbose treatment of values for SynonymsCollection (#327)
🔧 Fixed embedding issues for Entity type for neural networks (#308)
🔧 Refactoring RuSentRel reader, which is now repesents an ontop build over BRAT. (#287)
🔧 Attitude annotation performed on a fly within a pipeline! (#281)
🔧 Opinion annotation does not depend on the experiment (#250)
🔧 #347
🆕 added utils contrib part and there were moved 🥳
- evaluation (2-3 scale)
- cv-splittings (#324)
- entity formatters
- synonyms collections templates: stemmer-based
- experiment handlers (#325)
- np_utils -- utils to interact with np-serialized data (#348)
- pipelines ➿ for opinions extraction and data serialization, text processing: we are now able to declare a custom pipeline and adopt serialization for a variety of RE tasks
(#322),
(#326)
(#351)
🆕 API for conversion of external text_opinions into parsed_news (#338)
🆕 API for a variety of pipelines for data preparation, depending on DataType (#343)
🆕 DataType now includes Dev and Etalon by default (#345)
🆕 Evaluation refactoring, and support TextOpinion level results evaluation (#355)
🗑️ experimential_rusentrel contrib part removed (#321)
🗑️ OpinionRowsProvider should be removed [ARElight backlog] (#282)
fixed: #356

Implemented enhancements:

RuSentiFrames stat -- move script from source to the related UnitTest dir #391
Vocabulary for Embedding -- save it in .txt format. #388
BratSentence -- entities should be initialized via parameter #383
ModelIO -- move vocab and embedding related API to EmbeddingIO #382
BERT -- formatter differs only in TextB. #381
Provide JSON writer for OpenNRE library #378
ExperimentSerializationContext -- some parameters might be optional [Remove them] #369
ExperimentSerializationContext -- Annotator property is not used. #368
DocumentOperations -- iter_doc_ids actually wraps the ExperimentContext functionality #367
iter_tagget_doc_ids -- this might be treated as iter_doc_ids of an another instance #366
ExperimentIterationHandler -- switch to the PipelineItem for NN and BERT serialization [Remove ExperimentEngine and ExperimentHandler] #365
FixedFolding -- intersected parts are not supported [NIVTS project backlog] #364
InputDataSerializationHelper -- refactoring #362
exp_io.balance_samples-- remove Dependency from DataType.Train #360
NeuralNetwork -- for the fine-tunning it is impossible to pick a default embedding/vocabulary. #359
Evaluation -- support results evaluation for TextOpinion #355
DefaultOpinionAnnotator -- etalon_opinion logic might be moved outside [Remove DataType dependency, backlog] #354
StatesCount, StateIndex and iter_states of BaseDataFolding -- this is a part of CV-based method #353
Evaluator refactoring #352
Processing module -- Multiple Languages Scaling [Eng/Rus] [Contents Relocation] #351
ExperimentContext -- remove Evaluator from the base class. #349
np_utils -- move from networks to utils contrib part #348
StringWithEmbeddingNetworkTermMapping -- has hard-coded algorithms for tokens and terms embedding creation. #347
Existed in Embedding -- log (remove print) #346
DataType -- provide Dev and Etalon default types [QUICK fix] #345
Data Serialization -- update API that allow to provide a particular pipeline processor for each DataType [Backlog] #343
Model io utils -- move into contrib part #342
Engine -- provide states iterator as a parameter instead of DataFolding #341
Brat -- provide stability #340
BaseParsedNewsServiceProvider -- support conversion from Entity to DocumentEntity #338
OpinionEntityType -- this should be generalized #335
BratTextEntitiesParser and StringPartitioning -- nested entities are not supported. [Temp fix] #334
RuAttitudesLabelConverter -- required only for conversion (not for parsing) #332
SentenceOpinion -- no need to store entity values #331
Utils -- provide opinion converters from brat #330
RuAtttitudes -- move SentenceOpinion to brat #329
BratEntityCollectionHelper -- extract_entities considering for rows prefixed with T #328
SynonymsCollection -- value_to_group_id_func does not support expansion by default. #327
BERT and Network Serialization -- refactoring duplicated serialization implementations #322
exp_joined -- removed such experiment at experiment_rusentrel contrib #321
rusentrel_experiment -- organize a separated python project #320
"Uknown}" -- specific to RuSentRel entity case #319
BertExperimentInputSerializerIterationHandler -- Simplify API [Blog example backlog] #318
BaseRowsStorage -- consider rows shuffling [ARElight backlog] #316
EntityIds -- expected to be a part of the BaseSampleRowProvider [ARElight backlog] #312
iter_synonym_groups [Sources]-- refactor to common method [ARElight backlog] #310
term-embedding-pairs -- refactor chain of the parameter dependencies. #304
Move EntityFormatters outside #302
Sources -- RusentRel collection based on brat toolkit serialization format #287
BaseOpinionsRowProvider -- useless class and hence should be removed [refactoring IOUtils] #282
IOUtils -- replace experiment instance (and dependency) with string provider. #252
Annotator and algorithm is not related to experiment. #250
DocumentOperations -- parsed docs related API is not related to the expetiment concepts. #249
Remove sep_doc_id variable #131
Update Framework Description #74

Fixed bugs:

StringWithEmbeddingNetworkTermMapping -- map_token is expected a particular type of embedding which return embedding only #395
NetworksTrainingPipelineItem -- pass labels count #379
BertDefaultStringTextTermsMapper -- non masked entity values might be with separation between words #377
iter_rows_linked_by_text_opinions -- fixed bug with incorrect check. Removed doc-related check. #356
TextOpinion should be a part of a single sentence -- this limitation is not emphasized in any way of exceptions and assertions #339
BaseParsedNewsServiceProvider -- incorrect IDs assignation #337
Example -- Documents become mixed [RuAt...

Assets 2

17 Mar 11:46

nicolay-r

v0.22.0-rc-p1

3a11aa3

arekit-0.22.0

Release Notes 🎉

Pipelines integration!
- Utilized now in text processing, which now could be deleted onto tokenization, entities assignation, frames assignation stages.
Repositories for opinions and network input samples!
Storage kernel customizations support for opinion and samples! Using Pandas by default.
Opinion-related service turn into providers: pairs, opinions, text-opinions, etc.

NOTE: issue #232 has been moved to the next release.
This version does not support RuAttitudes collection news parsing!
Will be fixed in the upcomming project.

Changelog

v0.22.0-rc (2022-03-17)

Full Changelog

Changes

Implemented enhancements:

create_term_embedding -- Embedding algorithm based on parts requires useless check #298
UnitTests -- BertOntoNotes is no longer below the core processing #293
SingleLabelScaler -- provide [QUICK] #291
BRAT visualization -- support processing in case of multiple documents. #286
Entity -- IDs Refactoring #280
BaseSampleRowProvider -- provide sentence id #279
BRAT tool -- adopt ui as a callback for the predict pipeline #275
ExperimentIterationHandler -- add Labeled Output Samples convertion to OpinionCollection #270
InferenceContext -- split bags and samples extraction from a single method [Quick] #268
DataFolding -- organize united data folding. #267
BaseDataFolding -- iter_index is not related to the base implementation #266
DataFolding -- move into experiment context #264
DataIO (exp_data var) -- rename it to ExperimentContext #263
ExperimentIterationHandler (Callback before) -- organize ExperimentEvaluationCallback #262
NetworkCallback -- this callback should not inherit experiment base Callback #261
Neural Network Hidden states writers and providers refactoring #260
TrainingCallback -- separate onto TrainingTerminationCallback and HiddenWriterCallback classes. #259
BaseTensorflowModel -- simplify fit and predict operations. #258
LabeledCollection -- remove is_empty and reset_labels api #257
NetworkCallback -- move train/predict notification info into callback #256
Tensorflow saver -- move the related logic outside of the model implementation #255
DefaultSingleLabelAnnotationAlgorithm -- single label is not a part of the algo #244
ThreeScaleTaskAnnotator -- rename and move into core. #243
Data/output -- create pipelines directory with the related output processing #240
Examples -- document parsing executes twicely #239
Might be utilized pipeline implementation #238
OpinionsProvider -- performs two actions, including ids assignation #236
entity_to_group_func -- BaseExperiment should not provide this method. #235
TextOpinionHelper -- to news/parsed/providers (implement the latter as a provider) #233
DefaultSingleLabelAnnotationAlgorithm -- iter_opinion duplicates the generalized pair opinion pair creation approach #231
Common languages dir -- move its contents into processing contrib. #229
Linked Text Opinions Refactoring. #228
Lemmatization should be a part of the frames processing pipeline stage #226
DefaultTextParser -- this class is actually a Tokenizer #225
News -- text-opinions provider and entities access API might be a part of a ParsedNews by means of NewsParser (new class) #224
StringLabelsFormatter -- switch to label_types instead of label instances. #223
AnnotationAlgorithm -- iter_opinions requires EntitiesCollection while the latter utilized for entities iteration #222
TextParseOptions -- add keep_tokens #221
FrameVariantsParser -- return modified terms only #218
FramesAnnotation -- is_inverted flag and processing shoult be a pipeline item #217
FramesCollection -- use FrameConnotationProvider instead #216
FrameVariantsParser -- move into processing subfolder. #215
OpinionOperations -- remove try_read_annotated_opinion_collection #213
DocumentOperation -- unify iter_doc_ids operation into one with tag parameter. #212
OpinionOperations -- move readers* into IO. #211
OpinionCollectionsProvider -- serialization should not be a part of this class #210
data -- separate data-related information from the experiment #209
BaseInputReader -- class stores _df, however it should replaced with BaseRowsStorage #207
Repositories -- fill method should be a part of a storage rather than provider. #204
BaseStorage -- exclude save method into separated class BaseRowsWriter #202
Experiments -- rename formats to api (QUICK) #201
Embedding and Vocabulary -- organize Storage/Repository with serialize/load operations. #200
Sample -- remove dependency from DefaultNetworkConfig. #199
BaseOutputFormatter -- both provider and formatter mixes df usage #198
OpinionProvider -- remove dependency from Opinion and Document Operation instances. #197
Repositiories -- add this class which unite all the providers for data writing #195
Add column providers #194
NetworkSampleFormatter -- switch to provider #193
BaseSampleStorage -- use store_labels instead of data_type passing (QUICK) #192
NetworkOutputEncoder -- separate formatting from serialization. #191
BaseSampleFormatter -- __create_row is not relted to the Formatter, should be moved. #190
BaseDocumentStatGenerator -- provider depends on IO files. #189
OpinonFormatter -- use the latter in experiment io. #188
News -- remove return_text parameter from iter_sentences method (QUICK) #187
BaseRowsFormatter -- move format method in another class #185
BaseSampleFormatter -- _iter_sentence_terms should not be a part of this class. (QUICK) #184
BaseSampleFormatter -- _provide_rows behavior depends on row_ids_provider instance type. #182
BaseSampleFormatter -- remove data_type parameter from ctor #181
BaseObjectParser -- parse method should return object of the same type as sentence #179
News -- remove entities_parser instance from News class. #178
BaseEntitiesParser -- generalize to BaseObjectsParser. [#177](https://github.com/nicolay-r/AREkit/issu...

Assets 2

15 Aug 10:14

nicolay-r

v0.21.0-rc

43e5745

arekit-0.21.0

Changelog

v0.21.0-rc (2021-08-15)

Full Changelog

Implemented enhancements:

Sources -- clarify do_overwrite and refactor check_uniqueness flags RuSentiFrames #150
Compose Python Library #145
Sources -- provide local storage at home directory #144
Enum -- clarify enum34 package using instead of the enum. #143
OpinionCollectionsFormatter -- support to save/load only supported by label_formatter opinions #139
UnitTests -- gather all tests into single folder #125
BaseAnnotator -- intialize method is useless as the passed parameters requires only at serialize_missed_collections method. #123
NeutralAnnotator -- Rename to annotator, as neutral prefix is related to a specifics of the particular task #122
NeutralAnnot -- use a predefined template for names, based on labels count, instead of Name property #121
DefaultNeutralAlgo -- provide dist in sentence parameter #120
NeutralAnnot -- Two/Three scale annotators considered to be a part of the related experiment #119
Evaluation Metrics -- such functions considered to be a part of the particular experiment #115
Embedding -- set_stemmer method is not declared in base class #114
FrameVariantsCollection -- remove stemmer from __init__ params. #113
Bag (NeuralNetworks) -- label could be presented as uint. #110
experiment_rusentrel -- Group all folders by a single exp prefix #108
BaseModel -- Replace epochs_count parameter with generalized parameter structure. #107
OpinionCollection -- provide set of supported labels (opinion filtration by labels) #106
LabelCalculationMode -- make it enum #105
BaseModel -- replace epochs_count with model options #104
ThreeLabelsScaler -- remove dependecies of the latter in NeuralNetwork contrib #103
RuAttitudes -- use int_to_label function instead of label scaler #102
Labels -- Move Scaler into common/labels #101
Labels -- Provide a unique labels for the partucular experiment in contrib #100
Experiments -- reorganize rusentrel experiments data within the related new folder #97

Fixed bugs:

RuAttitudes-v1.2. -- fix downloading link #155
sources -- Remove data folder #149
Entity -- type could be None while there is no restriction for that #148
RuSentRelOpinionCollectionFormatter -- label could not be found during neural network training. #137
frame_variant -- label scaler receives NoLabel while experiment based on NeutralLabel #136
BaseEvaluator -- opinion labels might be incompatible with the one utilized in ResultEvaluator. #124

Closed issues:

UnitTests -- Run all unit tests via bash script #156
Remove release_notes.md file and move the related content into Releases descriptions. #146
Tutorial -- Clarify on how we perform optimization #90

Assets 2

29 Jul 12:08

nicolay-r

v0.20.5-rc

3125d91

AREkit-0.20.5

Release

Fixed:

Using custom check of duplicated opinions during OpinionCollection initialization.
Changes:
Speed-up and engine optimizations:
- Optionally loading neutral annotator.
Multi-Instance networks: now we consider that the next appered context always continues the prior.
(check out multi-instance bags creation for details)
Now shuffling in models performed for bags, not for bag groups.
Networks: added allow_growth=True flag for tensorflow based neural networks.
Memory fraction parameter has been removed.

Collection of parsed news become dispatched from text opinions collection.

News parsing now is assumed to be performed using TextParser.parse(news, options) call. Related refactoring.
- Stemmer application from RuAttitiudes parser has been removed.
Removed dependency from RelatedParsedNewCollection in TextOpinionCollection.
Labeling now separated from LinkedTextOpinion collection.
ParsedText class has been refactored, removed unused methods. Keep tokens has been discarded.
BERT tsv-format-encoders are now in a Factory (at contrib directory).
Fixed: RuSentRelTextOpinion replaced with TextOpinion, and independent from OpinionRef.
Single/Multi models now are not exist, as the latter prefixes affects only onto batch types selection. Refactoring.

Assets 2

29 Jul 11:32

nicolay-r

v0.20.4-rc

7e92387

AREkit-0.20.4

Release Notes

Labels conversion to_str and from_str now a part of external formatters (unique for each source, experiment, etc.).
Added labels-scaler, and labels casing (to int or uint) now depends on scaler;
Added bert exporter in contribution folder: with related formatters according to the [paper]:
- NLI -- (Natural language inference) format, assumes to provide an additional sentence, which describes
  attitude should be extracted
- QA -- (Question answering) provides an additional question onto attitude sentiment.
  With Label encoding in following format:
- Multiple -- all the supported sentiment labels (positive, negative, neutral)
- Binary -- (YES, NO) according to mention (additional sentence), provided by NLI and QA formatters.
Refactoring experiments in order to apply the latter also for classifiers (models from scikit-learn)
Updated nn-engine API
Refactoring tf-based neural network implementation.
Bert now moved into separated folder from contrib directory.
frame_variants moved to frames directory.
Frame variants labeling in news now performed during parse operation.
DataType now enumeration. List of Supported data-types now a part of experiment
The latter were moved onto sample level.
Service folder removed as the latter assumes to be apart of this repository.

Assets 2

Releases: nicolay-r/AREkit

AREkit-0.25.1

Changeset

Major

List of the release related updates #550

Minor changes

🪶 Lightweight the framework

Moved Resources

Removed sampling-related components

Uh oh!

AREkit-0.25.0

Release notes

Support Batching for effecting imputing LLM into text processing pipelines

Flexibility and Performance Enhancements

Fixed bugs

Minor Updates

Minor

Changeset

Uh oh!

AREkit-0.24.0

Improvements

Generalization

Changes and Simplifications

Minor

Uh oh!

AREkit-0.23.1

Main Updates

v0.23.1-rc (2023-06-02)

Uh oh!

AREkit-0.23.0-ChineseNY

What's new: Globalization and Internalization

Major

Fixed bugs

Minor

Uh oh!

arekit-0.22.1

Release Notes 🎉

Uh oh!

arekit-0.22.0

Release Notes 🎉

Changelog

v0.22.0-rc (2022-03-17)

Changes

Uh oh!

arekit-0.21.0

Changelog

v0.21.0-rc (2021-08-15)

Uh oh!

AREkit-0.20.5

Release

Uh oh!

AREkit-0.20.4

Release Notes

Uh oh!

Support `Batching` for effecting imputing LLM into text processing pipelines