Skip to content

Latest commit

 

History

History
19 lines (14 loc) · 1023 Bytes

File metadata and controls

19 lines (14 loc) · 1023 Bytes

Rus_summarizer. Text summarization tools for Russian language

This repository contains algorithms for extracrive summarization of texts in Russian language.

The thesis and presentation are availble in description folder (here and here).

The algorithms are based on 2 approaches:

  1. TextRank.
  2. Sentence clustering using K-Means.

There were several models of text feature extraction under study:

  1. Bag of words + TF-IDF.
  2. FastText (pretrained model from DeepPavlov lib).
  3. RuBERT (pretrained model from DeepPavlov lib).
  4. RuSBERT (pretrained model from DeepPavlov lib).
  5. MlSBERT (self-trained model using Sentence BERT for English).

The research showed that the best algorithm for summarization is "Mixed" (based on the union of TextRank algorithm and MlSBERT_KMeans).

All algorithms are in the folder "src/Rus_summarizers".