Skip to content

mehulsuresh/TextPhraseExtraction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

TextPhraseExtraction

Program to find the most used phrases in a text file

How to use

  1. Place text corpus in txt file called Input.txt

  2. Run phrase_extract.py

  3. Enter size of the phrases(number of words per phrase.

  4. Advanced options are aviliable for the following

    1. Soft delete - delete words found in the middle of phrases
    2. Hard replace - replace words.
    3. soft replace - replace the words that hard replace could not catch.
    4. Alternatively you can place words in a Delete.txt file prior to running the program
  5. Outputs appear in Output.xls

Techniques Used

Regex ngrams

About

Program to find the most used phrases in a text file

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages