Program to find the most used phrases in a text file
-
Place text corpus in txt file called Input.txt
-
Run phrase_extract.py
-
Enter size of the phrases(number of words per phrase.
-
Advanced options are aviliable for the following
- Soft delete - delete words found in the middle of phrases
- Hard replace - replace words.
- soft replace - replace the words that hard replace could not catch.
- Alternatively you can place words in a Delete.txt file prior to running the program
-
Outputs appear in Output.xls
Regex ngrams