GitHub - MatthewRuiz/N-Gram-Language-Model

INTRODUCTION

This program provides a set of natural language tools written in Python. It can take a .csv file, find a specific column, provided by the user should they chose to use a new .csv file, and generate unigram, bigram and trigram models. Text was normalized to fit model. See source documentation for the explanation. Once generated, you may retrieve the observed frequency and probability for a given n-gram; as well as other interesting commands. Ultimately, all of the tools are used to randomly generate sentences using the different n-gram models.

REQUIREMENTS

This program requires the following packages:

* argparse (https://pypi.python.org/pypi/argparse)
* NLTK (https://www.nltk.org/data.html)
* PrettyTable (https://pypi.python.org/pypi/PrettyTable)

This program requires the following .csv files:

* Amazone Fine Food Reviews
	(https://www.kaggle.com/snap/amazon-fine-food-reviews/data)
* Wine Reviews
	(https://www.kaggle.com/zynicide/wine-reviews/data)

Changing the file name will result in a FileNotFoundError

BUILD INSTRUCTIONS

There are three optional parameters:

*   -f FILE, --file FILE
            The name of csv file in the format of filename.csv
*   -l LINES, --lines LINES
            The amount of lines of text to be processed.
*   -c COLUMN, --column COLUMN
            If the user wishes to use their own .csv file, the
            index of the column that holds the text should be
            entered. For example: the text for the winemag-
            data_first150k.csv .csv file is in the 3rd column.
            Therefore, a 2 would be entered.

Should no parameters be given, the following default paremeters will be given:

* -f Reviews.csv
* -l 50000
* -c 9

If you are using the terminal, navigate to the directory containing the main.py file.

Entering the command: python main.py -h will provide you with the above instructions.

If you are using an IDE, you can also run the main.py file with or without providing parameters.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.idea		.idea
__pycache__		__pycache__
.DS_Store		.DS_Store
.gitignore		.gitignore
.main.py.swo		.main.py.swo
HW3.docx		HW3.docx
README.md		README.md
bigrams.py		bigrams.py
main.py		main.py
ngrams.py		ngrams.py
read_csv.py		read_csv.py
trigrams.py		trigrams.py
unigrams.py		unigrams.py
~WRL3074.tmp		~WRL3074.tmp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

INTRODUCTION

REQUIREMENTS

BUILD INSTRUCTIONS

About

Uh oh!

Releases

Packages

Languages

MatthewRuiz/N-Gram-Language-Model

Folders and files

Latest commit

History

Repository files navigation

INTRODUCTION

REQUIREMENTS

BUILD INSTRUCTIONS

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages