GitHub - Pragalbhv/NLP-Project: NLP project

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Dataset/cranfield		Dataset/cranfield
output		output
ME17B162_MM19B012_ProjectReport.pdf		ME17B162_MM19B012_ProjectReport.pdf
NLP PragSartha.ipynb		NLP PragSartha.ipynb
README.txt		README.txt
Report_A1_P2_ME17B162_MM19B012.pdf		Report_A1_P2_ME17B162_MM19B012.pdf
evaluation.py		evaluation.py
evaluation2.py		evaluation2.py
inflectionReduction.py		inflectionReduction.py
informationRetrieval.py		informationRetrieval.py
main.py		main.py
sentenceSegmentation.py		sentenceSegmentation.py
stopwordRemoval.py		stopwordRemoval.py
tokenization.py		tokenization.py
util.py		util.py

Repository files navigation

The following were implemented:


__> Preprocessing and setting up eval metrics

__> VSM model

__>LSA model

__>Kmeans and LDA


**********************************************************************

This folder contains the template code for a search engine application. 

main.py - The main module that contains the outline of the Search Engine. Do not change anything in this file.
util.py - An extra file where you can add any additional processing or utility functions that you may need for any of the sub-tasks.
sentenceSegmentation.py, tokenization.py, inflectionReduction.py and stopwordRemoval.py - Implement the corresponding sub-tasks inside the functions in these files.

More files corresponding to each sub-task will be provided as the assignment progresses, along with updated versions of main.py

To test your code, run main.py with the appropriate arguments
Usage: main.py [-custom] [-dataset DATASET FOLDER] [-out_folder OUTPUT FOLDER]
               [-segmenter SEGMENTER TYPE (naive|punkt)] [-tokenizer TOKENIZER TYPE (naive|ptb)] 
When the -custom flag is passed, the system will take a query from the user as input. When the flag is not passed, all the queries in the Cranfield dataset are considered, for example:
> python main.py -custom
> Enter query below
> Papers on Aerodynamics
This will generate *queries.txt files in the OUTPUT FOLDER after each stage of preprocessing of the query and *docs.txt files in the OUTPUT FOLDER after each stage of preprocessing of the documents.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Pragalbhv/NLP-Project

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages