-
Notifications
You must be signed in to change notification settings - Fork 0
Pragalbhv/NLP-Project
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
The following were implemented:
__> Preprocessing and setting up eval metrics
__> VSM model
__>LSA model
__>Kmeans and LDA
**********************************************************************
This folder contains the template code for a search engine application.
main.py - The main module that contains the outline of the Search Engine. Do not change anything in this file.
util.py - An extra file where you can add any additional processing or utility functions that you may need for any of the sub-tasks.
sentenceSegmentation.py, tokenization.py, inflectionReduction.py and stopwordRemoval.py - Implement the corresponding sub-tasks inside the functions in these files.
More files corresponding to each sub-task will be provided as the assignment progresses, along with updated versions of main.py
To test your code, run main.py with the appropriate arguments
Usage: main.py [-custom] [-dataset DATASET FOLDER] [-out_folder OUTPUT FOLDER]
[-segmenter SEGMENTER TYPE (naive|punkt)] [-tokenizer TOKENIZER TYPE (naive|ptb)]
When the -custom flag is passed, the system will take a query from the user as input. When the flag is not passed, all the queries in the Cranfield dataset are considered, for example:
> python main.py -custom
> Enter query below
> Papers on Aerodynamics
This will generate *queries.txt files in the OUTPUT FOLDER after each stage of preprocessing of the query and *docs.txt files in the OUTPUT FOLDER after each stage of preprocessing of the documents.
About
NLP project
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published