Download w2a.zip or the folder asg2 The required code-snippets for working on the Assignment 2 can be found at Assignment 2 work-sheet.ipynb.
- For questions 5,6,7 use the function
levenshtein - For question 6, modify the function
levenshteinon the variablesubstitutions - For question 8, use the function
jaro_winkler. The function is defined in the fileEdistance.py - For questions 5 to 10, the function
uniFreqis needed to calculate the count of unigrams in the corpus C3 - For question 9, the function
bigramFreqis needed to calculate the count of bigrams in the corpus C3 - For question 10, use the code snippet given in the last cell
- use unigram.csv for questions 5,6,7,8
- use bigrams.csv for questions 9,10