-
Notifications
You must be signed in to change notification settings - Fork 19
Miniproject: Machine Learning
Chaitanya Sharma edited this page Jun 18, 2021
·
21 revisions
1. We created sections using ami3 which look like this
<?xml version="1.0" encoding="UTF-8"?>
<ack>
<title>Acknowledgments</title>
<p>The authors are grateful to CNPq-Programa “Ciências sem fronteiras” (Grant No. 233761/2014-4) for financial support.</p>
</ack>
- Not all sections are labelled with universally accepted vocabulary.
- We want to improve our knowledge resource by clustering together similar articles on a paragraph or section basis. E.g. Using unsupervised learning we find out that gas chromatography is a frequently used phrase, we use it as a label to group together other articles that mention gas chromatography.
- We plan on extracting keywords and phrases using NLTK rake. We create a bag of words and tf-idf representation of the data. We manually agree on the labels we want to use for topic modelling. https://en.wikipedia.org/wiki/Tf%E2%80%93idf
- We want to work with different tools and libraries in python and discover the tools which serves our purpose best.
- Scikit-learn clustering models
- gensim
- countvectorizer
- tf-idf
- LDA
- cosine similarity
- spacy