Python project dedicted to exploring data classification through Bayesian algorithms, and different approaches to feature extraction.
Done in 3 phases:\ I) Document collection through web scraping\ II) Feature extraction via Bag of Words and TF-IDF\ III) Document classification and accuracy analysis via Multinomial Naive Bayes (supervised) machine learning.
Libraries used:
[scikit-learn] https://scikit-learn.org/stable/index.html
[BeautifulSoup] https://www.crummy.com/software/BeautifulSoup/bs4/doc/
python; requests, os, shutil, pathlib,
-
Notifications
You must be signed in to change notification settings - Fork 0
darkocejkov/webclassifier
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
About
duplication of categorycrawler to accomodate feature extraction and document classification algorithms
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published