Skip to content

duplication of categorycrawler to accomodate feature extraction and document classification algorithms

Notifications You must be signed in to change notification settings

darkocejkov/webclassifier

Repository files navigation

webclassifier

Python project dedicted to exploring data classification through Bayesian algorithms, and different approaches to feature extraction.
Done in 3 phases:\ I) Document collection through web scraping\ II) Feature extraction via Bag of Words and TF-IDF\ III) Document classification and accuracy analysis via Multinomial Naive Bayes (supervised) machine learning.

Libraries used:
[scikit-learn] https://scikit-learn.org/stable/index.html
[BeautifulSoup] https://www.crummy.com/software/BeautifulSoup/bs4/doc/
python; requests, os, shutil, pathlib,

About

duplication of categorycrawler to accomodate feature extraction and document classification algorithms

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published