This repository contains Code, Data and Documentation of Research Internship at PipeCandy.
Designed Heuristic Algorithms using ML for various E-Commerce Platforms
to efficiently automate the process of scraping for Product URLs
& also generating a Sitemap. All code was reviewed, perfected, and pushed to production.
Abstract of the Research Internship :
• Worked on coming up with Heuristic Algorithms to efficiently automate the process of scraping the E-Commerce Web Pages for Product Page URLs and in the process generating a Sitemap for E-Commerce Sites.
• By applying ML Models on Product URLs and Non Product URLs of various E-Commere Sites, a Bag of Words were obtained.
• Using the above Key-Words, Heuristic Algorithms for various E-Commerce Platforms were designed to efficiently scrape the Web Pages.
• All code was reviewed, perfected, and pushed to production.