Skip to content

Web scraping is the automated gathering of content and data from a website or any other resource available on the internet. Unlike screen scraping, web scraping extracts the HTML code under the webpage.

License

Notifications You must be signed in to change notification settings

Devansh-Seth-DEV/News_Scraping

Repository files navigation

Web Scraping

News Reader-Selenium Project

Web scraping is the automated gathering of content and data from a website or any other resource available on the internet. Unlike screen scraping, web scraping extracts the HTML code under the webpage. Users can then process the HTML code of the webpage to extract data and carry out data cleaning, manipulation, and analysis

Project Description

In this project , Web Browser will be accessed with Selenium. News has to be fetched from the browser & printed [ from the specifiesource eg- Hindustan Times ] the headlines and also save it in a text file, also converting the news headlines into speech using Google-Text-To-Speech & saving it into audio file (mp3)

Designed for Linux. Not yet tested on Windows and macOS!

Installation

STEP1: Clone this repository

~$ git clone https://github.com/Devansh-Seth-DEV/News_Scraping.git

STEP2: Create a virtual environment

Open your favourite Terminnal

~$ cd <path to cloned repository> /News_Scraping
News_Scraping:~$ pip3 install virtualenv
News_Scraping:~$ virtualenv <venv_name>

STEP3: Activate virtual environment

News_Scraping:~$ source <venv_name>/bin/activate

STEP4: Give permissions to firefox driver

(<venv_name>) News_Scraping:~$ chmod +x drivers/FirefoxDriver/geckodriver

Install Selenium and gTTS (Google-Text-To-Speech)

Assuming that the virtual environment is activated

(<venv_name>) News_Scraping:~$ pip3 install selenium
(<venv_name>) News_Scraping:~$ pip3 install webdriver-manager
(<venv_name>) News_Scraping:~$ pip3 install gTTS

RUN

Assuming that the virtual environment is activated

(<venv_name>) News_Scraping:~$ python3 news_scraperBOT.py

OUTPUT

News Text-Files Directory

(<venv_name>) News_Scraping:~$ cd ./docs/headlines

News-Text Audio Directory

(<venv_name>) News_Scraping:~$ cd ./audio/

Deactivating Virtual Environment

(<venv_name>) News_Scraping:~$ deactivate

About

Web scraping is the automated gathering of content and data from a website or any other resource available on the internet. Unlike screen scraping, web scraping extracts the HTML code under the webpage.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages