Skip to content

Latest commit

 

History

History
73 lines (57 loc) · 2.81 KB

File metadata and controls

73 lines (57 loc) · 2.81 KB

GitHub All Releases GitHub last commit Python GitHub stars Sourcegraph for Repo Reference Count GitHub code size in bytes Codacy grade

Scraper

A library for performing simple web scraping of a search engine's results page for data analysis tasks. (Note: For personal non-commercial use only. Follow all web scraping guidelines, before getting started. Be kind to servers.)

Requies Python version 3.6 or greater.

Getting Started

This library is intended for personal use only to get search results from a search engine for downstream analysis.

1. Clone Project Repo

git clone https://github.com/meads2/scraper.git
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
pwd

2. Enter search terms to scrape results

python scraper 'my favorite team'

You can use additional flags for various functionality if desired, some default assumptions are assumed.

Parameters

terms - String value of search terms to pass to scraper engine. (ex. 'Python Tips and Tricks')

--selfie - If present selenium will take a screenshot of the browser search window returned.

--dest (FUTURE) - If specified will save results to defined location

--showme (FUTURE) - If present browser window will open at runtime to see execution, useful for debugging.

--engine (FUTURE) - If specified will use that search engine, defaults to Google. ['Bing' - Microsoft Bing, 'duck' - DuckDuckGo, 'google' - Google, 'Yahoo'-Yahoo]

Examples

Basic Example

python scraper 'daily news near me'
### ... running and scraping quietly
### Check your downloads for a surprise!

Screenshot Example

python scraper 'daily news near me' --selfie
### ... running and scraping quietly
### Check your downloads for a surprise!

Verbose Example

python scraper 'daily news near me' --showme 
### ... running and scraping right before your eyes
### Check your downloads for a surprise

Custom Save Example

python scraper 'daily news near me' --dest '../some/location/'
### ... running and scraping quietly to your defined location
### Check your downloads for a surprise!