OpenGovUS Business Scraper

Structured business listings → clean CSVs, with post-processing & summaries.

A production-style Python scraper that collects business registration data from OpenGovUS and writes tidy CSVs. It renders pages with Playwright, parses with BeautifulSoup, and includes optional post-processing and summary steps so clients can use the data immediately.

🔍 Key Features

Dynamic rendering with Playwright (Chromium) to handle JS.
Structured fields exported to CSV: Business Name, Address, Category, Date Registered.
Pagination across multiple result pages.
Basic stealth tactics to reduce trivial bot detection.
Post-processing script to dedupe, clean, and sort records.
Summary generator (plain text + Markdown) for quick insights.

⚙️ Quick Start

Prerequisites

Python 3.10+
Git
Playwright browsers (install step below)

Installation

# 1) Clone
git clone https://github.com/mdugan8186/opengovus-scraper.git
cd opengovus-scraper

# 2) (optional) Virtual environment
python -m venv .venv
# macOS/Linux:
source .venv/bin/activate
# Windows:
.venv\Scripts\activate

# 3) Install dependencies
pip install -r requirements.txt

# 4) Install Playwright browsers (first run only)
python -m playwright install chromium

Run the Scraper

python script.py

Writes the raw CSV to: output/opengovus_listings.csv

Optional: Post-process & Summarize

# Clean & sort, save to samples/cleaned_listings.csv
python postprocess.py

# Create text + markdown summaries from the cleaned CSV
python summarize_data.py

Outputs:
- samples/cleaned_listings.csv
- output/summary.txt
- samples/summary.md

📁 Output

Primary CSV: output/opengovus_listings.csv
Cleaned CSV (optional): samples/cleaned_listings.csv

Columns

Business Name, Address, Category, Date Registered

🧩 Configuration & Selectors

CSS selectors and parsing logic live in the code (script.py). If the site HTML changes, update the selectors there.
For long-term maintainability, you can extract selectors into a config/ JSON (future enhancement).

🎥 Demo

Example of the scraper output:

The full dataset is saved as a CSV: output/opengovus_listings.csv

🧪 Testing & Dev Notes

See TESTING.md for a step-by-step sanity flow (render → extract → clean → summarize), selector maintenance notes, and data-quality checks.

🛠️ Tech Stack

Playwright (Python) for rendering
BeautifulSoup for parsing
pandas for cleaning & summaries
CSV outputs for easy analysis

⚖️ Legal & Ethical Use

This scraper includes basic measures (delays, browser automation) to reduce trivial blocking and ensure reliable data collection.
It is provided for educational and demonstration purposes only. Please review and comply with the target site’s terms of service and robots.txt before running it at scale.

📄 License

This project is licensed under the MIT License. See LICENSE.

👤 About

Mike Dugan — Python Web Scraper & Automation Developer

GitHub: @mdugan8186
Portfolio Website: scraping-portfolio
LinkIn: View my profile
Fiverr: Hire me for web scraping and custom scrapers
Upwork: Hire me for web scraping and Python automation
Email: mdugan8186.work@gmail.com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenGovUS Business Scraper

🔍 Key Features

⚙️ Quick Start

Prerequisites

Installation

Run the Scraper

Optional: Post-process & Summarize

📁 Output

🧩 Configuration & Selectors

🎥 Demo

🧪 Testing & Dev Notes

🛠️ Tech Stack

⚖️ Legal & Ethical Use

📄 License

👤 About

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.vscode		.vscode
extras		extras
media		media
output		output
samples		samples
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
TESTING.md		TESTING.md
postprocess.py		postprocess.py
requirements.txt		requirements.txt
script.py		script.py
summarize_data.py		summarize_data.py

License

mdugan8186/opengovus-scraper

Folders and files

Latest commit

History

Repository files navigation

OpenGovUS Business Scraper

🔍 Key Features

⚙️ Quick Start

Prerequisites

Installation

Run the Scraper

Optional: Post-process & Summarize

📁 Output

🧩 Configuration & Selectors

🎥 Demo

🧪 Testing & Dev Notes

🛠️ Tech Stack

⚖️ Legal & Ethical Use

📄 License

👤 About

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages