review-scraper

Pull every review off any G2 product page or Clutch agency page. Clean JSON. One CSV. Feed it to an LLM.

Quickstart · Use cases · Output · Compare · FAQ

If this saves you an afternoon, give it a star. It's the signal I use to decide what to build next.

Why this exists

G2 sells access to its review data for thousands of dollars a month. Clutch is the same. If you're a PMM, founder, or researcher doing competitive intel, persona research, or voice-of-customer work, that price is absurd for a weekend deliverable.

The reviews are public on the web. You just need something that collects them faster than copy-paste. This does that. Point it at a URL, come back in 5 minutes, get every review with rating, title, body, reviewer role and company, date, and pros/cons in clean JSON plus a spreadsheet.

Quickstart

git clone https://github.com/mothivenkatesh/review-scraper.git
cd review-scraper
pip install -r requirements.txt && python -m scrapling install

python scrape.py --site g2 --url https://www.g2.com/products/stripe-payments/reviews
python to_csv.py

No login. No API key. No Reddit-dev-account ceremony. Just go.

What you can do with it

Use case	What to run	What you get
Competitive teardown	Scrape top 3 competitors	Every user complaint and rave in one CSV
Persona research	Scrape your own product	Actual job titles of real users, not interview guesses
Voice-of-customer for positioning	Scrape you + top competitor	Diff the vocabulary. Gaps are positioning opportunities.
Agency shortlisting	Scrape a Clutch category	Build your own shortlist without the "Top 10" bait sites
Feed an LLM for theme extraction	500 reviews → Claude	"What are the top 10 pain points users mention?"
Pricing research	Clutch reviews show project cost	Real budgets clients paid, not agency rate cards
Win/loss analysis	Scrape competitor mentions in your category	What users switched from and why

Sample output

{
  "site": "g2",
  "url": "https://www.g2.com/products/stripe-payments/reviews",
  "scraped_at": "2026-04-22T08:12:44Z",
  "review_count": 287,
  "reviews": [
    {
      "rating": 4.5,
      "title": "Dev-friendly, but support is slow",
      "body": "The API documentation is the best in class...",
      "author": "Sandeep K.",
      "author_title": "Engineering Lead",
      "author_company": "Mid-Market (51-1000 emp.)",
      "date": "2026-03-14",
      "pros": "Excellent docs, webhooks are reliable, good test mode",
      "cons": "Support tickets take 3-5 days for non-Enterprise accounts"
    }
  ]
}

How it compares

	review-scraper	G2 Data API	Clutch Data API	Manual copy-paste
Cost	Free	$2,500/mo+	$1,000/mo+	Your time
Setup time	10 min	2 weeks + procurement	Similar	0
Reviewer role + company	Yes	Yes	Yes	Yes
Pros/cons breakdown	Yes (G2)	Yes	N/A	Yes
Project cost/type	Yes (Clutch)	N/A	Yes	Yes
Output format	JSON + CSV	JSON	JSON	Doc or sheet
Rate-limited	Lightly	No	No	By your patience
Scales to 100 URLs	Yes, overnight	Yes	Yes	No

Who this is for

Product marketers running competitive intel
Product managers doing persona research and ICP validation
Founders building positioning from real user language
Strategy consultants producing fast voice-of-customer decks
Researchers building datasets for NLP sentiment work

Setup walkthrough (for non-developers)

1. Python

python --version

If it's missing or below 3.10, install from python.org. On Windows, tick "Add Python to PATH".

2. Download and install

git clone https://github.com/mothivenkatesh/review-scraper.git
cd review-scraper
pip install -r requirements.txt
python -m scrapling install

The scrapling install line downloads a browser engine it uses to solve Cloudflare challenges (both G2 and Clutch sit behind Cloudflare). Takes a couple minutes, one time only.

3. Run a single scrape

G2 product reviews (use the /reviews URL):

python scrape.py --site g2 --url https://www.g2.com/products/stripe-payments/reviews

Clutch agency profile:

python scrape.py --site clutch --url https://clutch.co/profile/accenture

4. Run many at once

Make urls.txt:

https://www.g2.com/products/stripe-payments/reviews
https://www.g2.com/products/paypal/reviews
https://www.g2.com/products/razorpay/reviews

python scrape.py --site g2 --file urls.txt

5. Build the spreadsheet

python to_csv.py

Writes review-scrape/all_reviews.csv. Open in Excel or Sheets. One row per review with columns for site, URL, rating, title, body, reviewer details, date, and site-specific extras.

Common options

Flag	Does
`--max-pages 30`	Go deeper. Default is 20 pages (~200-400 reviews per URL).
`--dump-html`	Save the first page's raw HTML. Useful when selectors break.
`--out-dir my-research`	Write to a custom folder instead of `./review-scrape/`

Where everything lands

review-scrape/
  data/
    g2_stripe-payments_reviews.json
    g2_paypal_reviews.json
    clutch_accenture.json
  all_reviews.csv

When it breaks: selector drift

G2 and Clutch rename their HTML classes every few months. Symptom: scrape finishes with 0 reviews.

Fix in 2 minutes:

Re-run with --dump-html:

python scrape.py --site g2 --url <URL> --dump-html

Open the saved .html file in your browser, right-click a review, pick Inspect. Find the new container class name (usually div[data-testid="..."] or article.some-class).
Open scrape.py, find SITE_CONFIG near the top, add your new selector to the comma-separated list for that field. Keep the old ones as fallback.

If you fix it, please open a PR. It helps everyone.

FAQ

Is this legal?
Reviews are public data. Reading them is fine. Don't redistribute review text as your own content. Don't resell the scraped data. For commercial use at scale, talk to G2 or Clutch directly.

Why Scrapling instead of plain requests?
Both sites use Cloudflare. Plain requests gets blocked. Scrapling solves the Cloudflare challenge automatically.

Can I use this on Capterra/TrustRadius/Gartner Peer Insights?
Not out of the box, but the pattern extends. Add a new entry in SITE_CONFIG with that site's selectors and you're there. PRs welcome.

How many URLs can I scrape per day?
Keep it under 100. Both sites have rate limits that tighten if you hammer them. The scraper already waits 3-6 seconds between pages and 5-10 between URLs.

Does it handle login-gated reviews?
No. Only public reviews. Clutch shows almost everything publicly; G2 hides a small amount behind login.

Can I run this on a schedule?
Yes. Use cron (Mac/Linux) or Task Scheduler (Windows). Use --resume (G2/Clutch don't have that flag yet, but adding one is easy - accept PRs).

For Claude Code users

Drop this folder into ~/.claude/skills/review-scrape/. You get /review-scrape g2 <URL>, /review-scrape clutch <URL>, and /review-scrape csv. See SKILL.md.

Related projects

twitter-scraper - same spirit, for X/Twitter
reddit-scraper - Reddit via public JSON

Credits

Built on Scrapling by @D4Vinci.

License

MIT. Use it, fork it, ship it.

If this saved you an afternoon, star the repo. It genuinely helps.

Report a bug · Request a feature · Follow me on X

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SKILL.md		SKILL.md
requirements.txt		requirements.txt
scrape.py		scrape.py
to_csv.py		to_csv.py
urls.txt.example		urls.txt.example

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

review-scraper

Why this exists

Quickstart

What you can do with it

Sample output

How it compares

Who this is for

Setup walkthrough (for non-developers)

1. Python

2. Download and install

3. Run a single scrape

4. Run many at once

5. Build the spreadsheet

Common options

Where everything lands

When it breaks: selector drift

FAQ

For Claude Code users

Related projects

Credits

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

review-scraper

Why this exists

Quickstart

What you can do with it

Sample output

How it compares

Who this is for

Setup walkthrough (for non-developers)

1. Python

2. Download and install

3. Run a single scrape

4. Run many at once

5. Build the spreadsheet

Common options

Where everything lands

When it breaks: selector drift

FAQ

For Claude Code users

Related projects

Credits

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages