Skip to content

mothivenkatesh/review-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

review-scraper

Pull every review off any G2 product page or Clutch agency page. Clean JSON. One CSV. Feed it to an LLM.

Python License: MIT GitHub stars Last commit Issues

Quickstart · Use cases · Output · Compare · FAQ

If this saves you an afternoon, give it a star. It's the signal I use to decide what to build next.


Why this exists

G2 sells access to its review data for thousands of dollars a month. Clutch is the same. If you're a PMM, founder, or researcher doing competitive intel, persona research, or voice-of-customer work, that price is absurd for a weekend deliverable.

The reviews are public on the web. You just need something that collects them faster than copy-paste. This does that. Point it at a URL, come back in 5 minutes, get every review with rating, title, body, reviewer role and company, date, and pros/cons in clean JSON plus a spreadsheet.

Quickstart

git clone https://github.com/mothivenkatesh/review-scraper.git
cd review-scraper
pip install -r requirements.txt && python -m scrapling install

python scrape.py --site g2 --url https://www.g2.com/products/stripe-payments/reviews
python to_csv.py

No login. No API key. No Reddit-dev-account ceremony. Just go.

What you can do with it

Use case What to run What you get
Competitive teardown Scrape top 3 competitors Every user complaint and rave in one CSV
Persona research Scrape your own product Actual job titles of real users, not interview guesses
Voice-of-customer for positioning Scrape you + top competitor Diff the vocabulary. Gaps are positioning opportunities.
Agency shortlisting Scrape a Clutch category Build your own shortlist without the "Top 10" bait sites
Feed an LLM for theme extraction 500 reviews → Claude "What are the top 10 pain points users mention?"
Pricing research Clutch reviews show project cost Real budgets clients paid, not agency rate cards
Win/loss analysis Scrape competitor mentions in your category What users switched from and why

Sample output

{
  "site": "g2",
  "url": "https://www.g2.com/products/stripe-payments/reviews",
  "scraped_at": "2026-04-22T08:12:44Z",
  "review_count": 287,
  "reviews": [
    {
      "rating": 4.5,
      "title": "Dev-friendly, but support is slow",
      "body": "The API documentation is the best in class...",
      "author": "Sandeep K.",
      "author_title": "Engineering Lead",
      "author_company": "Mid-Market (51-1000 emp.)",
      "date": "2026-03-14",
      "pros": "Excellent docs, webhooks are reliable, good test mode",
      "cons": "Support tickets take 3-5 days for non-Enterprise accounts"
    }
  ]
}

How it compares

review-scraper G2 Data API Clutch Data API Manual copy-paste
Cost Free $2,500/mo+ $1,000/mo+ Your time
Setup time 10 min 2 weeks + procurement Similar 0
Reviewer role + company Yes Yes Yes Yes
Pros/cons breakdown Yes (G2) Yes N/A Yes
Project cost/type Yes (Clutch) N/A Yes Yes
Output format JSON + CSV JSON JSON Doc or sheet
Rate-limited Lightly No No By your patience
Scales to 100 URLs Yes, overnight Yes Yes No

Who this is for

  • Product marketers running competitive intel
  • Product managers doing persona research and ICP validation
  • Founders building positioning from real user language
  • Strategy consultants producing fast voice-of-customer decks
  • Researchers building datasets for NLP sentiment work

Setup walkthrough (for non-developers)

1. Python

python --version

If it's missing or below 3.10, install from python.org. On Windows, tick "Add Python to PATH".

2. Download and install

git clone https://github.com/mothivenkatesh/review-scraper.git
cd review-scraper
pip install -r requirements.txt
python -m scrapling install

The scrapling install line downloads a browser engine it uses to solve Cloudflare challenges (both G2 and Clutch sit behind Cloudflare). Takes a couple minutes, one time only.

3. Run a single scrape

G2 product reviews (use the /reviews URL):

python scrape.py --site g2 --url https://www.g2.com/products/stripe-payments/reviews

Clutch agency profile:

python scrape.py --site clutch --url https://clutch.co/profile/accenture

4. Run many at once

Make urls.txt:

https://www.g2.com/products/stripe-payments/reviews
https://www.g2.com/products/paypal/reviews
https://www.g2.com/products/razorpay/reviews
python scrape.py --site g2 --file urls.txt

5. Build the spreadsheet

python to_csv.py

Writes review-scrape/all_reviews.csv. Open in Excel or Sheets. One row per review with columns for site, URL, rating, title, body, reviewer details, date, and site-specific extras.

Common options

Flag Does
--max-pages 30 Go deeper. Default is 20 pages (~200-400 reviews per URL).
--dump-html Save the first page's raw HTML. Useful when selectors break.
--out-dir my-research Write to a custom folder instead of ./review-scrape/

Where everything lands

review-scrape/
  data/
    g2_stripe-payments_reviews.json
    g2_paypal_reviews.json
    clutch_accenture.json
  all_reviews.csv

When it breaks: selector drift

G2 and Clutch rename their HTML classes every few months. Symptom: scrape finishes with 0 reviews.

Fix in 2 minutes:

  1. Re-run with --dump-html:
    python scrape.py --site g2 --url <URL> --dump-html
    
  2. Open the saved .html file in your browser, right-click a review, pick Inspect. Find the new container class name (usually div[data-testid="..."] or article.some-class).
  3. Open scrape.py, find SITE_CONFIG near the top, add your new selector to the comma-separated list for that field. Keep the old ones as fallback.

If you fix it, please open a PR. It helps everyone.

FAQ

Is this legal?
Reviews are public data. Reading them is fine. Don't redistribute review text as your own content. Don't resell the scraped data. For commercial use at scale, talk to G2 or Clutch directly.

Why Scrapling instead of plain requests?
Both sites use Cloudflare. Plain requests gets blocked. Scrapling solves the Cloudflare challenge automatically.

Can I use this on Capterra/TrustRadius/Gartner Peer Insights?
Not out of the box, but the pattern extends. Add a new entry in SITE_CONFIG with that site's selectors and you're there. PRs welcome.

How many URLs can I scrape per day?
Keep it under 100. Both sites have rate limits that tighten if you hammer them. The scraper already waits 3-6 seconds between pages and 5-10 between URLs.

Does it handle login-gated reviews?
No. Only public reviews. Clutch shows almost everything publicly; G2 hides a small amount behind login.

Can I run this on a schedule?
Yes. Use cron (Mac/Linux) or Task Scheduler (Windows). Use --resume (G2/Clutch don't have that flag yet, but adding one is easy - accept PRs).

For Claude Code users

Drop this folder into ~/.claude/skills/review-scrape/. You get /review-scrape g2 <URL>, /review-scrape clutch <URL>, and /review-scrape csv. See SKILL.md.

Related projects

Credits

Built on Scrapling by @D4Vinci.

License

MIT. Use it, fork it, ship it.


If this saved you an afternoon, star the repo. It genuinely helps.

Report a bug · Request a feature · Follow me on X

Releases

No releases published

Packages

 
 
 

Contributors

Languages