This document explains the various job scraping APIs and services integrated into the New Grad Jobs aggregator.
Status: Fully Implemented
JobSpy is an open-source Python library that scrapes jobs from popular job sites including LinkedIn, Indeed, and Glassdoor.
Configuration:
jobspy:
enabled: true
sites:
- "indeed"
- "linkedin"
- "glassdoor"
search_terms:
- "new grad software engineer"
- "entry level software engineer"
location: "United States"
results_wanted: 50
hours_old: 72Features:
- No API key required (free and open source)
- Supports multiple major job sites
- Built-in filtering by location and posting date
- Handles proxy rotation and anti-bot measures
- Greenhouse API: Company-specific job boards
- Lever API: Company-specific job boards
- Google Careers API: Direct Google job searches
Status: Configuration Ready
SerpApi provides access to Google Jobs search results in a structured format.
Setup:
- Sign up at serpapi.com
- Get your API key
- Set environment variable:
export SERP_API_KEY="your_api_key" - Enable in config.yml:
scraper_apis: serp_api: enabled: true api_key: "${SERP_API_KEY}"
Status: Configuration Ready
ScraperAPI handles proxy rotation, CAPTCHA solving, and JavaScript rendering for scraping any job site.
Setup:
- Sign up at scraperapi.com
- Get your API key
- Set environment variable:
export SCRAPER_API_KEY="your_api_key" - Enable in config.yml:
scraper_apis: scraper_api: enabled: true api_key: "${SCRAPER_API_KEY}"
The following enterprise-grade APIs are configured in the system but require API keys and subscriptions:
- Zyte (Scrapinghub): Enterprise job scraping service
- Bright Data: Large-scale proxy network with job scraper API
- ScrapingBee: AI-powered web scraping with CAPTCHA solving
- Comprehensive Coverage: Different APIs cover different job sites and companies
- Redundancy: If one API fails, others continue working
- Rate Limit Management: Distribute requests across multiple services
- Cost Optimization: Use free APIs where possible, paid APIs for specialized needs
To add a new job scraping API:
- Update
config.ymlwith the new API configuration - Add a new function
fetch_[api_name]_jobs()inupdate_jobs.py - Add the function call to the main aggregation loop
- Update this documentation
- JobSpy is enabled by default as it's free and effective
- Paid APIs require environment variables for API keys
- All APIs respect the same filtering criteria (new grad signals, location, recency)
- Results are merged and deduplicated before final output