This comprehensive guide covers every step to get your Apollo.io scraper up and running.
- Prerequisites
- Deployment to Apify
- Local Development Setup
- Configuration
- Testing
- Scheduling Automated Runs
- Troubleshooting
✅ Apify Account (Free)
- Go to apify.com
- Click "Sign up for free"
- Confirm your email
- You get $5/month free credit
✅ Apollo.io Account (Required for scraping)
- Sign up at apollo.io
- Log in before using the scraper
- Make sure you have access to lists
✅ GitHub Account (For deployment)
- Sign up at github.com
- Free account works perfectly
- Git installed locally
- Node.js 18+ (for local testing)
- VS Code or code editor (for customization)
- Go to this repository on GitHub
- Click "Fork" button (top right)
- This creates a copy in your GitHub account
- Log into Apify Console
- Click "Actors" in left sidebar
- Click "Create new" button
- Select "From Git repository"
Fill in the form:
Git URL: https://github.com/YOUR_USERNAME/apollo-data-scraper
Name: apollo-data-scraper
Title: Apollo.io Data Scraper
Build tag: latest
Click "Create"
- Click "Build" button
- Wait 2-3 minutes for Docker image to build
- Look for green checkmark ✅
- Click "Start" button
- Enter this input:
{
"url": "https://app.apollo.io/#/people?page=1",
"numberOfPages": 2,
"timeBetweenPages": 5
}- Click "Start"
- Wait for completion (~30-60 seconds)
- Click "Dataset" tab
- Click "Export"
- Choose "CSV" format
- Download your contacts! 🎉
npm install -g apify-cligit clone https://github.com/YOUR_USERNAME/apollo-data-scraper.git
cd apollo-data-scraperapify loginThis opens a browser for authentication.
apify pushThis will:
- Create actor on Apify
- Upload your code
- Build Docker image
- Make it ready to run
apify call --input '{
"url": "https://app.apollo.io/#/people?page=1",
"numberOfPages": 5,
"timeBetweenPages": 5
}'# Clone the repository
git clone https://github.com/YOUR_USERNAME/apollo-data-scraper.git
cd apollo-data-scraper
# Install npm packages
npm installEdit .actor/input.json:
{
"url": "https://app.apollo.io/#/people?page=1",
"numberOfPages": 2,
"timeBetweenPages": 5
}# Quick test (1 page, visible browser)
npm run test:quick
# Full test (5 pages)
npm run test:full
# Custom test
node test-local.js --url "YOUR_URL" --pages 10 --delay 5
# Run as Apify actor (production mode)
npm start"url": "https://app.apollo.io/#/people?finderViewId=12345&page=1"- Must start with
https://app.apollo.io/ - Should be a list or search URL
- Get it from your Apollo.io browser address bar
"numberOfPages": 10- Minimum: 1
- Maximum: 100
- Each page has ~25 contacts
- 10 pages = ~250 contacts
"timeBetweenPages": 5- Minimum: 2 seconds
- Maximum: 30 seconds
- Recommended: 5-6 seconds
- Higher = safer, lower = faster
"proxyConfiguration": {
"useApifyProxy": true
}- Recommended for reliability
- Included in free tier
- Prevents IP blocking
{
"url": "https://app.apollo.io/#/people?page=1",
"numberOfPages": 20,
"timeBetweenPages": 6,
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
}
}npm run test:quickPurpose: Verify URL works and data format is correct
Time: ~10 seconds
Output: JSON file with ~25 contacts
npm run test:fullPurpose: Test pagination and delays
Time: ~30-60 seconds
Output: JSON file with ~50-125 contacts
node test-local.js --pages 10 --delay 5Purpose: Production-like test
Time: ~1-2 minutes
Output: JSON file with ~250 contacts
After each test, verify:
- All contacts have
fullName - Phone numbers are formatted correctly
- Emails look valid (if present)
- No weird characters in names
- Company names are clean
- Job titles are readable
- No duplicate contacts
- JSON file saved successfully
- Go to your Actor in Apify Console
- Click "Schedules" tab
- Click "Create new"
Name: Daily Apollo Scrape
Cron expression: 0 9 * * * (9 AM daily)
Timezone: Your timezone
Input:
{
"url": "https://app.apollo.io/#/people?page=1",
"numberOfPages": 20,
"timeBetweenPages": 5
}
- Toggle "Enabled" to ON
- Click "Save"
# Every day at 9 AM
0 9 * * *
# Every Monday at 10 AM
0 10 * * 1
# Every hour
0 * * * *
# Every 6 hours
0 */6 * * *
# First day of month at midnight
0 0 1 * *const { ApifyClient } = require('apify-client');
const client = new ApifyClient({ token: 'YOUR_TOKEN' });
await client.schedules().create({
name: 'Daily Apollo Scrape',
isEnabled: true,
cronExpression: '0 9 * * *',
timezone: 'America/New_York',
actions: [{
type: 'RUN_ACTOR',
actorId: 'YOUR_ACTOR_ID',
input: {
url: 'https://app.apollo.io/#/people?page=1',
numberOfPages: 20,
timeBetweenPages: 5
}
}]
});Symptoms: Actor completes but returns 0 contacts
Causes:
- Not logged into Apollo.io
- Wrong URL format
- Apollo.io page structure changed
Solutions:
- Verify the URL works in your browser
- Make sure you're logged into Apollo.io
- Check URL starts with
https://app.apollo.io/ - Try a different list URL
Symptoms: Build error in Apify Console
Causes:
- Missing files in repository
- Syntax errors in code
- Wrong Node.js version
Solutions:
- Check all files are committed to Git
- Verify
package.jsonexists - Make sure
Dockerfileis present - Check Apify build logs for details
# Verify files locally
git status
git add .
git commit -m "Fix missing files"
git pushSymptoms: "Navigation timeout" or "Waiting for selector timeout"
Causes:
- Slow network connection
- Apollo.io is slow to respond
- Page requires more time to load
Solutions:
- Increase timeout in main.js:
await page.goto(pageUrl, {
waitUntil: 'networkidle',
timeout: 120000 // Changed from 60000 to 120000
});- Increase selector timeout:
await page.waitForSelector('table', {
timeout: 60000 // Changed from 30000
});- Add more wait time:
await page.waitForTimeout(5000); // Wait 5 secondsSymptoms: Run completes successfully but dataset is empty
Causes:
- Authentication required
- Credits exhausted on Apollo.io
- Page structure different than expected
Solutions:
- Check Apollo.io account status
- Verify you can see contacts manually
- Try a different list
- Check if you have credits available
- Increase delays:
"timeBetweenPages": 10
Symptoms: Contacts have names but no emails/phones
This is NORMAL!
Reasons:
- Apollo.io requires credits to reveal contact info
- Some contacts don't have public emails/phones
- Privacy settings prevent data sharing
What to do:
- Check your Apollo.io credits
- Upgrade your Apollo.io plan
- Accept that some data may be incomplete
Symptoms:
- Requests fail after a few pages
- "Too many requests" error
- Slow response times
Solutions:
- Increase delay between pages:
{
"timeBetweenPages": 10
}- Enable Apify proxy:
{
"proxyConfiguration": {
"useApifyProxy": true
}
}- Run during off-peak hours
- Split into smaller runs
Symptoms: "JavaScript heap out of memory"
Solutions:
- Reduce pages per run:
{
"numberOfPages": 50 // Instead of 100
}- Increase memory in Apify (Console → Run options):
Memory: 512 MB → 1024 MB
- Process in batches:
- Run 1: Pages 1-50
- Run 2: Pages 51-100
If you're still stuck:
-
Check documentation:
- README.md
- USAGE.md
- PROJECT_SUMMARY.md
-
Search GitHub Issues:
- Someone may have had the same problem
- Check closed issues too
-
Create an Issue:
- Include error logs
- Share your input configuration
- Describe what you've tried
-
Apify Support:
Once setup is complete:
- ✅ Schedule regular runs - Automate your data collection
- ✅ Set up webhooks - Get notified when scraping completes
- ✅ Integrate with tools - Connect to Zapier, Make.com, or your CRM
- ✅ Customize the code - Add your own features
- ✅ Share feedback - Help us improve!
You're all set! Happy scraping! 🚀
Need quick help? Check QUICK_START.md for the 5-minute version.