Thank you for your interest in contributing! This document provides guidelines and instructions for contributing to this project.
- 🐛 Report bugs - Found an issue? Let us know!
- 💡 Suggest features - Have an idea? We'd love to hear it!
- 📝 Improve documentation - Help others understand the project better
- 🔧 Submit code - Fix bugs or implement new features
- 🧪 Write tests - Help us maintain quality
- 📣 Spread the word - Star the repo, share with others
- Node.js 18+ installed
- npm or yarn
- Git
- Apify account (for testing on the platform)
- Apollo.io account (for testing scraping functionality)
-
Fork the repository
# Click "Fork" on GitHub, then clone your fork git clone https://github.com/YOUR_USERNAME/apollo-data-scraper.git cd apollo-data-scraper
-
Install dependencies
npm install
-
Create a test input file
# Edit .actor/input.json with your test data { "url": "https://app.apollo.io/#/people?page=1", "numberOfPages": 2, "timeBetweenPages": 5 }
-
Run locally
# Test the scraper node test-local.js # Or run the actor npm start
git checkout -b feature/your-feature-name
# or
git checkout -b fix/bug-descriptionBranch naming conventions:
feature/- New featuresfix/- Bug fixesdocs/- Documentation changesrefactor/- Code refactoringtest/- Test additions/changes
- Write clean, readable code
- Follow the existing code style
- Add comments for complex logic
- Update documentation if needed
# Run local test
node test-local.js --url "YOUR_TEST_URL" --pages 2
# Test different scenarios
node test-local.js --pages 1 --delay 3
node test-local.js --pages 5 --delay 10git add .
git commit -m "Description of your changes"Commit message format:
type: Brief description
Longer explanation if needed
- Bullet points for details
- More details
Fixes #issue_number (if applicable)
Types: feat, fix, docs, style, refactor, test, chore
Examples:
feat: Add email validation before saving contacts
fix: Handle timeout errors gracefully
- Increased default timeout to 60s
- Added retry logic for failed pages
Fixes #123
docs: Update README with new examples
git push origin feature/your-feature-nameThen go to GitHub and create a Pull Request.
// ✅ Good
async function extractTableData(page) {
const data = await page.evaluate(() => {
// Implementation
});
return data;
}
// ❌ Avoid
async function extractTableData(page)
{
const data=await page.evaluate(()=>{
// Implementation
})
return data
}- Use async/await instead of callbacks
- Handle errors with try/catch
- Log important steps for debugging
- Validate inputs before processing
- Comment complex logic
- Keep functions small and focused
- Use meaningful variable names
async function scrapePage(page, url, pageNumber) {
try {
console.log(`Scraping page ${pageNumber}: ${url}`);
// Navigate with timeout
await page.goto(url, {
waitUntil: 'networkidle',
timeout: 60000
});
// Wait for content
await page.waitForSelector('table', { timeout: 30000 });
// Extract data
const data = await extractTableData(page);
console.log(`Extracted ${data.length} contacts`);
return data;
} catch (error) {
console.error(`Error scraping page ${pageNumber}:`, error.message);
throw error;
}
}Before submitting a PR, test:
- Scraping with 1 page works
- Scraping with multiple pages works
- Different time delays work
- Error handling works (invalid URL, timeout, etc.)
- Data is correctly formatted
- Phone numbers are properly formatted
- Empty fields are handled correctly
- Special characters are removed
-
Valid input
{"url": "https://app.apollo.io/#/people?page=1", "numberOfPages": 2} -
Invalid URL
{"url": "https://google.com", "numberOfPages": 1} -
Large dataset
{"url": "https://app.apollo.io/#/people?page=1", "numberOfPages": 50} -
Edge cases
- Empty table
- Slow loading page
- Network interruption
- Authentication required
When adding features, update:
- README.md - Main documentation
- USAGE.md - Usage examples
- DEPLOYMENT.md - Deployment instructions (if applicable)
- CHANGELOG.md - Add your changes
- Code comments - Explain complex logic
Good bug reports include:
- Title - Clear, descriptive summary
- Description - What happened vs what you expected
- Steps to reproduce - How to recreate the issue
- Environment - OS, Node version, Apify platform, etc.
- Screenshots/logs - If applicable
- Input configuration - What input caused the issue
## Bug Description
A clear description of what the bug is.
## To Reproduce
Steps to reproduce the behavior:
1. Use this input: `{"url": "...", "numberOfPages": 5}`
2. Run the actor
3. See error
## Expected Behavior
What you expected to happen.
## Actual Behavior
What actually happened.
## Environment
- Node.js version: 18.x
- Playwright version: 1.40.0
- Apify platform: Yes/No
- OS: Windows 10 / macOS / Linux
## LogsPaste relevant logs here
## Screenshots
If applicable, add screenshots.
## Additional Context
Any other context about the problem.
When suggesting features:
- Problem - Describe the problem you're trying to solve
- Solution - Propose a solution
- Alternatives - Any alternative solutions considered
- Use cases - Real-world examples
## Problem
Describe the problem this feature would solve.
## Proposed Solution
How should this feature work?
## Alternatives Considered
What other solutions did you consider?
## Use Cases
- Use case 1
- Use case 2
## Additional Context
Any mockups, examples, or references.- Create an issue first to discuss the feature
- Wait for approval before starting work
- Create a branch from
main - Implement the feature following coding standards
- Test thoroughly with various inputs
- Update documentation
- Submit a PR with clear description
// 1. Add helper function
function isValidEmail(email) {
const regex = /^[^\s@]+@[^\s@]+\.[^\s@]+$/;
return regex.test(email);
}
// 2. Use in extraction logic
if (text && text !== 'No email' && text !== 'NA') {
if (isValidEmail(text)) {
rowData.email = text;
} else {
console.warn(`Invalid email format: ${text}`);
}
}
// 3. Update INPUT_SCHEMA.json if needed
{
"validateEmails": {
"title": "Validate Emails",
"type": "boolean",
"description": "Only save contacts with valid email addresses",
"default": false
}
}
// 4. Update README.md with new feature
// 5. Add to CHANGELOG.md- Respond to feedback promptly
- Be open to suggestions
- Make requested changes
- Keep the PR focused on one feature/fix
- Be respectful and constructive
- Explain why changes are needed
- Approve when ready
- Test the changes if possible
- Update version in
package.json - Update
CHANGELOG.mdwith changes - Create a git tag:
git tag v1.1.0 - Push tag:
git push --tags - Create GitHub release
- Deploy to Apify
Contributors will be:
- Added to CONTRIBUTORS.md
- Mentioned in release notes
- Thanked in the community
- 💬 Open a GitHub Discussion
- 🐛 Create an Issue
- 📧 Email: your-email@example.com
We pledge to make participation in our project a harassment-free experience for everyone.
Positive behavior:
- Using welcoming language
- Being respectful
- Accepting constructive criticism
- Focusing on what's best for the community
Unacceptable behavior:
- Harassment or discriminatory language
- Trolling or insulting comments
- Publishing others' private information
- Other unprofessional conduct
By contributing, you agree that your contributions will be licensed under the MIT License.
Thank you for contributing! 🎉
Your help makes this project better for everyone.