A Django-based platform for scraping and managing job listings from various sources.
- Multi-source job scraping (cv.ee, LinkedIn, etc.)
- Job listing management and filtering
- Export functionality (CSV, Excel)
- Scraper management interface
- Real-time job status updates
- Email notifications
- Telegram bot integration
- Python 3.8+
- Django 3.2+
- Redis
- Celery
- PostgreSQL (recommended) or SQLite
- Chrome/Chromium (for Selenium scrapers)
- Clone the repository:
git clone https://github.com/yourusername/jobs_scraping.git
cd jobs_scraping
- Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate # Linux/Mac
venv\Scripts\activate # Windows
- Install dependencies:
pip install -r requirements.txt
- Set up environment variables:
cp .env.example .env
# Edit .env with your configuration
- Run migrations:
python manage.py migrate
- Create a superuser:
python manage.py createsuperuser
- Start Redis server:
redis-server
- Start Celery worker:
celery -A config worker -l info
- Start Celery beat:
celery -A config beat -l info
- Run the development server:
python manage.py runserver
Create a .env
file with the following variables:
DEBUG=True
SECRET_KEY=your-secret-key
DATABASE_URL=postgresql://user:password@localhost:5432/dbname
EMAIL_HOST=smtp.gmail.com
EMAIL_PORT=587
EMAIL_HOST_USER=[email protected]
EMAIL_HOST_PASSWORD=your-app-password
TELEGRAM_BOT_TOKEN=your-bot-token
Scraper settings can be configured in config/settings.py
:
SCRAPER_CONFIG = {
'cv_ee': {
'base_url': 'https://www.cv.ee',
'search_url': 'https://www.cv.ee/toopakkumised',
'max_pages': 10,
},
'linkedin': {
'base_url': 'https://www.linkedin.com',
'search_url': 'https://www.linkedin.com/jobs/search',
'max_pages': 5,
},
}
- Access the scraper management interface at
/scrapers/
- Click "Run Now" to start a specific scraper
- Use "Run All Scrapers" to start all configured scrapers
- Access the job list at
/
- Use filters to find specific jobs
- Click on a job to view details
- Access the export options at
/export/csv/
or/export/excel/
- Download the file in your preferred format
jobs_scraping/
├── apps/
│ ├── accounts/
│ └── scraping/
│ ├── management/
│ ├── scrapers/
│ ├── templates/
│ └── tests/
├── config/
├── logs/
├── media/
├── static/
└── templates/
- Create a new scraper class in
apps/scraping/scrapers/
- Implement the required methods:
__init__
run
parse_job
- Add the scraper to
SCRAPER_CONFIG
in settings - Register the scraper in
apps/scraping/tasks.py
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.