SciScrape is a work-in-progress Python tool designed to scrape scientific job postings from various institutional job boards and deliver a daily digest directly to your inbox.
Finding research positions, especially postdoc jobs relevant to your field, is a tedious process. I spent many hours cycling through the job boards on several university and institute websites during my own job search. As a fun side-project, I decided to create a tool to automate some of this, while also building familiarity with web scraping and deepening my Python skills.
"SciScrape" was made to help with:
- Fetching job listings from configured institutional websites (e.g., Humboldt University, Max Planck Institute).
- Filtering and parsing the job data.
- Generating a clean, HTML-formatted email digest.
- Sending the digest to your email address.
- Automated Scraping: currently supports parsing for Humboldt University (HU) and Max Planck Institute (MPI) job boards. (At least, it did in late-2025. The HU job board has recently gotten a significant overhaul, so no guarantees there!)
- Email Notifications: Sends a daily summary of new job postings using SMTP.
- Configurable: Easily add new sites or change user details via
config.json. - HTML Templates: Uses Jinja2 to render beautiful email reports.
-
Clone the repository:
git clone <repository-url> cd SciScrape
-
Install dependencies: Ensure you have Python installed. You will need the following packages:
requestsbeautifulsoup4python-dotenvjinja2lxml(optional, but recommended for BeautifulSoup)
You can install them via pip:
pip install requests beautifulsoup4 python-dotenv jinja2 lxml
-
Environment Variables: Create a
.envfile in the root directory to store your email credentials:SENDER_EMAIL=your_email@gmail.com APP_PASSWORD=your_google_app_password
-
Configuration: Update
config.jsonwith your details and target sites:{ "user_name": ["Your Name"], "user_email": "your.email@example.com", "sites": [ ] }
Run the main script to scrape the configured sites and send the email:
python main.pymain.py: The entry point of the application. Orchestrates scraping and emailing.config.json: Configuration file for user details and site URLs.parser_hu.py: Scraper logic for Humboldt University.parser_mpi.py: Scraper logic for Max Planck Institute.emailer.py: Handles SMTP email sending.email_template.py: Generates the HTML email content.
This project is currently under (semi-)active development.
- Support for more institutions (Charité, FU Berlin, TU Berlin) is planned.
- Refactoring of scraper logic to be more modular.
- Improved keyword filtering.
Happy job hunting! :)