SciScrape

SciScrape is a work-in-progress Python tool designed to scrape scientific job postings from various institutional job boards and deliver a daily digest directly to your inbox.

Description

Finding research positions, especially postdoc jobs relevant to your field, is a tedious process. I spent many hours cycling through the job boards on several university and institute websites during my own job search. As a fun side-project, I decided to create a tool to automate some of this, while also building familiarity with web scraping and deepening my Python skills.
"SciScrape" was made to help with:

Fetching job listings from configured institutional websites (e.g., Humboldt University, Max Planck Institute).
Filtering and parsing the job data.
Generating a clean, HTML-formatted email digest.
Sending the digest to your email address.

Features

Automated Scraping: currently supports parsing for Humboldt University (HU) and Max Planck Institute (MPI) job boards. (At least, it did in late-2025. The HU job board has recently gotten a significant overhaul, so no guarantees there!)
Email Notifications: Sends a daily summary of new job postings using SMTP.
Configurable: Easily add new sites or change user details via config.json.
HTML Templates: Uses Jinja2 to render beautiful email reports.

Setup & Installation

Clone the repository:
```
git clone <repository-url>
cd SciScrape
```
Install dependencies: Ensure you have Python installed. You will need the following packages:
- requests
- beautifulsoup4
- python-dotenv
- jinja2
- lxml (optional, but recommended for BeautifulSoup)
You can install them via pip:
```
pip install requests beautifulsoup4 python-dotenv jinja2 lxml
```
Environment Variables: Create a .env file in the root directory to store your email credentials:
```
SENDER_EMAIL=your_email@gmail.com
APP_PASSWORD=your_google_app_password
```

Configuration: Update config.json with your details and target sites:

{
  "user_name": ["Your Name"],
  "user_email": "your.email@example.com",
  "sites": [  ]
}

Usage

Run the main script to scrape the configured sites and send the email:

python main.py

Project Structure

main.py: The entry point of the application. Orchestrates scraping and emailing.
config.json: Configuration file for user details and site URLs.
parser_hu.py: Scraper logic for Humboldt University.
parser_mpi.py: Scraper logic for Max Planck Institute.
emailer.py: Handles SMTP email sending.
email_template.py: Generates the HTML email content.

Work in Progress

This project is currently under (semi-)active development.

Support for more institutions (Charité, FU Berlin, TU Berlin) is planned.
Refactoring of scraper logic to be more modular.
Improved keyword filtering.

Happy job hunting! :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SciScrape

Description

Features

Setup & Installation

Usage

Project Structure

Work in Progress

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.idea		.idea
.gitignore		.gitignore
README.md		README.md
config.json		config.json
email_template.py		email_template.py
emailer.py		emailer.py
main.py		main.py
parser_hu.py		parser_hu.py
parser_mpi.py		parser_mpi.py
scraper.py		scraper.py

Folders and files

Latest commit

History

Repository files navigation

SciScrape

Description

Features

Setup & Installation

Usage

Project Structure

Work in Progress

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages