Skip to content

arturovaine/airflow-web-scraper-etl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 

Repository files navigation

Dockerized Airflow Web ETL

This project sets up a fully dockerized Apache Airflow pipeline to scrape quotes from the web, transform the data, and load it into a PostgreSQL database.
Everything runs through Docker Compose, and the entire project can be bootstrapped with a single shell script.


πŸ“¦ Quickstart

1. Make the shell script executable

chmod +x create_airflow_scraper_project.sh

2. Run the project setup

./create_airflow_scraper_project.sh

3. Activate the virtual environment

source venv/bin/activate

4. Build and start the containers

docker-compose up --build

Wait a few moments...

🌐 Access the Airflow UI

URL: http://localhost:8080

Username: admin Password: admin

πŸƒ Run the DAG

Trigger the etl_web_scraper DAG from the Airflow UI. It will scrape quote data, store it in PostgreSQL, and log each step.

🐘 Check the PostgreSQL Database Enter the database container:

docker exec -it airflow-postgres psql -U airflow -d scraperdb

Run:

\dt
\c scraperdb
SELECT quote, author, tags, created_at FROM quotes LIMIT 5;

βœ… Output

Each quote includes:

  • Quote text

  • Author

  • Tags

  • Timestamp of extraction (created_at)

πŸ“ Project Structure

airflow-webscraper/
β”œβ”€β”€ dags/
β”‚   └── etl_web_scraper.py
β”œβ”€β”€ logs/
β”œβ”€β”€ plugins/
β”œβ”€β”€ docker-compose.yaml
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ .gitignore
└── create_airflow_scraper_project.sh

πŸ“š Tech Stack

  • Python + Airflow

  • Docker Compose

  • PostgreSQL

  • BeautifulSoup for web scraping

About

Fully dockerized Apache Airflow pipeline to scrape quotes from the web, transform the data, and load it into a PostgreSQL database. Everything runs through Docker Compose, and the entire project can be bootstrapped with a single shell script.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages