This repository is part of the Volatility Spillovers project. It provides an automated pipeline to download, update, and store electricity market data from the ENTSO-E Transparency Platform.
The project uses Apache Airflow to orchestrate tasks and Docker Compose for local development. Data is stored in Parquet format and organized for downstream processing with dbt and visualization.
-
Download electricity time series data from ENTSO-E.
-
Store data incrementally:
- If no local file exists → fetch from a user-defined start date up to the latest available.
- If a file exists → fetch only missing dates and append to the existing dataset.
-
Standardized storage in Parquet files.
-
Daily scheduling with Airflow.
electricity-data-collector/
├── dags/ # Airflow DAG definitions
├── edc/ # ETL logic (ENTSO-E client, file handlers, etc.)
├── notebooks/ # Notebook for visualizing downloaded data
├── docker-compose.yml # Local Docker setup
├── .env.example # Example environment configuration
└── README.md
- Docker and Docker Compose
- ENTSO-E API key (you can request one here)
-
Clone the repository
git clone https://github.com/franrolotti/electricity-data-collector.git cd electricity-data-collector -
Set up environment variables Copy the example file and edit it:
cp .env.example .env
Update with your values:
ENTSOE_API_TOKEN=your_api_token_here EDC_TIMEZONE=Europe/Madrid # or your local timezone
docker compose run --rm airflow-initThis creates/migrates the metadata DB and the default user.
docker compose up -dThis starts:
airflow-db(PostgreSQL)airflow(scheduler + webserver)
In the UI, toggle entsoe_sync to ON.
To read the file in jupyter notebook after running the airflow task run:
notebooks/show_parket.ipynb- Collector (this repo): electricity-data-collector
- Methodology (dbt + spillover models): volatility-spillovers-methodology
- Visualization (Streamlit/Dash): electricity-visualization