Skip to content

franrolotti/electricity-data-collector

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Electricity Data Collector

This repository is part of the Volatility Spillovers project. It provides an automated pipeline to download, update, and store electricity market data from the ENTSO-E Transparency Platform.

The project uses Apache Airflow to orchestrate tasks and Docker Compose for local development. Data is stored in Parquet format and organized for downstream processing with dbt and visualization.


Features

  • Download electricity time series data from ENTSO-E.

  • Store data incrementally:

    • If no local file exists → fetch from a user-defined start date up to the latest available.
    • If a file exists → fetch only missing dates and append to the existing dataset.
  • Standardized storage in Parquet files.

  • Daily scheduling with Airflow.


Repository structure

electricity-data-collector/
├── dags/                # Airflow DAG definitions
├── edc/                 # ETL logic (ENTSO-E client, file handlers, etc.)
├── notebooks/           # Notebook for visualizing downloaded data
├── docker-compose.yml   # Local Docker setup
├── .env.example         # Example environment configuration
└── README.md

Requirements


Quickstart (local with Docker + Airflow)

  1. Clone the repository

    git clone https://github.com/franrolotti/electricity-data-collector.git
    cd electricity-data-collector
  2. Set up environment variables Copy the example file and edit it:

    cp .env.example .env

    Update with your values:

    ENTSOE_API_TOKEN=your_api_token_here
    EDC_TIMEZONE=Europe/Madrid   # or your local timezone
    

3. Initialize Airflow (only the first time)

docker compose run --rm airflow-init

This creates/migrates the metadata DB and the default user.

4. Start Airflow + Postgres

docker compose up -d

This starts:

  • airflow-db (PostgreSQL)
  • airflow (scheduler + webserver)

5. Access the Airflow UI

Open http://localhost:8080

6. Enable the DAG

In the UI, toggle entsoe_sync to ON.


Reading

To read the file in jupyter notebook after running the airflow task run:

notebooks/show_parket.ipynb

Project links

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors