This project generates a comprehensive timestamp dimension table for data lake analytics.
The generated table includes:
- Timestamp: Hourly timestamps in UTC (24 per day)
- Country Code: PT, NL, FR, ES, DE, IT (Portugal, Netherlands, France, Spain, Germany, Italy)
- Timezone: Corresponding timezone for each country
- Holiday: Binary indicator for national holidays
- Daylight Savings: Binary indicator for daylight saving time periods
- Working Hours: Binary indicator for business hours (9 AM - 6 PM local time)
- Weekend: Binary indicator for weekends
-
Install Poetry if you haven't already:
curl -sSL https://install.python-poetry.org | python3 - -
Install dependencies:
poetry install
-
Run the generator:
poetry run python generate_timestamp_dimension.py
The script generates a CSV file timestamp_dimension.csv with all combinations of timestamps and countries for the year 2024.
You can modify the date range by editing the generate_dimension_table() function call in the main function:
df = generate_dimension_table(start_date='2023-01-01', end_date='2025-12-31')