OpenWeather Data Engineering Project README

Overview

This project is a data engineering pipeline implemented using Apache Airflow on Amazon Web Services (AWS) for processing OpenWeather data. The pipeline involves extracting weather data from the OpenWeather API, transforming it, and loading it into a data warehouse for analysis and visualization.

Components

1. Data Source

OpenWeather API: Data is extracted from the OpenWeather API, which provides current weather data, forecasts, and historical weather data.

2. AWS Services

Amazon S3 (Simple Storage Service): Used for storing intermediate data files and any necessary configuration files.
Apache Airflow: Used for orchestrating the data pipeline.
- DAGs (Directed Acyclic Graphs): Define the workflow of the pipeline.
- Operators: Perform tasks such as data extraction, transformation, and loading.
- Schedulers: Schedule and monitor the execution of tasks.

3. Data Pipeline Steps

Extract Data: Use Airflow to call the OpenWeather API and retrieve weather data for specified locations and time periods.
Transform Data: Cleanse and transform the raw weather data into a structured format suitable for analysis.
Load Data into S3: Load the transformed weather data into S3 for storage and analysis.

Setup Instructions

AWS Setup:
- Set up an AWS account if you haven't already.
- Create an S3 bucket for storing intermediate data files.
OpenWeather API Setup:
- Sign up for an OpenWeather API account and obtain an API key.
Apache Airflow Setup:
- Deploy an Airflow environment on AWS using services like Amazon ECS, EC2, or EMR.
- Define the DAGs, operators, and connections in Airflow for orchestrating the pipeline.
Configuration:
- Configure Airflow to use the OpenWeather API key.
- Set up connections to Amazon S3 in Airflow.
Run the Pipeline:
- Trigger the DAGs in Airflow to start the data pipeline execution.

Notes

Monitor AWS costs, especially related to data storage usage.
Consider setting up monitoring and alerting for the pipeline to detect and handle failures.
Ensure proper error handling and logging in the Airflow DAGs for troubleshooting.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
airflow		airflow
output		output
.env		.env
Readme.md		Readme.md
Screenshot 2024-03-18 140210.png		Screenshot 2024-03-18 140210.png
airflow_grid.png		airflow_grid.png
current_weather_data_portland_18032024081742.csv		current_weather_data_portland_18032024081742.csv
current_weather_data_portland_18032024081938.csv		current_weather_data_portland_18032024081938.csv
current_weather_data_portland_18032024082105.csv		current_weather_data_portland_18032024082105.csv
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenWeather Data Engineering Project README

Overview

Components

1. Data Source

2. AWS Services

3. Data Pipeline Steps

Setup Instructions

Notes

About

Uh oh!

Releases

Packages

Languages

Shainmax/OPENWEATHER-ETL

Folders and files

Latest commit

History

Repository files navigation

OpenWeather Data Engineering Project README

Overview

Components

1. Data Source

2. AWS Services

3. Data Pipeline Steps

Setup Instructions

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages