Welcome to the Cloudfide Senior Data Engineer Challenge! In this project, you'll demonstrate your skills in cloud data engineering by designing a data lake architecture, implementing data pipelines, and optimizing data processes. Your mission is to tackle a real-world scenario that involves creating a cloud data lake using Databricks, building data models, and ensuring data quality.
- Python 3.8+
- pip
- Azure account
- Databricks account
- Clone the repository:
git clone <repository-url>
- Navigate to the project directory:
cd cloudfide-senior-data-engineer-challenge - Create a virtual environment:
python -m venv venv
- Activate the virtual environment:
- On Windows:
.\venv\Scripts\activate
- On macOS and Linux:
source venv/bin/activate
- On Windows:
- Install the required packages:
pip install -r requirements.txt
- Follow the task instructions to implement the data lake and pipelines.
data-ingestion/: Contains scripts for data ingestion.data-models/: Contains data model definitions.scripts/: Utility scripts.
- Design and implement a cloud data lake architecture using Azure.
- Implement a data ingestion pipeline with Databricks.
This project is licensed under the MIT License.