Cloudfide Senior Data Engineer Challenge

Welcome to the Cloudfide Senior Data Engineer Challenge! In this project, you'll demonstrate your skills in cloud data engineering by designing a data lake architecture, implementing data pipelines, and optimizing data processes. Your mission is to tackle a real-world scenario that involves creating a cloud data lake using Databricks, building data models, and ensuring data quality.

Getting Started

Prerequisites

Python 3.8+
pip
Azure account
Databricks account

Installation

Clone the repository:
```
git clone <repository-url>
```

Navigate to the project directory:

cd cloudfide-senior-data-engineer-challenge

Create a virtual environment:
```
python -m venv venv
```
Activate the virtual environment:
- On Windows:
```
.\venv\Scripts\activate
```
- On macOS and Linux:
```
source venv/bin/activate
```
Install the required packages:
```
pip install -r requirements.txt
```

Running the Project

Follow the task instructions to implement the data lake and pipelines.

Project Structure

data-ingestion/: Contains scripts for data ingestion.
data-models/: Contains data model definitions.
scripts/: Utility scripts.

Tasks

Design and implement a cloud data lake architecture using Azure.
Implement a data ingestion pipeline with Databricks.

License

This project is licensed under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cloudfide Senior Data Engineer Challenge

Getting Started

Prerequisites

Installation

Running the Project

Project Structure

Tasks

License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Cloudfide Senior Data Engineer Challenge

Getting Started

Prerequisites

Installation

Running the Project

Project Structure

Tasks

License