Skip to content

Latest commit

 

History

History
53 lines (45 loc) · 1.48 KB

File metadata and controls

53 lines (45 loc) · 1.48 KB

Cloudfide Senior Data Engineer Challenge

Welcome to the Cloudfide Senior Data Engineer Challenge! In this project, you'll demonstrate your skills in cloud data engineering by designing a data lake architecture, implementing data pipelines, and optimizing data processes. Your mission is to tackle a real-world scenario that involves creating a cloud data lake using Databricks, building data models, and ensuring data quality.

Getting Started

Prerequisites

  • Python 3.8+
  • pip
  • Azure account
  • Databricks account

Installation

  1. Clone the repository:
    git clone <repository-url>
  2. Navigate to the project directory:
    cd cloudfide-senior-data-engineer-challenge
  3. Create a virtual environment:
    python -m venv venv
  4. Activate the virtual environment:
    • On Windows:
      .\venv\Scripts\activate
    • On macOS and Linux:
      source venv/bin/activate
  5. Install the required packages:
    pip install -r requirements.txt

Running the Project

  • Follow the task instructions to implement the data lake and pipelines.

Project Structure

  • data-ingestion/: Contains scripts for data ingestion.
  • data-models/: Contains data model definitions.
  • scripts/: Utility scripts.

Tasks

  • Design and implement a cloud data lake architecture using Azure.
  • Implement a data ingestion pipeline with Databricks.

License

This project is licensed under the MIT License.