Skip to content

nerdbord/cloudfide-senior-data-engineer-challenge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cloudfide Senior Data Engineer Challenge

Welcome to the Cloudfide Senior Data Engineer Challenge! In this project, you'll demonstrate your skills in cloud data engineering by designing a data lake architecture, implementing data pipelines, and optimizing data processes. Your mission is to tackle a real-world scenario that involves creating a cloud data lake using Databricks, building data models, and ensuring data quality.

Getting Started

Prerequisites

  • Python 3.8+
  • pip
  • Azure account
  • Databricks account

Installation

  1. Clone the repository:
    git clone <repository-url>
  2. Navigate to the project directory:
    cd cloudfide-senior-data-engineer-challenge
  3. Create a virtual environment:
    python -m venv venv
  4. Activate the virtual environment:
    • On Windows:
      .\venv\Scripts\activate
    • On macOS and Linux:
      source venv/bin/activate
  5. Install the required packages:
    pip install -r requirements.txt

Running the Project

  • Follow the task instructions to implement the data lake and pipelines.

Project Structure

  • data-ingestion/: Contains scripts for data ingestion.
  • data-models/: Contains data model definitions.
  • scripts/: Utility scripts.

Tasks

  • Design and implement a cloud data lake architecture using Azure.
  • Implement a data ingestion pipeline with Databricks.

License

This project is licensed under the MIT License.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages