Skip to content

Cardoso-leo/crypto-databricks-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Crypto Analytics Pipeline with Databricks & Delta Lake.

This project implements a batch data pipeline using Databricks, PySpark and Delta Lake, following a Bronze–Silver–Gold architecture.

Architecture

  • Bronze: Raw ingestion from Coinbase API
  • Silver: Data cleaning, type casting and normalization
  • Gold: Analytical layer with window functions (daily price variation)

Technologies

  • Databricks
  • Apache Spark (PySpark)
  • Delta Lake
  • REST API

Key Concepts Applied

  • Schema evolution (mergeSchema)
  • Window functions (LAG)
  • Medallion Architecture
  • Data type handling from external APIs

Motivation This project is part of a personal 50-day challenge focused on deepening my skills in data engineering and analytics.

About

Crypto Analytics Pipeline with Databricks & Delta Lake

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages