Skip to content

Journey

Sanjay Janardhan edited this page Mar 21, 2025 · 1 revision

I began exploring Apache Spark in 2022 when we had our first baby. My journey started with the Udemy course: Udemy - Taming Big Data

This course focused heavily on RDDs and required significant effort to set up on my laptop. After completing it, I pursued the Databricks Certified Data Engineer Associate certification, using materials from the Databricks Academy along with another Udemy course: Udemy - Derar Alhussein

I successfully passed the certification exam. However, due to limited project opportunities, my Spark journey gradually faded into the background.

Revisiting Apache Spark in 2025 In March 2025, I realized that most of my work involved SQL, which pushed me to explore areas I previously found challenging in the data domain. This led me to machine learning, where I started by learning Python and Pandas. My deep-dive into data engineering truly began after mastering Python.

Now, I am revisiting Apache Spark with the following resources:

Study Materials

  1. 8 Steps for a Developer to Learn Apache Spark
  2. Spark: The Definitive Guide by Matei Zaharia
  3. Advanced Apache Spark Training - Sameer Farooqui

Hands-on Practice

  1. Visual Studio Professional Subscription
  2. Google Colab

Clone this wiki locally