Skip to content

voidfunction/FabCon25SparkWorkshop

Repository files navigation

Harnessing Apache Spark for Next-Gen Analytics in Microsoft Fabric

Unlock the full potential of Apache Spark in Microsoft Fabric with this comprehensive, full-day workshop. Tailored for data engineers and data developers, this session offers hands-on experience in creating and optimizing Spark workflows in building data analytics platform with medallion architecture by leveraging industry standard Delta Lake. Dive deep into Spark's capabilities in data transformation, parallel processing, job scheduling, and performance tuning, all within the Microsoft Fabric ecosystem. This workshop will empower Spark newcomers or beginners to tackle complex data challenges with confidence and build an AI-ready data analytics platform.

By the end of the workshop, you will be able to:

  • Develop Apache Spark-based applications in Microsoft Fabric.
  • Utilize Delta Lake and Lakehouse to construct medallion architecture for your data analytics platform.
  • Utilize the immersive and rich authoring/development experience with Fabric Notebook and Visual Studio Code - Gain proficiency in writing and executing Spark code within notebooks. Learn useful functions in Notebook for better authoring experience (live versioning, display, notebookutils)
  • Use your preferred programming language to build data analytics applications and leverage your existing SQL skills to quickly get started with Spark.
  • Manage, monitor, and debug your Spark applications in Microsoft Fabric. Debug spark job with notebook in-context monitoring, Spark details page and OSS Spark UI.
  • Discover how to integrate Spark with other Fabric workloads like Data Factory, Data Warehouse, Power BI etc. seamlessly.
  • Discover how to leverage public and/or custom libraries to extend the functionality of your Spark applications by using Library Management.
  • Extra/Bonus - Learn tips and tricks to optimize your Spark applications and understand how to scale Spark applications to handle large datasets efficiently.

Agenda

Tip

You can progress through these exercises at your own pace. While we have structured logical breaks within the session, these are merely suggestions. You are not required to stop if you prefer to continue working. These breaks are provided to accommodate those who may need them. Feel free to continue through the material as fits your learning style and needs.

Important

9:00 am - 9:20 am - Introduction, Set Up and Overview of Fabric Analytics Platform

9:20 am - 10:30 am - Module 1 - Developing Spark Applications

10:30 am - 10:45 am - Break

10:45 am - 12:00 pm - Module 2 - Orchestrating Spark

12:00 pm - 01:00 pm - Lunch Break

01:00 pm - 02:10 pm - Module 3 - Job Scheduling, Monitoring, and Debugging

02:10 pm - 02:20 pm - Break

02:20 pm - 03:30 pm - Module 4 - Performance Tuning, Optimizing, and Scaling

03:30 pm - 04:00 pm - Q&A

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •