Unlock the full potential of Apache Spark in Microsoft Fabric with this comprehensive, full-day workshop. Tailored for data engineers and data developers, this session offers hands-on experience in creating and optimizing Spark workflows in building data analytics platform with medallion architecture by leveraging industry standard Delta Lake. Dive deep into Spark's capabilities in data transformation, parallel processing, job scheduling, and performance tuning, all within the Microsoft Fabric ecosystem. This workshop will empower Spark newcomers or beginners to tackle complex data challenges with confidence and build an AI-ready data analytics platform.
- Develop Apache Spark-based applications in Microsoft Fabric.
- Utilize Delta Lake and Lakehouse to construct medallion architecture for your data analytics platform.
- Utilize the immersive and rich authoring/development experience with Fabric Notebook and Visual Studio Code - Gain proficiency in writing and executing Spark code within notebooks. Learn useful functions in Notebook for better authoring experience (live versioning, display, notebookutils)
- Use your preferred programming language to build data analytics applications and leverage your existing SQL skills to quickly get started with Spark.
- Manage, monitor, and debug your Spark applications in Microsoft Fabric. Debug spark job with notebook in-context monitoring, Spark details page and OSS Spark UI.
- Discover how to integrate Spark with other Fabric workloads like Data Factory, Data Warehouse, Power BI etc. seamlessly.
- Discover how to leverage public and/or custom libraries to extend the functionality of your Spark applications by using Library Management.
- Extra/Bonus - Learn tips and tricks to optimize your Spark applications and understand how to scale Spark applications to handle large datasets efficiently.
Tip
You can progress through these exercises at your own pace. While we have structured logical breaks within the session, these are merely suggestions. You are not required to stop if you prefer to continue working. These breaks are provided to accommodate those who may need them. Feel free to continue through the material as fits your learning style and needs.
Important
9:00 am - 9:20 am - Introduction, Set Up and Overview of Fabric Analytics Platform
9:20 am - 10:30 am - Module 1 - Developing Spark Applications
10:30 am - 10:45 am - Break
10:45 am - 12:00 pm - Module 2 - Orchestrating Spark
12:00 pm - 01:00 pm - Lunch Break
01:00 pm - 02:10 pm - Module 3 - Job Scheduling, Monitoring, and Debugging
02:10 pm - 02:20 pm - Break
02:20 pm - 03:30 pm - Module 4 - Performance Tuning, Optimizing, and Scaling
03:30 pm - 04:00 pm - Q&A