Skip to content

Onboarding resources for new team members in the 883 Group, covering data operations, customer engagement, Hadoop, Spark, Airflow, Kubernetes, and more. Get started on your journey to success with us!

Notifications You must be signed in to change notification settings

883G/Onboarding-Newbies

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

14 Commits
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

883 Group Onboarding - Newbies ๐Ÿผ

Welcome to the 883 Group Onboarding for Newbies! This repository is your gateway to a structured 10-chapter program, designed to seamlessly integrate you into our dynamic Data Ops team. The program is organized into sessions with tasks, goals, and resources to guide you through the learning process. You'll also have the opportunity to engage in discussions, Q&A sessions, and hands-on exercises to reinforce your understanding and apply your knowledge in real-world scenarios.

Table of Contents

Chapter 0: Foundations in the 883 Group

  • Intro: Welcome & Introduction - Understand the onboarding process and the 883 Group's vision and mission ๐ŸŽฏ
  • Big Data Concepts: Introduction to Big Data - Core Concepts
  • System: Get familiar with system and Linux. :bow_and_arrow:

Chapter 01: Introduction to storage

This chapter covers foundational storage concepts, moving from general file system theory to modern data lake technologies. The days are organized thematically rather than by tool to help you build a coherent mental model.

  • File Systems Fundamentals โ€“ general file system concepts, hierarchy, metadata, allocation, permissions, consistency.

  • Hadoop Distributed File System (HDFS) โ€“ architecture, NameNode/DataNode, replication, HA, federation, etc.

  • S3-Compatible Object Storage โ€“ object model, APIs, metadata, consistency, security, ecosystem.

  • HBase โ€“ columnar store architecture, regions, WAL, compactions, ZooKeeper.

  • Hive Metastore & Table Format โ€“ metadata service and table formatting (no execution engines).

  • Catalogs & Table Formats โ€“ metadata catalogs and modern table formats (Iceberg, Delta, Hudi).

  • Data Partitioning โ€“ partitioning strategy, pruning, maintenance, and interaction with formats.

  • Apache Iceberg โ€“ deep dive on the Iceberg table format questions.

  • Practical Exercise 01: Meet the Hadoop Ecosystem in Action

  • Showcase 01: Develop a high-level understanding of the Hadoop ecosystem and its role in big data processing. ๐Ÿ”Ž

Contribution ๐Ÿ™Œ

If you have any suggestions or ideas to enhance the onboarding for future members, feel free to contribute. Fork, create a feature branch, commit changes, and create a pull request.

Note: โ€œOverlap 1โ€, โ€œOverlap 2โ€, etc. are just placeholders showing that multiple sessions may run concurrently in a week.

Timeline

A detailed timeline is forthcoming; the original Excel schedule will be added here or linked once it is finalized.

About

Onboarding resources for new team members in the 883 Group, covering data operations, customer engagement, Hadoop, Spark, Airflow, Kubernetes, and more. Get started on your journey to success with us!

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors