Welcome to the 883 Group Onboarding for Newbies! This repository is your gateway to a structured 10-chapter program, designed to seamlessly integrate you into our dynamic Data Ops team. The program is organized into sessions with tasks, goals, and resources to guide you through the learning process. You'll also have the opportunity to engage in discussions, Q&A sessions, and hands-on exercises to reinforce your understanding and apply your knowledge in real-world scenarios.
- Intro: Welcome & Introduction - Understand the onboarding process and the 883 Group's vision and mission ๐ฏ
- Big Data Concepts: Introduction to Big Data - Core Concepts
- System: Get familiar with system and Linux. :bow_and_arrow:
This chapter covers foundational storage concepts, moving from general file system theory to modern data lake technologies. The days are organized thematically rather than by tool to help you build a coherent mental model.
-
File Systems Fundamentals โ general file system concepts, hierarchy, metadata, allocation, permissions, consistency.
-
Hadoop Distributed File System (HDFS) โ architecture, NameNode/DataNode, replication, HA, federation, etc.
-
S3-Compatible Object Storage โ object model, APIs, metadata, consistency, security, ecosystem.
-
HBase โ columnar store architecture, regions, WAL, compactions, ZooKeeper.
-
Hive Metastore & Table Format โ metadata service and table formatting (no execution engines).
-
Catalogs & Table Formats โ metadata catalogs and modern table formats (Iceberg, Delta, Hudi).
-
Data Partitioning โ partitioning strategy, pruning, maintenance, and interaction with formats.
-
Apache Iceberg โ deep dive on the Iceberg table format questions.
-
Practical Exercise 01: Meet the Hadoop Ecosystem in Action
-
Showcase 01: Develop a high-level understanding of the Hadoop ecosystem and its role in big data processing. ๐
If you have any suggestions or ideas to enhance the onboarding for future members, feel free to contribute. Fork, create a feature branch, commit changes, and create a pull request.
Note: โOverlap 1โ, โOverlap 2โ, etc. are just placeholders showing that multiple sessions may run concurrently in a week.
A detailed timeline is forthcoming; the original Excel schedule will be added here or linked once it is finalized.