Skip to content

Commit 62b26c0

Browse files
committed
HDDS-9857. Migrate core concepts documentation from legacy site.
This migration includes comprehensive documentation for: - Architecture overview with detailed component explanations - Complete namespace documentation (volumes, buckets, layouts) - Data replication and storage container concepts - Security documentation including Kerberos and ACL configurations - All associated diagrams and images The content has been reorganized to follow the new site structure and all internal links have been updated to work with the new paths. Generated-by: Claude and Gemini
1 parent 0126e8a commit 62b26c0

File tree

82 files changed

+3016
-151
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

82 files changed

+3016
-151
lines changed

cspell.yaml

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,3 +87,41 @@ words:
8787
- UX
8888
- devs
8989
- CLI
90+
# Technical terms from core concepts
91+
- lakehouse
92+
- filesystems
93+
- keyspace
94+
- frontmatter
95+
- backporting
96+
- rebalancing
97+
- checksummed
98+
- setquota
99+
- clrquota
100+
- HCFS
101+
- Boto
102+
- Tez
103+
- nameserver
104+
- kinit
105+
- keytab
106+
- keytabs
107+
- testuser
108+
- setacl
109+
- getacl
110+
- removeacl
111+
- addacl
112+
- rwdlx
113+
- Flink
114+
- markdownlintignore
115+
- ajv
116+
# Additional technical terms
117+
- ETL
118+
- BEK
119+
- DEK
120+
- EDEK
121+
- rangerkms
122+
- bek
123+
- datastream
124+
- Xceiver
125+
- ISA
126+
- dn
127+
- rw
Lines changed: 324 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,327 @@
11
# Architecture
22

3-
**TODO:** File a subtask under [HDDS-9857](https://issues.apache.org/jira/browse/HDDS-9857) and complete this page or section.
3+
Apache Ozone is a scalable, distributed, and highly available object store designed to handle billions of objects of any size. This document provides an overview of Ozone's architecture, including its core components, data organization, and operational concepts.
44

5-
A high level overview of Ozone's components and what they do.
5+
## Ozone Namespace
6+
7+
Ozone organizes data in a hierarchical namespace consisting of three levels:
8+
9+
### Volumes
10+
11+
Volumes are the top-level entities in the Ozone namespace, conceptually similar to filesystems in traditional storage systems. They typically represent:
12+
13+
- Multi-tenant boundaries
14+
- Organizational divisions
15+
- Project groupings
16+
17+
Volumes provide isolation and can have their own admins and quota limits.
18+
19+
### Buckets
20+
21+
Buckets exist within volumes and act as containers for objects (keys). Each bucket can be configured with specific properties:
22+
23+
- Storage type and replication factor
24+
- Encryption settings
25+
- Access control policies
26+
- Quota limits
27+
28+
Buckets are analogous to directories in a filesystem or buckets in cloud object stores.
29+
30+
### Keys (Objects)
31+
32+
Keys are the actual data objects stored in buckets. They can be:
33+
34+
- Files of any size
35+
- Binary data
36+
- Named using a path-like structure depending on bucket layout
37+
38+
For more details about Ozone's namespace, see the namespace documentation.
39+
40+
### Ozone Bucket Types
41+
42+
Ozone supports two distinct bucket layouts, each optimized for different use cases:
43+
44+
#### Object Store Layout
45+
46+
The Object Store layout (OBS) works like traditional object stores with a flat namespace:
47+
48+
- Objects are stored with their full key path
49+
- No concept of directories (though path-like naming is supported)
50+
- Optimized for object storage workloads
51+
- Compatible with S3-style access patterns
52+
53+
#### File System Optimized Layout
54+
55+
The File System Optimized layout (FSO) provides hierarchical directory structure:
56+
57+
- Directories are first-class entities
58+
- Supports efficient directory operations (listing, renaming)
59+
- Includes filesystem features like trash
60+
- Better performance for filesystem-style workloads
61+
- Default layout type
62+
63+
The bucket layout determines how data is organized and accessed within a bucket.
64+
65+
![Ozone Namespace Hierarchy showing Volumes containing Buckets containing Keys](../../static/img/ozone/ozone-namespace.svg)
66+
67+
## Core Components
68+
69+
Ozone has a modular architecture with several key components that work together to provide a scalable and reliable storage system.
70+
71+
### Ozone Manager (OM)
72+
73+
The Ozone Manager is the metadata server that manages the namespace:
74+
75+
- Handles all volume, bucket, and key operations
76+
- Maintains metadata in RocksDB
77+
- Allocates blocks for data storage
78+
- Manages access control
79+
- Supports HA deployment with Ratis consensus protocol
80+
81+
The OM is the entry point for all namespace operations. It tracks which objects exist and where they are stored.
82+
83+
### Storage Container Manager (SCM)
84+
85+
The Storage Container Manager orchestrates the container lifecycle and coordinates datanodes:
86+
87+
- Manages container creation and allocation
88+
- Tracks datanode status and health
89+
- Handles container replication and EC
90+
- Issues block allocation requests
91+
- Coordinates container balancing
92+
- Supports HA deployment with Ratis
93+
94+
SCM is the control plane for container management.
95+
96+
### Datanode
97+
98+
Datanodes are the workhorses that store the actual data:
99+
100+
- Store data in containers on local disks
101+
- Serve read and write requests
102+
- Report container statistics to SCM
103+
- Participate in replication pipelines
104+
- Handle data integrity checks
105+
106+
Each datanode manages a set of containers and serves read/write requests from clients.
107+
108+
### Recon
109+
110+
Recon is the analytics and monitoring component:
111+
112+
- Collects and aggregates metrics
113+
- Provides a web UI for monitoring
114+
- Offers a consolidated view of the cluster
115+
- Helps identify issues and bottlenecks
116+
- Syncs data from OM, SCM, and Datanodes
117+
118+
Recon is an optional but recommended component for operational visibility.
119+
120+
### S3 Gateway
121+
122+
The S3 Gateway provides S3-compatible API access:
123+
124+
- Implements S3 REST API
125+
- Translates S3 operations to Ozone operations
126+
- Supports most S3 features and SDKs
127+
- Provides authentication and authorization
128+
129+
The S3 Gateway enables applications built for S3 to work with Ozone.
130+
131+
### HttpFS
132+
133+
HttpFS provides a REST interface compatible with WebHDFS:
134+
135+
- Enables HTTP access to Ozone
136+
- Compatible with HDFS clients
137+
- Supports read/write operations
138+
- Facilitates integration with web applications
139+
140+
HttpFS allows web applications to interact with Ozone using familiar APIs.
141+
142+
### Ozone Client
143+
144+
The Ozone client is the software component that enables applications to interact with the Ozone storage system:
145+
146+
- Provides Java libraries for programmatic access
147+
- Handles communication with OM for namespace operations
148+
- Manages direct data transfer with datanodes
149+
- Implements client-side caching for improved performance
150+
- Offers pluggable interfaces for different protocols (S3, OFS)
151+
- Handles authentication and token management
152+
153+
The client library abstracts away the complexity of the distributed system, providing applications with a simple, consistent interface to Ozone storage.
154+
155+
### Component Interactions
156+
157+
The components of Ozone interact in well-defined patterns for different operations:
158+
159+
![Diagram illustrating Ozone client interactions with OM, SCM, and Datanodes for read/write operations.](../../static/img/ozone/ozone-client-interaction.svg)
160+
161+
#### Write Path Sequence
162+
163+
The typical write sequence follows these steps:
164+
165+
1. **Namespace Operations**: The client contacts the Ozone Manager to create or locate the key in the namespace
166+
2. **Block Allocation**: The Ozone Manager requests blocks from the Storage Container Manager
167+
3. **Data Transfer**: The client directly writes data to the selected Datanodes according to the replication pipeline
168+
4. **Key Commit**: After successful data transfer, the client commits the key to the Ozone Manager
169+
170+
#### Read Path Sequence
171+
172+
For reads, the process is simpler:
173+
174+
1. The client requests key location information from the Ozone Manager
175+
2. Using the block location information, the client reads data directly from Datanodes
176+
3. In case of failures, the client retries with alternative replicas
177+
178+
#### Monitoring and Management
179+
180+
![Diagram showing Recon collecting data from OM, SCM, and Datanodes for monitoring.](../../static/img/ozone/recon.svg)
181+
The Recon service continuously:
182+
- Collects metrics from the Ozone Manager, Storage Container Manager, and Datanodes
183+
- Provides consolidated views of system health and performance
184+
- Facilitates troubleshooting and management
185+
186+
## Ozone Internals
187+
188+
Understanding Ozone's internal structure helps to grasp how data is organized and protected.
189+
190+
### Containers
191+
192+
Containers are the fundamental storage units in Ozone:
193+
194+
- Fixed-size (typically 5GB) units of storage
195+
- Managed by the Storage Container Manager (SCM)
196+
- Replicated or erasure-coded across datanodes
197+
- Contain multiple blocks
198+
- Include metadata and chunk files
199+
200+
Containers are self-contained units that include all necessary metadata and data. They are the unit of replication in Ozone.
201+
202+
### Blocks
203+
204+
Blocks are logical units of data within containers:
205+
206+
- Represent portions of objects/keys
207+
- Created when clients write data
208+
- Referenced by object metadata
209+
- Allocated by the Ozone Manager
210+
- Secured with block tokens
211+
212+
When a client writes data, the OM allocates blocks from SCM, and the client writes data to these blocks through datanodes.
213+
214+
### Chunks
215+
216+
Chunks are the physical data units stored on disk:
217+
218+
- Fixed-size portions of blocks
219+
- Written sequentially in container data files
220+
- Checksummed for data integrity
221+
- Optimized for disk I/O
222+
223+
Chunks are the smallest units of data stored on disk and include checksums for integrity verification.
224+
225+
![Diagram showing the structure of an Ozone container with metadata and chunks.](../../static/img/ozone/container.svg)
226+
227+
### Replicated Containers
228+
229+
Ozone provides durability through container replication:
230+
231+
- Default replication factor is 3
232+
- Uses Ratis (Raft) consensus protocol
233+
- Synchronously replicates data across datanodes
234+
- Provides strong consistency guarantees
235+
- Handles node failures transparently
236+
237+
Replicated containers ensure data durability by storing multiple copies of each container across different datanodes.
238+
239+
![Diagram illustrating how data blocks and chunks are stored across datanodes in a replicated container setup.](../../static/img/ozone/ozone-storage-hierarchy-replicated.svg)
240+
241+
### Erasure Encoded Containers
242+
243+
Erasure coding provides space-efficient durability:
244+
245+
- Splits data across multiple datanodes with parity
246+
- Supports various coding schemes (e.g., RS-3-2-1024k)
247+
- Reduces storage overhead compared to replication
248+
- Trades some performance for storage efficiency
249+
- Suitable for cold data storage
250+
251+
Erasure coding allows for data durability with less storage overhead than full replication.
252+
253+
![Diagram illustrating how data and parity blocks are stored across datanodes in an erasure coded container setup.](../../static/img/ozone/ozone-storage-hierarchy-ec.svg)
254+
255+
### Pipelines
256+
257+
Pipelines are groups of datanodes that work together to store data:
258+
259+
- Managed by SCM
260+
- Consist of multiple datanodes
261+
- Handle write operations as a unit
262+
- Support different replication strategies
263+
264+
For detailed information, see [Write Pipelines](http://localhost:3001/docs/core-concepts/replication/write-pipelines).
265+
266+
#### Ratis Replicated
267+
268+
Ratis pipelines use the Raft consensus protocol:
269+
270+
- Typically three datanodes per pipeline
271+
- One leader and multiple followers
272+
- Synchronous replication
273+
- Strong consistency guarantees
274+
- Automatic leader election on failure
275+
276+
#### Erasure Coded
277+
278+
Erasure coded pipelines distribute data and parity:
279+
280+
- Datanodes store data or parity chunks
281+
- EC pipeline size depends on the coding scheme
282+
- Handles reconstruction on node failure
283+
- Optimized for storage efficiency
284+
285+
## Multi-Protocol Access
286+
287+
Ozone supports multiple access protocols, making it versatile for different applications:
288+
289+
### Native API
290+
291+
- Command-line interface and Java API
292+
- Full feature access
293+
- Highest performance
294+
295+
### Ozone File System (OFS)
296+
297+
- Hadoop-compatible filesystem interface
298+
- Works with all Hadoop ecosystem applications
299+
- Path format: `ofs://om-host/volume/bucket/key`
300+
301+
### S3 Compatible
302+
303+
- REST API compatible with Amazon S3
304+
- Works with S3 clients and SDKs
305+
- Path format: `http://s3g-host/bucket/key`
306+
307+
### HttpFS
308+
309+
- REST API compatible with WebHDFS
310+
- Enables web applications to access Ozone
311+
- Path format: `http://httpfs-host/webhdfs/v1/volume/bucket/key`
312+
313+
The multi-protocol architecture allows for flexible integration with existing applications and workflows.
314+
315+
## Summary
316+
317+
Apache Ozone's architecture provides:
318+
319+
1. **Scalability** through separation of metadata and data paths
320+
2. **Reliability** via replication and erasure coding
321+
3. **Flexibility** with multiple access protocols
322+
4. **Performance** through optimized data paths
323+
5. **Manageability** with comprehensive monitoring
324+
325+
This architecture enables Ozone to serve as both an object store and a filesystem, making it suitable for a wide range of applications from big data analytics to cloud-native workloads.
326+
327+
<!-- For more detailed information about Ozone's components and internals, see the System Internals section when available. -->

0 commit comments

Comments
 (0)