You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Storage light module is available on zbus over the same channel as the full storage module:
6
+
7
+
| module | object | version |
8
+
|--------|--------|---------|
9
+
| storage|[storage](#interface)| 0.0.1|
10
+
11
+
## Introduction
12
+
13
+
`storage_light` is a lightweight variant of the [storage module](../storage/readme.md). It implements the same `StorageModule` interface and provides identical functionality to consumers, but has enhanced device initialization logic designed for nodes with pre-partitioned disks.
14
+
15
+
Both modules are interchangeable at the zbus level — other modules access storage via the same `StorageModuleStub` regardless of which variant is running.
16
+
17
+
## Differences from Storage
18
+
19
+
The key difference is in the **device initialization** phase during boot. The standard storage module treats each whole disk as a single btrfs pool. The light variant adds:
20
+
21
+
### 1. Partition-Aware Initialization
22
+
23
+
Instead of requiring whole disks, `storage_light` can work with individual partitions:
24
+
25
+
- Detects if a disk is already partitioned (has child partitions)
26
+
- Scans for unallocated space on partitioned disks using `parted`
27
+
- Creates new partitions in free space (minimum 5 GiB) for btrfs pools
28
+
- Refreshes device info after partition table changes
29
+
30
+
This allows ZOS to coexist with other operating systems or PXE boot partitions on the same disk.
31
+
32
+
### 2. PXE Partition Detection
33
+
34
+
Partitions labeled `ZOSPXE` are automatically skipped during initialization. This prevents the storage module from claiming boot partitions used for PXE network booting.
35
+
36
+
### 3. Enhanced Device Manager
37
+
38
+
The filesystem subpackage in `storage_light` extends the device manager with:
39
+
40
+
-`Children []DeviceInfo` field on `DeviceInfo` to track child partitions
41
+
-`UUID` field for btrfs filesystem identification
42
+
-`IsPartitioned()` method to check if a disk has child partitions
43
+
-`IsPXEPartition()` method to detect PXE boot partitions
44
+
-`GetUnallocatedSpaces()` method using `parted` to find free disk space
45
+
-`AllocateEmptySpace()` method to create partitions in free space
46
+
-`RefreshDeviceInfo()` method to reload device info after changes
47
+
-`ClearCache()` on the device manager interface for refreshing the device list
48
+
49
+
## Initialization Flow
50
+
51
+
The boot process is similar to the standard storage module but with added partition handling:
-**If whole disk (not partitioned)**: Create btrfs pool on the entire device (same as standard)
57
+
-**If partitioned**:
58
+
- Skip partitions labeled `ZOSPXE`
59
+
- Process existing partitions that have btrfs filesystems
60
+
- Scan for unallocated space using `parted`
61
+
- Create new partitions in free space >= 5 GiB
62
+
- Create btrfs pools on new partitions
63
+
- Mount pool, detect device type (SSD/HDD)
64
+
- Add to SSD or HDD pool arrays
65
+
4. Ensure cache exists (create if needed, start monitoring)
66
+
5. Shut down unused HDD pools
67
+
6. Start periodic disk power management
68
+
69
+
## When to Use Storage Light
70
+
71
+
Use `storage_light` instead of `storage` when:
72
+
73
+
- The node has disks with existing partition tables that must be preserved
74
+
- PXE boot partitions exist on the same disks
75
+
- The node dual-boots or shares disks with other systems
76
+
- Disks have been partially allocated and have free space that should be used
77
+
78
+
## Architecture
79
+
80
+
The overall architecture (pool types, mount points, cache management, volume/disk/device operations) is identical to the [standard storage module](../storage/readme.md). Refer to that document for details on:
Same as the [standard storage module](../storage/readme.md#interface). Both variants implement the same `StorageModule` interface defined in `pkg/storage.go`.
@@ -10,51 +10,129 @@ Storage module is available on zbus over the following channel
10
10
11
11
## Introduction
12
12
13
-
This module responsible to manage everything related with storage. On start, storaged holds ownership of all node disks, and it separate it into 2 different sets:
13
+
This module is responsible for managing everything related to storage. On start, storaged takes ownership of all node disks and separates them into two sets:
14
14
15
-
- SSD Storage: For each ssd disk available, a storage pool of type SSD is created
16
-
- HDD Storage: For each HDD disk available, a storage pool of type HDD is created
15
+
-**SSD pools**: One btrfs pool per SSD disk. Used for subvolumes (read-write layers), virtual disks (VM storage), and system cache.
16
+
-**HDD pools**: One btrfs pool per HDD disk. Used exclusively for 0-DB device allocation.
17
17
18
-
Then `storaged` can provide the following storage primitives:
19
-
-`subvolume`: (with quota). The btrfs subvolume can be used by used by `flistd` to support read-write operations on flists. Hence it can be used as rootfs for containers and VMs. This storage primitive is only supported on `ssd` pools.
20
-
- On boot, storaged will always create a permanent subvolume with id `zos-cache` (of 100G) which will be used by the system to persist state and to hold cache of downloaded files.
21
-
-`vdisk`: Virtual disk that can be attached to virtual machines. this is only possible on `ssd` pools.
22
-
-`device`: that is a full disk that gets allocated and used by a single `0-db` service. Note that a single 0-db instance can serve multiple zdb namespaces for multiple users. This is only possible for on `hdd` pools.
18
+
The module provides three storage primitives:
23
19
24
-
You already can tell that ZOS can work fine with no HDD (it will not be able to server zdb workloads though), but not without SSD. Hence a zos with no SSD will never register on the grid.
20
+
-**Subvolume** (with quota): A btrfs subvolume used by `flistd` to support read-write operations on flists. Used as rootfs for containers and VMs. Only created on SSD pools.
21
+
- On boot, a permanent subvolume `zos-cache` is always created (starting at 5 GiB) and bind-mounted at `/var/cache`. This volume holds system state and downloaded file caches.
22
+
-**VDisk** (virtual disk): A sparse file with Copy-on-Write disabled (`FS_NOCOW_FL`), used as block storage for virtual machines. Only created on SSD pools inside a `vdisks` subvolume.
23
+
-**Device**: A btrfs subvolume named `zdb` inside an HDD pool, allocated to a single 0-DB service. One 0-DB instance can serve multiple namespaces for multiple users. Only created on HDD pools.
25
24
26
-
List of sub-modules:
25
+
ZOS can operate without HDDs (it will not serve ZDB workloads), but not without SSDs. A node with no SSD will never register on the grid.
The module determines whether a disk is SSD or HDD using:
47
+
1. A `.seektime` file persisted at the pool root (survives reboots)
48
+
2. Fallback to the `seektime` tool or device rotational flag from lsblk
49
+
50
+
### Mount Points
51
+
52
+
| Resource | Path |
53
+
|----------|------|
54
+
| Pools |`/mnt/<pool-label>`|
55
+
| Cache |`/var/cache` (bind mount to `zos-cache` subvolume) |
56
+
| Volumes |`/mnt/<pool-label>/<volume-name>`|
57
+
| VDisks |`/mnt/<pool-label>/vdisks/<disk-id>`|
58
+
| Devices (0-DB) |`/mnt/<pool-label>/zdb`|
31
59
32
60
## On Node Booting
33
61
34
62
When the module boots:
35
63
36
-
- Make sure to mount all available pools
37
-
- Scan available disks that are not used by any pool and create new pools on those disks. (all pools now are created with `RaidSingle` policy)
38
-
- Try to find and mount a cache sub-volume under /var/cache.
39
-
- If no cache sub-volume is available a new one is created and then mounted.
64
+
1. Scans all available block devices using `lsblk`
65
+
2. For each device not already used by a pool, creates a new btrfs filesystem (all pools use `RaidSingle` policy)
66
+
3. Mounts all available pools
67
+
4. Detects device type (SSD/HDD) for each pool
68
+
5. Ensures a cache subvolume exists. If none is found, creates one on an SSD pool and bind-mounts it at `/var/cache`. Falls back to tmpfs if no SSD is available (sets `LimitedCache` flag)
69
+
6. Starts cache monitoring goroutine (checks every 5 minutes, auto-grows at 60% utilization, shrinks below 20%)
70
+
7. Shuts down and spins down unused HDD pools to save power
71
+
8. Starts periodic disk power management
40
72
41
73
### zinit unit
42
74
43
-
The zinit unit file of the module specify the command line, test command, and the order where the services need to be booted.
75
+
The zinit unit file specifies the command line, test command, and boot ordering.
44
76
45
-
Storage module is a dependency for almost all other system modules, hence it has high boot presidency (calculated on boot) by zinit based on the configuration.
77
+
Storage module is a dependency for almost all other system modules, hence it has high boot precedence (calculated on boot) by zinit based on the configuration.
46
78
47
-
The storage module is only considered running, if (and only if) the /var/cache is ready
79
+
The storage module is only considered running if (and only if) `/var/cache` is ready:
48
80
49
81
```yaml
50
82
exec: storaged
51
83
test: mountpoint /var/cache
52
84
```
53
85
54
-
### Interface
86
+
## Cache Management
55
87
56
-
```go
88
+
The system cache is a special btrfs subvolume (`zos-cache`) that stores persistent system state and downloaded files.
89
+
90
+
| Parameter | Value |
91
+
|-----------|-------|
92
+
| Initial size | 5 GiB |
93
+
| Check interval | 5 minutes |
94
+
| Grow threshold | 60% utilization |
95
+
| Shrink threshold | 20% utilization |
96
+
| Fallback | tmpfs (if no SSD available) |
97
+
98
+
## Pool Selection Policies
99
+
100
+
When creating volumes or disks, the module selects a pool using one of these policies:
101
+
102
+
- **SSD Only**: Only considers SSD pools (used for volumes and vdisks)
103
+
- **HDD Only**: Only considers HDD pools (used for 0-DB device allocation)
104
+
- **SSD First**: Prefers SSD pools, falls back to HDD
57
105
106
+
Mounted pools are always prioritized over unmounted ones to avoid unnecessary spin-ups.
107
+
108
+
## Error Handling
109
+
110
+
The module tracks two categories of failures:
111
+
112
+
- **Broken Pools**: Pools that fail to mount. Tracked and reported via `BrokenPools()`.
113
+
- **Broken Devices**: Devices that fail formatting, mounting, or type detection. Tracked and reported via `BrokenDevices()`.
114
+
115
+
These are exposed through the interface for monitoring and diagnostics.
116
+
117
+
## Thread Safety
118
+
119
+
All pool and volume operations are protected by a `sync.RWMutex`. Concurrent reads (lookups, listings) are allowed, while writes (create, delete, resize) are serialized.
120
+
121
+
## Consumers
122
+
123
+
Other modules access storage via zbus stubs:
124
+
125
+
| Consumer | Operations Used |
126
+
|----------|----------------|
127
+
| VM provisioner (`pkg/primitives/vm/`) | DiskCreate, DiskFormat, DiskWrite, DiskDelete |
0 commit comments