Skip to content

Commit a5cb560

Browse files
authored
HDDS-12196. Document ozone repair cli (apache#8849)
1 parent 0fd649e commit a5cb560

File tree

1 file changed

+252
-0
lines changed

1 file changed

+252
-0
lines changed
Lines changed: 252 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,252 @@
1+
---
2+
title: "Ozone Repair"
3+
date: 2025-07-22
4+
summary: Advanced tool to repair Ozone.
5+
---
6+
<!---
7+
Licensed to the Apache Software Foundation (ASF) under one or more
8+
contributor license agreements. See the NOTICE file distributed with
9+
this work for additional information regarding copyright ownership.
10+
The ASF licenses this file to You under the Apache License, Version 2.0
11+
(the "License"); you may not use this file except in compliance with
12+
the License. You may obtain a copy of the License at
13+
14+
http://www.apache.org/licenses/LICENSE-2.0
15+
16+
Unless required by applicable law or agreed to in writing, software
17+
distributed under the License is distributed on an "AS IS" BASIS,
18+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
19+
See the License for the specific language governing permissions and
20+
limitations under the License.
21+
-->
22+
23+
Ozone Repair (`ozone repair`) is an advanced tool to repair Ozone. The nodes being repaired must be stopped before the tool is run.
24+
Note: All repair commands support a `--dry-run` option which allows a user to see what repair the command will be performing without actually making any changes to the cluster.
25+
Use the `--force` flag to override the running service check in false-positive cases.
26+
27+
```bash
28+
Usage: ozone repair [-hV] [--verbose] [-conf=<configurationPath>]
29+
[-D=<String=String>]... [COMMAND]
30+
Advanced tool to repair Ozone. The nodes being repaired must be stopped before
31+
the tool is run.
32+
-conf=<configurationPath>
33+
34+
-D, --set=<String=String>
35+
36+
-h, --help Show this help message and exit.
37+
-V, --version Print version information and exit.
38+
--verbose More verbose output. Show the stack trace of the errors.
39+
Commands:
40+
datanode Tools to repair Datanode
41+
ldb Operational tool to repair ldb.
42+
om Operational tool to repair OM.
43+
scm Operational tool to repair SCM.
44+
```
45+
For more detailed usage see the output of `--help` for each of the subcommands.
46+
47+
## ozone repair datanode
48+
Operational tool to repair datanode.
49+
50+
### upgrade-container-schema
51+
Upgrade all schema V2 containers to schema V3 for a datanode in offline mode.
52+
Optionally takes `--volume` option to specify which volume needs the upgrade.
53+
54+
## ozone repair ldb
55+
Operational tool to repair ldb.
56+
57+
### compact
58+
Compact a column family in the DB to clean up tombstones while the service is offline.
59+
```bash
60+
Usage: ozone repair ldb compact [-hV] [--dry-run] [--force] [--verbose]
61+
--cf=<columnFamilyName> --db=<dbPath>
62+
CLI to compact a column-family in the DB while the service is offline.
63+
Note: If om.db is compacted with this tool then it will negatively impact the
64+
Ozone Manager\'s efficient snapshot diff.
65+
--cf, --column-family, --column_family=<columnFamilyName>
66+
Column family name
67+
--db=<dbPath> Database File Path
68+
```
69+
70+
## ozone repair om
71+
Operational tool to repair OM.
72+
73+
#### Subcommands under OM
74+
- fso-tree
75+
- snapshot
76+
- update-transaction
77+
- quota
78+
- compact
79+
- skip-ratis-transaction
80+
81+
### fso-tree
82+
Identify and repair a disconnected FSO tree by marking unreferenced entries for deletion.
83+
Reports the reachable, unreachable (pending delete) and unreferenced (orphaned) directories and files.
84+
OM should be stopped while this tool is run.
85+
```bash
86+
Usage: ozone repair om fso-tree [-hV] [--dry-run] [--force] [--verbose]
87+
[-b=<bucketFilter>] --db=<omDBPath>
88+
[-v=<volumeFilter>]
89+
Identify and repair a disconnected FSO tree by marking unreferenced entries for
90+
deletion. OM should be stopped while this tool is run.
91+
-b, --bucket=<bucketFilter>
92+
Filter by bucket name
93+
--db=<omDBPath> Path to OM RocksDB
94+
-v, --volume=<volumeFilter>
95+
Filter by volume name. Add '/' before the volume name.
96+
```
97+
98+
### snapshot
99+
Subcommand for all snapshot related repairs.
100+
101+
#### chain
102+
Update global and path previous snapshot for a snapshot in case snapshot chain is corrupted.
103+
```bash
104+
Usage: ozone repair om snapshot chain [-hV] [--dry-run] [--force] [--verbose]
105+
--db=<dbPath>
106+
--gp=<globalPreviousSnapshotId>
107+
--pp=<pathPreviousSnapshotId> <value>
108+
<snapshotName>
109+
CLI to update global and path previous snapshot for a snapshot in case snapshot
110+
chain is corrupted.
111+
<value> URI of the bucket (format: volume/bucket).
112+
<snapshotName> Snapshot name to update
113+
--db=<dbPath> Database File Path
114+
--gp, --global-previous=<globalPreviousSnapshotId>
115+
Global previous snapshotId to set for the given snapshot
116+
--pp, --path-previous=<pathPreviousSnapshotId>
117+
Path previous snapshotId to set for the given snapshot
118+
```
119+
120+
### update-transaction
121+
To avoid modifying Ratis logs and only update the latest applied transaction, use `update-transaction` command.
122+
This updates the highest transaction index in the OM transaction info table.
123+
```bash
124+
Usage: ozone repair om update-transaction [-hV] [--dry-run] [--force]
125+
[--verbose] --db=<dbPath> --index=<highestTransactionIndex>
126+
--term=<highestTransactionTerm>
127+
CLI to update the highest index in transaction info table.
128+
--db=<dbPath> Database File Path
129+
--index=<highestTransactionIndex>
130+
Highest index to set. The input should be non-zero long
131+
integer.
132+
--term=<highestTransactionTerm>
133+
Highest term to set. The input should be non-zero long
134+
integer.
135+
```
136+
137+
### quota
138+
Operational tool to repair quota in OM DB.
139+
140+
#### start
141+
To trigger quota repair use the `start` command.
142+
```bash
143+
Usage: ozone repair om quota start [-hV] [--dry-run] [--force] [--verbose]
144+
[--buckets=<buckets>]
145+
[--service-host=<omHost>]
146+
[--service-id=<omServiceId>]
147+
CLI to trigger quota repair.
148+
--buckets=<buckets> start quota repair for specific buckets. Input will
149+
be list of uri separated by comma as
150+
/<volume>/<bucket>[,...]
151+
--service-host=<omHost>
152+
Ozone Manager Host. If OM HA is enabled, use
153+
--service-id instead. If you must use
154+
--service-host with OM HA, this must point
155+
directly to the leader OM. This option is
156+
required when --service-id is not provided or
157+
when HA is not enabled.
158+
--service-id, --om-service-id=<omServiceId>
159+
Ozone Manager Service ID
160+
```
161+
162+
#### status
163+
Get the status of last triggered quota repair.
164+
```bash
165+
Usage: ozone repair om quota status [-hV] [--verbose] [--service-host=<omHost>]
166+
[--service-id=<omServiceId>]
167+
CLI to get the status of last trigger quota repair if available.
168+
--service-host=<omHost>
169+
Ozone Manager Host. If OM HA is enabled, use --service-id
170+
instead. If you must use --service-host with OM HA, this
171+
must point directly to the leader OM. This option is
172+
required when --service-id is not provided or when HA is
173+
not enabled.
174+
--service-id, --om-service-id=<omServiceId>
175+
Ozone Manager Service ID
176+
```
177+
178+
### compact
179+
Compact a column family in the OM DB to clean up tombstones. The compaction happens asynchronously. Requires admin privileges.
180+
```bash
181+
Usage: ozone repair om compact [-hV] [--dry-run] [--force] [--verbose]
182+
--cf=<columnFamilyName> [--node-id=<nodeId>]
183+
[--service-id=<omServiceId>]
184+
CLI to compact a column family in the om.db. The compaction happens
185+
asynchronously. Requires admin privileges.
186+
--cf, --column-family, --column_family=<columnFamilyName>
187+
Column family name
188+
--node-id=<nodeId> NodeID of the OM for which db needs to be compacted.
189+
--service-id, --om-service-id=<omServiceId>
190+
Ozone Manager Service ID
191+
```
192+
193+
### skip-ratis-transaction, srt
194+
Omit a raft log in a ratis segment file by replacing the specified index with a dummy EchoOM command.
195+
This is an offline tool meant to be used only when all 3 OMs crash on the same transaction.
196+
If the issue is isolated to one OM, manually copy the DB from a healthy OM instead.
197+
```bash
198+
Usage: ozone repair om skip-ratis-transaction [-hV] [--dry-run] [--force]
199+
[--verbose] -b=<backupDir> --index=<index> (-s=<segmentFile> |
200+
-d=<logDir>)
201+
CLI to omit a raft log in a ratis segment file. The raft log at the index
202+
specified is replaced with an EchoOM command (which is a dummy command). It is
203+
an offline command i.e., doesn\'t require OM to be running. The command should
204+
be run for the same transaction on all 3 OMs only when all the OMs are crashing
205+
while applying the same transaction. If only one OM is crashing and the other
206+
OMs have executed the log successfully, then the DB should be manually copied
207+
from one of the good OMs to the crashing OM instead.
208+
-b, --backup=<backupDir> Directory to put the backup of the original
209+
repaired segment file before the repair.
210+
-d, --ratis-log-dir=<logDir>
211+
Path of the ratis log directory
212+
--index=<index> Index of the failing transaction that should be
213+
removed
214+
-s, --segment-path=<segmentFile>
215+
Path of the input segment file
216+
```
217+
218+
## ozone repair scm
219+
Operational tool to repair SCM.
220+
221+
#### Subcommands under SCM
222+
- cert
223+
- update-transaction
224+
225+
### cert
226+
Subcommand for all certificate related repairs on SCM
227+
228+
#### recover
229+
Recover Deleted SCM Certificate from RocksDB
230+
```bash
231+
Usage: ozone repair scm cert recover [-hV] [--dry-run] [--force] [--verbose]
232+
--db=<dbPath>
233+
Recover Deleted SCM Certificate from RocksDB
234+
--db=<dbPath> SCM DB Path
235+
```
236+
237+
### update-transaction
238+
To avoid modifying Ratis logs and only update the latest applied transaction, use `update-transaction` command.
239+
This updates the highest transaction index in the SCM transaction info table.
240+
```bash
241+
Usage: ozone repair scm update-transaction [-hV] [--dry-run] [--force]
242+
[--verbose] --db=<dbPath> --index=<highestTransactionIndex>
243+
--term=<highestTransactionTerm>
244+
CLI to update the highest index in transaction info table.
245+
--db=<dbPath> Database File Path
246+
--index=<highestTransactionIndex>
247+
Highest index to set. The input should be non-zero long
248+
integer.
249+
--term=<highestTransactionTerm>
250+
Highest term to set. The input should be non-zero long
251+
integer.
252+
```

0 commit comments

Comments
 (0)