-
Notifications
You must be signed in to change notification settings - Fork 594
HDDS-14293. Improve Dynamic Property Reload documentation #9582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
23bb7e3
5f98a89
aae3c93
ccccd79
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,10 +1,10 @@ | ||
| --- | ||
| title: "Reconfigurability" | ||
| title: "Dynamic Property Reload" | ||
| weight: 11 | ||
| menu: | ||
| main: | ||
| parent: Features | ||
| summary: Dynamic reloading configuration. | ||
| summary: Dynamically reload configuration properties without restarting Ozone services. | ||
| --- | ||
| <!--- | ||
| Licensed to the Apache Software Foundation (ASF) under one or more | ||
|
|
@@ -23,93 +23,184 @@ summary: Dynamic reloading configuration. | |
| limitations under the License. | ||
| --> | ||
|
|
||
| Ozone supports dynamic loading of certain properties without restarting the service. | ||
| If a property is reconfigurable, you can modify it in the configuration file (`ozone-site.xml`) and then invoke the command to flush it to memory. | ||
| Ozone supports dynamic reloading of certain configuration properties without restarting services. This enables operators to tune cluster behavior, adjust limits, and update settings in production without service disruption. | ||
|
|
||
| ## Overview | ||
|
|
||
| When a property is marked as reconfigurable, you can: | ||
| 1. Modify the property value in the configuration file (`ozone-site.xml`) | ||
| 2. Invoke the reconfig command to apply the changes to the running service | ||
|
|
||
| The reconfiguration is performed asynchronously, and you can check the status to verify completion. | ||
|
|
||
| ## Command Reference | ||
|
|
||
| command: | ||
| ```shell | ||
| ozone admin reconfig --service=[OM|SCM|DATANODE] --address=<ip:port> start|status|properties | ||
| ozone admin reconfig --service=[OM|SCM|DATANODE] --address=<ip:port|hostname:port> <operation> | ||
| ``` | ||
|
|
||
| The meaning of command options: | ||
| - **--service**: The node type of the server specified with --address | ||
| - **--address**: RPC address for one server | ||
| - Three operations are provided: | ||
| - **start**: Execute the reconfig operation asynchronously | ||
| - **status**: Check reconfig status | ||
| - **properties**: List reconfigurable properties | ||
|
|
||
| ## Retrieve the reconfigurable properties list | ||
| To retrieve all the reconfigurable properties list for a specific component in Ozone, | ||
| you can use the command: `ozone admin reconfig --service=[OM|SCM|DATANODE] --address=<ip:port> properties`. | ||
| This command will list all the properties that can be dynamically reconfigured at runtime for specific component.<br> | ||
|
|
||
| > For example, get the Ozone OM reconfigurable properties list. | ||
| > | ||
| >$ `ozone admin reconfig --service=OM --address=hadoop1:9862 properties`<br> | ||
| OM: Node [hadoop1:9862] Reconfigurable properties:<br> | ||
| ### Options | ||
|
|
||
| | Option | Description | | ||
| |--------|-------------| | ||
| | `--service` | The service type: `OM`, `SCM`, or `DATANODE` | | ||
| | `--address` | RPC address of the target server (e.g., `hadoop1:9862` or `192.168.1.10:9862`). Required unless `--in-service-datanodes` is specified. | | ||
| | `--in-service-datanodes` | (DataNode only) Apply to all IN_SERVICE datanodes | | ||
|
|
||
| ### Operations | ||
|
|
||
| | Operation | Description | | ||
| |-----------|-------------| | ||
| | `start` | Execute reconfiguration asynchronously | | ||
| | `status` | Check the status of a reconfiguration task | | ||
| | `properties` | List all reconfigurable properties for the service | | ||
|
|
||
| ## Reconfigurable Properties Reference | ||
|
|
||
| ### Ozone Manager (OM) | ||
|
|
||
| | Property | Default | Description | | ||
| |----------|---------|-------------| | ||
| | `ozone.administrators` | - | Comma-separated list of Ozone administrators | | ||
| | `ozone.readonly.administrators` | - | Comma-separated list of read-only administrators | | ||
| | `ozone.om.server.list.max.size` | `1000` | Maximum server-side response size for list operations | | ||
| | `ozone.om.volume.listall.allowed` | `true` | Allow all users to list all volumes | | ||
| | `ozone.om.follower.read.local.lease.enabled` | `false` | Enable local lease for follower read optimization | | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. added recently by HDDS-13954 |
||
| | `ozone.om.follower.read.local.lease.lag.limit` | `10000` | Maximum log lag for follower reads | | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. added recently by HDDS-13954 |
||
| | `ozone.om.follower.read.local.lease.time.ms` | `5000` | Lease time in milliseconds for follower reads | | ||
| | `ozone.key.deleting.limit.per.task` | `50000` | Maximum keys to delete per task | | ||
| | `ozone.directory.deleting.service.interval` | `60s` | Directory deletion service run interval | | ||
| | `ozone.thread.number.dir.deletion` | `10` | Number of threads for directory deletion | | ||
| | `ozone.snapshot.filtering.service.interval` | `60s` | Snapshot SST filtering service run interval | | ||
|
|
||
| ### Storage Container Manager (SCM) | ||
vyalamar marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| | Property | Default | Description | | ||
| |----------|---------|-------------| | ||
| | `ozone.administrators` | - | Comma-separated list of Ozone administrators | | ||
| | `ozone.readonly.administrators` | - | Comma-separated list of read-only administrators | | ||
| | `hdds.scm.block.deletion.per-interval.max` | `500000` | Maximum blocks SCM processes per deletion interval | | ||
| | `hdds.scm.replication.thread.interval` | `300s` | Interval for the replication monitor thread | | ||
| | `hdds.scm.replication.under.replicated.interval` | `30s` | Frequency to check the under-replicated queue | | ||
| | `hdds.scm.replication.over.replicated.interval` | `30s` | Frequency to check the over-replicated queue | | ||
| | `hdds.scm.replication.event.timeout` | `12m` | Timeout for replication/deletion commands | | ||
| | `hdds.scm.replication.event.timeout.datanode.offset` | `6m` | Offset subtracted from event timeout for datanode deadline | | ||
| | `hdds.scm.replication.maintenance.replica.minimum` | `2` | Minimum replicas required for node maintenance | | ||
| | `hdds.scm.replication.maintenance.remaining.redundancy` | `1` | Remaining redundancy required for maintenance (EC) | | ||
| | `hdds.scm.replication.datanode.replication.limit` | `20` | Max replication commands queued per datanode | | ||
| | `hdds.scm.replication.datanode.reconstruction.weight` | `3` | Weight multiplier for reconstruction commands | | ||
| | `hdds.scm.replication.datanode.delete.container.limit` | `40` | Max delete container commands queued per datanode | | ||
| | `hdds.scm.replication.inflight.limit.factor` | `0.75` | Factor to scale cluster-wide replication limit | | ||
| | `hdds.scm.replication.container.sample.limit` | `100` | Number of containers sampled per state for debugging | | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. added a few days ago in the https://issues.apache.org/jira/browse/HDDS-5713 disk balancer branch. |
||
| | `ozone.scm.ec.pipeline.minimum` | `5` | Minimum EC pipelines to keep open | | ||
| | `ozone.scm.ec.pipeline.per.volume.factor` | `1` | Factor for calculating EC pipelines based on volumes | | ||
|
|
||
| ### DataNode | ||
vyalamar marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| | Property | Default | Description | | ||
| |----------|---------|-------------| | ||
| | `hdds.datanode.block.deleting.limit.per.interval` | `20000` | Maximum blocks deleted per interval on a datanode | | ||
| | `hdds.datanode.block.delete.threads.max` | `5` | Maximum threads for block deletion | | ||
| | `ozone.block.deleting.service.workers` | `10` | Number of block deletion service workers | | ||
| | `ozone.block.deleting.service.interval` | `60s` | Block deletion service run interval | | ||
| | `ozone.block.deleting.service.timeout` | `300s` | Block deletion service timeout | | ||
| | `hdds.datanode.replication.streams.limit` | `10` | Maximum replication streams per datanode | | ||
|
|
||
| ## Usage Examples | ||
|
|
||
| ### List Reconfigurable Properties | ||
|
|
||
| To view all properties that can be dynamically reconfigured: | ||
|
|
||
| ```shell | ||
| $ ozone admin reconfig --service=OM --address=hadoop1:9862 properties | ||
| OM: Node [hadoop1:9862] Reconfigurable properties: | ||
| ozone.administrators | ||
| ozone.om.server.list.max.size | ||
| ozone.om.volume.listall.allowed | ||
| ozone.om.follower.read.local.lease.enabled | ||
| ozone.om.follower.read.local.lease.lag.limit | ||
| ozone.om.follower.read.local.lease.time.ms | ||
| ``` | ||
|
|
||
| ### OM Reconfiguration Example | ||
|
|
||
| ## OM Reconfigurability | ||
| >For example, modify `ozone.administrators` in ozone-site.xml and execute: | ||
| > | ||
| > $ `ozone admin reconfig --service=OM --address=hadoop1:9862 start`<br> | ||
| OM: Started OM reconfiguration task on node [hadoop1:9862]. | ||
| > | ||
| >$ `ozone admin reconfig --service=OM --address=hadoop1:9862 status`<br> | ||
| OM: Reconfiguring status for node [hadoop1:9862]: started at Wed Dec 28 19:04:44 CST 2022 and finished at Wed Dec 28 19:04:44 CST 2022.<br> | ||
| SUCCESS: Changed property ozone.administrators<br> | ||
| From: "hadoop"<br> | ||
| Modify `ozone.administrators` in `ozone-site.xml`, then execute: | ||
|
|
||
| ```shell | ||
| $ ozone admin reconfig --service=OM --address=hadoop1:9862 start | ||
| OM: Started reconfiguration task on node [hadoop1:9862]. | ||
|
|
||
| $ ozone admin reconfig --service=OM --address=hadoop1:9862 status | ||
| OM: Reconfiguring status for node [hadoop1:9862]: started at Wed Dec 28 19:04:44 CST 2022 and finished at Wed Dec 28 19:04:44 CST 2022. | ||
| SUCCESS: Changed property ozone.administrators | ||
| From: "hadoop" | ||
| To: "hadoop,bigdata" | ||
| > | ||
| > $ `ozone admin reconfig --service=OM -address=hadoop1:9862 properties`<br> | ||
| OM: Node [hadoop1:9862] Reconfigurable properties:<br> | ||
| ozone.administrators | ||
| ``` | ||
|
|
||
| ### SCM Reconfiguration Example | ||
|
|
||
| ## SCM Reconfigurability | ||
| >For example, modify `ozone.administrators` in ozone-site.xml and execute: | ||
| > | ||
| > $ `ozone admin reconfig --service=SCM --address=hadoop1:9860 start`<br> | ||
| SCM: Started OM reconfiguration task on node [hadoop1:9860]. | ||
| > | ||
| >$ `ozone admin reconfig --service=SCM --address=hadoop1:9860 status`<br> | ||
| SCM: Reconfiguring status for node [hadoop1:9860]: started at Wed Dec 28 19:04:44 CST 2022 and finished at Wed Dec 28 19:04:44 CST 2022.<br> | ||
| SUCCESS: Changed property ozone.administrators<br> | ||
| From: "hadoop"<br> | ||
| Modify `ozone.administrators` in `ozone-site.xml`, then execute: | ||
|
|
||
| ```shell | ||
| $ ozone admin reconfig --service=SCM --address=hadoop1:9860 start | ||
| SCM: Started reconfiguration task on node [hadoop1:9860]. | ||
|
|
||
| $ ozone admin reconfig --service=SCM --address=hadoop1:9860 status | ||
| SCM: Reconfiguring status for node [hadoop1:9860]: started at Wed Dec 28 19:04:44 CST 2022 and finished at Wed Dec 28 19:04:44 CST 2022. | ||
| SUCCESS: Changed property ozone.administrators | ||
| From: "hadoop" | ||
| To: "hadoop,bigdata" | ||
| > | ||
| > $ `ozone admin reconfig --service=SCM -address=hadoop1:9860 properties`<br> | ||
| SCM: Node [hadoop1:9860] Reconfigurable properties:<br> | ||
| ozone.administrators | ||
| ``` | ||
|
|
||
| ### DataNode Reconfiguration Example | ||
|
|
||
| Modify `hdds.datanode.block.deleting.limit.per.interval` in `ozone-site.xml`, then execute: | ||
|
|
||
| ```shell | ||
| $ ozone admin reconfig --service=DATANODE --address=hadoop1:19864 start | ||
| Datanode: Started reconfiguration task on node [hadoop1:19864]. | ||
|
|
||
| $ ozone admin reconfig --service=DATANODE --address=hadoop1:19864 status | ||
| Datanode: Reconfiguring status for node [hadoop1:19864]: started at Wed Dec 28 19:04:44 CST 2022 and finished at Wed Dec 28 19:04:44 CST 2022. | ||
| SUCCESS: Changed property hdds.datanode.block.deleting.limit.per.interval | ||
| From: "20000" | ||
| To: "30000" | ||
| ``` | ||
|
|
||
| ### Batch Operations (DataNode Only) | ||
|
|
||
| ## Datanode Reconfigurability | ||
| >For example, modify `ozone.example.config` in ozone-site.xml and execute: | ||
| > | ||
| > $ `ozone admin reconfig --service=DATANODE --address=hadoop1:19864 start`<br> | ||
| To perform reconfiguration on all IN_SERVICE datanodes simultaneously: | ||
|
|
||
| ```shell | ||
| $ ozone admin reconfig --service=DATANODE --in-service-datanodes start | ||
| Datanode: Started reconfiguration task on node [hadoop1:19864]. | ||
| > | ||
| >$ `ozone admin reconfig --service=DATANODE --address=hadoop1:19864 status`<br> | ||
| Datanode: Reconfiguring status for node [hadoop1:19864]: started at Wed Dec 28 19:04:44 CST 2022 and finished at Wed Dec 28 19:04:44 CST 2022.<br> | ||
| SUCCESS: Changed property ozone.example.config<br> | ||
| From: "old"<br> | ||
| To: "new" | ||
| > | ||
| > $ `ozone admin reconfig --service=DATANODE -address=hadoop1:19864 properties`<br> | ||
| Datanode: Node [hadoop1:19864] Reconfigurable properties:<br> | ||
| ozone.example.config | ||
|
|
||
| ### Batch operation | ||
| If you want to perform a batch operations on the Datanode, you can set the `--in-service-datanodes` flag. | ||
| This will send reconfiguration requests to all available DataNodes in the `IN_SERVICE`operational state.<br> | ||
| Currently, only Datanode supports batch operations<br> | ||
|
|
||
|
|
||
| >For example, to list the reconfigurable properties of all Datanodes:<br> | ||
| > $ `ozone admin reconfig --service=DATANODE --in-service-datanodes properties`<br> | ||
| Datanode: Node [hadoop1:19864] Reconfigurable properties:<br> | ||
| ozone.example.config<br> | ||
| Datanode: Node [hadoop2:19864] Reconfigurable properties:<br> | ||
| ozone.example.config<br> | ||
| Datanode: Node [hadoop3:19864] Reconfigurable properties:<br> | ||
| ozone.example.config<br> | ||
| Reconfig successfully 3 nodes, failure 0 nodes.<br> | ||
| Datanode: Started reconfiguration task on node [hadoop2:19864]. | ||
| Datanode: Started reconfiguration task on node [hadoop3:19864]. | ||
| Reconfig successfully 3 nodes, failure 0 nodes. | ||
| ``` | ||
|
|
||
| To list properties across all datanodes: | ||
|
|
||
| ```shell | ||
| $ ozone admin reconfig --service=DATANODE --in-service-datanodes properties | ||
| DN: Node [hadoop1:19864] Reconfigurable properties: | ||
| hdds.datanode.block.deleting.limit.per.interval | ||
| Datanode: Node [hadoop2:19864] Reconfigurable properties: | ||
| hdds.datanode.block.deleting.limit.per.interval | ||
| Datanode: Node [hadoop3:19864] Reconfigurable properties: | ||
| hdds.datanode.block.deleting.limit.per.interval | ||
| Reconfig successfully 3 nodes, failure 0 nodes. | ||
| ``` | ||
|
|
||
| ## Best Practices | ||
|
|
||
| 1. **Test in non-production first**: Always validate configuration changes in a test environment before applying to production. | ||
|
|
||
| 2. **Change one property at a time**: When making multiple changes, apply them incrementally to isolate the impact of each change. | ||
|
|
||
| 3. **Monitor after changes**: Watch cluster metrics and logs after reconfiguration to ensure the changes have the desired effect. | ||
|
|
||
| 4. **Document changes**: Keep a record of configuration changes for troubleshooting and audit purposes. | ||
|
|
||
| 5. **Use batch operations carefully**: When using `--in-service-datanodes`, ensure all nodes should receive the same configuration. | ||
Uh oh!
There was an error while loading. Please reload this page.