Skip to content

Conversation

@tanabarr
Copy link
Contributor

@tanabarr tanabarr commented Jan 8, 2026

Set NVMe power management values for SSDs by setting the new
engine DAOS_NVME_POWER_MGMT environment variable to an integer
normally (0-4). Value will be applied by SPDK on devices attached to
an engine process. The value will not be reset on engine exit.

Steps for the author:

  • Commit message follows the guidelines.
  • Appropriate Features or Test-tag pragmas were used.
  • Appropriate Functional Test Stages were run.
  • At least two positive code reviews including at least one code owner from each category referenced in the PR.
  • Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

  • Gatekeeper requested (daos-gatekeeper added as a reviewer).

@tanabarr tanabarr self-assigned this Jan 8, 2026
@tanabarr tanabarr requested review from a team as code owners January 8, 2026 12:25
@github-actions
Copy link

github-actions bot commented Jan 8, 2026

Ticket title is 'NVMe power control feature'
Status is 'In Progress'
https://daosio.atlassian.net/browse/DAOS-18431

@Michael-Hennecke Michael-Hennecke self-requested a review January 8, 2026 12:51
@daosbuild3
Copy link
Collaborator

@tanabarr tanabarr force-pushed the tanabarr/bio-powmanage-patch branch from 5e0062d to 0be43f7 Compare January 8, 2026 17:26
@tanabarr tanabarr requested review from NiuYawei and kjacque January 8, 2026 17:35
@tanabarr tanabarr force-pushed the tanabarr/bio-powmanage-patch branch from 0be43f7 to 8b31c49 Compare January 8, 2026 17:38
Copy link
Contributor

@kjacque kjacque left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks reasonable. Just trying to understand the right way to interact with the spdk stuff.

If we're letting users configure this generally, it might be good to add an explicit parameter to the config file with more intuitively named values, based on whatever the 0-4 stand for. Not necessary in this PR, but something to think about.

Comment on lines +1110 to +1114
if (get_bdev_type(bdev) != BDEV_CLASS_NVME) {
D_DEBUG(DB_MGMT, "Device %s is not NVMe, skipping power management\n",
d_bdev->bb_name);
return 0;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like returning an error like DER_NOTSUPPORTED might be meaningful here. Are we calling this for simulated devices?

set_power_mgmt_completion(struct spdk_bdev_io *bdev_io, bool success, void *cb_arg)
{
struct bio_bdev *d_bdev = cb_arg;
int sc, sct;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - in our coding guidelines we mention defining 1 variable per line.

D_ASSERT(channel != NULL);
bb->bb_dev_health.bdh_io_channel = channel;

/* Set NVMe power management to 0x1 */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor - looks like the value is set by an environment variable.

Suggested change
/* Set NVMe power management to 0x1 */
/* Set up NVMe power management */

memset(&cmd, 0, sizeof(cmd));
cmd.opc = SPDK_NVME_OPC_SET_FEATURES;
cmd.cdw10 = SPDK_NVME_FEAT_POWER_MANAGEMENT;
cmd.cdw11 = bio_spdk_power_mgmt_val;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

union spdk_nvme_cmd_cdw11 has a member union spdk_nvme_feat_power_management feat_power_management that has multiple fields, not just the power state. So I think we should be using that to make sure we're setting the value correctly.

IMO we should also check the value taken from the env variable is in an acceptable range, preferably when we first ingest it. I had some trouble finding definitions for the power state values. Does SPDK define those somewhere?

Comment on lines +275 to +278
d_getenv_uint("DAOS_NVME_POWER_MGMT", &bio_spdk_power_mgmt_val);
if (bio_spdk_power_mgmt_val != UINT32_MAX)
D_INFO("NVMe power management setting to be applied is %u\n",
bio_spdk_power_mgmt_val);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I understand the flow correctly:

  1. Environment variable is set (probably in the daos_server config file).
  2. Daos_engine comes up and this function saves off the environment variable value.
  3. In bio_init_health_monitoring, we apply this value, or do nothing if it was unset.

Do we know for sure that the value doesn't persist in the drives between restarts of the daos_engine? Obviously the engine is not maintaining the value, I understand that, but do the drives reset this kind of thing themselves every time an engine restarts?

I imagine a case where someone stops the system, unsets the environment variable, and restarts. Do the drives have the same power state as before the restart, or do they auto-reset to the default without DAOS code doing anything? If not the latter, maybe we should define a sane default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants