-
Notifications
You must be signed in to change notification settings - Fork 338
DAOS-18387 test: recovery/ddb.py test_recovery_ddb_ls MD-on-SSD Support #17332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
To support MD-on-SSD for ddb, we need to support two commands. ddb prov_mem and ddb ls with --db_path. Update ddb_utils.py to support the new commands. Add check_ram_used in recovery_utils.py to detect whether the system is MD-on-SSD. Update test_recovery_ddb_ls to support MD-on-SSD with the new ddb commands. We need to update the test yaml to run on MD-on-SSD/HW Medium, but that will break other tests in ddb.py because they don't support MD-on-SSD yet. Keep the original tests as ddb_pmem.py and ddb_pmem.yaml and keep running them on VM (except test_recovery_ddb_ls because that's updated in this PR). Skip-unit-tests: true Skip-fault-injection-test: true Skip-func-hw-test-medium: false Test-tag: test_recovery_ddb_ls DdbPMEMTest Signed-off-by: Makito Kano <[email protected]>
|
Ticket title is 'CR Test Update - recovery/ddb.py test_recovery_ddb_ls MD-on-SSD Support' |
Skip-unit-tests: true Skip-fault-injection-test: true Skip-func-hw-test-medium: false Test-tag: test_recovery_ddb_ls DdbPMEMTest Signed-off-by: Makito Kano <[email protected]>
|
Test stage Functional on EL 8.8 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos/job/PR-17332/1/display/redirect |
Skip-unit-tests: true Skip-fault-injection-test: true Skip-func-hw-test-medium: false Test-tag: test_recovery_ddb_ls DdbPMEMTest Signed-off-by: Makito Kano <[email protected]>
Skip-unit-tests: true Skip-fault-injection-test: true Skip-func-hw-test-medium: false Test-tag: test_recovery_ddb_ls DdbPMEMTest Signed-off-by: Makito Kano <[email protected]>
Skip-unit-tests: true Skip-fault-injection-test: true Skip-func-hw-test-medium: false Test-tag: test_recovery_ddb_ls DdbPMEMTest Signed-off-by: Makito Kano <[email protected]>
|
Test stage Functional Hardware Medium completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17332/5/execution/node/857/log |
|
Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17332/5/execution/node/898/log |
Skip-unit-tests: true Skip-fault-injection-test: true Skip-func-hw-test-medium: false Test-tag: test_recovery_ddb_ls DdbPMEMTest
Skip-unit-tests: true Skip-fault-injection-test: true Skip-func-hw-test-medium: false Test-tag: test_recovery_ddb_ls DdbPMEMTest Signed-off-by: Makito Kano <[email protected]>
Skip-unit-tests: true Skip-fault-injection-test: true Skip-func-hw-test-medium: false Test-tag: test_recovery_ddb_ls DdbPMEMTest Signed-off-by: Makito Kano <[email protected]>
|
@phender @dinghwah I have two questions:
Thanks. |
| md_on_ssd = check_ram_used(server_manager=self.server_managers[0], log=self.log) | ||
| if md_on_ssd: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have a DaosServerManager method to determine if we're using MD on SSD:
| md_on_ssd = check_ram_used(server_manager=self.server_managers[0], log=self.log) | |
| if md_on_ssd: | |
| md_on_ssd = self.server_managers[0].manager.job.using_control_metadata | |
| if md_on_ssd: |
| return policies | ||
|
|
||
|
|
||
| def check_ram_used(server_manager, log): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're just using this to determine if we are using MD on SSD, we already have self.server_managers[0].manager.job.using_control_metadata.
| if md_on_ssd: | ||
| self.log_step(f"MD-on-SSD: Load pool dir to {daos_load_path}") | ||
| db_path = os.path.join( | ||
| self.log_dir, "control_metadata", "daos_control", "engine0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the control metadata path be obtained via self.server_managers[0].job.yaml.metadata_params.path.value?
| return self.run() | ||
|
|
||
| def prov_mem(self, db_path, tmpfs_mount): | ||
| """Call ddb "" prov_mem <db_path> <tmpfs_mount>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we always calling this command with an empty vos_path, or is specific to the only test currently using this command?
| command_result = run_remote( | ||
| log=self.log, hosts=self.hostlist_servers, command=command_root).passed | ||
| if not command_result: | ||
| self.fail(f"{command} failed!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want, we can also report on which host(s) the command failed:
| command_result = run_remote( | |
| log=self.log, hosts=self.hostlist_servers, command=command_root).passed | |
| if not command_result: | |
| self.fail(f"{command} failed!") | |
| result = run_remote( | |
| log=self.log, hosts=self.hostlist_servers, command=command_root) | |
| if not result.passed: | |
| self.fail(f"{command} failed on {result.failed_hosts}!") |
| Args: | ||
| remote_file_path (str): File path to copy to local. | ||
| test_dir (str): Test directory. Usually self.test_dir. | ||
| remote (str): Remote hostname to copy file from. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_clush_command requires a NodeSet.
| remote (str): Remote hostname to copy file from. | |
| remote (NodeSet): Remote hostname to copy file from. |
| f"ERROR: Copying {remote_file_path} from {remote}: {error}") from error | ||
|
|
||
| # Remove the appended .<server_hostname> from the copied file. | ||
| current_file_path = "".join([remote_file_path, ".", remote]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be a problem if there are multiple hosts specified in the remote argument. If the test is only going to using one remote host, is clush rcopy even needed?
To handle multiple hosts, this function could just return the paths of the copied files (with the hostname extension) and the caller could loop over the list of files to process them.
To support MD-on-SSD for ddb, we need to support two commands. ddb prov_mem and ddb ls with --db_path.
Update ddb_utils.py to support the new commands.
Add check_ram_used in recovery_utils.py to detect whether the system is MD-on-SSD.
Update test_recovery_ddb_ls to support MD-on-SSD with the new ddb commands.
We need to update the test yaml to run on MD-on-SSD/HW Medium, but that will break other tests in ddb.py because they don't support MD-on-SSD yet. Keep the original tests as ddb_pmem.py and ddb_pmem.yaml and keep running them on VM (except test_recovery_ddb_ls because that's updated in this PR).
Skip-unit-tests: true
Skip-fault-injection-test: true
Skip-func-hw-test-medium: false
Test-tag: test_recovery_ddb_ls DdbPMEMTest
Steps for the author:
After all prior steps are complete: