-
Notifications
You must be signed in to change notification settings - Fork 338
DAOS-18347 control: Add rebuild states to pool query #17322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Ticket title is 'Rebuild state reported in pool query human-readable output needs refinement' |
|
Test stage Build on Leap 15.5 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17322/1/execution/node/301/log |
|
Test stage Build on EL 9.6 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17322/1/execution/node/309/log |
|
Test stage Build on EL 8.8 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17322/1/execution/node/317/log |
|
Test stage Build on Leap 15.5 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17322/1/execution/node/405/log |
Features: control Signed-off-by: Tom Nabarro <[email protected]>
6eae49c to
61cc7aa
Compare
|
Test stage Build on Leap 15.5 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17322/2/execution/node/301/log |
|
Test stage Build on EL 8.8 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17322/2/execution/node/317/log |
|
Test stage Build on EL 9.6 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17322/2/execution/node/311/log |
|
Test stage Build on Leap 15.5 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17322/2/execution/node/405/log |
Features: control Signed-off-by: Tom Nabarro <[email protected]>
coverage Features: control Signed-off-by: Tom Nabarro <[email protected]>
…build-states Features: control Signed-off-by: Tom Nabarro <[email protected]>
|
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-17322/5/testReport/ |
kccain
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought I had submitted this feedback earlier, but github still shows it as pending. Trying again. Sorry for the inadvertent delay.
| FAILING = 5; | ||
| FAILED = 6; | ||
| } | ||
| State state = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid re-assigning the official status/state before output of human-readable or JSON (i.e., to always have alignment with what a libdaos API caller would see), a solution could be to add here a derived_state used only by the tools. Both human-readable and JSON output could consistently show all 3 values (e.g., "derived_state (state, status)".
Regardless of whether we adopt the above suggestion, let's inform @jamesanunez and @daltonbohning of the potential changes here in case functional testing (that uses dmg/daos pool query mostly, and less-so the libdaos API).
In the proposal, state would only ever be busy/idle/done and would never be translated to a new value. status (errno) would similarly never be manipulated. It will have that special -DER_OP_CANCELED value for the stopping/stopped conditions and it would always be presented to the caller of the dmg/daos pool query utilities.
derived_state could take on any of the above State values (busy/idle/done if there is no derived condition such as stopping or stopped or failing or failed. If there is a further derived condition, derived_state could take on one of the new State values to provide the qualifying detail. In that latter case, stopping and failing are considered modifiers to the busy state, stopped is a modifier to idle, and failed is a modifier to done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kccain @daltonbohning changes applied, please review and verify when you can. TIA
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if state and status are never manipulated then this seems to be backward compatible with existing tests. Using the new derived_state probably should be a separate PR because the rebuild code in ftest that detects that is pretty messy currently.
src/proto/mgmt/pool.proto
Outdated
| IDLE = 1; | ||
| DONE = 2; | ||
| STOPPING = 3; | ||
| StOPPED = 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(change to all upper case letters)
| StOPPED = 4; | |
| STOPPED = 4; |
Signed-off-by: Tom Nabarro <[email protected]>
Features: control Signed-off-by: Tom Nabarro <[email protected]>
…build-states Features: control Signed-off-by: Tom Nabarro <[email protected]>
|
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-17322/7/testReport/ |
|
Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos/job/PR-17322/7/display/redirect |
1 similar comment
|
Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos/job/PR-17322/7/display/redirect |
|
Test stage Functional Hardware Large MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos/job/PR-17322/7/display/redirect |
|
Test stage Functional Hardware Medium Verbs Provider MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos/job/PR-17322/7/display/redirect |
|
Test stage Functional Hardware Large MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17322/8/execution/node/507/log |
|
Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17322/8/execution/node/482/log |
|
Test stage Functional Hardware Medium Verbs Provider MD on SSD completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-17322/8/testReport/ |
…build-states Features: control Signed-off-by: Tom Nabarro <[email protected]>
|
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-17322/9/testReport/ |
|
Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17322/9/execution/node/1282/log |
|
@daltonbohning I need to make some test changes to adjust for the addition of |
For
Most of the tests are failing with errors like this that I don't understand |
Features: control Signed-off-by: Tom Nabarro <[email protected]>
…build-states Features: control Signed-off-by: Tom Nabarro <[email protected]>
daltonbohning
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ftest LGTM
|
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-17322/11/testReport/ |
|
awaiting reviews |
|
Nice update which should be very helpful for support tasks 👍 |
Add intermediate "derived" rebuild state field to indicate temporal
pool rebuild conditions. Preserve rebuild state value (idle/done/busy)
whilst adding intermediate states in derived_state field
(stopped/stopping/failed/failing) to better inform administrator.
Features: control
Steps for the author:
After all prior steps are complete: