-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Fix windows service tests #137451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix windows service tests #137451
Conversation
|
Pinging @elastic/es-core-infra (Team:Core/Infra) |
| assertThat(result.stdout(), containsString("Status : " + status)); | ||
| } | ||
|
|
||
| private void waitForStop(String id, Duration timeout) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we consider replacing the service control calls in the windows service cli with sc.exe calls (and this wait in stop)? Then these changes could benefit users too, not just tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I raised this question on the slack thread too :)
I think this is a good idea, but I'd prefer to be sure.
My proposal is to make these changes in tests, then let them run for a week or so. If the issue is definitely fixed, we can either use sc.exe in most of the ProcrunCommands, or consider replacing them with direct calls to Win32 functions (should be rather easy, and would make the code for wait a lot nicer -- no external script needed).
|
This is the race condition in procrun: In a normal execution, the thread serving the stop request and the thread doing/waiting for the JNI "stop" operation generate a reasonable sequence of SetServiceStatus calls: In many executions however the thread handling the stop request overlaps with the other threads, trying to re-set the status "back": Notice the This does not seem to have a visible effect; Windows recognizes the service has stopped, and invalidated the handle to it. However if it turns out that it does have bad effects we can fix it (feel free to ping me) |
rjernst
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This PR attempts to fix (definitely) Windows services tests.
We used
procrunas a service control tool; however,procrun(as a service) suffers from a race condition andprocrunas a tool seems to suffer from it.Furthermore, the implementation of
stopinprocrunis (at the very least), "strange": it seems to wait for he service to stop, but it's not guaranteed to wait till the end, returning an error (potentially)In general, this is true for all (most) of the service control tools: the return when the service is still in a STOP_PENDING state. We actually have to "busy wait" for the service to be really STOPPED.
This PR adds the busy wait and adjusts how we stop the service slightly. It also moves to the standard Windows
sc.exetool for service control operations (start,stopanddelete) in tests.Fixes: #113177
Fixes: #113160
Fixes: #113219
Fixes: #113313