OHOS: Robustify the runners #46

Narfinger · 2025-07-24T13:10:50Z

This involves a couple of changes:

Upgrade Dependencies
Have a new function check_and_inc_retries that will panic if the
retries is too much without success. We will exit the process and
crash. If we succeed we set the retries to zero.
Unify processor timeout throughout main loop. Some paths through the
main loop had some timeouts while some had none. We now have a unified
timeout at the end (before killing offline runners) of 5 seconds.
Read the env variable "RUNNER_SUFFIX" to add a suffix to the runner
name, so we do not have to hardcode it.
Introduce a new function call_github_runner_api which encapsulates
simple github api calls with gh. spawn_runner now uses this new
functionality for spawning. Previously some
Introduce kill_offline_runners which calsl the github api for
current runners, takes runners that are offline and have a name
starting with dresden-hos. Then it removes these runners. This happens
at the end and 5 seconds after we started the runners. The runners need
some time to start and connect to github properly. The timeout seems ok
but we can probably increase it.

Testing: Tested on runner CI2. It succesfully removed old offline runners on github and started new ones.
I plan to let CI2 run this new code for a bit to see if there are any regressions without changing CI1.

Hopefully this fixes: #44

Sorry for the massive changes but I think the codebase is so small that
it is allowed :D

- Upgrade Dependencies - Have a new function `check_and_inc_retries` that will panic if the retries is too much without success. We will exit the process and crash. If we succeed we set the retries to zero. - Unify processor timeout throughout main loop. Some paths through the main loop had some timeouts while some had none. We now have a unified timeout at the end (before killing offline runners) of 5 seconds. - Read the env variable "RUNNER_SUFFIX" to add a suffix to the runner name, so we do not have to hardcode it. - Introduce a new function `call_github_runner_api` which encapsulates simple github api calls with gh. `spawn_runner` now uses this new functionality for spawning. Previously some - Introduce `kill_offline_runners` which calsl the github api for current runners, takes runners that are offline and have a name starting with `dresden-hos`. Then it removes these runners. This happens at the end and 5 seconds after we started the runners. The runners need some time to start and connect to github properly. The timeout seems ok tbut we can probably increase it. Sorry for the massive changes but I think the codebase is so small that it is allowed :D Signed-off-by: Narfinger <[email protected]>

jschwe

Did you also have a look at the github runner run.sh and check if we can determine if the runner spawned successfully by wrapping run.sh?

docker/docker_jit_monitor/src/main.rs

jschwe · 2025-07-25T01:27:35Z

docker/docker_jit_monitor/src/main.rs

+
+        thread::sleep(Duration::from_secs(5));
+        // Check if some still running images are listed as offline from github api point of view
+        if let Err(e) = kill_offline_runners(&servo_ci_scope) {


I don't think we want to do this all the time, only when we detected a problem. github API calls are rate-limited, and the limit is not even too high.

docker/docker_jit_monitor/src/main.rs

Signed-off-by: Narfinger <[email protected]>

Narfinger force-pushed the robustify branch from eee2d0a to 9396dbb Compare July 24, 2025 13:14

jschwe reviewed Jul 25, 2025

View reviewed changes

Narfinger force-pushed the robustify branch from 90e34bc to 9faf393 Compare July 25, 2025 08:38

Requested changes

9faf393

Signed-off-by: Narfinger <[email protected]>

Narfinger marked this pull request as ready for review August 5, 2025 07:03

delan force-pushed the main branch 2 times, most recently from 87b9cb4 to 2f00564 Compare October 16, 2025 02:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

OHOS: Robustify the runners #46

OHOS: Robustify the runners #46

Uh oh!

Narfinger commented Jul 24, 2025 •

edited

Loading

Uh oh!

jschwe left a comment

Uh oh!

Uh oh!

Uh oh!

jschwe Jul 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

OHOS: Robustify the runners #46

Are you sure you want to change the base?

OHOS: Robustify the runners #46

Uh oh!

Conversation

Narfinger commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jschwe left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jschwe Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Narfinger commented Jul 24, 2025 •

edited

Loading