Skip to content

Conversation

@Narfinger
Copy link
Contributor

@Narfinger Narfinger commented Jul 24, 2025

This involves a couple of changes:

  • Upgrade Dependencies
  • Have a new function check_and_inc_retries that will panic if the
    retries is too much without success. We will exit the process and
    crash. If we succeed we set the retries to zero.
  • Unify processor timeout throughout main loop. Some paths through the
    main loop had some timeouts while some had none. We now have a unified
    timeout at the end (before killing offline runners) of 5 seconds.
  • Read the env variable "RUNNER_SUFFIX" to add a suffix to the runner
    name, so we do not have to hardcode it.
  • Introduce a new function call_github_runner_api which encapsulates
    simple github api calls with gh. spawn_runner now uses this new
    functionality for spawning. Previously some
  • Introduce kill_offline_runners which calsl the github api for
    current runners, takes runners that are offline and have a name
    starting with dresden-hos. Then it removes these runners. This happens
    at the end and 5 seconds after we started the runners. The runners need
    some time to start and connect to github properly. The timeout seems ok
    but we can probably increase it.

Testing: Tested on runner CI2. It succesfully removed old offline runners on github and started new ones.
I plan to let CI2 run this new code for a bit to see if there are any regressions without changing CI1.

Hopefully this fixes: #44

Sorry for the massive changes but I think the codebase is so small that
it is allowed :D

- Upgrade Dependencies
- Have a new function `check_and_inc_retries` that will panic if the
  retries is too much without success. We will exit the process and
crash. If we succeed we set the retries to zero.
- Unify processor timeout throughout main loop. Some paths through the
  main loop had some timeouts while some had none. We now have a unified
timeout at the end (before killing offline runners) of 5 seconds.
- Read the env variable "RUNNER_SUFFIX" to add a suffix to the runner
  name, so we do not have to hardcode it.
- Introduce a new function `call_github_runner_api` which encapsulates
  simple github api calls with gh. `spawn_runner` now uses this new
functionality for spawning. Previously some
- Introduce `kill_offline_runners` which calsl the github api for
  current runners, takes runners that are offline and have a name
starting with `dresden-hos`. Then it removes these runners. This happens
at the end and 5 seconds after we started the runners. The runners need
some time to start and connect to github properly. The timeout seems ok
tbut we can probably increase it.

Sorry for the massive changes but I think the codebase is so small that
it is allowed :D

Signed-off-by: Narfinger <[email protected]>
Copy link
Member

@jschwe jschwe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you also have a look at the github runner run.sh and check if we can determine if the runner spawned successfully by wrapping run.sh?


thread::sleep(Duration::from_secs(5));
// Check if some still running images are listed as offline from github api point of view
if let Err(e) = kill_offline_runners(&servo_ci_scope) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to do this all the time, only when we detected a problem. github API calls are rate-limited, and the limit is not even too high.

Signed-off-by: Narfinger <[email protected]>
@Narfinger Narfinger marked this pull request as ready for review August 5, 2025 07:03
@delan delan force-pushed the main branch 2 times, most recently from 87b9cb4 to 2f00564 Compare October 16, 2025 02:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

We are rate limited on the GitHub generate-jitconfig API

2 participants