|
| 1 | +# Launching Phoenix Runners |
| 2 | + |
| 3 | +The Phoenix runners were repeatedly failing due to a network error. |
| 4 | +Spencer managed to fix it via [this PR](https://github.com/MFlowCode/MFC/pull/933) and by running things through a socks5 proxy on each login node that holds a runner. |
| 5 | +These are documented for Spencer or his next of kin. |
| 6 | + |
| 7 | +__The runners are started via the following process__ |
| 8 | + |
| 9 | +1. Log in to the login node <x> via `ssh login-phoenix-rh9-<x>.pace.gatech.edu`. `<x>` can be `1` through `6` on Phoenix. |
| 10 | + * Detour: Make sure no stray `ssh` daemons are sitting around: `pkill -9 sshd`. |
| 11 | + * You can probably keep your terminal alive via `fuser -k -9 ~/nohup.out`, which kills (signal 9) whatever process is writing to that no-hangup file (the daemon we care about) |
| 12 | +2. Log back into the same login node because you may have just nuked your session |
| 13 | + * Detour: Make sure stray runners on that login node are dead (one liner): `pkill -9 -f -E 'run.sh|Runner.listener|Runner.helper'` |
| 14 | + * If cautious, check that no runner processes are left over. `top` followed by `u` and `<type your user name>` and return. |
| 15 | +3. Execute from your home directory: `nohup ssh -N -D 1080 -vvv login-phoenix-rh9-<x>.pace.gatech.edu &`, replacing `<x>` with the login node number |
| 16 | + * This starts a proxy to tunnel a new ssh session through |
| 17 | +4. Navigate to your runner's directory (or create a runner directory if you need). |
| 18 | + * Right now they are in Spencer's `scratch/mfc-runners/action-runner-<runner#>` |
| 19 | +5. Run the alias `start_runner`, which dumps output `~/runner.out` |
| 20 | + * If one doesn't have this alias yet, create and source it in your `.bashrc` or similar: |
| 21 | +```bash |
| 22 | +alias start_runner=' \ |
| 23 | + http_proxy="socks5://localhost:1080" \ |
| 24 | + https_proxy="socks5://localhost:1080" \ |
| 25 | + no_proxy="localhost,127.0.0.1,github.com,api.github.com,pipelines.actions.githubusercontent.com,alive.github.com,pypi.org,files.pythonhosted.org,fftw.org,www.fftw.org" \ |
| 26 | + NO_PROXY="localhost,127.0.0.1,github.com,api.github.com,pipelines.actions.githubusercontent.com,alive.github.com,pypi.org,files.pythonhosted.org,fftw.org,www.fftw.org" \ |
| 27 | + RUNNER_DEBUG=1 \ |
| 28 | + ACTIONS_STEP_DEBUG=1 \ |
| 29 | + GITHUB_ACTIONS_RUNNER_PREFER_IP_FAMILY=ipv4 \ |
| 30 | + DOTNET_SYSTEM_NET_SOCKETS_KEEPALIVE_TIME=00:01:00 \ |
| 31 | + DOTNET_SYSTEM_NET_SOCKETS_KEEPALIVE_INTERVAL=00:00:20 \ |
| 32 | + DOTNET_SYSTEM_NET_SOCKETS_KEEPALIVE_RETRYCOUNT=5 \ |
| 33 | + nohup ./run.sh > ~/runner.out 2>&1 &' |
| 34 | +``` |
| 35 | +6. You're done |
| 36 | + |
| 37 | + |
| 38 | +### For inquisitive minds |
| 39 | + |
| 40 | +__Why the `start_runner` alias?__ |
| 41 | + |
| 42 | +1. `alias start_runner='…'` |
| 43 | + Defines a new shell alias named `start_runner`. Whenever you run `start_runner`, the shell will execute everything between the single quotes as if you’d typed it at the prompt. |
| 44 | + |
| 45 | +2. `http_proxy="socks5://localhost:1080"` |
| 46 | + Sets the `http_proxy` environment variable so that any HTTP traffic from the runner is sent through a SOCKS5 proxy listening on `localhost:1080`. |
| 47 | + |
| 48 | +3. `https_proxy="socks5://localhost:1080"` |
| 49 | + Tells HTTPS-aware tools to use that same local SOCKS5 proxy for HTTPS requests. |
| 50 | + |
| 51 | +4. `no_proxy="localhost,127.0.0.1,github.com,api.github.com,pipelines.actions.githubusercontent.com,alive.github.com,pypi.org,files.pythonhosted.org,fftw.org,www.fftw.org"` |
| 52 | + Lists hosts and domains that should bypass the proxy entirely. Commonly used for internal or high-volume endpoints where you don’t want proxy overhead. |
| 53 | + |
| 54 | +5. `NO_PROXY="localhost,127.0.0.1,github.com,api.github.com,pipelines.actions.githubusercontent.com,alive.github.com,pypi.org,files.pythonhosted.org,fftw.org,www.fftw.org"` |
| 55 | + Same list as `no_proxy`—some programs only check the uppercase `NO_PROXY` variable. |
| 56 | + |
| 57 | +6. `RUNNER_DEBUG=1` |
| 58 | + Enables debug-level logging in the GitHub Actions runner itself, so you’ll see more verbose internal messages in its logs. |
| 59 | + |
| 60 | +7. `ACTIONS_STEP_DEBUG=1` |
| 61 | + Turns on step-level debug logging for actions you invoke—handy if you need to trace exactly what each action is doing under the hood. |
| 62 | + |
| 63 | +8. `GITHUB_ACTIONS_RUNNER_PREFER_IP_FAMILY=ipv4` |
| 64 | + Forces the runner to resolve DNS names to IPv4 addresses only. Useful if your proxy or network has spotty IPv6 support. |
| 65 | + |
| 66 | +9. `DOTNET_SYSTEM_NET_SOCKETS_KEEPALIVE_TIME=00:01:00` |
| 67 | + For .NET–based tasks: sets the initial TCP keepalive timeout to 1 minute (after 1 minute of idle, a keepalive probe is sent). |
| 68 | + |
| 69 | +10. `DOTNET_SYSTEM_NET_SOCKETS_KEEPALIVE_INTERVAL=00:00:20` |
| 70 | + If the first keepalive probe gets no response, wait 20 seconds between subsequent probes. |
| 71 | + |
| 72 | +11. `DOTNET_SYSTEM_NET_SOCKETS_KEEPALIVE_RETRYCOUNT=5` |
| 73 | + If probes continue to go unanswered, retry up to 5 times before declaring the connection dead. |
| 74 | + |
| 75 | +12. `nohup ./run.sh > ~/runner.out 2>&1 &` |
| 76 | + - `nohup … &` runs `./run.sh` in the background and makes it immune to hangups (so it keeps running if you log out). |
| 77 | + - `> ~/runner.out` redirects **stdout** to the file `runner.out` in your home directory. |
| 78 | + - `2>&1` redirects **stderr** into the same file, so you get a combined log of everything the script prints. |
| 79 | + |
| 80 | +__Why the extra ssh command?__ |
| 81 | + |
| 82 | +1. `http_proxy="socks5://localhost:1080"` |
| 83 | + Routes all HTTP traffic through a local SOCKS5 proxy on port 1080. |
| 84 | + |
| 85 | +2. `https_proxy="socks5://localhost:1080"` |
| 86 | + Routes all HTTPS traffic through the same proxy. |
| 87 | + |
| 88 | +3. `no_proxy="localhost,127.0.0.1,github.com,api.github.com,pipelines.actions.githubusercontent.com,alive.github.com,pypi.org,files.pythonhosted.org,fftw.org,www.fftw.org"` |
| 89 | + Specifies hosts and domains that bypass the proxy entirely. Includes specific things that MFC's CMake will try to `wget` (e.g., `fftw`) or some other non `git` command. Allows `git clone` to work. |
| 90 | + |
| 91 | +4. `NO_PROXY="localhost,127.0.0.1,github.com,api.github.com,pipelines.actions.githubusercontent.com,alive.github.com,pypi.org,files.pythonhosted.org,fftw.org,www.fftw.org"` |
| 92 | + Same bypass list for applications that only check the uppercase variable. |
| 93 | + |
| 94 | +5. `RUNNER_DEBUG=1` |
| 95 | + Enables verbose internal logging in the GitHub Actions runner. |
| 96 | + |
| 97 | +6. `GITHUB_ACTIONS_RUNNER_PREFER_IP_FAMILY=ipv4` |
| 98 | + Forces DNS resolution to IPv4 to avoid IPv6 issues. |
| 99 | + |
| 100 | +7. `DOTNET_SYSTEM_NET_SOCKETS_KEEPALIVE_TIME=00:01:00` |
| 101 | + (For .NET tasks) sends the first TCP keepalive probe after 1 minute of idle. |
| 102 | + |
| 103 | +8. `DOTNET_SYSTEM_NET_SOCKETS_KEEPALIVE_INTERVAL=00:00:20` |
| 104 | + Waits 20 seconds between subsequent TCP keepalive probes. |
| 105 | + |
| 106 | +9. `DOTNET_SYSTEM_NET_SOCKETS_KEEPALIVE_RETRYCOUNT=5` |
| 107 | + Retries keepalive probes up to 5 times before closing the connection. |
| 108 | + |
| 109 | +10. `nohup ./run.sh > ~/runner.out 2>&1 &` |
| 110 | + Runs `run.sh` in the background, immune to hangups, redirecting both stdout and stderr to `~/runner.out`. |
0 commit comments