@@ -170,3 +170,54 @@ These runners are most likely failing due to image pull failures which was one
170170of our original hypotheses on the issue. Recent changes to Github ARC
171171in https://github.com/actions/actions-runner-controller/pull/4059 should help
172172with this issues, although further testing is needed.
173+
174+ Before that patch makes it into a release, an important maintenance step is
175+ to periodically (every couple of days should be fine) go through and delete
176+ failed ` ephemeralrunner ` instances. This can be done by looking at all
177+ ` epehemralrunner ` instances and then deleting any that are failed. To get the
178+ list of runners, run the following command:
179+
180+ ``` bash
181+ kubectl get ephemeralrunner --all-namespaces
182+ ```
183+
184+ this will produce an output like the following:
185+
186+ ```
187+ NAMESPACE NAME GITHUB CONFIG URL RUNNERID STATUS JOBREPOSITORY JOBWORKFLOWREF WORKFLOWRUNID JOBDISPLAYNAME MESSAGE AGE
188+ llvm-premerge-linux-runners llvm-premerge-linux-runners-dhdwg-runner-kbh9v https://github.com/llvm 434949 Running llvm/llvm-project llvm/llvm-project/.github/workflows/premerge.yaml@refs/pull/141711/merge 3m4s
189+ llvm-premerge-windows-runners llvm-premerge-windows-runners-4pgkh-runner-4wv5w https://github.com/llvm 434901 Running llvm/llvm-project llvm/llvm-project/.github/workflows/premerge.yaml@refs/pull/141601/merge 64m
190+ llvm-premerge-windows-runners llvm-premerge-windows-runners-4pgkh-runner-92hgr https://github.com/llvm 434557 Failed Pod has failed to start more than 5 times: 7h18m
191+ llvm-premerge-windows-runners llvm-premerge-windows-runners-4pgkh-runner-9jrtj https://github.com/llvm 434898 Running llvm/llvm-project llvm/llvm-project/.github/workflows/premerge.yaml@refs/pull/140937/merge 69m
192+ llvm-premerge-windows-runners llvm-premerge-windows-runners-4pgkh-runner-d2bbd https://github.com/llvm 434941 Running llvm/llvm-project llvm/llvm-project/.github/workflows/premerge.yaml@refs/pull/141965/merge 19m
193+ llvm-premerge-windows-runners llvm-premerge-windows-runners-4pgkh-runner-f7gzn https://github.com/llvm 434924 Running llvm/llvm-project llvm/llvm-project/.github/workflows/premerge.yaml@refs/pull/141966/merge 39m
194+ llvm-premerge-windows-runners llvm-premerge-windows-runners-4pgkh-runner-l6v2k https://github.com/llvm 434948 3m4s
195+ llvm-premerge-windows-runners llvm-premerge-windows-runners-4pgkh-runner-lvt4f https://github.com/llvm 434923 Running llvm/llvm-project llvm/llvm-project/.github/workflows/premerge.yaml@refs/pull/141151/merge 39m
196+ llvm-premerge-windows-runners llvm-premerge-windows-runners-4pgkh-runner-rbtpz https://github.com/llvm 434944 Running llvm/llvm-project llvm/llvm-project/.github/workflows/premerge.yaml@refs/pull/137727/merge 11m
197+ llvm-premerge-windows-runners llvm-premerge-windows-runners-4pgkh-runner-vc5k4 https://github.com/llvm 434916 Running llvm/llvm-project llvm/llvm-project/.github/workflows/premerge.yaml@refs/pull/141963/merge 56m
198+ ```
199+
200+ Notice that one of the runners has failed. It can be claned up by running
201+ the following command (note that we also specify the namespace the runner is
202+ in):
203+
204+ ``` bash
205+ kubectl delete ephemeralrunner llvm-premerge-windows-runners-4pgkh-runner-92hgr -n llvm-premerge-windows-runners
206+ ```
207+
208+ That command should execute quickly and will clean it up.
209+
210+ ** IMPORTANT:** These steps need to be peformed separately on both
211+ ` llvm-premerge-cluster-us-central ` and ` llvm-premerge-us-west ` . You can switch
212+ between them using the standard ` gcloud ` authentication commands. For
213+ ` llvm-premerge-cluster-us-central ` you would run:
214+
215+ ``` bash
216+ gcloud container clusters get-credentials llvm-premerge-cluster-us-central --location us-central1-a
217+ ```
218+
219+ and the following for ` llvm-premerge-cluster-us-west ` :
220+
221+ ``` bash
222+ gcloud container clusters get-credentials llvm-premerge-cluster-us-west --location us-west1
223+ ```
0 commit comments