[CI] Update documentation after libcxx runner sets

boomanaiden154 · web-flow · commit 18c7e8081460 · 2025-07-18T17:46:05.000-07:00
There are a couple places where we need to update the documentation to reflect changes made due to the introduction of the libc++ runner sets. I did a cursory glance through the documentation and updated everything where I saw something was incorrect, but this patch might not be exhaustive. Reviewers: lnihlen, gburgessiv, dschuff, Keenuts, cmtice Reviewed By: cmtice Pull Request: #510
diff --git a/premerge/architecture.md b/premerge/architecture.md
@@ -20,9 +20,9 @@ To balance cost/performance, we keep both types.
  - building & testing LLVM shall be done on self-hosted runners.
 
 LLVM has several flavor of self-hosted runners:
- - libcxx runners.
  - MacOS runners for HLSL managed by Microsoft.
  - GCP windows/linux runners managed by Google.
+ - GCP linux runners setup for libcxx managed by Google.
 
 This document only focuses on Google's GCP hosted runners.
 
@@ -47,10 +47,11 @@ Any relevant differences are explicitly enumerated.
 
 Our runners are hosted on GCP Kubernetes clusters, and use the
 [Action Runner Controller (ARC)](https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/about-actions-runner-controller).
-The clusters have 3 pools:
+The clusters have 4 main pools:
   - llvm-premerge-linux
   - llvm-premerge-linux-service
   - llvm-premerge-windows
+  - llvm-premerge-libcxx
 
 **llvm-premerge-linux-service** is a fixed pool, only used to host the
 services required to manage the premerge infra (controller, listeners,
@@ -64,6 +65,11 @@ are `n2d-standard-64` due to quota limitations.
 VMs. Similar to the Linux pool, but this time it runs Windows workflows. In the
 US West cluster, the machines are `n2d-standard-32` due to quota limitations.
 
+**llvm-premerge-libcxx** is a auto-scaling pool with large `n2-standard-32`
+VMs. This is similar to the Linux pool but with smaller machines tailored
+to the libcxx testing workflows. In the US West Cluster, the machines are
+`n2d-standard-32` due to quota limitations.
+
 ### Service pool: llvm-premerge-linux-service
 
 This pool runs all the services managing the presubmit infra.
@@ -87,7 +93,7 @@ How a job is run:
  - If the instance is not reused in the next 10 minutes, the autoscaler
    will turn down the instance, freeing resources.
 
-### Worker pools : llvm-premerge-linux, llvm-premerge-windows
+### Worker pools : llvm-premerge-linux, llvm-premerge-windows, llvm-premerge-libcxx
 
 To make sure each runner pod is scheduled on the correct pool (linux or
 windows, avoiding the service pool), we use labels and taints.
@@ -98,6 +104,7 @@ So if we do not enforce limits, the controller could schedule 2 runners on
 the same instance, forcing containers to share resources.
 
 Those bits are configures in the
-[linux runner configuration](linux_runners_values.yaml) and
-[windows runner configuration](windows_runner_values.yaml).
+[linux runner configuration](linux_runners_values.yaml),
+[windows runner configuration](windows_runner_values.yaml), and
+[libcxx runner configuration](libcxx_runners_values.yaml).
 
diff --git a/premerge/cluster-management.md b/premerge/cluster-management.md
@@ -57,6 +57,7 @@ will see 3 node pools:
 - llvm-premerge-linux
 - llvm-premerge-linux-service
 - llvm-premerge-windows
+- llvm-premerge-libcxx
 
 Definitions for each pool are in [Architecture overview](architecture.md).
 
@@ -96,9 +97,11 @@ To apply any changes to the cluster:
 terraform apply -target module.premerge_cluster_us_central.google_container_node_pool.llvm_premerge_linux_service
 terraform apply -target module.premerge_cluster_us_central.google_container_node_pool.llvm_premerge_linux
 terraform apply -target module.premerge_cluster_us_central.google_container_node_pool.llvm_premerge_windows
+terraform apply -target module.premerge_cluster_us_central.google_container_node_pool.llvm_premerge_libcxx
 terraform apply -target module.premerge_cluster_us_west.google_container_node_pool.llvm_premerge_linux_service
 terraform apply -target module.premerge_cluster_us_west.google_container_node_pool.llvm_premerge_linux
 terraform apply -target module.premerge_cluster_us_west.google_container_node_pool.llvm_premerge_windows
+terraform apply -target module.premerge_cluster_us_west.google_container_node_pool.llvm_premerge_libcxx
 terraform apply
 ```
 
@@ -145,6 +148,9 @@ on a kubernetes destroy command:
 ```bash
 terraform destroy -target module.premerge_cluster_us_central_resources.helm_release.github_actions_runner_set_linux
 terraform destroy -target module.premerge_cluster_us_central_resources.helm_release.github_actions_runner_set_windows
+terraform destroy -target module.premerge_cluster_us_central_resources.helm_release.github_actions_runner_set_libcxx
+terraform destroy -target module.premerge_cluster_us_central_resources.helm_release.github_actions_runner_set_libcxx_release
+terraform destroy -target module.premerge_cluster_us_central_resources.helm_release.github_actions_runner_set_libcxx_next
 ```
 
 These should complete, but if they do not, we are still able to get things
@@ -157,6 +163,9 @@ commands by deleting the kubernetes namespaces all the resources live in:
 ```bash
 terraform destroy -target module.premerge_cluster_us_central_resources.kubernetes_namespace.llvm_premerge_linux_runners
 terraform destroy -target module.premerge_cluster_us_central_resources.kubernetes_namespace.llvm_premerge_windows_runners
+terraform destroy -target module.premerge_cluster_us_central_resources.kubernetes_namespace.llvm_premerge_libcxx_runners
+terraform destroy -target module.premerge_cluster_us_central_resources.kubernetes_namespace.llvm_premerge_libcxx_release_runners
+terraform destroy -target module.premerge_cluster_us_central_resources.kubernetes_namespace.llvm_premerge_libcxx_next_runners
 ```
 
 If things go smoothly, these should complete quickly. If they do not complete,