|
| 1 | +# Notes about the continuous integration workflow |
| 2 | + |
| 3 | +The continuous integration workflow in [`ci.yml`](ci.yml) triggers automatically |
| 4 | +on pushes, pull requests, and merge queue events. It can also be [triggered |
| 5 | +manually](https://docs.github.com/en/actions/how-tos/manage-workflow-runs/manually-run-a-workflow) |
| 6 | +from either the Actions tab on GitHub or via the GitHub APIs. Manual invocations |
| 7 | +like that are useful for limited testing and debugging (particularly when a |
| 8 | +problem only seems to show up on GitHub itself), but for more significant |
| 9 | +development and testing of the workflow itself, we use the |
| 10 | +[`act`](https://github.com/nektos/act) extension for GitHub's CLI program |
| 11 | +[`gh`](https://cli.github.com/) to run the workflow on a local computer. |
| 12 | + |
| 13 | +For the benefit of future Chromobius maintainers, this document summarizes how |
| 14 | +to set up an environment for working with [`act`](https://github.com/nektos/act) |
| 15 | +to run and test the CI workflow. |
| 16 | + |
| 17 | +## Local testing of the CI workflow with `act` |
| 18 | + |
| 19 | +The overall process consists of these steps, which are described in more detail |
| 20 | +in the subsections below: |
| 21 | + |
| 22 | +1. Clone the Chromobius repository to a local Linux computer. |
| 23 | + |
| 24 | +2. Install and configure the following programs: |
| 25 | + |
| 26 | + * The GitHub CLI program [`gh`](https://cli.github.com/) |
| 27 | + * The [`act` extension](https://nektosact.com/installation/gh.html) for `gh` |
| 28 | + * The free and open-source Docker Community Edition (CE) version of |
| 29 | + [Docker Engine](https://docs.docker.com/engine/#licensing) (note: this is not |
| 30 | + the same as Docker Desktop, which is _not_ needed) |
| 31 | + |
| 32 | +3. Create a Docker image that will be used by `gh act` to run the GitHub |
| 33 | + Actions workflow in `ci.yml`. |
| 34 | + |
| 35 | +4. Run `gh act` with specific arguments, observe the results of the run, edit |
| 36 | + the workflow file (if necessary), and repeat until satisfied. |
| 37 | + |
| 38 | +Note that the CI workflow in `ci.yml` contains a build step with a matrix of |
| 39 | +Linux, macOS, and Windows operating systems. It is not possible to run all of |
| 40 | +them on the same machine because of architectural differences, so when we test |
| 41 | +the workflow locally, we tell `gh act` to select a subset of the matrix. This is |
| 42 | +explained below. |
| 43 | + |
| 44 | +<a class="anchor" id="creating-runner-images"></a> |
| 45 | +### Creation of Docker images to use as workflow job runners |
| 46 | + |
| 47 | +For `gh act` to run a workflow, it needs to be told what Docker images to use |
| 48 | +for job runners. It is usually not possible to use GitHub's actual runner images |
| 49 | +(even though they are made freely available by GitHub) due to differences in |
| 50 | +hardware assumptions. Thankfully, some approximations to the GitHub images are |
| 51 | +available from other sources. For our testing, we create customized versions of |
| 52 | +runners that pre-install some software known to be provided on GitHub. |
| 53 | + |
| 54 | +For this project, here is the `Dockerfile` we use for the Linux runner: |
| 55 | + |
| 56 | +```dockerfile |
| 57 | +# Start from a base image that is already configured for act. |
| 58 | +# The hash below is for the image tagged act-24.04-20251102. |
| 59 | +FROM ghcr.io/catthehacker/ubuntu@sha256:8943e69edcada5141b8c1fcc1a84bab15568a49f438387bd858cb3e4df5a436d |
| 60 | + |
| 61 | +# Switch to the root user to have permission to install packages. |
| 62 | +USER root |
| 63 | + |
| 64 | +# Add some software that is pre-installed on GitHub Linux runners. |
| 65 | +RUN apt-get update && \ |
| 66 | + apt-get install -y --no-install-recommends \ |
| 67 | + clang \ |
| 68 | + cmake \ |
| 69 | + golang-go \ |
| 70 | + libclang-dev \ |
| 71 | + libclang-rt-dev \ |
| 72 | + ninja-build \ |
| 73 | + python3 python3-dev cython3 \ |
| 74 | + shellcheck \ |
| 75 | + yamllint \ |
| 76 | + && \ |
| 77 | + # Clean up the apt cache to keep the image small. |
| 78 | + rm -rf /var/lib/apt/lists/* |
| 79 | +``` |
| 80 | + |
| 81 | +Here are the shell commands used to build the image: |
| 82 | + |
| 83 | +```shell |
| 84 | +docker build -t ubuntu-act:latest . |
| 85 | +docker image prune |
| 86 | +``` |
| 87 | + |
| 88 | +The Docker image will be named `ubuntu-act`. This name is mapped to the names of |
| 89 | +GitHub runners used in `ci.yml` in a way explained in the next subsection. |
| 90 | + |
| 91 | +### Configuration of `act` |
| 92 | + |
| 93 | +`gh act` reads a configuration file that can be used to set some run-time |
| 94 | +parameters. This can be used to map the name of the Docker image built in the |
| 95 | +step above to the name of the runners used in the workflow. Certain other |
| 96 | +parameters are also essential to provide, notably `--pull=false`. Here is an |
| 97 | +example of a `~/.actrc` file: |
| 98 | + |
| 99 | +```shell |
| 100 | +# The -P flag maps a GitHub runner name (inside the workflow file) to the name |
| 101 | +# of a Docker image on the local computer. The following maps the runner named |
| 102 | +# "ubuntu-24.04" (used in ci.yml) to the local docker image "ubuntu-act". |
| 103 | +-P ubuntu-24.04=ubuntu-act:latest |
| 104 | + |
| 105 | +# If using a local docker image for the job runners, need to use --pull=false |
| 106 | +# or else will get the error "Error response from daemon: pull access denied". |
| 107 | +--pull=false |
| 108 | + |
| 109 | +# This tells act where to put artifacts saved using `actions/upload-artifact`. |
| 110 | +--artifact-server-path /tmp/act-artifacts |
| 111 | + |
| 112 | +# These are some miscellaneous performance improvements. |
| 113 | +--use-new-action-cache |
| 114 | +--action-offline-mode |
| 115 | + |
| 116 | +# This tells act to remove containers after workflow failures. |
| 117 | +--rm |
| 118 | +``` |
| 119 | + |
| 120 | +### Running `gh act` |
| 121 | + |
| 122 | +The following is an example of a command we use to run the workflow in debug |
| 123 | +mode. The command is meant to be executed from the top level of the Chromobius |
| 124 | +source directory. Note that this example shows how to select a specific OS from |
| 125 | +the matrix in `build_dist` (namely the entries using `ubuntu-24.04` as the |
| 126 | +operating system); this matrix selection value would need to be changed when |
| 127 | +running this command on a different operating system and hardware architecture. |
| 128 | + |
| 129 | +```shell |
| 130 | +gh act workflow_dispatch \ |
| 131 | + --matrix os:ubuntu-24.04 \ |
| 132 | + --input debug=true \ |
| 133 | + --input upload_to_pypi=false \ |
| 134 | + --env GITHUB_WORKFLOW_REF=refs/heads/main \ |
| 135 | + --no-recurse -W .github/workflows/ci.yml |
| 136 | +``` |
| 137 | + |
| 138 | +The `--input` options in the command line above are used to set variables that |
| 139 | +are used in the workflow to change some behaviors when debugging. The `--env` |
| 140 | +option sets the `GITHUB_WORKFLOW_REF` environment variable that is normally set |
| 141 | +by GitHub when a workflow is running in that environment. |
| 142 | + |
| 143 | +### Miscellaneous tips |
| 144 | + |
| 145 | +Sometimes it's useful to add the `--verbose` option to the `gh act` command |
| 146 | +above to get more information about what is happening. |
| 147 | + |
| 148 | +If the workflow running inside `act` inexplicably starts producing inconsistent |
| 149 | +errors, such as a program like `bazel` not being found on one run when it was |
| 150 | +found on the previous run, or well-known actions (e.g., `actions/setup-python`) |
| 151 | +generating internal errors, the first thing to suspect is problems with caching. |
| 152 | +Here are some things to try: |
| 153 | + |
| 154 | +1. A possible cause of random workflow errors is corruption in the `act` cache. |
| 155 | + (This can happen when runs are terminated using, e.g., |
| 156 | + <kbd>control</kbd><kbd>c</kbd>.) To resolve this, delete the cache contents: |
| 157 | + |
| 158 | + 1. Delete all artifacts in the `act` artifact directory. Assuming you are |
| 159 | + using `/tmp/act-artifacts` for the artifact directory, do this: |
| 160 | + |
| 161 | + ```shell |
| 162 | + rm -rf /tmp/act-artifacts/* |
| 163 | + ``` |
| 164 | + |
| 165 | + 2. Delete everything in the `act` cache (which is located in |
| 166 | + `$HOME/.cache/act/` by default): |
| 167 | + |
| 168 | + ```shell |
| 169 | + rm -rf ~/.cache/act/* |
| 170 | + rm -rf ~/.cache/actcache/* |
| 171 | + ``` |
| 172 | + |
| 173 | +2. If clearing the caches and containers as described above does not stop |
| 174 | + random flaky behavior, the next thing to try is to add the option |
| 175 | + `--no-cache-server` to the `gh act` command. If the random errors stop, then |
| 176 | + the cause has been narrowed down. You can then experiment with trying to get |
| 177 | + some parallelism back by replacing `--no-cache-server` with the |
| 178 | + `--concurrent-jobs` option and a low number like 4 or 2. Reducing the |
| 179 | + maximum concurrent jobs will reduce performance, but that may be the price |
| 180 | + for avoiding random errors. If random errors resurface, then it may be |
| 181 | + necessary to use `--no-cache-server` all the time on your system. |
| 182 | + |
| 183 | +3. If the steps above did not stop random errors, the last thing to try is to |
| 184 | + delete the Docker containers and volumes: |
| 185 | + |
| 186 | + ```shell |
| 187 | + docker system prune --all |
| 188 | + docker volume prune --all |
| 189 | + ``` |
| 190 | + |
| 191 | + This is a bit of a sledgehammer, unfortunately, and if you created a Docker |
| 192 | + image as described in the [section above](#creating-runner-images), then |
| 193 | + doing the pruning commands will remove it and you will need to recreate the |
| 194 | + image. |
0 commit comments