We can catch more bugs earlier if we have a test case that does the following:
- set up a coupled simulation
- run for 1 step
- check that the fluxes in the atmosphere match those of each surface
This can be a separate experiments/ script that runs in buildkite, so it doesn't slow down the runtests and is run for each PR.