As a next step, run EvolveGhSingleBlackHole on GPU without any control systems. At this point we can do sensible profiling as the GH equations should hide some latency.
Requires:
- support generic internal boundaries (i.e. mortars)
- kokkos kernel for ExponentialFilter
- port over TimeDerivative, UpwindPenalty, freezing outer boundary corrections