When running the distributed test with a CUDA backend, there's a segfault. It can be fixed easily with CUDA_LAUNCH_BLOCKING, but that's not ideal. Below are the commands to repro:
Server:
$ bash examples/distributed/serve.sh
Client:
$ bash examples/distributed/client.sh