Enable autotuning for distributed kernels launched via torchrun. Event based benchmarker available at `autotuner/benchmarker.py` in PR: https://github.com/pytorch/helion/pull/393 ## Enable via 1. Make sure all torchrun workers benchmark same configs in same order. 2. Master rank decides the config and communicate that to all processes.