[RL][Feature Request] Tun on torch.compile + cudagraphs for trainer definition

With Generator sped up by vllm's support_torch_compile, the new bottleneck is trainer.

Let's enable `torch.compile` and `cudagraph` there to get similar speedups