I think that the cuda kernel files in `detection/ops/src/cuda` did not support fp16 or bf16. However, the fp16 is set to train with mixed precision in config. I confused whether the fp16 is available in this project.