After converting the ONNX fp32 model to a TensorRT fp16 model, numerical overflow occurred during inference. The model was trained using the PyTorch framework with automatic mixed precision, and the ONNX fp32 model inference works normally. Therefore, it may be necessary to manually specify certain model layers to maintain fp32 precision. Is it possible to determine whether the TensorRT operations related to this operator are full precision based on the parameter types used in the mixed precision training process of the PyTorch model? How can this be carried out?
We had tried keep some classical layers fp32, like Softmax, LayerNorm, Sigmoid, .... with no luck.