Jetson Thor (SM 110) advertises FP8/FP4 throughput but TensorRT silently falls back to FP32 when FP8/FP4 flags are enabled

## Title
`Jetson Thor (SM 110) advertises FP8/FP4 throughput but TensorRT silently falls back to FP32 when FP8/FP4 flags are enabled`

## Summary
- **Expectation:** Jetson Thor datasheet claims 1,035 TFLOPS (Dense FP4 | Sparse FP8 | Sparse INT8) and 517 TFLOPS (Dense FP8 | Sparse FP16), so the platform should accelerate FP8/FP4 in TensorRT.
- **Reality:** On Jetson Thor (compute capability 11.0), TensorRT 10.13.3.9 accepts `BuilderFlag.FP8`/`BuilderFlag.FP4` but silently builds FP32 engines (larger files, `DataType.FLOAT` outputs, FP32-scale weights). No error or warning indicates the fallback.
- **Impact:** Users targeting advertised low-precision formats waste time debugging. Real throughput stays at FP32, contrary to product specs.

## Environment
- Device: NVIDIA Jetson Thor developer kit (GPU compute capability 11.0 / SM 110)
- OS: Jetson Linux (default Thor image)
- CUDA: 13.0
- TensorRT: 10.13.3.9 (Python API via [/usr/bin/python3](cci:7://file:///usr/bin/python3:0:0-0:0))
- Python: 3.12
- Model: `examples/gpt2.onnx` from https://github.com/commaai/bodyjim/blob/master/examples/roam.py (comma.ai bodyjim)

## Reproduction Steps
1. Prepare Jetson Thor with TensorRT 10.13.3.9 and CUDA 13.0.
2. Parse an ONNX model and request FP8:
   ```python
   import tensorrt as trt
   from pathlib import Path

   logger = trt.Logger(trt.Logger.VERBOSE)
   builder = trt.Builder(logger)
   network = builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))
   parser = trt.OnnxParser(network, logger)

   onnx_path = Path("examples/gpt2.onnx")
   with open(onnx_path, "rb") as f:
       parser.parse(f.read())

   config = builder.create_builder_config()
   config.set_memory_pool_limit(trt.MemoryPoolType.WORKSPACE, 4 << 30)
   config.set_flag(trt.BuilderFlag.FP8)          # Same behavior with FP4

   serialized_engine = builder.build_serialized_network(network, config)
   Path("gpt2_fp8.plan").write_bytes(serialized_engine)
   ```
3. Inspect the plan:
   ```python
   runtime = trt.Runtime(logger)
   engine = runtime.deserialize_cuda_engine(Path("gpt2_fp8.plan").read_bytes())
   for i in range(engine.num_io_tensors):
       name = engine.get_tensor_name(i)
       dtype = engine.get_tensor_dtype(name)
       print(name, dtype)
   ```
4. Compare plan sizes:
   ```
   # FP16 reference (TensorRT flag FP16):
   gpt2_fp16.plan ≈ 199 MB

   # FP8 request:
   gpt2_fp8.plan ≈ 382 MB  (roughly double FP16, matching FP32)
   ```
5. Observe TensorRT console logs:
   ```
   [TRT] [I] Total Weights Memory: 399224448 bytes
   ...
   Outputs:
     logits → DataType.FLOAT
   ```
6. Same behavior with FP4 flag: plan size ≈ 382 MB, outputs still `DataType.FLOAT`.

## Actual Behavior
- `build_serialized_network` returns a plan without errors.
- Generated engine sizes match FP32, not FP8/FP4.
- Runtime reports outputs as FP32 (`DataType.FLOAT`).
- No warning indicates fallback. Users believe FP8 succeeded but receive FP32 performance.

## Expected Behavior
- Builder should fail fast (or emit an explicit warning) if FP8/FP4 are unsupported on the target SM.
- Alternatively, TensorRT should honor the hardware claims and produce true FP8/FP4 rings on Jetson Thor.

## Additional Context
- Jetson Thor architecture sheet advertises:
  - 1,035 TFLOPS (Dense FP4 | Sparse FP8 | Sparse INT8)
  - 517 TFLOPS (Dense FP8 | Sparse FP16)
  - 2070 TFLOPS (Sparse FP4)
- Jetson Thor GPU is compute capability 11.0, so FP8/FP4 should be available if the specs are accurate.
- FP16 builds succeed with expected size (~199 MB) and `DataType.HALF` outputs.
- Removing the verification we added means customers silently run FP32.

## Request
Please clarify:
1. Is FP8/FP4 officially supported on Jetson Thor in TensorRT 10.13.3.9?
2. If not, can TensorRT fail with an explicit message instead of silently building FP32?
3. If yes, how can we produce true FP8/FP4 engines on SM 110?

We’re ready to share full scripts, logs, and plan files if needed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Jetson Thor (SM 110) advertises FP8/FP4 throughput but TensorRT silently falls back to FP32 when FP8/FP4 flags are enabled #4590

Title

Summary

Environment

Reproduction Steps

Actual Behavior

Expected Behavior

Additional Context

Request

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Jetson Thor (SM 110) advertises FP8/FP4 throughput but TensorRT silently falls back to FP32 when FP8/FP4 flags are enabled #4590

Description

Title

Summary

Environment

Reproduction Steps

Actual Behavior

Expected Behavior

Additional Context

Request

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions