Skip to content

Jetson Thor (SM 110) advertises FP8/FP4 throughput but TensorRT silently falls back to FP32 when FP8/FP4 flags are enabled #4590

@bccw2021

Description

@bccw2021

Title

Jetson Thor (SM 110) advertises FP8/FP4 throughput but TensorRT silently falls back to FP32 when FP8/FP4 flags are enabled

Summary

  • Expectation: Jetson Thor datasheet claims 1,035 TFLOPS (Dense FP4 | Sparse FP8 | Sparse INT8) and 517 TFLOPS (Dense FP8 | Sparse FP16), so the platform should accelerate FP8/FP4 in TensorRT.
  • Reality: On Jetson Thor (compute capability 11.0), TensorRT 10.13.3.9 accepts BuilderFlag.FP8/BuilderFlag.FP4 but silently builds FP32 engines (larger files, DataType.FLOAT outputs, FP32-scale weights). No error or warning indicates the fallback.
  • Impact: Users targeting advertised low-precision formats waste time debugging. Real throughput stays at FP32, contrary to product specs.

Environment

Reproduction Steps

  1. Prepare Jetson Thor with TensorRT 10.13.3.9 and CUDA 13.0.
  2. Parse an ONNX model and request FP8:
    import tensorrt as trt
    from pathlib import Path
    
    logger = trt.Logger(trt.Logger.VERBOSE)
    builder = trt.Builder(logger)
    network = builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))
    parser = trt.OnnxParser(network, logger)
    
    onnx_path = Path("examples/gpt2.onnx")
    with open(onnx_path, "rb") as f:
        parser.parse(f.read())
    
    config = builder.create_builder_config()
    config.set_memory_pool_limit(trt.MemoryPoolType.WORKSPACE, 4 << 30)
    config.set_flag(trt.BuilderFlag.FP8)          # Same behavior with FP4
    
    serialized_engine = builder.build_serialized_network(network, config)
    Path("gpt2_fp8.plan").write_bytes(serialized_engine)
  3. Inspect the plan:
    runtime = trt.Runtime(logger)
    engine = runtime.deserialize_cuda_engine(Path("gpt2_fp8.plan").read_bytes())
    for i in range(engine.num_io_tensors):
        name = engine.get_tensor_name(i)
        dtype = engine.get_tensor_dtype(name)
        print(name, dtype)
  4. Compare plan sizes:
    # FP16 reference (TensorRT flag FP16):
    gpt2_fp16.plan ≈ 199 MB
    
    # FP8 request:
    gpt2_fp8.plan ≈ 382 MB  (roughly double FP16, matching FP32)
    
  5. Observe TensorRT console logs:
    [TRT] [I] Total Weights Memory: 399224448 bytes
    ...
    Outputs:
      logits → DataType.FLOAT
    
  6. Same behavior with FP4 flag: plan size ≈ 382 MB, outputs still DataType.FLOAT.

Actual Behavior

  • build_serialized_network returns a plan without errors.
  • Generated engine sizes match FP32, not FP8/FP4.
  • Runtime reports outputs as FP32 (DataType.FLOAT).
  • No warning indicates fallback. Users believe FP8 succeeded but receive FP32 performance.

Expected Behavior

  • Builder should fail fast (or emit an explicit warning) if FP8/FP4 are unsupported on the target SM.
  • Alternatively, TensorRT should honor the hardware claims and produce true FP8/FP4 rings on Jetson Thor.

Additional Context

  • Jetson Thor architecture sheet advertises:
    • 1,035 TFLOPS (Dense FP4 | Sparse FP8 | Sparse INT8)
    • 517 TFLOPS (Dense FP8 | Sparse FP16)
    • 2070 TFLOPS (Sparse FP4)
  • Jetson Thor GPU is compute capability 11.0, so FP8/FP4 should be available if the specs are accurate.
  • FP16 builds succeed with expected size (~199 MB) and DataType.HALF outputs.
  • Removing the verification we added means customers silently run FP32.

Request

Please clarify:

  1. Is FP8/FP4 officially supported on Jetson Thor in TensorRT 10.13.3.9?
  2. If not, can TensorRT fail with an explicit message instead of silently building FP32?
  3. If yes, how can we produce true FP8/FP4 engines on SM 110?

We’re ready to share full scripts, logs, and plan files if needed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Module:QuantizationIssues related to QuantizationModule:RuntimeOther generic runtime issues that does not fall into other modules

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions