Skip to content

How to use half precision ONNX models? #1447

@richarddd

Description

@richarddd

Question

Hi,

I just exported a detection model with fp16 using optimum.
--dtype fp16

This is my pipeline:

const model = await AutoModel.from_pretrained(
  "./onnx_llama",
  { dtype: "fp16", device: "cpu" } 
const processor = await AutoProcessor.from_pretrained("./onnx_llama");
const { pixel_values, reshaped_input_sizes } = await processor(image);
const buffer = await fs.readFile("image3.jpg");
const blob = new Blob([buffer]);

const image = await RawImage.fromBlob(blob);
const { pixel_values, reshaped_input_sizes } = await processor(image);
const { output0 } = await model({ pixel_values: tensor });

Using this results in:
An error occurred during model execution: "Error: Unexpected input data type. Actual: (tensor(float)) , expected: (tensor(float16))".

Which makes sense, however when i try to convert to fp16 "manually"

const fp16data = Float16Array.from(pixel_values.data); //float32ArrayToUint16Array(pixel_values.data);
const tensor = new Tensor("float16", fp16data, pixel_values.dims);
const { output0 } = await model({ pixel_values:tensor });

I get:
Tensor.data must be a typed array (4) for float16 tensors, but got typed array (0).

What's going on here? I tried to converting the pixel_data.data to a UInt16Array manually but that has no effect as it gets converted to a Float16Array in the tensor constructor anyway.

Help is much appreciated!

Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions