Ideas how to use less VRAM for onnx models? #14472

elephantpanda · 2023-01-30T03:45:18Z

elephantpanda
Jan 30, 2023

I have some onnx models loaded into the GPU with InferenceSession("model.onnx") in DirectML mode.

All my 3 onnx files come to a total of 1.77GB. (float16)

It is using about 2.5GB on the GPU to load in the models.
And a further 5.2 GB once the models have run their first inference.

Overall taking 8.2GB of GPU space.

Now the most common VRAM people have is 8GB. So I need to get about 10-20% saving of GPU.

Anyone have any tips how to decrease my GPU size a bit without sacrificing to much speed?