How to run float16 on CUDAExecutionProvider #13145

FrancescoSaverioZuppichini · 2022-09-28T16:28:34Z

FrancescoSaverioZuppichini
Sep 28, 2022

Like title.

Thanks!

Cheers

tianleiwu · 2022-09-30T15:30:53Z

tianleiwu
Sep 30, 2022
Collaborator

CUDA EP supports float16, you need make sure your model uses float16 (like weights shall be in float16) first.

You can use https://github.com/microsoft/onnxconverter-common/blob/master/onnxconverter_common/float16.py to convert model from float to float16.

0 replies

FrancescoSaverioZuppichini · 2022-10-03T11:41:38Z

FrancescoSaverioZuppichini
Oct 3, 2022
Author

Hi @tianleiwu , thanks a lot! Do you think converting my models weights to fp16 in pytorch before exporting is sufficient?

0 replies

tianleiwu · 2022-10-15T17:39:00Z

tianleiwu
Oct 15, 2022
Collaborator

@FrancescoSaverioZuppichini, I recommend export fp32 models, then run graph optimization and finally convert to fp16. It is because some optimization requires that (extra Cast nodes might block some graph optimizations).

3 replies

FrancescoSaverioZuppichini Nov 8, 2022
Author

Thanks @tianleiwu , any snippet/tutorial I can use?

tianleiwu Dec 3, 2022
Collaborator

Example snippet for huggingface BERT model:

onnxruntime/onnxruntime/python/tools/transformers/onnx_exporter.py

Lines 244 to 263 in 335b62b

    
           # Use script to optimize model. 
        
           # Use opt_level <= 1 for models to be converted to fp16, because some fused op (like FusedGemm) has only fp32 and no fp16. 
        
           # It is better to be conservative so we use opt_level=0 here, in case MemcpyFromHost is added to the graph by OnnxRuntime. 
        
           opt_model = optimize_model( 
        
               onnx_model_path, 
        
               model_type, 
        
               num_heads=num_attention_heads, 
        
               hidden_size=hidden_size, 
        
               opt_level=0, 
        
               optimization_options=optimization_options, 
        
               use_gpu=use_gpu, 
        
               only_onnxruntime=False, 
        
           ) 
        
           if model_type == "bert_keras" or model_type == "bert_tf": 
        
               opt_model.use_dynamic_axes() 
        
           model_fusion_statistics[optimized_model_path] = opt_model.get_fused_operator_statistics() 
        
           if Precision.FLOAT16 == precision: 
        
               opt_model.convert_float_to_float16(keep_io_types=True)

FrancescoSaverioZuppichini Dec 6, 2022
Author

Thanks! I'll check it out

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to run float16 on CUDAExecutionProvider #13145

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 3 comments 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How to run float16 on CUDAExecutionProvider #13145

Uh oh!

Uh oh!

FrancescoSaverioZuppichini Sep 28, 2022

Replies: 3 comments · 3 replies

Uh oh!

tianleiwu Sep 30, 2022 Collaborator

Uh oh!

FrancescoSaverioZuppichini Oct 3, 2022 Author

Uh oh!

tianleiwu Oct 15, 2022 Collaborator

Uh oh!

FrancescoSaverioZuppichini Nov 8, 2022 Author

Uh oh!

tianleiwu Dec 3, 2022 Collaborator

Uh oh!

FrancescoSaverioZuppichini Dec 6, 2022 Author

FrancescoSaverioZuppichini
Sep 28, 2022

Replies: 3 comments 3 replies

tianleiwu
Sep 30, 2022
Collaborator

FrancescoSaverioZuppichini
Oct 3, 2022
Author

tianleiwu
Oct 15, 2022
Collaborator

FrancescoSaverioZuppichini Nov 8, 2022
Author

tianleiwu Dec 3, 2022
Collaborator

FrancescoSaverioZuppichini Dec 6, 2022
Author