Why is "FP16 is not supported on CPU; using FP32 instead" on Ampere A1? #978

FlippFuzz · 2023-02-18T07:59:09Z

FlippFuzz
Feb 18, 2023

I'm running whisper on a free instance on Oracle Cloud.
It has 4x Amphere A1 ARM CPUs and 24G of RAM.

I'm only performing basic usage and do not yet have a sound understanding
An example command that I'm using via the CLI:

whisper file.webm --model large-v2 --language Japanese --task translate --verbose True

python3.10/site-packages/whisper/transcribe.py:78: UserWarning: FP16 is not supported on CPU; using FP32 instead
  warnings.warn("FP16 is not supported on CPU; using FP32 instead")

According to https://learncloudnative.com/blog/2022-03-16-running_ai_on_ampere_instance FP16 is actually supported on Amphere A1.

How do I forcefully enable FP16? Is it as simple as commenting away the check at https://github.com/openai/whisper/blob/main/whisper/transcribe.py#L79?
I'm not familiar enough with machine learning in general, to know if anything else depends on this.

Here is the CPU information.

$ lscpu
Architecture:            aarch64
  CPU op-mode(s):        32-bit, 64-bit
  Byte Order:            Little Endian
CPU(s):                  4
  On-line CPU(s) list:   0-3
Vendor ID:               ARM
  Model name:            Neoverse-N1
    Model:               1
    Thread(s) per core:  1
    Core(s) per cluster: 4
    Socket(s):           -
    Cluster(s):          1
    Stepping:            r3p1
    BogoMIPS:            50.00
    Flags:               fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs
NUMA:
  NUMA node(s):          1
  NUMA node0 CPU(s):     0-3
Vulnerabilities:
  Itlb multihit:         Not affected
  L1tf:                  Not affected
  Mds:                   Not affected
  Meltdown:              Not affected
  Mmio stale data:       Not affected
  Retbleed:              Not affected
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:            Mitigation; __user pointer sanitization
  Spectre v2:            Mitigation; CSV2, BHB
  Srbds:                 Not affected
  Tsx async abort:       Not affected

Answered by jan-grzybek-ampere

Feb 23, 2023

Hi,

You may find this implementation of whisper handy: https://github.com/ggerganov/whisper.cpp
fp16 can be enabled there on Ampere A1 systems by modifying Makefile:

diff --git a/Makefile b/Makefile
index b61a588..50a312d 100644
--- a/Makefile
+++ b/Makefile
@@ -141,6 +141,7 @@ ifdef WHISPER_GPROF
CXXFLAGS += -pg
endif
ifneq ($(filter aarch64%,$(UNAME_M)),)
+ CFLAGS += -march=armv8.2-a+fp16
endif
ifneq ($(filter armv6%,$(UNAME_M)),)
# Raspberry Pi 1, 2, 3

Also you could find Ampere optimized Torch interesting: https://amperecomputing.com/solutions/ampere-ai

View full answer

FlippFuzz · 2023-02-18T12:44:03Z

FlippFuzz
Feb 18, 2023
Author

So, I tried commenting out the check for CPU and got:

Traceback (most recent call last):
  File "/home/jenkins/workspace/Whisper/whisperNew.py", line 23, in <module>
    result = transcribe(model, audio_path, temperature=temperature, **args)
  File "/home/jenkins/.local/lib/python3.10/site-packages/whisper/transcribe.py", line 181, in transcribe
    result: DecodingResult = decode_with_fallback(segment)
  File "/home/jenkins/.local/lib/python3.10/site-packages/whisper/transcribe.py", line 117, in decode_with_fallback
    decode_result = model.decode(segment, options)
  File "/usr/local/lib/python3.10/dist-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/jenkins/.local/lib/python3.10/site-packages/whisper/decoding.py", line 705, in decode
    result = DecodingTask(model, options).run(mel)
  File "/usr/local/lib/python3.10/dist-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/jenkins/.local/lib/python3.10/site-packages/whisper/decoding.py", line 621, in run
    audio_features: Tensor = self._get_audio_features(mel)  # encoder forward pass
  File "/home/jenkins/.local/lib/python3.10/site-packages/whisper/decoding.py", line 565, in _get_audio_features
    audio_features = self.model.encoder(mel)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/jenkins/.local/lib/python3.10/site-packages/whisper/model.py", line 149, in forward
    x = F.gelu(self.conv1(x))
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py", line 313, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/home/jenkins/.local/lib/python3.10/site-packages/whisper/model.py", line 43, in _conv_forward
    return super()._conv_forward(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py", line 309, in _conv_forward
    return F.conv1d(input, weight, bias, self.stride,
RuntimeError: "slow_conv2d_cpu" not implemented for 'Half'

Looks like pytorch is not implementing this.
pytorch/pytorch#84348

0 replies

jan-grzybek-ampere · 2023-02-23T20:58:51Z

jan-grzybek-ampere
Feb 23, 2023

Hi,

You may find this implementation of whisper handy: https://github.com/ggerganov/whisper.cpp
fp16 can be enabled there on Ampere A1 systems by modifying Makefile:

diff --git a/Makefile b/Makefile
index b61a588..50a312d 100644
--- a/Makefile
+++ b/Makefile
@@ -141,6 +141,7 @@ ifdef WHISPER_GPROF
CXXFLAGS += -pg
endif
ifneq ($(filter aarch64%,$(UNAME_M)),)
+ CFLAGS += -march=armv8.2-a+fp16
endif
ifneq ($(filter armv6%,$(UNAME_M)),)
# Raspberry Pi 1, 2, 3

Also you could find Ampere optimized Torch interesting: https://amperecomputing.com/solutions/ampere-ai

3 replies

FlippFuzz Feb 24, 2023
Author

Thank you! This is fantastic.
I've done a quick benchmark of that and there is a huge speedup with your suggestions.

ggml-org/whisper.cpp#89 (comment)

FlippFuzz Feb 26, 2023
Author

I've also tried the Ampere Optimized PyTorch at https://amperecomputing.com/solutions/ampere-ai but unfortunately, that version of PyTorch doesn't seem to support fp16 too. Got the same error message as #978 (comment) when trying to forcefully use fp16.

jan-grzybek-ampere Feb 27, 2023

Hi, we do support what we call implicit fp16 in Ampere Optimized PyTorch. It does conversion to fp16 at the runtime, at the level of our backend while not relying on torch's support. However, torch.jit scripted model is required to take advantage of that functionality and that's not the case with Whisper unfortunately.

6aligula · 2023-04-30T18:57:43Z

6aligula
Apr 30, 2023

result = whisper.decode(model, mel, options, fp16=False)
fp16= False is good for me

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Why is "FP16 is not supported on CPU; using FP32 instead" on Ampere A1? #978

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 3 comments 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Why is "FP16 is not supported on CPU; using FP32 instead" on Ampere A1? #978

Uh oh!

Uh oh!

FlippFuzz Feb 18, 2023

Replies: 3 comments · 3 replies

Uh oh!

FlippFuzz Feb 18, 2023 Author

Uh oh!

jan-grzybek-ampere Feb 23, 2023

Uh oh!

FlippFuzz Feb 24, 2023 Author

Uh oh!

FlippFuzz Feb 26, 2023 Author

Uh oh!

jan-grzybek-ampere Feb 27, 2023

Uh oh!

6aligula Apr 30, 2023

FlippFuzz
Feb 18, 2023

Replies: 3 comments 3 replies

FlippFuzz
Feb 18, 2023
Author

jan-grzybek-ampere
Feb 23, 2023

FlippFuzz Feb 24, 2023
Author

FlippFuzz Feb 26, 2023
Author

6aligula
Apr 30, 2023