[export] fix stft decomp and making it consistent with cpp impl. (pytorch#149232)

ydwu4 · amathewc · commit bd5dde434fd6 · 2025-04-17T07:03:12.000+03:00
Summary: We change the fake impl of stft to follow more closely with its cpp implementation [here](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/native/SpectralOps.cpp#L951-L963) where " n_frames = 1 + (len - n_fft) / hop_length;" is also an integer division. Test Plan: Existing tests and buck2 build --flagfile fbcode//mode/dev fbcode//executorch/examples/models/fb/llama4:speech_transform.pte Differential Revision: D71209142 edit: we kept the original path un-changed. Pull Request resolved: pytorch#149232 Approved by: https://github.com/jackzhxng
diff --git a/torch/_refs/__init__.py b/torch/_refs/__init__.py
@@ -3451,10 +3451,12 @@ def stft(
         left = (n_fft - win_length_) // 2
         window = aten.constant_pad_nd(window, [left, n_fft - win_length_ - left])
 
-    input = input.unfold(dimension=-1, size=n_fft, step=hop_length_)
     if not center and align_to_window:
         input_pad_amount = (n_fft - win_length_) // 2
         input = aten.pad(input, [input_pad_amount, input_pad_amount], pad_mode)
+
+    input = input.unfold(dimension=-1, size=n_fft, step=hop_length_)
+
     if window is not None:
         input = input * window