Skip to content

Conversation

@NimaSarajpoor
Copy link
Collaborator

Tested numpy_fft but with a different rfft in which some computations are combined. See the changed files. For convenience, sharing it here as well:

import math
import numpy as np

from numba import njit

@njit(fastmath=True)
def _rfft_preprocess_TQ(T, Q):
    n = len(T)
    m = len(Q)

    Q0 = np.empty(n, dtype=np.float64)
    Q0[:m] = Q[::-1]
    Q0[m:] = 0.0

    half_n = n // 2
    out = np.empty((2, half_n), dtype=np.complex128)
    
    m_stop = m // 2
    for i in range(m_stop):
        out[0, i] = T[2 * i] + 1j * T[2 * i + 1]
        out[1, i] = Q0[2 * i] + 1j * Q0[2 * i + 1]

    for i in range(m_stop, half_n):
        out[0, i] = T[2 * i] + 1j * T[2 * i + 1]
    out[1, m_stop:] = 0.0
        
    return out


@njit(fastmath=True)
def _rfft_postprocess_TQ(TQ):
    x_T = TQ[0]
    x_Q = TQ[1]

    n_x = len(x_T)
    half_n_x = n_x // 2

    F = np.empty(n_x + 1, dtype=np.complex128)

    # 0th element, half_n_x, and n_x
    F[0] = (x_T[0].real + x_T[0].imag) * (x_Q[0].real + x_Q[0].imag)
    F[half_n_x] = (x_T[half_n_x] * x_Q[half_n_x]).conjugate()
    F[n_x] = (x_T[0].real - x_T[0].imag) * (x_Q[0].real - x_Q[0].imag)

    theta0 = math.pi / n_x
    factor = math.cos(theta0) - 1j * math.sin(theta0)
    w = 0.5j
    for k in range(1, half_n_x):
        w = w * factor

        val_T = (x_T[k] - x_T[n_x - k].conjugate()) * (0.5 + w)
        val_Q = (x_Q[k] - x_Q[n_x - k].conjugate()) * (0.5 + w)

        F[k] = (x_T[k] - val_T) * (x_Q[k] - val_Q)
        F[n_x - k] = (x_T[n_x - k] + val_T.conjugate()) * (x_Q[n_x - k] + val_Q.conjugate())

    return F


def _rfft_TQ(T, Q):
    TQ = _rfft_preprocess_TQ(T, Q)
    np.fft.fft(TQ, axis=1, out=TQ)
    return _rfft_postprocess_TQ(TQ)


def sliding_dot_product(Q, T):
    n = len(T)
    m = len(Q)
    
    F = _rfft_TQ(T, Q)

    return np.fft.irfft(F)[m - 1 : n]


def setup(Q, T):
    return sliding_dot_product(Q, T)

Note that I wanted to set niter to a large number, and therefore I decided to only check performance for a shorter range of p. So, that's why I set pmin and pmax to 6 and 15, respectively. I also did not consider njit as it slows down the total running time.

image

# in test.sh

./test.py -niter 10000 -pmin 6 -pmax 15 pyfftw_sdp numpy_fft_sdp challenger_sdp > timing.csv

@NimaSarajpoor
Copy link
Collaborator Author

NimaSarajpoor commented Mar 1, 2025

To better understand the idea, I need to first share RFFT algorithm obtained from OTFFT's doc.

def rfft(T):
    n = len(T)   # should be power of two (??)
    half_n = n // 2
    y = np.empty(half_n + 1, dtype=np.complex128)
    
    x = np.empty(half_n, dtype=np.complex128)
    for i in range(half_n):
        x[i] = T[2 * i] + 1j * T[2 * i + 1]

    COMPUTE_FFT(x, y[:half_n]) # here we can use any FFT's function. I used `np.fft.fft`
    
    y[0] = x[0].real + x[0].imag
    y[n // 4] = x[n // 4].conjugate()
    y[half_n] = x[0].real - x[0].imag
    
    theta0 = math.pi / half_n
    factor = math.cos(theta0) - 1j * math.sin(theta0)
    w = 0.5j
    for k in range(1, n // 4):
        w = w * factor
        val = (x[k] - x[half_n - k].conjugate()) * (0.5 + w)
        y[k] = x[k] - val
        y[half_n - k] = x[half_n - k] + val.conjugate()
        
    return y

Note that we do RFFT(T) * RFFT(Q) in numpy_fft. So, we can use RFFT's function above... then change RFFT function such that it takes both T and Q, and compute "their RFFT AND their multiplication" on one go. Note that the function can be njit-decorated except the COMPUTE_FFT part.

@NimaSarajpoor
Copy link
Collaborator Author

NimaSarajpoor commented Mar 1, 2025

For the most cases, the results are close to numpy_fft. It is slightly(?) better but the code is more complicated now.

@seanlaw
Copy link
Contributor

seanlaw commented Mar 2, 2025

For the most cases, the results are close to numpy_fft. It is slightly(?) better but the code is more complicated now.

Yeah, the difference is insignificant and probably not worth the add complexity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants