Advanced Computer Architectures and High Performance Processors and Systems project.
Many scientific applications, including the computation of the Fast Fourier Transform (FTT), require an all to all operation. Performing this operation with a general-purpose processor can be very costly since the memory hierarchy used in such processors is not optimal for the intensive exchange of data required by the all to all. However, this operation can be sped up by using a customized hardware accelerator specifically designed for this task. In this project, we present an implementation of such an accelerator, that implements a broadcast algorithm that further explores the parallelism in message transmission. Finally, we have validated the accelerator through RTL software simulation and compared the obtained performance with respect to the available bandwidth.
Full project report available here.