This repository contains the implementation and analysis of 2D convolution algorithms using CUDA, focusing on optimizing performance through shared memory utilization. The study was conducted as part of the Master's Program in High-Performance Computing Engineering at Politecnico di Milano.
- CUDA-based 2D convolution implementation:
- Without Tiling: Direct shared memory usage.
- With Tiling: Optimized shared memory usage for larger matrices and kernels.
- Performance benchmarks conducted on a Tesla T4 GPU.
- Detailed report with methodology, results, and discussion.
cuda_example.ipynb: Jupyter notebook containing the CUDA implementation and experiments.report/2D_Convolution_Report.pdf: Academic report detailing the project.README.md: Overview of the repository and usage instructions.
- NVIDIA GPU with CUDA support.
- CUDA Toolkit installed.
- Python with the following packages:
numpymatplotlibnumba
- Clone the repository:
git clone https://github.com/your-username/2D-Convolution-CUDA.git cd 2D-Convolution-CUDA - Open the Jupyter notebook:
jupyter notebook cuda_example.ipynb
- Execute the cells to run the 2D convolution experiments.
Key findings from the experiments:
- Tiling significantly enhances performance for larger matrices and kernels.
- Memory utilization strategies are critical for high-performance GPU computations.
For detailed results and discussions, refer to the report.
- Salvatore Mariano Librici