Skip to content

geokoko/ntua-parlab

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Parallel Processing Systems - ECE NTUA

Course assignments for Parallel Processing Systems at the National Technical University of Athens (9th semester). Four assignments covering the full spectrum of parallel programming: shared-memory (OpenMP, Pthreads), GPU (CUDA), and distributed-memory (MPI).

Students Involved

Repository Structure

a1/   - OpenMP Game of Life
a2/   - Shared-memory parallel algorithms (OpenMP, Pthreads)
a3/   - CUDA GPU k-means implementations
a4/   - MPI distributed heat transfer & k-means

Assignments

Conway's Game of Life on an N x N grid, parallelized with OpenMP. Supports configurable grid size, timesteps, and optional GIF output.

Four sub-projects exploring different parallelization and synchronization strategies:

  • Concurrent Linked List — Six linked list implementations benchmarking different synchronization: serial, coarse-grain lock, fine-grain lock, optimistic, lazy deletion, and lock-free (CAS)
  • Floyd-Warshall — All-pairs shortest path: standard, tiled (cache-optimized), and scale-and-recurse (recursive blocking), parallelized with OpenMP
  • K-Means Clustering — Sequential, naive OpenMP, and reduction-based OpenMP with false-sharing analysis
  • K-Means with Lock Variants — Nine synchronization strategies: OpenMP critical, TAS, TTAS, CLH, array lock, pthread mutex/spinlock

GPU-accelerated k-means with eight CUDA kernel variants exploring shared memory, coalesced access, fused kernels, and parallel reduction optimizations.

Requirements: CUDA toolkit, NVIDIA Tesla V100 or similar GPU.

  • MPI K-Means — Distributed k-means using MPI data partitioning and collective centroid updates
  • Heat Transfer — 2D heat equation with three solvers (Jacobi, Gauss-Seidel SOR, Red-Black SOR), each with serial and MPI versions using 2D Cartesian domain decomposition and ghost cell exchanges

Technologies used

  • The core implementations have been done using C as the programming language.
  • For implementing shared-memory address space parallelism (a1, a2, a4), OpenMP was used.
  • For GPU parallelism, we utilized the CUDA framework (a3).
  • For Distributed-memory parallelism (a4), we used MPI (Message Passing Interface).
  • Python's matplotlib library was used for plotting and performance analysis.

Building

Each assignment directory contains its own Makefile. See the individual READMEs linked above for specific compilation and usage instructions.

Queue Usage Summary

Experiments are submitted via PBS queue scripts:

  • A4 (MPI): qsub -q parlab -l nodes=...:ppn=... script.sh
  • A3 (CUDA): qsub -q serial -l nodes=silver1:ppn=40 script.sh
  • A1/A2 (OpenMP): qsub -q serial -l nodes=sandman:ppn=64 script.sh

Replace script.sh with the appropriate make_on_queue.sh or run_on_queue.sh script for each subdirectory.

Utilities

scirouter/ contains scripts (push.sh, pull.sh) for transferring data to/from CSLab's scirouter server. To use these scripts, follow the instructions here.

About

Lab exercises of Parallel Processing Systems Course (ECE NTUA, 9th Semester)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors