Our codebase contains CUDA synchronisations, many of which are unnecessary (e.g., #432, #433, and #434). These unnecessary synchronisations are impacting performance. The aim is to identify and remove these redundant device synchronizations to enhance overall efficiency.