To run MERGE operation, 2 join queries are required. The first one is joining the whole target table and source data, to find *touched* files including any matched row. The other one is for actual MERGE operation only with *touched* files of the target table. The first join query is lighter as it only reads columns in the matching condition. Although Delta performs the first join to reduce the amount of data for the actual merge process, still a huge number of *unmodified* rows in *touched* files could go through the second join process which includes heavy shuffling process.
0 commit comments