-
Notifications
You must be signed in to change notification settings - Fork 46
Open
Description
Hi,
I'm hitting serious performance issues when marking missing rows between source/target tables.
explain
UPDATE pgcompare.dc_source s
SET compare_result = 'm'
WHERE s.tid = 282
AND NOT EXISTS (
SELECT 1
FROM pgcompare.dc_target t
WHERE t.tid = 282
AND s.pk_hash = t.pk_hash
);
"Update on dc_source s (cost=1083743.09..5648966.56 rows=0 width=0)"
" -> Hash Anti Join (cost=1083743.09..5648966.56 rows=41306268 width=20)"
" Hash Cond: ((s.pk_hash)::text = (t.pk_hash)::text)"
" -> Seq Scan on dc_source s (cost=0.00..2574867.50 rows=62358713 width=39)"
" Filter: (tid = 282)"
" -> Hash (cost=824329.60..824329.60 rows=12771079 width=39)"
" -> Seq Scan on dc_target t (cost=0.00..824329.60 rows=12771079 width=39)"
" Filter: (tid = 282)"I think it’s hard to achieve meaningful optimization just by tuning the SQL, so I've two questions:
- For tables with tens of millions of rows, Is there any suggested batch size as a reference?
- For this very large table, is it possible to avoid a full-table comparison? Can I just compare a specific time range instead?
Thanks for any advice.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels