-
Notifications
You must be signed in to change notification settings - Fork 69
[ENG-8221][ENG-8038] less database data; part 1 #877
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENG-8221][ENG-8038] less database data; part 1 #877
Conversation
mfraezz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor note, otherwise LGTM. Benchmarking indicated that this may take a while, but a manageable amount of time.
| ) | ||
| if continue_after is not None: | ||
| _raw_qs = _raw_qs.filter(pk__gt=continue_after) | ||
| for _raw_pk_chunk in pk_chunked(_raw_qs, chunk_size): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rewrote this loop slightly to be non-modifying, and did some benchmarking. A little slow to start due to _raw_qs and the size of the RawDatum table, but appears workable. EXPLAINALYZE'd the slowest query, and no apparent way to speed it up:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using share_rawdatum_expiration_idx on share_rawdatum (cost=0.57..42489486.91 rows=3096 width=1069) (actual time=0.284..13910.434 rows=483200 loops=1)
Index Cond: (expiration_date > '2025-06-16'::date)
Filter: (id = (SubPlan 2))
Rows Removed by Filter: 172539
SubPlan 1
-> Limit (cost=67.95..67.95 rows=1 width=12) (actual time=0.006..0.006 rows=1 loops=483200)
-> Sort (cost=67.95..68.20 rows=102 width=12) (actual time=0.005..0.005 rows=1 loops=483200)
Sort Key: (COALESCE(u0.datestamp, u0.date_created)) DESC NULLS LAST
Sort Method: quicksort Memory: 25kB
-> Index Scan using share_rawdatum_01248af3 on share_rawdatum u0 (cost=0.57..67.44 rows=102 width=12) (actual time=0.002..0.004 rows=4 loops=483200)
Index Cond: (suid_id = share_rawdatum.suid_id)
SubPlan 2
-> Limit (cost=67.95..67.95 rows=1 width=12) (actual time=0.014..0.014 rows=1 loops=655739)
-> Sort (cost=67.95..68.20 rows=102 width=12) (actual time=0.014..0.014 rows=1 loops=655739)
Sort Key: (COALESCE(u0_1.datestamp, u0_1.date_created)) DESC NULLS LAST
Sort Method: quicksort Memory: 25kB
-> Index Scan using share_rawdatum_01248af3 on share_rawdatum u0_1 (cost=0.57..67.44 rows=102 width=12) (actual time=0.005..0.012 rows=5 loops=655739)
Index Cond: (suid_id = share_rawdatum.suid_id)
Planning time: 6.497 ms
Execution time: 13939.482 ms
(20 rows)
first step of a less reckless approach to #876 -- mirror
RawDatum.expiration_datetoIndexcardRdf.expiration_datein preparation for droppingRawDatum(and before eventually renamingIndexcardRdftoResourceDescription)expiration_datefield to abstractIndexcardRdfmodel (base forArchivedIndexcardRdf,LatestIndexcardRdf,SupplementaryIndexcardRdf)expiration_datein those tables as well asRawDatumtrovemanagement commandmigrate_rawdatum_expirationto copy existingexpiration_datefromRawDatumtoSupplementaryIndexcardRdf(the only kind that expires, in current production usage)ENG-8221