Skip to content

Conversation

@aaxelb
Copy link
Contributor

@aaxelb aaxelb commented Jun 9, 2025

first step of a less reckless approach to #876 -- mirror RawDatum.expiration_date to IndexcardRdf.expiration_date in preparation for dropping RawDatum (and before eventually renaming IndexcardRdf to ResourceDescription)

  • add expiration_date field to abstract IndexcardRdf model (base for ArchivedIndexcardRdf, LatestIndexcardRdf, SupplementaryIndexcardRdf)
  • when ingesting, save expiration_date in those tables as well as RawDatum
  • add trove management command migrate_rawdatum_expiration to copy existing expiration_date from RawDatum to SupplementaryIndexcardRdf (the only kind that expires, in current production usage)

ENG-8221

@aaxelb aaxelb marked this pull request as ready for review June 9, 2025 18:47
@aaxelb aaxelb requested a review from mfraezz June 9, 2025 18:48
@coveralls
Copy link

coveralls commented Jun 9, 2025

Coverage Status

coverage: 81.24% (-0.004%) from 81.244%
when pulling 08d4804 on aaxelb:feature/eng-8038--migrate-expirations
into 4a73dfe on CenterForOpenScience:develop.

@aaxelb aaxelb changed the title [ENG-8038] less database data; part 1 [ENG-8221][ENG-8038] less database data; part 1 Jun 13, 2025
Copy link
Member

@mfraezz mfraezz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor note, otherwise LGTM. Benchmarking indicated that this may take a while, but a manageable amount of time.

)
if continue_after is not None:
_raw_qs = _raw_qs.filter(pk__gt=continue_after)
for _raw_pk_chunk in pk_chunked(_raw_qs, chunk_size):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewrote this loop slightly to be non-modifying, and did some benchmarking. A little slow to start due to _raw_qs and the size of the RawDatum table, but appears workable. EXPLAINALYZE'd the slowest query, and no apparent way to speed it up:

                                                                                QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using share_rawdatum_expiration_idx on share_rawdatum  (cost=0.57..42489486.91 rows=3096 width=1069) (actual time=0.284..13910.434 rows=483200 loops=1)
   Index Cond: (expiration_date > '2025-06-16'::date)
   Filter: (id = (SubPlan 2))
   Rows Removed by Filter: 172539
   SubPlan 1
     ->  Limit  (cost=67.95..67.95 rows=1 width=12) (actual time=0.006..0.006 rows=1 loops=483200)
           ->  Sort  (cost=67.95..68.20 rows=102 width=12) (actual time=0.005..0.005 rows=1 loops=483200)
                 Sort Key: (COALESCE(u0.datestamp, u0.date_created)) DESC NULLS LAST
                 Sort Method: quicksort  Memory: 25kB
                 ->  Index Scan using share_rawdatum_01248af3 on share_rawdatum u0  (cost=0.57..67.44 rows=102 width=12) (actual time=0.002..0.004 rows=4 loops=483200)
                       Index Cond: (suid_id = share_rawdatum.suid_id)
   SubPlan 2
     ->  Limit  (cost=67.95..67.95 rows=1 width=12) (actual time=0.014..0.014 rows=1 loops=655739)
           ->  Sort  (cost=67.95..68.20 rows=102 width=12) (actual time=0.014..0.014 rows=1 loops=655739)
                 Sort Key: (COALESCE(u0_1.datestamp, u0_1.date_created)) DESC NULLS LAST
                 Sort Method: quicksort  Memory: 25kB
                 ->  Index Scan using share_rawdatum_01248af3 on share_rawdatum u0_1  (cost=0.57..67.44 rows=102 width=12) (actual time=0.005..0.012 rows=5 loops=655739)
                       Index Cond: (suid_id = share_rawdatum.suid_id)
 Planning time: 6.497 ms
 Execution time: 13939.482 ms
(20 rows)

@aaxelb aaxelb merged commit c5e2669 into CenterForOpenScience:develop Jun 16, 2025
2 of 3 checks passed
@aaxelb aaxelb deleted the feature/eng-8038--migrate-expirations branch June 17, 2025 13:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants