Skip to content

Conversation

@aaxelb
Copy link
Contributor

@aaxelb aaxelb commented Jun 3, 2025

mainly:

  • delete RawDatum model
  • knit the rest back together
    • update trove.digestive_tract
      • /trove/ingest view now calls digestive_tract.ingest (which extracts and saves the given record) instead of digestive_tract.swallow (which would save the request body as RawDatum for later extraction)
      • task__extract_and_derive removed (extraction should now be synchronous)
    • (update: moved to [ENG-8221][ENG-8038] less database data; part 1 #877 ) add expiration_date to base ResourceDescription
      • replaces RawDatum.expiration_date
      • add data migration to copy expiration dates from RawDatum to SupplementaryResourceDescription
        (expiration_date is used by osf only for supplements)
    • update SourceUniqueIdentifier.get_date_first_seen
      • use ArchivedResourceDescription.created instead of RawDatum.date_created
  • let non-successful CeleryTaskResults get deleted eventually (but keep them longer than successes)
    • CELERY_RESULT_EXPIRES, used for successes, now set from environment variable (with previous hard-coding as default)
    • FAILED_CELERY_RESULT_EXPIRES new environment variable, for all non-success statuses, default 11 days (different prefix because it's not used by celery directly)

also, along the way:

  • small fixes, as encountered
    • error at share.example/api/v2/ root -- fix rss/atom feed link generation
    • pagination at share.example/api/v2/sourceconfigs/
    • instead of hard-coded default admin username/password (and undocumented one-off env var)...
      • move env-var access to settings/environment:
        • SHARE_ADMIN_USERNAME, default "admin"
        • SHARE_ADMIN_PASSWORD, no default except when DEBUG
  • delete some dead code
    • share.exceptions
  • rename IndexcardRdf (and kids) to ResourceDescription, to better convey intent (got confused multiple times by my own bad naming) by parallelling existing ResourceIdentifier
    • (even if "rdf" weren't foul arcana, the usage is wrong/confusing -- "this is a resource description" more helpful than nonsensical "this is an indexcard rdf")
    • move from trove.models.indexcard to trove.models.resource_description
    • ArchivedIndexcardRdf => ArchivedResourceDescription
    • LatestIndexcardRdf => LatestResourceDescription
    • SupplementaryIndexcardRdf => SupplementaryResourceDescription
    • indexcard_rdf => resource_description
    • indexcardrdf => resourcedescription
    • latest_rdf => latest_resource_description
    • latestrdf => latestresourcedescription
    • update_rdf => update_resource_description
    • update_supplementary_rdf => update_supplementary_description
  • move more models to their own files (while touching share.models.ingest and trove.models.indexcard)
    • move DerivedIndexcard to trove.models.derived_indexcard (from trove.models.indexcard)
    • leave Indexcard the only model in trove.models.indexcard, appropriately
    • move Source to share.models.source and SourceConfig to share.models.source_config (from share.models.ingest)
  • change github actions to stop on first failure (could consider separate jobs for short-circuit and full-run, in future)

ENG-8038

@coveralls
Copy link

coveralls commented Jun 4, 2025

Coverage Status

coverage: 81.714% (+0.5%) from 81.24%
when pulling 85bf0ad on aaxelb:feature/eng-8038--less-blobs
into f0e657b on CenterForOpenScience:develop.

@aaxelb aaxelb marked this pull request as ready for review June 4, 2025 17:48
@aaxelb aaxelb force-pushed the feature/eng-8038--less-blobs branch 3 times, most recently from 779d011 to f5b4dbb Compare June 4, 2025 19:21
@aaxelb aaxelb changed the title Feature/eng 8038 less blobs [ENG-8038] drop RawDatum Jun 4, 2025
@aaxelb aaxelb changed the title [ENG-8038] drop RawDatum [ENG-8038] fewer blobs in database Jun 4, 2025
Copy link
Member

@felliott felliott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@aaxelb
Copy link
Contributor Author

aaxelb commented Jun 9, 2025

may be splitting this across a few deployments -- will return to this and resolve conflicts after #877

@aaxelb aaxelb force-pushed the feature/eng-8038--less-blobs branch from f5b4dbb to e797a90 Compare June 9, 2025 20:46
@aaxelb aaxelb force-pushed the feature/eng-8038--less-blobs branch from 3690a91 to 013825f Compare June 16, 2025 19:59
@aaxelb aaxelb force-pushed the feature/eng-8038--less-blobs branch from 013825f to 85bf0ad Compare June 24, 2025 13:04
@aaxelb aaxelb merged commit 2a1e49d into CenterForOpenScience:develop Jun 24, 2025
3 checks passed
@aaxelb aaxelb deleted the feature/eng-8038--less-blobs branch June 24, 2025 14:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants