Skip to content

Conversation

@davereinhart
Copy link
Collaborator

@davereinhart davereinhart commented Oct 20, 2025

Summary
Adds column metadata support and a multipart PATCH endpoint to update score set metadata and variant data together with conditional variant regeneration.

Key Changes

  • New dataset_columns (score/count column names + metadata)
  • New ScoreSetUpdateAllOptional model with as_form for multipart parsing
  • PATCH /api/v1/score-sets-with-variants/{urn} accepting:
    • scores_file, counts_file (optional)
    • score_columns_metadata_file, count_columns_metadata_file (JSON)
    • Partial metadata fields (title, publications, target genes, scoreRanges, etc.)
  • Target gene find_or_create helpers to prevent duplicate rows
  • Conditional variant rebuild (only on files or target gene changes)
  • Permission split: UPDATE vs SET_SCORES for file/variant changes
  • Processing state set to processing before enqueue
  • Worker fixtures for column metadata
  • Updated tests for new models and flows

Rationale
Unifies metadata and data updates, reduces redundant variant rebuilds, and introduces structured column annotations for downstream processing.

Backwards Compatibility
PUT still works; new fields are additive. No behavior change for metadata-only updates.

Introduce dataset_columns field on score sets allowing definition of score and count column metadata.
Establish base plumbing for persisting and returning column structure with score set resources.
Add dataset_columns to score set updates

Allows two keys of dataset_columns to be set up score_set_update: score_columns_metadata, and count_columns_metadata.  These can be populated with JSON, keyed on column name, with values describing the fields listed in the score_columns and count_columns values of dataset_columns. Both score and count column metadata are optional, but if present will be utilized by the UI to display more information about custom columns present in each dataset.
…nts as part of updating variants

Reworks previous implementation that includes scores and counts column metadata as part of JSON body to instead have them submitted as optional JSON files along with scores and counts CSV files. This change also includes validation of the JSON files with the CSV files they correspond to. Added pydantic models for DatasetColumns and DatasetColumnMetadata to correspond with ScoreSet.dataset_columns and its score_columns_metadata and count_columns_metadata fields. Ensures background jobs have structured column annotations available for downstream processing.
Create update model where all fields are optional plus as_form classmethod to parse multipart form data.
Enables PATCH/PUT endpoints to accept partial updates including JSON-encoded nested objects.
Extract dataset column models into view_models/score_set_dataset_columns for cohesion and reuse.
Improves maintainability by separating concerns from core score set models.
Remove test relying on camelization since dataset_columns now backed by explicit pydantic models.
Adjust wrapper function to satisfy mypy type checking.
Add recordType to SavedDatasetColumns for clearer differentiation of stored column metadata groups.
Update related constants and tests to reflect new attribute.
…pload

Implement /score-sets-with-variants/{urn} PATCH handling multipart form (files + metadata).
Supports simultaneous update of score set fields, scores CSV, counts CSV, and column metadata JSON.
Provide utility functions to avoid redundant target gene recreation on update.
Improves efficiency and correctness of target gene handling during score set modifications.
Rename internal variables for consistency (score_columns_metadata, count_columns_metadata).
Adjust route logic to accept both score/count files and their metadata JSON counterparts.
Integrate file parsing (scores, counts, column metadata) into PATCH workflow.
Adds validation + conditional variant regeneration based on uploaded content and target gene changes.
Revise existing tests and introduce new ones to cover optional update model and multipart handling.
Ensures regression coverage for newly added endpoint behaviors.
… usage

Add targeted tests for optional update model.
Apply model validation within router context to assert coherent partial update semantics.
Introduce score_columns_metadata.json and count metadata test files.
Enable worker job tests to parse realistic column annotations during variant ingestion pipeline.
@davereinhart davereinhart linked an issue Oct 21, 2025 that may be closed by this pull request
Copy link
Collaborator

@bencap bencap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really nice and should be a solid base for anything else we do with column metadata. Thanks for the related changes to the enqueue duplication, even though that was ancillary that is great to get cleaned up!

Comments are pretty minor and I left a couple about refactoring out some methods / classes such that they could be re-used later.

Before merging I would love to get a couple more test cases that cover the two find_or_create_target functions (two each for find and create) plus two tests on the score_set_update function that asserts the boolean field is set correctly for a target that doesn't need an enqueue and for a target that does. After that, I think the logic all looks good to merge.

@davereinhart davereinhart force-pushed the davereinhart/scoreset-column-metadata branch from f5d8180 to 2950b2e Compare October 27, 2025 19:49
…tadata as JSON

Updates to score set router endpoints to have score and count columns metadata included in request payload as JSON data rather than Upload files. This is more consistent with how the extra_metadata field is handled, and will allow for more flexibility to design an interactive UI for creating column metadata in the future (either in addition to or to replace the current JSON file form inputs).

This commit also fixes an exisitng bug, where None values were being excluded from score set updates, which prevented any way of clearning existing values from fields.
…as JSON

Also adds test_add_score_set_variants_scores_counts_and_column_metadata_endpoint unit test, to test sending score and count CSVs along with corresponding column metadata
Copy link
Collaborator

@bencap bencap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks for the updates! Just had one comment on how we handle certain special fields in score set updates. After we address that this looks gtg.

@davereinhart davereinhart force-pushed the davereinhart/scoreset-column-metadata branch from ad5609b to b3a5a88 Compare November 5, 2025 19:10
@davereinhart davereinhart merged commit b155ed6 into release-2025.5.0 Nov 7, 2025
6 checks passed
@bencap bencap mentioned this pull request Nov 13, 2025
@bencap bencap deleted the davereinhart/scoreset-column-metadata branch November 14, 2025 20:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Structured metadata for column descriptions

3 participants