-
Notifications
You must be signed in to change notification settings - Fork 2
Adds column metadata support and a multipart PATCH endpoint to update score sets and variants #546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds column metadata support and a multipart PATCH endpoint to update score sets and variants #546
Conversation
…st to simplify build process
…null falsey values
Introduce dataset_columns field on score sets allowing definition of score and count column metadata. Establish base plumbing for persisting and returning column structure with score set resources. Add dataset_columns to score set updates Allows two keys of dataset_columns to be set up score_set_update: score_columns_metadata, and count_columns_metadata. These can be populated with JSON, keyed on column name, with values describing the fields listed in the score_columns and count_columns values of dataset_columns. Both score and count column metadata are optional, but if present will be utilized by the UI to display more information about custom columns present in each dataset.
…nts as part of updating variants Reworks previous implementation that includes scores and counts column metadata as part of JSON body to instead have them submitted as optional JSON files along with scores and counts CSV files. This change also includes validation of the JSON files with the CSV files they correspond to. Added pydantic models for DatasetColumns and DatasetColumnMetadata to correspond with ScoreSet.dataset_columns and its score_columns_metadata and count_columns_metadata fields. Ensures background jobs have structured column annotations available for downstream processing.
Create update model where all fields are optional plus as_form classmethod to parse multipart form data. Enables PATCH/PUT endpoints to accept partial updates including JSON-encoded nested objects.
Extract dataset column models into view_models/score_set_dataset_columns for cohesion and reuse. Improves maintainability by separating concerns from core score set models.
Remove test relying on camelization since dataset_columns now backed by explicit pydantic models. Adjust wrapper function to satisfy mypy type checking.
Add recordType to SavedDatasetColumns for clearer differentiation of stored column metadata groups. Update related constants and tests to reflect new attribute.
…pload
Implement /score-sets-with-variants/{urn} PATCH handling multipart form (files + metadata).
Supports simultaneous update of score set fields, scores CSV, counts CSV, and column metadata JSON.
Provide utility functions to avoid redundant target gene recreation on update. Improves efficiency and correctness of target gene handling during score set modifications.
Rename internal variables for consistency (score_columns_metadata, count_columns_metadata). Adjust route logic to accept both score/count files and their metadata JSON counterparts.
Integrate file parsing (scores, counts, column metadata) into PATCH workflow. Adds validation + conditional variant regeneration based on uploaded content and target gene changes.
Revise existing tests and introduce new ones to cover optional update model and multipart handling. Ensures regression coverage for newly added endpoint behaviors.
… usage Add targeted tests for optional update model. Apply model validation within router context to assert coherent partial update semantics.
Introduce score_columns_metadata.json and count metadata test files. Enable worker job tests to parse realistic column annotations during variant ingestion pipeline.
bencap
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really nice and should be a solid base for anything else we do with column metadata. Thanks for the related changes to the enqueue duplication, even though that was ancillary that is great to get cleaned up!
Comments are pretty minor and I left a couple about refactoring out some methods / classes such that they could be re-used later.
Before merging I would love to get a couple more test cases that cover the two find_or_create_target functions (two each for find and create) plus two tests on the score_set_update function that asserts the boolean field is set correctly for a target that doesn't need an enqueue and for a target that does. After that, I think the logic all looks good to merge.
f5d8180 to
2950b2e
Compare
…tadata as JSON Updates to score set router endpoints to have score and count columns metadata included in request payload as JSON data rather than Upload files. This is more consistent with how the extra_metadata field is handled, and will allow for more flexibility to design an interactive UI for creating column metadata in the future (either in addition to or to replace the current JSON file form inputs). This commit also fixes an exisitng bug, where None values were being excluded from score set updates, which prevented any way of clearning existing values from fields.
…as JSON Also adds test_add_score_set_variants_scores_counts_and_column_metadata_endpoint unit test, to test sending score and count CSVs along with corresponding column metadata
9c02542 to
b3a5a88
Compare
bencap
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks for the updates! Just had one comment on how we handle certain special fields in score set updates. After we address that this looks gtg.
ad5609b to
b3a5a88
Compare
Summary
Adds column metadata support and a multipart PATCH endpoint to update score set metadata and variant data together with conditional variant regeneration.
Key Changes
Rationale
Unifies metadata and data updates, reduces redundant variant rebuilds, and introduces structured column annotations for downstream processing.
Backwards Compatibility
PUT still works; new fields are additive. No behavior change for metadata-only updates.