Adds column metadata support and a multipart PATCH endpoint to update score sets and variants #546

davereinhart · 2025-10-20T17:33:01Z

Summary
Adds column metadata support and a multipart PATCH endpoint to update score set metadata and variant data together with conditional variant regeneration.

Key Changes

New dataset_columns (score/count column names + metadata)
New ScoreSetUpdateAllOptional model with as_form for multipart parsing
PATCH /api/v1/score-sets-with-variants/{urn} accepting:
- scores_file, counts_file (optional)
- score_columns_metadata_file, count_columns_metadata_file (JSON)
- Partial metadata fields (title, publications, target genes, scoreRanges, etc.)
Target gene find_or_create helpers to prevent duplicate rows
Conditional variant rebuild (only on files or target gene changes)
Permission split: UPDATE vs SET_SCORES for file/variant changes
Processing state set to processing before enqueue
Worker fixtures for column metadata
Updated tests for new models and flows

Rationale
Unifies metadata and data updates, reduces redundant variant rebuilds, and introduces structured column annotations for downstream processing.

Backwards Compatibility
PUT still works; new fields are additive. No behavior change for metadata-only updates.

…st to simplify build process

…null falsey values

Introduce dataset_columns field on score sets allowing definition of score and count column metadata. Establish base plumbing for persisting and returning column structure with score set resources. Add dataset_columns to score set updates Allows two keys of dataset_columns to be set up score_set_update: score_columns_metadata, and count_columns_metadata. These can be populated with JSON, keyed on column name, with values describing the fields listed in the score_columns and count_columns values of dataset_columns. Both score and count column metadata are optional, but if present will be utilized by the UI to display more information about custom columns present in each dataset.

…nts as part of updating variants Reworks previous implementation that includes scores and counts column metadata as part of JSON body to instead have them submitted as optional JSON files along with scores and counts CSV files. This change also includes validation of the JSON files with the CSV files they correspond to. Added pydantic models for DatasetColumns and DatasetColumnMetadata to correspond with ScoreSet.dataset_columns and its score_columns_metadata and count_columns_metadata fields. Ensures background jobs have structured column annotations available for downstream processing.

Create update model where all fields are optional plus as_form classmethod to parse multipart form data. Enables PATCH/PUT endpoints to accept partial updates including JSON-encoded nested objects.

Extract dataset column models into view_models/score_set_dataset_columns for cohesion and reuse. Improves maintainability by separating concerns from core score set models.

Remove test relying on camelization since dataset_columns now backed by explicit pydantic models. Adjust wrapper function to satisfy mypy type checking.

Add recordType to SavedDatasetColumns for clearer differentiation of stored column metadata groups. Update related constants and tests to reflect new attribute.

…pload Implement /score-sets-with-variants/{urn} PATCH handling multipart form (files + metadata). Supports simultaneous update of score set fields, scores CSV, counts CSV, and column metadata JSON.

Provide utility functions to avoid redundant target gene recreation on update. Improves efficiency and correctness of target gene handling during score set modifications.

Rename internal variables for consistency (score_columns_metadata, count_columns_metadata). Adjust route logic to accept both score/count files and their metadata JSON counterparts.

Integrate file parsing (scores, counts, column metadata) into PATCH workflow. Adds validation + conditional variant regeneration based on uploaded content and target gene changes.

Revise existing tests and introduce new ones to cover optional update model and multipart handling. Ensures regression coverage for newly added endpoint behaviors.

… usage Add targeted tests for optional update model. Apply model validation within router context to assert coherent partial update semantics.

Introduce score_columns_metadata.json and count metadata test files. Enable worker job tests to parse realistic column annotations during variant ingestion pipeline.

bencap

Looks really nice and should be a solid base for anything else we do with column metadata. Thanks for the related changes to the enqueue duplication, even though that was ancillary that is great to get cleaned up!

Comments are pretty minor and I left a couple about refactoring out some methods / classes such that they could be re-used later.

Before merging I would love to get a couple more test cases that cover the two find_or_create_target functions (two each for find and create) plus two tests on the score_set_update function that asserts the boolean field is set correctly for a target that doesn't need an enqueue and for a target that does. After that, I think the logic all looks good to merge.

src/mavedb/view_models/score_set.py

src/mavedb/view_models/target_gene.py

src/mavedb/lib/validation/dataframe/dataframe.py

src/mavedb/view_models/score_set.py

tests/view_models/test_score_set.py

…for reusability

…tadata as JSON Updates to score set router endpoints to have score and count columns metadata included in request payload as JSON data rather than Upload files. This is more consistent with how the extra_metadata field is handled, and will allow for more flexibility to design an interactive UI for creating column metadata in the future (either in addition to or to replace the current JSON file form inputs). This commit also fixes an exisitng bug, where None values were being excluded from score set updates, which prevented any way of clearning existing values from fields.

…as JSON Also adds test_add_score_set_variants_scores_counts_and_column_metadata_endpoint unit test, to test sending score and count CSVs along with corresponding column metadata

bencap

Looks great, thanks for the updates! Just had one comment on how we handle certain special fields in score set updates. After we address that this looks gtg.

src/mavedb/routers/score_sets.py

…adata

davereinhart added 16 commits October 14, 2025 16:38

Add build path to docker-compose-dev.yaml for dcd-mapping and cdot-re…

61aa6ef

…st to simplify build process

Update redis environment variables in template file

73b3a1a

Update value check on score_set and experiment updates to handle non-…

12d6c0e

…null falsey values

feat: add ScoreSetUpdateAllOptional with multipart form helper

f28c8e7

Create update model where all fields are optional plus as_form classmethod to parse multipart form data. Enables PATCH/PUT endpoints to accept partial updates including JSON-encoded nested objects.

refactor: move dataset column pydantic models to dedicated module

5cb60e0

Extract dataset column models into view_models/score_set_dataset_columns for cohesion and reuse. Improves maintainability by separating concerns from core score set models.

refactor: replace dynamic camelization test with explicit model

bbcb3f2

Remove test relying on camelization since dataset_columns now backed by explicit pydantic models. Adjust wrapper function to satisfy mypy type checking.

feat: extend SavedDatasetColumns with recordType

454cf86

Add recordType to SavedDatasetColumns for clearer differentiation of stored column metadata groups. Update related constants and tests to reflect new attribute.

feat: add PATCH endpoint supporting variants + score/count metadata u…

e3812b1

…pload Implement /score-sets-with-variants/{urn} PATCH handling multipart form (files + metadata). Supports simultaneous update of score set fields, scores CSV, counts CSV, and column metadata JSON.

feat: add target gene find_or_create helpers for sequence/accession

45c195d

Provide utility functions to avoid redundant target gene recreation on update. Improves efficiency and correctness of target gene handling during score set modifications.

refactor: unify variable names for score/count metadata & extend routes

de87fc4

Rename internal variables for consistency (score_columns_metadata, count_columns_metadata). Adjust route logic to accept both score/count files and their metadata JSON counterparts.

feat: enhance PATCH endpoint to fully process uploaded files

7fe4ed8

Integrate file parsing (scores, counts, column metadata) into PATCH workflow. Adds validation + conditional variant regeneration based on uploaded content and target gene changes.

test: update and add unit tests aligned with new score set update flow

cf89e6e

Revise existing tests and introduce new ones to cover optional update model and multipart handling. Ensures regression coverage for newly added endpoint behaviors.

test: add ScoreSetUpdateAllOptional model tests and router validation…

d49fb27

… usage Add targeted tests for optional update model. Apply model validation within router context to assert coherent partial update semantics.

feat: add worker job fixtures for score/count column metadata

4fa4995

Introduce score_columns_metadata.json and count metadata test files. Enable worker job tests to parse realistic column annotations during variant ingestion pipeline.

davereinhart linked an issue Oct 21, 2025 that may be closed by this pull request

Structured metadata for column descriptions #38

Closed

bencap reviewed Oct 25, 2025

View reviewed changes

davereinhart added 4 commits October 27, 2025 10:46

Apply ruff format and organize import on files in this branch

b57c11f

Cleanup

8391f67

Move all_fields_optional_model decorator to view models utils module …

5077533

…for reusability

Add unit tests for all_fields_optional_model decorator

2950b2e

davereinhart force-pushed the davereinhart/scoreset-column-metadata branch from f5d8180 to 2950b2e Compare October 27, 2025 19:49

davereinhart added 2 commits October 29, 2025 11:40

Update unit tests to include score and count columns metadata fields …

b3a5a88

…as JSON Also adds test_add_score_set_variants_scores_counts_and_column_metadata_endpoint unit test, to test sending score and count CSVs along with corresponding column metadata

davereinhart force-pushed the davereinhart/scoreset-column-metadata branch from 9c02542 to b3a5a88 Compare October 29, 2025 20:57

davereinhart mentioned this pull request Oct 29, 2025

Include score and counts column metadata fields on ScoreSetCreator and ScoreSetEditor components VariantEffect/mavedb-ui#540

Merged

bencap reviewed Nov 4, 2025

View reviewed changes

src/mavedb/routers/score_sets.py Show resolved Hide resolved

davereinhart force-pushed the davereinhart/scoreset-column-metadata branch from ad5609b to b3a5a88 Compare November 5, 2025 19:10

bencap approved these changes Nov 5, 2025

View reviewed changes

Merge branch 'release-2025.5.0' into davereinhart/scoreset-column-met…

2f37d6b

…adata

davereinhart merged commit b155ed6 into release-2025.5.0 Nov 7, 2025
6 checks passed

bencap mentioned this pull request Nov 13, 2025

Release 2025.5.0 #575

Merged

bencap deleted the davereinhart/scoreset-column-metadata branch November 14, 2025 20:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adds column metadata support and a multipart PATCH endpoint to update score sets and variants #546

Adds column metadata support and a multipart PATCH endpoint to update score sets and variants #546

Uh oh!

davereinhart commented Oct 20, 2025 •

edited

Loading

Uh oh!

bencap left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bencap left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Adds column metadata support and a multipart PATCH endpoint to update score sets and variants #546

Adds column metadata support and a multipart PATCH endpoint to update score sets and variants #546

Uh oh!

Conversation

davereinhart commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bencap left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bencap left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

davereinhart commented Oct 20, 2025 •

edited

Loading