Skip to content

Conversation

@bencap
Copy link
Collaborator

@bencap bencap commented Nov 10, 2025

Features

Bug Fixes

Maintenance

Deployment Note
To properly migrate the database to support this branch follow these steps:

Checkout the commit 2cbf857d78ecd38caaa105eeecada9b6eda46f99 and upgrade to the alembic head.

git checkout 129d7daefb0c18d0f70f20ee633cb17a02a3cc37
DB_PORT=5434 DB_DATABASE_NAME=mavedb DB_USERNAME=postgres DB_PASSWORD=XXX poetry run alembic upgrade head

Run the manual migration

DB_PORT=5434 DB_DATABASE_NAME=mavedb DB_USERNAME=postgres DB_PASSWORD=XXX poetry run python3 alembic/manual_migrations/migrate_score_ranges_to_calibrations.py

Checkout the commit 3da28919a6f6173ad0d65c768b172dfffbf730fc and upgrade to the alembic head.

git checkout 3da28919a6f6173ad0d65c768b172dfffbf730fc
DB_PORT=5434 DB_DATABASE_NAME=mavedb DB_USERNAME=postgres DB_PASSWORD=XXX poetry run alembic upgrade head

Checkout the head commit at main and upgrade to the alembic head

git checkout main
DB_PORT=5434 DB_DATABASE_NAME=mavedb DB_USERNAME=postgres DB_PASSWORD=XXX poetry run alembic upgrade head

bencap and others added 30 commits October 6, 2025 13:40
Adds a standalone score calibration model to replace the `score_ranges` property of score sets.
This model better supports generically typed score ranges, publication identifiers directly associated with score ranges and odds paths provided independently of functional classes.
Introduce dataset_columns field on score sets allowing definition of score and count column metadata.
Establish base plumbing for persisting and returning column structure with score set resources.
Add dataset_columns to score set updates

Allows two keys of dataset_columns to be set up score_set_update: score_columns_metadata, and count_columns_metadata.  These can be populated with JSON, keyed on column name, with values describing the fields listed in the score_columns and count_columns values of dataset_columns. Both score and count column metadata are optional, but if present will be utilized by the UI to display more information about custom columns present in each dataset.
…nts as part of updating variants

Reworks previous implementation that includes scores and counts column metadata as part of JSON body to instead have them submitted as optional JSON files along with scores and counts CSV files. This change also includes validation of the JSON files with the CSV files they correspond to. Added pydantic models for DatasetColumns and DatasetColumnMetadata to correspond with ScoreSet.dataset_columns and its score_columns_metadata and count_columns_metadata fields. Ensures background jobs have structured column annotations available for downstream processing.
Create update model where all fields are optional plus as_form classmethod to parse multipart form data.
Enables PATCH/PUT endpoints to accept partial updates including JSON-encoded nested objects.
Extract dataset column models into view_models/score_set_dataset_columns for cohesion and reuse.
Improves maintainability by separating concerns from core score set models.
Remove test relying on camelization since dataset_columns now backed by explicit pydantic models.
Adjust wrapper function to satisfy mypy type checking.
Add recordType to SavedDatasetColumns for clearer differentiation of stored column metadata groups.
Update related constants and tests to reflect new attribute.
…pload

Implement /score-sets-with-variants/{urn} PATCH handling multipart form (files + metadata).
Supports simultaneous update of score set fields, scores CSV, counts CSV, and column metadata JSON.
Provide utility functions to avoid redundant target gene recreation on update.
Improves efficiency and correctness of target gene handling during score set modifications.
Rename internal variables for consistency (score_columns_metadata, count_columns_metadata).
Adjust route logic to accept both score/count files and their metadata JSON counterparts.
Integrate file parsing (scores, counts, column metadata) into PATCH workflow.
Adds validation + conditional variant regeneration based on uploaded content and target gene changes.
Revise existing tests and introduce new ones to cover optional update model and multipart handling.
Ensures regression coverage for newly added endpoint behaviors.
… usage

Add targeted tests for optional update model.
Apply model validation within router context to assert coherent partial update semantics.
Introduce score_columns_metadata.json and count metadata test files.
Enable worker job tests to parse realistic column annotations during variant ingestion pipeline.
- New /api/v1/alphafold-files/version endpoint (CORS-safe)
- Fetches upstream XML index, handles optional namespace
- Extracts model version from NextMarker (model_vN) and returns {"version": "v6"} (lowercased)
- Error responses: upstream fetch failure (status passthrough), malformed ID (500), XML parse error (502)
…t scoreset URNs and count from response

Including the scoreset URNs and count for all score sets for the parent experiment is computationally expensive and not necessary in all contexts (e.g. the main score set search page). Adding an optional boolean parameter to score set search paramaters to indicate whether or not to include this in the response, which defaults to true to maintain existing behavior.
- Add a limit option to score set search queries. Require a limit of at most 100 for searches of all score sets, while searches for the user's own score sets can have no limit.
- Extract the score set search query filter clause logic into a new function.
- Use limit + 1 in the search query, and if limit is exceeded, run a second query to count available rows. In either case, return the number of available rows with the limited search result.
- Instead of searching all score sets and then replacing un-superseded ones with their successors, revise the database query to search only un-superseded score sets.
- In the main search endpoint (but not in the "my score sets" endpoint), mandate that the search be only for published score sets.
…pecified, but limit the number of publication IDs.
sallybg and others added 27 commits November 12, 2025 15:33
The Rxiv server recently updated their API to remove the `preprint_` string pre-pended to various fields. This change removes the prepended string, allowing us to parse these fields out again.
In newer versions of UTA, the gene table has become overrun with mRNA, mitochondrial genes, etc. that polute the result list. Since the real purpose of this endpoint is to list the genes for which we can link transcripts, this query edits the query to select distinct HGNC names from the transcripts table.
… and clarify error logging for variant updates
Populate and surface post-mapped HGVS expressions and VEP functional consequence, and surface gnomAD AF
@bencap bencap marked this pull request as ready for review November 14, 2025 03:51
@bencap bencap merged commit 37659db into main Nov 14, 2025
6 checks passed
@bencap bencap deleted the release-2025.5.0 branch November 14, 2025 19:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment