DNA Meth: Initial attempt to implement a Gaussian process model for region-level DMR#4299
DNA Meth: Initial attempt to implement a Gaussian process model for region-level DMR#4299
Conversation
9076b22 to
2ceace3
Compare
2ceace3 to
269edb8
Compare
There was a problem hiding this comment.
Pull request overview
Adds region-level differential methylation analysis (GP-based DMR calling) that can be launched from DNA methylation volcano plot points, returning DMR intervals for display as a custom genome browser track.
Changes:
- Extends the
termdb/dmrAPI contract to return per-DMR stats (width, max Δβ, direction, probability) and accept optional annotation/nan-threshold parameters. - Replaces the prior
dmr.Rexecution with a Python GPDM pipeline (gpdm_analysis.py+gpdm/library) invoked viarun_python(). - Adds client-side click behavior for DNA methylation volcano points to call
termdb/dmrand render DMRs as abedjtrack in a new sandboxed genome browser Block.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| shared/types/src/routes/termdb.dmr.ts | Updates request/response typing for the new GPDM-backed DMR API. |
| server/routes/termdb.dmr.ts | Implements the Node route calling Python GPDM and returning DMRs. |
| python/src/gpdm_analysis.py | New Python entry point: reads HDF5 beta values, runs GPDM, returns JSON (and optional plot). |
| python/src/gpdm/core.py | New GPDM modeling library (naive + domain-partitioned GP, DMR calling, plotting). |
| python/src/gpdm/init.py | Exposes GPDM public API symbols. |
| client/plots/volcano/view/DataPointMouseEvents.ts | Routes DNA methylation volcano clicks to GPDM instead of boxplot. |
| client/plots/volcano/interactions/VolcanoInteractions.ts | Adds launchGpdm() to fetch DMRs and display a Block with a DMR bedj track. |
| client/plots/gb/view/View.ts | Passes hlregions through when launching Block. |
| release.txt | Adds a release note entry for the new feature. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| const time1 = Date.now() | ||
| const result = JSON.parse(await run_python('gpdm_analysis.py', JSON.stringify(gpdmInput))) | ||
| mayLog('DMR analysis time:', formatElapsedTime(Date.now() - time1)) |
There was a problem hiding this comment.
Will address in the future. Can generalize the grin2 manager and use it in both places
| def run_gpdm(params): | ||
| """ | ||
| Execute the full GPDM analysis pipeline for a single genomic region. | ||
|
|
||
| Pipeline steps: | ||
| 1. Read beta matrix from HDF5 for all requested samples | ||
| 2. Validate minimum sample and probe counts | ||
| 3. Drop probes with high NaN fraction (default > 50%) | ||
| 4. Impute remaining NaNs with per-group column means | ||
| 5. Initialize RegionalDMAnalysis and load the prepared data | ||
| 6. Add any caller-supplied annotations (from the ProteinPaint termdb) | ||
| 7. Run both naive and annotation-aware GP models | ||
| 8. Serialize results to a dict for JSON output | ||
|
|
There was a problem hiding this comment.
Will add tests to cover this at a later date
server/routes/termdb.dmr.ts
Outdated
| const plotPath = path.join(cachedir_gpdm, `dmr_${crypto.randomBytes(16).toString('hex')}.png`) | ||
|
|
||
| const gpdmInput = { | ||
| h5file: ds.queries.dnaMethylation.file, | ||
| chr: q.chr, | ||
| start: q.start, | ||
| stop: q.stop | ||
| stop: q.stop, | ||
| group1, | ||
| group2, | ||
| annotations: q.annotations || [], | ||
| nan_threshold: q.nan_threshold ?? 0.5, | ||
| plot_path: plotPath | ||
| } | ||
|
|
||
| const result: any = JSON.parse(await run_R('dmr.R', JSON.stringify(arg))) | ||
| const time1 = Date.now() | ||
| const result = JSON.parse(await run_python('gpdm_analysis.py', JSON.stringify(gpdmInput))) | ||
| mayLog('DMR analysis time:', formatElapsedTime(Date.now() - time1)) | ||
| if (result.error) throw new Error(result.error) | ||
| res.send(result as TermdbDmrSuccessResponse) | ||
|
|
||
| // PNG is written to cachedir_gpdm by Python and kept there for reference | ||
| res.send({ status: 'ok', dmrs: result.dmrs } as TermdbDmrSuccessResponse) |
There was a problem hiding this comment.
No cap is needed for now as we are still in testing. Will implement a cleanup later. I imagine we have existing infrastructure to handle this cleanup
There was a problem hiding this comment.
delete serverside rendering at next pr.
viz in dmr plot is done via block tracks
279a0f6 to
9aa617f
Compare
| this.addTooltipRow(table, 'log<sub>2</sub>(fold-change)', roundValueAuto(d.fold_change)) | ||
| this.addTooltipRow(table, 'Original p-value', roundValueAuto(d.original_p_value)) | ||
| this.addTooltipRow(table, 'Adjusted p-value', roundValueAuto(d.adjusted_p_value)) | ||
| if (this.termType === DNA_METHYLATION && d.gene_name) { |
There was a problem hiding this comment.
requiring gene_name will disable analysis for gene-less promoters
at next pr, need to query by promoter encode id and retrieve position, which will be used for dmr
same querying method could later be re-used in dna meth variable building in someway
There was a problem hiding this comment.
Agreed. Will address this in next pr
81010de to
df3ea38
Compare
client/plots/dmr/DmrTypes.ts
Outdated
| probability: number | ||
| } | ||
|
|
||
| export type DmrResult = { |
There was a problem hiding this comment.
This can be deleted. This should match the response type defined in #shared/types, TermdbDmrSuccessResponse.
| config: { | ||
| chartType: 'dmr', | ||
| headerText: promoterId ? `DMR: ${geneName} (${promoterId})` : `DMR: ${geneName}`, | ||
| genome: this.app.vocabApi.vocab.genome, |
There was a problem hiding this comment.
If the genome and dslabel are already defined in vocabApi, then it's not necessary to pass it into the config. Simply use this.app.vocabApi.vocab in the dmr plot code.
There was a problem hiding this comment.
Ah good point. Now just using this.app.vocabApi.vocab
…on specific DMR analysis with the initial route wired up
…e dmrs paint on block instead of returnpng
…de based on feedback. Other cleanups
0a2f3b5 to
5e70c10
Compare
Description
Motivation
The existing volcano plot shows one point per gene (aggregated promoter methylation). GPDM provides a drill-down that answers: where exactly in this gene is the differential methylation, and how confident are we? Classical tools (bumphunter, DMRcate) smooth p-values after the fact. GPDM models spatial correlation as a first-class part of the statistical model, producing a continuous posterior over the locus rather than a list of pre-called regions.
What was built
Backend (python/src/gpdm/)
core.py— GPDM library. DomainPartitionedGP fits separate annotation-aware GPs per regulatory domain with biologically-informed priors, stitched via distance-weighted blendinggpdm_analysis.py— ProteinPaint entry point via run_python(). Reads JSON from stdin, queries HDF5, runs NaN filtering + per-group mean imputation, calls GPDM, returns JSON with DMRs and metadataRoute (server/)
termdb.dmr.ts— route at termdb/dmr. Validates request, resolves HDF5 path from dataset config, invokesgpdm_analysis.pyClient (client/plots/)
dmr/DmrPlot.ts— Mass-native plot extending PlotBase. Resolves gene name to coordinates via genelookup, pads by configurable amount (default 2000bp both directions), fetches DMRs fromtermdb/dmr, builds a bedj track with bedItems for DMR regions, and renders a genome browser Block with RefGene + DMR tracksdmr/DmrTypes.ts— type definitions for config, result, and DOMdmr/settings/Settings.ts— DMRSettings type (blockWidth, pad)dmr/settings/defaults.ts— default settings factory with override supportvolcano/interactions/VolcanoInteractions.ts—launchDmr()dispatches aplot_createwith just the gene name; all coordinate resolution and padding handled by DmrPlotvolcano/view/DataPointMouseEvents.ts— click handler on DNA methylation volcano points dispatches tolaunchDmr()How the model works
Known limitations / future work
read_region_from_h5()duplicates some logic fromquery_beta_values.py— should be unifiedCloses
Closes stjude/sjpp#1232
Checklist
Check each task that has been performed or verified to be not applicable.