Skip to content

DNA Meth: Initial attempt to implement a Gaussian process model for region-level DMR#4299

Merged
xzhou82 merged 13 commits intomasterfrom
dna.meth.pp.3
Mar 16, 2026
Merged

DNA Meth: Initial attempt to implement a Gaussian process model for region-level DMR#4299
xzhou82 merged 13 commits intomasterfrom
dna.meth.pp.3

Conversation

@compbiolover
Copy link
Contributor

@compbiolover compbiolover commented Mar 11, 2026

Description

  • Adds region-level differential methylation analysis (GPDM) accessible from the DNA methylation volcano plot
  • Clicking a gene in the volcano plot launches a DMR plot that resolves gene coordinates, fetches beta values from HDF5, fits a Bayesian Gaussian Process model to both sample groups, and identifies DMRs
  • DMRs are displayed as a bedj track with colored bedItems on the genome browser Block (orange = hyper, blue = hypo)

Motivation

The existing volcano plot shows one point per gene (aggregated promoter methylation). GPDM provides a drill-down that answers: where exactly in this gene is the differential methylation, and how confident are we? Classical tools (bumphunter, DMRcate) smooth p-values after the fact. GPDM models spatial correlation as a first-class part of the statistical model, producing a continuous posterior over the locus rather than a list of pre-called regions.

What was built

Backend (python/src/gpdm/)

  • core.py — GPDM library. DomainPartitionedGP fits separate annotation-aware GPs per regulatory domain with biologically-informed priors, stitched via distance-weighted blending
  • gpdm_analysis.py — ProteinPaint entry point via run_python(). Reads JSON from stdin, queries HDF5, runs NaN filtering + per-group mean imputation, calls GPDM, returns JSON with DMRs and metadata

Route (server/)

  • termdb.dmr.ts — route at termdb/dmr. Validates request, resolves HDF5 path from dataset config, invokes gpdm_analysis.py

Client (client/plots/)

  • dmr/DmrPlot.ts — Mass-native plot extending PlotBase. Resolves gene name to coordinates via genelookup, pads by configurable amount (default 2000bp both directions), fetches DMRs from termdb/dmr, builds a bedj track with bedItems for DMR regions, and renders a genome browser Block with RefGene + DMR tracks
  • dmr/DmrTypes.ts — type definitions for config, result, and DOM
  • dmr/settings/Settings.ts — DMRSettings type (blockWidth, pad)
  • dmr/settings/defaults.ts — default settings factory with override support
  • volcano/interactions/VolcanoInteractions.tslaunchDmr() dispatches a plot_create with just the gene name; all coordinate resolution and padding handled by DmrPlot
  • volcano/view/DataPointMouseEvents.ts — click handler on DNA methylation volcano points dispatches to launchDmr()

How the model works

  • Beta values extracted from HDF5 for all samples in both groups
  • Per-CpG group means and standard errors computed (SE passed as heteroscedastic noise to GP)
  • Two independent annotation-aware GPs fitted per group; domain-specific prior means subtracted before fitting, with per-domain length-scale priors and 150bp overlap margins for smooth stitching
  • Posterior difference Δ(x) = pred_B − pred_A computed at 500 grid points; Var[Δ] = Var[A] + Var[B]
  • 95% credible interval derived; grid points where CI excludes zero are significant
  • Contiguous significant runs merged into DMRs (≥50bp), characterized by max delta-beta, mean posterior probability, and direction (hyper/hypo based on signed mean_delta_beta) which are returned to client

Known limitations / future work

  • read_region_from_h5() duplicates some logic from query_beta_values.py — should be unified
  • NaN imputation is per-group column mean; kNN or multiple imputation could be explored
  • No covariate control yet (e.g. EPIC vs 450k array type)
  • Annotations currently passed from client; could be auto-populated from termdb annotation tracks
  • Promoters with no gene TSS will not work — need to extend by allowing a selection instead of only gene point clicks

Closes

Closes stjude/sjpp#1232

Checklist

Check each task that has been performed or verified to be not applicable.

  • Tests: Added and/or passed unit and integration tests, or N/A
  • Todos: Commented or documented, or N/A
  • Notable Changes: updated release.txt, prefixed a commit message with "fix:" or "feat:", added to an internal tracking document, or N/A
  • Rust: Checked to see whether Rust needs to be re-compiled because of this PR, or N/A

@compbiolover compbiolover added the DA Differential Analysis tool label Mar 11, 2026
@compbiolover compbiolover force-pushed the dna.meth.pp.3 branch 3 times, most recently from 9076b22 to 2ceace3 Compare March 13, 2026 02:35
@compbiolover compbiolover changed the title WIP: Initial attempt to implement a Gaussian process model for probe-level DMR Initial attempt to implement a Gaussian process model for region-level DMR Mar 13, 2026
@compbiolover compbiolover changed the title Initial attempt to implement a Gaussian process model for region-level DMR DNA Meth: Initial attempt to implement a Gaussian process model for region-level DMR Mar 13, 2026
@compbiolover compbiolover added the DNA METH For the DNA Meth project label Mar 13, 2026
@compbiolover compbiolover marked this pull request as ready for review March 13, 2026 03:35
Copilot AI review requested due to automatic review settings March 13, 2026 03:35
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds region-level differential methylation analysis (GP-based DMR calling) that can be launched from DNA methylation volcano plot points, returning DMR intervals for display as a custom genome browser track.

Changes:

  • Extends the termdb/dmr API contract to return per-DMR stats (width, max Δβ, direction, probability) and accept optional annotation/nan-threshold parameters.
  • Replaces the prior dmr.R execution with a Python GPDM pipeline (gpdm_analysis.py + gpdm/ library) invoked via run_python().
  • Adds client-side click behavior for DNA methylation volcano points to call termdb/dmr and render DMRs as a bedj track in a new sandboxed genome browser Block.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
shared/types/src/routes/termdb.dmr.ts Updates request/response typing for the new GPDM-backed DMR API.
server/routes/termdb.dmr.ts Implements the Node route calling Python GPDM and returning DMRs.
python/src/gpdm_analysis.py New Python entry point: reads HDF5 beta values, runs GPDM, returns JSON (and optional plot).
python/src/gpdm/core.py New GPDM modeling library (naive + domain-partitioned GP, DMR calling, plotting).
python/src/gpdm/init.py Exposes GPDM public API symbols.
client/plots/volcano/view/DataPointMouseEvents.ts Routes DNA methylation volcano clicks to GPDM instead of boxplot.
client/plots/volcano/interactions/VolcanoInteractions.ts Adds launchGpdm() to fetch DMRs and display a Block with a DMR bedj track.
client/plots/gb/view/View.ts Passes hlregions through when launching Block.
release.txt Adds a release note entry for the new feature.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +65 to +67
const time1 = Date.now()
const result = JSON.parse(await run_python('gpdm_analysis.py', JSON.stringify(gpdmInput)))
mayLog('DMR analysis time:', formatElapsedTime(Date.now() - time1))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will address in the future. Can generalize the grin2 manager and use it in both places

Comment on lines +196 to +209
def run_gpdm(params):
"""
Execute the full GPDM analysis pipeline for a single genomic region.

Pipeline steps:
1. Read beta matrix from HDF5 for all requested samples
2. Validate minimum sample and probe counts
3. Drop probes with high NaN fraction (default > 50%)
4. Impute remaining NaNs with per-group column means
5. Initialize RegionalDMAnalysis and load the prepared data
6. Add any caller-supplied annotations (from the ProteinPaint termdb)
7. Run both naive and annotation-aware GP models
8. Serialize results to a dict for JSON output

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add tests to cover this at a later date

Comment on lines +51 to +71
const plotPath = path.join(cachedir_gpdm, `dmr_${crypto.randomBytes(16).toString('hex')}.png`)

const gpdmInput = {
h5file: ds.queries.dnaMethylation.file,
chr: q.chr,
start: q.start,
stop: q.stop
stop: q.stop,
group1,
group2,
annotations: q.annotations || [],
nan_threshold: q.nan_threshold ?? 0.5,
plot_path: plotPath
}

const result: any = JSON.parse(await run_R('dmr.R', JSON.stringify(arg)))
const time1 = Date.now()
const result = JSON.parse(await run_python('gpdm_analysis.py', JSON.stringify(gpdmInput)))
mayLog('DMR analysis time:', formatElapsedTime(Date.now() - time1))
if (result.error) throw new Error(result.error)
res.send(result as TermdbDmrSuccessResponse)

// PNG is written to cachedir_gpdm by Python and kept there for reference
res.send({ status: 'ok', dmrs: result.dmrs } as TermdbDmrSuccessResponse)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No cap is needed for now as we are still in testing. Will implement a cleanup later. I imagine we have existing infrastructure to handle this cleanup

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete serverside rendering at next pr.
viz in dmr plot is done via block tracks

@compbiolover compbiolover requested a review from creilly8 March 13, 2026 16:37
this.addTooltipRow(table, 'log<sub>2</sub>(fold-change)', roundValueAuto(d.fold_change))
this.addTooltipRow(table, 'Original p-value', roundValueAuto(d.original_p_value))
this.addTooltipRow(table, 'Adjusted p-value', roundValueAuto(d.adjusted_p_value))
if (this.termType === DNA_METHYLATION && d.gene_name) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requiring gene_name will disable analysis for gene-less promoters
at next pr, need to query by promoter encode id and retrieve position, which will be used for dmr
same querying method could later be re-used in dna meth variable building in someway

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Will address this in next pr

probability: number
}

export type DmrResult = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be deleted. This should match the response type defined in #shared/types, TermdbDmrSuccessResponse.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted

config: {
chartType: 'dmr',
headerText: promoterId ? `DMR: ${geneName} (${promoterId})` : `DMR: ${geneName}`,
genome: this.app.vocabApi.vocab.genome,
Copy link
Collaborator

@creilly8 creilly8 Mar 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the genome and dslabel are already defined in vocabApi, then it's not necessary to pass it into the config. Simply use this.app.vocabApi.vocab in the dmr plot code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good point. Now just using this.app.vocabApi.vocab

@xzhou82 xzhou82 merged commit 22f20e1 into master Mar 16, 2026
3 checks passed
@xzhou82 xzhou82 deleted the dna.meth.pp.3 branch March 16, 2026 17:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

DA Differential Analysis tool DNA METH For the DNA Meth project

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants