Skip to content

Comments

Fix NULL row names in NMF matrix for sig_tally() Wang method#474

Draft
Copilot wants to merge 3 commits intomasterfrom
copilot/fix-null-row-names-nmf-matrix
Draft

Fix NULL row names in NMF matrix for sig_tally() Wang method#474
Copilot wants to merge 3 commits intomasterfrom
copilot/fix-null-row-names-nmf-matrix

Conversation

Copy link
Contributor

Copilot AI commented Dec 25, 2025

When using sig_tally() with the Wang method, the NMF matrix row names are NULL, particularly for single-sample cases. The Steele method correctly preserves sample IDs as row names.

Root Cause

The matrix generation transposes a components-by-samples matrix to samples-by-components. R's transpose doesn't reliably preserve dimension names in edge cases (e.g., single-column matrices).

Changes

  • R/sig_tally.R: Explicitly capture sample names before transpose and assign as row names after
  • tests/testthat/test-sig_tally_rownames.R: Add tests for single-sample and multi-sample cases
  • NEWS.md: Document bug fix

Example

# Single sample case (previously returned NULL row names)
cn <- read_copynumber(singleSampleSegTabs, ...)
tally_wang <- sig_tally(cn, method = "Wang")
rownames(tally_wang$nmf_matrix)  # Now returns: "TCGA-A8-A07S-01A-11D-A036-01"

The fix ensures consistent behavior across all sample counts and aligns with the Steele method's row name handling.

Original prompt

This section details on the original issue you should resolve

<issue_title>NULL Row Names in NMF Matrix When Using sig_tally() with Wang Method</issue_title>
<issue_description>Hi @ShixiangWang,

Hope you're well.

Just noticed that when using the sig_tally() function with the Wang method, the row names in the resulting NMF matrix (tally_wang$nmf_matrix) are unexpectedly NULL. This behavior is inconsistent with the expected outcome, where the row names would ideally be the sample IDs, as observed when using the S (Steele) method. Reprex below:

# Load the sigminer library
library(sigminer)
#> Registered S3 method overwritten by 'sigminer':
#>   method      from
#>   print.bytes Rcpp
#> sigminer version 2.3.2
#> - Star me at https://github.com/ShixiangWang/sigminer
#> - Run hello() to see usage and citation.
# Load a toy segmentation table included with the sigminer package
load(system.file("extdata", "toy_segTab.RData",
                 package = "sigminer", mustWork = TRUE
))

# Set a seed for reproducibility
set.seed(1234)

# Add a new column 'minor_cn' with random values of either 0 or 1
segTabs$minor_cn <- sample(c(0, 1), size = nrow(segTabs), replace = TRUE)

# Subset the segmentation table for a single sample
singleSampleSegTabs <- subset(segTabs, sample == "TCGA-A8-A07S-01A-11D-A036-01")

# Read the copy number data for the single sample
cn <- read_copynumber(singleSampleSegTabs,
                      seg_cols = c("chromosome", "start", "end", "segVal"),
                      genome_measure = "wg", complement = TRUE, add_loh = TRUE
)
#> ℹ [2024-09-05 14:58:13.990813]: Started.
#> ℹ [2024-09-05 14:58:13.995483]: Genome build  : hg19.
#> ℹ [2024-09-05 14:58:13.996023]: Genome measure: wg.
#> ℹ [2024-09-05 14:58:13.996513]: When add_loh is TRUE, use_all is forced to TRUE.
#> Please drop columns you don't want to keep before reading.
#> ✔ [2024-09-05 14:58:14.002538]: Chromosome size database for build obtained.
#> ℹ [2024-09-05 14:58:14.003157]: Reading input.
#> ✔ [2024-09-05 14:58:14.003672]: A data frame as input detected.
#> ✔ [2024-09-05 14:58:14.004327]: Column names checked.
#> ✔ [2024-09-05 14:58:14.005043]: Column order set.
#> ✔ [2024-09-05 14:58:14.006148]: Chromosomes unified.
#> ✔ [2024-09-05 14:58:14.008141]: Value 2 (normal copy) filled to uncalled chromosomes.
#> ✔ [2024-09-05 14:58:14.009723]: Data imported.
#> ℹ [2024-09-05 14:58:14.010245]: Segments info:
#> ℹ [2024-09-05 14:58:14.010774]:     Keep - 45
#> ℹ [2024-09-05 14:58:14.011272]:     Drop - 0
#> ✔ [2024-09-05 14:58:14.011938]: Segments sorted.
#> ℹ [2024-09-05 14:58:14.012445]: Adding LOH labels...
#> ℹ [2024-09-05 14:58:14.013251]: Joining adjacent segments with same copy number value. Be patient...
#> ✔ [2024-09-05 14:58:14.018461]: 39 segments left after joining.
#> ✔ [2024-09-05 14:58:14.019082]: Segmental table cleaned.
#> ℹ [2024-09-05 14:58:14.019575]: Annotating.
#> ✔ [2024-09-05 14:58:14.024584]: Annotation done.
#> ℹ [2024-09-05 14:58:14.02513]: Summarizing per sample.
#> ✔ [2024-09-05 14:58:14.033565]: Summarized.
#> ℹ [2024-09-05 14:58:14.034175]: Generating CopyNumber object.
#> ✔ [2024-09-05 14:58:14.034964]: Generated.
#> ℹ [2024-09-05 14:58:14.03548]: Validating object.
#> ✔ [2024-09-05 14:58:14.036014]: Done.
#> ℹ [2024-09-05 14:58:14.036588]: 0.046 secs elapsed.
# Tally signatures using the Wang method
tally_wang <- sigminer::sig_tally(cn, method = "Wang", keep_only_matrix = FALSE)
#> ℹ [2024-09-05 14:58:14.03747]: Started.
#> ℹ [2024-09-05 14:58:14.038619]: Step: getting copy number features.
#> ℹ [2024-09-05 14:58:14.076979]: Getting breakpoint count per 10 Mb...
#> ℹ [2024-09-05 14:58:14.104202]: Getting breakpoint count per chromosome arm...
#> ℹ [2024-09-05 14:58:14.111208]: Getting copy number...
#> ℹ [2024-09-05 14:58:14.112797]: Getting change-point copy number change...
#> ℹ [2024-09-05 14:58:14.118523]: Getting length of chains of oscillating copy number...
#> ℹ [2024-09-05 14:58:14.124469]: Getting (log10 based) segment size...
#> ℹ [2024-09-05 14:58:14.125993]: Getting the minimal number of chromosome with 50% CNV...
#> ℹ [2024-09-05 14:58:14.133015]: Getting burden of chromosome...
#> ✔ [2024-09-05 14:58:14.140732]: Gotten.
#> ℹ [2024-09-05 14:58:14.14139]: Step: generating copy number components.
#> ✔ [2024-09-05 14:58:14.141898]: `feature_setting` checked.
#> ℹ [2024-09-05 14:58:14.143294]: Step: counting components.
#> ✔ [2024-09-05 14:58:14.250663]: Counted.
#> ℹ [2024-09-05 14:58:14.251405]: Step: generating components by sample matrix.
#> ✔ [2024-09-05 14:58:14.252393]: Matrix generated.
#> ℹ [2024-09-05 14:58:14.252925]: 0.215 secs elapsed.
# Tally signatures using the Steele method
tally_steele <- sigminer::sig_tally(cn, method = "S", keep_only_matrix = FALSE)
#> ℹ [2024-09-05 14:58:14.253776]: Started.
#> ℹ [2024-09-05 14:58:14.25461]...

</details>



<!-- START COPILOT CODING AGENT SUFFIX -->

- Fixes ShixiangWang/sigminer#465

<!-- START COPILOT CODING AGENT TIPS -->
---Let Copilot coding agent [set things up for you](https://github.com/ShixiangWang/sigminer/issues/new?title=+Set+up+Copilot+instructions&body=Configure%20instructions%20for%20this%20repository%20as%20documented%20in%20%5BBest%20practices%20for%20Copilot%20coding%20agent%20in%20your%20repository%5D%28https://gh.io/copilot-coding-agent-tips%29%2E%0A%0A%3COnboard%20this%20repo%3E&assignees=copilot) — coding agent works faster and does higher quality work when set up for your repo.

Co-authored-by: ShixiangWang <25057508+ShixiangWang@users.noreply.github.com>
Co-authored-by: ShixiangWang <25057508+ShixiangWang@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix NULL row names in NMF matrix with sig_tally() Wang method Fix NULL row names in NMF matrix for sig_tally() Wang method Dec 25, 2025
Copilot AI requested a review from ShixiangWang December 25, 2025 15:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants