Skip to content

Optimize/x ref phase2#133

Closed
gaow wants to merge 2 commits intomainfrom
optimize/x-ref-phase2
Closed

Optimize/x ref phase2#133
gaow wants to merge 2 commits intomainfrom
optimize/x-ref-phase2

Conversation

@gaow
Copy link
Copy Markdown
Contributor

@gaow gaow commented Mar 29, 2026

No description provided.

gaow and others added 2 commits March 29, 2026 07:00
… O(P^2) computation

In summary statistics mode, XtX %*% beta was computed 3 separate times per
iteration per outcome: in residual update, profile loglikelihood, and
correlation update (get_correlation). This commit caches the product once
after the beta update and reuses it, reducing the dominant O(P^2) cost by 3x.

Also precomputes per-outcome constants (scaling_factor, beta_scaling) during
model initialization to avoid repeated conditional evaluation per iteration.

Benchmark results (micro-benchmark on XtX %*% beta):
  P=1000, L=2,  M=100:  0.43s -> 0.14s (3x speedup)
  P=2000, L=5,  M=200:  9.75s -> 3.25s (3x speedup)
  P=5000, L=10, M=500:  324s  -> 108s  (3x speedup, saves 216s)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…f LD

When the number of variants P is large, the P×P LD matrix may be too large
to fit in memory. Users can now pass X_ref (N_ref × P reference panel) directly
instead of precomputing LD = cor(X_ref). When N_ref < P, ColocBoost computes
LD products on the fly via t(X_ref) %*% (X_ref %*% v) / (N_ref - 1), avoiding
the P×P memory cost. When N_ref >= P, LD is precomputed internally.

Key design:
- get_genotype_matrix(): returns $X or $X_ref from a data entry (mutually
  exclusive per entry: individual-level has $X, summary stats has $XtX or $X_ref)
- compute_xtx_product(v, XtX, X_ref): unified XtX %*% v from either source
- LD lookup functions (get_LD_jk, get_LD_jk1_jk2, get_LD_jk_each) reuse
  existing X path via get_genotype_matrix — no X_ref-specific code needed
- dict_sumstatLD works for both LD and X_ref mapping

Also:
- Replace all Rfast:: calls with proper @importFrom (correls, standardise,
  upper_tri, med) and remove redundant local aliases
- Add vignette section 3.4 with X_ref usage example
- 23 new tests covering numerical equivalence, edge cases, error handling

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 29, 2026

Codecov Report

❌ Patch coverage is 86.47059% with 23 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.72%. Comparing base (2b5e1e1) to head (3595e60).
⚠️ Report is 9 commits behind head on main.

Files with missing lines Patch % Lines
R/colocboost.R 78.43% 11 Missing ⚠️
R/colocboost_init.R 85.10% 7 Missing ⚠️
R/colocboost_check_update_jk.R 55.55% 4 Missing ⚠️
R/colocboost_inference.R 95.83% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #133      +/-   ##
==========================================
+ Coverage   84.05%   84.72%   +0.66%     
==========================================
  Files          14       14              
  Lines        4828     4889      +61     
==========================================
+ Hits         4058     4142      +84     
+ Misses        770      747      -23     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@gaow gaow closed this Apr 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant