Skip to content

Commit 4797dec

Browse files
committed
feat(eqtl): comprehensive eQTL analysis methods, config, and visualization
New modules: - gwas/finemapping/eqtl.py: cis/trans eQTL scan, conditional analysis, effect sizes - gwas/visualization/eqtl_visualization.py: volcano, boxplot, LocusZoom plots New config: - config/eqtl/eqtl_amellifera.yaml: A. mellifera eQTL config linking RNA + GWAS Tests: 17 passing tests for eQTL integration
1 parent 096b3f8 commit 4797dec

File tree

20 files changed

+1262
-0
lines changed

20 files changed

+1262
-0
lines changed

config/eqtl/README.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
# eQTL Configuration
2+
3+
Configuration files for expression Quantitative Trait Loci (eQTL) analysis.
4+
5+
## Files
6+
7+
| File | Description |
8+
|------|-------------|
9+
| `eqtl_amellifera.yaml` | *Apis mellifera* eQTL analysis config |
10+
11+
## Quick Start
12+
13+
```bash
14+
# Run eQTL analysis with config
15+
uv run python scripts/eqtl/run_eqtl_analysis.py --config config/eqtl/eqtl_amellifera.yaml
16+
```
17+
18+
## Configuration Sections
19+
20+
- **expression**: RNA-seq data paths, normalization settings
21+
- **variants**: VCF file, MAF filters
22+
- **annotations**: Gene positions for cis-window
23+
- **cis_eqtl**: Window size, FDR thresholds
24+
- **trans_eqtl**: Trans analysis settings (optional)
25+
- **colocalization**: GWAS integration
26+
- **output**: Results and plots directories
27+
28+
## Data Sources
29+
30+
Requires:
31+
32+
1. **Expression data**: `output/amalgkit/apis_mellifera_all/work/quant/`
33+
2. **Variants**: `output/gwas/amellifera/variants/`
34+
3. **Annotations**: `output/gwas/amellifera/genome/genomic.gff`
35+
36+
## See Also
37+
38+
- [config/gwas/](../gwas/) - GWAS configuration
39+
- [config/amalgkit/](../amalgkit/) - RNA-seq pipeline config

config/eqtl/eqtl_amellifera.yaml

Lines changed: 151 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,151 @@
1+
# METAINFORMANT eQTL Configuration
2+
# Species: Apis mellifera (Western Honey Bee)
3+
# Purpose: Expression QTL analysis linking GWAS variants to RNA expression
4+
5+
# =============================================================================
6+
# EXPRESSION DATA
7+
# =============================================================================
8+
# RNA-seq expression data from kallisto quantification
9+
expression:
10+
# Directory containing sample abundance files
11+
quant_dir: output/amalgkit/apis_mellifera_all/work/quant/
12+
13+
# Expression matrix file (if pre-merged)
14+
# expression_matrix: output/amalgkit/apis_mellifera_all/work/merge/expression_matrix.tsv
15+
16+
# Gene-level or transcript-level
17+
level: transcript
18+
19+
# TPM threshold for expressed genes
20+
min_tpm: 1.0
21+
22+
# Minimum samples with expression
23+
min_samples_expressed: 10
24+
25+
# Normalization method: tpm, tmm, or quantile
26+
normalization: tmm
27+
28+
# =============================================================================
29+
# VARIANT DATA
30+
# =============================================================================
31+
# GWAS variant data for eQTL association
32+
variants:
33+
# VCF file with genotype dosages
34+
vcf_file: output/gwas/amellifera/variants/amellifera_population.vcf
35+
36+
# Minimum minor allele frequency
37+
min_maf: 0.05
38+
39+
# Maximum missing rate
40+
max_missing: 0.1
41+
42+
# Variant positions file (if separate from VCF)
43+
# positions_file: output/gwas/amellifera/variants/variant_positions.tsv
44+
45+
# =============================================================================
46+
# GENE ANNOTATIONS
47+
# =============================================================================
48+
# Gene/transcript position information for cis-window definition
49+
annotations:
50+
# GFF3 or GTF file
51+
gff_file: output/gwas/amellifera/genome/genomic.gff
52+
53+
# Gene positions TSV (gene_id, chrom, tss_position)
54+
# gene_positions: output/gwas/amellifera/annotations/gene_positions.tsv
55+
56+
# Feature type to extract (gene, mRNA, exon)
57+
feature_type: gene
58+
59+
# =============================================================================
60+
# CIS-eQTL ANALYSIS
61+
# =============================================================================
62+
cis_eqtl:
63+
# Enable cis-eQTL analysis
64+
enabled: true
65+
66+
# Window size around TSS (bp)
67+
window: 1000000 # 1Mb
68+
69+
# Minimum samples for testing
70+
min_samples: 30
71+
72+
# FDR threshold for significance
73+
fdr_threshold: 0.05
74+
75+
# Permutation-based significance (slower but more accurate)
76+
permutations: 0 # 0 = no permutation, 1000 recommended for final analysis
77+
78+
# =============================================================================
79+
# TRANS-eQTL ANALYSIS
80+
# =============================================================================
81+
trans_eqtl:
82+
# Enable trans-eQTL analysis
83+
enabled: false # Computationally expensive
84+
85+
# P-value threshold for reporting (pre-FDR)
86+
pvalue_threshold: 1.0e-6
87+
88+
# FDR threshold for significance
89+
fdr_threshold: 0.01
90+
91+
# =============================================================================
92+
# COLOCALIZATION
93+
# =============================================================================
94+
# Colocalization with GWAS phenotype associations
95+
colocalization:
96+
enabled: true
97+
98+
# GWAS summary statistics file
99+
gwas_sumstats: output/gwas/amellifera/results/varroa_resistance.tsv
100+
101+
# Colocalization method: coloc or clpp
102+
method: coloc
103+
104+
# Posterior probability threshold for shared causal variant (H4)
105+
pp_h4_threshold: 0.75
106+
107+
# =============================================================================
108+
# SAMPLE MATCHING
109+
# =============================================================================
110+
# Sample ID mapping between expression and genotype data
111+
samples:
112+
# Sample ID mapping file (expression_id, genotype_id)
113+
# mapping_file: output/eqtl/amellifera/sample_mapping.tsv
114+
115+
# Covariates to include (population PCs, batch, etc.)
116+
covariates:
117+
- PC1
118+
- PC2
119+
- PC3
120+
121+
# Covariate file path
122+
# covariates_file: output/gwas/amellifera/structure/pca_covariates.tsv
123+
124+
# =============================================================================
125+
# OUTPUT
126+
# =============================================================================
127+
output:
128+
# Output directory
129+
results_dir: output/eqtl/amellifera/results
130+
131+
# Plots directory
132+
plots_dir: output/eqtl/amellifera/plots
133+
134+
# Output format: tsv, json, parquet
135+
format: tsv
136+
137+
# Generate summary report
138+
summary_report: true
139+
140+
# =============================================================================
141+
# COMPUTATIONAL
142+
# =============================================================================
143+
compute:
144+
threads: 8
145+
memory_gb: 16
146+
147+
# Batch processing for large datasets
148+
batch_size: 1000 # genes per batch
149+
150+
# Random seed for reproducibility
151+
seed: 42
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
{
2+
"n_targets": 28355,
3+
"n_bootstraps": 0,
4+
"n_processed": 36298267,
5+
"n_pseudoaligned": 29365958,
6+
"n_unique": 12893561,
7+
"p_pseudoaligned": 80.9,
8+
"p_unique": 35.5,
9+
"kallisto_version": "0.51.1",
10+
"index_version": 13,
11+
"k-mer length": 31,
12+
"start_time": "Thu Feb 5 09:20:13 2026",
13+
"call": "kallisto quant -i output/amalgkit/apis_mellifera_all/work/index/Apis_mellifera_transcripts.idx -o output/amalgkit/apis_mellifera_all/work/quant/SRR10030225 -t 2 --single -l 200 -s 30 /Volumes/blue/data/apis_mellifera/SRR10030225/SRR10030225.fastq.gz"
14+
}
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
{
2+
"n_targets": 28355,
3+
"n_bootstraps": 0,
4+
"n_processed": 33933530,
5+
"n_pseudoaligned": 24992338,
6+
"n_unique": 9178145,
7+
"p_pseudoaligned": 73.7,
8+
"p_unique": 27.0,
9+
"kallisto_version": "0.51.1",
10+
"index_version": 13,
11+
"k-mer length": 31,
12+
"start_time": "Thu Feb 5 08:51:50 2026",
13+
"call": "kallisto quant -i output/amalgkit/apis_mellifera_all/work/index/Apis_mellifera_transcripts.idx -o output/amalgkit/apis_mellifera_all/work/quant/SRR10030260 -t 2 --single -l 200 -s 30 /Volumes/blue/data/apis_mellifera/SRR10030260/SRR10030260.fastq.gz"
14+
}
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
{
2+
"n_targets": 28355,
3+
"n_bootstraps": 0,
4+
"n_processed": 7586964,
5+
"n_pseudoaligned": 5956799,
6+
"n_unique": 3611675,
7+
"p_pseudoaligned": 78.5,
8+
"p_unique": 47.6,
9+
"kallisto_version": "0.51.1",
10+
"index_version": 13,
11+
"k-mer length": 31,
12+
"start_time": "Thu Feb 5 09:17:52 2026",
13+
"call": "kallisto quant -i output/amalgkit/apis_mellifera_all/work/index/Apis_mellifera_transcripts.idx -o output/amalgkit/apis_mellifera_all/work/quant/SRR1254943 -t 2 --single -l 200 -s 30 /Volumes/blue/data/apis_mellifera/SRR1254943/SRR1254943.fastq.gz"
14+
}
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
{
2+
"n_targets": 28355,
3+
"n_bootstraps": 0,
4+
"n_processed": 5381117,
5+
"n_pseudoaligned": 5044398,
6+
"n_unique": 2990289,
7+
"p_pseudoaligned": 93.7,
8+
"p_unique": 55.6,
9+
"kallisto_version": "0.51.1",
10+
"index_version": 13,
11+
"k-mer length": 31,
12+
"start_time": "Thu Feb 5 09:22:34 2026",
13+
"call": "kallisto quant -i output/amalgkit/apis_mellifera_all/work/index/Apis_mellifera_transcripts.idx -o output/amalgkit/apis_mellifera_all/work/quant/SRR14494818 -t 2 /Volumes/blue/data/apis_mellifera/SRR14494818/SRR14494818_1.fastq.gz /Volumes/blue/data/apis_mellifera/SRR14494818/SRR14494818_2.fastq.gz"
14+
}
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
{
2+
"n_targets": 28355,
3+
"n_bootstraps": 0,
4+
"n_processed": 9286835,
5+
"n_pseudoaligned": 5979778,
6+
"n_unique": 1030023,
7+
"p_pseudoaligned": 64.4,
8+
"p_unique": 11.1,
9+
"kallisto_version": "0.51.1",
10+
"index_version": 13,
11+
"k-mer length": 31,
12+
"start_time": "Thu Feb 5 09:21:28 2026",
13+
"call": "kallisto quant -i output/amalgkit/apis_mellifera_all/work/index/Apis_mellifera_transcripts.idx -o output/amalgkit/apis_mellifera_all/work/quant/SRR26149819 -t 2 --single -l 200 -s 30 /Volumes/blue/data/apis_mellifera/SRR26149819/SRR26149819.fastq.gz"
14+
}
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
{
2+
"n_targets": 28355,
3+
"n_bootstraps": 0,
4+
"n_processed": 8846554,
5+
"n_pseudoaligned": 4614179,
6+
"n_unique": 697600,
7+
"p_pseudoaligned": 52.2,
8+
"p_unique": 7.9,
9+
"kallisto_version": "0.51.1",
10+
"index_version": 13,
11+
"k-mer length": 31,
12+
"start_time": "Thu Feb 5 09:16:56 2026",
13+
"call": "kallisto quant -i output/amalgkit/apis_mellifera_all/work/index/Apis_mellifera_transcripts.idx -o output/amalgkit/apis_mellifera_all/work/quant/SRR26149866 -t 2 --single -l 200 -s 30 /Volumes/blue/data/apis_mellifera/SRR26149866/SRR26149866.fastq.gz"
14+
}
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
{
2+
"n_targets": 28355,
3+
"n_bootstraps": 0,
4+
"n_processed": 9181469,
5+
"n_pseudoaligned": 4045193,
6+
"n_unique": 624440,
7+
"p_pseudoaligned": 44.1,
8+
"p_unique": 6.8,
9+
"kallisto_version": "0.51.1",
10+
"index_version": 13,
11+
"k-mer length": 31,
12+
"start_time": "Thu Feb 5 09:22:30 2026",
13+
"call": "kallisto quant -i output/amalgkit/apis_mellifera_all/work/index/Apis_mellifera_transcripts.idx -o output/amalgkit/apis_mellifera_all/work/quant/SRR26150166 -t 2 --single -l 200 -s 30 /Volumes/blue/data/apis_mellifera/SRR26150166/SRR26150166.fastq.gz"
14+
}
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
{
2+
"n_targets": 28355,
3+
"n_bootstraps": 0,
4+
"n_processed": 16406968,
5+
"n_pseudoaligned": 12455073,
6+
"n_unique": 8542421,
7+
"p_pseudoaligned": 75.9,
8+
"p_unique": 52.1,
9+
"kallisto_version": "0.51.1",
10+
"index_version": 13,
11+
"k-mer length": 31,
12+
"start_time": "Thu Feb 5 09:21:46 2026",
13+
"call": "kallisto quant -i output/amalgkit/apis_mellifera_all/work/index/Apis_mellifera_transcripts.idx -o output/amalgkit/apis_mellifera_all/work/quant/SRR9705235 -t 2 --single -l 200 -s 30 /Volumes/blue/data/apis_mellifera/SRR9705235/SRR9705235.fastq.gz"
14+
}

0 commit comments

Comments
 (0)