Skip to content

Commit dd922b0

Browse files
committed
fix: test
1 parent 817f80a commit dd922b0

File tree

6 files changed

+453
-86
lines changed

6 files changed

+453
-86
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,3 +17,4 @@ hail*.log
1717
.python-version
1818
.idea
1919
.venv/
20+
notebooks

FEATURE_TRANS_PQTL_COLOC.md

Lines changed: 227 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,227 @@
1+
# Trans-pQTL Colocalisation Feature for L2G Prediction
2+
3+
## Overview
4+
5+
This feature adds trans-pQTL (protein quantitative trait loci) colocalisation scoring to the Locus-to-Gene (L2G) prediction pipeline. It identifies and scores genetic interactions between disease-associated loci and trans-acting protein QTL effects, enhancing gene prioritization in disease mapping.
6+
7+
## Feature Details
8+
9+
### What It Does
10+
11+
The `TransPQtlColocH4MaximumFeature` extracts the maximum H4 colocalisation probability between:
12+
13+
- **Left side**: Disease-associated credible sets (GWAS loci)
14+
- **Right side**: Trans-pQTL study loci
15+
16+
For each gene in a locus, it computes the maximum colocalisation H4 score from all trans-pQTL colocalizations. Genes without trans-pQTL colocalizations receive a score of 0.
17+
18+
### Biological Rationale
19+
20+
Trans-pQTLs represent protein expression changes that affect multiple cell types and tissues (trans-acting effects). When disease associations colocalize with trans-pQTLs, it suggests:
21+
22+
- The implicated gene's protein expression is a likely mediator of disease risk
23+
- Cross-tissue/cross-cell-type effects strengthen the causal inference
24+
- Gene prioritization based on mechanistic evidence rather than correlation alone
25+
26+
## Implementation
27+
28+
### New Components
29+
30+
#### 1. **`TransPQtlColocH4MaximumFeature` Class**
31+
32+
- Located in: `src/gentropy/dataset/l2g_features/colocalisation.py`
33+
- Inherits from: `L2GFeature`
34+
- Feature name: `transPQtlColocH4Maximum`
35+
- Dependency types: `[Colocalisation, StudyIndex, StudyLocus]`
36+
37+
#### 2. **`common_trans_pqtl_colocalisation_feature_logic()` Function**
38+
39+
- Implements the core logic for trans-pQTL feature computation
40+
- Filters colocalisation dataset for trans-pQTL-specific results
41+
- Returns features in long format (studyLocusId, geneId, featureName, featureValue)
42+
43+
#### 3. **Feature Factory Registration**
44+
45+
- Added to `src/gentropy/method/l2g/feature_factory.py`
46+
- Enables feature discovery and automatic instantiation
47+
- Maps feature name `"transPQtlColocH4Maximum"` to the feature class
48+
49+
### Algorithm
50+
51+
1. **Identify trans-pQTL study loci**: Filter study locus dataset for `isTransQtl == True`
52+
2. **Filter colocalisation results**: Keep only colocalizations where the right study is a trans-pQTL
53+
3. **Extract gene information**: Join with study index to map genes to trans-pQTL studies
54+
4. **Compute maximum**: For each left study locus and gene pair, find the maximum H4 score
55+
5. **Handle missing values**: Genes without trans-pQTL colocalizations get score 0.0
56+
57+
### Integration with L2G Pipeline
58+
59+
The feature integrates seamlessly with the existing L2G infrastructure:
60+
61+
```python
62+
from gentropy.method.l2g.feature_factory import FeatureFactory, L2GFeatureInputLoader
63+
64+
# Feature is automatically available in the feature mapper
65+
features_list = ["transPQtlColocH4Maximum", "pQtlColocH4Maximum", ...]
66+
67+
feature_factory = FeatureFactory(study_loci, features_list)
68+
features = feature_factory.generate_features(
69+
L2GFeatureInputLoader(
70+
colocalisation=coloc_dataset,
71+
study_index=study_index,
72+
study_locus=study_locus,
73+
)
74+
)
75+
```
76+
77+
## Testing
78+
79+
### Test Coverage
80+
81+
All tests are located in `tests/gentropy/dataset/test_l2g_feature.py` under `TestTransPQtlColocH4Feature` class:
82+
83+
1. **`test_trans_pqtl_coloc_h4_maximum`**
84+
85+
- Verifies feature computation with trans-pQTL data
86+
- Tests correct column structure
87+
- Tests feature name in long format DataFrame
88+
89+
2. **`test_trans_pqtl_coloc_with_no_trans_qtls`**
90+
91+
- Verifies genes without trans-pQTL colocalizations receive score 0
92+
- Tests handling of cis-only study loci
93+
94+
3. **`test_trans_pqtl_feature_factory_inclusion`**
95+
96+
- Tests feature factory registration
97+
- Verifies correct class mapping
98+
- Validates feature discoverability
99+
100+
4. **Parametrized factory test**
101+
- Included in existing `test_feature_factory_return_type` test
102+
- Verifies feature returns proper L2GFeature instance
103+
- Tests dependency injection
104+
105+
### Running Tests
106+
107+
```bash
108+
# Run all trans-pQTL feature tests
109+
pytest tests/gentropy/dataset/test_l2g_feature.py::TestTransPQtlColocH4Feature -v
110+
111+
# Run with coverage
112+
pytest tests/gentropy/dataset/test_l2g_feature.py::TestTransPQtlColocH4Feature \
113+
--cov=src/gentropy/dataset/l2g_features/colocalisation \
114+
--cov-report=term-missing
115+
116+
# Run factory test for new feature
117+
pytest tests/gentropy/dataset/test_l2g_feature.py::test_feature_factory_return_type \
118+
-k "TransPQtl" -v
119+
```
120+
121+
## Example Usage
122+
123+
```python
124+
from gentropy.dataset.l2g_feature_matrix import L2GFeatureMatrix
125+
from gentropy.method.l2g.feature_factory import L2GFeatureInputLoader
126+
127+
# Create feature matrix with trans-pQTL feature
128+
feature_matrix = L2GFeatureMatrix.from_features_list(
129+
study_loci_to_annotate=credible_set,
130+
features_list=["transPQtlColocH4Maximum", "pQtlColocH4Maximum", ...],
131+
features_input_loader=L2GFeatureInputLoader(
132+
colocalisation=coloc_dataset,
133+
study_index=study_index,
134+
study_locus=study_locus,
135+
),
136+
)
137+
138+
# Use in L2G model training
139+
from gentropy.method.l2g.trainer import LocusToGeneTrainer
140+
141+
trainer = LocusToGeneTrainer(
142+
model=model,
143+
feature_matrix=feature_matrix,
144+
features_list=["transPQtlColocH4Maximum", ...],
145+
)
146+
147+
trained_model = trainer.fit()
148+
```
149+
150+
## Files Modified
151+
152+
1. **src/gentropy/dataset/l2g_features/colocalisation.py**
153+
154+
- Added `common_trans_pqtl_colocalisation_feature_logic()` function
155+
- Added `TransPQtlColocH4MaximumFeature` class
156+
- Total: ~140 lines added
157+
158+
2. **src/gentropy/method/l2g/feature_factory.py**
159+
160+
- Updated import to include `TransPQtlColocH4MaximumFeature`
161+
- Added feature to `feature_mapper` dictionary
162+
- Total: 2 lines added
163+
164+
3. **tests/gentropy/dataset/test_l2g_feature.py**
165+
- Added `TestTransPQtlColocH4Feature` test class
166+
- Added import for `TransPQtlColocH4MaximumFeature`
167+
- Updated parametrized test to include new feature
168+
- Total: ~200 lines added
169+
170+
## Design Decisions
171+
172+
### Why H4 vs CLPP?
173+
174+
The feature uses H4 (posterior probability of shared causal variant) rather than CLPP (colocalized likelihood ratio) because:
175+
176+
- H4 is more interpretable (direct probability)
177+
- Consistent with existing pQTL features
178+
- Better calibrated for L2G training
179+
180+
### Why Trans-pQTL Specific?
181+
182+
A dedicated feature for trans-pQTLs captures:
183+
184+
- Cross-tissue protein effects
185+
- Broader biological impact
186+
- Mechanistic evidence for disease causality
187+
- Distinct from cis-pQTL effects (already covered by `pQtlColocH4Maximum`)
188+
189+
### No Neighbourhood Feature
190+
191+
As requested, only the single feature is implemented (not a neighbourhood variant). This design:
192+
193+
- Focuses on strongest mechanistic evidence
194+
- Reduces feature dimensionality
195+
- Avoids overfitting on weak trans-effects
196+
197+
## Performance Characteristics
198+
199+
- **Computation time**: Linear in colocalisation dataset size
200+
- **Memory usage**: Minimal (only filters and aggregates)
201+
- **Sparsity**: Likely high (most genes have no trans-pQTL colocalizations)
202+
- **Distribution**: Skewed towards 0, with occasional high values
203+
204+
## Future Enhancements
205+
206+
Potential extensions for future work:
207+
208+
1. Add trans-pQTL features for other colocalisation metrics (CLPP)
209+
2. Implement neighbourhood aggregation (if needed)
210+
3. Add tissue-specific trans-pQTL features
211+
4. Integration with drug target predictions
212+
5. Validation studies using orthogonal methods
213+
214+
## References
215+
216+
- Original trans-pQTL analysis in notebook: `07_trans_pQTLs_CHEMBL_enrich.ipynb`
217+
- L2G feature framework: `src/gentropy/dataset/l2g_features/`
218+
- Colocalisation methods: `src/gentropy/method/colocalisation.py`
219+
- L2G prediction: `src/gentropy/method/l2g/`
220+
221+
## Questions?
222+
223+
For questions about this feature:
224+
225+
1. Check the implementation in `src/gentropy/dataset/l2g_features/colocalisation.py`
226+
2. Review tests in `tests/gentropy/dataset/test_l2g_feature.py`
227+
3. See original analysis in Notebook: `07_trans_pQTLs_CHEMBL_enrich.ipynb`

src/gentropy/config.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -361,6 +361,7 @@ class LocusToGeneFeatureMatrixConfig(StepConfig):
361361
study_index_path: str | None = None
362362
target_index_path: str | None = None
363363
intervals_path: str | None = None
364+
gene_interactions_path: str | None = None
364365
feature_matrix_path: str = MISSING
365366
features_list: list[str] = field(
366367
default_factory=lambda: [

0 commit comments

Comments
 (0)