-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Description
This happens due to the redundancy filtering. The issue is here:
data-leakage-ppi-prediction/create_gold_standard.py
Lines 162 to 175 in 227ea4c
| with open(f'Datasets_PPIs/Hippiev2.3/Intra{block}_pos.txt', 'r') as f: | |
| for line in f: | |
| pos_interactions += 1 | |
| prot_a, prot_b = line.strip().split(' ') | |
| if prot_a not in redundant_proteins and prot_b not in redundant_proteins and prot_a not in intra_sims and prot_b not in intra_sims: | |
| block_pos.add((prot_a, prot_b)) | |
| print(f'Positives: {len(block_pos)} / {pos_interactions} remained! Filtered {pos_interactions - len(block_pos)} PPIs ...') | |
| neg_interactions = 0 | |
| with open(f'Datasets_PPIs/Hippiev2.3/Intra{block}_neg.txt', 'r') as f: | |
| for line in f: | |
| neg_interactions += 1 | |
| prot_a, prot_b = line.strip().split(' ') | |
| if prot_a not in redundant_proteins and prot_b not in redundant_proteins and prot_a not in intra_sims and prot_b not in intra_sims: | |
| block_neg.add((prot_a, prot_b)) |
If a protein only interacts with redundant proteins in the positive dataset but in the negative dataset, it interacts with non-redundant proteins, it will only have negative edges (example: Q9NR71)
Solution:
Why did I even sample the negative dataset before the redundancy reduction? Turn the steps around:
- Make the partitioning
- Kick out redundant proteins within and between the blocks
- Sample the negatives by expected degree sampling
Metadata
Metadata
Assignees
Labels
No labels