Skip to content

Commit dd7c560

Browse files
committed
feat: Add biologically-informed peptide mutation
- Implements peptide mutation using empirical substitution probabilities from cancer data. - Adds interactive and command-line mutation workflows. - Updates README with usage instructions. - Cleans up old data files and adds .gitignore.
1 parent 911d8fd commit dd7c560

9 files changed

+495
-60137
lines changed

.gitignore

Lines changed: 4 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -1,39 +1,9 @@
1-
.DS_Store
2-
.DS_Store
3-
.DS_Store
4-
.DS_Store
5-
.DS_Store
6-
data/GCF_000001405.40/protein.faa
7-
Context.md
8-
CLAUDE.md
9-
test.fasta
10-
process-todos.md
11-
.claude
12-
.codebuddy
131

14-
# Generated pVACbind outputs
15-
results/pvacbind*/
2+
# Data and output files
3+
*.fasta
4+
*.tsv
165

17-
# Large reference proteome file (too big for GitHub)
18-
data/protein.faa
19-
20-
# Python cache files
6+
# Python cache
217
__pycache__/
228
*.pyc
23-
*.pyo
24-
*.pyd
25-
.Python
26-
build/
27-
develop-eggs/
28-
dist/
29-
downloads/
30-
eggs/
31-
.eggs/
32-
lib/
33-
lib64/
34-
parts/
35-
sdist/
36-
var/
37-
wheels/
38-
399

README.md

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ A tool for generating reference peptide sets
1313
- **Command-line interface** for batch processing
1414
- **Progress indicators** for all generation methods
1515
- **Reproducible and documented**: All parameters and code are managed in git
16+
- **Biologically-Informed Peptide Mutation**: Mutate peptides using empirical substitution frequencies derived from ~1.6 million cancer mutations.
1617

1718
## Requirements
1819

@@ -56,6 +57,40 @@ Generate 100 9-mer peptides using ESM2 (direct generation):
5657
python scripts/generation/generate_control_peptides.py --source llm --llm_model esm2 --length 9 --count 100 --output ESM2-9mer-100.fasta
5758
```
5859

60+
### Peptide Mutation
61+
62+
After generating any set of peptides, you will be prompted to optionally mutate them. You can also apply mutations directly or to an existing FASTA file.
63+
64+
**1. Interactive Mutation (After Generation)**
65+
66+
Run any generation command, and an interactive prompt will appear:
67+
68+
```bash
69+
python scripts/generation/generate_control_peptides.py --source random --length 8 --count 10
70+
71+
...
72+
73+
🧬 Would you like to mutate these peptides? (y/n): y
74+
How many mutations per peptide? (default: 1): 2
75+
```
76+
This will create a new file, e.g., `control_peptides_mutated_2x.fasta`.
77+
78+
**2. Direct Mutation (With Flags)**
79+
80+
Use the `--mutate` and `--mutations` flags to apply mutations automatically:
81+
82+
```bash
83+
python scripts/generation/generate_control_peptides.py --source random --length 9 --count 100 --mutate --mutations 1
84+
```
85+
86+
**3. Mutate an Existing FASTA File**
87+
88+
Use the `peptide_mutations.py` script to mutate an existing peptide set:
89+
90+
```bash
91+
python scripts/generation/peptide_mutations.py --fasta_file path/to/your/peptides.fasta --mutations 1
92+
```
93+
5994
## Docker Usage
6095

6196
Run the tool using Docker (no local setup required):

0 commit comments

Comments
 (0)