Skip to content

Commit 91a335f

Browse files
committed
Remove GUI components and simplify README
- Remove peptide_gui.py and PySimpleGUI dependency - Simplify README with clear copy-paste commands for each method - Add ESM2 method to usage examples - All commands generate 100 peptides of length 9 for consistency - Remove GUI references from documentation - Update acknowledgements to include ESM2 - Exclude large protein.faa file (already in .gitignore)
1 parent 82c2748 commit 91a335f

11 files changed

+376
-148
lines changed

.idea/vcs.xml

Lines changed: 6 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

README.md

Lines changed: 23 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -4,21 +4,22 @@ A tool for generating reference peptide sets
44

55
## Features
66

7-
- **Three peptide generation modes:**
7+
- **Four peptide generation modes:**
88
- Random (from 20 amino acids)
99
- Sampled from a user-supplied FASTA file
10-
- Generated by protein language models (ProtGPT2 or ESM-2 via HuggingFace Transformers)
10+
- Generated by ProtGPT2 language model (interactive proteome workflow)
11+
- Generated by ESM2 language model (direct generation)
1112
- **FASTA output** compatible with pVACtools
1213
- **Command-line interface** for batch processing
13-
- **Simple GUI** (using PySimpleGUI) for non-technical users
14+
- **Progress indicators** for all generation methods
1415
- **Reproducible and documented**: All parameters and code are managed in git
1516

1617
## Requirements
1718

1819
- Python 3.8+
19-
- [PySimpleGUI](https://pysimplegui.readthedocs.io/)
2020
- [transformers](https://huggingface.co/docs/transformers/index)
2121
- [torch](https://pytorch.org/)
22+
- [tqdm](https://tqdm.github.io/) (for progress bars)
2223

2324
Install dependencies with:
2425
```bash
@@ -30,34 +31,32 @@ pip install -r requirements.txt
3031
3132
## Usage
3233
33-
### Command Line
34-
34+
### Random Generation
35+
Generate 100 random 9-mer peptides:
3536
```bash
36-
# Random generation - Generate 1000 random 9-mer peptides
37-
python scripts/generation/generate_control_peptides.py --source random --length 9 --count 1000 --output Random-9mer-1000.fasta --seed 42
38-
39-
# FASTA sampling - Sample 1000 9-mer peptides from reference proteome
40-
python scripts/generation/generate_control_peptides.py --source fasta --length 9 --count 1000 --fasta_file data/protein.faa --output RefProteome-9mer-1000.fasta
41-
42-
# ProtGPT2 generation - Interactive workflow to generate peptides from synthetic proteins
43-
python scripts/generation/generate_control_peptides.py --source llm --llm_model protgpt2 --length 9 --count 1000 --output ProtGPT2-9mer-1000.fasta
44-
# This will prompt you to either:
45-
# 1) Use an existing ProtGPT2-generated proteome, or
46-
# 2) Generate a new synthetic proteome using ProtGPT2
37+
python scripts/generation/generate_control_peptides.py --source random --length 9 --count 100 --output Random-9mer-100.fasta
38+
```
4739

48-
# Generate different peptide lengths with descriptive names
49-
python scripts/generation/generate_control_peptides.py --source random --length 8 --count 5000 --output Random-8mer-5000.fasta
50-
python scripts/generation/generate_control_peptides.py --source llm --llm_model protgpt2 --length 10 --count 2000 --output ProtGPT2-10mer-2000.fasta
40+
### FASTA Sampling
41+
Sample 100 9-mer peptides from a reference proteome:
42+
```bash
43+
python scripts/generation/generate_control_peptides.py --source fasta --length 9 --count 100 --fasta_file data/protein.faa --output RefProteome-9mer-100.fasta
5144
```
5245

53-
### GUI
46+
### ProtGPT2 Generation
47+
Generate 100 9-mer peptides using ProtGPT2 (interactive workflow):
48+
```bash
49+
python scripts/generation/generate_control_peptides.py --source llm --llm_model protgpt2 --length 9 --count 100 --output ProtGPT2-9mer-100.fasta
50+
```
51+
*This will prompt you to either use an existing proteome or generate a new synthetic proteome.*
5452

53+
### ESM2 Generation
54+
Generate 100 9-mer peptides using ESM2 (direct generation):
5555
```bash
56-
python peptide_gui.py
56+
python scripts/generation/generate_control_peptides.py --source llm --llm_model esm2 --length 9 --count 100 --output ESM2-9mer-100.fasta
5757
```
58-
Follow the prompts to select generation mode, parameters, and output file.
5958

6059
## Acknowledgements
6160

6261
- [ProtGPT2](https://huggingface.co/nferruz/ProtGPT2)
63-
- [PySimpleGUI](https://pysimplegui.readthedocs.io/)
62+
- [ESM2](https://huggingface.co/facebook/esm2_t6_8M_UR50D)

data/protein.faa.zip

27.1 MB
Binary file not shown.
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
{
2+
"keep": {
3+
"days": true,
4+
"amount": 14
5+
},
6+
"auditLog": "/Users/chris/Desktop/Griffith Lab/Peptide Sequence Synthesis/logs/.2241a41cac6b783a91d7f3cae270c084b996b6de-audit.json",
7+
"files": [
8+
{
9+
"date": 1753800742818,
10+
"name": "/Users/chris/Desktop/Griffith Lab/Peptide Sequence Synthesis/logs/mcp-puppeteer-2025-07-29.log",
11+
"hash": "ffe980fcea0fb7d27f5f6750563aee7ea2f76495534bbff4701002e4c2471ccf"
12+
},
13+
{
14+
"date": 1753940970612,
15+
"name": "/Users/chris/Desktop/Griffith Lab/Peptide Sequence Synthesis/logs/mcp-puppeteer-2025-07-31.log",
16+
"hash": "ada19ae3c4ec575d27da2c5a1bca035c836d7f844ba154cf8977dbc1a09e3634"
17+
}
18+
],
19+
"hashType": "sha256"
20+
}

logs/mcp-puppeteer-2025-07-29.log

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
{"level":"info","message":"Starting MCP server","service":"mcp-puppeteer","timestamp":"2025-07-29 10:52:22.865"}
2+
{"level":"info","message":"MCP server started successfully","service":"mcp-puppeteer","timestamp":"2025-07-29 10:52:22.866"}

logs/mcp-puppeteer-2025-07-31.log

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
{"level":"info","message":"Starting MCP server","service":"mcp-puppeteer","timestamp":"2025-07-31 01:49:30.658"}
2+
{"level":"info","message":"MCP server started successfully","service":"mcp-puppeteer","timestamp":"2025-07-31 01:49:30.659"}
3+
{"level":"info","message":"Starting MCP server","service":"mcp-puppeteer","timestamp":"2025-07-31 12:48:39.866"}
4+
{"level":"info","message":"MCP server started successfully","service":"mcp-puppeteer","timestamp":"2025-07-31 12:48:39.867"}
5+
{"level":"info","message":"Starting MCP server","service":"mcp-puppeteer","timestamp":"2025-07-31 12:49:51.620"}
6+
{"level":"info","message":"MCP server started successfully","service":"mcp-puppeteer","timestamp":"2025-07-31 12:49:51.621"}

requirements.txt

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,4 @@
1-
PySimpleGUI>=4.60.0
2-
transformers>=4.20.0
1+
transformers>=4.21.0
32
torch>=1.12.0
43
# Data analysis / plotting
54
pandas>=1.5.0

run_docker_pvactools.sh

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
#!/bin/bash
2+
"""
3+
Run PVACtools analysis using official Griffith Lab Docker image
4+
Addresses previous Docker "no output" issues
5+
"""
6+
7+
echo "🧬 Starting PVACtools Docker Analysis"
8+
echo "====================================="
9+
10+
# Set project directory
11+
PROJECT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
12+
echo "Project directory: $PROJECT_DIR"
13+
14+
# Make script executable and run
15+
chmod +x "$PROJECT_DIR/scripts/tools/docker_pvactools_runner.py"
16+
python3 "$PROJECT_DIR/scripts/tools/docker_pvactools_runner.py"
17+
18+
echo ""
19+
echo "Analysis complete. Check results in: $PROJECT_DIR/results/pvacbind_docker/"

scripts/generation/generate_control_peptides.py

Lines changed: 55 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -40,15 +40,23 @@ def parse_fasta_sequences(fasta_path: Path) -> List[str]:
4040
return sequences
4141

4242
def sample_peptides_from_fasta(fasta_path: Path, length: int, count: int) -> List[str]:
43+
print(f"Parsing FASTA file: {fasta_path}")
4344
sequences = parse_fasta_sequences(fasta_path)
45+
46+
print(f"Extracting {length}-mer peptides from {len(sequences)} proteins...")
4447
all_subseqs = set() # Use set to automatically collapse duplicates
45-
for seq in sequences:
48+
49+
# Use progress bar for subsequence extraction
50+
for seq in tqdm(sequences, desc="Processing proteins", unit="protein"):
4651
if len(seq) >= length:
4752
for i in range(len(seq) - length + 1):
4853
all_subseqs.add(seq[i:i+length])
54+
4955
if not all_subseqs:
5056
raise ValueError(f"No subsequences of length {length} found in {fasta_path}")
5157

58+
print(f"Found {len(all_subseqs)} unique {length}-mer peptides")
59+
5260
# Convert set back to list for sampling
5361
unique_subseqs = list(all_subseqs)
5462

@@ -58,6 +66,7 @@ def sample_peptides_from_fasta(fasta_path: Path, length: int, count: int) -> Lis
5866
return unique_subseqs
5967

6068
# Sample without replacement
69+
print(f"Sampling {count} peptides without replacement...")
6170
return random.sample(unique_subseqs, k=count)
6271

6372
def generate_llm_peptides(length: int, count: int, model_name: str = "protgpt2", top_k: int = 950, top_p: float = 0.9, repetition_penalty: float = 1.2) -> List[str]:
@@ -332,9 +341,11 @@ def generate_fake_proteome(num_proteins: int, target_lengths: List[int], model_n
332341
sys.exit(1)
333342

334343
def write_fasta(peptides: List[str], output_path: Path, prefix: str = "peptide"):
344+
print(f"Writing {len(peptides)} peptides to {output_path}...")
335345
with open(output_path, 'w') as f:
336-
for i, pep in enumerate(peptides, 1):
337-
f.write(f">{prefix}_{i}\n{pep}\n")
346+
for i, pep in enumerate(tqdm(peptides, desc="Writing peptides", unit="peptide")):
347+
f.write(f">{prefix}_{i+1}\n{pep}\n")
348+
print(f"✅ Successfully wrote {len(peptides)} peptides to {output_path}")
338349

339350
def main():
340351
parser = argparse.ArgumentParser(description="Generate control peptides for neoantigen analysis.")
@@ -381,39 +392,50 @@ def main():
381392
sys.exit(1)
382393
peptides = sample_peptides_from_fasta(args.fasta_file, args.length, args.count)
383394
elif args.source == 'llm':
384-
# Interactive workflow for LLM-based peptide generation
385-
print(f"\nGenerating {args.count} peptides of length {args.length} using LLM approach...")
386-
print("This approach generates a fake proteome first, then samples peptides from it.")
387-
388-
# Check if user has existing proteome
389-
has_existing = get_user_input("\nDo you have an existing ProtGPT2-generated proteome file? (y/n): ").lower().startswith('y')
390-
391-
if has_existing:
392-
proteome_path = get_existing_proteome_path()
393-
print(f"Using existing proteome: {proteome_path}")
395+
# Choose workflow based on the LLM model
396+
if args.llm_model.lower() == 'protgpt2':
397+
# Interactive proteome workflow for ProtGPT2 (to avoid M bias)
398+
print(f"\nGenerating {args.count} peptides of length {args.length} using ProtGPT2 proteome approach...")
399+
print("This approach generates a fake proteome first, then samples peptides from it.")
400+
elif args.llm_model.lower() == 'esm2':
401+
# Direct generation for ESM2 (no M bias issue)
402+
print(f"\nGenerating {args.count} peptides of length {args.length} using ESM2 direct generation...")
403+
peptides = generate_llm_peptides(args.length, args.count, args.llm_model, args.top_k, args.top_p, args.repetition_penalty)
394404
else:
395-
# Ask if user wants to generate new proteome
396-
generate_new = get_user_input("Would you like to generate a new fake proteome? (y/n): ").lower().startswith('y')
397-
398-
if not generate_new:
399-
print("Cannot proceed without a proteome. Exiting.")
400-
sys.exit(1)
401-
402-
# Configure proteome generation (no reference needed)
403-
num_proteins, target_lengths = configure_proteome_generation()
405+
print(f"Error: Unsupported LLM model '{args.llm_model}'", file=sys.stderr)
406+
sys.exit(1)
407+
408+
# Only run interactive proteome workflow for ProtGPT2
409+
if args.llm_model.lower() == 'protgpt2':
410+
# Check if user has existing proteome
411+
has_existing = get_user_input("\nDo you have an existing ProtGPT2-generated proteome file? (y/n): ").lower().startswith('y')
404412

405-
# Generate the fake proteome
406-
fake_proteins = generate_fake_proteome(num_proteins, target_lengths, args.llm_model)
413+
if has_existing:
414+
proteome_path = get_existing_proteome_path()
415+
print(f"Using existing proteome: {proteome_path}")
416+
else:
417+
# Ask if user wants to generate new proteome
418+
generate_new = get_user_input("Would you like to generate a new fake proteome? (y/n): ").lower().startswith('y')
419+
420+
if not generate_new:
421+
print("Cannot proceed without a proteome. Exiting.")
422+
sys.exit(1)
423+
424+
# Configure proteome generation (no reference needed)
425+
num_proteins, target_lengths = configure_proteome_generation()
426+
427+
# Generate the fake proteome
428+
fake_proteins = generate_fake_proteome(num_proteins, target_lengths, args.llm_model)
429+
430+
# Save the generated proteome
431+
proteome_output = Path(f'fake_proteome_{num_proteins}proteins.fasta')
432+
write_fasta(fake_proteins, proteome_output, prefix="protein")
433+
print(f"\nGenerated fake proteome saved to: {proteome_output}")
434+
proteome_path = proteome_output
407435

408-
# Save the generated proteome
409-
proteome_output = Path(f'fake_proteome_{num_proteins}proteins.fasta')
410-
write_fasta(fake_proteins, proteome_output, prefix="protein")
411-
print(f"\nGenerated fake proteome saved to: {proteome_output}")
412-
proteome_path = proteome_output
413-
414-
# Now sample peptides from the proteome using FASTA method
415-
print(f"\nSampling {args.count} peptides from the proteome...")
416-
peptides = sample_peptides_from_fasta(proteome_path, args.length, args.count)
436+
# Now sample peptides from the proteome using FASTA method
437+
print(f"\nSampling {args.count} peptides from the proteome...")
438+
peptides = sample_peptides_from_fasta(proteome_path, args.length, args.count)
417439
else:
418440
print(f"Unknown source: {args.source}", file=sys.stderr)
419441
sys.exit(1)

0 commit comments

Comments
 (0)