griffithlab
diff --git a/‎.idea/vcs.xml‎
Lines changed: 6 additions & 0 deletions b/‎.idea/vcs.xml‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 23 additions & 24 deletions b/‎README.md‎
Lines changed: 23 additions & 24 deletions
diff --git a/‎data/protein.faa.zip‎
27.1 MB b/‎data/protein.faa.zip‎
27.1 MB
diff --git a/‎logs/.2241a41cac6b783a91d7f3cae270c084b996b6de-audit.json‎
Lines changed: 20 additions & 0 deletions b/‎logs/.2241a41cac6b783a91d7f3cae270c084b996b6de-audit.json‎
Lines changed: 20 additions & 0 deletions
diff --git a/‎logs/mcp-puppeteer-2025-07-29.log‎
Lines changed: 2 additions & 0 deletions b/‎logs/mcp-puppeteer-2025-07-29.log‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎logs/mcp-puppeteer-2025-07-31.log‎
Lines changed: 6 additions & 0 deletions b/‎logs/mcp-puppeteer-2025-07-31.log‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎requirements.txt‎
Lines changed: 1 addition & 2 deletions b/‎requirements.txt‎
Lines changed: 1 addition & 2 deletions
diff --git a/‎run_docker_pvactools.sh‎
Lines changed: 19 additions & 0 deletions b/‎run_docker_pvactools.sh‎
Lines changed: 19 additions & 0 deletions
diff --git a/‎scripts/generation/generate_control_peptides.py‎
Lines changed: 55 additions & 33 deletions b/‎scripts/generation/generate_control_peptides.py‎
Lines changed: 55 additions & 33 deletions
@@ -4,21 +4,22 @@ A tool for generating reference peptide sets
 
 ## Features
 
-- **Three peptide generation modes:**
+- **Four peptide generation modes:**
   - Random (from 20 amino acids)
   - Sampled from a user-supplied FASTA file
-  - Generated by protein language models (ProtGPT2 or ESM-2 via HuggingFace Transformers)
+  - Generated by ProtGPT2 language model (interactive proteome workflow)
+  - Generated by ESM2 language model (direct generation)
 - **FASTA output** compatible with pVACtools
 - **Command-line interface** for batch processing
-- **Simple GUI** (using PySimpleGUI) for non-technical users
+- **Progress indicators** for all generation methods
 - **Reproducible and documented**: All parameters and code are managed in git
 
 ## Requirements
 
 - Python 3.8+
-- [PySimpleGUI](https://pysimplegui.readthedocs.io/)
 - [transformers](https://huggingface.co/docs/transformers/index)
 - [torch](https://pytorch.org/)
+- [tqdm](https://tqdm.github.io/) (for progress bars)
 
 Install dependencies with:
 ```bash
@@ -30,34 +31,32 @@ pip install -r requirements.txt
 
 ## Usage
 
-### Command Line
-
+### Random Generation
+Generate 100 random 9-mer peptides:
 ```bash
-# Random generation - Generate 1000 random 9-mer peptides
-python scripts/generation/generate_control_peptides.py --source random --length 9 --count 1000 --output Random-9mer-1000.fasta --seed 42
-
-# FASTA sampling - Sample 1000 9-mer peptides from reference proteome
-python scripts/generation/generate_control_peptides.py --source fasta --length 9 --count 1000 --fasta_file data/protein.faa --output RefProteome-9mer-1000.fasta
-
-# ProtGPT2 generation - Interactive workflow to generate peptides from synthetic proteins
-python scripts/generation/generate_control_peptides.py --source llm --llm_model protgpt2 --length 9 --count 1000 --output ProtGPT2-9mer-1000.fasta
-# This will prompt you to either:
-# 1) Use an existing ProtGPT2-generated proteome, or
-# 2) Generate a new synthetic proteome using ProtGPT2
+python scripts/generation/generate_control_peptides.py --source random --length 9 --count 100 --output Random-9mer-100.fasta
+```
 
-# Generate different peptide lengths with descriptive names
-python scripts/generation/generate_control_peptides.py --source random --length 8 --count 5000 --output Random-8mer-5000.fasta
-python scripts/generation/generate_control_peptides.py --source llm --llm_model protgpt2 --length 10 --count 2000 --output ProtGPT2-10mer-2000.fasta
+### FASTA Sampling
+Sample 100 9-mer peptides from a reference proteome:
+```bash
+python scripts/generation/generate_control_peptides.py --source fasta --length 9 --count 100 --fasta_file data/protein.faa --output RefProteome-9mer-100.fasta
 ```
 
-### GUI
+### ProtGPT2 Generation
+Generate 100 9-mer peptides using ProtGPT2 (interactive workflow):
+```bash
+python scripts/generation/generate_control_peptides.py --source llm --llm_model protgpt2 --length 9 --count 100 --output ProtGPT2-9mer-100.fasta
+```
+*This will prompt you to either use an existing proteome or generate a new synthetic proteome.*
 
+### ESM2 Generation
+Generate 100 9-mer peptides using ESM2 (direct generation):
 ```bash
-python peptide_gui.py
+python scripts/generation/generate_control_peptides.py --source llm --llm_model esm2 --length 9 --count 100 --output ESM2-9mer-100.fasta
 ```
-Follow the prompts to select generation mode, parameters, and output file.
 
 ## Acknowledgements
 
 - [ProtGPT2](https://huggingface.co/nferruz/ProtGPT2)
-- [PySimpleGUI](https://pysimplegui.readthedocs.io/)
+- [ESM2](https://huggingface.co/facebook/esm2_t6_8M_UR50D)
@@ -0,0 +1,20 @@
+{
+    "keep": {
+        "days": true,
+        "amount": 14
+    },
+    "auditLog": "/Users/chris/Desktop/Griffith Lab/Peptide Sequence Synthesis/logs/.2241a41cac6b783a91d7f3cae270c084b996b6de-audit.json",
+    "files": [
+        {
+            "date": 1753800742818,
+            "name": "/Users/chris/Desktop/Griffith Lab/Peptide Sequence Synthesis/logs/mcp-puppeteer-2025-07-29.log",
+            "hash": "ffe980fcea0fb7d27f5f6750563aee7ea2f76495534bbff4701002e4c2471ccf"
+        },
+        {
+            "date": 1753940970612,
+            "name": "/Users/chris/Desktop/Griffith Lab/Peptide Sequence Synthesis/logs/mcp-puppeteer-2025-07-31.log",
+            "hash": "ada19ae3c4ec575d27da2c5a1bca035c836d7f844ba154cf8977dbc1a09e3634"
+        }
+    ],
+    "hashType": "sha256"
+}
@@ -0,0 +1,2 @@
+{"level":"info","message":"Starting MCP server","service":"mcp-puppeteer","timestamp":"2025-07-29 10:52:22.865"}
+{"level":"info","message":"MCP server started successfully","service":"mcp-puppeteer","timestamp":"2025-07-29 10:52:22.866"}
@@ -0,0 +1,6 @@
+{"level":"info","message":"Starting MCP server","service":"mcp-puppeteer","timestamp":"2025-07-31 01:49:30.658"}
+{"level":"info","message":"MCP server started successfully","service":"mcp-puppeteer","timestamp":"2025-07-31 01:49:30.659"}
+{"level":"info","message":"Starting MCP server","service":"mcp-puppeteer","timestamp":"2025-07-31 12:48:39.866"}
+{"level":"info","message":"MCP server started successfully","service":"mcp-puppeteer","timestamp":"2025-07-31 12:48:39.867"}
+{"level":"info","message":"Starting MCP server","service":"mcp-puppeteer","timestamp":"2025-07-31 12:49:51.620"}
+{"level":"info","message":"MCP server started successfully","service":"mcp-puppeteer","timestamp":"2025-07-31 12:49:51.621"}
@@ -1,5 +1,4 @@
-PySimpleGUI>=4.60.0
-transformers>=4.20.0
+transformers>=4.21.0
 torch>=1.12.0
 # Data analysis / plotting
 pandas>=1.5.0
 
@@ -0,0 +1,19 @@
+#!/bin/bash
+"""
+Run PVACtools analysis using official Griffith Lab Docker image
+Addresses previous Docker "no output" issues
+"""
+
+echo "🧬 Starting PVACtools Docker Analysis"
+echo "====================================="
+
+# Set project directory
+PROJECT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+echo "Project directory: $PROJECT_DIR"
+
+# Make script executable and run
+chmod +x "$PROJECT_DIR/scripts/tools/docker_pvactools_runner.py"
+python3 "$PROJECT_DIR/scripts/tools/docker_pvactools_runner.py"
+
+echo ""
+echo "Analysis complete. Check results in: $PROJECT_DIR/results/pvacbind_docker/"
@@ -40,15 +40,23 @@ def parse_fasta_sequences(fasta_path: Path) -> List[str]:
     return sequences
 
 def sample_peptides_from_fasta(fasta_path: Path, length: int, count: int) -> List[str]:
+    print(f"Parsing FASTA file: {fasta_path}")
     sequences = parse_fasta_sequences(fasta_path)
+    
+    print(f"Extracting {length}-mer peptides from {len(sequences)} proteins...")
     all_subseqs = set()  # Use set to automatically collapse duplicates
-    for seq in sequences:
+    
+    # Use progress bar for subsequence extraction
+    for seq in tqdm(sequences, desc="Processing proteins", unit="protein"):
         if len(seq) >= length:
             for i in range(len(seq) - length + 1):
                 all_subseqs.add(seq[i:i+length])
+    
     if not all_subseqs:
         raise ValueError(f"No subsequences of length {length} found in {fasta_path}")
 
+    print(f"Found {len(all_subseqs)} unique {length}-mer peptides")
+    
     # Convert set back to list for sampling
     unique_subseqs = list(all_subseqs)
 
@@ -58,6 +66,7 @@ def sample_peptides_from_fasta(fasta_path: Path, length: int, count: int) -> Lis
         return unique_subseqs
 
     # Sample without replacement
+    print(f"Sampling {count} peptides without replacement...")
     return random.sample(unique_subseqs, k=count)
 
 def generate_llm_peptides(length: int, count: int, model_name: str = "protgpt2", top_k: int = 950, top_p: float = 0.9, repetition_penalty: float = 1.2) -> List[str]:
@@ -332,9 +341,11 @@ def generate_fake_proteome(num_proteins: int, target_lengths: List[int], model_n
         sys.exit(1)
 
 def write_fasta(peptides: List[str], output_path: Path, prefix: str = "peptide"):
+    print(f"Writing {len(peptides)} peptides to {output_path}...")
     with open(output_path, 'w') as f:
-        for i, pep in enumerate(peptides, 1):
-            f.write(f">{prefix}_{i}\n{pep}\n")
+        for i, pep in enumerate(tqdm(peptides, desc="Writing peptides", unit="peptide")):
+            f.write(f">{prefix}_{i+1}\n{pep}\n")
+    print(f"✅ Successfully wrote {len(peptides)} peptides to {output_path}")
 
 def main():
     parser = argparse.ArgumentParser(description="Generate control peptides for neoantigen analysis.")
@@ -381,39 +392,50 @@ def main():
             sys.exit(1)
         peptides = sample_peptides_from_fasta(args.fasta_file, args.length, args.count)
     elif args.source == 'llm':
-        # Interactive workflow for LLM-based peptide generation
-        print(f"\nGenerating {args.count} peptides of length {args.length} using LLM approach...")
-        print("This approach generates a fake proteome first, then samples peptides from it.")
-        
-        # Check if user has existing proteome
-        has_existing = get_user_input("\nDo you have an existing ProtGPT2-generated proteome file? (y/n): ").lower().startswith('y')
-        
-        if has_existing:
-            proteome_path = get_existing_proteome_path()
-            print(f"Using existing proteome: {proteome_path}")
+        # Choose workflow based on the LLM model
+        if args.llm_model.lower() == 'protgpt2':
+            # Interactive proteome workflow for ProtGPT2 (to avoid M bias)
+            print(f"\nGenerating {args.count} peptides of length {args.length} using ProtGPT2 proteome approach...")
+            print("This approach generates a fake proteome first, then samples peptides from it.")
+        elif args.llm_model.lower() == 'esm2':
+            # Direct generation for ESM2 (no M bias issue)
+            print(f"\nGenerating {args.count} peptides of length {args.length} using ESM2 direct generation...")
+            peptides = generate_llm_peptides(args.length, args.count, args.llm_model, args.top_k, args.top_p, args.repetition_penalty)
         else:
-            # Ask if user wants to generate new proteome
-            generate_new = get_user_input("Would you like to generate a new fake proteome? (y/n): ").lower().startswith('y')
-            
-            if not generate_new:
-                print("Cannot proceed without a proteome. Exiting.")
-                sys.exit(1)
-            
-            # Configure proteome generation (no reference needed)
-            num_proteins, target_lengths = configure_proteome_generation()
+            print(f"Error: Unsupported LLM model '{args.llm_model}'", file=sys.stderr)
+            sys.exit(1)
+        
+        # Only run interactive proteome workflow for ProtGPT2
+        if args.llm_model.lower() == 'protgpt2':
+            # Check if user has existing proteome
+            has_existing = get_user_input("\nDo you have an existing ProtGPT2-generated proteome file? (y/n): ").lower().startswith('y')
 
-            # Generate the fake proteome
-            fake_proteins = generate_fake_proteome(num_proteins, target_lengths, args.llm_model)
+            if has_existing:
+                proteome_path = get_existing_proteome_path()
+                print(f"Using existing proteome: {proteome_path}")
+            else:
+                # Ask if user wants to generate new proteome
+                generate_new = get_user_input("Would you like to generate a new fake proteome? (y/n): ").lower().startswith('y')
+                
+                if not generate_new:
+                    print("Cannot proceed without a proteome. Exiting.")
+                    sys.exit(1)
+                
+                # Configure proteome generation (no reference needed)
+                num_proteins, target_lengths = configure_proteome_generation()
+                
+                # Generate the fake proteome
+                fake_proteins = generate_fake_proteome(num_proteins, target_lengths, args.llm_model)
+                
+                # Save the generated proteome
+                proteome_output = Path(f'fake_proteome_{num_proteins}proteins.fasta')
+                write_fasta(fake_proteins, proteome_output, prefix="protein")
+                print(f"\nGenerated fake proteome saved to: {proteome_output}")
+                proteome_path = proteome_output
 
-            # Save the generated proteome
-            proteome_output = Path(f'fake_proteome_{num_proteins}proteins.fasta')
-            write_fasta(fake_proteins, proteome_output, prefix="protein")
-            print(f"\nGenerated fake proteome saved to: {proteome_output}")
-            proteome_path = proteome_output
-        
-        # Now sample peptides from the proteome using FASTA method
-        print(f"\nSampling {args.count} peptides from the proteome...")
-        peptides = sample_peptides_from_fasta(proteome_path, args.length, args.count)
+            # Now sample peptides from the proteome using FASTA method
+            print(f"\nSampling {args.count} peptides from the proteome...")
+            peptides = sample_peptides_from_fasta(proteome_path, args.length, args.count)
     else:
         print(f"Unknown source: {args.source}", file=sys.stderr)
         sys.exit(1)
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,2 @@`
	`1`	`+{"level":"info","message":"Starting MCP server","service":"mcp-puppeteer","timestamp":"2025-07-29 10:52:22.865"}`
	`2`	`+{"level":"info","message":"MCP server started successfully","service":"mcp-puppeteer","timestamp":"2025-07-29 10:52:22.866"}`