An Open OnDemand Batch Connect app that provides a web-based interface for running protein structure prediction jobs using AlphaFold 2 and AlphaFold 3 on the ICDS Roar cluster. The app simplifies the process of submitting and monitoring AlphaFold jobs by providing a user-friendly interface and automated job management.
This app uses the Batch Connect basic template with Slurm. It executes a two-phase
workflow: a CPU phase for MSA generation and a GPU phase for structure prediction,
with the GPU job submitted as a dependency of the CPU job.
- Upstream project: AlphaFold by DeepMind
- Batch Connect template:
basic - Scheduler: Slurm
- Container runtime: Singularity
| Model selection, partition & working directory (left) | JSON input & terms of service (right) |
|---|---|
![]() |
![]() |
![]() |
![]() |
- Leveraging AlphaFold in Graduate Research
- OSC News: Inaugural GOOD Conference Draws Strong Attendance from 10 Countries
Presented as a talk at the Global Open On Demand Conference 2025, Harvard University
Date: March 19, 2025, 4:00 PM – 4:25 PM (25 min)
Title: AlphaFold accessibility: an optimized open-source OOD app for Protein Structure Prediction
Speakers: Vinay Saji Mathew [Pennsylvania State University] , William Lai [Cornell], Matt Hansen [Pennsylvania State University]Track: Application Track [featuring AI OnDemand]
Location: Tsai Auditorium (CGIS S010)
-
AlphaFold 2:
- Supports AlphaFold v2.3.2 for protein structure prediction
- Handles both monomer and multimer predictions
- Uses full database configuration for maximum accuracy
- Automated MSA generation and template search
-
AlphaFold 3 (New!):
- Latest version of AlphaFold with improved accuracy
- Supports protein-protein, protein-DNA/RNA, and protein-ligand complexes
- Enhanced diffusion-based structure prediction
- Requires acceptance of Google's terms of service
- Two-phase execution:
- CPU phase for MSA/templates
- GPU phase for prediction (set as a dependency)
- Real-time job status monitoring
- Detailed progress tracking
- Automatic error handling and recovery
- Flexible Input Formats:
- FASTA sequence input for AlphaFold 2
- JSON format input for AlphaFold 3 (following official specifications)
- GPU allocation selection
- Working directory customization
- Real-time progress visualization
- Direct access to output files
-
AlphaFold 2:
- PDB structure files (ranked by confidence)
- Multiple Sequence Alignment (MSA) files
- Detailed prediction metrics and confidence scores
- Comprehensive log files
-
AlphaFold 3:
- CIF structure files
- Ranking scores for multiple predictions
- Detailed model outputs and metrics
- Complete execution logs
- Slurm scheduler
- Has been tested to work with OOD v3 & v4
Both AlphaFold versions require genetic databases that must be set up before using the app:
- AlphaFold 2: Download using script from AlphaFold 2 repository
- AlphaFold 3: Additional databases required. Setup instructions available here
The app uses Singularity containers for execution:
- AlphaFold 2: Download from Sylabs
- AlphaFold 3: Requires official container from Google (subject to terms of use). Weights needed for running AlphaFold 3 have to be requested from Google here
- Clone this repository into your Open OnDemand apps directory
- Configure paths in
template/alphafold_env.sh - Ensure all required databases are properly set up
- Verify GPU compute capabilities.
Edit form.yml.erb and update these values for your cluster:
| Attribute | ICDS Default | Change to |
|---|---|---|
cluster |
rc |
Your cluster name |
auto_accounts |
(dynamic) | GPU account selection for your site |
auto_queues |
(dynamic) | Queue/partition for your site |
working_directory |
/scratch/<user> |
Default scratch path on your site |
In before.sh.erb, the app sources alphafold_env.sh to set environment variables
for database paths, container paths, and working directories. You must configure this
file for your site.
| Attribute | Widget | Description | Default |
|---|---|---|---|
session_type |
select | Prediction engine (AlphaFold 2 or AlphaFold 3) | AlphaFold 2 |
auto_accounts |
select | GPU account for job submission | (dynamic) |
auto_queues |
select | Queue/partition for job submission | (dynamic) |
working_directory |
path_selector | Output directory (scratch space recommended) | /scratch/<user> |
protein_sequence |
text_area | Input sequence (FASTA for AF2, JSON for AF3) | (empty) |
agree_terms |
check_box | Accept Google's Terms of Service (AF3 only) | unchecked |
bc_email_on_started |
check_box | Email notification on job start/completion | unchecked |
- Access the Open OnDemand dashboard
- Navigate to "Interactive Apps"
- Select "Protein Structure Prediction"
- Choose prediction engine (AlphaFold 2 or 3)
- Fill out the form:
- For AlphaFold 2: Enter protein sequence in FASTA format
- For AlphaFold 3: Provide input in JSON format
- Select GPU allocation
- Choose working directory
- Accept terms of service (required for AlphaFold 3)
- Submit the job
The app accepts protein sequences in FASTA format.
Example:
>sequence_name
MVKVGVNGFGRIGRLVTRAAFNSGKVDIVAINDPFIDLNYMVYMFQYDSTHGKFHGTVKA
ENGKLVINGNPITIFQERDPSKIKWGDAGAEYVVESTGVFTTMEKAGAHLQGGAKRVIIS
{
"name": "example_complex",
"sequences": [
{
"protein": {
"id": "protein_chain_A",
"sequence": "MVKVGVNG..."
}
}
],
"modelSeeds": [1, 2, 3]
}
The app generates the following output structure:
working_directory/
└── run_YYYYMMDD_HHMMSS/
├── input/
│ ├── [structure files] # Predicted structures
│ ├── [prediction data] # Detailed predictions
│ └── msas/ # Multiple sequence alignments
├── logs/ # Job logs
├── CPU-SLURM/ # CPU phase files
└── GPU-SLURM/ # GPU phase files
The app provides real-time monitoring of:
- MSA generation progress
- Template search status
- Structure prediction progress
- Model relaxation status
Common issues and solutions:
-
Job fails in CPU phase:
- Check available disk space
- Verify database paths
- Examine CPU phase logs
-
GPU phase errors:
- Verify GPU allocation
- Check memory requirements
- Review GPU phase logs
- For AlphaFold 3: Ensure GPU compute availability.
For bugs or feature requests, open an issue.
- AlphaFold 2 -- protein structure prediction by DeepMind
- AlphaFold 3 -- latest version with protein-ligand complex support
- Open OnDemand -- the HPC portal framework
- OOD Batch Connect app development docs
MIT (see LICENSE file)
- AlphaFold by DeepMind Technologies Limited
- Singularity container by prehensilecode
- The research project is generously funded by Cornell University BRC Epigenomics Core Facility (RRID:SCR_021287), Penn State Institute for Computational and Data Sciences (RRID:SCR_025154) , Penn State University Center for Applications of Artificial Intelligence and Machine Learning to Industry Core Facility (AIMI) (RRID:SCR_022867) and supported by a gift to AIMI research from Dell Technologies.
- Computational support was provided by NSF ACCESS to William KM Lai and Gretta Kellogg through BIO230041
For questions or issues, please contact:
- Technical support: vinaysmathew@psu.edu
- ICDS support: icds@psu.edu



