Simulated annealing optimization for X-ray scattering molecular structure refinement.
We recommend using Miniconda or Anaconda for environment setup.
-
Create a new conda environment:
conda create --name sa python=3.10 conda activate sa
-
Install the following required packages from
conda-forge:conda install -c conda-forge pyscf numba rdkit openbabel openmm openff-toolkit openff-forcefields pytest
-
Optional (CUDA backend): install CuPy matching your CUDA version, for example:
conda install -c conda-forge cupy
Run with default configuration (uses input.toml):
python3 run.pyTip: The
input.tomlfile contains detailed comments explaining all parameters. See that file for comprehensive parameter documentation.
Run with a different config file:
python3 run.py --config my_config.tomlShow help message:
python3 run.py --helppython3 run.py --mode test --molecule chd --run-id test_runpython3 run.py \
--start-xyz-file xyz/my_start.xyz \
--reference-xyz-file xyz/my_reference.xyz \
--target-file xyz/my_target.xyz \
--forcefield-file forcefields/my_forcefield.offxmlpython3 run.py --run-pyscf-modes --pyscf-basis 6-31gpython3 run.py --verbosepython3 run.py --gpu-backend cudapython3 run.py --gpu-backend cuda --gpu-emulationpython3 run.py \
--qmin 0.1 \
--qmax 8.0 \
--qlen 81 \
--inelastic \
--pcd-modepython3 run.py \
--sa-starting-temp 1.0 \
--sa-nsteps 4000 \
--sa-step-size 0.012 \
--greedy-algorithm \
--ga-nsteps 20000 \
--ga-step-size 0.012python3 run.py --bonds --angles --torsionspython3 run.py \
--nrestarts 5 \
--ntotalruns 5python3 run.py --ewald-modepython3 run.py \
--tmin 0.0 \
--tmax 1.0 \
--tlen 21 \
--pmin 0.0 \
--pmax 2.0 \
--plen 21python3 run.py --sampling --boltzmann-temperature 300.0python3 run.py \
--noise-value 0.05 \
--noise-data-file noise/my_noise.datpython3 run.py \
--tuning-ratio-target 0.9 \
--c-tuning-initial 0.001python3 run.py --non-h-modes-onlypython3 run.py --hf-energypython3 run.py --write-dat-filepython3 run.py \
--config input.toml \
--mode test \
--molecule chd \
--run-id production_run_001 \
--results-dir results/production \
--start-xyz-file xyz/start.xyz \
--reference-xyz-file xyz/chd_reference.xyz \
--target-file xyz/target.xyz \
--forcefield-file forcefields/openff_unconstrained-2.0.0.offxml \
--run-pyscf-modes \
--pyscf-basis 6-31g* \
--verbose \
--inelastic \
--pcd-mode \
--ewald-mode \
--qmin 1e-9 \
--qmax 8.0 \
--qlen 81 \
--tmin 0.0 \
--tmax 1.0 \
--tlen 21 \
--pmin 0.0 \
--pmax 2.0 \
--plen 21 \
--sa-starting-temp 1.0 \
--sa-nsteps 4000 \
--greedy-algorithm \
--ga-nsteps 20000 \
--sa-step-size 0.012 \
--ga-step-size 0.012 \
--nrestarts 5 \
--ntotalruns 5 \
--bonds \
--angles \
--torsions \
--tuning-ratio-target 0.9 \
--c-tuning-initial 0.001The calculate_iam.py script is a standalone tool for calculating IAM (Independent Atom Model) scattering signals from XYZ files without running the full simulated annealing optimization. This is useful for:
- Quick IAM calculations from single structures or trajectories
- Generating reference signals for comparison
- Testing scattering calculations independently
- Batch processing multiple structures
Calculate IAM signal from a single XYZ file:
python3 calculate_iam.py input.xyz output.datCalculate IAM from an XYZ trajectory (multiple structures):
python3 calculate_iam.py trajectory.xyz output.datConfigure the q-vector range and resolution:
python3 calculate_iam.py input.xyz output.dat \
--qmin 0.1 \
--qmax 8.0 \
--qlen 100Calculate PCD signal relative to a reference structure:
python3 calculate_iam.py input.xyz output.dat \
--reference reference.xyz \
--pcdPCD is calculated as: PCD = 100 × (IAM / reference_IAM - 1)
Include Compton (inelastic) scattering contributions:
python3 calculate_iam.py input.xyz output.dat --inelasticNote: Requires data/Compton_Scattering_Intensities.npz file. If not found, the script will continue without inelastic scattering.
Calculate 3D scattering in Ewald sphere mode:
python3 calculate_iam.py input.xyz output.dat --ewaldConfigure Ewald parameters (theta and phi angles):
python3 calculate_iam.py input.xyz output.dat \
--ewald \
--tmin 0.0 \
--tmax 1.0 \
--tlen 21 \
--pmin 0.0 \
--pmax 2.0 \
--plen 21Note: Theta and phi are specified in units of π (e.g., tmax=1.0 means π radians).
Example with all options enabled:
python3 calculate_iam.py input.xyz output.dat \
--reference reference.xyz \
--pcd \
--inelastic \
--ewald \
--qmin 0.1 \
--qmax 8.0 \
--qlen 100 \
--tmin 0.0 \
--tmax 1.0 \
--tlen 21 \
--pmin 0.0 \
--pmax 2.0 \
--plen 21- Single structure: Output file contains two columns:
q (Å⁻¹)andIAM signal - Trajectory: Output file contains multiple columns:
q (Å⁻¹),IAM_1,IAM_2,IAM_3, ...
| Argument | Type | Default | Description |
|---|---|---|---|
input_xyz |
required | - | Input XYZ file (single structure or trajectory) |
output_dat |
required | - | Output DAT file with IAM signal |
--qmin |
float | 0.1 | Minimum q value (Å⁻¹) |
--qmax |
float | 8.0 | Maximum q value (Å⁻¹) |
--qlen |
int | 100 | Number of q points |
--reference |
str | None | Reference XYZ file for PCD calculation |
--pcd |
flag | False | Calculate PCD instead of IAM |
--inelastic |
flag | False | Include inelastic (Compton) scattering |
--ewald |
flag | False | Use Ewald sphere mode (3D scattering) |
--tmin |
float | 0.0 | Minimum theta (units of π) |
--tmax |
float | 1.0 | Maximum theta (units of π) |
--tlen |
int | 21 | Number of theta points |
--pmin |
float | 0.0 | Minimum phi (units of π) |
--pmax |
float | 2.0 | Maximum phi (units of π) |
--plen |
int | 21 | Number of phi points |
View all available options:
python3 calculate_iam.py --helpThe optimal_path.py script finds the globally optimal "smoothest" trajectory through multiple candidate structures at each timestep. It uses dynamic programming to minimize a weighted sum of:
- Fit factor: Per-candidate fit quality (parsed from filename)
- Signal delta: Signal difference between consecutive timesteps (MSE on interpolated common q-grid)
- Structure delta: Structural difference between consecutive timesteps (Kabsch-aligned RMSD)
This is useful for:
- Selecting optimal trajectories from multiple candidate structures per timestep
- Smoothing trajectories by minimizing structural and signal jumps
- Post-processing simulation results to find the best path through candidate structures
The script expects files with a specific naming pattern:
01_000.17533577.dat
01_000.17533577.xyz
02_000.46654335.dat
02_000.46654335.xyz
Where:
- First number (
01,02, etc.) is the timestep - Second number (
0.17533577, etc.) is the fit factor (lower is better) - Each timestep should have matching
.datand.xyzfiles
Find optimal path in a directory containing candidate files:
python3 optimal_path.py results/This will:
- Load all candidate structures and signals
- Find the optimal path through timesteps
- Write the selected trajectory to
optimal_trajectory.xyz - Print the chosen path and statistics
Randomly sample files per timestep before pruning (useful for testing or exploring subsets):
python3 optimal_path.py results/ --random-sample 500This randomly selects up to 500 files per timestep before applying topM/delta pruning. Useful for:
- Testing with smaller datasets
- Reducing computational cost
- Exploring different random subsets
- Quick prototyping
- Ensuring consistent coverage across all timesteps
Note: Sampling happens per timestep after grouping, so each timestep gets up to N files. If a timestep has fewer than N files, all files are kept. The same --seed is used for reproducibility.
Combine with other options:
python3 optimal_path.py results/ --random-sample 1000 --topM 50 --delta 0.1This will:
- Group files by timestep
- Randomly sample up to 1000 files per timestep (seed from
--seed) - Apply delta pruning (fit <= best_fit + 0.1)
- Apply topM pruning (keep top 50)
Limit the number of candidates considered per timestep:
Keep top M candidates (by fit factor):
python3 optimal_path.py results/ --topM 50Keep candidates within a fit window:
python3 optimal_path.py results/ --delta 0.1This keeps candidates with fit <= best_fit + 0.1 at each timestep.
Combine both (delta applied first, then topM):
python3 optimal_path.py results/ --delta 0.1 --topM 50Control the relative importance of fit, signal, and RMSD terms:
python3 optimal_path.py results/ \
--fit-weight 1.0 \
--signal-weight 1.0 \
--rmsd-weight 1.0Auto-scaling (enabled by default) normalizes weights so they're comparable:
python3 optimal_path.py results/ # Auto-scaling enabledDisable auto-scaling to use raw weights:
python3 optimal_path.py results/ --no-autoscaleSelect which atoms to include in RMSD calculations:
python3 optimal_path.py results/ --rmsd-indices "0,1,2,5"Use comma-separated atom indices (0-based). If not specified, all atoms are used.
Example: Calculate RMSD using only heavy atoms or specific regions:
python3 optimal_path.py results/ --rmsd-indices "3,5,6,10,12"Custom output filename:
python3 optimal_path.py results/ --xyz-out my_trajectory.xyzAuto-scaling parameters:
python3 optimal_path.py results/ \
--edge-sample-cap 5000 \
--seed 42--edge-sample-cap: Number of random edges sampled for auto-scaling (default: 3000)--seed: Random seed for edge sampling (default: 0)
python3 optimal_path.py results/ \
--random-sample 2000 \
--topM 100 \
--delta 0.2 \
--fit-weight 1.0 \
--signal-weight 1.0 \
--rmsd-weight 1.0 \
--rmsd-indices "3,5,6,10,12" \
--xyz-out optimal_trajectory.xyz \
--edge-sample-cap 5000 \
--seed 42The script provides:
-
Console output:
- Auto-scaling information (if enabled)
- Progress for each timestep transition
- Optimal path with timestep, fit, and filenames
- Unweighted totals (sum of fit factors, signal MSE, RMSD)
- Best weighted cost
-
XYZ trajectory file: Multi-frame XYZ file containing the selected structures, with comment lines recording timestep and fit values.
| Argument | Type | Default | Description |
|---|---|---|---|
directory |
required | - | Directory containing *.dat and *.xyz files |
--random-sample |
int | None | Randomly sample N files before pruning (uses --seed for reproducibility) |
--topM |
int | 100 | Number of lowest-fit candidates to keep per timestep |
--delta |
float | None | Fit window: keep candidates with fit <= best_fit + delta |
--fit-weight |
float | 1.0 | Weight for fit factor term |
--signal-weight |
float | 1.0 | Weight for signal MSE term |
--rmsd-weight |
float | 1.0 | Weight for RMSD term |
--no-autoscale |
flag | False | Disable automatic scaling of cost terms |
--rmsd-indices |
str | None | Comma-separated atom indices (0-based) for RMSD calculation |
--xyz-out |
str | optimal_trajectory.xyz | Output XYZ trajectory filename |
--edge-sample-cap |
int | 3000 | Number of edges sampled for auto-scaling |
--seed |
int | 0 | RNG seed for random sampling and auto-scale edge sampling |
View all available options:
python3 optimal_path.py --helpThe analyze_geometry.py script is a standalone tool for analyzing molecular geometry from XYZ files or trajectories. It can calculate:
- Bond lengths: Distance between two atoms
- Angles: Angle between three atoms (i-j-k)
- Dihedral angles: Torsion angle between four atoms (i-j-k-l)
This is useful for:
- Analyzing molecular structures from simulations or optimizations
- Tracking geometric parameters over trajectories
- Comparing structures by specific geometric features
- Extracting key structural parameters for analysis
Calculate bond length between atoms 0 and 1:
python3 analyze_geometry.py input.xyz --bond 0 1Calculate angle between atoms 0, 1, and 2:
python3 analyze_geometry.py input.xyz --angle 0 1 2Calculate dihedral angle between atoms 0, 1, 2, and 3:
python3 analyze_geometry.py input.xyz --dihedral 0 1 2 3Calculate multiple geometric parameters in one run:
python3 analyze_geometry.py input.xyz \
--bond 0 1 \
--bond 1 2 \
--angle 0 1 2 \
--dihedral 0 1 2 3For XYZ trajectories (multiple structures), output to CSV:
python3 analyze_geometry.py trajectory.xyz \
--bond 0 1 \
--angle 0 1 2 \
--output results.csvThe CSV file will contain one row per structure with all calculated values.
Bond lengths can be specified in Angstrom (default) or Bohr:
python3 analyze_geometry.py input.xyz --bond 0 1 --units bohrAngles and dihedral angles are always reported in degrees.
Single structure (no --output specified):
- Human-readable format printed to stdout
- Shows structure comment and all calculated values
Trajectory or with --output:
- CSV format with one column per calculation (no header, no frame/comment columns)
- Each row represents one structure in the trajectory
- Column order matches the order of calculations specified (bonds first, then angles, then dihedrals)
Important: Atom indices are 0-based (first atom is index 0, second is index 1, etc.).
To determine atom indices:
- Open the XYZ file in a text editor
- Count atoms starting from 0 (first atom = 0, second = 1, etc.)
- Or use a molecular viewer that shows atom indices
Analyze a single structure:
python3 analyze_geometry.py optimized.xyz \
--bond 0 1 \
--angle 0 1 2 \
--dihedral 0 1 2 3Analyze a trajectory:
python3 analyze_geometry.py simulation.xyz \
--bond 0 1 \
--angle 0 1 2 \
--dihedral 0 1 2 3 \
--output geometry_analysis.csvMultiple bonds and angles:
python3 analyze_geometry.py structure.xyz \
--bond 0 1 \
--bond 1 2 \
--bond 2 3 \
--angle 0 1 2 \
--angle 1 2 3 \
--dihedral 0 1 2 3| Argument | Type | Description |
|---|---|---|
input_xyz |
required | Input XYZ file (single structure or trajectory) |
--bond I J |
int, int | Calculate bond length between atoms I and J (0-indexed). Can be specified multiple times. |
--angle I J K |
int, int, int | Calculate angle I-J-K in degrees (0-indexed). Can be specified multiple times. |
--dihedral I J K L |
int, int, int, int | Calculate dihedral angle I-J-K-L in degrees (0-indexed). Can be specified multiple times. |
--output OUTPUT |
str | Output file (CSV format). If not specified, prints to stdout. |
--units {angstrom,bohr} |
str | Units for bond lengths (default: angstrom) |
View all available options:
python3 analyze_geometry.py --helpThe compare_random_subsets.py script automates the process of comparing multiple random subsets from your results directory. It:
- Runs
optimal_path.pymultiple times with different random seeds - Analyzes specified geometric parameters (bonds, angles, dihedrals) from each optimal trajectory
- Creates a comparison plot showing all subsets together
This is useful for:
- Evaluating robustness: See how different random subsets affect the optimal path
- Statistical analysis: Compare multiple trajectories from different data subsets
- Quick comparisons: Automate the workflow of running optimal_path → analyze → plot
Compare 5 random subsets (default) with dihedral 2-3-4-5 and random sample size 500:
python3 compare_random_subsets.py results/ --dihedral 2 3 4 5Compare 10 random subsets:
python3 compare_random_subsets.py results/ --n-subsets 10 --dihedral 2 3 4 5You can analyze bonds, angles, and/or dihedrals. At least one must be specified:
Bond length:
python3 compare_random_subsets.py results/ --bond 0 1Angle:
python3 compare_random_subsets.py results/ --angle 0 1 2Dihedral angle:
python3 compare_random_subsets.py results/ --dihedral 2 3 4 5Multiple calculations (all will be plotted):
python3 compare_random_subsets.py results/ \
--bond 0 1 \
--angle 0 1 2 \
--dihedral 2 3 4 5If you plan to run many subsets (e.g. 100s) and want a single summary curve instead of plotting every subset, use --aggregate.
This computes the mean and standard deviation across subsets at each timepoint and plots mean ± 1σ as a shaded band.
python3 compare_random_subsets.py results/ \
--n-subsets 200 \
--random-sample 500 \
--topM 50 \
--dihedral 2 3 4 5 \
--aggregate \
--output-plot dihedral_mean_std.pngAfter plotting, the script also selects a representative subset: the subset whose dihedral trajectory is closest (smallest RMS difference) to the mean trajectory. The corresponding optimal-path XYZ is saved to:
subset_comparison/representative_trajectory.xyz(or your chosen--output-dir)
Use a different random sample size (default is 500):
python3 compare_random_subsets.py results/ --random-sample 1000 --dihedral 2 3 4 5Specify output directory and plot filename:
python3 compare_random_subsets.py results/ \
--output-dir my_comparison \
--output-plot my_plot.pngIf you do not set --output-plot (i.e. you leave it as the default), the script will auto-generate a descriptive filename based on your settings (e.g. n-subsets, random-sample, topM, geometry indices, seed-start, and whether --aggregate is enabled).
Control the starting seed for reproducibility:
python3 compare_random_subsets.py results/ \
--n-subsets 5 \
--seed-start 42Each subset uses seed_start + subset_index (e.g., 42, 43, 44, 45, 46).
python3 compare_random_subsets.py results/ \
--n-subsets 8 \
--random-sample 500 \
--topM 50 \
--seed-start 0 \
--bond 0 1 \
--angle 0 1 2 \
--dihedral 2 3 4 5 \
--output-dir subset_analysis \
--output-plot geometry_comparison.png-
Runs optimal_path.py for each subset with:
--random-sample N(configurable via--random-sample, default: 500)--topM M(configurable via--topM, default: 50)--fit-weight 0--rmsd-weight 1--signal-weight 1--no-autoscale- Different seed for each subset
-
Analyzes geometry using
analyze_geometry.py:- Extracts the specified bonds, angles, and/or dihedrals from each trajectory
- Saves results as CSV files (one column per calculation)
-
Creates comparison plot using
plot_geometry.py:- Plots all subsets on the same graph
- If multiple calculations are specified, plots each calculation type separately
- Labels each subset with its seed and calculation type
- Saves as PNG
- Trajectory files:
optimal_trajectory_subset_0.xyz,optimal_trajectory_subset_1.xyz, ... - Analysis files:
geometry_subset_0.csv,geometry_subset_1.csv, ... (contains one column per calculation) - Comparison plot:
geometry_comparison.png(or custom filename)
All files are saved in the output directory (default: subset_comparison/).
| Argument | Type | Default | Description |
|---|---|---|---|
directory |
required | - | Directory containing *.dat and *.xyz files |
--n-subsets |
int | 5 | Number of random subsets to compare |
--random-sample |
int | 500 | Number of files to randomly sample for each subset |
--topM |
int | 50 | Number of lowest-fit candidates to keep per timestep |
--seed-start |
int | 0 | Starting seed (each subset uses seed_start + index) |
--output-dir |
str | subset_comparison | Output directory for intermediate files |
--output-plot |
str | geometry_comparison.png | Output plot filename |
--bond |
int, int | None | Bond atom indices (I J). At least one of --bond, --angle, or --dihedral must be specified. |
--angle |
int, int, int | None | Angle atom indices (I J K). At least one of --bond, --angle, or --dihedral must be specified. |
--dihedral |
int, int, int, int | None | Dihedral atom indices (I J K L). At least one of --bond, --angle, or --dihedral must be specified. |
View all available options:
python3 compare_random_subsets.py --helpThe plot_fit_histograms.py script creates histograms of fit values from filenames for each timestep separately. This helps you:
- Visualize fit distributions across timesteps
- Identify converged regions where most good runs fall
- Determine outlier thresholds for filtering
- Compare fit distributions between timesteps
Plot histograms for all timesteps:
python3 plot_fit_histograms.py results/Specify output filename:
python3 plot_fit_histograms.py results/ --output my_histograms.pngChange the number of histogram bins:
python3 plot_fit_histograms.py results/ --bins 100The script creates:
-
Histogram plot: A grid of histograms, one per timestep, showing:
- Distribution of fit values
- Statistics box (min, Q25, median, mean, Q75, max)
- Number of files per timestep
-
Summary statistics: Printed to console with:
- Count of files per timestep
- Min, Q25, median, Q75, max fit values per timestep
Summary Statistics (across 20 timesteps):
============================================================
Timestep Count Min Q25 Median Q75 Max
------------------------------------------------------------
01 1500 0.123456 0.234567 0.345678 0.456789 0.987654
02 1500 0.112345 0.223456 0.334567 0.445678 0.876543
...
| Argument | Type | Default | Description |
|---|---|---|---|
directory |
required | - | Directory containing *.dat and *.xyz files |
--output |
str | fit_histograms.png | Output plot filename |
--bins |
int | 50 | Number of bins for histograms |
--dat-ext |
str | .dat | Extension for DAT files |
--xyz-ext |
str | .xyz | Extension for XYZ files |
View all available options:
python3 plot_fit_histograms.py --helpThe plot_geometry.py script creates plots from CSV files generated by analyze_geometry.py. It can visualize geometric parameters over time (or frame number) and supports multiple customization options. It can plot multiple files simultaneously for comparison.
Plot column 0 from a single CSV file:
python3 plot_geometry.py data.csv output.pngPlot specific columns (0-indexed) from a single file:
python3 plot_geometry.py data.csv output.png --columns 0 1Plot the same column from multiple files for comparison:
python3 plot_geometry.py file1.csv file2.csv file3.csv output.pngThis will plot column 0 from each file. Use --columns to specify which column(s) to plot from all files:
python3 plot_geometry.py file1.csv file2.csv output.png --columns 0Custom labels for each file:
python3 plot_geometry.py run1.csv run2.csv run3.csv output.png \
--labels "Run 1" "Run 2" "Run 3" \
--title "Comparison of Multiple Runs"Labels and titles:
python3 plot_geometry.py data.csv output.png \
--title "Bond Length Evolution" \
--xlabel "time (fs)" \
--ylabel "Bond Length (Å)" \
--labels "C-C bond" "C-H bond"Note: The x-axis defaults to "time (fs)" where each frame represents 20 fs, with the first frame at 10 fs. The plot starts at 0 fs to show blank space before the first data point. You can override the x-axis label with --xlabel if needed.
Axis limits:
python3 plot_geometry.py data.csv output.png \
--xmin 0 --xmax 100 \
--ymin 1.0 --ymax 2.0Plot style:
# Line plot (default)
python3 plot_geometry.py data.csv output.png --style line
# Scatter plot
python3 plot_geometry.py data.csv output.png --style scatter
# Both line and scatter
python3 plot_geometry.py data.csv output.png --style bothFigure size and resolution:
python3 plot_geometry.py data.csv output.png \
--figsize 12 8 \
--dpi 300Add grid:
python3 plot_geometry.py data.csv output.png --gridAnalyze geometry and plot results:
# Step 1: Analyze trajectory
python3 analyze_geometry.py trajectory.xyz \
--bond 0 1 \
--angle 0 1 2 \
--output geometry_data.csv
# Step 2: Plot the results
python3 plot_geometry.py geometry_data.csv geometry_plot.png \
--title "Molecular Geometry Evolution" \
--xlabel "time (fs)" \
--ylabel "Value" \
--labels "Bond 0-1 (Å)" "Angle 0-1-2 (°)" \
--grid \
--style lineCompare multiple trajectories:
# Analyze multiple trajectories
python3 analyze_geometry.py traj1.xyz --bond 0 1 --output traj1.csv
python3 analyze_geometry.py traj2.xyz --bond 0 1 --output traj2.csv
python3 analyze_geometry.py traj3.xyz --bond 0 1 --output traj3.csv
# Plot all together for comparison
python3 plot_geometry.py traj1.csv traj2.csv traj3.csv comparison.png \
--title "Bond Length Comparison" \
--ylabel "Bond Length (Å)" \
--labels "Run 1" "Run 2" "Run 3" \
--grid| Argument | Type | Description |
|---|---|---|
input_csv |
required | Input CSV file(s) (from analyze_geometry.py, no header). Can specify multiple files for comparison. |
output_png |
required | Output PNG file |
--columns |
int+ | Column indices to plot (0-indexed). Applies to all files. If not specified, plots column 0 from each file. |
--xlabel |
str | X-axis label (default: "time (fs)") |
--ylabel |
str | Y-axis label (default: "Value") |
--labels |
str+ | Labels for each column (legend). Number should match number of columns plotted. |
--title |
str | Plot title |
--xmin, --xmax |
float | X-axis limits |
--ymin, --ymax |
float | Y-axis limits |
--figsize |
float, float | Figure size in inches (default: 10 6) |
--dpi |
int | Output resolution in DPI (default: 300) |
--style |
str | Plot style: line, scatter, or both (default: line) |
--grid |
flag | Show grid on plot |
View all available options:
python3 plot_geometry.py --helpNote: Detailed explanations of all parameters, including their units, purpose, and relationships, are provided as comments in the
input.tomlconfiguration file. Refer to that file for comprehensive parameter documentation.
--run-id: Run identifier--molecule: Molecule name (must match entry inmolecule_paramssection)--results-dir: Directory for output files
--forcefield-file: Force field file path--start-xyz-file: Starting XYZ structure--start-sdf-file: Starting SDF structure--reference-xyz-file: Reference XYZ structure--target-file: Target XYZ or DAT file
--run-pyscf-modes: Enable PySCF normal modes calculation--pyscf-basis: Basis set for PySCF (e.g., "6-31g*", "sto-3g")--verbose: Enable verbose output--write-dat-file: Write DAT file output--mm-param-method: MM parameter retrieval method (see "MM Parameter Retrieval Methods" below)
--sampling: Enable sampling--boltzmann-temperature: Temperature for Boltzmann sampling
--inelastic: Enable inelastic scattering--pcd-mode: Enable PCD mode--excitation-factor: Excitation factor--ewald-mode: Enable Ewald mode--qmin,--qmax,--qlen: Q-vector parameters--tmin,--tmax,--tlen: Theta parameters--pmin,--pmax,--plen: Phi parameters--noise-value: Noise value--noise-data-file: Noise data file path
--sa-starting-temp: Starting temperature--sa-nsteps: Number of SA steps--greedy-algorithm: Use greedy algorithm--ga-nsteps: Number of GA steps--sa-step-size: Step size for SA--ga-step-size: Step size for GA--nrestarts: Number of restarts--ntotalruns: Total number of runs--bonds: Enable bond constraints--angles: Enable angle constraints--torsions: Enable torsion constraints--tuning-ratio-target: Target tuning ratio--c-tuning-initial: Initial C tuning value--non-h-modes-only: Only use non-hydrogen modes--hf-energy: Run PySCF HF energy calculation
The program supports two methods for retrieving molecular mechanics (MM) force field parameters:
You can choose the method using the mm_param_method parameter in input.toml:
[options]
mm_param_method = "sdf" # or "basic"Or via command line:
python3 run.py --mm-param-method sdfDefault method. This method attempts to use force field parameters from OpenFF Toolkit by working with SDF files.
Process:
- Robust SDF method: Tries to create an OpenMM system from the SDF file with:
- Automatic radical fixing (removes radicals that OpenFF doesn't support)
- Bond order simplification (converts double/triple/aromatic bonds to single bonds if needed)
- Direct RDKit-to-OpenFF conversion
- Original SDF method: If robust SDF fails, tries the original SDF method (standard OpenFF parameterization)
- Basic method fallback: If both SDF methods fail, automatically falls back to the basic method (geometry extraction with generic force constants)
When to use:
- Default choice for most molecules
- When you want to use accurate force field parameters from OpenFF
- When your molecule is compatible with OpenFF (or can be made compatible with radical fixing/bond simplification)
- When you have an SDF file or can create one from your XYZ file
Advantages:
- Uses accurate, atom-type-specific force field parameters
- Handles radicals and unusual bond orders automatically
- Falls back gracefully to basic method if needed
Requirements:
- SDF file (will be created from XYZ if not present)
- OpenFF Toolkit installed
- Force field file (e.g.,
openff_unconstrained-2.0.0.offxml)
This method extracts parameters directly from the molecular geometry using generic force constants, bypassing OpenFF entirely.
Process:
- Reads the starting XYZ structure
- Identifies bonds, angles, and torsions from atomic connectivity
- Uses current geometry as equilibrium values
- Applies generic force constants (see "Generic Force Constants" section below)
When to use:
- When OpenFF parameterization consistently fails
- When you want to skip SDF file creation/conversion
- When you need guaranteed parameter extraction (no dependency on OpenFF)
- For quick testing or when approximate parameters are sufficient
- For molecules with unusual structures that OpenFF cannot handle
Advantages:
- Always works: No dependency on OpenFF compatibility
- Fast: No SDF conversion or OpenFF parameterization
- Simple: Direct geometry-based extraction
- Reliable: Guaranteed to produce parameters
Limitations:
- Uses generic force constants (not atom-type-specific)
- Less accurate than force field parameters
- See "Generic Force Constants" section for details
| Feature | SDF Method | Basic Method |
|---|---|---|
| Accuracy | High (force field parameters) | Moderate (generic constants) |
| Speed | Slower (SDF conversion + OpenFF) | Faster (direct extraction) |
| Reliability | May fail for unusual molecules | Always works |
| Dependencies | Requires OpenFF + SDF | No special dependencies |
| Fallback | Falls back to basic if needed | No fallback needed |
| Use case | Production runs, accurate parameters | Quick testing, guaranteed extraction |
- Start with SDF method (
mm_param_method = "sdf"): It will automatically fall back to basic method if needed - Use basic method if:
- You consistently get errors with SDF method
- You're doing quick tests and don't need high accuracy
- You want to avoid SDF file dependencies
- Your molecule has unusual structures that OpenFF cannot handle
When the program cannot obtain molecular mechanics (MM) force field parameters from OpenFF Toolkit (e.g., due to radicals, unusual bond orders, or other compatibility issues), it automatically falls back to a geometry-based parameter extraction method that uses generic force constants.
The generic force constants are used in two scenarios:
- When
mm_param_method = "basic": Used directly as the primary method - When
mm_param_method = "sdf": Used as a fallback when:- The robust SDF method fails (after radical fixing and bond simplification)
- The original SDF method fails
If the SDF method succeeds, the program uses the force field parameters from OpenFF Toolkit instead of generic constants.
The fallback method extracts parameters directly from your starting geometry:
-
Equilibrium values: Calculated from the current XYZ coordinates
- Bond lengths: Current distances between bonded atoms
- Bond angles: Current angles between three connected atoms
- Torsion angles: Current dihedral angles between four connected atoms
-
Force constants: Generic values based on typical molecular mechanics force fields:
k_bond = 400.0 + 50.0 × (average atomic number) [kcal/mol/Ų]
Examples:
- C-C bond (both atomic number 6):
400 + 50×6 = 700 kcal/mol/Ų - C-H bond (avg atomic number ~3.5):
400 + 50×3.5 = 575 kcal/mol/Ų - C-O bond (avg atomic number ~7):
400 + 50×7 = 750 kcal/mol/Ų
Reference: Typical C-C bonds in OpenFF are ~529 kcal/mol/Ų, so these values are in a similar range (slightly higher for safety).
k_angle = 60.0 [kcal/mol/rad²]
This is a fixed value for all angles. Typical range in force fields: 50-100 kcal/mol/rad².
k_torsion = 2.0 [kcal/mol]
This is a fixed value for all torsions. Typical range in force fields: 0.1-5 kcal/mol.
These generic force constants are based on typical values from standard molecular mechanics force fields:
- AMBER: Bond constants ~300-600 kcal/mol/Ų, angle constants ~40-100 kcal/mol/rad², torsion constants ~0.1-3 kcal/mol
- CHARMM: Similar ranges to AMBER
- OpenFF: Bond constants ~400-800 kcal/mol/Ų, angle constants ~50-100 kcal/mol/rad², torsion constants ~0.1-5 kcal/mol
The values chosen are:
- Conservative (slightly higher than typical) to provide reasonable constraints
- Within typical ranges used in standard force fields
- Simple heuristics that work for most organic molecules
The generic force constants are approximations and have the following limitations:
- Not atom-type specific: All C-C bonds get the same force constant, regardless of hybridization (sp³ vs sp²)
- Not bond-order specific: Single, double, and triple bonds are treated the same
- Not environment-specific: Aromatic vs aliphatic bonds are treated the same
- Fixed angle/torsion constants: All angles and torsions use the same force constant regardless of atom types
Despite these limitations, the generic force constants work well for simulated annealing because:
- Reasonable constraints: They provide sufficient constraints to keep the structure near the starting geometry
- Typical ranges: The values are within typical force field ranges
- Equilibrium from geometry: The equilibrium values come from your actual starting structure, so they're appropriate for your molecule
- Simulated annealing tolerance: For optimization purposes, approximate constraints are often sufficient
When the fallback method is used, you'll see output like:
Both robust method and SDF method failed!
Attempting final fallback: extracting parameters directly from geometry...
(This uses starting geometry as equilibrium values with generic force constants)
Successfully extracted parameters from geometry!
Found X bonds, Y angles, Z torsions
Note: Using generic force constants and starting geometry as equilibrium values.
If you see this message, the program is using generic force constants instead of OpenFF parameters.
-
Default Values: All parameters have default values defined in the TOML config file (
input.tomlby default). Command-line arguments override these defaults. -
Boolean Flags: Boolean parameters (like
--verbose,--inelastic) are enabled by including the flag. They default toFalseif not specified. -
Parameter Overrides: Only parameters explicitly provided on the command line override the TOML file values. Unspecified parameters use TOML defaults.
-
Mode Validation: The
--modeargument must be either"normal"or"test"(case-insensitive). -
Molecule Parameters: The
--moleculeargument must match a section inmolecule_paramsin the TOML file (e.g.,[molecule_params.chd]).
To see all available command-line options with descriptions:
python3 run.py --helpThis will display all arguments organized by group with their descriptions and default values.
Run the test suite:
pytest tests/For more information on tests, see tests/README.md.