MDAnalysis Hydrogenbond calculation taking too much time #4898

M15160058 · 2025-01-23T14:28:00Z

M15160058
Jan 23, 2025

Hello everyone,
I am trying to calculate hydrogen bonds using MDAnalysis but it is taking too much time (more than a day).Even for frame 1, it is taking 5or six hours.
Yes the xtc file is large like # Atoms 770667 and 300ns long. But I want to calculate the hbonds for only four chains(3600 atoms only).
`import pandas as pd
import MDAnalysis as mda
from MDAnalysis.analysis.hydrogenbonds import HydrogenBondAnalysis
import matplotlib.pyplot as plt

Load the Universe

u = mda.Universe("clpb_tetramer_traj1_npt_50.tpr", "concate_cdab.xtc")

Select specific parts (e.g., two chains)

selection = u.select_atoms(
"segid seg_6_Protein_chain_G or segid seg_7_Protein_chain_H or segid seg_8_Protein_chain_I or segid seg_9_Protein_chain_J"
)

Use the selection for analysis

print("Number of atoms in selection:", len(selection))

Define chain selections using segid

chains = {
"A": u.select_atoms("segid seg_6_Protein_chain_G"),
"B": u.select_atoms("segid seg_7_Protein_chain_H"),
"C": u.select_atoms("segid seg_8_Protein_chain_I"),
"D": u.select_atoms("segid seg_9_Protein_chain_J"),
}

Define all chain pair combinations (for inter-chain only)

chain_pairs = [
("A", "B"), ("A", "C"), ("A", "D"),
("B", "C"), ("B", "D"), ("C", "D")
]

Pre-calculate donor and acceptor selections

donor_selectors = {}
acceptor_selectors = {}
for chain_id in chains:
donor_selectors[chain_id] = f"segid {chains[chain_id].segids[0]} and (name N or name O)" # Example: Select N and O atoms as donors
acceptor_selectors[chain_id] = f"segid {chains[chain_id].segids[0]} and name O" # Example: Select O atoms as acceptors

File 1: H-Bond details for the first frame

first_frame_data = []

Process the first frame only

u.trajectory[0]
print("Processing the first frame:")

for i, (chain1, chain2) in enumerate(chain_pairs, start=1):
print(f" Chain pair {i}/{len(chain_pairs)}: {chain1}-{chain2}")
hbonds = HydrogenBondAnalysis(
universe=u,
donors_sel=donor_selectors[chain1],
acceptors_sel=acceptor_selectors[chain2],
d_a_cutoff=3.5,
d_h_a_angle_cutoff=120,
)
hbonds.run()

# Process the results for this pair
for hbond in hbonds.results.hbonds:
    frame, donor, hydrogen, acceptor, distance, angle = hbond
    donor_atom = u.atoms[int(donor)]
    acceptor_atom = u.atoms[int(acceptor)]

    # Extract chain, residue, and atom details
    donor_details = f"{chain1}-{donor_atom.resname}-{donor_atom.resid}-{donor_atom.index+1}-{donor_atom.name}"
    acceptor_details = f"{chain2}-{acceptor_atom.resname}-{acceptor_atom.resid}-{acceptor_atom.index+1}-{acceptor_atom.name}"

    # Append the data for the current hydrogen bond
    first_frame_data.append({
        "Chain Pair": f"{chain1}-{chain2}",
        "Donor (Chain-Residue-Index-Atom)": donor_details,
        "Acceptor (Chain-Residue-Index-Atom)": acceptor_details,
        "Distance (Å)": f"{distance:.2f}",
        "Angle (°)": f"{angle:.2f}"
    })

Save the first frame data to a CSV file

first_frame_df = pd.DataFrame(first_frame_data)
first_frame_df.to_csv("first_frame_hbonds.csv", index=False)
print("First frame H-Bond details saved to 'first_frame_hbonds.csv'.")

File 2: Time vs Number of H-Bonds across trajectory

time_data = []

Iterate over all frames in the trajectory

print("\nProcessing all trajectory frames:")
for ts in u.trajectory:
current_time = ts.time
print(f" Frame {u.trajectory.frame}/{len(u.trajectory)} at time {current_time:.2f} ps")
total_hbonds = 0

for i, (chain1, chain2) in enumerate(chain_pairs, start=1):
    print(f"    Chain pair {i}/{len(chain_pairs)}: {chain1}-{chain2}")
    hbonds = HydrogenBondAnalysis(
        universe=u,
        donors_sel=donor_selectors[chain1], 
        acceptors_sel=acceptor_selectors[chain2], 
        d_a_cutoff=3.5, 
        d_h_a_angle_cutoff=120,
    )
    hbonds.run()

    # Add the number of H-bonds for this pair
    total_hbonds += len(hbonds.results.hbonds)

# Append the total H-bond count for this frame to the time data
time_data.append({"Time (ps)": current_time, "Number of H-bonds": total_hbonds})

Save the time vs H-Bond data to a CSV file

time_df = pd.DataFrame(time_data)
time_df.to_csv("time_vs_hbonds.csv", index=False)
print("Time vs Number of H-Bonds saved to 'time_vs_hbonds.csv'.")

Plot the results

plt.figure(figsize=(10, 6))
plt.plot(time_df["Time (ps)"], time_df["Number of H-bonds"], marker='o')
plt.xlabel("Time (ps)")
plt.ylabel("Number of Hydrogen Bonds")
plt.title("Time vs. Number of Hydrogen Bonds")
plt.grid()
plt.savefig("time_vs_hbonds_plot.png")
plt.show()`

p-j-smith · 2025-02-11T09:31:33Z

p-j-smith
Feb 11, 2025
Maintainer

Hi @M15160058, several hours for a single frame sounds surprisingly slow. I had a look at your code and there are a few things you can do to speed it up:

set update_selections to False when instantiating HydrogenBondAnalysis to avoid re-calculating donor-hydrogen pairs at each frame (as suggested in the UserGuide)
use the between keyword to specify you would like to find hydrogen bonds between specific atom selections. This way you only need to perform the calculation once, rather than iterating over pairs of chains
pass the hydrogens_sel to HydrogenBondAnalysis but not donors_sel. From the User Guide: 'By not providing a donor atom selection (donor_sel) we will use the topology bond information to find donor-hydrogen pairs.'. If you pass in donors_sel, then donor-hydrogen pairs will be identified by distance rather than topology, which is both less reliable and much slower
you can use guess_acceptors and guess_hydrogens to help generate the atom selections.
in your section 'Process the first frame only', you are actually running the analysis on all frames. In the call to hbonds.run() you can specify the start, stop, and step keywords to limit the analysis to a specific frame. The default is to analyse all frames.
use the count_by_time helper function to get the number of hydrogen bonds over time

It looks like what you're currently doing is:

iterating over each frame
for each frame, iterate over each pair of chains
for each pair of chains at each frame, calculate hbonds between the chains for all frames of the trajectory and add to the count of hydrogen bonds

Instead, you should only need to make one call to HydrogenBondAnalysis.run():

import MDAnalysis as mda
from MDAnalysis.analysis.hydrogenbonds import HydrogenBondAnalysis

u = mda.Universe("clpb_tetramer_traj1_npt_50.tpr", "concate_cdab.xtc")

chainA = "segid seg_6_Protein_chain_G"
chainB = "segid seg_7_Protein_chain_H"
chainC = "segid seg_8_Protein_chain_I"
chainD = "segid seg_9_Protein_chain_J"

hbonds = HydrogenBondAnalysis(
    universe=u,
    d_a_cutoff=3.5, 
    d_h_a_angle_cutoff=120,
    between = [
        [chainA, chainB],
        [chainA, chainC],
        [chainA, chainD],
        [chainB, chainC],
        [chainB, chainD],
        [chainC, chainD],
    ],
)
hbonds.hydrogens_sel = hbonds.guess_hydrogens(f"({chainA}) or ({chainB}) or ({chainC}) or ({chainD})")
hbonds.acceptors_sel = hbonds.guess_acceptors(f"({chainA}) or ({chainB}) or ({chainC}) or ({chainD})")

# run the analysis over all frames and print progress
hbonds.run(verbose=True)  

# Plot hbonds over time
plt.plot(hbonds.times, hbonds.count_by_time(), lw=2)

I haven't tested this, but it should give you an idea of what to do. I'd also recommend reading through the user guide for HydrogenBondAnalysis to see how best to run the analysis

1 reply

M15160058 Feb 13, 2025
Author

Hello p-j-smith,
It works. Thank you so much. Could you please help me to solve following problem?
I am trying to store the data for further analysis. But I face a different problem and I believe it is due the Mdanalysis' 0 baesd index .

# Create a DataFrame with appropriate column names (without type conversion issues)
df_detailed = pd.DataFrame(hbond_data, columns=["Time", "Donor_ix", "Hydrogen_ix", "Acceptor_ix", "Distance", "Angle"])

# For each hydrogen bond, add donor and acceptor atom information
# (Convert indices to integers for proper indexing)
df_detailed["Donor_ix"] = (df_detailed["Donor_ix"] ).astype(int)
df_detailed["Acceptor_ix"] = (df_detailed["Acceptor_ix"] ).astype(int)
df_detailed["Hydrogen_ix"] = (df_detailed["Hydrogen_ix"] ).astype(int)

df_detailed["Donor_resname"] = [u.atoms[i].resname for i in df_detailed["Donor_ix"]]
df_detailed["Acceptor_resname"] = [u.atoms[i].resname for i in df_detailed["Acceptor_ix"]]
df_detailed["Donor_resid"] = [u.atoms[i].resid for i in df_detailed["Donor_ix"]]
df_detailed["Acceptor_resid"] = [u.atoms[i].resid for i in df_detailed["Acceptor_ix"]]
df_detailed["Donor_name"] = [u.atoms[i].name for i in df_detailed["Donor_ix"]]
df_detailed["Acceptor_name"] = [u.atoms[i].name for i in df_detailed["Acceptor_ix"]]

Output:

<title></title>
<meta name="generator" content="LibreOffice 7.3.7.2 (Linux)"/>
<style type="text/css">
	body,div,table,thead,tbody,tfoot,tr,th,td,p { font-family:"Liberation Sans"; font-size:x-small }
	a.comment-indicator:hover + comment { background:#ffd; position:absolute; display:block; border:1px solid black; padding:0.5em;  } 
	a.comment-indicator { background:red; display:inline-block; border:1px solid black; width:0.5em; height:0.5em;  } 
	comment { display:none;  } 
</style>

Time	Donor_ix	Hydrogen_ix	Acceptor_ix	Distance	Angle	Donor_bynum	Hydrogen_bynum	Acceptor_bynum	Donor_resname	Acceptor_resname	Donor_resid	Acceptor_resid	Donor_name	Acceptor_name
0	34955	34956	37893	3.07403224399592	177.448731863319	34954	34955	37892	ILE	GLU	3518	3830	N	OE2

So, if I check the 'gro' file I see the atom for example 34955 is not the N atom.The atom number of the N is 34956. It an oxygen atom of the previous residue. How to solve this issues here?Should I add 1 here?df_detailed["Donor_ix"] = (df_detailed["Donor_ix"] ).astype(int) and so. But If I add 1 here then the donar name changes to H.

Secondly, this code does not calculate Hbond among backbone atoms. It calculates hbond with side chains. I need to calculate if there is any hbond between backbone atoms of two chains. Please see the following:

<title></title>
<meta name="generator" content="LibreOffice 7.3.7.2 (Linux)"/>
<style type="text/css">
	body,div,table,thead,tbody,tfoot,tr,th,td,p { font-family:"Liberation Sans"; font-size:x-small }
	a.comment-indicator:hover + comment { background:#ffd; position:absolute; display:block; border:1px solid black; padding:0.5em;  } 
	a.comment-indicator { background:red; display:inline-block; border:1px solid black; width:0.5em; height:0.5em;  } 
	comment { display:none;  } 
</style>

Donor_ix	Hydrogen_ix	Acceptor_ix	Distance	Angle	Donor_bynum	Hydrogen_bynum	Acceptor_bynum	Donor_resname	Acceptor_resname	Donor_resid	Acceptor_resid	Donor_name	Acceptor_name
34955	34956	37893	3.07403224399592	177.448731863319	34954	34955	37892	ILE	GLU	3518	3830	N	OE2
35095	35096	36022	3.17745225008398	126.88743332875	35094	35095	36021	GLY	GLU	3532	3627	N	OE1
35095	35096	36023	3.11302718047391	167.127356438271	35094	35095	36022	GLY	GLU	3532	3627	N	OE2
35135	35136	36022	2.86455482740786	162.203299332871	35134	35135	36021	HIS	GLU	3536	3627	NE2	OE1
35198	35199	36084	2.68437711059177	170.530697003709	35197	35198	36083	SER	ASP	3542	3634	OG	OD1
35360	35361	37818	2.85831070530492	164.205014625103	35359	35360	37817	THR	GLU	3558	3823	OG1	OE2
35531	35532	38149	3.11623134419752	153.123601159224	35530	35531	38148	HIS	GLU	3577	3857	N	OE2
35545	35546	38148	3.49281936224657	156.807122233207	35544	35545	38147	ASN	GLU	3578	3857	N	OE1
35551	35552	38148	2.9096449425214	170.842882329205	35550	35551	38147	ASN	GLU	3578	3857	ND2	OE1
35556	35557	38149	2.89686118693986	178.919212741302	35555	35556	38148	CYS	GLU	3579	3857	N	OE2
35571	35573	38157	2.85535580439461	146.544185355871	35570	35572	38156	GLN	ASP	3580	3858	NE2	OD1
35599	35600	38157	2.87870827532216	163.711835083156	35598	35599	38156	THR	ASP	3584	3858	N	OD1
35603	35604	38158	2.92910774058586	152.929247158435	35602	35603	38157	THR	ASP	3584	3858	OG1	OD2
35603	35604	38157	2.86036528920919	139.108255586934	35602	35603	38156	THR	ASP	3584	3858	OG1	OD1
35714	35715	37340	3.35435810169038	148.954777330417	35713	35714	37339	SER	GLU	3596	3764	OG	OE2
35771	35772	38501	2.87967703492741	157.795679615902	35770	35771	38500	LYS	GLU	3603	3894	N	OE1
35771	35772	38502	3.35874543702678	126.536482965135	35770	35771	38501	LYS	GLU	3603	3894	N	OE2
35784	35785	38502	3.11012491351276	161.440666701482	35783	35784	38501	GLU	GLU	3604	3894	N	OE2
35960	35961	36827	2.87852897693886	162.346244180717	35959	35960	36826	GLU	GLU	3622	3711	N	OE2
36207	36208	35375	3.43432424861092	154.211516382296	36206	36207	35374	LEU	SER	3646	3560	N	OG
36220	36221	35392	2.809079446699	174.034867849015	36219	36220	35391	THR	ASP	3647	3562	OG1	OD2
36231	36232	37112	3.17821503069237	126.337795348992	36230	36231	37111	SER	ASP	3649	3740	N	OD2
36424	36425	35619	3.1247771131576	131.320982902158	36423	36424	35618	GLN	GLU	3669	3586	N	OE1
36424	36425	35620	2.90340371377794	153.979208306504	36423	36424	35619	GLN	GLU	3669	3586	N	OE2
36436	36437	35619	2.68946621618473	174.782429244648	36435	36436	35618	LEU	GLU	3670	3586	N	OE1
36855	36856	36023	2.77512427590067	163.300241528102	36854	36855	36022	HIS	GLU	3714	3627	NE2	OE2
36982	36984	34989	3.46065075886794	132.702770594101	36981	36983	34988	GLN	GLU	3727	3521	NE2	OE2
36982	36984	34988	2.723769974967	165.795543715905	36981	36983	34987	GLN	GLU	3727	3521	NE2	OE1
37456	37458	38425	2.87395673162463	156.976658253072	37455	37457	38424	ASN	SER	3777	3886	ND2	OG
37755	37756	36755	2.96171431619803	145.217014859511	37754	37755	36754	VAL	TYR	3817	3703	N	OH
37966	37967	36945	2.75262890925577	150.95745131733	37965	37966	36944	GLN	ASP	3839	3723	N	OD2
37973	37974	36945	2.77839022353195	165.421341960379	37972	37973	36944	GLN	ASP	3839	3723	NE2	OD2
38021	38022	37157	3.29370094917603	133.013813816329	38020	38021	37156	LYS	GLU	3843	3745	N	OE1

My goal is to calculate hbond among all atoms and among backbone atoms only.

Thanks again.

p-j-smith · 2025-02-13T08:29:35Z

p-j-smith
Feb 13, 2025
Maintainer

good to hear it works for you!

if I check the 'gro' file I see the atom for example 34955 is not the N atom.The atom number of the N is 34956.

As you suggested, I think you're comparing atom indices in your results with atom numbers in your gro file, so there's an offset of 1. I would recommend working with the indices as these are guaranteed to be unique (in a gro file, atom numbers repeat once they reach 99999).

My goal is to calculate hbond among all atoms and among backbone atoms only.

Does the current selection include atoms in both the backbone and the side chains? If so, there are two ways you could do this:

Modify your atom selections to select only backbone atoms and re-run your analysis
Post-process your current results to check whether the donor and acceptor atoms in each hydrogen bond are both part of a backbone

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MDAnalysis Hydrogenbond calculation taking too much time #4898

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

MDAnalysis Hydrogenbond calculation taking too much time #4898

Uh oh!

Uh oh!

M15160058 Jan 23, 2025

Load the Universe

Select specific parts (e.g., two chains)

Use the selection for analysis

Define chain selections using segid

Define all chain pair combinations (for inter-chain only)

Pre-calculate donor and acceptor selections

File 1: H-Bond details for the first frame

Process the first frame only

Save the first frame data to a CSV file

File 2: Time vs Number of H-Bonds across trajectory

Iterate over all frames in the trajectory

Save the time vs H-Bond data to a CSV file

Plot the results

Replies: 2 comments · 1 reply

Uh oh!

p-j-smith Feb 11, 2025 Maintainer

Uh oh!

Uh oh!

M15160058 Feb 13, 2025 Author

Uh oh!

p-j-smith Feb 13, 2025 Maintainer

M15160058
Jan 23, 2025

Replies: 2 comments 1 reply

p-j-smith
Feb 11, 2025
Maintainer

M15160058 Feb 13, 2025
Author

p-j-smith
Feb 13, 2025
Maintainer