Skip to content

Convert TorchMDNetPotential to use PythonForce#117

Open
peastman wants to merge 3 commits intoopenmm:mainfrom
peastman:torchmdnet
Open

Convert TorchMDNetPotential to use PythonForce#117
peastman wants to merge 3 commits intoopenmm:mainfrom
peastman:torchmdnet

Conversation

@peastman
Copy link
Member

@peastman peastman commented Feb 5, 2026

With this change, it no longer relies on compiling the model to TorchScript. I also added support for periodic systems, which fixes #115.

@sef43 can you review this to make sure it looks correct? I'm especially suspicious about the periodic boundary conditions. In a test of alanine dipeptide in a box of water, turning on the periodic boundary conditions only changes the energy from -68146 to -68502, which seems doubtful. That's with ASE, not just OpenMM.

@sef43
Copy link
Contributor

sef43 commented Feb 6, 2026

Thanks for doing this implementation!

Your suspicion is correct, for the AceFF-2.0 model the PBCs will not be working, as it does not currently have PBC's implemented in the coulomb term in the torchmdnet package. For the AceFF-1.0/1.1 and other torchmdnet models that use a standard scalar output module this implementation should work. I will do some testing

@sef43
Copy link
Contributor

sef43 commented Feb 25, 2026

The changes needed in torchmdnet have now been implemented, when using AceFF2 with PBCs you will need to provide a new keyword argument to load_model with the cutoff distance to use for the coulomb interaction in angstroms, e.g.: load_model(..., coulomb_cutoff=10.0)

The other models (AceFF1.0/1.1) don't need this argument.

@peastman
Copy link
Member Author

Thanks. Is there a recommended value?

@sef43
Copy link
Contributor

sef43 commented Feb 26, 2026

It is hard to do proper testing to asses that, the models I have trained are primarily designed for a small molecules in vacuum or in mechanical embedding in MM/ML where in both cases we would use no PBC or cutoff for the coulomb term.
Practically the cutoff needs to be less than half the box length (as in OpenMM), given it uses the same reaction field equation as openmm's nonbondedcutoff Force I would suggest the cutoff follows the same recommendation for that, or just make it as large as you can get away with. The speed of the coulomb term should be insignificant compared to the rest of the MLIP.

@peastman
Copy link
Member Author

I'll use 12A as the default. That's generally a reasonable value with reaction field.

I've run into a couple of problems trying to implement this. If I include the coulomb_cutoff option when creating a model that doesn't include a Coulomb term, such as AceFF-1.1, it fails with an exception:

E           RuntimeError: Error(s) in loading state_dict for TorchMD_Net:
E           	Unexpected key(s) in state_dict: "output_model.distance.box".

../../../miniconda3/envs/openmmml/lib/python3.13/site-packages/torch/nn/modules/module.py:2629: RuntimeError

Since the user can specify an arbitrary model file, I don't know in advance whether it includes that term or not.

When testing AceFF-2.0 it fails with a different exception:

Traceback (most recent call last):
  File "/Users/peastman/workspace/openmm-ml/openmmml/models/torchmdnetpotential.py", line 219, in _computeTorchMDNet
    energy = model(z=numbers, pos=positions/lengthScale, batch=batch, q=charge, box=cell)[0]*energyScale
             ~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/peastman/miniconda3/envs/openmmml/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/Users/peastman/miniconda3/envs/openmmml/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/peastman/miniconda3/envs/openmmml/lib/python3.13/site-packages/torchmdnet/models/model.py", line 513, in forward
    x = self.output_model.pre_reduce(x, v, z, pos, batch, box=box)
  File "/Users/peastman/miniconda3/envs/openmmml/lib/python3.13/site-packages/torchmdnet/models/output_modules.py", line 595, in pre_reduce
    e_i = e_i.index_add(0, edge_index[0], e_ij)
IndexError: index out of range in self

@sef43
Copy link
Contributor

sef43 commented Mar 6, 2026

the first error is fixed already, the second error was from not dealing with static shapes properly and will be fixed in the latest pr to torchmd-net torchmd/torchmd-net#386

@sef43
Copy link
Contributor

sef43 commented Mar 6, 2026

Another issue is that we really want to be using torch.compile, particulary when there is less than 100 atoms we want to use static_shapes=True and torch.compile with

            self.compiled_model = torch.compile(
                self.model,
                backend="inductor",
                dynamic=False,
                fullgraph=True,
                mode="reduce-overhead",
            )

as done here: https://github.com/torchmd/torchmd-net/blob/df4cc6af6a3a05cc078bc80e80ac6c56061f3e2e/torchmdnet/calculators.py#L297-L303

These settings fuse operations, and importantly capture the whole model in a graph and uses cudagraphs. It can increase the speed from tens of steps per second to hundreds of steps per second, almost reaching 1ms per step latency. Without cudagraphs and compile the models get stuck at 10ms per step due to pytorch overheads.

@peastman
Copy link
Member Author

peastman commented Mar 6, 2026

Thanks, I made the changes. Do they look ok?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PBC not applied in TorchMDNetPotential

2 participants