Good AIMD RMSE but unstable MD; adding distortions breaks TRAIN error without fixing MD stability #1342
-
|
Summary I am training MACE on a 10 ps DFT-AIMD trajectory. Case 1: AIMD-only training (10 ps)Training/validation/test split by continuous time blocks. Despite these errors, running MD with this model leads to:
Case 2: AIMD + distorted geometries (distortions added only to TRAIN)Observations:
Training settings --foundation_model="small" \
--energy_key="REF_energy" \
--multiheads_finetuning=False \
--forces_key="REF_forces" \
--model="MACE" \
--E0s="average" \
--num_channels=256 \
--max_L=2 \
--correlation=3 \
--max_num_epochs=500 \
--batch_size=10 \
--patience=50 \
--valid_batch_size=10 \
--lr=0.001 \
--energy_weight=1.0 \
--forces_weight=100.0 \
--weight_decay=1e-8 \
--error_table='PerAtomMAE' \
--ema \
--ema_decay=0.99 \
--amsgrad \
--restart_latest \
--default_dtype="float64" \
--device=cuda \
--seed=1 \
--scaling='rms_forces_scaling' \
--save_cpuQuestions
GoalMy goal is not to replace AIMD, but to:
Any guidance on best practices for this workflow would be greatly appreciated. because I am a new user for MACE |
Beta Was this translation helpful? Give feedback.
Replies: 7 comments
-
|
Hello! thanks for trying our code. We'll help you diagnose the problem. On the face of it, your strategy is not wrong. Generally we find that potentials trained only on short AIMD trajectories are stable when you try them for MD of the same composition and temperature (I say short because MD is very correlated, so you may have 10ps, 10K frames, perhaps only 20-30 of them are actually independent!). And yes, adding distorted structures (or frames from higher temperature MD runs, different compositions) does generally help stability. However, the fact that adding new structures to your training set increases the training RMSE so dramatically suggests to me that there is incompatibility between your two different types of data. To test this, take one of your frames from your AIMD that you put into the training set, and recompute its energy and forces it with the exact settings you used for the distorted configs, compare the energy, forces and stress components between the original AIMD-derived data and your single point recompilation. Also, it would help if you posted the kinds of plots the MACE training outputs, i.e. reference vs. mace predictions for energies, force components and stress components (even better if you colour separately the points from the AIMD and the distorted structures in case of your second training) |
Beta Was this translation helpful? Give feedback.
-
|
Hello, Thank you very much for your reply and for sharing your excellent code. I believe the issue with the very large training force error (TRAIN: RMSE F ≈ 6814 meV/Å) was caused by including highly distorted configurations in the training set. These structures contained broken bonds or atoms that had effectively “flown away,”. I have since removed those configurations and instead added MD frames generated at higher but controlled temperatures (600 K and 900 K). With this revised dataset, the training now converges to much more reasonable values. For example, at epoch 2009 I obtain: loss ≈ 0.0010, MAE energy per atom ≈ 28.35 meV, and MAE force ≈ 12.32 meV/Å. As you suggested, I also selected a single configuration from train set and performed a single-point calculation using the same settings and the result was almost the same. That said, I am still struggling to understand whether I should train entirely from scratch or start from a small foundation model. I have tried both approaches, but I observe similar behavior in each case for the instability of MD. Additionally, I would appreciate your guidance on dataset size: does it matter significantly whether I use a smaller split (e.g., 800 training, 100 validation, 100 test frames) versus a larger one (e.g., 4000 training, 500 validation, 500 test frames)? Finally, I did not fully understand what you meant by the “kinds of plots” produced by MACE during training—specifically, reference versus MACE predictions for energies, force components, and stress components. I am not considering stress, as my systems are non-periodic and I do not require PBCs. I apologize if this is a basic question; I am still a new user. Thank you again for your time and support. Kind regards, |
Beta Was this translation helpful? Give feedback.
-
|
Your energy RMSE is very high. Typically decent potentials achieve a couple of meV/atom, and often you can get it down to ~ 1 meV/atom (but that requires converged basis sets, no CP2K, and elimination of all noise from your training data). what electronic structure code are you using? We find that you will get better models when you fine-tune a foundation model, but it does require a little bit of experience (you are managing both the old data and the new data), multiple heads on the model (one for parts of the old data, one for the new data), so my suggestion is that you first get a model that is reasonable when trained from scratch, and then you can try to improve on it by fine-tuning. More data will help get you lower errors, especially if the new data is uncorrelated with the previous data. The amount depends on what accuracy you are looking for. You are using "average" E0s (i.e. the energies for isolated atoms), rather than DFT computed isolated atom energies. The latter are important if you want the potential to describe bond breaking, dissociations, and not using isolated atom energies is indeed why your training was much worse when you included broken configurations where atoms "flew away". When you train a mace, it produces parity plots (references vs mace predictions) for energies and forces (and stresses if you have them, not if you don't). You don't have to use these, you can make your own. upload such plots here, it will help a lot to understand what you are trying to do, and how to get the energy error down. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
|
You have some very strong negative correlations. My theory is that when you fit to aimd data you indeed have the forces in the dataset but when you do the single point calculations on the distorted structured, you have the negative of the force, I.e. the gradient. To confirm Please do the test I suggested where you compare the two electronic structure calculations, one from the aimd and one with you single point calculations on exactly the same structure |
Beta Was this translation helpful? Give feedback.
-
|
It's also possible that both your aimd and single point calculations have the gradient rather than for force (negative gradient) |
Beta Was this translation helpful? Give feedback.
-
|
Thank you very much. You were completely right, I was using gradients for both the AIMD and the single-point calculations. I have now achieved an energy MAE of ~0.1 meV/atom and a force RMSE of ~16 meV/Å (I plan to increase the number of epochs to further reduce the force MAE). The MD simulations are now running stably. I have one additional question: if I want to retrain the model using new data, is it acceptable to continue training from the existing checkpoint, or is it better practice to restart from the beginning and include all structures (old and new) in a single training set? |
Beta Was this translation helpful? Give feedback.






It's also possible that both your aimd and single point calculations have the gradient rather than for force (negative gradient)