-
Notifications
You must be signed in to change notification settings - Fork 133
Add NEB, ApproxNEB jobs / workflows #1007
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add NEB, ApproxNEB jobs / workflows #1007
Conversation
for clarity
|
@JaGeo should be ready to merge in. There are a few simple tests of the workflows in the PR. From more extensive tests (especially of the VASP Approx / NEB) in our group, these seem to be performant. Avoiding adding more for now since these are longer-running flows Over time, we may want to modify the document schema a bit (I've already found at least one spot where metadata could be improved) but I would lean towards merging in since these are only additions @davidwaroquiers I'll release a new version after merging |
gpetretto
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @esoteric-ephemera, thanks a lot for the great work implementing this.
I did not go through the code in details, but testing it I came across a couple of potential issues that I mentioned in the comments below.
As an additional point I wanted to check if I am getting right that at the moment there is no equivalent of the NebFromEndpointsMaker for the ASE NEB. It is easy to create such a flow manually, but it may be convenient to have something like this as well.
| "You must pip install `pymatgen-analysis-diffusion` " | ||
| "to generate images with IDPP." | ||
| ) from exc | ||
| return IDPPSolver.from_endpoints( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that here an instance of IDPPSolver is returned. I think that the output of this function should be the output of the IDPPSolver.run() method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Will add some logic for separating out the IDPPSolver.run and constructor kwargs
src/atomate2/common/jobs/neb.py
Outdated
| """ | ||
| if interpolation_method == NebInterpolation.LINEAR: | ||
| return endpoints[0].interpolate( | ||
| endpoints[1], nimages=num_images, **interpolation_kwargs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have seen that the pymatgen interpolate method behaves in a somewhat weird way: nimages+1 structures are returned: https://github.com/materialsproject/pymatgen/blob/f9d9fe8e0ce09ef30cc03bcc4e9937d27afd5a6a/src/pymatgen/core/structure.py#L2443
So you get a set of structures which is not nimages in total nor nimages plus the initial and final structure. Which I found quite unexpected.
In addition this is at variance with what the IDPPSolver does. In that case a total of nimages+2 structures is returned.
So the meaning of nimages varies depending on the inteprolation method.
I would suggest passing num_images+1 to interpolate. This would maintain consistency among the methods and would also seem more reasonable.
|
@gpetretto thanks for reviewing and testing! |
|
Thanks @gpetretto! Have drafted up the ASE NEB from endpoints maker, still needs to be incorporated with the forcefields. I'll get back to this next week |
Thanks a lot for putting all of this together! |
gpetretto
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry if this comes as a second round of comments, but I have now tested also the VASP and ApproxNEB and encountered a few issues there as well.
See also the related PR on emmet: materialsproject/emmet#1255
src/atomate2/vasp/jobs/neb.py
Outdated
| }, | ||
| } | ||
| ) | ||
| lclimb: bool = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this is supposed to be propagated to the NebSetGenerator or if it is a leftover of a previous option, but it looks like it is not used anywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was a holdover from earlier logic, thanks for the catch
src/atomate2/vasp/jobs/neb.py
Outdated
| } | ||
| ) | ||
| lclimb: bool = True | ||
| kpoints_kludge: Kpoints | None = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it may be good to add the docstrings for this option, as its use is quite obscure without looking at the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I'll move it into the main docstr (the explanation was buried in the maker) - will also change the default behavior to use this fix
| num_sites = len(images[0]) | ||
|
|
||
| tags = [os.getcwd()] | ||
| is_force_conv = all( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not try to do the math myself to see what the problem could be with this point, but in one of my tests I had an issue. The optimizer considered the forces converged and stopped the optimization after few loops, but this check was False and thus the task was marked as failed.
More in general, is there a need for this check specifically? optimize.run already returns a bool to specify if the optimization converged or not, why not using that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably has to do with the difference in NEB forces (interatomic + spring) vs plain interatomic forces. I'll take this out - I like the idea of using optimize.run to determine the task state, I'll update the other jobs as well
| run_vasp(**self.run_vasp_kwargs) | ||
|
|
||
| # parse vasp outputs | ||
| task_doc = get_vasp_task_document( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a generic comment on the NebFromImagesMaker, but I wanted to mention that I found somewhat confusing the fact that this job returns a NebTaskDoc, because by construction this will only contain the energies of the intermediate images and not those of the endpoints. As a consequence all the barrier analysis and values reported in the output document are wrong.
Even assuming that this is always used inside the NebFromEndpointsMaker the output database will contain two NebTaskDoc coming from the same flow, one of which is incomplete.
I am afraid I don't have a much better way of proceeding to propose. One potential suggestion would be that the output of this job should only be a minimal document with the list of folders and some information about the vasp calculation, relying on the fact that a collect_neb_output Job will be run as a subsequent step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, maybe there should be an NebIntermediateImageResult class that has this minimal info. Wouldn't need to be in emmet since this is just for book-keeping on the atomate2 side
| "PREC": "Normal", | ||
| "NSW": 99, | ||
| "LCHARG": False, | ||
| "IBRION": 2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if it was just because my example was too trivial, but with the default EDIFF the forces were not converging even for a large number of steps, probably due to the SCF not being properly converged. It instead converged was quite fast as soon as I lowered EDIFF. I don't have many test cases, but if you also encountered this issue it may be worth considering using a lower value as a default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Always hard to say with NEB but EDIFF = 1e-6 or 1e-7 might be a safer default
| ) | ||
| run_vasp_kwargs: dict = field( | ||
| default_factory=lambda: { | ||
| "job_type": JobType.DOUBLE_RELAXATION, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I am missing something but it seems that with set_type="image" the INCAR containts ISIF=2 and thus volume is not changing. So what is the point of the double relaxation here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is mostly for continuity / reproducibility of the original atomate version. Might be good to have a legacy / current split of the input sets, just like we do for the EOS workflows
| # compatibility with legacy input (list) | ||
| if isinstance(inserted_coords_dict, list): | ||
| inserted_coords_dict = dict(enumerate(inserted_coords_dict)) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to have a check here to verify that inserted_coords_dict and inserted_coords_combo are consistent. I had a typo in my inserted_coords_combo and this caused an error only late in the Flow execution, with no obvious link to the cause of the error.
| the mobile species in ApproxNEB | ||
| inserted_coords_dict: dict or list | ||
| a dictionary containing site coords (endpoints) for working ions | ||
| in the simulation cell |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better specify that these should be fractional coordinates. I had to look at the code to know if they had to be fractional or cartesian.
| """ | ||
| ep_jobs: list[Job] = [] | ||
| ep_output: dict[str, dict[str, Any]] = {} | ||
| for idx, ep in enumerate(end_point_structures): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is mentioned in the docstrings, but I would add a check that only two structures are actually passed in the end_point_structures.
In principle I suppose there would be no particular issue implementing this Flow accepting more than 2 endpoints and calculating all the subsequent jumps. Before checking the docstrings I was expecting it would have been possible to pass more than 2.
| ep_relax_output = {} | ||
| ep_relax_jobs = [] | ||
| for ep_index, ep_coords in endpoint_coords.items(): | ||
| if int(ep_index) in ep_distinct: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it is redundant with the other check I am suggesting in CommonApproxNebMaker, but in case this Job ends up being used outside that flow it may be good to make a check here as well. In this loop if one of the keys in ep_distinct is not endpoint_coords the code will just pass without creating the relaxation job for that endpoint, but fail later on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having both is a good fallback
| name: str = "Forcefield NEB from images" | ||
| force_field_name: str | MLFF = MLFF.Forcefield | ||
|
|
||
| @job(data=_FORCEFIELD_DATA_OBJECTS, schema=NebResult) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I just realized that here and in all the other jobs the schema is set through the schema argument, but it should be output_schema. Jobflow does not raise an error since a job accepts kwargs.
|
@gpetretto @JaGeo I think I've addressed all comments - any objections to merging? |
|
Also @fraricci I've added the checks you suggested for NPTBerendsen, plus a test for correctly setting the MD kwargs |
|
@JaGeo noticed I had some typos in the MACE + explicit D3 calculator, fixed those, added a better test, and am going to merge this PR |
|
Sound good @esoteric-ephemera ! |
Summary
Adding NEB workflows and schemas for VASP / ASE / MLFF. This hopefully brings atomate2 to full parity with the feature set of atomate
@hmlli is leading the port of ApproxNEB flows for VASP
Very open to suggestions, particularly for document schema structure
Key updates:
Miscellaneous