-
Notifications
You must be signed in to change notification settings - Fork 55
Description
Describe the bug
I am running crest on relatively large transition-metal complexes (>100 atoms) using GFN2-xTB. I noticed that some jobs can take days on 48 cores, with the bottleneck being at the MTD step.
One example:
-----------------
Wall Time Summary
-----------------
CREST runtime (total) 4 d, 10 h, 16 min, 1.196 sec
------------------------------------------------------------------
Trial metadynamics (MTD) ... 20 min, 57.004 sec ( 0.329%)
Metadynamics (MTD) ... 3809 min, 42.823 sec ( 59.751%)
Geometry optimization ... 1244 min, 51.169 sec ( 19.524%)
Molecular dynamics (MD) ... 1275 min, 33.403 sec ( 20.006%)
Genetic crossing (GC) ... 24 min, 40.559 sec ( 0.387%)
I/O and setup ... 0 min, 16.238 sec ( 0.004%)
------------------------------------------------------------------
* wall-time: 4 d, 10 h, 16 min, 1.196 sec
* cpu-time: 107 d, 23 h, 49 min, 24.465 sec
* ratio c/w: 24.390 speedup
------------------------------------------------------------------
* Total number of energy+grad calls: 4512415
The MTDs themselves seem to be quick:
*MTD 9 completed successfully ... 15 min, 4.139 sec
*MTD 11 completed successfully ... 15 min, 22.552 sec
*MTD 8 completed successfully ... 18 min, 59.209 sec
*MTD 12 completed successfully ... 27 min, 34.236 sec
*MTD 6 completed successfully ... 31 min, 41.620 sec
*MTD 13 completed successfully ... 32 min, 11.275 sec
*MTD 2 completed successfully ... 38 min, 13.662 sec
*MTD 5 completed successfully ... 39 min, 32.141 sec
*MTD 3 completed successfully ... 40 min, 51.536 sec
*MTD 10 completed successfully ... 51 min, 4.949 sec
*MTD 7 completed successfully ... 57 min, 24.290 sec
*MTD 1 completed successfully ... 57 min, 56.692 sec
*MTD 4 completed successfully ... 59 min, 36.460 sec
*MTD 14 completed successfully ... 16 min, 1.645 sec
In particular, I notice that the jobs typically stall in this part of the MTD simulations, as I sporadically track the stdout:
...
========================================
MTD Simulations done
========================================
Collecting ensmbles.
CREGEN> running RMSDs ... done.
CREGEN> E lowest : -192.88698
45 structures remain within 6.00 kcal/mol window
init_shake: metal bond 1 2 not constrained
init_shake: metal bond 1 4 not constrained
init_shake: metal bond 1 21 not constrained
init_shake: metal bond 1 61 not constrained
init_shake: metal bond 2 49 not constrained
init_shake: metal bond 45 49 not constrained
init_shake: metal bond 49 51 not constrained
>>> the jobs will stall here <<<
===============================================
Additional regular MDs on lowest 4 conformer(s)
===============================================
...
top reports high CPU utilization of approximately 6000%, so crest is apparently still running. Structures are also being added to MD_FILES/crest_trj.*, so it appears crest is still doing something despite the lack of printout in stdout, but the additions are relatively slow (perhaps 2-3 minutes per addition?)
I am removing the structure from the stdout log that I've attached (happy to provide via DM for testing). I am also considering swapping over to just using --gfn2//gfnff for this workflow instead which seems to work much quicker on some other test cases.