-
-
Notifications
You must be signed in to change notification settings - Fork 276
Description
Summary:
When cadCAD runs in parallel mode, cadCAD will underutilize CPUs somewhere around 75% of available CPUs will go vacant. It's not that it doesn't try and use the CPU it's that it thrashes, the CPU by trying to create new process pools for every config that it wants to run in parallel. This in turn causes the process manager to thrash, because it will constantly utilize then free up memory.
Motivation:
cadCAD performance increase to be able to utilize 100% of the cpu in a multithreaded situation, that can save hours off of a large simulation. It will also prevent too many process file handles being opened during execution of a simulation, I believe this might be related to #350 .
Solution:
The solution is to refactor execution.py to use a single process pool, as intended by the creators of the package, and instead refactor the simulation to instead only create the pool once, and then reuse cpus as they become available. I also suggest, that we include an option to write intermediate results to disk, and to read them back without loading the entire dataset into memory.
I found that once I increased the parallelization, it was really easy to run out of memory for a large simulation, simply because all intermediate results were being held. My temporary solution is to write these intermediate results to temporary disk. This will prevent processes from running out of memory when running in a highly ( think 16 cores or more) parallel environment. The downside of this is that , when the simulation is complete, cadCAD currently requires you to load everything back into memory. This in turn is memory intensive as well.
The solution here is to continue the refactor to have the final data, load iteratively into memory vs altogether. My initial experiments were with cadcad 0.4.28 to fix this problem. I am hopeful that maybe the datacopy enhancement would reduce the overall memory capacity, which might make it less likely to be an issue. I think the real solution is to iteratively load it.
I'd be happy to go down this route of making this configurable. The PR i've written auto writes and reads from disk, this should be a config option. I wanted to know if this is a direciton worth going, if so I can make it prod worthy.