- 
                Notifications
    
You must be signed in to change notification settings  - Fork 49
 
Description
@owlas @arokem I'm running fci.random_forest_error on a fairly large dataset.
train: 3334431, 200
test: 13703350, 200
(train is smaller after undersampling)
I'm trying to use both the memory_constrained version and the low_memory version (#74).
I ran the memory_constrained version on a m5.24xlarge EC2 instance with 384.0 GiB of memory and 96 vCPUs. I gave it a memory_limit of 100000 MB (100 GB). This only utilized about half of the memory on the instance and it ran for over 48 hours until I just terminated the instance.
I'm currently running the low_memory option on a m5.12xlarge (192 GiB memory, 48 vCPU) which has been running for 15 hours straight and hasn't finished yet. Using top I do see that all of the CPUs are being utilized to 100%.
I have a few questions:
- 
how can I estimate the ideal size of
memory_limit? I understand its the max size of the intermediate matrices from the docs but it wasn't clear to me how many intermediate matricies are being created at a time? Is it sequential, i.e. do I just give it the whole RAM? - 
is the
memory_constrainedversion faster than thelow_memoryoption given large enough memory limit? It wasn't clear to me which one I should be expecting to complete. - 
Is there any way to show a progress indicator (even if I have to hack in a print for now)? I'd like to know how close I am to completion with jobs that seem to take multiple days to run.
 - 
Overall, I'm looking to precompute as much as possible and then run this model on live predictions one at a time. Looking at the code I believe this should be possible- does this sound do-able to you?
 
Any other advice you can offer would be most helpful.
Thanks!
EDIT: I'm trying the low_memory option again with a memory_limit of 300000 (300 GiB). I believe that limit does indeed appear to be sequential. The memory slowly crawls up to the max, all the cores kick in for a few minutes, and then the memory comes back down again.
Notably, while the memory is slowly filling up, only one core is being. Only when the memory fills up all the way, most of the cores kick into action for about 2 minutes. Then it takes 1 minute for the memory to go back down with a single core being used. The cycle then starts again.
Is there perhaps some optimization that can allow all the cores to be used more efficiently? It seems the majority of time is currently spent waiting until the memory limit is reached and only then some computation occurs.
As mentioned before, the low_memory option, on the other hand, is constantly using all cores at 100%.