progress indicator?

@owlas @arokem I'm running `fci.random_forest_error` on a fairly large dataset.

```
train: 3334431, 200
test: 13703350, 200
```

(train is smaller after undersampling)

I'm trying to use both the `memory_constrained` version and the `low_memory` version (https://github.com/scikit-learn-contrib/forest-confidence-interval/pull/74).

I ran the `memory_constrained` version on a m5.24xlarge EC2 instance with 384.0 GiB of memory and 96 vCPUs. I gave it a memory_limit of `100000` MB (100 GB). This only utilized about half of the memory on the instance and it ran for over 48 hours until I just terminated the instance.

I'm currently running the `low_memory` option on a m5.12xlarge (192 GiB memory, 48 vCPU) which has been running for 15 hours straight and hasn't finished yet. Using `top` I do see that all of the CPUs are being utilized to 100%.

I have a few questions:

1) how can I estimate the ideal size of `memory_limit`? I understand its the max size of the intermediate matrices from the docs but it wasn't clear to me how many intermediate matricies are being created at a time? Is it sequential, i.e. do I just give it the whole RAM?

2) is the `memory_constrained` version faster than the `low_memory` option given large enough memory limit? It wasn't clear to me which one I should be expecting to complete.

3) Is there any way to show a progress indicator (even if I have to hack in a print for now)? I'd like to know how close I am to completion with jobs that seem to take multiple days to run.

4) Overall, I'm looking to precompute as much as possible and then run this model on live predictions one at a time. Looking at the code I believe this should be possible- does this sound do-able to you?

Any other advice you can offer would be most helpful.

Thanks!

EDIT: I'm trying the `low_memory` option again with a memory_limit of `300000` (300 GiB). I believe that limit does indeed appear to be sequential. The memory slowly crawls up to the max, all the cores kick in for a few minutes, and then the memory comes back down again.

Notably, while the memory is slowly filling up, only one core is being. Only when the memory fills up all the way, most of the cores kick into action for about 2 minutes. Then it takes 1 minute for the memory to go back down with a single core being used. The cycle then starts again.

Is there perhaps some optimization that can allow all the cores to be used more efficiently? It seems the majority of time is currently spent waiting until the memory limit is reached and only then some computation occurs.

As mentioned before, the `low_memory` option, on the other hand, is constantly using all cores at 100%.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

progress indicator? #81

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

progress indicator? #81

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions