Understanding the behaviour of Modin #6226

overseek944 · 2023-05-31T19:19:13Z

overseek944
May 31, 2023

#033[2m#033[33m(raylet)#033[0m [2023-05-29 19:08:22,049 E 1099 1099] (raylet) node_manager.cc:3071: 13 Workers (tasks / actors) killed due to memory pressure (OOM), 0 Workers crashed due to other reasons at node over the last time period.

Data details
Number of Files - 5000+ JSON files
Every file will be of size around 70 MBs
total size - 350 GBs

Instance details
type - ml.m5.24xlarge
memory - 386 Gbs
CPUs - 96

Hi Team,
I want to understand what can be the potential reasons for this failure and how can this be fixed?
I am trying this in TrainingJob where I am using SKLearnProcessor. This is not distributed so I am just using 1 instance type of ml.m5.24xlarge.

Reference -
https://stackoverflow.com/questions/76043804/ray-workers-being-killed-because-of-oom-pressure
I gone through the above post but still I am not able to understand how it can take upto 2x memory overhead.

anmyachev · 2023-06-05T13:47:21Z

anmyachev
Jun 5, 2023
Collaborator

Hi @overseek944!

Reference -
https://stackoverflow.com/questions/76043804/ray-workers-being-killed-because-of-oom-pressure
I gone through the above post but still I am not able to understand how it can take upto 2x memory overhead.

We read each part of the csv file into a temporary buffer, and then pass that buffer as input to the read function of pandas itself. That is, at the peak moment, we can have on each process both a buffer and a pandas dataframe created from it, which can be roughly estimated as 2 times more memory. This is also true for reading json files if they are created with lines=True option.

I want to understand what can be the potential reasons for this failure and how can this be fixed?

I believe that in this case the reason is only a lack of RAM. Therefore, you need to either reduce the number of files that you use, or increase the amount of RAM on the machine.

2 replies

overseek944 Jun 5, 2023
Author

Thanks for reply @anmyachev, how much RAM should I ideally use to successfully complete the execution for the given settings -
Number of Files - 5000+ JSON files
Every file will be of size around 70 MBs
Total Size - 350 GBs

anmyachev Jun 6, 2023
Collaborator

@overseek944 this very much depends on your workload, because each operation in it can allocate memory for new dataframes. I would suggest that here you need to focus on volumes of about 1TB of RAM.

P.S. If you find that some Modin operation is using more memory than it should, please write us and we'll check.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Understanding the behaviour of Modin #6226

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Understanding the behaviour of Modin #6226

Uh oh!

Uh oh!

overseek944 May 31, 2023

Replies: 1 comment · 2 replies

Uh oh!

anmyachev Jun 5, 2023 Collaborator

Uh oh!

overseek944 Jun 5, 2023 Author

Uh oh!

anmyachev Jun 6, 2023 Collaborator

overseek944
May 31, 2023

Replies: 1 comment 2 replies

anmyachev
Jun 5, 2023
Collaborator

overseek944 Jun 5, 2023
Author

anmyachev Jun 6, 2023
Collaborator