Skip to content

map_sync with pandas operation function does not finish. #844

@yun881201

Description

@yun881201

Map_sync with pandas operation function does not finish.

I have very long dataframe. So I split the dataframe into 40 sub-dataframes, and apply pandas operation to 40 sub-dataframes parallelly by using map_sync. The pandas operation is just about groupby and apply.

My code is like this:
PEN = 40
dfs = np.array_split(target_df, PEN)
c = ipp.Cluster(n=PEN)
with c as rc:
e_all = rc[:]
results = e_all.map_sync(FUCTION, dfs)
results

I have 30 target_dfs. For the first 10 target dfs map_sync worked fine. But after that map_sync didn't complete.
I have found that without parallelism, the pandas job applied to target_df completes in under 2 hours.
I use window os and Ipyparallel version is the lastest.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions