-
-
Notifications
You must be signed in to change notification settings - Fork 1k
Open
Description
Map_sync with pandas operation function does not finish.
I have very long dataframe. So I split the dataframe into 40 sub-dataframes, and apply pandas operation to 40 sub-dataframes parallelly by using map_sync. The pandas operation is just about groupby and apply.
My code is like this:
PEN = 40
dfs = np.array_split(target_df, PEN)
c = ipp.Cluster(n=PEN)
with c as rc:
e_all = rc[:]
results = e_all.map_sync(FUCTION, dfs)
results
I have 30 target_dfs. For the first 10 target dfs map_sync worked fine. But after that map_sync didn't complete.
I have found that without parallelism, the pandas job applied to target_df completes in under 2 hours.
I use window os and Ipyparallel version is the lastest.
Metadata
Metadata
Assignees
Labels
No labels