-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Description
I was wondering if it would be pertinent to add a halve
method for a GroupedDataFrame (as a package extension). It is fairly simple to write:
function halve(gdf::GroupedDataFrame)
(left, right) = halve(keys(gdf))
return (gdf[left], gdf[right])
end
It is useful because DataFrames will still pick and choose when to spawn threads in its combine
method, even when you have the threads=true
kwarg set. In practice I have found a few cases recently where the DataFrames implementation is only single-threaded, and writing a Folds-based reducer like this has utilized all CPU cores and sped up my computations:
init = DataFrame(...) # empty, correct columns and types
Folds.mapreduce(vcat, groupby(df, :key); init) do subdf
...
end
Metadata
Metadata
Assignees
Labels
No labels