|
| 1 | +# TidierData.jl is built on DataFrames.jl. |
| 2 | + |
| 3 | +# This section will directly compare the two package syntaxes. |
| 4 | +# |
| 5 | +# This documentation is based directly off of the DataFrames.jl documentation [comparing different workflows.](https://dataframes.juliadata.org/stable/man/comparisons/#Comparison-with-the-R-package-dplyr) |
| 6 | + |
| 7 | +# To run these examples, use these two dataframes. |
| 8 | + |
| 9 | +# ```julia |
| 10 | +# using DataFrames, TidierData # TidierData re-exports Statistics.jl which is why it does not need to be explicitly loaded. |
| 11 | +# df = DataFrame(grp = repeat(1:2, 3), x = 6:-1:1, y = 4:9, z = [3:7; missing], id = 'a':'f') |
| 12 | +# df2 = DataFrame(grp = [1, 3], w = [10, 11]) |
| 13 | +# ``` |
| 14 | + |
| 15 | +# ## Basic Operations |
| 16 | +# | Operation | TidierData.jl | DataFrames.jl | |
| 17 | +# |:-------------------------|:-------------------------------------|:---------------------------------------| |
| 18 | +# | Reduce multiple values | `@summarize(df, mean_x = mean(x))` | `combine(df, :x => mean)` | |
| 19 | +# | Add new columns | `@mutate(df, mean_x = mean(x))` | `transform(df, :x => mean => :x_mean)` | |
| 20 | +# | Rename columns | `@rename(df, x_new = x)` | `rename(df, :x => :x_new)` | |
| 21 | +# | Pick columns | `@select(df, x, y)` | `select(df, :x, :y)` | |
| 22 | +# | Pick & transform columns | `@transmute(df, mean_x = mean(x), y)`| `select(df, :x => mean, :y)` | |
| 23 | +# | Pick rows | `@filter(df, x >= 1)` | `subset(df, :x => ByRow(x -> x >= 1))` | |
| 24 | +# | Sort rows | `@arrange(df, x)` | `sort(df, :x)` | |
| 25 | + |
| 26 | +# As in DataFrames.jl, some of these functions can operate by group on a grouped dataframe. |
| 27 | +# Below we show TidierData macros chained together. |
| 28 | + |
| 29 | +# ## Grouped DataFrames |
| 30 | +# | Operation | TidierData.jl | DataFrames.jl | |
| 31 | +# |:-------------------------|:-----------------------------------------------------------|:--------------------------------------------| |
| 32 | +# | Reduce multiple values | `@chain df @group_by(grp) @summarize(mean_x = mean(x))` | `combine(groupby(df, :grp), :x => mean)` | |
| 33 | +# | Add new columns | `@chain df @group_by(grp) @mutate(mean_x = mean(x))` | `transform(groupby(df, :grp), :x => mean)` | |
| 34 | +# | Pick & transform columns | `@chain df @group_by(grp) @select(mean_x = mean(x), y)` | `select(groupby(df, :grp), :x => mean, :y)` | |
| 35 | + |
| 36 | +# ## More advanced commands are shown below: |
| 37 | + |
| 38 | +# | Operation | TidierData.jl | DataFrames.jl | |
| 39 | +# |:--------------------------|:----------------------------------------------------------|:---------------------------------------------------------------------------| |
| 40 | +# | Complex Function | `@summarize(df, mean_x = mean(skipmissing(x)))` | `combine(df, :x => x -> mean(skipmissing(x)))` | |
| 41 | +# | Transform several columns | `@summarize(df, x_max = maximum(x), y_min = minimum(y))` | `combine(df, :x => maximum => :x_max, :y => minimum => :y_min)` | |
| 42 | +# | | `@summarize(df, across((x, y), mean))` | `combine(df, [:x, :y] .=> mean)` | |
| 43 | +# | | `@summarize(df, across(starts_with("x"), mean))` | `combine(df, names(df, r"^x") .=> mean)` | |
| 44 | +# | | `@summarize(df, across((x, y), (maximum, minimum)))` | `combine(df, ([:x, :y] .=> [maximum minimum])...)` | |
| 45 | +# | DataFrame as output | `@summarize(df, test = [minimum(x), maximum(x)])` | `combine(df, :x => (x -> (value = [minimum(x), maximum(x)],)) => AsTable)` | |
| 46 | + |
| 47 | + |
| 48 | +# ## Joining DataFrames |
| 49 | + |
| 50 | +# | Operation | TidierData.jl | DataFrames.jl | |
| 51 | +# |:----------------------|:------------------------------------------------|:--------------------------------| |
| 52 | +# | Inner join | `@inner_join(df, df2, grp)` | `innerjoin(df, df2, on = :grp)` | |
| 53 | +# | Outer join | `@outer_join(df, df2, grp)` | `outerjoin(df, df2, on = :grp)` | |
| 54 | +# | Left join | `@left_join(df, df2, grp)` | `leftjoin(df, df2, on = :grp)` | |
| 55 | +# | Right join | `@right_join(df, df2, grp)` | `rightjoin(df, df2, on = :grp)` | |
| 56 | +# | Anti join (filtering) | `@anti_join(df, df2, grp)` | `antijoin(df, df2, on = :grp)` | |
| 57 | +# | Semi join (filtering) | `@semi_join(df, df2, grp)` | `semijoin(df, df2, on = :grp)` | |
| 58 | + |
0 commit comments