Skip to content

Commit 70b35d4

Browse files
drizk1kdpsingh
andauthored
adds comparison docs (#112)
* adds comparison docs * added missing space * fixes `n` slice_min/max bug (#110) * fixes `n` slice_min/max bug * adds `@head` * Clean up documentation in prep for release, bump version to v0.16.2. * Fix doctest. --------- Co-authored-by: Karandeep Singh <[email protected]> * adds extra for sep and remove for unite (#113) * adds extra for sep and remove for unite * switch from `warn` ex to `drop` ex in docstring * add :cat_other, :cat_replace_missing, :cat_recode to donotvec list * fixes `n` slice_min/max bug (#110) * fixes `n` slice_min/max bug * adds `@head` * Clean up documentation in prep for release, bump version to v0.16.2. * Fix doctest. --------- Co-authored-by: Karandeep Singh <[email protected]> * Cleaned up docstrings. * Clean up NEWS.md --------- Co-authored-by: Karandeep Singh <[email protected]> * Clean up comparison docs. --------- Co-authored-by: Karandeep Singh <[email protected]>
1 parent ad1e8b5 commit 70b35d4

File tree

2 files changed

+59
-0
lines changed

2 files changed

+59
-0
lines changed
Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
# TidierData.jl is built on DataFrames.jl.
2+
3+
# This section will directly compare the two package syntaxes.
4+
#
5+
# This documentation is based directly off of the DataFrames.jl documentation [comparing different workflows.](https://dataframes.juliadata.org/stable/man/comparisons/#Comparison-with-the-R-package-dplyr)
6+
7+
# To run these examples, use these two dataframes.
8+
9+
# ```julia
10+
# using DataFrames, TidierData # TidierData re-exports Statistics.jl which is why it does not need to be explicitly loaded.
11+
# df = DataFrame(grp = repeat(1:2, 3), x = 6:-1:1, y = 4:9, z = [3:7; missing], id = 'a':'f')
12+
# df2 = DataFrame(grp = [1, 3], w = [10, 11])
13+
# ```
14+
15+
# ## Basic Operations
16+
# | Operation | TidierData.jl | DataFrames.jl |
17+
# |:-------------------------|:-------------------------------------|:---------------------------------------|
18+
# | Reduce multiple values | `@summarize(df, mean_x = mean(x))` | `combine(df, :x => mean)` |
19+
# | Add new columns | `@mutate(df, mean_x = mean(x))` | `transform(df, :x => mean => :x_mean)` |
20+
# | Rename columns | `@rename(df, x_new = x)` | `rename(df, :x => :x_new)` |
21+
# | Pick columns | `@select(df, x, y)` | `select(df, :x, :y)` |
22+
# | Pick & transform columns | `@transmute(df, mean_x = mean(x), y)`| `select(df, :x => mean, :y)` |
23+
# | Pick rows | `@filter(df, x >= 1)` | `subset(df, :x => ByRow(x -> x >= 1))` |
24+
# | Sort rows | `@arrange(df, x)` | `sort(df, :x)` |
25+
26+
# As in DataFrames.jl, some of these functions can operate by group on a grouped dataframe.
27+
# Below we show TidierData macros chained together.
28+
29+
# ## Grouped DataFrames
30+
# | Operation | TidierData.jl | DataFrames.jl |
31+
# |:-------------------------|:-----------------------------------------------------------|:--------------------------------------------|
32+
# | Reduce multiple values | `@chain df @group_by(grp) @summarize(mean_x = mean(x))` | `combine(groupby(df, :grp), :x => mean)` |
33+
# | Add new columns | `@chain df @group_by(grp) @mutate(mean_x = mean(x))` | `transform(groupby(df, :grp), :x => mean)` |
34+
# | Pick & transform columns | `@chain df @group_by(grp) @select(mean_x = mean(x), y)` | `select(groupby(df, :grp), :x => mean, :y)` |
35+
36+
# ## More advanced commands are shown below:
37+
38+
# | Operation | TidierData.jl | DataFrames.jl |
39+
# |:--------------------------|:----------------------------------------------------------|:---------------------------------------------------------------------------|
40+
# | Complex Function | `@summarize(df, mean_x = mean(skipmissing(x)))` | `combine(df, :x => x -> mean(skipmissing(x)))` |
41+
# | Transform several columns | `@summarize(df, x_max = maximum(x), y_min = minimum(y))` | `combine(df, :x => maximum => :x_max, :y => minimum => :y_min)` |
42+
# | | `@summarize(df, across((x, y), mean))` | `combine(df, [:x, :y] .=> mean)` |
43+
# | | `@summarize(df, across(starts_with("x"), mean))` | `combine(df, names(df, r"^x") .=> mean)` |
44+
# | | `@summarize(df, across((x, y), (maximum, minimum)))` | `combine(df, ([:x, :y] .=> [maximum minimum])...)` |
45+
# | DataFrame as output | `@summarize(df, test = [minimum(x), maximum(x)])` | `combine(df, :x => (x -> (value = [minimum(x), maximum(x)],)) => AsTable)` |
46+
47+
48+
# ## Joining DataFrames
49+
50+
# | Operation | TidierData.jl | DataFrames.jl |
51+
# |:----------------------|:------------------------------------------------|:--------------------------------|
52+
# | Inner join | `@inner_join(df, df2, grp)` | `innerjoin(df, df2, on = :grp)` |
53+
# | Outer join | `@outer_join(df, df2, grp)` | `outerjoin(df, df2, on = :grp)` |
54+
# | Left join | `@left_join(df, df2, grp)` | `leftjoin(df, df2, on = :grp)` |
55+
# | Right join | `@right_join(df, df2, grp)` | `rightjoin(df, df2, on = :grp)` |
56+
# | Anti join (filtering) | `@anti_join(df, df2, grp)` | `antijoin(df, df2, on = :grp)` |
57+
# | Semi join (filtering) | `@semi_join(df, df2, grp)` | `semijoin(df, df2, on = :grp)` |
58+

docs/mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -139,5 +139,6 @@ nav:
139139
- "Interpolation" : "examples/generated/UserGuide/interpolation.md"
140140
- "Auto-vectorization" : "examples/generated/UserGuide/autovec.md"
141141
# - "Benchmarking" : "examples/generated/UserGuide/benchmarking.md"
142+
- "Comparison to DF.jl" : "examples/generated/UserGuide/comparisons.md"
142143
- "Contribute" : "examples/generated/Contributors/Howto.md"
143144
- "Reference" : "reference.md"

0 commit comments

Comments
 (0)