|
| 1 | +--- |
| 2 | +output: hugodown::hugo_document |
| 3 | + |
| 4 | +slug: duckplyr-1-0-0 |
| 5 | +title: duckplyr fully joins the tidyverse! |
| 6 | +date: 2025-02-11 |
| 7 | +author: Kirill Müller and Maëlle Salmon |
| 8 | +description: > |
| 9 | + duckplyr 1.0.0 is on CRAN and part of the tidyverse! duckplyr is a drop-in |
| 10 | + replacement for dplyr, powered by DuckDB for speed. |
| 11 | +
|
| 12 | +photo: |
| 13 | + url: https://www.pexels.com/photo/a-mallard-duck-on-water-6918877/ |
| 14 | + author: Kiril Gruev |
| 15 | + |
| 16 | +# one of: "deep-dive", "learn", "package", "programming", "roundup", or "other" |
| 17 | +categories: [package] |
| 18 | +tags: |
| 19 | + - duckplyr |
| 20 | + - dplyr |
| 21 | + - tidyverse |
| 22 | +--- |
| 23 | + |
| 24 | +<!-- |
| 25 | +TODO: |
| 26 | +* [x] Look over / edit the post's title in the yaml |
| 27 | +* [x] Edit (or delete) the description; note this appears in the Twitter card |
| 28 | +* [x] Pick category and tags (see existing with `hugodown::tidy_show_meta()`) |
| 29 | +* [x] Find photo & update yaml metadata |
| 30 | +* [ ] Create `thumbnail-sq.jpg`; height and width should be equal |
| 31 | +* [ ] Create `thumbnail-wd.jpg`; width should be >5x height |
| 32 | +* [ ] `hugodown::use_tidy_thumbnails()` |
| 33 | +* [x] Add intro sentence, e.g. the standard tagline for the package |
| 34 | +* [x] `usethis::use_tidy_thanks()` |
| 35 | +--> |
| 36 | + |
| 37 | +We're very chuffed to announce the release of [duckplyr](https://duckplyr.tidyverse.org) 1.0.0. |
| 38 | +duckplyr is a drop-in replacement for dplyr, powered by DuckDB for speed. |
| 39 | +It joins the rank of dplyr backends together with [dtplyr](https://dtplyr.tidyverse.org) and [dbplyr](https://dbplyr.tidyverse.org). |
| 40 | + |
| 41 | +You can install it from CRAN with: |
| 42 | + |
| 43 | +```{r, eval = FALSE} |
| 44 | +install.packages("duckplyr") |
| 45 | +``` |
| 46 | + |
| 47 | +In this article, we'll introduce you to the basic usage of duckplyr, show how it can help you handle large data, and explain how you can help improve the package. |
| 48 | + |
| 49 | + |
| 50 | +## A drop-in replacement for dplyr |
| 51 | + |
| 52 | +The duckplyr package is a drop-in replacement for dplyr that uses DuckDB for speed. |
| 53 | + |
| 54 | +First, data is inputted using either conversion (from data in memory) or ingestion (from data in files) functions. |
| 55 | +Alternatively, calling `library(duckplyr)` overwrites dplyr methods, enabling duckplyr for the entire session no matter how data.frames are created. |
| 56 | + |
| 57 | +```{r load} |
| 58 | +library(conflicted) |
| 59 | +library(duckplyr) |
| 60 | +conflict_prefer("filter", "dplyr", quiet = TRUE) |
| 61 | +``` |
| 62 | + |
| 63 | +Then, the data manipulation pipeline uses the exact same syntax as a dplyr pipeline. |
| 64 | +The duckplyr package performs the computation using DuckDB, or, if a specific operation is not supported, fallbacks to dplyr. |
| 65 | + |
| 66 | + |
| 67 | +```{r} |
| 68 | +library("babynames") |
| 69 | +out <- babynames |> |
| 70 | + filter(n > 1000) |> |
| 71 | + summarize( |
| 72 | + .by = c(sex, year), |
| 73 | + babies_n = sum(n) |
| 74 | + ) |> |
| 75 | + filter(sex == "F") |
| 76 | +``` |
| 77 | + |
| 78 | +The result can finally be materialized to memory, or computed temporarily, or computed to a file. |
| 79 | + |
| 80 | +```{r} |
| 81 | +# to memory |
| 82 | +out |
| 83 | +
|
| 84 | +# to a file |
| 85 | +csv_file <- withr::local_tempfile() |
| 86 | +file.size(csv_file) |
| 87 | +compute_csv(out, csv_file) |
| 88 | +file.size(csv_file) |
| 89 | +``` |
| 90 | + |
| 91 | +When duckplyr itself does not support specific functionality, it falls back to dplyr. |
| 92 | +For instance, row names are not supported yet: |
| 93 | + |
| 94 | +```{r} |
| 95 | +mtcars |> |
| 96 | + summarize( |
| 97 | + .by = cyl, |
| 98 | + disp = mean(disp, na.rm = TRUE), |
| 99 | + sd = sd(disp, na.rm = TRUE) |
| 100 | + ) |
| 101 | +``` |
| 102 | + |
| 103 | +Current limitations are documented in a vignette. |
| 104 | +You can change the verbosity of fallbacks, refer to [`duckplyr::fallback_sitrep()`](https://duckplyr.tidyverse.org/reference/fallback.html). |
| 105 | + |
| 106 | + |
| 107 | + |
| 108 | +## A handy tool for large data |
| 109 | + |
| 110 | +## Help us improve duckplyr! |
| 111 | + |
| 112 | +Our goals for future development of duckplyr include: |
| 113 | + |
| 114 | +- Enabling users to provide custom translations of dplyr functionality; |
| 115 | +- Making it easier to contribute code to duckplyr. |
| 116 | + |
| 117 | +You can already help though, in three main ways: |
| 118 | + |
| 119 | +- Please report any issue especially regarding unknown incompabilities. See [`vignette("limits")`](https://duckplyr.tidyverse.org/articles/limits.html). |
| 120 | +- Contribute to the codebase after reading duckplyr's contributing guide. |
| 121 | +- Turn on telemetry to help us hear about the most frequent fallbacks so we can prioritize working on the corresponding missing dplyr translation. See [`vignette("telemetry")](https://duckplyr.tidyverse.org/articles/telemetry.html) and the [`duckplyr::fallback_sitrep()`](https://duckplyr.tidyverse.org/reference/fallback.html) function. |
| 122 | + |
| 123 | +## Acknowledgements |
| 124 | + |
| 125 | +A big thanks to all 54 folks who filed issues, created PRs and generally helped to improve duckplyr! |
| 126 | + |
| 127 | +[@adamschwing](https://github.com/adamschwing), [@andreranza](https://github.com/andreranza), [@apalacio9502](https://github.com/apalacio9502), [@apsteinmetz](https://github.com/apsteinmetz), [@barracuda156](https://github.com/barracuda156), [@beniaminogreen](https://github.com/beniaminogreen), [@bob-rietveld](https://github.com/bob-rietveld), [@brichards920](https://github.com/brichards920), [@cboettig](https://github.com/cboettig), [@davidjayjackson](https://github.com/davidjayjackson), [@DavisVaughan](https://github.com/DavisVaughan), [@Ed2uiz](https://github.com/Ed2uiz), [@eitsupi](https://github.com/eitsupi), [@era127](https://github.com/era127), [@etiennebacher](https://github.com/etiennebacher), [@eutwt](https://github.com/eutwt), [@fmichonneau](https://github.com/fmichonneau), [@github-actions[bot]](https://github.com/github-actions[bot]), [@hadley](https://github.com/hadley), [@hannes](https://github.com/hannes), [@hawkfish](https://github.com/hawkfish), [@IndrajeetPatil](https://github.com/IndrajeetPatil), [@JanSulavik](https://github.com/JanSulavik), [@JavOrraca](https://github.com/JavOrraca), [@jeroen](https://github.com/jeroen), [@jhk0530](https://github.com/jhk0530), [@joakimlinde](https://github.com/joakimlinde), [@JosiahParry](https://github.com/JosiahParry), [@krlmlr](https://github.com/krlmlr), [@larry77](https://github.com/larry77), [@lnkuiper](https://github.com/lnkuiper), [@lorenzwalthert](https://github.com/lorenzwalthert), [@luisDVA](https://github.com/luisDVA), [@maelle](https://github.com/maelle), [@math-mcshane](https://github.com/math-mcshane), [@meersel](https://github.com/meersel), [@multimeric](https://github.com/multimeric), [@mytarmail](https://github.com/mytarmail), [@nicki-dese](https://github.com/nicki-dese), [@PMassicotte](https://github.com/PMassicotte), [@prasundutta87](https://github.com/prasundutta87), [@rafapereirabr](https://github.com/rafapereirabr), [@Robinlovelace](https://github.com/Robinlovelace), [@romainfrancois](https://github.com/romainfrancois), [@sparrow925](https://github.com/sparrow925), [@stefanlinner](https://github.com/stefanlinner), [@thomasp85](https://github.com/thomasp85), [@TimTaylor](https://github.com/TimTaylor), [@Tmonster](https://github.com/Tmonster), [@toppyy](https://github.com/toppyy), [@wibeasley](https://github.com/wibeasley), [@yjunechoe](https://github.com/yjunechoe), [@ywhcuhk](https://github.com/ywhcuhk), and [@zhjx19](https://github.com/zhjx19). |
0 commit comments