Skip to content

Improvements to the poorman package

Chun-Yu Chen edited this page Apr 10, 2021 · 5 revisions

Background

{poorman} is a grammar of data manipulation, providing a consistent set of verbs that help you solve the most common data manipulation challenges:

  • select() picks variables based on their names.
  • mutate() adds new variables that are functions of existing variables.
  • filter() picks cases based on their values.
  • summarise() reduces multiple values down to a single summary.
  • arrange() changes the ordering of the rows.

Related work

{poorman} is a package that unapologetically attempts to recreate the {dplyr} API in a dependency free way using only {base} R. {poorman} is still under development and doesn’t have all of {dplyr}’s functionality but what is considered the "core" functionality is included. The idea behind {poorman} is that a user should be able to take a {dplyr} based script and run it using {poorman} without any hiccups.

Details of your coding project

Students will primarily be working on adding the rowwise(), pivot_wider() and pivot_longer() features to {poorman} using a test-driven development process.

All code written for this project should be fully documented using {roxygen2} and fully tested using {tinytest}. Tests should ideally aim to replicate the testing done by {dplyr}, though depending on the implementation it may require custom tests as well.

If the student completes this work in the allotted time, the current set of features still to be developed for {poorman} can be viewed in the features list. It is entirely possible that by the time this project starts, some of these features may have been handled by the package author or other contributors, however it is not expected that all of these features will have been implemented. As this list is always evolving, the student will be expected to agree on a set of features from this list with the mentors near to the start date.

The student should be comfortable with R package development and using version control software, ideally git.

Expected impact

If successful, {poorman} will be closer to its goal of being able to run {dplyr} scripts without the need to have {dplyr} installed. The aim is for the student to contribute enough code to release a new version of {poorman} on CRAN.

Mentors

Tests

  1. Add new columns to the following dataset which equate to the mean, median and the sum of each row.
df <- data.frame(x = runif(6), y = runif(6), z = runif(6))
  1. Using stats::reshape(), change the following dataset to wide format such that there is one row per sample and then reshape it back to the original shape.
df <- data.frame(sample = c(rep(1, 10), rep(2, 10)), test = rep(1:10, 2), values = runif(20))
  1. Using stats::reshape(), reshape the following data to long format such that there are five columns: sex, age, id, exam and score. Then reshape the data back into the original format.
df <- structure(list(
  sex = c(1L, 1L, 0L, 0L, 1L, 1L, 0L, 1L, 0L, 1L),
  age = rep(c(15L, 16L), 5),
  exam1 = c(34L, 47L, 41L, 44L, 47L, 42L, 57L, 61L, 53L, 42L),
  exam2 = c(46L, 54L, 47L, 41L, 65L, 41L, 62L, 59L, 61L, 39L),
  exam3 = c(45L, 49L, 40L, 40L, 60L, 57L, 63L, 49L, 61L, 42L),
  exam4 = c(39L, 53L, 39L, 50L, 50L, 72L, 55L, 44L, 57L, 42L),
  exam5 = c(36L, 61L, 51L, 26L, 56L, 31L, 41L, 66L, 56L, 41L)
), class = "data.frame", row.names = c(NA, -10L))
  1. Using stats::reshape(), reshape the following data to a wide format with the resulting columns: id, min.1, max.1, min.2, max.2. Then reshape the data back to the original shape.
df <- data.frame(id = rep(1:4, each = 2), sample = rep(c(1, 2), 4), min = 1:8, max = 3:10)

Solutions of tests

Students, please post a link to your test results here.

S No. STUDENT NAME GITHUB PROFILE TEST RESULTS LINK
1 Miguel Gutierrez https://github.com/juanmigutierrez https://github.com/juanmigutierrez/Poorman_Test
2 Chun-Yu Chen https://github.com/yo80106 https://github.com/yo80106/GSoC-2021
Clone this wiki locally