-
Notifications
You must be signed in to change notification settings - Fork 6
Improvements to the poorman package
{poorman} is a grammar of data manipulation, providing a consistent set of verbs that help you solve the most common data manipulation challenges:
-
select()picks variables based on their names. -
mutate()adds new variables that are functions of existing variables. -
filter()picks cases based on their values. -
summarise()reduces multiple values down to a single summary. -
arrange()changes the ordering of the rows.
{poorman} is a package that unapologetically attempts to recreate the {dplyr} API in a dependency free way using only {base} R. {poorman} is still under development and doesn’t have all of {dplyr}’s functionality but what is considered the "core" functionality is included. The idea behind {poorman} is that a user should be able to take a {dplyr} based script and run it using {poorman} without any hiccups.
Students will primarily be working on adding the rowwise(), pivot_wider() and pivot_longer() features to {poorman} using a test-driven development process.
All code written for this project should be fully documented using {roxygen2} and fully tested using {tinytest}. Tests should ideally aim to replicate the testing done by {dplyr}, though depending on the implementation it may require custom tests as well.
If the student completes this work in the allotted time, the current set of features still to be developed for {poorman} can be viewed in the features list. It is entirely possible that by the time this project starts, some of these features may have been handled by the package author or other contributors, however it is not expected that all of these features will have been implemented. As this list is always evolving, the student will be expected to agree on a set of features from this list with the mentors near to the start date.
The student should be comfortable with R package development and using version control software, ideally git.
If successful, {poorman} will be closer to its goal of being able to run {dplyr} scripts without the need to have {dplyr} installed. The aim is for the student to contribute enough code to release a new version of {poorman} on CRAN.
- EVALUATING MENTOR: Nathan Eastwood, [email protected]. Nathan is the author of {poorman} and has been coding in R since 2009.
- Co-mentor: Justin Shea, [email protected]
- Add new columns to the following dataset which equate to the mean, median and the sum of each row.
df <- data.frame(x = runif(6), y = runif(6), z = runif(6))- Using
stats::reshape(), change the following dataset to wide format such that there is one row per sample and then reshape it back to the original shape.
df <- data.frame(sample = c(rep(1, 10), rep(2, 10)), test = rep(1:10, 2), values = runif(20))- Using
stats::reshape(), reshape the following data to long format such that there are five columns:sex,age,id,examandscore. Then reshape the data back into the original format.
df <- structure(list(
sex = c(1L, 1L, 0L, 0L, 1L, 1L, 0L, 1L, 0L, 1L),
age = rep(c(15L, 16L), 5),
exam1 = c(34L, 47L, 41L, 44L, 47L, 42L, 57L, 61L, 53L, 42L),
exam2 = c(46L, 54L, 47L, 41L, 65L, 41L, 62L, 59L, 61L, 39L),
exam3 = c(45L, 49L, 40L, 40L, 60L, 57L, 63L, 49L, 61L, 42L),
exam4 = c(39L, 53L, 39L, 50L, 50L, 72L, 55L, 44L, 57L, 42L),
exam5 = c(36L, 61L, 51L, 26L, 56L, 31L, 41L, 66L, 56L, 41L)
), class = "data.frame", row.names = c(NA, -10L))- Using
stats::reshape(), reshape the following data to a wide format with the resulting columns:id,min.1,max.1,min.2,max.2. Then reshape the data back to the original shape.
df <- data.frame(id = rep(1:4, each = 2), sample = rep(c(1, 2), 4), min = 1:8, max = 3:10)Students, please post a link to your test results here.
| S No. | STUDENT NAME | GITHUB PROFILE | TEST RESULTS LINK |
|---|---|---|---|
| 1 | Miguel Gutierrez | https://github.com/juanmigutierrez | https://github.com/juanmigutierrez/Poorman_Test |
| 2 | Chun-Yu Chen | https://github.com/yo80106 | https://github.com/yo80106/GSoC-2021 |