-
Notifications
You must be signed in to change notification settings - Fork 31
Noninvasive source code formatting
As the tidyverse style guide puts it:
Good coding style is like correct punctuation: you can manage without it, butitsuremakesthingseasiertoread.
Especially in collaborative projects, a consistent coding style helps communicate the code's intent. A style guide helps defining how code should look like before it's accepted in the main development branch. This can be a document, or (much better) a facility that automatically formats the code according to that style guide.
However, existing solutions to pretty-printing R code do slightly too much: any intent the developer might have put in the formatting is usually lost after formatting. This project aims at implementing a formatting solution that only alters the formatting where this is absolutely necessary according to the style guide in effect.
The general idea is to use the information in the parse tree (obtained from utils::getParseData()) to add/remove whitespace and line breaks as dictated by the style guide, but leave everything else untouched. The result will be code formatted according to with minimal differences to the original formatting.
- formatR: Currently powers pretty-printing in knitr. Parses code and uses R's deparsing mechanism to emit formatted source code. Applys base R's notion of source code formatting. Has problems with certain edge cases.
- Google's R formatter: Aims at producing "aesthetically appealing" formatting by using an optimization approach (technical report), requires Python.
- "Reformat code" command in RStudio: Cannot be easily automated, implemented in Java. Applied style is just slightly different from the tidyverse style guide.
- lintr: Only detects style violations, cannot currently fix them.
All of the above solutions operate with a hard-coded notion of style which cannot be changed easily. The code is edited too heavily, or (in the case of lintr) not at all.
What exactly do you want your student to code in the 3-month deadline? What functions? What do they do? Docs? Tests? Vignettes?
Mentors, please explain how this project will produce a useful package for the R community.
Each project needs 2 mentors. One should be an expert R programmer with previous package development experience, and the other can be a domain expert in some other field or application area (optimization, bioinformatics, machine learning, data viz, etc). Ideally one of the two mentors should have previous experience with GSOC (either as a student or mentor).
Several tests that potential students can do to demonstrate their capabilities for this particular project. Ask some hard questions that will give you insight about how the students write code to solve problems. You'll see that the harder the questions that you ask, the easier it will be for you to choose between the students that apply for your project!
Please modify the suggestions below to make them specific for your project.
- Easy: something that any useR should be able to do, e.g. download some existing package listed in the Related Work, and run it on some example data.
- Medium: something a bit more complicated. You can encourage students to write a script or some functions that show their R coding abilities.
- Hard: Can the student write a package with Rd files, tests, and vigettes? If your package interfaces with non-R code, can the student write in that other language?
Students, please post a link to your test results here.