Skip to content

Noninvasive source code formatting

Kirill Müller edited this page Mar 17, 2017 · 21 revisions

Background

As the tidyverse style guide puts it:

Good coding style is like correct punctuation: you can manage without it, butitsuremakesthingseasiertoread.

Especially in collaborative projects, a consistent coding style helps communicate the code's intent. A style guide helps defining how code should look like before it's accepted in the main development branch. This can be a document, or (much more comfortable) a facility that automatically formats the code according to that style guide.

However, existing solutions to pretty-printing R code do slightly too much: any intent the developer might have put in the formatting is usually lost after pretty-printing. This project aims at implementing a pretty-printing solution that only alters the formatting where this is absolutely necessary according to the style guide in effect.

The general idea is to use the information in the parse tree (obtained from utils::getParseData()) to add/remove whitespace and line breaks as dictated by the style guide, but leave everything else untouched. The result will be code formatted according to the style guide with minimal differences to the original formatting.

Related work

  • formatR: Currently powers pretty-printing in knitr. Parses code and uses R's deparsing mechanism to emit formatted source code. Applys base R's notion of source code formatting. Has problems with certain edge cases.
  • Google's R formatter: Aims at producing "aesthetically appealing" formatting by using an optimization approach (technical report), requires Python.
  • "Reformat code" command in RStudio: Cannot be easily automated, implemented in Java. Applied style is just slightly different from the tidyverse style guide.
  • lintr: Only detects style violations, cannot currently fix them.

All of the above solutions operate with a hard-coded notion of style which cannot be changed easily. The code is edited too heavily, or (in the case of lintr) not at all.

Details of your coding project

A proof of concept is able to add/remove whitespace around operators, but currently cannot fix broken indentation. The project will aim at getting this draft ready for production, and show its utility by reformatting several existing mid-size R packages and analysis scripts. The package tests should be enhanced to cover all important use cases; test-driven development looks like a good strategy for this project anyway.

The package will support formatting entire R packages, formatting entire R source trees, and checking consistent formatting. Development will focus primarily on implementing the tidyverse style guide, but these rules should be implemented in a way that allows for replacement with rules that support other coding conventions.

All functions will be documented, and a short vignette will demonstrate how to use the package.

Further extensions are possible:

  • Support for a second, third, ... style guide
  • A Shiny application that allows previewing formatted source code
  • Configuration: Selection of a particular style guide for a codebase
  • Support for scripts with with wide characters (requiring two or more columns in a fixed-width font) or zero-width characters
  • IDE integration (RStudio, emacs-ess, ...)

Expected impact

The resulting package is a tool that both package authors and R users can benefit from, especially in collaborative settings. Package authors gain a simple tool to define how their code should look like, and check/enforce this style. R users can effortlessly format their scripts with a style guide of their choice, this will simplify future understanding of the scripts.

Mentors

Each project needs 2 mentors. One should be an expert R programmer with previous package development experience, and the other can be a domain expert in some other field or application area (optimization, bioinformatics, machine learning, data viz, etc). Ideally one of the two mentors should have previous experience with GSOC (either as a student or mentor).

  1. Kirill Müller has a computer science background and has been using R since 2012. He maintains and has developed several CRAN packages and is an active contributor to the tidyverse.
  2. Yihui Xie has authored knitr, bookdown, formatR, highr, shiny, and a great many other R packages.

Tests

Several tests that potential students can do to demonstrate their capabilities for this particular project. Ask some hard questions that will give you insight about how the students write code to solve problems. You'll see that the harder the questions that you ask, the easier it will be for you to choose between the students that apply for your project!

Please modify the suggestions below to make them specific for your project.

  • Easy: something that any useR should be able to do, e.g. download some existing package listed in the Related Work, and run it on some example data.
  • Medium: something a bit more complicated. You can encourage students to write a script or some functions that show their R coding abilities.
  • Hard: Can the student write a package with Rd files, tests, and vigettes? If your package interfaces with non-R code, can the student write in that other language?

Solutions of tests

Students, please post a link to your test results here.

Clone this wiki locally