Writing robust, performant linters

Accumulated knowledge on best practice writing linters

The {lintr} codebase has a lot of accumulated knowledge about how to write robust and fast linters. This Wiki exists as a repository for tidbits on these topics.

It exists as a Wiki to make editing it open to all and with low overhead.

Tips for robustness

When writing a test for logical constants like TRUE or FALSE, if you want the condition to match the shorthands T and F, note that the former is a NUM_CONST while the latter is a SYMBOL (c.f. getParseData(parse(text = "TRUE; T"))).
Keep pipes (%>%, |>) in mind when writing lints based on positional logic (e.g. if it's a lint for the 2nd argument to meet some condition, that will usually become the 1st argument inside a pipe chain).
The magrittr pipe %>% and the "native pipe" |> show up differently on the parse tree: SPECIAL and PIPE, respectively. Note that all infix operators (e.g. %%, %in%, %*%) show up as SPECIAL, so you'll need to test the text() as well for magrittr pipes.
Often, it's better to anchor on EQ_SUB instead of SYMBOL_SUB when writing conditions around named arguments. The latter need not be present in all cases, e.g. in foo("a" = 1), which is valid R code, the parse tree will have a STR_CONST for "a", not a SYMBOL_SUB.
Be wary of * searches like preceding-sibling::*[1]. Are you sure everything counts? One common mistake is to include <COMMENT> nodes here, so the XPath lands on a comment instead of the intended expression. Exclude such comments like preceding-sibling::*[not(self::COMMENT)][1].

Tips for performance

Avoid //* XPaths like the plague! At least in the current {xml2}, it is almost always slower than alternatives. A good example is https://github.com/r-lib/lintr/pull/2025, which shows a 3x speed-up from avoiding //* even though the replacement is a long, inefficient-seeming chain of //A[expr] | //B[expr]-style repetitive expressions.
Similarly, avoid //expr XPaths. See https://github.com/r-lib/lintr/issues/1358 -- more than 1/3 of all nodes are <expr>, so //expr only eliminates a relatively small portion of the parse tree. The more specific a node you can anchor on, the better, but the difference among nodes besides <expr> is not as important, so err on the side of readability/comprehensibility.
If you use //SYMBOL_FUNCTION_CALL as an entry point, use the xml_find_function_calls() helper instead, because it returns cached results much faster, especially when testing for multiple options of text() = 'foo'.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Writing robust, performant linters

Accumulated knowledge on best practice writing linters

Tips for robustness

Tips for performance

Uh oh!

Uh oh!

Clone this wiki locally