grep(pattern, ...) et al.: Optionally escalate warning on "argument 'pattern' has length > 1 ..." to an error #8
HenrikBengtsson
started this conversation in
Ideas
Replies: 1 comment
-
I've submitted this to BugZilla, cf. PR18577. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Issue
The
grep()
family of functions produces a warning ifpattern
specified more than one regular expression, e.g.I cannot come up with a valid use case where using
length(pattern) != 1
is not a mistake. Because of this, I argue it's a bug. Because warnings might be ignored, or drown in the noise of other warnings, this is almost as problematic as a silent bug. There is a risk that there is code out there that produces incorrect results, without being noticed.I've checked R 4.3.1 and found the following functions with this lax behavior:
In the base package:
agrep()
agrepl()
grep()
grepl()
grepRaw()
regexpr()
gregexpr()
regexec()
gsub()
sub()
There might be more.
Suggestion
For the sprint
As a starter, introduce an "internal" environment variable, e.g.
_R_CHECK_LENGTH_1_PATTERN_
, that can be used to escalate the warning to an error:Long-term
I'd argue that using
length(pattern) > 1
is a bug, similar to howlength(cond) > 1
is a bug inif (cond) { ... }
. For the latter, we started out with_R_CHECK_LENGTH_1_CONDITION_=true
as above in R (>= 3.4.0), but since that was too harsh to enable on CRAN, support for per-package checking was added as:in R (>= 4.0.0). That would only escalate to an error, if the buggy code was part of the package tested, otherwise it remained a warning. This allowed CRAN to have packages fixed one by one without breaking the package ecosystem. Eventually, enough packages were fixed, and they made
_R_CHECK_LENGTH_1_CONDITION_=true
the default. And in R (>= 4.2.0), erroring was the only outcome and_R_CHECK_LENGTH_1_CONDITION_
could be dropped.A similar path forward can be made to fix
length(pattern) > 1
bugs, which means the next step after the above would be to add support for:There is no need to implement the latter during the sprint. Having
_R_CHECK_LENGTH_1_PATTERN_=true
will be a big step forward.Comment: I think it was @kalibera who implemented
_R_CHECK_LENGTH_1_CONDITION_=package:_R_CHECK_PACKAGE_NAME_,verbose
.PS. This proposal was taken from HenrikBengtsson/Wishlist-for-R#122.
Beta Was this translation helpful? Give feedback.
All reactions