-
Notifications
You must be signed in to change notification settings - Fork 8
Description
On Bugzilla: https://bugs.r-project.org/show_bug.cgi?id=16750
Currently,
R> summary(LETTERS)
Length Class Mode
26 character character
is not very useful. This is also the only information that is presented for character columns in data frames (which occur more frequently since the stringsAsFactors=FALSE change in R 4.0.0), compare:
R> summary(data.frame("character" = LETTERS, "factor" = factor(LETTERS)))
character factor
Length:26 A : 1
Class :character B : 1
Mode :character C : 1
D : 1
E : 1
F : 1
(Other):20
A character vector is not necessarily categorical, so always using the factor-style summary is not ideal either. Furthermore, it should remain clear from the data-frame summary if a variable is a factor or "bare" character.
summary.default() has an argument quantile.type to customize numeric summaries. For character input, the idea would be to have an argument character.proxy = c("nchar", "factor", "none"), say, where "none" corresponds to the current generic summary (for back-compatibility), "nchar" produces a summary based on nchar(object), e.g.
Mode nchar.min nchar.max NA's
character 0 1 2
(where the last element is only included if >0, similar to numeric summaries) and "factor" uses summary.factor (but still includes the Mode as the first element). An alternative would be to have options "nchar2" and "nchar5" to use range() and quantile() summaries, respectively.