Skip to content

Bug 16750: improve summary.default() for character vectorsΒ #117

@bastistician

Description

@bastistician

On Bugzilla: https://bugs.r-project.org/show_bug.cgi?id=16750
Currently,

R> summary(LETTERS)
   Length     Class      Mode 
       26 character character 

is not very useful. This is also the only information that is presented for character columns in data frames (which occur more frequently since the stringsAsFactors=FALSE change in R 4.0.0), compare:

R> summary(data.frame("character" = LETTERS, "factor" = factor(LETTERS)))
  character             factor  
 Length:26          A      : 1  
 Class :character   B      : 1  
 Mode  :character   C      : 1  
                    D      : 1  
                    E      : 1  
                    F      : 1  
                    (Other):20  

A character vector is not necessarily categorical, so always using the factor-style summary is not ideal either. Furthermore, it should remain clear from the data-frame summary if a variable is a factor or "bare" character.

summary.default() has an argument quantile.type to customize numeric summaries. For character input, the idea would be to have an argument character.proxy = c("nchar", "factor", "none"), say, where "none" corresponds to the current generic summary (for back-compatibility), "nchar" produces a summary based on nchar(object), e.g.

       Mode   nchar.min    nchar.max    NA's
  character           0            1       2

(where the last element is only included if >0, similar to numeric summaries) and "factor" uses summary.factor (but still includes the Mode as the first element). An alternative would be to have options "nchar2" and "nchar5" to use range() and quantile() summaries, respectively.

Metadata

Metadata

Assignees

No one assigned

    Labels

    MiscIssues that cannot be classified otherwiseRIssue should require knowledge of R onlyRSECon25needs analysisTrack down the cause of the bug, or identify as not a bug

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions