Skip to content

Commit dd0dc35

Browse files
committed
Copy changes from gdata repo
1 parent 7794f88 commit dd0dc35

File tree

4 files changed

+532
-19
lines changed

4 files changed

+532
-19
lines changed

.Rbuildignore

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
^.*\.Rproj$ # Automatically added by RStudio,
2+
^\.Rproj\.user$ # used for temporary files.
3+
^README\.Rmd$ # An Rmarkdown file used to generate README.md
4+
^cran-comments\.md$ # Comments for CRAN submission
5+
^NEWS\.md$ # A news file written in Markdown
6+
^\.travis\.yml$ # Used for continuous integration testing with travis
7+
8+
\.aux$
9+
\.tex$
10+
\.bbl$
11+
\.rds$

README.md

Lines changed: 19 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,25 +1,25 @@
11
# Various R Programming Tools for Data Manipulation
22

33
Various R programming tools for data manipulation, including:
4-
- medical unit conversions ('ConvertMedUnits', 'MedUnits'),
5-
- combining objects ('bindData', 'cbindX', 'combine', 'interleave'),
6-
- character vector operations ('centerText', 'startsWith', 'trim'),
7-
- factor manipulation ('levels', 'reorder.factor', 'mapLevels'),
8-
- obtaining information about R objects ('object.size', 'elem', 'env',
9-
'humanReadable', 'is.what', 'll', 'keep', 'ls.funs',
10-
'Args','nPairs', 'nobs'),
11-
- manipulating MS-Excel formatted files ('read.xls',
12-
'installXLSXsupport', 'sheetCount', 'xlsFormats'),
13-
- generating fixed-width format files ('write.fwf'),
14-
- extricating components of date & time objects ('getYear', 'getMonth',
15-
'getDay', 'getHour', 'getMin', 'getSec'),
16-
- operations on columns of data frames ('matchcols', 'rename.vars'),
17-
- matrix operations ('unmatrix', 'upperTriangle', 'lowerTriangle'),
18-
- operations on vectors ('case', 'unknownToNA', 'duplicated2', 'trimSum'),
19-
- operations on data frames ('frameApply', 'wideByFactor'),
20-
- value of last evaluated expression ('ans'), and
21-
- wrapper for 'sample' that ensures consistent behavior for both
22-
scalar and vector arguments ('resample').
4+
- medical unit conversions (`ConvertMedUnits`, `MedUnits`),
5+
- combining objects (`bindData`, `cbindX`, `combine`, `interleave`),
6+
- character vector operations (`centerText`, `startsWith`, `trim`),
7+
- factor manipulation (`levels`, `reorder.factor`, `mapLevels`),
8+
- obtaining information about R objects (`object.size`, `elem`, `env`,
9+
`humanReadable`, `is.what`, `ll`, `keep`, `ls.funs`,
10+
`Args`,`nPairs`, `nobs`),
11+
- manipulating MS-Excel formatted files (`read.xls`,
12+
`installXLSXsupport`, `sheetCount`, `xlsFormats`),
13+
- generating fixed-width format files (`write.fwf`),
14+
- extricating components of date & time objects (`getYear`, `getMonth`,
15+
`getDay`, `getHour`, `getMin`, `getSec`),
16+
- operations on columns of data frames (`matchcols`, `rename.vars`),
17+
- matrix operations (`unmatrix`, `upperTriangle`, `lowerTriangle`),
18+
- operations on vectors (`case`, `unknownToNA`, `duplicated2`, `trimSum`),
19+
- operations on data frames (`frameApply`, `wideByFactor`),
20+
- value of last evaluated expression (`ans`), and
21+
- wrapper for `sample` that ensures consistent behavior for both
22+
scalar and vector arguments (`resample`).
2323

2424
Authors: Gregory R. Warnes, Ben Bolker, Gregor Gorjanc, Gabor
2525
Grothendieck, Ales Korosec, Thomas Lumley, Don MacQueen, Arni

inst/doc/mapLevels.Rnw

Lines changed: 230 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,230 @@
1+
2+
%\VignetteIndexEntry{Mapping levels of a factor}
3+
%\VignettePackage{gdata}
4+
%\VignetteKeywords{levels, factor, manip}
5+
6+
\documentclass[a4paper]{report}
7+
\usepackage{Rnews}
8+
\usepackage[round]{natbib}
9+
\bibliographystyle{abbrvnat}
10+
11+
\usepackage{Sweave}
12+
\SweaveOpts{strip.white=all, keep.source=TRUE}
13+
14+
\begin{document}
15+
\SweaveOpts{concordance=TRUE}
16+
17+
\begin{article}
18+
19+
\title{Mapping levels of a factor}
20+
\subtitle{The \pkg{gdata} package}
21+
\author{by Gregor Gorjanc}
22+
23+
\maketitle
24+
25+
\section{Introduction}
26+
27+
Factors use levels attribute to store information on mapping between
28+
internal integer codes and character values i.e. levels. First level is
29+
mapped to internal integer code 1 and so on. Although some users do not
30+
like factors, their use is more efficient in terms of storage than for
31+
character vectors. Additionally, there are many functions in base \R{} that
32+
provide additional value for factors. Sometimes users need to work with
33+
internal integer codes and mapping them back to factor, especially when
34+
interfacing external programs. Mapping information is also of interest if
35+
there are many factors that should have the same set of levels. This note
36+
describes \code{mapLevels} function, which is an utility function for
37+
mapping the levels of a factor in \pkg{gdata} \footnote{from version 2.3.1}
38+
package \citep{WarnesGdata}.
39+
40+
\section{Description with examples}
41+
42+
Function \code{mapLevels()} is an (S3) generic function and works on
43+
\code{factor} and \code{character} atomic classes. It also works on
44+
\code{list} and \code{data.frame} objects with previously mentioned atomic
45+
classes. Function \code{mapLevels} produces a so called ``map'' with names
46+
and values. Names are levels, while values can be internal integer codes or
47+
(possibly other) levels. This will be clarified later on. Class of this
48+
``map'' is \code{levelsMap}, if \code{x} in \code{mapLevels()} was atomic
49+
or \code{listLevelsMap} otherwise - for \code{list} and \code{data.frame}
50+
classes. The following example shows the creation and printout of such a
51+
``map''.
52+
53+
<<ex01>>=
54+
library(gdata)
55+
(fac <- factor(c("B", "A", "Z", "D")))
56+
(map <- mapLevels(x=fac))
57+
@
58+
59+
If we have to work with internal integer codes, we can transform factor to
60+
integer and still get ``back the original factor'' with ``map'' used as
61+
argument in \code{mapLevels<-} function as shown bellow. \code{mapLevels<-}
62+
is also an (S3) generic function and works on same classes as
63+
\code{mapLevels} plus \code{integer} atomic class.
64+
65+
<<ex02>>=
66+
(int <- as.integer(fac))
67+
mapLevels(x=int) <- map
68+
int
69+
identical(fac, int)
70+
@
71+
72+
Internally ``map'' (\code{levelsMap} class) is a \code{list} (see bellow),
73+
but its print method unlists it for ease of inspection. ``Map'' from
74+
example has all components of length 1. This is not mandatory as
75+
\code{mapLevels<-} function is only a wrapper around workhorse function
76+
\code{levels<-} and the later can accept \code{list} with components of
77+
various lengths.
78+
79+
<<ex03>>=
80+
str(map)
81+
@
82+
83+
Although not of primary importance, this ``map'' can also be used to remap
84+
factor levels as shown bellow. Components ``later'' in the map take over
85+
the ``previous'' ones. Since this is not optimal I would rather recommend
86+
other approaches for ``remapping'' the levels of a \code{factor}, say
87+
\code{recode} in \pkg{car} package \citep{FoxCar}.
88+
89+
<<ex04>>=
90+
map[[2]] <- as.integer(c(1, 2))
91+
map
92+
int <- as.integer(fac)
93+
mapLevels(x=int) <- map
94+
int
95+
@
96+
97+
Up to now examples showed ``map'' with internal integer codes for values
98+
and levels for names. I call this integer ``map''. On the other hand
99+
character ``map'' uses levels for values and (possibly other) levels for
100+
names. This feature is a bit odd at first sight, but can be used to easily
101+
unify levels and internal integer codes across several factors. Imagine
102+
you have a factor that is for some reason split into two factors \code{f1}
103+
and \code{f2} and that each factor does not have all levels. This is not
104+
uncommon situation.
105+
106+
<<ex05>>=
107+
(f1 <- factor(c("A", "D", "C")))
108+
(f2 <- factor(c("B", "D", "C")))
109+
@
110+
111+
If we work with this factors, we need to be careful as they do not have the
112+
same set of levels. This can be solved with appropriately specifying
113+
\code{levels} argument in creation of factors i.e. \code{levels=c("A", "B",
114+
"C", "D")} or with proper use of \code{levels<-} function. I say proper
115+
as it is very tempting to use:
116+
117+
<<ex06>>=
118+
fTest <- f1
119+
levels(fTest) <- c("A", "B", "C", "D")
120+
fTest
121+
@
122+
123+
Above example extends set of levels, but also changes level of 2nd and 3rd
124+
element in \code{fTest}! Proper use of \code{levels<-} (as shown in
125+
\code{levels} help page) would be:
126+
127+
<<ex07>>=
128+
fTest <- f1
129+
levels(fTest) <- list(A="A", B="B",
130+
C="C", D="D")
131+
fTest
132+
@
133+
134+
Function \code{mapLevels} with character ``map'' can help us in such
135+
scenarios to unify levels and internal integer codes across several
136+
factors. Again the workhorse under this process is \code{levels<-} function
137+
from base \R{}! Function \code{mapLevels<-} just controls the assignment of
138+
(integer or character) ``map'' to \code{x}. Levels in \code{x} that match
139+
``map'' values (internal integer codes or levels) are changed to ``map''
140+
names (possibly other levels) as shown in \code{levels} help page. Levels
141+
that do not match are converted to \code{NA}. Integer ``map'' can be
142+
applied to \code{integer} or \code{factor}, while character ``map'' can be
143+
applied to \code{character} or \code{factor}. Result of \code{mapLevels<-}
144+
is always a \code{factor} with possibly ``remapped'' levels.
145+
146+
To get one joint character ``map'' for several factors, we need to put
147+
factors in a \code{list} or \code{data.frame} and use arguments
148+
\code{codes=FALSE} and \code{combine=TRUE}. Such map can then be used to
149+
unify levels and internal integer codes.
150+
151+
<<ex08>>=
152+
(bigMap <- mapLevels(x=list(f1, f2),
153+
codes=FALSE,
154+
combine=TRUE))
155+
mapLevels(f1) <- bigMap
156+
mapLevels(f2) <- bigMap
157+
f1
158+
f2
159+
cbind(as.character(f1), as.integer(f1),
160+
as.character(f2), as.integer(f2))
161+
@
162+
163+
If we do not specify \code{combine=TRUE} (which is the default behaviour)
164+
and \code{x} is a \code{list} or \code{data.frame}, \code{mapLevels}
165+
returns ``map'' of class \code{listLevelsMap}. This is internally a
166+
\code{list} of ``maps'' (\code{levelsMap} objects). Both
167+
\code{listLevelsMap} and \code{levelsMap} objects can be passed to
168+
\code{mapLevels<-} for \code{list}/\code{data.frame}. Recycling occurs when
169+
length of \code{listLevelsMap} is not the same as number of
170+
components/columns of a \code{list}/\code{data.frame}.
171+
172+
Additional convenience methods are also implemented to ease the work with
173+
``maps'':
174+
175+
\begin{itemize}
176+
177+
\item \code{is.levelsMap}, \code{is.listLevelsMap}, \code{as.levelsMap} and
178+
\code{as.listLevelsMap} for testing and coercion of user defined
179+
``maps'',
180+
181+
\item \code{"["} for subsetting,
182+
183+
\item \code{c} for combining \code{levelsMap} or \code{listLevelsMap}
184+
objects; argument \code{recursive=TRUE} can be used to coerce
185+
\code{listLevelsMap} to \code{levelsMap}, for example \code{c(llm1, llm2,
186+
recursive=TRUE)} and
187+
188+
\item \code{unique} and \code{sort} for \code{levelsMap}.
189+
190+
\end{itemize}
191+
192+
\section{Summary}
193+
194+
Functions \code{mapLevels} and \code{mapLevels<-} can help users to map
195+
internal integer codes to factor levels and unify levels as well as
196+
internal integer codes among several factors. I welcome any comments or
197+
suggestions.
198+
199+
% \bibliography{refs}
200+
\begin{thebibliography}{1}
201+
\providecommand{\natexlab}[1]{#1}
202+
\providecommand{\url}[1]{\texttt{#1}}
203+
\expandafter\ifx\csname urlstyle\endcsname\relax
204+
\providecommand{\doi}[1]{doi: #1}\else
205+
\providecommand{\doi}{doi: \begingroup \urlstyle{rm}\Url}\fi
206+
207+
\bibitem[Fox(2006)]{FoxCar}
208+
J.~Fox.
209+
\newblock \emph{car: Companion to Applied Regression}, 2006.
210+
\newblock URL \url{http://socserv.socsci.mcmaster.ca/jfox/}.
211+
\newblock R package version 1.1-1.
212+
213+
\bibitem[Warnes(2006)]{WarnesGdata}
214+
G.~R. Warnes.
215+
\newblock \emph{gdata: Various R programming tools for data manipulation},
216+
2006.
217+
\newblock URL
218+
\url{http://cran.r-project.org/src/contrib/Descriptions/gdata.html}.
219+
\newblock R package version 2.3.1. Includes R source code and/or documentation
220+
contributed by Ben Bolker, Gregor Gorjanc and Thomas Lumley.
221+
222+
\end{thebibliography}
223+
224+
\address{Gregor Gorjanc\\
225+
University of Ljubljana, Slovenia\\
226+
\email{gregor.gorjanc@bfro.uni-lj.si}}
227+
228+
\end{article}
229+
230+
\end{document}

0 commit comments

Comments
 (0)