Yet another operator to simplify data preparation with memisc¶
The recently published version 0.99.31.6 of the memisc package also contains an
%$$%
operator that simplifies routine data preparation steps that hitherto would
involve calls to the function within()
. It is analogous to the operator %$%
,
which is provided by the “magrittr” package, but is also defined by this
package.
These operators are illustrated by the following code examples.
library(magrittr)
library(memisc)
set.seed(42)
Show code cell output
Error in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]) :
there is no package called ‘httpgd’
Error in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]) :
there is no package called ‘httpgd’
Error in library(memisc) : there is no package called ‘memisc’
Here we create a simple example data frame:
df <- data.frame(a = 1:7, x = rnorm(7))
df
a x
1 1 1.3709584
2 2 -0.5646982
3 3 0.3631284
4 4 0.6328626
5 5 0.4042683
6 6 -0.1061245
7 7 1.5115220
The following code creates two new variables b
and x.sq
in the data frame using within()
:
df <- within(df,{
b <- a + 4
x.sq <- x^2
})
df
a x x.sq b
1 1 1.3709584 1.87952706 5
2 2 -0.5646982 0.31888402 6
3 3 0.3631284 0.13186224 7
4 4 0.6328626 0.40051508 8
5 5 0.4042683 0.16343288 9
6 6 -0.1061245 0.01126241 10
7 7 1.5115220 2.28469875 11
This is a bit tedious, because we have to write the name of the data frame
(i.e. “df”) twice. Using the operator %<>%
from the magrittr package one
needs to write the name of the data frame only once:
df %<>% within({
b <- a + 4
x.sq <- x^2
})
df
a x x.sq b
1 1 1.3709584 1.87952706 5
2 2 -0.5646982 0.31888402 6
3 3 0.3631284 0.13186224 7
4 4 0.6328626 0.40051508 8
5 5 0.4042683 0.16343288 9
6 6 -0.1061245 0.01126241 10
7 7 1.5115220 2.28469875 11
The magrittr package defines an operator %$%
that can be used as a shorthand
for with()
:
with(df, mean(x))
[1] 0.5159882
df %$% mean(x)
[1] 0.5159882
Thus it does not seem to be far-fetched to use an analogous shorthand for
within()
- which is defined in the most recent version of memisc:
df[c("b","x.sq")] <- NULL
df %$$% {
b <- a + 4
x.sq <- x^2
}
df
a x b x.sq
1 1 1.3709584 5 1.87952706
2 2 -0.5646982 6 0.31888402
3 3 0.3631284 7 0.13186224
4 4 0.6328626 8 0.40051508
5 5 0.4042683 9 0.16343288
6 6 -0.1061245 10 0.01126241
7 7 1.5115220 11 2.28469875
Beside being shorter than a call to within()
, it results in a data frame (or
data set) in which the variables are ordered by their creation - variables
created frist, appear first in the resulting data frame.