Yet another operator to simplify data preparation with memisc

The recently published version 0.99.31.6 of the memisc package also contains an %$$% operator that simplifies routine data preparation steps that hitherto would involve calls to the function within(). It is analogous to the operator %$%, which is provided by the “magrittr” package, but is also defined by this package.

These operators are illustrated by the following code examples.

library(magrittr)
library(memisc) 
set.seed(42)
Hide code cell output
Error in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]) : 
  there is no package called ‘httpgd’
Error in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]) : 
  there is no package called ‘httpgd’
Error in library(memisc) : there is no package called ‘memisc’

Here we create a simple example data frame:

df <- data.frame(a = 1:7, x = rnorm(7))
df
  a          x
1 1  1.3709584
2 2 -0.5646982
3 3  0.3631284
4 4  0.6328626
5 5  0.4042683
6 6 -0.1061245
7 7  1.5115220

The following code creates two new variables b and x.sq in the data frame using within():

df <- within(df,{
    b <- a + 4
    x.sq <- x^2
})
df
  a          x       x.sq  b
1 1  1.3709584 1.87952706  5
2 2 -0.5646982 0.31888402  6
3 3  0.3631284 0.13186224  7
4 4  0.6328626 0.40051508  8
5 5  0.4042683 0.16343288  9
6 6 -0.1061245 0.01126241 10
7 7  1.5115220 2.28469875 11

This is a bit tedious, because we have to write the name of the data frame (i.e. “df”) twice. Using the operator %<>% from the magrittr package one needs to write the name of the data frame only once:

df %<>% within({
    b <- a + 4
    x.sq <- x^2
})
df
  a          x       x.sq  b
1 1  1.3709584 1.87952706  5
2 2 -0.5646982 0.31888402  6
3 3  0.3631284 0.13186224  7
4 4  0.6328626 0.40051508  8
5 5  0.4042683 0.16343288  9
6 6 -0.1061245 0.01126241 10
7 7  1.5115220 2.28469875 11

The magrittr package defines an operator %$% that can be used as a shorthand for with():

with(df, mean(x))
[1] 0.5159882
df %$% mean(x)
[1] 0.5159882

Thus it does not seem to be far-fetched to use an analogous shorthand for within() - which is defined in the most recent version of memisc:

df[c("b","x.sq")] <- NULL

df %$$% {
    b <- a + 4
    x.sq <- x^2
}
df
  a          x  b       x.sq
1 1  1.3709584  5 1.87952706
2 2 -0.5646982  6 0.31888402
3 3  0.3631284  7 0.13186224
4 4  0.6328626  8 0.40051508
5 5  0.4042683  9 0.16343288
6 6 -0.1061245 10 0.01126241
7 7  1.5115220 11 2.28469875

Beside being shorter than a call to within(), it results in a data frame (or data set) in which the variables are ordered by their creation - variables created frist, appear first in the resulting data frame.