An R equivalent to Stata’s ‘replace if’ in memisc

Version 0.99.31.6 of package “memisc” was recently (3rd March 2023) published on CRAN. One of the new features of this version is the %if% operator which allows to assign values to subsets of observations. To see how it works, consider the following example:

library(memisc) 
Hide code cell output
Loading required package: lattice
Loading required package: MASS

Attaching package: ‘memisc’

The following objects are masked from ‘package:stats’:

    contr.sum, contr.treatment, contrasts

The following object is masked from ‘package:base’:

    as.array

x <- 1:7

(y <- 1) %if% (x > 5)
(y <- 2) %if% (x <= 5)
(y <- 3) %if% (x <= 3)

data.frame(y,x,check.names=FALSE)
  y x
1 3 1
2 3 2
3 3 3
4 2 4
5 2 5
6 1 6
7 1 7

I implemented this feature on suggestion from a colleague who missed such a feature for data preparation.

While the %if% operator is supposed to mimic the functionality of Stata’s replace if there are two notable differences:

  • The variable to which values are assigned to does not have to exist prior to the assignment. In this case, the newly created vector variable will have the same length as the logical vector that contains the logical condition for the assignment. Its elements will have a missing value (i.e. NA) when the assignment condition is FALSE, otherwhise its elements will equal the right hand side of the assignment.
  • The right-hand side of the assignment does not have to have the same length as the left-hand side, alternatively it can have as many elements as the instances that the condition vector is TRUE, or a single element which is recycled to the appropriate length.
  • Due to the fixed operator precedence in R, both the assignment operation and the logical condition need to be put between parentheses.

Of course, similar results can be obtained using cases() (from the “memisc” package) or ifelse() from base R,1 but this syntax should make it easier to translate data preparation scripts from Stata to R.

For comparison, see the following example with ifelse():

y <- ifelse(x <= 3,3,
            ifelse(x <= 5,2,1))
data.frame(y,x,check.names=FALSE)                           
  y x
1 3 1
2 3 2
3 3 3
4 2 4
5 2 5
6 1 6
7 1 7

Note that in any variant one has to take into account the order in which the conditions are checked.

y <- cases(x >  5 -> 1,
           x >  3 -> 2,
           x <= 3 -> 3)

data.frame(y,x,check.names=FALSE)
Warning message:
In cases(1 <- x > 5, 2 <- x > 3, 3 <- x <= 3) :
  Conditions are not mutually exclusive
  y x
1 3 1
2 3 2
3 3 3
4 2 4
5 2 5
6 1 6
7 1 7

1

In fact %if% uses ifelse() internally, albeit with some appropriate length checks.