An R equivalent to Stata's 'replace if' in memisc

Version 0.99.31.6 of package "memisc" was recently (3rd March 2023) published on CRAN. One of the new features of this version is the %if% operator which allows to assign values to subsets of observations. To see how it works, consider the following example:

library(memisc) 
Last executed at 5 September 2024 at 17:41:32 GMT+2 in 494ms
Show cell output
Lade nötiges Paket: lattice
Lade nötiges Paket: MASS

Attache Paket: ‘memisc’

Die folgenden Objekte sind maskiert von ‘package:stats’:

contr.sum, contr.treatment, contrasts

Das folgende Objekt ist maskiert ‘package:base’:

as.array

The following objects are masked from ‘package:stats’:

contr.sum, contr.treatment, contrasts

The following object is masked from ‘package:base’:

as.array

x <- 1:7

(y <- 1) %if% (x > 5)
(y <- 2) %if% (x <= 5)
(y <- 3) %if% (x <= 3)

data.frame(y,x,check.names=FALSE)
Last executed at 5 September 2024 at 17:41:32 GMT+2 in 178ms
  y x
1 3 1
2 3 2
3 3 3
4 2 4
5 2 5
6 1 6
7 1 7

I implemented this feature on suggestion from a colleague who missed such a feature for data preparation.

While the %if% operator is supposed to mimic the functionality of Stata's replace if there are two notable differences:

Of course, similar results can be obtained using cases() (from the "memisc" package) or ifelse() from base R,[1] but this syntax should make it easier to translate data preparation scripts from Stata to R.


  1. In fact %if% uses ifelse() internally, albeit with some appropriate length checks. For comparison, see the following example with ifelse(): ↩︎

y <- ifelse(x <= 3,3,
ifelse(x <= 5,2,1))
data.frame(y,x,check.names=FALSE)
Last executed at 5 September 2024 at 17:41:32 GMT+2 in 19ms
  y x
1 3 1
2 3 2
3 3 3
4 2 4
5 2 5
6 1 6
7 1 7

Note that in any variant one has to take into account the order in which the conditions are checked.

y <- cases(x >  5 -> 1,
x > 3 -> 2,
x <= 3 -> 3)

data.frame(y,x,check.names=FALSE)
Last executed at 5 September 2024 at 17:41:32 GMT+2 in 25ms
Warning in cases(1 <- x > 5, 2 <- x > 3, 3 <- x <= 3):
Conditions are not mutually exclusive
  y x
1 3 1
2 3 2
3 3 3
4 2 4
5 2 5
6 1 6
7 1 7