recode memisc 0.99.20.1

Recode Items, Factors and Numeric Vectors

Description

recode substitutes old values of a factor or a numeric vector by new ones, just like the recoding facilities in some commercial statistical packages.

Usage

recode(x,...,
       copy=getOption("recode_copy",identical(otherwise,"copy")),
       otherwise=NA)
## S4 method for signature 'vector'
recode(x,...,
    copy=getOption("recode_copy",identical(otherwise,"copy")),
    otherwise=NA)
## S4 method for signature 'factor'
recode(x,...,
    copy=getOption("recode_copy",identical(otherwise,"copy")),
    otherwise=NA)
## S4 method for signature 'item'
recode(x,...,
    copy=getOption("recode_copy",identical(otherwise,"copy")),
    otherwise=NA)

Arguments

x

An object

...

One or more assignment expressions, each of the form new.value <- old.values. new.value should be a scalar numeric value or character string. If one of the new.value``s is a character string, the return value of ``recode will be a factor and each new.value will be coerced to a character string that labels a level of the factor. Each old.value in an assignment expression may be a (numeric or character) vector. If x is numeric such an assignment expression may have the form new.value <- range(lower,upper) In that case, values between lower and upper are exchanged by new.value. If one of the arguments to range is min, it is substituted by the minimum of x. If one of the arguments to range is max, it is substituted by the maximum of x. In case of the method for labelled vectors, the tags of arguments of the form tag = new.value <- old.values will define the labels of the new codes. If the old.values of different assignment expressions overlap, an error will be raised because the recoding is ambigous.

copy

logical; should those values of x not given an explicit new code copied into the resulting vector?

otherwise

a character string or some other value that the result may obtain. If equal to NA or "NA", original codes not given an explicit new code are recoded into NA. If equal to "copy", original codes not given an explicit new code are copied.

Value

A numerical vector, factor or an item object.

Details

recode relies on the lazy evaluation mechanism of R: Arguments are not evaluated until required by the function they are given to. recode does not cause arguments that appear in ... to be evaluated. Instead, recode parses the ... arguments. Therefore, although expressions like 1 <- 1:4 would cause an error action, if evaluated at any place elsewhere in R, they will not cause an error action, if given to recode as an argument. However, a call of the form recode(x,1=1:4), would be a syntax error.

If John Fox’ package “car” is installed, recode will also be callable with the syntax of the recode function of that package.

See also

recode of package “car”.

Examples

x <- as.item(sample(1:6,20,replace=TRUE),
       labels=c( a=1,
                 b=2,
                 c=3,
                 d=4,
                 e=5,
                 f=6))
print(x)
[1] b a a e c f e d d c f a f c f f f f a b
# A recoded version of x is returned
# containing the values 1, 2, 3, which are
# labelled as "A", "B", "C".
recode(x,
 A = 1 <- range(min,2),
 B = 2 <- 3:4,
 C = 3 <- range(5,max), # this last comma is ignored
 )
Item (measurement: nominal, type: integer, length = 20)

 [1:20] A A A C B C C B B B C A C B C C C C A A
# This causes an error action: the sets
# of original values overlap.
try(recode(x,
 A = 1 <- range(min,2),
 B = 2 <- 2:4,
 C = 3 <- range(5,max)
 ))
Error in recode(x, A = 1 <- range(min, 2), B = 2 <- 2:4, C = 3 <- range(5,  :
  recoding request is ambiguous
recode(x,
 A = 1 <- range(min,2),
 B = 2 <- 3:4,
 C = 3 <- range(5,6),
 D = 4 <- 7
 )
Warning in recode(x, A = 1 <- range(min, 2), B = 2 <- 3:4, C = 3 <- range(5,  :
  recoding 4 <- 7 has no consequences

Item (measurement: nominal, type: integer, length = 20)

 [1:20] A A A C B C C B B B C A C B C C C C A A
# This results in an all-missing vector:
recode(x,
 D = 4 <- 7,
 E = 5 <- 8
 )
Warning in recode(x, D = 4 <- 7, E = 5 <- 8) :
  recodings 4 <- 7, 5 <- 8 have no consequences

Item (measurement: nominal, type: integer, length = 20)

 [1:20] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
f <- as.factor(x)
x <- as.integer(x)
recode(x,
 1 <- range(min,2),
 2 <- 3:4,
 3 <- range(5,max)
 )
[1] 1 1 1 3 2 3 3 2 2 2 3 1 3 2 3 3 3 3 1 1
# This causes another error action:
# the third argument is an invalid
# expression for a recoding.
try(recode(x,
 1 <- range(min,2),
 3:4,
 3 <- range(5,max)
 ))
Error in recode(x, 1 <- range(min, 2), 3:4, 3 <- range(5, max)) :
  invalid recoding request
# The new values are character strings,
# therefore a factor is returned.
recode(x,
 "a" <- range(min,2),
 "b" <- 3:4,
 "c" <- range(5,6)
 )
 [1] a a a c b c c b b b c a c b c c c c a a
Levels: a b c
recode(x,
 1 <- 1:3,
 2 <- 4:6
 )
[1] 1 1 1 2 1 2 2 2 2 1 2 1 2 1 2 2 2 2 1 1
recode(x,
 4 <- 7,
 5 <- 8,
 otherwise = "copy"
 )
Warning in recode(x, 4 <- 7, 5 <- 8, otherwise = "copy") :
  recodings 4 <- 7, 5 <- 8 have no consequences
 [1] 2 1 1 5 3 6 5 4 4 3 6 1 6 3 6 6 6 6 1 2
recode(f,
 "A" <- c("a","b"),
 "B" <- c("c","d"),
 otherwise="copy"
 )
 [1] A A A e B f e B B B f A f B f f f f A A
Levels: A B e f
recode(f,
 "A" <- c("a","b"),
 "B" <- c("c","d"),
 otherwise="C"
 )
 [1] A A A C B C C B B B C A C B C C C C A A
Levels: A B C
recode(f,
 "A" <- c("a","b"),
 "B" <- c("c","d")
 )
 [1] A    A    A    <NA> B    <NA> <NA> B    B    B    <NA> A    <NA> B    <NA>
[16] <NA> <NA> <NA> A    A
Levels: A B
DS <- data.set(x=as.item(sample(1:6,20,replace=TRUE),
       labels=c( a=1,
                 b=2,
                 c=3,
                 d=4,
                 e=5,
                 f=6)))
print(DS)
   x
 1 c
 2 b
 3 d
 4 c
 5 e
 6 c
 7 a
 8 c
 9 a
10 b
11 f
12 b
13 e
14 c
15 a
16 d
17 e
18 b
19 b
20 f
DS <- within(DS,{
   xf <- recode(x,
                "a" <- range(min,2),
                "b" <- 3:4,
                "c" <- range(5,6)
                )
   xn <- x@.Data
   xc <- recode(xn,
                "a" <- range(min,2),
                "b" <- 3:4,
                "c" <- range(5,6)
                )
   xc <- as.character(x)
   xcc <- recode(xc,
                 1 <- letters[1:2],
                 2 <- letters[3:4],
                 3 <- letters[5:6]
                 )
})
DS
Data set with 20 observations and 5 variables

   x xf xn xc xcc
 1 c  b  3  c   2
 2 b  a  2  b   1
 3 d  b  4  d   2
 4 c  b  3  c   2
 5 e  c  5  e   3
 6 c  b  3  c   2
 7 a  a  1  a   1
 8 c  b  3  c   2
 9 a  a  1  a   1
10 b  a  2  b   1
11 f  c  6  f   3
12 b  a  2  b   1
13 e  c  5  e   3
14 c  b  3  c   2
15 a  a  1  a   1
16 d  b  4  d   2
17 e  c  5  e   3
18 b  a  2  b   1
19 b  a  2  b   1
20 f  c  6  f   3
DS <- within(DS,{
   xf <- recode(x,
                "a" <- range(min,2),
                "b" <- 3:4,
                "c" <- range(5,6)
                )
   x1 <- recode(x,
                1 <- range(1,2),
                2 <- range(3,4),
                copy=TRUE
                )
   xf1 <- recode(x,
                "A" <- range(1,2),
                "B" <- range(3,4),
                copy=TRUE
                )
})
DS
Data set with 20 observations and 7 variables

   x xf xn xc xcc x1 xf1
 1 c  b  3  c   2  2   B
 2 b  a  2  b   1  1   A
 3 d  b  4  d   2  2   B
 4 c  b  3  c   2  2   B
 5 e  c  5  e   3  5   5
 6 c  b  3  c   2  2   B
 7 a  a  1  a   1  1   A
 8 c  b  3  c   2  2   B
 9 a  a  1  a   1  1   A
10 b  a  2  b   1  1   A
11 f  c  6  f   3  6   6
12 b  a  2  b   1  1   A
13 e  c  5  e   3  5   5
14 c  b  3  c   2  2   B
15 a  a  1  a   1  1   A
16 d  b  4  d   2  2   B
17 e  c  5  e   3  5   5
18 b  a  2  b   1  1   A
19 b  a  2  b   1  1   A
20 f  c  6  f   3  6   6
codebook(DS)
================================================================================

   x

--------------------------------------------------------------------------------

   Storage mode: integer
   Measurement: nominal

   Values and labels    N    Percent

             1   'a'    3   15.0
             2   'b'    5   25.0
             3   'c'    5   25.0
             4   'd'    2   10.0
             5   'e'    3   15.0
             6   'f'    2   10.0

================================================================================

   xf

--------------------------------------------------------------------------------

   Storage mode: integer
   Measurement: nominal

   Values and labels    N    Percent

             1   'a'    8   40.0
             2   'b'    7   35.0
             3   'c'    5   25.0

================================================================================

   xn

--------------------------------------------------------------------------------

   Storage mode: integer
   Measurement: interval

            Min:   1.000
            Max:   6.000
           Mean:   3.150
       Std.Dev.:   1.558
       Skewness:   0.385
       Kurtosis:  -0.955

================================================================================

   xc

--------------------------------------------------------------------------------

   Storage mode: character
   Measurement: nominal

       Min:  a
       Max:  f

================================================================================

   xcc

--------------------------------------------------------------------------------

   Storage mode: integer
   Measurement: nominal

   Values and labels    N    Percent

             1   '1'    8   40.0
             2   '2'    7   35.0
             3   '3'    5   25.0

================================================================================

   x1

--------------------------------------------------------------------------------

   Storage mode: integer
   Measurement: nominal

             Values     N     Percent

       (unlab.val.)    20   100.0

================================================================================

   xf1

--------------------------------------------------------------------------------

   Storage mode: integer
   Measurement: nominal

   Values and labels    N    Percent

    1   'A'             8   40.0
    2   'B'             7   35.0
        (unlab.val.)    5   25.0
DF <- data.frame(x=rep(1:6,4,replace=TRUE))
DF <- within(DF,{
   xf <- recode(x,
                "a" <- range(min,2),
                "b" <- 3:4,
                "c" <- range(5,6)
                )
   x1 <- recode(x,
                1 <- range(1,2),
                2 <- range(3,4),
                copy=TRUE
                )
   xf1 <- recode(x,
                "A" <- range(1,2),
                "B" <- range(3,4),
                copy=TRUE
                )
   xf2 <- recode(x,
                "B" <- range(3,4),
                "A" <- range(1,2),
                copy=TRUE
                )
})
DF
   x xf2 xf1 x1 xf
1  1   A   A  1  a
2  2   A   A  1  a
3  3   B   B  2  b
4  4   B   B  2  b
5  5   5   5  5  c
6  6   6   6  6  c
7  1   A   A  1  a
8  2   A   A  1  a
9  3   B   B  2  b
10 4   B   B  2  b
11 5   5   5  5  c
12 6   6   6  6  c
13 1   A   A  1  a
14 2   A   A  1  a
15 3   B   B  2  b
16 4   B   B  2  b
17 5   5   5  5  c
18 6   6   6  6  c
19 1   A   A  1  a
20 2   A   A  1  a
21 3   B   B  2  b
22 4   B   B  2  b
23 5   5   5  5  c
24 6   6   6  6  c
codebook(DF)
================================================================================

   x

--------------------------------------------------------------------------------

   Storage mode: integer

          Min.:  1.000
       1st Qu.:  2.000
        Median:  3.500
          Mean:  3.500
       3rd Qu.:  5.000
          Max.:  6.000

================================================================================

   xf2

--------------------------------------------------------------------------------

   Storage mode: integer
   Factor with 4 levels

   Levels and labels    N    Percent

               1 'B'    8   33.3
               2 'A'    8   33.3
               3 '5'    4   16.7
               4 '6'    4   16.7

================================================================================

   xf1

--------------------------------------------------------------------------------

   Storage mode: integer
   Factor with 4 levels

   Levels and labels    N    Percent

               1 'A'    8   33.3
               2 'B'    8   33.3
               3 '5'    4   16.7
               4 '6'    4   16.7

================================================================================

   x1

--------------------------------------------------------------------------------

   Storage mode: double

          Min.:  1.000
       1st Qu.:  1.000
        Median:  2.000
          Mean:  2.833
       3rd Qu.:  5.000
          Max.:  6.000

================================================================================

   xf

--------------------------------------------------------------------------------

   Storage mode: integer
   Factor with 3 levels

   Levels and labels    N    Percent

               1 'a'    8   33.3
               2 'b'    8   33.3
               3 'c'    8   33.3