Constructing a survey design object from data of the 2016 American Election Study.

The following makes use of the memisc package. You may need to install it from CRAN using the code install.packages("memisc") if you want to run this on your computer. (The package is already installed on the notebook container, however.)

library(memisc)
Loading required package: lattice
Loading required package: MASS

Attaching package: 'memisc'

The following objects are masked from 'package:stats':

    contr.sum, contr.treatment, contrasts

The following object is masked from 'package:base':

    as.array

The the code makes used of the data file “anes_timeseries_2016.sav”, which is not included in the supporting material. In order to obtain this data file (and run this notebook successufully), you need to download them from the ANES website for 2016 and upload them to the virtual machine that runs this notebook. To do this,

  1. pull down the “File” menu item and select “Open”
  2. An overview of the folder that contains the notebook opens.
  3. The folder view has a button labelled “Upload”. Use this to upload the file that you downloaded from the ANES website.

Note that the uploaded data will disappear, once you “Quit” the notebook (and the Jupyter instance).

anes_2016_sav <- spss.file("anes_timeseries_2016.sav")
File character set is 'UTF-8'.
Converting character set to the local 'ascii'.

Loading a subset: Only pre-election waves and only face-to-face interviews

anes_2016_pre_work_ds <- subset(anes_2016_sav,
                                V160501 == 1,
                                select=c(
                                # According to docs, these are the
                                # sample weights for the
                                # face-to-face component
                                pre_w_f2f     = V160101f,
                                # Face-to-face strata    
                                strat_f2f     = V160201f,
                                psu_f2f       = V160202f,
                                pre_voted12   = V161005,
                                pre_recall12  = V161006,
                                pre_voted     = V161026,
                                pre_vote      = V161027,
                                pre_intov     = V161030,
                                pre_voteint   = V161031#,
                           ))
library(magrittr) # For the '%<>%' operator
anes_2016_pre_work_ds %<>% within({
    # Setting up recalled votes of 2012
    # Since a "default" value for the remaining conditions
    # is used, we use 'check.xor = FALSE' to avoid warnings.
    recall12 <- cases(
        'Did not vote' = 9 <- pre_voted12  == 2,
        'Obama'        = 1 <- pre_recall12 == 1,
        'Romney'       = 2 <- pre_recall12 == 2,
        'Other'        = 3 <- pre_recall12 == 5,
        'Inap'         = 99 <- TRUE, check.xor = FALSE
    )
    # Early voters 
    vote16_1 <- cases(
        'Clinton' = 1 <- pre_voted == 1 & pre_vote == 1,
        'Trump'   = 2 <- pre_voted == 1 & pre_vote == 2,
        'Other'   = 3 <- pre_voted == 1 & pre_vote %in% 3:5,
        'Inap'    = 99 <- TRUE, check.xor = FALSE)
    # Vote intentions
    vote16 <- cases(
        'Clinton' = 1 <- pre_intov == 1 & pre_voteint == 1,
        'Trump'   = 2 <- pre_intov == 1 & pre_voteint == 2,
        'Other'   = 3 <- pre_intov == 1 & pre_voteint %in% 3:6,
        'Will not vote/Not registered' = 8 <- pre_intov %in% c(-1,2),
        'Inap'    = 99 <- TRUE, check.xor = FALSE)
    vote16[] <- ifelse(vote16 == 99 & vote16_1 != 99,
                       vote16_1,
                       vote16)
    measurement(pre_w_f2f) <- "ratio"
})
anes_2016_prevote <- as.data.frame(anes_2016_pre_work_ds)
save(anes_2016_prevote,file="anes-2016-prevote.RData")
#Unweighted crosstable
xtabs(~ vote16 + recall12,
      data=anes_2016_prevote)
                              recall12
vote16                         Obama Romney Other Did not vote Inap
  Clinton                        326     12     2           59    6
  Trump                           29    242     5           70    8
  Other                           30     28     7           16    4
  Will not vote/Not registered    28     41     0          139    5
  Inap                            46     27     2           31   17

The following makes use of the survey package. You may need to install it from CRAN using the code install.packages("survey") if you want to run this on your computer. (The package is already installed on the notebook container, however.)

library(survey)
Loading required package: grid
Loading required package: Matrix
Loading required package: survival

Attaching package: 'survey'

The following object is masked from 'package:graphics':

    dotchart
anes_2016_prevote_desgn <- svydesign(id = ~psu_f2f,
                                     strata = ~strat_f2f,
                                     weights = ~pre_w_f2f,
                                     data = anes_2016_prevote,
                                     nest = TRUE)
anes_2016_prevote_desgn
Stratified 1 - level Cluster Sampling design (with replacement)
With (65) clusters.
svydesign(id = ~psu_f2f, strata = ~strat_f2f, weights = ~pre_w_f2f, 
    data = anes_2016_prevote, nest = TRUE)

In order to later make use of the survey design object, we save it into a file.

save(anes_2016_prevote_desgn,file="anes-2016-prevote-desgn.RData")

We reduce the digits after dot …

ops <- options(digits=2)
(tab <- svytable(~ vote16 + recall12,
                 design = anes_2016_prevote_desgn))
                              recall12
vote16                         Obama Romney Other Did not vote  Inap
  Clinton                      316.0   11.7   1.1         69.9   8.6
  Trump                         35.9  228.8   4.2         73.0   5.1
  Other                         34.1   24.4   6.6         13.9   5.3
  Will not vote/Not registered  28.8   41.4   0.0        150.2   4.3
  Inap                          44.8   25.0   1.9         28.3  16.0

and drop counts of non-valid responses before we compute percentages.

percentages(vote16 ~ recall12, data=tab[-6,-5])
                              recall12
vote16                         Obama Romney Other Did not vote
  Clinton                       68.8    3.5   8.0         20.8
  Trump                          7.8   69.1  30.6         21.8
  Other                          7.4    7.4  47.6          4.1
  Will not vote/Not registered   6.3   12.5   0.0         44.8
  Inap                           9.7    7.5  13.9          8.4
options(ops) # To undo the change in the options.

Here we compute a F-test of independence with the table, which uses the Rao-Scott second-order correction with a Satterthwaite approximation of the denominator degrees of freedom is used.

summary(tab)
Warning in chisq.test(svytable(formula, design, Ntotal = N), correct = FALSE):
Chi-squared approximation may be incorrect
                              recall12
vote16                         Obama Romney Other Did not vote Inap
  Clinton                        316     12     1           70    9
  Trump                           36    229     4           73    5
  Other                           34     24     7           14    5
  Will not vote/Not registered    29     41     0          150    4
  Inap                            45     25     2           28   16

	Pearson's X^2: Rao & Scott adjustment

data:  svychisq(~vote16 + recall12, design = anes_2016_prevote_desgn,     statistic = "F")
F = 29.235, ndf = 9.3968, ddf = 310.0952, p-value < 2.2e-16

The more conventional Pearson-Chi-squared test adjusted with a design-effect estimate is obtained by a slight modification.

summary(tab, statistic="Chisq")
Warning in chisq.test(svytable(formula, design, Ntotal = N), correct = FALSE):
Chi-squared approximation may be incorrect
                              recall12
vote16                         Obama Romney Other Did not vote Inap
  Clinton                        316     12     1           70    9
  Trump                           36    229     4           73    5
  Other                           34     24     7           14    5
  Will not vote/Not registered    29     41     0          150    4
  Inap                            45     25     2           28   16

	Pearson's X^2: Rao & Scott adjustment

data:  svychisq(~vote16 + recall12, design = anes_2016_prevote_desgn,     statistic = "Chisq")
X-squared = 778.41, df = 16, p-value < 2.2e-16