Constructing a survey design object from data of the 2016 American Election Study.

The following makes use of the memisc package. You may need to install it from CRAN using the code install.packages("memisc") if you want to run this on your computer. (The package is already installed on the notebook container, however.)


library(memisc)
Loading required package: lattice

Loading required package: MASS


Attaching package: ‘memisc’


The following objects are masked from ‘package:stats’:

    contr.sum, contr.treatment, contrasts


The following object is masked from ‘package:base’:

    as.array


The the code makes used of the data file “anes_timeseries_2016.sav”, which is not included in the supporting material. In order to obtain this data file (and run this notebook successufully), you need to download them from the ANES website for 2016 and upload them to the virtual machine that runs this notebook. To do this,

  1. pull down the “File” menu item and select “Open”
  2. An overview of the folder that contains the notebook opens.
  3. The folder view has a button labelled “Upload”. Use this to upload the file that you downloaded from the ANES website.

Note that the uploaded data will disappear, once you “Quit” the notebook (and the Jupyter instance).


anes_2016_sav <- spss.file("anes_timeseries_2016.sav")
File character set is 'UTF-8'.

Converting character set to the local 'utf-8'.

Loading a subset: Only pre-election waves and only face-to-face interviews


anes_2016_pre_work_ds <- subset(anes_2016_sav,
                                V160501 == 1,
                                select=c(
                                # According to docs, these are the
                                # sample weights for the
                                # face-to-face component
                                pre_w_f2f     = V160101f,
                                # Face-to-face strata
                                strat_f2f     = V160201f,
                                psu_f2f       = V160202f,
                                pre_voted12   = V161005,
                                pre_recall12  = V161006,
                                pre_voted     = V161026,
                                pre_vote      = V161027,
                                pre_intov     = V161030,
                                pre_voteint   = V161031#,
                           ))

library(magrittr) # For the '%<>%' operator

anes_2016_pre_work_ds %<>% within({
    # Setting up recalled votes of 2012
    # Since a "default" value for the remaining conditions
    # is used, we use 'check.xor = FALSE' to avoid warnings.
    recall12 <- cases(
        'Did not vote' = 9 <- pre_voted12  == 2,
        'Obama'        = 1 <- pre_recall12 == 1,
        'Romney'       = 2 <- pre_recall12 == 2,
        'Other'        = 3 <- pre_recall12 == 5,
        'Inap'         = 99 <- TRUE, check.xor = FALSE
    )
    # Early voters
    vote16_1 <- cases(
        'Clinton' = 1 <- pre_voted == 1 & pre_vote == 1,
        'Trump'   = 2 <- pre_voted == 1 & pre_vote == 2,
        'Other'   = 3 <- pre_voted == 1 & pre_vote %in% 3:5,
        'Inap'    = 99 <- TRUE, check.xor = FALSE)
    # Vote intentions
    vote16 <- cases(
        'Clinton' = 1 <- pre_intov == 1 & pre_voteint == 1,
        'Trump'   = 2 <- pre_intov == 1 & pre_voteint == 2,
        'Other'   = 3 <- pre_intov == 1 & pre_voteint %in% 3:6,
        'Will not vote/Not registered' = 8 <- pre_intov %in% c(-1,2),
        'Inap'    = 99 <- TRUE, check.xor = FALSE)
    vote16[] <- ifelse(vote16 == 99 & vote16_1 != 99,
                       vote16_1,
                       vote16)
    measurement(pre_w_f2f) <- "ratio"
})

anes_2016_prevote <- as.data.frame(anes_2016_pre_work_ds)
save(anes_2016_prevote,file="anes-2016-prevote.RData")

#Unweighted crosstable
xtabs(~ vote16 + recall12,
      data=anes_2016_prevote)
                              recall12
vote16                         Obama Romney Other Did not vote Inap
  Clinton                        326     12     2           59    6
  Trump                           29    242     5           70    8
  Other                           30     28     7           16    4
  Will not vote/Not registered    28     41     0          139    5
  Inap                            46     27     2           31   17

The following makes use of the survey package. You may need to install it from CRAN using the code install.packages("survey") if you want to run this on your computer. (The package is already installed on the notebook container, however.)


library(survey)
Loading required package: grid

Loading required package: Matrix

Loading required package: survival


Attaching package: ‘survey’


The following object is masked from ‘package:graphics’:

    dotchart



anes_2016_prevote_desgn <- svydesign(id = ~psu_f2f,
                                     strata = ~strat_f2f,
                                     weights = ~pre_w_f2f,
                                     data = anes_2016_prevote,
                                     nest = TRUE)
anes_2016_prevote_desgn
Stratified 1 - level Cluster Sampling design (with replacement)
With (65) clusters.
svydesign(id = ~psu_f2f, strata = ~strat_f2f, weights = ~pre_w_f2f,
    data = anes_2016_prevote, nest = TRUE)

In order to later make use of the survey design object, we save it into a file.


save(anes_2016_prevote_desgn,file="anes-2016-prevote-desgn.RData")

We reduce the digits after dot …


ops <- options(digits=2)
(tab <- svytable(~ vote16 + recall12,
                 design = anes_2016_prevote_desgn))
                              recall12
vote16                         Obama Romney Other Did not vote  Inap
  Clinton                      316.0   11.7   1.1         69.9   8.6
  Trump                         35.9  228.8   4.2         73.0   5.1
  Other                         34.1   24.4   6.6         13.9   5.3
  Will not vote/Not registered  28.8   41.4   0.0        150.2   4.3
  Inap                          44.8   25.0   1.9         28.3  16.0

and drop counts of non-valid responses before we compute percentages.


percentages(vote16 ~ recall12, data=tab[-6,-5])
                              recall12
vote16                         Obama Romney Other Did not vote
  Clinton                       68.8    3.5   8.0         20.8
  Trump                          7.8   69.1  30.6         21.8
  Other                          7.4    7.4  47.6          4.1
  Will not vote/Not registered   6.3   12.5   0.0         44.8
  Inap                           9.7    7.5  13.9          8.4

options(ops) # To undo the change in the options.

Here we compute a F-test of independence with the table, which uses the Rao-Scott second-order correction with a Satterthwaite approximation of the denominator degrees of freedom is used.


summary(tab)
                              recall12
vote16                         Obama Romney Other Did not vote Inap
  Clinton                        316     12     1           70    9
  Trump                           36    229     4           73    5
  Other                           34     24     7           14    5
  Will not vote/Not registered    29     41     0          150    4
  Inap                            45     25     2           28   16

        Pearson's X^2: Rao & Scott adjustment

data:  svychisq(~vote16 + recall12, design = anes_2016_prevote_desgn,     statistic = "F")
F = 29.235, ndf = 9.3968, ddf = 310.0952, p-value < 2.2e-16

The more conventional Pearson-Chi-squared test adjusted with a design-effect estimate is obtained by a slight modification.


summary(tab, statistic="Chisq")
                              recall12
vote16                         Obama Romney Other Did not vote Inap
  Clinton                        316     12     1           70    9
  Trump                           36    229     4           73    5
  Other                           34     24     7           14    5
  Will not vote/Not registered    29     41     0          150    4
  Inap                            45     25     2           28   16

        Pearson's X^2: Rao & Scott adjustment

data:  svychisq(~vote16 + recall12, design = anes_2016_prevote_desgn,     statistic = "Chisq")
X-squared = 778.41, df = 16, p-value < 2.2e-16

Downloadable R script and interactive version

Explanation

The link with the “jupyterhub” icon directs you to an interactive Jupyter1 notebook, which runs inside a Docker container2. There are two variants of the interative notebook. One shuts down after 60 seconds and does not require a sign it. The other requires sign in using your ORCID3 credentials, yet shuts down only after 24 hours. (There is no guarantee that such a container persists that long, it may be shut down earlier for maintenance purposes.) After shutdown all data within the container will be reset, i.e. all files created by the user will be deleted.4

Above you see a rendered version of the Jupyter notebook.5

1

For more information about Jupyter see http://jupyter.org. The Jupyter notebooks make use of the IRKernel package.

2

For more information about Docker see https://docs.docker.com/. The container images were created with repo2docker, while containers are run with docker spawner.

3

ORCID is a free service for the authentication of researchers. It also allows to showcase publications and contributions to the academic community such as peer review.. See https://info.orcid.org/what-is-orcid/ for more information.

4

The Jupyter notebooks come with NO WARRANTY whatsoever. They are provided for educational and illustrative purposes only. Do not use them for production work.

5

The notebook is rendered with the help of the nbsphinx extension.