Comparing poststratification, raking, and calibration with ANES data

The following makes use of the packages survey and memisc. You may need to install them from CRAN using the code install.packages(c("survey","memisc")) if you want to run this on your computer. (The packages are already installed on the notebook container, however.)

library(survey)
Loading required package: grid
Loading required package: Matrix
Loading required package: survival

Attaching package: 'survey'

The following object is masked from 'package:graphics':

    dotchart

library(memisc)
Loading required package: lattice
Loading required package: MASS

Attaching package: 'memisc'

The following object is masked from 'package:Matrix':

    as.array

The following objects are masked from 'package:stats':

    contr.sum, contr.treatment, contrasts

The following object is masked from 'package:base':

    as.array

This loads data files created in earlier examples.

load("anes-2016-vprevote-design.RData")
load("anes-2016-prevote-desgn-post.RData")
load("anes-2016-prevote-desgn-rake.RData")
load("anes-2016-prevote-desgn-calib.RData")

Let’s compare the effect of poststratification and raking on the relation between variables.

tab <- svytable(~ vote16 + recall12,
                 design = anes_2016_vprevote_desgn)
percentages(vote16 ~ recall12, data=tab)
         recall12
vote16        Obama    Romney     Other   No vote
  Clinton 76.187161  3.813877  9.241746 22.757762
  Trump    8.644019 74.683608 35.531586 23.783194
  Other    8.228826  7.974947 55.226668  4.516499
  No vote  6.939995 13.527568  0.000000 48.942545

Second, we create a table from the poststatified data.

tab_post <- svytable(~ vote16 + recall12,
                 design = anes_2016_prevote_desgn_post)
percentages(vote16 ~ recall12, data=tab_post)
         recall12
vote16        Obama    Romney     Other   No vote
  Clinton 76.187161  3.813877  9.241746 22.757762
  Trump    8.644019 74.683608 35.531586 23.783194
  Other    8.228826  7.974947 55.226668  4.516499
  No vote  6.939995 13.527568  0.000000 48.942545

Third, we create a table from the raked data.

tab_rak <- svytable(~ vote16 + recall12,
                    design = anes_2016_prevote_desgn_rake)
percentages(vote16 ~ recall12, data=tab_rak)
         recall12
vote16        Obama    Romney     Other   No vote
  Clinton 70.403656  3.152370 12.125475 12.579417
  Trump    8.177831 63.198219 47.727460 13.458918
  Other    4.213195  3.652234 40.147065  1.383226
  No vote 17.205318 29.997177  0.000000 72.578439

Fourth, we create a table from the calibrated data

tab_calib <- svytable(~ vote16 + recall12,
                    design = anes_2016_prevote_desgn_calib)
percentages(vote16 ~ recall12, data=tab_calib)
         recall12
vote16        Obama    Romney     Other   No vote
  Clinton 69.137748  3.114927 11.193203 13.406539
  Trump    8.016145 62.183500 43.631304 14.227998
  Other    3.637356  3.547990 45.175493  1.694680
  No vote 19.208751 31.153583  0.000000 70.670783

Poststratification does not alter percentages that are conditional on the variable used for poststratification. Yet raking does change the conditional percentages.

To examine whether raking affects relations between recalled vote in 2012 and vote in 2016 we compute log-odds ratios:

log.odds <- function(x) log((x[1,1]/x[1,2])/(x[2,1]/x[2,2]))

Log-odds ratios are a way to describe the relation between two dichotomous variables. Like correlations between continuous variables they are not affected by the marginal distribution.

log.odds(tab)
[1] 5.15094
log.odds(tab_post)
[1] 5.15094
log.odds(tab_rak)
[1] 5.15094
log.odds(tab_calib)
[1] 5.148527

Clearly, both poststratfication and raking leaves log-odds ratios unaffected. Calibration has an effect, but this appears to be minor (at least in the present case).

Downloadable R script and interactive version

Explanation

The link with the “jupyterhub” icon directs you to an interactive Jupyter1 notebook, which runs inside a Docker container2. There are two variants of the interative notebook. One shuts down after 60 seconds and does not require a sign it. The other requires sign in using your ORCID3 credentials, yet shuts down only after 24 hours. (There is no guarantee that such a container persists that long, it may be shut down earlier for maintenance purposes.) After shutdown all data within the container will be reset, i.e. all files created by the user will be deleted.4

Above you see a rendered version of the Jupyter notebook.5

1

For more information about Jupyter see http://jupyter.org. The Jupyter notebooks make use of the RKernel package.

2

For more information about Docker see https://docs.docker.com/. The container images are run with docker spawner.

3

ORCID is a free service for the authentication of researchers. It also allows to showcase publications and contributions to the academic community such as peer review.. See https://info.orcid.org/what-is-orcid/ for more information.

4

The Jupyter notebooks come with NO WARRANTY whatsoever. They are provided for educational and illustrative purposes only. Do not use them for production work.

5

The notebook is rendered with the help of the nbsphinx extension.