Poststratification of 2016 American National Election Study data

The following data file was created in an earlier script / notebook.


There must not be any missing values in the stratifying variables.

anes_2016_vprevote <- subset(anes_2016_prevote,
                                    vote16 != "Inap" &
                                    recall12 != "Inap"

In order to make poststratification possible, we need to make sure that the levels of the stratification variables match the population information. Therefore we relabel the variables “recall12” and “vote16”.

The following makes use of the memisc package. You may need to install it from CRAN using the code install.packages("memisc") if you want to run this on your computer. (The package is already installed on the notebook container, however.)

Loading required package: lattice

Loading required package: MASS

Attaching package: ‘memisc’

The following objects are masked from ‘package:stats’:

    contr.sum, contr.treatment, contrasts

The following object is masked from ‘package:base’:


anes_2016_vprevote <- within(anes_2016_vprevote,{
    recall12 <- recall12[,drop=TRUE]
    vote16 <- vote16[,drop=TRUE]
    recall12 <- relabel(recall12,"Did not vote"="No vote")
    vote16 <- relabel(vote16,
                      "Will not vote/Not registered"="No vote")

Finally, we set up a survey design object. The following makes use of the survey package. You may need to install it from CRAN using the code install.packages("survey") if you want to run this on your computer. (The package is already installed on the notebook container, however.)

Loading required package: grid

Loading required package: Matrix

Loading required package: survival

Attaching package: ‘survey’

The following object is masked from ‘package:graphics’:


anes_2016_vprevote_desgn <- svydesign(id = ~psu_f2f,
                                       strata = ~strat_f2f,
                                       weights = ~pre_w_f2f,
                                       data = anes_2016_vprevote,
                                       nest = TRUE)

We collect the electoral results of 2012 to prepare poststratification.

result.2012 = c(Obama  = 65915795,
                Romney = 60933504,
                # Other candidates are combined
                Other = sum(c(
                    Johson = 1275971,
                    Stein  =  469627,
                    Others =  490510

The number of non-voters is computed from the sum of the results and census data on the population in voting age.

result.2012 <- c(result.2012,
                 "No vote" = 235248000 - sum(result.2012))

# Here we collect the results for 2016
result.2016 <- c(Clinton = 65853514,
                 Trump   = 62984828,
                 Other   = sum(c(
                    Johnson  = 4489341,
                    Stein    = 1457218,
                    McMullin =  731991,
                    Others   = 1154084

result.2016 <- c(result.2016,
                 "No vote" = 250056000 - sum(result.2016))

The poststratification function expects population data to be in the form of data frames:

pop.vote16 <- data.frame(

pop.recall12 <- data.frame(


We poststratify the sample design object by recalled vote in 2012

anes_2016_prevote_desgn_post <- postStratify(

We compare the estimated percentages of 2012 votes:

                   mean     SE
recall12Obama   39.8844 0.0233
recall12Romney  29.4551 0.0198
recall12Other    1.1429 0.0035
recall12No vote 29.5176 0.0222

                    mean SE
recall12Obama   28.01970  0
recall12Romney  25.90182  0
recall12Other    0.95053  0
recall12No vote 45.12795  0

As should be expected, post-stratification eliminates the uncertainty about 2012 votes. It also corrects for turnout overreporting.

We now compare the estimated percentages of 2016 votes

                 mean     SE
vote16Clinton 38.3334 0.0291
vote16Trump   32.8720 0.0222
vote16Other    7.5954 0.0104
vote16No vote 21.1992 0.0200

                 mean     SE
vote16Clinton 32.6932 0.0222
vote16Trump   32.8370 0.0152
vote16Other    6.9345 0.0101
vote16No vote 27.5352 0.0228

The percentages of Clinton voters and Trump voters are closer after poststratification.

We save the poststratified data for later use.


Downloadable R script and interactive version


The link with the “jupyterhub” icon directs you to an interactive Jupyter1 notebook, which runs inside a Docker container2. There are two variants of the interative notebook. One shuts down after 60 seconds and does not require a sign it. The other requires sign in using your ORCID3 credentials, yet shuts down only after 24 hours. (There is no guarantee that such a container persists that long, it may be shut down earlier for maintenance purposes.) After shutdown all data within the container will be reset, i.e. all files created by the user will be deleted.4

Above you see a rendered version of the Jupyter notebook.5


For more information about Jupyter see The Jupyter notebooks make use of the IRKernel package.


For more information about Docker see The container images were created with repo2docker, while containers are run with docker spawner.


ORCID is a free service for the authentication of researchers. It also allows to showcase publications and contributions to the academic community such as peer review.. See for more information.


The Jupyter notebooks come with NO WARRANTY whatsoever. They are provided for educational and illustrative purposes only. Do not use them for production work.


The notebook is rendered with the help of the nbsphinx extension.