Using the replicate weights provided with CHIS data

The following makes use of the survey package. You may need to install it from CRAN using the code install.packages("survey") if you want to run this on your computer. (The package is already installed on the notebook container, however.)


library(survey)
Loading required package: grid

Loading required package: Matrix

Loading required package: survival


Attaching package: ‘survey’


The following object is masked from ‘package:graphics’:

    dotchart



library(foreign)

The file “adult.dta” is downloaded from http://healthpolicy.ucla.edu/chis/data/Pages/GetCHISData.aspx and contains the 2005 wave of the California Health Interview Survey. Redistribution of the data is prohibited, so readers who want to preproduce the following will need to download their own copy of the data set and upload it to the virtual machine that runs this notebook. To do this,

  1. pull down the “File” menu item and select “Open”
  2. An overview of the folder that contains the notebook opens.
  3. The folder view has a button labelled “Upload”. Use this to upload the file that you downloaded from the website.

Note that the uploaded data will disappear, once you “Quit” the notebook (and the Jupyter instance).


adult_chis <- read.dta("adult.dta",
                       warn.missing.labels=FALSE)

The data set contains 80 set of (raked) replicate weights. They are in the variables named “rakedw1” through “rakedw80”. Raked sampling weights are in “raked0”.

We obtain the column numbers of these variables, making use of our knowledge of the name pattern


repw <- which(names(adult_chis) %in% paste0("rakedw",1:80))

To apecify replicate weights, we call the function svrepdesgin The first argument specifies the variables that will be used for data analysis. The weights argument specifies sampling weights, while the function repweights specifies the replicate weights. The data= argument specifies the data frame where the data all come from. The combined.weights= argument is needed here, because the replicate weights were constructed from sampling weights and “pure” replicate weights. Since we do not know the way the replicate weights were constructed we have to specify type="other".


adult_chis_rd <- svrepdesign(adult_chis[-repw],
                             weights=~rakedw0,
                             repweights=adult_chis[repw],
                             data=adult_chis,
                             combined.weights=TRUE,
                             type="other",
                             scale=1,rscales=1)

With svymean() we get the estimated proportions of the various categories of health insurance status in California 2005, along with standard errors, multiplying by 100, we get percentages.


100*svymean(~instyp_p, design=adult_chis_rd)
                                    mean     SE
instyp_pUNINSURED                16.1204 0.0027
instyp_pMEDICARE & MEDICAID       4.0544 0.0011
instyp_pMEDICARE & OTHERS         9.5286 0.0010
instyp_pMEDICARE ONLY             2.0639 0.0007
instyp_pMEDICAID                  8.5105 0.0018
instyp_pEMPLOYMENT-BASED         51.9316 0.0030
instyp_pPRIVATELY PURCHASED       6.0567 0.0017
instyp_pHEALTHY FAM/OTHER PUBLIC  1.7339 0.0011

With svytotal() we obtain estimates of how many people have which health insurance status.


svytotal(~instyp_p, design=adult_chis_rd)
                                    total    SE
instyp_pUNINSURED                 4253792 72494
instyp_pMEDICARE & MEDICAID       1069871 28764
instyp_pMEDICARE & OTHERS         2514367 25892
instyp_pMEDICARE ONLY              544612 19018
instyp_pMEDICAID                  2245709 48474
instyp_pEMPLOYMENT-BASED         13703511 79679
instyp_pPRIVATELY PURCHASED       1598225 45184
instyp_pHEALTHY FAM/OTHER PUBLIC   457527 27854

Downloadable R script and interactive version

Explanation

The link with the “jupyterhub” icon directs you to an interactive Jupyter1 notebook, which runs inside a Docker container2. There are two variants of the interative notebook. One shuts down after 60 seconds and does not require a sign it. The other requires sign in using your ORCID3 credentials, yet shuts down only after 24 hours. (There is no guarantee that such a container persists that long, it may be shut down earlier for maintenance purposes.) After shutdown all data within the container will be reset, i.e. all files created by the user will be deleted.4

Above you see a rendered version of the Jupyter notebook.5

1

For more information about Jupyter see http://jupyter.org. The Jupyter notebooks make use of the IRKernel package.

2

For more information about Docker see https://docs.docker.com/. The container images were created with repo2docker, while containers are run with docker spawner.

3

ORCID is a free service for the authentication of researchers. It also allows to showcase publications and contributions to the academic community such as peer review.. See https://info.orcid.org/what-is-orcid/ for more information.

4

The Jupyter notebooks come with NO WARRANTY whatsoever. They are provided for educational and illustrative purposes only. Do not use them for production work.

5

The notebook is rendered with the help of the nbsphinx extension.