Using the replicate weights provided with CHIS data¶
The following makes use of the survey package. You may need to install it from
CRAN using the code
install.packages("survey") if you want to run this on your computer. (The
package is already installed on the notebook container, however.)
Loading required package: grid Loading required package: Matrix Loading required package: survival Attaching package: 'survey' The following object is masked from 'package:graphics': dotchart
The file “
adult.dta” is downloaded from
and contains the 2005 wave of the California Health Interview Survey.
Redistribution of the data is prohibited, so readers who want to preproduce
the following will need to download their own copy of the data set and upload it to the virtual machine that runs this notebook. To do this,
- pull down the “File” menu item and select “Open”
- An overview of the folder that contains the notebook opens.
- The folder view has a button labelled “Upload”. Use this to upload the file that you downloaded from the website.
Note that the uploaded data will disappear, once you “Quit” the notebook (and the Jupyter instance).
adult_chis <- read.dta("adult.dta", warn.missing.labels=FALSE)
The data set contains 80 set of (raked) replicate weights. They are in the variables named “rakedw1” through “rakedw80”. Raked sampling weights are in “raked0”.
We obtain the column numbers of these variables, making use of our knowledge of the name pattern
repw <- which(names(adult_chis) %in% paste0("rakedw",1:80))
To apecify replicate weights, we call the function
The first argument specifies the variables that will be used for data
weights argument specifies sampling weights, while the
repweights specifies the replicate weights. The
specifies the data frame where the data all come from.
combined.weights= argument is needed here, because the replicate weights
were constructed from sampling weights and “pure” replicate weights. Since we
do not know the way the replicate weights were constructed we have to specify
adult_chis_rd <- svrepdesign(adult_chis[-repw], weights=~rakedw0, repweights=adult_chis[repw], data=adult_chis, combined.weights=TRUE, type="other", scale=1,rscales=1)
svymean() we get the estimated proportions of the various categories of
health insurance status in California 2005, along with standard errors,
multiplying by 100, we get percentages.
mean SE instyp_pUNINSURED 16.1204 0.0027 instyp_pMEDICARE & MEDICAID 4.0544 0.0011 instyp_pMEDICARE & OTHERS 9.5286 0.0010 instyp_pMEDICARE ONLY 2.0639 0.0007 instyp_pMEDICAID 8.5105 0.0018 instyp_pEMPLOYMENT-BASED 51.9316 0.0030 instyp_pPRIVATELY PURCHASED 6.0567 0.0017 instyp_pHEALTHY FAM/OTHER PUBLIC 1.7339 0.0011
svytotal() we obtain estimates of how many people have which health
total SE instyp_pUNINSURED 4253792 72494 instyp_pMEDICARE & MEDICAID 1069871 28764 instyp_pMEDICARE & OTHERS 2514367 25892 instyp_pMEDICARE ONLY 544612 19018 instyp_pMEDICAID 2245709 48474 instyp_pEMPLOYMENT-BASED 13703511 79679 instyp_pPRIVATELY PURCHASED 1598225 45184 instyp_pHEALTHY FAM/OTHER PUBLIC 457527 27854
- R file: survey-replication-weights-CHIS.R
- Rmarkdown file: survey-replication-weights-CHIS.Rmd
- Jupyter notebook file: survey-replication-weights-CHIS.ipynb
- Interactive version of the Jupyter notebook (shuts down after 60s):
- Interactive version of the Jupyter notebook (sign in required):