Replication files for Elff, Martin, Jan Paul Heisig, Merlin Schaeffer, and Susumu Shikano: “Multilevel Analysis with Few Clusters: Improving Likelihood-based Methods to Provide Unbiased Estimates and Accurate Inference”
In this document, we give a brief overview of the replication files for our article and some general instructions for using them. The individual files include comments providing further details and orientation. To use the replication files, you need a working version of R (see detailed session info below), Stata Version 15.1, and a (demo) version of Mplus 8.2.
Performance of Point Estimates, “Mplus-replication” (Mplus and R)
- Figure 1 in our article and Figure B1 in the online appendix replicate some of Stegmueller’s results on the alleged biasedness of likelihood-based (frequentist) point estimates. The figures further show how these results change when we a) increase the number of Monte Carlo replications from 1,000 to 10,000 and b) use a different random number seed for each experimental condition. Figure 1 shows results for case of a direct context effect (i.e., for an additive linear effect of a contextual predictor). Figure B1 displays results for the case of a cross-level interaction between a contextual and a lower-level variable. The corresponding replication files can be found, respectively, in the subfolders “2_macro_variables (type III IV)” and “3_random_slope (type V VI)” both of which are located in the “Mplus-replication” folder. Each of these two folders contains two subfolders (“Linear_model” and “Probit_model”) for the linear and probit results. Within these folders, there are always three sub-sub-folders: “1000-replications”, “10000-replications”, and “diff seeds”.
Stegmueller conducted his Monte Carlo analysis in Mplus and used a separate Mplus script for each experimental condition. His original replication scripts can be found in the “1000-replications” subfolders in the “Linear_model” and “Probit_model” folders). The “1000-replications” folders contain Stegmueller’s Mplus scripts (with filename extension “.inp”). The “10000-replications” and “diff seeds” folders contain the same scripts, except for the two crucial modifications that we consider in the article. In the scripts located in the “10000-replications” folders, we increased the number of replications to 10,000 by changing the line Nreps = 1000;
to Nreps = 10000;
. In the scripts located in the “diff seeds” folders, we changed the the random number seed by modifying the line Seed = 12345;
and replacing “12345” with a different 5-digit number such as “30377”. The alternative numbers were generated using www.random.org. The Excel file “Mplus_simresults” in the “Mplus-replication” directory lists the random number seeds for all scripts used in the simulations underlying Figures 1 and B1.
To replicate the simulation results you need to run each Mplus script (i.e. each “.inp” file) individually. Each script will create an output (filename extension “.out”) file of the same name. You can run the scripts using a free demo version of Mplus (and of course also using a full version, if you happen to have one). The Excel file “Mplus_simresults.xlsx” in the folder “Mplus-replication” directory contains the results for all experimental conditions.
-
To replicate Figures 1 and B1, which visualize the simulation results listed in “Mplus_simresults.xlsx”, you need to:
2.1 Install the following R packages:
install.packages(c("dplyr", "ggplot2", "stringr", "xlsx"), dep = TRUE)
. Note that, as reproducibility may also depend on the version of R and the various packages used, we included a screenshot of the R session info for our simulations with the replication materials, see the filesessionInfo.png
in theMplus-replication
subfolder.2.2 Open file “ReplicationFiles/Mplus-replication/Plot_Figures1nB1.R”. This R script creates Figure 1 of the main article and Figure B1 in the online appendix. Assign the path
YOURPATH/ReplicationFiles/Mplus-replication/
to the objectloc
, where “YOURPATH” refers to the appropriate directory on your computer. 2.3 Mark all and run.
Performance of Variance and Interval Estimates, “R_Coverage_Replication” (R)
The folder “R_Coverage_Replication” contains replication files for Figures 2 to 4 in the main article as well as Figures B2 to B4 in the online appendix. All simulations underlying these results were implemented in R. They were entirely written by us and not by Stegmueller, although we tried to mimic the data-generating processes used in Stegmueller’s study as closely as possible. It was not possible to use Mplus for these parts of our analysis because the degrees of freedom approximations and REML-like estimation of the multilevel probit models could not be easily implemented in Mplus. Please proceed as follows to replicate the results:
- Install the following R packages:
install.packages(c("arrayhelpers", "boot", "Hmisc", "plyr", "popbio", "stringr", "reshape", "lmerTest", "memisc", "mvtnorm", "doBy", "foreign", "parallel", "car", "sandwich", "zoo", "SDMTools", "dplyr", "hglm"), dep = TRUE)
-
Generate the simulation results.
2.1 Open file “ReplicationFiles/R_Coverage_Replication/RJobs/sim_01_MasterJobs.R”. This R script creates the simulation results and writes them into various sub-folders of the “RSimulations” folder. Before running the script, assign the path
YOURPATH/ReplicationFiles/R_Coverage_Replication/
to the objectloc
, where “YOURPATH” refers to the appropriate directory on your computer. The simulations for the different experimental conditions are executed by a loop at the bottom of the script. This loop runs from 1 to 84 and repeatedly calls theMLMaster
function (which is defined insim_01a_MasterFunction.R
). Each iteration of the loop corresponds to one of the 84 experimental conditions. Thereps
andNcores
arguments of theMLMaster
function specify the number of replications per conditions (which should be set to 5,000 to replicate the results in the paper) and the number of cores to be used in parallel computation. The features of a given experimental condition (number of clusters, with or without cross-level interaction…) are defined by various further arguments of theMLMaster
function (clusts
,cli
…). Theseed
argument specifies the random number seed used to initialize the experimental conditions. For reproducibility, the features of the various experimental conditions as well as the corresponding random number seeds are specified in a series of vectors (all of length 84) that are specified before theMLMaster
function is called. For full reproducibility, it is vital that the order of these vectors is not modified (as reproducibility may also depend on the version of R and the various packages used, we included a screenshot of the R session info for our simulations with the replication materials, see the filesessionInfo.png
in theR_Coverage_Replication
subfolder). Simulating all experimental conditions is quite time-consuming. It took approximately 30 hours to complete the simulations using 40 cores on a Linux server equipped with 8 8-core 2,4 GHz Intel Xeon CPUs. You may therefore want to simulate only a few conditions at a time. This can be controlled via the for loop:for (i in 1:84) {
. For instance, to simulate only the first five conditions, change this line to:for (i in 1:5) {
, to simulate conditions 10 to 12, change this line to:for (i in 10:12) {
, and so on.2.2 Mark all and run.
-
Summarize the simulation results.
3.1 Open file “ReplicationFiles/R_Coverage_Replication/RJobs/sim_04_dataprep.R”. This R script summarizes the various simulation results and combines them into one data frame. Assign the path
YOURPATH/ReplicationFiles/R_Coverage_Replication/
to the objectloc
, where “YOURPATH” refers to the appropriate directory on your computer.3.2 Mark all and run.
-
Visualize the simulation results.
4.1 Install the following R packages:
install.packages(c("stringr", "plyr", "dplyr", "ggplot2", "tidyr"), dep = TRUE)
4.2 Open file “ReplicationFiles/R_Coverage_Replication/RJobs/sim_05_Plot_Figures2toB4.R”. This R script summarizes the various simulation results in one data frame. Assign the path
YOURPATH/ReplicationFiles/R_Coverage_Replication/
to the objectloc
, where “YOURPATH” refers to the appropriate directory on your computer.4.3 Mark all and run.
Empirical Application, “EmpApplication_Replication” (Stata and R)
The folder “EmpApplication_Replication” contains nearly the same Stata and R scripts as Stegmueller’s replication files. The only difference is the additional “Plot_Figure5.R” script. To replicate the results of the empirical application, you need to:
-
Download the Steenbergen & Jone’s Eurobarometer data from: http://dx.doi.org/10.4232/1.10923. The data set is called “ZA2898_v1-0-1.dta” and should be saved directly into the “EmpApplication_Replication” folder. Note that the data set has been revised since the publication of Stegmueller’s study and that the previous versions seem to be no longer available. The only variable affected seems to be the alphanumeric country id variable, however. We are therefore confident that the publicly available data allows us to replicate the original results. We cannot verify this because Stegmueller only provided these results in graphical form, but a comparison of our graph with the corresponding one in Stegmueller confirms that the results are virtually identical.
-
Open Stata and install the “dropmiss” and “parmest” ado-files:
findit dropmiss
andfindit parmest
-
Change the working directory, by typing the following into the command field:
cd "YOURPATH/Replication_Files/EmpApplication_Replication/"
-
Run two do-files to replicate the results of the frequentist (likelihood-based) models, by typing the following into the command field:
run "genData.do" do "Model1.do"
-
Install JAGS version 4.3.0 on your computer: http://mcmc-jags.sourceforge.net/
-
Open R and install the following R packages:
install.packages(c("foreign", "lme4", "rjags", "ggplot2", "stringr", "tidyverse"), dep = TRUE)
. Note that, as reproducibility may also depend on the version of R and the various packages used, we included a screenshot of the R session info for our simulations with the replication materials, see the filesessionInfo.png
in theEmpApplication_Replication
subfolder. -
Open file “ReplicationFiles/EmpApplication_Replication/Model2.R”. Write the path
YOURPATH/ReplicationFiles/EmpApplication_Replication/data.dta
into the objectloc
. -
Mark all and run; go get a coffee, this takes a while …
-
Open file “YOURPATH/ReplicationFiles/EmpApplication_Replication/Plot_Figure5.R”. Write the path
YOURPATH/ReplicationFiles/EmpApplication_Replication/
into the objectloc
. -
Mark all and run.