Reshaping data frames: An example with data from the British Election Study¶
First we load an R data file that contains data from the 2010 British election study. Here we use data from the British Election Study 2010. The data set bes2010feelings-prepost.RData is prepared from the original available at https://www.britishelectionstudy.com/data-object/2010-bes-cross-section/ by removing identifying information and scrambling the data.
load("bes2010feelings-prepost.RData")
names(bes2010flngs_pre)
[1] "flng.brown" "flng.cameron" "flng.clegg" "flng.salmond" "flng.jones"
[6] "flng.labour" "flng.cons" "flng.libdem" "flng.snp" "flng.pcym"
[11] "flng.green" "flng.ukip" "flng.bnp" "region"
A sensible way to bring these data into long format would be to have the feelings towards the parties and their leaders as multiple measurements. Therefore we reshape the data in the appropriate long format:
bes2010flngs_pre_long <- reshape(
within(bes2010flngs_pre,
na <- NA),
varying=list(
# Parties
c("flng.cons","flng.labour","flng.libdem",
"flng.snp","flng.pcym",
"flng.green","flng.ukip","flng.bnp"),
# Party leaders
c("flng.cameron","flng.brown","flng.clegg",
"flng.salmond","flng.jones",
"na","na","na")
),
v.names=c("flng.parties",
"flng.leaders"),
times=c("Conservative","Labour","LibDem",
"SNP","Plaid Cymru",
"Green","UKIP","BNP"),
timevar="party",
direction="long")
head(bes2010flngs_pre_long,n=14)
region party flng.parties flng.leaders id
1.Conservative England Conservative 6 3 1
2.Conservative <NA> Conservative 6 7 2
3.Conservative England Conservative 4 7 3
4.Conservative England Conservative 6 4 4
5.Conservative <NA> Conservative 4 5 5
6.Conservative England Conservative 1 0 6
7.Conservative England Conservative 3 3 7
8.Conservative England Conservative 3 6 8
9.Conservative England Conservative 3 2 9
10.Conservative England Conservative 3 2 10
11.Conservative <NA> Conservative 6 4 11
12.Conservative England Conservative 3 2 12
13.Conservative England Conservative 0 4 13
14.Conservative England Conservative 5 5 14
The fellowing demostrates the convenience variant of reshape()
provided by the memisc package, the function Reshape()
. You may need to install this package using install.packages("memisc")
from
CRAN if you want to run this on your computer. (Package is already installed on the notebook container, however.)
library(memisc)
Loading required package: lattice
Loading required package: MASS
Attaching package: 'memisc'
The following objects are masked from 'package:stats':
contr.sum, contr.treatment, contrasts
The following object is masked from 'package:base':
as.array
With the Reshape()
function the syntax is simpler than with reshape()
from the stats package:
bes2010flngs_pre_long <- Reshape(bes2010flngs_pre,
# Note that "empty" places designate measurement
# occastions that are to be filled with NAs.
# In the present case these are measurement
# feelings about party leaders that were not
# asked in the BES 2010 questionnaires.
flng.leaders=c(flng.cameron,flng.brown,
flng.clegg,flng.salmond,
flng.jones,,,),
flng.parties=c(flng.cons,flng.labour,
flng.libdem,flng.snp,
flng.pcym,flng.green,
flng.ukip,flng.bnp),
party=c("Conservative","Labour","LibDem",
"SNP","Plaid Cymru",
"Green","UKIP","BNP"),
direction="long")
In long format the observations are sorted such that the variable that distinguishes measurement occasions (the party variable) changes faster than the variable that distinguishes individuals:
head(bes2010flngs_pre_long)
region party flng.leaders flng.parties id
1.Conservative England Conservative 3 6 1
1.Labour England Labour 6 5 1
1.LibDem England LibDem 3 4 1
1.SNP England SNP NA NA 1
1.Plaid Cymru England Plaid Cymru 5 NA 1
1.Green England Green NA 7 1
Like with reshape()
, reshaping back from long into wide format takes (almost) the
same syntax as reshaping from wide into long format:
bes2010flngs_pre_wide <- Reshape(bes2010flngs_pre_long,
# Note that "empty" places designate measurement
# occastions that are to be filled with NAs.
# In the present case these are measurement
# feelings about party leaders that were not
# asked in the BES 2010 questionnaires.
flng.leaders=c(flng.cameron,flng.brown,
flng.clegg,flng.salmond,
flng.jones,,,),
flng.parties=c(flng.cons,flng.labour,
flng.libdem,flng.snp,
flng.pcym,flng.green,
flng.ukip,flng.bnp),
party=c("Conservative","Labour","LibDem",
"SNP","Plaid Cymru",
"Green","UKIP","BNP"),
direction="wide")
After reshaping into wide format, the variables that correspond to multiple measures of the same variable are grouped together:
head(bes2010flngs_pre_wide)
region id flng.cameron flng.cons flng.brown flng.labour
1.Conservative England 1 3 6 6 5
2.Conservative <NA> 2 7 6 3 1
3.Conservative England 3 7 4 8 3
4.Conservative England 4 4 6 4 6
5.Conservative <NA> 5 5 4 5 8
6.Conservative England 6 0 1 5 5
flng.clegg flng.libdem flng.salmond flng.snp flng.jones
1.Conservative 3 4 NA NA 5
2.Conservative 5 7 NA NA 3
3.Conservative 4 5 NA NA 10
4.Conservative 3 5 NA NA 7
5.Conservative 5 5 NA NA 5
6.Conservative 4 4 NA NA 1
flng.pcym flng.green flng.ukip flng.bnp
1.Conservative NA 7 3 0
2.Conservative NA 6 0 0
3.Conservative NA 5 0 0
4.Conservative NA 5 3 2
5.Conservative NA 4 NA 2
6.Conservative NA 4 0 0
save(bes2010flngs_pre_long,file="bes2010flngs-pre-long.RData")
- R file: reshaping-BES.R
- Rmarkdown file: reshaping-BES.Rmd
- Jupyter notebook file: reshaping-BES.ipynb
- Interactive version of the Jupyter notebook (shuts down after 60s):
- Interactive version of the Jupyter notebook (sign in required):