Reshaping data frames: An example with data from the British Election Study

First we load an R data file that contains data from the 2010 British election study. Here we use data from the British Election Study 2010. The data set bes2010feelings-prepost.RData is prepared from the original available at https://www.britishelectionstudy.com/data-object/2010-bes-cross-section/ by removing identifying information and scrambling the data.


load("bes2010feelings-prepost.RData")

names(bes2010flngs_pre)
 [1] "flng.brown"   "flng.cameron" "flng.clegg"   "flng.salmond" "flng.jones"
 [6] "flng.labour"  "flng.cons"    "flng.libdem"  "flng.snp"     "flng.pcym"
[11] "flng.green"   "flng.ukip"    "flng.bnp"     "region"

A sensible way to bring these data into long format would be to have the feelings towards the parties and their leaders as multiple measurements. Therefore we reshape the data in the appropriate long format:


bes2010flngs_pre_long <- reshape(
              within(bes2010flngs_pre,
                     na <- NA),
              varying=list(
                  # Parties
                  c("flng.cons","flng.labour","flng.libdem",
                    "flng.snp","flng.pcym",
                    "flng.green","flng.ukip","flng.bnp"),
                  # Party leaders
                  c("flng.cameron","flng.brown","flng.clegg",
                    "flng.salmond","flng.jones",
                    "na","na","na")
              ),
              v.names=c("flng.parties",
                        "flng.leaders"),
              times=c("Conservative","Labour","LibDem",
                      "SNP","Plaid Cymru",
                      "Green","UKIP","BNP"),
              timevar="party",
              direction="long")
head(bes2010flngs_pre_long,n=14)
                region  party        flng.parties flng.leaders id
1.Conservative  England Conservative 6            3             1
2.Conservative  NA      Conservative 6            7             2
3.Conservative  England Conservative 4            7             3
4.Conservative  England Conservative 6            4             4
5.Conservative  NA      Conservative 4            5             5
6.Conservative  England Conservative 1            0             6
7.Conservative  England Conservative 3            3             7
8.Conservative  England Conservative 3            6             8
9.Conservative  England Conservative 3            2             9
10.Conservative England Conservative 3            2            10
11.Conservative NA      Conservative 6            4            11
12.Conservative England Conservative 3            2            12
13.Conservative England Conservative 0            4            13
14.Conservative England Conservative 5            5            14

The fellowing demostrates the convenience variant of reshape() provided by the memisc package, the function Reshape(). You may need to install this package using install.packages("memisc") from CRAN if you want to run this on your computer. (Package is already installed on the notebook container, however.)


library(memisc)
Loading required package: lattice

Loading required package: MASS


Attaching package: ‘memisc’


The following objects are masked from ‘package:stats’:

    contr.sum, contr.treatment, contrasts


The following object is masked from ‘package:base’:

    as.array


With the Reshape() function the syntax is simpler than with reshape() from the stats package:


bes2010flngs_pre_long <- Reshape(bes2010flngs_pre,
       # Note that "empty" places designate measurement
       # occastions that are to be filled with NAs.
       # In the present case these are measurement
       # feelings about party leaders that were not
       # asked in the BES 2010 questionnaires.
       flng.leaders=c(flng.cameron,flng.brown,
                      flng.clegg,flng.salmond,
                      flng.jones,,,),
       flng.parties=c(flng.cons,flng.labour,
                      flng.libdem,flng.snp,
                      flng.pcym,flng.green,
                      flng.ukip,flng.bnp),
       party=c("Conservative","Labour","LibDem",
               "SNP","Plaid Cymru",
               "Green","UKIP","BNP"),
       direction="long")

In long format the observations are sorted such that the variable that distinguishes measurement occasions (the party variable) changes faster than the variable that distinguishes individuals:


head(bes2010flngs_pre_long)
               region  party        flng.leaders flng.parties id
1.Conservative England Conservative  3            6           1
1.Labour       England Labour        6            5           1
1.LibDem       England LibDem        3            4           1
1.SNP          England SNP          NA           NA           1
1.Plaid Cymru  England Plaid Cymru   5           NA           1
1.Green        England Green        NA            7           1

Like with reshape(), reshaping back from long into wide format takes (almost) the same syntax as reshaping from wide into long format:


bes2010flngs_pre_wide <- Reshape(bes2010flngs_pre_long,
       # Note that "empty" places designate measurement
       # occastions that are to be filled with NAs.
       # In the present case these are measurement
       # feelings about party leaders that were not
       # asked in the BES 2010 questionnaires.
       flng.leaders=c(flng.cameron,flng.brown,
                      flng.clegg,flng.salmond,
                      flng.jones,,,),
       flng.parties=c(flng.cons,flng.labour,
                      flng.libdem,flng.snp,
                      flng.pcym,flng.green,
                      flng.ukip,flng.bnp),
       party=c("Conservative","Labour","LibDem",
               "SNP","Plaid Cymru",
               "Green","UKIP","BNP"),
       direction="wide")

After reshaping into wide format, the variables that correspond to multiple measures of the same variable are grouped together:


head(bes2010flngs_pre_wide)
               region  id flng.cameron flng.cons flng.brown flng.labour
1.Conservative England 1  3            6         6          5
2.Conservative NA      2  7            6         3          1
3.Conservative England 3  7            4         8          3
4.Conservative England 4  4            6         4          6
5.Conservative NA      5  5            4         5          8
6.Conservative England 6  0            1         5          5
               flng.clegg flng.libdem flng.salmond flng.snp flng.jones
1.Conservative 3          4           NA           NA        5
2.Conservative 5          7           NA           NA        3
3.Conservative 4          5           NA           NA       10
4.Conservative 3          5           NA           NA        7
5.Conservative 5          5           NA           NA        5
6.Conservative 4          4           NA           NA        1
               flng.pcym flng.green flng.ukip flng.bnp
1.Conservative NA        7           3        0
2.Conservative NA        6           0        0
3.Conservative NA        5           0        0
4.Conservative NA        5           3        2
5.Conservative NA        4          NA        2
6.Conservative NA        4           0        0

save(bes2010flngs_pre_long,file="bes2010flngs-pre-long.RData")

Downloadable R script and interactive version

Explanation

The link with the “jupyterhub” icon directs you to an interactive Jupyter1 notebook, which runs inside a Docker container2. There are two variants of the interative notebook. One shuts down after 60 seconds and does not require a sign it. The other requires sign in using your ORCID3 credentials, yet shuts down only after 24 hours. (There is no guarantee that such a container persists that long, it may be shut down earlier for maintenance purposes.) After shutdown all data within the container will be reset, i.e. all files created by the user will be deleted.4

Above you see a rendered version of the Jupyter notebook.5

1

For more information about Jupyter see http://jupyter.org. The Jupyter notebooks make use of the IRKernel package.

2

For more information about Docker see https://docs.docker.com/. The container images were created with repo2docker, while containers are run with docker spawner.

3

ORCID is a free service for the authentication of researchers. It also allows to showcase publications and contributions to the academic community such as peer review.. See https://info.orcid.org/what-is-orcid/ for more information.

4

The Jupyter notebooks come with NO WARRANTY whatsoever. They are provided for educational and illustrative purposes only. Do not use them for production work.

5

The notebook is rendered with the help of the nbsphinx extension.