Groupwise computations within data frames

Here we use data from the British Election Study 2010. The data set bes2010feelings-pre-long.RData is prepared from the original available at https://www.britishelectionstudy.com/data-object/2010-bes-cross-section/ by removing identifying information and scrambling the data.

load("bes2010feelings-pre-long.RData")

Groupwise computations using split():

bes2010flngs_pre_long.splt <- split(bes2010flngs_pre_long,
                                    bes2010flngs_pre_long$id)

str(bes2010flngs_pre_long.splt[[1]])
'data.frame':	8 obs. of  5 variables:
 $ region      : Factor w/ 3 levels "England","Scotland",..: 1 1 1 1 1 1 1 1
 $ party       : Factor w/ 8 levels "Conservative",..: 1 2 3 4 5 6 7 8
 $ flng.leaders: num  3 6 3 NA 5 NA NA NA
 $ flng.parties: num  6 5 4 NA NA 7 3 0
 $ id          : int  1 1 1 1 1 1 1 1
 - attr(*, "reshapeLong")=List of 4
  ..$ varying:List of 2
  .. ..$ flng.leaders: chr [1:8] "flng.cameron" "flng.brown" "flng.clegg" "flng.salmond" ...
  .. ..$ flng.parties: chr [1:8] "flng.cons" "flng.labour" "flng.libdem" "flng.snp" ...
  ..$ v.names: chr [1:2] "flng.leaders" "flng.parties"
  ..$ idvar  : chr "id"
  ..$ timevar: chr "party"
Mean <- function(x,...) mean(x,...,na.rm=TRUE)
bes2010flngs_pre_long.splt <- lapply(
    bes2010flngs_pre_long.splt,
    within,expr={
        rel.flng.parties <- flng.parties - Mean(flng.parties)
        rel.flng.leaders <- flng.leaders - Mean(flng.leaders)
    })

str(bes2010flngs_pre_long.splt[[1]])
'data.frame':	8 obs. of  7 variables:
 $ region          : Factor w/ 3 levels "England","Scotland",..: 1 1 1 1 1 1 1 1
 $ party           : Factor w/ 8 levels "Conservative",..: 1 2 3 4 5 6 7 8
 $ flng.leaders    : num  3 6 3 NA 5 NA NA NA
 $ flng.parties    : num  6 5 4 NA NA 7 3 0
 $ id              : int  1 1 1 1 1 1 1 1
 $ rel.flng.leaders: num  -1.25 1.75 -1.25 NA 0.75 NA NA NA
 $ rel.flng.parties: num  1.833 0.833 -0.167 NA NA ...
 - attr(*, "reshapeLong")=List of 4
  ..$ varying:List of 2
  .. ..$ flng.leaders: chr [1:8] "flng.cameron" "flng.brown" "flng.clegg" "flng.salmond" ...
  .. ..$ flng.parties: chr [1:8] "flng.cons" "flng.labour" "flng.libdem" "flng.snp" ...
  ..$ v.names: chr [1:2] "flng.leaders" "flng.parties"
  ..$ idvar  : chr "id"
  ..$ timevar: chr "party"
bes2010flngs_pre_long <- unsplit(bes2010flngs_pre_long.splt,
                                 bes2010flngs_pre_long$id)
str(bes2010flngs_pre_long)
'data.frame':	15480 obs. of  7 variables:
 $ region          : Factor w/ 3 levels "England","Scotland",..: 1 1 1 1 1 1 1 1 NA NA ...
 $ party           : Factor w/ 8 levels "Conservative",..: 1 2 3 4 5 6 7 8 1 2 ...
 $ flng.leaders    : num  3 6 3 NA 5 NA NA NA 7 3 ...
 $ flng.parties    : num  6 5 4 NA NA 7 3 0 6 1 ...
 $ id              : int  1 1 1 1 1 1 1 1 2 2 ...
 $ rel.flng.leaders: num  -1.25 1.75 -1.25 NA 0.75 NA NA NA 2.5 -1.5 ...
 $ rel.flng.parties: num  1.833 0.833 -0.167 NA NA ...
 - attr(*, "reshapeLong")=List of 4
  ..$ varying:List of 2
  .. ..$ flng.leaders: chr [1:8] "flng.cameron" "flng.brown" "flng.clegg" "flng.salmond" ...
  .. ..$ flng.parties: chr [1:8] "flng.cons" "flng.labour" "flng.libdem" "flng.snp" ...
  ..$ v.names: chr [1:2] "flng.leaders" "flng.parties"
  ..$ idvar  : chr "id"
  ..$ timevar: chr "party"

Groupwise computations using withinGroups() from the package memisc. You may need to install this package using install.packages("memisc") from CRAN if you want to run this on your computer. (The package is already installed on the notebook container, however.)

library(memisc)
Loading required package: lattice
Loading required package: MASS

Attaching package: 'memisc'

The following object is masked _by_ '.GlobalEnv':

    Mean

The following objects are masked from 'package:stats':

    contr.sum, contr.treatment, contrasts

The following object is masked from 'package:base':

    as.array

bes2010flngs_pre_long <- withinGroups(bes2010flngs_pre_long,
                                      ~id,{
     rel.flng.parties <- flng.parties - Mean(flng.parties)
     rel.flng.leaders <- flng.leaders - Mean(flng.leaders)
    })

We use ‘head’ to look at the first 14 elements of the re-combined data frame:

head(bes2010flngs_pre_long[-(1:2)],n=14)
               flng.leaders flng.parties id rel.flng.leaders rel.flng.parties
1.Conservative            3            6  1            -1.25        1.8333333
1.Labour                  6            5  1             1.75        0.8333333
1.LibDem                  3            4  1            -1.25       -0.1666667
1.SNP                    NA           NA  1               NA               NA
1.Plaid Cymru             5           NA  1             0.75               NA
1.Green                  NA            7  1               NA        2.8333333
1.UKIP                   NA            3  1               NA       -1.1666667
1.BNP                    NA            0  1               NA       -4.1666667
2.Conservative            7            6  2             2.50        2.6666667
2.Labour                  3            1  2            -1.50       -2.3333333
2.LibDem                  5            7  2             0.50        3.6666667
2.SNP                    NA           NA  2               NA               NA
2.Plaid Cymru             3           NA  2            -1.50               NA
2.Green                  NA            6  2               NA        2.6666667

Downloadable R script and interactive version

Explanation

The link with the “jupyterhub” icon directs you to an interactive Jupyter1 notebook, which runs inside a Docker container2. There are two variants of the interative notebook. One shuts down after 60 seconds and does not require a sign it. The other requires sign in using your ORCID3 credentials, yet shuts down only after 24 hours. (There is no guarantee that such a container persists that long, it may be shut down earlier for maintenance purposes.) After shutdown all data within the container will be reset, i.e. all files created by the user will be deleted.4

Above you see a rendered version of the Jupyter notebook.5

1

For more information about Jupyter see http://jupyter.org. The Jupyter notebooks make use of the RKernel package.

2

For more information about Docker see https://docs.docker.com/. The container images are run with docker spawner.

3

ORCID is a free service for the authentication of researchers. It also allows to showcase publications and contributions to the academic community such as peer review.. See https://info.orcid.org/what-is-orcid/ for more information.

4

The Jupyter notebooks come with NO WARRANTY whatsoever. They are provided for educational and illustrative purposes only. Do not use them for production work.

5

The notebook is rendered with the help of the nbsphinx extension.