Groupwise computations within data frames¶
Here we use data from the British Election Study 2010. The data set bes2010feelings-pre-long.RData is prepared from the original available at https://www.britishelectionstudy.com/data-object/2010-bes-cross-section/ by removing identifying information and scrambling the data.
load("bes2010feelings-pre-long.RData")
Groupwise computations using split()
:
bes2010flngs_pre_long.splt <- split(bes2010flngs_pre_long,
bes2010flngs_pre_long$id)
str(bes2010flngs_pre_long.splt[[1]])
'data.frame': 8 obs. of 5 variables:
$ region : Factor w/ 3 levels "England","Scotland",..: 1 1 1 1 1 1 1 1
$ party : Factor w/ 8 levels "Conservative",..: 1 2 3 4 5 6 7 8
$ flng.leaders: num 3 6 3 NA 5 NA NA NA
$ flng.parties: num 6 5 4 NA NA 7 3 0
$ id : int 1 1 1 1 1 1 1 1
- attr(*, "reshapeLong")=List of 4
..$ varying:List of 2
.. ..$ flng.leaders: chr [1:8] "flng.cameron" "flng.brown" "flng.clegg" "flng.salmond" ...
.. ..$ flng.parties: chr [1:8] "flng.cons" "flng.labour" "flng.libdem" "flng.snp" ...
..$ v.names: chr [1:2] "flng.leaders" "flng.parties"
..$ idvar : chr "id"
..$ timevar: chr "party"
Mean <- function(x,...) mean(x,...,na.rm=TRUE)
bes2010flngs_pre_long.splt <- lapply(
bes2010flngs_pre_long.splt,
within,expr={
rel.flng.parties <- flng.parties - Mean(flng.parties)
rel.flng.leaders <- flng.leaders - Mean(flng.leaders)
})
str(bes2010flngs_pre_long.splt[[1]])
'data.frame': 8 obs. of 7 variables:
$ region : Factor w/ 3 levels "England","Scotland",..: 1 1 1 1 1 1 1 1
$ party : Factor w/ 8 levels "Conservative",..: 1 2 3 4 5 6 7 8
$ flng.leaders : num 3 6 3 NA 5 NA NA NA
$ flng.parties : num 6 5 4 NA NA 7 3 0
$ id : int 1 1 1 1 1 1 1 1
$ rel.flng.leaders: num -1.25 1.75 -1.25 NA 0.75 NA NA NA
$ rel.flng.parties: num 1.833 0.833 -0.167 NA NA ...
- attr(*, "reshapeLong")=List of 4
..$ varying:List of 2
.. ..$ flng.leaders: chr [1:8] "flng.cameron" "flng.brown" "flng.clegg" "flng.salmond" ...
.. ..$ flng.parties: chr [1:8] "flng.cons" "flng.labour" "flng.libdem" "flng.snp" ...
..$ v.names: chr [1:2] "flng.leaders" "flng.parties"
..$ idvar : chr "id"
..$ timevar: chr "party"
bes2010flngs_pre_long <- unsplit(bes2010flngs_pre_long.splt,
bes2010flngs_pre_long$id)
str(bes2010flngs_pre_long)
'data.frame': 15480 obs. of 7 variables:
$ region : Factor w/ 3 levels "England","Scotland",..: 1 1 1 1 1 1 1 1 NA NA ...
$ party : Factor w/ 8 levels "Conservative",..: 1 2 3 4 5 6 7 8 1 2 ...
$ flng.leaders : num 3 6 3 NA 5 NA NA NA 7 3 ...
$ flng.parties : num 6 5 4 NA NA 7 3 0 6 1 ...
$ id : int 1 1 1 1 1 1 1 1 2 2 ...
$ rel.flng.leaders: num -1.25 1.75 -1.25 NA 0.75 NA NA NA 2.5 -1.5 ...
$ rel.flng.parties: num 1.833 0.833 -0.167 NA NA ...
- attr(*, "reshapeLong")=List of 4
..$ varying:List of 2
.. ..$ flng.leaders: chr [1:8] "flng.cameron" "flng.brown" "flng.clegg" "flng.salmond" ...
.. ..$ flng.parties: chr [1:8] "flng.cons" "flng.labour" "flng.libdem" "flng.snp" ...
..$ v.names: chr [1:2] "flng.leaders" "flng.parties"
..$ idvar : chr "id"
..$ timevar: chr "party"
Groupwise computations using withinGroups()
from the package memisc. You may need to install this package using install.packages("memisc")
from
CRAN if you want to run this on your computer. (The package is already installed on the notebook container, however.)
library(memisc)
Loading required package: lattice
Loading required package: MASS
Attaching package: 'memisc'
The following object is masked _by_ '.GlobalEnv':
Mean
The following objects are masked from 'package:stats':
contr.sum, contr.treatment, contrasts
The following object is masked from 'package:base':
as.array
bes2010flngs_pre_long <- withinGroups(bes2010flngs_pre_long,
~id,{
rel.flng.parties <- flng.parties - Mean(flng.parties)
rel.flng.leaders <- flng.leaders - Mean(flng.leaders)
})
We use ‘head’ to look at the first 14 elements of the re-combined data frame:
head(bes2010flngs_pre_long[-(1:2)],n=14)
flng.leaders flng.parties id rel.flng.leaders rel.flng.parties
1.Conservative 3 6 1 -1.25 1.8333333
1.Labour 6 5 1 1.75 0.8333333
1.LibDem 3 4 1 -1.25 -0.1666667
1.SNP NA NA 1 NA NA
1.Plaid Cymru 5 NA 1 0.75 NA
1.Green NA 7 1 NA 2.8333333
1.UKIP NA 3 1 NA -1.1666667
1.BNP NA 0 1 NA -4.1666667
2.Conservative 7 6 2 2.50 2.6666667
2.Labour 3 1 2 -1.50 -2.3333333
2.LibDem 5 7 2 0.50 3.6666667
2.SNP NA NA 2 NA NA
2.Plaid Cymru 3 NA 2 -1.50 NA
2.Green NA 6 2 NA 2.6666667
- R file: groupwise-computations.R
- Rmarkdown file: groupwise-computations.Rmd
- Jupyter notebook file: groupwise-computations.ipynb
- Interactive version of the Jupyter notebook (shuts down after 60s):
- Interactive version of the Jupyter notebook (sign in required):