Merging data frames: An example with data from the British Election Study¶
Here we merge data from the British Election Study. Here we use data from the British Election Study 2010. The data set bes2010feelings-prepost-for-merge.RData is prepared from the original available at https://www.britishelectionstudy.com/data-object/2010-bes-cross-section/ by removing identifying information and scrambling the data.
load("bes2010feelings-prepost-for-merge.RData")
A peek into a data frame about respondents’ feelings about parties:
head(bes2010flngs_parties_pre)
id refno vote flng.labour flng.cons flng.libdem flng.snp flng.pcym
40103.1 40103 312 NA 5 6 4 NA NA
40107.1 40107 312 NA 1 6 7 NA NA
40109.1 40109 312 NA 3 4 5 NA NA
40110.1 40110 312 Labour 6 6 5 NA NA
40111.1 40111 312 Labour 8 4 5 NA NA
40112.1 40112 312 Labour 5 1 4 NA NA
flng.green flng.ukip flng.bnp region
40103.1 7 3 0 England
40107.1 6 0 0 NA
40109.1 5 0 0 England
40110.1 5 3 2 England
40111.1 4 NA 2 NA
40112.1 4 0 0 England
And anotehr peek into a data frame about respondents’ feelings about party leaders:
head(bes2010flngs_leaders_pre)
id flng.brown flng.cameron flng.clegg flng.salmond flng.jones
40103.1 40103 6 3 3 NA 5
40107.1 40107 3 7 5 NA 3
40109.1 40109 8 7 4 NA 10
40110.1 40110 4 4 3 NA 7
40111.1 40111 5 5 5 NA 5
40112.1 40112 5 0 4 NA 1
The variable that identifies individual respondents in both data frames is ‘id’, so we use this variable to match the rows in both data frames:
bes2010flngs_pre_merged <- merge(
bes2010flngs_parties_pre,
bes2010flngs_leaders_pre,
by="id"
)
merge()
also allows for identifier variables that may have
different names in the two data frame. In such cases
one can use the named arguments by.x=
and by.y=
:
bes2010flngs_pre_merged <- merge(
bes2010flngs_parties_pre,
bes2010flngs_leaders_pre,
by.x="id",
by.y="id"
)
It is not absolutely necessary to provide a by=
argument, if
the merged data frames share a variable (with the same name in both)
that idenfies cases or observations. Therefore, we can call merge()
here without any by=
, by.x=
, or by.y=
arguments:
bes2010flngs_pre_merged <- merge(
bes2010flngs_parties_pre,
bes2010flngs_leaders_pre
)
head(bes2010flngs_pre_merged)
id refno vote flng.labour flng.cons flng.libdem flng.snp flng.pcym
1 40103 312 NA 5 6 4 NA NA
2 40107 312 NA 1 6 7 NA NA
3 40109 312 NA 3 4 5 NA NA
4 40110 312 Labour 6 6 5 NA NA
5 40111 312 Labour 8 4 5 NA NA
6 40112 312 Labour 5 1 4 NA NA
flng.green flng.ukip flng.bnp region flng.brown flng.cameron flng.clegg
1 7 3 0 England 6 3 3
2 6 0 0 NA 3 7 5
3 5 0 0 England 8 7 4
4 5 3 2 England 4 4 3
5 4 NA 2 NA 5 5 5
6 4 0 0 England 5 0 4
flng.salmond flng.jones
1 NA 5
2 NA 3
3 NA 10
4 NA 7
5 NA 5
6 NA 1
The data frame constwin
contains data about relectoral districts, that is
which party won the respective district seat in 2005 and 2010. The
variable that identifies the electoral district is both in the individual-level
data frame and the district-level data frame is named refno
, so we use this
as a matching variable.
bes2010pre_merged <- merge(
bes2010flngs_pre_merged,
constwin,
by = "refno" # Not necessary in the present case, because
) # it is the same in both data frames.
As can be glimpsed from the output of str
, the result of merge
is
sorted by the matching variable, i.e. “refno”
head(bes2010pre_merged)
refno id vote flng.labour flng.cons flng.libdem flng.snp
1 1 77920 Plaid Cymru 6 5 5 NA
2 1 57911 NA 5 3 3 NA
3 1 57905 Labour 10 0 3 NA
4 1 57906 Labour 10 0 4 NA
5 1 57910 Conservatives 0 9 3 NA
6 1 57902 Conservatives 8 9 6 NA
flng.pcym flng.green flng.ukip ⋯ flng.brown flng.cameron flng.clegg
1 7 7 5 ⋯ 0 8 NA
2 3 4 0 ⋯ 6 5 5
3 5 3 4 ⋯ 8 0 4
4 3 0 6 ⋯ 10 0 0
5 3 6 2 ⋯ 0 9 0
6 5 5 6 ⋯ 4 8 6
flng.salmond flng.jones seat win05 win10 maj05 maj10
1 NA 4 Aberavon Labour Labour 46.3 35.7
2 NA 6 Aberavon Labour Labour 46.3 35.7
3 NA 0 Aberavon Labour Labour 46.3 35.7
4 NA 10 Aberavon Labour Labour 46.3 35.7
5 NA 0 Aberavon Labour Labour 46.3 35.7
6 NA 7 Aberavon Labour Labour 46.3 35.7
- R file: merging-BES.R
- Rmarkdown file: merging-BES.Rmd
- Jupyter notebook file: merging-BES.ipynb
- Interactive version of the Jupyter notebook (shuts down after 60s):
- Interactive version of the Jupyter notebook (sign in required):