Counting words in the UK party manifesto on occasion of the 2017 election

The file “UKLabourParty_201706.csv” was downloaded from the Manifesto Project website. Redistribution of the data is prohibited, so readers who want to preproduce the following will need to download their own copy of the data set and upload it to the virtual machine that runs this notebook. To do this,

  1. pull down the “File” menu item and select “Open”
  2. An overview of the folder that contains the notebook opens.
  3. The folder view has a button labelled “Upload”. Use this to upload the file that you downloaded from the Manifesto Project website.

Note that the uploaded data will disappear, once you “Quit” the notebook (and the Jupyter instance).


# First, the data are read in
Labour.2017 <- read.csv("UKLabourParty_201706.csv",
                        stringsAsFactors=FALSE)

# Second, some non-ascii characters are substituted
Labour.2017$content <- gsub("\xE2\x80\x99","'",Labour.2017$content)
str(Labour.2017)
'data.frame':   1396 obs. of  3 variables:
 $ content : chr  "CREATING AN ECONOMY THAT WORKS FOR ALL" "Labour's economic strategy is about delivering a fairer, more prosperous society for the many, not just the few." "We will measure our economic success not by the number of billionaires, but by the ability of our people to live richer lives." "Labour understands that the creation of wealth is a collective endeavour between workers, entrepreneurs, invest"| __truncated__ ...
 $ cmp_code: chr  "H" "503" "503" "405" ...
 $ eu_code : logi  NA NA NA NA NA NA ...

# The variable 'content' contains the text of the manifesto
Labour.2017 <- Labour.2017$content
Labour.2017[1:5]
[1] "CREATING AN ECONOMY THAT WORKS FOR ALL"
[2] "Labour's economic strategy is about delivering a fairer, more prosperous society for the many, not just the few."
[3] "We will measure our economic success not by the number of billionaires, but by the ability of our people to live richer lives."
[4] "Labour understands that the creation of wealth is a collective endeavour between workers, entrepreneurs, investors and government."
[5] "Each contributes and each must share fairly in the rewards."

# The headings in the manifesto are all-uppercase, this helps
# to identify them:
Labour.2017.hlno <- which(Labour.2017==toupper(Labour.2017))
Labour.2017.headings <- Labour.2017[Labour.2017.hlno]
Labour.2017.headings[1:4]
[1] "CREATING AN ECONOMY THAT WORKS FOR ALL"
[2] "A FAIR TAXATION SYSTEM"
[3] "BALANCING THE BOOKS"
[4] "INFRASTRUCTURE INVESTMENT"

# All non-heading text is changed to lowercase
labour.2017 <- tolower(Labour.2017[-Labour.2017.hlno])
labour.2017[1:5]
[1] "labour's economic strategy is about delivering a fairer, more prosperous society for the many, not just the few."
[2] "we will measure our economic success not by the number of billionaires, but by the ability of our people to live richer lives."
[3] "labour understands that the creation of wealth is a collective endeavour between workers, entrepreneurs, investors and government."
[4] "each contributes and each must share fairly in the rewards."
[5] "this manifesto sets out labour's plan to upgrade our economy and rewrite the rules of a rigged system, so that our economy really works for the many, and not only the few."

# All lines that contain the pattern 'econom' are collected
ecny.labour.2017 <- grep("econom",labour.2017,value=TRUE)
ecny.labour.2017[1:5]
[1] "labour's economic strategy is about delivering a fairer, more prosperous society for the many, not just the few."
[2] "we will measure our economic success not by the number of billionaires, but by the ability of our people to live richer lives."
[3] "this manifesto sets out labour's plan to upgrade our economy and rewrite the rules of a rigged system, so that our economy really works for the many, and not only the few."
[4] "britain is the only major developed economy where earnings have fallen even as growth has returned after the financial crisis."
[5] "we will upgrade our economy, breaking down the barriers that hold too many of us back,"

# Using 'strsplit()' the lines are split into words
labour.2017.words <- strsplit(labour.2017,"[ ,.;:]+")
str(labour.2017.words[1:5])
List of 5
 $ : chr [1:18] "labour's" "economic" "strategy" "is" ...
 $ : chr [1:23] "we" "will" "measure" "our" ...
 $ : chr [1:17] "labour" "understands" "that" "the" ...
 $ : chr [1:10] "each" "contributes" "and" "each" ...
 $ : chr [1:32] "this" "manifesto" "sets" "out" ...

# The result is a list. We change it into a character vector.
labour.2017.words <- unlist(labour.2017.words)
labour.2017.words[1:20]
 [1] "labour's"   "economic"   "strategy"   "is"         "about"
 [6] "delivering" "a"          "fairer"     "more"       "prosperous"
[11] "society"    "for"        "the"        "many"       "not"
[16] "just"       "the"        "few"        "we"         "will"

# We now count the words and look at the 20 most common ones.
labour.2017.nwords <- table(labour.2017.words)
labour.2017.nwords <- sort(labour.2017.nwords,decreasing=TRUE)
labour.2017.nwords[1:20]
labour.2017.words
   the    and     to   will     of      a     we     in labour    for    our
  1202    947    832    664    625    438    418    369    313    312    244
  that     on   with     by     is    are     as   have ensure
   232    212    185    161    161    134    112    108    104

Downloadable R script and interactive version

Explanation

The link with the “jupyterhub” icon directs you to an interactive Jupyter1 notebook, which runs inside a Docker container2. There are two variants of the interative notebook. One shuts down after 60 seconds and does not require a sign it. The other requires sign in using your ORCID3 credentials, yet shuts down only after 24 hours. (There is no guarantee that such a container persists that long, it may be shut down earlier for maintenance purposes.) After shutdown all data within the container will be reset, i.e. all files created by the user will be deleted.4

Above you see a rendered version of the Jupyter notebook.5

1

For more information about Jupyter see http://jupyter.org. The Jupyter notebooks make use of the IRKernel package.

2

For more information about Docker see https://docs.docker.com/. The container images were created with repo2docker, while containers are run with docker spawner.

3

ORCID is a free service for the authentication of researchers. It also allows to showcase publications and contributions to the academic community such as peer review.. See https://info.orcid.org/what-is-orcid/ for more information.

4

The Jupyter notebooks come with NO WARRANTY whatsoever. They are provided for educational and illustrative purposes only. Do not use them for production work.

5

The notebook is rendered with the help of the nbsphinx extension.