Filling and completing with tidyr

The following makes use of the packages tidyr and readr. You may need to install them from CRAN using the code install.packages(c("tidyr","readr")) if you want to run this on your computer. (The packages are already installed on the notebook container, however.)

Filling missing values with fill()


library(tidyr)
library(readr)

messy_data_str <- "
country,  year,var1, var2
Rodinia,  1297,  67, -3.0
,         1298,  69, -2.9
,         1299,  70, -2.8
Pannotia, 1296,  73, -4.1
,         1297,  74, -3.9
,         1298,  75, -3.9
Pangaea,  1296,  54, -1.2
,         1297,  53, -1.1
,         1298,  52, -1.0
,         1299,  51, -0.9
"

messy_data_str %>% read_csv() -> messy_data
messy_data
   country  year var1 var2
1  Rodinia  1297 67   -3.0
2  NA       1298 69   -2.9
3  NA       1299 70   -2.8
4  Pannotia 1296 73   -4.1
5  NA       1297 74   -3.9
6  NA       1298 75   -3.9
7  Pangaea  1296 54   -1.2
8  NA       1297 53   -1.1
9  NA       1298 52   -1.0
10 NA       1299 51   -0.9

messy_data %>% fill(country) -> filled_data
filled_data
   country  year var1 var2
1  Rodinia  1297 67   -3.0
2  Rodinia  1298 69   -2.9
3  Rodinia  1299 70   -2.8
4  Pannotia 1296 73   -4.1
5  Pannotia 1297 74   -3.9
6  Pannotia 1298 75   -3.9
7  Pangaea  1296 54   -1.2
8  Pangaea  1297 53   -1.1
9  Pangaea  1298 52   -1.0
10 Pangaea  1299 51   -0.9

Completing data by missing values with complete()


filled_data %>% complete(crossing(country,year))
   country  year var1 var2
1  Pangaea  1296 54   -1.2
2  Pangaea  1297 53   -1.1
3  Pangaea  1298 52   -1.0
4  Pangaea  1299 51   -0.9
5  Pannotia 1296 73   -4.1
6  Pannotia 1297 74   -3.9
7  Pannotia 1298 75   -3.9
8  Pannotia 1299 NA     NA
9  Rodinia  1296 NA     NA
10 Rodinia  1297 67   -3.0
11 Rodinia  1298 69   -2.9
12 Rodinia  1299 70   -2.8

Downloadable R script and interactive version

Explanation

The link with the “jupyterhub” icon directs you to an interactive Jupyter1 notebook, which runs inside a Docker container2. There are two variants of the interative notebook. One shuts down after 60 seconds and does not require a sign it. The other requires sign in using your ORCID3 credentials, yet shuts down only after 24 hours. (There is no guarantee that such a container persists that long, it may be shut down earlier for maintenance purposes.) After shutdown all data within the container will be reset, i.e. all files created by the user will be deleted.4

Above you see a rendered version of the Jupyter notebook.5

1

For more information about Jupyter see http://jupyter.org. The Jupyter notebooks make use of the IRKernel package.

2

For more information about Docker see https://docs.docker.com/. The container images were created with repo2docker, while containers are run with docker spawner.

3

ORCID is a free service for the authentication of researchers. It also allows to showcase publications and contributions to the academic community such as peer review.. See https://info.orcid.org/what-is-orcid/ for more information.

4

The Jupyter notebooks come with NO WARRANTY whatsoever. They are provided for educational and illustrative purposes only. Do not use them for production work.

5

The notebook is rendered with the help of the nbsphinx extension.