The structure of data frames¶
Data frame construction¶
# First create a few vectors from which we construct the data frame:
population <- c(55619400,1885400,5424800,3125000)
area.sq.m <- c(50301,5460,30090,8023)
GVA.cap <- c(28096,20000,24800,19900)
# then we use 'data.frame' to construct the data frame:
UK <- data.frame(population,area.sq.m,GVA.cap)
UK
population area.sq.m GVA.cap
1 55619400 50301 28096
2 1885400 5460 20000
3 5424800 30090 24800
4 3125000 8023 19900
names(UK)
names(UK) <- c("Population","Area","GVA")
UK
[1] "population" "area.sq.m" "GVA.cap"
Population Area GVA
1 55619400 50301 28096
2 1885400 5460 20000
3 5424800 30090 24800
4 3125000 8023 19900
row.names(UK)
[1] "1" "2" "3" "4"
row.names(UK) <- c("England",
"Northern Ireland",
"Scotland",
"Wales")
UK
Population Area GVA
England 55619400 50301 28096
Northern Ireland 1885400 5460 20000
Scotland 5424800 30090 24800
Wales 3125000 8023 19900
# It is also possible to set the names and row names in the data frame explicitly, when this
# appears more convenient:
UK <- data.frame(
Population = c(55619400,1885400,5424800,3125000),
Area = c(50301,5460,30090,8023),
GVA = c(28096,20000,24800,19900),
row.names = c("England",
"Northern Ireland",
"Scotland",
"Wales"))
UK
Population Area GVA
England 55619400 50301 28096
Northern Ireland 1885400 5460 20000
Scotland 5424800 30090 24800
Wales 3125000 8023 19900
nrow(UK)
[1] 4
ncol(UK)
[1] 3
dim(UK)
[1] 4 3
In what follows we treat the data frame ‘UK’ as a list:¶
# Here we get the variable 'Population':
UK$Population
[1] 55619400 1885400 5424800 3125000
# Analoguously, one can use the double bracket-operator ('[[]]')
# to get the variable 'Population':
UK[["Population"]]
[1] 55619400 1885400 5424800 3125000
# Also the single bracket-operator works as with lists.
# We get a data frame of the first two variables in
# the data frame
UK[1:2]
Population Area
England 55619400 50301
Northern Ireland 1885400 5460
Scotland 5424800 30090
Wales 3125000 8023
# Now we get a data frame with the variables named 'Population' and
# 'GVA'
UK[c("Population","GVA")]
Population GVA
England 55619400 28096
Northern Ireland 1885400 20000
Scotland 5424800 24800
Wales 3125000 19900
In the next few lines show the selection of rows and columns of a data frame¶
# We select the first two rows of the
# data frame 'UK' by just using their numbers:
UK[1:2,]
Population Area GVA
England 55619400 50301 28096
Northern Ireland 1885400 5460 20000
# By referring to row names, we select Scotland and Wales:
UK[c("Scotland","Wales"),]
Population Area GVA
Scotland 5424800 30090 24800
Wales 3125000 8023 19900
# As in a previous example, we select the first two columns ...
UK[,1:2]
Population Area
England 55619400 50301
Northern Ireland 1885400 5460
Scotland 5424800 30090
Wales 3125000 8023
# and the variables named 'Population' and 'GVA'
UK[,c("Population","GVA")]
Population GVA
England 55619400 28096
Northern Ireland 1885400 20000
Scotland 5424800 24800
Wales 3125000 19900
Downloadable R script and interactive version
- R Script: structure-of-data-frames.R
- Interactive version (shuts down after 60s):
- Interactive version (sign in required):
Explanation
The link with the “jupyterhub” icon directs you to an interactive Jupyter1 notebook, which runs inside a Docker container2. There are two variants of the interative notebook. One shuts down after 60 seconds and does not require a sign it. The other requires sign in using your ORCID3 credentials, yet shuts down only after 24 hours. (There is no guarantee that such a container persists that long, it may be shut down earlier for maintenance purposes.) After shutdown all data within the container will be reset, i.e. all files created by the user will be deleted.4
Above you see a rendered version of the Jupyter notebook.5
- 1
-
For more information about Jupyter see
http://jupyter.org
. The Jupyter notebooks make use of the IRKernel package. - 2
-
For more information about Docker see
https://docs.docker.com/
. The container images were created with repo2docker, while containers are run with docker spawner. - 3
-
ORCID is a free service for the authentication of researchers. It also allows to showcase publications and contributions to the academic community such as peer review.. See
https://info.orcid.org/what-is-orcid/
for more information. - 4
-
The Jupyter notebooks come with NO WARRANTY whatsoever. They are provided for educational and illustrative purposes only. Do not use them for production work.
- 5
-
The notebook is rendered with the help of the nbsphinx extension.