Spatial and Geographical data¶
While single geographical locations can be identified with a single pair of coordinates, the geographical extent of trade routes or countries are series of such coordinate values and cannot easily accommodated with the observation-by-variable format of data frames. Instead such geographical data constitutes its own data type, which is discussed in this chapter. The chapter also discusses how such spatial data can be connected with other data about geographical entities, e.g. the population density or GDP per capita of countries. Further it discusses the import of geographical data from shape files and OpenStreetMap OSM files and the definition of and conversion between cartographic projections.
Below is the supporting material for the various sections of the chapter.
The Structure of Spatial Data¶
-
The structure of objects of classes defined in the sp package
-
Script file:
spatial-datastructures-sp.R
The script makes use of the two packages:
- the sp package, which is available from
https://cran.r-project.org/package=sp
- the cshapes package, which is available from
https://cran.r-project.org/package=cshapes
The sp package is a requirement of cshapes and provides the necessary data class definitions.
- the sp package, which is available from
-
Interactive notebook:
-
-
The structure of objects of classes defined in the sf package
-
Script file:
spatial-datastructures-sf.R
The script makes use of the sf package, which is available from
https://cran.r-project.org/package=sf
Data file used in the script:
ged191.RData
which is contained in the ZIP archiveged191-RData.zip
downloaded fromhttps://ucdp.uu.se/downloads/index.html#ged_global
. Athttps://ucdp.uu.se/downloads/index.html#ged_global
you will now find a newer version of these data, which however have a different format. -
Interactive notebook:
In [1]:options(jupyter.rich_display=FALSE) # Create output as usual in R
In [2]:# The data used here is available from https://ucdp.uu.se/downloads/ged/ged191-RData.zip # download.file("https://ucdp.uu.se/downloads/ged/ged191-RData.zip", # "ged191-RData.zip")
In [3]:unzip("ged191-RData.zip") load("ged191.RData")
In [4]:class(ged191)
In [5]:# Number of observations nrow(ged191)
In [6]:# Number of variables ncol(ged191)
In [7]:library(sf)
In [8]:options(width=140) # Using the adapted 'print()' method from the 'sf' package print(ged191)
In [9]:# Examining the structure of an "sf" object with 'str()' str(ged191)
In [10]:# Obtaining the geometry contained in 'ged191' gged191 <- st_geometry(ged191) gged191
In [11]:# Accessing an individual shape gged191[[1]]
In [12]:# The class of a shape object class(gged191[[1]])
-
-
Converting “sp” objects to “sf” objects:
-
Script file:
converting-sp-to-sf.R
The script makes use of the two packages:
- the sp package, which is available from
https://cran.r-project.org/package=sp
- the sf package, which is available from
https://cran.r-project.org/package=sf
- the sp package, which is available from
-
Interactive notebook:
In [1]:options(jupyter.rich_display=FALSE) # Create output as usual in R
In [2]:library(sf)
In [3]:library(cshapes)
In [4]:cshapes.1990 <- cshp(as.Date("1990-01-01")) cshapes.1990 <- as(cshapes.1990,"sf")
In [5]:options(width=200) print(cshapes.1990[1:10])
In [6]:SthAmCntry.names <- c( "Argentina", "Bolivia", "Brazil", "Chile", "Colombia", "Ecuador", "Guyana", "Paraguay", "Peru", "Suriname", "Uruguay", "Venezuela")
In [7]:SthAmCountries <- subset(cshapes.1990, CNTRY_NAME %in% SthAmCntry.names)
In [8]:Brazil <- subset(cshapes.1990,CNTRY_NAME=="Brazil") Chile <- subset(cshapes.1990,CNTRY_NAME=="Chile") Colombia <- subset(cshapes.1990,CNTRY_NAME=="Colombia")
In [9]:cap.latlong <- with(cshapes.1990,cbind(CAPLONG,CAPLAT))
In [10]:cap.latlong <- lapply(1:nrow(cap.latlong), function(i)cap.latlong[i,])
In [11]:cap.latlong <- lapply(cap.latlong,st_point) cap.latlong <- st_sfc(cap.latlong)
In [12]:cshapes.capitals.1990 <- cshapes.1990 st_geometry(cshapes.capitals.1990) <- cap.latlong
In [13]:st_crs(cshapes.capitals.1990) <- st_crs(cshapes.1990)
In [14]:print(cshapes.capitals.1990[1:10])
In [15]:Brasilia <- subset(cshapes.capitals.1990,CNTRY_NAME=="Brazil") Santiago <- subset(cshapes.capitals.1990,CNTRY_NAME=="Chile") Bogota <- subset(cshapes.capitals.1990,CNTRY_NAME=="Colombia")
In [16]:graypal <- function(n)gray.colors(n,start=.2,end=.9,alpha=.5) plot(SthAmCountries,pal=graypal)
In [17]:plot(st_geometry(SthAmCountries))
In [18]:plot(st_geometry(Brazil))
In [19]:save(cshapes.1990,cshapes.capitals.1990,file="cshapes-1990.RData") save(Brazil,Chile,Colombia, Brasilia,Santiago,Bogota, SthAmCountries, file="south-america-1990.RData")
-
Spatial Relations and Operations¶
-
Spatial relations
-
Script file:
spatial-relations.R
The script makes use of the two packages:
- the sf package, which is available from
https://cran.r-project.org/package=sf
- the cshapes package, which is available from
https://cran.r-project.org/package=cshapes
Data file used in the script:
south-america-1990.RData
created by earlier scriptconverting-sp-to-sf.R
- the sf package, which is available from
-
Interactive notebook:
In [1]:options(jupyter.rich_display=FALSE) # Create output as usual in R
In [2]:library(sf)
In [3]:library(cshapes)
In [4]:load("south-america-1990.RData")
In [5]:st_contains(Brazil,Brasilia,sparse=FALSE)
In [6]:st_contains(Brazil,Bogota,sparse=FALSE)
In [7]:load("cshapes-1990.RData") ThreeCountries <- subset(cshapes.1990, CNTRY_NAME %in% c("Brazil","Chile","Colombia")) rownames(ThreeCountries) <- ThreeCountries$CNTRY_NAME
In [8]:ThreeCapitals <- subset(cshapes.capitals.1990, CNTRY_NAME %in% c("Brazil","Chile","Colombia")) rownames(ThreeCapitals) <- ThreeCapitals$CAPNAME
In [9]:st_contains(ThreeCountries,ThreeCapitals,sparse=FALSE)
In [10]:structure( st_contains(ThreeCountries,ThreeCapitals,sparse=FALSE), dimnames=list(rownames(ThreeCountries),rownames(ThreeCapitals)) )
In [11]:st_touches(Brazil,Colombia,sparse=FALSE)
In [12]:st_touches(Brazil,Chile,sparse=FALSE)
In [13]:structure( st_touches(ThreeCountries,sparse=FALSE), dimnames=list(rownames(ThreeCountries),rownames(ThreeCountries)) )
In [14]:save(ThreeCountries,ThreeCapitals,file="three-countries.RData")
-
-
Properties of spatial objects
-
Script file:
spatial-proterties.R
The script makes use of the two packages:
- the sf package, which is available from
https://cran.r-project.org/package=sf
- the units package, which is available from
https://cran.r-project.org/package=units
Data files used in the script:
created by earlier scripts
spatial-datastructures-sf.R
andspatial-relations.R
- the sf package, which is available from
-
Interactive notebook:
In [1]:options(jupyter.rich_display=FALSE) # Create output as usual in R
In [2]:library(sf)
In [3]:load("south-america-1990.RData") load("three-countries.RData")
In [4]:st_area(Brazil)
In [5]:in_km2 <- function(x) units::set_units(x,"km^2") in_km2(st_area(Brazil))
In [6]:in_km2(st_area(SthAmCountries))
In [7]:structure(in_km2(st_area(SthAmCountries)), names=as.character(SthAmCountries$CNTRY_NAME))
In [8]:st_distance(Brasilia,Bogota)
In [9]:st_distance(Chile,Bogota)
In [10]:# This takes a while, because R needs to figure out which points of the borders are the closest to one another st_distance(Chile,Colombia)
In [11]:in_km <- function(x) units::set_units(x,"km") in_km(st_distance(Brasilia,Bogota))
In [12]:in_km(st_distance(ThreeCapitals))
-
-
Unions and intersections (of artificial geometric objects)
-
Script file:
artificial-unions-intersections.R
The script makes use of the sf package, which is available from
https://cran.r-project.org/package=sf
. -
Interactive notebook:
In [1]:options(jupyter.rich_display=FALSE) # Create output as usual in R
In [2]:library(sf)
In [3]:A <- rbind( c(0,0), c(2,0), c(0,2), c(0,0) ) A <- st_polygon(list(A)) B <- rbind( c( 1, 1), c( 1,-1), c(-1,-1), c(-1, 1), c( 1, 1) ) B <- st_polygon(list(B))
In [4]:A
In [5]:B
In [6]:op <- par(mai=c(0,0,0,0),mfrow=c(1,3),xpd=NA) plot(A,xlim=c(-1,2),ylim=c(-1,2)) plot(B,lty=2,add=TRUE) text(0,1.5,"A",pos=2) text(1,-.5,"B",pos=4) text(.5,-1.5,"Two shapes A and B",pos=1) plot(st_union(A,B),col="gray70",xlim=c(-1,2),ylim=c(-1,2)) plot(A,lty=3,add=TRUE) plot(B,lty=3,add=TRUE) text(.5,-1.5,"st_union(A,B)",pos=1) plot(A,lty=3,xlim=c(-1,2),ylim=c(-1,2)) plot(B,lty=3,add=TRUE) plot(st_intersection(A,B),add=TRUE,col="gray70") text(.5,-1.5,"st_intersection(A,B)",pos=1) par(op)
In [7]:op <- par(mai=c(0,0,0,0),mfrow=c(1,3),xpd=NA) plot(st_difference(A,B),col="gray70",,xlim=c(-1,2),ylim=c(-1,2)) plot(A,lty=3,add=TRUE) plot(B,lty=3,add=TRUE) text(.5,-1.5,"st_difference(A,B)",pos=1) plot(st_difference(B,A),col="gray70",,xlim=c(-1,2),ylim=c(-1,2)) plot(A,lty=3,add=TRUE) plot(B,lty=3,add=TRUE) text(.5,-1.5,"st_difference(B,A)",pos=1) plot(st_sym_difference(A,B),col="gray70",,xlim=c(-1,2),ylim=c(-1,2)) text(.5,-1.5,"st_sym_difference(A,B)",pos=1) par(op)
-
-
Subsetting spatial feature objects
-
Script file:
spatial-subsets.R
The script makes use of the sf package, which is available from
https://cran.r-project.org/package=sf
.Data files used in the script:
created by earlier scripts
spatial-datastructures-sf.R
andconverting-sp-to-sf.R
-
Interactive notebook:
In [1]:options(jupyter.rich_display=FALSE) # Create output as usual in R
In [2]:library(sf)
In [3]:load("south-america-1990.RData") load("ged191.RData")
In [4]:# This fails due to different coordinate reference systems Colombia.conflicts <- ged191[Colombia,]
In [5]:st_crs(Colombia)
In [6]:st_crs(ged191)
In [7]:# Here we make sure that the coordinate reference systems match ged191_ellips <- st_transform(ged191,st_crs(Colombia)) Colombia.conflicts <- ged191_ellips[Colombia,]
In [8]:mypal <- function(n)gray.colors(n,start=.2,end=.9,alpha=.5) plot(st_geometry(Colombia)) plot(Colombia.conflicts["deaths_civilians"], add=TRUE,pch=19,cex=.2, pal=mypal, nbreaks=30 )
In [9]:st_circ <- function(x,dist.km){ dist.degr <- 360*dist.km/40007.863 st_buffer(st_geometry(x),dist=dist.degr) }
In [10]:Bogota.200km <- st_circ(Bogota,dist.km=200)
In [11]:Bogota.conflicts <- ged191_ellips[Bogota.200km,]
In [12]:plot(st_geometry(Colombia)) plot(st_geometry(Colombia.conflicts), add=TRUE,pch=1,cex=.3,col="gray80" ) plot(st_geometry(Bogota.conflicts), add=TRUE,pch=19,cex=.3) plot(Bogota.200km,lty=3,add=TRUE)
-
-
Aggregating spatial feature objects
-
Script file:
spatial-aggregates.R
The script makes use of the sf package, which is available from
https://cran.r-project.org/package=sf
.Data files used in the script:
created by earlier scripts
spatial-datastructures-sf.R
andconverting-sp-to-sf.R
-
Interactive notebook:
In [1]:options(jupyter.rich_display=FALSE) # Create output as usual in R
In [2]:library(sf)
In [3]:load("south-america-1990.RData") load("ged191.RData") load("cshapes-1990.RData")
In [4]:ged191_ellips <- st_transform(ged191,st_crs(cshapes.1990))
In [5]:# Civilian deaths per country aggregate(ged191_ellips["deaths_civilians"],by=SthAmCountries,sum)
In [6]:# Civilian deaths per country, with country names within( aggregate(ged191_ellips["deaths_civilians"],by=SthAmCountries,sum), country <- SthAmCountries$CNTRY_NAME)
In [7]:st_circ <- function(x,dist.km){ dist.degr <- 360*dist.km/40007.863 st_buffer(st_geometry(x),dist=dist.degr) }
In [8]:Bogota.region <- st_circ(Bogota,dist.km=200) Colombia.rest <- st_difference(st_geometry(Colombia),Bogota.region)
In [9]:# Civilian deaths in the Bogota region and the rest of Colombia aggregate(ged191_ellips["deaths_civilians"], by=c(Bogota.region,Colombia.rest), sum)
-
Cartographic Projections¶
-
Script file:
cartographic-projections.R
The script makes use of the sf package, which is available from
https://cran.r-project.org/package=sf
.Data file used in the script:
US_flat.RData
which contains prepared data originally from the maps package, available fromhttps://cran.r-project.org/package=maps
. -
Interactive notebook:
Geographical projections of a mainland USA map (with apologies to Alaskans and Hawaiians)¶
In [1]:options(jupyter.rich_display=FALSE) # Create output as usual in R
In [2]:library(sf)
In [3]:load("US_flat.RData") US_flat
In [4]:plot(st_geometry(US_flat), graticule=TRUE,axes=TRUE)
In [5]:bbox_US <- st_bbox(US_flat)
In [6]:c(xcenter = mean(bbox_US[c("xmin","xmax")]), ycenter = mean(bbox_US[c("ymin","ymax")]))
In [7]:laea <- st_crs("+proj=laea +lon_0=-95.8 +lat_0=37.3") US_proj <- st_transform(US_flat,laea) plot(st_geometry(US_proj), graticule=TRUE,axes=TRUE)