Dates, Times, and Time Series¶
Temporal data, consisting of dates and times, pose their own challenges. Time is measured in non-metric units, in hours, minutes and seconds. Dates can be recorded according to various calendaric systems, and are complicated by leap days and leap seconds. R provides facilities to convert times and dates into different calendaric systems, to format temporal data and to import temporal data recorded in different formats. This is one topic of this chapter. The other topic are time series and similar data structures (such as panels). Basic time series consist of measurements conducted in regular temporal intervals, but beside these basic variants, R also supports irregular time series. The chapter therefore also the discusses the construction and manipulation of regular and irregular time series.
Below is the supporting material for the various sections of the chapter.
Dates and Times¶
-
Date objects and date formatting
- Script file:
date-objects-date-formatting.R
-
Interactive notebook:
- Script file:
-
Date arithmetic
- Script file:
date-arithmetic.R
-
Interactive notebook:
In [1]:options(jupyter.rich_display=FALSE) # Create output as usual in R
In [2]:# R knows the lengths of months, e.g. that March has 31 days: d0 <- as.Date("1968-03-01") d0 + 31
In [3]:# R also knows that 1968 was a leap year, d1 <- as.Date("1968-02-28") d1 + 1
In [4]:# that 1900 was not a leap year, d2 <- as.Date("1900-02-28") d2 + 1
In [5]:# that 2000 was a leap year, d3 <- as.Date("2000-02-28") d3 + 1
In [6]:# and that leap years are 366 days long d3 + 366
- Script file:
-
POSIXct time objects
- Script file:
POSIXct-time-objects.R
-
Interactive notebook:
In [1]:options(jupyter.rich_display=FALSE) # Create output as usual in R
In [2]:as.POSIXct(7200,origin="1970-01-01")
In [3]:t0 <- as.POSIXct(7200,origin="1970-01-01",tz="GMT") t0 <- as.POSIXct(7200,origin="1970-01-01") attr(t0,"tzone") <- "GMT"
In [4]:as.POSIXct(c("97/11/12 12:45","98/01/23 14:20"), format="%y/%m/%d %H:%M",tz="GMT")
- Script file:
-
Time arithmetic
- Script file:
time-arithmetic.R
-
Interactive notebook:
In [1]:options(jupyter.rich_display=FALSE) # Create output as usual in R
In [2]:# When in standard format, a string does not need a format spefication in order # to be translatable t0 <- as.POSIXct("2020-02-01 00:00",tz="GMT") t0
In [3]:# Adding 3600 seconds means adding an hour: t0 + 3600
In [4]:# Subtracting seconds may also change the date: t0 - 1
In [5]:# A day is 24 times 3600 seconds day <- 24*3600 t0 + day
- Script file:
-
POSIXlt time objects
- Script file:
POSIXlt-time-objects.R
-
Interactive notebook:
In [1]:options(jupyter.rich_display=FALSE) # Create output as usual in R
In [2]:t0 <- as.POSIXlt(0,origin="2020-02-01",tz="GMT")
In [3]:(t1 <- as.POSIXlt(t0 + 3630))
In [4]:# Get the seconds component of the time point t1$sec
In [5]:# Get the minutes component of the time point t1$min
In [6]:# Get the hours component t1$hour
In [7]:# Get the day(s) of the month t1$mday
In [8]:# Get the (numeric) month t1$mon
In [9]:# Get the (numeric) year t1$year
In [10]:# Get the (numeric) day of the week t1$wday
- Script file:
-
Creation of date and time data for given years, months, and days
- Script file:
ISOdate.R
-
Interactive notebook:
In [1]:options(jupyter.rich_display=FALSE) # Create output as usual in R
In [2]:# Here we create the first days of all months in the year 2000: # By default the time is noon ISOdate(2000,1:12,1)
In [3]:# To get the start of the date we have to set the hour to midnight: ISOdate(2000,1:12,1,hour=0)
In [4]:# We can of course also create a sequence of days: ISOdate(2000,2,1:29,hour=0)
In [5]:# 'Impossible' dates result in NA: ISOdate(2000,2,29:31,hour=0)
- Script file:
-
Time differences
- Script file:
time-differences.R
-
Interactive notebook:
In [1]:options(jupyter.rich_display=FALSE) # Create output as usual in R
In [2]:# It does not matter whether we have "POSIXct" or "POSIXlt" objects, # we can always obtain differences between the tiems t0 <- as.POSIXlt(0,origin="2020-02-01",tz="GMT") t1 <- as.POSIXct(0,origin="2020-02-01 3:00",tz="GMT") t2 <- as.POSIXlt(0,origin="2020-02-01 3:45",tz="GMT") t3 <- as.POSIXct(0,origin="2020-02-01 3:45:06",tz="GMT")
In [3]:# The unit of measurement for time differences is selected # automatically. Usually it is the largest sensible unit: t1 - t0
In [4]:t2 - t1
In [5]:t3 - t2
In [6]:t3 - t0
In [7]:# The last difference is in hours and hour fractions. It might be more sensible # to have seconds as units of measuremnt. diff.t <- t3 - t0 units(diff.t) <- "secs" diff.t
In [8]:# It is also possible to compute differences between dates: d0 <- as.Date("2020-01-31") d1 <- as.Date("2020-02-28") d2 <- as.Date("2020-03-31")
In [9]:# Usually the difference is in days: d1 - d0
In [10]:d2 - d0
In [11]:# We may also want to see the difference in hours: diff.d <- d1 - d0 units(diff.d) <- "hours" diff.d
In [12]:# It is also possible to create time durations from scratch # From strings: as.difftime("0:30:00")
In [13]:# and from numbers, here it is necessary to specify the unit of measurement as.difftime(30, units="mins")
- Script file:
Time Series¶
Regular time series¶
-
Approval of US presidents
- Script file:
presidents-timeseries.R
-
Interactive notebook:
In [1]:options(jupyter.rich_display=FALSE) # Create output as usual in R
The following line is not really necessary, it is used here only to indicate that
presidents
is a pre-installed data example.In [2]:data(presidents)
The data contains quarterly data about presidents' popularity. The function
tsp()
contains the time series properties: the starting point, the end point and the frequency in which the popularity is measured within years.In [3]:tsp(presidents)
With the functions
start()
,end()
andfrequency()
we can obtain the respective time series properties.In [4]:start(presidents)
In [5]:end(presidents)
In [6]:frequency(presidents)
In [7]:presidents[1:12]
In [8]:window(presidents, start=1945, end=c(1947,4))
In [9]:nixon <- window(presidents, start=1969, end=c(1974,2)) nixon
In [10]:plot(nixon)
In [11]:time(nixon)
- Script file:
-
OECD unemployment data
-
Script file:
OECD-unemployment.R
Data set used in the script:
unemployment.csv
, which was originally downloaded fromhttps://data.oecd.org
-
Interactive notebook:
In [1]:options(jupyter.rich_display=FALSE) # Create output as usual in R
In [2]:unemployment <- read.csv("unemployment.csv")
In [3]:unemployment.ts <- ts(unemployment[2:5], start = 1970)
In [4]:plot(unemployment.ts)
In [5]:window(unemployment.ts, start=1980, end=1989)
In [6]:delta.unemployment.ts <- diff(unemployment.ts)
In [7]:plot(delta.unemployment.ts)
-
-
Artificial time series data
-
Script file:
time-arithmetic.R
Data set used in the script:
unemployment.csv
-
Interactive notebook:
In [1]:options(jupyter.rich_display=FALSE) # Create output as usual in R
In [2]:# When in standard format, a string does not need a format spefication in order # to be translatable t0 <- as.POSIXct("2020-02-01 00:00",tz="GMT") t0
In [3]:# Adding 3600 seconds means adding an hour: t0 + 3600
In [4]:# Subtracting seconds may also change the date: t0 - 1
In [5]:# A day is 24 times 3600 seconds day <- 24*3600 t0 + day
-
Irregular time series and the zoo package¶
-
Creating a “zoo” object from the
presidents
time series-
Script file:
creating-zoo-objects-presidents.R
The script makes use of the zoo package, which is available from
https://cran.r-project.org/package=zoo
-
Interactive notebook:
In [1]:options(jupyter.rich_display=FALSE) # Create output as usual in R
In [2]:npresidents <- as.numeric(presidents) library(zoo)
In [3]:years <- 1945:1974 quarters <- 1:4 presi.times <- yearqtr( rep(years,each=4) + # each year is repeated 4 times rep((quarters-1)/4,30) # the quarters are repeated 30 times ) zpresidents <- zoo(npresidents,order.by=presi.times) zpresidents
In [4]:str(zpresidents)
In [5]:coredata(zpresidents)[1:15] # To save space we only look at the
In [6]:index(zpresidents)[1:15] # first 15 elements.
In [7]:time(zpresidents)[1:15]
In [8]:zpresidents[1:8]
In [9]:# Saved for later use: save(zpresidents,file="zpresidents.RData")
-
-
Creating a “zoo” object from OECD unemployment data
-
Script file:
creating-zoo-objects-unemployment.R
Data set used in the script:
unemployment.csv
, which was originally downloaded fromhttps://data.oecd.org
The script makes use of the zoo package, which is available from
https://cran.r-project.org/package=zoo
-
Interactive notebook:
In [1]:options(jupyter.rich_display=FALSE) # Create output as usual in R
In [2]:unemployment <- read.csv("unemployment.csv") library(zoo)
In [3]:unemployment.z <- zoo(unemployment[,2:7], order.by=as.Date( ISOdate(year=unemployment[,1], month=12, day=31)))
In [4]:dim(unemployment.z)
In [5]:class(unemployment.z)
In [6]:head(unemployment.z)
In [7]:start(unemployment.z)
In [8]:end(unemployment.z)
In [9]:end(unemployment.z) - start(unemployment.z)
In [10]:# Saved for later use: save(unemployment.z,file="unemployment-z.RData")
-
-
Subsetting “zoo” objects
-
Script file:
subsetting-zoo-objects.R
The script makes use of the zoo package, which is available from
https://cran.r-project.org/package=zoo
Data file used in the script:
zpresidents.RData
created by earlier scriptcreating-zoo-objects-presidents.R
The script makes use of the zoo package, which is available from
https://cran.r-project.org/package=zoo
-
Interactive notebook:
In [1]:options(jupyter.rich_display=FALSE) # Create output as usual in R
In [2]:library(zoo)
In [3]:as.yearqtr("1945 Q2")
In [4]:load("zpresidents.RData")
In [5]:zpresidents[as.yearqtr("1945 Q2")]
In [6]:qtrs3 <- as.yearqtr(paste(1960:1969,"Q3")) zpresidents[qtrs3]
In [7]:qtrs <- paste(rep(1960:1964,each=4),rep(4:1,4),sep="-") qtrs
In [8]:zpresidents[as.yearqtr(qtrs)]
In [9]:load("unemployment-z.RData")
In [10]:unemployment.z[as.Date("1997-12-31")]
In [11]:window(zpresidents, start = as.yearqtr("1969-1"), end = as.yearqtr("1974-2"))
In [12]:window(unemployment.z, start = as.Date("1980-12-31"), end = as.Date("1989-12-31"))
-
-
Handling missing values
-
Script file:
handling-missing-values.R
The script makes use of the zoo package, which is available from
https://cran.r-project.org/package=zoo
Data file used in the script:
zpresidents.RData
created by earlier scriptcreating-zoo-objects-presidents.R
-
Interactive notebook:
In [1]:options(jupyter.rich_display=FALSE) # Create output as usual in R
In [2]:library(zoo) load("zpresidents.RData")
In [3]:# Leads to an error: presidents.o <- na.omit(presidents)
In [4]:zpresidents.o <- na.omit(zpresidents)
In [5]:c("Original length" = length(zpresidents), "Length after dropping NAs" = length(zpresidents.o))
In [6]:plot(zpresidents,lty=3) lines(na.contiguous(zpresidents),lwd=2)
In [7]:plot(zpresidents,lwd=2) lines(na.approx(zpresidents),lty=2) lines(na.spline(zpresidents),lty=3)
-
-
Rolling statistics
-
Script file:
rolling-statistics.R
The script makes use of the zoo package, which is available from
https://cran.r-project.org/package=zoo
Data file used in the script:
zpresidents.RData
created by earlier scriptcreating-zoo-objects-presidents.R
-
Interactive notebook:
In [1]:options(jupyter.rich_display=FALSE) # Create output as usual in R
In [2]:library(zoo) load("zpresidents.RData")
In [3]:zpresidents.o <- na.omit(zpresidents)
In [4]:zpresidents.o8 <- zpresidents.o[1:8]
In [5]:rollmean(zpresidents.o8,k=7)
In [6]:rollmean(zpresidents.o8,k=7,align="left")
In [7]:rollmean(zpresidents.o8,k=7,align="right")
In [8]:zpresidents.s <- na.spline(zpresidents) plot(zpresidents.s,lty=3)
In [10]:zpresidents.m <- rollmean(zpresidents.s,k=9) plot(zpresidents.s,lty=3) lines(zpresidents.m,lwd=2)
In [11]:zpresidents.sd <- rollapply(zpresidents.s, width=9, FUN=sd)
In [12]:tv <- qt(.975,df=8) zpresidents.u <- zpresidents.m+tv*zpresidents.sd/sqrt(8) zpresidents.l <- zpresidents.m-tv*zpresidents.sd/sqrt(8)
In [13]:plot(zpresidents.m,ylim=c(20,80)) lines(zpresidents.u,lty=2) lines(zpresidents.l,lty=2)
-
-
Time arithmetics with “zoo” objects
-
Script file:
time-arithmetic.R
The script makes use of the zoo package, which is available from
https://cran.r-project.org/package=zoo
-
Interactive notebook:
In [1]:options(jupyter.rich_display=FALSE) # Create output as usual in R
In [2]:# When in standard format, a string does not need a format spefication in order # to be translatable t0 <- as.POSIXct("2020-02-01 00:00",tz="GMT") t0
In [3]:# Adding 3600 seconds means adding an hour: t0 + 3600
In [4]:# Subtracting seconds may also change the date: t0 - 1
In [5]:# A day is 24 times 3600 seconds day <- 24*3600 t0 + day
-
-
Merging (multivariate) time series
-
Script file:
merging-timeseries.R
The script makes use of the zoo package, which is available from
https://cran.r-project.org/package=zoo
Data file used in the script:
unemployment-z.RData
created by earlier scriptcreating-zoo-objects-unemployment.R
-
Interactive notebook:
In [1]:options(jupyter.rich_display=FALSE) # Create output as usual in R
In [2]:library(zoo) load("unemployment-z.RData")
In [3]:Netherlands <- unemployment.z[,4] length(Netherlands)
In [4]:Belgium <- unemployment.z[,5] length(Belgium)
In [5]:Luxembourg <- na.omit(unemployment.z[,6]) length(Luxembourg)
In [6]:unemployment.benelux <- merge(Netherlands, Belgium, Luxembourg) head(unemployment.benelux,n=10)
-
-
Importing data into “zoo” objects
-
Script file:
importing-zoo-objects.R
The script makes use of the zoo package, which is available from
https://cran.r-project.org/package=zoo
Data file used in the script:
unemployment-z.RData
created by earlier scriptcreating-zoo-objects-unemployment.R
-
Interactive notebook:
In [1]:options(jupyter.rich_display=FALSE) # Create output as usual in R
In [2]:library(zoo)
In [3]:unemployment_z <- read.csv.zoo("unemployment.csv") str(unemployment_z)
In [4]:Text <- "2012/1/6 20 2012/1/7 30 2012/1/8 40 " read.zoo(text=Text)
In [5]:read.zoo(text=Text,format="%Y/%m/%d")
In [6]:Text <- "date,time,x,y 2011-05-08,22:45:21,4,41 2011-05-08,22:45:22,5,42 2011-05-08,22:45:23,5,42 2011-05-08,22:45:24,6,43 " zobj <- read.csv.zoo(text=Text, index.column=1:2) zobj
-