Zoo Faq
Zoo Faq
Abstract
This is a collection of frequently asked questions (FAQ) about the zoo package together
with their answers.
Keywords: irregular time series, ordered observations, time index, daily data, weekly data,
returns.
1. I know that duplicate times are not allowed but my data has them.
What do I do?
zoo objects should not normally contain duplicate times. If you try to create such an object
using zoo or read.zoo then warnings will be issued but the objects will be created. The user
then has the opportunity to fix them up – typically by using aggregate.zoo or duplicated.
Merging is not well defined for duplicate series with duplicate times and rather than give an
undesired or unexpected result, merge.zoo issues an error message if it encounters such illegal
objects. Since merge.zoo is the workhorse behind many zoo functions, a significant portion
of zoo will not accept duplicates among the times.
Typically duplicates are eliminated by (1) averaging over them, (2) taking the last among
each run of duplicates or (3) interpolating the duplicates and deleting ones on the end that
cannot be interpolated. These three approaches are shown here using the aggregate.zoo
function. Another way to do this is to use the aggregate argument of read.zoo which will
aggregate the zoo object read in by read.zoo all in one step.
Note that in the example code below that identity is the identity function (i.e. it just returns
its argument). It is an R core function:
A "zoo" series with duplicated indexes
1 2 2 2 3 4 5 5
1 2 3 4 5 6 7 8
1 2 3 4 5
1.0 3.0 5.0 6.0 7.5
2 zoo FAQ
1 2 3 4 5
1 4 5 6 8
If there is a run of equal times at end they wind up as NAs and we cannot have NA times.
> z[!is.na(time(z))]
1 2 2.3333 2.6667 3 4 5
1 2 3 4 5 6 7
The read.zoo command has an aggregate argument that supports arbitrary summarization.
For example, in the following we take the last value among any duplicate times and sum the
volumes among all duplicate times. We do this by reading the data twice, once for each
aggregate function. In this example, the first three columns are junk that we wish to suppress
which is why we specified colClasses; however, in most cases that argument would not be
necessary.
value volume
18:15:05 1100 217
18:15:06 80 201
If the reason for the duplicate times is that the data is stored in long format then use read.zoo
(particlarly the split argument) to convert it to wide format. Wide format is typically a
time series whereas long format is not so wide format is the suitable one for zoo.
IBM ORCL
2000-01-01 10 12
2000-01-02 11 13
> set.seed(1)
> z.Date <- as.Date(paste(2003, 02, c(1, 3, 7, 9, 14), sep = "-"))
> z <- zoo(cbind(left = rnorm(5), right = rnorm(5, sd = 0.2)), z.Date)
> plot(z[,1], xlab = "Time", ylab = "")
> opar <- par(usr = c(par("usr")[1:2], range(z[,2])))
> lines(z[,2], lty = 2)
> axis(side = 4)
> legend("bottomright", lty = 1:2, legend = colnames(z), bty="n")
> par(opar)
4 zoo FAQ
1.5
1.0
0.05
0.5
−0.05
0.0
−0.5
left
−0.15
right
Time
4. I have data frame with both numeric and factor columns. How do I
convert that to a "zoo" object?
A "zoo" object may be (1) a numeric vector, (2) a numeric matrix or (3) a factor but may
not contain both a numeric vector and factor. The underlying reason for this constraint is
that "zoo" was intended to generalize R’s "ts" class, which is also based on matrices, to
irregularly spaced series with an arbitrary index class. The main reason to stick to matrices
is that operations on matrices in R are much faster than on data frames.
If you have a data frame with both numeric and factor variables that you want to convert to
"zoo", you can do one of the following.
Use two "zoo" variables instead:
5. Why does lag give slightly different results on a "zoo" and a "zooreg"
series which are otherwise the same?
To be definite let us consider the following examples, noting how both lag and diff give a
different answer with the same input except its class is "zoo" in one case and "zooreg" in
another:
> lag(zr)
> diff(log(z))
> diff(log(zr))
2008-01-03 2008-01-04
0.08004271 0.07410797
lag.zoo and lag.zooreg work differently. For "zoo" objects the lagged version is obtained
by moving values to the adjacent time point that exists in the series but for "zooreg" objects
the time is lagged by deltat, the time between adjacent regular times.
A key implication is that "zooreg" can lag a point to a time point that did not previously
exist in the series and, in particular, can lag a series outside of the original time range whereas
that is not possible in a "zoo" series.
Note that lag.zoo has an na.pad= argument which in some cases may be what is being
sought here.
The difference between diff.zoo and diff.zooreg stems from the fact that diff(x) is
defined in terms of lag like this: x-lag(x,-1).
> set.seed(123)
> z <- zoo(rnorm(100), as.Date("2007-01-01") + seq(0, by = 10, length = 100))
> z.demean1 <- z - ave(z, as.yearmon(time(z)))
6 zoo FAQ
This first generates some artificial data and then employs ave to compute monthly means.
To subtract the mean of all Januaries from each January, etc. try this:
7. How do I create a monthly series but still keep track of the dates?
Create a S3 subclass of "yearmon" called "yearmon2" that stores the dates as names on
the time vector. It will be sufficient to create an as.yearmon2 generic together with an
as.yearmon2.Date methods as well as the inverse: as.Date.yearmon2.
This new class will act the same as "yearmon" stores and allows recovery of the dates using
as.Date and aggregate.zoo.
Jan 2000 Feb 2000 Mar 2000 Apr 2000 May 2000
1 2 3 4 5
A multivariate series can either be generated as (1) multiple single panel plots:
or (2) as a multipanel plot. In this case any custom axis must be placed in a panel function.
9. Why is nothing plotted except axes when I plot an object with many
NAs?
Isolated points surrounded by NA values do not form lines:
> plot(na.omit(z))
> plot(na.approx(z))
Note that this is not specific to zoo. If we plot in R without zoo we get the same behavior.
> library("timeDate")
> dts <- c("1989-09-28", "2001-01-15", "2004-08-30", "1990-02-09")
> tms <- c( "23:12:55", "10:34:02", "08:30:00", "11:18:23")
> td <- timeDate(paste(dts, tms), format = "%Y-%m-%d %H:%M:%S")
> library("zoo")
> z <- zoo(1:4, td)
> zz <- merge(z, lag(z))
> plot(zz)
> library("timeSeries")
> zz
zoo Development Team 9
z lag(z)
1989-09-28 23:12:55 1 4
1990-02-09 11:18:23 4 2
2001-01-15 10:34:02 2 3
2004-08-30 08:30:00 3 NA
> as.timeSeries(zz)
GMT
z lag(z)
1989-09-28 23:12:55 1 4
1990-02-09 11:18:23 4 2
2001-01-15 10:34:02 2 3
2004-08-30 08:30:00 3 NA
> as.zoo(as.timeSeries(zz))
z lag(z)
1989-09-28 23:12:55 1 4
1990-02-09 11:18:23 4 2
2001-01-15 10:34:02 2 3
2004-08-30 08:30:00 3 NA
Depends
AER Applied Econometrics with R
BootPR Bootstrap Prediction Intervals and Bias-Corrected Forecasting
DMwR Functions and data for ”Data Mining with R”
FinTS Companion to Tsay (2005) Analysis of Financial Time Series
MFDF Modeling Functional Data in Finance
Modalclust Hierarchical Modal Clustering
PerformanceAnalytics Econometric tools for performance and risk analysis
RBloomberg R/Bloomberg
RghcnV3 Global Historical Climate Network Version 3
StreamMetabolism Stream Metabolism-A package for calculating single station
metabolism from diurnal Oxygen curves
TSfame Time Series Database Interface extensions for fame
TShistQuote Time Series Database Interface extensions for get.hist.quote
TSxls Time Series Database Interface extension to connect to spread-
sheets
VhayuR Vhayu R Interface
delftfews delftfews R extensions
dyn Time Series Regression
dynlm Dynamic Linear Regression
fda Functional Data Analysis
forecast Forecasting functions for time series
fractalrock Generate fractal time series with non-normal returns distribution
fxregime Exchange Rate Regime Analysis
glogis Fitting and Testing Generalized Logistic Distributions
hydroTSM Time series management, analysis and interpolation for hydrolog-
ical modelling
lmtest Testing Linear Regression Models
meboot Maximum Entropy Bootstrap for Time Series
mlogit multinomial logit model
party A Laboratory for Recursive Partytioning
quantmod Quantitative Financial Modelling Framework
rdatamarket Data access API for DataMarket.com
sandwich Robust Covariance Matrix Estimators
sde Simulation and Inference for Stochastic Differential Equations
solaR Solar Photovoltaic Systems
spacetime classes and methods for spatio-temporal data
strucchange Testing, Monitoring, and Dating Structural Changes
tawny Provides various portfolio optimization strategies including ran-
dom matrix theory and shrinkage estimators
termstrc Zero-coupon Yield Curve Estimation
tgram Functions to compute and plot tracheidograms
tripEstimation Metropolis sampler and supporting functions for estimating ani-
mal movement from archival tags and satellite fixes
tseries Time series analysis and computational finance
wq Exploring water quality monitoring data
xts eXtensible Time Series
zoo Development Team 11
Enhances
chron Chronological objects which can handle dates and times
hydroTSM Time series management, analysis and interpolation for hydrolog-
ical modelling
lubridate Make dealing with dates a little easier
tis Time Indexes and Time Indexed Series
Imports
fxregime Exchange Rate Regime Analysis
glogis Fitting and Testing Generalized Logistic Distributions
hydroGOF Goodness-of-fit functions for comparison of simulated and ob-
served hydrological time series
openair Tools for the analysis of air pollution data
rasterVis Visualization methods for the raster package
Suggests
MeDiChI MeDiChI ChIP-chip deconvolution library
RQuantLib R interface to the QuantLib library
TSAgg Time series Aggregation
TSMySQL Time Series Database Interface extensions for MySQL
TSPostgreSQL Time Series Database Interface extensions for PostgreSQL
TSSQLite Time Series Database Interface extentions for SQLite
TSdbi Time Series Database Interface
TSodbc Time Series Database Interface extensions for ODBC
TSzip Time Series Database Interface extension to connect to zip files
UsingR Data sets for the text ”Using R for Introductory Statistics”
Zelig Everyone’s Statistical Software
gsubfn Utilities for strings and function arguments
latticeExtra Extra Graphical Utilities Based on Lattice
mondate Keep track of dates in terms of months
playwith A GUI for interactive plots using GTK+
pscl Political Science Computational Laboratory, Stanford University
quantreg Quantile Regression
tframePlus Time Frame coding kernel extensions
objects with the same time index. zoo provides an ifelse.zoo function that should be used
instead. The .zoo part must be written out since ifelse is not generic.
2 3 4
1 -5 -10
> # ok
> ifelse.zoo(diff(z) > 4, -z, z)
1 2 3 4
NA 5 -10 -15
1 2 3 4
NA 5 -10 -15
2 3 4
1 -5 -10
13. In a series which is regular except for a few missing times or for
which we wish to align to a grid how is it filled or aligned?
Jan 2000 Feb 2000 Mar 2000 Apr 2000 May 2000 Jun 2000
1 2 3 3 4 5
A variation of this is where the grid is of a different date/time class than the original series.
In that case use the x argument. In the example that follows the series z is of "Date" class
whereas the grid is of "yearmon" class:
zoo Development Team 13
What is the difference between as.Date in zoo and as.Date in the core of
R?
zoo has extended the origin argument of as.Date.numeric so that it has a default of
origin="1970-01-01" (whereas in the core of R it has no default and must always be spec-
ified). Note that this is a strictly upwardly compatible extensions to R and any usage of
as.Date in R will also work in zoo.
This makes it more convenient to use as.Date as a function input. For example, one can
shorten this:
2000-01-01 2000-01-02
1 2
to just this:
2000-01-01 2000-01-02
1 2
2000-01-01 2000-01-02
12 13
to this:
2000-01-01 2000-01-02
12 13
Note to package developers of packages that use zoo: Other packages that work with zoo and
define as.Date methods should either import zoo or else should fully export their as.Date
methods in their NAMESPACE file, e.g. export(as.Date.X), in order that those methods be
registered with zoo’s as.Date generic and not just the as.Date generic in base.
> system.time({
+ zc <- coredata(z)
+ tt <- time(z)
+ zr <- sapply(seq_along(zc),
+ function(i) sum(zc[tt <= tt[i] & tt > tt[i] - 3]))
+ z2 <- zoo(zr, tt)
+ })
zoo Development Team 15
[1] TRUE
Affiliation:
zoo Development Team
R-Forge: http://R-Forge.R-project.org/projects/zoo/
Comprehensive R Archive Network: http://CRAN.R-project.org/package=zoo