Applied Nonparametric Econometrics
Applied Nonparametric Econometrics
Material will be taken mainly from Li, Q. and J.S. Racine (2007), Nonparametric
Econometrics: Theory and Practice, Princeton University Press (direct link to PUP).
The np package for R (direct link to CRAN) will be used for the nonparametric and
semiparametric analysis. This can be installed directly from R via the Comprehensive R
Archive Network (cran.r-project.org).
We shall cover a range of topics in this workshop. In the early part of the workshop we shall
emphasize the statistical foundation for the methods studied. Having been exposed to the
underpinnings of nonparametric kernel-based methods, we will then cover a range of topics
of interest to applied researchers.
Jeffrey S. Racine
Department of Economics
McMaster University
Hamilton, Ontario, Canada
www.economics.mcmaster.ca/faculty/racinej
Topics Covered and Workshop
Structure
Topics covered will include the following:
1. Nonparametric density and probability function estimation (Li & Racine (2007), chapters
1, 3, 4)
2. Nonparametric regression (Li & Racine (2007), Chapter 2)
3. Nonparametric testing of hypotheses (Li & Racine (2007), chapters 12 & 13)
Here is a zip file containing the R code for examples used in the lecture slides
(slides_code.zip).
Day 1: During the first morning’s lecture we will introduce nonparametric methods, present a
number of illustrative examples, compare and contrast nonparametric and parametric
models, and then study the underpinnings of nonparametric density estimation. The
afternoon lab session will introduce students to the R environment and the np package and
have them conduct some rudimentary analysis.
Day 2: During this morning’s lecture we will build on the density estimation framework and
then move into a regression framework. Motivating examples will be presented, then we will
study in detail the local constant and local linear estimators. The afternoon lab session will
involve the use of parametric and nonparametric regression models in the R environment.
Day 3: During this morning’s lecture we will build on the regression framework and will look
at constructing partial regression and gradient surfaces, assessing variability and conducting
forecasts. We then will consider consistent hypothesis testing in a nonparametric framework.
The afternoon lab session will continue to develop students' competence with R and will
consider methods for assessing relative performance of parametric and nonparametric
regression models.
Handouts
This page contains a syllabus along with a few supplementary handouts designed to assist
you with the material presented in this workshop. The handout on orders of magnitude is to
assist with notation found throughout the lectures. The handout on resampling methods is a
brief introduction intended to convey an overview of what resampling methods are, how they
can be used, and their place in nonparametric settings (virtually all software routines you will
encounter in this workshop use resampling methods for one thing or another).
Lab Material
You are strongly encouraged to become acquainted with the application of methods outlined
during the lectures. To assist with this task I have structured some semi-formal labs to guide
you through some of the more popular estimators, and I encourage you to experiment and
discover features at your leisure. My hope is that these labs will get you working with your
own data in short order.
Each lab module presents you with a series of practical exercises to help you become
familiar with R and the np package. You can learn how to read datasets from a variety of
programs using the `foreign’ package in R. You will also have the opportunity for some
simple R programming where you will learn the mechanics of nonparametric estimation.
Actual and simulated datasets will be used for demonstration. As the labs progress over the
course of the workshop you are encouraged to explore your own unique datasets and
discuss these with me if you desire.
Below you will find the exercises for each lab module:
1. 1.TBA
These labs are a great opportunity to migrate a project you are working on or interested in
into the R framework and thereby the nonparametric methods developed during the lectures.
Using R and RStudio
In this workshlp we shall be using `R’ for our data analysis (the underlying statistical engine)
and the R front-end `RStudio’. Since RStudio automatically calls R when both are installed, I
shall be using RStudio extensively as you will see. You can run R in `stand alone’ mode, but
RStudio is an integrated development environment that provides a much more intuitive front
end for the user (plus it is platform independent, so whether you use Linux, Mac OS X, MS
Windows etc. we will all have the identical menus/options available). Below we discuss each
in turn.
First, what is R? This is perhaps best answered by quoting from the R website (www.r-
project.org) directly (see “What is R?” on the R website for even more details, and also see
two New York Times articles for further background information (article 1 Jan 6 2009, article
2 Jan 8 2009)).
From the R website, we see that “R is a language and environment for statistical computing
and graphics. It is a GNU project which is similar to the S language and environment which
was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John
Chambers and colleagues. R can be considered as a different implementation of S. There
are some important differences, but much code written for S runs unaltered under R.
R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical
tests, time-series analysis, classification, clustering, and so forth) and graphical techniques,
and is highly extensible. The S language is often the vehicle of choice for research in
statistical methodology, and R provides an Open Source route to participation in that activity.
One of R's strengths is the ease with which well-designed publication-quality plots can be
produced, including mathematical symbols and formulae where needed. Great care has
been taken over the defaults for the minor design choices in graphics, but the user retains
full control.
R is available as Free Software under the terms of the Free Software Foundation's GNU
General Public License in source code form. It compiles and runs on a wide variety of UNIX
platforms and similar systems (including FreeBSD and Linux), Windows and MacOS.”
For an introduction to R you have a range of options. One popular source is titled “An
Introduction to R” that some may find useful (PDF): R-intro.pdf. Or, having installed R, you
can browse the help facilities that are available within R itself. Or you can see the page
`Getting Help with R’ on the RStudio website (www.rstudio.org/docs/help_with_r).
Here is a link to a set (90+) of `two minute tutorial’ videos describing `how do do stuff in R in
two minutes or less’ (www.twotorials.com).
For a variety of documents that will assist with using RStudio, kindly see the FAQ
(www.rstudio.org/docs/faq) and the documents section of the RStudio website located at
www.rstudio.org/docs.
1.Racine, J.S. (2012), "RStudio: A Platform-Independent IDE for R and Sweave," Journal of
Applied Econometrics, Volume 27, Issue 1, 167-172.
2.Racine, J.S. and R. Hyndman (2002), "Using R to Teach Econometrics," Journal of
Applied Econometrics, March/April, Volume 17, Issue 2, 175-189.
Sweave/knitr
The `Sweave’ package for the R statistical computing environment enables the user to
construct a single file which includes both the code to be run in R and the TeX code
comprising the text of the document. Files containing both types of code are referred to as
.Rnw files. The various sections (`chunks') of R and TeX code are included in the file in the
order in which they are to be employed in the final document.
Sweave then weaves together the code chunks to produce a .tex file that may be compiled
using TeX. By using Sweave, an individual can create a dynamic document which includes
both the statistical analysis and the methods by which the output underlying the analysis are
obtained. This process sidesteps a major source of research errors, namely, the
misreporting of computer output.
`knitr’ is an R package that is similar to Sweave but adds much missing functionality, and
many find it preferable to Sweave. Both are supported in RStudio provided you have TeX
installed on your system and have installed the R package `knitr’. RStudio provides a
streamlined `one click' compilation of your project.
Here is a simple illustration using R and LaTeX (example.Rnw, example.bib). To run this first
save both files to a directory, open RStudio and change to the directory where you saved the
files (navigate the RStudio menu Session -> Set Working Directory -> Choose Directory),
open the example.Rnw file in RStudio (navigate the File -> Open File menu), change the
default Sweave driver to knitr (navigate the RStudio menu Preferences -> Sweave -> Weave
rnw files using knitr and also select the option `pdflatex’ in the `Typeset LaTeX into pdf using’
entry field - note that you only need to do this once), then simply click the `Compile PDF’
icon, wait a minute for the R code to run, then if all goes well you will be presented with the
PDF of the results (this creates the document displayed in the figure on this page - click on
the figure for the PDF document itself, and after the initial setup for knitr/pdflatex you simply
click on the `Compile PDF’ icon to process a knitr document).
Meredith, E. and J.S. Racine (2009), "Towards Reproducible Econometric Research: The
Sweave Framework," Journal of Applied Econometrics, Volume 24, 366- 374.