Spatial Econometrics with R
Last Updated :
08 Aug, 2024
Spatial econometrics involves the study and modeling of spatial relationships and dependencies in econometric data. It is an essential field for understanding spatial phenomena in economics, geography, and various other disciplines. This guide covers the theoretical foundation of spatial econometrics and practical examples using R Programming Language.
Introduction to Spatial Econometrics
Spatial Econometrics is a subfield of econometrics that deals with spatial interdependence and spatial heterogeneity in regression models. It is used to analyze spatial data where observations are not independent but influenced by their spatial location.
- Spatial Autocorrelation: Measures the degree to which similar values occur near each other in space.
- Spatial Lag Models (SLM): Incorporate spatially lagged dependent variables to account for spatial autocorrelation.
- Spatial Error Models (SEM): Incorporate spatially autocorrelated error terms to account for spatial dependency in residuals.
- Spatial Durbin Models (SDM): Extend spatial lag models by including spatial lags of both dependent and independent variables.
Spatial Weight Matrices
A critical component in spatial econometrics is the spatial weight matrix (W), which defines the spatial structure of the data. It quantifies the spatial relationships between observations.
- Contiguity-Based Matrices: Define neighbors based on shared boundaries.
- Distance-Based Matrices: Define neighbors based on distance thresholds.
Practical Examples of Spatial Econometrics Using R
First Install and load the necessary packages for spatial econometrics:
install.packages(c("spdep", "sf", "spatialreg"))
library(spdep)
library(sf)
library(spatialreg)
Example 1: Spatial Autocorrelation
Load a spatial dataset and prepare the spatial weight matrix:
R
# Load an example dataset
nc <- st_read(system.file("shape/nc.shp", package="sf"))
# Create a spatial weight matrix based on contiguity
nb <- poly2nb(nc)
W <- nb2listw(nb, style="W")
head(nc)
Output:
Simple feature collection with 6 features and 14 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -81.74107 ymin: 36.07282 xmax: -75.77316 ymax: 36.58965
Geodetic CRS: NAD27
AREA PERIMETER CNTY_ CNTY_ID NAME FIPS FIPSNO CRESS_ID BIR74 SID74 NWBIR74
1 0.114 1.442 1825 1825 Ashe 37009 37009 5 1091 1 10
2 0.061 1.231 1827 1827 Alleghany 37005 37005 3 487 0 10
3 0.143 1.630 1828 1828 Surry 37171 37171 86 3188 5 208
4 0.070 2.968 1831 1831 Currituck 37053 37053 27 508 1 123
5 0.153 2.206 1832 1832 Northampton 37131 37131 66 1421 9 1066
6 0.097 1.670 1833 1833 Hertford 37091 37091 46 1452 7 954
BIR79 SID79 NWBIR79 geometry
1 1364 0 19 MULTIPOLYGON (((-81.47276 3...
2 542 3 12 MULTIPOLYGON (((-81.23989 3...
3 3616 6 260 MULTIPOLYGON (((-80.45634 3...
4 830 2 145 MULTIPOLYGON (((-76.00897 3...
5 1606 3 1197 MULTIPOLYGON (((-77.21767 3...
6 1838 5 1237 MULTIPOLYGON (((-76.74506 3...
Calculate Moran's I for the dataset
Moran's I is a measure of spatial autocorrelation:
R
# Calculate Moran's I for a variable
moran.test(nc$BIR74, listw = W)
Output:
Moran I test under randomisation
data: nc$BIR74
weights: W
Moran I statistic standard deviate = 2.4055, p-value = 0.008074
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation Variance
0.139319332 -0.010101010 0.003858258
- The Moran I statistic is
0.1393
, indicating a slight positive spatial autocorrelation. - The Z-score is
2.4055
, suggesting the observed Moran I is 2.4 standard deviations above the expected value under the null hypothesis. - The p-value is
0.008074
, which is less than the common significance level of 0.05, indicating that the spatial autocorrelation is statistically significant.
In conclusion, there is significant evidence to reject the null hypothesis of no spatial autocorrelation in the BIR74
variable, suggesting that areas with high values of BIR74
tend to be near other areas with high values, and areas with low values of BIR74
tend to be near other areas with low values.
Example 2: Spatial Lag Model (SLM)
Fit a spatial lag model using the lagsarlm
function from the spatialreg
package:
R
# Load necessary libraries
library(sf)
library(spdep)
library(spatialreg)
# Load the example dataset
nc <- st_read(system.file("shape/nc.shp", package="sf"))
# Create a spatial weight matrix based on contiguity
nb <- poly2nb(nc)
W <- nb2listw(nb, style="W")
# Fit a spatial lag model with the correct variable names
slm <- lagsarlm(SID74 ~ NWBIR74 + BIR74, data = nc, listw = W)
summary(slm)
Output:
Call:lagsarlm(formula = SID74 ~ NWBIR74 + BIR74, data = nc, listw = W)
Residuals:
Min 1Q Median 3Q Max
-11.42866 -1.56581 -0.54971 1.13618 14.20567
Type: lag
Coefficients: (asymptotic standard errors)
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.89337654 0.63911040 1.3978 0.162160
NWBIR74 0.00357200 0.00054522 6.5515 5.696e-11
BIR74 0.00052249 0.00020037 2.6077 0.009116
Rho: 0.043388, LR test value: 0.33608, p-value: 0.5621
Asymptotic standard error: 0.0764
z-value: 0.5679, p-value: 0.5701
Wald statistic: 0.32251, p-value: 0.5701
Log likelihood: -261.2544 for lag model
ML residual variance (sigma squared): 10.879, (sigma: 3.2983)
Number of observations: 100
Number of parameters estimated: 5
AIC: 532.51, (AIC for lm: 530.84)
LM test for residual autocorrelation
test value: 0.021662, p-value: 0.88299
The spatial lag model for SID74
indicates that NWBIR74
and BIR74
have significant positive effects on SID74
, with p-values of 5.696e-11
and 0.009116
, respectively. The spatial autoregressive coefficient (Rho) is 0.043388
, but it is not statistically significant (p-value: 0.5701
). The model's residuals show no significant spatial autocorrelation (p-value: 0.88299
). The model's log-likelihood is -261.2544
, and the AIC is 532.51
, indicating a slightly worse fit than a standard linear model (AIC for lm: 530.84).
Conclusion
Spatial econometrics provides powerful tools for analyzing spatial dependencies in econometric data. Using R, you can easily perform spatial econometric analysis, including testing for spatial autocorrelation, fitting spatial regression models, and visualizing spatial data. This guide covered the fundamental theory and practical examples to get you started with spatial econometrics in R.
Similar Reads
Geospatial Data Analysis with R
Geospatial data analysis involves working with data that has a geographic or spatial component. It allows us to analyze and visualize data in the context of its location on the Earth's surface. R Programming Language is a popular open-source programming language, that offers a wide range of packages
5 min read
What is Spatial Analysis?
Have you ever wondered how city planners decide where to build schools, hospitals, or parks? How did authorities track and manage the spread of COVID-19 to contain the outbreak effectively? How are vaccination strategies devised and monitored to ensure equitable distribution? How are such precise ma
9 min read
Parametric Inference with R
Parametric inference in R involves the process of drawing statistical conclusions regarding a population using a parametric statistical framework. These parametric models make the assumption that the data adheres to a specific probability distribution, such as the normal, binomial, or Poisson distri
7 min read
What can you do with R?
R is a powerful programming language specifically designed for statistical computing and data analysis. Its versatility and extensive functionality have made it a popular choice among data scientists, statisticians, and analysts across various fields. This article delves into What can you do with R?
4 min read
How To Start Programming With R
R Programming Language is designed specifically for data analysis, visualization, and statistical modeling. Here, we'll walk through the basics of programming with R, from installation to writing our first lines of code, best practices, and much more. Table of Content 1. Installation2. Variables and
12 min read
Types of Statistical Series
In statistics, data is often organized in series to facilitate analysis and interpretation. A statistical series refers to a set of observations arranged in a particular order based on one or more criteria. Understanding the different types of statistical series is crucial for effectively analyzing
12 min read
Data Science Tutorial with R
Data Science is an interdisciplinary field, using various methods, algorithms, and systems to extract knowledge and insights from structured and unstructured data. Data Science combines concepts from statistics, computer science, and domain knowledge to turn data into actionable insights. R programm
3 min read
What is Geospatial Data Analysis?
Have you ever used a ride-sharing app to find the nearest drivers, pinpointed a meeting location on a map, or checked a weather forecast showing precipitation patterns? If so, you have already interacted with geospatial analysis! This widespread, versatile field integrates geography, statistics, and
11 min read
Coloring Points Based on Variable with R ggpairs
This article will explain how to color points based on a variable using ggpairs() By adding color to the points in a pairwise plot based on a categorical or continuous variable, we can easily see how different categories or ranges of values behave across multiple pairwise relationships using R Progr
3 min read
What is Statistical Analysis?
In the world of using data to make smart decisions, Statistical Analysis is super tool. It helps make sense of all the raw data. Whether it's figuring out what might happen in the market, or understanding how people behave when they buy things, or making a business run smoother, statistical analysis
11 min read