Modeling space
Modeling spatial dependence requires a representation
corresponding to the spatial arrangement through matrices of
spatial weights (W)
Spatial ponderingsI define the interaction of each
territorial units with their neighbors; the maximum number of
possible interactions: n(n-1)/2.
Since we cannot estimate all these spatial relationships,
we introduce a certain structure in the analysis of relationships:
Only the 'neighbors' interact.
We are reducing the number of neighbors to simplify.
estimation.
Spatial regression models
Before running a model with spatial variables, it is necessary
the existence of spatial influence has been verified
Moran's I statistic for testing dependence
spatial (spatial autocorrelation).
Spatial dependence leads to specification errors in
classical linear regression models: hypothesis
independence of the corresponding regions
the neighboring is still violated.
The solution: explicit incorporation into the regression model of
univariate spatial (spatial errors at the spatial level).
By multiplying the matrix of spatial weights W by
n x n dimension with the n x 1 vector of regional values of
the variableiy obtains the n x 1 vector containing the spatial lag
for each observation i (i=1,...,n).
Matrix form of the spatial autoregressive process
this
y = ρWy + e
Spatial lag of variable y (influence of neighbors)
The parameter ρ expresses the intensity of spatial dependence in
the analyzed sample of observations.
•e- errors
Spatial regression - variant I. SPATIAL LAG MODEL - dependence
spatial is included through a spatial lag of variable Wyal
endogeneity.
y=ρWy+βX + e
Lagrange Multiplier test for spatial lag (LM-lag)
H0ρ = 0 => y = βX + e (=> this is suitable for classic regression)
H1ρ ≠ 0
Statistics:
Decision:LMℓ > χ2(1;1-α) => we reject H0classical regression
Spatial regression - variant II. SPATIAL ERROR MODEL
spatial dependence in the regression model is reflected in
errors, through the spatial lag of errors:
y=Xβ+e
e = λWe + ν
Lagrange Multiplier test for spatial error (LM-error)
H0: λ = 0 (there is no spatial dependence).
H1λ ≠ 0.
Statistics of the test:
are distribution χ2(hi-square) with 1 degree of freedom.
Decision: LMe> χ2(1;1-α) => reject H0classical regression
Steps of estimating spatial models
Estimation OLS (classic model) OUTPUT
Moran's I for errors less than 0.05 indicates
the rejection of the classical model (OLS).
–LM Error statistics or LM Lag significantly
(smaller probability) indicates the most spatial model
appropriate; check with different types of matrices
spatial weighting;
Run the model suitable for the better variant (spatial-
error or spatial-Lag.
APPLICATION IN GEODA: spatial regression
Initiating a new working session
GeoDa is activated (by double-clicking the shortcut)
Select the file [Link] from the desktop and move it into the box.
”Drop files here”.
We open the table (which for now does not
contains data).
2. We import the data (the model variables) from the Excel file 'Data'
regression”:Table–Merge Table Data=> opens a window
dialog:
We select the file
Excel from which
we import
the variables.
Include all
the variables from
Excel file
We select the key of
identify (county name)
after which the tables will be merged.
Her name in the file
.shp destination (COUNTY) must
to be different from the name in
source-file .xls (county).
The confirmation message:
The imported data is now in the table.
The regression model:
-Variabila dependentă: salariul mediu anual (lei)
-Explanatory variables: GDP per capita (lei), unemployment rate (%), FDI (thousands
euro).
First, we need to define the matrix of spatial weights.
(to identify the neighbors).
We activate the W (weights) button from the main menu.
We create a binary matrix Queen, just with adjacent neighbors (neighbors of
order 1) because we have few territorial units.
We save the matrix named '[Link]' in the 'judete' folder:
In the main menu
selecting Regression
A window opens to
let's choose the salary as
dependent variable and
the rest as variables
explanatory (covariates).
I. We run the model first
classic regression.
However, we bite weights
file(activate the matrix
spatial weights
[Link]) in order to
get the statistics
space (necessary
to choose the model
classical regression
from the space station).
We choose the type of model
(Classic) and the statistics
Dorite, Run.
OUTPUT
The variables
social rat and FDI do not
are significant
we will vomit
eliminate.
Although there is dependency
spatial (prob Moran=
0.05), no type of
the spatial model is not
validate (prob LM >
0.05) => problem of
specificare a
of the model.
The Reset button initializes the estimation of a new model.
We run the classic model again, without the insignificant variables.
Unica variabilă explicativă este acum PIB_loc.
The new OLS model is
validate the tests
standard statistics
•R2mare, prob F
almost null, no
we reject the hypotheses of
homoscedasticity and
normal distribution of
of the errors.
However, the classical model does not
it is valid because there is
spatial dependence
(see prob Moran).
The model with errors
spatial is indicated by
the LMca test being the
more suitable.
Traditional statistics:
•R2and R2adjust
sum of squared residuals
the variance of the residuals and the estimation of the standard error, in two
variant:
with the adjustment for the loss of degrees of freedom
(Sigma-square and S.E. of regression)
without adjustment (Sigma-square ML and S.E. of regression
ML)
Statistics for comparability with the models of
spatial regression:
•log likelihood (the higher, the better the model is)
bun)
The Akaike criterion and the Schwarz criterion (the smaller they are, the better)
the model fits better.
Valorile estimate (OLS_PREDIC) ș i reziduurile (OLS_RESIDU)
the classic regression model can be saved in the data table,
then they can be used to build maps => visual inspection of
models.
The value maps estimates are adjusted ("smoothed") in the sense that
random variability, due to other factors than those included in
model has been removed.
Maps of residues
The most useful is the 'standard deviational map':
-wide areas of over-prediction (negative residues or tones
blue) and underprediction (positive residues or brown tones) =>
the presence of spatial autocorrelation (also requires a formal test).
the magnitude of the residues, especially those greater than
Two standard deviations indicate the absence of significant regressors.
II. Spatial Error Model
(Spatial Error). Statistics show that
it is better than the classic: Log
greater likelihood, Akaike and
Schwarz mai mici, Likelihood Ratio
significant test (prob<0.05).
And R2(pseudo-R2) this
bigger, but it is not
directly comparable to that
for OLS.
We save the theoretical values and the model residues (with the button Save to
Table
ERR_PREDIC = estimarea
for the dependent variable y
ERR_RESIDU = reziduurile
model (estimates for
the term error) used
for standard tests
ERR_PRDERR = eroarea de
prediction (the difference between
the real value and the estimated value;
residuals spatial estimation
ERR = tipul modelului (error)
Moran diagram for residues: Space–Univariate Moran’s I–
selection variable.
The space model has eliminated
spatial autocorrelation of Prediction errors
of residues (by including spatial transformations are
spatially correlated by definition.
this one as an explanatory variable =
the law of errors).
Economic interpretation of the results:
The average salary at the county level depends on the variable
GDP per capita (level of development, wealth), but
it also depends on the average salaries practiced in the counties
adjacent (most likely due to the mobility of the force of
work, either permanently - by changing the residence, or by
navetism.
Contrary to economic theory, unemployment does not exert pressure.
significant impact on the average salary (due to the level
relative reduction of unemployment in Romania, a result of emigration
external mass)
Contrary to expectations, foreign direct investments have no
influence on the territorial variation of salaries.
How do we choose the right model?
1. The standard tests for the classic regression model (R2mare,
prob F < 0.05, prob JB > 0.05, prob White > 0.05
multicollinearity condition number < 30, regressors
significant etc.)
2. Moran's I test for errors < 0.05 => we reject the null hypothesis o
random spatial distribution => there is spatial autocorrelation
of the errors, therefore we reject OLS (the classical model)
3. Tests based on Lagrange Multiplier (LM) show what it is
the best alternative to OLS:
Autoregressive process (spatial lag) when LM-lag < 0.05
The model with spatial errors (spatial error) when LM-error <
0.05
If LM-lag < 0.05 and LM-error < 0.05, we compare
their robust variants and we choose the smallest cucel model
Robust LM.
Additional criteria
This model is better with:
The higher value oflog-likelihood
Lower value forAkaike information criterion
(AIC) and Schwarz criterion (SC)
Likelihood Ratio TestI am small. Likelihood Ratio Test
compare OLS with the spatial model. Associated prob the most
small (mandatory under 0.05) indicates that the spatial model
it is better than the classic one.
Spatial gravitational model
The gravitational model is a regression model initially used for
estimation of trade flows between countries.
inspired by Newton's law of gravitation which shows that attraction
The gravitational force between two objects is directly proportional to their masses.
the inverse proportional law of the distance between them.
Yi Yj
FCij G
Di j
undeFCij-the value of trade flows from the country (i) to the countries of
destination
Yiandjthese are the dimensions of the economies of the two countries (usually,
measured as gross domestic product - GDP, or GDP per capita,
Dijthe geographical distance between countries,
G - gravitational constant.
To facilitate econometric estimations, the equation is logged.
gravitational, resulting in a linear relationship:
in FCij= lnG + αln YI+βln Yj-δln Dij+ eij
in Gresponses of the intercept, where α, β, and δ are elasticities.
Exports from the country to the country depend on three factors:
the potential (supply) of export from the country: a positive function of the level
revenues of the exporting country;
the potential import demand from country j: a positive function of income
importing countries;
trade barriers: a negative function of trade costs,
transport costs (proportional to the distance between countries) and
of rates.
The hypotheses of the model:
The economic dimension (GDP) increases bilateral trade
(large countries trade more with each other)
Trade increases when partners are closer to the point.
from a geographical point of view.
There is a positive relationship between per capita income differences.
and bilateral trade (the more different the countries are from each other,
presenting a comparative factorial advantage, the exchanges
growth).
Extended gravitational model:
in FCij= lnG + αln YI+βln Yj-δln Dij+ρ(Yi/Li) +η(Yj/Lj) +
φAij+ eij
•Yiș iYjis the GDP of the country respectively,
•(Yi/ Li) and (Yj/ LjGdp per capita of the countries and respectively; (other
variant:DEijthe difference between the GDP per capita of others and GDP
The residents of the other reflect the economic distance between them.
partners
•Dijthe geographical distance between the economic centers of the two
partners (proxy for transportation costs),
Aij old variable: favorable factors (the existence of trade agreements
between the two countries, common language and historical ties) or unfavorable
(barriers to trade), preference variables (demand
for luxury goods compared to necessities), the variables of
endowment etc.
The spatial gravitational model
The model with spatial lag
in FCij= ln G + ρW ln FCij+ αln Yi+βln Yj-dln Dij+ω(Yi/Li) +
+η(Yj/Lj) +φAij+ eij
under W ln FCij– spatial lag (reflects the influence of foreign trade from
neighboring countries/regions.
The model with spatial errors
in FCij= ln G + αln Yi+βln Yj-dln Dij+ω(Yi/Li) +η(Yj/Lj) +φAij
+ eij eij =λW eij+vij
where λW isijThe spatial lag of errors: reflects the influence of other factors
(not included in the model) in neighboring countries/regions.
Example 2. Gravitational model for foreign direct investments
-ISD is explained by the size of the countries of origin and host and the distance.
geographic between them.
Empirical results suggest that when the size of a country
his ability to invest abroad is higher.
On the other hand, if the size of the host country is large, it
represents a potential market and will attract high levels of FDI.
Example 3. Gravitational model for migration
Migration flows depend on the size of the countries of origin and host,
the economic distance and the geographical distance between them.
Example 4. Gravitational model for tourism
The flows of tourists depend on the size of the countries of origin and host,
future and the geographical distance between them.
Application 1. The gravitational model for exports
Romania in the EU
We initiate a new work session by double-clicking on the UE27 project file.
Explanatory note
Previously (in another working session) a set of files was created.
spatial for the EU without Romania. Since in the gravitational model
we use data regarding Romania's exports to the other EU countries,
we only need these countries in the model (so the files too
spatial).
Procedure for disaggregation of geographic areas (as it was
described in high course):
We are uploading [Link] (including all EU countries),
We select Romania (on the map, with a simple click, or in the table, with a
click on the left margin of the row corresponding to Romania,
Table-Invert selection (for EU selections, less than
Romania
File-Save selection as file type (shapefile), name (EU27)
Save.
The application data is located in the Excel file 'S9. Gravitational model data';
the values of the variables are logarithmic and refer to the year 2016.
lnExp(dependent variable) - the value of Romania's exports to the country
(million euros); lnGDP - GDP values (million euros); lnD - road distance (km)
between Bucharest and the capital of the country; lnPOP - population (thousand people).
Importing data from Excel:
Table–Merge– selection
source file (S9. Data models
gravitational) - selection
keys for identifying countries
(two-letter alphabetic code)–
variable selection
imported from Excel (including
only the logarithmic variants
but of the variables)–clickMerge
-clickClose.
To run the regression model, we need to load a matrix of
spatial weights using Weights Manager: click in the main menu
W–Load–we select the matrix [Link] (which has been manually corrected
for islands during 4. Spatial matrices; in addition, for this
the Romania (RO) application has been removed from this matrix by
manual deletion in Notepad –see course 4)–OK
ClicHistogram–we check if all
countries have neighbors (condition for
the spatial regression model.
We first run the classic regression model, with the Queen1 matrix activated.
(to obtain spatial dependency diagnoses).
Dependent variable:
Romania's exports in
EU countries.
Option 1: variables
explanatory: GDP
importing countries and
geographical distance
compared to this.
The regressors are
significant and
under the sign of
expected
(according to the theory
of the model
gravitational).
It does not exist
dependency
spatial. The model
classic (OLS) is
preferably.
Option 2: we run a
new model in which
we replace GDP with
population (not
include both
simultaneous variable in
model because I am
powerful correlations); and
population variable
this is significant and
there is the expected sign.
Not even in this model
there is dependence
spatial. The model
classic (OLS) is
preferably.
Application 2. The gravitational model for the number of
EU tourists coming to Romania
The data is already in the file.
[Link].
We first run a classical model
(OLS), with the Queen1 matrix
activated.
Dependent variable:
lnTUR16 - the number of
tourists in 2016.
Option 1: variables
explanatory: distance
and GDP. Both are
significant and have
the expected sign.
The diagnoses of
spatial dependence
recommend the model
with spatial lag.
The output of the model
with spatial lag.
This model is more
but rather than the classic one:
Log likelihood in
mare, Akaike and
Smaller Schwartz.
spatial lag variable
this is significant.
Likelihood Ratio
The test confirms the fact
the model with lag
spatial is better
than the classic one:
prob. < 0.05.
Option 2: variables
explanatory: the distance and
population. All regressors
they are significant and have
the expected sign:
the number of tourists that
Wine in Romania is increasing
along with the size
population of the country of origin
and decreases with distance
in relation to this.
The model with errors
spatial could be
suitable for these
date, to shape
the robustness of the test is not
confirm the choice.
The model output
with spatial errors
shows that it is more
but rather than the classic one:
In log likelihood
mare, Akaike and
Smaller Schwartz,
and Likelihood Ratio
Tests are probably less than 0.05.