Spatial Data Science MOOC Section 4 Exercise 1 - Detect Patterns
Spatial Data Science MOOC Section 4 Exercise 1 - Detect Patterns
Exercise
Detect patterns
Section 4 Exercise 1
November 20, 2020
Spatial Data Science MOOC
Detect patterns
Time to complete
45 minutes
Introduction
Statistical cluster analysis can help you minimize the subjectivity in your maps by identifying
meaningful clusters in your data. The Hot Spot Analysis and Outlier Analysis tools use
statistics to detect spatial patterns in your data, but each provides slightly different
information about these patterns.
Hot Spot Analysis uses the Getis-Ord Gi* statistic to identify statistically significant spatial
clusters of high values (hot spots) and low values (cold spots).
Outlier Analysis uses the Anselin Local Moran's I statistic to identify statistically significant
clusters of high and low values and to detect spatial outliers, or features with values
significantly dissimilar from their neighbors.
ArcGIS provides traditional and optimized statistical cluster analysis tools. The optimized
statistical cluster analysis tools interrogate your data to provide smart default values,
optimizing the analysis workflow. The traditional statistical cluster analysis tools allow you
more flexibility in defining the spatial relationships in your data, providing you with more
control of your analysis. In this exercise, you will use the optimized statistical cluster analysis
tools to explore the spatial patterns in the data.
Exercise scenario
The Supplemental Nutrition Assistance Program (SNAP) is a federal program that helps
families buy nutritional food to maintain their health and well-being. In this exercise, you will
complete a Hot Spot Analysis and Outlier Analysis to find meaningful patterns of high and low
SNAP participation. This information can help decision makers distribute resources more
efficiently and equitably, ensuring that healthy food is accessible to all SNAP recipients.
c Extract the files to a folder on your local computer, saving the files in a location that you
will remember.
c In the bottom-left corner of the ArcGIS Pro Start page, click Open Another Project.
Note: If you have configured ArcGIS Pro to start without a project template or with a default
project, you will not see the Start page. On the Project tab, click Open, and then click Open
Another Project.
d In the Open Project dialog box, browse to the PatternDetection_SpaceTime folder that
you saved on your computer.
Your ArcGIS Pro project includes a map of the counties in the contiguous United States. Each
county is symbolized by the rate of the population that participated in SNAP during 2016.
a In the Geoprocessing pane, under the search field, click the Toolboxes tab.
Note: If you closed the Geoprocessing pane: Analysis tab > Geoprocessing group > Tools.
f Click Run.
The result of your analysis is a layer displaying hot spots in three shades of red and cold spots
in three shades of blue. The varying shades correspond to three confidence intervals,
indicating how confident you can be that these patterns are meaningful and not the result of
random chance. You will review the analysis details to ensure that the parameters were
appropriate for your question.
If you choose the default values for the Optimized Hot Spot Analysis tool, review
the geoprocessing details to identify the default parameter values. Ensure that
these values are appropriate for the scale of your analysis.
b Expand Messages, and then review the geoprocessing details to answer the following
question.
_______________________________________________________________________________
The tool chose a default distance band of approximately 150 kilometers (km) based on the
average distance to 30 nearest neighbors. This default is a good place to start exploring your
data, but it may not represent the scale at which you want to analyze patterns in your dataset.
In this example, a 150 km distance band is too large because you want to analyze more local
patterns in SNAP participation. You will reduce the distance band to 75 km to detect more
local patterns in this county-level dataset.
e Under Distance Band, type 75, and then next to Feet, click the down arrow and choose
Kilometers.
f Click Run.
Reducing the size of the distance band identified more detailed patterns. This scale is more
appropriate for this particular analysis.
_______________________________________________________________________________
_______________________________________________________________________________
_______________________________________________________________________________
The results of this statistical analysis provide a measure of confidence that can help you
identify areas with clusters of high SNAP participation. You can use this information to
investigate these areas and their access to stores that accept SNAP and carry healthy foods.
a In the top-left corner of the Geoprocessing pane, click the Back button , and then
search for outlier.
The Performance Adjustment field defines the number of permutations to create a random
distribution. The tool will then compare your data's spatial distribution with the randomly
generated values. To balance precision and processing time, you will leave the default. For
more information about permutations, see ArcGIS Pro Help: How Cluster and Outlier Analysis
(Anselin Local Moran's I) works.
When you compare the results of a Hot Spot Analysis and Outlier Analysis, use
the same distance band in the analysis.
f Click Run.
Note: The permutations in the Optimized Outlier Analysis tool compare your data values to a
set of randomly generated values. Therefore, your results may vary slightly from the preceding
graphic.
The bright red and blue features represent spatial outliers. Features with high values
surrounded by areas with low values are called High-Low outliers and are displayed in red.
Features with low values surrounded by areas with high values are called Low-High outliers
and are displayed in blue. The pink and light blue colors indicate clusters of features with
statistically significantly high values (pink) and statistically significantly low values (light blue).
These clusters typically align with the hot and cold spots from the Optimized Hot Spot
Analysis tool.
e In the map, drag your pointer to the left, to the right, or up and down to compare the Hot
Spot Analysis and Outlier Analysis results.
Using Hot Spot Analysis and Outlier Analysis, you located statistically significant clusters of
high SNAP participation. This information can help in the allocation of SNAP resources to
areas of higher food insecurities. The results can help drive the decision to distribute
resources more efficiently and equitably.
f At the top of the map view, next to Pattern Detection, click the X to close the map.
2. What statistically significant spatial patterns can you detect from this analysis?
Generally, the southeastern areas of the contiguous United States have statistically
significantly high SNAP participation, and the north-central areas of the contiguous
United States have statistically significantly low SNAP participation.