0% found this document useful (0 votes)

91 views20 pages

Exploratory Data Analysis With Modelmap

The ModelMap package for R now includes tools for exploratory data analysis of predictor data and relationships between predictors and training data. The model.explore function provides graphical analysis of predictor correlations and distributions of training data values over the ranges of each predictor. It identifies areas where predictors fall outside the training data range. The function works for both continuous and categorical predictors and response types. An example analysis is demonstrated using data from a Nevada land cover study. Predictors include elevation, land cover, and MODIS satellite imagery bands. model.explore produces graphs of predictor correlations and distributions to explore these relationships before predictive modeling.

Uploaded by

Carlos Eduardo Melo Martínez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

91 views20 pages

Exploratory Data Analysis With Modelmap

Uploaded by

Carlos Eduardo Melo Martínez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Exploratory Data Analysis with ModelMap

Elizabeth A. Freeman, Tracey S. Frescino, Gretchen G. Moisen

September 10, 2018

Abstract
The ModelMap package (Freeman, 2009) for R (R Development Core Team, 2008) now in-
cludes tools for exploratory data analysis. The model.explore function offers several graphical
tools for exploring predictor correlation, and relationships between the available training data
set and the raster files of predictor data over which the map will be made. It will identify
regions of the map where the predictor lies outside the range of the training data, and show
the distribution of training data over the range of each predictor.

1 Introduction

2 Exploratory Data Analysis

The model.explore function offers several graphical tools for exploring predictor correlation, and
relationships between the training data and the area to be mapped. The function identifies regions
of the map where predictors fall outside the range of the training data, and shows the distribution
of training data and the map pixels over the range of each predictor.
The model.explore function can be used for both continuous and factored predictors, and for
binary, categorical, and continuous responses.

2.1 Example dataset

The data set is from a pilot study in Nevada launched in 2004 involving acquisition and photo-
interpretation of large-scale aerial photography, the Nevada Photo-Based Inventory Pilot (NPIP)
(Frescino et al., 2009). The data files for these examples are included in the ModelMap pack-
age installation in the R library directory. The data sets are under the ’external’ then under
’vignetteexamples’.
The predictor data set consists of 6 predictor variables: 5 continuous variables, and 1 categorical
variable (Table 1). The predictor layers are 250-meter resolution, pixel-based raster layers includ-
ing Moderate Resolution Imaging Spectro-radiometer (MODIS) satellite imagery (Justice et al.,
2002), a Landsat Thematic Mapper-based, thematic layer of predicted land cover, National Land
Cover Data (NLCD) (Homer et al., 2004), and a topographic layer of elevation from the National
Elevation Data (Gesch et al., 2002).
The continuous response variables are percent cover of Pinyon and Sage. The binary response
variables are presence of Pinyon and Sage. The categorical response variable is the vegetation
category: TREE, SHRUB, OTHERVEG, and NONVEG.
The MODIS data included 250-meter, 16-day, cloud-free, composites of MODIS imagery for April
6, 2005: visible-red (RED) and near-infrared (NIR) bands and 2 vegetation indices, normalized
difference vegetation index (NDVI) and enhanced vegetation index (EVI) (Huete et al., 2002). The
land cover and topographic layers were 30-meter products re-sampled to 250 meter using majority
and mean summaries, respectively.

1
Name Type Description
ELEV250 Continuous 90m NED elevation (ft)
resampled to 250m, average of 49 points
NLCD01 250 Categorical National Land Cover Dataset 2001
resampled to 250m - min. value of 49 points
EVI2005097 Continuous MODIS Enhanced vegetation index
NDV2005097 Continuous MODIS Normalized difference vegetation index
NIR2005097 Continuous MODIS Band 2 (Near Infrared)
RED2005097 Continuous MODIS Band 1 (Red)

Table 1: Predictor variables

The rectangular subset of Nevada chosen for these maps was deliberately selected to lie along the
diagonal edge of the study region to illustrate how ModelMap handles unsampled regions of a
rectangle (Figure 2).

2.2 Spatial Raster Layers

The ModelMap uses the raster package to read spatial rasters into R. The data for predictive
mapping in ModelMap should be in the form of pixel-based raster layers representing the predictors
in the model. The layers must be a file type recognizable by the raster package, for example ERDAS
Imagine image (single or multi-band) raster data formats, having continuous or categorical data
values. If there is more than one raster layer, the layers must have the same extent, projection,
and pixel size.

2.3 Raster Look Up Table

As with the model.mapmake function, the Raster Look Up Table (rastLUTfn) provides the link
between the spatial rasters for map production and the column names of the Training and Test
data sets. The Raster Look Up Table can be given as an R data frame specified by the argument
rastLUTfn or read in from a CSV file specified by rastLUTfn.
The rastLUTfn must include 3 columns: (1) the full path and base names of the raster file or
files; (2) the column headers from the Training and Test data sets for each predictor; (3) the layer
(band) number for each predictor. The names (column 2) must match not only the column headers
in Training and Test data sets (qdata.trainfn and qdata.testfn), but also the predictor names
in the arguments predList and predFactor, and the predictor names in model.obj.
In a windows environment, the function build.rastLUT() may be used to help build the look-up-
table with the aid of GUI prompts.

2.4 Device Types for Graphical Output

These are complex graphics, and can be slow to produce, particularly for large maps. If the
"default" on-screen graphics device is used, it is very important not to move or close the window
till model.explore() is completely finished running. If you attempt to close or move or use
the scroll wheel while the mouse is hovering over the the graphics device before the function is
completed, there is a risk of crashing the entire R session, and loosing work.
Therefore device.type="default" is disabled unless allow.default.graphics is set to TRUE. If
you decide to enable device.type="default" then it is recommended that you save all work first.
Otherwise, it is safer to have the output sent directly to a file, by setting device.type="jpeg",
"pdf", "png", etc...

2
3 Examples
3.1 Set up
After installing the ModelMap package, find the sample data sets from the R installation and
copy them to your working directory. The data consists of five files and is located in the vignette
directory of ModelMap, for example, in C:\R\R-2.15.0\library\ModelMap\vignettes.
There are 5 files:
VModelMapData.csv
VModelMapData LUT.csv
VModelMapData dem ELEVM 250.img
VModelMapData modis STK2005097.img
VModelMapData nlcd NLCD01 250.img
Load the ModelMap package.

R> library("ModelMap")

Next define some of the arguments.

Define training and test data file names. Note that the arguments qdata.trainfn and qdata.testfn
will accept either character strings giving the file names of CSV files of data, or the data itself in
the form of a data frame.

R> qdatafn <- "VModelMapData.csv"

R> qdata.trainfn <- "VModelMapData_TRAIN.csv"
R> qdata.testfn <- "VModelMapData_TEST.csv"

Define the output folder.

R> folder <- getwd()

Split the data into training and test sets. In example 1, an independent test set is used for
model validation diagnostics. The function get.test() randomly divides the original data into
training and test sets. This function writes the training and test sets to the folder specified by
folder, under the file names specified by qdata.trainfn and qdata.testfn. If the arguments
qdata.trainfn and qdata.testfn are not included, file names will be generated by appending
"_train" and "_test" to qdatafn.

R> get.test( proportion.test=0.2,

qdatafn=qdatafn,
seed=42,
folder=folder,
qdata.trainfn=qdata.trainfn,
qdata.testfn=qdata.testfn)

Define the predictors and define which predictors are categorical. Example 1 uses five continuous
predictors: the four predictor layers from the MODIS imagery plus the topographic elevation layer.
As none of the chosen predictors are categorical set predFactor to FALSE.

R> predList <- c( "ELEV250",

"NLCD01_250",
"EVI2005097",
"NDV2005097",
"NIR2005097",
"RED2005097")
R> predFactor <- c("NLCD01_250")

3
RED2005097
NIR2005097

EVI2005097
ELEV250
Explore
Predictor Correlation

ND
1

NIR2005097 0.8

0.6

RED2005097 0.94 0.4

0.2

ELEV250 −0.29 −0.18 ● 0

−0.2

V2005097 −0.57 −0.76 0.16 −0.4

−0.6

EVI2005097 −0.32 −0.58 0.05 0.81 −0.8

−1

Figure 1: Correlation of continuous predictor variables.

Define the column that contains unique identifiers for each data point. These identifiers will be
used to label the output file of observed and predicted values when running model validation.

R> unique.rowname <- "ID"

Define raster look up table.

R> rastLUTfn <- "VModelMapData_LUT.csv"

R> rastLUTfn <- read.table( rastLUTfn,
header=FALSE,
sep=",",
stringsAsFactors=FALSE)
R> rastLUTfn[,1] <- paste(folder,rastLUTfn[,1],sep="/")

4 Predictor Correlation
The model.explore function uses the corrplot package (Wei, 2013) to produce a graphical corre-
lation matrix for all the continuous (non-factored) predictor variables (Figure 1).

4
5 Example 1 - Continuous Response
We will start by looking at Pinyon Percent Cover as a response value.
To produce exploratory plots for all predictor variables and a given response variable call model.explore.
The argument col.ramp allows you to set the color ramp used to map continuous predictors,
while the argument col.cat allows you to set the colors used to map the categorical predictors.
Note that you can only specify one color ramp and one set of color categories each time you run
model.explore. However, you can run model.explore several times on different subsets of your
predictor variables. For example, we could run it once with col.ramp=terrain.colors(101) and
predList="ELEV250", then run it again with a different color ramp and for the remote sensing
predictor variables.

R> model.explore( qdata.trainfn=qdata.trainfn,

folder=folder,
predList=predList,
predFactor=predFactor,

OUTPUTfn="PinyonCover",

response.name="PINYON",
response.type="continuous",

unique.rowname=unique.rowname,

device.type=c("png"),
#cex=1.2,

# Raster arguments
rastLUTfn=rastLUTfn,
na.value=-9999,

# colors for continuous predictors

col.ramp=terrain.colors(101),

#colors for categorical predictors

col.cat=c("wheat1","springgreen2","darkolivegreen4",
"darkolivegreen2","yellow","thistle2",
"brown2","brown4")
)

5.1 Example 1 - Continuous Predictor - Continuous Response

As an example of a continuous spatial predictor raster, lets look at elevation. The sub-region of
the study area used in this vignette contains part of a small mountain range surrounded by plains,
and lies along the diagonal southeast edge of Nevada.
It is possible to produce a basic map of elevation (Figure 2) from the raster package alone:

R> elevfn <- paste(folder,"/VModelMapData_dem_ELEVM_250.img",sep="")

R> mapgrid <- raster(elevfn)

R> opar <- par(mar=c(4,4,3,6),xpd=NA,mgp=c(3, 2, .3))

R> col.ramp<-terrain.colors(101)
R> zlim <- c(1500,maxValue(mapgrid))

5
R> legend.label<-rev(pretty(zlim,n=5))
R> legend.colors<-col.ramp[trunc((legend.label/max(legend.label))*100)+1]
R> legend.label<-paste(legend.label,"m",sep="")
R> legend.label<-paste((7:3)*500,"m")
R> legend.colors<-col.ramp[c(100,75,50,25,1)]
R> image( mapgrid,
col = col.ramp,
xlab="", ylab="",
zlim=zlim,
asp=1, bty="n", main="")
R> legend( x=xmax(mapgrid),y=ymax(mapgrid),
legend=legend.label,
fill=legend.colors,
bty="n",
cex=1.2)
R> mtext("Elevation of Study Region",side=3,line=1,cex=1.5)
R> par(opar)

However, this map doesn’t tell us anything about how this predictor raster relates to the elevations
of the training data.
The model.explore functions illustrates graphically the relationships between the training data
and the predictor rasters. The map produced by the model.explore function, masks out all pixels
with NA values in light gray, and predictor values outside the range of the training data in black.
In this vignette, the light gray triangle at the souteast edge of all the maps represents a map region
that lies outside of Nevada, and therefore no data was collected.
In the exploratory plot for Pinyon Percent Cover as a function of Elevation (Figure 3) the black
regions near the center of the map are where the mountain top reaches elevations higher than
any of the training data, therefore any model predictions for this region will be extrapolations.
The ranges given in the text below the map indicate the range of the training data and that of
the predictor raster. Here you can see that the area covered by the raster ranges from 1560m to
3461m, while the highest training plot was only at 3084m.
The regions of the map that lie outside the range of the training data will be different for each
predictor variable. It is up to the user to determine if in the case of their particular model it is
worth collecting additional training data to improve predictions in these regions. If it is known
that a particular species does not occur above or below a given elevation, then collecting additional
training data may not be worth while. For example, if one is interested in mapping Pinyon Pine,
high elevation training data is less important than if one is interested in Bristlecone Pine.
The right half of the exploratory graphic has four smaller figures.
The graphic on the upper left is a scatter plot of the response variable (Pinyon percent cover) as a
function of the predictor variable (elevation). The red line is a basic Generalized Additive Model
(GAM) of the relationship of Response variable to this predictor.
The upper right graphic has box plots of the predictor variable over the training data and the
raster pixels. Here you can see that the subsection of the study area used for this vignette is
missing the lower elevations from the training data. This is due to the fact that the training data
was collected over the entire state of Nevada, while we are using a raster of a tiny section of the
state for the vignette.
The graphic on the bottom left is a histogram of the number of training plots by elevation, while
the bottom right is a similar histogram of number of map pixels by elevation. The colors in these
histograms correspond to the map colors. Again, you can see that the raster we are using is missing
any low elevation pixels.

6
Elevation of Study Region

3500 m
1960000

3000 m
2500 m
2000 m
1500 m
1955000
1950000
1945000

−1544000 −1538000 −1532000

Figure 2: Elevation of study region. Projection: Universal Transverse Mercator (UTM) Zone 11,
Datum: NAD83

7
5.2 Example 1 - Categorical Predictor - Continuous Response
Next we will look at an example of an exploratory plot for a categorical predictor while keeping
Pinyon Percent Cover as our continuous response. In this case, we will look at the National Land
Cover Data (NLCD) as a predictor (Figure 4).
With categorical predictors, the map masks out categories not found in the training data, and
indicates these categories in the map legend with asterisks. In this case, no training data was
collected in NLCD classes 41 (deciduous forests) and 43 (mixed forests). The user can then decide
to either collect additional training data in these classes, collapse classes (for example, combine
deciduous and mixed forests with evergreen forests to create a single ”Forest” class), or to leave
the original classes, and accept that these pixels will be predicted as ”NA”.
With categorical predictors, the right side of the exploratory plot gives three bar charts. The top
one shows the mean response value for each category in the training data. Here we can see that
the only NLCD class present in the training data with any appreciable Pinyon cover is class 42
(evergreen forest).
The middle graphic is a histogram of the number of training plots per NLCD class, while the
bottom graphic shows the number of map pixels per NLCD class.

8
9

Figure 3: Exploratory plot for continuous predictor (elevation) and continuous response (Percent Cover of Pinyon). Light gray pixels in the map
represent NA values in the predictors, while black pixels have values outside the range of the training data. The light gray triangle at the southeast
edge represents a map region that lies outside of Nevada, and therefore no data was collected, while the black regions near the center of the map are
where the mountain top reaches elevations higher than any of the training data, therefore any model predictions for this region will be extrapolations.
10

Figure 4: Exploratory plot for factored (categorical) predictor NLCD and continuous response (Percent Cover of Pinyon). Light gray pixels in the
map represent NA values in the predictors, while black pixels have predictor categories not found in the training data. Categories not found in the
training data are also marked with asterisks in the map legend. Here NLCD classes 41 (deciduous forest) and 43 (mixed forest) were not included
in the training data, and are therefor masked out of the exploratory map.
6 Example 2 - Binary Response
Next we will look at Sage Presence-Absence as a binary response value, and a remote sensing band
as a continuous predictor. We will use col.ramp=heat.colors(101) for the map color ramp.

R> model.explore( qdata.trainfn=qdata.trainfn,

folder=folder,
predList=predList,
predFactor=predFactor,

OUTPUTfn="SagePresence",

response.name="SAGE",
response.type="binary",

unique.rowname=unique.rowname,

device.type=c("png"),
#cex=1.2,

# Raster arguments
rastLUTfn=rastLUTfn,
na.value=-9999,

# colors for continuous predictors

col.ramp=heat.colors(101),

#colors for categorical predictors

col.cat=c("wheat1","springgreen2","darkolivegreen4",
"darkolivegreen2","yellow","thistle2",
"brown2","brown4")
)

6.1 Example 2 - Continuous Predictor - Binary Response

In the exploratory map (Figure 5) you can see a few small areas on the map where the values of
the Near Infra Red (NIR2005097) remote sensing band are higher or lower than the values found
in the training data.
On the right side, the graphic on the upper left shows relative proportions of Sage Presence to
Sage Absence in the training data as a function of NIR2005097. This shows that for the training
data plots with values of NIR2005097 between roughly 3250 and 5250 had the highest probability
of sage presence.
The box plots in the upper right show that the median values of NIR2005097 is quite similar
between the training data and the raster pixels, though the raster pixels had a slightly greater
spread.
Note that the box plot for the training data shows outliers, while the box plot for the raster does
not. This is due to the method used to produce the plots, rather than a lack of outliers in the
raster. The boxplot function in the raster package does not offer the option of showing outliers.
The raster package uses sampling on large rasters when making the box plot, to keep memory
usage reasonable, but this means it does not check every single pixel to locate all outliers.
Again, the two bottom graphics are again histograms of plot and raster frequency as a function
of the predictor variable. Here we see that training data and raster data are roughly similar

11
across the range of NIR2005097, with most of the training data and raster pixels lying at values of
NIR2005097 close to 2000. Though as seen on the box plot, the raster pixels have a wider range
than the training data.

6.2 Example 2 - Categorical Predictor - Binary Response

This data set only has only categorical predictor (NLCD) so the exploratory plot is very similar
to that of Example 1 (Figure 6).
The map and the lower two graphics on the right are based purely on the predictor variable,
therefore changing in response variable from Pinyon Percent Cover to Sage Presence has no effect
on these portions of the graphic.
The change to a binary response variable, however does change the upper right graphic. For binary
Presence-Absence response variables, such as Sage presence, the upper graphic shows the relative
proportion of presences and absences in each predictor category of the training data. Here we
can see that Sage is relatively common in NLCD categories 42 (evergreen forest), 50 (shrub land),
and 70 (herbaceous), very rare in category 30 (barren), and almost non-existent in category 80
(cultivated). Categories 41 (deciduous forest) and 43 (mixed forest) are not found in the training
data, and are therefore not included in this graphic.

12
13

Figure 5: Exploratory plot for continuous predictor (Near Infared - NIR2005097) and binary response (Sage Presence-Absence). Light gray pixels in
the map represent NA values in the predictors and black pixels are predictor values outside the range of the training data. The light gray triangle
at the southeast edge represents a map region that lies outside of Nevada, and therefore no data was collected, while the black regions are where the
map pixels have values higher or lower than that of the training data, therefore any model predictions for this region will be extrapolations.
14

Figure 6: Exploratory plot for factored (categorical) predictor NLCD and binary response (Sage Presence-Absence). Light gray pixels in the map
represent NA values in the predictors and black pixels are predictor categories not found in the training the training data. Categories not found in
the training data are also marked with asterisks in the map legend. Here NLCD classes 41 (deciduous forest) and 43 (mixed forest) were not included
in the training data, and are therefor masked out of the exploratory map.
7 Example 3 - Categorical Response
Example 3 builds a categorical response model for vegetation category. The response variable
consists of four categories: TREE, SHRUB, OTHERVEG, and NONVEG.

R> model.explore( qdata.trainfn=qdata.trainfn,

folder=folder,
predList=predList,
predFactor=predFactor,

OUTPUTfn="VegCat",

response.name="VEGCAT",
response.type="categorical",

unique.rowname=unique.rowname,

device.type=c("png"),
#cex=1.2,

# Raster arguments
rastLUTfn=rastLUTfn,
na.value=-9999,

# colors for continuous predictors

col.ramp=heat.colors(101),

#colors for categorical predictors

col.cat=c("wheat1","springgreen2","darkolivegreen4",
"darkolivegreen2","yellow","thistle2",
"brown2","brown4")
)

7.1 Example 3 - Continuous Predictor - Categorical Response

In the exploratory map (Figure 7) you can see a few small areas on the map where the values of
the Near Infra Red (NIR2005097) remote sensing band are higher or lower than the values found
in the training data.
On the right side, the graphic on the upper left shows relative proportions of Sage Presence to
Sage Absence in the training data as a function of NIR2005097. This shows that for the training
data plots with values of NIR2005097 between roughly 3250 and 5250 had the highest probability
of sage presence.
The box plots in the upper right show that the median values of NIR2005097 is quite similar
between the training data and the raster pixels, though the raster pixels had a slightly greater
spread.
Note again that the box plot for the training data shows outliers, while the boxplot for the raster
does not. This is due to the method used to produce the plots, rather than a lack of outliers in
the raster.
Again, the two bottom graphics are again histograms of plot and raster frequency as a function
of the predictor variable. Here we see that training data and raster data are roughly similar
across the range of NIR2005097, with most of the training data and raster pixels lying at values of
NIR2005097 close to 2000. Though as seen on the box plot, the raster pixels have a wider range
than the training data.

15
7.2 Example 3 - Categorical Predictor - Categorical Response
This dataset only has only categorical predictor (NLCD) so the exploratory plot is very similar to
that of Example 1 and 2 (Figure 8).
The change to a binary response variable, however does change the upper right graphic. The upper
right graphic shows the relative proportion of the response categories in each predictor category of
the training data. Here we can see that in the training data Vegcat=TREE is predominatly found
in NLCD categories 42 (evergreen forest). Vegcat=SHRUB is modtly found in 50 (shrubland),
though there are also some SHRUB plots in 70 (herbaceous). Vegcat=OTHERVEG dominates 70
(herbaceous) and 80 (cultivated). Not surprisingly most NLCD category 30 (barren) training plots
are Vegcat=NONVEG, but there are also substation proportions of NLCD 42 (evergreen forest)
and 50 (shrubland) classified as Vegcat=NONVEG.

16
17

Figure 7: Exploratory plot for continuous predictor (Red band - RED2005097) and categorical response (Vegetation category - VEGCAT). Light
gray pixels in the map represent NA values in the predictors and black pixels are predictor values outside the range of the training data. The light
gray triangle at the southeast edge represents a map region that lies outside of Nevada, while the black regions where the map pixels have values
higher or lower than that of the training data, therefore any model predictions for this region will be extrapolations.
18

Figure 8: Exploratory plot for factored (categorical) predictor NLCD and categorical response (Vegetation category - VEGCAT). Light gray pixels
in the map represent NA values in the predictors and black pixels are predictor categories not found in the training the training data. Categories not
found in the training data are also marked with asterisks in the map legend. Here NLCD classes 41 (deciduous forest) and 43 (mixed forest) were
not included in the training data, and are therefor masked out of the exploratory map.
8 Overall Mask for All Predictors
The model.explore function outputs an overall mask raster showing pixels where any of the
predictors had NA values or values lying outside the range of the training data (Figure 9). This
can be used on the final prediction map, to mask out regions were the model is extrapolating.
If you wish to save the individual masks for each predictor variable, set argument create.extrapolation.masks=TRUE
and a multi layer raster will be produces with one layer for each predictor variable in predList.
The layers in this file will be in the same order as the predictors in predList.

References
E. Freeman. ModelMap: An R Package for Modeling and Map production using Random Forest
and Stochastic Gradient Boosting. USDA Forest Service, Rocky Mountain Research Station, 507
25th street, Ogden, UT, USA, 2009. URL http://CRAN.R-project.org/. [email protected].
T. S. Frescino, G. G. Moisen, K. A. Megown, V. J. Nelson, Elizabeth, Freeman, P. L. Patterson,
M. Finco, K. Brewer, and J. Menlove. Nevada photo-based inventory pilot(npip) photo sampling
procedures. Gen. Tech. Rep. RMRSGTR-222, U.S. Departmentof Agriculture, Forest Service,
Rocky Mountain Research Station., Fort Collins, CO, 2009.
D. Gesch, M. Oimoen, S. Greenlee, C. Nelson, M. Steuck, and D. Tyler. The national elevation
dataset. photogrammetric engineering and remote sensing. Photogrammetric Engineering and
Remote Sensing, 68:5–11, 2002.

C. Homer, C. Huang, L. Yang, B. Wylie, and M. Coan. Development of a 2001 national land-cover
database for the united states. Photogrammetric Engineering and Remote Sensing, 70:829–840,
2004.
A. Huete, K. Didan, T. Miura, E. P. Rodriguez, X. Gao, and L. G. Ferreira. Overview of the
radiometric and biophysical performance of the modis vegetation indices. Remote Sensing of
Environment, 83:195–213, 2002.
C. O. Justice, J. R. G. Townshend, E. F. Vermote, E. Masuoka, R. E. Wolfe, N. Saleous, D. P. Roy,
and J. T. Morisette. An overview of modis land data processing and product status. Remote
Sensing of Environment, 83:3–15, 2002.

R Development Core Team. R: A Language and Environment for Statistical Computing. R Foun-
dation for Statistical Computing, Vienna, Austria, 2008. URL http://www.R-project.org.
ISBN 3-900051-07-0.
T. Wei. corrplot: Visualization of a correlation matrix, 2013. URL
http://CRAN.R-project.org/package=corrplot. R package version 0.73.

19
Figure 9: Map showing all pixels where at least one of the predictor variables is NA (light gray)
or is outside the range of the training data (black).

Pick Your Flavor of Random Forest: Elizabeth A. Freeman, Tracey S. Frescino, Gretchen G. Moisen September 10, 2018
No ratings yet
Pick Your Flavor of Random Forest: Elizabeth A. Freeman, Tracey S. Frescino, Gretchen G. Moisen September 10, 2018
19 pages
Modelmap: An R Package For Model Creation and Map Production
No ratings yet
Modelmap: An R Package For Model Creation and Map Production
69 pages
R Programs 2024-2025
No ratings yet
R Programs 2024-2025
13 pages
Presentation Geoxp
No ratings yet
Presentation Geoxp
9 pages
R Graphics Essentials Great Data Visualization
No ratings yet
R Graphics Essentials Great Data Visualization
248 pages
R MHW
No ratings yet
R MHW
189 pages
Experiment # 4
No ratings yet
Experiment # 4
10 pages
DA Lab Week-1
No ratings yet
DA Lab Week-1
7 pages
Chapter 1
No ratings yet
Chapter 1
28 pages
Introduction to Orange Data Mining
50% (2)
Introduction to Orange Data Mining
33 pages
Apunts BLOC 1 Estadística
No ratings yet
Apunts BLOC 1 Estadística
15 pages
Limma User Guide for Microarray Data
No ratings yet
Limma User Guide for Microarray Data
102 pages
Unit 2
No ratings yet
Unit 2
32 pages
M4 DAR Part1
No ratings yet
M4 DAR Part1
16 pages
Introduction to R Programming
No ratings yet
Introduction to R Programming
115 pages
Week 6
No ratings yet
Week 6
36 pages
CS202 Assignment - 4 - GIKI
No ratings yet
CS202 Assignment - 4 - GIKI
3 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
40 pages
Introduction To EBImage
No ratings yet
Introduction To EBImage
19 pages
Data Preparation For Digital Soil Mapping
No ratings yet
Data Preparation For Digital Soil Mapping
28 pages
Final Data Lab
No ratings yet
Final Data Lab
21 pages
Exploratory Data Analysis Course Notes
No ratings yet
Exploratory Data Analysis Course Notes
55 pages
DM Assignment
No ratings yet
DM Assignment
17 pages
Model Lab
No ratings yet
Model Lab
6 pages
R Graphics and Data Visualization Guide
No ratings yet
R Graphics and Data Visualization Guide
76 pages
Spatial Econometrics With R 2020
No ratings yet
Spatial Econometrics With R 2020
141 pages
Graphical Analysis
No ratings yet
Graphical Analysis
64 pages
Graph Plotting in R Programming
No ratings yet
Graph Plotting in R Programming
12 pages
Analysis Using Statistical: Introduction & Data Exploration
No ratings yet
Analysis Using Statistical: Introduction & Data Exploration
23 pages
Data Visualization in R
No ratings yet
Data Visualization in R
12 pages
Package GD': Topics Documented
No ratings yet
Package GD': Topics Documented
10 pages
Ecm1486 Sup 0002 Appendixs2
No ratings yet
Ecm1486 Sup 0002 Appendixs2
30 pages
R语言学习笔记
No ratings yet
R语言学习笔记
78 pages
UNIT II (R Programming)
No ratings yet
UNIT II (R Programming)
89 pages
R Lab Program
No ratings yet
R Lab Program
20 pages
Kassambara, Alboukadel - Machine Learning Essentials - Practical Guide in R (2018)
100% (1)
Kassambara, Alboukadel - Machine Learning Essentials - Practical Guide in R (2018)
424 pages
MIT 302 - Statistical Computing II - Tutorial 04
No ratings yet
MIT 302 - Statistical Computing II - Tutorial 04
7 pages
Introduction To Visualising Spatial Data in R
No ratings yet
Introduction To Visualising Spatial Data in R
29 pages
22MSM40206 Data Visualisation
No ratings yet
22MSM40206 Data Visualisation
13 pages
Da Laqs Saqs
No ratings yet
Da Laqs Saqs
23 pages
Shahun Term Workr1
No ratings yet
Shahun Term Workr1
34 pages
Lab Manual - DSR
No ratings yet
Lab Manual - DSR
32 pages
R Studio Lab Summary Sheet
No ratings yet
R Studio Lab Summary Sheet
3 pages
DATAMINING
No ratings yet
DATAMINING
24 pages
R Skills Practice for Data Analysts
No ratings yet
R Skills Practice for Data Analysts
91 pages
BuildingPredictiveModelsR Caret
No ratings yet
BuildingPredictiveModelsR Caret
26 pages
Possible Questions On R Programming and Metaverse
No ratings yet
Possible Questions On R Programming and Metaverse
20 pages
Spatial Data Analysis in R Tutorial
No ratings yet
Spatial Data Analysis in R Tutorial
24 pages
Machine Learning Statistical Model Using Transportation Data
No ratings yet
Machine Learning Statistical Model Using Transportation Data
32 pages
Note 2
No ratings yet
Note 2
27 pages
Tutorial R
100% (1)
Tutorial R
217 pages
Aman DA 111
No ratings yet
Aman DA 111
14 pages
Data Science Practical Completion Report
No ratings yet
Data Science Practical Completion Report
31 pages
Chapter 03 Visualization (R)
No ratings yet
Chapter 03 Visualization (R)
30 pages
BDAExp 8
No ratings yet
BDAExp 8
9 pages
Note 5-7
No ratings yet
Note 5-7
21 pages
Boulder Handout 2019
No ratings yet
Boulder Handout 2019
187 pages
Estimation of Random Utility Models in R: The Mlogit Package
No ratings yet
Estimation of Random Utility Models in R: The Mlogit Package
40 pages
Exercises PDF
No ratings yet
Exercises PDF
33 pages
Estimating Multinomial Logit Models in R
No ratings yet
Estimating Multinomial Logit Models in R
73 pages
Combining Multiple Imputations: Thomas Lumley April 26, 2019
No ratings yet
Combining Multiple Imputations: Thomas Lumley April 26, 2019
5 pages
Spatial Statistics Math Exercises
No ratings yet
Spatial Statistics Math Exercises
2 pages
Applied Econometrics With R: Package Vignette and Errata: Christian Kleiber Achim Zeileis
No ratings yet
Applied Econometrics With R: Package Vignette and Errata: Christian Kleiber Achim Zeileis
6 pages
Adephylo: Exploratory Analyses For The Phylogenetic Comparative Method
No ratings yet
Adephylo: Exploratory Analyses For The Phylogenetic Comparative Method
20 pages
R Stat
No ratings yet
R Stat
88 pages
Journal of Statistical Software: Spacetime: Spatio-Temporal Data in R
No ratings yet
Journal of Statistical Software: Spacetime: Spatio-Temporal Data in R
30 pages
Spatial Statistics Course Syllabus: Course Scope and Purpose
No ratings yet
Spatial Statistics Course Syllabus: Course Scope and Purpose
4 pages
Ross Undergraduate Sample Business Case Essay
100% (3)
Ross Undergraduate Sample Business Case Essay
9 pages
Sociology of Global Inequality
No ratings yet
Sociology of Global Inequality
14 pages
MSC 4 Sem Mathematics Partial Differential Equations Paper 2 Summer 2018
No ratings yet
MSC 4 Sem Mathematics Partial Differential Equations Paper 2 Summer 2018
2 pages
Phase Equilibrium in Gaseous States
No ratings yet
Phase Equilibrium in Gaseous States
51 pages
3.3.3 Wave Overtopping: - Wave-Overtopping of Breakwater in A Flume Test
No ratings yet
3.3.3 Wave Overtopping: - Wave-Overtopping of Breakwater in A Flume Test
9 pages
Ragnarok X
No ratings yet
Ragnarok X
122 pages
Statistics Project Sleep Study 20pages
No ratings yet
Statistics Project Sleep Study 20pages
20 pages
Business Stats: Dispersion Analysis
No ratings yet
Business Stats: Dispersion Analysis
24 pages
Neckerman and Torche 2007 (不平等：原因与后果)
No ratings yet
Neckerman and Torche 2007 (不平等：原因与后果)
24 pages
BSBRSK 501
100% (1)
BSBRSK 501
12 pages
Bessel Differential Equation
No ratings yet
Bessel Differential Equation
3 pages
Understanding Seasonality & Holt-Winters Forecasting
No ratings yet
Understanding Seasonality & Holt-Winters Forecasting
44 pages
Stream Test
No ratings yet
Stream Test
1 page
Ignatius Loyola's Educational Impact
No ratings yet
Ignatius Loyola's Educational Impact
3 pages
ECOFOR For Cement Mill - Intensification of Dry Grinding Processes by Using The ECOFOR' Device
No ratings yet
ECOFOR For Cement Mill - Intensification of Dry Grinding Processes by Using The ECOFOR' Device
4 pages
IELTS Speaking Topics: Food, Sports, Family
No ratings yet
IELTS Speaking Topics: Food, Sports, Family
51 pages
Methods For Chemical Analysis of Soils
No ratings yet
Methods For Chemical Analysis of Soils
102 pages
Concave Mirror Ray Diagram Cases
No ratings yet
Concave Mirror Ray Diagram Cases
3 pages
DM Plan Sapahar Upazila Noagaon District - English Version-2014
No ratings yet
DM Plan Sapahar Upazila Noagaon District - English Version-2014
107 pages
Syllabus Sem 6 Ocr
No ratings yet
Syllabus Sem 6 Ocr
14 pages
Hicks, J. Posthumanism in The Novels of Kurt Vonnegut
100% (1)
Hicks, J. Posthumanism in The Novels of Kurt Vonnegut
251 pages
Merino Restroom Solution (E-Brochure) - Compressed
No ratings yet
Merino Restroom Solution (E-Brochure) - Compressed
70 pages
Vasquez Alexis 3e Inquiry-Based Lesson Plan
No ratings yet
Vasquez Alexis 3e Inquiry-Based Lesson Plan
3 pages
Architectural Project
No ratings yet
Architectural Project
19 pages
Communication Barriers Guide
No ratings yet
Communication Barriers Guide
44 pages
10m Walking Test
No ratings yet
10m Walking Test
2 pages
Week 1
No ratings yet
Week 1
48 pages
MPPSC 2022 Mathematical Sciences Syllabus
No ratings yet
MPPSC 2022 Mathematical Sciences Syllabus
4 pages
Belz Mail - Provide The Quotation of Lab Instruments For Calibration With Site Visit Charges
No ratings yet
Belz Mail - Provide The Quotation of Lab Instruments For Calibration With Site Visit Charges
2 pages
Laser: Amplification by Stimulated Emission of Radiation"
No ratings yet
Laser: Amplification by Stimulated Emission of Radiation"
28 pages

Exploratory Data Analysis With Modelmap

Uploaded by

Exploratory Data Analysis With Modelmap

Uploaded by

Exploratory Data Analysis with ModelMap

Elizabeth A. Freeman, Tracey S. Frescino, Gretchen G. Moisen

September 10, 2018

2 Exploratory Data Analysis

2.1 Example dataset

Table 1: Predictor variables

2.2 Spatial Raster Layers

2.3 Raster Look Up Table

2.4 Device Types for Graphical Output

Next define some of the arguments.

R> qdatafn <- "VModelMapData.csv"

Define the output folder.

R> folder <- getwd()

R> get.test( proportion.test=0.2,

R> predList <- c( "ELEV250",

RED2005097 0.94 0.4

ELEV250 −0.29 −0.18 ● 0

V2005097 −0.57 −0.76 0.16 −0.4

EVI2005097 −0.32 −0.58 0.05 0.81 −0.8

Figure 1: Correlation of continuous predictor variables.

R> unique.rowname <- "ID"

Define raster look up table.

R> rastLUTfn <- "VModelMapData_LUT.csv"

R> model.explore( qdata.trainfn=qdata.trainfn,

# colors for continuous predictors

#colors for categorical predictors

5.1 Example 1 - Continuous Predictor - Continuous Response

R> elevfn <- paste(folder,"/VModelMapData_dem_ELEVM_250.img",sep="")

R> opar <- par(mar=c(4,4,3,6),xpd=NA,mgp=c(3, 2, .3))

−1544000 −1538000 −1532000

R> model.explore( qdata.trainfn=qdata.trainfn,

# colors for continuous predictors

#colors for categorical predictors

6.1 Example 2 - Continuous Predictor - Binary Response

6.2 Example 2 - Categorical Predictor - Binary Response

R> model.explore( qdata.trainfn=qdata.trainfn,

# colors for continuous predictors

#colors for categorical predictors

7.1 Example 3 - Continuous Predictor - Categorical Response

You might also like