High-Resolution Mapping of Forest Canopy Height Using Machine Learning by
High-Resolution Mapping of Forest Canopy Height Using Machine Learning by
A R T I C LE I N FO A B S T R A C T
Keywords: Forest canopy height is an important indicator of forest carbon storage, productivity, and biodiversity. The
Forest canopy height present study showed the first attempt to develop a machine-learning workflow to map the spatial pattern of the
ICESat-2 forest canopy height in a mountainous region in the northeast China by coupling the recently available canopy
Sentinel-1 height (Hcanopy) footprint product from ICESat-2 with the Sentinel-1 and Sentinel-2 satellite data. The ICESat-2
Sentinel-2
Hcanopy was initially validated by the high-resolution canopy height from airborne LiDAR data at different spatial
Landsat-8
scales. Performance comparisons were conducted between two machine-learning models – deep learning (DL)
Machine-learning
Deep-learning model and random forest (RF) model, and between the Sentinel and Landsat-8 satellites. Results showed that the
Random forest ICESat-2 Hcanopy showed the highest correlation with the airborne LiDAR canopy height at a spatial scale of
250 m with a Pearson’s correlation coefficient (R) of 0.82 and a mean bias of -1.46 m, providing important
evidence on the reliability of the ICESat-2 vegetation height product from the case in China’s forest. Both DL and
RF models obtained satisfactory accuracy on the upscaling of ICESat-2 Hcanopy assisted by Sentinel satellite co-
variables with an R-value between the observed and predicted Hcanopy equalling 0.78 and 0.68, respectively.
Compared to Sentinel satellites, Landsat-8 showed relatively weaker performance in Hcanopy prediction, sug-
gesting that the addition of the backscattering coefficients from Sentinel-1 and the red-edge related variables
from Sentinel-2 could positively contribute to the prediction of forest canopy height. To our knowledge, few
studies have demonstrated large-scale vegetation height mapping in a resolution ≤ 250 m based on the newly
available satellites (ICESat-2, Sentinel-1 and Sentinel-2) and DL regression model, particularly in the forest areas
in China. Thus, the present work provided a timely and important supplementary to the applications of these
new earth observation tools.
1. Introduction biomass, carbon stock density, vertical complexity of canopy, and ha-
bitat quality (Alexander et al., 2018; Asner et al., 2010; Goetz et al.,
As both a product and driver of ecosystem processes, forest canopy 2010; Hill and Hinsley, 2015; Li et al., 2015). Multi-temporal mon-
height plays a vital role in biomass allocation, carbon storage, forest itoring in large-scale forest canopy height is critical for assessing forest
productivity, biodiversity (Lefsky et al., 2002; Simard et al., 2011; disturbance regime and its dynamics as well as associated deforestation
Zhang et al., 2016), and in predicting tree mortality risk and rate during and degradation (Pourshamsi et al., 2018), providing important refer-
drought periods (Stovall et al., 2019). Accurate estimates of forest ca- ences for policymakers.
nopy height are key steps for the assessments on biophysical parameters As a powerful earth observation approach, remote sensing has been
that highly associated with canopy height such as above-ground increasingly used in the large-scale mapping of forest canopy height in
⁎
Corresponding author at: State Key Laboratory of Remote Sensing Science, Aerospace Information Research Institute, Chinese Academy of Sciences, P.O. Box
9718, No. 20 Datun Road, Olympic Science & Technology Park of CAS, Beijing 100101, China.
⁎⁎
Corresponding author at: State key Laboratory of Resources and Environmental Information System, Institute of Geographic Sciences and Natural Resources
Research, Chinese Academy of Sciences, 11A, Datun Road, Chaoyang District, Beijing, 100101, China.
E-mail addresses: [email protected] (W. Li), [email protected] (R. Shang).
https://doi.org/10.1016/j.jag.2020.102163
Received 6 February 2020; Received in revised form 19 April 2020; Accepted 25 May 2020
Available online 06 June 2020
1569-8432/ © 2020 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/BY-NC-ND/4.0/).
W. Li, et al. Int J Appl Earth Obs Geoinformation 92 (2020) 102163
the past decades in various platforms including synthetic aperture radar height when coupled with the airborne LiDAR-derived vegetation
(e.g., TanDEM-X and TerraSAR-X) (Garestier et al., 2007; Kugler et al., height data (Lang et al., 2019). Nevertheless, it remains unknown that if
2014), oblique photography from satellite and drones (Li et al., 2016; the canopy height observed from the ICESat-2 satellite also showed
St-Onge et al., 2015), as well as light detection and ranging (LiDAR) or equivalent capability in the large-scale mapping of canopy height when
laser scanning (Lefsky et al., 2005). Among these platforms, LiDAR has integrated with the Sentinel-1 & 2 data, particularly in a different forest
been recognized as the most powerful tool in the mapping of large-scale region in China. To fill up these knowledge gaps, it is important to
forest canopy height due to its evident advantages of providing direct investigate how these new satellite data (ICESat-2 and Sentinel-1 & 2)
observation on forest canopy structure in the vertical plane. This has could be used to map the spatially contiguous patterns of the forest
been proved by numerous studies that used LiDAR from different canopy height in China so that further testify the performance of these
platforms including terrestrial laser scanning (TLS) (Liang et al., 2016), new earth observation tools.
airborne laser scanning (ALS) (Alexander et al., 2018; Wulder et al., One important step to map spatially contiguous forest canopy height
2012), drone LiDAR (Almeida et al., 2019; Sankey et al., 2017, 2018), is to integrate the satellite LiDAR-derived height with ancillary co-
as well as the first satellite LiDAR GLAS (the Geoscience Laser Altimeter variables (height relevant) from spatially-contiguous satellite imagery
System)(Lefsky et al., 2005; Pang et al., 2011; Simard et al., 2011). because the satellite LiDAR shots are obtained along transects at dis-
Compared to satellite LiDAR, TLS and ALS could provide more detailed crete intervals (Wang et al., 2016b). Machine-learning models have
measurements on the canopies with higher ranging precision, but they been increasingly used as the main mathematical approach in such
are usually limited by their small spatial coverage that makes them integration procedure for upscaling canopy height and other vegetation
hard to be solely applied to very large forested areas. The satellite attributes observed from LiDAR, field measurements as well as high-
LiDAR is more advantageous in the regional and global scale estimates resolution remote sensing imagery (Ahmed et al., 2015; Lary et al.,
of forest canopy height when it is assisted by other satellite data and 2016; Suess et al., 2018; Tian et al., 2014; Zhang et al., 2019). One of
associated environmental products (Lefsky, 2010; Simard et al., 2011; the increasingly used machine-learning models is the random forest
Wang et al., 2016b). The recently launched LiDAR satellite ICESat-2 (RF) model that built based on randomly structured subsets of in-
(Ice, Cloud, and land Elevation Satellite-2, launched on September 15, dependent variables in tree construction, which copes well with redu-
2018) from the National Aeronautics and Space Administration in cing the impacts from multi-collinearity caused by a large number of
America and the high-resolution Sentinel satellites (incl., Sentinel-1 and correlated independent variables (Cutler et al., 2007). In addition to the
Sentinel-2) from European Space Agency further broaden the global- widely used machine-learning models, e.g., random forest (RF) model
scale monitoring on forest ecosystem dynamics and functioning in- (Jones et al., 2018; Luo et al., 2019) and support vector machine model
cluding the estimates of forest canopy height (Lang et al., 2019; Liu (Pourshamsi et al., 2018), deep-learning (DL) model has become a
et al., 2019; Neuenschwander and Pitts, 2019). The ICESat-2 success- hotspot in the field of machine-learning applications in big remote
fully extended the LiDAR observation on the planet from its pioneering sensing data (Ball et al., 2017; Yuan et al., 2020; Zhang and Du, 2016)
satellite ICESat-1 who stopped its mission around 2009, providing a and ecology (Christin et al., 2019; Reichstein et al., 2019). Developed
critical and continuous earth observation to improve our understanding from the traditional neural network, the deep-learning model has out-
of global carbon stocks, ecosystem dynamics, and ecosystem func- performed traditional machine learning models with higher perfor-
tioning changes (Neuenschwander and Pitts, 2019; Neuenschwander mance (Reichstein et al., 2019; Yuan et al., 2020). Many machine-
and Magruder, 2019). Previous study showed that the vegetation pro- learning models usually require domain knowledge to extract features
duct (AL08) from the ICESat-2 showed a high accuracy on canopy from raw data via data mining techniques to structure the input data
height retrieval (< 3.2 m) in a vegetated region of Finland before modeling, while the deep-learning model does not need to cope
(Neuenschwander and Magruder, 2019). However, few studies have with this data structuring as the nested layers in the neural networks
explored the performance of the newly available ICESat-2 vegetation put data through hierarchies of different concepts (Ball et al., 2017;
products on the estimation of canopy height of forests in China, and Cheng et al., 2017; El-Amir and Hamdy, 2020). As a branch of machine
how correlated it is with the airborne LiDAR canopy height, and at what learning based on deep neural networks, the deep-learning model is
spatial scale does such correlation reach the highest level. usually composited of multiple-layer neural networks (usually at least
Upscaling the satellite or airborne LiDAR-derived vegetation attri- three hidden layers between the input and output layers) as more
butes (incl., canopy height, canopy cover, and above-ground biomass) “learned” than common artificial neural network models, which can
from footprint and plot level to regional and global level using spatially have over millions of parameters (Ball et al., 2017). Although most of
contiguous satellite imageries have become the mainstream approach the previous DL-based remote sensing studies have focused on image
in large scale mapping of vegetation attributes (Hudak et al., 2002; segmentation, object detection and land cover classification (Chen
Lang et al., 2019; Li et al., 2015; Su et al., 2016; Wang et al., 2016a; et al., 2014; Handan-Nader and Ho, 2019; Kussul et al., 2017), DL-
Zald et al., 2016). The upscaling procedure highly relies on the ex- based regression model has also been proved to have very high cap-
istence of the satisfactory correlation between the LiDAR-derived ve- ability in mapping spatially contiguous forest attributes, e.g., above-
getation attributes and the satellite-derived co-variables including the ground biomass and canopy height (Asner et al., 2018; Lang et al.,
spectral reflectance, vegetation indices, texture variables as well as 2019). For instance, the DL regression model has been successfully
other environmental factors. With the increase of the data availability applied to estimate canopy height for a vegetated region in Switzerland
of the new ICESat-2 LiDAR data together with the high-resolution sa- by coupling the Sentinel-2 multi-spectral and texture co-variables with
tellite data (Sentinel-1 & 2), new possibilities are appearing for us to the height measurements from airborne LiDAR (Lang et al., 2019).
map the vegetation canopy height at a higher spatial resolution Compared to airborne LiDAR, the working principle, data acquisition,
(< 1 km) than previous studies (Healey et al., 2016; Simard et al., and structure of ICESat-2 LiDAR are significantly different, e.g. laser
2011). Compared with the pioneering optical satellite data, e.g., MODIS beam power, vertical resolution, and footprint size, etc. Thus, it is
and Landsat, Sentinel constellation has two satellites for both the syn- worthy to investigate that if the DL regression model and other ma-
thetic aperture radar (SAR) senor (Sentinel-1), and the optical sensor chine-learning models could still keep their high performance in the
(Sentinel-2) with higher spatio-temporal resolution (10 m ∼ 20 m, re- mapping of forest canopy height when being applied to integrate the
visit cycle within 6 days). In addition, Sentinel-2 satellites were de- newly available canopy height from ICESat-2 LiDAR with the co-vari-
signed to have more spectral channels from the red-Edge bands com- ables from Sentinel imagery. Further studies are needed to explore if the
pared to the Landsat-8 satellite. A recent study showed that the ICESat-2 canopy height is correlated with the backscattering attributes
Sentinel-2 multi-spectral and texture metrics are very promising co- from the Sentinel-1 imagery, and the spectral and texture proxies from
variables in the country-level high-resolution mapping of vegetation the Sentinel-2 imagery. It remains unknown that how the addition of
2
W. Li, et al. Int J Appl Earth Obs Geoinformation 92 (2020) 102163
red-edge spectral attributes and backscattering information from Sen- mainly include deciduous broad-leaved and deciduous needle-leaved,
tinel satellites contribute to the performance of up-scaling models, with a large cover of white birch (Betula platyphylla Suk L.), larch (Larix
compared to other coarser-resolution satellite imagery, e.g., Landsat-8. gmelinii, L.) and aspen (Populus davidiana, L.). A mixture of secondary
Importantly, comparing the prediction ability among different spatially shrubs and grass, and cropland in the southwest mainly cover the non-
contiguous satellite data during their integration with ICESat-2 canopy forest areas. The climate is cold temperate continental monsoon climate
height could also extend our understanding of large-scale canopy height with an annual mean temperature of −5.4 °C and a minimum tem-
mapping in an era of big earth observation data. perature of −52.6 °C (Ni et al., 2019). The region receives approxi-
In this study, we aim to develop a machine-learning-based workflow mately 500 mm of precipitation per year.
(Fig. 1) to map the spatial pattern of forest canopy height in a moun-
tainous region - Da Hinggan Ling Prefecture in the northeast China. The 2.2. Data processing
canopy height product (Hcanopy) from ICESat-2 was initially validated
by the high-resolution canopy height from airborne LiDAR data at 2.2.1. ALS data
different spatial scales. The satellite imagery from Sentinel-1, Sentinel-2 The ALS data were collected in two regions within the study site
and Landsat-8 were then used to extrapolate the ICESat-2 canopy height (Fig. 2). One is called Yigen, located in the southwestern part, which is
from footprint level to regional level with the help from two machine- an ecotone of agriculture and deciduous broad-leaved forests. Another
learning models – DL model and RF model. We compared the perfor- one is called Genhe, located in the northeastern part that mainly cov-
mance in Hcanopy prediction between these two machine-learning ered by deciduous needle-leaved forests. The airborne flights were
models, and between the Sentinel and Landsat-8 satellites. To our conducted in Yigen and Genhe between August and September in 2012
knowledge, few studies have demonstrated large-scale mapping of and 2016, respectively (Table 1). Repetitive flights were conducted
forest canopy height based on the new satellites ICESat-2 and Sentinel with a Leica ALS60 airborne laser scan system loaded, producing an
via DL models, particularly in the forest areas in China. Thus, the average point density of 9.33 pts/m2 for Yigen and 8.12 pts/m2 for
present work is also aimed to provide a timely and important supple- Genhe with a vertical detection accuracy of 0.15 m. The ALS-derived
mentary to the applications of these new earth observation tools. canopy height showed a very high consistency with the field mea-
surements based on traditional forest plot inventory according to simple
2. Materials and methods linear regression analysis (R2 = 0.93, RMSE = 0.88 m) (Tian et al.,
2015). The point clouds were geo-referenced to the UTM Zone 51 N/
2.1. Study site WGS-84 projection system.
The study site is located in the Inner Mongolia Autonomous Region 2.2.2. ICESat-2 data
of China, covering an approximate area of 2773 km2 (Fig. 2). The to- The vegetation height product (ATL08, Hcanopy) from the satellite
pography is mountainous with elevation ranging from 585 m to 1230 m ICESat-2 LiDAR was used as the basic input for the mapping based on
above sea level. The southwestern part is an ecotone of agriculture and machine-learning. The ATL08 vegetation height product is one of the
forestry while the remaining part is covered by forests. The forests ICESat-2 geophysical products, which is aimed to provide estimates of
3
W. Li, et al. Int J Appl Earth Obs Geoinformation 92 (2020) 102163
Fig. 2. Map of ICESat-2 footprints and airborne laser scanning (ALS) boundaries overlaid on the composited Sentinel-2 imagery for the study area. The Sentinel-2
imagery was composited as the median imagery from all the available high-quality Sentinel-2 imagery without cloud contaminations. All the imagery were acquired
in the growing season (June ∼ September) between 2017 and 2019.
4
W. Li, et al. Int J Appl Earth Obs Geoinformation 92 (2020) 102163
Table 2
Key attributes of ICESat-2 vegetation height product (ATL08).
Attribute Description
pixel size was selected according to the spatial resolution of the com- variables (texture contrast: NDVI_con; texture entropy: NDVI_ent, and
monly used satellite images (Sentinel: 10 m, Landsat: 30 m, MODIS: texture variance: NDVI_var) were calculated based on the NDVI ima-
250 m, 500 m, 1000 m) in previous studies for extrapolating the foot- gery using a gray-tone spatial dependence matrix with a window size of
print level LiDAR canopy height. Generalized linear regression (GLR) 3 × 3.
was conducted based on the ICESat-2 Hcanopy and the ALS height me-
trics to test the reliability of the ICESat-2 product of vegetation height. 2.2.4. Landsat-8 data
The root mean squared error (RMSE, Eq. (1)) and the relative RMSE The median images from Landsat-8 in the growing seasons from
(rRMSE, Eq. (2)) were used to evaluate the GLR model accuracy. 2017 to 2019 were also composited. The surface reflectance product of
n Landsat-8 was used in this study, which has been atmospherically
1
RMSE =
n
∑ (pi − pˆi ) corrected using LaSRC (the Land Surface Reflectance Code). It includes
i=1 (1) a cloud, shadow, water and snow mask produced using CFMASK, as
well as a per-pixel saturation mask (Vermote et al., 2016). The surface
RMSE
rRMSE = reflectance of the six spectral bands (blue, green, red, NIR, SWIR1, and
pi (2)
SWIR2) and the three vegetation indices (NDVI, EVI, and MSAVI), as
∧
where pi is the observed value, pi is the predicted value, n is population well as the three NDVI texture variables (NDVI_con, NDVI_ent and
size. NDVI_var) were also calculated (Table 3).
All the ICESat-2 Hcanopy data, Sentinel, Landsat images and the
2.2.3. Sentinel data generated metrics were projected to the Universal Transverse Mercator
The Sentinel-1 and Sentinel-2 satellite data obtained in the growing (UTM) Zone 51 N based on the WGS-84 ellipsoid. All the co-variables
season (June ∼ September) from 2017 to 2019 were used to extra- from the satellite imagery were resampled to different spatial resolution
polate the ICESat-2 Hcanopy as the input explanatory variables for the (250 m, 30 m) according to different modeling processes (Section 2.3).
machine-learning models. All the Sentinel-1 and -2 data were processed
in the cloud-based computation platform - Google Earth Engine (GEE, 2.3. Machine-learning modeling
Google, 2015, https://code.earthengine.google.com/). The Sentinel-1
satellites are dual-polarization C-band SAR, providing a data collection Two machine-learning models (DL and RF) were used to extrapolate
of ground range detected scenes that contain either 1 or 2 out of 4 the Hcanopy to the entire study site. The ICESat-2 Hcanopy was used as the
possible polarization bands, depending on the instrument's polarization dependent variable and the spectral metrics from the satellite images
settings (VV or HH, and dual-band VV + VH and HH + HV). In this (Sentinel or Landsat) were used as the dependent variables for the
study, only the bands of VV and VH backscattering coefficients were machine-learning models. The metrics from satellite imageries were
available in the study site during the past three years. The Sentinel-1 extracted for each of the ICESat-2 footprint. Before the modeling, spa-
data were calibrated and ortho-corrected using the Sentinel-1 Toolbox. tial autocorrelation of Hcanopy was checked using the Moron’s I metric
The Sentinel-2 data is Level-2A product which contains the surface (Moran, 1950), and showed that no spatial autocorrelation was found in
reflectance that atmospherically corrected by the sen2cor model (Louis this study. Data normalization was applied to both of dependent vari-
et al., 2016). A median image composition was applied to each band of able and independent variables before regression. The ICESat-2 point
the Sentinel-1 and -2 images based on all the high-quality observations clouds were randomly divided into 80% for model training and 20% for
from the growing seasons in the past three years. The Sentinel-2 image independent validation. Considering the highest correlation between
consists of 12 spectral bands (Table 3) with the highest spatial resolu- the ICESat-2 Hcanopy and ALS-derived height metrics, the machine-
tion of 10 m for four visible bands (blue, green, red, and near-infrared), learning modeling was built at a 250-m scale (Section 3.1). Meanwhile,
and 20 m for the short-wave infrared (swir1, and swir2) and red-edge the same modeling was also conducted at a 30-m scale in order to
bands (redEdge 1∼4). Cloud masking was applied to all the single compare the performance between Sentinel and Landsat-8 satellites.
Sentinel-2 images before the composition according to the opaque and For the 30-m scale modeling, the Sentinel variables were resampled to
cirrus clouds information provided by the quality assessment band 30-m before the DL modeling. Compared to the 250-m scale modeling,
(QA60). the metrics from Sentinel-1 backscattering coefficients and those re-
In addition to the backscattering coefficients and the surface re- lated to the red-edge bands from Sentinel-2 were not used in the 30-m
flectance of the composited Sentinel-1 and Sentinel-2 imagery, spectral scale modeling to ensure the comparability between the two satellites
metrics were calculated from the Sentinel-2 imagery including six ve- (Sentinel-2 and Landsat-8) in the model inputs.
getation indices and three spatial texture variables. The vegetation in- For the DL modeling, there were five layers of neural network with
dices are the most commonly used normalized difference vegetation 100 neurons for each layer were built after tuning of model hy-
index (NDVI) (Crippen, 1990), the enhanced vegetation index (EVI) perparameters. The linear function was selected as the activation
(Jiang et al., 2008), and modified soil-adjusted vegetation index function. The mean squared error (MSE, the sum of squared distances
(MSAVI) (Qi et al., 1994), as well as four red-edge NDVIs (NDVIr- between the observed and predicted values) was used as the loss metric
edEdge) corresponding with the four red-edge bands. The texture for the DL model optimizer since the model used in this study is for
5
W. Li, et al. Int J Appl Earth Obs Geoinformation 92 (2020) 102163
Table 3
The main specifications on spectral bands of Sentinel-2 and Landsat-8 satellites.
Band Sentinel-2 Landsat-8 Wavelength(nm)
pixels with tree percentage cover bigger than 1%, was used as the mask
layer in this study.
3. Results
6
W. Li, et al. Int J Appl Earth Obs Geoinformation 92 (2020) 102163
Table 4
Statistics of comparing the canopy heights obtained from ICESat-2 and ALS at different spatial scales. The statistics include the Pearson’s correlation coefficient (R),
root mean squared error (RMSE), and the relative RMSE (rRMSE) estimated by the generalized linear regression models (GLMs), represented by numbers coloured
from red (low) to green (high). The ICESat-2 Hcanopy was used as the dependent variable for the GLMs. All the correlation is statistically significant with p value less
than 0.01.
learning models tended to slightly underestimate Hcanopy with a mean High correlations were also found between the DL-predicted Hcanopy
error of -0.55 m for the DL model, and -0.80 m for the RF model. (Pre_H) and the ALS HP90 (ALS_H) for the two satellite sensors although
Overall, prediction results from both the footprint level and entire the R-value slightly decreased (Fig. 10a&b). Both of the Sentinel and
spatial pattern demonstrated that the DL model showed a higher per- Landsat-8 data underestimated Hcanopy with a mean error of -0.89 m
formance than the RF model in the estimate of Hcanopy. and -1.35 m, respectively (Fig. 10c&d).
3.3. Comparison of canopy height prediction between Sentinel and Landsat- 4. Discussion
8
4.1. Accuracy of ICESat-2 Hcanopy
To compare the prediction performance between the two satellite
sensors – Sentinel and Landsat-8, the DL model at 30-m scale was also In this study, we testified the accuracy of the ICESat-2 product of
built to predict the spatial pattern of Hcanopy. Results showed that vegetation canopy height in a mountainous forest region in northeast
Sentinel and Landsat-8 showed high consistency in the spatial pattern at China. Our study showed that high correlation was found between the
a 30-m scale, with similar gradients to those at a 250-m scale (Fig. 9). ICESat-2 Hcanopy and ALS canopy height metrics with a slight
Fig. 4. Scatter plot of forest canopy height from ICESat-2 and the ALS at 250-m scale. The dots are coloured by forest cover types. The ALS canopy height represents
the 90-percentile canopy height HP90 at a 250-m scale.
7
W. Li, et al. Int J Appl Earth Obs Geoinformation 92 (2020) 102163
Fig. 5. Scatter plot of observed and predicted forest canopy height at 250-m scale from the independent validation for (a) deep-learning model and (b) random forest
model. The colour of the dots represents the frequency of the dots.
from the previous study in the sparse boreal forests of Alaska where the
Hcanopy was most correlated with the 95th percentile relative height
(Neuenschwander and Pitts, 2019). This suggested that the type of ALS
height metric that used to compare with the ICESat-2 Hcanopy is un-
necessarily to be fixed since different study sites may have different
height distribution patterns. Despite this, the generally high correlation
between many of the ALS height metrics and the ICESat-2 height still
further suggested the high detection accuracy of the ICESat-2 canopy
height product, providing reliable basic data source for the subsequent
spatially contiguous mapping. This has also been testified by the fact
that the estimation errors (DL model: RMSE = 2.64 m, and RF model:
RMSE = 2.93 m) obtained in this study are also comparable to
Neuenschwander et al, 2019 (RMSE = 0.85 ∼ 3.25 m). The satisfactory
results on the ICESat-2 canopy height mapping in this study further
demonstrated that ICESat-2 could bring about more opportunities on
regional and even larger coverage of vegetation height mapping
(Markus et al., 2017; Neuenschwander and Magruder, 2019).
8
W. Li, et al. Int J Appl Earth Obs Geoinformation 92 (2020) 102163
Fig. 7. Canopy height maps (250-m) predicted by (a) deep-learning model and (b) random forest model using Sentinel satellite data.
texture metrics from the spatially contiguous satellites (Sentinel and the gradients of these environmental factors for the entire study site are
Landsat-8) since our main goal is to testify the capability of these sa- much higher than our study (Lefsky et al., 2005; Simard et al., 2011;
tellite data in the estimate of Hcanopy. The co-variables were the com- Wang et al., 2016b). Geodiversity (the variation in Earth's abiotic
posited median images during the growing seasons rather than single- processes and features) on these newly added environmental factors
date imagery between 2017 and 2019, which could not only be con- and their spatial constrains on the ICESat-2 Hcanopy at local and regional
sistent with the ICESat-2 data on acquisition date but also reduce the scales should also be considered in future studies (Zarnetske et al.,
estimation errors caused by intra-annual spectral variations in a short 2019). The spatial heterogeneity or variability of the terrain elevation
period of observation (Patel et al., 2015; Traganos et al., 2018). and Hcanopy obtained from the ICESat-2 satellite could be important
Nonetheless, future efforts should be made to add more other en- forms of geodiversity.
vironmental factors that have been proved to have constraints on the The machine-learning modeling in the present study was built in
spatial distribution of forest canopy height such as topographic two spatial resolution – 250 m and 30 m, mainly considering the fact
roughness, precipitation, temperature, etc (Simard et al., 2011; Wang that the highest correlation between ICESat-2 Hcanopy and the ALS ca-
et al., 2016b; Xu et al., 2016; Zhang et al., 2016). This is particularly nopy height metric HP90 is at 250-m scale, and that one of our main
true when national, continental and global mapping is needed where goals is to compare the performance between Sentinel and Landsat-8.
9
W. Li, et al. Int J Appl Earth Obs Geoinformation 92 (2020) 102163
Fig. 8. (a)∼(b) scatter plots and (c)∼(d) histogram plots depict the predicted Hcanopy (Pre_H) and the ALS HP90 (ALS_H) within the airborne flight areas. Left panel:
deep-learning model. Right panel: random forest model.
Although the MODIS data match well with the 250-m scale, it was not other hand, the recent advent studies on producing harmonized
used in this study due to its relatively limited spectral information at a Landsat-8 and Sentinel-2 time series will also highly facilitate such
resolution ≤ 250 m compared to the relatively higher-resolution sa- long-term dynamics monitoring (Claverie et al., 2018; Shang and Zhu,
tellites (≤ 30 m, Sentinel and Landsat-8). However, this will not con- 2019). However, cautions should be taken when the ICESat-2 are used
ceal the evident advantages of the large amount of MODIS data and its to detect the multi-temporal changes of forest canopy height since the
associated environmental products in the global-scale mapping of forest data availability of ICESat-2 is not equally distributed in different ob-
canopy height as proved by many previous studies (Lefsky, 2010; Wang servation dates and regions, and its revisit cycle is long (91 days). Thus,
et al., 2016b). Although the Sentinel satellites could provide rich future efforts should also be made to further extend the application of
spectral and texture metrics in the highest spatial resolution of 10 m, ICESat satellites by coupling the ICESat-2 (2017∼) and the recently
the ICEAT-2 Hcanopy product showed a relatively weaker correlation released GEDI (Global Ecosystem Dynamics Investigation) data (Qi
with the 10-m scale ALS height metrics. This constrained our study to et al., 2019) with its pioneering ICESat-1 data (2003 ∼ 2009) (Zwally
choose a relatively coarser resolution for the Hcanopy mapping, which et al., 2002). Corrections and matching the geographic positioning
showed a big difference to the previous study who used the ALS canopy among different sensors are important steps before integrating different
height rather than ICEAT-2 Hcanopy as the reference data (Lang et al., satellite LiDAR data. Full use of these repeat altimetry measurements on
2019). Despite this, the positive contribution from the resampled Sen- the earth's surface should be taken in different scientific communities
tinel metrics at 30-m and 250-m scale, e.g., the backscattering coeffi- including the cryosphere, ocean, terrestrial and atmospheric scientific
cients and red-edge NDVIs to the machine-learning models suggested communities (Markus et al., 2017). Machine-learning models will be
that the resampled Sentinel metrics still possess a high response to the very powerful approaches for these applications to cope with the in-
variations in canopy height at a coarser spatial resolution. creasing big earth observation data (Yuan et al., 2020).
High consistency in the performance of Hcanopy modeling between
Sentinel and Landsat-8 suggested that the metrics from both of these
two satellites are very promising co-variables for the upscaling of 5. Conclusion
ICESat-2 Hcanopy. This provides a very important signal on the potential
fusion of these satellite data for future monitoring on vegetation height In this study, we developed a machine-learning-based workflow to
dynamics with the accumulation of the ICESat-2 observation. On the map the spatial pattern of the forest canopy height in a mountainous
region in China by coupling the newly available canopy height product
10
W. Li, et al. Int J Appl Earth Obs Geoinformation 92 (2020) 102163
Fig. 9. Canopy height maps (30 m) predicted by (a) Sentinel and (b) Landsat-8 using the deep-learning model.
from ICESat-2 and the high-resolution Sentinel satellite imagery with advance studies on global ecosystem dynamics.
cross-comparison to Landsat-8 imagery. Our results demonstrated that
the ICESat-2 footprint Hcanopy showed the highest correlation with the CRediT authorship contribution statement
ALS canopy height at a spatial scale of 250 m, providing important
evidence on the reliability of the ICESat-2 vegetation height product Wang Li: Conceptualization, Data curation, Formal analysis,
from the case in China’s forests. Both DL and RF models obtained sa- Funding acquisition, Project administration, Writing - review & editing.
tisfactory results on the upscaling of ICESat-2 Hcanopy with the help Zheng Niu: Conceptualization, Resources, Writing - review & editing,
from Sentinel satellite co-variables. The relatively weaker performance Funding acquisition. Rong Shang: Visualization, Validation, Writing -
of Landsat-8 to Sentinel showed that the addition of the backscattering review & editing. Yuchu Qin: Resources, Validation. Li Wang:
coefficients from Sentinel-1 and the variables related to red-edge bands Resources, Validation. Hanyue Chen: Resources, Validation.
from Sentinel-2 could positively contribute to the prediction of forest
canopy height. This suggests very promising future studies on the global Declaration of Competing Interest
high-resolution mapping of forest canopy height based on the harmo-
nized Landsat-8 and Sentinel-2 product, which will accelerate and
The authors declare that they have no known competing financial
11
W. Li, et al. Int J Appl Earth Obs Geoinformation 92 (2020) 102163
Fig. 10. (a)∼(b) scatter plots and (c) ∼(d) histogram plots depict the predicted Hcanopy (Pre_H) and the ALS HP90 (ALS_H) within the airborne flight areas. Left panel:
Sentinel. Right panel: Landsat-8.
12
W. Li, et al. Int J Appl Earth Obs Geoinformation 92 (2020) 102163
and state of the art. Proc. IEEE 105, 1865–1883. Theoretical Basis Document for Land - Vegetation Along-track Products.
Christin, S., Hervet, É., Lecomte, N., 2019. Applications for deep learning in ecology. Neuenschwander, A.L., Magruder, L.A., 2019. Canopy and terrain height retrievals with
Methods Ecol. Evol. 10, 1632–1644. ICESat-2: a first look. Remote Sens. 11, 1721.
Claverie, M., Ju, J., Masek, J.G., Dungan, J.L., Vermote, E.F., Roger, J.-C., Skakun, S.V., Ni, W., Dong, J., Sun, G., Zhang, Z., Pang, Y., Tian, X., Li, Z., Chen, E., 2019. Synthesis of
Justice, C., 2018. The Harmonized Landsat and Sentinel-2 surface reflectance data leaf-on and leaf-off unmanned aerial vehicle (UAV) stereo imagery for the inventory
set. Remote Sens. Environ. 219, 145–161. of aboveground biomass of deciduous forests. Remote Sens. 11, 889.
Crippen, R.E., 1990. Calculating the vegetation index faster. Remote Sens. Environ. 34, Pang, Y., Lefsky, M., Sun, G., Ranson, J., 2011. Impact of footprint diameter and off-nadir
71–73. pointing on the precision of canopy height estimates from spaceborne lidar. Remote
Cutler, D.R., Edwards Jr., T.C., Beard, K.H., Cutler, A., Hess, K.T., Gibson, J., Lawler, J.J., Sens. Environ. 115, 2798–2809.
2007. Random forests for classification in ecology. Ecology 88, 2783–2792. Patel, N.N., Angiuli, E., Gamba, P., Gaughan, A., Lisini, G., Stevens, F.R., Tatem, A.J.,
El-Amir, H., Hamdy, M., 2020. Deep Learning Pipeline : Building a Deep Learning Model Trianni, G., 2015. Multitemporal settlement and population mapping from Landsat
With TensorFlow. using Google Earth Engine. Int. J. Appl. Earth Obs. Geoinf. 35, 199–208.
Garestier, F., Dubois-Fernandez, P.C., Papathanassiou, K.P., 2007. Pine forest height in- Pourshamsi, M., Garcia, M., Lavalle, M., Balzter, H., 2018. A machine-learning approach
version using single-pass X-band PolInSAR data. IEEE Trans. Geosci. Remote Sens. 46, to PolInSAR and LiDAR data fusion for improved tropical forest canopy height esti-
59–68. mation using NASA AfriSAR campaign data. IEEE J. Sel. Top. Appl. Earth Obs.
Goetz, S.J., Steinberg, D., Betts, M.G., Holmes, R.T., Doran, P.J., Dubayah, R., Hofton, M., Remote Sens. 11, 3453–3463.
2010. Lidar remote sensing variables predict breeding habitat of a Neotropical mi- Qi, J., Chehbouni, A., Huete, A., Kerr, Y., Sorooshian, S., 1994. A modified soil adjusted
grant bird. Ecology 91, 1569–1576. vegetation index. Remote Sens. Environ. 48, 119–126.
Handan-Nader, C., Ho, D.E., 2019. Deep learning to map concentrated animal feeding Qi, W., Saarela, S., Armston, J., Ståhl, G., Dubayah, R., 2019. Forest biomass estimation
operations. Nat. Sustain. 2, 298–306. over three distinct forest types using TanDEM-X InSAR data and simulated GEDI lidar
Healey, S., Hernandez, M., Edwards, D., Lefsky, M., FREEMAN, J., Patterson, P., data. Remote Sens. Environ. 232, 111283.
Lindquist, E., Lister, A., 2016. CMS: GLAS LiDAR-Derived Global Estimates of Forest Reichstein, M., Camps-Valls, G., Stevens, B., Jung, M., Denzler, J., Carvalhais, N., 2019.
Canopy Height, 2004-2008. ORNL DAAC. Deep learning and process understanding for data-driven Earth system science.
Hill, R.A., Hinsley, S.A., 2015. Airborne Lidar for woodland habitat quality monitoring: Nature 566, 195–204.
exploring the significance of Lidar data characteristics when modelling organism- Sankey, T., Donager, J., McVay, J., Sankey, J.B., 2017. UAV lidar and hyperspectral fu-
habitat relationships. Remote Sens. 7, 3446–3466. sion for forest monitoring in the southwestern USA. Remote Sens. Environ. 195,
Hudak, A.T., Lefsky, M.A., Cohen, W.B., Berterretche, M., 2002. Integration of lidar and 30–43.
Landsat ETM+ data for estimating and mapping forest canopy height. Remote Sens. Sankey, T.T., McVay, J., Swetnam, T.L., McClaran, M.P., Heilman, P., Nichols, M., 2018.
Environ. 82, 397–416. UAV hyperspectral and lidar data and their fusion for arid and semi‐arid land ve-
Jiang, Z., Huete, A.R., Didan, K., Miura, T., 2008. Development of a two-band enhanced getation monitoring. Remote Sens. Ecol. Conserv. 4, 20–33.
vegetation index without a blue band. Remote Sens. Environ. 112, 3833–3845. Shang, R., Zhu, Z., 2019. Harmonizing Landsat 8 and Sentinel-2: a time-series-based re-
Jones, M.O., Allred, B.W., Naugle, D.E., Maestas, J.D., Donnelly, P., Metz, L.J., Karl, J., flectance adjustment approach. Remote Sens. Environ. 235, 111439.
Smith, R., Bestelmeyer, B., Boyd, C., Kerby, J.D., McIver, J.D., 2018. Innovation in Simard, M., Pinto, N., Fisher, J.B., Baccini, A., 2011. Mapping forest canopy height
rangeland monitoring: annual, 30 m, plant functional type percent cover maps for globally with spaceborne lidar. J. Geophys. Res. Biogeosci. 116.
U.S. rangelands, 1984–2017. Ecosphere 9, e02430. St-Onge, B., Audet, F.-A., Bégin, J., 2015. Characterizing the height structure and com-
Kugler, F., Schulze, D., Hajnsek, I., Pretzsch, H., Papathanassiou, K.P., 2014. TanDEM-X position of a boreal forest using an individual tree crown approach applied to pho-
Pol-InSAR performance for forest height estimation. IEEE Trans. Geosci. Remote Sens. togrammetric point clouds. Forests 6, 3899–3922.
52, 6404–6422. Stovall, A.E.L., Shugart, H., Yang, X., 2019. Tree height explains mortality risk during an
Kussul, N., Lavreniuk, M., Skakun, S., Shelestov, A., 2017. Deep learning classification of intense drought. Nat. Commun. 10, 4385.
land cover and crop types using remote sensing data. IEEE Geosci. Remote. Sens. Lett. Su, Y., Guo, Q., Xue, B., Hu, T., Alvarez, O., Tao, S., Fang, J., 2016. Spatial distribution of
14, 778–782. forest aboveground biomass in China: estimation through combination of spaceborne
Lang, N., Schindler, K., Wegner, J.D., 2019. Country-wide high-resolution vegetation lidar, optical imagery, and forest inventory data. Remote Sens. Environ. 173,
height mapping with Sentinel-2. Remote Sens. Environ. 233, 111347. 187–199.
Lary, D.J., Alavi, A.H., Gandomi, A.H., Walker, A.L., 2016. Machine learning in geos- Suess, S., van der Linden, S., Okujeni, A., Griffiths, P., Leitão, P.J., Schwieder, M., Hostert,
ciences and remote sensing. Geosci. Front. 7, 3–10. P., 2018. Characterizing 32 years of shrub cover dynamics in southern Portugal using
Lefsky, M.A., 2010. A global forest canopy height map from the moderate resolution annual Landsat composites and machine learning regression modeling. Remote Sens.
imaging spectroradiometer and the geoscience laser altimeter system. Geophys. Res. Environ. 219, 353–364.
Lett. 37. Team, R.C., 2014. A Language and Environment for Statistical Computing. ISBN
Lefsky, M.A., Cohen, W.B., Harding, D.J., Parker, G.G., Acker, S.A., Gower, S.T., 2002. 3‐900051‐07‐0. R Foundation for Statistical Computing, Vienna, Austria. https://
Lidar remote sensing of above‐ground biomass in three biomes. Glob. Ecol. Biogeogr. www.R-project.org/.
11, 393–399. Tian, X., Li, Z., Chen, E., Liu, Q., Yan, G., Wang, J., Niu, Z., Zhao, S., Li, X., Pang, Y., 2015.
Lefsky, M.A., Harding, D.J., Keller, M., Cohen, W.B., Carabajal, C.C., Del Bom Espirito- The complicate observations and multi-parameter land information constructions on
Santo, F., Hunter, M.O., de Oliveira Jr., R., 2005. Estimates of forest canopy height allied telemetry experiment (COMPLICATE). PLoS One 10.
and aboveground biomass using ICESat. Geophys. Res. Lett. 32. Tian, X., Li, Z., Su, Z., Chen, E., van der Tol, C., Li, X., Guo, Y., Li, L., Ling, F., 2014.
Li, D., Guo, H., Wang, C., Li, W., Chen, H., Zuo, Z., 2016. Individual tree delineation in Estimating montane forest above-ground biomass in the upper reaches of the Heihe
windbreaks using airborne-laser-scanning data and unmanned aerial vehicle stereo River Basin using Landsat-TM data. Int. J. Remote Sens. 35, 7339–7362.
images. IEEE Geosci. Remote Sens. Lett. 13, 1330–1334. Traganos, D., Poursanidis, D., Aggarwal, B., Chrysoulakis, N., Reinartz, P., 2018.
Li, W., Niu, Z., Liang, X., Li, Z., Huang, N., Gao, S., Wang, C., Muhammad, S., 2015. Estimating satellite-derived bathymetry (SDB) with the google earth engine and
Geostatistical modeling using LiDAR-derived prior knowledge with SPOT-6 data to sentinel-2. Remote Sens. 10, 859.
estimate temperate forest canopy cover and above-ground biomass via stratified Vermote, E., Justice, C., Claverie, M., Franch, B., 2016. Preliminary analysis of the per-
random sampling. Int. J. Appl. Earth Obs. Geoinf. 41, 88–98. formance of the Landsat 8/OLI land surface reflectance product. Remote Sens.
Liang, X., Kankare, V., Hyyppä, J., Wang, Y., Kukko, A., Haggrén, H., Yu, X., Kaartinen, Environ. 185, 46–56.
H., Jaakkola, A., Guan, F., 2016. Terrestrial laser scanning in forest inventories. Wang, X., Huang, H., Gong, P., Biging, G., Xin, Q., Chen, Y., Yang, J., Liu, C., 2016a.
ISPRS J. Photogramm. Remote Sens. 115, 63–77. Quantifying multi-decadal change of planted forest cover using airborne LiDAR and
Liaw, A., Wiener, M., 2002. Classification and regression by randomForest. R news 2, Landsat imagery. Remote Sens. 8, 62.
18–22. Wang, Y., Li, G., Ding, J., Guo, Z., Tang, S., Wang, C., Huang, Q., Liu, R., Chen, J.M.,
Liu, Y., Gong, W., Xing, Y., Hu, X., Gong, J., 2019. Estimation of the forest stand mean 2016b. A combined GLAS and MODIS estimation of the global distribution of mean
height and aboveground biomass in Northeast China using SAR Sentinel-1B, multi- forest canopy height. Remote Sens. Environ. 174, 24–43.
spectral Sentinel-2A, and DEM imagery. ISPRS J. Photogramm. Remote Sens. 151, Wulder, M.A., White, J.C., Nelson, R.F., Næsset, E., Ørka, H.O., Coops, N.C., Hilker, T.,
277–289. Bater, C.W., Gobakken, T., 2012. Lidar sampling for large-area forest characteriza-
Louis, J., Debaecker, V., Pflug, B., Main-Knorn, M., Bieniarz, J., Mueller-Wilm, U., Cadau, tion: a review. Remote Sens. Environ. 121, 196–209.
E., Gascon, F., 2016. Sentinel-2 Sen2Cor: L2A processor for users. Proceedings Living Xu, C., Hantson, S., Holmgren, M., van Nes, E.H., Staal, A., Scheffer, M., 2016. Remotely
Planet Symposium 2016 1–8 Spacebooks Online. sensed canopy height reveals three pantropical ecosystem states. Ecology 97,
Luo, S.Z., Wang, C., Xi, X.H., Nie, S., Fan, X.Y., Chen, H.Y., Ma, D., Liu, J.F., Zou, J., Lin, 2518–2521.
Y., Zhou, G.Q., 2019. Estimating forest aboveground biomass using small-footprint Yuan, Q., Shen, H., Li, T., Li, Z., Li, S., Jiang, Y., Xu, H., Tan, W., Yang, Q., Wang, J., Gao,
full-waveform airborne LiDAR data. Int. J. Appl. Earth Obs. Geoinf. 83. J., Zhang, L., 2020. Deep learning in environmental remote sensing: achievements
Markus, T., Neumann, T., Martino, A., Abdalati, W., Brunt, K., Csatho, B., Farrell, S., and challenges. Remote Sens. Environ. 241, 111716.
Fricker, H., Gardner, A., Harding, D., Jasinski, M., Kwok, R., Magruder, L., Lubin, D., Zald, H.S.J., Wulder, M.A., White, J.C., Hilker, T., Hermosilla, T., Hobart, G.W., Coops,
Luthcke, S., Morison, J., Nelson, R., Neuenschwander, A., Palm, S., Popescu, S., N.C., 2016. Integrating Landsat pixel composites and change metrics with lidar plots
Shum, C.K., Schutz, B.E., Smith, B., Yang, Y., Zwally, J., 2017. The Ice, Cloud, and to predictively map forest structure and aboveground biomass in Saskatchewan,
land Elevation Satellite-2 (ICESat-2): science requirements, concept, and im- Canada. Remote Sens. Environ. 176, 188–201.
plementation. Remote Sens. Environ. 190, 260–273. Zarnetske, P.L., Read, Q.D., Record, S., Gaddis, K.D., Pau, S., Hobi, M.L., Malone, S.L.,
Moran, P.A., 1950. Notes on continuous stochastic phenomena. Biometrika 37, 17–23. Costanza, J.M., Dahlin, K., Latimer, A.M., Wilson, A.M., Grady, J.M., Ollinger, S.V.,
Neuenschwander, A., Pitts, K., 2019. The ATL08 land and vegetation product for the Finley, A.O., Gillespie, T., 2019. Towards connecting biodiversity and geodiversity
ICESat-2 Mission. Remote Sens. Environ. 221, 247–259. across scales with satellite remote sensing. Glob. Ecol. Biogeogr. 28, 548–556.
Neuenschwander, A.L., 2018. Ice, Cloud, and Land Elevation Satellite-2 Algorithm Zhang, J., Nielsen, S.E., Mao, L., Chen, S., Svenning, J.-C., 2016. Regional and historical
13
W. Li, et al. Int J Appl Earth Obs Geoinformation 92 (2020) 102163
factors supplement current climate in shaping global forest canopy height. J. Ecol. aboveground biomass from combined LiDAR and landsat 8 data. Remote Sens. 11,
104, 469–478. 1459.
Zhang, L., Du, B., 2016. Deep learning for remote sensing data: a technical tutorial on the Zwally, H.J., Schutz, B., Abdalati, W., Abshire, J., Bentley, C., Brenner, A., Bufton, J.,
state of the art. IEEE Geosci. Remote Sens. Mag. 4, 22–40. Dezio, J., Hancock, D., Harding, D., 2002. ICESat’s laser measurements of polar ice,
Zhang, L., Shao, Z., Liu, J., Cheng, Q., 2019. Deep learning based retrieval of forest atmosphere, ocean, and land. J. Geodyn. 34, 405–445.
14