0% found this document useful (0 votes)
22 views13 pages

A NeRF-Based Color Consistency Correction Method For Remote Sensing Images

Uploaded by

andr234ibatera
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views13 pages

A NeRF-Based Color Consistency Correction Method For Remote Sensing Images

Uploaded by

andr234ibatera
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL.

17, 2024 6805

A NeRF-Based Color Consistency Correction


Method for Remote Sensing Images
Zongcheng Zuo , Yuanxiang Li , Tongtong Zhang , and Fan Mo

Abstract—Remote sensing images are prone to significant vari- merge images from different sensors and times. However, varia-
ations in photometry due to changing seasons, illumination, and tions in radiation due to factors such as sunlight incidence angle,
atmospheric conditions. These variations often result in visible atmospheric conditions, and illumination can cause significant
stitching seams at the edges of mosaic images, which can impact
the visual quality and interpretation of the data. To address color visual inconsistencies and stitching seams in the resulting mo-
inconsistencies in remote sensing images, conventional methods saic satellite images [2]. These issues can severely impact the
rely on absolute radiation correction and relative radiation normal- usability of the images in various applications, highlighting the
ization. However, these approaches may not be effective in handling need to explore solutions that ensure the highest quality output.
complex variations and producing visually pleasing outcomes. This Mosaic techniques have become increasingly popular for
article introduces a novel approach based on neural radiance fields
(NeRFs) for correcting color inconsistencies in multiview images. creating seamless and panoramic images, including DOM. How-
Our method leverages implicit expressions and reillumination of ever, two key challenges need to be addressed for optimal results.
the feature space to capture the intrinsic radiance and reflectance First, geometrical misalignments caused by image registration
properties of the scene. By intricately weaving image features errors can lead to visual discontinuities in the resulting image.
together, we generate a fusion image that seamlessly integrates This issue can be resolved through image-warping techniques
color information from multiple views, resulting in improved color
consistency and reduced stitching seams. To evaluate the effective- [3] and optimal seamline detection methods [4]. Second, multi-
ness of our approach, we conducted experiments using satellite temporal satellite images often suffer from photometric incon-
and unmanned aerial vehicle images with significant variations in sistencies, caused by factors such as atmospheric illumination
range and time. The experimental results demonstrate that our variations, varying camera response functions, and different
NeRF-based method produces synthesized images with exceptional exposure settings. While blending techniques can help to smooth
visual effects and smooth color transitions at the edges. The fusion
images exhibit enhanced color consistency, effectively reducing minor color differences between images, these methods are
visible stitching seams and elevating the overall image quality. unable to resolve significant color discrepancies [5], [6], [7],
[8]. As such, it is important to address color inconsistency prior
Index Terms—Color consistency, feature fusion, image
mosaicking, multiple-view images, neural radiance field (NeRF).
to mosaicking multiple images, in order to avoid the presence of
visible seams and color artifacts in the final mosaic image. While
various color transfer approaches and radiometric normalization
methods have been proposed to correct color differences in
I. INTRODUCTION two images, it remains challenging to address these issues in
IGITAL orthophoto maps (DOMs) have become an essen- multiple images. Therefore, the main purpose of our work is to
D tial tool for various fields, including surveying, mapping,
and land resource management, owing to their impressive accu-
devise a method for addressing the color consistency of multiple
images.
racy and versatility [1]. However, the coverage of satellite im- Achieving smooth and natural tone transitions in satellite
agery is limited due to restrictions in sensor design. To overcome images while preserving their texture information and ensuring
this limitation and obtain a more comprehensive range of remote accurate feature interpretation is the primary objective of color
sensing images, mosaic techniques are frequently employed to consistency processing. Two commonly used approaches for
color consistency correction are absolute radiometric calibration
and relative radiation normalization [9]. Absolute radiometric
Manuscript received 30 November 2023; revised 17 January 2024; accepted calibration involves converting the digital number value from
6 March 2024. Date of publication 8 March 2024; date of current version satellite imagery to surface reflectance accurately [10]. However,
21 March 2024. This work was supported in part by the National Natural this method usually requires atmospheric property parameters
Science Foundation of China under Grant 62371297 and in part by the Special
Funds for Creative Research under Grant 2022C61540. (Corresponding author: during image acquisition, which can be difficult to obtain. On
Yuanxiang Li.) the other hand, the relative radiation normalization method aims
Zongcheng Zuo, Yuanxiang Li, and Tongtong Zhang are with the School of to make the radiation information of the target images consistent
Aeronautics and Astronautics, Shanghai Jiao Tong University, Shanghai 200240,
China (e-mail: [email protected]; [email protected]; tongtong_zhang with reference images, typically selected based on overlapping
@sjtu.edu.cn). regions or established color reference databases [11]. While
Fan Mo is with the Land Satellite Remote Sensing Application Center, relative radiometric correction is more widely used due to its
Ministry of Natural Resources, Beijing 100048, China (e-mail: surveymofan@
163.com). less strict input conditions, it also has its disadvantages. The
Digital Object Identifier 10.1109/JSTARS.2024.3374808 determination of the reference image is not standardized, and

© 2024 The Authors. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see
https://creativecommons.org/licenses/by-nc-nd/4.0/
6806 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 17, 2024

obtaining regression calibration parameters can be challenging the experimental results, and finally, Section V concludes this
when there is limited overlap or significant changes in the article.
overlap area.
To address the limitations of traditional methods and achieve II. RELATED WORK
improved color consistency in the fusion of satellite images, this
In this section, we provide an overview of the background
article introduces a new and innovative method called neural and progress of both NeRF and scene relighting, highlighting
radiance field color consistency (NeRF-CC), inspired by the
the key contributions and limitations of existing approaches.
recent rendering paradigm neural radiance fields (NeRFs) [12].
NeRF-CC leverages the power of NeRF’s global multilayer per- A. Neural Radiance Fields
ceptron (MLP) to regress view-dependent radiance and volume
densities at any location, allowing us to synthesize images from NeRF represents a scene as a continuous function that maps
new viewpoints and change the scene characteristics captured 3-D spatial coordinates to radiance values, which can be learned
in different environments in a high-quality and semantically by training a neural network on a series of input images that
meaningful manner. Our approach provides a promising solution capture scenes from different viewpoints. Traditional methods
for achieving improved color consistency in remote sensing for modeling 3-D scenes relied on discrete representations,
images, overcoming the limitations of traditional methods. The for example, voxel grids [13] and textured meshes [14]. How-
key contributions of our approach can be summarized as follows. ever, these approaches have limitations in representing complex
1) NeRF-CC introduces a groundbreaking method that uti- scenes with fine details and textures. Unlike traditional methods
lizes NeRFs for relighting remote sensing images, rev- that rely on discrete representations of scene geometry, NeRF
olutionizing the relighting process by providing explicit employs an MLP parameterized as a continuous volumetric
control over various intrinsic scene properties such as function to capture complex scene geometry and render pho-
local shadows and albedo. By incorporating NeRFs, the torealistic novel views. NeRF is primarily focused on capturing
relighting effects achieved are highly realistic, resulting in a scene from multiple viewpoints and representing it as con-
visually compelling imagery. tinuous density and color values. It learns the spatial structure
2) Our proposed method involves training a neural network and lighting properties of the scene through a neural network
to learn implicit representations of natural scenes. This and attempts to reconstruct the scene from the given image.
representation is decomposed into various components However, the training process of NeRF does not pay special
such as illumination, shadowing, spatial occupancy, and attention to the distinction between surface reflection properties,
diffuse albedo reflectance. This decomposition can pro- such as incident light and underlying materials. This means that
vide a more accurate and detailed understanding of the NeRF may mix the two together, making it difficult to accurately
characteristics of the scene. To train the neural network, separate them when reilluminating. Therefore, NeRF lacks the
we use a self-supervised learning approach, leveraging ability to disentangle surface radiance into incoming radiance
a large dataset of remote sensing captured from diverse and underlying material, limiting its relighting capabilities.
viewpoints and lighting conditions. This extensive training To address this limitation, the problem of inverse rendering
ensures that our model is versatile and adaptable, capable or intrinsic image estimation [15] becomes essential. Classical
of handling various scenarios and generating high-quality approaches in computer vision have made insightful obser-
results. vations about separating images into reflectance and shading
3) Our method takes advantage of a novel recurrent neu- components [16], which can be used to restore the intrinsic
ral network (RNN) architecture that enables the efficient properties of scenes, for example, their geometry and material
processing of input image sequences and facilitates the properties [17], [18]. However, these methods are not suitable
incremental reconstruction of global-scale radiance fields. for generating complete scenes from multiple images.
By iteratively fusing and reconstructing the local radiance Compared to traditional methods that rely on single-image
field of per-image, our method achieves a seamless and analysis, our approach utilizes multiple images captured under
coherent representation of the overall scene radiance, different lighting conditions. While the recent inverse rendering
contributing to the preservation of fine details and en- of single-image methods [19], [20], [21], [22] rely on large
hancing the overall quality of the relit remote sensing datasets with labeled illumination to train networks, our ap-
images. proach only requires the pose of input images of scenes with
Our proposed color correction algorithm outperforms exist- certain illumination conditions. Our work builds upon the pre-
ing representative algorithms in terms of both efficiency and vious work of NeRF [23] by extending its capabilities to handle
effectiveness, as demonstrated by extensive experiments on four arbitrary lighting and global illumination scenarios, whereas
remote-sensing image datasets. We present a comprehensive the original method was limited to single-point light and direct
analysis of our algorithm, detailing its advantages over other illumination.
methods. Our approach draws inspiration from recent work that replaces
The rest of this article is organized as follows. Section II traditional mesh and voxel expression with MLPs to approxi-
provides an overview of related work, Section III details our mate continuous 3-D functions for shape representation [24]. In
proposed approach in color correction, Section IV presents addition, we also leverage the use of MLPs for view synthesis
ZUO et al.: NERF-BASED COLOR CONSISTENCY CORRECTION METHOD FOR REMOTE SENSING IMAGES 6807

under fixed lighting conditions [25]. Some methods use MLPs to In addition, it assumes that inspected objects are at a similar
control the appearance of the output on a latent code encoding the distance to all views, a condition that is not easily satisfied in
lighting of each image, in order to enable relighting. However, uncontrolled photography setups. However, unlike NeRD, our
this approach is limited to the lighting conditions observed method does not require images from the same scene captured
during training and cannot handle new or previously unseen under fixed lighting conditions. Instead, our approach works
lighting conditions. Our method, however, leverages the physics with a single image and reconstructs the underlying scene ge-
of light transport to render scenes under new lighting conditions. ometry and illumination from that input. This makes our method
In addition, our approach is also influenced by graphics research more practical for use in scenarios where multiple images from
on precomputation [26] and approximation [27] techniques for the same scene are under varying lighting conditions.
efficient global lighting computation in physically-based ren-
dering. III. METHOD
Our proposed method, NeRF-CC, aims to improve the ac-
B. Scene Relighting curacy of color consistency correction in satellite images. To
achieve this, our method introduces a novel approach to scene
Various methods have been developed for natural illumination
decomposition, which allows for a comprehensive understand-
editing and scene relighting [28], [29], [30], [31]. The object-
ing of the intrinsic properties of the scene. By decomposing the
centric approach focuses on integrating objects into the image
scene into key components, such as shadows, lighting, spatial oc-
in a manner consistent with lighting, whereas scene-centric
cupancy, and diffuse albedo, we can gain a deeper understanding
methods process the entire scene to generate novel images under
of the scene’s characteristics and generate high-quality images
new lighting conditions. For example, Duchene et al. [28] have
from new viewpoints and under different lighting conditions.
put forward a method to estimate the reflectance of a scene,
To perform color consistency correction, NeRF-CC takes a
as well as its shading and visibility, based on multiple views
set of input images as its initial input. These images may exhibit
captured under fixed lighting conditions. This method allows for
significant color inconsistencies due to variations in seasons,
the creation of new and innovative relighting effects, including
illumination, and atmospheric conditions. However, our neural
moving cast shadows. Barron and Malik [32] proposed a sta-
scene representation captures the underlying scene characteris-
tistical inference-based method to formulate inverse rendering,
tics, enabling us to directly access scene lighting information and
which allows for the estimation of shape normals, shading,
estimate the scene’s intrinsic properties explicitly. This explicit
reflectance, and illumination from a single RGB image. Philip
estimation is essential in achieving accurate color consistency
et al. [30] proposed a method for guiding relighting using a proxy
correction. By understanding the scene’s lighting conditions,
geometry structure that is estimated from multiview images, as
NeRF-CC can adjust the color and tone of the input images to
well as neural networks that can convert buffers of image space
achieve a more consistent and visually pleasing result.
to achieve the desired relighting effects.
Furthermore, NeRF-CC incorporates a dedicated component
Other methods, such as those by Yu and Smith [33] and Yu
for fusing volumes, which is a crucial feature for scene fusion.
et al. [34], focus on estimating the albedo, normal, and lighting
This component ensures that the color consistency correction is
of a natural scene from a viewpoint. Relighting is allowed by
seamlessly applied across different parts of the scene, reducing
editing the reconstructed illumination, either through spherical
the visibility of stitching seams and improving the overall visual
harmonics (SHs) or through a neural renderer trained on a wide
continuity of the mosaic image.
and diverse dataset of uncontrolled images. While these methods
Fig. 1 illustrates the pipeline of NeRF-CC, showcasing the
have shown impressive results visually and numerically, they do
step-by-step process of scene decomposition, color consistency
not offer the capability to edit the camera viewpoint.
correction, and volume fusion. This pipeline demonstrates the
Currently, related work has developed some methods for
efficiency and effectiveness of our proposed method in achieving
using NeRF backbone to achieve relighting [35], [36], [37].
high-quality color consistency correction in satellite images.
However, the existing methods for image relighting often have
The camera pose is essential for projecting the scene onto the
limitations such as the need for input images with fixed il-
2-D image plane, which is necessary for training the NeRF-CC
lumination conditions, reliance on known illumination during
model. Various techniques can be used to estimate the camera
training, or specialization for specific object classes like faces.
pose. In this article, we employed OpenMVG to estimate the
This restricts their practical applicability in scenarios where
camera pose. By leveraging the power of NeRF-CC, our ap-
illumination conditions are uncontrolled or unknown, such as
proach leads to a major enhancement in the visual quality and
in outdoor scenes or remote sensing images. We have developed
interpretability of mosaic remote-sensing images. The accurate
a novel approach to relighting that is similar in concept to the
decomposition of scene components and the explicit estimation
NeRD method proposed by Boss et al. [35], which operates on
of intrinsic properties allow for precise color consistency correc-
images of scenes captured under different lighting conditions.
tion, resulting in smooth transitions and natural tone variations.
NeRD estimates the spatially varying bidirectional reflectance
distribution function (BRDF) using physically-based rendering
A. Spherical Harmonics NeRF
and converts the learned reflectance volume into a relightable
texture mesh. However, NeRD did not explicitly model shadows, NeRF defines its color c(p) and density σ(p) for any point p
which is crucial for high-quality natural scene reillumination. in 3-D space. Rays are projected from the camera origin o along
6808 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 17, 2024

Fig. 1. Pipeline of NeRF-CC. NeRF-CC is our proposed method that uses remote sensing images captured in an uncontrolled environment of a scene to reconstruct
an implicit scene model that can be re-illuminated. NeRF-CC is designed to learn the inherent properties of scenes and illuminations, which are expressed through
the SH coefficients. In addition, specialized neural components are utilized to learn and incorporate the effects of shadows in the scene. During testing, NeRF-CC
demonstrates the ability to synthesize novel images at arbitrary scene illuminations and camera viewpoints. To achieve this, the user provides the desired scene
illumination information and camera pose via SH coefficients.

the direction d corresponding to each output pixel to render an apparent BRDF under fixed illumination [38]. Nevertheless,
image. The final color C(o, d) for a given pixel is rendered by this representation still does not contain the underlying scene
considering values of color and density along the corresponding semantic essence, nor directly control lighting.
ray (o, d) in the image space, which can be defined as in the To enable relighting, the explicit second-order SH lighting
following equation: model [39] was introduced, and we then modified the rendering
(1)
  N
depth       
Ndepth Ndepth Ndepth Ndepth
C (o, d) = C {pi }i = 1 = T (ti ) α (σ (pi ) δi ) c (pi ) C {pi }i = 1 , L = A {p i }i=1  Lb N {p i }i=1
i=1 (2)
(1) where  represents element-wise multiplication. A(x) is the
Ndepth −1
where T (ti ) = exp(− j=1 σ(pj )δj ) is corresponding ray cumulative albedo color, which is produced in a similar manner
N
depth, δi = ti+1 − ti , α (y) = 1 − exp(−y). Depth{ti }i = depth
1 to (1) by combining the output of the diffuse albedo extraction.
is chosen from a uniform distribution using stratified sampling, L stands for the deducible SH coefficient for each image, and
spanning depths along (o, d), which starts from the near and ends b(n) is the SH basis. N (x) is the surface normal calculated from
at the far camera plane. Both the color c(p) and the density σ(p) congregated ray density
are modeled using MLP, and the final pixel value is generated  
Ndepth
by training the model to match the per-pixel value of the ground   N̂ {p i } i=1
Ndepth
truth image in a self-supervised manner. N {pi }i = 1 =   (3)
 Ndepth 2
To enhance the representation of fine details, NeRF employs N̂  {pi }i=1 
hierarchical volume sampling by sampling points at different
depths. Rather than performing a single rendering pass, the 
Ndepth
  ∂
Ndepth 
N̂ {pi }i = 1 = σ (xi )  T (ti ) α (σ (pi ) δi ) .
method employs stratified sampling of points, and their densities ∂xi
i=1
are used for importance sampling. The training procedure of the (4)
ultimate model involves supervision of the pixel colors generated
by both passes, utilizing the ground-truth colors as the point To extract N, we distinguish the point densities of the raw
of reference. This approach enables NeRF to capture intricate x, y, and z components on the ray relative to the ray sample,
details more effectively. and then use weights T (ti )α(σ(pi )δi ) to aggregate them on all
While (1) allows view synthesis of high-quality free, c(p) is N depth samples on the ray, and finally normalize the output
only defined by MLP, which cannot encode illumination. In other vector to the unit sphere. Equation (2) represents the rendering
words, (1) only represents a simple Lambertian model under a process in screen space, which is based on the screen space
fixed and uniform illumination condition. A more generalized albedo and normal information that is obtained from the neural
model with view direction dependence acquires slices of the volume. This allows for fast and efficient rendering of the scene
ZUO et al.: NERF-BASED COLOR CONSISTENCY CORRECTION METHOD FOR REMOTE SENSING IMAGES 6809

Fig. 2. Framework of volumes fusion for global reconstruction. The proposed method involves the following steps. (1) Sequence of images is fed into the pipeline,
and image features(F1 . . . FN ) are extracted by a 2-D CNN. (2) Local sparse neural volume 𝒱1 . . . 𝒱N is reconstructed in the canonical world space for each
image. This is achieved by using a sparse 3-D CNN to fetch and aggregate 2-D features across neighboring images. (3) Local sparse volumes are fused across
images using an RNN, which allows for the sequential construction of global feature volumes 𝒱g1 . . . 𝒱gN . (4) View-dependent radiance and volume density are
regressed from the sparse volumes. This allows for the rendering of new images with differentiable ray marching, enabling modifications to the scene such as
relighting.

while still retaining the details and complexity of the original instead of R3×9 . The motivation behind the proposed method
scene. This accumulation step helps to reduce the noise in the is that shadows mainly depend on the spatial distribution of
estimated albedo and surface normal, facilitating faster conver- light, rather than color. Unlike the traditional ray tracing method
gence. Additionally, this approach allows for a single shading used in [36], our shadow estimator runs much more efficiently,
calculation instead of performing shading calculations for each requiring only a single forward pass similar to geometric and
sample point and accumulating the shaded colors. This opti- albedo estimation. This efficiency is a significant advantage,
mization significantly improves the efficiency of the rendering as it ensures computational scalability while still allowing for
process. relighting with completely new lighting conditions.
All terms in (2) are learnable, except the regular extraction
operator N (x) and the SH basis b(n), which are based on
fixed explicit models. The proposed ensemble of illumination C. Volumes Fusion for Global Reconstruction
models allows explicit reillumination by varying L. Although the We propose a framework of volume fusion for global re-
method accounts for Lambertian effects, it does not directly gen- construction that can generate a local neural volume for each
erate shadows, which is essential for accurately modeling and input image, utilizing K − 1 images from adjacent images
relighting natural scenes. Including direct shadow generation and the input image It (see Fig. 2). By incorporating multiple
would greatly enhance the realism and fidelity of the rendered adjacent images for each image reconstruction, the framework
images, particularly in natural environments where shadows of volumes fusion can obtain more accurate scene geometry
play a significant role. by utilizing multiview correspondence. It is important to note
that our method relies on feature alignment and weaving based
B. Shadow Generation Network on overlapping areas. Therefore, in the absence of overlapping
regions, our method cannot perform feature weaving to fuse
To enable explicit control over shadows during the relighting multiple images effectively.
process, we introduced a specialized shadow model, denoted as Local Volumes Reconstruction: To ensure the generalization
Ndepth
S({pi }i = 1 , L), and extended the rendering (2) of local reconstruction across different scenes, we utilize deep
      multi-view stereo (MVS) techniques, known for their general-
Ndepth Ndepth Ndepth
C {pi }i = 1 , L = S {pi }i = 1 , L A {pi }i = 1 izability. We extract 2-D features from images and use them to
   construct a cost volume, which is then used to regress a neural
Ndepth
 Lb N {pi }i = 1 (5) feature volume. However, our proposed approach is to build local
volume in a standardized world coordinate system to align it with
which is parameterized by the position pi , the depth N , the 𝒱g representing the volume output of global reconstruction,
and the lighting conditions L. The proposed method uses a thereby facilitating the subsequent volumes fusion, which is
shadow model that is delimited by a scalar value represented completely different from MVSNeRF [40] and other MVS-
by an MLP, denoted as s(p, L) ∈ [0, 1]. To compute the final based techniques [41] that construct the truncated volume in the
value of shadow, the model accumulates along the ray into perspective coordinate system of the view. First, the proposed
Ndepth
S({pi }i = 1 , L) ∈ [0, 1], similar to (1). method utilizes a deep 2-D CNN to extract high-level features
It is worth noting that the shadow prediction network uses from each image in the sequence. The framework of volume
SH coefficients as input for its grayscale version, i.e., L ∈ R1×9 fusion obtains 2-D feature maps Ft through the corresponding
6810 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 17, 2024

mapping of each input image It , which encodes important


information about the scene content from each image. These
features are then used to construct the local neural volumes
for each image and enable the rendering of new views using
differentiable ray marching. Next, we consider how to construct
a bounding box containing a set of voxels in canonical space,
which surrounds the truncated volume of all K adjacent images
in world coordinates. We have blocked voxels that are invisible to
all views, resulting in a sparse set of voxels within the bounding
box. To address this problem our method projects image features
into the volume to perform local volume fusion.
The framework of volumes fusion constructs a 3-D feature
volume i for each adjacent image i and its corresponding
feature map Fi . We retrieve the 2-D feature maps of its 2-D
projection from each adjacent view at image t for each visible
voxel centered on v. Besides the feature maps, the framework Fig. 3. GRU fusion step is illustrated in 2-D. The proposed method employs
of volume fusion also incorporates the corresponding viewing an adaptive update mechanism to update the global feature volume 𝒱gt−1 using
the incoming local feature volume 𝒱t . This allows the model to capture new
direction di at each image viewpoint v and then uses MLP G to information about the scene from each input image and incorporate it into the
compute additional feature maps. These feature maps are then global feature volume. This process is performed using a GRU that selectively
combined to form a 3-D feature volume for each image, which updates the hidden state based on the relevance of the new information. The
radiation field of the entire scene is then modeled by using the generated global
can be used for differentiable ray marching to render new views feature volume 𝒱gt , which enables the generation of novel views under arbitrary
and modify the scene. Therefore, each image 3-D volume 𝒰i camera viewpoints and scene illumination.
can be represented as in the following equation:
𝒰i (v) = [Fi (μi ) , G (di )] (6)
where 𝒰i (v) denotes the feature of the voxel centered on v, the volume generated for each image serves as input to our
ui is the 2-D projection of the center in view i, and [·, ·] de- fusion module, which is easier to consume since the volume
notes features concatenation. It is worth noting that we encode is constructed directly in canonical space.
additional information on the input images during the volume Global Volumes Reconstruction: The proposed method aims
fusion process, which enables our subsequent fusion modules to establish a seamless and adaptable scene reconstruction ap-
to effectively account for the effect of synthesis across image proach by utilizing a global neural volume fusion network. This
captures. network is designed to incrementally fuse the local feature vol-
To construct a local radiance field, we aggregate features from umes {𝒱t } obtained from each image to create a comprehensive
multiple adjacent images to regress a local volume 𝒱t from global volume 𝒱g . The fusion process is carried out by using
image t. The features are obtained by computing the variance and a combination of sparse 3-D CNNs and gated recurrent unit
mean of each voxel feature in 𝒰i through adjacent images, simi- (GRU) in the fusion module. Fig. 3 provides a 2-D schematic
lar to how cost volumes are built in MVS-based techniques. The representation of the GRU fusion step. The 3-D CNNs are used
proposed method utilizes the mean operation to fuse the appear- to extract features from the 2-D images and to generate a 3-D
ance information of each image and the variance operation to volume representation of the scene. Specifically, the 3-D CNNs
provide rich clues for geometry reasoning. The above-mentioned take a stack of 2-D images as input and produce a 3-D tensor
operations are invariant to the order and number of input images, as output, where each voxel in the tensor represents a 3-D point
allowing them to handle voxels with varying numbers of visible in the scene. The 3-D CNNs are trained to learn the represen-
image viewpoints. In order to operate on the mean and variance tation of a scene that is consistent across all the input images.
features of each voxel and perform a regression representation The GRUs are used to refine the 3-D volume representation
of each image reconstruction, a deep neural network 𝒥 is used by incorporating temporal information from the input images.
to implement this operation. Specifically, we compute the local Specifically, the GRUs take the 3-D tensor generated by the
volume as the following equation: 3-D CNNs as input and propagate information through time to
refine the representation. This allows the method to model the
𝒱t = 𝒥 ([Meani∈𝒩t 𝒰i , Vari∈𝒩t 𝒰i ]) (7)
temporal coherence of the scene and to produce a more accurate
where Mean and Var denote element-wise mean and variance 3-D reconstruction.
operations, and 𝒩t denotes all K adjacent images used at image Through this unique design, the network can learn how to
t, respectively. iteratively fuse the local reconstructions of each image and pro-
Our approach is inspired by MVSNeRF, which regresses duce a top-notch global radiation field. In addition, the module
a local radiance field using features from neighboring views. leverages the global reconstruction 𝒱gt−1 composed from the
However, our method differs from MVSNeRF, in that we use previous image and the local volume of sparse reconstruction
the generated local volumes for global large-scale rendering, 𝒱t as recurrent inputs at each image t. Overall, this method
rather than focusing solely on local reconstruction. Furthermore, ensures consistency, efficiency, and extensibility in the scene
ZUO et al.: NERF-BASED COLOR CONSISTENCY CORRECTION METHOD FOR REMOTE SENSING IMAGES 6811

reconstruction process global reconstruction 𝒱gt after blending multiple images. Essen-
tially, we use each intermediate local and global volume on each
z t = Mz 𝒱gt−1 , 𝒱t (8)
image to render new images of viewpoint and supervise them
rt = Mr 𝒱gt−1 , 𝒱t (9) using ground truth images. Therefore, the fusion module can
reasonably combine local volumes from several input images.
˜ gt
𝒱 = Mt rt ∗ 𝒱gt−1 , 𝒱t (10)

𝒱gt = (1 − zt ) ∗ 𝒱gt−1 + zt ∗ ˜ gt
𝒱 (11) IV. EXPERIMENTS
The experiment consists of three parts. First, we introduced
where zt and rt represent the reset gate and update gate respec-
metrics to evaluate different performance. Next, we introduced
tively. Mz , Mr , and Mt are neural networks that utilize sparse
the implementation details and training methods of the network.
3-D convolution layers, ∗ denotes the element-wise multipli-
Finally, we conducted a comparison between our proposed color
cation. The proposed method employs a global volume fusion
correction method and several existing methods [43], [44], [45],
module based on GRU, which can incrementally fuse the local
[46] to demonstrate its effectiveness.
feature volume 𝒱t of each image to generate a consistent and
high-quality global radiation field. The fusion process involves
A. Evaluation Metrics
element-wise multiplication denoted by ࢩ, as well as update
and reset gates zt and rt , respectively, which are controlled In order to quantitatively evaluate the performance of the
by deep neural networks Mz and Mr designed with sigmoid color correction method, we use two commonly used indicators,
activation. In addition, the Mt network is also a deep neural PSNR and SSIM, to measure the similarity in terms of color
network with sparse 3-D convolution layers but is designed with and structure from the corrected images. To complement the
a tanh activation function to update the global reconstruction 𝒱gt . evaluation of color correction methods, we also calculated an
To improve efficiency, our approach selectively applies the GRU additional reference metric known as the color distance (CD)
and sparse 3-D CNN networks only to these voxels overlapped and the gradient loss (GL). The CD metric provides an as-
by the local volume 𝒱t , while remaining unchanged for all other sessment of color consistency within a dataset by measuring
voxels in global volumes. This selective approach allows us to the average differences between the color correspondences of
focus computation on the most informative regions while still adjacent images after color correction. By calculating the color
capturing the global context. differences between neighboring images, the CD metric offers
insights into how well the color correction method maintains
D. Objective Function color consistency throughout the dataset. A lower CD value
indicates a higher level of color consistency, indicating that
After reconstructing the radiance field using a sparse neural
the color correction method successfully preserves the color
volume, the final rendered image is produced through differ-
relationships between adjacent images.
entiable ray marching. This process involves the use of the re-
The ground truth of remote sensing images is vital for quanti-
gressed view-dependent radiance and volume density at sampled
tative evaluation experiments. Since it is hard to delimit what the
ray points, similar to the radiance field methods [42]. Our entire
ground truth is for a set of images with inconsistent colors, it is
pipeline is trained entirely on rendering supervision from ground
not feasible for this problem. Color consistency between images
truth images, which does not require any additional geometric
is a relative measure. Therefore, two metrics proposed in [47]
supervision. Specifically, we train the radiance field decoder and
will be used to evaluate the color calibration effects produced
local reconstruction network using a loss function
by different methods. The metrics include the CD and the GL,
2
ℒlocal = Ct − Ĉ (12) which evaluate the color differences between overlap-corrected
2 images and the gradient variations between original images and
where Ct is the pixel color rendered by a local volume 𝒱t rectified images respectively. They can be defined as follows:
reconstructed from image t, and Ĉ as the ground truth pixel  
color. This training method can generate locally realistic im-  ΔH Iˆij , Iˆji
CD = wij (14)
ages because the network learns how to predict reliable local Nb
Ii ∩Ij =∅
volumes. This training process ensures that the fusion module
 
can obtain meaningful volume features provided by the local
N ΔG I , Iˆ
reconstruction module, thus effectively facilitating the fusion 1 i i
GL = (15)
task. Finally, we train the complete pipeline end-to-end, which N i=1 Np
includes the radiance field decoder network, the fusion network,
and local reconstruction network, using a rendering loss. The where Ii and Ij are two overlapped images. Iˆi and Iˆj represent
final objective function is shown in the following equation: the corresponding corrected images. Iˆij stands for the region
 2 2 of Iˆi overlapped with Iˆj . wij is the normalized weight, which
ℒglobal = Ct − Ĉ + Ctg − Ĉ (13) is proportional to the area of Iˆij . ΔH stands for the difference
2 2
t between the color histograms distilled from the bin, and Nb is the
where Ct stands for the pixel color rendered by the local recon- bin number of the color histogram. ΔG stands for the difference
struction 𝒱t , and Ctg stands for the pixel color rendered by the between the gradient direction map extracted from Ii and Iˆi by
6812 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 17, 2024

TABLE I
DETAILED INTRODUCTION TO THE FOUR DATASETS USED IN OUR EXPERIMENT

pixel, Np and is the number of valid pixels in Iˆi . If the CD In addition, our method can be applied to the OpenDroneMap
and GL values are small, then it indicates a higher quality color dataset, which shares similarities with the UAV dataset we tested
calibration outcome. in terms of acquisition method and image characteristics. We
validated our method on a variety of different test datasets that
B. Implementation Details were significantly different from the training dataset, which
further demonstrates that our method has good generalization
Training datasets: We used a combination of largescale re- ability and can maintain consistent performance on different
mote sensing images from MTS-WH [48] and small objects datasets. The experiments were conducted on a computer tested
from DTU [49] as our training data. For the MTS-WH dataset, as equipped with an Intel Core i9-9900K 3.2 GHz CPU, 128 GB
we randomly sampled 100 scenes for training, while for the DTU of RAM, and 2 NVIDIA 2080Ti GPUs.
dataset including 88 training scenes, we used the training method
similar to PixelNeRF [50]. By incorporating various scene types
and camera setups into our training data, our proposed method C. Comparative Experiments
is capable of generalizing to different scenarios. The experi- In our experiments, we evaluate our method against four
ments demonstrate that our method is effective in processing typical color correction methods from the literature [43], [44],
large-scale remote sensing image color correction. [45], [46] using the land, mountain, city, and village datasets.
Training details: To train our network, we uniformly sample For the sake of clarity, we rename these methods in the following
key images from each input image sequence. For the object- experiments as Method 1, Method 2, Method 3, and Method 4,
centric dataset (i.e., DTU), we sampled 16 images for each scene, respectively. Method 1 [43] is a global optimization approach
and all other remaining images were used for supervision. When that minimizes a cost function defined over multiple images
reconstructing local volumes, we select K = 3 adjacent images in order to optimize color consistency, which applies a linear
for each input image. For the MTS-WH dataset, we choose the model to amend color differences. Method 2 [44] improves
three adjacent images with the closest viewing direction and upon Method 1 by combining these linear and gamma models
position. We used TorchSparse [51] to implement the construc- to simultaneously solve these correction issues of color and
tion of sparse volumes and networks. Our network is trained brightness. Method 3 [45] proposes using a linear model to
using the Adam optimizer with an initial learning rate of 0.005. modify the color histograms of multiple images. Method 4
During the training process, we used datasets with varying time [46] proposes a global-to-local color correction approach aimed
spans, color changes, and overlapping areas. This diverse dataset at eliminating both global and local color differences, which
ensured that the model could learn feature matching, alignment, resolves the linear model parameters of all images through a
and fusion capabilities under different conditions. The diversity global optimization scheme.
of the training dataset allowed the model to learn how to handle The first dataset used to evaluate our method is the Land
different seasons, geographic regions, and landscapes. dataset, which consists of 16 multitemporal images selected
Test datasets: Our experiments are conducted on four datasets from the WorldView-3 satellite platform. The input raw images
acquired from different regions, different seasons, and different exhibit severe color differences due to varying atmospheric
sensor platforms, three of which are from satellite platforms conditions during acquisition at different times, as shown in
and one from unmanned aerial vehicle (UAV) platform. These Fig. 4(a). The results obtained by Method 1 to Method 4 are
images were geographically aligned to the same coordinate shown in (b)–(e) of Fig. 4, and the result obtained by our method
system in each dataset. The parameter information for these is shown in Fig. 4(f). From the results obtained by Method 1 and
four datasets is presented in Table I. Note that the included Method 2, these mosaic images have obvious seams between
images were down-sampled to reduce computational costs and these image strips, as shown in Fig. 4(b) and (c). In other words,
memory requirements. The land, mountain, and city datasets these two methods failed to perform color correction on the Land
are composed by sifting from satellite images taken at different dataset, resulting in the mosaic image still retaining obvious
times, so there are severe color discrepancies between these color differences. Method 3 obtained a higher contrast results,
raw images. Since the range of drone photography is generally but due to the linear stretching process in this method, some
limited, it is difficult for drones to obtain multitemporal images areas will be overexposed, especially the originally bright areas
similar to those accumulated by satellite platforms for an area will become brighter. And there are still some color differences
all year round. In order to verify the effectiveness of differ- between adjacent image strips, as shown in Fig. 4(d). Method
ent color correction methods for drone images, we randomly 4 obtained a mosaic image with relatively consistent color, but
adjusted the raw colors of the images in the village dataset. there was excessive blurring in local areas and the overall tone
ZUO et al.: NERF-BASED COLOR CONSISTENCY CORRECTION METHOD FOR REMOTE SENSING IMAGES 6813

Fig. 4. Results generated by different methods on the land dataset. (a) Raw images. (b)–(f) Results of Method 1 to Method 4 and our proposed method, respectively.

dim, resulting in low distinction between surface features, as where the image strips are spliced, especially in the areas of
shown in Fig. 4(e). In contrast, our method produced the best buildings and vegetation, as shown in Fig. 6(b) and (c). Fig. 6(d)
color correction result, as shown in Fig. 4(f). Color discrepancies shows the color correction results of Method 3, which gives
between these raw images are dramatically reduced, and the visually better results than Method 1 and Method 2. However,
quality of the mosaic image is greatly improved as contrast is the resulting mosaic image is dim across the entire area and has
balanced without overexposure or over-blurring. low contrast. Our method and Method 4 achieve good results
The second dataset used to evaluate our method is the on this dataset and effectively eliminate the color differences
Mountain dataset, which consists of 12 multitemporal images between the raw images, as shown in Fig. 6(e) and (f). However,
selected from the Chinese ZY-3 satellite platform. There are for the mosaic image generated by Method 4, we observe that
obvious color differences in these input raw images, as shown the left region of the mosaic image is slightly brighter than
in Fig. 5(a). There are still serious color differences in the the right region, and the contrast of the left region is also
results obtained by Method 1 and Method 2, especially when higher than that of the right side. However, we found that in
there is a very large hue difference between image strips. The the result generated by our method, the overall contrast of the
correction results generated by these two methods will be con- mosaic image was more consistent, and the color transitions
taminated by the hue of an image strip, as shown in Fig. 5(b) were more balanced. Although the result generated by our
and (c). We observed that Method 3 produced a better color method is slightly darker than Method 4, the overall color is more
correction result compared to Methods 1 and 2, as shown in consistent.
Fig. 5(d). However, there is still a slight color difference in The fourth dataset that was used to evaluate our proposed
the middle area of the mosaic image, and the resulting mosaic method was captured using a UAV in a village area and consisted
image is blurry. Fig. 5(e) shows the color correction results of more than 50 images, as shown in Fig. 7(a). Traditional
for Method 4, which produced the highest contrast. However, color correction methods produce unsatisfactory results on this
the mountains in the left area of the resulting mosaic are too dataset. The mosaic image generated by Method 1 still has
bright. Our method effectively eliminates color differences be- serious color differences, as shown in Fig. 7(b). Method 2
tween raw images. The resulting mosaic image has moderate produces slightly better color correction results than Method
contrast and smooth color transitions, as shown in Fig. 5(f). 1, but obvious seams can still be seen between the corrected
Therefore, our method is the best performer on the Mountain image strips, as shown in Fig. 7(c). Method 3 performs well in
dataset. terms of color consistency, and the corrected output has only a
The City dataset is the third dataset used in our experiment, small color difference, as shown in Fig. 7(d). However, Method 3
consisting of 16 multitemporal images taken from Chinese GF-7 suffers from serious lighting inconsistencies, with the right area
satellite, as shown in Fig. 6(a). In the results of Method 1, of the image appearing too dark. Method 4 provides better results
as well as Method 2, there are still serious color differences than the previous methods, but the resulting mosaic image has
6814 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 17, 2024

Fig. 5. Results generated by different methods on the mountain dataset. (a) Raw images. (b)–(f) Results of Method 1 to Method 4 and our proposed method,
respectively.

Fig. 6. Results generated by different methods on the city dataset. (a) Raw images. (b)–(f) Results of Method 1 to Method 4 and our proposed method, respectively.
ZUO et al.: NERF-BASED COLOR CONSISTENCY CORRECTION METHOD FOR REMOTE SENSING IMAGES 6815

Fig. 7. Results generated by different methods on the village dataset. (a) Raw images. (b)–(f) Results of Method 1 to Method 4 and our proposed method,
respectively.

TABLE II
QUANTITATIVE EVALUATION RESULTS OF THE COLOR CORRECTION PRODUCED BY DIFFERENT METHODS

some blurring. Our proposed method produces the most pleasing The quantitative evaluation results of the color correction
output, significantly better than the results of other methods. produced by different methods are shown in Table II. As can
be seen in Table II, our method provides the best CD scores
for these four test datasets, and these CD scores are signifi-
cantly better than the scores of the other four methods. Our
D. Quantitative Comparison method achieves the best CD and GL scores, especially for the
In addition to visual comparison, and to demonstrate the Mountain dataset. Regarding the Land, City, and UAV datasets,
superiority of our method more convincingly, we also performed however, our method did not provide the best scores for GL.
a quantitative evaluation of the results generated by these four The direct reason is that our method causes the color of the
remote sensing datasets, including color consistency measure- raw images in these datasets to change significantly in order to
ments (see Table II) and corrected image quality metrics (see eliminate severe color differences, and inevitably sacrifices some
Table III). gradient information. Considering the comprehensive scores
6816 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 17, 2024

TABLE III
QUANTITATIVE EVALUATION OF THE IMAGE QUALITY RESULTS PRODUCED BY DIFFERENT METHODS

of CD and GL, our method is superior to other comparative advance various applications in remote sensing and computer
methods. vision.
Furthermore, we reported the computation time of all methods
in Table II. We found that the computation times of Method 1
to Method 3 are at the same level. Method 4 is the slowest of
all methods. We also found that, in addition to Method 1 being REFERENCES
faster in the city dataset, our method has a shorter calculation [1] M. Kerschner, “Seamline detection in colour orthoimage mosaicking by
time than the other four methods. use of twin snakes,” ISPRS J. Photogrammetry Remote Sens., vol. 56, no. 1,
pp. 53–64, 2001.
In our research, we found that our method scored highest in [2] X. Li, R. Feng, X. Guan, H. Shen, and L. Zhang, “Remote sensing image
PSNR and SSIM indicators, indicating that our method better mosaicking achievements and challenges,” IEEE Geosci. Remote Sens.
reflects the similarity between corrected images and raw images Mag., vol. 7, no. 4, pp. 8–22, Dec. 2019.
[3] M. J. Canty, A. A. Nielsen, and M. Schmidt, “Automatic radiometric
in test datasets. Although raw images have significant CD and normalization of multitemporal satellite imagery,” Remote Sens. Environ.,
brightness differences and are not suitable as reference images vol. 91, no. 3, pp. 441–451, 2004.
for comparison with corrected images, the reason why our [4] K. Chen, J. Tu, J. Yao, and J. Li, “Generalized content-preserving warp:
Direct photometric alignment beyond color consistency,” IEEE Access,
method can achieve high scores is that these raw images can be vol. 6, pp. 69835–69849, 2018.
implicitly expressed and reilluminated to output the synthesized [5] P. Pérez, M. Gangnet, and A. Blake, “Poisson image editing,” Assoc.
image. This means that the synthesized image can find the best Comput. Machinery Trans. Graph., vol. 22, no. 3, pp. 777–784, 2003.
[6] Z. Su, K. Zeng, L. Liu, B. Li, and X. Luo, “Corruptive artifacts suppression
balance of color tones in terms of the raw images. Our method for example-based color transfer,” IEEE Trans. Multimedia, vol. 16, no. 4,
can find the lighting conditions with the smallest overall color pp. 988–999, Jun. 2014.
tone changes, which means that our method can better approach [7] L. Li, Y. Li, M. Xia, Y. Li, J. Yao, and B. Wang, “Grid model-based global
color correction for multiple image mosaicking,” IEEE Geosci. Remote
real lighting. Compared with other methods, our method can Sens. Lett., vol. 18, no. 11, pp. 2006–2010, Nov. 2021.
minimize color tone changes and obtain images that are closer [8] J. Park, Y. W. Tai, S. N. Sinha, and I. S. Kweon, “Efficient and robust
to real lighting, whereas other methods often have overall color color consistency for community photo collections,” in Proc. IEEE Conf.
Comput. Vis. Pattern Recognit., 2016, pp. 430–438.
tone deviations, and the results obtained can only be close to the [9] Z. Su, D. Deng, X. Yang, and X. Luo, “Color transfer based on multiscale
color tone of a raw image. gradient-aware decomposition and color distribution mapping,” in Proc.
Assoc. Comput. Machinery Int. Conf. Multimedia, 2012, pp. 753–756.
[10] Y. Hwang, J.-Y. Lee, I. S. Kweon, and S. J. Kim, “Color transfer using
probabilistic moving least squares,” in Proc. IEEE Conf. Comput. Vis.
V. CONCLUSION Pattern Recognit., 2014, pp. 3342–3349.
[11] Y. Niu, X. Zheng, T. Zhao, and J. Chen, “Visually consistent color correc-
In this article, we demonstrate a method to recover replace- tion for stereoscopic images and videos,” IEEE Trans. Circuits Syst. Video
able neural volume representations from ambient and indirectly Technol., vol. 30, no. 3, pp. 697–710, Mar. 2020.
illuminated remote sensing images by using visibility MLPs to [12] B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi,
and R. Ng, “Nerf: Representing scenes as neural radiance fields for view
approximate parts of the volume drawn integral. This method is synthesis,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 405–421.
a novel neural rendering method that enables fast, high-quality, [13] N. Snavely, S. M. Seitz, and R. Szeliski, “Photo tourism: Exploring photo
and large-scale multiview scene reconstruction for photorealistic collections in 3D,” Assoc. Comput. Machinery Trans. Graph., vol. 25,
no. 3, pp. 835–846, 2006.
rendering. In addition, our method uses a new recursive neural [14] S. M. Seitz and C. R. Dyer, “Photorealistic scene reconstruction by voxel
network to process the input multiview images and gradually coloring,” Int. J. Comput. Vis., vol. 35, pp. 1067–1073, 1999.
reconstructs the global large-scale radiation field through the [15] R. Ramamoorthi and P. Hanrahan, “A signal-processing framework for
inverse rendering,” in Proc. 28th Annu. Conf. Comput. Graph. Interactive
reconstruction and fusion of the local radiation field of each Techn., 2001, pp. 117–128.
image. [16] B. K. P. Horn, “Determining lightness from an image,” Comput. Graph.
Compared with traditional color consistency methods, we Image Process., vol. 3, no. 4, pp. 277–299, 1974.
[17] B. K. P. Horn, “Shape from shading: A method for obtaining the shape of
reconstruct the image strips fused into a mosaic image into a smooth opaque object from one view,” Tech. Rep., Massachusetts Inst.
these volumetric radiation fields and then render them with Technol., Cambridge, MA, USA, 1970.
relighting to obtain a photorealistic view-synthesized image. [18] J. T. Barron and J. Malik, “Shape, illumination, and reflectance from
shading,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 8,
We conduct tests on four remote sensing datasets, and the pp. 1670–1687, Aug. 2015.
experiments show that our method has clear advantages over [19] Z. Li, M. Shafiei, R. Ramamoorthi, K. Sunkavalli, and M. Chandraker,
existing color correction methods. This work opens new possi- “Inverse rendering for complex indoor scenes: Shape, spatially-varying
lighting and SVBRDF from a single image,” in Proc. Conf. Comput. Vis.
bilities for color consistency correction and has the potential to Pattern Recognit., 2020, pp. 2472–2481.
ZUO et al.: NERF-BASED COLOR CONSISTENCY CORRECTION METHOD FOR REMOTE SENSING IMAGES 6817

[20] S. Sengupta, J. Gu, K. Kim, G. Liu, D. W. Jacobs, and J. Kautz, “Neural [46] L. Yu, Y. Zhang, M. Sun, and Y. Lu, “Automatic reference image selection
inverse rendering of an indoor scene from a single image,” in Proc. Int. for color balancing in remote sensing imagery mosaic,” IEEE Geosci.
Conf. Comput. Vis., 2019, pp. 8597–8606. Remote Sens. Lett., vol. 14, no. 5, pp. 729–733, May 2017.
[21] X. Wei, G. Chen, Y. Dong, S. Lin, and X. Tong, “Object-based illumination [47] M. Xia, J. Yao, and Z. Gao, “A closed-form solution for multi-view color
estimation with renderingaware neural networks,” in Proc. Eur. Conf. correction with gradient preservation,” ISPRS J. Photogrammetry Remote
Comput. Vis., 2020, pp. 380–396. Sens., vol. 157, pp. 188–200, 2019.
[22] Y. Yu and W. A. P. Smith, “InverseRenderNet: Learning single image [48] C. Wu, L. Zhang, and L. Zhang, “A scene change detection framework
inverse rendering,” in Proc. Conf. Comput. Vis. Pattern Recognit., 2019, for multi-temporal very high-resolution remote sensing images,” Signal
pp. 3150–3159. Process., vol. 124, pp. 184–197, 2015.
[23] S. Bi et al., “Deep relightable appearance models for animatable faces,” [49] R. Jensen, A. Dahl, G. Vogiatzis, E. Tola, and H. Aanæs, “Large scale
Assoc. Comput. Machinery Trans. Graph., vol. 40, no. 4, pp. 1–15, 2021. multi-view stereopsis evaluation,” in Proc. Conf. Comput. Vis. Pattern
[24] V. Sitzmann, J. Martel, A. Bergman, D. Lindell, and G. Wetzstein, “Implicit Recognit., 2014, pp. 406–413.
neural representations with periodic activation functions,” Adv. Neural Inf. [50] A. Yu, V. Ye, M. Tancik, and A. Kanazawa, “PixelNeRF: Neural radiance
Process. Syst., vol. 33, 2020, pp. 7462–7473. fields from one or few images,” in Proc. Conf. Comput. Vis. Pattern
[25] L. Yariv et al., “Multiview neural surface reconstruction by disentan- Recognit., 2021, pp. 4576–4585.
gling geometry and appearance,” Adv. Neural Inf. Process. Syst., vol. 33, [51] H. Tang et al., “Searching efficient 3D architectures with sparse point-voxel
pp. 2492–2502, 2020. convolution,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 685–702.
[26] T. Ritschel, T. Grosch, J. Kautz, and S. Muller, “Interactive illumination
with coherent shadow maps,” in Proc. Eurographics Symp. Rendering,
2007, pp. 61–72. Zongcheng Zuo received the M.S. degree in pho-
[27] P. Green, J. Kautz, and F. Durand, “Efficient reflectance and visibility togrammetry and remote sensing from the School
approximations for environment map rendering,” in Proc. Comput. Graph. of Remote Sensing and Information Engineering,
Forum, 2007, vol. 26, no. 3, pp. 495–502. Wuhan University, Wuhan, China, in 2014. He is
[28] S. Duchêne et al., “Multiview intrinsic images of outdoors scenes with currently working toward the Doctorate degree in
an application to relighting,” Assoc. Comput. Machinery Trans. Graph., photogrammetry and remote sensing at the School
vol. 34, no. 5, pp. 1–16, 2015. of Aeronautics and Astronautics, Shanghai Jiao Tong
[29] K. Karsch, V. Hedau, D. Forsyth, and D. Hoiem, “Rendering synthetic ob- University, Shanghai, China.
jects into legacy photographs,” Assoc. Comput. Machinery Trans. Graph., His research interests include color correction for
vol. 30, no. 6, pp. 1–12, 2011. multiple images, 3-D structure extraction from point
[30] J. Philip, M. Gharbi, T. Zhou, A. A. Efros, and G. Drettakis, “Multi-view clouds, 3-D reconstruction from multiple images, re-
relighting using a geometry-aware network,” Assoc. Comput. Machinery mote sensing image interpretation and processing, and deep learning.
Trans. Graph., vol. 38, no. 4, pp. 78–71, 2019.
[31] J. Shi, X. Jiang, and C. Guillemot, “Learning fused pixel and feature-based
view reconstructions for light fields,” in Proc. Conf. Comput. Vis. Pattern Yuanxiang Li received the Ph.D. degree in communi-
Recognit., 2020, pp. 2552–2561. cation engineering from the Department of Electronic
[32] J. T. Barron and J. Malik, “Shape, illumination, and reflectance from Engineering, Tsinghua University, Beijing, China, in
shading,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 8, 2001.
pp. 1670–1687, Aug. 2015. From 2002 to 2004, he was a Research Fellow
[33] Y. Yu and W. A. Smith, “Inverserendernet: Learning single image in- with the National University of Singapore, Singapore,
verse rendering,” in Proc. Conf. Comput. Vis. Pattern Recognit., 2019, and from 2015 to 2016, a Visiting Professor with the
pp. 3150–3159. University of Michigan, Dearborn, MI, USA. He is
[34] Y. Yu, A. Meka, M. Elgharib, H. P. Seidel, C. Theobalt, and W. Smith, currently an Associate Professor with the School of
“Selfsupervised outdoor scene relighting,” in Proc. Eur. Conf. Comput. Aeronautics and Astronautics, Shanghai Jiao Tong
Vis., 2020, pp. 84–101. University, Shanghai, China. His current research
[35] M. Boss, R. Braun, V. Jampani, J. T. Barron, C. Liu, and H. P. Lensch, interests include machine learning, default diagnosis and prediction, time se-
“Nerd: Neural reflectance decomposition from image collections,” in Proc. ries signal analysis, data and knowledge fusion, image recognition, and 3-D
Int. Conf. Comput. Vis., 2021, pp. 12664–12674. reconstruction.
[36] P. P. Srinivasan, B. Deng, X. Zhang, M. Tancik, B. Mildenhall, and J.
T. Barron, “Nerv: Neural reflectance and visibility fields for relighting
and view synthesis,” in Proc. Conf. Comput. Vis. Pattern Recognit., 2021, Tongtong Zhang received the B.Sc. degree in math-
pp. 7491–7500. ematics from the School of Mathematics and Statis-
[37] T. Sun, K. E. Lin, S. Bi, Z. Xu, and R. Ramamoorthi, “Nelf: Neural tics, Xidian University, Xian, China, in 2019. She
light-transport field for portrait view synthesis and relighting,” in Proc. is currently working toward the Ph.D. degree in in-
Eurographics Symp. Rendering, 2021, pp. 1079–1087. formation and control engineering at the School of
[38] B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, Aeronautics and Astronautics, Shanghai Jiao Tong
and R. Ng, “Nerf: Representing scenes as neural radiance fields for view University, Shanghai, China.
synthesis,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 405–421. Her research interests include structure from mo-
[39] R. Basri and D. W. Jacobs, “Lambertian reflectance and linear subspaces,” tion, simultaneously location and mapping, and im-
IEEE Trans. Pattern Anal. Mach. Intell., vol. 25, no. 2, pp. 218–233, age processing.
Feb. 2003.
[40] A. Chen et al., “MVSNeRF: Fast generalizable radiance field reconstruc-
tion from multi-view stereo,” in Proc. Int. Conf. Comput. Vis., 2021, Fan Mo received the Master’s degree in surveying
pp. 14124–14133. and mapping engineering from Information Engi-
[41] Y. Yao, Z. Luo, S. Li, T. Fang, and L. Quan, “MVSNet: Depth inference for neering University, Zhengzhou, China, in 2016.
unstructured multi-view stereo,” in Proc. Eur. Conf. Comput. Vis., 2018, From 2018 to 2023, he was the Head of Laser Cal-
pp. 767–783. ibration Group of Reference Calibration, Department
[42] L. Liu et al., “Neural sparse voxel fields,” in Proc. Int. Conf. NeurIPS, of Land Satellite Remote Sensing Application, Center
2020, vol. 1313, pp. 15651–15663. of Ministry of Natural Resources, and the Deputy
[43] M. Brown and D. G. Lowe, “Automatic panoramic image stitching using Chief Designer of satellite geometry calibration of
invariant features,” Int. J. Comput. Vis., vol. 74, no. 1, pp. 59–73, 2007. ZiYuan3-03. He has designed and developed the op-
[44] Y. Xiong and K. Pulli, “Color matching of image sequences with com- erational attitude postprocessing software system of
bined gamma and linear corrections,” in Proc. Int. Conf. Assoc. Comput. Gaofen-7 satellite. As the main researcher, he devel-
Machinery Multimedia, 2010, pp. 1506–1516. oped the Gaofen-7 satellite laser in-orbit operational inspection and calibration
[45] T. Shen, J. Wang, T. Fang, S. Zhu, and L. Quan, “Color correction for system. His main research direction is the research and development of satellite
image-based modeling in the large,” in Proc. Asian Conf. Comput. Vis., attitude postprocessing and in-orbit geometry calibration technology of satellite
2016, pp. 392–407. laser altitude measurement.

You might also like