Seismic Inversion
Zvi Koren
© 2017, PARADIGM. ALL RIGHTS RESERVED. | 1
References
References
» Gerard T. Schuster, Seismic Inversion, SEG, 2017
» William Menke, Geophysical Data Analysis: Discrete Inverse Theory,
Elsevier, 1984
» Albert Tarantola, Inverse Problem Theory: Methods for Data Fitting and
Model Parameter Estimation, Elsevier, 1987
» Keiiti Aki and Paul G. Richards, Quantitative Seismology: Theory and
Methods, Volume II, 1980
» Selected published papers
© 2017, PARADIGM. ALL RIGHTS RESERVED. | 2
Structure
Structure of the Course
» Every lecture will include two parts
› Main course topics (two academic hours)
› Complementary topics (one academic hour)
» Assignments:
› Every main topic will be followed by an exercise
› A final exercise
© 2017, PARADIGM. ALL RIGHTS RESERVED. | 3
Syllabus
Syllabus (Main Course)
» Introduction to seismic inversion
» Part I: Iterative Optimization Methods
› Newton, Steepest-Descent and Conjugate-gradient methods
» Part II: Numerical Forward Modeling
› Model realizations/parameterizations
› Ray tracing (Eigen Rays)
› Numerical solutions of the wave equation (FD, Pseudo-Spectral, Spectral-Element)
› The visco-elastic/acoustic wave equation
» Part III: Seismic Migration/Imaging
› Forward and Adjoint modeling using Green’s function (GRT)
Born and Kirchhoff approximations
› Ray-based Imaging (Kirchhoff, LAD)
› Reverse Time Migration (RTM)
› Resolution limits
© 2017, PARADIGM. ALL RIGHTS RESERVED. | 4
Syllabus
Syllabus (Main Course) (cont.)
» Part IV: Least-Squares Migrations (LSM)
› Iterative LSM
Kirchhoff/LAD and RTM
› Visco-acoustic LSM
» Part V: Raypath Traveltime (Res. TT) Tomography
» Part VI: Waveform Inversion
› Acoustic waveform inversion and its numerical implementation
› Wave-equation inversion of skeletonized data
› Elastic and visco-elastic full-waveform inversion
Accounting for anisotropy
» Part VII: Image Domain Inversion
› Migration velocity analysis workflow
› Generalized differential semblance optimization
› Generalized image-domain inversion
© 2017, PARADIGM. ALL RIGHTS RESERVED. | 5
Syllabus (Complementary Topics)
» Review of Linear Algebra
» Basic concepts of Probability Theory
› Conditional probability density functions – Bayes Theorem
» Generalized Inverse problems
› Data / Model / Covariance / Resolution matrices
› The Backus-Gilbert Generalized Inverse for the underdetermined problem
The trade-off between resolution and variance
» Maximum Likelihood methods
» Global optimizations
» Seismic Tomography in real production environment
» Amplitude Inversion – From seismic reflectivity to impedance to rock properties
» Time Reversal Mirroring (TRM)
» Machine/Deep Learning (ML/DL)
© 2017, PARADIGM. ALL RIGHTS RESERVED. | 6
Introduction to Seismic Inversion
© 2017, PARADIGM. ALL RIGHTS RESERVED. | 7
Seismic Inversion
» Seismic Inversion is the procedure for reconstructing earth
properties from seismic data.
» Applications of seismic inversions include subsurface
characterization for engineering geology, such as road, tunnel,
reservoir or building construction; oil and mineral exploration;
scientific characterization of volcanoes, tectonic plates.
» The inversion of sound-wave measurements also is used for
nondestructive testing of materials, military sonar ranging and
medical imaging.
© 2017, PARADIGM. ALL RIGHTS RESERVED. | 8
Inverse Problem
» The discrete inverse problem for geophysics can be defined generally
as inverting for the model vector 𝐦 from the recorded data vector 𝐝,
where 𝐝 is related to 𝐦 by, 𝐝 = 𝐋𝐦 .
» 𝐋 represents a nonlinear forward-modeling operator that depends
implicitly on 𝐦 and predicts the data 𝐝 from the model 𝐦 .
» If the original forward-modeling operator is nonlinear, then it is usually
linearized about some background model.
Notations:
Scalars are denoted by lowercase italic letters,
Vectors by lowercase bold,
Matrices by uppercase bold
© 2017, PARADIGM. ALL RIGHTS RESERVED. | 9
Properties of Well-Posed Inverse Problems
1. Well-posed mathematical models, such as the linearized version of
equation 𝐝 = 𝐋𝐦 for physical systems, should have the following
characteristics,
› A solution exists. However, for many geophysical data sets, a
solution does not exist because the data are too “noisy” and
“incomplete” and lead to inconsistent and overdetermined
equations. For example, equations 𝑥 = 3 and 𝑥 = 4 represent an
overdetermined and inconsistent system.
› The remedy is, for example, to seek a solution that minimizes a
weighted sum of residuals squared and an additive penalty
function.
© 2017, PARADIGM. ALL RIGHTS RESERVED. | 10
Properties of Well-Posed Inverse Problems (cont.)
» A data residual is the difference between a measured data value, e. g.,
a traveltime for the first arrival at a specific recording station, and the
predicted data. The predicted data may be generated, for example, by
ray tracing in the predicted velocity model.
» A penalty function is a function that its value becomes large when the
inverted model strays from some preferred characteristic of the
assumed model. For example, smooth velocity models with small
spatial velocity gradient might be preferred, so the squared magnitude
of the velocity gradient is used as a penalty function.
© 2017, PARADIGM. ALL RIGHTS RESERVED. | 11
Properties of Well-Posed Inverse Problems (cont.)
2. The solution is unique. However, in a typical geophysical experiment,
the sources and receivers are located only over a small portion of the
body of interest. This often leads to a nonempty null space for 𝐋, and
consequently, zero eigenvalues of 𝐋𝑇 𝐋. A common remedy is to use
different types of regularization methods.
3. The solution should depend continuously on the data, otherwise the
solution is unstable. For example, an unstable solution is one in
which small changes in the noise level of the data lead to
discontinuous changes in the model. This is a symptom of ill-
conditioned matrix 𝐋, when many eigenvalues of 𝐋𝑇 𝐋 are nearly zero,
so that many different models nearly fit the same data. Again, a
regularization method is used to steer the solution to one with
preferred characteristics.
© 2017, PARADIGM. ALL RIGHTS RESERVED. | 12
Summary of Properties for Well-Posed Inverse Problems
1. Existence of a solution
2. Uniqueness of the solution
3. Stability of the solution
Since none of the above items holds for real geophysical problems,
the seismic inversion is a solution to an Ill-Posed problem!
© 2017, PARADIGM. ALL RIGHTS RESERVED. | 13
Typical Approach to Geophysical Inversion
» The typical approach to geophysical inversion is to recast the problem
such that we seek the optimal model which minimizes a weighted
combination of data misfit (residuals) and model penalty functions
(constraints), also known as an objective function.
» Typically, the objective function is a nonlinear function of the model
parameters. The solution procedure falls under the class of nonlinear
optimization methods.
» An example of such approach may be traveltime tomography
» The goal of traveltime tomography is to estimate the subsurface velocity
distribution by inverting the traveltime errors along traced rays.
© 2017, PARADIGM. ALL RIGHTS RESERVED. | 14
Example: Traveltime Tomography
Assuming a background velocity model exists.
The following representative steps are used to implement traveltime
tomography (valid for most geophysical inverse algorithms):
1. Discretization of the model 𝐦. The subsurface velocity (or slowness)
model is discretized into a (coarse) grid of unknown velocity
perturbations. For simplicity, we assume the slowness perturbations 𝛿𝑠𝑗
to be constant in the 𝑗-th cells, to give a model vector of 𝑁 unknown
slowness perturbation values (𝑁=𝑁𝑥𝑁𝑦𝑁𝑧)
2. Discretization of the data 𝐝. For traveltime reflection tomography,
traveltimes of the reflected events are picked. For 𝑀 rays, the traveltime
measurements (picks) form the data vector of the corresponding length
© 2017, PARADIGM. ALL RIGHTS RESERVED. | 15
Steps of Traveltime Tomography (cont.)
3. Discrete modeling operator 𝐋. The inverse problem is solved by
assuming a starting (background) model and a modeling operator 𝐋
to generate the predicted data. For traveltime tomography, the total
traveltime along the 𝑖-th ray is the sum of individual traveltimes 𝑙𝑖𝑗
in each cell,
n
lij m j di , i 1, 2 m
j 1
𝑙𝑖𝑗 is the ray path segment length of the 𝑖-th ray in the 𝑗-th cell, and 𝑚𝑗
is the constant slowness s𝑗 in the 𝑗-th cell. Here, 𝐋 is 𝑛 × 𝑚 matrix
© 2017, PARADIGM. ALL RIGHTS RESERVED. | 16
High Frequency Approximation
» A high-frequency approximation (or "high energy approximation") for
scattering or other wave propagation problems, in physics or
engineering, is an approximation whose accuracy increases with the
size of features on the scatterer or medium relative to the wavelength
of the scattered particles.
» Classical mechanics and geometric optics are the most common and
extreme high frequency approximations, where the wave or field
properties of, respectively, quantum mechanics and electromagnetism
are neglected entirely.
© 2017, PARADIGM. ALL RIGHTS RESERVED. | 17
High Frequency Approximation (cont.)
» In our case, seismic waves are treated as rays, applying the laws of
geometric optics. As a rule of thumb, the high frequency
approximation is valid if the maximum wave length λmax of the wave
field is less than 1/3 the minimum wave length of the velocity
variations
» The value λmax can be evaluated by dividing the maximum model
velocity by the minimum frequency of the source wavelet
© 2017, PARADIGM. ALL RIGHTS RESERVED. | 18
Steps of Traveltime Tomography (cont.)
4. Linearization. The traveltime equation is nonlinear, because the ray
path segment length 𝑙𝑖𝑗 depends on the velocity model.
According to Snell's law, large gradients in the velocity distributions
will lead to large changes in the ray paths (velocity gradient is related
to the curvature of the path). Therefore, 𝐋 implicitly depends on 𝐦,
and so inverting the equation set will not yield an acceptable answer
unless the starting velocity model is close to the true one.
The remedy is to start with a background model 𝐦(0) = 𝐦o , and
linearize the relationships between the data and the model.
Then the data can be inverted for a more accurate model 𝐦(1) .
Using 𝐦(1) as a new starting model, this procedure can be repeated
until convergence.
© 2017, PARADIGM. ALL RIGHTS RESERVED. | 19
Steps of Traveltime Tomography (cont.)
» Defining the 𝑗-th model parameter as 𝑚𝑗 , the linearization step starts
by expanding the 𝑖-th data measurement 𝑑𝑖 to fist order in a Taylor
series about the first-guess model 𝐦o , close to the true model,
di m
N
di m di m o
2
mj O m
j 1 m j
m m o
where 𝐦 = 𝐦o + δ𝐦 is the model perturbation, or δ𝐦 is the
difference between the actual model vector 𝐦 and the initial model
vector 𝐦o
© 2017, PARADIGM. ALL RIGHTS RESERVED. | 20
Steps of Traveltime Tomography (cont.)
» Ignoring the high-order terms and rearranging the linear terms yields
the linearized equation
di m
N
di m
2
mj O m
j 1 m j
m m o
where 𝐝 𝐦 − 𝐝(𝐦o ) = δ𝐝(𝐦o ) is the data residual, or the difference
between the observed 𝐝 𝐦 and predicted 𝐝(𝐦o ) data vectors
» In matrix-vector notation, the linearized set reads,
δ𝐝 𝐦 = 𝐋(𝐦o )δ𝐦
© 2017, PARADIGM. ALL RIGHTS RESERVED. | 21
Steps of Traveltime Tomography (cont.)
» Matrix 𝐋 denotes as the Jacobian matrix, and its elements are
𝐿𝑖𝑗 = 𝜕𝑑𝑖 (𝐦o )/𝜕𝑚𝑗 , also known as Frechet derivatives
» These derivatives determine the sensitivity of the 𝑖-th data to the
model perturbations in the 𝑗-th cell
» For traveltime tomography and a relatively small cell size, the Frechet
derivative represents the ray path segment length 𝑙𝑖𝑗 in the 𝑗-th cell of
the 𝑖-th ray. The tomography equation becomes, δ𝑡𝑖 = 𝑁 𝑗=1 𝑙𝑖𝑗 δ𝑠𝑗
where δ𝑠𝑗 is the perturbation of slowness in the 𝑗-th cell
and δ𝑡𝑖 is the total traveltime change (error) along the 𝑖-th ray
© 2017, PARADIGM. ALL RIGHTS RESERVED. | 22
Steps of Traveltime Tomography (cont.)
5. Regularization. The recorded data contain “noise”, and this leads to
inconsistent set of overdetermined equations, 𝑀 equations with 𝑁
unknown parameters, 𝑀 > 𝑁. The solution can be unstable, and
many models might nearly satisfy the same data.
› To partially remedy these problems, we seek a model that best
minimizes the objective function ε, which is the 𝑝-norm of the data
residual taken to the 𝑝-th power and the sum of a penalty term
1
2 g m
p
L m d p
p
› Subscript 𝑝 means 𝑝-norm, superscript 𝑝 means raised to power 𝑝.
© 2017, PARADIGM. ALL RIGHTS RESERVED. | 23
Steps of Traveltime Tomography (cont.)
» Parameter η2 is a small positive scalar, and 𝑔(𝐦) is a penalty function
that becomes smaller as the estimated model approaches an a priori
estimate of the actual model
» Parameter 𝑝 is a positive integer, and the residual vector 𝐋δ𝐦 − δ𝐝 is
the difference between the predicted and the observed data.
» In most cases 𝑝 = 2, and the squared length of the residual vector is
the misfit function
1 ′ 𝑝
» The penalty term is sometimes expressed as 𝑔 𝐦 = 𝐦 − 𝐦 𝑝 for
𝑝
some a priori model 𝐦′. For 𝑝 = 2 and 𝐦′ = 𝐦o , the objective
function reads,
2
2
1 2
L m d m
2 2
© 2017, PARADIGM. ALL RIGHTS RESERVED. | 24
Steps of Traveltime Tomography (cont.)
» The objective function is minimized when its gradient vanishes (and the
Hessian matrix – matrix of the second derivatives – is positive definite). The
vanishing gradient leads to,
𝐈 is the identity matrix
L L I m L d
T 2 T
of dimension 𝑁 × 𝑁
» The solution can be symbolically written as,
1
m L L I LT d
T 2
» Note that the transpose operation means also complex conjugate if the
matrix elements are complex numbers
» Derivative penalty terms may be used that reward smoother solutions
© 2017, PARADIGM. ALL RIGHTS RESERVED. | 25
Do not (directly) Invert the Matrix
» We emphasize that notation with the inverse matrix is symbolic
» There is no need to invert the matrix to solve a linear equation set
» Matrix inversion is computationally three times more expensive
operation than solving a linear set
2 3
» 𝑁 Floating point operations are needed to solve the linear set
3
» 2𝑁 3 operations are needed for the matrix inversion
» For a symmetric matrix, in case the algorithms for symmetric matrices
are applied, both numbers reduce by factor 2, i.e.
1 3
› 𝑁 for solving the linear set and 𝑁 3 for the matrix inversion
3
© 2017, PARADIGM. ALL RIGHTS RESERVED. | 26
Steps of Traveltime Tomography (cont.)
6. Iterative regularized solution. The data are related nonlinearly to the
model, so the solution of the tomography equation can be found by
an iterative updating scheme,
1
m m LT L 2I LT d
k 1 k k
» Here 𝑘 and 𝑘 + 1 are iteration numbers, α is the step length, 𝐦(𝑘) and
δ𝐝(𝑘) are the model parameters of the 𝑘-th iteration, and the data
residual, respectively.
» The rays associated with matrix 𝐋 are computed in the 𝑘-th velocity
model, and a sequence of models is generated until the data residual
falls below some acceptable level.
© 2017, PARADIGM. ALL RIGHTS RESERVED. | 27
Steps of Traveltime Tomography (cont.)
» Step α in the iterative solution is theoretically 1, but a positive number
smaller than 1 (say, 0.7) can be used to decrease the difference
between two successive models, in favor of stability of the procedure.
» When the gradient methods are used, the step may be computed
» Matrix 𝐋𝑇 𝐋 + η2 𝑰 is sometimes too expensive to compute, store and
invert. Even if we take into account that a linear set can be solved
instead of the matrix inversion, the direct way is still expensive
» Therefore the matrix is often approximated by its diagonal components,
(𝐋𝑇 𝐋 + η2 𝑰)𝑖𝑗 ≈ (𝐋𝑇 𝐋 + 𝑰)𝑖𝑖 δ𝑖𝑗 η2
(no summation for repeating indices)
© 2017, PARADIGM. ALL RIGHTS RESERVED. | 28
Steps of Traveltime Tomography (cont.)
» The result is the preconditioned steepest descent solution, with
regularization and the normalized (unitless) step-length parameter α
L dT k
m
k 1
m
k
L L
T
ii
2
» Note that the numerator of the fraction on the right-hand side
represents a vector of length 𝑁 (matrix 𝑁 × 𝑀 multiplied by vector of
length 𝑀), while the denominator is a scalar value
© 2017, PARADIGM. ALL RIGHTS RESERVED. | 29
Multiple Local Minima and Skeletonization
» The main problem with nonlinear seismic inversion is that the
objective function often is plagued by many local minima
» Thus, the iterative solution gets stuck in a local minimum, and never
reaches the global minimum, or actual model
» To mitigate this problem, the data can be simplified by skeletonizing it
and inverting only the essential features
» This means solving for longer wavelength features at the early
iterations. At the later iterations, higher-order details are admitted
into the data and model
© 2017, PARADIGM. ALL RIGHTS RESERVED. | 30
Skeletonization (cont.)
» Some skeletonization methods include multiscale inversion, in which
the traces are low-pass filtered to get a somewhat linearized
objective function
» Higher frequencies are admitted gradually with increasing iteration
number
» Another data reduction method is to invert initially the early arrivals,
near-offset traces and/or phases of selected arrivals
» At later iterations, wider offset traces, longer listening times, and
more complex physics ae admitted into the inversion
» Avoiding local minima is an active area of research
© 2017, PARADIGM. ALL RIGHTS RESERVED. | 31
Types of Seismic Inversion
» There are several types of seismic data that are inverted:
› Traveltimes
› Phase information
› Waveform information
› Migration images
© 2017, PARADIGM. ALL RIGHTS RESERVED. | 32
Phase- and Waveform-Inversion Methods
» Under the high frequency approximation, the phase-inversion
method of traveltime tomography inverts picked traveltimes for the
smoothly varying component of the subsurface velocity distribution.
» The method is computationally efficient, and convergence is robust.
» However, since the method assumes high frequency approximation, it
is able to reconstruct only the low to intermediate wavenumbers of
the model.
» As a partial remedy, wave-equation phase and waveform inversion
methods sometimes can achieve higher resolution, but at expense of
much larger computational cost and reduced robustness
© 2017, PARADIGM. ALL RIGHTS RESERVED. | 33
Convergence Issues
» Convergence to the correct solution often is ruined if the modeling
operator 𝐋 does not take into account significant physics of waveform
propagation.
» To mitigate this problem, the data set can be skeletonized so that only
the essential and accurately modeled parts are inverted initially.
» Sometimes the data are transformed to a domain where the events
are more focused and do not overlap one another excessively.
» One such method is migration velocity analysis, which migrates the
estimated migration image for the velocity distributions.
© 2017, PARADIGM. ALL RIGHTS RESERVED. | 34
Incomplete Physics
» In not-too-distant future, computers will be powerful enough to
accommodate full 3D anelastic and anisotropic effects in wave
propagation, and the incomplete physics problem will be mitigated
significantly.
» However, this will even intensify the non-uniqueness issues because
of the many unknowns that need to be determined from the limited
coverage of data.
© 2017, PARADIGM. ALL RIGHTS RESERVED. | 35
Inverse Crimes
» “Inverse Crime” is the use of the same modeling operator (e.g., finite
difference) to generate, as well as to invert, synthetic data.
» To avoid the inverse crime of trivial inversion, the crucial issue is that the
synthetic data has to be computed by a forward solver, which has no
connection to the inverse solver.
» One of such inverse felonies is a declaration of a successful inverse problem
solution without being accompanied by rigorous tests on noisy bandlimited
synthetics, realistic models and representative field data.
» Another example is too fine gridding, so that the model vector is longer
than the data vector (more unknowns than independent equations). In this
case, many models fit the same data.
» “Too close to be true” – when the starting model is the smoothed version of
the true model, so that the problem of multiple local minima are bypassed
unrealistically.
© 2017, PARADIGM. ALL RIGHTS RESERVED. | 36
Thanks
© 2017, PARADIGM. ALL RIGHTS RESERVED. | 37