0% found this document useful (0 votes)
11 views

Array Signal Processing an Algebraic Approach

This document serves as the course material for the MSc level course on array signal processing at TU Delft, focusing on multisensor array processing applications across various fields such as communications, radio astronomy, and medical imaging. It emphasizes algebraic techniques and linear algebra concepts, including singular value decomposition and factor analysis, while providing a comprehensive overview of data models and signal processing algorithms. The course builds on prior knowledge in linear algebra and is designed for students with a background in related mathematical techniques.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Array Signal Processing an Algebraic Approach

This document serves as the course material for the MSc level course on array signal processing at TU Delft, focusing on multisensor array processing applications across various fields such as communications, radio astronomy, and medical imaging. It emphasizes algebraic techniques and linear algebra concepts, including singular value decomposition and factor analysis, while providing a comprehensive overview of data models and signal processing algorithms. The course builds on prior knowledge in linear algebra and is designed for students with a background in related mathematical techniques.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 230

ARRAY SIGNAL PROCESSING

An algebraic approach
ii

EE 4715 (2022): Array Signal Processing


TU Delft
Faculty of Electrical Engineering, Mathematics, and Computer Science
Section Circuits and Systems

ARRAY SIGNAL PROCESSING

An algebraic approach

EE 4715
Spring 2022

Alle-Jan van der Veen


iv

EE 4715 (2022): Array Signal Processing


Contents

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

1 Introduction 1
1.1 Applications of array processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

I DATA MODELS 7

2 Wave propagation 9
2.1 The wave equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Spatial Fourier transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3 Spatial sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4 Correlation processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.5 Application: radio astronomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.6 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3 Narrowband data models 39


3.1 Antenna array receiver model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.2 Narrowband correlation models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.3 Application: radio astronomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.4 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

EE 4715 (2022): Array Signal Processing


vi

4 Wideband data models 63


4.1 Physical channel properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.2 Signal modulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.3 Deterministic data models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.4 Frequency-domain data models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.5 Application: radio astronomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.6 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

5 Linear algebra background 91


5.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.2 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
5.3 The QR factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.4 The singular value decomposition (SVD) . . . . . . . . . . . . . . . . . . . . . . . 99
5.5 Pseudo-inverse and the Least Squares problem . . . . . . . . . . . . . . . . . . . 105
5.6 The eigenvalue problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
5.7 The generalized eigenvalue decomposition . . . . . . . . . . . . . . . . . . . . . . 109
5.8 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

II METHODS AND ALGORITHMS 111

6 Spatial processing techniques 113


6.1 Deterministic approach to Matched and Wiener filters . . . . . . . . . . . . . . . 114
6.2 Stochastic approach to Matched and Wiener filters . . . . . . . . . . . . . . . . . 118
6.3 Other interpretations of Matched Filtering . . . . . . . . . . . . . . . . . . . . . . 122
6.4 Prewhitening filter structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
6.5 Eigenvalue analysis of Rx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
6.6 Beamforming and direction estimation . . . . . . . . . . . . . . . . . . . . . . . . 134
6.7 Applications to temporal matched filtering . . . . . . . . . . . . . . . . . . . . . . 138

7 Weighted Least Squares Beamforming 145


7.1 Maximum Likelihood formulation to direction finding . . . . . . . . . . . . . . . 145
7.2 Covariance Matching; Weighted Subspace Fitting . . . . . . . . . . . . . . . . . . 145

EE 4715 (2022): Array Signal Processing


vii

7.3 Gauss-Newton Solver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145


7.4 Application to Radio Astronomy imaging . . . . . . . . . . . . . . . . . . . . . . 145

8 Direction finding: the ESPRIT algorithm 147


8.1 Prelude: Shift-invariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
8.2 Direction estimation using the ESPRIT algorithm . . . . . . . . . . . . . . . . . 148
8.3 Delay estimation using ESPRIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
8.4 Frequency estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
8.5 System identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
8.6 Real processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
8.7 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

9 Joint diagonalization and Kronecker product structures 169


9.1 Joint azimuth and elevation estimation . . . . . . . . . . . . . . . . . . . . . . . . 169
9.2 Connection to the Khatri-Rao product structure . . . . . . . . . . . . . . . . . . 173
9.3 Joint angle and delay estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
9.4 Joint angle and frequency estimation . . . . . . . . . . . . . . . . . . . . . . . . . 180
9.5 Multiple invariances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
9.6 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181

10 Factor Analysis 185


10.1 The Factor Analysis problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
10.2 Computing the Factor Analysis decomposition . . . . . . . . . . . . . . . . . . . 189
10.3 Rank detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
10.4 Extensions of the Classical Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
10.5 Application to interference cancellation . . . . . . . . . . . . . . . . . . . . . . . 201
10.6 Application to array calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
10.7 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213

11 Independent Component Analysis 219


11.1 Fourth-order Cumulants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
11.2 Data model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
11.3 JADE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219

EE 4715 (2022): Array Signal Processing


viii

11.4 Application: ACMA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219

EE 4715 (2022): Array Signal Processing


PREFACE

This reader contains the course material for the MSc level course on array signal processing at
TU Delft, ET4 147. Over the past 20 years, this course was presented as “signal processing for
communications”; in 2022 it was combined with another course on speech and audio processing
into one that is more general and focuses on multisensor array processing.
Sensor arrays are present in many applications:

– In wireless communications, multiple antennas at the transmitter and/or the receiver allows
to increase data rates and suppress unwanted interference.
– In radio astronomy, collections of the hallmark telescope dishes have been the workhorse for
many years. The array is called an interferometer. Over the past decade, the dishes have
been upgraded with antenna arrays in the focal plane, or been replaced with massive arrays of
“simple” (non-steerable) antennas, typically arranged in some hierarchy. Using the observed
data of one night (or even many nights), the aim is to create images of the sky, as function
of frequency.
– In a medical setting, ultrasound transducers are used to create images of organs in the human
body. Such a transducer can consist of a line array of piezo-electric elements, or of a 2D array.
– Still in a medical setting, electrode arrays are used to capture electrical signals from the
skull, i.e. electro-encephalogram (EEG) signals. These are then processed to obtain a crude
3D-localized image of functional regions in the brain.
– Microphone arrays are being used to filter out unwanted interference in noise-cancelling
headphones. In hearing aids, they are used to focus on an intended speaker while suppressing
background noise.
– Other applications are phased array radar, sonar, and seismic exploration.

While these applications are very diverse, the underlying signal processing data models and
mathematical techniques are in fact very similar. The course will focus on these data models,
introduce the appropriate mathematical technique, then derive generic signal processing algo-

EE 4715 (2022): Array Signal Processing


x

rithms, and relate to one of the applications as an example. While you probably have seen
already many models and mathematical techniques in other courses, such as Detection and Es-
timation, or Machine Learning, or Convex Optimization, the present course will angle towards
matrix models and advanced techniques in linear algebra, such as the singular value decompo-
sition, factor analysis, generalized eigenvalues, and some tensor techniques. This is in line with
the origins of array processing.
However, we will hardly discuss adaptive techniques related to array signal processing, such as
LMS or CMA. An in-depth discussion would need an entire course of its own. Thus, the course
is mostly focused on algebraic technques for array processing.
It is assumed that the participants already have a fair background in linear algebra, although
one lecture is spent to refresh this knowledge.

Acknowledgements
The course material is derived in part from previously published papers, stemming from joint
work with many colleagues and (former) PhD students over the course of 30 years. In particular
I would like to acknowledge the collaboration with Amir Leshem, Stefan Wijnholds, Millad
Sardarabadi: text from (overview) papers we wrote together has been used, but edited to fit the
course.

Alle-Jan van der Veen


Spring 2022

EE 4715 (2022): Array Signal Processing


0

EE 4715 (2022): Array Signal Processing


Chapter 1

INTRODUCTION

Contents
1.1 Applications of array processing . . . . . . . . . . . . . . . . . . . . . 2
1.2 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Signal processing is the theory and engineering art of converting acquired sensor measurements
into “information” (or “useful data”). It starts by deriving data models, or concise abstractions
of the physics behind the observations. Next, methods are developed and algorithms are pro-
posed to extract the “information”. Strongly depending on the application, this could consist
of signal parameters, propagation parameters, reconstructed time domain signals, images, etc.
Finally, part of signal processing is concerned with efficient implementations on computational
platforms.
Array signal processing is the branch of signal processing that considers multiple sensors, or an
array of sensors. This could occur in many applications, e.g., an array of antennas in wireless
communication, or a microphone array inside hearing aids or teleconferencing equipment.
The received data from the multiple sensors are stacked into vectors, and simple data models
express each sensor signal as a linear combination of a stack of transmitted signals s(t) to which
noise n(t) is added, i.e.,
x(t) = As(t) + n(t) . (1.1)
The tools relevant to analyze and process this data are then found in linear algebra: in this
course we will be looking at matrix multiplications and inversion, subspace estimation, eigenvalue
decompositions, and more. Since noise is added and needs to be taken into account, we will also
need tools from statistics, as seen e.g., in a course on Estimation and Detection. These tools are
then generalized to the matrix-vector case.
Thirty years ago, an overview article was publised with the title “Two decades of array signal
processing research” [1]. We can thus consider that the area of array signal processing is about 50
years old, although its origins are of course much older (going back, e.g., to optics interferometry).

EE 4715 (2022): Array Signal Processing


2 Introduction

1.1 APPLICATIONS OF ARRAY PROCESSING

Sensor arrays can be used for many things. This section lists some basic applications.

1.1.1 Diversity

A straightforward application of having multiple sensors is SNR improvement. Suppose we


have M sensors each receiving a copy of the desired signal, but with independent additive noise
contributions:
xm (t) = s(t) + nm (t) , m = 1, · · · , M .
If the desired signal has power σs2 and the noise has power σn2 , then each sensor has SNR

σs2
SNRm = .
σn2

If we average the M received signals,


M M
1 X 1 X
x(t) = xm (t) = s(t) + nm (t) ,
M m=1 M m=1

then s(t) is unaffected, but the noised is averaged out: the power of the noise present in x(t) is
σn2 /M , and the SNR after averaging is

σs2
SNRout = M .
σn2

We say that we have an array gain of M . This is easily generalized to the model (1.1), which
for a single signal in noise is
x(t) = as(t) + n(t)
where in the previous example we had a unit-weight vector a = 1, with 1 = [1, 1, · · · , 1]T . The
signal s(t) is recovered by computing the weighted average1
H
ŝ(t) = w x(t) .

We will see later that the optimal weights are w = a/kak2 . The vector w is known as a
beamformer.
This is applied in wireless communication, where multiple antennas are used to provide diversity.
In the presence of multipath reflections, it may happen that a reflection cancels the desired signal
at the location of one antenna. If we have a second antenna at a slightly different location that
therefore captures a different linear combination of these signals, we can still receive the signal.
1 T H
Superscript denotes a transpose, and superscript a complex conjugate transpose.

EE 4715 (2022): Array Signal Processing


1.1 Applications of array processing 3

(a)

(b) (c)

Figure 1.1. (a) Example of a radio telescope: The Very Large Array, New Mexico; (b) a single
element ultrasound transducer next to a 3D ultrasound array; (c) MiG-35 phased
array radar.

EE 4715 (2022): Array Signal Processing


4 Introduction

1.1.2 Wavefield sampling

An array of sensors is used to sample signals in space. This is useful if the signals have spatial
properties: we consider wavefields, where signals propagate in space. Much of the early research
(1950–1990) is concerned with modeling and estimating the propagation conditions, e.g., direc-
tions of arrival, propagation delays, propagation velocities. If we represent directions of arrival in
two dimensions, then we obtain images, and direction finding is called image formation. Prime
application areas are radar, radio astronomy, ultrasound imaging, underwater acoustics, and
seismic exploration. Fig. 1.1 shows examples of sensor arrays in these applications. In relation
to (1.1), we would say that these applications are interested in estimating parameters of A: a
model for the propagation.

1.1.3 MIMO communication

In other applications, we are interested in the transmitted signals s(t). E.g., if the matrix A
in (1.1) is invertible, we can compute the estimate ŝ(t) = A−1 x(t). In this case, the multiple
antennas are combined by A−1 such that interfering signals are cancelled and the desired signal
is found. A common application is MIMO wireless communication (“multiple input multiple
output”, i.e., multiple antennas at the transmitter and at the receiver), where we increase the
total capacity of the system by spatially separating overlapping signals. Using M antennas, we
can expect to separate M overlapping signals and thus to increase our capacity by a factor of M .
Fig. 1.2 shows an example of a MIMO antenna array that is used for this. In Massive MIMO
designs, we have M > 100, leading to huge capacity gains but also hardware complexities: not
every antenna can be equipped with a transmitter or receiver.
Similarly, in microphone array processing, we are interested in the audio signal (e.g., hearing
aids which nowadays employ multiple microphones to enable noise cancellation).

1.2 APPROACH

Signal processing starts with modeling. Given an application, we first construct a forward data
model which shows how the received sensor signals depend on the sources of interest and the
propagation medium. This can take the simple form of (1.1), but very often, more detail is
needed depending on the situation at hand. E.g., antenna gains may be direction dependent,
multipath may be present such that delayed signals s(t − τ ) also enter into the model, etc.
In wireless communication, source signals are often known up to the unknown symbols in the
message that we try to receive: the signals are deterministic with a number of unknown pa-
rameters. In other cases, such as radio astronomy, the source signals are quite random (e.g.,
described by temporally white Gaussian processes) and it may be more appropriate to define a
stochastic data model. We will frequently look at second order correlation models of the form
H
Rx = ARs A + Rn (1.2)

EE 4715 (2022): Array Signal Processing


1.2 Approach 5

Figure 1.2. 5G MIMO communication array

where Rx = E[x(t)xH (t)] is the correlation matrix of the received signals, and similarly for Rs
and Rn .

Several assumptions were already made to arrive at this model, e.g., stationarity, and inde-
pendence of the signals and the noise. In the modeling phase, it is important to specify the
assumptions that were made to arrive at the model. Classical array processing textbooks often
provide a lot of details on translating wave propagation into models [2]. We will cover some of
this in Chap. 2.

Once we have a model for either x(t) or Rx , we can start to look at methods to estimate the
parameters we are interested in. These could be A, or parameters on which A depends, or
the source signals or parameters related to them. Not surprisingly, the methods we consider
are based on linear algebra, and various methods target various structures that may be present
in the model. E.g., in future chapters we will consider the structure that arises if, in (1.2),
Rs is diagonal, if Rn is diagonal or equal to Rn = σn2 I. This will then result in eigenvalue
decomposition problems or, more general, in factor analysis.

Linear algebra was the main workhorse for array signal processing in the period 1990–2010.
Since then, the attention has shifted to methods arising from compressed sensing, resulting in
formulations of problems as constrained optimization problems. These are then solved using
generic optimization techniques. Nonetheless, the focus of the book is on tools from linear
algebra.

EE 4715 (2022): Array Signal Processing


6 Introduction

1.3 NOTES

“Classical” array processing textbooks are the books by Johnson and Dudgeon [2], and Van
Trees [3]. The state-of-the-art in 1995 is also quite nicely summarized by the Signal Processing
Magazine article of Krim and Viberg [1], Since then, blind beamforming techniques have given
a major impetus to the field. A few books giving an overview are found under the headings
of blind source separation and independent component analysis [4, 5], although this material is
probably better studied by consulting some of the original overview papers [].
A nice overview of applications is found in Haykin [6] which has extensive chapters on geophysics
exploration, sonar, radar, radio astronomy, and medical tomographic imaging (e.g., MRI and
CT scans). An early introduction to phased array radar is presented in Skolnik [7].
Linear algebra is used throughout the book, and a standard reference to this is Golub and Van
Loan [8].

Bibliography

[1] H. Krim and M. Viberg, “Two decades of array signal processing research: the parametric
approach,” IEEE Signal Processing Magazine, vol. 13, no. 4, pp. 67–94, 1996.

[2] D.H. Johnson and D.E. Dudgeon, Array signal processing: concepts and techniques. Prentice
Hall, 1993.

[3] H.L. Van Trees, Optimum array processing: Part IV of detection, estimation, and modulation
theory. Wiley, 2004.

[4] J.V. Stone, Independent component analysis: a tutorial introduction. MIT press, 2004.

[5] P. Comon and C. Jutten, Handbook of Blind Source Separation: Independent component
analysis and applications. Academic press, 2010.

[6] S. Haykin, ed., Array signal processing. Prentice Hall, 1985.

[7] M.I. Skolnik, Introduction to radar systems. McGraw-Hill, 1980.

[8] G.H. Golub and C.F. Van Loan, Matrix computations. Johns Hopkins University Press, 1996.

EE 4715 (2022): Array Signal Processing


Part I

DATA MODELS

EE 4715 (2022): Array Signal Processing


Chapter 2

WAVE PROPAGATION

Contents
2.1 The wave equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Spatial Fourier transforms . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3 Spatial sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4 Correlation processing . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.5 Application: radio astronomy . . . . . . . . . . . . . . . . . . . . . . . 30
2.6 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

In signal processing, data models are used as an abstraction of the physics in an application.
The model should be based on reality but not be overly detailed. Often, a variety of data models
are suitable, with different assumptions leading to different algorithms.

2.1 THE WAVE EQUATION

In free space, RF signals propagate following the Maxwell equations. These describe the rela-
tions between the vector electric and magnetic field intensities. If we specialize them to scalar
components (most sensors will not measure vector fields), we arrive at the wave equation:

1 ∂ 2 s(x, t)
∇2 s(x, t) = (2.1)
c2 ∂t2

where ∇2 is the Laplace operator,

∂2 ∂2 ∂2
∇2 = + + .
∂x2 ∂y 2 ∂z 2

Here, x is a position in space; assuming 3D space, x = [x, y, z]T . The scalar field s(x, t) is a
function of both space x and time t, and we will call it a signal. The coefficient c in (2.1) will

EE 4715 (2022): Array Signal Processing


10 Wave propagation

later be interpreted as the speed of propagation. It depends on the properties of the medium
(specifically, the dielectric permittivity and the magnetic permeability).
In acoustics, a similar equation holds for the acoustic pressure of a sound wave in gas or in a
fluid, and for the longitudinal and transverse waves in solids. In this case, c represents the speed
of sound. In a gas (air), it depends on pressure and temperature.
Different media (or materials) will have different propagation speeds. Interesting effects occur
at the interface between materials, or objects in space, such as reflection and diffraction. It is
also possible for c to vary continuously in space, e.g., due to gradients in salinity or temperature
in ocean water, or due to varying electron densities in the ionosphere.

2.1.1 Plane waves

A typical solution (“eigenfunction”) of this equation has the form1

s(x, t) = ej(ωt−k·x) . (2.2)

If we insert this function into the wave equation, we find the constraint
ω
k= , (2.3)
c
where k = kkk is called the wavenumber (or spatial frequency, in analogy to its role next to ω in
(2.2)), with unit radians per meter. The function s(x, t) represents a monochromatic plane wave.
Indeed, ω represents the radial frequency (in rad/s), and the vector k is called the wavenumber
vector.
To interprete this signal, pick a constant C and look at the function argument,

ωt − k · x = C .

Clearly, for each time t, this describes a plane in 3D space where the function is constant, and
this defines a wavefront. The vector k is the normal to the plane, and indicates the direction of
propagation: in directions x parallel to k, the function argument changes fastest.
For the monochromatic wave, the time period of a cycle is


T = .
ω
Over this time, the wavefront moves over a distance


λ=
k
1
To remain consistent with the literature, we use here a dot, k · x, to represent the inner product between two
T
vectors. Other notations are hk, xi and k x.

EE 4715 (2022): Array Signal Processing


2.1 The wave equation 11

t y z

k
θ θ
T
k
λ

0 x 0 x φ y
x
λ λx

Figure 2.1. Propagation of a monochromatic wave in 1, 2 and 3 dimensions.

meters, in the direction of k. Combining these expressions with (2.3), we see that the distance
λ covered over time T has the ratio
λ ω
= = c,
T k
showing that, indeed, c in (2.1) defines the propagation speed. We can interprete λ as the
wavelength in meters, and k/(2π) = λ−1 as the number of wave cycles that fit into 1 meter.
Let ζ be a unit-norm vector in the direction of k so that k = kζ, and write

1
 
ωt − k · x = ω t − ζ · x .
c

This takes care of the constraint (2.3). The factor 1/c represents a delay in the propagation
direction: the time it takes the wave to cover 1 meter.
Fig. 2.1 shows the propagation in 1, 2 and 3 dimensions. Propagation in 1 dimension, e.g., on
a rope, is less relevant for the book, but sometimes provides a nice example as (x, t) can be
visualized in a simple plot. The figure shows s(x, t) = cos(ωt − kx) where the positive part of
the wave is shaded, and the plot shows both the period T and the wavelength λ.
For propagation in 2 dimensions, a 2D plot can show (x, y) but not t, so the meaning of the plot
is in fact quite different. We can parametrize the propagation direction ζ with a single angle θ,
the angle of incidence of the wave: " #
sin(θ)
ζ=− . (2.4)
cos(θ)
The minus sign in this expression comes from the choice to let the wave propagate towards the
origin. For fixed ω and c, this allows to parametrize k with a single parameter θ,
" # " # " #
k ω sin(θ) 2π sin(θ)
k= x =− =− . (2.5)
ky c cos(θ) λ cos(θ)

EE 4715 (2022): Array Signal Processing


12 Wave propagation

Fig. 2.1(b) shows that the wavefronts are orthogonal to k, and that λ is the (shortest) distance
between wavefronts. If we observe the wavefronts only on the x-axis, the apparent wavelength
λx is longer; similarly, the k-vector projected on the x-axis is shorter. As a result, the apparent
propagation velocity is larger and depends on the direction θ. This provides a means to recover
the direction θ even from 1D observations, for cases where we know the true propagation velocity
c. This plays a role later, when we place our sensors on the x-axis.
Likewise, in 3 dimensions, we will need 2 parameters φ and θ, the azimuth and elevation,
respectively:  
sin(θ) cos(φ)
ζ = −  sin(θ) sin(φ)  . (2.6)
 
cos(θ)
This leads to      
kx sin(θ) cos(φ) sin(θ) cos(φ)
ω 2π 
k = ky  = −  sin(θ) sin(φ)  = −  sin(θ) sin(φ)  . (2.7)
   
c λ
kz cos(θ) cos(θ)
In the plot, the wavefronts are shown as planes orthogonal to k.
More generally, the wave equation supports the addition of multiple monochromatic solutions,
e.g., solutions at different frequencies ω. We can also scale these solutions by some S(ω), and
in the limit we find
1 1 ∞
  Z
1
s t− ζ·x = S(ω) ejω(t− c ζ·x) dω , (2.8)
c 2π −∞
provided this inverse Fourier transform integral converges. Thus, functions of the form s(t −
(1/c)ζ ·x) are also plane waves and a solution of the wave equation. Note that the corresponding
time-domain signal s(t) could be anything; it could be a sinusoid, but also a short pulse traveling
through space. The reason this pulse does not get distorted on its way is that all frequency
components receive the same delay as function of position, as represented by (1/c)ζ · x. The
frequency components remain coherent because in the present formulation c is not a function
of frequency. More in general, c does depend on frequency and this leads to dispersion: a
distortion of the pulse as different frequency components experience different delays (or phase
shifts) during propagation.
The additivity of the wave equation also supports the superposition of signals coming from
different directions. These can be monochromatic signals at the same frequency, at different
frequencies, or general signals of the form (2.8). E.g., the original signal could have been reflected
in an object, resulting in a copy of the signal traveling in a different direction (multipath), or
we can have multiple sources transmitting from different locations.

2.1.2 Spherical waves

The wave equation (2.1) also admits other solutions than plane waves. If we switch from Carte-
sian coordinates to spherical coordinates (r, φ, θ) centered around the origin, and assume spher-

EE 4715 (2022): Array Signal Processing


2.1 The wave equation 13

ically symmetric solutions s(r, t), then these will satisfy the spherical wave equation [1]

1 ∂ 2 (rs)
∇2 (rs) = .
c2 ∂t2

This is the same equation as before, but now in terms of rs(t) instead of s(t). We thus obtain
similar solutions as before, but now as a function of radius r, and scaled by 1/r. E.g., the
monochromatic spherical wave, propagating from the origin, has the form

1
s(r, t) = ej(ωt−kr) ,
r

and a more general solution has the form s(r, t) = 1r s(t − 1c r). The scaling by 1/r is related to
the Friis freespace transmission equation used in telecommunication and radar.
Far away from the origin, in the so-called far field, the solution can be approximated by a plane
wave in case we only study a limited part of space.

2.1.3 Dispersive and diffractive effects

The wave equation (2.1) describes propagation in a lossless, homogeneous medium with constant
propagation velocity c. However, the situation may be more general:

– If the medium is not lossless, waves get attenuated as they propagate.


– If the medium is non-homogeneous, it has different propagation speeds in different areas, and
refraction occurs at the interfaces: part of the wave could be reflected, and part could be
transmitted but at a different angle.
Refraction leads to multipath: a propagating signal arrives at a receiver via several paths,
each with its own direction, attenuation, and propagation delay.
– Diffraction is the effect of waves bending around objects. In RF, this happens around sharp
edges, e.g., rooftops in mobile communication, but scattering also occurs on small objects
such as raindrops. The physics of this are complicated!

Other interesting effects occur if the propagation velocity is frequency dependent: in this case
we do not have the linear relation ω = ck, and dispersion occurs. An example is a prism, where,
at the interface with air, different colors of light are bent in slightly different angles. Other ex-
amples are the propagation of RF signals through the atmosphere or ionosphere, or the acoustic
propagation in the ocean, where propagation speed depends on salinity and temperature.
The effect of dispersion is that of a linear filter H(r, ω), which introduces range-dependent,
frequency-dependent effects in the transmitted signal. Sensors placed in the far field (r large)
will measure the same waveforms, but they are not equal to the transmitted waveform because
of the frequency-dependent filtering: pulse distortion has occured.

EE 4715 (2022): Array Signal Processing


14 Wave propagation

2.2 SPATIAL FOURIER TRANSFORMS

The Fourier transform has shown to be an essential tool in the analysis of linear time-invariant
systems: such systems are characterized by an impulse response, an input signal is convolved
by this impulse response, and in the Fourier domain, this convolution becomes a frequency-wise
multiplication with the transfer function of the system. The transfer function is the Fourier
transform of the impulse response.
Viewed in another way, the inverse Fourier transform shows that a signal can be represented as
a sum of sinusoids, ejωt .

2.2.1 Space-time Fourier transform


For space-time signals s(x, t), we can generalize by defining
Z ∞ Z ∞
S(k, ω) = s(x, t)e−j(ωt−k·x) dxdt ,
−∞ −∞

where, for convenience, the three-dimensional integral over space is represented by a single
integral sign. S(k, ω) is called the wavenumber-frequency representation, and correponding
plots are called F-K plots (with frequency in Hz). Such plots are used in seismic exploration
(geophysics) and underwater acoustics [2].
The corresponding inverse Fourier transform is
Z ∞ Z ∞
1
s(x, t) = S(k, ω) ej(ωt−k·x) dkdω . (2.9)
(2π)4 −∞ −∞

Equation (2.2) showed that a monochromatic plane wave is given by ej(ωt−k·x) . Thus, (2.9)
shows that any space-time signal can be represented by a weighted sum of monochromatic plane
waves.
If s(x, t) is a single monochromatic plane wave with frequency ω0 and wavenumber vector k0 ,
s(x, t) = ej(ω0 t−k0 ·x) , (2.10)
then its spectrum is
S(k, ω) = (2π)4 δ(ω − ω0 )δ(k − k0 ) , (2.11)
which represents a single point in wavenumber-frequency space. This expression is verified by
inserting (2.11) into (2.9).
Let us extend this to a wideband source with spectrum S(ω). The velocity of propagation was
previously shown to be c = ω/k with k = kkk. Since the velocity of propagation is given by the
medium, k and ω are not completely independent. We can pick the direction of propagation,
ζ 0 (a unit-norm vector), and then k = ωc ζ 0 . A single wideband plane wave thus traces a line in
wavenumber-frequency space (i.e., in an F-K plot), and
ω
S(k, ω) = (2π)3 S(ω)δ (k − k0 ) ; k0 = ζ 0 . (2.12)
c

EE 4715 (2022): Array Signal Processing


2.2 Spatial Fourier transforms 15

This generalizes (2.11) to wideband sources. Since they come from a single direction, such
sources will be called point sources (as opposed to spatially extended sources).
Next, we could look at filters. Working by analogy, a space-time filter can be defined by a
frequency response H(k, ω), and the output of the filter is

Y (k, ω) = H(k, ω)S(k, ω) .

The corresponding time domain signal is given by a convolution


Z Z
y(x, t) = h(x, t) ∗ s(x, t) = h(x − p, t − τ ) s(p, τ ) dpdτ

Practically speaking, it is not clear how such filters can be realized, as they act over all of space.

2.2.2 Apertures
In the next section we will look at sampling. Obviously, we will not be able to place sensors
anywhere in space: normally they will be placed on a line or within a limited spatial region. In
analogy to optics, the area over which we will sample space is called the aperture, and it acts
as a spatial window w(x):
y(x, t) = w(x)s(x, t) . (2.13)
E.g., for x = [x, y, z]T , a linear aperture on the x-axis of size D (a “slit”) is defined by
(
1, |x| < D/2
h(x) = ⇔ w(x) = h(x)δ(y)δ(z) , (2.14)
0, otherwise

and a circular aperture on the (x, y) plane in 3D with diameter D is


(
1, x2 + y 2 < D/2
h(x, y) = ⇔ w(x) = h(x, y)δ(z) . (2.15)
0, otherwise

For time-domain signals, we know that a product in time domain becomes a convolution in
frequency domain. Thus, by analogy, applying the space-time Fourier transform to (2.13) yields
1 1
Z
Y (k, ω) = 3
W (k) ∗ S(k, ω) = W (k − p)S(p, ω)dp (2.16)
(2π) (2π)3
where the aperture smoothing function is
Z
W (k) = w(x) ej k·x dx . (2.17)

For the linear aperture (2.14) and with k = [kx , ky , kz ]T , we obtain

sin(kx D/2)
W (k) = , (2.18)
kx /2

EE 4715 (2022): Array Signal Processing


16 Wave propagation

y D
D

θ k ∼ 2π
∼ λ
D/2 D D/2 D

W( )
W(k)
visible

0 0

0 x -8 /D -4 /D 0 4 /D 8 /D - /2 - /4 0 /4 /2
−D/2 D/2 kx

Figure 2.2. Aperture function in a 2D scenario. (a) a linear aperture (slit); (b) the corre-
sponding W (k), only the kx component is shown; (c) W (θ), for D = 2λ.

which is a sinc function in kx , and constant in ky , kz . For the plane wave signal s(x, t) with
wavenumber-frequency transform (2.12), the resulting spectrum is
ω
Y (k, ω) = W (k) ∗ S(ω)δ(k − k0 ) = S(ω)W (k − k0 ) ; k0 = ζ 0 . (2.19)
c
Thus, the effect of the aperture (window) in spatial domain is a convolution of the signal in
wavenumber-frequency domain, which smears out (smooths) the spatial spectrum. The resulting
signal does not come from a single direction ζ 0 anymore, but appears to come from a range of
directions around ζ 0 . Thus, the effect of the aperture is a limitation on resolution.
For the linear aperture, (2.18) shows that we get some dilution in the kx component, while ky
and kz are completely dropped: the y and z components of the field are not measured. Thus,
by looking through the slit, only a 1D propagation scenario is visible. Signals with the same kx
but different ky , kz are indistinguishable.
Fig. 2.2 shows a linear aperture in a 2D scenario, and the corresponding function W (k). Since
it only depends on kx , only this component is shown. It is seen from (2.18) that the peak of
the sinc function has magnitude D. The first zero crossing occurs at kx = 2π/D, hence the
main lobe width is said to be approximately 2π/D (the exact value depends on the definition of
width). Thus, as D → ∞, the sinc function converges to a delta spike, as expected. Consider
now the parametrization of k as in (2.5). Then

kx = − sin(θ) .
λ
Clearly, as θ varies from −π/2 to π/2, then kx ranges between ± 2π λ , and this is the part of the plot
of W (kx ) that is “visible” for fixed λ and varying direction of arrival θ. In this parametrization,
we find (with some abuse of notation)2
sin( D
λ π sin(θ))
W (θ) = D D
.
λ π sin(θ)
2
Correct notation would define Wk (k) and Wθ (θ), but we would like to avoid such adorned notation.

EE 4715 (2022): Array Signal Processing


2.2 Spatial Fourier transforms 17

z
D2 /4

k λ
∼ 1.22 D

W( )
2
θ D /8

0
0 D/2 y
D/2 - /2 - /4 0 /4 /2
x

Figure 2.3. Aperture function in a 3D scenario. (a) a circular aperture; (b) the corresponding
W (k), only the (kx , ky ) components are shown; (c) W (θ), for D = 2λ.

This function is plotted in the right panel, for D = 2λ. The first zero crossing occurs for
(D/λ)π sin(θ) = π, i.e., sin(θ) = λ/D. Thus, we see that the main lobe width in the θ-plot is
approximately λ/D. This will later be interpreted as the angular resolution of this aperture.
Since the maximum kx that can be obtained is (D/λ) π, we also see that only part of the plot
of W (kx ) is visible, as indicated by the dashed box. For a given D, the visible part depends on
λ, and the ratio D/λ determines the number of sidelobes of W (k) that are visible in W (θ).
Note that we defined W (θ), but to compute the response for a source from direction θ0 , we cannot
work with W (θ − θ0 ). Instead, starting from (2.19), we we can write Y (θ, ω) = S(ω)W (θ; θ0 ),
where
sin( D
λ π[sin(θ) − sin(θ0 )])
W (θ; θ0 ) = D D
.
λ π[sin(θ) − sin(θ0 )]

For θ0 close to 21 π, the beamshape will not only center around θ0 , but be distinctively different,
with a much broader main lobe.
In 3D, if we take a square aperture around the origin in the (x, y)-plane, then w(x) =
h(x)h(y)δ(z), and
sin(kx D/2) sin(ky D/2)
W (k) = . (2.20)
kx /2 ky /2

Ideally, we design apertures such that W (k) is as close to a delta spike as possible: the width
of the main lobe determines the spatial resolution in applications such as direction finding. On
the other hand, we don’t necessarily need narrow main lobes in all three dimensions of k: for
direction finding, we are interested in the direction vector ζ, and if c is known, there are only 2
independent dimensions to specify ζ.

EE 4715 (2022): Array Signal Processing


18 Wave propagation

Circulair aperture In 3D, for a circular aperture with diameter D = 2R, one shows that [1]
2πR q
W (k) = J1 (kxy R) , kxy = kx2 + ky2 ,
kx y
where J1 (·) is the first-order Bessel function of the first kind. This smooting function is known in
optics as the Airy disk. It describes the pattern (bright spot and rings around it) that is visible
on a screen placed behind a small uniformly illuminated aperture. Fig. 2.3 shows the aperture,
W (k) and W (θ). The Bessel function is quite similar to a two-dimensional sinc function, but
note that it is circularly symmetric (unlike (2.20))..
The first zero crossing of J1 (x) occurs for x = 3.8317 . . .. Using (2.7), we find
2π λ
sin(θ)R = 3.8317 ⇔ sin(θ) = 1.22 (2.21)
λ D
where D = 2R is the diameter of the array. Again, the beamwidth of this aperture is determined
by λ/D.

2.3 SPATIAL SAMPLING

To measure a field, practically we can observe it only over some finite area in space: the aperture.
We could use, e.g., a parabolic dish with diameter D, and then the aperture is the size of the
dish. The dish casts the incoming energy onto (usually) a single sensor, and because of its
directionality, we will have to scan it to cover all directions.
If D is large, then this is not practical. Instead, we can place a number of sensors inside the area
covered by the aperture. This sensor array will spatially sample the wavefield. The sensors could
be simple antennas, or they could be small dishes or arrays themselves, leading to a hierarchy
of arrays. For the moment, we will assume that the sensors are ideal, omnidirectional antennas,
i.e., they simply capture s(x, t) at a specific position x.
At first, the theory of spatial sampling can be presented as a direct extension of the usual tempo-
ral sampling: sampling creates periodicities in the spectrum, leads to aliasing, and bandlimited
signals can be perfectly reconstructed from their samples. Aliasing in this context means that
sources from two different directions will result in the same sampled signal and thus cannot be
distinguished.
Fig. 2.4 shows some of the notation we will be using in this section. We will first sample the
wavefield, and subsequently apply the aperture (= select a finite number of spatial samples)
and also apply weights to the selected samples. We use X(k, t) and Y (k, t) to keep track of the
related spatial spectra.

2.3.1 Infinite number of sensors


To study spatial sampling, let us start with a 1D scenario, where a field s(x, t) is sampled at
regular locations xm = m d, where d in meters is the distance between the sensors. The samples

EE 4715 (2022): Array Signal Processing


2.3 Spatial sampling 19

source field sampling aperture


&
weighting

s(x, t)
k
xm (t) y(t)
θ
{wm }

S(k, t) X(k, t) Y (k, t)

Figure 2.4. Notation related to spatial sampling and beamforming.

are
xm (t) = s(md, t) , m = · · · , −1, 0, 1, · · · .

An infinite number of sensors is needed, but this will be managed later. The original “continu-
ous” signal s(x, t) has space-time spectrum
Z Z Z Z 
S(k, ω) = s(x, t)e−j(ωt−kx) dxdω = s(x, t)ejkx dx e−jωt dω ,

Note that the Fourier transform over space and time decouples. For simplicity of notation, we
will instead consider here only the spatial spectrum (omitting the transformation in time): let
Z
S(k, t) = s(x, t)ejkx dx .

This looks like the usual Fourier transform, except for the minus sign in the exponent, but that
is of no consequence. Inversely,

1
Z
s(x, t) = S(k, t)e−jkx dk .

In analogy to time-domain sampling, define the spatial sampling frequency as


ks = .
d

EE 4715 (2022): Array Signal Processing


20 Wave propagation

Next, we split the integration over k into a fundamental interval (−ks /2, ks /2], plus shifts nks
of this interval, for n = · · · , −1, 0, 1, · · ·. This leads to
1 X nks +ks /2
Z
s(x, t) = S(k, t)e−jkx dk
2π n nks −ks /2
1 X ks /2
Z
= S(k − nks , t)e−jkx e−jnks x dk .
2π n −ks /2

Sampling x (but not t, yet), and using nks md = 2πnm,

1 X ks /2
Z
xm (t) = s(md, t) = S(k − nks , t)e−jkdm e−j2πnm dk
2π n −ks /2
Z ks /2 "X #
1
= S(k − nks , t) e−jkdm dk . (2.22)
2π −ks /2 n

Let us compare this to the spectrum that we can define for the sampled signal, in analogy to
the DTFT. Various definitions are possible, and we opt for
Z ks /2
d
X(k, t)e−jkdm dk .
X
X(k, t) = xm (t)ejkdm ⇔ xm (t) = (2.23)
m 2π −ks /2

This spectrum is defined on the fundamental interval −ks /2 ≤ k ≤ ks /2, and for larger k it is
periodic (since ks = 2π/d). The usual factor 1/2π in the inverse transform is replaced here by
d/(2π) = 1/ks because we have defined the spectrum using k, instead of a normalized frequency
variable kd which would range from −π to π.
Comparing to (2.22), we see that the spectrum of the sampled signal is related to that of the
unsampled signal via
1X 1 1
X(k, t) = S(k − nks , t) , − ks ≤ k ≤ ks ,
d n 2 2

and periodic elsewhere.


The summation in this expression represents aliasing: the original spectrum is shifted by mul-
tiples of the sampling frequency ks , and these copies are all added. If the original spectrum is
zero outside the interval [−ks /2, ks /2], then these copies do not overlap, and
1 1 1
X(k, t) = S(k, t) , − ks ≤ k ≤ ks .
d 2 2
This is the equivalent of the Nyquist sampling rate condition, corresponding to a spatially
bandlimited signal. For constant c, the relation k = ω/c allows to immediately translate this to
a condition for a temporal-frequency bandlimited signal,
ks c c
|k| < ⇔ |ω| < π ⇔ |f | <
2 d 2d

EE 4715 (2022): Array Signal Processing


2.3 Spatial sampling 21

where ω = 2πf , with f in Hz. Alternatively, the distance between the sensors has to satisfy
c
d< ,
2B
where B = fmax is the bandwidth of the signal in Hz. Or, if λmin = c/B is the smallest
wavelength in the signal,
d < 12 λmin .
In words: the distance between sensors has to be less than half of the shortest wavelength in
the signal. If this Nyquist condition holds, then Shannon’s sampling theorem states that the
continuous signal can be recovered perfectly by lowpass filtering the periodic spectrum of the
sampled signal, which amounts to sinc interpolation. Consequently, no information is lost if the
sensors are spaced closer than 21 λmin . Otherwise, aliasing will occur, which will be problematic
for direction finding (or imaging) applications.
These results extend to higher dimensions. For a wavefield in 2D, we use uniform sampling in
2 dimensions, etc. Nonetheless, this theory is not entirely satisfying yet, as we would like (i) to
sample using a finite number of sensors, (ii) to sample 3D space using only a 2D array, (iii) to
consider using a random (non-uniformly spaced) array.

2.3.2 Finite number of sensors


A finite number of sensors is effectively obtained if we multiply the infinite samples by an
aperture function that selects the range of sensors to be used. As we saw in (2.16), the effect of
this is a convolution in the spatial spectrum with the aperture smoothing function W (k), which
results in smearing of the spectrum.
To study this, consider in the 1D case a uniform linear array with M sensors at locations
xm = md, m = 0, · · · , (M − 1)d. To select this range, define the aperture weighting function
(
1, m = 0, · · · , M − 1
wm = (2.24)
0, elsewhere .

The spatial spectrum of the sampled signal using M sensors is


M
X −1 ∞
X
Y (k, t) = xm (t)ejk·xm = wm s(md, t)ejkmd .
m=0 m=−∞

Let X(k, t) be the (periodic) spectrum of the sampled signal using an infinite number of sensors,
m = −∞, · · · , ∞. Using (2.23) gives
∞ Z ks /2
d X
Y (k, t) = wm e jkmd
X(p, t)e−jpmd dp
2π m=−∞ −ks /2

Z ks /2 " X #
d j(k−p)md
= wm e X(p, t)dp . (2.25)
2π −ks /2 m=−∞

EE 4715 (2022): Array Signal Processing


22 Wave propagation

10
grating main lobe M =9
8 lobe

|W(k)|

Md
4
fundamental interval
2

0
-2 /d - /d 0 /d 2 /d
k

Figure 2.5. Amplitude of the discrete aperture function W (k) for M = 9 sensors. The plot is
periodic with period ks = 2π
d .

Now define the discrete aperture function



X M
X −1
jkmd
W (k) = wm e = ejkmd , (2.26)
m=−∞ m=0

which is periodic with period ks = d/(2π). Then (2.25) can be written as


Z ks /2
d
Y (k, t) = W (k − p)X(p, t)dp . (2.27)
2π −ks /2

This is recognized as a (circular) convolution of W (k) with the discrete spatial spectrum X(p, t),
over one period of the spectrum. This convolution will smooth the spectrum and limit its
resolution. It will also introduce sidelobes, as we will now see. From (2.26), we obtain
M −1
X 1 − ejkM d sin(kM d/2) jk(M −1)d/2
W (k) = ejkmd = = e . (2.28)
m=0
1 − ejkd sin(kd/2)

The factor ejk(M −1)d/2 is simply a phase factor that determines the phase center of the array,
and can be ignored here. The real part (sin over sin) can be viewed as a “periodic sinc-function”
(it is known as the Dirichlet kernel and occurs in convergence proofs for Fourier series). The
amplitude |W (k)| is periodic with period ks = 2π d , as determined by the denominator. It has

zero crossings for k = M d = ks /M .
A plot that shows this beamshape is shown in Fig. 2.5, for M = 9. The periodicity with ks
is clearly visible. The peak of |W (k)| is equal to M , called the array gain. The width of the

main lobe is determined by the first zero crossing, i.e., M d = ks /M . Ideally, for M → ∞,
W (k) converges to a delta spike train, such that the convolution (2.27) does not change the
spectrum: Y (k, t) = X(k, t). For finite M , the convolution with the main lobe will smear out

the spectrum, and reduce its resolution to M d . Indeed, suppose the spectrum S(k, t) contains
two point sources (i.e., delta spikes at specific values for k). The convolution with W (k) will

EE 4715 (2022): Array Signal Processing


2.3 Spatial sampling 23

replace the delta spikes by the main lobe of W (k). If the delta spikes are closer to each other

than approximately M d , then the main lobes will highly overlap, and appear in the spectrum
as a single point source. This is similar to the discussion in Sec. 2.2.2 where we looked at the
effect of an aperture. Note that D = M d can be interpreted as the spatial coverage (aperture)
of the array.
We also note that, next to the main lobe, there are in total M − 1 side lobes within the
fundamental interval, in between the zero crossings of the sinus function in the nominator of
(2.28). The sidelobes in Fig. 2.5 will cause confusion: if in the spectrum Y (k, t) we observe a
small peak, we will not know if it is a weak point source, or if it is a side lobe of another (strong)
source. Thus, the sidelobes limit the sensitivity of the array.3
Due to the periodicity, the main lobe is repeated outside the fundamental interval; these lobes
are called grating lobes. They might appear in the spectrum if the visible region is larger than
the fundamental interval.

2.3.3 More general weights

Until now, we selected in (2.24) aperture weights that were either 0 (outside the aperture)
or 1 (for the M sensors inside the aperture). However, we are not bound to take the non-zero
weights equal to 1: we can select other weights. Doing so will allow us to design other smoothing
functions than the Dirichlet kernel which, after all, has quite high sidelobes. Thus we define,
generalizing (2.26),
M
X −1
W (k) = wm ejkmd . (2.29)
m=0

The nonzero weights are called a shading, or tapering, of the array; in general they could
be complex numbers. W (k) can be interpreted as a (discrete-space) Fourier transform of the
sequence [wm ], similar to the DTFT. Thus, similar design techniques as used for digital filters
can be applied here to design weights that result in a desired “transfer function” |W (k)| with
minimal sidelobe heights or other desired features. For example, instead of the rectangular
window (2.24), we can use a triangular window, a Hann or Hamming window, etc., or apply
other window/filter design techniques such as Parks/McClellan. An extensive overview is given
in [3, Ch. 3].

2.3.4 1D array in a 2D propagation scenario

In the above, we analyzed the 1D case. An extension to higher dimensions is straightforward,


at least if we consider uniform rectangular arrays in 2D, or uniform cubic arrays in 3D.
Let us look now at a 1D array in a 2D propagation scenario, i.e., consider a uniform linear array
3
We will later discuss deconvolution techniques that attempt to undo the convolution by W (k).

EE 4715 (2022): Array Signal Processing


24 Wave propagation

y M M =9 M d = 12 λ
k
2π λ
θ M/2 ∼ Md M/2 ∼ Md

W(k)

W( )
visible
0 0

0 x -2 /d - /d 0 /d 2 /d - /2 - /4 0 /4 /2
− 12 M d 1
2Md kx

Figure 2.6. Uniform linear array with M = 9 sensors in a 2D scenario. (a) Configuration; (b)
the corresponding W (kx ); (c) W (θ), for d = 12 λ.

of M sensors in 2D. Thus, we observe only kx , and drop ky . Then, using (2.28),

sin(kx M d/2)
W (k) = W (kx ) = ,
sin(kx d/2)

(the phase offset due to the non-zero phase center of the array was dropped for simplicity: the
array is centered around the origin), and with kx = − 2π
λ sin(θ) we obtain

sin( Mλd π sin(θ))


W (θ) = .
sin( λd π sin(θ))

We saw in Sec. 2.2.2 that the relation between D and λ determined the part of W (k) that is
visible in case we fix λ and scan θ: the visible part is the interval [− 2π 2π
λ , λ ]. Comparing to Fig.
1
2.5, we see that if d < 2 λ, then the visible part is within one period of W (kx ). Using similar
arguments as before, we estimate the angular resolution as λ/(M d). There are M − 1 zero
crossings, which corresponds to the number of sidelobes.
Fig. 2.6 shows W (kx ) and W (θ), for M = 9 and d = 12 λ. For this choice of d, exactly one
period of W (kx ) is visible. Only the visible part is in W (θ), where we see that the horizontal
axis is stretched at the edges (around ± π2 ) compared to W (kx ). If we take d > 12 λ, then we do
not satisfy the Nyquist criterion, and the resulting aliasing may result in visible grating lobes:
secondary main lobes of sources may appear, especially for sources with angles close to ± 12 π.

2.3.5 Irregular sampling

More in general, we can consider an irregular array, where M sensors are placed “randomly”, at
locations xm in 3D. If S(k, t) is the wavefield, then at a location x,
Z
s(x, t) = S(k, t)ej(ωt−k·xm ) dk .

EE 4715 (2022): Array Signal Processing


2.3 Spatial sampling 25

6
y

|W(kx )|
θ 4

0
-4 /d -3 /d -2 /d - /d 0 /d 2 /d 3 /d 4 /d
x kx
d 11.4d
8

6
y

|W(kx )|
θ 4

0
-4 /d -3 /d -2 /d - /d 0 /d 2 /d 3 /d 4 /d
x kx
d 44d

Figure 2.7. Amplitude of the discrete aperture function W (kx ) for M = 9 non-uniformly
spaced sensors. The smallest spacing is d. Two designs are shown: a random one
with an aperture slightly larger than the uniform design, and a sparse nonredun-
dant array with a much larger aperture. The dotted lines correspond to a uniform
array with M sensors spaced at d.

Taking a finite number of samples at locations xm and weighting them by a taper wm gives

y(x, t) = w(x) · s(x, t) ,

with
M
X −1
w(x) = wm δ(x − xm ) .
m=0

Let the (continuous) spatial Fourier transform of y(x, t) be Y (k, t). Using a similar derivation
as before, we obtain

1 1
Z
Y (k, t) = W (k − p)S(p, t)dp = W (k) ∗ S(k, t) ,
(2π)3 (2π)3

where, inserting w(x) in (2.17),


M
X −1
W (k) = wm ejk·xm
m=0

is the spatial Fourier transform of w(x). This generalizes the previous definition (2.29) of W (k)
to the non-uniform case. Again, the sampled spectrum Y (k, t) is the convolution of the original
spectrum with a smoothing function W (k). However, if the sensor locations are not uniformly
spaced, then W (k) and Y (k, t) will not be periodic, and it will be hard to analyze theoretically.

EE 4715 (2022): Array Signal Processing


26 Wave propagation

Two examples are shown in Fig. 2.7. The first array is mildly irregular, with a sensor at x = 0,
one at x = d, and subsequent ones uniformly selected randomly between d and 1.5 d away from
the previous one. In total M = 9 sensors are used, all with equal weights. Since the array is
irregular, it is seen that the plot of |W (kx )| is non-periodic. The aperture of this array is not
much larger than M d, and the main lobe width is not much narrower than that for a uniform
linear array of M sensors spaced at d (the red dotted line). Grating lobes are present, but not at
the full height of the main lobe, and a bit closer than would be expected for a minimum spacing
of d. Moreover, these plots greatly vary if another, similar, random design is selected. It is clear
that without a design process such random arrays will not have desired properties.
The second array in Fig. 2.7 is based on a uniform distance d. It can be viewed as an M = 45
uniform array that is subsequently thinned to M = 9; this is called a sparse linear array. The
spacings between the sensors are [1]

[1, 4, 7, 13, 2, 8, 6, 3] · d .

Because of the underlying uniformity (all sensor spacings are a multiple of d), the spectrum
W (k) is periodic with period ks = 2π/d. The main lobe is much narrower than before, as the
aperture is D = 45d. As a penalty, the side lobes are now much stronger and appear noise-like.
It could be argued that the effective array gain is only a factor 3 rather than 9.
To analyze array designs, let
c(x) = w(x) ∗ w(−x) (2.30)

be the deterministic “autocorrelation” of w(x). It is a delta sequence with spikes at every


baseline between two sensor locations. (A baseline is the distance vector between a pair of
sensors.) If multiple baselines are the same (the array is redundant), then the scale of the
corresponding spike counts the multiplicities. For lag x = 0, we have c(0) = M . The locations
where c(x) has spikes forms the co-array. The motivation for (2.30) comes from this: since
|W (k)|2 = W (k)W ∗ (k), then Z
|W (k)|2 = c(x)ejk·x dx

is the Fourier transform of c(x). Thus, the co-array determines the magnitude of the spectrum
smoothing function.
Filter design techniques can be used to determine, starting from a desired |W (k)|2 , the corre-
sponding c(x), and subsequently a set of positions and sensor weights that approximate this
c(x). Generally, however, array design comes with many constraints and is not an easy art.

2.4 CORRELATION PROCESSING

In telecommunication, we use array processing to spatially separate sources and receive one
signal of interest. In many other applications, we are not so much interested in the source signal

EE 4715 (2022): Array Signal Processing


2.4 Correlation processing 27

s(t), but more in the propagation parameters, i.e., k or the unit-norm direction vector ζ. In the
2D case, ζ is specified by the direction of arrival θ.
In these cases, we can do away with the temporal dimension and work with correlation models.
Generally, we look at second-order correlations between the sensor signals. For non-Gaussian
sources, we might also consider higher-order statistics, cf. Chap. 11.
Thus, in this section we will consider signals s(t) as random processes. If we limit ourselves to
descriptions by second-order statistics, we look at the mean and the variance,
h i
E[s(t)] , E |s(t) − E[s(t)]|2

and more in general the autocorrelation function, rs (t, t0 ) = E[s(t)s∗ (t0 )]. (For generality, com-
plex signals are assumed, and the superscript ∗ denotes the complex conjugate.) Usually we
immediately make several simplifying assumptions: we consider that the signal is wide sense
stationary, so that the mean is constant over time, the autocorrelation function only depends
on the time difference τ = t0 − t, and the variance is finite. We can then write

rs (τ ) = E[s(t + τ )s∗ (t)] .

Moreover, we usually consider the mean to be zero, E[s(t)] = 0, so that rs (τ ) equals the variance.
In any case, rs (0) represents the power of the random signal.
Recall from a course on random processes that the power spectral density is defined as the
Fourier transform of the autocorrelation function:
Z
Rs (ω) = rs (τ )e−jωτ dτ .

This can be related to the spectrum S(ω) of s(t), but we have to be careful as for a random
process, the energy in s(t) is infinite: we have to look at the energy per unit time. Therefore,
let sT (t) be equal to s(t) on the interval (− 21 T, 12 T ] and zero otherwise, and let ST (ω) be the
Fourier transform of sT (t), then one can show that
1 h i
Rs (ω) = lim E |ST (ω)|2 .
T →∞ T

For white noise, rs (τ ) is a delta spike, and Rs (ω) is a constant. (Its power, rs (0), is actually
infinite, so truly white noise does not exist.)

2.4.1 Monochromatic plane wave


These concepts can be carried over to the spatial domain. Space-time stochastic processes are
called random fields. To start from basics, consider a scenario with a single plane wave, where
the propagating source s(t) is a monochromatic plane wave:

s(x, t) = α ej(ω0 t−k0 ·x)

EE 4715 (2022): Array Signal Processing


28 Wave propagation

where α is a random (complex) amplitude,

E[α] = 0 , E[ |α|2 ] = P .

At a position x, we can look at the temporal autocorrelation function

rs (τ ) = E[s(x, t + τ )s∗ (x, t)] = P ejω0 τ

(the source is stationary in time and the result does not depend on the position x). Now, in
analogy, consider a sensor at location x0 and one at location x1 . The spatial cross-correlation
function is
h i
rs (x0 , x1 , τ ) = E[s(x1 , t + τ )s∗ (x0 , t)] = E |α|2 ej(ω0 τ −k0 ·(x1 −x0 )) .

Note that this depends only on the baseline b = x1 − x0 , i.e., the vector pointing from x0 to x1 .
Such a random field is called homogeneous, and we can write

rs (b, τ ) = E[s(x0 + b, t + τ )s∗ (x0 , t)] .

For the monochromatic plane wave, we have

rs (b, τ ) = P ejω0 τ e−jk0 ·b .


ω0
Let ζ be the unit-norm vector pointing in the direction of k0 , and recall that k0 = c ζ , then

rs (b, τ ) = P ejω0 τ e−jω0 τg (2.31)

where
ζ·b
τg =.
c
See Fig. 2.8. The figure shows that τg is the geometric delay, the delay of the wavefront in
propagating from x0 to x1 . In the figure, the signal arrives at x1 first, and therefore the delay
is actually an advance, and therefore negative. If d = |b| and θ is the angle between the source
direction and the direction orthogonal to the baseline (broadside), then

d
τg = − sin(θ) .
c
(The minus sign is due to the orientation of ζ, and indeed, for positive θ the delay is negative.)
Thus, τg is related to the direction of arrival (DOA). Often, DOA estimation algorithms estimate
τg , or the phase delay e−jω0 τg , and determine the DOA from this.
Taking the temporal Fourier transform of (2.31) gives the cross power spectral density,

Rs (b, ω) = 2π P e−jω0 τg δ(ω − ω0 ) . (2.32)

EE 4715 (2022): Array Signal Processing


2.4 Correlation processing 29

y
ζ

θ λ
τg

x0 b x1 x

Figure 2.8. Propagating monochromatic source. The projection of b on the propagation


direction determines the geometric delay τg .

2.4.2 Wideband plane wave


If we now generalize from a monochromatic source to a wideband source with power spectral
density Rs (ω), we can similarly define
1 ζ·b
Z
rs (b, τ ) = Rs (ω)ejω(τ −τg ) dω , τg = . (2.33)
2π c
Applying the temporal Fourier transform gives the cross power spectral density,
ζ·b
Rs (b, ω) = Rs (ω)e−jωτg , τg = . (2.34)
c
which generalizes (2.32). The expression shows that we can write (2.33) as

rs (b, τ ) = rs (τ ) ∗ δ(τ − τg ) (2.35)

i.e., the crosscorrelation between two sensors (spaced by b) is the autocorrelation of the source,
convolved with a delay. Of course, this result could have been obtained also directly!

2.4.3 Random fields


TBD
More in general, we will receive a superposition of multiple sources simultaneously. These can be
point sources (with a specific direction k0 ), or a random field with a certain source distribution
as function of k.
The generalization of (2.35) is
1
Z Z
rs (b, τ ) = Rs (k, ω)ej(ωτ −k·b) dkdω .
(2π)4
Z Z
Rs (k, ω) = rs (b, τ )e−j(ωτ −k·b) dbdτ

EE 4715 (2022): Array Signal Processing


30 Wave propagation

ERROR notation clash with (2.34).


White noise random field: rs (b, τ ) = δ(b, τ ): zero for any nonzero lags in position or time.
Isotropic noise field: random waves propagating in all possible directions with equal probability.
ω
S(k, ω) = S(ω)δ(k − ), k = kkk .
c
S(ω) is the coloring of the noise in the temporal frequency domain. The argument of the delta
function selects all k on a sphere that satisfies the linear dispersion condition.

2.5 APPLICATION: RADIO ASTRONOMY

Astronomical instruments measure cosmic particles or electromagnetic waves impinging on the


Earth. Astronomers use the data generated by these instruments to study physical phenomena
outside the Earth’s atmosphere. In recent years, astronomy has transformed into a multi-modal
science in which observations at multiple wavelengths are combined. Fig. 2.9 provides a nice
example showing the lobed structure of the famous radio source Cygnus A as observed at 240
MHz with the Low Frequency Array (LOFAR) overlaid by an X-Ray image observed by the
Chandra satellite, which shows a much more compact source.
Such images are only possible if the instruments used to observe different parts of the electro-
magnetic spectrum provide similar resolution. Since the resolution is determined by the ratio of
observed wavelength and aperture diameter, the aperture of a radio telescope has to be 5 to 6
orders of magnitude larger than that of an optical telescope to provide the same resolution. This
implies that the aperture of a radio telescope should have a diameter of several hundreds of kilo-
meters. Most current and future radio telescopes therefore exploit interferometry to synthesize
a large aperture from a number of relatively small receiving elements.

2.5.1 Interferometry

An interferometer measures the correlation of the signals received by two antennas spaced at a
certain distance. After a number of successful experiments in the 1950s and 1960s, two arrays of
25-m dishes were built in the 1970s: the 3 km Westerbork Synthesis Radio Telescope (WSRT, 14
dishes, see Fig. 2.10) in Westerbork, The Netherlands and the 36 km Very Large Array (VLA,
27 movable dishes) in Socorro, New Mexico, USA (Fig. 1.1). These telescopes use Earth rotation
to obtain a sequence of correlations for varying antenna baselines, resulting in high-resolution
images via synthesis mapping. A more extensive historical overview is presented in [5].
The radio astronomy community has recently commissioned a new generation of radio telescopes
for low frequency observations, including the Murchison Widefield Array (MWA) [6] in Western
Australia and the Low Frequency Array (LOFAR) [7] in Europe. These telescopes exploit phased
array technology to form a large collecting area with ∼1000 to ∼50,000 receiving elements. The
community is also making detailed plans for the Square Kilometre Array (SKA), a future radio

EE 4715 (2022): Array Signal Processing


2.5 Application: radio astronomy 31

Figure 2.9. Radio image of Cygnus A observed at 240 MHz with the Low Frequency Ar-
ray (showing mostly the lobes left and right), overlaid over an X-Ray image of
the same source observed by the Chandra satellite (the fainter central cloud).
(Courtesy of Michael Wise and John McKean.)

EE 4715 (2022): Array Signal Processing


32 Wave propagation

Figure 2.10. Westerbork Synthesis Radio Telescope (14 dishes)

EE 4715 (2022): Array Signal Processing


2.5 Application: radio astronomy 33

FOV

geometric
delay

g1 g2 baseline gJ
x̃1 (t) x̃2 (t) x̃J (t)

Figure 2.11. Schematic overview of a radio interferometer.

telescope that should be one to two orders of magnitude more sensitive than any radio telescope
built to date [8]. This will require millions of elements to provide the desired collecting area of
order one square kilometer.
The concept of interferometry is illustrated in Fig. 2.11. An interferometer measures the spatial
coherency of the incoming electromagnetic field. This is done by correlating the signals from the
individual receivers with each other. The correlation of each pair of receiver outputs provides
the amplitude and phase of the spatial coherence function for the baseline defined by the vector
pointing from the first to the second receiver in a pair. In radio astronomy, these correlations
are called the visibilities.
Obviously, Fig. 2.11 is directly tied to Fig. 2.8. For a wideband plane wave source s propagating
in the direction ζ, the power spectral density Rs (ω) of this source is called the brightness, and
denoted by I(ω, ζ). (Actually, −ζ is used: this is the unit vector pointing towards the source.)
We also defined the observed cross power spectral density due to this source in (2.35) as Rs (b, ω),
and this is the visibility V (ω, b). Thus, for a single source,

ω
V (ω, b) = I(ω, ζ)e−j c ζ·b

For a superposition of sources, we can generalize this to

Z
ω
V (ω, b) = I(ω, ζ)e−j c ζ·b dζ (2.36)

This relation is called the Van Cittert-Zernike theorem [5, 9]. For each ω, I(ω, ζ) is viewed as an
image (called the map), by parametrizing ζ in two coordinates or two angles, and the objective
in radio astronomy is to obtain this map.

EE 4715 (2022): Array Signal Processing


34 Wave propagation

2.5.2 Dirty map


If we could measure V (ω, b) for all possible baselines b, then we can reconstruct I(ω, ζ) from
(2.36) by an inverse spatial Fourier transform,
1
Z
ω
I(ω, ζ) = V (ω, b)ej c ζ·b db (2.37)
(2π)3

which is computed for each ω separately. However, in practice, we can estimate V (ω, b) for only
a discrete set of baselines {bk }: every telescope pair provides one baseline, and as the earth
rotates, this baseline rotates and traces an arc in 3D space. We can obtain samples along this
arc.
Thus, we cannot directly implement (2.37). Instead, we can compute an estimate
1 X ω
ID (ω, ζ) = V (ω, bk )ej c ζ·bk (2.38)
(2π)3 k

which is called the dirty map. It is not equal to the desired map I(ω, ζ). Indeed, by substituting
(2.36), we find
1
Z X ω
ID (ω, ζ) = I(ω, n) ej c (n−ζ)·bk dn (2.39)
(2π)3 k
This follows a similar derivation as we saw in Sec. 2.3.5, but now using baselines instead of direct
location samples. We can write (2.39) as
1 1
Z
ID (ω, ζ) = W (n − ζ)I(ω, n)dn = W (ζ) ∗ I(ω, ζ) (2.40)
(2π)3 (2π)3
where X ω
W (ζ) = ej c ζ·bk . (2.41)
k
Thus, the obtained dirty map is a convolution of the desired “true” map with a smoothing
function W (ζ). This W (ζ) is called the dirty beam. Since, generally, the baseline sampling is
quite irregular, the dirty beam also looks quite random.
An example of a set of antenna coordinates and the corresponding dirty beam is shown in Fig.
2.12. This is for a single low-band LOFAR station and a single 10 second integration interval
and frequency bin. The dirty beam has heavy sidelobes, as high as −10 dB. To make this plot,
the unit-norm
√ direction vector ζ is parametrized as ζ = [`, m, n]T , where [`, m] are plotted and
n = 1 − `2 − m2 is not shown.
A resulting dirty image is shown in Fig. 2.13. The image shows the complete sky, in (`, m)
coordinates, where the reference direction is pointing towards zenith. The strong visible sources
are Cassiopeia A and Cygnus A, also visible is the milky way, ending in the north polar spur
(NPS) and, weaker, Virgo A. In the South, the Sun is visible as well. The image was obtained by
averaging 25 integration intervals, each consisting of 10 s data in 25 frequency channels of 156

EE 4715 (2022): Array Signal Processing


2.5 Application: radio astronomy 35

40
1 0
30
−5

0.5 −10

South ← m → North
20
South ← y → North

−15
10

0 −20
0
−25
−10
−0.5 −30

−20 −35

−1 −40
−30
−40 −30 −20 −10 0 10 20 30
1 0.5 0 −0.5 −1
East ← x → West East ← l → West

Figure 2.12. (a) Coordinates of the antennas in a LOFAR station, which defines the spatial
sampling function, and (b) the resulting dirty beam, plotted in dB.

DFT image
1

0.8 1.6

0.6 1.4

0.4 1.2
South ← m → North

0.2
1
0
0.8
−0.2
0.6
−0.4
0.4
−0.6

−0.8 0.2

−1 0
1 0.5 0 −0.5 −1
East ← l → West

Figure 2.13. Dirty image following (2.40), using LOFAR station data.

EE 4715 (2022): Array Signal Processing


36 Wave propagation

kHz wide taken from the band 45–67 MHz, avoiding the locally present radio interference. As
this shows data from a single LOFAR station, with a relatively small maximal baseline (65 m),
the resolution is limited and certainly not representative of the capabilities of the full LOFAR
array.
The dirty beam is essentially a non-ideal point spread function due to finite and non-uniform
spatial sampling: we only have a limited set of baselines. The dirty beam has a main lobe
centered at ζ = 0, and many side lobes. If we would have a large number of telescopes positioned
in a uniform rectangular grid, the dirty beam would be a 2D sinc-function. The resulting beam
size is inversely proportional to the aperture (diameter) of the array. This determines the
resolution in the dirty image. The sidelobes of the beam give rise to confusion between sources:
it is unclear whether a small peak in the image is caused by the main lobe of a weak source,
or the sidelobe of a strong source. Therefore, attempts are made to design the array such that
the sidelobes are low. As mentioned in Sec. 2.3.3, it is also possible to introduce weighting
coefficients (“tapers”) in (2.38) to obtain an acceptable beamshape.
As mentioned, an antenna array generates a set of baselines, and as the Earth rotates, these
baselines also rotate and generate many of such sets. In the definition of W (ζ), the effect of
summing over these sets is that the sidelobes tend to get averaged out, to some extent. Many
images are also formed by averaging over a small number of frequency bins (assuming the source
powers Rs (ω) are constant over these frequency bins), which enters into the equations in exactly
the same way.
Since W (ζ) is data-independent, it can be nearly perfectly predicted after careful calibration
of the instrument. Thus, we can try to estimate I(ω, ζ) from ID (ω, ζ) using deconvolution
techniques.
There are many issues that we ignored in the discussion, such as coordinate systems, approxi-
mation of the 3D integral in (2.40) by a 2D integral (on the assumption that either the field of
interest is small, or that the antenna array sits on a flat plane), and how the V (ω, bk ) can be
estimated from received telescope signals. Also, there are directional disturbances due to non-
isotropic antennas, unequal antenna gains, and disturbances due to atmospheric effects. Some
of these questions are covered in future chapters.

2.6 NOTES

The discussion in Sec. 2.1 summarizes the presentation in [1]. Much more can be said about
these topics, in particular for applications where the propagation speed c is position dependent
(as for geophysics exploration or underwater acoustics). A range of applications and related
wave models is found in [2].
A more extensive introduction to wavefield propagation and its role in image formation is offered
in the course EE4595 Wavefield imaging.
The text on radio astronomy signal processing in Sec. 2.5 is based on Van der Veen e.a. [4]. This

EE 4715 (2022): Array Signal Processing


Bibliography 37

paper gives a short introduction. A classical introduction from 1985 is in [2, Ch. 5]. Well-known
reference textbooks are [5, 9].

Bibliography

[1] D.H. Johnson and D.E. Dudgeon, Array signal processing: concepts and techniques. Prentice
Hall, 1993.

[2] S. Haykin, ed., Array signal processing. Prentice Hall, 1985.

[3] H.L. Van Trees, Optimum array processing: Part IV of detection, estimation, and modulation
theory. Wiley, 2004.

[4] A.J. van der Veen, S.J. Wijnholds, and A.M. Sardarabadi, “Signal processing for radio
astronomy,” in Handbook of Signal Processing Systems, 3rd ed., Springer, November 2018.
ISBN 978-3-319-91734-4.

[5] A.R. Thompson, J.M. Moran, and G.W. Swenson, Interferometry and Synthesis in Radio
Astronomy. New York: Wiley, 2nd ed., 2001.

[6] C. Lonsdale et al., “The Murchison Widefield Array: Design overview,” Proceedings of the
IEEE, vol. 97, pp. 1497–1506, Aug. 2009.

[7] M. de Vos, A.W. Gunst, and R. Nijboer, “The LOFAR telescope: System architecture and
signal processing,” Proceedings of the IEEE, vol. 97, pp. 1431–1437, Aug. 2009.

[8] P.E. Dewdney, P.J. Hall, R.T. Schilizzi, and T.J. Lazio, “The square kilometre array,” Pro-
ceedings of the IEEE, vol. 97, pp. 1482–1496, Aug. 2009.

[9] R.A. Perley, F.R. Schwab, and A.H. Bridle, Synthesis Imaging in Radio Astronomy, vol. 6
of Astronomical Society of the Pacific Conference Series. BookCrafters Inc., 1994.

EE 4715 (2022): Array Signal Processing


38 Wave propagation

EE 4715 (2022): Array Signal Processing


Chapter 3

NARROWBAND DATA MODELS

Contents
3.1 Antenna array receiver model . . . . . . . . . . . . . . . . . . . . . . . 40

3.2 Narrowband correlation models . . . . . . . . . . . . . . . . . . . . . . 53

3.3 Application: radio astronomy . . . . . . . . . . . . . . . . . . . . . . . 58

3.4 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

In the previous chapter, we have seen the basics of wave propagation in relation to array pro-
cessing. Our goal in this book is to give an overview of the basic signal processing algorithms
that are used in this area, and that form the basis of more complicated algorithms in real appli-
cations. An important first step in the derivation of any algorithm should be a good description
of the application scenario and a statement of the basic assumptions that can be made, such
that a clear data model that captures the scenario can be stated. The model will then determine
the type of algorithm that is appropriate. Different assumptions lead to different models and to
different algorithms.

The model should be based on reality but not be overly detailed: if we want to estimate model
parameters, their number should not be too large! The purpose of this chapter and the next is
to present models for a number of prototype scenarios. Depending on the assumptions that are
made, simple models with few parameters result, or more accurate models with more parameters.
It ultimately depends on the requirements of the application which model is preferred.

In this chapter, we start the modeling by focusing on the reception of signals on an antenna
array, under narrowband conditions. If the narrowband condition is satisfied, a delay can be
translated to a phase shift: convolutions simplify to scalar products. This greatly simplifies the
modeling (and subsequent processing), and allows us to develop spatial beamforming without
caring much about time domain aspects.

EE 4715 (2022): Array Signal Processing


40 Narrowband data models

s(t)

y(t) x1 (t)
y(t)
x1 x2 x3
x2 (t) ∆
x3 (t)
∆ ∆ +

y(t)

Figure 3.1. Coherent adding of signals. A parabolic dish physically ensures the correct delays
for coherently adding signals that come from the same look direction. A phased
array has to electronically insert the correct delays.

3.1 ANTENNA ARRAY RECEIVER MODEL

We start by considering the reception of a signal at an antenna array consisting of multiple


antennas. We only consider linear receivers at this point.

3.1.1 Introduction
Beamforming An antenna array may be employed for several reasons. A traditional one is
signal enhancement. If the same signal is received at multiple antennas and can be coherently
added, then incoherent additive noise is averaged out. For example, suppose we have a signal
s(t) received at M antennas x0 , · · · , xM −1 ,

xm (t) = s(t) + nm (t) , m = 0, · · · , M − 1

where s(t) is the desired signal and nm (t) is noise. Let us suppose that the noise variance is
E[|nm (t)|2 ] = σ 2 . If the noise is uncorrelated from each antenna to the others, then by averaging
we obtain
−1 −1
1 MX 1 MX
y(t) = xm (t) = s(t) + nm (t) .
M m=0 M m=0
1 2
The noise variance on y(t) is given by E[|y|2 ] = Mσ . We thus see that there is an array gain
equal to a factor M , the number of antennas.
The reason that we could simply average, or add up the received signals xm (t), is that the
desired signal entered coherently, with the same delay at each antenna. More in general, the

EE 4715 (2022): Array Signal Processing


3.1 Antenna array receiver model 41

s(t)

x0 (t) x1 (t)
w1
y(t)
w0

Figure 3.2. Nulling out a signal.

desired signal is received at unequal delays and we have to introduce compensating delays to be
able to coherently add them. This requires knowledge on these delays, or the direction at which
the signal was received. The operation of delay-and-sum is known as beamforming, since it can
be regarded as forming a beam into the direction of the source. The delay-and-sum beamformer
acts like an equivalent of a parabolic dish, which physically inserts the correct delays to look in
the desired direction. See Fig. 3.1.

Spatial filtering A second reason to use an antenna array is to introduce a form of spatial
filtering. Filtering can be done in the frequency domain —very familiar—, but similarly in the
spatial domain. Spatial filtering can just be regarded as taking (often linear) combinations of
the antenna outputs, and perhaps delays of them, to reach a desired spatial response.
A prime application of spatial filtering is null steering: the linear combinations are chosen such
that a signal (interferer) is completely cancelled out. Suppose a signal s(t) is received at the
first antenna directly, but at the second antenna with a delay τ , see Fig. 3.2. It is easy to see
how the received signals can be combined to produce a zero output, by inserting a proper delay
and taking the difference. However, even without a delay we can do something. By weighting
and adding the antenna outputs, we obtain a signal y(t) at the output of the beamformer,

y(t) = w0 s(t) + w1 s(t − τ )

In the frequency domain, this is

Y (ω) = S(ω)(w0 + w1 e−jωτ )

Thus, we can make sure that the signal is cancelled, Y (ω) = 0, at a certain frequency ω0 , if we
select the weights such that
w1 = −w0 ejω0 τ

EE 4715 (2022): Array Signal Processing


42 Narrowband data models

beamformer space-time equalizer


s(t) s(t)
x0 (t) x0 (t)
y(t) = ŝ(t) y(t) = ŝ(t)

w
xM −1 (t) xM −1 (t) w

Figure 3.3. (a) Narrowband beamformer (spatial filter); (b) broadband beamformer (spa-
tial/temporal filter).

Z(ω)
z(t) Lowpass s(t)

−ω0 0 ω0 e−jω0 t
B

S(ω)

−2ω0 0

Figure 3.4. Transmitted real signal z(t) and complex baseband signal s(t).

Note that (i) if we do not delay the antenna outputs but only scale them before adding, then
we need complex weights; (ii) with an implementation using weights, we can cancel the signal
only at a specific frequency, but not at all frequencies.
Thus, for signals that consist of a single frequency, or a narrow band around a carrier frequency,
we can do null steering by means of a phased array (i.e., summing after multiplications by
complex weights). In more general situations, with broadband signals, we need a beamformer
structure consisting of weights and delays. How narrow is narrow-band depends on the maximal
delay across the antenna array, as is discussed next.

3.1.2 The narrowband assumption

Let us recall the following facts. In signal processing, signals are usually represented by their
lowpass equivalents, see e.g., [1]. This is a suitable representation for narrowband signals in a
digital communication system. A real valued bandpass signal with center frequency ω0 may be
written as
z(t) = real{s(t)ejω0 t } = x(t) cos(ω0 t) − y(t) sin(ω0 t) (3.1)

EE 4715 (2022): Array Signal Processing


3.1 Antenna array receiver model 43

where s(t) = x(t) + jy(t) is the complex envelope of the signal z(t), also called the baseband
signal. The real and imaginary parts, x(t) and y(t), are called the in-phase and quadrature
components of the signal z(t). In practice, they are generated by multiplying the received signal
with cos(ω0 t) and sin(ω0 t) followed by low-pass filtering. (An alternative is to apply a Hilbert
transformation.)
Suppose that the bandpass signal z(t) is delayed by a time τ . This can be written as

zτ (t) := z(t − τ ) = real{s(t − τ )ejω0 (t−τ ) } = real{s(t − τ )e−jω0 τ ejω0 t } .

The complex envelope of the delayed signal is thus sτ (t) = s(t − τ )e−jω0 τ . Let B be the
bandwidth of the complex envelope (the baseband signal) and let S(ω) be its Fourier transform.
We then have
1 B/2
Z
s(t − τ ) = S(ω)e−jωτ ejωt dω .
2π −B/2
If |ωτ |  2π for all frequencies |ω| ≤ B
2 we can approximate e−jωτ ≈ 1 for ω within the band,
and get
Z B/2
1
s(t − τ ) ≈ S(ω)ejωt dω = s(t) .
2π −B/2

Thus, we have for the complex envelope sτ (t) of the delayed bandpass signal zτ (t) that

sτ (t) ≈ s(t)e−jω0 τ for Bτ  2π .

Bτ  2π is called the narrowband condition. The conclusion is that, for narrowband signals,
time delays smaller than the inverse bandwidth may be represented as phase shifts of the complex
envelope. This is fundamental in direction estimation using phased antenna arrays.
For propagation across an antenna array, the maximal delay depends on the maximal distance
across the antenna array: the aperture. Let us work with frequencies f = ω/(2π) in Hz, and
corresponding bandwidths W = B/(2π) Hz. If the wavelength is λ = c/f0 and the aperture is ∆
wavelengths, then the maximal delay is τ = ∆λ/c = ∆/f0 . In this context, narrowband means

∆ f0
Bτ  2π ⇔ W 1 ⇔ W  . (3.2)
f0 ∆
For mobile communications, the wavelength around f0 = 1 GHz is about 30 cm. For practical
purposes, ∆ is small, say ∆ < 5 wavelengths, and then narrowband means W  30 MHz.
This condition is satisfied for most communication systems around 1 GHz. Bluetooth operates
with channels that have a bandwidth of 1 MHz at 2.4 GHz, and the narrowband assumption is
satisfied. Ultrawideband (UWB) systems in the IEEE 802.15.4 standard operate in 500 MHz
bands at 3.1 GHz to 10.6 GHz, and the narrowband assumption does not hold. In low-frequency
radio astronomy, we could have a center frequency at 100 MHz (wavelength 3 m), and a telescope
array with a diameter of 100 km (33,000 wavelengths), so that the maximal bandwidth is in the
order of 3 kHz. This is implemented by splitting the received signals into narrow subbands.

EE 4715 (2022): Array Signal Processing


44 Narrowband data models

s0 (t)

x0 (t) x1 (t) xM −1 (t)


Figure 3.5. A uniform linear array receiving a far field point source.

The above considerations would be for a single plane wave traveling across the array. For outdoor
propagation, the situation may be different. If there is a reflection on a distant object, there
may be path length differences of a few km, or ∆ = O(1000) wavelengths at 1 GHz. In this
context, narrow band means W  1 MHz, and for many communication signals (e.g., GSM,
UMTS), this is not really satisfied. In this case, delays of the signal cannot be represented by
mere phase shifts, and we need to do broadband beamforming, i.e., space-time processing.
Usually we sample a signal at Nyquist, i.e., fs = W (assuming complex baseband samples), or
Ts = 1/W . The narrowband condition translates to τmax  Ts : the maximal delay for propa-
gation across the array is less than the sampling period. This will give rise to an instantaneous
data model, discussed later in Sec. 4.3.1.

3.1.3 Antenna array response


Consider an array consisting of M identical antenna elements placed along a line in space, and
assume a point source is present in the far field. See Fig. 3.5. Let s0 (t) be the transmitted
baseband signal, which is subsequently modulated at ω0 . If the distance between the array and
the source is large enough in comparison to the extent of the array (i.e., the source is in the
far field), the wave incident on the array is approximately planar. The angle θ to the normal is
called the direction of arrival (DOA) of the plane wave. Let am (t, θ) be the response of the ith
antenna element. The signal received at the ith antenna (after demodulation by ω0 ) is

xm (t) = am (t, θ) ∗ s0 (t − Tm )e−jω0 Tm (3.3)

where ∗ denotes convolution and Ti is the time it takes the signal to travel from the source to
the mth antenna.
A uniform array has identical elements, i.e., all antennas have the same response a(t, θ). It is

EE 4715 (2022): Array Signal Processing


3.1 Antenna array receiver model 45

reasonable to assume separability into a(t, θ) = a0 (θ)g(t), where a0 (θ) is the antenna gain pattern
in the direction θ, and g(t) is its temporal response. If the antennas are onmidirectional and
the frequency response is flat over the band of interest, as is often assumed, we have a0 (θ) = a0
and g(t) = δ(t).
Define by
s(t) = g(t) ∗ s0 (t − T0 )e−jω0 T0

the signal received by the first antenna element, save for the array gain, and let τm = Tm − T0
be the time difference of arrivals (the geometric delays). If the τm are small compared to the
inverse bandwidth of s(t), we may set sm (t) = s(t)e−jω0 τm , which is the signal received at time
t at the mth element of the array.
Collecting the signals received by the individual elements into a vector x(t), we obtain from
(3.3)
   
s(t) 1
 sτ1 (t)   e−jω0 τ1 
   
x(t) = a0 (θ) 
 .. =
  ..  a0 (θ)s(t)
. .

   
sτM −1 (t) e−jω0 τM −1
For a uniform linear array, we have the same distance d between the antenna elements, so that
all delays between two consecutive array elements are the same: τm = mτ . We can also relate
the time difference (or phase shift) to the angle of arrival θ:

d sin(θ) 2π
ω0 τ = −ω0 = − d sin(θ) = −2π∆ sin(θ)
c λ

where ∆ = d/λ is the spacing between antenna elements measured in wavelengths (corresponding
to the center frequency ω0 ) so that
 
1

 ej2π∆ sin(θ) 

x(t) =  ..  a0 (θ)s(t) =: a(θ)s(t) , (3.4)
.
 
 
ej2π(M −1)∆ sin(θ)

where the array response vector a(θ) is the response of the array to a plane wave with DOA
θ. The array manifold A is the curve that a(θ) describes in the M -dimensional complex vector
space C| M when θ is varied over the domain of interest:

A = {a(θ) : 0 ≤ θ < 2π} .

In (3.4), the array response vector a(θ) has a very regular form, due to the uniform linear
array structure. More in general, assume that in a 2D scenario we have an irregular array with

EE 4715 (2022): Array Signal Processing


46 Narrowband data models

elements at positions xm . Further assume x0 = 0: this sets the phase reference at element 0.
Then, following Chap. 2, we find
 
1
 jφ1 
 e 
a(θ) =   a0 (θ)
..  (3.5)
 . 

ejφM −1

where the phase factors are


ω0 2π
φm = −ω0 τm = − ζ · x m = − ζ · xm .
c λ
Here, ζ denotes the direction vector of the incoming wave. In 2D, we have, viz. (2.4),
" #
sin(θ)
ζ=−
cos(θ)

so that the phase factors are


xm ym
φm = 2π sin(θ) + 2π cos(θ) .
λ λ
This generalizes (3.4). A similar expression can be derived for propagation in 3D, leading to an
array response vector a(θ, φ) parametrized by two angle parameters.
In many algorithms, the common factor a0 (θ) does not play a role and is often omitted even in the
data model: the array is assumed to have equal response in all directions (it is “omnidirectional”)
although this assumption is typically not valid and not always necessary. Otherwise, this factor
is a direction-dependent gain. The time response g(t) is usually omitted as well, or lumped in
the receiver filter description.

3.1.4 Array manifold and parametric direction finding


Data models of the form
x(t) = a(θ)s(t)
play an important role throughout this book. Note that for varying source samples s(t), the
data vector x(t) is only scaled in length, but its direction a(θ) is constant. Thus, x(t) is confined
to a line in M -dimensional space. If we know the array manifold, i.e., the function a(θ), then we
can determine θ by intersecting the line with the curve traced by a(θ) for varying θ, or “fitting”
the best a(θ) to the direction of the x(t).
For two sources, the data model becomes a superposition,
" #
s (t)
x(t) = a(θ1 )s1 (t) + a(θ2 )s2 (t) = [a(θ1 ) a(θ2 )] 1
s2 (t)

EE 4715 (2022): Array Signal Processing


3.1 Antenna array receiver model 47

1 signal
2 signals

x(t)

x(t)

a(θ) a(θ)

Figure 3.6. Direction finding means intersecting the array manifold with the line or plane
spanned by the antenna output vectors.

or " #
s (t)
x(t) = As(t) , A = A(θ1 , θ2 ) = [a(θ1 ) a(θ2 )] , s(t) = 1 .
s2 (t)
When s1 (t) and s2 (t) both vary with t, x(t) is confined to a plane. Direction finding now
amounts to intersecting this plane with the array manifold, see Fig. 3.6.
With multipath, we obtain a linear combination of the same source via two different paths. If
the relative delay between the two paths is small compared to the inverse bandwidth, it can be
represented by a phase shift. Thus, the data model is
x(t) = a(θ1 )s(t) + a(θ2 )βs(t)
= {a(θ1 ) + βa(θ2 )} s(t) = a s(t) .
In this case, the combined vector a is not on the array manifold and direction finding is more
complicated. At any rate, x(t) contains an instantaneous multiple a of s(t). In many applications
β is fluctuating relatively quickly, so that a is time varying on short time scales (the coherence
time).

3.1.5 Beamforming and source separation


With two narrowband sources and multipath, we receive a linear mixture of these sources,
x(t) = a1 s1 (t) + a2 s2 (t) = As(t) .
The objective of source separation is to estimate beamformers w1 and w2 to separate and recover
the individual sources:
H H
y1 (t) = w1 x(t) = ŝ1 (t) , y2 (t) = w2 x(t) = ŝ2 (t) ,

EE 4715 (2022): Array Signal Processing


48 Narrowband data models

spatial response for fixed w


spatial response for fixed w spatial response for fixed w 7
2 3
M=7
M=2 M=3 6
2.5 Delta = 0.5
Delta = 0.5 Delta = 0.5
1.5 5
2
4
1 1.5
3

1
2
0.5
0.5 1

0 0 0
−50 0 50 −50 0 50 −50 0 50
angle [deg] angle [deg] angle [deg]

Figure 3.7. Spatial responses to a beamformer w = [1, · · · , 1]T as a function of the incoming
source direction θ.

or in matrix form, with W = [w1 w2 ],

W = A(A A)−1 .
H H H
W x(t) = s(t) ⇔ W A=I ⇔

Thus, we have to obtain an estimate of the mixing matrix A and find a left inverse to separate
the sources. We assumed that A is such that AH A is invertible; this requires A to be tall: at
least as many antennas as sources.
There are several ways to estimate A. One we have seen before: if there is no multipath, then
A = [a(θ1 ) a(θ2 )]. By estimating the directions of the sources, we find estimates of θ1 and θ2 ,
and hence A becomes known and can be inverted.
In other situations, in wireless communications, we may know the values of s1 (t) and s2 (t) for a
short time interval t = [0, T ]: the data contains a “training period”. We thus have a data model

X = AS , X = [x(0) , · · · , x(T )] , S = [s(0) , · · · , s(T )] .

This allows to set up a least squares problem

min k X − AS k2F
A

with X and S known. The solution is given by

A = XS (SS )−1
H H

and subsequently W = A−H .

3.1.6 Spatial response graphs


Let us know consider in some more detail the properties of the array response vector a(θ). For
simplicity, we will look at uniform linear arrays. Suppose we have an antenna spacing of λ/2,

EE 4715 (2022): Array Signal Processing


3.1 Antenna array receiver model 49

spatial response for fixed w spatial response for fixed w spatial response for fixed w
7 7 7
M=7 M=7 M=7
6 6 6
Delta = 0.5 Delta = 1 Delta = 2
5 5 5

4 4 4

3 3 3

2 2 2

1 1 1

0 0 0
−50 0 50 −50 0 50 −50 0 50
angle [deg] angle [deg] angle [deg]

Figure 3.8. Grating lobes.

and select a simple beamformer of the form

1
 
 .. 
w= . 
1

i.e., we simply sum the outputs of the antennas. The response of the array to a unit-amplitude
signal from direction θ is characterized by
H
|y(t)| = |w a(θ)| .

Graphs of this response for M = 2, 3, 7 antennas are shown in Fig. 3.7, as a function of θ. Note
that the response is maximal for a signal from the direction 0◦ , or broadside from the array.
This is natural since a signal from this direction is summed coherently, as we have seen in the
begining of the section. The gain in this direction is equal to M , the array gain. From all other
directions, the signal is not summed coherently. For some directions, the response is even zero,
where the delayed signals add destructively. We saw in Chap. 2 that the number of zeros is
equal to M − 1. In between the zeros, sidelobes occur. The width of the main beam is also
related to the number of antennas, and in Chapter 2 we estimated it at about 180◦ /M . With
more antennas, the beamwidth gets smaller.

Ambiguity and grating lobes Let us now consider what happens if the antenna spacing in-
creases beyond d = λ/2. As before, let ∆ = d/λ. We have an array response vector
 
1

 ejφ 

a(θ) =  .. , φ = 2π∆ sin(θ) .
.
 
 
ej(M −1)φ

EE 4715 (2022): Array Signal Processing


50 Narrowband data models

spatial response for fixed w response for scanning w response for scanning w
7
8 8
M=7 M=7 M=7
6 7 7
Delta = 0.5 Delta = 0.5 Delta = 0.5
5 6 alpha = [0 30] 6 alpha = [0 12]

4 5 5

4 4
3
3 3
2
2 2
1 1 1

0 0 0
−50 0 50 −50 0 50 −50 0 50
angle [deg] angle [deg] angle [deg]

Figure 3.9. Beam steering. (a) response to w = a(30◦ ); (b) response for scanning w = a(θ),
in a scenario with two sources, well separated, and (c) separated less than a beam
width.

Since sin(θ) ∈ [−1, 1], we have that 2π∆ sin(θ) ∈ [−2π∆, 2π∆]. If ∆ > 0.5, then this interval
extends beyond [−π, π]. In that case, there are several values of θ that give rise to the same
argument of the exponent, or to the same φ. The effect is two-fold:

• Spatial aliasing occurs: we cannot recover θ from knowledge of φ, and

• In the array reponse graph, grating lobes occur, see Fig. 3.8. This is because coherent
addition is now possible for several values of θ.

Grating lobes prevent a unique estimation of θ. However, we can still estimate A and it does
not prevent the possibility of null steering or source separation. Sometimes, grating lobes can
be suppressed by using directional antennas rather than omnidirectional ones (e.g., parabolic
dishes): the spatial response is then multiplied with the directional response of the antenna
a0 (θ) as in (3.4), and if it is sufficiently narrow, only a single lobe is left.

Beam steering Finally, let us consider what happens when we change the beamforming vector
w. Although we are free to choose anything, let us choose a structured vector, e.g., w = a(30◦ ).
Fig. 3.9 shows the response to this beamformer. Note that now the main peak shifts to 30◦ ,
signals from this direction are coherently added. By scanning w = a(θ), we can place the peak
at any desired θ. This is called classical beamforming.
This also provides a simple way to do direction estimation. Suppose we have a single unit-norm
source, arriving from broadside (0◦ ),

1
 
 .. 
x(t) = a(0)s(t) =  .  s(t) = 1s(t) .
1

EE 4715 (2022): Array Signal Processing


3.1 Antenna array receiver model 51

If we compute y(t) = wH x(t) and scan w(θ) = a(θ) over all values of θ and monitor the output
power of the beamformer,

Py (θ) = E[ |y(t)|2 ] = E[ |w(θ) x(t)|2 ] = |a(θ) 1|2 Ps ,


H H
−π ≤ θ ≤ π (3.6)

where Ps is the source power, then (except for the square) we obtain essentially the same array
graph as in Fig. 3.7 before (it is the same functional). Thus, there will be a main peak at 0◦ ,
the direction of arrival, and the beam width is related to the number of antennas. In general, if
the source is coming from direction θ0 , then the graph will have a peak at θ0 .
With two sources, x(t) = a(θ1 )s1 (t) + a(θ2 )s2 (t), the array graph will show two peaks, at θ1 and
θ2 , at least if the two sources are well separated. If the sources are close, then the two peaks
will shift, then merge and at some point we will not recognize that there are in fact two sources.
The choice w(θ) = a(θ) is one of the simplest forms of beamforming.1 It is data independent,
and optimal only for a single source in white noise. One can show that for more than 1 source,
the parameter estimates for the directions θi will be biased: the peaks have a tendency to move
a little bit to each other. Unbiased estimates are obtained only for a single source.
There are other ways of beamforming, in which the beamformer is selected depending on the
data, with higher resolution (sharper peaks) and better statistical properties in the presence
of noise. Alternatively, we may follow a parametric approach in which we pose the model
x(t) = a(θ1 )s1 (t) + a(θ2 )s2 (t) and try to compute the parameters θ1 and θ2 that best fit the
observed data, as we discussed in Section 3.1.4.
For more general arrays, the array response vector a(θ) is given by (3.5), and we would still pick
w(θ) = a(θ) to steer towards θ, i.e., set
 
1
 jφ1
 e


w(θ) = 
 ..
. (3.7)
 .


ejφM −1

Note that when we compute w(θ)H x(t), we apply complex conjugates to the entries of w, so that
φm becomes −φm . It is recognized that the resulting phases −φm are precisely those that are
needed to compensate the phase of the incoming signal at sensor m, so that we sum coherently
for signals coming from direction θ.

Beam shaping In (3.7), the amplitudes of each entry of w were all equal to 1. As a result,
all beams in Fig. 3.9 looked like Dirichlet functions. In particular, they all have quite high
sidelobes.
1
It is a spatial matched filter, and known as Maximum Ratio Combining in communications.

EE 4715 (2022): Array Signal Processing


52 Narrowband data models

aperture
sampling beamsteer beamshape sum
s(t)
x0 (t)
y(t)
e−jφ0 ∗
w0,0

θ x1 (t)
+
e−jφ1 ∗
w0,1

xM −1 (t)

e−jφM −1 ∗
w0,M −1

s(t) a(θ) x(t) w(θ)∗ w0∗

Figure 3.10. General phased array beamformer (narrowband signals).

We can address this by tapering, i.e., scale each entry ejφm of w(θ) by a weight w0,m . The
resulting beamformer can be written as
   
w0,0 w0,0

 w0,1 ejφ1  
  w0,1 

w = w0 w(θ) =  .. =
..  a(θ) . (3.8)

 .
 
  . 

w0,M −1 ejφM −1 w0,M −1

The latter way of writing (as a matrix multiplying a(θ)) will be recognized in a later chapter
when we consider more general beamformers.
The design of w0 to arrive at a desired beam shape follows the same discussion as in Sec. 2.3.3,
where we discussed weighted spatial Fourier transforms.

Relation to spatial spectra In Chapter 2, we discussed the analysis of propagating waves


using the spatial Fourier transform. In the present section, we derived beamforming quite
independently, as a way to coherently sum signals coming from a direction θ. The connection
between the two concepts is as follows.
Consider Fig. 3.10, and compare to Fig. 2.4. In the notation of this chapter, we have

x(t) = a(θ)s(t) , am = ejφm , φm = −k · xm ,

where s(t) is a baseband signal. In the notation of Fig. 2.4, we have

xm (t) = s(x, t) = s(t)ejφm , φm = −k · xm

where s(t) is a narrowband passband signal with center frequency ω0 . However, the modulation
ejω0 t can be dropped from both s(t) and xm (t). This makes the expressions for xm (t) equivalent.

EE 4715 (2022): Array Signal Processing


3.2 Narrowband correlation models 53

Next, we considered beamforming,


H
y(t) = w x(t) , w = w0 w(θ)
with w given in (3.8), so that the beamforming entries are wm = w0,m ejφm . In Chap. 2,
we defined the discrete-space Fourier transform of the xm (t) including aperture selection and
weighting as
M
X −1
Y (k, t) = wm xm (t)ejk·xm (3.9)
m=0
This is, in fact, equal to the beamformer output y(t), if wm in (3.9) is replaced by w0,m (note
that the complex conjugate in wH takes care of the minus sign in front of φm ). Thus, the
beamformer in Fig. 3.10 is recognized as computing the (weighted) spatial Fourier transform of
the selected samples of s(x, t).
This is entirely equivalent to the interpretation of the periodogram in time-domain spectrum
estimation as the output of a subband filter in a filter bank, viz. [2, Ch. 8.2.1].

3.2 NARROWBAND CORRELATION MODELS

3.2.1 Data models


Let us revisit the narrowband data model, after sampling:
xn = Asn , n = 0, · · · , N − 1 .
Instead of a deterministic model for the sources, we can consider a stochastic model. We will
restrict ourselves to zero-mean, wide sense stationary sources, and let
H
E[sn ] = 0 , Rs = E[sn sn ] .
Then the data satisfies
H H
E[xn ] = 0 , Rx = E[xn xn ] = ARs A .
We will sometimes find it useful to vectorize this model, and work with rx = vec(Rx ). Using
properties of Kronecker products (see Sec. 5.1.6), we obtain
rx = (Ā ⊗ A)rs . (3.10)

Generally, Rs is a full d × d matrix. Two important special cases are:

• Independent sources: the source covariance is diagonal,


σ12
 

Rs = Σs = 
 .. ,

.
σd2

EE 4715 (2022): Array Signal Processing


54 Narrowband data models

where the source variances (powers) σi2 are possibly unequal. If σ s = vecdiag(Σs ) is a
vector containing the source powers, then (3.10) becomes

rx = (Ā ◦ A)σ s .

• Independent sources with equal variances:

Rs = σs2 I .

This leads to
Rx = σs2 AA , rx = σs2 (Ā ◦ A)1
H

where 1 is a vector with all entries equal to 1.

Next, we augment the data model with additive noise:

x n = A n s n + nn .

Similar to the sources, the noise is considered to be zero mean and wide-sense stationary,
H
E[nn ] = 0 , Rn = E[nn nn ] .

The noise is independent from the signals, so that


H
Rx = ARs A + Rn .

After vectoring, this becomes


rx = (Ā ⊗ A)rs + rn .
Usually the noise on the various sensors is considered to be independent, so that Rn is modeled
to be diagonal: Rn = Σn . Moreover, we often model the noise powers on the various antennas
to be equal, i.e., spatially white noise, so that

Rn = σn2 I .

With diagonal source and noise covariance models, the vectored data model can be written as

rx = (Ā ◦ A)σ s + (I ◦ I)σ n .

3.2.2 Sample correlation matrices


Generally, we have a finite number of samples N , and do not have access to the true data
covariance matrix. Instead we form the estimate
−1
1 NX H
R̂x = xn xn .
N n=0

EE 4715 (2022): Array Signal Processing


3.2 Narrowband correlation models 55

which is called the sample correlation matrix. Using a data matrix X = [x0 , · · · , xN −1 ], we can
also write it as
1 H
R̂x = XX .
N
If we define R̂s in a similar way then, in the noiseless case, R̂x = AR̂s AH . With additive noise,
H
R̂x = AR̂s A + R̂n + (cross terms) .

Since the sources and the noise are zero mean, the cross terms are zero mean, and the covariance
estimate is unbiased:
H
E[R̂x ] = Rx = ARs A + Rn .
Thus, we can consider R̂x to be equal to Rx plus a zero mean error term E due to the finite
number of samples. For large N , we expect E → 0. What is the (co)variance of R̂x ? Or
equivalently, of E?
To answer this, we work with the vectored covariance matrices rx and r̂x .
For a matrix-valued stochastic variable R̂, its covariance matrix can be defined as the covariance
of r̂, i.e.,
H
cov[R̂] = cov[r̂] = E[(r̂ − E[r̂])(r̂ − E[r̂]) ] .
1 P ∗
Next, insert the definition of r̂ = N xn ⊗ xn , and use the fact that xi is independent of xj for
i 6= j to derive
1 X ∗ 1 X ∗
 
(xi ⊗ xi ) − E[x∗i ⊗ xi ])( (xj ⊗ xj ) − E[x∗j ⊗ xj ])
H
cov[R̂x ] = E ( (3.11)
N N
1 XX h ∗ ∗ ∗ ∗ H
i
= E (x i ⊗ x i − E[x i ⊗ x i ])(x j ⊗ x j − E[x j ⊗ x j ])
N2
1 h i
E (x∗i ⊗ xi − E[x∗i ⊗ xi ])(x∗i ⊗ xi − E[x∗i ⊗ xi ])
X H
= 2
N
1  
E[(x∗ ⊗ x)(x∗ ⊗ x) ] − E[x∗ ⊗ x]E[x∗ ⊗ x]
H H
= (3.12)
N
1
= Cx ,
N
where
Cx = E[(x∗k ⊗ xk )(x∗k ⊗ xk ) ] − E[x∗k ⊗ xk ]E[x∗k ⊗ xk ] .
H H
(3.13)
The first term of this expression shows that the covariance of R̂x involves fourth-order correla-
tions. These can often be described in simpler terms using cumulants. A discussion of this is
deferred to Chap. 11.
For the special case where the entries xi of x are zero-mean and jointly Gaussian distributed, it
is known that (for arbitrary indices a, b, c, d = 0, · · · , M − 1)

E[xa x∗b xc x∗d ] = E[xa x∗b ]E[xc x∗d ] + E[xa x∗d ]E[x∗b xc ] + E[xa xc ]E[x∗b x∗d ] . (3.14)

EE 4715 (2022): Array Signal Processing


56 Narrowband data models

This follows from an expression of the 4th order (joint) cumulant in terms of moments; for
Gaussian random variables this cumulant is zero. “Proper” (or circularly symmetric) complex
variables are such that E[xxT ] = 0. In this case, the last term vanishes.
The LHS of (3.14) represents a 4th order moment. Stacking in a matrix with row-index a + M b
and column-index c + M d, we can write this expression compactly as
E[(x∗ ⊗x)(x∗ ⊗x) ] = E[x∗ ⊗x]E[x∗ ⊗x] +E[x∗ x∗ ]⊗E[xx ]+E[(x∗ ⊗1)(1⊗x) ] E[(1⊗x)(x∗ ⊗1) ] .
H H H H H H

For proper complex variables, the last term vanishes. For this case, if we compare to (3.13), we
see that
Cx = R∗x ⊗ Rx .
Thus, for zero mean proper complex-valued Gaussian random variables,
1 ∗
cov[R̂x ] =R ⊗ Rx . (3.15)
N x
while for zero mean non-proper complex variables
1 h ∗ i
Rx ⊗ Rx + E[(x∗ ⊗ 1)(1 ⊗ x) ] E[(1 ⊗ x)(x∗ ⊗ 1) ] . .
H H
cov[R̂x ] = (3.16)
N
and for zero mean real-valued Gaussian random variables
1 h T T
i
cov[R̂x ] = Rx ⊗ Rx + E[(x ⊗ 1)(1 ⊗ x) ] E[(1 ⊗ x)(x ⊗ 1 ] . (3.17)
N

3.2.3 Spatial power spectrum estimation


By generalizing (3.6), a spatial power spectrum estimate is given by
H
P̂ (θ) = w(θ) R̂w(θ) .
A classical spectrum (equal to the periodogram in temporal power spectrum estimation) is
obtained by using a matched filter: w(θ) = √1M a(θ). For example, for a single source in white
noise, we have
xn = a(θ0 )sn + nn .
What is the expected value and the variance of this estimate?
The expected value is straightforward:
H
E[P̂ (θ)] = w(θ) Rw(θ) .
To derive the variance, for simplicity of notation, we consider a single w.
var[P̂ ] = E[ |P̂ − E[P̂ ]|2 ] = E[ |wR̂w − wRw|2 ]
= E[ |w(R̂ − R)w|2 ]
= E[ |(w∗ ⊗ w) (r̂ − r)|2 ]
H

= E[(w∗ ⊗ w) (r̂ − r)(r̂ − r) (w∗ ⊗ w)]


H H

= (w∗ ⊗ w) cov[R̂](w∗ ⊗ w) .
H

EE 4715 (2022): Array Signal Processing


3.2 Narrowband correlation models 57

Assuming zero mean proper complex Gaussian sources, we can insert (3.15):
1 1 1 H 1
(w∗ ⊗ w) (R∗ ⊗ R)(w∗ ⊗ w) = w∗ R∗ w∗ ⊗ w Rw = |w Rw|2 = |E[y]|2
H H
var[P̂ ] =
N N N N

In other words, the standard deviation of the spectrum estimate is 1/ N times the expected
value of the spectrum estimate itself.
This is the same result as for the periodogram [2, Ch. 8]. The result is valid for any data-
independent beamformer w(θ), i.e., also if we apply tapering.

3.2.4 Variance
The variance of r̂ is a vector consisting of the diagonal entries of cov[r̂]. The variance of R̂ is
defined as an unfolding of this vector into a matrix. Each entry of this matrix then shows the
variance of the corresponding entry in R̂.
Thus, if D = diag(R) and d = vecdiag(R), then, for zero mean complex proper Gaussian
variables,
1 1
var[r̂] = vecdiag(R∗ ⊗ R) = d ⊗ d
N N
and
1
var[R̂] = vec−1 (var[r]) = dd .
T

N
Some examples follow.

Independent noise If xk = nk is zero mean proper symmetric Gaussian noise with variance
2 ), then
Σn = diag(σ n ) (i.e., the sensors have independent noise with variance σn,i

R = Σn
1
cov[R̂] = Σn ⊗ Σn
N
1 T
var[R̂] = σnσn .
N

Single point source If xk = ask , where sk is a zero mean proper complex Gaussian source
with unit variance (a non-unit variance can be incorporated in a), then
H
R = aa
1 ∗ T 1
a a ⊗ aa = (a∗ ⊗ a)(a∗ ⊗ a)
H H
cov[R̂] =
N N
1 ∗ T H
var[R̂] = a a aa .
N
Although generally does not preserve the rank of matrices, it can be shown that a∗ aT aaH
is rank 1 (see Sec. 5.1.6).

EE 4715 (2022): Array Signal Processing


58 Narrowband data models

10 MHz 100 kHz 10 s


10 µs
x̃1 (t)
H
RF x(t) x(n, k) x(n, k)x(n, k)P R̂p,k
to filter
BB bank

x̃M (t)

Figure 3.11. The processing chain to obtain covariance data.

3.3 APPLICATION: RADIO ASTRONOMY

In Sec. 2.5, we introduced radio astronomy. Starting from basic wave propagation, we arrived
at the “Van Cittert-Zernike” measured data model (2.36) of the form
Z
ω
V (ω, b) = I(ω, ζ)e−j c ζ·b dζ (3.18)

which describes the received cross power spectral density V (ω, b) over a baseline b, in terms of
the image I(ω, ζ), i.e., the intensity into the direction ζ, and the phase delays ωc ζ · b in that
look direction (the geometric delays).
With the tools in the present chapter, we can rewrite this into a data matrix form. Let us first
consider the receiver model in a bit more detail.

3.3.1 Data acquisition


Mathematically, the correlation process is described as follows. Assume that there are M array
elements (telescopes). The RF signal x̃j (t) from the jth telescope is first moved to baseband
where it is denoted by xj (t), then sampled and split into narrow subbands, e.g., of 100 kHz
each, such that the narrowband condition holds: the maximal geometric delay across the array
should be fairly representable by a phase shift of the complex baseband signal.
For radio astronomy, the maximal geometric delay is related to the diameter of the array (usually
several kilometers or nowadays up to several hundreds of kilometers). The bandwidth W such
that the narrowband assumption is satisfied is then fairly small. In current systems that consist of
many antennas spread over a large area, a hierarchy is made where the first group of antennas (a
“station”) covers only a relatively small area (a few hundred meters) such that the narrowband
condition does not require very small bandwidths. The station antennas are combined via
beamforming, and the beamformed output can be regarded as the output of a steered dish.
Later processing stages then split the beamformed station signals into narrower bandwidths, as
they are combined with antenna signals from farther away.
The resulting signal is called xm (n, k), for the mth telescope (or station), nth time bin, and for
the subband frequency centered at RF frequency fk . The M signals are stacked into a M × 1

EE 4715 (2022): Array Signal Processing


3.3 Application: radio astronomy 59

vector x(n, k).


A single correlation matrix is formed by “integrating” (summing) the crosscorrelation products
x(n, k)xH (n, k) over N subsequent samples,

pN −1
1 X H
R̂p,k = x(n, k)x (n, k) , (3.19)
N n=(p−1)N

where p is the index of the corresponding “short-term interval” (STI) over which is correlated.
The processing chain is summarized in Fig. 3.11.
The duration of an STI depends on the stationarity of the data, which is limited by factors
like Earth rotation and the diameter of the array. For the Westerbork array, a typical value for
the STI is 10 to 30 s; the total observation can last for up to 12 hours. The resulting number
of samples N in a snapshot observation is equal to the product of bandwidth and integration
time and typically ranges from 103 (1 s, 1 kHz) to 106 (10 s, 100 kHz) in radio astronomical
applications.

3.3.2 Basic covariance data model

For our purposes, it is convenient to model the sky as consisting of a collection of Q spatially
discrete point sources, with sq [n, k] the signal of the qth source at time sample n and frequency
fk .
For a single source, we saw in (3.5) that the received signal at an antenna array can be expressed
as
x[n, k] = aq [n, k]sq [n, k]

where the array response vector aq [n, k] has entries


ω
am = ejφm , φm = ζ · xm
c q
where xm is the position of the mth antenna and ζ q the direction vector of the qth source. For
simplicity of notation, let us define normalized telescope position vectors,

2πfk
zm [n, k] = xm
c
As the earth rotates, the antenna positions are actually functions of time. We can collect them
in a 3 × M matrix
Z[n, k] = [z1 [n, k], · · · , zM [n, k]] .

In this notation,
T
aq [n, k] = ejZ(n,k) ζq
, (3.20)

EE 4715 (2022): Array Signal Processing


60 Narrowband data models

Summing over all sources, we obtain


Q
X
x[n, k] = aq [n, k]sq [n, k] + n[n, k] (3.21)
q=1

where aq [n, k] is the array response vector for the qth source, consisting of the phase multipli-
cation factors, and n[n, k] is an additive noise vector, due to thermal noise at the receiver. We
will model sq [n, k] and n[n, k] as baseband complex envelope representations of zero mean wide
sense stationary temporally white Gaussian random processes sampled at the Nyquist rate.
For convenience of notation, we will in future usually drop the dependence on the frequency fk
(index k) from the notation.
Previously, in (3.19), we defined correlation estimates R̂p as the output of the data acquisition
process, where the time index p corresponds to the pth short term integration interval (STI),
such that (p − 1)N ≤ n ≤ pN . Due to Earth rotation, the vector aq [n] changes slowly with time,
but we assume that within an STI it can be considered constant and can be represented, with
some abuse of notation, by aq [p]. In that case, x[n] is wide sense stationary over the STI, and
a single STI autocovariance is defined as
H n
Rp = E[x[n] x [n]] , p=d e (3.22)
N
where Rp has size M × M . Each element of Rp represents the interferometric correlation
along the baseline vector between the two corresponding receiving elements. It is estimated by
STI sample covariance matrices R̂p defined in (3.19), and our stationarity assumptions imply
E[R̂p ] = Rp .
If we generalize now to Q sources and add zero mean noise, uncorrelated from antenna to
antenna, as in the signal model (3.21), we obtain the covariance data model
H
Rp = Ap Σs Ap + Σn , p = 0, 1, 2, · · · , (3.23)

where Ap = [a1 (p), · · · , aQ (p)]


2 2
Σs = diag[σs,1 , · · · , σs,Q ]
H 2 2
Σn = E[n(p) n (p)] = diag[σn,1 , · · · , σn,M ].
2 = E[|s (n, k)|2 ] is the variance of the qth source, Σ is the corresponding signal
Here, σs,q q s
covariance matrix, and Σn is the noise covariance matrix. Noise is assumed to be independent
2 are considered unknown
but not evenly distributed across the array. The noise variances σn,j
until they have been calibrated. This measurement equation is actually a matrix version of
(3.18).
Under ideal circumstances, the array response matrix Ap is just a phase matrix: its columns
are given by the vectors aq (p) in (3.20), and its entries express the phase shifts due to the
geometrical delays associated with the array and source geometry. We will later generalize this
and introduce directional disturbances due to non-isotropic antennas, unequal antenna gains,
and disturbances due to atmospheric effects.

EE 4715 (2022): Array Signal Processing


3.3 Application: radio astronomy 61

3.3.3 Image formation for the ideal data model


Ignoring the additive noise and using the ideal array response matrix Ap , the measurement
equation (3.23), in its simplest form, can be written as
Q
X T
(Rp )i,j = I(ζ q ) ej(zi (p)−zj (p)) ζq
(3.24)
q=1

where (Rp )i,j is the correlation between antennas i and j at STI interval p, I(ζ q ) = σq2 is the
brightness (power) of the source in direction ζ q , zi (p) is the normalized location vector of the
ith antenna at STI p, and ζ q is the unit propagation vector from the qth source.
The function I(ζ) is the brightness image (or map) of interest. For our discrete point-source
model, it is
Q
X
I(ζ) = σq2 δ(ζ − ζ q ) (3.25)
q=1

where δ(·) is a Kronecker delta, and the direction vector ζ is mapped to the location of “pixels”
in the image (various transformations are possible). Only the pixels ζ q are nonzero, and have
value equal to the source variance σq2 .
Equation (3.24) describes the relation between the visibility model and the desired image, and
it has the form of a Fourier transform; as discussed in Chap. 2.5, it is the Van Cittert-Zernike
theorem [4, 5]. Image formation is essentially the inversion of this relation. We discussed this
in Sec. 2.5. In the present setting, we have only a finite set of observations (as indexed by p). If
we apply the inverse “discrete-space” Fourier transformation to the measured correlation data,
we obtain the dirty image
T
IˆD (ζ) := (R̂p )ij e−j(zi (p)−zj (p))
X
ζ
. (3.26)
i,j,p

In terms of the measurement data model (3.24), the “expected value” of the image is obtained
by replacing R̂p by Rp , or
T
(Rp )i,j e−j(zi (p)−zj (p))
X
ζ
ID (ζ) :=
i,j,p
T
σq2 e−j(zi (p)−zj (p))
XX
(ζ−ζ q )
=
i,j,p q
X
= I(ζ q ) B(ζ − ζ q )
q
= I(ζ) ∗ B(ζ) (3.27)
where the dirty beam (or point spread function) is given by
T
e−j(zi (p)−zj (p))
X
ζ
B(ζ) := . (3.28)
i,j,p

EE 4715 (2022): Array Signal Processing


62 Narrowband data models

This is the same result as in Sec. 2.5, but now for spatially sampled observations. Again, the
dirty image ID (ζ) is the desired image I(ζ) convolved with the dirty beam B(ζ): every point
source excites a beam B(ζ − ζ q ) centered at its location ζ q . Note that B(ζ) is a known function:
it only depends on the locations of the telescopes, or rather the sampled set of telescope baselines
zi (p) − zj (p).

3.4 NOTES

Section 3.3 is based on Van der Veen e.a. [3].

Bibliography

[1] J.G. Proakis and M. Salehi, Communication Systems Engineering. Prentice-Hall, 1994.

[2] M.H. Hayes, Statistical digital signal processing and modeling. Wiley, 1996.

[3] A.J. van der Veen, S.J. Wijnholds, and A.M. Sardarabadi, “Signal processing for radio
astronomy,” in Handbook of Signal Processing Systems, 3rd ed., Springer, November 2018.
ISBN 978-3-319-91734-4.

[4] R.A. Perley, F.R. Schwab, and A.H. Bridle, Synthesis Imaging in Radio Astronomy, vol. 6
of Astronomical Society of the Pacific Conference Series. BookCrafters Inc., 1994.

[5] A.R. Thompson, J.M. Moran, and G.W. Swenson, Interferometry and Synthesis in Radio
Astronomy. New York: Wiley, 2nd ed., 2001.

EE 4715 (2022): Array Signal Processing


Chapter 4

WIDEBAND DATA MODELS

Contents
4.1 Physical channel properties . . . . . . . . . . . . . . . . . . . . . . . . 63
4.2 Signal modulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.3 Deterministic data models . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.4 Frequency-domain data models . . . . . . . . . . . . . . . . . . . . . . 84
4.5 Application: radio astronomy . . . . . . . . . . . . . . . . . . . . . . . 84
4.6 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

Having covered narrowband data models, in this chapter we continue and focus on wideband
data models. These are used in the context of wireless (RF) communication systems, where
convolutions by pulse shape functions and channel propagation delays play an important role.
A data model for wireless communication consists of the following parts (see Fig. 4.1):

1. Source model: signal alphabet, data packets, and modulation by a pulse shape function;

2. Physical channel: multipath propagation over the wireless channel, based on the wave
propagation model of Chapter 2;

3. Receiver model: reception of multiple signals at multiple antennas, sampling, beamforming,


equalization and decision making. This is about algorithms, as covered in the subsequent
chapters.

We start by looking at models for the physical channel.

4.1 PHYSICAL CHANNEL PROPERTIES

Wide-area multipath propagation model We consider in this section a wireless communica-


tion setting, i.e., the propagation of a signal from a transmitter (“mobile”) through a medium

EE 4715 (2022): Array Signal Processing


64 Wideband data models

space-time equalizer
x1 (t)
[ŝ1 ]k
[s1 ]k g(t)
xM (t) W

[ŝd ]k
[sd ]k

Figure 4.1. Wireless communication scenario

(“channel”) to a receiver (“base station”). This allows to present concepts that are to some
extent also relevant in other contexts, such as microphone arrays, radar, GPS receivers, etc.
The propagation of signals through the wireless channel is fairly complicated to model. A correct
treatment would require a complete description of the physical environment, and would not be
very useful for the design of signal processing algorithms. To arrive at a more useful parametric
model, we have to make simplifying assumptions regarding the wave propagation. Provided this
model is reasonably valid, we can, in a second stage, try to derive statistical models for the
parameters to obtain reasonable agreement with measurements.
The number of parameters in an accurate model can be quite high, and from a signal processing
point of view, they might not be very well identifiable. For this reason, another model used in
signal processing is a much less sophisticated unparametrized model. The radio channel is simply
modeled as an FIR (finite impulse response) filter, with main parameters the impulse response
length (in symbols) and the total attenuation or signal-to-noise ratio (SNR). This model is
described in Section 4.3. The parametrized model is a special case, giving structure to the FIR
coefficients.

Jakes’ model A commonly used parametric model is a multiray scattering model, also known
as Jakes’ model (after Jakes [1], see also [2–6]). In this model, the signal follows on its way from
the source to the receiver a number of distinct paths, referred to as multipaths. These arise from
scattering, reflection, or diffraction of the radiated energy on objects that lie in the environment.
The received signal from each path is much weaker than the transmitted signal due to various
scattering and fading effects. Multipath propagation also results in the spreading of the signal
in various dimensions: delay spread in time, Doppler spread in frequency, and angle spread in
space. Each of them has a significant effect on the signal. The mean path loss, shadowing, fast
fading, delay, Doppler spread and angle spread are the main channel characteristics and form
the parameters of the multiray model.
The scattering of the signal in the environment can be specialized into three stages: scattering

EE 4715 (2022): Array Signal Processing


4.1 Physical channel properties 65

(α0 , β0 , τ0 )

x0 (t)

s(t) g(t) xM −1 (t)

path i

Figure 4.2. Multipath propagation model in a wireless communication setting.

local to the source at surrounding objects, reflections on distant objects of the few dominant
rays that emerge out of the local clutter, and scattering local to the receiver. See Fig. 4.2.

Scatterers local to the mobile Scattering local to the mobile is caused by buildings and other
objects in the direct vicinity of the mobile (at, say, a few tens of meters). Motion of the mobile
and local scattering give rise to Doppler spread which causes “time-selective fading”: the signal
power can have significant fluctuations over time. While local scattering contributes to Doppler
spread, the delay spread will usually be insignificant because of the small scattering radius.
Likewise, the angle spread will also be small.

Remote scatterers Away from the cluster of local scatterers, the emerging wavefronts may then
travel directly to the base or may be scattered toward the base by remote dominant scatterers,
giving rise to specular multipath. These remote scatterers can be either terrain features (distant
hills) or high rise building complexes. Remote scattering can cause significant delay and angle
spreads.

Scatterers local to the base Once these multiple wavefronts reach the base station, they may
be scattered further by local structures such as buildings or other structures that are in the
vicinity of the base. Such scattering will be more pronounced for low elevation and below-roof-
top antennas. The scattering local to the base can cause significant angle spread which can cause
space-selective fading: different antennas at the base station can receive totally different signal
powers. This fading is time invariant, unlike the time varying space-selective fading caused by
remote scattering.

Doppler spread and time selective fading If mobiles or scatterers are moving, then the phases
of each multipath component are quickly changing relative to each other, hence they add up

EE 4715 (2022): Array Signal Processing


66 Wideband data models

differently over time. As mentioned, this causes (fast) fluctuations in the received signal power
over that ray (time-selective fading).
Movement also results in a Doppler spread, i.e., a pure CW tone is spread over a non-zero
spectral bandwidth. If a source moves with a velocity of v m/s towards the receiver, then its
observed frequency is increased by fm = v/λ [Hz], or ωm = v 2πλ . Likewise, if the source moves
away from the receiver, its observed frequency is reduced by fm . If it moves sideways, there is
no shift in frequency.
If there is a ring of scatterers around the mobile, then seen via some reflectors, the mobile
may seem to move away, while via other reflectors, it seems to approach. Thus, we obtain a
distribution of Doppler shifts. If one assumes uniformly distributed scatterers, then the baseband
power spectrum of the vertical electrical field component of the channel is convolved with [1, ch.1]
" 2 #−1/2
3 ω

S(ω) = 1− , |ω| < ωm (4.1)
ωm ωm

The Doppler spectrum described by (4.1) is often called the classical spectrum. For a mobile
traveling at 100 kph, the Doppler spread is approximately fm = 175 Hz in the 1900 MHz band.
Because a convolution in frequency domain translates in pointwise multiplication in time domain,
and the function is non-flat in this case, Doppler spread causes time selective fading. It is usually
characterized by the coherence time of the channel [1], i.e., the time lag over which the Doppler
time function has an autocorrelation larger than 0.5. The larger the Doppler spread, the smaller
the coherence time. The coherence time is in the order of 1/ωm , i.e., approximately 0.9 ms for
fm = 175 Hz. In comparison, the burst length in a single GSM data package is 0.577 ms, so that
the GSM channel can be regarded almost time-invariant during the burst, but not in between
two bursts.

Signal processing model Let us ignore the local scattering for the moment, and assume that
there are r rays bouncing off remote objects such as hills or tall buildings. As extension of the
narrowband model (3.4), the received parametric signal model is then usually written as the
convolution
r
" #
X
x(t) = h(t) ∗ s(t) , h(t) = a(θi )βi g(t − τi ) , (4.2)
i=1

where x(t) is a vector consisting of the M antenna outputs, a(θ) is the array response vector,
and the impulse response g(t) collects all temporal aspects, such as pulse shaping and transmit
and receive filtering. The model parameters of each ray are its (mean) angle-of-incidence θi ,
(mean) path delay τi , and path loss βi . The latter parameter lumps the overall attenuation, all
phase shifts, and possibly the antenna response a0 (θ) as well.
Each of the rays is itself composed of a large number of “mini-rays” due to scattering close to the
source: all with roughly equal angles and delays, but arbitrary phases. This can be described by
extending the model with additional parameters such as the standard deviations from the mean

EE 4715 (2022): Array Signal Processing


4.1 Physical channel properties 67

Table 4.1. Typical delay, angle and Doppler spreads in cellular applications.
Environment delay spread angle spread Doppler spread
Flat rural (macro) 0.5 µs 1◦ 190 Hz
Urban (macro) 5 µs 20◦ 120 Hz
Hilly (macro) 20 µs 30◦ 190 Hz
Mall (micro) 0.3 µs 120◦ 10 Hz
Indoors (pico) 0.1 µs 360◦ 5 Hz

angle θi and mean delay τi , which depend on the radius (aspect ratio) of the scattering region
and its distance to the remote scattering object [7, 8]. For macroscopic models, the standard
deviations are generally small (less than a few degrees, and a fraction of τi ) and are usually but
not always ignored.

The local scattering however has a major effect on the statistics and stationarity of βi . For
example, if all local rays have equal amplitude, then βi is the sum of a large number of arbi-
trary complex numbers, each with equal modulus but random phase, which gives βi a complex
Gaussian distribution. Consequently, its amplitude has a Rayleigh distribution (hence the name
Rayleigh fading). More in general, if there is a strong path with some scattering around it that
causes fluctuation, then often a Rice distribution or log-normal distribution is assumed.

A second effect is that βi = βi (t) is really (slowly) time-varying: if the source is in motion, then
the Doppler shifts and/or the varying location change the phase differences among the rays, so
that the sum can be totally different from one time instant to the next. The maximal Doppler
shift fD is given by the speed of the source (in m/s) divided by the wavelength of the carrier. The
coherence time of the channel is inversely proportional to fD , roughly by a factor of 0.2: βi (t)
can be considered approximately constant for time intervals smaller than this time [4, 9, 10].
Angles and delays are generally assumed to be stationary over much longer periods.

A proper discussion should now present statistical models for θi , βi , and τi . Since this is not the
focus of the book, we omit further details.

Typical channel parameters Angle spread, delay spread, and Doppler spread are important
characterizations of a mobile channel, as it determines the amount of equalization that is re-
quired, but also the amount of diversity that can be obtained. Measurements in macrocells
indicate that up to 6 to 12 dominant paths may be present. Typical channel delay and Doppler
spreads (1800 MHz) are given in table 4.1 [4, 9] (see also references in [5]). Typical angle spreads
are not well known; the given values are suggested by [6].

EE 4715 (2022): Array Signal Processing


68 Wideband data models

s̃k modulation u(t) z(t)


sk coding real
g(t) or q(t)
{0, 1}

ejω0 t

Figure 4.3. Modulation process.

4.2 SIGNAL MODULATION

Before a digital bit sequence can be transmitted over a radio channel, it has to be prepared:
among other things, it has to be transformed into an analog signal in continuous time and
modulated onto a carrier frequency. The various steps are shown in Fig. 4.3. The coding step,
in its simplest form, translates the binary sequence {sk } ∈ {0, 1} into a sequence {s̃k } with
another alphabet, such as {−1, +1}. A digital filter may be part of the coder as well. In linear
modulation schemes, the resulting sequence is then convolved with a pulse shape function g(t),
whereas in phase modulation, it is convolved with some other pulse shape function q(t) to yield
the phase of the modulated signal. The resulting baseband signal u(t) is modulated by the
carrier frequency ω0 to produce the RF signal that will be broadcast.

In this section, a few examples of coding alphabets and pulse shape functions are presented, for
future reference. We do not go into the properties and reasons why certain modulation schemes
are chosen; see e.g. [11] for more details.

4.2.1 Time-domain modulations

Digital alphabets The first step in the modulation process is the coding of the binary sequence
{sk } into some other sequence {s̃k }. The {s̃k } are chosen from an alphabet or constellation, which
might be real or complex. There are many possibilities; common examples are BPSK (binary
phase shift keying), QPSK (quadrature phase shift keying), PAM-m (pulse amplitude modu-
lation), QAM-m (quadrature amplitude modulation), MSK (minimum-shift keying), DQPSK
(differential QPSK), defined as in table 4.2. See also Fig. 4.4. Smaller constellations are more
robust in the presence of noise, because of the larger distance between the symbols. Larger
constellations may lead to higher bitrates, but are harder to detect in noise.

It is possible that the data rate of the output of the coder is different than the input data
rate. E.g., if a binary sequence is coded into QPSK, the data rate halves. (The opposite is also
possible, e.g., in CDMA systems, where each bit is coded into a sequence of 31 or more “chips”.)

EE 4715 (2022): Array Signal Processing


4.2 Signal modulation 69

Table 4.2. Common digital constellations


s̃k chosen from:
BPSK {1, −1}
PAM-m {−m, · · · , −1, 1, · · · , m}
QPSK (QAM-4) {1, −1, j, −j}
{1, −1} , k even
MSK
{j, −j} , k odd
{1, −1, j, −j} , k even
DQPSK
{ejπ/4 , ej3π/4 , e−j3π/4 , e−jπ/4 } , k odd

BPSK QPSK (QAM-4) MSK DQPSK


j

(a) (b) (c) (d)

Figure 4.4. Digital constellations.

EE 4715 (2022): Array Signal Processing


70 Wideband data models

0.8

0.6 α=0 α=0


0.25 0.25
0.5 0.5
0.4 0.75 0.75
1 1
0.2

−0.2

−5 −4 −3 −2 −1 0 1 2 3 4 5 −1.5 −1 −0.5 0 0.5 1 1.5


(a) time [T] (b) frequency f

Figure 4.5. (a) Family of raised-cosine pulse shape functions, (b) corresponding spectra.

Pulse shape functions The coded digital signal s̃(t) can be described as a sequence of dirac
pulses,

X
s̃(t) = s̃k δ(t − k) ,
−∞
where, for convenience, the symbol rate is normalized to T = 1. In linear modulation schemes,
the digital dirac-pulse sequence is convolved by a pulse shape function g(t):

X
u(t) = g(t) ∗ s̃(t) = s̃k g(t − k) . (4.3)
−∞

Again, there are many possibilities. The optimum wave form is one that is both localized in time
(to lie within a pulse period of length T = 1) and in frequency (to satisfy the Nyquist criterion
when sampled at a rate 1/T = 1). This is of course impossible, but good approximations exist.
A pulse with perfect frequency localization is the sinc-pulse, defined by
(
sin πt 1, |f | < 21
g(t) = , G(f ) = (4.4)
πt 0, otherwise

However, the pulse has very long tails in the time-domain.

Raised cosine pulseshape A modification of this pulse leads to the family of raised-cosine
pulseshapes, with better localization properties. They are defined, for α ≤ 1, by [11, ch.6]
sin πt cos απt
g(t) = ·
πt 1 − 4α2 t2
with corresponding spectrum

 1,
 |f | < 12 (1 − α)
1 1
G(f ) = 2 − 2 sin( απ (|f | − 1
2 )) ,
1 1
2 (1 − α) < |f | < 2 (1 + α)

 0, otherwise

EE 4715 (2022): Array Signal Processing


4.2 Signal modulation 71

The spectrum is limited to | f | ≤ 21 (1 + α), so that α represents the excess bandwidth. For
α = 0, the pulse is identical to the sinc pulse (4.4). For other values of α, the amplitude decays
more smoothly in frequency, so it is also known as the rolloff factor. The shape of the rolloff
is that of a cosine, hence the name. In the time domain, the pulses are still infinite in extent.
However, as α increases, the size of the tails diminishes. A common choice is α = 0.35, and to
truncate g(t) outside the interval [−3, 3].
The raised-cosine pulses are designed such that, when sampled at integer time instants, the
only nonzero sample occurs at t = 0. Thus, u(k) = s̃k , and to recover {s̃k } from u(t) is simple,
provided we are synchronized: any fractional delay 0 < τ < 1 results in intersymbol interference.

Phase modulation Many other modulation formats exists, in particular phase modulations are
often used. An example is GMSK as used in the GSM system. For signal processing purposes,
these are very often hard to handle. In some cases, these nonlinear modulations can be well
approximated by linear modulations (e.g., GMSK), in other cases, we simply use some general
properties of the resulting signal. E.g., several modulation formats are based on frequency or
phase modulation and satisfy a constant-modulus property (|s(t)| = 1).

4.2.2 Spread spectrum signalling


In Code Division Multiple Access (CDMA) systems, instead of directly modulating a symbol
sequence {sk } with a pulse shape function g(t), the symbols are first spread with a user-specific
code vector c. The code vector consists of G symbols cn called chips. Usually cn ∈ {0, 1} or
{−1, 1}. The coded sequence is
s1 c
 

s̃ =  s2 c  = s ⊗ c . (4.5)
 
..
.
which is subsequently modulated by g(t) in the usual way, cf. (4.3). Here, ‘⊗’ denotes a Kronecker
product, which for vectors is defined as indicated. (Properties of Kronecker products are found
in Sec. 5.1.6.) If the original sequence has a symbol duration T = 1, then each code chip has
a duration T /G, and g(t) is scaled accordingly. Thus, in frequency domain the pulse G(f )
occupies G times more bandwidth: this is called spread spectrum modulation. We can also
view the combination of code c and pulse g(t) as a new coded pulse g̃(t) that now has a more
complicated, user-specific form,
G−1
i
X  
g̃(t) = ci g t − .
i=0
G

Typical values of G are 31 till 1024. The codes are used to distinguish individual users. Instead
of giving each user a dedicated time slot or frequency subband, they get a specific user code.
This permits us to separate a superposition of multiple users at a (basestation) receiver. An
advantage is that more than G users can be active simultaneously. A disadvantage is that, due

EE 4715 (2022): Array Signal Processing


72 Wideband data models

beamformer
x0 (t)
[s1 ]k

[s1 ]k W
xM −1 (t)
[sd ]k

[sd ]k

Figure 4.6. Spatial beamformer with an I-MIMO channel.

to the shorter chip duration, the convolution by the channel impulse response has more impact,
and equalization is more complicated.
In practical systems like the 3rd generation (3G) mobile system UMTS, the used codes are
non-periodic: they differ from symbol to symbol. This requires a simple extension of (4.5) to
use symbol-specific codes ck :
s1 c1
 
 s2 c2 
s̃ =  
..
.

The factorization using Kronecker products is now not possible.


Spread spectrum is also used in GPS, with quite long codes that are different for each satellite.
E.g., for the C/A code, G = 1023 · 20 = 20 460, while the symbol rate is at 50 bits/s. Long code
lengths allow this system to operate at very low received powers.

4.3 DETERMINISTIC DATA MODELS

In Sec. 4.1, we have presented a channel model based on physical properties of the radio channel.
Though useful for generating simulated data, it is not always a suitable model for identification
purposes, e.g., if the number of parameters is large, if the angle spreads within a cluster are large
so that parametrization in terms of directions is not possible, or if there is a large and fuzzy
delay spread. In these situations, it is more appropriate to work with an unstructured model,
where the channel impulse responses are posed simply as arbitrary multichannel finite impulse
response (FIR) filters. It is a generalization of the physical channel model considered earlier, in
the sense that at a later stage we can still specify the structure of the coefficients.
In this section, we look at deterministic data models, i.e., no stochastic considerations are used.
In this case, the sampled data is directly placed in a matrix X which is subsequently analyzed.

EE 4715 (2022): Array Signal Processing


4.3 Deterministic data models 73

4.3.1 I-MIMO model


Assume that d source signals s1 (t), · · · , sd (t) are transmitted from d independent sources at
different locations. If the delay spread is small, then what we receive at the antenna array will
be a simple linear combination of these signals:

x(t) = a1 s1 (t) + · · · + ad sd (t)

where as before x(t) is a stack of the output of the M antennas. We will usually write this in
matrix form:
s1 (t)
 

x(t) = As(t) , A = [a1 · · · ad ] , s(t) =  ...  .


 

sd (t)
Suppose we sample with a period T , normalized to T = 1, and collect a batch of N samples into
a matrix X, then
X = AS
where X = [x(0), · · · , x(N − 1)] and S = [s(0), · · · , s(N − 1)]. The resulting [X = AS] model
is called an instantaneous multi-input multi-output model, or I-MIMO for short. It is a generic
linear model for source separation, valid when the delay spread of the dominant rays is much
smaller than the inverse bandwidth of the signals, e.g., for narrowband signals, in line-of-sight
situations or in scenarios where there is only local scattering. Even though this appears to limit
its applicability, it is important to study it in its own right, since more complicated convolutive
models can often be reduced (after equalization or separation into sufficiently narrow subbands)
to X = AS.
The objective of beamforming for source separation is to construct a left-inverse WH of A, such
that WH A = I and hence WH X = S: see Fig. 4.6. This will recover the source signals from the
observed mixture. It immediately follows that in this scenario it is necessary to have d ≤ M to
ensure interference-free reception, i.e., not more sources than sensors. If we know already (part
of) S, e.g., because of training, then we can estimate W via WH = SX† = SXH (XXH )−1 , where
X† denotes the Moore-Penrose pseudo-inverse of X, here equal to its right inverse (see Chapter
5). With noise, other beamformers may be better.

Coherent multipath If we adopt the multipath propagation model, then A is endowed with a
parametric structure: every column ai is a sum of direction vectors a(θij ), with different fadings
βij . If the ith source is received through ri rays, then

βi1
 
ri
X  .. 
ai = a(θij )βij = [a(θi1 ), · · · , a(θi,ri )]  .  (i = 1, · · · , d) .
j=1
βi,ri

If each source has only a single ray to the receiver array (a line-of-sight situation), then each ai
is a vector on the array manifold, and identification will be relatively straightforward. The more

EE 4715 (2022): Array Signal Processing


74 Wideband data models

general case amounts to decomposing a given a-vector into a sum of vectors on the manifold,
which makes identification much harder.
To summarize the parametric structure in a compact way, we could collect all a(θij )-vectors and
path attenuation coefficients βij of all rays of all sources in single matrices Aθ and B,

Aθ = [a(θ11 ), · · · , a(θd,rd )] , B = diag[β11 , · · · , βd,rd ] .

To sum the rays belonging to each source into the single ai -vector of that source, we define a
selection matrix
1r1 0
 

J=
 .. : r×d

(4.6)
.
0 1rd

where r = d1 ri and 1m denotes an m × 1 vector consisting of 1’s. Together, this allows to write
P

the full (noise-free) I-MIMO data model as

X = AS , A = Aθ BJ . (4.7)

4.3.2 Convolutive model for one antenna and one source

To extend the instantaneous model to a situation with convolutive channels, let h[k] be a finite
impulse response (FIR) filter. The matrix equation corresponding to a convolution x[n] =
L−1
X
h[n] ∗ s[n] = h[k]s[n − k] is
k=0
 
h[0] 0
 
x[0]
   h[1] h[0]

 x[1]   
   ..

 x[2]   
.

h[2] h[1]

s[0]
   
 ..  
. ..

..


 . 
  h[2] . h[0]   s[1]  
x = Hs ⇔ . = (4.8)

.. ..
 
 .
.

  ..  
   h[L − 1] . . h[1] 
 . 
 ..  
..  s[Ns − 1]

.
  
   h[L − 1] . h[2] 
..
  
.. ..
   
 .   . .


x[N − 1] 0 h[L − 1]

where the “box” indicates the location of time-index 0, L is the channel length, Ns is the length
of the input sequence (prior and subsequent symbols are supposed to be zero; this is usually
achieved by a guard interval), and N = Ns + L − 1 is the length of the observation (ignoring
the other samples). Note that H has size Ns + L − 1 × Ns , so H is always tall. If there is no
guard interval, we have to drop the first L − 1 samples of x since they are “contaminated” by
prior symbols, and the top part of H has to be dropped accordingly. Likewise, we will probably

EE 4715 (2022): Array Signal Processing


4.3 Deterministic data models 75

have to drop the last L − 1 rows of H as well, if we have to assume that subsequent symbols
s[Ns ], s[Ns + 1], · · · are nonzero and unknown. This will reduce the size of H to Ns − L + 1 × Ns ,
and it is not tall anymore.
H has a Toeplitz structure: it is constant along diagonals. That structure always appears when
we have time-invariant systems.
Suppose we observe x and know the channel matrix H, and it is tall. The input sequence can be
estimated by taking a left inverse H† of H, such that H† H = I. Since H is tall, we can usually
take
H† = (H H)−1 H
H H

where, for now, we assume that HH H is invertible. This results in

ŝ = H† x = (H H)−1 H x .
H H
(4.9)

This is a block receiver: all entries of s are estimated simultaneously. If Ns is large, this is not
very efficient.
Due to the commutativity of the convolution, we can also write x[n] = s[n] ∗ h[n], and hence
 
  s[0] 0
x[0]  
   s[1] s[0] 
x[1]  
.. .. ..

  
.

 ..  
  . . 


 .   ..

h[0]

  s[L − 1] s[L − 2] . s[0]

x[L − 1]  
 
h[1] 
  

..
  .. .. .. .. 
x = Sh ⇔ 
 .
=
  . . . . 
..


 
  .. ..

. 
x[Ns − 1] 


   s[Ns − 1] . . s[Ns − L]  h[L − 1]

 

x[Ns ]   .. 
s[Ns − 1] . s[Ns − L + 1] 

 ..  
  
. ..

   .. 
 . .

x[N + L − 2]
s
 
0 s[Ns − 1]
(4.10)
Now S has a Toeplitz structure, it has size Ns + L − 1 × L. This expression can be used to
estimate the channel coefficients in case we know the transmitted symbols (e.g., due to a training
period), i.e., ĥ = S† x, where S† = (SH S)−1 SH . Note that S is tall: we need Ns ≥ 1.
The “0” blocks in S should be replaced by symbols in case the transmitter is not silent be-
fore/after the transmission of the training symbols (i.e., if there is no guard interval). Often
these are unknown. To estimate h we should omit all rows in S that contain unknown entries of
s[n] (and also drop the corresponding entries in x). This results in the model x0 = S0 h, where x0
and S0 are the parts of x and S between the horizontal lines in (4.10). S0 has size Ns − L + 1 × L,
and more samples are needed to make it have a left inverse: Ns ≥ 2L − 1.

EE 4715 (2022): Array Signal Processing


76 Wideband data models

4.3.3 Oversampling
Since in (4.9) we aim to invert H, we would like it to be tall. If it is not tall (e.g., due to lack
of a guard interval), we can sometimes make it more tall by considering oversampling. In this
context, oversampling means sampling faster than the symbol rate. Although it does not make
sense to sample much faster than the Nyquist rate, often the Nyquist rate is higher than the
symbol rate. E.g., in Fig. 4.5, we saw examples of the raised cosine pulse shape which is more
compact in time than a sinc pulse, but therefore has excess bandwidth in frequency (controlled
by the parameter α).
In the case of linear modulation, we can define

X
s(t) = sk δ(t − kT ) (4.11)
k=−∞

and define the modulation by a convolution with the pulse shape g(t). For convenience, we
normalize the symbol period to T = 1. Then the modulated signal is
X
u(t) = s(t) ∗ g(t) = g(t − k)sk .
k

As before, let x(t) be the baseband received signal. The impulse response of the channel from the
source to the receiver, h(t), is a convolution of the pulse shaping filter g(t) and the actual channel
response from u(t) to x(t). We can include any propagation delays and unknown synchronization
delays in h(t) as well. The data model is written compactly as the convolution x(t) = h(t) ∗ s(t).
Inserting (4.11) gives (with T = 1)
Z
h(t − t0 ) sk δ(t0 − k) dt0 =
X X
x(t) = sk h(t − k) . (4.12)
k k

This appears as a discrete-time convolution, even if x(t) and h(t) are continuous-time. An im-
mediate consequence of the FIR assumption is that, at any given moment, at most L consecutive
symbols play a role in x(t). Indeed, for t = n + τ , where n ∈ ZZ and 0 ≤ τ < 1, the convolution
(4.12) can be written as
L−1
X
x(n + τ ) = h(k + τ )sn−k . (4.13)
k=0

Suppose that we sample x(t) at a rate of P times the symbol rate.1 Then (4.13) shows that for
all samples that fall between times n and n + 1, the same L symbols play a role. If we define
   
x(n) h(k)

 x(n + P1 ) 


 h(k + P1 ) 

x[n] =  .. , h[k] =  ..  (4.14)
. .
   
   
P −1 P −1
x(n + P ) h(k + P )
1
For the raised cosine pulses, we would select P = 2.

EE 4715 (2022): Array Signal Processing


4.3 Deterministic data models 77

then we can write (4.13) as

L−1
X
x[n] = h[n] ∗ s[n] = h[k]sn−k . (4.15)
k=0

This is the same as we had before, but now using sample vectors consisting of the P samples
that fall within one sample period. Thus, (4.8) becomes

 
h[0] 0
 
x[0]
   h[1] h[0]

 x[1]   
   ..

 x[2]   .

h[2] h[1]

  s[0]
 
 .
..
 
. ..

  ..

h[2] . h[0]   s[1] 

   
x = Hs ⇔ .. = (4.16)

.. .
 

.
  .. 
. 
   h[L − 1]

. . h[1] 
 . 
 .. 
  ..  s[Ns − 1]

.

  
  h[L − 1] . h[2] 
..

  .. ..
 
 . 
  . .


x[N − 1] 0 h[L − 1]

Now, H is a block-Toeplitz matrix, where each block is a P × 1 vector. We can estimate the
symbols by inverting H as before, ŝ = H† x = (HH H)−1 HH x. Compared to the previous case,
H is a factor P times more tall, which is usually good for inversion.

In fact, it would seem that if we take P very large, then we can make H as tall as we want.
However, it does not make sense to sample (much) faster than the Nyquist rate. If we sample
faster, H might be tall but at some point its columns will not become more orthogonal to each
other. Thus, the condition number of H (see Sec. 5.4.6), an indicator of the amount of noise
enhancement, converges to a constant. Said differently, by oversampling, we collect more signal
energy, but we also collect more noise. A more detailed analysis is needed here, but we can
expect that sampling faster than Nyquist will not give benefits.

If we define the P × 1 vector

sn
 
 .. 
s[n] =  .  = sn ⊗ 1P
sn

EE 4715 (2022): Array Signal Processing


78 Wideband data models

where 1P is a P × 1 vector of ones, then an extension of (4.10) gives


 
  s[0] 0
x[0]
 
   s[1] s[0] 
x[1]
  
.. .. ..

  
.

  
 .. . . 


   . ..

h[0]

  s[L − 1] s[L − 2] . s[0]

x[L − 1]  
 
   h[1] 

..
  .. .. .. .. 
x = Sh ⇔ = .


.   . . .  .

 .
.

  ..


x[Ns − 1]   s[Ns − 1]
 ..  



. . s[Ns − L]   h[L − 1]
 

x[Ns ]   .. 
s[Ns − 1] . s[Ns − L + 1] 

 ..  
  
. ..

   .. 
 . .

x[N + L − 2]
s
 
0 s[Ns − 1]
(4.17)
where the symbol matrix S has size (Ns + L − 1)P × L. Clearly, S has many repeated entries:
we can write S = S ⊗ 1P .
Another way to stack the data is
 
x(0) x(1) · · · x(N − 1)

 x( P1 ) x(1 + 1
P ) · 

X = [x[0] ··· x[N − 1]] =  .. .. . (4.18)
. .
 
 
x( PP−1 ) · · · · x(N − 1 + P −1
P )

X has size P × N ; its nth column x[n] contains the P samples taken during the nth symbol
period. Based on the FIR assumption, it follows that X has a factorization

X = HS (4.19)

where
 
h(0) h(1) · · · h(L − 1)
h( 1 ) · · 
 P 
H = [h[0] h[1] ··· h[L − 1]] = 
 .. .. : P ×L (4.20)
 . .


h( PP−1 ) · · · · h(L − P1 )
 
s0 s1 · · · sL−1 · · · sNs −2 sNs −1 0

 s0 · · · · · · ··· ··· sNs −2 sNs −1 

S =  .. .. .. : L×N,

 . s1 ··· ··· ··· . . 

0 s0 ··· ··· sNs −L ··· sNs −2 sNs −1

and N = Ns + L − 1. This factorization is readily derived from a transpose of (4.10). It is


seen in the definition of H that we have “folded” samples of h(t) into a matrix. As a result of

EE 4715 (2022): Array Signal Processing


4.3 Deterministic data models 79

s(t) H(t) ↓
w1∗
1 z
P ↓ xk w2∗
z

z

z
↓ xk−1 yk
z


z
↓ xk−m+1
z ∗
wmP

Figure 4.7. Equalizer

the different organization of the received data into a matrix, not a vector, we have avoided the
Kronecker-repetition of the symbols in S = S ⊗ 1P that was present in (4.16).
A problem with using this factorization to estimate S compared to the estimation based on
(4.16) is that the Toeplitz structure of S is not enforced: in the presence of noise, Ŝ = H† X
is not Toeplitz. The redundancy in S is not exploited. Also, usually P is not very large (e.g.,
P = 2), and therefore, usually H is not tall.

4.3.4 Stacking and linear equalization

A linear equalizer in the present context can be written as a vector w which combines the rows
of X to generate an output y = wH X. If we consider the model X = HS, then we would require
wH H = [0, · · · , 0, 1, 0, · · · 0], so that equalization by w results in y equal to one of the rows of S.
In the noise-free model of (4.20), it doesn’t really matter which row of S is reconstructed: we
have L options, and they only differ by a delay. Since we only combine the P samples of x(t) in
one symbol period, the equalizer length is one symbol period.
Often, it is much better to filter over multiple sample periods. For a linear equalizer with a
length of m symbol periods, we have to augment X with m − 1 horizontally shifted copies of
itself:
.
x[1] . .
 
x[0] x[N − m]
. .
 x[1] x[2] . . .. 
X =  .. . .  : mP × N − m + 1 .
 
 . .. .. x[N − 2] 
.
x[m − 1] . . x[N − 2] x[N − 1]

Each column of X is a regression vector: the memory of the filter. Using X , a linear equalizer
over m symbol periods can be written as y = wH X , which combines mP snapshots: see Fig.
4.7.

EE 4715 (2022): Array Signal Processing


80 Wideband data models

The augmented data matrix X has a factorization


  .. 
0 H sm−1 . sN −2 sN −1
. .  .. .. ..
 .. ..  . . . sN −2 
X = HS =  .. .. (4.21)
  
H   s−L+2 s−L+3 . .
 
.
 
H 0 s−L+1 s−L+2 . . sN −L−m+1

where H = Hm has size mP × L + m − 1. H has a block-Hankel structure: it is constant along


antidiagonals. S has the same structure as S in (4.20) but size L + m − 1 × N − m + 1.
In this factorization, oversampling is not essential: we can also have P = 1. In that case, H
will not be tall for any m so that perfect equalization for finite m is not possible. The reason
is that we try to invert an FIR channel by an FIR filter! Generally, without oversampling we
will need an ARMA filter to do this. A common problem is that the ARMA filter may easily
become unstable (e.g., if the FIR filter is non-minimum phase: zeros outsize the unit circle).

4.3.5 Multiple antennas and multiple sources: FIR-MIMO model


Instead of oversampling, we may also consider the use of multiple antennas. In (4.14), we defined
x[n] as a stack of P samples. We can also define x[n] to be a stack of M antenna outputs at
time n. Likewise, h[k] in (4.14) simply becomes an arbitrary vector, e.g., the sum of array
response vectors for multipath components arriving at a delay of k samples. The convolution
model (4.15) remains unchanged. That means that all the subsequent steps, i.e., the stacking
of x[n] into X and X , and the resulting factorization models, are unchanged.
We can also consider oversampling together with multiple antennas. In that case, each vector
x[n] will have size M P . The stacking and factorization models are unchanged, except that Hm
will have size mM P × L + m − 1. It is now much easier to have Hm tall.
In the I-MIMO model, we considered multiple sources. In a more general FIR-MIMO model,
we can also do this. This models d sources arriving at an antenna array with M antennas,
and convolutive channels, and oversampling by a factor P : see Fig. 4.8. This extension to
FIR-MIMO is straightforward extension, although the notation becomes a bit cluttered. As
simplifying assumption, we could start by assuming the d sources have the same symbol rate,
so that the oversampling rate P has the same meaning. Assume that the ith source received on
the jth antenna has an FIR channel hij (t) of length Lj symbols. If for the ith source, we have
a model X = Hi Si as in (4.21), then with d sources we can write

S1
 
 .. 
X = HS , H = [H1 , · · · , Hd ] , S= . .
Sd

If the d have the same channel lengths L, then we could also rearrange this to arrive at a model
as in (4.21), but now with block matrices H of size M P × dL, and d-dimensional vectors sk in
S. The m shifts of H to the left in H then are each over d positions.

EE 4715 (2022): Array Signal Processing


4.3 Deterministic data models 81

h11

x1 (t)
x1k
1
P

h1d
s1 (t)

sd (t)
hM 1

xM (t)
xM k
1
P

hM d

Figure 4.8. Multiuser convolutive channel model. Input signals s1 (t), · · · , sd (t) are synchro-
nized dirac-pulse sequences.

EE 4715 (2022): Array Signal Processing


82 Wideband data models

In this general case, H has size mM P × d(L + m − 1). A necessary condition for space-time
equalization (the output y is equal to a row of S) is that H is tall, which gives minimal conditions
on m in terms of M, P, d, L:
mM P ≥ d(L + m − 1) ⇒ m(M P − d) ≥ d(L − 1)
which implies
d(L − 1)
MP > d , m≥ .
MP − d

4.3.6 Connection to the parametric multipath model


For a single source, recall the multipath propagation model (4.2), valid for specular multipath
with small cluster angle spread:
r
X
h(t) = a(θi )βi g(t − τi ) (4.22)
i=1

where g(t) is the pulse shape function by which the signals are modulated. In this model, there
are r distinct propagation paths, each parameterized by (θi , τi , βi ), where θi is the direction-of-
arrival (DOA), τi is the path delay, and βi ∈ C| is the complex path attenuation (fading). The
vector-valued function a(θ) is the array response vector for an array of M antenna elements to
a signal from direction θ.
Suppose as before that h(t) has finite duration and is zero outside an interval [0, L). Conse-
quently, g(t − τi ) has the same support for all τi . At this point, we can define a parametric “time
manifold” vector function g(τ ), collecting LP samples of g(t − τ ):
 
g(0 − τ )

 g( P1 − τ ) 

g(τ ) =  .. , 0 ≤ τ ≤ max τi .
.
 
 
1
g(L − P − τ)
If we also construct a vector h with samples of h(t),
 
h(0)

 h( P1 ) 

h= .. 
.
 
 
1
h(L − P)

then it is straightforward to verify that (4.22) gives


β1
 
r
X  .. 
h= (gi ⊗ ai )βi = [g1 ⊗ a1 , · · · , gr ⊗ ar ]  . 
i=1
βr
gi = g(τi ) , ai = a(θi ) .

EE 4715 (2022): Array Signal Processing


4.3 Deterministic data models 83

Thus, the multiray channel vector is a weighted sum of vectors on the space-time manifold
g(τ ) ⊗ a(θ). Because of the Kronecker product, this is a vector in an LP M -dimensional space,
with more distinctive characteristics than the M -dimensional a(θ)-vector in a scenario without
delay spread. The connection of h with H as in (4.20) is that h = vec(H), i.e., h is a stacking
of all columns of H in a single vector.
We can define, much as before, parametric matrix functions

Aθ = [a(θ1 ) · · · a(θr )] , Gτ = [g(τ1 ) · · · g(τr )] , B = diag[β1 · · · βr ]

Gτ ◦ Aθ := [g1 ⊗ a1 , · · · , gr ⊗ ar ]
(Gτ ◦ Aθ ) is a columnwise Kronecker product known as the Khatri-Rao product; its properties
are discussed in Sec. 5.1.6. This gives h = (Gτ ◦ Aθ )B1r .
Extending now to d sources, we get that the M P ×dL-sized matrix H in (4.20) can be rearranged
into an M P L × d matrix
H0 = [h1 , · · · , hd ] = (Gτ ◦ Aθ )BJ . (4.23)
where J is the selection matrix defined in (4.6) that sums the rays into channel vectors. (Gτ ◦Aθ )
now plays the same role as Aθ in Sec. 4.3.1. Each of its columns is a vector on the space-time
manifold.

4.3.7 Summary

A summary of the noise-free data models developed so far is

I-MIMO: X = AS , A = Aθ BJ
(4.24)
FIR-MIMO: X = HS , H ↔ H0 = (Gτ ◦ Aθ )BJ

The first part of these model equations is generally valid for linear time-invariant channels,
whereas the second part is a consequence of the adopted multiray model in the form of a
parametric channel model.
Based on this model, the received data matrix X or X has several structural properties. In
several combinations, these are often strong enough to allow to find the factors A (or H) and S
(or S), even from knowledge of X or X alone. Very often, this will be in the form of a collection
of beamformers (or space-time equalizers) {wi }d1 such that each beamformed output wiH X = si
is equal to one of the source signals, so that it must have the properties of that signal.
One of the most powerful “structures”, on which most systems today rely to a large extent, is
knowledge of part of the transmitted message (a training sequence), so that several columns
of S are known. Along with the received signal X , this allows to estimate H. Very often, an
unparameterized FIR model is assumed here. The algorithms are using a temporal reference.
Algorithms that do not use this are called blind. Examples of this will be discussed in the coming
chapters.

EE 4715 (2022): Array Signal Processing


84 Wideband data models

4.4 FREQUENCY-DOMAIN DATA MODELS

TBD: for wideband data, consider STFT, split into narrow subbands; this translates a wideband
model into a set of narrowband models. These models can be processed independently, or
(better) jointly.
Diagonalization of Hankel matrix (if circulant).

4.4.1 OFDM

TBD

4.5 APPLICATION: RADIO ASTRONOMY

4.5.1 Instrument design

New instruments are designed to achieve higher performance:

– Higher resolution implies longer baselines. We will see that this results in shorter integration
time (due to earth rotation), and more (narrower) subbands.
– Higher sensitivity implies a larger number of antennas (typically grouped in stations), longer
observing times, and better calibration (direction dependent).
– Higher survey speed requires a larger total bandwidth, a larger field-of-view, multiple beams,
and direction dependent calibration.

This results in larger data sets, higher computational demands, and the need for better calibra-
tion and imaging algorithms.
The performance of a radio telescope depends on many parameters: the spatial resolution de-
pends on the diameter of the instrument, the number of spatial samples on the number of STI’s,
the finite sample noise in a single STI depends on the number of samples N that we average in
that STI, etc. How should these parameters be designed?
As it turns out, a lot depends on the non-stationarity introduced by the rotation of the earth,
and constraints resulting from a requirement to satisfy the narrowband assumption. Essentially,
we can average samples that differ by phase factors φ = exp(−j2πf τ ) if they satisfy:

– Narrowband condition: f τ should be approximately constant for f ∈ (fmin , fmax ) and all
geometric delays τ . If W = fmax − fmin is the bandwidth, this translates to

Wτ  1.

For omni-directional antennas, the maximal geometric delay is τmax = D/c, for source signals
arriving in the same direction as the baseline. For directional antennas, D is scaled by sin(θ),

EE 4715 (2022): Array Signal Processing


4.5 Application: radio astronomy 85

t t + ∆t

α
τ
α
D

Figure 4.9. Due to earth rotation, the stars appear to move relative to the array.

where the maximal θ depends on the field of view.2 Thus,


c
W  (4.25)
D sin θ)

This condition determines maximum processing bandwidth, as a function of the array diam-
eter D. A signal with a larger total bandwidth should first be split into sufficiently narrow
subbands (“channels”).
– Stationarity condition: f τ should be approximately constant while the baselines move (due
to earth rotation).
This determines the maximum processing time (STI), also a function of the array diameter,
because for longer baselines, τ changes faster.

The latter condition is worked out as follows. The earth rotation rate is ωe = 1day = 7.27 · 10−5
rad/s. The sky appears to move with this angular speed. “A day” is taken here to be a sidereal
day, i.e., taking into account that the earth makes an extra revolution over the course of a year;
it is about 4 minutes shorter than 24 hours.
As shown in Fig. 4.9, over a small time period ∆t, the earth rotates over a small angle α = ωe ∆t.
If initially we had τ = 0, we now have

D D
τ= sin(α) ≈ α
c c
Thus, the rate of change of τ due to earth rotation is

dτ D dα D
= = ωe
dt c dt c
2
This requires some elaboration.

EE 4715 (2022): Array Signal Processing


86 Wideband data models

station calibration
channelization beamforming

wk

x1 (t) xk [n]
A/D filter
bank

wkH xk [n]

xM (t)
A/D filter
bank

Figure 4.10. Station data processing

Integrating phases φ = e−j2πf τ coherently over a period T requires, at the largest frequency
fmax ,
dτ c λmin
fmax · T 1 ⇒ T  = (4.26)
dt Dfmax ωe D ωe

This condition limits the STI. It depends on the observing frequency and the array diameter.
As example, we can look at the design of a first phase of the Square Kilometre Array (SKA), i.e.,
SKA1-Low: a low-frequency aperture array planned for around 2021-2023. Initial SKA1 design
objectives were specified in 2013 [12], but several of these numbers were scaled down later. The
architecture of the instrument is similar to that of LOFAR (commissioned in 2007), but at a
larger scale.
Generally, for the lower frequencies, the idea is that simple non-steerable antennas are grouped
into stations. The beamformed output of a station mimics that of a steerable dish. Next, the
station output signals are combined (correlated) at a central location.
The initial SKA1 design called for 131,072 antennas, divided over 512 stations each with 256
dual-polarized antennas. For SKA1-Low, the frequency range is 50–300 MHz, sampled using
log-periodic (somewhat directional) antennas. The maximal baseline was set at 100 km (later
scaled down to 65 km), and each station has a diameter of 35 m.
The objective of station data processing is to produce beamformed outputs. An important
consideration is that this will reduce the raw datarate by a factor equal to the number of
antennas in a station. If budget permits, a station can produce multiple beamformed outputs,
increasing the survey speed.
The number of coarse frequency channels that are needed is determined by the maximal band-
width that satisfies the narrowband condition. This depends on the station diameter. For this

EE 4715 (2022): Array Signal Processing


4.5 Application: radio astronomy 87

tracking channelization correlation

station 1 xk [n]

τ filter
bank

P Rij,k

station P

τ filter
bank

Figure 4.11. Central signal processing.

example, (4.25) gives, for a station with omnidirectional antennas,

D = 35 m ⇒ Wchan  8.6 MHz.

In actuality, narrower subbands are proposed e.g. to facilitate station calibration.


Next, at a central location, the beamformed data from the stations are correlated, and averaged
over short time intervals. With P stations, the correlation matrices have size P × P . For the
SKA1 example, P = 512.
The number of frequency channels is again determined by the narrowband condition, but this
time depends on the instrument diameter. For example:

B = 100 km ⇒ Wchan  4.2 kHz.

For the selected SKA1 design parameters, we could choose 250,000 channels with Wchan = 1
kHz. However, this result is valid if the array aperture B is filled with omnidirectional antennas.
In reality, it is filled with stations that have beamformed outputs, and the beams limit the field
of view to
λ
sin(θ) =
D
Using (4.25) gives
cD D
Wchan   fmin
Bλ B
Eq. (4.26) gives ?????????? TBD
D
T < 1200
B
With B = 100 km (the longest baseline) and D = 35 m (station diameter), the maximal
integration time is T < 0.4 sec.

EE 4715 (2022): Array Signal Processing


88 Wideband data models

Thus, the output of the central signal processing step are complex correlation matrices of size
512 × 512 times 250,000 channels, each 0.4 sec, times 4 polarizations.
Depending on P , it can happen that the the correlator produces several times more output
“data” than flows in: the input vectors of size P are transformed into matrices of size P × P ,
and then averaged. If the STI T is too short, it is more efficient to work with the original
data (X-matrices) rather than correlated data (R-matrices). The correlation matrices will be
rank-deficient, indicasting redundancy.
“Baseline dependent averaging” (analyzed in [13]) exploits the fact that up to 90% of the base-
lines are short and can be integrated over longer times and larger bandwidths. This may
significantly reduce the datarates.
The hardware bottleneck is perhaps not the required computational complexity (flops) but the
required bandwidth to transport all the data. In particular, a seemingly trivial operation is
this: data comes in from the various stations, each with a large number of subbands, and has
to be rearranged such that for each subband the data for all stations is together. This does not
involve computations but is a massive communication operation nonetheless.

Resolution So far, we have seen that B and D determine the number of subband channels and
the integration length. They also determine the resolution. To see this, consider first a single
station. If the antennas are placed sufficiently dense in a rectangular or circular aperture with
diameter D, then we have seen in (2.21) that the beamwidth is θs ∼ λ/D. This determines the
instantaneous field of view (FOV) of a station, which acts as a dish element in the entire array.
If the array has a diameter B, then its beamwidth is θ ∼ λ/B. This beam partitions the FOV.
Let
θs B
N := =
θ D
The field of view is covered by N beams in each dimension: we have a resolution of N “pixels” in
each dimension. Thus, without superresolution techniques, we can expect to create an image of
size N × N pixels. For the SKA1-Low example: D = 35 m, B = 100 km, resulting in N ∼ 3000.
Interestingly, N is independent of frequency, but the FOV is frequency dependent. Thus, al-
though the resulting image is size N ×N , the area on the sky which the image covers is frequency
dependent. Lower frequencies (larger λ) cover a larger area.

Number of subband channels For the complete instrument, the longest baseline B determines
the narrowband constraint. The maximum subband bandwidth is (taking into account the
reduced field of view)
D
Wchan  fmin .
B
For example, we can set
D
Wchan = 0.1 · fmin .
B

EE 4715 (2022): Array Signal Processing


4.6 Notes 89

For a total bandwidth Wtot , the number of subband channels is proportional to

Wtot B Wtot
Nchan = = 10 ·
Wchan D fmin
B
Thus, in first approximation, N ∼ D determines each dimension of the image, and the number
of frequency bins, i.e., the entire “image cube”. The main drivers for complexity are thus P 2
 3
B Wtot
and the ratios (the instrument to station diameter) and (the fractional bandwidth).
D fmin

4.6 NOTES

Section 4.5 is based on Van der Veen e.a. [14].

Bibliography

[1] W.C. Jakes, ed., Microwave Mobile Communications. New York: John Wiley, 1974.

[2] F. Adachi and etal, “Cross correlation between the envelopes of 900 MHz signals received
at a mobile radio base station site,” IEE Proceedings, vol. 133, pp. 506–512, Oct. 1986.

[3] W.C.Y. Lee, “Effects on correlation between two mobile radio basestation antennas,” IEEE
Tr. Comm., vol. 21, pp. 1214–1224, Nov. 1973.

[4] W.C.Y. Lee, Mobile Communications Design Fundamentals. New York: John Wiley, 1993.

[5] B. Sklar, “Rayleigh fading channels in mobile digital communication systems, part I: Char-
acterization,” IEEE Communications Magazine, vol. 35, pp. 90–100, July 1997.

[6] A.J. Paulraj and C.B. Papadias, “Space-time processing for wireless communications,”
IEEE Signal Proc. Mag., vol. 14, pp. 49–83, November 1997.

[7] T. Trump and B. Ottersten, “Estimation of nominal direction of arrival and angular spread
using an array of sensors,” Signal Processing, vol. 50, pp. 57–69, April 1996.

[8] B. Ottersten, “Array processing for wireless communications,” in Proc. IEEE workshop on
Stat. Signal Array Proc., (Corfu), pp. 466–473, June 1996.

[9] T.S. Rappaport, Wireless Communications: Principles and Practice. Upper Saddle River,
NJ: Prentice Hall, 1996.

[10] K. Pahlavan and A.H. Levesque, “Wireless data communications,” Proc. IEEE, vol. 82,
pp. 1398–1430, September 1994.

EE 4715 (2022): Array Signal Processing


90 Wideband data models

[11] E.A. Lee and D.G. Messerschmitt, Digital Communication. Boston: Kluwer Publishers,
1988.

[12] P.E. Dewdney, W. Turner, R. Millenaar, R. McCool, J. Lazio, and T.J. Cornwell, “SKA1
system baseline design,” Tech. Rep. SKA-TEL-SKO-DD-001, SKA Office, 2013.

[13] S.J. Wijnholds, A.G. Willis, and S. Salvini, “Baseline-dependent averaging in radio inter-
ferometry,” Monthly Notices of the Royal Astronomical Society, vol. 476, pp. 2029–2039,
Feb. 2018.

[14] A.J. van der Veen, S.J. Wijnholds, and A.M. Sardarabadi, “Signal processing for radio
astronomy,” in Handbook of Signal Processing Systems, 3rd ed., Springer, November 2018.
ISBN 978-3-319-91734-4.

EE 4715 (2022): Array Signal Processing


Chapter 5

LINEAR ALGEBRA BACKGROUND

Contents
5.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.2 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
5.3 The QR factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.4 The singular value decomposition (SVD) . . . . . . . . . . . . . . . . 99
5.5 Pseudo-inverse and the Least Squares problem . . . . . . . . . . . . 105
5.6 The eigenvalue problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
5.7 The generalized eigenvalue decomposition . . . . . . . . . . . . . . . 109
5.8 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

TBD: notation: N vs n vs d; check duplicate material (pseudo-inverse)


Throughout the book, several linear algebra concepts such as subspaces, QR factorizations,
singular value decompositions and eigenvalue decompositions play an omni-important role. This
chapter gives a brief review of the most important properties as needed here.1
An extensive tutorial to linear algebra in relation to signal processing can be found in Moon and
Stirling [1]. Suitable reference books on advanced matrix algebra are Golub and Van Loan [2],
and Horn and Johnson [3].

5.1 BASICS

5.1.1 Notation

A bold-face letter, such as x, denotes a vector (usually a column vector, but occasionally a row
vector). Matrices are written with capital bold letters. A matrix A has entries aij , and columns
1
On a first reading, the more advanced topics should probably be skipped until needed in a future chapter, as
indicated there.

EE 4715 (2022): Array Signal Processing


92 Linear algebra background

aj , and we can write


A = [aij ] = [a1 a2 · · · aN ] .
The M × M identity matrix is denoted by IM , or I for short. A matrix or vector with only zero
entries is denoted by 0M ×N , or 0 for short.
Complex conjugate is denoted by an overbar, the transpose of a matrix is denoted by AT = [aji ].
T
For complex matrices, the complex conjugate (= hermitian) transpose is AH := A .

5.1.2 Matrix products


A matrix A and a vector x can be multiplied if their sizes match: the number of columns of A
should equal the number of entries in x. In that case
 
x1

 x2 

y = Ax = [a1 a2 · · · aN ]  ..  = a1 x1 + a2 x2 + · · · + aN xN
.
 
 
xN

In general, X
yi = Aij xj , i = 1, · · · , M
j

Likewise, two matrices A and B can be multiplied if their “inner dimensions” match (i.e., the
number of columns of A equals the number of rows of B. In that case
X
C = AB ⇔ Cij = Aik Bkj
k

5.1.3 Inner product and norms


The inner product of two vectors a and b of equal size is
H
ha, bi = b a .

The inner product is used to define a vector norm kak by


X
kak2 = a a = |ai |2
H

This satisfies the required properties of a norm, i.e.,

kak ≥ 0
kak = 0 ⇔ a = 0.

It also satisfies the triangle inequality:

c=a+b ⇒ kck ≤ kak + kbk

EE 4715 (2022): Array Signal Processing


5.1 Basics 93

with equality only if a is parallel to b.


The inner product satisfies the inequality

H
|b a| ≤ kak kbk

and this allows to define the angle θ between two vectors via

H
b a = kak kbk cos(θ) .

5.1.4 Matrix norms

The induced matrix 2-norm of a matrix A (also called the spectral norm, or the operator norm)
is
kAxk
k A k := max
x kxk

It represents the largest magnification of a vector x that can be obtained by applying A to it.
Another expression for this is
xH AH Ax
k A k2 = max
x xH x

The Frobenius norm of A represents the energy contained in its entries:

|Aij |2 )1/2
X
k A kF = (

5.1.5 Trace

For a square matrix A, the trace of A is the sum of its diagonal entries:
X
tr[A] = Aii
i

It has several properties. One is


tr[AB] = tr[BA]

The Frobenius norm of A is


X
kAk2F = |Aij |2 = tr[A A]
H

ij

since the ith diagonal entry of AH A is given by A∗ji Aji .


P
j

EE 4715 (2022): Array Signal Processing


94 Linear algebra background

5.1.6 Kronecker products and the vec operator


For a matrix, vec(·) denotes the stacking of the columns of a matrix into a vector. Conversely,
vec−1 (·) is the inverse of vec(·): the construction of a matrix out of a vector. This is not
unambiguous: the matrix dimensions must be made clear from the context.
For a vector, diag(v) is a diagonal matrix with the entries of v on the diagonal. For a matrix,
vecdiag(A) is a vector consisting of the diagonal entries of A.
For two matrices A and B, the Kronecker product is defined as
a11 B · · · a1N B
 

A⊗B= .. ..
,
 
. .
aM 1 B · · · aM N B
and the Schur-Hadamard product as
a11 b11 · · · a1N b1N
 

A B= .. ..
,
 
. .
aM 1 bM 1 · · · aM N bM N
provided A and B have the same size.
A rank-one matrix has the form abT . It can be written using the Kronecker product as
T
vec(ab ) = b ⊗ a ,
and similarly, for complex vectors,
H
vec(ab ) = b ⊗ a . (5.1)
This can be readily shown by writing the products in full:
 
b1 a1

 b1 a2 

 .. 

 . 

b1 aM
 
 
 
   b2 a1 
a1 b1 a1 b2 ··· a1 bN
 

 b2 a2 

H

 a2 b1 a2 b2 ··· a2 bN 
  .. 
ab =  , b⊗a= .
 
.. .. .. 
. . .
 
b2 aM
   
 
aM b1 aM b2 · · · aM bN 
 ..



 . 


 bN a1 

bN a2
 
 
..
 
 
 . 
bN aM

EE 4715 (2022): Array Signal Processing


5.1 Basics 95

We will also often use the more general matrix identity


T
vec(ABC) = (C ⊗ A)vec(B) .

This can be proven using (5.1), by writing ABC as a sum of rank-one components of the form
(ai cTj )bij , where ai is the ith column of A, and cTj the jth row of C.
◦ denotes the Khatri-Rao product, i.e., a column-wise Kronecker product:

A ◦ B := [a1 ⊗ b1 a 2 ⊗ b2 ··· ].

This forms a submatrix of A ⊗ B.


Notable properties of Kronecker products are (for matrices and vectors of compatible sizes):

vec(ab ) = b∗ ⊗ a
H
(5.2)
(A ⊗ B)(C ⊗ D) = AC ⊗ BD (5.3)
(A ⊗ B)(C ◦ D) = AC ◦ BD (5.4)
H H H
(A ◦ B) (C ◦ D) = A C B D (5.5)
H H
(a ⊗ B)C = a ⊗ BC (5.6)
T
vec(ABC) = (C ⊗ A)vec(B) (5.7)
T
vec(A diag(b) C) = (C ◦ A)b (5.8)
H H H
[a ⊗ b][c ⊗ d] = ac ⊗ bd (5.9)
H H
= a ⊗ bc ⊗ d (5.10)
H H
= c ⊗ ad ⊗ b
T T H H
tr(AB) = vec (A )vec(B) = vec (A )vec(B) (5.11)
(5.12)
T T T H H T
tr(ABCD) = vec (A )(D ⊗ B)vec(C) = vec (A )(D ⊗ B)vec(C) (5.13)
tr(A ⊗ B) = tr(A)tr(B) (5.14)

Let A be a P ×Q matrix. Clearly, vec(A) and vec(AT ) contain the same elements, but organized
differently: they are related by a permutation matrix. This matrix is called the commutation
matrix, and denoted by KP,Q :
vec(AT ) = KP,Q vec(A) .
For any P × Q matrix A and M × N matrix B we have

(A ⊗ B)KQ,N = KP,M (B ⊗ A) (5.15)


(A ◦ B) = KP,M (B ◦ A), (5.16)

where Q = N for (5.16).


Pointwise multiplication by generally does not preserve rank: the rank of A B can be higher
than that of A or B. For example, if B = 11T , then B has rank 1, while A B = A has rank

EE 4715 (2022): Array Signal Processing


96 Linear algebra background

equal to A, possibly larger than 1. If A = aaH has rank 1, and B = I, then A B = diag(A)
can have full rank while A has rank 1. An exception is that abH cdH does have rank 1. This
is shown by considering
H H H
ab ⊗ cd = (a ⊗ c)(b ⊗ d) ,
(which is rank 1) and noting that abH cdH is a submatrix of abH ⊗ cdH .

5.2 SUBSPACES

The space H spanned by a collection of vectors {xk }

H := {α1 x1 + · · · + αn xn | αi ∈ C| , ∀i}

is called a linear subspace.


Important examples of subspaces are

Range (column span) of A: ran(A) = {Ax : x ∈ C| N }


Kernel (nullspace) of A: ker(A) = {x ∈ C| N : Ax = 0}

One routinely shows that


ran(A) ⊕ ker(AH ) = C| M
ran(AH ) ⊕ ker(A) = C| N
Here, H1 ⊕ H2 denotes the direct sum of two linearly independent subspaces, namely {x1 +
x2 | x1 ∈ H1 , x2 ∈ H2 }.

5.2.1 Linear independence

A collection of vectors {xi } is called linearly independent if

α1 x1 + · · · + αn xn = 0 ⇔ α1 = · · · = αn = 0 .

5.2.2 Basis

An independent collection of vectors {xi } that together span a subspace is called a basis for that
subspace.
If the vectors are orthogonal (xHi xj = 0, i 6= j), it is an orthogonal basis.
If moreover, the vectors have norm 1: kxi k = 1, the basis is called orthonormal.
The basis for a subspace is not unique.
Often, we stack the basis vectors xi into a matrix X and, with abuse of terminology, call that
matrix a basis.

EE 4715 (2022): Array Signal Processing


5.2 Subspaces 97

5.2.3 Rank
The rank of a matrix X is the number of independent columns (or rows) of X.
A prototype rank-1 matrix is X = abH , a prototype rank-2 matrix is X = a1 bH1 +a2 bH2 , etcetera:
a rank-d matrix is
H H H H
X = aa1 b1 + a2 b2 + · · · + ad bd = AB
where A = [a1 , a2 , · · · , ad ] and B = [b1 , b2 , · · · , bd ]. Thus, a matrix factorization X = ABH
where the “inner dimension” is d shows that the rank of X is (at most) d. This decomposition
is called a dyadic decomposition; it is not unique.
The rank cannot be larger than the smallest size of the matrix (when it is equal, the matrix is
full rank, otherwise it is rank deficient). A tall matrix is said to have full column rank if the
rank is equal to the number of columns: the columns are independent. Similary, a wide matrix
has full row rank if its rank equals the number of rows.

5.2.4 Unitary matrix and isometry


A real (square) matrix U is called an orthogonal matrix if UT U = I, and UUT = I.
Likewise, a complex matrix U is unitary if UH U = I, UUH = I.
A unitary matrix looks like a rotation and/or a reflection. Its norm is k U k = 1, and its columns
are orthonormal.
A tall matrix Û is called an isometry if ÛH Û = I. Its columns are an orthonormal basis of a
subspace (not the complete space), its norm is k Û k = 1. There is an orthogonal complement
Û⊥ of Û such that U = [Û Û⊥ ] is square and unitary.

5.2.5 Projection
A square matrix P is a projection if PP = P. It is an orthogonal projection if also PH = P.
The norm of an orthogonal projection is k P k = 1. For an isometry Û, the matrix P = ÛÛH is
an orthogonal projection onto the space spanned by the columns of Û. This is the general form
of an orthogonal projection.
Suppose U = [|{z}
Û Û⊥ ] is unitary. Then,
|{z}
d M −d

1. from UH U = Im :

Û Û⊥ = 0 , (Û⊥ ) Û⊥ = IM −d .


H H H
Û Û = Id ,

2. from UUH = IM :

ÛÛ + Û⊥ (Û⊥ ) = IM , Û⊥ (Û⊥ ) = P⊥


H H H H
ÛÛ = Pc , c = I − Pc

EE 4715 (2022): Array Signal Processing


98 Linear algebra background

This shows that any vector x ∈ C| M can be decomposed into x = x̂ + x̂⊥ , where x̂ ⊥ x̂⊥ ,

x̂ = Pc x ∈ ran(Û) , x̂⊥ = P⊥ ⊥
c x ∈ ran(Û )

The matrices ÛÛH = Pc and Û⊥ (Û⊥ )H = P⊥c are the orthogonal projectors onto the column
span of X and its orthogonal complement in C M respectively.
|

Similarly, we can find a matrix V̂H whose rows span the row span of X, and augment it with a
matrix V̂⊥ to a unitary matrix V:

d N −d
h↔ ↔ i
V =N l V̂ V̂⊥ .

The matrices V̂V̂H = Pr and V̂⊥ (V̂⊥ )H = P⊥ r are orthogonal projectors onto the original
subspaces in C| N spanned by the columns of V̂ and V̂⊥ , respectively. The columns of V̂⊥ span
the kernel (or nullspace) of X, i.e., the space of vectors a for which Xa = 0.

5.3 THE QR FACTORIZATION

Let X : N × N be a square matrix of full rank. Then there is a decomposition X = QR,


 
r11 r12 · · · r1N
 0 r22 · · ·
i r2N
h i h 

x1 x2 · · · xN = q1 q2 · · · qN  .. .. 
 0

0 . .


0 0 0 rN N

The interpretation is that q1 is a normalized vector with the same direction as x1 , similarly
[q1 q2 ] is an isometry spanning the same space as [x1 x2 ], etcetera.
If X : M × N is a tall matrix (M ≥ N ), then there is a decomposition
" #

X = QR = [Q̂ Q̂⊥ ] = Q̂R̂
0

Here, Q is a unitary matrix, R̂ is upper triangular and square. R is upper triangular with
M − N zero rows added. X = Q̂R̂ is called an “economy-size” QR.
If R̂ is nonsingular (all entries on the main diagonal are invertible), then d = N , the columns
of Q̂ form a basis of the column span of X, and Pc = Q̂Q̂H . If R̂ is rank-deficient, then this
is not true: the column span of Q̂ is too large. However, the QR factorization can be used as
a start in the estimation of an orthogonal basis for the column span of X. Although this has
sometimes been attempted, it is numerically not very robust to use the QR directly to estimate
the rank of a matrix. (Modifications such as a “rank-revealing QR” do exist.)

EE 4715 (2022): Array Signal Processing


5.4 The singular value decomposition (SVD) 99

Likewise, for a “wide” matrix (M ≤ N ) we can define an RQ factorization


" #

X = RQ = [R̂ 0]
Q̂⊥

(for different Q and R). Now, X and R̂ have the same singular values and left singular vectors.

5.4 THE SINGULAR VALUE DECOMPOSITION (SVD)

For a given (complex) matrix X of size m × n, where we assume m > n, the SVD is defined by
 
σ1

 σ2 

X = UΣV ,
H
U = [u1 · · · um ] ,

Σ= .. 
, V = [v1 · · · vn ] (5.17)
 . 
σn 
 

0 ··· ··· 0

where U : m × m and V : n × n are orthogonal matrices, and Σ is a diagonal matrix of size m × n


containing the singular values in descending order. These are non-negative (real) numbers. Note
that Σ has a block of m − n “zero” rows at the bottom.
Any matrix X has this decomposition [2]. If X is real-valued, then the factors are also real-
valued. Algorithms to compute the SVD are iterative and of complexity O(m2 n), but with a
large scale factor: think about 20m2 n. This is much more complex than a QR factorization. In
fact, a QR factorization is usually applied as a preprocessing step to compute the SVD.
Since Σ has m − n rows with zeros, and often m can be very large, it is inefficient to keep so
many columns of U that are anyway not used (they are multiplied by the zeros). Thus, we can
also define the “economy-size” SVD, where
 
σ1
H

 σ2 

X = UΣV , U = [u1 · · · un ] , Σ= .. , V = [v1 · · · vn ] (5.18)

 . 

σn

where U : m × n is a tall matrix of the same size as X, and V and Σ are n × n. Note that
UH U = I but UUH 6= I because it is an m × m matrix of rank n.
By using these properties, we can readily show:
H
Σ = U XV
UΣ = XV
H H
ΣV = U X.

EE 4715 (2022): Array Signal Processing


100 Linear algebra background

The singular values give important information on the dominant directions in the column span
and row span of X. This is seen by writing out the matrix equations, which gives the dyadic
decomposition
H H H H
X = UΣV = u1 σ1 v1 + u2 σ2 v2 + · · · + un σn vn . (5.19)
Each term of the form uk σk vkH is a rank-1 matrix. If σ1 is large, then the corresponding
component u1 v1H is dominantly present in X, and u1 is the dominant direction in the column
span of X. In fact, u1 σ1 v1H is the best rank-1 approximation of X (in the Least Squares sense).
If σn is zero, then one dimension is missing in the matrix: it is rank deficient by order 1. In
general, if X is of rank d, then only d singular values σ1 , · · · , σd are nonzero. This is also seen
from (5.19) because it will then consist of the sum of d rank-1 components. The best rank-d
approximation of a matrix X is obtained by setting σd+1 = · · · = σn = 0.
Suppose X has rank d, with d < n. Similar to the economy-size SVD, we can write
 
σ1
H

 σ2 

X = ÛΣ̂V̂ , Û = [u1 · · · ud ] , Σ̂ =  .. , V̂ = [v1 · · · vd ] (5.20)

 . 

σd

where Σ̂ now has size d × d, and only the nonzero singular values are kept.
Since Û and V̂ are “tall” isometric matrices, we can complement them with orthonormal columns
ud+1 , · · · , um and vd+1 , · · · , vn , respectively, to square unitary matrices,

U = [Û , U⊥ ] , V = [V̂ , V⊥ ] .

We can augment Σ̂ accordingly with zero entries along the diagonal and elsewhere, to arrive at
the original decomposition X = UΣVH in (5.17). Thus, in Σ, the number of nonzero singular
values shows the rank of X.
The columns of Û provide an orthonormal basis for the column span of X. Likewise, the columns
of V̂ are an orthonormal basis for the column span of XH . The complementary matrices also
have a meaning: since V̂H V⊥ = 0, and hence XV⊥ = 0, it is seen that the columns of V⊥ span
the null space of X. Likewise, the columns of U⊥ span the left null space, U⊥H X = 0.
The SVD of X in (5.20) reveals the behavior of the map b = Xa: a is projected onto the column
span of V̂ and rotated in n-space (by VH a), then scaled (by the entries of Σ̂), and finally rotated
in m-space (by Û) to give b.

5.4.1 Norms and the SVD


Recall the Frobenius norm of X:
X
kXk2F = |Xij |2 = tr[X X]
H

ij

EE 4715 (2022): Array Signal Processing


5.4 The singular value decomposition (SVD) 101

The latter expression shows that multiplication of X by a unitary matrix does not change the
Frobenius norm. Thus, since Σ = UH XV, we find that
X
kXk2F = σi2 .
i

Recall the “induced 2 norm” or matrix 2-norm kXk2 , which measures how much a matrix can
increase the 2-norm of a vector v:

kXvk
kXk = max (5.21)
v kvk

Without loss of generality, we may normalize the vectors v such that kvk = 1. We can also
insert the SVD. We then obtain

kXk2 = max kXvk2 = max v (X X)v = max v (VΣ2 V )v .


H H H H

kvk=1 kvk=1 kvk=1

From this we can deduce that the vector v that maximizes the norm is given by v = v1 , the
dominant right singular vector. The matrix 2-norm of X is then seen to be equal to σ1 . The
fact that the 2-norm is attained on the dominant singular vector v1 gives a recursive way to
prove the existence of the SVD: find v1 on which the norm is attained, and the corresponding
σ1 and u1 using
Xv1 = u1 σ1 ,

then subtract (or project out) this rank-1 component and consider the residual X0 , and repeat [2].
An important property that follows from the definition of the norm (5.21) is

kXvk ≤ kXkkvk ∀v

where the maximum is only achieved for v = αv1 .


For an arbitrary matrix X, perhaps of full rank, the best rank-d approximant X̂ is obtained by
computing X = UΣVH , and then setting all but the first d singular values in Σ equal to zero:

H
X̂ = ÛΣ̂V̂ ,

The approximation error in Frobenius norm and operator norm is given by

M
X
k X − X̂ k2F = σi2
i=d+1
k X − X̂ k2 2
= σd+1

EE 4715 (2022): Array Signal Processing


102 Linear algebra background


u1 σ 1 2

u2 σ2 2
x2

x1

Figure 5.1. Construction of the left singular vectors and values of the matrix X = [x1 x2 ],
where x1 and x2 have equal length.

5.4.2 QR and the SVD


The QR factorization can be used as a start in the computation of the SVD of a tall matrix X.
We first compute
X = Q̂R̂ .
The next step is to continue with an SVD of R̂:
H
R̂ = ÛR Σ̂R V̂R ,

so that the SVD of X is


H
X = (Q̂ÛR )Σ̂R V̂R ,
The preprocessing by QR in computing the SVD is useful because it reduces the size from X to
that of R̂, and obviously, X and R̂ have the same singular values and right singular vectors. A
QR decomposition is much easier to compute than the SVD (which requires an iterative process).
As mentioned before, the QR decomposition gives only a poor indication of the rank (although
rank-revealing QR decompositions have been proposed). The SVD, on the other hand, is the
standard tool to determine the rank.

Example 5.1. Figure 5.1 shows the construction of the left singular vectors of a matrix
X = [x1 x2 ], whose columns x1 and x2 are of equal length. The largest singular
vector u1 is in the direction of the sum of x1 and x2 , i.e., the “common” direction
of the√two vectors, and the corresponding singular value σ1 is equal to σ1 = k x1 +
x2 k/ 2. On the other hand, the smallest singular vector u2 is dependent √ on the
difference x2 − x1 , as is its corresponding singular value: σ2 = k x2 − x1 k/ 2. If
x1 and x2 become more aligned, then σ2 will be smaller and X will be closer to a
singular matrix. Clearly, u2 is the most sensitive direction for perturbations on x1
and x2 .
An example of such a matrix could be A = [a(φ1 ) a(φ2 )], where a(φ) =
[1 φ φ2 · · · φM −1 ]T , where φ is for example related to the direction at which
a signal hits an antenna array, or to the time difference to a reference signal. If

EE 4715 (2022): Array Signal Processing


5.4 The singular value decomposition (SVD) 103

Table 5.1. Example 5.4.2: Singular values of XM,N .

M = 3, σ1 = 3.44 M = 3, σ1 = 4.86
N =3 σ2 = 0.44 N = 6 σ2 = 0.63
M = 6, σ1 = 4.73
N =3 σ2 = 1.29

two directions are close together, then φ1 ≈ φ2 and a(φ1 ) points in about the same
direction as a(φ2 ), which will be the direction of u1 . The smallest singular value, σ2 ,
is dependent on the difference of the directions of a(φ1 ) and a(φ2 ).
For further illustration, consider the following small numerical experiment. Let φ1 =
1, φ2 = exp(jπ · 0.1), √and construct M × N matrices X = [a(φ1 ) a(φ2 )]S, where
SSH = N I. Since (1/ N )S √ is co-isometric, the singular values of X are those of
A = [a(φ1 ) a(φ2 )] times N . The two non-zero singular values of X for some
values of M, N are given in Table 5.1. It is seen that doubling M almost triples the
smallest
√ singular value, whereas doubling N only increases the singular values by a
factor 2, which is because the matrices have larger size.

5.4.3 Matrix inversion using the SVD


If X is full column rank, then the left inverse of X is X† = (XH X)−1 XH . Inserting X = UΣVH ,
we obtain that this can also be written as

X = VΣ−1 U
H
(5.22)

which is a slightly more general expression. Essentially, we are inverting the singular values
here. We can easily verify that X† X = I and

XX† = UU
H

If we define P = UUH then we see that P is an orthogonal projection, because PP = P and


PH = P. It is a projection onto the column span of X.
The largest singular value of the pseudo-inverse is σn−1 . It follows that the matrix 2-norm of
X† is given by σn−1 . If σn is very small, it shows that Σ−1 and X† have a very large norm, and
should not be used: in applications, this leads to noise enhancement.

5.4.4 Connection to the eigenvalue decomposition


If we take the SVD of X and “square” it to XH X, we obtain

X X = VΣ2 V
H H

EE 4715 (2022): Array Signal Processing


104 Linear algebra background

Matrix XH X is a symmetric matrix; the decomposition is recognized as the eigenvalue decom-


position of the symmetric matrix XH X, where the eigenvalues are given by the entries of Σ2 ,
and the eigenvectors by the columns of V. (The eigenvalue problem is discussed later in Sec.
5.6.)
Similarly, XXH has eigenvalue decomposition

XX = UΣ2 U
H H

The point of the SVD is that it gives similar information as we obtain from an eigenvalue
decomposition, but (i) it is applicable to any matrix (e.g., non-square matrices), and (ii) it
always exists, whereas the eigenvalue decomposition only exists for “regular” matrices. Also,
there are numerically very robust algorithms to compute the decomposition.

5.4.5 Rank reduction using the SVD; Moore-Penrose pseudo-inverse


Equation (5.22) shows that, when we invert a matrix X, the smallest singular values of X
become the largest singular values of X† . Sometimes, if a matrix is almost rank deficient (σn is
very small), that small component will dominate the inverse, which can give rise to numerical
problems, as we will see later. In that case, we propose to first approximate X to a lower rank
d, by setting all singular values below a certain threshold  equal to zero:
H H H
X̂ = u1 σ1 v1 + u2 σ2 v2 + · · · + ud σd vd

which we can write as (economy-size SVD notation)


 
σ1
H

 σ2 

X̂ = ÛΣ̂V̂ , Û = [u1 · · · ud ] , Σ̂ =  .. , V̂ = [v1 · · · vd ] .

 . 

σd

This is called the Truncated SVD.


The corresponding approximate inverse is

X̂† = V̂Σ̂−1 Û .
H

This is called the Moore-Penrose pseudo-inverse of X̂. It satisfies the projection properties:

X̂X̂† = ÛÛ = Pc , X̂† X̂ = V̂V̂ = Pr


H H

where Pc is a projection onto the dominant column span of X, and Pr a projection onto the
dominant row span.
The largest singular value of the Moore-Penrose (truncated) pseudo-inverse is σd−1 , whereas
without truncation it was σn−1 . This gives a way to control the norm of the inverse, by inverting
only dominant directions in X, and projecting away the other dimensions.

EE 4715 (2022): Array Signal Processing


5.5 Pseudo-inverse and the Least Squares problem 105

This pseudo-inverse (matlab: pinv) is commonly used if we are not sure if a matrix is full rank.
Typically, we compare the singular values of X to a threshold () and replace them by 0 if they
are below the threshold, leading to X̂. Next, we compute X̂† by inverting the non-zero singular
values.

5.4.6 The condition number


The condition number of X is defined by
σ1
c(X) :=
σn
Thus, we always have c(X) ≥ 1. If it is large, then X is hard to invert (and X† is sensitive to
small changes). The smallest condition number for a matrix is c = 1, which is achieved for an
orthogonal matrix.
When we compute the inverse of a matrix, its condition number is very important. Indeed, the
condition number gives the relative sensitivity of the solution of a linear systems of equations.
Let us suppose that we wish to solve a system of equations Ax = b, where we take A : n × n
square. We have
Ax = b ⇒ x = A−1 b
Now, if we perturb the data vector b by a noise vector e, we obtain

b0 = b + e ⇒ x0 = x + A−1 e

Define σ1 = k A k , σn−1 = k A−1 k, and use kAxk ≤ kAkkxk. Then

k A−1 e k ≤ σn−1 k e k
kbk ≤ σ1 k x k
kx0 − xk kek kek
≤ σn−1 ≤ σn−1 σ1
kxk kxk kbk
This measures the relative change in the solution vector x, and shows that any error in b is
potentially magnified by a factor equal to the condition number.
If a matrix has a poor condition number, the usual strategy is not to invert it directly, but to
do a rank reduction to rank d, and compute the pseudo-inverse as shown before. This will avoid
noise enhancement due to the inversion of non-important components.

5.5 PSEUDO-INVERSE AND THE LEAST SQUARES PROBLEM

5.5.1 The pseudo-inverse


Consider a rank-d M × N matrix X. In general, since X may be rank-deficient or non-square,
the inverse of X does not exist; i.e., for a given vector b, we cannot always find a vector a such
that b = Xa.

EE 4715 (2022): Array Signal Processing


106 Linear algebra background

If X is tall but of full rank, the pseudo-inverse of X is X† = (XH X)−1 XH . It satisfies

X† X = IN
XX† = Pc

Thus, X† is an inverse on the “short space”, and XX† is a projection onto the column span of
X. It is easy to verify that the solution to b = Xa is given by a = X† b.
If X is rank deficient, then XH X is not invertible, and there is no exact solution to b = Xa. In
this case, we can resort to the Moore-Penrose pseudo-inverse of X, also denoted by X† . It can
be defined in terms of the “economy size” SVD X = ÛΣ̂V̂H (equation (5.18)) as

X† = V̂Σ̂−1 Û .
H

This pseudo-inverse satisfies the properties

1. XX† X = X 3. XX† = Pc
2. X† XX† = X† 4. X† X = Pr

which constitute the Moore-Penrose inverse in the traditional way.


These equations show that, in order to make the problem b = Xa solvable, a solution can be
forced to an approximate problem by projecting b onto the column space of X:

b0 = P c b ,

after which b0 = Xa has solution


a = X† b0 .
The projection is in fact implicitly done by just taking a = X† b: from properties 1 and 3 of the
list above, we have that
a = X† b0 = X† XX† b = X† b
It can be shown that this solution a is the solution of the (Least Squares) minimization problem

min k b − Xa k2 ,
a

where a is chosen to have minimal norm if there is more than one solution (the latter requirement
translates to a = Pr a).
Some other properties of the pseudo-inverse are

• The norm of X† is k X† k = σd−1 .

• The condition number of X is c(X) := σ1


σd .

If it is large, then X is hard to invert (X† is sensitive to small changes).

EE 4715 (2022): Array Signal Processing


5.6 The eigenvalue problem 107

5.5.2 Total Least Squares


Now, suppose that instead of a single vector b we are given an (M × N )-dimensional matrix
Y, the columns of which are not all in the column space of the matrix X. We want to force
solutions to XA = Y. Clearly, we can use a least squares approximation Ŷ = PX Y to force
the columns of Ŷ to be in the d-dimensional column space of X. This is reminiscent to the LS
application above, but just one way to arrive at X and Y having a common column space, in
this case by only modifying Y. There is another way, called Total Least Squares (TLS) which
is effectively described as projecting both X and Y onto some subspace that lies between them,
and that is “closest” to the column spaces of the two matrices. To implement this method, we
compute the SVD
" #
⊥ V̂H
V̂2 ] + Û⊥ Σ̂⊥ (V̂⊥ )
H H H
[X Y] = [Û Û ] Σ = ÛΣ̂ [V̂1
(V̂⊥ )H

and define the projection Pc = ÛÛH . We now take the TLS (column space) approximations
to be X̂ = Pc X = ÛΣ̂V̂1H and Ŷ = Pc Y = ÛΣ̂V̂2H , where V̂1 and V̂2 are the partitions of V̂
corresponding to X and Y respectively. X̂ and Ŷ have the same column span defined by Û,
and are in fact solutions to
min k [X Y] − [X̂ Ŷ] k2F
[X̂ Ŷ] rank N

and A satisfying X̂A = Ŷ is obtained as A = X̂† Ŷ. This A is the TLS solution of XA ≈ Y.
Instead of asking for rank N , we might even insist on a lower rank d.

5.6 THE EIGENVALUE PROBLEM

The eigenvalue problem for a matrix A is


Ax = λx ⇔ (A − λI)x = 0 (5.23)
Any λ that makes A − λI singular is called an eigenvalue, the corresponding x is the eigenvector
(invariant vector). It has an arbitrary norm usually set equal to 1.
We can collect the eigenvectors in a matrix:
 
λ1
A[x1 x2 · · ·] = [x1 x2 · · ·]  λ2
..

.
⇔ AT = TΛ
It is common to normalize the eigenvectors (columns of T) to have unit norm.
A regular matrix A has an eigenvalue decomposition:
A = TΛT−1 , T invertible , Λ diagonal (5.24)

EE 4715 (2022): Array Signal Processing


108 Linear algebra background

This decomposition might not exist if eigenvalues are repeated. A classical example of a matrix
that does not have an eigenvalue decomposition is
" #
0 1
A= .
0 0

5.6.1 Schur decomposition

Suppose T has a QR factorization T = QRT , so that T−1 = R−1 H


T Q . Then

A = QRT ΛR−1
H H
T Q = QRQ

The factorization
H
A = QRQ ,

with Q unitary and R upper triangular, is called a Schur decomposition. One can show that this
decomposition always exists (although it is not unique); if A is hermitian, then R = RH is upper
triangular implies that R is diagonal, and in this case the Schur decomposition coincides with the
eigenvalue decomposition (and the SVD). For non-hermitian A, the Schur decomposition avoids
the inversion of the eigenvalue matrix T, which might be ill-conditioned (or even non-invertible
in some cases).
R has the eigenvalues of A on the diagonal. Q gives information about “eigen-subspaces”
(invariant subspaces), but doesn’t contain eigenvectors.

5.6.2 Connection to the SVD

Suppose we compute the SVD of a matrix X, and then consider XXH :

X = UΣVH ⇒ XXH = UΣVH VΣUH


= UΣ2 UH
= UΛUH

This shows that the eigenvalues of XXH are the singular values of X, squared (hence real). The
eigenvectors of XXH are equal to the left singular vectors of X (hence U is unitary). Since the
SVD always exists, the eigenvalue decomposition of XXH always exists (in fact it exists for any
Hermitian matrix C = CH ).
Historically, the SVD was derived out of frustration that the eigenvalue decomposition does
not always exist. By generalizing the eigenvector matrix T to two unitary matrices U, V, a
decomposition was found that does always exist. Despite this connection, the decompositions
are generally different and have different applications.

EE 4715 (2022): Array Signal Processing


5.7 The generalized eigenvalue decomposition 109

5.7 THE GENERALIZED EIGENVALUE DECOMPOSITION

For two matrices A and B, the generalized eigenvalue problem is

Ax = λBx ⇔ (A − λB)x = 0

This is a generalization of (5.23) where we had B = I. The solutions λi are called the generalized
eigenvalues. The set of matrices A − λB (for any λ) or the pair (A, B) is called a matrix pencil.
In the above formulation, A and B have the same size but could be rectangular. If A is square
and B is invertible, then we can immediately return to the usual eigenvalue decomposition, by
considering
B−1 Ax = λx
Thus, the generalized eigenvalues of a pair (A, B) are the eigenvalues of B−1 A. Generally, we
try to avoid inverting B, for numerical reasons, or because A and B might have structure that
is otherwise lost. E.g., if B is banded, its inverse is not a band.
As in (5.24), we can collect the eigenvectors in a matrix T, such that

AT = BTΛ

where Λ is diagonal. We can also write the solution as a joint matrix decomposition
(
A = FΛA T−1
(5.25)
B = FΛB T−1

where Λ−1
B ΛA are the generalized eigenvalues, and F is an (invertible?) matrix with unit-norm
columns. Indeed, from AT = BTΛ, after having found T we can set W = AT, and then
normalize the columns of W to find W = FΛA . This decomposition is called the Generalized
Eigenvalue Decomposition (GEV).
The form (5.25) shows an application of the GEV, namely a joint diagonalization of two matrices
(A, B). This application is studied in more detail in Chap. 9.
The existence of this decomposition is similar to that of the eigenvalue decomposition: in many
cases, it does not exist, and/or F and T are not invertible. However, if A and B are hermitian
and one of them (typically B) is positive definite, the decomposition exists. For more general
cases, numerical algorithms are more complicated and might run into problems.
Just as the SVD is connected to the eigenvalue decomposition, the generalized eigenvalue de-
composition leads to a Generalized SVD (GSVD):
(
A = FΣA UH
(5.26)
B = FΣB VH

where U, V are unitary, ΣA , ΣB are diagonal and nonnegative real, and F is an invertible matrix
with unit-norm columns. This decomposition always exists; the matrices A and B do not need

EE 4715 (2022): Array Signal Processing


110 Linear algebra background

to be square, they can even have a different number of columns. Note that if B is square and
invertible, then B−1 A = VΣ−1 H
B ΣA U constitutes an SVD of B A.
−1

Actually, various definitions of the GSVD exist. In the usual formulation [2],
(
A = UΣA X−1
B = VΣB X−1

where U, V are unitary, X is invertible, and Σ2A + Σ2B = I, which gives a connection to the CS
decomposition (“cosine-sine”). However, this formulation has problems in case there is a vector
x in the nullspace of both A and B. In our applications, we often have that case. Therefore,
we will use (5.26).
The Generalized Schur Decomposition (GSD), also called the QZ decomposition, is
(
A = QRA ZH
(5.27)
B = QRB ZH

where Q, Z are unitary, and RA , RB are upper triangular. This decomposition always exists. It
follows from the GEV (5.25) by inserting a QR decomposition for F and another one for T−1 .
The generalized eigenvalues of (A, B) are found by the ratios of the diagonal entries of RA and
RB . The advantage of this decomposition is that it is more stable to compute as it involves
only unitary matrices. This facilitates its computation using 2 × 2 Givens rotations. Generally,
the QZ algorithm is used: the core of this consists of an iteration where the QR decomposition
of B−1 A is implicitly computed, without forming the product.

5.8 NOTES

A widely-used reference book on linear algebra is Golub & Van Loan [2]. More advanced
properties are found in Horn and Johnson [3]. An extensive tutorial to linear algebra in relation
to signal processing can be found in Moon and Stirling [1].

Bibliography

[1] T. K. Moon and W. C. Stirling, Mathematical methods and algorithms for signal processing.
Prentice Hall, 2000.

[2] G. Golub and C. Van Loan, Matrix Computations. The Johns Hopkins University Press,
1989.

[3] R. Horn and C. Johnson, Matrix Analysis. Cambridge, NY: Cambridge Univ. Press, 1985.

EE 4715 (2022): Array Signal Processing


Part II

METHODS AND ALGORITHMS

EE 4715 (2022): Array Signal Processing


Chapter 6

SPATIAL PROCESSING TECHNIQUES

Contents
6.1 Deterministic approach to Matched and Wiener filters . . . . . . . . 114

6.2 Stochastic approach to Matched and Wiener filters . . . . . . . . . . 118

6.3 Other interpretations of Matched Filtering . . . . . . . . . . . . . . . 122

6.4 Prewhitening filter structure . . . . . . . . . . . . . . . . . . . . . . . 128

6.5 Eigenvalue analysis of Rx . . . . . . . . . . . . . . . . . . . . . . . . . . 131

6.6 Beamforming and direction estimation . . . . . . . . . . . . . . . . . 134

6.7 Applications to temporal matched filtering . . . . . . . . . . . . . . . 138

In this chapter, we look at elementary receiver schemes: the matched filter and Wiener filter in
their non-adaptive forms. They are suitable if we have a good estimate of the channel, or if we
know a segment of the transmitted data, e.g., because of a training sequence. These receivers
are most simple in the context of narrowband antenna array processing, and hence we place the
discussion first in this scenario. The matched filter is shown to maximize the output signal-
to-noise ratio (in the case of a single signal in noise), whereas the Wiener receiver maximizes
the output signal-to-interference plus noise (in the case of several sources in noise). We also
look at the application of these receivers as non-parametric beamformers for direction-of-arrival
estimation. Improved accuracy is possible using parametric data models and subspace-based
techniques: a prime example is the MUSIC algorithm.

General references to this chapter are [1–7].

EE 4715 (2022): Array Signal Processing


114 Spatial processing techniques

6.1 DETERMINISTIC APPROACH TO MATCHED AND WIENER FILTERS

6.1.1 Data model and assumptions


In this chapter, we consider a simple array signal processing model of the form
d
X
xk = ai si,k + nk = Ask + nk . (6.1)
i=1

We assume that signals are received by M antennas, and that the antenna outputs (after de-
modulation, sampling, A/D conversion) are stacked into vectors xk . According to the model,
xk is a linear combination of d narrowband source signals si,k and noise nk . Initially, we will
consider an even simpler case where there is only one signal in noise. In all cases, we assume
that the noise covariance matrix
H
Rn := E[nn ]
is known, up to a scalar which represents the noise power. The most simple situation is spatially
white noise, for which
Rn = σ 2 I .

Starting from the data model (6.1), let us assume that we have collected N sample vectors. If
we store the samples in an M × N matrix X = [x1 , · · · , xN ], then we obtain that X has a
decomposition
X = AS + N (6.2)
where the rows of S ∈ C| d×N contain the samples of the source signals. Note that we can choose
to put the source powers in either A or S, or even in a separate factor B. Here we will assume
they are absorbed in A, thus the sources have unit powers. Sources may be considered either
stochastic (with probability distributions) or deterministic. If they are stochastic, we assume
they are zero mean, independent and hence uncorrelated,
H
E[sk sk ] = I .

If they are considered deterministic (E[sk ] = sk ), we will assume similarly that

lim 1 SSH = I.
N →∞ N

The objective of beamforming is to construct a receiver weight vector w such that the output is
H
yk = w xk = ŝk (6.3)

is an estimate of one of the original sources. Which beamformer is “the best” depends on the
optimality criterion, of which there are many. It also makes a difference if we wish to receive
only a single signal, as in (6.3), or all d signals jointly,
H
yk = W xk = ŝk (6.4)

EE 4715 (2022): Array Signal Processing


6.1 Deterministic approach to Matched and Wiener filters 115

where W = [w1 , · · · , wd ].
We will first look at purely deterministic techniques to estimate the beamformers: here no
explicit statistical assumptions are made on the data. The noise is viewed as a perturbation on
the noise-free data AS, and the perturbations are assumed to be small and equally important
on all entries. Then we will look at more statistically oriented techniques. The noise will be
modeled as a stochastic sequence with a joint Gaussian distribution. Still, we have a choice
whether we consider sk to be a deterministic sequence (known or unknown), or if we associate a
probabilistic distribution to it, for example Gaussian or belonging to a certain alphabet such as
{+1, −1}. In the latter case, we can often improve on the linear receiver (6.3) or (6.4) by taking
into account that the output of the beamformer should belong to this alphabet (or should have
a certain distribution). The resulting receivers will then contain some non-linear components.
In this chapter, we only consider the most simple cases, resulting in the classical linear beam-
formers.

6.1.2 Algebraic (purely deterministic) approach


Noiseless case Let us first consider the noiseless case, and a situation where we have collected
N samples. Our data model thus is
X = AS .
Our objective will be to construct a linear beamforming matrix W such that
H
W X = S.

We consider two cases:

1. A is known, for example we know the directions of the sources and have set A =
[a(θ1 ) · · · a(θd )],
2. S is known, for example we have selected a segment of the data which contains a training
sequence for all sources. Alternatively, for discrete alphabet sources (e.g., Sij ∈ {±1}) we
can be in this situation via decision feedback.

In both cases, the problem is easily solved. If A is known, then we set

W = A† ,
H H
S = W X.

Here, A† is the Moore-Penrose pseudo-inverse of A. If M ≥ d and the columns of A are linearly


independent, then A† is equal to the left inverse

A† = (A A)−1 A .
H H

Note that, indeed, under these assumptions AH A is invertible and A† A = I. If M < d then we
cannot recover the sources exactly: AH A is not invertible (it is a d × d matrix with maximal
rank M ), so that A† A 6= I.

EE 4715 (2022): Array Signal Processing


116 Spatial processing techniques

If S is known, then we take


W = SX† , A = (W )† .
H H

where X† is a right inverse of X. If N ≥ d and the rows of X are linearly independent,1 then

X† = X (XX )−1 .
H H

This is verified by XX† = I. In both cases, we obtain a beamformer which exactly cancels all
interference, i.e., WH A = I.

Noisy case In the presence of additive noise, we have X = AS + N. Two types of linear least-
squares (LS) minimization problems can now be considered. The first is based on minimizing
the model fitting error,

min kX − ASk2F , or min kX − ASk2F (6.5)


S A

with A or S known, respectively. The second type of minimization problem is based on mini-
mizing the output error,

min kW X − Sk2F , min kW X − Sk2F ,


H H
or (6.6)
S W

also with A or S known, respectively. The minimization problems are straightforward to solve,
and in the same way as before.

Deterministic model matching For (6.5) with A known we obtain

Ŝ = arg min kX − ASk2F ⇒ Ŝ = A† X , (6.7)


S

so that again WH = A† . This is known as the Zero-Forcing solution, because WH A = I: all


interfering sources are canceled. As will be shown later, the ZF beamformer maximizes the
Signal-to-Interference power Ratio (SIR) at the output: in this case (known A) it is infinity.
Note however that
W X = S + A† N .
H

The noise contribution at the output is A† N, and if A† is large, the output noise will be large.
To get a better insight for this, introduce the “economy-size” singular value decomposition of
A,
H
A = UA ΣA VA
where we take UA : m×d with orthonormal columns, ΣA : d×d diagonal containing the nonzero
singular values of A, and VA : d × d unitary. Since

A† = VA Σ−1
H H
A = U A ΣA V A ⇒ A UA ,
1
In the present noiseless case, note that there are only d linearly independent rows in S and X, so for linear
independence of the rows of X we need M = d. With noise, X will have full row rank M .

EE 4715 (2022): Array Signal Processing


6.1 Deterministic approach to Matched and Wiener filters 117

A† is large if Σ−1
A is large, i.e., if A is ill conditioned.
Similarly, for (6.5) with S known we obtain

 = arg min kX − ASk2F  = XS† = XS (SS )−1 .


H H
⇒ (6.8)
A

This does not specify the beamformer, but staying in the same context of minimizing kX−ASk2F ,
it is natural to take again a Zero-Forcing beamformer so that WH = † . Asymptotically for zero
mean noise independent of the sources, this gives  → A: we converge to the true A-matrix.

Example 6.1. The ZF beamformer satisfies WH A = I. Let w1 be the first column of


W, it is the beamformer to receive the first signal. Then
H H
W A=I ⇒ w1 [a2 , · · · , ad ] = [0 , · · · , 0]

so that
w1 ⊥ {a2 , · · · , ad } .
Thus, w1 projects out all other sources, except source 1,
d
w1H x(t) = H H
P
i=1 w1 ai si (t) + w1 n(t)
H
= s1 (t) + w1 n(t) .

The effect on the noise is not considered. In ill-conditioned cases (A is ill-conditioned


so that its inverse W may have large entries), w1 might give a large amplification of
the noise.

Deterministic output error minimization The second optimization problem (6.6) minimizes
the difference of the output signals to S. For known S, we obtain

W = arg min k W X − S k2F = SX† .


H H
(6.9)
W

Note that X† = XH (XXH )−1 , so that


H −1
1 H 1
= R̂Hxs R̂−1 W = R̂−1
H
W = N SX ( N XX ) x , x R̂xs .

R̂x := N1 XXH is the sample data covariance matrix, and R̂xs := 1 H


N (XS ) is the sample corre-
lation between the sources and the received data.
With known A, note that we cannot solve the minimization problem (6.6) since we can fit any S.
We have to put certain assumptions on S, for example the fact that the rows of S (the signals)
are statistically independent from each other and the noise, and hence for large N
1 H 1 H
N SS → I, N SN →0

EE 4715 (2022): Array Signal Processing


118 Spatial processing techniques

(we assumed that the source powers are incorporated in A), so that

1 H 1 H 1 H
R̂xs = N XS = N ASS + N NS → A.

Asymptotically,
W → R−1
x A,

where Rx = E[xxH ] is the true data covariance matrix.2 With finite samples, we would set

W = R̂−1
x A.

This is known as the Linear Minimum Mean Square Error (LMMSE) or Wiener receiver. This
beamformer maximizes the Signal-to-Interference-plus-Noise Ratio (SINR) at the output. Since
it does not cancel all interference, WH A 6= I, the output source estimates are not unbiased.
However, it produces estimates of S with minimal deviation, which is often more relevant.

6.2 STOCHASTIC APPROACH TO MATCHED AND WIENER FILTERS

6.2.1 Performance criteria

Let us now define some performance criteria, based on elementary stochastic assumptions on
the data. For the case of a single signal in noise,
H H H
xk = ask + nk , yk = w xk = (w a)sk + (w nk ) .

We make the assumptions

E[|sk |2 ] = 1 ,
H H
E[sk nk ] = 0 , E[nk nk ] = Rn ,

so that
E[|y|2 ] = (w a)(a w) + w Rn w .
H H H

The Signal to Noise Ratio (SNR) at the output can then be defined as

E[|(wH a)sk |2 ] wH aaH w


SNRout (w) = = .
E[|wH nk |2 ] wH Rn w

With d signals (signal 1 of interest, the others considered interferers), we can write

xk = Ask + nk = a1 s1,k + A0 s0k + nk , y = w xk = (w a1 )s1,k + w A0 s0k + (w nk ) ,


H H H H

2
We thus see that even if we adopt a deterministic framework, we cannot avoid to make certain stochastic
assumptions on the data.

EE 4715 (2022): Array Signal Processing


6.2 Stochastic approach to Matched and Wiener filters 119

where A0 contains the columns of A except for the first one, and similarly for s0k . Now we can
define two criteria: the Signal to Interference Ratio (SIR), and the Signal to Interference plus
Noise Ratio (SINR):

wH (a1 aH1 )w wH (a1 aH1 )w


sir1 (w) := H =
w H A0 A0 w wH (AAH − a1 aH1 )w
(6.10)
wH (a1 aH1 )w wH (a1 aH1 )w
sinr1 (w) := = .
wH (A0 A0 H + Rn )w wH (AAH − a1 aH1 + Rn )w
For the Zero-Forcing receiver, we have by definition (for known A)

w A0 = [0, · · · , 0] ,
H H H H
W A=I ⇒ w1 A = [1, 0, · · · , 0] ⇒ w a1 = 1 ,

and it follows that sir1 (w1 ) = ∞. When W is estimated from a known S, the ZF receiver still
maximizes the SIR, but it is not infinity anymore.
Note that (6.10) defines only the performance with respect to the first signal. If we want to
receive all signals, we need to define a performance vector, with entries for each signal,
SIR(W) := [sir1 (w1 ) · · · sird (wd )]
SINR(W) := [sinr1 (w1 ) · · · sinrd (wd )] .
In graphs, we would usually plot only the worst performance of each vector, or the average of
each vector.

6.2.2 Stochastic derivations (white noise)


We now show how the same ZF and Wiener receivers can be derived when starting from a
stochastic formulation, but considering the signals deterministic.

Stochastic model matching Assume a model with d sources,

xk = Ask + nk (k = 1, · · · , N ) ⇔ X = AS + N .

Suppose that sk is deterministic, and that the noise samples are independent and identically dis-
tributed in time (temporally white), and spatially white (Rn = I) and jointly complex Gaussian
distributed, so that nk has a probability density
1 knk k2
nk ∼ CN (0, σ 2 I) ⇔ p(nk ) = √ e− σ2 .
πσ
Because of temporal independence, the probability distribution of N samples is the product of
the individual probability distributions,
N
Y 1 − knk2k2
p(N) = √ e σ .
k=1
πσ

EE 4715 (2022): Array Signal Processing


120 Spatial processing techniques

Since nk = xk −Ask , the probability to receive a certain vector xk (with a known or deterministic
sk ) is thus
1 kxk −Ask k2
p(xk |sk ) = √ e− σ2
πσ
and hence
PN
N kxk −Ask k2
N kxk −Ask k2 N kX−ASk2
1 1 1
 
k=1 F
√ e− − −
Y
p(X|S) = σ2 = √ e σ2 = √ e σ2 .
k=1
πσ πσ πσ

p(X|S) is called the likelihood of receiving a certain data matrix X, for a certain transmitted data
matrix S. It is of course a probability density function, but in the likelihood interpretation we
regard it as a function of S, for an actual received data matrix X. The Deterministic Maximum
Likelihood technique estimates S as that matrix that maximizes the likelihood of having received
the actual received X, thus
N kX−ASk2
1

− F
Ŝ = arg max √ e σ2 . (6.11)
S πσ

If we take the negative logarithm of p(X|S), we obtain what is called the negative log-likelihood
function. Since it is a monotonously growing function, taking the logarithm does not change
the location of the maximum. The maximization problem then becomes a minimization over
const + kX − ASk2F /σ 2 , or
Ŝ = arg min kX − ASk2F . (6.12)
S

This is the same model fitting problem as we had before in (6.5). Thus, the deterministic ML
problem is equivalent to the LS model fitting problem in the case of white Gaussian noise.

Stochastic output error minimization In a statistical framework, the output error problem
(6.6) becomes
min E[|w xk − sk |2 ] .
H
w

The cost function is known as the Linear Minimum Mean Square Error. It can be worked out
as follows:
J(w) = E[|wH xk − sk |2 ]
= wH E[xxH ]w − wH E[xs̄k ] − E[sk xH ]w + E[|sk |2 ] .
At this point, note that there is a question whether we regard sk as a stochastic variable or
deterministic. If sk is stochastic with E[|sk |2 ] = 1, then
H H H
J(w) = w Rx w − w a − a w + 1 .

If sk is deterministic, then J = Jk depends on sk , and we need to work with an average over N


samples, J¯ = N1 k Jk . For large N and i.i.d. aassumptions on sk , the result will be the same.
P

EE 4715 (2022): Array Signal Processing


6.2 Stochastic approach to Matched and Wiener filters 121

Now differentiate with respect to w. This is a bit tricky since w is complex and functions
of complex variables may not be differentiable (a simple example of a non-analytic function is
f (z) = z̄). There are various approaches (e.g. [1, 8]). A consistent approach is to regard w
and w∗ as independent variables. Let w = u + jv with u and v real-valued, then the complex
gradients to w and w∗ are defined as [8]
∂ ∂
   
∂u1 J ∂v1 J
1 1 ..  1  .. 
∇w J = (∇u J + j∇v J) =  . + j
. 
2 2 ∂
 2 


∂ud J ∂vd J
∂ ∂
   
∂u1 J ∂v1 J
1 1 ..  1  .. 
∇w ∗ J = (∇u J − j∇v J) =  . − j
. 
2 2 ∂
 2 


∂ud J ∂vd J
with properties
∇w a w = a∗ , ∇w w Rw = R w∗
H H H T
∇w w a = 0 ,
∇∗w w a = a , ∇∗w a w = 0 , ∇∗w w Rw = Rw
H H H

It can further be shown that for a stationary point, it is necessary and sufficient that either
∇w J = 0 or that ∇∗w J = 0: the two are equivalent. Since the latter expression is more simple,
and because it specifies the maximal rate of change, we keep from now on the definition for the
gradient
∇J(w) ≡ ∇∗w J(w) , (6.13)
and we obtain
∇J(w) = Rx w − a .
The minimum of J(w) is attained for
∇w J = 0 ⇒ w = R−1
x a.

We thus obtain the Wiener receiver.


The LMMSE cost function is also called Minimum Variance. This is in fact a misnomer: the
expression is not really that of a variance because the error E[wH xk − sk ] 6= 0. In fact, for the
Wiener receiver, a single signal in noise, and sk considered deterministic (E[sk ] = sk ),
H
E[yk ] = E[w xk ]
= E[a R−1
H
x (ask + nk )]
= a R−1
H
x ask
= a (aa + σ 2 I)−1 ask
H H

= a a(a a + σ 2 )−1 sk
H H

aH a
= sk .
a a + σ2
H

Thus, the expected value of the output is not sk , but a scaled-down version of it.

EE 4715 (2022): Array Signal Processing


122 Spatial processing techniques

6.2.3 Colored noise


Let us now see what changes in the above when the noise is not white, but has a variance
H
E[nn ] = Rn .

We assume that we know the variance. In that case, we can prewhiten the data with a square-root
−1/2
factor Rn :

xk = Ask + nk ⇒ R−1/2 xk = R−1/2 A sk + Rn−1/2 nk


| n {z } | n{z } | {z }
xk = Ask + nk

Note that now


Rn = E[nk nk ] = Rn−1/2 Rn R−1/2
H
n =I
so that the noise nk is white. At this point, we are back on familiar grounds. The ZF equalizer
becomes
sk = A†H xk = (A A)−1 A xk = (A R−1 −1 H −1
H H H
n A) A Rn xk

W = R−1 −1 −1H
⇒ n A(A Rn A) (6.14)
The Wiener receiver on the other hand will be the same, since Rn is not used at all in the
derivation. This can also be checked:
−1/2 −1/2 −1 −1/2 1/2
W = R−1
x A = (Rn Rx Rn ) Rn A = Rn R−1
x A
−1/2 −1
⇒ W = Rn W = Rx A .

6.3 OTHER INTERPRETATIONS OF MATCHED FILTERING

6.3.1 Maximum Ratio Combining


Consider a special case of the previous, a single signal in white noise,

E[nk nk ] = σ 2 I .
H
xk = ask + nk ,

As we showed before, the ZF beamformer is given by

w = a(a a)−1 = γ1 a
H

where γ1 is a scalar. Since a scalar multiplication does not change the output SNR, the optimal
beamformer for s in this case is given by

wM F = a

which is known as a matched filter or a classical beamformer. It is also known as Maximum


Ratio Combining (MRC).

EE 4715 (2022): Array Signal Processing


6.3 Other interpretations of Matched Filtering 123

With non-white noise,


H
xk = ask + nk , E[nn ] = Rn
we have seen in (6.14) that

w = R−1 −1 −1
= γ2 R−1
H
n a(a Rn a) n a.

Thus, the matched filter in non-white noise is

wM F = R−1
n a.

We can proceed similarly with the Wiener receiver. In white noise,

w = R−1
x a
= (aaH + σ 2 I)−1 a
= a(aH a + σ 2 )−1 ∼ a.

It is equal to a multiple of the matched filter. In colored noise, we whiten to apply the white
noise result:
w = R−1 x a
= (aaH + Rn )−1 a
−1/2 −1/2
= Rn (aaH + I)−1 a (a = Rn a)
−1/2
= Rn a(aH a + 1)−1
∼ R−1n a.

This is equal to a multiple of the matched filter for colored noise.


The colored noise case is relevant also for the following reason: with more than one signal, we
can write the model as
xk = Ask + nk = a1 sk + (A0 s0k + nk ) .
This is of the form
xk = ask + nk , Rn = A0 A0H + σ 2 I
where the noise is now not white, but colored due to the contribution of the interfering sources.
The conclusion is quite interesting:

For the reception of a single source out of interfering sources plus noise, the Zero-
Forcing receiver, Matched Filter or MRC: w = R−1 n a, and the Wiener receiver:
w = R−1x a, are asymptotically all equal to a scalar multiple of each other, and hence
will asymptotically give the same performance.

It should be stressed that this equivalence is only an asymptotic result (large N ), because the
interfering sources are not regarded deterministic sources, but stochastic. In finite samples, the
corresponding receivers
wZF = R−1n a, wW iener = R̂−1
x a

EE 4715 (2022): Array Signal Processing


124 Spatial processing techniques

will be different. Note that Rn is assumed to be known, whereas R̂x is estimated from the
received data.
The above are examples of non-joint receivers: the interference is lumped together with the
noise, and there might as well be many more interferers than antennas. Improved performance
may be possible by a joint estimation of the collection of receivers for all sources.

Example 6.2. Consider a single source in white noise:

x(t) = a(θ)s(t) + n(t) , Rn = σ 2 I .

Suppose the signal is normalized to have power E[|s|2 ] = σs2 . Then

σs2
SNRin = .
σ2
This is the SNR at each element of the array. Suppose all entries of a(θ) have unit
norm, |ai (θ)| = 1. With M antennas, a(θ)H a(θ) = M .
If we choose the matched filter, or MRC, i.e., w = a(θ), then
H H H H
y(t) = w x(t) = a as(t) + a n(t) = M s(t) + a n(t)

then
M 2 σs2 M 2 σs2
SNRout = = = M · SNRin .
aH σ 2 Ia M σ2
The factor M is the array gain.

6.3.2 Maximizing the output SNR


For a single signal in noise, the matched filter w = R−1
n a maximizes the output SNR. This is
derived as follows. Similarly as in the preceding example, we have

x(t) = as(t) + n(t) .

Define E[|s|2 ] = σs2 , then Rx = Rs + Rn , with

Rs = σs2 aa ,
H H
Rn = E[nn ] .

The output SNR after beamforming is equal to

wH Rs w
SNRout (w) = .
wH Rn w
We now would like to find the beamformer that maximizes SIRout , i.e.,

wH Rs w
w = arg max .
w wH Rn w

EE 4715 (2022): Array Signal Processing


6.3 Other interpretations of Matched Filtering 125

The expression is known as a Rayleigh quotient, and the solution is known to be given by the
solution of the eigenvalue equation
R−1
n Rs w = λmax w . (6.15)
This can be seen as follows: suppose that Rn = I, then the equation is
H
max w Rs w .
w

Introduce an eigenvalue decomposition for Rs = UΛUH , then


H H
max (w U)Λ(U w) .
w

Let λ1 be the largest eigenvalue (1, 1-entry of Λ), then it is clear that the maximum of the
expression is given by choosing wH U = [1 0 · · · 0]. Thus, the optimal w is the eigenvector
corresponding to the largest eigenvalue, and satisfies the eigenvalue equation Rs w = λ1 w. If
Rn 6= I, then we can first whiten the noise to obtain the result in (6.15).
The solution of (6.15) can be found in closed form, by inserting Rs = σs2 aaH . We obtain
R−1
n Rs w = λmax w
⇔ σs R−1
2
n aa w
H
= λmax w
−1/2 −1/2 1/2 1/2
⇔ σs2 (Rn a)(aH Rn )(Rn w) = λmax (Rn w)
⇔ σs2 aaH w = λmax w
⇔ w = a , λmax = σs2 aH a
and it follows that
w = R−1
n a
which is, as promised, the matched filter in colored noise.

6.3.3 LCMV – MVDR – GSC – Capon


A related technique for beamforming is the so-called Linearly constrained Minimum Variance
(LCMV), also known as Minimum Variance Distortionless Response (MVDR), Generalized Side-
lobe Canceling (GSC), and Capon beamforming (in the French literature). In this technique, it is
again assumed that we have a single source in colored noise (this might contain other interferers
as well),
xk = ask + nk .
If a is known, then the idea is that we constrain the beamformer w to
H
w a=1
i.e., we have a fixed response towards the source. The remaining freedom is used to minimize
the total output power (“response” or “variance”) after beamforming,
H H
min w Rx w such that w a = 1 .
w

EE 4715 (2022): Array Signal Processing


126 Spatial processing techniques

w0

x1

y
xM −

Cn wn

Figure 6.1. The Generalized Sidelobe Canceler

The solution can be found in closed form using Lagrange multipliers and is given by

w = R−1 −1 −1
H
x a(a Rx a) .

Thus, w is a scalar multiple of the Wiener receiver.


This case may be generalized by introducing a constraint matrix C : M × L (M > L) and
L-dimensional vector f , and asking for CH w = f . The solution to
H H
min w Rx w such that C w = f
w

is given by
w = R−1 −1 −1
H
x C(C Rx C) f .

Generalized Sidelobe Canceler The generalized sidelobe canceler (GSC) represents an alter-
native formulation of the LCMV problem, which provides insight, is useful for analysis, and can
simplify LCMV beamformer implementation. Essentially, it is a technique to convert a con-
strained minimization problem into an unconstrained form. Suppose we decompose the weight
vector w into two orthogonal components, w0 and −v (w = w0 −v), that lie in the range and null
space of C and CH , respectively. These subspaces span the entire space so this decomposition
can be used to represent any w. Since CH v = 0, we must have

w0 = C(C C)−1 f
H
(6.16)

if w is to satisfy the constraints. (6.16) is the minimum norm solution to the under-determined
system CH w0 = f . The vector v is a linear combination of the columns of an M × (M − L)
matrix Cn , and v = Cn wn ; provided the columns of Cn form a basis for the null space of C.
The matrix Cn can be obtained from C using any of several orthogonalization procedures, for

EE 4715 (2022): Array Signal Processing


6.3 Other interpretations of Matched Filtering 127

s1

x0
yk = ŝ1

s2 beamformer
sd x1

w
xM

Figure 6.2. The Multiple Sidelobe Canceller: interference is estimated from a reference an-
tenna array and subtracted from the primary antenna x0 .

example the QR factorization or the SVD. The structure of the beamformer using the weight
vector w = w0 −Cn wn is depicted in Fig. 6.1. The choice for w0 and Cn implies that w satisfies
the constraints independent of wn and reduces the LCMV problem to the unconstrained problem
H
min [w0 − Cn wn ] Rx [w0 − Cn wn ] .
wn

The solution is
wn = (Cn Rx Cn )−1 Cn Rx w0 .
H H

The primary advantage of this implementation stems from the fact that the weights wn are
unconstrained and a data independent beamformer w0 is implemented as an integral part of the
adaptive beamformer. The unconstrained nature of the adaptive weights permits much simpler
adaptive algorithms to be employed and the data independent beamformer is useful in situations
where adaptive signal cancellation occurs.

Example 6.3. Reference channels – Multiple sidelobe canceler


A special case of the LCMV is that where there is a primary channel x0 (t), receiving
a signal of interest plus interferers and noise, and a collection of reference antennas
x(t), receiving only interference and noise. For example, in hands-free telephony in a
car, we may have a microphone close to the speaker, and other microphones further
away from the speaker and closer to the engine and other noise sources. Or we may
have a directional antenna (parabolic dish) and an array of omnidirectional antennas.
The objective is to subtract from the primary channel a linear combination of the
reference antennas such that the output power is minimized. If indeed the signal
of interest is not present on the reference antennas, the SINR of this signal will be

EE 4715 (2022): Array Signal Processing


128 Spatial processing techniques

improved. (If the signal is present at the reference antennas, then of course it will
be canceled as well!)
Call the primary sensor signal x0 and the reference signal vector x. Then the objec-
tive is
min Ekx0 − w x k2 .
H
w

The solution of this problem is given by

w = R−1
x a a := E[xx̄0 ] .

This technique is called the Multiple Sidelobe Canceler (Applebaum 1976). It is a


special case of the LCMV beamformer, which becomes clear if we construct a joint
data vector " # " # " #
0 x0 0 1 1
x = , w = , c=
x −w 0

The constraint is (w0 )H c = 1.

6.4 PREWHITENING FILTER STRUCTURE

Subspace-based prefiltering In the noise-free case with less sources than sensors, X = AS
is rank deficient: its rank is d (the number of signals) rather than m (the number of sensors).
As a consequence, once we have found a beamformer w such that wH X = s, one of the source
signals, then we can add any vector w0 such that w0H X = 0 to w, and obtain the same output.
The beamforming solutions are not unique.
The desired beamforming solutions are all in the column span of A. Indeed, any component
orthogonal to this span will not contribute at the output. The most easy way to ensure that our
solutions will be in this span is by performing a dimension-reducing prefiltering. Let F be any
M × d matrix such that span(F) = span(A). Then all beamforming matrices W in the column
span of A are given by
W = FW
where W is a d × d matrix, nonsingular if the beamformers are linearly independent. We will
use the underscore to denote prefiltered variables. Thus, the prefiltered noisy data matrix is
H
X := F X

with structure
H H
X = AS + N , where A := F A , N := F N .
X has only d channels, and is such that WH X = WH X. Thus, the columns of W are d-
dimensional beamformers on the prefiltered data X, and for any choice of W the columns of the
effective beamformer W are all in the column span of A, as desired.

EE 4715 (2022): Array Signal Processing


6.4 Prewhitening filter structure 129

whitening subspace filter beamformer


xk −1/2 xk
Rn Σ̂−1 H
s Ûs w ŝk
M d d

subspace
estim.

Figure 6.3. Beamforming prefiltering structure

To describe the column span of A, introduce the “economy-size” singular value decomposition
of A,
H
A = UA ΣA VA
where we take UA : m×d with orthonormal columns, ΣA : d×d diagonal containing the nonzero
singular values of A, and VA : d × d unitary. Also let U⊥
A be the orthonormal complement of
UA . The columns of UA are an orthonormal basis of the column span of A. The point is that
even if A is unknown, UA can be estimated from the data, as described below (and in more
detail in section 6.5).
1
We assume that the noise is spatially white, with covariance matrix σ 2 I. Let R̂x = N XX
H
be
the noisy sample data covariance matrix, with eigenvalue decomposition
R̂x = ÛΛ̂Û = ÛΣ̂2 Û .
H H
(6.17)
Here, Û is M × M unitary, and Σ̂ is M × M diagonal. Equivalently, these factors follow from
an SVD of the data matrix X directly:
√1 X = ÛΣ̂V̂H
N

We collect the d largest singular values into a d × d diagonal matrix Σ̂s , and collect the corre-
sponding d eigenvectors into Ûs . Asymptotically, Rx satisfies Rx = AAH + σ 2 I, with eigenvalue
decomposition
Rx = UA Σ2A UA + σ 2 I = UA (Σ2A + σ 2 I)UA + σ 2 U⊥ ⊥H
H H
A UA . (6.18)
Since R̂x → Rx as the number of samples N grows, we have that Ûs Σ̂2s ÛHs → UA (Σ2A +σ 2 I)UHA ,
so that Ûs is an asymptotically unbiased estimate of UA . Thus UA and also Σ and Λ can
be estimated consistently from the data, by taking sufficiently many samples. In contrast, VA
H
cannot be estimated like this: this factor is on the “inside” of the factorization AS = UA ΣA VA S
and as long as S is unknown, any unitary factor can be exchanged between VA and S.
Even if we choose F to have the column span of Ûs , there is freedom left. As we will show, a
natural choice is to combine the dimension reduction with a whitening of the data covariance
matrix, i.e., such that Rx := N1 XXH becomes unity: Rx = I. This is achieved if we define F as

F = Ûs Σ̂−1
s . (6.19)

EE 4715 (2022): Array Signal Processing


130 Spatial processing techniques

Without dimension reduction, F = ÛΣ̂ is a square root factor3 of R̂−1 −1 H


x , i.e., R̂x = FF .

If the noise is colored with covariance matrix σ 2 Rn , where we know Rn but perhaps not the
−1/2
noise power σ 2 , then we first whiten the noise by computing Rn X, and continue as in the
−1/2
white noise case, by computing an SVD of Rn X. The resulting prewhitening/dimension
reducing filter is then
F = Rn−1/2 ÛΣ̂−1 .

The structure of the resulting beamformer is shown in Fig. 6.3.


After this preprocessing, the Wiener filter is simply given by

W=A

at least asymptotically. Indeed,


H
W = FW = FF A

and asymptotically FFH = R−1 x PA . Since PA A = A, the result follows. For finite samples, the
dimension reduction gives a slight difference.

Direct matched filtering Another choice for F that reduces dimensions and that is often taken
if (an estimate of) A is known is by simply setting

F=A

The output after this filter becomes

H H H
X = A X = (A A)S + A N

The noise is now non-white, it has covariance AH A.


We can whiten it by multiplying by a factor (AH A)−1/2 . It is more convenient to introduce an
SVD A = UA ΣA VA H
, and use a non-symmetrical factor Σ−1 H −1 H H
A VA . Note that ΣA VA A = UA .
H

This gives
H H H H H
X = UA X = (UA A)S + UA N = (ΣA VA )S + UA N .
H
The noise is white again, and A = ΣA VA . If we subsequently want to apply a Wiener receiver
in this prefiltered domain, it is given by

W = (AA + σ 2 I)−1 A = (Σ2A + σ 2 I)−1 ΣA VA


H H

3 1/2 1/2 1/2 H 1/2


Square root factors are usually taken symmetric, i.e., R̂x R̂x = R̂x and R̂x = R̂x , but this is not
necessary. F is a non-symmetric factor.

EE 4715 (2022): Array Signal Processing


6.5 Eigenvalue analysis of Rx 131

Conclusion

We can do the following forms of prefiltering:

• F = A. After this the noise is nonwhite.


• F = A(AH A)−1/2 = UA = Us . After this the noise is white, the Wiener
receiver is obtained by setting W = R−1
x A.
• F = Σ̂−1
s Ûs . The noise becomes nonwhite, but the data is whitened, R̂x = I.
The Wiener receiver is obtained by W = A.

6.5 EIGENVALUE ANALYSIS OF Rx

So far, we have looked at the receiver problem from a rather restricted viewpoint: the beam-
formers were based on the situation where there is a single source in noise. In the next section
we will also consider beamforming algorithms that can handle more sources. These are based
on an eigenvalue analysis of the data covariance matrix, which is introduced in this section.
Let us first consider the covariance matrix due to d sources and no noise,
H
Rx = ARs A

where Rx has size M × M , A has size M × d and Rs has size d × d. If d < M , then the rank of
Rx is d since A has only d columns. Thus, we can estimate the number of narrow-band sources
from a rank analysis. This is also seen from an eigenvalue analysis: let
H
Rx = UΛU

be an eigenvalue decomposition of Rx , where the M × M matrix U is unitary (UUH = I,


UH U = I) and contains the eigenvectors, and the M × M diagonal matrix Λ contains the
corresponding eigenvalues in non-increasing order (λ1 ≥ λ2 ≥ · · · ≥ λM ≥ 0). Since the rank is
d, there are only d nonzero eigenvalues. We can collect these in a d × d diagonal matrix Λs , and
the corresponding eigenvectors in a M × d matrix Us , so that
H
Rx = Us Λs Us . (6.20)

The remaining M − d eigenvectors from U can be collected in a matrix Un , and they are
orthogonal to Us since U = [Us Un ] is unitary. The subspace spanned by the columns of Us is
called the signal subspace, the orthogonal complement spanned by the columns of Un is known
as the noise subspace (although this is a misnomer since here there is no noise yet and later the
noise will be everywhere and not confined to the subspace). Thus, in the noise-free case,
" #" #
H Λs 0 UHs
Rx = UΛU = [Us Un ]
0 0 UHn

EE 4715 (2022): Array Signal Processing


132 Spatial processing techniques

In the presence of white noise,

Rx = As Rs As + σ 2 IM .
H

In this case, Rx is full rank: its rank is always M . However, we can still detect the number of
sources by looking at the eigenvalues of Rx . Indeed, the eigenvalue decomposition is derived as
(expressed in terms of the previous decomposition (6.20) and using the fact that U = [Us Un ]
is unitary: Us UHs + Un UHn = IM )

Rx = As Rs AHs + σ 2 IM
= Us Λs UHs + σ 2 (Us UHs + Un UHn )
= Us (Λs + σ"2 Iq )UHs + Un (σ 2 IM −d )U H
# "n # (6.21)
Λs + σ 2 Iq 0 UHs
= [Us Un ]
0 σ 2 IM −d UHn
=: UΛUH

hence Rx has M − d eigenvalues equal to σ 2 , and d that are larger than σ 2 . Thus, we can detect
the number of signals d by comparing the eigenvalues of Rx to a threshold defined by σ 2 .
A physical interpretation of the eigenvalue decomposition can be as follows. The eigenvectors
give an orthogonal set of “directions” (spatial signatures) present in the covariance matrix,
sorted in decreasing order of dominance. The eigenvalues give the power of the signal coming
from the corresponding directions, or the power of the output of a beamformer matched to that
direction. Indeed, let the i’th eigenvector be ui , then this output power will be
H
ui Rui = λi .

The first eigenvector, u1 , is always pointing in the direction from which most energy is coming.
The second one, u2 , points in a direction orthogonal to u1 from which most of the remaining
energy is coming, etcetera.
If only (spatially white) noise is present but no sources, then there is no dominant direction,
and all eigenvalues are equal to the noise power. If there is a single source with power σs2 and
spatial signature a, normalized to kak2 = p, then the covariance matrix is Rx = σs2 aaH + σ 2 I.
It follows from the previous that there is only one eigenvalue larger than σ 2 . The corresponding
1
eigenvector is u1 = a kak , and is in the direction of a. The power coming from that direction is

λ1 = u1 Ru1 = M σs2 + σ 2 .
H

Since there is only one source, the power coming from any other direction orthogonal to u1 is
1
σ 2 , the noise power. Since u1 = a kak ,

aH Rx a uH1 Rx u1
= = λ1 .
aH a uH1 u1

EE 4715 (2022): Array Signal Processing


6.5 Eigenvalue analysis of Rx 133

well separated closely spaced more noise


10 10 10
SNR=20dB SNR=20dB SNR=0dB
sep=60deg sep=5deg sep=60deg
8 8 8

singular value
6 6 6

4 4 4

gap
2 2 2

0 0 0
1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
index index index

Figure 6.4. Behavior of singular values.

Thus, the result of using the largest eigenvector as a beamformer is the same as the output
power of a matched filter where the a-vector of the source is known.
With more than one source, this generalizes. Suppose there are two sources with powers σ1 and
σ2 , and spatial signatures a1 and a2 . If the spatial signatures are orthogonal, aH1 a2 = 0, then u1
will be in the direction of the strongest source, number 1 say, and λ1 will be the corresponding
power, λ1 = M σ12 + σ 2 . Similarly, λ2 = M σ22 + σ 2 .
In general, the spatial signatures are not orthogonal to each other. In that case, u1 will point
into the direction that is common to both a1 and a2 , and u2 will point in the remaining direction
orthogonal to u1 . The power λ1 coming from direction u1 will be larger than before because it
combines power from both sources, whereas λ2 will be smaller.

Example 6.4. Instead of the eigenvalue decomposition of R̂x , we may also compute the
singular value decomposition of X:
H
X = UΣV

Here, U : M × M and V : N × N are unitary, and Σ : M × N is diagonal. Since


1 H 1 2 H
Rx = N XX = N UΣ U

it is seen that U contains the eigenvectors of R̂x , whereas N1 Σ2 = Λ are the eigen-
values. Thus, the two decompositions give the same information (numerically, it is
often better to compute the SVD).
Figure 6.4 shows singular values of A for d = 2 sources, a uniform linear array with
M = 5 antennas, and N = 10 samples, for

1. well separated angles: large gap between signal and noise singular values,
2. signals from close directions, resulting in a small signal singular value,
3. increased noise level, increasing noise singular values.

EE 4715 (2022): Array Signal Processing


134 Spatial processing techniques

Eigenvalues of covariance matrix (900 MHz) vs. time with CW and GSM
1.8

1.6

1.4

1.2
eigenvalue

0.8

0.6

0.4

0.2

0
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35
time [sec]

Figure 6.5. Eigenstructure as a function of time

Example 6.5. The covariance matrix eigenvalue structure can be nicely illustrated on
data collected at the Westerbork telescope array. We selected a narrow band slice
(52 kHz) of a GSM uplink data file, around 900 MHz. In this subband we have two
sources: a continuous narrow band (sine wave) signal which leaked in from a local
oscillator, and a weak GSM signal. From this data we computed a sequence of short
term data covariance matrices R̂x0.5ms based on 0.5 ms averages. Figure 6.5 shows
the time evolution of the eigenvalues of these matrices. The largest eigenvalue is due
to the CW signal and is always present. The GSM source is intermittent: at time
intervals where it is present the number of large eigenvalues increases to two. The
remaining eigenvalues are at the noise floor, σ 2 . The small step in the noise floor
after 0.2 s is due to a periodically switched calibration noise source at the input of
the telescope front ends.

6.6 BEAMFORMING AND DIRECTION ESTIMATION

In the previous sections, we have assumed that the source matrix S or the array matrix A is
known. We can now generalize the situation and only assume that the array response is known
as a function of the direction parameter θ. Then the directions of arrival (DOA’s) of the signals
are estimated and used to generate the beamformer weights. The beamformers are in fact the
same as we derived in the previous section, except that we specify them in terms of a(θ) and
subsequently scan θ to find directions where there is “maximal response” (e.g., in the sense of
maximal output SNR).

EE 4715 (2022): Array Signal Processing


6.6 Beamforming and direction estimation 135

6.6.1 The classical beamformer


The weights in a data independent beamformer are designed so the beamformer response ap-
proximates a desired response independent of the array data or data statistics. The design
objective—approximating a desired response—is the same as that for classical FIR filter design.
In spatial filtering one is often interested in receiving a signal arriving from a known location
point θ0 . Assuming the signal is narrowband, a common choice for the beamformer weight
vector is the array response vector a(θ0 ). This is called the classical beamformer, or the Bartlett
beamformer; it is precisely the same as the matched filter assuming spatially white noise.
In direction finding using classical beamforming, we estimate the directions of the sources as
those that maximize the output power of the beamformer when pointing in a scanning direction
θ (and normalizing the output by the array gain):
a(θ)H Rx a(θ)
θ̂ = max .
θ a(θ)H a(θ)
The expression is a spatial spectrum estimator. An example of the spectrum obtained this way
is shown in Fig. 6.6, see also Chap. 3. With only N samples available, we replace Rx by the
sample covariance matrix, R̂x . For multiple signals we choose the d largest local maxima.
This technique is equivalent to maximizing the output SNR in case there is only 1 signal in white
noise. If the noise is colored, the denominator should actually be replaced by a(θ)H Rn a(θ). If the
noise is white but there are interfering sources, our strategy before was to lump the interferers
with the noise. However, in the present situation we do not know the interfering directions
or a(θ2 ), · · · , a(θd ), so this is impossible. This shows that with multiple sources, the classical
beamforming technique gives a bias to the direction estimate.

6.6.2 The MVDR


As discussed before, in the MVDR technique we try to minimize the output power, while con-
straining the power towards the direction θ:
H H
θ̂ = min w R̂x w subject to w a(θ) = 1 .
θ

This yields
R̂−1
x a(θ)
w=
a(θ) R̂−1
H
x a(θ)
and thus the direction estimate is
1
θ̂ = min .
θ a(θ) R̂−1
H
x a(θ)

To make a spectral graph as in the classical beamformer, the expression is inverted to obtain a
search for maxima,
θ̂ = max a(θ) R̂−1
H
x a(θ) .
θ

EE 4715 (2022): Array Signal Processing


136 Spatial processing techniques

DOA estimation: using several methods


25

MUSIC
MVDR
20 Classical BF

15

10
Power [dB]

−5

−10
40 50 60 70 80 90 100 110 120 130
Angle [deg]

Figure 6.6. Spatial spectra corresponding to the classical beamformer, MVDR, and MUSIC.
The DOA’s are estimated as the maxima of the spectra.

For multiple signals choose again the d largest local maxima. The MVDR is also illustrated in
Fig. 6.6.

6.6.3 The AAR

TBD: make notation uniform


A problem with the MVDR and other adaptive beamformers is that the output noise power is
not spatially uniform. Consider the data model R = AΣs AH + Σn , where Σn = σn2 I is the noise
covariance matrix, then at the output of the beamformer the noise power is

σy2 (p) = w(p)H Rn w(p)


a(p)H R−1 (σn2 I)R−1 a(p)
=
[a(p)H R−1 a(p)]2
a(p)H R−2 a(p)
= σn2 .
[a(p)H R−1 a(p)]2

Thus, the output noise power is direction dependent.


As a remedy to this, a related beamformer which satisfies the constraint w(p)H w(p) = 1 (and
therefore has spatially uniform output noise) is obtained by using a different scaling of the

EE 4715 (2022): Array Signal Processing


6.6 Beamforming and direction estimation 137

MVDR beamformer:
1
w(p) = µR−1 a(p) , µ= .
a(p)H R−2 a(p)

This beamformer is known as the “Adapted Angular Response” (AAR) [9]. The resulting image
is
a(p)H R−1 a(p)
IAAR (p) = w(p)H Rw(p) = .
[a(p)H R−2 a(p)]2
It has a high resolution and suppresses sidelobe interference under the white noise constraint. It
was proposed for use in radio astronomy image formation in [10], the resulting image was called
LS-MVI.

6.6.4 The CLEAN algorithm


TBD (here? or separate section on deconvolution in context of RA)

6.6.5 The MUSIC algorithm


The classical beamformer and the MVDR have a poor performance in cases where there are sev-
eral closely spaced sources. We now consider more advanced techniques based on the eigenvalue
decomposition of the covariance matrix, viz. equation (6.21),

Rx = As Rs AHs + σ 2 IM
= Us (Λs + σ 2 Iq )UHs + Un (σ 2 IM −d )UHn

As discussed before, the eigenvalues give information on the number of sources (by counting how
many eigenvalues are larger than σ 2 ). However, the decomposition shows more than just the
number of sources. Indeed, the columns of Us span the same subspace as the columns of A. This
is clear in the noise-free case (6.20), but the decomposition (6.21) shows that the eigenvectors
contained in Us and Un respectively are the same as in the noise-free case. Thus,
H
span(Us ) = span(A) , Un A = 0 . (6.22)

Given a correlation matrix R̂x estimated from the data, we compute its eigenvalue decomposi-
tion. From this we can detect the rank d from the number of eigenvalues larger than σ 2 , and
we can estimate Us and hence the subspace spanned by the columns of A. Although we cannot
directly identify each individual column of A, its subspace estimate can nonetheless be used to
determine the directions, since we know that

A = [a(θ1 ) , · · · , a(θd )]

If a(θ) is known as a function of θ, then we can select the unknown parameters [θ1 , · · · , θd ] to
make the estimate of A fit the subspace Us . Several algorithms are based on this idea. Below we

EE 4715 (2022): Array Signal Processing


138 Spatial processing techniques

discuss an effective algorithm that is widely used, the MUSIC (Multiple SIgnal Classification)
algorithm.
Note that it is crucial that the noise is spatially white. For colored noise, an extension (whitening)
is possible but we have to know the coloring.
Assume that d < M . Since col(Us ) = col{a(θ1 ), · · · , a(θd )}, we have
H
Un a(θi ) = 0 , (1 ≤ i ≤ d) (6.23)

The MUSIC algorithm estimates the directions of arrival by choosing the d lowest local minima
of the cost function
kÛHn a(θ)k2 a(θ)H Ûn ÛHn a(θ)
JM U SIC (θ) = = (6.24)
ka(θ)k2 a(θ)H a(θ)
where ÛHn is the sample estimate of the noise subspace, obtained from an eigenvalue decompo-
sition of R̂x . To obtain a ‘spectral-like’ graph as before (it is called a pseudo-spectrum), we
plot the inverse of JM U SIC (θ). See Fig. 6.6. Note that this eigenvalue technique gives a higher
resolution than the original classical spectrum, also because its sidelobes are much more flat.
Note, very importantly, that as long as the number of sources is smaller than the number of
sensors (d < M ), the eigenvalue decomposition of the true Rx allows to estimate exactly the
DOAs. This means that if the number of samples N is large enough, we can obtain estimates
with arbitrary precision. Thus, in contrast to the beamforming techniques, the MUSIC algorithm
provides statistically consistent estimates.
An important limitation is still the failure to resolve closely spaced signals in small samples
and at low SNR scenarios. This loss of resolution is more pronounced for highly correlated
signals. In the limiting case of coherent signals, the property (6.23) is violated because the rank
of Rx becomes smaller than the number of sources (the dimension of Un is too large), and the
method fails to yield consistent estimates. To remedy this problem, techniques such as “spatial
smoothing” as well as extensions of the MUSIC algorithm have been derived.

6.7 APPLICATIONS TO TEMPORAL MATCHED FILTERING

In the previous sections, we have looked at matched filtering in the context of array signal
processing. Let us now look at how this applies to temporal filtering.

No intersymbol interference We start with a fairly simple case, namely the reception of a
symbol sequence s(t) convolved with a pulse shape function g(t):

x(t) = g(t) ∗ s(t)


P
The symbol sequence is modeled as a sequence of delta pulses, s(t) = sk δ(t−kT ). The symbol
period T will be assumed to be normalized to T = 1. We will first assume that the pulse shape

EE 4715 (2022): Array Signal Processing


6.7 Applications to temporal matched filtering 139

gs1 gs2

x1 x2

Figure 6.7. No intersymbol interference

function has a duration of less than T , so that g(t) has support only on the interval [0, 1i. We
sample x(t) at a rate P , where P is the (integer) oversampling rate. The samples of x(t) are
stacked in vectors  
x(k)
x(k + 1 ) 
 P 
xk =   ..

 .


P −1
x(k + P )

See also Fig. 6.7. If we are sufficiently synchronized, this means that
   
x(k) g(0)
x(k + 1  g( 1 )
 P )   P


xk = gsk ⇔  .
 .
= .
  .
 sk (6.25)
 .   .


P −1
x(k + P ) g( PP−1 )

or
X = gs , X = [x1 x2 ··· xN ] , s = [s1 s2 ··· sN ] .
The matched filter in this context is simply gH . It has a standard interpretation as a convolution
P −1
or integrate-and-dump filter. Indeed, yk = gH xk = Pi=0 g(i)x(k + Pi ). This can be viewed as a
convolution by the reverse filter gr (t) := g(T − t):

P
X
gr ( Pi )x(k + 1 − i
H
yk = g xk = P)
i=1

If P is very large, the summation becomes an integral


Z T Z T
yk = g(t)x(kT + t) dt = gr (t)x( (k + 1)T − t) dt .
0 0

With intersymbol interference In practise, pulse shape functions are often a bit larger than
1 symbol period. Also, we might not be able to achieve perfect synchronization. Thus let us
define a shift of g over some delay τ , and assume for simplicity that the result has support on

EE 4715 (2022): Array Signal Processing


140 Spatial processing techniques

g1 s1 g2 s1

x1 x2

Figure 6.8. With intersymbol interference

[0, 2T i (although with pulse shapes longer than a symbol period, it would in fact be more correct
to have a support of [0, 3T i):  
g(0 − τ )
 1
g( − τ )

gτ :=  . P

 ..


g(2 − P1 − τ )
Now, gτ is spread over two symbol periods, and we can define
" #
g
gτ = 1
g2

After convolution of g(t − τ ) by the symbol sequence s(t), sampling at rate P , and stacking, we
obtain that the resuling sample vectors xk are the sum of two symbol sequences (see Fig. 6.8):
     
x(k) g(0 − τ ) g(1 − τ )
1   1  1
x(k + P)  g( − τ ) g( − τ )
  
xk = g1 sk + g2 sk−1 ⇔  .  =  .P  sk +  . P  sk−1
 
 ..   ..   .. 
P −1 1 1
x(k + P ) g(1 − P − τ ) g(2 − P − τ )

or in matrix form
" #
s1 s2 · · · sN
X = Gτ S ⇔ [x1 x2 ··· xN ] = [g1 g2 ]
s0 s1 · · · sN −1

In this case, there is intersymbol interference: a sample vector xk contains the contributions of
more than a single symbol.
A matched filter in this context would be GHτ , at least if Gτ is tall: P ≥ 2. In the current
situation (impulse response length including fractional delay shorter than 2 symbols) this is the
case as soon as we do any amount of oversampling. After matched filtering, the output yk has
two entries, each containing a mixture of the symbol sequence and one shift of this sequence.
The mixture is given by " #
H g1H g1 g1H g2
Gτ Gτ =
g2H g1 g2H g2

EE 4715 (2022): Array Signal Processing


6.7 Applications to temporal matched filtering 141

Thus, if g1 is not orthogonal to g2 , the two sequences will be mixed and further equalization
(‘beamformer’ on yk ) will be necessary. The matched filter in this case only serves to make the
output more compact (2 entries) in case P is large.
More in general, we can stack the sample vectors to obtain
 
" # " # s2 s3 · · · sN
x1 x2 · · · xN −1 0 g1 g2
X = Gτ S ⇔ =  s1 s2 · · · sN −1 
 
x2 x3 · · · xN g1 g2 0
s0 s1 · · · sN −2

Gτ is tall if 2P ≥ 3. It is clear that for any amount of oversampling (P > 1) this is satisfied.
We can imagine several forms of filtering based on this model.

1. Matched filtering by Gτ . The result after matched filtering is yk = GτH [ xxk−1


k
], a vector with
H
3 entries, and containing the contributions of 3 symbols, mixed via Gτ Gτ (a 3 × 3 matrix).

2. Matched filtering by gτ . This is a more common operation, and equal to performing


integrate-and-dump filtering after a synchronization delay by τ . The data model is re-
garded as a signal of interest (the center row of S, premultiplied by gτ : the center column
of Gτ ),
" # " #" #
g1 0 g2 s2 s3 · · · sN
X = [s s2 ··· sN −1 ] +
g2 1 g1 0 s0 s1 · · · sN −2

The second term is regarded as part of the noise. As such, it has a covariance matrix
" #
g2 g2H 0
Rn =
0 g1 g1H

The result after matched filtering is a 1-dimensional sequence {yk },


H H
yk = (gτ gτ )sk + gτ nk

where the noise at the output due to ISI has variance


" #" #
g2 g2H 0 g1
= 2|g1 g2 |2
H H H H
gτ Rn gτ = [g1 g2 ]
0 g1 g1H g2

If g1 is not orthogonal to g2 , then the noise due to ISI is not zero. Since these vectors are
dependent on τ , this will generally be the case. With temporally white noise added to the
samples, there will also be a contribution σ 2 (g1H g1 + g2H g2 ) to the output noise variance.4
4
In actuality, the noise will not be white but shaped by the receiver filter.

EE 4715 (2022): Array Signal Processing


142 Spatial processing techniques

3. Zero-forcing filtering and selection of one output. This solution can be regarded as the
matched filter of item 1, followed by a de-mixing step (multiplication by (GτH Gτ )−1 ), and
selection of one of the outputs. The resulting filter is
 
0
H −1  H †
w = Gτ (Gτ Gτ ) 1 = (Gτ Gτ ) gτ

0
and the output will be [s1 s2 · · ·. Note that in principle we could select also one of the
other outputs, this would give only a shift in the output sequence (starting with [s0 s1 · · ·]
or [s2 s3 · · ·]). With noise, however, reconstructing the center sequence is likely to give
the best performance since it carries the most energy.
4. Wiener filtering. This is
w = R̂−1
X gτ .
Under noise-free conditions, this is asympotically equal to w = (GG H )† gτ , i.e., the zero-
forcing filter. In the presence of noise, however, it is more simply implemented by direct
inversion of the data covariance matrix. Among the linear filtering schemes considered
here, the Wiener filter is probably the preferred filter since it maximizes the output SINR.
As we have seen before, the Wiener filter is asymptotically also equal to a scaling of R−1
n gτ ,
i.e., the result of item 2, taking the correlated ISI-noise into account. (This equivalence
can however only be shown if there is some amount of additive noise as well, or else Rn
and RX are not invertible.)

Delay estimation In general, the delay τ by which the data is received is unknown and has
to be estimated from the data as well. This is a question very related to that of the DOA
estimation considered in the previous section. Indeed, in an ISI-free model xk = gτ sk , the
problem is similar to xk = a(θ)sk , but for a different functional. The traditional technique in
communications is to use the “classical beamformer”: scan the matched filter over a range of τ ,
and take that τ that gives the peak response. As we have seen in the previous sections, this is
optimal if there is only a single component in noise, i.e., no ISI. With ISI, the technique relies
on a sufficient orthogonality of the columns of Gτ . This is however not guaranteed, and the
resolution may be poor.
We may however also use the MUSIC algorithm. This is implemented here as follows: compute
the SVD of X , or the eigenvalue decomposition of RX . In either case, we obtain a basis Us for
the column span of X . In noise-free conditions or asymptotically for a large number of samples,
we know that the rank of X is 3, so that Us has 3 columns, and that
" #
0 g 1 g2
span{Us } = span{Gτ } = span{ }
g1 g2 0
Thus, gτ is in the span of Us . Therefore,
gτ ⊥ Un = (Us )⊥

EE 4715 (2022): Array Signal Processing


Bibliography 143

fractional delay estimation −− spectrum of MF and MUSIC


50
matched filter
45 MUSIC

40

35

30
[dB]

25

20

15

10

0
0 0.2 0.4 0.6 0.8 1
delay [T]

Figure 6.9. Delay estimation: spectra corresponding to the matched filter and MUSIC. The
true delay is 0.2T .

Thus, if we look at the MUSIC cost function (viz. (6.24))

g(τ )H Ûn ÛHn g(τ )


JM U SIC (τ ) =
g(τ )H g(τ )
it will be exactly zero when τ matches the true delay. Figure 6.9 shows the inverse of JM U SIC (τ ),
compared to scanning the matched filter. It is obvious that the MUSIC provides a much higher
resolution.

Bibliography

[1] D.H. Johnson and D.E. Dudgeon, Array Signal Processing: Concepts and Techniques.
Prentice-Hall, 1993.
[2] S.M. Kay, Fundamentals of Statistical Signal Processing: Estimation Theory. Prentice-Hall,
1993.
[3] S. Haykin, Adaptive Filter Theory. Englewood Cliffs (NJ): Prentice-Hall, 1992.
[4] R. A. Monzingo and T. W. Miller, Introduction to Adaptive Arrays. New-York: Wiley-
Interscience, 1980.
[5] H. Krim and M. Viberg, “Two decades of array signal processing research: The parametric
approach,” IEEE Signal Processing Magazine, vol. 13, pp. 67–94, July 1996.

EE 4715 (2022): Array Signal Processing


144 Spatial processing techniques

[6] B.D. van Veen and K.M. Buckley, “Beamforming: A versatile approach to spatial filtering,”
IEEE ASSP Magazine, vol. 5, pp. 4–24, Apr. 1988.

[7] L.L. Scharf, Statistical Signal Processing. Reading, MA: Addison-Wesley, 1991.

[8] D.H. Brandwood, “A complex gradient operator and its application in adaptive array the-
ory,” IEE Proc., parts F and H, vol. 130, pp. 11–16, Feb. 1983.

[9] G. B. Borgiotti and L. J. Kaplan, “Supperresolution of uncorrelated interference sources


by using adaptive array techniques,” IEEE Trans. Antennas Propagat., vol. 27, p. 842–845,
1979.

[10] C. Ben-David and A. Leshem, “Parametric high resolution techniques for radio astronomical
imaging,” IEEE J. Sel. Topics in Signal Processing, vol. 2, pp. 670–684, Oct. 2008.

EE 4715 (2022): Array Signal Processing


Chapter 7

WEIGHTED LEAST SQUARES BEAMFORMING

Contents
7.1 Maximum Likelihood formulation to direction finding . . . . . . . . 145
7.2 Covariance Matching; Weighted Subspace Fitting . . . . . . . . . . . 145
7.3 Gauss-Newton Solver . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
7.4 Application to Radio Astronomy imaging . . . . . . . . . . . . . . . . 145

7.1 MAXIMUM LIKELIHOOD FORMULATION TO DIRECTION FINDING

7.2 COVARIANCE MATCHING; WEIGHTED SUBSPACE FITTING

7.3 GAUSS-NEWTON SOLVER

7.4 APPLICATION TO RADIO ASTRONOMY IMAGING

EE 4715 (2022): Array Signal Processing


146 Weighted Least Squares Beamforming

EE 4715 (2022): Array Signal Processing


Chapter 8

DIRECTION FINDING: THE ESPRIT


ALGORITHM

Contents
8.1 Prelude: Shift-invariance . . . . . . . . . . . . . . . . . . . . . . . . . . 147
8.2 Direction estimation using the ESPRIT algorithm . . . . . . . . . . 148
8.3 Delay estimation using ESPRIT . . . . . . . . . . . . . . . . . . . . . 157
8.4 Frequency estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
8.5 System identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
8.6 Real processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
8.7 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

In Chapter 6, we have looked at the MVDR and MUSIC algorithms for direction finding. It
was seen that MUSIC provides high-resolution estimates for the directions-of-arrival (DOAs).
However, these algorithms need a search over the parameter α, and extensive calibration data
(i.e., the function a(α) for a finely sampled range of α). In this present chapter, we look at
the ESPRIT algorithm for direction estimation. This algorithm does not require a search or
calibration data, but assumes a special array configuration that allows to solve for the DOAs
algebraically, by solving an eigenvalue problem. The same algorithm applies to delay estimation
and to frequency estimation.

8.1 PRELUDE: SHIFT-INVARIANCE

In this chapter, we will be involved in estimating the parameter θ of vectors with the following
polynomial or “Vandermonde” structure
 
1

 θ 

θ2
 
a(θ) =  , |θ| = 1 .

.. 
.
 
 
θN

EE 4715 (2022): Array Signal Processing


148 Direction finding: the ESPRIT algorithm

The phase of θ provides either the angle-of-arrival of signals, its frequency, or its relative delay,
depending on the interpretation of a phase shift in the application. The structure of a(θ) is
rich, and there are several ways to estimate θ from it after it has been perturbed by noise. A
simple (but statistically suboptimal) method is to look for the ratios of entries of a with their
neighbor: each ratio ai+1
ai is equal to θ. To obtain a good estimate in the presence of noise, we
would rather take the average over these ratios:
N
1 X ai+1
θ̂ = . (8.1)
N i=1 ai

This usually provides a very reasonable estimate of θ. The property which has been used here
is that of shift-invariance of the vector: if we shift a(θ) over one position, we obtain the same
vector, but multiplied by θ. Indeed, define the subvectors
   
1 θ
 2

 θ 
 θ 
x= .. , y=
 ..  .

.  . 
 
 
θN −1 θN

Then the shift-invariance is expressed as

θ = (x x)−1 x y = x† y .
H H
y = xθ ⇒ (8.2)

If the entries of x are on the unit circle, then (xH x)−1 = N1 and (xH )i = a1i , and the two
estimates of θ are the same.1 The “algorithm” to compute θ in (8.2) is readily extended to
superpositions of multiple vectors a(θi ) of the same form, and this is the principle underlying
many subspace-based algorithms for harmonic retrieval, direction finding, and rational system
identification. The prototype algorithm for this is the ESPRIT algorithm, which was originally
proposed for direction finding.

8.2 DIRECTION ESTIMATION USING THE ESPRIT ALGORITHM

As in previous chapters, we assume that all signals are narrowband with respect to the propaga-
tion delay across the array, so that this delay translates to a phase shift. We consider a simple
propagation scenario, in which there is no multipath and sources have only one ray towards the
receiving antenna array. Since no delays are involved, all measurements are simply instanta-
neous linear combinations of the source signals. Each source has only one ray, so that the data
model is
X = AS .
1
Otherwise, the estimates are slightly different: the ratios in (8.1) should be weighted by |ai |2 to obtain the
same result. This deemphasizes ratios with a poor SNR.

EE 4715 (2022): Array Signal Processing


8.2 Direction estimation using the ESPRIT algorithm 149

s(t)

x3 ∆ y3

x1 ∆ y1
x2 ∆ y2

Figure 8.1. Array geometry: sensor doublets.

A = [a(α1 ), · · · , a(αd )] contains the array response vectors. The rows of S contain the signals,
multiplied by the fading parameters (amplitude scalings and phase rotations).

Computationally attractive ways to compute {αi } and hence A are possible for certain regu-
lar antenna array configurations for which a(α) becomes a shift-invariant or similar recursive
structure. This is the basis for the ESPRIT algorithm (Roy, Kailath and Paulraj 1987 [1]).

8.2.1 Array geometry

The constraint on the array geometry imposed by ESPRIT is that of sensor doublets: the array
consists of two subarrays, denoted by

x1 (t) y1 (t)
   
 ..   .. 
x(t) =  .  , y(t) =  . 
xM (t) yM (t)

where each sensor yi has an identical response as xi , and is spaced at a constant displacement
vector ∆ (wavelengths) from xi . It is important that the displacement vector is the same for all
sensor pairs (both in length and in direction). The antenna response ai (α) for the pair (xi , yi )
is arbitrary and may be different for other pairs.

EE 4715 (2022): Array Signal Processing


150 Direction finding: the ESPRIT algorithm

8.2.2 Data model

For the pair (xi (t), yi (t)), we have the model

d
X
xi (t) = ai (αk )sk (t)
k=1
Xd d
X
yi (t) = ai (αk )ej2π∆ sin(αk ) sk (t) = ai (αk )θk sk (t)
k=1 k=1

where θk = ej2π∆ sin(αk ) is the phase rotation due to the propagation of the signal from the
x-antenna to the corresponding y-antenna.
In terms of the vectors x and y, we have

d
X
x(t) = a(αk )sk (t)
k=1 (8.3)
Xd
y(t) = a(αk )θk sk (t)
k=1

The ESPRIT algorithm does not assume any structure on a(α). It will instead use the phase
relation between x(t) and y(t).
If we collect N samples in matrices X and Y, we obtain the data model

X = AS
(8.4)
Y = AΘS

where

θ1
 

A = [a(α1 ) ··· a(αd )] , Θ=


 .. ,

θi = ej2π∆ sin(αi ) .
.
θd

One special case in which the shift-invariant structure occurs is that of a uniform linear array
(ULA) with M + 1 antennas. For such an array, with interelement spacing ∆ wavelengths, we
have seen that  
1
 θ 
 
j2π∆ sin(α)
a(θ) =   ..  , θ = e
 . (8.5)
 . 
θM

If we now split the array into two overlapping subarrays, the first (x) containing antennas 1 to

EE 4715 (2022): Array Signal Processing


8.2 Direction estimation using the ESPRIT algorithm 151

M , and the second (y) antennas 2 to M + 1, we obtain


     
1 θ 1

 θ 


 θ2  
  θ 

ax =  .. , ay =  .. = .. θ
. . .
     
     
θM −1 θM θM −1

which gives precisely the model (8.3), where a in (8.3) is one entry shorter than in (8.5).

8.2.3 Algorithm
Given the data X and Y, we first stack all data in a single matrix Z of size 2M × N with model
" # " #
X A
Z= = Az S , Az = .
Y AΘ

(In the case of a ULA with M + 1 antennas, we stack the available antenna outputs vertically
but do not duplicate the antennas; Z will then have size M + 1 × N ). Since Z has rank d, we
compute an (economy-size) SVD
H
Z = Ûz Σ̂z V̂z , (8.6)
where Ûz : 2M × d has d columns which together span the column space of Z. The same space
is spanned by the columns of Az , so that there must exist a d × d invertible matrix T that maps
one basis into the other, i.e., such that
" #
AT
Ûz = Az T = (8.7)
AΘT

If we now split Ûz into two M × d matrices in the same way as Z,


" #
Ûx
Ûz =
Ûy

then we obtain that (


Ûx = AT
Ûy = AΘT

For M ≥ d, Ûx is “tall”, and if we assume that A has full column rank, then Ûx has a left-inverse

Û†x := (Ûx Ûx )−1 Ûx .


H H

It is straightforward to verify that

Û†x = (T A AT)−1 T A = T−1 A†


H H H H

EE 4715 (2022): Array Signal Processing


152 Direction finding: the ESPRIT algorithm

so that
Û†x Ûy = T−1 ΘT .
The matrix on the left hand side is known from the data. Since Θ is a diagonal matrix, the
matrix product on the right hand side is recognized as an eigenvalue equation: T−1 contains
the eigenvectors of Û†x Ûy (scaled arbitrarily to unit norm), and the entries of Θ on the diagonal
are the eigenvalues. Hence we can simply compute the eigenvalue decomposition of Û†x Ûy , take
the eigenvalues {θi } (they should be on the unit circle), and compute the DOAs αi from each
of them. This comprises the ESPRIT algorithm.
Note that the SVD of Z in (8.6) along with the definition of T in (8.7) as Ûz = Az T implies
that
Z = Ûz Σ̂z V̂zH , Z = Az S = Az TT−1 S
⇒ T−1 S = Σ̂z V̂zH = ÛHz Z
⇒ S = TÛHz Z
Hence, after having obtained T from the eigenvectors, a zero-forcing beamformer W on Z such
that S = WH Z is given by
H
W = Ûz T .
Thus, source separation is straightforward in this case and essentially reduced to an SVD and
an eigenvalue problem.
If the two subarrays are spaced by at most half a wavelength, then the DOAs are directly
recovered from the diagonal entries of Θ, otherwise they are ambiguous (two different values of
α give the same θ). Such an ambiguity does not prevent the construction of the beamformer W
from T, and source separation is possible nonetheless. Because the rows of T are determined
only up to a scaling, the correct scaling of the rows of S cannot be recovered unless we know the
average power of each signal or the array manifold A. This is of course inherent in the problem
definition.
With noise, essentially the same algorithm is used. If we assume that the number of sources d
is known, then we compute the SVD of the noisy Z, and set ÛZ equal to the principal d left
singular vectors. This is the best estimate of the subspace spanned by the columns of A, and
asymptotically (infinite samples) identical to it. Thus, for infinitely many samples we obtain the
correct directions: the algorithm is asymptotically unbiased (consistent). For finite samples, an
estimated eigenvalue θ̂ will not be on the unit circle, but we can easily map it to the unit circle
by dividing by |θ̂|.
Compared to the beamforming algorithms in Chap. 6, which locate sources due to their peaks
in a spatial power spectrum, the ESPRIT algorithm finds the exact directions of arrival under
noise-free conditions, or asymptotically as the number of samples grows large. This is due to the
parametric assumptions: an exact number of point sources, less than the number of antennas.
Under these conditions, the sources may be arbitrarily close. On the other hand, the algorithm
will fail if some of the sources are diffuse: such sources must be modeled as part of the noise. If
the noise is not white, the noise covariance must be estimated and whitened.

EE 4715 (2022): Array Signal Processing


8.2 Direction estimation using the ESPRIT algorithm 153

8.2.4 Extension for a ULA


There are many important refinements and extensions to this algorithm. If we have a uniform
linear array, we can use the fact that the solutions θ should be on the unit circle, i.e.,

θ̄ = θ−1

along with the structure of a(θ) in (8.5):

θ̄M
       
1 1 1

1
  M −1  

θ   1  θ̄  θ̄ θ 
 −M
= a(θ)θ−M .
    
a(θ) =  ⇒ Πā(θ) =:  ..
= . = θ

.. 
.
 ..   .   ..
. .   .   .
    
   
θM 1 θ̄M 1 θM

Thus, if we construct an extended data matrix

Ze = [Z , ΠX̄]

then this will double the number of observations but will not increase the rank, since

Ze = Az [S , Θ−1 S] .

Using this structure, it is also possible to transform Ze to a real-valued matrix, by simple linear
operations on its rows and columns [2, 3]. As we saw in chapter 6, there are many other direction
finding algorithms that are applicable. For the case of a ULA in fact a better algorithm is known
to be MODE [4]. Although ESPRIT is statistically suboptimal, its performance is usually
quite adequate. Its interest lies also in its straightforward generalization to more complicated
estimation problems in which shift-invariance structure is present.

8.2.5 Noise whitening


TBD:
The SVD estimates the column span of A, which is the essential first step. If there is white
noise, this does not affect this estimate (asympotically). Alternatively, we can work on an EVD
of Rx .
If the noise covariance Rn is not white, it must be estimated and taken into account: whiten X
−1/2
by working with Rn X, or using a GEV (Rx , Rn ).
(This is common in all subspace-based algorithms and should be mentioned earlier and more
prominently.)

8.2.6 Performance
Figure 8.2 shows the results of a simulation with 2 sources with directions −10◦ , 10◦ , a ULA( λ2 )
with 6 antennas, and N = 40 samples. The first graph shows the mean value, the second the

EE 4715 (2022): Array Signal Processing


154 Direction finding: the ESPRIT algorithm

Mean value of estimated directions Standard deviation of direction estimates


15 5
ESPRIT ESPRIT
MUSIC 4.5 DOA = [−10, 10] MUSIC
M=6
10
4 N=40
estimated directions [degrees]

3.5
5

std(θ) [degrees]
3

0 2.5

2
−5
1.5

1
−10
0.5

−15 0
−10 −5 0 5 10 15 20 −10 −5 0 5 10 15 20
SNR [dB] SNR [dB]

Figure 8.2. Mean and standard deviations of ESPRIT and MUSIC estimates as function of
SNR
Mean value of estimated directions Standard deviation of direction estimates
10 5
ESPRIT ESPRIT
8 MUSIC 4.5 SNR = 10 dB MUSIC
M=6
6 4 N=40
estimated directions [degrees]

4 3.5
std(θ) [degrees]

2 3

0 2.5

−2 2

−4 1.5

−6 1

−8 0.5

−10 0
0 5 10 15 20 0 5 10 15 20
DOA separation [degrees] DOA separation [degrees]

Figure 8.3. Mean and standard deviations of ESPRIT and MUSIC estimates as function of
DOA separation

EE 4715 (2022): Array Signal Processing


8.2 Direction estimation using the ESPRIT algorithm 155

standard deviation (averaged over the two sources), which indicates the accuracy of an individual
estimate. For sufficient SNR, the performance of both algorithms is approximately the same.
Figure 8.3 shows the same for varying separation of the two sources, with an SNR of 10 dB.
For small separation, the performance of ESPRIT drops because the matrix A drops in rank:
it appears to have only 1 independent column rather than 2. If we select two singular vectors,
then this subspace will not be shift-invariant, and the algorithm produces bad estimates: both
the mean value and the standard deviation explode. MUSIC, on the other hand, selects the null
space and scans for vectors orthogonal to it. If we ask for 2 vectors, it will in this case produce
two times the same vector since there is only a single maximum in the MUSIC spectrum. It
is seen that the estimates become biased towards a direction centered between the two sources
(= 0◦ ), but that the standard deviation gets smaller since the algorithm consistenly picks this
center.
The performance of both ESPRIT and MUSIC is noise limited: without noise, the correct DOAs
are obtained. With noise and asymptotically many samples, N → ∞, the correct DOAs are
obtained as well, since the subspace spanned by Ûz is asymptotically identical to that obtained
in the noise-free case, the span of the columns of A.

8.2.7 Extension to coherent multipath

In the above, we assumed that there was no multipath: each source had only one path to the
antenna array. However, the X = AS model is also valid if sources have multiple rays towards
the array, as long as the delay differences are small compared to the signal bandwidth, so that
they can be represented by phase shifts. This is known as coherent multipath (see also Sec.
4.3.1).
Pd
Let d be the number of sources, ri the number of rays belonging to source i, and r = 1 ri the
total number of rays (assumed to be distinct). In that case, a more detailed model is

X = (Aθ BJ)S (8.8)

where Aθ : M × r is the Vandermonde matrix associated to the DOAs of the rays, and J : r × d
is a selection matrix which adds groups of rays to source signals, e.g.,
 
1 0
 1 0 
J=
 
0 1

 
0 1

in case of two sources, each with two rays. B is a diagonal scaling matrix representing the
different amplitudes (fadings) of each ray, including phase offsets. Because the rank of X is still
d, the SVD of X can retrieve only a d-dimensional subspace Û, so that
H
Û = (Aθ BJ)T , S = TÛ X .

EE 4715 (2022): Array Signal Processing


156 Direction finding: the ESPRIT algorithm

It is clear that blind beamforming is more challenging now: we try to find T such that each
column of Û is represented by a sum of r Vandermonde vectors, rather than only d vectors, and
r is not known.
To solve this problem algebraically using ESPRIT-type techniques,2 we first try to restore the
rank to r. This is possible if the number of antennas M is sufficiently large, in fact M ≥
r + max(ri ). In that case, we can form a block-Hankel matrix out of Û by taking vertical shifts
of it:
Um := [Û(1) Û(2) · · · Û(m) ] : (M − m + 1) × md . (8.9)
Here, Û(i) is a submatrix of Û consisting of its i-th till M − m + i-th row, and m is known as the
spatial smoothing factor [5, 6]. With the above model, we have that Um satisfies the factorization

Um = A0θ B[JT ΘJT · · · Θm−1 JT]


(8.10)
=: A0θ BT ,

where A0θ consists of the top M − m + 1 rows of Aθ . If M − m + 1 ≥ r and m ≥ max(ri ), the


factors in the above factorization can be shown to have full rank r, so that Um has rank r.
At this point, the structure of Um in (8.10) shows that we have reduced the problem to an
[X = AS]-type problem without multipath, which can be solved using the ESPRIT algorithm.
Thus we compute an SVD of Um ,
Um = Ûu Σ̂u V̂u ,
where Ûu contains the dominant r singular vectors of Um . From (8.10) it follows that there is
an invertible r × r matrix R such that

Ûu = A0θ BR ,
H
T = (RÛu )Um .

We continue in the same way as before to compute R: with

Ûx = Jx Ûu , Ûy = Jy Ûu ,

the data model satisfies the eigenvalue equation

Û†x Ûy = R−1 ΘR (8.11)

which gives both Θ and R, up to scaling of its rows. At this point, we have recovered T =
(RÛHu )Um , up to multiplication at the left by an arbitrary diagonal matrix. The next objective
is to estimate T from the structure of T in (8.10). This is now a much simpler task: we have
available m matrices of size r × d, after correction by suitable powers of Θ−1 all equal to JT.
The structure of J ensures that this matrix has only d distinct rows, which are the d rows of T.
Hence, it suffices to estimate these d unique rows, which is a simple clustering problem if the
rows of T are sufficiently different. This determines both T and J, i.e., the assignment of rays
to sources. With T in hand, we have our blind beamformer as before: WH = TÛH .
2
Other techniques such as MODE are directly applicable to the coherent case without modifications.

EE 4715 (2022): Array Signal Processing


8.3 Delay estimation using ESPRIT 157

8.3 DELAY ESTIMATION USING ESPRIT

A channel matrix H can be estimated from training sequences, or sometimes “blindly” (without
training). Very often, we do not need to know the details of H if our only purpose is to recover
the signal matrix S. But there are several situations as well where it is interesting to pose a
multipath propagation model, and try to resolve the individual propagation paths. This would
give information on the available delay and angle spread, for the purpose of diversity. It is often
assumed that the directions and delays of the paths do not change quickly, only their powers
(fading parameters), so that it makes sense to estimate these parameters. If the channel is
well-characterized by this parametrized model, then fitting the channel estimate to this model
will lead to a more accurate receiver. Another application is wireless localization.

8.3.1 Principle
Let us consider first the simple case already introduced in Sec. 6.7. Assume we have a vector g0
corresponding to N samples of an FIR pulse shape function g(t), sampled with period T above
the Nyquist rate,  
g(0)

 g(T ) 

g(t) ↔ g0 =  .. .
.
 
 
g((N − 1)T )
Similarly, we can consider a delayed version of g(t):
 
g(0 − τ )

 g(T − τ ) 

g(t − τ ) ↔ gτ =  .. .
.
 
 
g((N − 1)T − −τ )

The number of samples N is chosen such that at the maximal possible delay, g(t−τ ) has support
only on the interval [0, N T i symbols.
Given gτ and knowing g0 , how do we estimate τ ? Note here that τ does not have to be a multiple
of T , so that gτ is not exactly a shift of the samples in g0 . A simple “pattern matching” with
entry-wise shifts of g0 will not give an exact result.
We can however make use of the fact that a Fourier transformation maps a delay to a certain
phase progression. Let
N −1

e−jωi k g(kT ) ,
X
g̃(ωi ) = ωi = i , i = 0, 1, · · · , N − 1 .
k=0
N

In matrix-vector form, this can be written as

g̃0 = F g0 , g̃τ = F gτ

EE 4715 (2022): Array Signal Processing


158 Direction finding: the ESPRIT algorithm

where F denotes the DFT matrix of size N × N , defined by


 
1 1 ··· 1
 1 φ ··· φN −1 

φ = e−j N .
 
F :=  .. .. .. , (8.12)
. . .
 
 
2
1 φN −1 · · · φ(N −1)

If τ is an integer multiple of T , then it is straightforward to see that the Fourier transform g̃τ
of the sampled version of g(t − τ ) is given by
   
1 1
 φτ /T   φτ /T 
   
 (φτ /T )2  = diag(g̃0 ) ·  (φτ /T )2
   
g̃τ = g̃0  (8.13)

 .. 


 .. 

 .   . 
(φτ /T )N −1 (φτ /T )N −1

where represents entrywise multiplication of the two vectors. The same holds true for any τ
if g(t) is bandlimited and sampled at or above the Nyquist rate.
Thus, we will assume that g(t) is bandlimited and sampled at such a rate that (8.13) is valid even
if τ is not an integer multiple of T . The next step is to do a deconvolution of g(t) in frequency
domain, by entrywise dividing g̃τ by g̃0 . Obviously, this can be done only on intervals where g̃0
is nonzero. Pulse shapes are bandlimited, and if we sample above Nyquist, some entries of g̃0
will be close to zero. If necessary, a selection matrix has to be applied to select only the nonzero
interval.
Next, we factor diag(g̃0 ) out of g̃τ and obtain

z := {diag(g̃0 )}−1 g̃τ , (N × 1) (8.14)

which satisfies the model


 
1

 φ 

φ2 φ := e−j N
2π τ
 
z = f (φ) , f (φ) :=  , T . (8.15)

.. 
.
 
 
φN −1

Note that f (φ) has the same structure as a(θ) for a ULA. Hence, we can apply the ESPRIT
algorithm in the same way as before to estimate φ from z, and subsequently τ . In the present
case, we simply split z into two subvectors x and y, one a shift of the other, and from the model
y = xφ we can obtain φ = x† y, from which we can compute τ .

EE 4715 (2022): Array Signal Processing


8.3 Delay estimation using ESPRIT 159

LP samples

LP samples LWmax
Lg P L
1 1

0.8 P=2
L=9 0.8
0.6 Lg=6
rolloff=0.3 0.6
0.4

0.2 0.4

0
0.2

−0.2
0
0 2 4 6 8 −P/2 −0.5 0 0.5 P/2
(a) time [T] (b) frequency

Figure 8.4. Definition of parameters: (a) time domain, (b) frequency domain.

Oversampled pulse shapes There are some details that we skipped in the preceding discussion.
First of all, we assumed g(t) has a representation as an FIR filter. Because of the truncation
to length N , the spectrum of g(t) widens and sampling at a rate 1/T introduces some aliasing
due to spectral folding. This will eventually lead to a small bias in the delay estimate. To avoid
this, we can oversample the channel.
To give a specific example, assume that g(t) is a raised cosine pulse, as in Fig. 8.4. For conve-
nience of notation we normalize the time axis and set T = 1/P , where P is the oversampling
factor (in the figure, P = 2), so that in the DFT frequency domain we have N samples within
the fundamental interval −P/2 < F < P/2. Clearly, g(t) is bandlimited, and only L = N/P
frequency domain samples are significant. In the deconvolution step, we cannot divide out g̃0 ,
because we will be dividing by small numbers.
Let Jg̃ : L × N be a selection matrix for g̃, such that Jg̃ g̃ has the desired entries. For later use,
we require that the selected frequencies appear in increasing order, which with the definition of
the DFT in (8.12) means that the final dL/2e samples of g̃0 should be moved up front: Jg̃ has
the form
" #
0 0 IdL/2e
Jg̃ = : L×N.
IbL/2c 0 0

Next, we can factor diag(Jg̃ g̃0 ) out of Jg̃ g̃τ and obtain

z := {diag(Jg̃ g̃0 )}−1 Jg̃ g̃τ , (L × 1) (8.16)

EE 4715 (2022): Array Signal Processing


160 Direction finding: the ESPRIT algorithm

(τi , βi )

x(t)
sk g(t)

Figure 8.5. Multiray propagation channel

which satisfies the model


 
1

 φ 

z = f (φ) φ−dL/2e , φ2 φ := e−j N
2π τ
 
f (φ) :=  , T . (8.17)

.. 
.
 
 
φL−1
We are essentially back to (8.15), although the vector is a bit shorter.

8.3.2 Multipath channel model estimation


We will now build on the above principle. Consider a multipath channel which consists of r
delayed copies of g(t), as in Fig. 8.5,3 so that the impulse response is
β1
 
r r
X X  .. 
h(t) = βi g(t − τi ) ⇔ h= gτi βi = [gτ1 , · · · , gτr ]  .  =: Gτ b .
i=1 i=1
βr
We assume that we know h (e.g., from a channel identification using a training sequence). Also
the pulse shape g(t) is known. The unknowns are the parameters {τi } and {βi }. Our objective
is to estimate these parameters.
As before, we can introduce the DFT transformation and the deconvolution by the known pulse
shape,
z := {diag(g̃)}−1 Fh , (N × 1) .
The vector z has model
 
1

 φ 

φ2
 
z = Fb , F = [f (φ1 ) , · · · , f (φr )] , f (φ) :=  .

.. 
.
 
 
φN −1
3
As in Sec. 4.3.6, but for a single receiver antenna.

EE 4715 (2022): Array Signal Processing


8.3 Delay estimation using ESPRIT 161

Since there are now multiple components in F and only a single vector z, we cannot simply
estimate the parameters from this single vector by splitting it in x and y: this would allow only
to estimate a model with a single component. However, we can use the shift-invariance of the
vectors f (·) to construct a matrix out of z as
Z = [z(0) , z(1) , · · · , z(m−1) ] , (N − m + 1 × m) (8.18)
where  
zi+1
(i)

 zi+2 

z :=  .. 
.
 
 
zN −m+i
is a subvector of z containing the i + 1-st till the N − m + i-th entry. If we define f (φ)(i) similarly,
then
φi
   
1
 i+1  
 φ   φ  i

f (φ)(i) =  i+2  =  2  φ =: f 0 (φ)φi .
 φ   φ 
.. ..
   
. .
Thus, Z has the model
Z = F0 B , F0 = [f 0 (φ1 ) , · · · , f 0 (φr )]
φ1
 

B = [b Φb Φ2 b ··· Φm−1 b] , Φ=


 .. 
. 
φr
where F0 is a submatrix of F of size N − m + 1 × r, and B has size r × m. Since each column
of F0 has the required shift-invariant structure, this is a model of the form that can be used by
ESPRIT: split Z into X and Y,
" # " #
X ∗∗∗
Z= =
∗∗∗ Y
where X contains all but the last rows of Z, and Y contains all but the first. Subsequently
compute the eigenvalue decomposition
X† Y = T−1 ΦT .
This determines Φ as the eigenvalues of X† Y, from which the delays {τi } can be estimated.
This algorithm produces high-resolution estimates of the delays, in case the parametrized model
holds with good accuracy for h. There is however one condition to check. ESPRIT requires that
the factorization Z = F0 B is a low-rank factorization, i.e., if F0 is strictly tall (N − m + 1 > r)
and B is square or wide (r ≤ m). These conditions imply
1
r≤ N.
2

EE 4715 (2022): Array Signal Processing


162 Direction finding: the ESPRIT algorithm

Thus, there is a limit on the number of rays that can be estimated: not more than half the
number of samples in frequency domain. If this condition cannot be satisfied, we need to use
multiple antennas. This is discussed in Sec. 9.3.

8.4 FREQUENCY ESTIMATION

The ESPRIT algorithm can also be used to estimate frequencies. Consider a signal x(t) which
is the sum of d harmonic components,
d
X
x(t) = βi ejωi t (8.19)
i=1

Suppose that we uniformly sample this signal with period T (satisfying the Nyquist criterion,
here −π ≤ ωi T < π), and have available x(T ), x(2T ), · · · , x(N T ). We can then collect the
samples in a data matrix Z with m rows,
 
x1 x2 x3 · · ·
 x2 x3 x4 · · · 
Z= . . . , xk = x(kT ) .
 
 .. .. .. 
xm xm+1 · · · xN
From (8.19), we see that this matrix satisfies the model
 
1 ··· 1
φ1 ··· φd  β φ β1 φ21 · · ·
 
  1 1
 . ..
φ21 ··· φ2d

Z = AS :=   . 
. . 

.. .. 
. .  βd φd βd φ2d · · ·
 

φm−1
1 ··· φm−1
d

where φi = ejωi T . Since the model is the same as before, we can estimate the phase factors {φi }
as before using ESPRIT, and from these the frequencies {ωi } follow uniquely, since the Nyquist
condition was assumed to hold.
The parameter m has to be chosen larger than d. A larger m will give more accurate estimates,
however if N is fixed then the number of columns of Z (= N − m + 1) will get smaller and there
is a tradeoff. For a single sinusoid in noise, one can show that the most accurate estimate is
obtained by making Z rectangular with 2 times more columns than rows, m = N3 .

8.5 SYSTEM IDENTIFICATION

Linear time-invariant (LTI) systems can be represented using state-space models. This is in
particular convenient in the case of systems with multiple inputs and multiple outputs (MIMO).
The time-invariance gives rise to a shift invariance property, which allows to identify the state-
space matrices.

EE 4715 (2022): Array Signal Processing


8.5 System identification 163

x−1

u−1 y−1

z z
x0

u0 y0

z z
x1

u1 y1 xk

z z
x2 A C
uk yk
D
u2 y2 B

z z xk+1

(a) (b)

Figure 8.6. LTI state space model. (a) Mapping of an input sequence {ui } to an output
sequence {yi } using an intermediate state sequence {xi }. The state dimension is
d = 2. Due to causality, the signal flow is from top to bottom. The delay operator
z −1 denotes a time shift here. (b) The operation at a particular time instant k
is a linear map from input uk and current state xk to output yk and next state
xk+1 .

8.5.1 State space model


8.5.2 State space representation

The familiar state space model used to describe causal LTI systems is (for a system with a scalar
input uk and a scalar output yk ),

xk+1 = Axk + Buk


(8.20)
yk = Cxk + Duk .

Here, xk is the state vector (assumed to have d entries), A is a d × d state transition matrix,
B and C T are d × 1 vectors, and D is a scalar (see Fig. 8.6). The integer d is called the state
dimension or system order. All finite dimensional linear systems can be described in this way.
The representation (8.20) is not at all unique. An equivalent system representation (yielding the
same input-output relationship) is obtained by applying a state transformation R (an invertible

EE 4715 (2022): Array Signal Processing


164 Direction finding: the ESPRIT algorithm

d × d matrix) to define a new state vector x0k = Rxk . The equivalent system is

x0k+1 = A0 x0k + B0 uk
yk = C0 x0k + Duk

where the new state space quantities are given by


" # " #" #" #
A0 B0 R−1 A B R
= .
C0 D 1 C D 1

The eigenvalues of A remain invariant under this transformation since R−1 AR is a similarity
transformation. The eigenvalues of A are directly related to the poles of the system; for stability,
they are required to be bounded by 1.
The impulse response of this system is

CA2 B
H
h = [··· 0 D CB CAB ···] . (8.21)

The realization problem is to find a state space representation that matches a given impulse
response. As pointed out above, this representation is not unique.

8.5.3 Hankel operator


The solution to the realization problem in a subspace context calls for the Hankel matrix, defined
from the impulse response as  
h1 h2 h3 · · ·
 h2 h3
 

H= .. . (8.22)
 
 h3 . 

.. 
.
The Hankel structure is recognized: H is constant along the anti-diagonals.
Let us define the controllability operator C and observability operator O as
 
C

CA 
A2 B
   
O=
 CA2
;
 C= B AB ··· . (8.23)
..
 
.

Then, using (8.21) and comparing to (8.22) shows that H has a factorization as

H = OC

For a minimal realization, C and O have by definition full rank d. Since H is an outer product
of rank d matrices, it must be of rank d itself. Even for minimal realizations, there is of course

EE 4715 (2022): Array Signal Processing


8.5 System identification 165

an ambiguity in this factorization. With R an invertible d × d matrix, we can also factor H


as H = O0 C 0 = OR · R−1 C, corresponding to a state space model that has undergone a state
transformation by R as described above. Factorizations modulo R lead to equivalent systems.
C and O have a shift-invariance structure. E.g., if we let O↑ denote O with its top row removed
(thus, shifted upwards), then (8.23) shows that
O↑ = OA
Likewise, if C → denotes C with its first column removed (thus, shifted to the left), then
C → = AC
This shift-invariance carries over to H.
H↑ = O↑ C = OA · C
H← = OC ← = O · AC .
Thus it is seen that shifting H upwards or to the left is equivalent to a multiplication by A in
the center of the factorization.

8.5.4 Realization scheme


Using the above two properties of the Hankel operator H — i.e., that it is of finite rank with
some minimal factorization H = OC, and that it is shift-invariant — we will show how to obtain
a state space realization as in equation (8.20) from a given impulse response.

1. Given the impulse reponse, construct the Hankel matrix H as in (8.22). Determine the
rank d, and any factorization H = OC, where O and C are of full rank d. The SVD is a
robust tool for doing this.
2. At this point, we know that C and O have the shift-invariant structure of equation (8.23).
Use this property to derive
OA = O↑ ⇒ A = O† O↑
Because O is of full row rank d, we have O† = (OH O)−1 OH . This determines A. The
matrices B, C and D follow simply as
B = C(:,1)
C = O(1,:)
D = h0
where the subscript (:, 1) denotes the first column of the associated matrix, and (1, :) the
first row.

In practice, H should have finite size. This issue can be dealt with relatively easily. Further,
in practice we may not have the impulse response, but only a single input signal u plus its
corresponding output signal y. With some effort, we can adapt the algorithm to this situation.
See Verhaegen [7, 8] for details.

EE 4715 (2022): Array Signal Processing


166 Direction finding: the ESPRIT algorithm

8.6 REAL PROCESSING

TBD: unitary ESPRIT [2, 3]

8.7 NOTES

The ESPRIT algorithm was originally proposed by Roy and Kailath in [9, 10]. See [11, 12] for
overviews.
Delay estimation using ESPRIT was proposed in [13].

Bibliography

[1] R. Roy and T. Kailath, “ESPRIT – Estimation of Signal Parameters via Rotational Invari-
ance Techniques,” IEEE Trans. Acoust., Speech, Signal Proc., vol. 37, pp. 984–995, July
1989.

[2] M. Haardt and J. Nossek, “Unitary ESPRIT: how to obtain increased estimation accuracy
with a reduced computational burden,” IEEE Trans. Signal Proc., vol. 43, pp. 1232–1242,
May 1995.

[3] M. Zoltowski, M. Haardt, and C. Mathews, “Closed-form 2-D angle estimation with rect-
angular arrays in element space or beamspace via Unitary ESPRIT,” IEEE Trans. Signal
Proc., vol. 44, pp. 316–328, February 1996.

[4] P. Stoica and K. Sharman, “Maximum Likelihood methods for direction-of-arrival estima-
tion,” IEEE Trans. Acoust., Speech, Signal Proc., vol. 38, pp. 1132–1143, July 1990.

[5] T. Shan, M. Wax, and T. Kailath, “On spatial smoothing for direction-of-arrival estimation
of coherent signals,” IEEE Trans. Acoust. Speech Signal Proc., vol. 33, pp. 806–811, April
1985.

[6] U. Pillai and B. Kwon, “Forward/backward spatial smoothing techniques for coherent signal
identification,” IEEE Trans. Acoust., Speech, Signal Proc., vol. 37, pp. 8–15, January 1989.

[7] M. Verhaegen and P. Dewilde, “Subspace model identification. Part 1: The Output Error
state space model identification class of algorithms,” Int. J. Control, vol. 56, no. 5, pp. 1187–
1210, 1992.

[8] M. Verhaegen and P. Dewilde, “Subspace model identification. Part 2: Analysis of the ele-
mentary Output-Error state-space model identification algorithm,” Int. J. Control, vol. 56,
no. 5, pp. 1211–1241, 1992.

EE 4715 (2022): Array Signal Processing


Bibliography 167

[9] R. Roy, A. Paulraj, and T. Kailath, “ESPRIT—a subspace rotation approach to estimation
of parameters of cisoids in noise,” IEEE Trans. Acoust., Speech, Signal Proc., vol. 34,
pp. 1340–1342, Oct. 1986.

[10] R. Roy, ESPRIT. PhD thesis, Stanford Univ., Stanford, CA, 1987.

[11] F. Li and R. Vaccaro, “Analytical performance prediction of subspace-based algorithms for


DOA estimation,” in SVD and Signal Processing, II: Algorithms, Analysis and Applications
(R. Vaccaro, ed.), pp. 243–260, Elsevier, 1991.

[12] A. van der Veen, E. Deprettere, and A. Swindlehurst, “Subspace based signal analysis using
singular value decomposition,” Proceedings of the IEEE, vol. 81, pp. 1277–1308, Sept. 1993.

[13] A. van der Veen, M. Vanderveen, and A. Paulraj, “Joint angle and delay estimation using
shift-invariance properties,” subm. IEEE Signal Processing Letters, Aug. 1996.

EE 4715 (2022): Array Signal Processing


168 Direction finding: the ESPRIT algorithm

EE 4715 (2022): Array Signal Processing


Chapter 9

JOINT DIAGONALIZATION AND KRONECKER


PRODUCT STRUCTURES

Contents
9.1 Joint azimuth and elevation estimation . . . . . . . . . . . . . . . . . 169
9.2 Connection to the Khatri-Rao product structure . . . . . . . . . . . 173
9.3 Joint angle and delay estimation . . . . . . . . . . . . . . . . . . . . . 175
9.4 Joint angle and frequency estimation . . . . . . . . . . . . . . . . . . 180
9.5 Multiple invariances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
9.6 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181

In the previous chapter, we have seen how direction finding of narrowband sources using a ULA,
delay estimation, and frequency estimation, all lead to a similar data model that shows shift-
invariance, and can be solved using the ESPRIT algorithm. In this chapter, we extend this to
a number of joint estimation techniques: determining the two-dimensional directions of arrival,
joint angle-delay estimation, and joint angle-frequency estimation. The data models related to
these applications have the same Khatri-Rao product structure, as well as the shift-invariance
structure. Therefore, the estimation algorithms can be based on an extension of the ESPRIT
algorithm to two dimensions. In many cases, the shift invariance is not even needed to enable
source separation: the Khatri-Rao product structure suffices.

9.1 JOINT AZIMUTH AND ELEVATION ESTIMATION

9.1.1 The data model


As an extension of the ESPRIT-related doublet array, consider M sensor triplets, each composed
of three identical sensors with unknown gain and phase patterns, which may vary from triplet
to triplet. For every triplet, the displacement vectors dxy and dxz between its components are
required to be the same. See Fig. 9.1.
This way, collecting N time-domain samples and assigning the three sensors of each triplet to
each of the data matrices X, Y, Z, respectively, three identical although displaced arrays are

EE 4715 (2022): Array Signal Processing


170 Joint diagonalization and Kronecker product structures

dxz
z3
dxy
x3 y3

dxz
z1
dxy dxz z
2
x1 y1 d
x xy y2
2

Figure 9.1. Sensor array consisting of triplets

Figure 9.2. Possible array configurations: a uniform rectangular array (URA), an L-shaped
array, and a +-shaped array.

obtained. Impinging on every array are d narrowband non-coherent signals sk (t). In a direct
extension of the data models in Chap. 8, we obtain the data model (ignoring the noise for the
moment)
    
 X
 = Ax S = AS X A
Y = Ay S = AΦS ⇔  Y  =  AΦ  S . (9.1)
   

 Z = A S = AΘS
z Z AΘ

We write A = Ax for brevity of notation. We will not assume a more detailed structure of A:
all structure in the problem is obtained by the assumption of shift invariance in the relation of
Ax to Ay and Az .
Fig. 9.2 shows some other antenna configurations that lead to the required shift invariance: a
Uniform Rectangular Array (URA), an L-shaped array, and a +-shaped array. The latter two
have a larger aperture for the same number of antennas, but it is sparsely filled, and the number
of baselines that are used is less: A has fewer rows. Hence, it is hard to say a priori which
geometry is preferred.
Due to the shift-invariance of the array, Ay = Ax Φ and Az = Ax Θ, where Φ and Θ are diagonal

EE 4715 (2022): Array Signal Processing


9.1 Joint azimuth and elevation estimation 171

matrices with entries ω0


φk = e−j c dxy ·ζ k
ω0 (9.2)
θk = e−j c dxz ·ζ k , k = 1, · · · , d ,
in which ζ k is the propagation direction vector of the kth signal, ω0 is the carrier frequency of
the d signals, and c is the propagation velocity.
S is the signal matrix (d × N ). The matrices A and S are unknown, and are not rank-deficient
by assumption. The matrices
bP hi and
bT heta are diagonal and contain the phase shifts (9.2) for each signal:

Φ = diag(φ1 , φ2 , · · · , φd )
Θ = diag(θ1 , θ2 , · · · , θd )

The DOA problem is to estimate Φ and Θ from (X, Y, Z). From these matrices, the 2D angles
of arrival can directly be computed.
At this point, of course, the ESPRIT algorithm can be applied separately to (X, Y) and (X, Z).
This will produce two sets of angles. However, the angles are listed in random order. How can
the correct pairs be found?

9.1.2 Preprocessing: estimating the column span

Since there are d sources, we would like to reduce the problem to matrices of size d × d. As
before, this is done by computing an SVD. First, construct the combined data matrix
 
X
K= Y .
 
Z

In view of the model (9.1), we know that without noise, K has rank d. Therefore, we can
compute the ‘economy-size’ SVD,
H
K = UΣV

where U has d columns. In the case of noise, we need to actually compute the truncated SVD
where we take the dominant d singular components. Also, if we assume array configurations as
shown in Fig. 9.2, where X, Y, Z share some of the same elements, then the SVD is computed
on a data matrix K that contains only the unique elements.
Partitioning U in the same way as K gives
   
X Ux
H
Y   Uy  ΣV .
=
   

Z Uz

EE 4715 (2022): Array Signal Processing


172 Joint diagonalization and Kronecker product structures

The d columns of U span the signal subspace. Comparing to the model (9.1), we find that there
must be a d × d invertible matrix T that maps one basis of the subspace to the other:
     
Ux Ax A
 Uy  =  Ay  T =  AΦ  T (9.3)
     
Uz Az AΘ

This implies
U K = ΣV = T−1 S
H H

so that the separating beamformer (such that WH K = S) is


H H
W = TU .

The source separation problem thus reduces to find T.

9.1.3 Joint diagonalization

Equation (9.3) shows that A = Ux T−1 , so that


(
Uy = Ux T−1 ΦT
(9.4)
Uz = Ux T−1 ΘT

Assuming Ux has a left inverse (this requires at least M ≥ d), compute My = U†x Uy and
Mz = U†x Uz , both of size d × d Then these have model
(
My = T−1 ΦT
(9.5)
Mz = T−1 ΘT

This shows that both My and Mz are jointly diagonalized by the same T. (Here, we mean
diagonalization by a similarity transform; we will see another form of diagonalization later on.)
These two equations are redundant: already one of the two will allow us to compute T. If the
eigenvalues Φ are distinct, then we can compute T from an eigenvalue decomposition of My ;
its eigenvector matrix T is unique up to a permutation and a scaling of its columns; this will
translate to an unknown scaling and permutation of the rows of S. The scaling can be fixed by
prior knowledge on the powers of the sources, or on the norms of the columns of A. In this case,
we can compute T from My and apply it to Mz to find Θ = TMz T−1 .
Similarly, if the eigenvalues Θ are distinct, we can compute T from an eigenvalue decomposition
of Mz , and then use T to compute Φ.
In either case, we find the correct correspondence between the entries of Φ and those of Θ, i.e.,
one pair for each source. This correct pairing does not happen if we compute the two eigenvalue
decompositions separately, as generally the eigenvalues will appear in random order.

EE 4715 (2022): Array Signal Processing


9.2 Connection to the Khatri-Rao product structure 173

With noise, we use the SVD to compute the dominant subspace U and proceed as above to
find My and Mz . However, now there is not a single T that exactly diagonalizes both matrices.
We would aim to find a single T to diagonalize as much as possible both matrices. This Joint
Approximate Diagonalization problem has several formulations and several algorithms have been
proposed, and a decent treatment warrants a separate chapter.
In one formulation, we can propose a QR decomposition of T−1 = QR (where Q is unitary and
R is upper triangular), so that
(
My = QRy QH
(9.6)
Mz = QRz QH

where Ry and Rz are upper triangular. This translates the problem into a joint Schur decom-
position. Since Q is unitary, it can be composed of 2 × 2 rotations (called Jacobi rotations),
which leads to numerically stable algorithms. The main diagonals of Ry and Rz give us Φ and
Θ, respectively.

9.2 CONNECTION TO THE KHATRI-RAO PRODUCT STRUCTURE

Recall the model (9.1)


   
X A
 Y  =  AΦ  S
   
Z AΘ
If we define a matrix F from the diagonals of Φ and Θ as
 
1 1 ··· 1
F =  φ1 φ2 · · · φd 
 
θ1 θ2 · · · θd

then we can write this compactly as


 
X
 Y  = (F ◦ A)S
 
Z

where ◦ denotes the Khatri-Rao product (column-wise Kronecker product); see Sec. 5.1.6 for its
definition and some properties.
Likewise, we can write (9.3) compactly as
 
Ux
 Uy  = (F ◦ A)T
 
Uz

EE 4715 (2022): Array Signal Processing


174 Joint diagonalization and Kronecker product structures

Note that this Khatri-Rao product structure is the only property that was needed to derive
the joint diagonalization model (9.5), via (9.4) and subsequently removing one matrix (Ux ) by
inversion. Thus, whenever we have this structure, we can transform it into joint diagonalization.
Further, note that here we expanded F into its rows; each row leading to a matrix of the form
shown in (9.4). But the form U = (F ◦ A)T is multilinear: we can also expand along T or
A, and arrive at a joint diagonalization model. E.g., expanding T = [t1 , · · · , td ] and likewise
U = [u1 , · · · , ud ] gives
T
uk = (F ◦ A)tk ⇔ Uk = ADk F , k = 1, · · · , d,
where Uk is a M × 3 matrix such that vec(Uk ) = uk , and Dk is a diagonal matrix such that
diag(Dk ) = tk . We used (5.8) to derive this. If FT is square and invertible (but here it is not;
this requires some preprocessing) then if we premultiply the set of matrices with a left inverse of
U1 , where U†1 = F−T D−1 †
1 A , then

U†1 Uk = F−T (D−1


T
1 Dk )F , k = 2, · · · d .
This is indeed another joint diagonalization model, now involving d matrices. Likewise, we can
expand along A and obtain a joint diagonalization model with M matrices.
We have seen here that inverting one term (Ux or U1 ) and applying it to the other components
leads to the desired result. That makes the first component “special” in some sense. If it
is unreliable due to noise, or if it is poorly conditioned, then that carries over to all other
components.
It is possible to avoid the inversion and replace it by correlation, as follows. Consider again
(9.3), viz.    
Ux A
 Uy  =  AΦ  T .
   
Uz AΘ
This time, premultiply by1 UHx :
       
UHx Ux TH AH A Mx B
 H   H H
 Ux Uy  =  T A AΦ  T ⇔  My  =  BΦ  T
    
UHx Uz TH AH AΘ Mz BΘ

where B = TH AH A is a square invertible matrix (assuming A is tall and full rank). In other
words, we have three d × d matrices of the form

 Mx
 = BT
M
y = BΦT

 M
z = BΘT
1
This still singles out one data matrix and transfers its noise over to the other matrices. It would be better
to compute the column span of A from an SVD of [Ux , Uy , Uz ], i.e., stacked in a block row, and use this joint
estimate to reduce the dimensions to size d × d.

EE 4715 (2022): Array Signal Processing


9.3 Joint angle and delay estimation 175

This is also a joint diagonalization problem, but now “by congruence” and not by similarity.
Also this problem has been well studied and several algorithms have been proposed. One tech-
nique to proceed is to insert QR factorizations B = QR and T = R0 Z (where R, R0 are upper
triangular and Q, Z are unitary matrices). Then the problem has the form

 Mx
 = QRx Z
My = QRy Z (9.7)

 M
z = QRz Z

where Rx , Ry , Rz are upper triangular. We thus need to find unitary matrices Q, Z to make
Mx , My , Mz upper triangular. From the main diagonals of Rx , Ry and Rz , we can recover Φ
and Θ.
This problem is a “joint” generalized Schur decomposition, see Sec. 5.7. The matrices Q, Z can
be found using a generalization of the QZ algorithm. Note that a good starting point for the
iteration is available by first computing the solution to a single generalized eigenvalue problem.
Comparing (9.7) to (9.6), we see that two unitary matrices Q, Z are used instead of only one
Q. At the same time, three matrices Mx , My , Mz are available, rather than two. The number
of degrees of freedom in two unitary matrices is about equal to that of a single general matrix.
Thus, in the present case, the number of equations and number of unknowns has about the same
balance.
We have seen that the Khatri-Rao structure of the form X = (F ◦ A)T is the root to the
joint diagonalization model. This structure is an instance of a more general canonical polyadic
decomposition (CPD) of the data matrix, in this case of a third order tensor. A CPD aims to
find a low multi-linear rank approximation of a given tensor. The model in the present context
is similar to parallel factor analysis (PARAFAC). A CPD is more general because it allows more
than 3 dimensions, gives exact conditions on the dimensions in relation to the rank such that
there is a ‘unique’ decomposition, and allows for sparse representations. The tensor framework
also gives access to other decompositions such as a block term decomposition (BTD).

9.3 JOINT ANGLE AND DELAY ESTIMATION

A second application that leads to a joint diagonalization problem is the following. In Sec. 8.3,
we studied the multipath estimation problem. Starting from a channel estimate h(t), we want
to estimate the individual path delays, directions of arrival, and path gains of each ray, as shown
in Fig. 9.3. With multiple antennas, the channel model is

r
X
h(t) = a(αi )βi g(t − τi ) .
i=1

EE 4715 (2022): Array Signal Processing


176 Joint diagonalization and Kronecker product structures

(αi , τi , βi )

P
x1
space
time ŝk
sk g(t) equalizer
xM

Figure 9.3. Multiray propagation channel

Here, the pulse shape g(t) is known, and the antenna response vector a(α) is known as function
of α. We assume a ULA with interelement spacing ∆ wavelengths, so that
 
1

θ 
θ = ej2π∆ sin(α) .
 
a(θ) =  .. ,
.
 
 
θM −1
Assume h(t) is sampled above the Nyquist rate and that we collect N samples. Also assume
that the entire support of each g(t − τi ) is contained in these samples. We stack the samples of
h(t) into a vector as before,
   
h0 h(0)

 h1 


 h(T ) 
 Xr
h= ..  =  ..  = [gτi ⊗ a(αi )]βi = [G ◦ A]b . (9.8)
. .
   
    i=1
hN −1 h((N − 1)T )
The N samples of g(t − τi ) are stacked in the vector gτi , and we will assume that the entire
support of each g(t − τi ) is contained in these samples.
The equation shows the Khatri-Rao structure, which in the previous section we established to
be the root of the joint diagonalization model. How does this work out here?

• We can rearrange h into an M × N -matrix H:


 
h0

 h1 

h= ..  ⇔ H = [h0 h1 · · · hN −1 ]
.
 
 
hN −1
Since h = vec(H), we find (with property (5.8))
T
H = Adiag(b)G . (9.9)

EE 4715 (2022): Array Signal Processing


9.3 Joint angle and delay estimation 177

With just a single matrix H, we do not have sufficient information to uniquely determine
its factorization: there is no “joint” diagonalization.

• Alternatively, expand G into rows giT . This gives

hi = Adiag(gi )b , i = 0, · · · , N − 1

Here, joint diagonalization also doesn’t work because we just have single vectors hi , not
matrices.

• Expanding on the rows of A, we reach the same conclusion.

Thus, before we can proceed, we need to find a way to expand a single vector into a matrix. We
have seen in Chap. 8 that if A corresponds to a ULA (or a doublet structure is sufficient), we
can apply spatial smoothing to do this. Alternatively, after a DFT and deconvolution, we can
use the similar structure resulting in G to do a similar smoothing.
Indeed, this is what we did in Sec. 8.3 on single-antenna data. There, we applied a DFT to the
time domain samples in h, resulting in a vector z, and then in (8.18) constructed a matrix Z
from m shifts of z so that

Z = [z(0) , z(1) , · · · , z(m−1) ] , M (N − m + 1) × m . (9.10)

Extending the results in Sec. 8.3 to multiple antennas, we see that Z has a model

Z = [F ◦ A]B , F = [f (φ1 ) , · · · , f (φr )] (9.11)

where  
1

 φ 

φ2 φ := e−j N
2π τ
 
f (φ) =  , T

.. 
.
 
 
φN −1

φ1
 

B = [b Φb Φ2 b ··· Φm−1 b] , Φ=


 .. .

.
φr
Therefore, the shift invariance in the time domain (after the DFT) allows us to expand a single
vector b to a matrix B.
In Sec. 8.3, we applied ESPRIT to only a shift along the time domain. With a ULA, we can also
expand using the shift invariance of A. This leads to a joint diagonalization with two matrices,
and allows us to jointly estimate both delays and angles of arrival.

EE 4715 (2022): Array Signal Processing


178 Joint diagonalization and Kronecker product structures

Thus consider Z in (9.10), and assume m is large enough such that Z has rank r (in the noise-free
case). As usual, we proceed by computing the (truncated) SVD of Z,
H
Z = UΣV ,

where we truncate to rank r, i.e., U has r columns. Comparing to the model (9.11) we see
H
U = (F ◦ A)T , B = (TU )Z .

To estimate T, we form two types of selection matrices: a pair to select submatrices of F, and
a pair to select from A,

Jxφ := [IN −1 01 ] ⊗ IM , Jxθ := IN ⊗ [IM −1 01 ] ,


Jyφ := [01 IN −1 ] ⊗ IM , Jyθ := IN ⊗ [01 IM −1 ] .

To estimate Φ, we take submatrices consisting of the first and respectively last M (N − 1) rows
of U, i.e.,
Uxφ = Jxφ U , Uyφ = Jyφ U ,
whereas to estimate Θ we stack, for all N blocks, its first and respectively last M − 1 rows:

Uxθ = Jxθ U , Uyθ = Jyθ U .

These new data matrices have the structure


( (
Uxφ = A0 T Uxθ = A00 T
(9.12)
Uyφ = A0 ΦT Uyθ = A00 ΘT

If dimensions are such that these are low-rank factorizations, then

U†xφ Uyφ = T−1 ΦT


(9.13)
U†xθ Uyθ = T−1 ΘT

This is again a joint diagonalization problem, where a single matrix T can diagonalize two data
matrices. Having found the eigenvalue matrices Φ and Θ, we can retrieve the delays and angles
of each ray. The correct pairing of angles to delays follows simply from the fact that they share
the same eigenvectors.

More multipath With a straightforward extension of this approach, we can estimate the mul-
tipath parameters of d sources, where each source is received via a superposition of rays, each
with its own angle θi , delay τi , and fading βi . The corresponding data model is

X = (G ◦ A)BJS , (9.14)

where B is the diagonal matrix containing all fading parameters, and J is a r × d selection
matrix which assigns each ray to one of the sources. A similar model was derived in (4.23).

EE 4715 (2022): Array Signal Processing


9.3 Joint angle and delay estimation 179

The presence of J and S requires some additional processing steps: we try to estimate r > d
components from a rank-d matrix. This is the same problem as we encountered in the “coherent
multipath” problem in Sec. 8.2.7, and we can proceed in the same way.
In summary, an important property of the joint processing is that it allows us to simultaneously
identify parameters of many more rays than we have antennas: by combining with the time
domain, we extend A to G ◦ A.
By combining with Sec. 9.1, the algorithm has an elegant extension to the estimation of delays
and both azimuth and elevation angles. This results in a joint diagonalization problem of three
matrices. Similar generalizations occur if we have a non-uniform array with multiple baselines.

Exploiting fading diversity At the start of the section, we mentioned the multipath model
(9.9)
h = [G ◦ A]b .
Since only a single channel vector is available, we needed to exploit shift invariance of A or
(after the DFT) G to expand this to a matrix that admits joint diagonalization.
However, in mobile communication we often experience fast fading. In this case, angles and
delays of h remain more or less constant over time (in the order of microseconds), but b fluctu-
ates. If we obtain multiple channel estimates hk with constant angles and delays, but each with
different fading amplitudes bk , then

[h1 , h2 , · · ·] = [G ◦ A]B , B = [b1 , b2 , · · ·] .

By unvectoring the hk , we immediately obtain a joint diagonalization model of the form


T
Hk = Adiag(bk )G , k = 1, 2, · · · .

Joint diagonalization algorithms will allow us to estimate the columns of A and the rows of
G, without making any further assumptions on the structure of A or G: we do not need shift
invariance.
Fading diversity can also be exploited in other applications. As a simple example, consider d
independent narrowband unit-power sources impinging on an antenna array. In the kth trans-
mission block, the received data is

xk [n] = Adiag(bk )sk [n]

(ignoring the noise), where bk are the complex amplitudes, including the source powers. Subject
to fading, we assume that these are different for each k. Then the correlation matrix of xk [n] is

Rk = Adiag(bk )2 A .
H

This again leads to a joint diagonalization problem. We do not need to make assumptions on
the structure of A to be able to separate the sources.

EE 4715 (2022): Array Signal Processing


180 Joint diagonalization and Kronecker product structures

9.4 JOINT ANGLE AND FREQUENCY ESTIMATION

A somewhat different scenario than what we considered before, which however leads to the same
type of data models (and thus the same beamforming algorithms), is the following. Suppose
that we observe a frequency band of interest, and want to separate all sources that are present.
Assume that the sources are narrowband, typically with different carrier frequencies, but that
the spectra might be partly overlapping. The objective is to construct a beamformer to separate
the sources based on differences in angles or carrier frequencies. This is a problem of joint angle-
frequency estimation [1, 2]. We will assume that the sample rates in this application are much
higher than the data rates of each source, and that there is only coherent multipath, although
generalizations are possible.
Suppose that the narrowband signals have a bandwidth of less than T1 , so that they can be
sampled with a period T to satisfy the Nyquist rate. We normalize to T = 1. Also assume that
the bandwidth of the band to be scanned is P times larger: after demodulation to IF we have to
sample at rate P . Without multipath, the data model of the modulated sources at the receiver
is
d
X 2π
x(t) = a(θi )βi ej P fi t
si (t)
1

where fi is the residual modulation frequency of the i-th source (− P2 ≤ fi < P


2 ). In matrix form
this is written as
x(t) = Aθ BΦt s(t) (9.15)
where
φ1 0
 

Φ=
 .. ,

φi = ej P fi
.
.
0 φd
Since P can be quite large (order 100, say), it would be very expensive to construct a full data
matrix of all samples. In fact, it is sufficient to subsample: collect m subsequent samples at rate
P , then wait till the next period before sampling again, resulting in a data matrix X of size
mM × N ,  
x(0) x(1) · · · x(N − 1)
1
 x( P )

x(1 + P1 ) · · · x(N − 1 + P1 ) 

X= . . . .
 .. .. .. 

x( m−1
P ) x(1 +
m−1
P ) · · · x(N − 1 + m−1
P )

With the model of x(t) in (9.15), we find that X has a factorization

Aθ BΦP s(1)
 
Aθ Bs(0) ···

 Aθ BΦs( P1 ) Aθ BΦP +1 s(1 + 1
P) ··· 

X= .. .. 
. .
 
 
Aθ BΦm−1 s( m−1
P ) Aθ BΦ
P +m−1 s(1 + m−1
P ) ···

EE 4715 (2022): Array Signal Processing


9.5 Multiple invariances 181

Let us assume at this point that P  m. In that case, s(t) is relatively bandlimited with respect
to the observed band, which allows to make the crucial assumption that
1 m−1
s(t) ≈ s(t + P) ≈ · · · ≈ s(t + P )

so that the model of X simplifies to


 


Aθ Φ 
ΦP s1 · · · Φ(N −1)P sN −1 ]
 
X ≈  ..  B [s0
.
 
 
Aθ Φm−1
= (Fφ ◦ Aθ ) B (FP S) .

Fφ is as in (9.11), and only has a different interpretation: φ is now related to the carrier
frequency. FP is similar to Fφ except for a transpose and different powers, and the pointwise
multiplication represents the modulation on the signals. Obviously, beamforming will not remove
this modulation but after estimating Φ, we can easily correct for it.
If we do consider coherent multipath, the data model becomes

X = (Fφ ◦ Aθ )BJ(FP S) . (9.16)

The column span of this model has precisely the same structure as X in (9.14) before, and hence
we can use the same algorithm to find the beamformer.
If sources are assumed not to have equal carrier frequencies and m > d, we can separate them
based on the structure of Fφ only. In this case we do not need the array structure and an
arbitrary array can be used, but we do not recover the DOAs. If frequencies can be close,
however, we will have to separate the signals based on differences in angles as well. It is then
also necessary to restore the rank of X to r by spatial smoothing.

9.5 MULTIPLE INVARIANCES

TBD
Direct extension: Using both short and long baselines to improve resolution
Swindlehurst: MI-ESPRIT [?]
Lemma (in context of JAFE)

9.6 NOTES

Section 9.1 discussed the use of antenna triplets to derive the 2D ESPRIT algorithm. Instead of
triplets, we can also consider two ULAs oriented in two different directions, e.g., in an L-shape

EE 4715 (2022): Array Signal Processing


182 Joint diagonalization and Kronecker product structures

or a +-shape. Extensions to more general 2-D arrays on which the ESPRIT algorithm works are
straightforward to derive, see e.g., [3]. The main issues are the preservation of shift-invariance
properties, and the correct pairing of the estimated path parameters using a coupled eigenvalue
method.
Joint angle-delay estimation is covered in [4–8].
The IQML-2D method of [9] was originally developed for estimating the two-dimensional modes
of sinusoids in Gaussian noise. As it is based on ML, it is expected to show high performance
and convergence to the CRB for large number of samples. It can be used to determine angles
and delays if both manifolds have Vandermonde structure.
Joint diagonalization problems such as encountered in this chapter have received wide interest
in the 1990s. If eigenvalues are distinct, then already a single matrix allows to compute the
separating beamformer. To achieve this situation, one line of approaches was to form linear
combinations of the two matrices to ensure that the combination has distinct eigenvalues: see
e.g., [3]. Several Jacobi-type algorithms have been proposed as well, although some of these
assume that T is a unitary matrix [10–23].
Although these algorithms usually give good performance, the problem of joint diagonalization
with non-hermitian matrices has not yet been optimally solved. It is very relevant to study such
overdetermined eigenvalue problems. Indeed, a third matrix arises if we use a two-dimensional
uniform antenna array, by which we can measure both azimuth and elevation, or any other
array with multiple independent baselines. We will see several other examples of joint eigenvalue
problems later in this book.

Bibliography

[1] M.D. Zoltowski and C.P. Mathews, “Real-time frequency and 2-D angle estimation with
sub-Nyquist spatio-temporal sampling,” IEEE Trans. Signal Proc., vol. 42, pp. 2781–2794,
October 1994.

[2] K.-B. Yu, “Recursive super-resolution algorithm for low-elevation target angle tracking in
multipath,” IEE Proceedings - Radar, Sonar and Navigation, vol. 141, pp. 223–229, August
1994.

[3] M.D. Zoltowski, M. Haardt, and C.P. Mathews, “Closed-form 2-D angle estimation with
rectangular arrays in element space or beamspace via Unitary ESPRIT,” IEEE Trans.
Signal Proc., vol. 44, pp. 316–328, February 1996.

[4] Y. Ogawa, N. Hamaguchi, K. Ohshima, and K. Itoh, “High-resolution analysis of indoor


multipath propagation structure,” IEICE Trans. Communications, vol. E78-B, pp. 1450–
1457, November 1995.

EE 4715 (2022): Array Signal Processing


Bibliography 183

[5] J. Gunther and A.L. Swindlehurst, “Algorithms for blind equalization with multiple an-
tennas based on frequency domain subspaces,” in Proc. IEEE ICASSP, (Atlanta, GA),
pp. 2421–2424, 1996.

[6] M. Wax and A. Leshem, “Joint estimation of directions-of-arrival and time-delays of multi-
ple reflections of known signal,” IEEE Trans. Signal Proc., vol. 45, pp. 2477–2484, October
1997.

[7] M.C. Vanderveen, C.B. Papadias, and A. Paulraj, “Joint angle and delay estimation
(JADE) for multipath signals arriving at an antenna array,” IEEE Communications Letters,
vol. 1, pp. 12–14, January 1997.

[8] A.J. van der Veen, M.C. Vanderveen, and A. Paulraj, “Joint angle and delay estimation
using shift-invariance techniques,” IEEE Trans. Signal Proc., vol. 46, pp. 405–418, February
1998.

[9] M.P. Clark and L.L. Scharf, “Two-dimensional modal analysis based on maximum likeli-
hood,” IEEE Trans. Signal Processing, vol. 42, pp. 1443–52, June 1994.

[10] A.J. van der Veen, P.B. Ober, and E.F. Deprettere, “Azimuth and elevation computation in
high resolution DOA estimation,” IEEE Trans. Signal Proc., vol. 40, pp. 1828–1832, July
1992.

[11] M. Haardt, Efficient One-, Two-, and Multidimensional High-Resolution Array Signal Pro-
cessing. PhD thesis, TU München, Munich, Germany, 1997.

[12] Y. Hua, “Estimating two-dimensional frequencies by matrix enhancement and matrix pen-
cil,” IEEE Trans. Signal Proc., vol. 40, pp. 2267–2280, September 1992.

[13] J.F. Cardoso and A. Souloumiac, “Blind beamforming for non-Gaussian signals,” IEE Proc.
F (Radar and Signal Processing), vol. 140, pp. 362–370, December 1993.

[14] A. Belouchrani, K. Abed-Meraim, J.-F. Cardoso, and E. Moulines, “A blind source separa-
tion technique using second-order statistics,” IEEE Trans. Signal Proc., vol. 45, pp. 434–
444, February 1997.

[15] L. De Lathauwer, B. De Moor, and J. Vandewalle, “Independent component analysis based


on higher-order statistics only,” in Proc. IEEE SP Workshop on Stat. Signal Array Proc.,
(Corfu, Greece), pp. 356–359, 1996.

[16] L. De Lathauwer, Signal Processing Based on Multilinear Algebra. PhD thesis, KU Leuven,
Leuven, Belgium, 1997.

[17] A.J. van der Veen and A. Paulraj, “An analytical constant modulus algorithm,” IEEE
Trans. Signal Processing, vol. 44, pp. 1136–1155, May 1996.

EE 4715 (2022): Array Signal Processing


184 Joint diagonalization and Kronecker product structures

[18] P. Binding, “Simultaneous diagonalization of several Hermitian matrices,” SIAM J. Matrix


Anal. Appl., vol. 4, no. 11, pp. 531–536, 1990.

[19] M.T. Chu, “A continuous Jacobi-like approach to the simultaneous reduction of real ma-
trices,” Lin. Alg. Appl., vol. 147, pp. 75–96, 1991.

[20] A. Bunse-Gerstner, R. Byers, and V. Mehrmann, “Numerical methods for simultaneous


diagonalization,” SIAM J. Matrix Anal. Appl., vol. 4, pp. 927–949, 1993.

[21] B.D. Flury and B.E. Neuenschwander, “Simultaneous diagonalization algorithms with ap-
plications in multivariate statistics,” in Approximation and Computation (R.V.M. Zahar,
ed.), pp. 179–205, Basel: Birkhäuser, 1995.

[22] J.-F. Cardoso and A. Souloumiac, “Jacobi angles for simultaneous diagonalization,” SIAM
J. Matrix Anal. Appl., vol. 17, no. 1, pp. 161–164, 1996.

[23] M. Wax and J. Sheinvald, “A least-squares approach to joint diagonalization,” IEEE Signal
Proc. Letters, vol. 4, pp. 52–53, February 1997.

EE 4715 (2022): Array Signal Processing


Chapter 10

FACTOR ANALYSIS

Contents
10.1 The Factor Analysis problem . . . . . . . . . . . . . . . . . . . . . . . 186
10.2 Computing the Factor Analysis decomposition . . . . . . . . . . . . . 189
10.3 Rank detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
10.4 Extensions of the Classical Model . . . . . . . . . . . . . . . . . . . . 199
10.5 Application to interference cancellation . . . . . . . . . . . . . . . . . 201
10.6 Application to array calibration . . . . . . . . . . . . . . . . . . . . . . 207
10.7 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213

Many array signal processing algorithms are at some point based on the eigenvalue decompo-
sition, which is used e.g., to make a distinction between the “signal subspace” and the “noise
subspace”. By using orthogonal projections, part of the noise is projected out and only the signal
subspace remains. This can then be used for applications such as high-resolution direction-of-
arrival estimation, blind source separation, etc. In these applications, it is commonly assumed
that the noise is spatially white. However, this is valid only after suitable calibration.
Factor analysis considers covariance data models where the noise is uncorrelated but has un-
known powers at each sensor, i.e., the noise covariance matrix is an arbitrary diagonal with
positive real entries. In these cases the familiar eigenvalue decomposition (EVD) has to be
replaced by a more general “Factor Analysis” decomposition (FAD), which then reveals all rel-
evant information. It is a very relevant model for the early stages of data processing in radio
astronomy, because at that point the instrument is not yet calibrated and the noise powers on
the various antennas may be quite different.
As it turns out, this problem has been studied in the psychometrics, biometrics and statistics
literature since the 1930s (but usually for real-valued matrices) [1, 2]. The problem has received
much less attention in the signal processing literature. In this chapter, we describe the FAD,
some applications, and some algorithms for computing it.

EE 4715 (2022): Array Signal Processing


186 Factor Analysis

10.1 THE FACTOR ANALYSIS PROBLEM

10.1.1 Problem formulation


Assume as before that we have a set of Q narrow-band Gaussian signals impinging on an array
of P sensors. The received signal can be described in complex envelope (baseband) form by
Q
X
x[n] = aq sq [n] + n[n] = As[n] + n[n] (10.1)
q=1

where A = [a1 , · · · , aQ ] contains the array response vectors. In this model, A is unknown, and
the array response vectors are unstructured, i.e., we do not consider a directional model for
them. The source vector s[n] and noise vector n[n] are considered zero mean i.i.d. complex
Gaussian, i.e., the corresponding covariance matrices are diagonal.
The data model leads to a model for the data covariance matrix as
H
R = AΣs A + Σn ,

where Σs is the (diagonal) source covariance matrix, and Σn is the (diagonal) noise covariance
matrix.
For given R, can we estimate A, Σs , and Σn ? If A has no special structure (such as imposed by
a parametrically known array response vector), then we cannot distinguish A and A0 = AΣ1/2 :
without loss of generality, we can scale the source signals such that the source covariance matrix
Σs is identity.
Therefore, in this section we will consider a data covariance matrix of the form
H
R = AA + D (10.2)

where D is the (diagonal) noise covariance matrix, and A has full column rank Q. We assume
Q < P so that AAH is rank deficient. Many signal processing algorithms are based on computing
an eigenvalue decomposition of R as R = UΛUH , where U is unitary and Λ is a diagonal matrix
containing the eigenvalues in descending order.

• If D = 0 (no noise), then R has rank Q and the eigenvalue decomposition specializes to
" #" #
H Λs UHs
R = UΛ0 U = [Us Un ]
0 UHn

where Λs contains the Q nonzero eigenvalues and Us the corresponding eigenvectors. The
range of Us is called the signal subspace, its orthogonal complement Un the noise subspace.
Since without noise R = AAH , we see that the column span of Us equals the column span
of A, i.e., ran(Us ) = ran(A).

EE 4715 (2022): Array Signal Processing


10.1 The Factor Analysis problem 187

• For spatially white noise, D = σ 2 I, we can write D = σ 2 UUH , and the eigenvalue decom-
position becomes
" #" #
Λs + σ 2 I UHs
R = UΛU = U(Λ0 + σ 2 I)U = [Us
H H
Un ] . (10.3)
σ2I UHn

Hence, all eigenvalues are raised by σ 2 , but the eigenvectors are unchanged. Algorithms
based on Us can thus proceed as if there was no noise, thus leading to the use of the EVD
and related subspace estimation algorithms in many array signal processing applications.
See e.g., Chap. 8.
• If the noise is not uniform, then D is an unknown diagonal matrix, and the EVD of R
does not reveal the signal subspace Us .

In practice, we are given a finite number of samples x[n], n = 0, · · · , N − 1, and compute the
sample covariance matrix
−1
1 NX H
R̂ = x[n]x[n] .
N n=0
For large enough N , this estimate is close, but not quite equal, to R.
The objective for factor analysis is, for given R̂, to identify A and D, as well as the factor
dimension Q. This can be seen as an extension of the eigenvalue decomposition, to be used if
the noise covariance is not σ 2 I but an unknown diagonal.
It is clear that for an arbitrary Hermitian matrix R, the factorization R = AAH +D can exist in
its exact form only for Q ≥ P , in which case we can set D = 0, or any other value, which makes
the factorization useless. Hence, for a noise-perturbed matrix, we wish to detect the smallest
Q which gives a “reasonable fit”, and we will assume that Q < P is sufficiently small so that
unique decompositions exist. What we consider reasonable depends on N , as de accuracy of R̂
(or: its covariance) scales with 1/N .

10.1.2 Identifiability and uniqueness


It is immediately clear from (10.2) that the factors are not uniquely identifiable. E.g., A is not
unique: The columns of A can be permuted and if A satisfies the model, then also A0 = AQ
is valid, for any unitary matrix Q. However, the column span of A is invariant under these
transformations, and thus these do not harm subspace estimation techniques.
To be accurate, if we denote AAH = Us Λs UHs , we see we can can estimate a bit more than
just the column span of A (given by ran(Us )), because also the “signal eigenvalues” Λs tells us
something more about A. This might help to estimate the source covariance matrix Σs which
for the moment we assumed to be the identity.
More important is the uniqueness of D. By counting numbers of observations (or equations)
and numbers of unknowns,
√ we see that the number of columns Q of A cannot be too large, in
fact we need Q < P − P as can be determined as follows.

EE 4715 (2022): Array Signal Processing


188 Factor Analysis

The number of available observations is equal to the number of (real) parameters in R̂, which
is P (real) entries on the main diagonal and P (P − 1) (real) parameters for the off-diagonal
(complex) entries, taking into account Hermitian symmetry. In total these are P 2 observations.
The number of unknowns is 2P Q (real) parameters for A, and P parameters for D, minus
the number of constraints to make A unique. A constraint which is often used is to make the
columns of A0 := D−1/2 A orthogonal, or equivalently, AH D−1 A is diagonal (this is motivated
in Sec. 10.1.3 below). This gives Q2 − Q constraints on the parameters of A. Further restricting
the first row of A to be real gives another Q constraints. In total we have for the number of
equations minus the number of unknowns

s = P + P (P − 1) − (2P Q + P − (Q2 − Q + Q)) = (P − Q)2 − P . (10.4)

This number is also called the degree of freedom, and plays a role
√ in the asymptotic modeling of
the likelihood. Requiring s > 0 leads to the condition Q < P − P . This is an upper bound on
the factor rank.
Even if we satisfy this constraint, D is not always unique, as seen from the following example.
Consider R = A1 AH1 + D1 , where  
1 0
 1 1 
 
A1 =  . . 

.
 .. .. 
1 1
H
Then we also have R = A2 A2 + D2 , where
 
1/2
√  1 

 1 T
A2 = 2 
 ..  ,
 D2 = D1 + e1 e1
 .  2
1

and ei is the ith column of the identity matrix. The problem in this case is caused by a
submatrix of A1 being rank-deficient. This can be considered an uncommon technicality that
can be detected after the factors have been estimated. Throughout the rest of the chapter, we
assume that D can be identified uniquely.

10.1.3 Constraints on A
If D is identifiable, then A is unique up to a rotation Q. We can make A unique by adding
additional constraints. This essentially amounts to choosing a non-redundant parametrization.
Not all algorithms require this, but it may be needed to avoid singularities during the compu-
tation of the Cramer-Rao Bound (CRB) or when we use Newton gradient descent techniques.
For complex data, Q2 constraint equations are needed. Common constraints are to force the
columns of A to be orthogonal with respect to a certain weight matrix W > 0, i.e. to require
that AH WA is diagonal.

EE 4715 (2022): Array Signal Processing


10.2 Computing the Factor Analysis decomposition 189

In more detail, suppose we have estimated D, then we can whiten the noise covariance matrix
in R:
R̃ := D−1/2 RD−1/2 = (D−1/2 A)(A D−1/2 ) + I .
H

At this point, we can introduce the usual eigenvalue decomposition of R̃:


H H
R̃ = ŨΛ̃Ũ = Ũ(Λ̃s + I)Ũ ,

and identify D−1/2 AV = Ũ, or A = D1/2 ŨVH , where V is an arbitrary unitary factor. If we
choose V = I, we obtain AH D−1 A = Λ̃s is diagonal. We can use this as a constraint to obtain a
more unique parametrization of A. Note that A is not yet quite unique, because in the complex
case each column of A can be scaled by an arbitrary complex phase, and the columns may be
reordered as well.
If we compute a matrix A without satisfying constraints, the required transformation Q such
that A0 = AQ satisfies the constraints is easily determined afterwards. Hence, in most algo-
rithms the constraints do not play a role. In the literature, constraints such as setting AH D−1 A
diagonal have been introduced in an attempt to interprete the resulting “latent factors”, but
without prior structural information on A, these attempts are often futile.

10.2 COMPUTING THE FACTOR ANALYSIS DECOMPOSITION

Factor analysis is a classical problem. It was introduced in 1904 [3] and over time, several
algorithms were proposed [4–6], all for real data matrices (although readily extended to the
complex case). In this section we briefly review some of these approaches.
Consider again the model
H
R = AA + D, (10.5)
where A has Q columns, and D is a diagonal matrix with positive diagonal elements. As data,
we are given a sample covariance matrix R̂ based on N samples,
−1
1 NX H
R̂ = x[n]x[n] .
N n=0

In Factor Analysis, there are two problems:

1. Detection: given R̂, estimate Q. The hypothesis that the factor rank is q is denoted by
Hq . We can formulate this as a likelihood ratio test.

2. Identification: given R̂ and Q, estimate D and A, or Λs and Us in the formulation


R = Us Λs UHs + D.

We consider the latter problem first. Detection is studied in Sec. 10.3.

EE 4715 (2022): Array Signal Processing


190 Factor Analysis

|a1 |2 = αr12

R0 = R − D = aaH =

a = αb

Figure 10.1. Ad hoc method: Away from the main diagonal, all submatrices have at most
rank Q. This can be used to estimate the main diagonal. The figure shows how
this is done for Q = 1.

If we separate detection from identification, then for the latter, the objective is to estimate the
factors A and D from R̂, where the number of columns Q of A is known. We first present
an ad-hoc algorithm, which gives some insight in the problem. Then, we look at Maximum
Likelihood (ML)-type algorithms, and in particular consider a Weighted Least Squares (WLS)
formulation that is minimized using fast-converging Gauss-Newton iterations.

10.2.1 Ad Hoc Method


If the rank Q is relatively small, it is possible to solve the FA problem in closed form. As an
example, let Q = 1, assume we know R exactly, and consider R0 = R − D = aaH . Clearly,
R0 is rank 1, and this implies that each column in the matrix is a multipe of another column.
Moreover, if D is unknown, then only the diagonal entries of R0 are unknown. Each submatrix
of R0 that does not involve the main diagonal is completely known and will have rank 1. This
can be used to fill in the diagonal entries of R0 such that the entire matrix is of rank 1. Indeed,
as shown in Fig. 10.1, the first column is α times the second column, and we can find α from this
0 = αr
ratio. Then r11 12 is found immediately. Likewise, we can find the entire main diagonal.

This ad hoc estimation


√ algorithm can be extended to higher ranks, but perhaps not to the
maximal rank P − P . Also, if we only have an estimate R̂, then the algorithm will not be
optimal. However, it could be used to provide a good initial point for an iterative algorithm.

10.2.2 Alternating Least Squares


The estimation problem can also be approached as a two-stage minimization problem [2]. In
this approach we minimize the LS cost function

min kR̂ − AA − Dk2F


H
(10.6)
A,D

EE 4715 (2022): Array Signal Processing


10.2 Computing the Factor Analysis decomposition 191

by an alternating least-squares (ALS) approach, where k · kF is the Frobenius norm. First, for
a given A, (10.6) is minimized with respect to D and in the next stage, D is held constant and
a new A is found. Both problems can be optimized in closed form.
Let the subscript (k) denote the iteration count. The iteration steps are
H
D(k+1) := diag(R̂ − A(k) A(k) ) (10.7)
H
U(k+1) Λ(k+1) U(k+1) := R̂ − D(k+1) [EVD] (10.8)
1/2
A(k+1) := Us,(k+1) Λs,(k+1) , (10.9)

where U(k+1) and Λ(k+1) follow from an eigenvalue decomposition, and Us,(k+1) and Λs,(k+1)
contain the Q dominant eigenvectors and corresponding eigenvalues. A Weighted Least Squares
formulation could be considered instead of (10.6), leading to similar iterations, but involving the
EVD of D−1/2 R̂D−1/2 , if we take D−1 as a weight.
The iteration is usually initialized by taking

D(0) = [diag(R̂−1 )]−1 .

As for most ALS approaches, the rate of convergence is slow (linear). An EVD is required at
each iteration, which makes it prohibitive for large problems.

10.2.3 Maximum Likelihood Estimator


We now aim for more optimal techniques. The standard approach to tackle estimation problems
is to consider the Maximum Likelihood estimator. We assume we know Q. The first step is to
choose a suitable parametrization.

Parametrization Let us write the model as R(θ) = AAH + D, where the vector θ represents
the unknown parameters in the model. Since A is complex, a direct representation of its entries
gives complex parameters. We could represent them as independent real and purely imaginary
components, but a popular alternative is to represent them using Wirtinger operators [7, App.2],
[8]: for an unknown complex parameter θi we consider its conjugate θi∗ as an independent
parameter while real parameters are represented only once. Using this method we define the
parameter vector as  
θA
θ =  θ A∗  (10.10)
 
θD
where

θ A = vec(A)
θ A∗ = vec(A∗ )
θ D = diag(D) = d .

EE 4715 (2022): Array Signal Processing


192 Factor Analysis

This parametrization is redundant: it does not implement the Q2 constraints we need to place
on A to make it unique. However, it is more convenient to do this at a later stage.
Using this parameterization and properties of Kronecker products (5.7) and (5.8), we have

r = vec(R) = (A∗ ⊗ IP )vec(A) + (IP ◦ IP )d


= (A∗ ⊗ IP )θ A + (IP ◦ IP )θ D . (10.11)

To show how r depends on θ A∗ , let K be the exchange matrix defined by vec(AT ) = Kvec(A)
(cf. (5.15)). Then we can also write r as
T
r = (IP ⊗ A)vec(A ) + (IP ◦ IP )d
= (IP ⊗ A)Kθ A∗ + (IP ◦ IP )θ D . (10.12)

In the Wirtinger calculus, the derivative of a function to a complex parameter z = x + jy is


defined as [8]

∂f 1 ∂f ∂f
 
= −j
∂z 2 ∂x ∂y
∂f 1 ∂f ∂f
 
= +j .
∂z ∗ 2 ∂x ∂y

Moreover, z and z ∗ are treated as independent variables in the differentiation.


Based on the parametrization of R(θ), we can then derive its Jacobian J(θ) as
" #
∂vec(R) ∂vec(R) ∂vec(R) ∂vec(R)
J = T = , ,
∂θ ∂θ TA ∂θ TA∗ ∂θ TD
= [JA , JA∗ , JD ] , (10.13)

where
JA = A∗ ⊗ IP , JA∗ = (IP ⊗ A)K, JD = IP ◦ IP . (10.14)

ML cost and Fisher score If we assume that the samples x are generated by zero mean
complex proper Gaussian sources, i.e.,

1 h i
exp −x R−1 x ,
H
p(x; θ) =
πP det(R)

then the complex log-likelihood function for N independent samples is given by


h i
l(θ) = −N P log(π) + log det(R(θ)) + tr(R(θ)−1 R̂) . (10.15)

EE 4715 (2022): Array Signal Processing


10.2 Computing the Factor Analysis decomposition 193

The maximum likelihood approach aims to find a θ that maximizes this function. To this end,
we find the gradient of the likelihood function (called the Fisher score) and set it equal to zero.
For complex parameters, the Fisher score for a proper Gaussian distributed signal is defined as
..
 
 
gA H .

  
 ∂ log p(X; θ)
g(θ) =  gA∗  = log p(X; θ) =  ∂θj∗ .
 
∂θ
gD ..
 
.
Inserting (10.15), the jth entry of g(θ) can be evaluated as
∂ ∂
[g(θ)]j = −N ∗ log det(R) − N ∗ tr(R−1 R̂) .
∂θj ∂θj
We need some results for matrix differentials [8, p.53]:
∂ det(R) = det(R)tr(R−1 ∂R)
∂R−1 = −R−1 ∂R R−1 .
This gives
∂R ∂R
[g(θ)]j = −N tr[R−1 ∗ ] + N tr[R−1 ∗ R−1 R̂] .
∂θj ∂θj
Next, we use some properties of Kronecker products (see Sec. 5.1.6):
H H
tr(AB) = vec (A )vec(B)

H H T
tr(ABCD) = vec (A )(D ⊗ B)vec(C) .
This results in
∂R H ∂R
) vec(R−1 ) + N vec ( ∗ )(R−T ⊗ R−1 )vec(R̂)
H
[g(θ)]j = −N vec ( ∗
∂θj ∂θj
∂vec(R) H −T
= N( ) (R ⊗ R−1 )vec(R̂ − R) .
∂θj∗
Finally, stacking for all j and using (10.13), we find a compact expression for g(θ) as
 
R−T ⊗ R−1 vec(R̂ − R) .
H
g(θ) = N J (10.16)
This is a general expression. Let us now look at our specific parametrization for R: inserting
(10.13) into (10.16), the elements of the Fisher score g(θ) become
gA = N (A R−T ⊗ R−1 )vec(R̂ − R)
T

h i
= N vec R−1 (R̂ − R)R−1 A (10.17)

gA∗ = gA (10.18)
h i
−1 −1
gD = N vecdiag R (R̂ − R)R . (10.19)

EE 4715 (2022): Array Signal Processing


194 Factor Analysis

The ML technique requires us to set (10.17) and (10.19) equal to zero, but unfortunately this
does not produce a closed-form solution. One approach to numerically compute the ML estimate
is to consider Newton-Raphson-like algorithms, as these provide quadratic convergence. Besides
the gradient, we will also need an expression for the Hessian.

The Scoring Method The scoring algorithm is a variant of the Newton-Raphson algorithm
where the gradient is the Fisher score (10.16) and the Hessian is replaced by the Fisher infor-
mation matrix [9]. The Fisher information matrix (FIM) is defined as

∂g(θ)
 
F = −E
∂θ T

where the expectation is over the data (i.e., R̂). Inserting (10.16), and realizing that after the
expectation only the derivative of vec(R̂ − R) results in a nonzero contribution, gives

F = N J (R−T ⊗ R−1 )J,


H

where J is given by (10.13). The resulting iterations in the scoring algorithm are

θ (k+1) = θ (k) + µ(k) δ, (10.20)

where θ (k) is the current estimate of the parameters, µ(k) is a step size, and
 
δA
δ =  δ A∗ 
 
δD

is the direction of descent. The latter follows from solving

F(k) δ = g(k) , (10.21)

where g(k) = g(θ (k) ) is the Fisher score and F(k) = F(θ (k) ) is the FIM. Since without constraints
the parametrization is redundant (see Sec. 10.1.2), the FIM is singular. However, this does not
need to cause complications because (10.16) shows that g(k) is in the column span of F(k) , so
that the system of equations has a solution, and (taking the minimum-norm solution) standard
convergence results for the scoring method follow.
A problem with the scoring method is that the matrix F quickly becomes large, as its dimension
is equal to the number of unknown parameters. Solving (10.21) then becomes unattractive.
Similarly, we also do not want to directly work with R−T ⊗ R−1 as it is a matrix of size P 2 × P 2 .
Another problem is that R changes each iteration cycle and its inverse has to be recomputed
each time.

EE 4715 (2022): Array Signal Processing


10.2 Computing the Factor Analysis decomposition 195

Covariance matching techniques We can view Factor Analysis as a special case of covariance
matching, as studied in Chap. 7. In this approach, the ML problem is replaced by a Weighted
Least Squares (WLS) fitting of the sample covariance. The large sample properties of the
estimators are the same. Solving this nonlinear least squares problem using gradient descent
techniques is closely connected to the scoring algorithm.
The corresponding Nonlinear Weighted Least Squares (NLWLS) problem is

θ̂ = arg min kW1/2 [r̂ − r(θ)]k2 = arg min [r̂ − r(θ)] W[r̂ − r(θ)]
H
(10.22)
θ θ

where r = vec(R), r̂ = vec(R̂). The optimal weighting matrix W is the inverse of the covariance
matrix of r̂. We derived in Sec. 3.2.2 that this covariance is equal to C = (1/N )(R∗ ⊗R). Because
we only have access to the sample covariance matrices R̂, we use instead

W = R̂−T ⊗ R̂−1 , (10.23)

and then θ̂ asymptotically (for large N ) converges to the optimal ML solution for a Gaussian
distributed data matrix.
This is precisely in context of [10], and we can use one of the algorithms proposed there: Gauss-
Newton iterations, the scoring algorithm, or sequential estimation algorithms. Here, we derive
the Gauss-Newton iterations.

Gauss-Newton algorithm for solving NLWLS For the Gauss-Newton iteration, the Hessian
is replaced by the Gramian of the Jacobians [11]. The updates are similar to the scoring method
updates (10.20):
θ (k+1) = θ (k) + µ(k) δ, (10.24)
where δ is the direction of descent. To find δ we need to solve

B(θ (k) )δ = g(θ (k) ), (10.25)

where
H
g(θ) = J (θ)W[r̂ − r(θ)] (10.26)
H
B(θ) = J (θ)WJ(θ) . (10.27)

The weight W is given by (10.23) and the Jacobian J(θ) by (10.13).


The iterations given by (10.24) are repeated until kg(θ (k) )k2 < , where  > 0 depends on the
desired accuracy. Clearly, the equations are very similar to the scoring method (10.20, except
that the sample covariance matrices in W are constant and have to be inverted only once.
The key step in the Gauss-Newton iteration is solving the linear system (10.25). The matrix
dimensions can become large. We propose an algorithm based on a symbolic inversion of B.

EE 4715 (2022): Array Signal Processing


196 Factor Analysis

Closed-form solution for direction of descent A complicated derivation [12] that we omit
here shows how we can solve for δ D inside δ in closed form. Define
W̃ = = R̂−1 − R̂−1 A(A

H
R̂−1 A)−1 AH R̂−1
B̃D = JHD W̃T ⊗ W̃ JD = W̃T W̃ [using JD = I ◦ I and (5.5)]
 
H T
g̃D = JD W̃ ⊗ W̃ vec[R̂ − R(θ)] .

Note that W̃A = 0. Then the computation of


 
vec(∆A )
δ =  vec(∆A∗ ) 
 
δD

in (10.25) reduces to the computation of δ D from B̃D δ D = g̃D , while


1  
I + R̂W̃ R̂ − R(θ) − diag(δ D ) R̂−1 A(A R̂−1 A)−1
H
∆A = (10.28)
2
and ∆A∗ = ∆∗A . Each of these computations requires us to handle matrices not larger than size
P × P.

Alternating Weighted Least Squares (AWLS) algorithm If we take step size µ = 1, then
the closed-form result simplifies to
(k+1) (k)
θD = θD + δD .

Premultiplying with B̃D gives


(k+1) (k)
B̃D θ D = B̃D θ D + g̃D
 
(k) H T H
= B̃D θ D + JD W̃ ⊗ W̃ vec(R̂ − AA − D)
 
H T H
= JD W̃ ⊗ W̃ vec(R̂ − AA ) .
(k)
Here we used vec(D) = JD θ D and the definition of B̃D , and in the notation dropped the
dependency on k from B̃D , W̃, A and D.
Since W̃A = 0 and JD = I ◦ I, this reduces to
 
(k+1) H T
B̃D θ D = JD W̃ ⊗ W̃ vec(R̂)
h i
T (k+1)
⇔ W̃ W̃ θ D = vecdiag(W̃R̂W̃)
= vecdiag(W̃) .

W̃ acting on R̂ can be interpreted as “projecting out” the contribution of the term AAH in
R̂ after which the remaining term D can be estimated. The final simplification used that
W̃R̂W̃ = W̃.

EE 4715 (2022): Array Signal Processing


10.3 Rank detection 197

The result can be formulated as the Alternating Weighted Least Squares (AWLS) algorithm [12].
First, for given D(k) , compute
H −1/2 −1/2
UΛU := D(k) R̂D(k)
1/2
A(k+1) := D(k) Us (Λs − I)1/2

where the first line represents an eigenvalue decomposition, and in the second line, Λs contains
the largest Q eigenvalues of Λ, and Us the corresponding eigenvectors. This step is similar to
the (prewhitened) alternating LS algorithm in Sec. 10.2.2. Alternatively, (10.28) could have
been used.
Next, let W = R̂−1 , and update the estimate of D:

W̃ := W − WA(k+1) (AH(k+1) WA(k+1) )−1 AH(k+1) W


h i−1
d(k+1) := W̃T W̃ vecdiag(W̃)
D(k+1) := diag(d(k+1) ) .

These two steps are alternated until convergence. In this algorithm, all computations are on
matrices of size P × P , which makes the computational complexity of the same order as that of
an EVD: O(P 3 ).

10.2.4 Convergence
The following simulation experiment gives an indication on the convergence speed. We use
P = 100 sensors, N = 1000 samples. The matrix A is chosen randomly with a standard
complex Gaussian distribution (i.e. each element is distributed as CN (0, 1)) and D is chosen
randomly with a uniform distribution between 1 and 5.
Convergence is gauged by looking at the norm of the gradient.
AWLS is tested against a range of other algorithms which are described in [12]. In the graph, the
”Ad Hoc” method is the ALS, “Joreskog” is an implementation of WLS using Fletcher-Powell
iterations [13] as used in many standard toolboxes, while “CM” is the Constrained Maximization
algorithm [14], which was derived from the EM algorithm, is straightforward to implement and
shows quadratic convergence. “KLD/EM” is another representative of an EM algorithm with a
straightforward implementation [15].
As seen in Fig. 10.2, the AWLS algorithm converges fastest (in 10-15 iterations), while the ALS
and EM algorithms have slow convergence (over 1000 iterations for large Q).

10.3 RANK DETECTION

The detection problem is to estimate the factor rank Q (i.e., the number of columns of A). In
array processing, this relates to detecting the number of sources that the array is exposed to. An

EE 4715 (2022): Array Signal Processing


198 Factor Analysis

Magnitude of the Gradient P = 100 Q = 20 N = 1000


Magnitude of the Gradient P = 100 Q = 80 N = 1000
10 0 AWLS
AWLS
Direct NLWLS
Direct NLWLS
Krylov Scoring
Krylov Scoring
CM 0
10 CM
Joreskog 72
Joreskog 72
Ad hoc
10 -2 KLD/EM
Ad hoc
KLD/EM

10 -2

10 -4

10 -4

10 -6
10 -6

10 -8
10 -8

10 -10 10 -10
10 0 10 1 10 2 10 3 10 0 10 1 10 2 10 3
# iterations # iterations

Figure 10.2. (a) Q = 20 sources; (b) Q = 80 sources.

extensive literature exists on this topic; here we limit the discussion to a general likelihood ratio
test (GLRT) [16], which is used to decide whether the FA model fits a given sample covariance
matrix. We can use the GLRT to design a constant false alarm ratio detector. In the special case
where Q = 0, this test indicates whether there are any sources active during the measurement
(we detect whether R is diagonal). The largest permissible value of Q is that for which the
number of equations
√ minus the number of unknown (real) parameters s = (P − Q)2 − P > 0, or
Qmax < P − P . For larger Q, there is no identifiability of A and D: any sample covariance
matrix R̂ can be fitted.
Let Rq denote the covariance matrix of the FA model with q sources,
H
Rq = AA + D , where A : P ×q, D diagonal ,

and let CN (0, Rq ) denote the zero-mean complex normal distribution with covariance Rq . To
find Q using the GLRT, we define a collection of hypotheses

Hq : x(k) ∼ CN (0, Rq ) q = 0, 1, 2, · · · (10.29)

which are tested in turn against the null hypothesis

H0 : x(k) ∼ CN (0, R0 ) .

H0 corresponds to a default hypothesis of an arbitrary (unstructured) positive definite matrix


R0 .
In the GLRT, we have to insert maximum likelihood estimates for each of the unknown param-
eters, under each of the hypotheses. For Hq , we can use the estimation techniques from the
previous section, resulting in an estimated model Rq . For H0 , the ML estimate R0 is equal to
the sample covariance, R0 = R̂.

EE 4715 (2022): Array Signal Processing


10.4 Extensions of the Classical Model 199

Under Hq , respectively H0 , the maximum values of the log-likelihood are (dropping constants)

log(Lq ) = −N log det(R−1 −1


q ) − N tr(Rq R̂)
log(L0 ) = −N log det(R̂−1 ) − N P .

The log-likelihood ratio is then


!
L0
log(λ) := log = N tr(R−1 −1
q R̂) + N log det(Rq R̂) − N P . (10.30)
Lq

Here, λ = L0 /Lq is the test statistic (likelihood ratio), and we will reject Hq and accept H0 if
λ > γ, where γ is a predetermined threshold. Typically, γ is determined such that we obtain an
acceptable “false-alarm” rate (i.e., the probability that we accept H0 instead of Hq , while Hq is
actually true). To establish γ, we need to know the statistics of λ under Hq .
Generalizing the results from the real-valued case [1, 2], we obtain that for moderately large N
(say N > 50), the test statistic 2 log(λ) has approximately a χ2s distribution, where s is equal
to “the number of free parameters” under Hq (the number of equations minus the number of
unknowns). For the complex case, we saw that this number is s = (P − q)2 − P degrees of
freedom.
In view of results of Box and Bartlett, a better fit of the distribution of 2 log(λ) to a χ2s distri-
bution is obtained by replacing N in (10.30) by [1, 2]
1 2
N 0 = N − (2P + 11) − Q .
6 3

To detect Q, we start with q = 0, and apply the test for increasing values of q until it is
accepted, or until q > Qmax . In that case, the hypothesis H0 is accepted, i.e., the given R̂ is an
unstructured covariance matrix. A disadvantage of this process is that the model parameters
for each q have to be estimated, which can become quite cumbersome if P is large.1
However, note that if the GLRT passes for a given estimate Q0 it also passes for any Q > Q0 ,
and if it fails it also fails for any Q < Q0 . Therefore, instead of a linear search for Q we can√use
a binary search. The maximum number of possible sources for FA is given by Qmax < P − P .
In a binary search, we split the entire interval into two segments, and test on the boundary to
decide in which interval the solution must lie. Proceeding recursively in this way, the number
of needed FA estimates is on average log2 (Qmax ) + 1, which is reasonable even for large P .

10.4 EXTENSIONS OF THE CLASSICAL MODEL

We present two extensions of the classical model: joint and extended factor analysis.
1
Also, as for any sequential hypothesis test, the actual false alarm rate that is achieved is unknown, because
the tests are not independent.

EE 4715 (2022): Array Signal Processing


200 Factor Analysis

10.4.1 Joint Factor Analysis Model


In some applications, the signal subspace (i.e. A) is not stationary, while the noise covariance
is stationary. Consider e.g., DOA estimation of moving sources and an uncalibrated array. An
available dataset is then partitioned into M short subsets or “snapshots”, each containing N
samples. This leads to M sample covariance matrices R̂m , m = 1, . . . , M , with model
H
Rm = Am Am + D, m = 1, . . . , M . (10.31)
Am is a low-rank matrix of size P × Qm with Qm < P for all m = 1, . . . , M , and D is a
positive real diagonal matrix common among the M models. We call this model Joint Factor
Analysis (JFA). The objective is to estimate D and {Am } jointly, based on the available sample
covariance matrices {R̂m }. In many applications we are just interested in the column span of
Am .
An example where this model could occur is in wideband processing in frequency domain, where
m represents a frequency index and each Rm corresponds to a narrowband model. If the noise
powers are frequency-independent, then D common among the various covariance matrices. A
joint estimate will be more accurate than an algorithm where we first estimate the FAD for each
m, and then average the Dm .

10.4.2 Extended and Joint Extended FA Model


Another extension is to consider the noise covariance matrix to be more general than a diagonal
matrix, say Rn = Ψ, where Ψ has a certain structure, assumed to be known. Here we consider
Ψ of the form
Ψ = M Ψ,
where M is a symmetric matrix containing only ones and zeros and denotes the Hadamard
or entrywise product. We call M a mask matrix; the main diagonal is assumed to be nonzero.
We can model various types of covariance matrices using this approach (for example: block-
diagonal matrices, band matrices, sparse matrices, etc.). A further generalization of this is to
model Ψ as a linear sum of known matrices, i.e.,
vec(Ψ) = Gθ ψ ,
where G is a fixed basis. For example, G could contain selected columns of a Fourier matrix to
model spatially lowpass noise [10].
We assume M to be known based on the application. The Extended FA (EFA) model then
becomes
H
R = AA + Ψ . (10.32)
Both generalizations can be combined into Joint Extended FA (JEFA), where we have
H
Rm = Am Am + Ψ , m = 1, . . . , M . (10.33)
Algorithms to find these decompositions are straightforward generalizations of the previously
presented algorithms [12].

EE 4715 (2022): Array Signal Processing


10.5 Application to interference cancellation 201

0 0
Eigenanalysis
Factor analysis

residual interference power after projection


residual interference power after projection

−2 Whitened eigenanalysis
J = 8, Q = 1, N = 500
−4 Nominal noise power: 0 dB
−5
−6

−8

−10
−10
J = 8, Q = 1
−12 Nominal noise power: 0 dB
Maximal deviation: 3 dB
Eigenanalysis
−14 Factor analysis
Whitened eigenanalysis
−16 −15
1 2 3
0 2 4 6 8 10 12 14 10 10 10
Maximal deviation in noise power from nominal [dB] N

Figure 10.3. Residual interference power after projections.

10.5 APPLICATION TO INTERFERENCE CANCELLATION

10.5.1 Interference projection


In the context of radio astronomy, factor analysis shows up in interference cancellation. In
general, this is a large topic with many aspects. Here, we consider a simple case where we take
short integration intervals and an uncalibrated array. Since astronomical sources are weak and
much below the background noise level, if we integrate only over short intervals, the noise is
dominant. Therefore, in the absence of interference, the data covariance matrix R from a single
short term integration interval could be modeled as a diagonal D. Assuming Q independent
interfering signals gives us a contribution AAH . The approach for interference cancellation using
spatial filtering is to estimate ran(A), and to apply to R a projector P⊥ A onto the orthogonal
complement of the span, i.e., R0 = P⊥ A RP ⊥ . That should remove the interference. The filtered
A
covariance matrices are further averaged over multiple integration intervals, and corrections need
to be applied since also the astronomical data has been filtered. Details on this approach can
be found in [17, 18].

Simulation Here, we describe only a limited-scope simulation on synthetic data, where we


estimate a rank-1 subspace (i) using factor analysis, and for comparison (ii) using an eigende-
composition assuming that D = σ 2 I, or (iii) using an eigendecomposition after whitening by
D−1/2 , assuming the true D is known from calibration. The correct rank is Q = 1, and we show
the residual interference power after projection, i.e., kP⊥
â ak as a function of number of samples
N , mean noise power, and deviation in noise power. The noise powers in D are randomly gener-
ated at the beginning of the simulation, uniformly in an interval. Legends in the graphs indicate
the nominal noise power and the maximal deviation. All simulations use P = 8 sensors, and a
nominal interference to noise ratio per channel of 0 dB.

EE 4715 (2022): Array Signal Processing


202 Factor Analysis

The results are shown in Fig. 10.3. The left graph shows the residual interference power for
varying maximal deviations, the right graph shows the residual for varying number of samples
N , and a maximal deviation of 3 dB of the noise powers. The figures indicate that already for
small deviations of the noise powers it is essential to take this into account, by using the FAD
instead of the EVD. Furthermore, the estimates from the factor analysis are nearly as good as
can be obtained via whitening with known noise powers.

10.5.2 Reference antenna array

In the previous paragraph, we projected out the interference dimension, and this effectively
reduces the number of antennas (dishes) by the number of detected interferers. An alternative
is to use a reference antenna array, with antennas that receive a good copy of the interfering
signals, but have little gain towards the desired sky sources. So suppose we have a primary array
with p0 antennas, and a reference array with p1 antennas. The received signal model is

x0 (t) = v0 (t) + A0 (t)s(t) + n0 (t)


x1 (t) = A1 (t)s(t) + n1 (t)

where the subscripts 0 and 1 refer to the primary and reference array, respectively, v0 (t) contains
the desired sky source signals, s(t) the q interfering signals, and ni (t) the noise on each array.
Collecting all antenna signals into a single vector x(t), we can write

x(t) = v(t) + A(t)s(t) + n(t)

where A : p × q has q columns corresponding to the q interferers. The covariance matrix of x(t)
can be partitioned as " #
R00 R01
R= .
R10 R11

According to the assumptions, R has model


H
R = AA
"
+Ψ #
Rv,0 + A0 AH0 + Σ0 A0 AH1
= (10.34)
A1 AH0 A1 AH1 + Σ1

where Ψ := Rv + Σ is the interference-free covariance matrix, Rv := bdiag[Rv,0 , 0] contains the


astronomical visibilities, and Σ := bdiag[Σ0 , Σ1 ] is the diagonal noise covariance matrix. The
objective is to estimate the interference-free covariance submatrix Ψ00 := Rv,0 + Σ0 .
The data model (10.34) satisfies the Extended Factor Analysis (EFA) model. The covariance
model (10.34) is
" #
H H Ψ00 0
R = AA + Ψ = AA + . (10.35)
0 Σ1

EE 4715 (2022): Array Signal Processing


10.5 Application to interference cancellation 203

where we are interested in estimating the unknown square matrix Ψ00 and, for an uncalibrated
array, Σ1 is unknown. Thus, the appropriate masking matrix M such that Ψ = M Ψ is
" #
11T 0
M= .
0 I

This is an EFA model and we can apply the corresponding algorithms for estimating A and Ψ.
Each R̂ will give us an estimate Ψ̂, and Ψ̂00 is simply the upper left sub-block of this matrix.
A necessary condition for identification is that the degree of freedom s > 0. Compared to FA,
we see that the p parameters of Σ are now replaced by the p20 + p1 (real) parameters in Ψ. Thus,
we require
s = p2 + q 2 − 2pq − (p20 + p1 )
to be larger than 0. With p = p0 + p1 and solving for the number of reference antennas p1 we
find q
p1 > q − (p0 − 21 ) + q + (p0 − 12 )2 . (10.36)

Thus, if p0 is small, we need p1 > q + q, and if p0 is large, we need p1 > q.
If R̂ is based on a short-term interval (a snapshot estimate of the covariance) and we have
multiple snapshots, then we can apply JEFA to estimate the varying interfering subspaces while
exploiting that the sky covariance is constant and common among all snapshots.
We show two examples with experimental data, taken from [19].

Example 10.1. To test the algorithm on actual data, we have made a short observation
of the strong astronomical source 3C48 contaminated by Afristar satellite signals.
The primary array consists of p0 = 3 of the 14 telescope dishes of the Westerbork
Synthesis Radio Telescope (WSRT), located in The Netherlands. As reference signals
we use p1 = 27 of 52 elements of a focal–plane array that is mounted on another dish
of WSRT which is set off-target (see Fig. 10.4) such that it has no dish gain towards
the astronomical source nor to the interferer.
We recorded 13.4 seconds of data with 80 MS/s, and processed these offline. Using
short-term windowed Fourier transforms, the data was first split into 8192 frequency
bins (from which we used 1537), and subsequently correlated and averaged over
M = 4048 samples to obtain N = 64 short-term covariance matrices.
Fig. 10.5(a) shows the autocorrelations and crosscorrelations on the primary antennas
and Fig. 10.5(b) shows the autocorrelation of 6 reference antennas. The interference
is clearly seen in the spectrum. The interference consists of a lower and higher
frequency part. The low frequency part is stronger on the reference antenna and
the higher part is stronger on the primary antenna. However, because of a relatively
large number of reference antennas the total INR, as we will see, is high enough for
the algorithms to be effective.

EE 4715 (2022): Array Signal Processing


204 Factor Analysis

Figure 10.4. Reference focal-plane array mounted on a dish.

Because no calibration step has been performed we use a generalized likelihood ratio
test (GLRT) [20] to detect if each frequency bin is contaminated with RFI and then
we use EFA to estimate the noise powers and the signal spatial signature. The result
of whitening the spectrum with the estimated result of EFA is shown in Fig. 10.6(a).
The resulting auto- and crosscorrelation spectra after filtering are shown in Fig.
10.6(b). The autocorrelation spectra are almost flat, and close to 1 (the whitened
noise power). The cross-correlation spectra show that the spatial filtering with the
reference antenna has removed the RFI within the sensitivity of the telescope. Also
it shows the power of using EFA at this stage in the processing chain, as it is not
required for the array to be calibrated.

Example 10.2. In a second experiment, we use raw data from LOFAR station RS409
(100-200 MHz). Data from 46 (out of 48) x-polarization receiving elements are
sampled with a frequency of 200 MHz and correlated. Samples are then divided into
1024 subbands with the help of tapering and an FFT. From these samples we form
N = 4 covariance matrices with an integration time of 19 ms (M = 1862) for each
subband. No calibration was done on the resulting covariance matrices.
The LOFAR HBA has a hierarchy of antennas, where a single receiving element
output is the result of analog beamforming on 16 antennas (4 × 4) in a tile. During

EE 4715 (2022): Array Signal Processing


10.5 Application to interference cancellation 205

Spectrum of Primary Antenna (Uncalibrated) Spectrum of Reference Antennas


3 3
|R11|
|R12|
2.5 2.5
|R13|

2 2
|Rij|

1.5

Rii
1.5

1 1

0.5 0.5

0 0
1480 1485 1490 1495 1480 1485 1490 1495
Frequency [MHz] Frequency [MHz]

Figure 10.5. Observed spectrum from (a) the primary telescopes, (b) 6 of the reference an-
tennas.
Correlation Coef after EFA
0
10
Spectrum of Primary Antenna after whitening R
11
1.8
R12
|R11|
1.6
|R12|
R13
−1
10 R23
1.4 |R |
13
|Rij|2/ ( |Rii | |Rjj|)

1.2

−2
1 10
|Rij|

0.8

0.6
p=30
−3 p0=3
10
0.4 Nf=1537
N=64
0.2 M=4048
−4
0 10
1480 1485 1490 1495 1480 1485 1490 1495
Frequency [MHz] Frequency [MHz]

Figure 10.6. After EFA, the covariance matrices can be whitened: (a) Spectrum of primary
antenna after whitening, (b) averaged normalized correlation coefficients after
filtering.

EE 4715 (2022): Array Signal Processing


206 Factor Analysis

Spectrum LOFAR HBA Mode 5


4
10

2
10
Autocorrelation R00

0
10

−2
10

−4
10
100 120 140 160 180 200
Freq [Mhz]

Figure 10.7. Spectrum received at a LOFAR HBA station

the measurements the analog beamformers were tracking the strong astronomical
source Cygnus A.
The received spectrum is shown in Fig. 10.7. Above 174 MHz, the spectrum is
heavily contaminated by wideband Digital Audio Broadcast (DAB) transmissions.
We have used 6 of the 46 receiving elements as reference array for our filtering
techniques and the rest as primary array. Because we do not have dedicated reference
antennas and because the data is already beamformed the assumption that the source
is too weak at each short integration time (19 ms) is not completely valid. Also the
assumption that the sky sources are much weaker on the reference antennas is not
valid in this case because the reference array elements are also following Cygnus A.
Finally, we have the same exposure to the RFI on the secondary array as we have
on the primary so there is no additional RFI gain for the secondary array.
To illustrate the performance of the filtering technique we produce snapshot images
of the sky (i.e., images based on a single covariance matrix). For an uncontaminated
image, we have chosen subband 250 at 175.59 MHz, see Fig. 10.8(a), while for RFI-
contaminated data we take subband 247 at 175.88 MHz, see Fig. 10.8(b). These two
subbands have been chosen because they are close to each other (in frequency) and
we expect that the astronomical images for these bands would be similar. Subband
247 is heavily contaminated and has a 10 dB flux increase on the auto-correlations
and a 20 dB increase on the cross-correlations.
The repeated source visible in Fig. 10.8(a) is Cygnus A; the repetition is due to the
spatial aliasing which occurs at these frequencies (the tiles are separated by more
than half a wavelength). The contaminated image in Fig. 10.8(b) shows no trace of
Cygnus A; note the different amplitude scale which has been increased by a factor

EE 4715 (2022): Array Signal Processing


10.6 Application to array calibration 207

100.
Fig. 10.9 shows the image after using EFA. The image is very similar to the clean
image in Fig. 10.8(a)).

10.6 APPLICATION TO ARRAY CALIBRATION

Before we can do any beamforming, we need to calibrate the array. Indeed, in the previous
chapters, we assumed we fully knew the array response function, and in many cases, we even
assumed omnidirectional antennas (i.e., the individual antennas have the same unit response in
all directions). Before we are in this situation, we need to estimate these responses. Generally,
this involves a single test source that we scan across the array, but this is not always practical
once the array is out of the factory and deployed in the field. A particular example is radio
astronomy, where the “antennas” are large dishes or beamformed stations, and the calibrator
sources are strong celestial objects. Obviously we have no control over them and cannot switch
them off, but on the other hand their positions and source powers are accurately known from
tables. In this section, it is shown how factor analysis can be used to solve the problem of
calibration.
The calibration problem does not only involve the antenna response functions, it also involves
the receiver noise present on each antenna. In previous chapters, we usually assumed the noise
was spatially white: independent and of equal power on each antenna. However, before calibra-
tion the receiver noise generally has different powers on each antenna. These also need to be
estimated.

10.6.1 Non-ideal measurements

So far we ignored the beam shape of the individual elements (antennas or dishes) of the array.
In fact, any antenna has its own directional response b(ζ), where ζ denotes a unit-length source
direction vector (see (2.6)). This function is called the primary beam. For simplicity, it is
generally assumed that the primary beam is equal for all elements in the array, although this is
also subject to calibration. With Q point sources, we will collect the resulting samples of the
primary beam into a vector b = [b(ζ 1 ), · · · , b(ζ Q )]T . These coefficients are seen as gains that
(squared) will multiply the source powers σq2 . The general shape of the primary beam b(ζ) is
known from electromagnetic modeling during the design of the antenna. If this is not sufficiently
accurate, then it has to be calibrated.
We also have direction-independent differences in gains and phases among the antennas, e.g.,
due to differences in the receiver chains of each element in the array. Initially these are also
unknown and have to be estimated. We thus have an unknown vector g (size P × 1) with
complex entries that each multiply the output signal of each antenna.
Also the noise powers of each element are unknown and generally unequal to each other. We
will still assume that the noise is independent from element to element. We can thus model the

EE 4715 (2022): Array Signal Processing


208 Factor Analysis

Clean Subband 250 Dirty Subband 247


−0.4 −0.4 1.6
0.016
−0.3 −0.3 1.4

−0.2 0.014 −0.2 1.2

−0.1 −0.1 1
0.012
m

m
0 0 0.8

0.1 0.01 0.1 0.6

0.2 0.2 0.4


0.008
0.3 0.3 0.2

0.4 0.006 0.4


−0.4 −0.2 0 0.2 0.4 −0.4 −0.2 0 0.2 0.4
l l

Figure 10.8. (a) Clean subband 250, (b) Contaminated subband 247
Dirty Subband 247 after Filtering with EFA
−0.4 0.016

−0.3
0.014
−0.2

−0.1 0.012
m

0
0.01
0.1

0.2
0.008
0.3
0.006
0.4
−0.4 −0.2 0 0.2 0.4
l

Figure 10.9. Result of filtering using EFA

EE 4715 (2022): Array Signal Processing


10.6 Application to array calibration 209

noise covariance matrix by an (unknown) diagonal Σn .


For a calibrated array, we used until now the covariance data model
H
R = A(θ) Σs A(θ) + Σn . (10.37)

Here, the array response matrix A(θ) is a known function of the source direction vectors
{ζ 1 , · · · , ζ Q }, suitably parametrized by the vector θ (with typically two direction cosines per
source).
The modified data model that captures the unknown gain/phase/noise effects and replaces
(10.37) is then
H H H
R = [ΓA(θ)B] Σs [B A(θ) Γ ] + Σn (10.38)
where Γ = diag(g) is a diagonal with unknown receiver complex gains, and B = diag(b) contains
the samples of the primary beam (the directional response of each antenna). Usually, Γ and B
are considered to vary only slowly with time and frequency, so that we can combine multiple
covariance matrices Rm,k with the same Γ and B.
In some cases, the source directions are disturbed as well, e.g. due to atmospheric effects (or due
to ionospheric delays in radio astronomy). In first order, we can replace A(θ) by A(θ 0 ), where
θ 0 differs from θ due to the shift in apparent direction of each source. The modified data model
that captures the above effects is thus

R = [ΓA(θ 0 )B] Σs [B A(θ 0 ) Γ ] + Σn .


H H H
(10.39)

If we wish to be very general, we can write this as


H
R = [G A(θ)] Σs [G A(θ)] + Σn (10.40)

where indicates an entrywise multiplication of two matrices (Schur-Hadamard product). Here,


G is a full matrix that captures all non-linear measurement effects. Equation (10.38) is recovered
if we write G = gbH (i.e., a rank-1 matrix), and equation (10.39) if we write G = gbH A0 ,
where A0 is a matrix consisting of phase corrections such that A(θ 0 ) = A(θ) A0 .
Calibration is the process of identifying the unknown parameters in G, and subsequently cor-
recting for G during the imaging step. The model (10.40) in its generality is not identifiable
unless we make assumptions on the structure of G (in the form of a suitable parametrization)
and describe how it varies with time and frequency, e.g., in the form of (stochastic) models for
these variations.
In the next subsection, we will first describe how models of the form (10.38) or (10.39) can be
identified. This step will serve as a stepping stone in the identification of a more general G.

10.6.2 Calibration algorithms


Let us assume a model of the form (10.38), where there are Q dominant calibration sources
within the field of view. For these sources, we assume that their positions and source powers are

EE 4715 (2022): Array Signal Processing


210 Factor Analysis

known with sufficient accuracy, i.e., we assume that A and Σs are known. We can then write
(10.38) as
H H
R = ΓAΣA Γ + Σn (10.41)
where Σ = BΣs B is a diagonal with apparent source powers. With B unknown, Σ is unknown,
but estimating Σ is precisely the problem of estimating source powers in given directions: a
problem we studied before. Thus, once we have estimated Σ and know Σs , we can easily
estimate the directional gains B. The problem thus reduces to estimate the diagonal matrices
Γ, Σ and Σn from a model of the form (10.41).

Single calibrator source For some cases, e.g., radio telescope arrays where the elements are
traditional telescope dishes, the field of view is quite narrow (degrees) and we may assume that
there is only a single calibrator source in the observation. Then Σ = σ 2 is a scalar and the
problem reduces to
R = gσ 2 g + Σn
H

and since g is unknown, we could even absorb the unknown σ in g (it is not separately identifi-
able). The structure of R is a rank-1 matrix gσ 2 gH plus a diagonal Σn . This is recognized as a
“rank-1 factor analysis” model.

Example 10.3. In a radio astronomy experiment reported in [21], we observe a strong


point source in the sky with the Westerbork Synthesis Radio Telescope (WSRT).
The point source requirement is that the source angular size is much smaller than
the telescope main beam power.
In the experiments, p = 8 of the 14 WSRT telescopes were used, in single linear
polarisation mode, with a maximum distance (baseline) of 1 km. The telescopes
tracked the strong astronomical point source “3C48” at a sky frequency of 1420.4
MHz with a receiver bandwidth of 1.25 MHz. The earth-rotation related phase
drift was compensated for, which means that during the experiment the telescope–
interferometer phase was constant. We split the data into 32 frequency bins, each
with a bandwidth of 39 kHz, which fits the narrow band assumption reasonably well.
Each bin had N = 131 072 samples. The data was subsequently spatially cross-
correlated resulting in complex covariance matrices for each of the frequency bins,
to which the factor analysis algorithms are applied.
The data model in the experiment is R = gσs2 gH + Σn , where σs is the source flux
(known from tables). The FA algorithm gives an estimate of g and Σn . Figure
10.10 shows the resulting entries of these vectors as function of frequency. The
figure confirms that the received SNR is about −13 dB for each antenna, which is
expected from calibration tables for this source. A bump in the noise power curves
at 1420.4 MHz corresponds to the spectral line of neutral hydrogen, and is caused
by the galactic emission of our Milky Way. As the Milky Way is a spatially wide
source of radiowaves, it is not resolved by the WSRT interferometers, and is therefore
visible only in the noise estimates.

EE 4715 (2022): Array Signal Processing


10.6 Application to array calibration 211

18
16 N = 131072

14
12
di (i = 1..8)
|σ g |2, d (dB)

10
i

8
6
i
s

4 2
(σs gi) (i = 1..8)
2
0
−2
1419.8 1420 1420.2 1420.4 1420.6 1420.8
frequency (MHz)

Figure 10.10. Gain magnitude and noise power estimates, as function of frequency, for an
observation of the astronomical source 3C48.

EE 4715 (2022): Array Signal Processing


212 Factor Analysis

In general, there are more calibrator sources (Q) in the field of view, and we have to solve
(10.41). Using Factor Analysis, we can first solve for A0 and Σn in
H
R = A0 A0 + Σn .
Then, since A0 is not unique, we can identify
A0 = ΓAΣ1/2 Q
where Q is an unknown unitary matrix. Equivalently, define R0 = A0 AH = R − Σn , then
R0 = ΓAΣA Γ
H

where both Γ and Σ are diagonal and unknown. We may resort to an Alternating Least Squares
approach. If Γ is considered known, then we can correct R0 for it, so that we have precisely
the same problem as we considered before, (??), and we can solve for Σ using the techniques
discussed in section 7.4. Alternatively, with Σ known, we can say we know a reference model
R0 = AΣAH , and the problem is to identify the element gains Γ = diag(g) from a model of the
form
R0 = ΓR0 Γ .
H

After applying the vec(·)-operation, this is


vec(R0 ) = diag(vec(R0 ))(g∗ ⊗ g) .
This leads to the Least Squares problem
ĝ = arg ming kvec(R̂ − Σn ) − diag(vec(R0 ))(g∗ ⊗ g)k2 .
This problem cannot be solved in closed form. Alternatively, we can first solve an unstructured
problem: define x = g∗ ⊗ g and solve
x̂ = diag(vec(R0 ))−1 vec(R̂ − Σn )
or equivalently, if we define X = ggH ,
X̂ = (R̂ − Σn ) R0 .
where denotes an entrywise matrix division. After estimating the unstructured vector X, we
enforce the rank-1 structure X = ggH , via a rank-1 approximation, and find an estimate for
g. The pointwise division can lead to noise enhancement; this is remediated by only using the
result as an initial estimate for a Gauss-Newton iteration [22] or by formulating a weighted least
squares problem instead [23, 24].
With g known, we can again estimate Σ and Σn , and make an iteration. Overall we then obtain
an alternating least squares solution.
The more general calibration problem (10.39) follows from (10.38) by writing A = A(θ 0 ) where
θ 0 are the apparent source locations. In the alternating least squares framework, this problem
can be solved in quite the same way: we solve for g, θ 0 , σ s and σ n in turn, keeping the other
parameters fixed at their previous estimates. After that, we can relate the apparent source
locations to the (known) locations of the calibrator sources θ.

EE 4715 (2022): Array Signal Processing


10.7 Notes 213

Estimating the general model In the more general case (10.40), viz.
H
R = (G A)Σs (G A) + Σn ,

we have an unknown full matrix G. We assume A and Σs known. Since A pointwise multiplies
G and G is unknown, we might as well omit A from the equations without loss of generality.
For the same reason also Σs can be omitted. This leads to a problem of the form
H
R = GG + Σn ,

where G : P × Q and Σn (diagonal) are unknown. This problem is recognized as a rank-Q factor
analysis problem. For reasonably small Q, as compared to the size P of R, the factor G can be
solved for, again using algorithms for covariance matching such as in [10].
It is important to note that G can be identified only up to a unitary factor V at the right:
G0 = GV would also be a solution. This factor makes the gains unidentifiable unless we
introduce more structure to the problem.

10.7 NOTES

Material presented in this chapter was derived from [12, 25].


FA for real-valued matrices was first introduced by Spearman [3] in 1904 to find a quantitative
measure for intelligence, given a series of test results. Between 1940 and 1970, Lawley, Anderson,
Jöreskog and others developed FA as an established multivariate technique [1, 2, 5, 26, 27]. Cur-
rently, FA is an important and popular tool for latent variable analysis with many applications
in various fields of science [28]. However, its application within the signal processing community
has been surprisingly limited.
In the context of signal processing, the FA problem and several extensions can be regarded
as a specific case of covariance matching, studied in detail in [10]. In there, the model (10.2)
is presented more generically in terms of a parametric model A(θ) and a linear parametric
model for the noise covariance (not restricted to diagonal), and maximum likelihood algorithms
are presented to estimate the parameters. This relates to the topic of sensor array parameter
estimation (e.g., direction of arrival) in the presence of colored noise or spatially correlated
noise, under a variety of possible model assumptions such as D being diagonal, block diagonal,
or composed of a linear sum of known matrices [29–32].
Generally, algorithms for finding the model parameters in the FA model can be categorized
into two groups. “Classical” approaches are based on Maximum Likelihood (ML) or related
weighted least squares optimization. This results in large nonlinear optimization problems that
are often implemented using Newton-Raphson or more efficient Fletcher-Powell iterations [13, 26,
33]. These algorithms are still very popular and standard toolboxes (Matlab, SPSS) use them.
Unfortunately, they are relatively hard to implement and computationally rather complex due to
the inversion of a large matrix containing the second-order derivatives, so that approximations

EE 4715 (2022): Array Signal Processing


214 Factor Analysis

are necessary. Alternatively, the ML solution is found using Expectation-Maximization (EM)


techniques, first proposed in [34], resulting in algorithms that are simpler to implement but
often show slow convergence. The Conditional Maximization (CM) algorithm [14] has quadratic
convergence and currently seems most competitive.
A second class of algorithms is inspired by the work of Ledermann in 1940 [4] and gained renewed
momentum in recent years due to the popularity of convex optimization. The factors are found
using the trace function as a convex relaxation of a minimum-rank constraint [35–37]. Recently,
several new approaches for matrix completion have been proposed that involve low-rank plus
sparse matrices [38, 39]. This leads to similar convex optimization algorithms, although not
specifically designed with covariance matrices in mind.
In [12], we proposed new algorithms for FA (and extensions) that are of the ML type, resulting
in particular in the attractive AWLS algorithm that is easy to implement and the fastest in
convergence.
For the radio astronomy application, we applied FA to calibration and interference detec-
tion/filtering in [40–43]. These addressed the case where the noise covariance matrix is diagonal
with unknown elements. For cases where the noise covariance matrix is no longer diagonal but
has a known sparse structure, we later proposed the “extended FA” (EFA) model [12]. We
also considered applications where the desired subspace changes rapidly while the noise remains
stationary. In this case we can compute a series of short-term covariance matrices or “snap-
shots” (each with the classical FA model form (10.2) but with a common matrix D), requiring
an extension toward “joint FA” (JFA). Combined, this led to “joint extended FA” (JEFA) [12].
We recommend FA as an extension of the eigenvalue decomposition (EVD) to cases where
the noise is not white. The simulations in [12] indicated that even if the noise is white, the
performance penalty with respect to EVD is minor. Therefore, the more general structure
of the extended FA data models enable their application in a wide range of signal processing
applications.
Cramér-Rao Bounds for the presented models appear in [19].
The potential of FA and (J)EFA in practical scenarios was demonstrated for spatial filtering of
RFI signals present in astronomical data in [19]. Calibration of the Westerbork radio telescope
array (P = 14 dishes) using the Ad Hoc approach was shown in [40]. Calibration of one station of
the LOFAR radio telescope array (P = 96 antennas) was reported in [43–45], and this application
is run in daily practice of the array [46]. Using LOFAR data, EFA was demonstrated in [47] to
suppress the Milky Way (broadband emission).

Bibliography

[1] Derrick Norman Lawley and A.E. Maxwell, Factor analysis as a statistical method. 2nd.
ed., New York: Am. Elsevier Publ., 1971.

EE 4715 (2022): Array Signal Processing


Bibliography 215

[2] K.V. Mardia, J.T. Kent, and J.M. Bibby, Multivariate Analysis. Academic Press, 1979.

[3] C. Spearman, “The proof and measurement of association between two things,” The Amer-
ican Journal of Psychology, vol. 15, pp. 72–101, Jan 1904.

[4] Walter Ledermann, “On a problem concerning matrices with variable diagonal elements,”
Proceedings of the Royal Society of Edinburgh, vol. 60, pp. 1–17, 1 1940.

[5] K. G. Jöreskog, “A general approach to confirmatory maximum likelihood factor analysis,”


Psychometrika, vol. 34, no. 2, pp. 183–202, 1969.

[6] Sik Yum Lee, “The Gauss-Newton algorithm for the Weighted Least Squares factor analy-
sis,” Journal of the Royal Statistical Society, vol. 27, June 1978.

[7] Peter J. Schreier, Statistical Signal Processing of Complex-Valued Data. Cambridge Uni-
versity Press, 2010.

[8] Are Hjørungnes, Complex-Valued Matrix Derivatives with Applications in Signal Processing
and Communications. Cambridge University Press, 2011.

[9] Steven M. Kay, Fundamentals of Statistical Signal Processing, Estimation theory, vol. Vol-
ume I. Prentice Hall, 1993.

[10] B. Ottersten, P. Stoica, and R. Roy, “Covariance matching estimation techniques for ar-
ray signal proce ssing applications,” Digital Signal Processing, A Review Journal, vol. 8,
pp. 185–210, July 1998.

[11] P. Gill, W. Murray, and M.H. Wright, Practical optimization. London: Academic Press,
1981.

[12] A.M. Sardarabadi and A.J. van der Veen, “Complex factor analysis and extensions,” IEEE
Tr. Signal Processing, vol. 66, February 2018.

[13] Karl G. Jöreskog and Arthur S. Goldberger, “Factor analysis by generalized least squares,”
Psychometrika, vol. 37, pp. 243–260, Sep 1972.

[14] J.-H. Zhao, Philip Yu, and Qibao Jiang, “ML estimation for factor analysis: EM or non-
EM?,” Statistics and Computing, vol. 18, pp. 109–123, 2008. 10.1007/s11222-007-9042-y.

[15] A.-K. Seghouane, “An iterative projections algorithm for ML factor analysis,” in IEEE
Workshop on Machine Learning for Signal Processing, pp. 333–338, Oct. 2008.

[16] S.M. Kay, Fundamentals of Statistical Signal Processing. Volume II: Detection Theory.
Upper Saddle River, NJ: Prentice Hall PTR, 1998.

[17] J. Raza, A-J Boonstra, and A-J. van der Veen, “Spatial filtering of RF interference in radio
astronomy,” IEEE Signal Processing Letters, vol. 9, Mar. 2002.

EE 4715 (2022): Array Signal Processing


216 Factor Analysis

[18] S. van der Tol and A. J. van der Veen, “Performance analysis of spatial filtering of RF
interference in radio astronomy,” IEEE Transactions on Signal Processing, vol. 53, pp. 896–
910, Mar. 2005.

[19] A. Mouri Sardarabadi, A.-J. van der Veen, and A.-J. Boonstra, “Spatial Filtering of RF
Interference in Radio Astronomy Using a Reference Antenna Array,” IEEE Trans. Signal
Process., vol. 64, pp. 432–447, Jan 2016.

[20] A. Leshem and A.-J. van der Veen, “Multichannel detection of Gaussian signals with un-
calibrated receivers,” IEEE Signal Processing Letters, vol. 8, no. 4, pp. 120–122, 2001.

[21] A. J. Boonstra and A. J. van der Veen, “Gain calibration methods for radio telescope
arrays,” IEEE Trans. Signal Processing, vol. 51, pp. 25–38, Jan. 2003.

[22] D. R. Fuhrmann, “Estimation of sensor gain and phase,” IEEE Trans. Signal Processing,
vol. 42, pp. 77–87, Jan. 1994.

[23] S. J. Wijnholds and A. J. Boonstra, “A multisource calibration method for phased ar-
ray telescopes,” in Fourth IEEE Workshop on Sensor Array and Multi-channel Processing
(SAM), (Waltham (Mass.), USA), July 2006.

[24] S. J. Wijnholds and A. J. van der Veen, “Multisource self-calibration for sensor arrays,”
IEEE Tr. Signal Processing, vol. 57, pp. 3512–3522, Sept. 2009.

[25] A.J. van der Veen, S.J. Wijnholds, and A.M. Sardarabadi, “Signal processing for radio
astronomy,” in Handbook of Signal Processing Systems, 3rd ed., Springer, November 2018.
ISBN 978-3-319-91734-4.

[26] Derrick N Lawley, “The estimation of factor loadings by the method of maximum likeli-
hood.,” Proceedings of the Royal Society of Edinburgh, vol. 60, no. 01, pp. 64–82, 1940.

[27] T. W. Anderson and H. Rubin, “Statistical inference in factor analysis,” In Proceedings of


the Third Berkeley Symposium on Mathematical Statistics and Probability, vol. 5, pp. 111
– 150, 1956.

[28] David J. Bartholomew, Martin Knott, and Irini Moustaki, Latent Variable Models and
Factor Analysis: A Unified Approach. John Wiley and Sons, 2011.

[29] M. Viberg, P. Stoica, and B. Ottersten, “Array processing in correlated noise fields based
on instrumental variables and subspace fitting,” IEEE Trans. Signal Process., vol. 43,
p. 1187–1199, Jan. 1995.

[30] V. Nagesha and S. M. Kay, “Maximum likelihood estimation for array processing in colored
noise,” IEEE Trans. Signal Process., vol. 44, p. 169–180, Feb. 1996.

EE 4715 (2022): Array Signal Processing


Bibliography 217

[31] P. Stoica, M. Viberg, K. M. Wong, and Q. Wu, “Maximum-likelihood bearing estimation


with partly calibrated arrays in spatially correlated noise fields,” IEEE Trans. Signal Pro-
cess., vol. 44, p. 888–899, Apr. 1996.

[32] M. Wax, J. Sheinvald, and A. J. Weiss, “Detection and localization in colored noise via
generalized least squares,” IEEE Tr. Signal Process., vol. 44, pp. 1734–1743, July 1996.

[33] K. G. Jöreskog, “Some contributions to maximum likelihood factor analysis,” Psychome-


trika, vol. 32, no. 4, pp. 433–482, 1967.

[34] Donald Rubin and Dorothy Thayer, “EM algorithms for ML factor analysis,” Psychome-
trika, vol. 47, pp. 69–76, 1982. 10.1007/BF02293851.

[35] Alexander Shapiro, “Weighted minimum trace factor analysis,” Psychometrika, vol. 47,
no. 3, pp. 243–264, 1982.

[36] Alexander Shapiro, “Rank-reducibility of a symmetric matrix and sampling theory of min-
imum trace factor analysis,” Psychometrika, vol. 47, no. 2, pp. 187–199, 1982.

[37] James Saunderson, Venkat Chandrasekaran, Pablo A Parrilo, and Alan S Willsky, “Diagonal
and low-rank matrix decompositions, correlation matrices, and ellipsoid fitting,” SIAM
Journal on Matrix Analysis and Applications, vol. 33, no. 4, pp. 1395–1416, 2012.

[38] E.J. Candes, X. Li, Y. Ma, and J. Wright, “Robust principal component analysis?,” arXiv
preprint arXiv:0912.3599, 2009.

[39] Emmanuel J Candès and Benjamin Recht, “Exact matrix completion via convex optimiza-
tion,” Foundations of Computational mathematics, vol. 9, no. 6, pp. 717–772, 2009.

[40] A.-J. Boonstra and A.-J. van der Veen, “Gain calibration methods for radio telescope
arrays,” IEEE Tr. Signal Processing, vol. 51, pp. 25–38, Jan. 2003.

[41] A-J. van der Veen, A. Leshem, and A-J. Boonstra, “Array signal processing for radio
astronomy,” Experimental Astronomy (EXPA), vol. 17, no. 1-3, pp. 231–249, 2004. ISSN
0922-6435.

[42] A-J. van der Veen, A. Leshem, and A-J. Boonstra, “Array signal processing for radio
astronomy,” in The Square Kilometre Array: An Engineering Perspective (P.J. Hall, ed.),
pp. 231–249, Dordrecht: Springer, 2005. ISBN 1-4020-3797-x. Reprinted from Experimental
Astronomy, 17(1-3),2004.

[43] S.J. Wijnholds and A.-J. van der Veen, “Multisource self-calibration for sensor arrays,”
Signal Processing, IEEE Transactions on, vol. 57, pp. 3512–3522, Sept 2009.

[44] S.J. Wijnholds, S. van der Tol, R. Nijboer, and A.-J. van der Veen, “Calibration challenges
for future radio telescopes,” IEEE Signal Processing Magazine, vol. 27, pp. 30–42, Jan 2010.

EE 4715 (2022): Array Signal Processing


218 Factor Analysis

[45] A. Mouri Sardarabadi and A.-J. van der Veen, “Application of Krylov based methods in
calibration for radio astronomy,” in 2014 IEEE 8th Sensor Array and Multichannel Signal
Processing Workshop (SAM), pp. 153–156, June 2014.

[46] M. P. van Haarlem, M. W. Wise, A. W. Gunst, et al., “LOFAR: The LOw-Frequency


ARray,” Astronomy & Applications, vol. 556, p. A2, 2013.

[47] A. Mouri Sardarabadi and A.-J. van der Veen, “Subspace estimation using factor analysis,”
in 2012 IEEE 7th Sensor Array and Multichannel Signal Processing Workshop (SAM),
pp. 477 –480, June 2012.

EE 4715 (2022): Array Signal Processing


Chapter 11

INDEPENDENT COMPONENT ANALYSIS

Contents
11.1 Fourth-order Cumulants . . . . . . . . . . . . . . . . . . . . . . . . . . 219
11.2 Data model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
11.3 JADE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
11.4 Application: ACMA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219

11.1 FOURTH-ORDER CUMULANTS

11.2 DATA MODEL

11.3 JADE

11.4 APPLICATION: ACMA

EE 4715 (2022): Array Signal Processing

You might also like