100% found this document useful (2 votes)
21 views77 pages

Recurrent Neural Networks For Prediction Learning Algorithms Architectures and Stability Danilo P Mandic Instant Download

The document is about 'Recurrent Neural Networks for Prediction' authored by Danilo P. Mandic and Jonathon A. Chambers, focusing on learning algorithms, architectures, and stability of recurrent neural networks. It includes a comprehensive overview of the fundamentals, network architectures, activation functions, and stability issues related to RNNs. The book is part of the Wiley series in adaptive and learning systems for signal processing, communications, and control.

Uploaded by

sarkingiio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
21 views77 pages

Recurrent Neural Networks For Prediction Learning Algorithms Architectures and Stability Danilo P Mandic Instant Download

The document is about 'Recurrent Neural Networks for Prediction' authored by Danilo P. Mandic and Jonathon A. Chambers, focusing on learning algorithms, architectures, and stability of recurrent neural networks. It includes a comprehensive overview of the fundamentals, network architectures, activation functions, and stability issues related to RNNs. The book is part of the Wiley series in adaptive and learning systems for signal processing, communications, and control.

Uploaded by

sarkingiio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 77

Recurrent Neural Networks For Prediction

Learning Algorithms Architectures And Stability


Danilo P Mandic download

https://ebookbell.com/product/recurrent-neural-networks-for-
prediction-learning-algorithms-architectures-and-stability-
danilo-p-mandic-4311386

Explore and download more ebooks at ebookbell.com


Here are some recommended products that we believe you will be
interested in. You can click the link to download.

Recurrent Neural Networks For Temporal Data Processing Hubert Cardot


And Romuald Bon

https://ebookbell.com/product/recurrent-neural-networks-for-temporal-
data-processing-hubert-cardot-and-romuald-bon-2353042

Recurrent Neural Networks For Shortterm Load Forecasting An Overview


And Comparative Analysis 1st Edition Filippo Maria Bianchi Et Al

https://ebookbell.com/product/recurrent-neural-networks-for-shortterm-
load-forecasting-an-overview-and-comparative-analysis-1st-edition-
filippo-maria-bianchi-et-al-6791102

Braincomputer Interfacing For Assistive Robotics Electroencephalograms


Recurrent Quantum Neural Networks And Usercentric Graphical Interfaces
1st Edition Vaibhav Gandhi

https://ebookbell.com/product/braincomputer-interfacing-for-assistive-
robotics-electroencephalograms-recurrent-quantum-neural-networks-and-
usercentric-graphical-interfaces-1st-edition-vaibhav-gandhi-5138764

Recurrent Neural Networks With Python Quick Start Guide Simeon


Kostadinov

https://ebookbell.com/product/recurrent-neural-networks-with-python-
quick-start-guide-simeon-kostadinov-58262830
Recurrent Neural Networks Xiaolin Hu P Balasubramaniam

https://ebookbell.com/product/recurrent-neural-networks-xiaolin-hu-p-
balasubramaniam-2625740

Recurrent Neural Networks From Simple To Gated Architectures 1st


Edition Fathi M Salem

https://ebookbell.com/product/recurrent-neural-networks-from-simple-
to-gated-architectures-1st-edition-fathi-m-salem-37419192

Recurrent Neural Networks Design And Applications L C Jain

https://ebookbell.com/product/recurrent-neural-networks-design-and-
applications-l-c-jain-4120836

Recurrent Neural Networks Concepts And Applications Amit Kumar Tyagi

https://ebookbell.com/product/recurrent-neural-networks-concepts-and-
applications-amit-kumar-tyagi-43393116

Recurrent Neural Networks With Python Quick Start Guide Sequential


Learning And Language Modeling With Tensorflow 1st Edition Simeon
Kostadinov

https://ebookbell.com/product/recurrent-neural-networks-with-python-
quick-start-guide-sequential-learning-and-language-modeling-with-
tensorflow-1st-edition-simeon-kostadinov-51644338
Recurrent Neural Networks for Prediction
Authored by Danilo P. Mandic, Jonathon A. Chambers
Copyright 2001
c John Wiley & Sons Ltd
ISBNs: 0-471-49517-4 (Hardback); 0-470-84535-X (Electronic)

RECURRENT
NEURAL
NETWORKS
FOR PREDICTION
WILEY SERIES IN ADAPTIVE AND LEARNING SYSTEMS FOR
SIGNAL PROCESSING, COMMUNICATIONS, AND CONTROL
Editor: Simon Haykin
Beckerman/ADAPTIVE COOPERATIVE SYSTEMS
Chen and Gu/CONTROL-ORIENTED SYSTEM IDENTIFICATION: An H
Approach
Cherkassky and Mulier/LEARNING FROM DATA: Concepts, Theory and Methods
Diamantaras and Kung/PRINCIPAL COMPONENT NEURAL NETWORKS:
Theory and Applications
Haykin and Puthusserypady/CHAOTIC DYNAMICS OF SEA CLUTTER
Haykin/NONLINEAR DYNAMICAL SYSTEMS: Feedforward Neural Network
Perspectives
Haykin/UNSUPERVISED ADAPTIVE FILTERING, VOLUME I: Blind Source
Separation
Haykin/UNSUPERVISED ADAPTIVE FILTERING, VOLUME II: Blind
Deconvolution
Hines/FUZZY AND NEURAL APPROACHES IN ENGINEERING
Hrycej/NEUROCONTROL: Towards an Industrial Control Methodology
Krstic, Kanellakopoulos, and Kokotovic/NONLINEAR AND ADAPTIVE
CONTROL DESIGN
Mann/INTELLIGENT IMAGE PROCESSING
Nikias and Shao/SIGNAL PROCESSING WITH ALPHA-STABLE
DISTRIBUTIONS AND APPLICATIONS
Passino and Burgess/STABILITY ANALYSIS OF DISCRETE EVENT SYSTEMS
Sanchez-Peña and Sznaier/ROBUST SYSTEMS THEORY AND APPLICATIONS
Tao and Kokotovic/ADAPTIVE CONTROL OF SYSTEMS WITH ACTUATOR
AND SENSOR NONLINEARITIES
Van Hulle/FAITHFUL REPRESENTATIONS AND TOPOGRAPHIC MAPS:
From Distortion- to Information-Based Self-Organization
Vapnik/STATISTICAL LEARNING THEORY
Werbos/THE ROOTS OF BACKPROPAGATION: From Ordered Derivatives to
Neural Networks and Political Forecasting
Yee and Haykin/REGULARIZED RADIAL-BASIS FUNCTION NETWORKS:
Theory and Applications
RECURRENT
NEURAL
NETWORKS
FOR PREDICTION
LEARNING ALGORITHMS,
ARCHITECTURES AND STABILITY

Danilo P. Mandic
School of Information Systems,
University of East Anglia, UK
Jonathon A. Chambers
Department of Electronic and Electrical Engineering,
University of Bath, UK

JOHN WILEY & SONS, LTD


Chichester • New York • Weinheim • Brisbane • Singapore • Toronto
Copyright 2001
c John Wiley & Sons, Ltd
Baffins Lane, Chichester,
West Sussex, PO19 1UD, England
National 01243 779777
International (+44) 1243 779777
e-mail (for orders and customer service enquiries): [email protected]
Visit our Home Page on http://www.wiley.co.uk or http://www.wiley.com
All Rights Reserved. No part of this publication may be reproduced, stored in a retrieval
system, or transmitted, in any form or by any means, electronic, mechanical, photocopying,
recording, scanning or otherwise, except under the terms of the Copyright Designs and
Patents Act 1988 or under the terms of a licence issued by the Copyright Licensing Agency
Ltd, 90 Tottenham Court Road, London, W1P 0LP, UK, without the permission in writing
of the Publisher, with the exception of any material supplied specifically for the purpose
of being entered and executed on a computer system, for exclusive use by the purchaser of
the publication.
Neither the author(s) nor John Wiley & Sons Ltd accept any responsibility or liability for
loss or damage occasioned to any person or property through using the material, instruc-
tions, methods or ideas contained herein, or acting or refraining from acting as a result of
such use. The author(s) and Publisher expressly disclaim all implied warranties, including
merchantability of fitness for any particular purpose.
Designations used by companies to distinguish their products are often claimed as trade-
marks. In all instances where John Wiley & Sons is aware of a claim, the product names
appear in initial capital or capital letters. Readers, however, should contact the appropriate
companies for more complete information regarding trademarks and registration.
Other Wiley Editorial Offices
John Wiley & Sons, Inc., New York, USA
WILEY-VCH Verlag GmbH, Weinheim, Germany
John Wiley & Sons Australia, Ltd, Queensland
John Wiley & Sons (Canada) Ltd, Ontario
John Wiley & Sons (Asia) Pte Ltd, Singapore

Library of Congress Cataloging-in-Publication Data


Mandic, Danilo P.
Recurrent neural networks for prediction : learning algorithms, architectures, and
stability / Danilo P. Mandic, Jonathon A. Chambers.
p. cm - - (Wiley series in adaptive and learning systems for signal processing,
communications, and control)
Includes bibliographical references and index.
ISBN 0-471-49517-4 (alk. paper)
1. Machine learning. 2. Neural networks (Computer science) I. Chambers, Jonathon A.
II. Title. III. Adaptive and learning systems for signal processing, communications, and
control.
Q325.5 .M36 2001
006.3 2- -dc21 2001033418
British Library Cataloguing in Publication Data
A catalogue record for this book is available from the British Library
ISBN 0-471-49517-4
Produced from LATEX files supplied by the author, typeset by T&T Productions Ltd, London.
Printed and bound in Great Britain by Antony Rowe, Chippenham, Wiltshire.
This book is printed on acid-free paper responsibly manufactured from sustainable forestry, in
which at least two trees are planted for each one used for paper production.
To our students and families
Contents

Preface xiv

1 Introduction 1
1.1 Some Important Dates in the History of Connectionism 2
1.2 The Structure of Neural Networks 2
1.3 Perspective 4
1.4 Neural Networks for Prediction: Perspective 5
1.5 Structure of the Book 6
1.6 Readership 8

2 Fundamentals 9
2.1 Perspective 9
2.1.1 Chapter Summary 9
2.2 Adaptive Systems 9
2.2.1 Configurations of Adaptive Systems Used in Signal
Processing 10
2.2.2 Blind Adaptive Techniques 12
2.3 Gradient-Based Learning Algorithms 12
2.4 A General Class of Learning Algorithms 14
2.4.1 Quasi-Newton Learning Algorithm 15
2.5 A Step-by-Step Derivation of the Least Mean Square (LMS)
Algorithm 15
2.5.1 The Wiener Filter 16
2.5.2 Further Perspective on the Least Mean Square (LMS)
Algorithm 17
2.6 On Gradient Descent for Nonlinear Structures 18
2.6.1 Extension to a General Neural Network 19
2.7 On Some Important Notions From Learning Theory 19
2.7.1 Relationship Between the Error and the Error Function 19
2.7.2 The Objective Function 20
2.7.3 Types of Learning with Respect to the Training Set
and Objective Function 20
2.7.4 Deterministic, Stochastic and Adaptive Learning 21
2.7.5 Constructive Learning 21
viii CONTENTS

2.7.6 Transformation of Input Data, Learning and


Dimensionality 22
2.8 Learning Strategies 24
2.9 General Framework for the Training of Recurrent Networks by
Gradient-Descent-Based Algorithms 24
2.9.1 Adaptive Versus Nonadaptive Training 24
2.9.2 Performance Criterion, Cost Function, Training Function 25
2.9.3 Recursive Versus Nonrecursive Algorithms 25
2.9.4 Iterative Versus Noniterative Algorithms 25
2.9.5 Supervised Versus Unsupervised Algorithms 25
2.9.6 Pattern Versus Batch Learning 26
2.10 Modularity Within Neural Networks 26
2.11 Summary 29

3 Network Architectures for Prediction 31


3.1 Perspective 31
3.2 Introduction 31
3.3 Overview 32
3.4 Prediction 33
3.5 Building Blocks 35
3.6 Linear Filters 37
3.7 Nonlinear Predictors 39
3.8 Feedforward Neural Networks: Memory Aspects 41
3.9 Recurrent Neural Networks: Local and Global Feedback 43
3.10 State-Space Representation and Canonical Form 44
3.11 Summary 45

4 Activation Functions Used in Neural Networks 47


4.1 Perspective 47
4.2 Introduction 47
4.3 Overview 51
4.4 Neural Networks and Universal Approximation 51
4.5 Other Activation Functions 54
4.6 Implementation Driven Choice of Activation Functions 57
4.7 MLP versus RBF Networks 60
4.8 Complex Activation Functions 60
4.9 Complex Valued Neural Networks as Modular Groups of
Compositions of Möbius Transformations 65
4.9.1 Möbius Transformation 65
4.9.2 Activation Functions and Möbius Transformations 65
4.9.3 Existence and Uniqueness of Fixed Points in a Complex
Neural Network via Theory of Modular Groups 67
4.10 Summary 68
CONTENTS ix

5 Recurrent Neural Networks Architectures 69


5.1 Perspective 69
5.2 Introduction 69
5.3 Overview 72
5.4 Basic Modes of Modelling 72
5.4.1 Parametric versus Nonparametric Modelling 72
5.4.2 White, Grey and Black Box Modelling 73
5.5 NARMAX Models and Embedding Dimension 74
5.6 How Dynamically Rich are Nonlinear Neural Models? 75
5.6.1 Feedforward versus Recurrent Networks for Nonlinear
Modelling 76
5.7 Wiener and Hammerstein Models and Dynamical Neural Networks 77
5.7.1 Overview of Block-Stochastic Models 77
5.7.2 Connection Between Block-Stochastic Models and
Neural Networks 78
5.8 Recurrent Neural Network Architectures 81
5.9 Hybrid Neural Network Architectures 84
5.10 Nonlinear ARMA Models and Recurrent Networks 86
5.11 Summary 89

6 Neural Networks as Nonlinear Adaptive Filters 91


6.1 Perspective 91
6.2 Introduction 91
6.3 Overview 92
6.4 Neural Networks and Polynomial Filters 92
6.5 Neural Networks and Nonlinear Adaptive Filters 95
6.6 Training Algorithms for Recurrent Neural Networks 101
6.7 Learning Strategies for a Neural Predictor/Identifier 101
6.7.1 Learning Strategies for a Neural Adaptive Recursive Filter 103
6.7.2 Equation Error Formulation 104
6.7.3 Output Error Formulation 104
6.8 Filter Coefficient Adaptation for IIR Filters 105
6.8.1 Equation Error Coefficient Adaptation 107
6.9 Weight Adaptation for Recurrent Neural Networks 107
6.9.1 Teacher Forcing Learning for a Recurrent Perceptron 108
6.9.2 Training Process for a NARMA Neural Predictor 109
6.10 The Problem of Vanishing Gradients in Training of Recurrent
Neural Networks 109
6.11 Learning Strategies in Different Engineering Communities 111
6.12 Learning Algorithms and the Bias/Variance Dilemma 111
6.13 Recursive and Iterative Gradient Estimation Techniques 113
6.14 Exploiting Redundancy in Neural Network Design 113
6.15 Summary 114
x CONTENTS

7 Stability Issues in RNN Architectures 115


7.1 Perspective 115
7.2 Introduction 115
7.3 Overview 118
7.4 A Fixed Point Interpretation of Convergence in Networks with
a Sigmoid Nonlinearity 118
7.4.1 Some Properties of the Logistic Function 118
7.4.2 Logistic Function, Rate of Convergence and
Fixed Point Theory 121
7.5 Convergence of Nonlinear Relaxation Equations Realised
Through a Recurrent Perceptron 124
7.6 Relaxation in Nonlinear Systems Realised by an RNN 127
7.7 The Iterative Approach and Nesting 130
7.8 Upper Bounds for GAS Relaxation within FCRNNs 133
7.9 Summary 133

8 Data-Reusing Adaptive Learning Algorithms 135


8.1 Perspective 135
8.2 Introduction 135
8.2.1 Towards an A Posteriori Nonlinear Predictor 136
8.2.2 Note on the Computational Complexity 137
8.2.3 Chapter Summary 138
8.3 A Class of Simple A Posteriori Algorithms 138
8.3.1 The Case of a Recurrent Neural Filter 140
8.3.2 The Case of a General Recurrent Neural Network 141
8.3.3 Example for the Logistic Activation Function 141
8.4 An Iterated Data-Reusing Learning Algorithm 142
8.4.1 The Case of a Recurrent Predictor 143
8.5 Convergence of the A Posteriori Approach 143
8.6 A Posteriori Error Gradient Descent Algorithm 144
8.6.1 A Posteriori Error Gradient Algorithm for Recurrent
Neural Networks 146
8.7 Experimental Results 146
8.8 Summary 147

9 A Class of Normalised Algorithms for Online Training


of Recurrent Neural Networks 149
9.1 Perspective 149
9.2 Introduction 149
9.3 Overview 150
9.4 Derivation of the Normalised Adaptive Learning Rate for a
Simple Feedforward Nonlinear Filter 151
9.5 A Normalised Algorithm for Online Adaptation of Recurrent
Neural Networks 156
9.6 Summary 160
CONTENTS xi

10 Convergence of Online Learning Algorithms in Neural


Networks 161
10.1 Perspective 161
10.2 Introduction 161
10.3 Overview 164
10.4 Convergence Analysis of Online Gradient Descent Algorithms
for Recurrent Neural Adaptive Filters 164
10.5 Mean-Squared and Steady-State Mean-Squared Error Convergence 167
10.5.1 Convergence in the Mean Square 168
10.5.2 Steady-State Mean-Squared Error 169
10.6 Summary 169

11 Some Practical Considerations of Predictability and


Learning Algorithms for Various Signals 171
11.1 Perspective 171
11.2 Introduction 171
11.2.1 Detecting Nonlinearity in Signals 173
11.3 Overview 174
11.4 Measuring the Quality of Prediction and Detecting
Nonlinearity within a Signal 174
11.4.1 Deterministic Versus Stochastic Plots 175
11.4.2 Variance Analysis of Delay Vectors 175
11.4.3 Dynamical Properties of NO2 Air Pollutant Time Series 176
11.5 Experiments on Heart Rate Variability 181
11.5.1 Experimental Results 181
11.6 Prediction of the Lorenz Chaotic Series 195
11.7 Bifurcations in Recurrent Neural Networks 197
11.8 Summary 198

12 Exploiting Inherent Relationships Between


Parameters in Recurrent Neural Networks 199
12.1 Perspective 199
12.2 Introduction 199
12.3 Overview 204
12.4 Static and Dynamic Equivalence of Two Topologically Identical
RNNs 205
12.4.1 Static Equivalence of Two Isomorphic RNNs 205
12.4.2 Dynamic Equivalence of Two Isomorphic RNNs 206
12.5 Extension to a General RTRL Trained RNN 208
12.6 Extension to Other Commonly Used Activation Functions 209
12.7 Extension to Other Commonly Used Learning Algorithms for
Recurrent Neural Networks 209
12.7.1 Relationships Between β and η for the Backpropaga-
tion Through Time Algorithm 210
12.7.2 Results for the Recurrent Backpropagation Algorithm 211
xii CONTENTS

12.7.3 Results for Algorithms with a Momentum Term 211


12.8 Simulation Results 212
12.9 Summary of Relationships Between β and η for General
Recurrent Neural Networks 213
12.10 Relationship Between η and β for Modular Neural Networks:
Perspective 214
12.11 Static Equivalence Between an Arbitrary and a Referent
Modular Neural Network 214
12.12 Dynamic Equivalence Between an Arbitrary and a Referent
Modular Network 215
12.12.1 Dynamic Equivalence for a GD Learning Algorithm 216
12.12.2 Dynamic Equivalence Between Modular Recurrent
Neural Networks for the ERLS Learning Algorithm 217
12.12.3 Equivalence Between an Arbitrary and the Referent PRNN 218
12.13 Note on the β–η–W Relationships and Contractivity 218
12.14 Summary 219

Appendix A The O Notation and Vector and Matrix


Differentiation 221
A.1 The O Notation 221
A.2 Vector and Matrix Differentiation 221

Appendix B Concepts from the Approximation Theory 223

Appendix C Complex Sigmoid Activation Functions,


Holomorphic Mappings and Modular Groups 227
C.1 Complex Sigmoid Activation Functions 227
C.1.1 Modular Groups 228

Appendix D Learning Algorithms for RNNs 231


D.1 The RTRL Algorithm 231
D.1.1 Teacher Forcing Modification of the RTRL Algorithm 234
D.2 Gradient Descent Learning Algorithm for the PRNN 234
D.3 The ERLS Algorithm 236

Appendix E Terminology Used in the Field of Neural


Networks 239

Appendix F On the A Posteriori Approach in Science


and Engineering 241
F.1 History of A Posteriori Techniques 241
F.2 The Usage of A Posteriori 242
F.2.1 A Posteriori Techniques in the RNN Framework 242
CONTENTS xiii

F.2.2 The Geometric Interpretation of A Posteriori Error


Learning 243

Appendix G Contraction Mapping Theorems 245


G.1 Fixed Points and Contraction Mapping Theorems 245
G.1.1 Contraction Mapping Theorem in R 245
G.1.2 Contraction Mapping Theorem in RN 246
G.2 Lipschitz Continuity and Contraction Mapping 246
G.3 Historical Perspective 247

Appendix H Linear GAS Relaxation 251


H.1 Relaxation in Linear Systems
m 251
H.1.1 Stability Result for i=1 ai = 1 253
H.2 Examples 253

Appendix I The Main Notions in Stability Theory 263

Appendix J Deseasonalising Time Series 265

References 267

Index 281
Preface

New technologies in engineering, physics and biomedicine are creating problems in


which nonstationarity, nonlinearity, uncertainty and complexity play a major role.
Solutions to many of these problems require the use of nonlinear processors, among
which neural networks are one of the most powerful. Neural networks are appealing
because they learn by example and are strongly supported by statistical and opti-
misation theories. They not only complement conventional signal processing tech-
niques, but also emerge as a convenient alternative to expand signal processing hori-
zons.
The use of recurrent neural networks as identifiers and predictors in nonlinear
dynamical systems has increased significantly. They can exhibit a wide range of
dynamics, due to feedback, and are also tractable nonlinear maps.
In our work, neural network models are considered as massively interconnected
nonlinear adaptive filters. The emphasis is on dynamics, stability and spatio-temporal
behaviour of recurrent architectures and algorithms for prediction. However, wherever
possible the material has been presented starting from feedforward networks and
building up to the recurrent case.
Our objective is to offer an accessible self-contained research monograph which can
also be used as a graduate text. The material presented in the book is of interest to
a wide population of researchers working in engineering, computing, science, finance
and biosciences. So that the topics are self-contained, we assume familiarity with
the basic concepts of analysis and linear algebra. The material presented in Chap-
ters 1–6 can serve as an advanced text for courses on neural adaptive systems. The
book encompasses traditional and advanced learning algorithms and architectures for
recurrent neural networks. Although we emphasise the problem of time series predic-
tion, the results are applicable to a wide range of problems, including other signal
processing configurations such as system identification, noise cancellation and inverse
system modelling. We harmonise the concepts of learning algorithms, embedded sys-
tems, representation of memory, neural network architectures and causal–noncausal
dealing with time. A special emphasis is given to stability of algorithms – a key issue
in real-time applications of adaptive systems.
This book has emerged from the research that D. Mandic has undertaken while
at Imperial College of Science, Technology and Medicine, London, UK. The work
was continued within the vibrant culture of the University of East Anglia, Norwich,
UK.
xvi PREFACE

Acknowledgements

Danilo Mandic acknowledges Dr M. Razaz for providing a home from home in the
Bioinformatics Laboratory, School of Information Systems, University of East Anglia.
Many thanks to the people from the lab for creating a congenial atmosphere at work.
The Dean of the School of Information Systems, Professor V. Rayward-Smith and
his predecessor Dr J. Glauert, deserve thanks for their encouragement and support.
Dr M. Bozic has done a tremendous job on proofreading the mathematics. Dr W. Sher-
liker has contributed greatly to Chapter 10. Dr D. I. Kim has proofread the mathe-
matically involved chapters. I thank Dr G. Cawley, Dr M. Dzamonja, Dr A. James
and Dr G. Smith for proofreading the manuscript in its various phases. Dr R. Harvey
has been of great help throughout. Special thanks to my research associates I. Krcmar
and Dr R. Foxall for their help with some of the experimental results. H. Graham
has always been at hand with regard to computing problems. Many of the results
presented here have been achieved while I was at Imperial College, where I greatly
benefited from the unique research atmosphere in the Signal Processing Section of the
Department of Electrical and Electronic Engineering.
Jonathon Chambers acknowledges the outstanding PhD researchers with whom he
has had the opportunity to interact, they have helped so much towards his orientation
in adaptive signal processing. He also acknowledges Professor P. Watson, Head of
the Department of Electronic and Electrical Engineering, University of Bath, who
has provided the opportunity to work on the book during its later stages. Finally,
he thanks Mr D. M. Brookes and Dr P. A. Naylor, his former colleagues, for their
collaboration in research projects.

Danilo Mandic
Jonathon Chambers
List of Abbreviations
ACF Autocorrelation function
AIC Akaike Information Criterion
ANN Artificial Neural Network
AR Autoregressive
ARIMA Autoregressive Integrated Moving Average
ARMA Autoregressive Moving Average
ART Adaptive Resonance Theory
AS Asymptotic Stability
ATM Asynchronous Transfer Mode
BIC Bayesian Information Criterion
BC Before Christ
BIBO Bounded Input Bounded Output
BP Backpropagation
BPTT Backpropagation Through Time
CM Contraction Mapping
CMT Contraction Mapping Theorem
CNN Cellular Neural Network
DC Direct Current
DR Data Reusing
DSP Digital Signal Processing
DVS Deterministic Versus Stochastic
ECG Electrocardiagram
EKF Extended Kalman Filter
ERLS Extended Recursive Least Squares
ES Exponential Stability
FCRNN Fully Connected Recurrent Neural Network
FFNN Feedforward Neural Network
FIR Finite Impulse Response
FPI Fixed Point Iteration
GAS Global Asymptotic Stability
GD Gradient Descent
HOS Higher-Order Statistics
HRV Heart Rate Variability
i.i.d. Independent Identically Distributed
IIR Infinite Impulse Response
IVT Intermediate Value Theorem
KF Kalman Filter
xviii LIST OF ABBREVIATIONS

LMS Least Mean Square


LPC Linear Predictive Coding
LRGF Locally Recurrent Globally Feedforward
LUT Look-Up Table
MA Moving Average
MLP Multi-Layer Perceptron
MMSE Minimum Mean Square Error
MPEG Moving Pictures Experts Group
MSLC Multiple Sidelobe Cancellation
MSE Mean Squared Error
MVT Mean Value Theorem
NARMA Nonlinear Autoregressive Moving Average
NARMAX Nonlinear Autoregressive Moving Average with eXogenous input
NARX Nonlinear Autoregressive with eXogenous input
NGD Nonlinear Gradient Descent
NNGD Normalised Nonlinear Gradient Descent
NLMS Normalised Least Mean Square
NMA Nonlinear Moving Average
NN Neural Network
NO2 Nitrogen dioxide
NRTRL Normalised RTRL algorithm
pdf probability density function
PG Prediction Gain
PRNN Pipelined Recurrent Neural Network
PSD Power Spectral Density
RAM Random Access Memory
RBF Radial Basis Function
RBP Recurrent Backpropagation
RLS Recursive Least Squares
RNN Recurrent Neural Network
ROM Read-Only Memory
R–R Distance between two consecutive R waves in ECG
RTRL Real-Time Recurrent Learning
SG Stochastic Gradient
SP Signal Processing
VLSI Very Large Scale Integration
WGN White Gaussian Noise
WWW World Wide Web
Mathematical Notation

z First derivative of variable z


{·} Set of elements
α Momentum constant
β Slope of the nonlinear activation function Φ
γ Contraction constant, gain of the activation function
Γ Modular group of compositions of Möbius transformations
Γ (k) Adaptation gain vector
δi Gradient at ith neuron
δij Kronecker delta function
(k) Additive noise
η Learning rate
∇x Y Gradient of Y with respect to x
|·| Modulus operator
 · p Vector or matrix p-norm operator
∗ Convolution operator
∝ Proportional to
0 Null vector
λ Forgetting factor
λi Eigenvalues of a matrix
µ Step size in the LMS algorithm
µ Mean of a random variable
ν(k) Additive noise
Ωk Set of nearest neighbours vectors
|Ωk | Number of elements in set Ωk
j
πn,l Sensitivity of the jth neuron to the change in wn,l
Φ Nonlinear activation function of a neuron
Φx First derivative of Φ with respect to x
Π Matrix of gradients at the output neuron of an RNN

 Partial derivative operator
Summation operator
σ Variance
σ(x) General sigmoid function
Σ(k) Sample input matrix
τ Delay operator
Θ General nonlinear function
Θ(k) Parameter vector
xx MATHEMATICAL NOTATION

( · )T Vector or matrix transpose


CL ( · ) Computational load
B Backshift operator
C Set of complex numbers
C n (a, b) The class of n-times continuously differentiable
functions on an open interval (a, b)
d(k) Desired response
deg(·) Degree of a polynomial
diag[ · ] Diagonal elements of a matrix
E(k) Cost function
E[ · ] Expectation operator
E[y | x] Conditional expectation operator
e(k) Instantaneous prediction error
ē(k) A posteriori prediction error
F (·, ·) Nonlinear approximation function
G(·) Basis function
I Identity matrix
inf Infimum
J(k) Cost function
J Non-negative error measure
J Jacobian
k Discrete time index
H Hessian matrix
H(z) Transfer function in z-domain
H∞ The infinity norm quadratic optimisation
L Lipschitz constant
Lp Lp norm
lim Limit
max Maximum
min Minimum
N Set of natural numbers
N (µ, σ 2 ) Normal random process with mean µ and variance σ 2
O( · ) Order of computational complexity
P (f ) Power spectral density
q(k) Measurement noise
R Set of real numbers
R+ {x ∈ R | x > 0}
Rn The n-dimensional Euclidean space
Rp Prediction gain
Rx,y , Rx,y Correlation matrix between vectors x and y
S Class of sigmoidal functions
sgn(·) Signum function
span(·) Span of a vector
sup Supremum
T Contraction operator
tr{·} Trace operator (sum of diagonal elements of a matrix)
MATHEMATICAL NOTATION xxi

u(k) Input vector to an RNN


v Internal activation potential of a neuron
v(k) System noise
w(k) Weight vector
∆w(k) Correction to the weight vector
w̃(k) Optimal weight vector
w̆(k) Weight error vector
ŵ(k) Weight vector estimate
wi,j Weight connecting neuron j to neuron i
W (k) Weight matrix of an NN
∆W (k) Correction to the weight matrix
x(k), X(k) External input vector to an NN
xl (k) ∈ X(k) The lth element of vector X(k)
x̂ Estimated value of x
x∗ Fixed point of the sequence {x}
y Output of an NN
yi,j Output of the jth neuron of the ith module of the PRNN
z −k The kth-order time delay
Z The Z transform
Z −1 The inverse Z transform
Recurrent Neural Networks for Prediction
Authored by Danilo P. Mandic, Jonathon A. Chambers
Copyright 2001
c John Wiley & Sons Ltd
ISBNs: 0-471-49517-4 (Hardback); 0-470-84535-X (Electronic)
Index

Activation functions Autoregressive integrated moving


continual adaptation 202 average (ARIMA) model 171
definition 239 Autoregressive moving average
desirable properties 51 (ARMA) models
examples 53 filter structure 37
properties to map to the complex Averaging methods 163
plane 61
Backpropagation 18
steepness 200 normalised algorithm 153
temperature 200 through time 209
why nonlinear 50 through time dynamical
Activation potential 36, 54 equivalence 210
Adaline 2 Batch learning 20
Adaptability 9 Bias/variance dilemma 112
Adaptation gain Bilinear model 93
behaviour of steepest descent 16 Black box modelling 73
Adaptive algorithms Blind equalizer 12
convergence criteria 163 Block-based estimators 15
momentum 211 Bounded Input Bounded Output
performance criteria 161 (BIBO) stability 38, 118
Adaptive learning 21 Brower’s Fixed Point Theorem 247
Adaptive resonance theory 2
Adaptive systems Canonical state-space representation
configurations 10 44
generic structure 9 Cauchy–Riemann equations 228
Analytic continuation 61 Channel equalisation 11
A posteriori Clamping functions 240
algorithms 113, 138 Classes of learning algorithm 18
computational complexity 138 Cognitive science 1
error 135 Complex numbers
techniques 241 elementary transformations 227
Asymptotic convergence rate 121 Connectionist models 1
Asymptotic stability 116 dates 2
Attractors 115 Constructive learning 21
Autonomous systems 263 Contraction mapping theorem 245
Autoregressive (AR) models Contractivity of relaxation 252
coefficients 37 Curse of dimensionality 22

281
282 INDEX

David Hilbert 47 Gradient-based learning 12


Data-reusing 114, 135, 142 Grey box modelling 73
stabilising features 137, 145
Delay space embedding 41 Hammerstein model 77
Deseasonalising data 265 Heart rate variability 181
Deterministic learning 21 Heaviside function 55
Deterministic versus stochastic (DVS) Hessian 15, 52
plots 172, 175 Holomorphic function 227
Directed algorithms 111 Hyperbolic attractor 110
Domain of attraction 116
Dynamic multilayer perceptron Incremental learning 21
(DMLP) 79 Independence assumptions 163
Independent Identically Distributed
Efficiency index 135 (IID) 38
Electrocardiogram (ECG) 181 Induced local field 36
Embedding dimension 74, 76, 174, 178 Infinite Impulse Response (IIR)
Embedded memory 111
equation error adaptive filter 107
Equation error 104
Input transformations 23
adaptation 107
Equilibrium point 264 Kalman Filter (KF) algorithm 14
Error criterion 101 Kolmogorov function 223
Error function 20 Kolmogorov’s theorem 6, 93, 223
Exogeneous inputs 40 universal approximation theorem
Exponentially stable 264 47, 48
Extended Kalman filter 109 Kolmogorov–Sprecher Theorem 224
Extended recursive least squares
algorithm 217, 236 Learning rate 13
Feedforward network continual adaptation 202
definition 239 selection 202
Fixed point Learning rate adaptation 200
iteration 143 Least Mean Square (LMS) algorithm
theory 117, 245 14, 18
Forgetting behaviour 110 data-reusing form 136
Forgetting mechanism 101 Linear filters 37
Frobenius matrix 251 Linear prediction
Function definitions foundations 31
conformal 227 Linear regression 14
entire 227 Liouville Theorem 61, 228
meromorphic 227 Lipschitz function 224, 246, 263
Gamma memory 42 Logistic function 36, 53
Gaussian variables a contraction 118
fourth order standard factorisation approximation 58
167 fixed points of biased form 127
Gear shifting 21 Lorenz equations 174, 195
Global asymptotic stability (GAS) Lyapunov stability 116, 143, 162
116, 118, 251, 264 indirect method 162
INDEX 283

Mandelbrot and Julia sets 61 with locally distributed dynamics


Markov model (LDNN) 79
first order 164 Neuron
Massive parallelism 6 biological perspective 32
Misalignment 168, 169 definition 3
Möbius transformation 47, 228 structure 32, 36
fixed points 67 Noise cancellation 10
Model reference adaptive system Nonlinear Autoregressive (NAR)
(MRAS) 106 model 40
Modular group 229 Nonlinear Autoregressive Moving
transfer function between neurons Average (NARMA) model 39
66 recurrent perceptron 97
Modular neural networks Nonlinear Finite Impulse Response
dynamic equivalence 215 (FIR) filter
static equivalence 214 learning algorithm 18
normalised gradient descent,
NARMA with eXogeneous inputs optimal step size 153
(NARMAX) model weight update 201
compact representation 71 Nonlinear gradient descent 151
validity 95 Nonlinear parallel model 103
Nearest neighbours 175 Nonlinearity detection 171, 173
Nesting 130 Nonparametric modelling 72
Neural dynamics 115 Non-recursive algorithm 25
Neural network Normalised LMS algorithm
bias term 50 learning rate 150
free parameters 199
general architectures for prediction O notation 221
and system identification 99 Objective function 20
growing and pruning 21 Ontogenic functions 241
hybrid 84 Orthogonal condition 34
in complex plane 60 Output error 104
invertibility 67 adaptive infinite impulse response
modularity 26, 199, 214 (IIR) filter 105
multilayer feedforward 41 learning algorithm 108
nesting 27 Parametric modelling 72
node structure 2 Pattern learning 26
ontogenic 21 Perceptron 2
properties 1 Phase space 174
radial basis function 60 Piecewise-linear model 36
redundancy 113 Pipelining 131
specifications 2 Polynomial equations 48
spline 56 Polynomial time 221
time-delay 42 Prediction
topology 240 basic building blocks 35
universal approximators 49, 54 conditional mean 39, 88
wavelet 57 configuration 11
284 INDEX

difficulties 5 Williams–Zipser 83
history 4 Recurrent perceptron
principle 33 GAS relaxation 125
reasons for using neural networks 5 Recursive algorithm 25
Preservation of Recursive Least-Squares (RLS)
contractivity/expansivity 218 algorithm 14
Principal component analysis 23 Referent network 205
Proximity functions 54 Riccati equation 15
Pseudolinear regression algorithm 105 Robust stability 116
Quasi-Newton learning algorithm 15 Sandwich structure 86
Rate of convergence 121 Santa Fe Institute 6
Real time recurrent learning (RTRL) Saturated-modulus function 57
92, 108, 231 Seasonal ARIMA model 266
a posteriori form 141 Seasonal behaviour 172
normalised form 159 Semiparametric modelling 72
teacher forcing 234 Sensitivities 108
weight update for static and Sequential estimators 15
dynamic equivalence 209 Set
Recurrent backpropagation 109, 209 closure 224
static and dynamic equivalence 211 compact 225
Recurrent neural filter dense subset 224
a posteriori form 140 Sigmoid packet 56
fully connected 98 Sign-preserving 162
stability bound for adaptive Spin glass 2
algorithm 166 Spline, cubic 225
Recurrent neural networks (RNNs) Staircase function 55
activation feedback 81 Standardisation 23
dynamic behaviour 69 Stochastic learning 21
dynamic equivalence 205, 207 Stochastic matrix 253
Elman 82 Stone–Weierstrass theorem 62
fully connected, relaxation 133 Supervised learning 25
fully connected, structure 231 definition 239
Jordan 83 Surrogate dataset 173
local or global feedback 43 System identification 10
locally recurrent–globally System linearity 263
feedforward 82 Takens’ theorem 44, 71, 96
nesting 130 tanh activation function
output feedback 81 contraction mapping 124
pipelined (PRNN) 85, 132, 204, 234 Teacher forced adaptation 108
rate of convergence of relaxation
Threshold nonlinearity 36
127
Training set construction 24
relaxation 129
Turing machine 22
RTRL optimal learning rate 159
static equivalence 205, 206 Unidirected algorithms 111
universal approximators 49 Uniform approximation 51
INDEX 285

Uniform asymptotic stability 264


Uniform stability 264
Unsupervised learning 25
definition 239
Uryson model 77
Vanishing gradient 109, 166
Vector and matrix differentiation
rules 221
Volterra series
expansion 71
Weierstrass Theorem 6, 92, 224
White box modelling 73
Wiener filter 17
Wiener model 77
represented by NARMA model 80
Wiener–Hammerstein model 78
Wold decomposition 39
Yule–Walker equations 34
Zero-memory nonlinearities 31
examples 35
Recurrent Neural Networks for Prediction
Authored by Danilo P. Mandic, Jonathon A. Chambers
Copyright 2001
c John Wiley & Sons Ltd
ISBNs: 0-471-49517-4 (Hardback); 0-470-84535-X (Electronic)

Introduction

Artificial neural network (ANN) models have been extensively studied with the aim
of achieving human-like performance, especially in the field of pattern recognition.
These networks are composed of a number of nonlinear computational elements which
operate in parallel and are arranged in a manner reminiscent of biological neural inter-
connections. ANNs are known by many names such as connectionist models, parallel
distributed processing models and neuromorphic systems (Lippmann 1987). The ori-
gin of connectionist ideas can be traced back to the Greek philosopher, Aristotle, and
his ideas of mental associations. He proposed some of the basic concepts such as that
memory is composed of simple elements connected to each other via a number of
different mechanisms (Medler 1998).
While early work in ANNs used anthropomorphic arguments to introduce the meth-
ods and models used, today neural networks used in engineering are related to algo-
rithms and computation and do not question how brains might work (Hunt et al.
1992). For instance, recurrent neural networks have been attractive to physicists due
to their isomorphism to spin glass systems (Ermentrout 1998). The following proper-
ties of neural networks make them important in signal processing (Hunt et al. 1992):
they are nonlinear systems; they enable parallel distributed processing; they can be
implemented in VLSI technology; they provide learning, adaptation and data fusion
of both qualitative (symbolic data from artificial intelligence) and quantitative (from
engineering) data; they realise multivariable systems.
The area of neural networks is nowadays considered from two main perspectives.
The first perspective is cognitive science, which is an interdisciplinary study of the
mind. The second perspective is connectionism, which is a theory of information pro-
cessing (Medler 1998). The neural networks in this work are approached from an
engineering perspective, i.e. to make networks efficient in terms of topology, learning
algorithms, ability to approximate functions and capture dynamics of time-varying
systems. From the perspective of connection patterns, neural networks can be grouped
into two categories: feedforward networks, in which graphs have no loops, and recur-
rent networks, where loops occur because of feedback connections. Feedforward net-
works are static, that is, a given input can produce only one set of outputs, and hence
carry no memory. In contrast, recurrent network architectures enable the informa-
tion to be temporally memorised in the networks (Kung and Hwang 1998). Based
on training by example, with strong support of statistical and optimisation theories
2 SOME IMPORTANT DATES IN THE HISTORY OF CONNECTIONISM

(Cichocki and Unbehauen 1993; Zhang and Constantinides 1992), neural networks
are becoming one of the most powerful and appealing nonlinear signal processors for
a variety of signal processing applications. As such, neural networks expand signal
processing horizons (Chen 1997; Haykin 1996b), and can be considered as massively
interconnected nonlinear adaptive filters. Our emphasis will be on dynamics of recur-
rent architectures and algorithms for prediction.

1.1 Some Important Dates in the History of Connectionism

In the early 1940s the pioneers of the field, McCulloch and Pitts, studied the potential
of the interconnection of a model of a neuron. They proposed a computational model
based on a simple neuron-like element (McCulloch and Pitts 1943). Others, like Hebb
were concerned with the adaptation laws involved in neural systems. In 1949 Donald
Hebb devised a learning rule for adapting the connections within artificial neurons
(Hebb 1949). A period of early activity extends up to the 1960s with the work of
Rosenblatt (1962) and Widrow and Hoff (1960). In 1958, Rosenblatt coined the name
‘perceptron’. Based upon the perceptron (Rosenblatt 1958), he developed the theory
of statistical separability. The next major development is the new formulation of
learning rules by Widrow and Hoff in their Adaline (Widrow and Hoff 1960). In
1969, Minsky and Papert (1969) provided a rigorous analysis of the perceptron. The
work of Grossberg in 1976 was based on biological and psychological evidence. He
proposed several new architectures of nonlinear dynamical systems (Grossberg 1974)
and introduced adaptive resonance theory (ART), which is a real-time ANN that
performs supervised and unsupervised learning of categories, pattern classification and
prediction. In 1982 Hopfield pointed out that neural networks with certain symmetries
are analogues to spin glasses.
A seminal book on ANNs is by Rumelhart et al. (1986). Fukushima explored com-
petitive learning in his biologically inspired Cognitron and Neocognitron (Fukushima
1975; Widrow and Lehr 1990). In 1971 Werbos developed a backpropagation learn-
ing algorithm which he published in his doctoral thesis (Werbos 1974). Rumelhart
et al . rediscovered this technique in 1986 (Rumelhart et al. 1986). Kohonen (1982),
introduced self-organised maps for pattern recognition (Burr 1993).

1.2 The Structure of Neural Networks

In neural networks, computational models or nodes are connected through weights


that are adapted during use to improve performance. The main idea is to achieve
good performance via dense interconnection of simple computational elements. The
simplest node provides a linear combination of N weights w1 , . . . , wN and N inputs
x1 , . . . , xN , and passes the result through a nonlinearity Φ, as shown in Figure 1.1.
Models of neural networks are specified by the net topology, node characteristics
and training or learning rules. From the perspective of connection patterns, neural
networks can be grouped into two categories: feedforward networks, in which graphs
have no loops, and recurrent networks, where loops occur because of feedback con-
nections. Neural networks are specified by (Tsoi and Back 1997)
INTRODUCTION 3

+1
x1 w1
w0
w2
x2
.. y = Φ(Σ x i wi +w0)
i
wN node
.
xN
Figure 1.1 Connections within a node

• Node: typically a sigmoid function;

• Layer: a set of nodes at the same hierarchical level;

• Connection: constant weights or weights as a linear dynamical system, feedfor-


ward or recurrent;

• Architecture: an arrangement of interconnected neurons;

• Mode of operation: analogue or digital.

Massively interconnected neural nets provide a greater degree of robustness or fault


tolerance than sequential machines. By robustness we mean that small perturbations
in parameters will also result in small deviations of the values of the signals from their
nominal values.
In our work, hence, the term neuron will refer to an operator which performs the
mapping
Neuron: RN +1 → R (1.1)

as shown in Figure 1.1. The equation



N 
y=Φ wi xi + w0 (1.2)
i=1

represents a mathematical description of a neuron. The input vector is given by x =


[x1 , . . . , xN , 1]T , whereas w = [w1 , . . . , wN , w0 ]T is referred to as the weight vector of
a neuron. The weight w0 is the weight which corresponds to the bias input, which is
typically set to unity. The function Φ : R → (0, 1) is monotone and continuous, most
commonly of a sigmoid shape. A set of interconnected neurons is a neural network
(NN). If there are N input elements to an NN and M output elements of an NN, then
an NN defines a continuous mapping

NN: RN → RM . (1.3)
4 PERSPECTIVE

1.3 Perspective

Before the 1920s, prediction was undertaken by simply extrapolating the time series
through a global fit procedure. The beginning of modern time series prediction was
in 1927 when Yule introduced the autoregressive model in order to predict the annual
number of sunspots. For the next half century the models considered were linear, typ-
ically driven by white noise. In the 1980s, the state-space representation and machine
learning, typically by neural networks, emerged as new potential models for prediction
of highly complex, nonlinear and nonstationary phenomena. This was the shift from
rule-based models to data-driven methods (Gershenfeld and Weigend 1993).
Time series prediction has traditionally been performed by the use of linear para-
metric autoregressive (AR), moving-average (MA) or autoregressive moving-average
(ARMA) models (Box and Jenkins 1976; Ljung and Soderstrom 1983; Makhoul 1975),
the parameters of which are estimated either in a block or a sequential manner with
the least mean square (LMS) or recursive least-squares (RLS) algorithms (Haykin
1994). An obvious problem is that these processors are linear and are not able to
cope with certain nonstationary signals, and signals whose mathematical model is
not linear. On the other hand, neural networks are powerful when applied to prob-
lems whose solutions require knowledge which is difficult to specify, but for which
there is an abundance of examples (Dillon and Manikopoulos 1991; Gent and Shep-
pard 1992; Townshend 1991). As time series prediction is conventionally performed
entirely by inference of future behaviour from examples of past behaviour, it is a suit-
able application for a neural network predictor. The neural network approach to time
series prediction is non-parametric in the sense that it does not need to know any
information regarding the process that generates the signal. For instance, the order
and parameters of an AR or ARMA process are not needed in order to carry out the
prediction. This task is carried out by a process of learning from examples presented
to the network and changing network weights in response to the output error.
Li (1992) has shown that the recurrent neural network (RNN) with a sufficiently
large number of neurons is a realisation of the nonlinear ARMA (NARMA) process.
RNNs performing NARMA prediction have traditionally been trained by the real-
time recurrent learning (RTRL) algorithm (Williams and Zipser 1989a) which pro-
vides the training process of the RNN ‘on the run’. However, for a complex physical
process, some difficulties encountered by RNNs such as the high degree of approxi-
mation involved in the RTRL algorithm for a high-order MA part of the underlying
NARMA process, high computational complexity of O(N 4 ), with N being the number
of neurons in the RNN, insufficient degree of nonlinearity involved, and relatively low
robustness, induced a search for some other, more suitable schemes for RNN-based
predictors.
In addition, in time series prediction of nonlinear and nonstationary signals, there
is a need to learn long-time temporal dependencies. This is rather difficult with con-
ventional RNNs because of the problem of vanishing gradient (Bengio et al. 1994).
A solution to that problem might be NARMA models and nonlinear autoregressive
moving average models with exogenous inputs (NARMAX) (Siegelmann et al. 1997)
realised by recurrent neural networks. However, the quality of performance is highly
dependent on the order of the AR and MA parts in the NARMAX model.
INTRODUCTION 5

The main reasons for using neural networks for prediction rather than classical time
series analysis are (Wu 1995)
• they are computationally at least as fast, if not faster, than most available
statistical techniques;
• they are self-monitoring (i.e. they learn how to make accurate predictions);
• they are as accurate if not more accurate than most of the available statistical
techniques;
• they provide iterative forecasts;
• they are able to cope with nonlinearity and nonstationarity of input processes;
• they offer both parametric and nonparametric prediction.

1.4 Neural Networks for Prediction: Perspective

Many signals are generated from an inherently nonlinear physical mechanism and have
statistically non-stationary properties, a classic example of which is speech. Linear
structure adaptive filters are suitable for the nonstationary characteristics of such
signals, but they do not account for nonlinearity and associated higher-order statistics
(Shynk 1989). Adaptive techniques which recognise the nonlinear nature of the signal
should therefore outperform traditional linear adaptive filtering techniques (Haykin
1996a; Kay 1993). The classic approach to time series prediction is to undertake an
analysis of the time series data, which includes modelling, identification of the model
and model parameter estimation phases (Makhoul 1975). The design may be iterated
by measuring the closeness of the model to the real data. This can be a long process,
often involving the derivation, implementation and refinement of a number of models
before one with appropriate characteristics is found.
In particular, the most difficult systems to predict are
• those with non-stationary dynamics, where the underlying behaviour varies with
time, a typical example of which is speech production;
• those which deal with physical data which are subject to noise and experimen-
tation error, such as biomedical signals;
• those which deal with short time series, providing few data points on which to
conduct the analysis, such as heart rate signals, chaotic signals and meteorolog-
ical signals.
In all these situations, traditional techniques are severely limited and alternative
techniques must be found (Bengio 1995; Haykin and Li 1995; Li and Haykin 1993;
Niranjan and Kadirkamanathan 1991).
On the other hand, neural networks are powerful when applied to problems whose
solutions require knowledge which is difficult to specify, but for which there is an
abundance of examples (Dillon and Manikopoulos 1991; Gent and Sheppard 1992;
Townshend 1991). From a system theoretic point of view, neural networks can be
considered as a conveniently parametrised class of nonlinear maps (Narendra 1996).
6 STRUCTURE OF THE BOOK

There has been a recent resurgence in the field of ANNs caused by new net topolo-
gies, VLSI computational algorithms and the introduction of massive parallelism into
neural networks. As such, they are both universal function approximators (Cybenko
1989; Hornik et al. 1989) and arbitrary pattern classifiers. From the Weierstrass The-
orem, it is known that polynomials, and many other approximation schemes, can
approximate arbitrarily well a continuous function. Kolmogorov’s theorem (a neg-
ative solution of Hilbert’s 13th problem (Lorentz 1976)) states that any continuous
function can be approximated using only linear summations and nonlinear but contin-
uously increasing functions of only one variable. This makes neural networks suitable
for universal approximation, and hence prediction. Although sometimes computation-
ally demanding (Williams and Zipser 1995), neural networks have found their place
in the area of nonlinear autoregressive moving average (NARMA) (Bailer-Jones et
al. 1998; Connor et al. 1992; Lin et al. 1996) prediction applications. Comprehensive
survey papers on the use and role of ANNs can be found in Widrow and Lehr (1990),
Lippmann (1987), Medler (1998), Ermentrout (1998), Hunt et al. (1992) and Billings
(1980).
Only recently, neural networks have been considered for prediction. A recent compe-
tition by the Santa Fe Institute for Studies in the Science of Complexity (1991–1993)
(Weigend and Gershenfeld 1994) showed that neural networks can outperform conven-
tional linear predictors in a number of applications (Waibel et al. 1989). In journals,
there has been an ever increasing interest in applying neural networks. A most com-
prehensive issue on recurrent neural networks is the issue of the IEEE Transactions of
Neural Networks, vol. 5, no. 2, March 1994. In the signal processing community, there
has been a recent special issue ‘Neural Networks for Signal Processing’ of the IEEE
Transactions on Signal Processing, vol. 45, no. 11, November 1997, and also the issue
‘Intelligent Signal Processing’ of the Proceedings of IEEE, vol. 86, no. 11, November
1998, both dedicated to the use of neural networks in signal processing applications.
Figure 1.2 shows the frequency of the appearance of articles on recurrent neural net-
works in common citation index databases. Figure 1.2(a) shows number of journal and
conference articles on recurrent neural networks in IEE/IEEE publications between
1988 and 1999. The data were gathered using the IEL Online service, and these publi-
cations are mainly periodicals and conferences in electronics engineering. Figure 1.2(b)
shows the frequency of appearance for BIDS/ATHENS database, between 1988 and
2000, 1 which also includes non-engineering publications. From Figure 1.2, there is a
clear growing trend in the frequency of appearance of articles on recurrent neural
networks. Therefore, we felt that there was a need for a research monograph that
would cover a part of the area with up to date ideas and results.

1.5 Structure of the Book

The book is divided into 12 chapters and 10 appendices. An introduction to connec-


tionism and the notion of neural networks for prediction is included in Chapter 1. The
fundamentals of adaptive signal processing and learning theory are detailed in Chap-
ter 2. An initial overview of network architectures for prediction is given in Chapter 3.
1 At the time of writing, only the months up to September 2000 were covered.
INTRODUCTION 7

Number of journal and conference papers on Recurrent Neural Networks via IEL
140

120

100

80
Number

60

40

20

0
1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000
Year

(a) Appearance of articles on Recurrent Neural Networks in


IEE/IEEE publications in period 1988–1999

Number of journal and conference papers on Recurrent Neural Networks via BIDS
70
(b)

60

50

40
Number

30

20

10

0
1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001
Year

(b) Appearance of articles on Recurrent Neural Networks in


BIDS database in period 1988–2000

Figure 1.2 Appearance of articles on RNNs in major citation databases. (a) Appearance
of articles on recurrent neural networks in IEE/IEEE publications in period 1988–1999. (b)
Appearance of articles on recurrent neural networks in BIDS database in period 1988–2000.
8 READERSHIP

Chapter 4 contains a detailed discussion of activation functions and new insights are
provided by the consideration of neural networks within the framework of modu-
lar groups from number theory. The material in Chapter 5 builds upon that within
Chapter 3 and provides more comprehensive coverage of recurrent neural network
architectures together with concepts from nonlinear system modelling. In Chapter 6,
neural networks are considered as nonlinear adaptive filters whereby the necessary
learning strategies for recurrent neural networks are developed. The stability issues
for certain recurrent neural network architectures are considered in Chapter 7 through
the exploitation of fixed point theory and bounds for global asymptotic stability are
derived. A posteriori adaptive learning algorithms are introduced in Chapter 8 and
the synergy with data-reusing algorithms is highlighted. In Chapter 9, a new class
of normalised algorithms for online training of recurrent neural networks is derived.
The convergence of online learning algorithms for neural networks is addressed in
Chapter 10. Experimental results for the prediction of nonlinear and nonstationary
signals with recurrent neural networks are presented in Chapter 11. In Chapter 12,
the exploitation of inherent relationships between parameters within recurrent neural
networks is described. Appendices A to J provide background to the main chapters
and cover key concepts from linear algebra, approximation theory, complex sigmoid
activation functions, a precedent learning algorithm for recurrent neural networks, ter-
minology in neural networks, a posteriori techniques in science and engineering, con-
traction mapping theory, linear relaxation and stability, stability of general nonlinear
systems and deseasonalising of time series. The book concludes with a comprehensive
bibliography.

1.6 Readership

This book is targeted at graduate students and research engineers active in the areas
of communications, neural networks, nonlinear control, signal processing and time
series analysis. It will also be useful for engineers and scientists working in diverse
application areas, such as artificial intelligence, biomedicine, earth sciences, finance
and physics.
Recurrent Neural Networks for Prediction
Authored by Danilo P. Mandic, Jonathon A. Chambers
Copyright 2001
c John Wiley & Sons Ltd
ISBNs: 0-471-49517-4 (Hardback); 0-470-84535-X (Electronic)

Fundamentals

2.1 Perspective

Adaptive systems are at the very core of modern digital signal processing. There are
many reasons for this, foremost amongst these is that adaptive filtering, prediction or
identification do not require explicit a priori statistical knowledge of the input data.
Adaptive systems are employed in numerous areas such as biomedicine, communica-
tions, control, radar, sonar and video processing (Haykin 1996a).

2.1.1 Chapter Summary

In this chapter the fundamentals of adaptive systems are introduced. Emphasis is


first placed upon the various structures available for adaptive signal processing, and
includes the predictor structure which is the focus of this book. Basic learning algo-
rithms and concepts are next detailed in the context of linear and nonlinear structure
filters and networks. Finally, the issue of modularity is discussed.

2.2 Adaptive Systems

Adaptability, in essence, is the ability to react in sympathy with disturbances to the


environment. A system that exhibits adaptability is said to be adaptive. Biological
systems are adaptive systems; animals, for example, can adapt to changes in their
environment through a learning process (Haykin 1999a).
A generic adaptive system employed in engineering is shown in Figure 2.1. It consists
of
• a set of adjustable parameters (weights) within some filter structure;
• an error calculation block (the difference between the desired response and the
output of the filter structure);
• a control (learning) algorithm for the adaptation of the weights.
The type of learning represented in Figure 2.1 is so-called supervised learning,
since the learning is directed by the desired response of the system. Here, the goal
10 ADAPTIVE SYSTEMS

Comparator
_
Filter +
Structure Σ Desired
Input
Signal Response

Control
Algorithm Error

Figure 2.1 Block diagram of an adaptive system

is to adjust iteratively the free parameters (weights) of the adaptive system so as to


minimise a prescribed cost function in some predetermined sense. 1 The filter structure
within the adaptive system may be linear, such as a finite impulse response (FIR) or
infinite impulse response (IIR) filter, or nonlinear, such as a Volterra filter or a neural
network.

2.2.1 Configurations of Adaptive Systems Used in Signal Processing

Four typical configurations of adaptive systems used in engineering are shown in


Figure 2.2 (Jenkins et al. 1996). The notions of an adaptive filter and adaptive system
are used here interchangeably.
For the system identification configuration shown in Figure 2.2(a), both the adap-
tive filter and the unknown system are fed with the same input signal x(k). The error
signal e(k) is formed at the output as e(k) = d(k) − y(k), and the parameters of the
adaptive system are adjusted using this error information. An attractive point of this
configuration is that the desired response signal d(k), also known as a teaching or
training signal, is readily available from the unknown system (plant). Applications of
this scheme are in acoustic and electrical echo cancellation, control and regulation of
real-time industrial and other processes (plants). The knowledge about the system is
stored in the set of converged weights of the adaptive system. If the dynamics of the
plant are not time-varying, it is possible to identify the parameters (weights) of the
plant to an arbitrary accuracy.
If we desire to form a system which inter-relates noise components in the input
and desired response signals, the noise cancelling configuration can be implemented
(Figure 2.2(b)). The only requirement is that the noise in the primary input and the
reference noise are correlated. This configuration subtracts an estimate of the noise
from the received signal. Applications of this configuration include noise cancellation

1 The aim is to minimise some function of the error e. If E[e2 ] is minimised, we consider minimum

mean squared error (MSE) adaptation, the statistical expectation operator, E[ · ], is due to the
random nature of the inputs to the adaptive system.
FUNDAMENTALS 11

y(k)
x(k) Adaptive y(k)
Adaptive
Filter Filter
_ _
e(k) e(k)
Σ Σ
+ +
d(k)
x(k) Unknown d(k)
N1(k) s(k) + No(k)
Input System Output
Reference input Primary input

(a) System identification configuration (b) Noise cancelling configuration

Unknown Adaptive y(k)


System Filter
x(k) Adaptive y(k) _
Delay e(k)
Filter Σ
_ +
e(k)
Σ d(k)
+ Delay

d(k)
x(k)

(c) Prediction configuration (d) Inverse system configuration

Figure 2.2 Configurations for applications of adaptive systems

in acoustic environments and estimation of the foetal ECG from the mixture of the
maternal and foetal ECG (Widrow and Stearns 1985).
In the adaptive prediction configuration, the desired signal is the input signal
advanced relative to the input of the adaptive filter, as shown in Figure 2.2(c). This
configuration has numerous applications in various areas of engineering, science and
technology and most of the material in this book is dedicated to prediction. In fact,
prediction may be considered as a basis for any adaptation process, since the adaptive
filter is trying to predict the desired response.
The inverse system configuration, shown in Figure 2.2(d), has an adaptive system
cascaded with the unknown system. A typical application is adaptive channel equal-
isation in telecommunications, whereby an adaptive system tries to compensate for
the possibly time-varying communication channel, so that the transfer function from
the input to the output of Figure 2.2(d) approximates a pure delay.
In most adaptive signal processing applications, parametric methods are applied
which require a priori knowledge (or postulation) of a specific model in the form of
differential or difference equations. Thus, it is necessary to determine the appropriate
model order for successful operation, which will underpin data length requirements.
On the other hand, nonparametric methods employ general model forms of integral
12 GRADIENT-BASED LEARNING ALGORITHMS

s(k) x(k) Adaptive y(k) Zero d(k)


Channel Memory
Equaliser Nonlinearity

_ +
Σ

Figure 2.3 Block diagram of a blind equalisation structure

equations or functional expansions valid for a broad class of dynamic nonlinearities.


The most widely used nonparametric methods are referred to as the Volterra–Wiener
approach and are based on functional expansions.

2.2.2 Blind Adaptive Techniques

The presence of an explicit desired response signal, d(k), in all the structures shown
in Figure 2.2 implies that conventional, supervised, adaptive signal processing tech-
niques may be applied for the purpose of learning. When no such signal is available,
it may still be possible to perform learning by exploiting so-called blind, or unsuper-
vised, methods. These methods exploit certain a priori statistical knowledge of the
input data. For a single signal, this knowledge may be in the form of its constant mod-
ulus property, or, for multiple signals, their mutual statistical independence (Haykin
2000). In Figure 2.3 the structure of a blind equaliser is shown, notice the desired
response is generated from the output of a zero-memory nonlinearity. This nonlinear-
ity is implicitly being used to test the higher-order (i.e. greater than second-order)
statistical properties of the output of the adaptive equaliser. When ideal convergence
of the adaptive filter is achieved, the zero-memory nonlinearity has no effect upon
the signal y(k) and therefore y(k) has identical statistical properties to that of the
channel input s(k).

2.3 Gradient-Based Learning Algorithms

We provide a brief introduction to the notion of gradient-based learning. The aim is


to update iteratively the weight vector w of an adaptive system so that a nonnegative
error measure J ( · ) is reduced at each time step k,

J (w + ∆w) < J (w), (2.1)

where ∆w represents the change in w from one iteration to the next. This will gener-
ally ensure that after training, an adaptive system has captured the relevant properties
of the unknown system that we are trying to model. Using a Taylor series expansion
FUNDAMENTALS 13

x1
w1 =10

x2 w2 =1
Σ y
x3 w3 =0.1

w4 =0.01
x4
Figure 2.4 Example of a filter with widely differing weights

to approximate the error measure, we obtain 2

∂J (w)
J (w) + ∆w + O(w2 ) < J (w). (2.2)
∂w
This way, with the assumption that the higher-order terms in the left-hand side of
(2.2) can be neglected, (2.1) can be rewritten as

∂J (w)
∆w < 0. (2.3)
∂w
From (2.3), an algorithm that would continuously reduce the error measure on the
run, should change the weights in the opposite direction of the gradient ∂J (w)/∂w,
i.e.
∂J
∆w = −η , (2.4)
∂w
where η is a small positive scalar called the learning rate, step size or adaptation
parameter.
Examining (2.4), if the gradient of the error measure J (w) is steep, large changes
will be made to the weights, and conversely, if the gradient of the error measure J (w)
is small, namely a flat error surface, a larger step size η may be used. Gradient descent
algorithms cannot, however, provide a sense of importance or hierarchy to the weights
(Agarwal and Mammone 1994). For example, the value of weight w1 in Figure 2.4 is
10 times greater than w2 and 1000 times greater than w4 . Hence, the component of
the output of the filter within the adaptive system due to w1 will, on the average,
be larger than that due to the other weights. For a conventional gradient algorithm,
however, the change in w1 will not depend upon the relative sizes of the coefficients,
but the relative sizes of the input data. This deficiency provides the motivation for
certain partial update gradient-based algorithms (Douglas 1997).
It is important to notice that gradient-descent-based algorithms inherently forget old
data, which leads to a problem called vanishing gradient and has particular importance
for learning in filters with recursive structures. This issue is considered in more detail
in Chapter 6.

2 The explanation of the O notation can be found in Appendix A.


14 A GENERAL CLASS OF LEARNING ALGORITHMS

2.4 A General Class of Learning Algorithms

To introduce a general class of learning algorithms and explain in very crude terms
relationships between them, we follow the approach from Guo and Ljung (1995). Let
us start from the linear regression equation,
y(k) = xT (k)w(k) + ν(k), (2.5)
where y(k) is the output signal, x(k) is a vector comprising the input signals, ν(k)
is a disturbance or noise sequence, and w(k) is an unknown time-varying vector of
weights (parameters) of the adaptive system. Variation of the weights at time k is
denoted by n(k), and the weight change equation becomes
w(k) = w(k − 1) + n(k). (2.6)
Adaptive algorithms can track the weights only approximately, hence for the following
analysis we use the symbol ŵ. A general expression for weight update in an adaptive
algorithm is
ŵ(k + 1) = ŵ(k) + ηΓ (k)(y(k) − xT (k)ŵ(k)), (2.7)
where Γ (k) is the adaptation gain vector, and η is the step size. To assess how far an
adaptive algorithm is from the optimal solution we introduce the weight error vector,
w̆(k), and a sample input matrix Σ(k) as
w̆(k) = w(k) − ŵ(k), Σ(k) = Γ (k)xT (k). (2.8)
Equations (2.5)–(2.8) yield the following weight error equation:
w̆(k + 1) = (I − ηΣ(k))w̆(k) − ηΓ (k)ν(k) + n(k + 1). (2.9)
For different gains Γ (k), the following three well-known algorithms can be obtained
from (2.7). 3
1. The least mean square (LMS) algorithm:
Γ (k) = x(k). (2.10)

2. Recursive least-squares (RLS) algorithm:


Γ (k) = P (k)x(k), (2.11)
 
1 P (k − 1)x(k)xT (k)P (k − 1)
P (k) = P (k − 1) − η . (2.12)
1−η 1 − η + ηxT (k)P (k − 1)x(k)

3. Kalman filter (KF) algorithm (Guo and Ljung 1995; Kay 1993):
P (k − 1)x(k)
Γ (k) = , (2.13)
R + ηxT (k)P (k − 1)x(k)
ηP (k − 1)x(k)xT (k)P (k − 1)
P (k) = P (k − 1) − + ηQ. (2.14)
R + ηxT (k)P (k − 1)x(k)

3 Notice that the role of η in the RLS and KF algorithm is different to that in the LMS algorithm.

For RLS and KF we may put η = 1 and introduce a forgetting factor instead.
FUNDAMENTALS 15

The KF algorithm is the optimal algorithm in this setting if the elements of n(k)
and ν(k) in (2.5) and (2.6) are Gaussian noises with a covariance matrix Q > 0 and a
scalar value R > 0, respectively (Kay 1993). All of these adaptive algorithms can be
referred to as sequential estimators, since they refine their estimate as each new sample
arrives. On the other hand, block-based estimators require all the measurements to
be acquired before the estimate is formed.
Although the most important measure of quality of an adaptive algorithm is gen-
erally the covariance matrix of the weight tracking error E[w̆(k)w̆T (k)], due to the
statistical dependence between x(k), ν(k) and n(k), precise expressions for this covari-
ance matrix are extremely difficult to obtain.
To undertake statistical analysis of an adaptive learning algorithm, the classical
approach is to assume that x(k), ν(k) and n(k) are statistically independent. Another
assumption is that the homogeneous part of (2.9)

w̆(k + 1) = (I − ηΣ(k))w̆(k) (2.15)

and its averaged version

E[w̆(k + 1)] = (I − ηE[Σ(k)])E[w̆(k)] (2.16)

are exponentially stable in stochastic and deterministic senses (Guo and Ljung 1995).

2.4.1 Quasi-Newton Learning Algorithm

The quasi-Newton learning algorithm utilises the second-order derivative of the objec-
tive function 4 to adapt the weights. If the change in the objective function between
iterations in a learning algorithm is modelled with a Taylor series expansion, we have

∆E(w) = E(w + ∆w) − E(w) ≈ (∇w E(w))T ∆w + 12 ∆wT H∆w. (2.17)

After setting the differential with respect to ∆w to zero, the weight update equation
becomes
∆w = −H −1 ∇w E(w). (2.18)
The Hessian H in this equation determines not only the direction but also the step
size of the gradient descent.
To conclude: adaptive algorithms mainly differ in their form of adaptation gains.
The gains can be roughly divided into two classes: gradient-based gains (e.g. LMS,
quasi-Newton) and Riccati equation-based gains (e.g. KF and RLS).

2.5 A Step-by-Step Derivation of the Least Mean Square (LMS)


Algorithm

Consider a set of input–output pairs of data described by a mapping function f :

d(k) = f (x(k)), k = 1, 2, . . . , N. (2.19)


4 The term objective function will be discussed in more detail later in this chapter.
16 A STEP-BY-STEP DERIVATION OF THE LMS ALGORITHM

x(k) x(k-1) x(k-2) x(k-N+1)


z -1 z -1 z -1 z -1

w1(k) w2(k) w3(k) wN (k)

y(k)
Figure 2.5 Structure of a finite impulse response filter

The function f ( · ) is assumed to be unknown. Using the concept of adaptive systems


explained above, the aim is to approximate the unknown function f ( · ) by a function
F ( · , w) with adjustable parameters w, in some prescribed sense. The function F is
defined on a system with a known architecture or structure. It is convenient to define
an instantaneous performance index,

J(w(k)) = [d(k) − F (x(k), w(k))]2 , (2.20)

which represents an energy measure. In that case, function F is most often just the
inner product F = xT (k)w(k) and corresponds to the operation of a linear FIR filter
structure. As before, the goal is to find an optimisation algorithm that minimises the
cost function J(w). The common choice of the algorithm is motivated by the method
of steepest descent, and generates a sequence of weight vectors w(1), w(2), . . . , as

w(k + 1) = w(k) − ηg(k), k = 0, 1, 2, . . . , (2.21)

where g(k) is the gradient vector of the cost function J(w) at the point w(k)

∂J(w) 
g(k) = . (2.22)
∂w w=w(k)

The parameter η in (2.21) determines the behaviour of the algorithm:


• for η small, algorithm (2.21) converges towards the global minimum of the error
performance surface;
• if the value of η approaches some critical value ηc , the trajectory of convergence
on the error performance surface is either oscillatory or overdamped;
• if the value of η is greater than ηc , the system is unstable and does not converge.
These observations can only be visualised in two dimensions, i.e. for only two param-
eter values w1 (k) and w2 (k), and can be found in Widrow and Stearns (1985). If the
approximation function F in the gradient descent algorithm (2.21) is linear we call
such an adaptive system a linear adaptive system. Otherwise, we describe it as a
nonlinear adaptive system. Neural networks belong to this latter class.

2.5.1 The Wiener Filter

Suppose the system shown in Figure 2.1 is modelled as a linear FIR filter (shown in
Figure 2.5), we have F (x, w) = xT w, dropping the k index for convenience. Con-
sequently, the instantaneous cost function J(w(k)) is a quadratic function of the
FUNDAMENTALS 17

weight vector. The Wiener filter is based upon minimising the ensemble average of
this instantaneous cost function, i.e.

JWiener (w(k)) = E[[d(k) − xT (k)w(k)]2 ] (2.23)

and assuming d(k) and x(k) are zero mean and jointly wide sense stationary. To find
the minimum of the cost function, we differentiate with respect to w and obtain
∂JWiener
= −2E[e(k)x(k)], (2.24)
∂w
where e(k) = d(k) − xT (k)w(k).
At the Wiener solution, this gradient equals the null vector 0. Solving (2.24) for
this condition yields the Wiener solution,

w = R−1
x,x r x,d , (2.25)

where Rx,x = E[x(k)xT (k)] is the autocorrelation matrix of the zero mean input
data x(k) and r x,d = E[x(k)d(k)] is the crosscorrelation between the input vector
and the desired signal d(k). The Wiener formula has the same general form as the
block least-squares (LS) solution, when the exact statistics are replaced by temporal
averages.
The RLS algorithm, as in (2.12), with the assumption that the input and desired
response signals are jointly ergodic, approximates the Wiener solution and asymptot-
ically matches the Wiener solution.
More details about the derivation of the Wiener filter can be found in Haykin
(1996a, 1999a).

2.5.2 Further Perspective on the Least Mean Square (LMS) Algorithm

To reduce the computational complexity of the Wiener solution, which is a block


solution, we can use the method of steepest descent for a recursive, or sequential,
computation of the weight vector w. Let us derive the LMS algorithm for an adaptive
FIR filter, the structure of which is shown in Figure 2.5. In view of a general adaptive
system, this FIR filter becomes the filter structure within Figure 2.1. The output of
this filter is
y(k) = xT (k)w(k). (2.26)
Widrow and Hoff (1960) utilised this structure for adaptive processing and proposed
instantaneous values of the autocorrelation and crosscorrelation matrices to calcu-
late the gradient term within the steepest descent algorithm. The cost function they
proposed was
J(k) = 12 e2 (k), (2.27)
which is again based upon the instantaneous output error e(k) = d(k) − y(k). In order
to derive the weight update equation we start from the instantaneous gradient
∂J(k) ∂e(k)
= e(k) . (2.28)
∂w(k) ∂w(k)
18 ON GRADIENT DESCENT FOR NONLINEAR STRUCTURES

x(k) x(k-1) x(k-2) x(k-N+1)


z -1 z -1 z -1 z -1

w1(k) w2(k) w3(k) wN(k)

Φ y(k)

Figure 2.6 The structure of a nonlinear adaptive filter

Following the same procedure as for the general gradient descent algorithm, we obtain
∂e(k)
= −x(k) (2.29)
∂w(k)
and finally
∂J(k)
= −e(k)x(k). (2.30)
∂w(k)
The set of equations that describes the LMS algorithm is given by

N


y(k) = xi (k)wi (k) = xT (k)w(k),


i=1 (2.31)
e(k) = d(k) − y(k), 




w(k + 1) = w(k) + ηe(k)x(k).
The LMS algorithm is a very simple yet extremely popular algorithm for adaptive
filtering. It is also optimal in the H ∞ sense which justifies its practical utility (Hassibi
et al. 1996).

2.6 On Gradient Descent for Nonlinear Structures

Adaptive filters and neural networks are formally equivalent, in fact, the structures of
neural networks are generalisations of linear filters (Maass and Sontag 2000; Nerrand
et al. 1991). Depending on the architecture of a neural network and whether it is used
online or offline, two broad classes of learning algorithms are available:
• techniques that use a direct computation of the gradient, which is typical for
linear and nonlinear adaptive filters;
• techniques that involve backpropagation, which is commonplace for most offline
applications of neural networks.
Backpropagation is a computational procedure to obtain gradients necessary for
adaptation of the weights of a neural network contained within its hidden layers and
is not radically different from a general gradient algorithm.
As we are interested in neural networks for real-time signal processing, we will
analyse online algorithms that involve direct gradient computation. In this section we
introduce a learning algorithm for a nonlinear FIR filter, whereas learning algorithms
for online training of recurrent neural networks will be introduced later. Let us start
from a simple nonlinear FIR filter, which consists of the standard FIR filter cascaded
FUNDAMENTALS 19

with a memoryless nonlinearity Φ as shown in Figure 2.6. This structure can be seen
as a single neuron with a dynamical FIR synapse. This FIR synapse provides memory
to the neuron. The output of this filter is given by

y(k) = Φ(xT (k)w(k)). (2.32)

The nonlinearity Φ( · ) after the tap-delay line is typically a sigmoid. Using the ideas
from the LMS algorithm, if the cost function is given by

J(k) = 12 e2 (k) (2.33)

we have

e(k) = d(k) − Φ(xT (k)w(k)), (2.34)


w(k + 1) = w(k) − η∇w J(k), (2.35)

where e(k) is the instantaneous error at the output neuron, d(k) is some teach-
ing (desired) signal, w(k) = [w1 (k), . . . , wN (k)]T is the weight vector and x(k) =
[x1 (k), . . . , xN (k)]T is the input vector.
The gradient ∇w J(k) can be calculated as
∂J(k) ∂e(k)
= e(k) = −e(k)Φ (xT (k)w(k))x(k), (2.36)
∂w(k) ∂w(k)
where Φ ( · ) represents the first derivative of the nonlinearity Φ(·) and the weight
update Equation (2.35) can be rewritten as

w(k + 1) = w(k) + ηΦ (xT (k)w(k))e(k)x(k). (2.37)

This is the weight update equation for a direct gradient algorithm for a nonlinear
FIR filter.

2.6.1 Extension to a General Neural Network

When deriving a direct gradient algorithm for a general neural network, the network
architecture should be taken into account. For large networks for offline processing,
classical backpropagation is the most convenient algorithm. However, for online learn-
ing, extensions of the previous algorithm should be considered.

2.7 On Some Important Notions From Learning Theory

In this section we discuss in more detail the inter-relations between the error, error
function and objective function in learning theory.

2.7.1 Relationship Between the Error and the Error Function

The error at the output of an adaptive system is defined as the difference between
the output value of the network and the target (desired output) value. For instance,
20 ON SOME IMPORTANT NOTIONS FROM LEARNING THEORY

the instantaneous error e(k) is defined as


e(k) = d(k) − y(k). (2.38)
The instantaneous error can be positive, negative or zero, and is therefore not a good
candidate for the criterion function for training adaptive systems. Hence we look for
another function, called the error function, that is a function of the instantaneous
error, but is suitable as a criterion function for learning. Error functions are also called
loss functions. They are defined so that an increase in the error function corresponds
to a reduction in the quality of learning, and they are nonnegative. An error function
can be defined as
N
E(N ) = e2 (i) (2.39)
i=0

or as an average value
1  2
N
Ē(N ) = e (i). (2.40)
N + 1 i=0

2.7.2 The Objective Function

The objective function is a function that we want to minimise during training. It can
be equal to an error function, but often it may include other terms to introduce con-
straints. For instance in generalisation, too large a network might lead to overfitting.
Hence the objective function can consist of two parts, one for the error minimisa-
tion and the other which is either a penalty for a large network or a penalty term for
excessive increase in the weights of the adaptive system or some other chosen function
(Tikhonov et al. 1998). An example of such an objective function for online learning
is
1  2
N
J(k) = (e (k − i + 1) + G(w(k − i + 1)22 )), (2.41)
N i=1
where G is some linear or nonlinear function. We often use symbols E and J inter-
changeably to denote the cost function.

2.7.3 Types of Learning with Respect to the Training Set and Objective Function

Batch learning
Batch learning is also known as epochwise, or offline learning, and is a common
strategy for offline training. The idea is to adapt the weights once the whole training
set has been presented to an adaptive system. It can be described by the following
steps.
1. Initialise the weights
2. Repeat
• Pass all the training data through the network
FUNDAMENTALS 21

• Sum the errors after each particular pattern


• Update the weights based upon the total error
• Stop if some prescribed error performance is reached

The counterpart of batch learning is so-called incremental learning, online, or pat-


tern learning. The procedure for this type of learning is as follows.
1. Initialise the weights
2. Repeat

• Pass one pattern through the network


• Update the weights based upon the instantaneous error
• Stop if some prescribed error performance is reached

The choice of the type of learning is very much dependent upon application. Quite
often, for networks that need initialisation, we perform one type of learning in the
initialisation procedure, which is by its nature an offline procedure, and then use some
other learning strategy while the network is running. Such is the case with recurrent
neural networks for online signal processing (Mandic and Chambers 1999f).

2.7.4 Deterministic, Stochastic and Adaptive Learning

Deterministic learning is an optimisation technique based on an objective function


which always produces the same result, no matter how many times we recompute it.
Deterministic learning is always offline.
Stochastic learning is useful when the objective function is affected by noise and
local minima. It can be employed within the context of a gradient descent learning
algorithm. The idea is that the learning rate gradually decreases during training and
hence the steps on the error performance surface in the beginning of training are
large which speeds up training when far from the optimal solution. The learning rate
is small when approaching the optimal solution, hence reducing misadjustment. This
gradual reduction of the learning rate can be achieved by e.g. annealing (Kirkpatrick
et al. 1983; Rose 1998; Szu and Hartley 1987).
The idea behind the concept of adaptive learning is to forget the past when it is
no longer relevant and adapt to the changes in the environment. The terms ‘adaptive
learning’ or ‘gear-shifting’ are sometimes used for gradient methods in which the
learning rate is changed during training.

2.7.5 Constructive Learning

Constructive learning deals with the change of architecture or interconnections in


the network during training. Neural networks for which topology can change over
time are called ontogenic neural networks (Fiesler and Beale 1997). Two basic classes
of constructive learning are network growing and network pruning. In the network
growing approach, learning begins with a network with no hidden units, and if the
22 ON SOME IMPORTANT NOTIONS FROM LEARNING THEORY

error is too big, new hidden units are added to the network, training resumes, and so
on. The most used algorithm based upon network growing is the so-called cascade-
correlation algorithm (Hoehfeld and Fahlman 1992). Network pruning starts from a
large network and if the error in learning is smaller than allowed, the network size is
reduced until the desired ratio between accuracy and network size is reached (Reed
1993; Sum et al. 1999).

2.7.6 Transformation of Input Data, Learning and Dimensionality

A natural question is whether to linearly/nonlinearly transform the data before feed-


ing them to an adaptive processor. This is particularly important for neural networks,
which are nonlinear processors. If we consider each neuron as a basic component of a
neural network, then we can refer to a general neural network as a system with compo-
nentwise nonlinearities. To express this formally, consider a scalar function σ : R → R
and systems of the form,
y(k) = σ(Ax(k)), (2.42)

where the matrix A is an N × N matrix and σ is applied componentwise

σ(x1 (k), . . . , xN (k)) = (σ(x1 (k)), . . . , σ(xN (k))). (2.43)

Systems of this type arise in a wide variety of situations. For a linear σ, we have a
linear system. If the range of σ is finite, the state vector of (2.42) takes values from
a finite set, and dynamical properties can be analysed in time which is polynomial in
the number of possible states. Throughout this book we are interested in functions, σ,
and combination matrices, A, which would guarantee a fixed point of this mapping.
Neural networks are commonly of the form (2.42). In such a context we call σ the
activation function. Results of Siegelmann and Sontag (1995) show that saturated
linear systems (piecewise linear) can represent Turing machines, which is achieved by
encoding the transition rules of the Turing machine in the matrix A.

The curse of dimensionality


The curse of dimensionality (Bellman 1961) refers to the exponential growth of com-
putation needed for a specific task as a function of the dimensionality of the input
space. In neural networks, a network quite often has to deal with many irrelevant
inputs which, in turn, increase the dimensionality of the input space. In such a case,
the network uses much of its resources to represent and compute irrelevant informa-
tion, which hampers processing of the desired information. A remedy for this prob-
lem is preprocessing of input data, such as feature extraction, and to introduce some
importance function to the input samples. The curse of dimensionality is particularly
prominent in unsupervised learning algorithms. Radial basis functions are also prone
to this problem. Selection of a neural network model must therefore be suited for a
particular task. Some a priori information about the data and scaling of the inputs
can help to reduce the severity of the problem.
FUNDAMENTALS 23

Transformations on the input data


Activation functions used in neural networks are centred around a certain value in
their output space. For instance, the mean of the logistic function is 0.5, whereas
the tanh function is centred around zero. Therefore, in order to perform efficient
prediction, we should match the range of the input data, their mean and variance,
with the range of the chosen activation function. There are several operations that
we could perform on the input data, such as the following.
1. Normalisation, which in this context means dividing each element of the input
vector x(k) by its squared norm, i.e. xi (k) ∈ x(k) → xi (k)/x(k)22 .

2. Rescaling, which means transforming the input data in the manner that we
multiply/divide them by a constant and also add/subtract a constant from the
data. 5

3. Standardisation, which is borrowed from statistics, where, for instance, a ran-


dom Gaussian vector is standardised if its mean is subtracted from it, and the
vector is then divided by its standard deviation. The resulting random vari-
able is called a ‘standard normal’ random variable with zero mean and unity
standard deviation. Some examples of data standardisation are

• Standardisation to zero mean and unity standard deviation can be per-


formed as

iXi i (Xi− mean)2


mean = , std = .
N N −1
The standardised quantity becomes Si = (Xi − mean)/std.
• Standardise X to midrange 0 and range 2. This can be achieved by

midrange = 12 (max Xi + min Xi ), range = max Xi − min Xi ,


i i i i
Xi − midrange
Si = .
range/2

4. Principal component analysis (PCA) represents the data by a set of unit norm
vectors called normalised eigenvectors. The eigenvectors are positioned along
the directions of greatest data variance. The eigenvectors are found from the
covariance matrix R of the input dataset. An eigenvalue λi , i = 1, . . . , N , is
associated with each eigenvector. Every input data vector is then represented
by a linear combination of eigenvectors.
As pointed out earlier, standardising input variables has an effect on training, since
steepest descent algorithms are sensitive to scaling due to the change in the weights
being proportional to the value of the gradient and the input data.

5 In real life a typical rescaling is transforming the temperature from Celsius into Fahrenheit scale.
24 LEARNING STRATEGIES

Nonlinear transformations of the data


This method to transform the data can help when the dynamic range of the data is
too high. In that case, for instance, we typically apply the log function to the input
data. The log function is often applied in the error and objective functions for the
same purposes.

2.8 Learning Strategies

To construct an optimal neural approximating model we have to determine an appro-


priate training set containing all the relevant information of the process and define
a suitable topology that matches the complexity and performance requirements. The
training set construction issue requires four entities to be considered (Alippi and Piuri
1996; Bengio 1995; Haykin and Li 1995; Shadafan and Niranjan 1993):
• the number of training data samples ND ;
• the number of patterns NP constituting a batch;
• the number of batches NB to be extracted from the training set;
• the number of times the generic batch is presented to the network during learn-
ing.
The assumption is that the training set is sufficiently rich so that it contains all the
relevant information necessary for learning.
The requirement coincides with the hypothesis that the training data have been
generated by a fully exciting input signal, such as white noise, which is able to excite
all the process dynamics. White noise is a persistently exciting input signal and is
used for the driving component of moving average (MA), autoregressive (AR) and
autoregressive moving average (ARMA) models.

2.9 General Framework for the Training of Recurrent Networks by


Gradient-Descent-Based Algorithms

In this section we summarise some of the important concepts mentioned earlier.

2.9.1 Adaptive Versus Nonadaptive Training

The training of a network makes use of two sequences, the sequence of inputs and the
sequence of corresponding desired outputs. If the network is first trained (with a train-
ing sequence of finite length) and subsequently used (with the fixed weights obtained
from training), this mode of operation is referred to as non-adaptive (Nerrand et al.
1994). Conversely, the term adaptive refers to the mode of operation whereby the net-
work is trained permanently throughout its application (with a training sequence of
infinite length). Therefore, the adaptive network is suitable for input processes which
exhibit statistically non-stationary behaviour, a situation which is normal in the fields
of adaptive control and signal processing (Bengio 1995; Haykin 1996a; Haykin and
FUNDAMENTALS 25

Li 1995; Khotanzad and Lu 1990; Narendra and Parthasarathy 1990; Nerrand et al.
1994).

2.9.2 Performance Criterion, Cost Function, Training Function

The computation of the coefficients during training aims at finding a system whose
operation is optimal with respect to some performance criterion which may be either
qualitative, e.g. (subjective) quality of speech reconstruction, or quantitative, e.g.
maximising signal to noise ratio for spatial filtering. The goal is to define a positive
training function which is such that a decrease of this function through modifications
of the coefficients of the network leads to an improvement of the performance of the
system (Bengio 1995; Haykin and Li 1995; Nerrand et al. 1994; Qin et al. 1992). In the
case of non-adaptive training, the training function is defined as a function of all the
data of the training set (in such a case, it is usually termed as a cost function). The
minimum of the cost function corresponds to the optimal performance of the system.
Training is an optimisation procedure, conventionally using gradient-based methods.
In the case of adaptive training, it is impossible, in most instances, to define a
time-independent cost function whose minimisation leads to a system that is optimal
with respect to the performance criterion. Therefore, the training function is time
dependent. The modification of the coefficients is computed continually from the
gradient of the training function. The latter involves the data pertaining to a time
window of finite length, which shifts in time (sliding window) and the coefficients are
updated at each sampling time.

2.9.3 Recursive Versus Nonrecursive Algorithms

A nonrecursive algorithm employs a cost function (i.e. a training function defined on a


fixed window), whereas a recursive algorithm makes use of a training function defined
on a sliding window of data. An adaptive system must be trained by a recursive
algorithm, whereas a non-adaptive system may be trained either by a nonrecursive or
by a recursive algorithm (Nerrand et al. 1994).

2.9.4 Iterative Versus Noniterative Algorithms

An iterative algorithm performs coefficient modifications several times from a set of


data pertaining to a given data window, a non-iterative algorithm makes only one
(Nerrand et al. 1994). For instance, the conventional LMS algorithm (2.31) is thus a
recursive, non-iterative algorithm operating on a sliding window.

2.9.5 Supervised Versus Unsupervised Algorithms

A supervised learning algorithm performs learning by using a teaching signal, i.e. the
desired output signal, while an unsupervised learning algorithm, as in blind signal
processing, has no reference signal as a teaching input signal. An example of a super-
vised learning algorithm is the delta rule, while unsupervised learning algorithms are,
26 MODULARITY WITHIN NEURAL NETWORKS

Module 1 Module 2 Module N


Input Output
Figure 2.7 A cascaded realisation of a general system

for example, the reinforcement learning algorithm and the competitive rule (‘winner
takes all’) algorithm, whereby there is some sense of concurrency between the elements
of the network structure (Bengio 1995; Haykin and Li 1995).

2.9.6 Pattern Versus Batch Learning

Updating the network weights by pattern learning means that the weights of the
network are updated immediately after each pattern is fed in. The other approach is
to take all the data as a whole batch, and the network is not updated until the entire
batch of data is processed. This approach is referred to as batch learning (Haykin and
Li 1995; Qin et al. 1992).
It can be shown (Qin et al. 1992) that while considering feedforward networks
(FFN), after one training sweep through all the data, the pattern learning is a first-
order approximation of the batch learning with respect to the learning rate η. There-
fore, the FFN pattern learning approximately implements the FFN batch learning
after one batch interval. After multiple sweeps through the training data, the dif-
ference between the FFN pattern learning and FFN batch learning is of the order6
O(η 2 ). Therefore, for small training rates, the FFN pattern learning approximately
implements FFN batch learning after multiple sweeps through the training data. For
recurrent networks, the weight updating slopes for pattern learning and batch learn-
ing are different 7 (Qin et al. 1992). However, the difference could also be controlled
by the learning rate η. The difference will converge to zero as quickly as η goes to
zero 8 (Qin et al. 1992).

2.10 Modularity Within Neural Networks

The hierarchical levels in neural network architectures are synapses, neurons, layers
and neural networks, and will be discussed in Chapter 5. The next step would be
combinations of neural networks. In this case we consider modular neural networks.
Modular neural networks are composed of a set of smaller subnetworks (modules),
each performing a subtask of the complete problem. To depict this problem, let us
recourse to the case of linear adaptive filters described by a transfer function in the
6 In fact, if the data being processed exhibit highly stationary behaviour, then the average error

calculated after FFN batch learning is very close to the instantaneous error calculated after FFN
pattern learning, e.g. the speech data can be considered as being stationary within an observed frame.
That forms the basis for use of various real-time and recursive learning algorithms, e.g. RTRL.
7 It can be shown (Qin et al. 1992) that for feedforward networks, the updated weights for both

pattern learning and batch learning adapt at the same slope (derivative dw/dη) with respect to the
learning rate η. For recurrent networks, this is not the case.
8 In which case we have a very slow learning process.
Another Random Document on
Scribd Without Any Related Topics
to state the reasons which led me to think no fight would take place, for
doing so would have been to betray confidence. And so we parted company
—they to feast their eyes on a bombardment—and if they only are near
enough to see it they will heartily regret their curiosity, or I am mistaken—
and we to return to Mobile.
It was dark before the Diana was well down off Fort Pickens again, and,
as she passed out to sea between it and Fort M’Rae, it was certainly to have
been expected that one side or other would bring her to. Certainly our friend
Mr. Brown in his clipper Oriental would overhaul us outside, and there lay a
friendly bottle in a nest of ice waiting for the gallant sailor who was to take
farewell of us according to promise. Out we glided into night and into the
cool sea breeze, which blew fresh and strong from the north. In the distance
the black form of the Powhatan could be just distinguished; the rest of the
squadron could not be made out by either eye or glass, nor was the schooner
in sight. A lantern was hoisted by my orders, and was kept aft for some time
after the schooner was clear of the forts. Still no schooner. The wind was
not very favorable for running toward the Powhatan, and it was too late to
approach her with perfect confidence from the enemy’s side. Besides, it was
late; time pressed. The Oriental was surely lying off somewhere to the
westward, and the word was given to make sail, and soon the Diana was
bowling along shore, where the sea melted away in a fiery line of foam so
close to us that a man could, in nautical phrase, “shy a biscuit” on the sand.
The wind was abeam, and the Diana seemed to breathe it through her sails,
and flew along at an astonishing rate through the phosphorescent waters
with a prow of flame and a bubbling wake of dancing meteor-like streams
flowing from her helm, as though it were a furnace whence boiled a stream
of liquid metal. “No sign of the Oriental on our lee-bow?” “Nothin’ at all in
sight, sir.” The sharks and huge rays flew off from the shore as we passed
and darted out seaward, marking their runs in brilliant trails of light. On
sped the Diana, but no Oriental came in sight.
I was tired. The sun had been very hot; the ride through the batteries, the
visits to quarters, the excursion to Pickens, had found out my weak places,
and my head was aching and legs fatigued, and so I thought I would turn in
for a short time, and I dived into the shades below, where my comrades
were already sleeping, and kicking off my boots, lapsed into a state which
rendered me indifferent to the attentions no doubt lavished upon me by the
numerous little familiars who recreate in the well-peopled timbers. It never
entered into my head, even in my dreams, that the captain would break the
blockade if he could—particularly as his papers had not been indorsed, and
the penalties would be sharp and sure if he were caught. But the confidence
of coasting captains in the extraordinary capabilities of their craft is a
madness—a hallucination so strong that no danger or risk will prevent their
acting upon it whenever they can. I was assured once by the “captain” of a
Billyboy, that he could run to windward of any frigate in Her Majesty’s
service, and there is not a skipper from Hartlepool to Whitstable who does
not believe his own Mary Ann or Three Grandmothers is, on certain “pints,”
able to bump her fat bows and scuttle-shaped stern faster through the seas
than any clipper which ever flew a pendant. I had been some two hours and
a half asleep, when I was awakened by a whispering in the little cabin.
Charley, the negro cook, ague-stricken with terror, was leaning over the
bed, and in broken French was chattering through his teeth: “Monsieur,
Monsieur, nous sommes perdus! Le batement de guerre nous poursuit. Il n’a
pas encore tiré. Il va tirer bientot! Oh, mon Dieu! mon Dieu!” Through the
hatchway I could see the skipper was at the helm, glancing anxiously from
the compass to the quivering reef-points of his mainsail. “What’s all this we
hear, captain?” “Well, sir, there’s been somethin’ a runnin’ after us these
two hours” (very slowly). “But I don’t think he’ll keech us up no how this
time.” “But, good heavens! you know it may be the Oriental, with Mr.
Brown on board.” “Ah, wall—may bee. But he kept quite close up on me in
the dark—it gave me quite a stark when I seen him. May be, says I, he’s a
privateerin’ chap, and so I draws in on shore close as I cud,—gets mee
centre-board in, and, says I, I’ll see what yer med of, mee boy. He an’t a
gaining much on us.” I looked, and sure enough, about half or three-
quarters of a mile astern, and somewhat to leeward of us, a vessel, with sails
and hull all blended into a black lump, was standing on in pursuit. I strained
my eyes and furbished up the glasses, but could make out nothing definite.
The skipper held grimly on. The shore was so close we could have almost
leaped into the surf, for the Diana, when her centre-board is up, does not
draw much over four feet. “Captain, I think you had better shake your wind,
and see who he is. It may be Mr. Brown.” “Meester Brown or no I can’t
help carrine on now. I’d be on the bank outside in a minit if I didn’t hold my
course.” The captain had his own way; he argued that if it was the Oriental
she would have fired a blank gun long ago to bring us to; and as to not
calling us when the sail was discovered he took up the general line of the
cruelty of disturbing people when they’re asleep. Ah! captain, you knew
well it was Mr. Brown, as you let out when we were off Fort Morgan. By
keeping so close in shore in shoal water the Diana was enabled to creep
along to windward of the stranger, who evidently was deeper than
ourselves. See there! Her sails shiver! so one of the crew says; she’s struck!
But she’s off again, and is after us. We are just within range, and one’s eyes
become quite blinky, watching for the flash from the bow, but, whether
privateer or United States schooner she was too magnanimous to fire. A
stern chase is a long chase. It must now be somewhere about two in the
morning. Nearer and nearer to shore creeps the Diana. “I’ll lead him into a
pretty mess, whoever he is, if he tries to follow me through the Swash,”
grins the skipper. The Swash is a very shallow, narrow, and dangerous
passage into Mobile Bay, between the sand-banks on the east of the main
channel and the shore. The Diana is now only some nine or ten miles from
Fort Morgan, guarding the entrance to Mobile. Soon an uneasy dancing
motion welcomes her approach to the Swash. “Take a cast of the lead,
John!” “Nine feet.” “Good! Again!” “Seven feet.” “Good—Charley, bring
the lantern.” (Oh, Charley, why did that lantern go out just as it was wanted,
and not only expose us to the most remarkable amount of “cussin’,”
imprecation, and strange oaths our ears ever heard, but expose our lives and
your head to more imminent danger?) But so it was, just at the critical
juncture when a turn of the helm port or starboard made the difference,
perhaps, between life and death, light after light went out, and the captain
went dancing mad after intervals of deadly calmness, as the mate sang out,
“Five feet and a half! seven feet—six feet—eight feet—five feet—four feet
and a half—(Oh, Lord!)—six feet,” and so on, through a measurement of
death by inches, not at all agreeable. And where was Mr. Brown all this
time? Really, we were so much interested in the state of the lead-line, and in
the very peculiar behavior of the lanterns which would not burn, that we
scarcely cared much when we heard from the odd hand and Charley that she
had put about, after running aground once or twice, they thought, as soon as
we entered the Swash, and had vanished rapidly in the darkness. It was little
short of a miracle that we got past the elbow, for just at the critical moment,
in a channel not more than a hundred yards broad, with only six feet of
water, the binnacle light, which had burned steadily for a minute, sank with
a sputter into black night. When the passage was accomplished, the captain
relieved his mind by chasing Charley into a corner, and with a shark, which
he held by the tail, as the first weapon that came to hand, inflicting on him
condign punishment, and then returning to the helm. Charley, however,
knew his master, for he slyly seized the shark and flung his defunct corpse
overboard before another fit of passion came on, and by the morning the
skipper was good friends with him, after he had relieved himself, by a series
of castigations of the negligent lamplighter with every variety of
Rhadamanthine implement.
The Diana had thus distinguished her dirty little person by breaking a
blockade, and giving an excellent friend of ours a great deal of trouble (if it
was, indeed Mr. Brown), as well as giving us a very unenviable character
for want of hospitality and courtesy; and, for both, I beg to apologize with
this account of the transaction. But she had a still greater triumph. As she
approached Fort Morgan, all was silence. The morning was just showing a
gray streak in the east. “Why, they’re all asleep at the fort,” observed the
indomitable captain, and, regardless of guns or sentries, down went his
helm, and away the Diana thumped into Mobile Bay, and stole off in the
darkness toward the opposite shore. There was, however, a miserable day
before us. When the light fairly broke we had got only a few miles inside, a
stiff northerly wind blew right in our teeth, and the whole of the blessed day
we spent in tacking backward and forward between one low shore and
another low shore, in water the color of pea-soup, so that temper and
patience were exhausted, and we were reduced to such a state that we took
intense pleasure in meeting with a drowning alligator. He was a nice-
looking young fellow about ten feet long, and had evidently lost his way,
and was going out to sea bodily, but it would have been the height of
cruelty to take him on board our ship miserable as he was, though he passed
within two yards of us. There was to be sure the pleasure of seeing Mobile
in every possible view, far and near, east and west, and in a lump and run
out, but it was not relished any more than our dinner, which consisted of a
very gamy Bologna sausage, pig who had not decided whether he would be
pork or bacon, and onions fried in a terrible preparation of Charley the
cook. At five in the evening, however, having been nearly fourteen hours
beating about twenty-seven miles, we were landed at an outlying wharf, and
I started off for the Battle House and rest. The streets are filled with the
usual rub-a-dub-dubbing bands, and parades of companies of the citizens in
grotesque garments and armament, all looking full of fight and secession. I
write my name in the hotel book at the bar as usual. Instantly young
Vigilance Committee, who has been resting his heels high in air, with one
eye on the staircase and the other on the end of his cigar, stalks forth and
reads my style and title, and I have the satisfaction of slapping the door in
his face as he saunters after me to my room, and looks curiously in to see
how a man takes off his boots. They are all very anxious in the evening to
know what I think about Pickens and Pensacola, and I am pleased to tell the
citizens I think it be a very tough affair on both whenever it comes. I
proceed to New Orleans on Monday.

NEW-ORLEANS, May 25, 1861.


There are doubts arising in my mind respecting the number of armed men
actually in the field in the South, and the amount of arms in the possession
of the Federal forces. The constant advertisements and appeals for “a few
more men to complete” such and such companies furnish some sort of
evidence that men are still wanting. But a painful and startling insight into
the manner in which “volunteers” have been sometimes obtained has been
afforded to me at New Orleans. In no country in the world have outrages on
British subjects been so frequent and so wanton as in the States of America.
They have been frequent, perhaps, because they have generally been
attended with impunity. Englishmen, however, will be still a little surprised
to hear that within a few days British subjects living in New Orleans have
been seized, knocked down, carried off from their labor at the wharf and the
workshop, and forced by violence to serve in the “volunteer” ranks! These
cases are not isolated. They are not in twos and threes, but in tens and
twenties; they have not occurred stealthily or in by-ways; they have taken
place in the open day, and in the streets of New Orleans. These men have
been dragged along like felons, protesting in vain that they were British
subjects. Fortunately, their friends bethought them that there was still a
British consul in the city, who would protect his countrymen—English,
Irish, or Scotch. Mr. Mure, when he heard of the reports and of the
evidence, made energetic representations to the authorities, who, after some
evasion, gave orders that the impressed “volunteers” should be discharged,
and the “Tiger Rifles” and other companies were deprived of the services of
the thirty-five British subjects whom they had taken from their usual
avocations. The mayor promises that it shall not occur again. It is high time
that such acts should be put a stop to, and that the mob of New Orleans
should be taught to pay some regard to the usages of civilized nations.
There are some strange laws here and elsewhere in reference to compulsory
service on the part of foreigners which it would be well to inquire into, and
Lord John Russell may be able to deal with them at a favorable opportunity.
As to any liberty of opinion or real freedom here, the boldest Southerner
would not dare to say a shadow of either exists. It may be as bad in the
North, for all I know; but it must be remembered that in all my
communications I speak of things as they appear to me to be in the place
where I am at the time. The most cruel and atrocious acts are perpetrated by
the rabble who style themselves citizens. The national failing of curiosity
and prying into other people’s affairs is now rampant, and assumes the
name and airs of patriotic vigilance. Every stranger is watched, every word
is noted, espionage commands every keyhole and every letter-box; love of
country takes to evesdropping, and freedom shaves men’s heads, and packs
men up in boxes for the utterance of “Abolition sentiments.” In this city
there is a terrible substratum of crime and vice, violence, misery, and
murder, over which the wheels of the Cotton King’s chariot rumble
gratingly, and on which rest in dangerous security the feet of his throne.
There are numbers of negroes who are sent out into the streets every day
with orders not to return with less than seventy-five cents—any thing more
they can keep. But if they do not gain that—about 3s. 6d. a day—they are
liable to be punished; they may be put into jail on charges of laziness, and
may be flogged ad libitum, and are sure to be half starved. Can any thing,
then, be more suggestive than this paragraph, which appeared in last night’s
papers. “Only three coroners’ inquests were held yesterday on persons
found drowned in the river, names unknown!” The italics are mine. Over
and over again has the boast been repeated to me, that on the plantations
lock and key are unknown or unused in the planters’ houses. But in the
cities they are much used, though scarcely trusted. It appears, indeed, that
unless a slave has made up his or her mind to incur the dreadful penalties of
flight, there would be no inducement to commit theft, for money or jewels
would be useless; search would be easy, detection nearly certain. That all
the slaves are not indifferent to the issues before them, is certain. At the
house of a planter, the other day, one of them asked my friend, “Will we be
made to work, massa, when ole English come?” An old domestic in the
house of a gentleman in this city said, “There are few whites in this place
who ought not to be killed for their cruelty to us.” Another said, “Oh, just
wait till they attack Pickens!” These little hints are significant enough,
coupled with the notices of runaways, and the lodgments in the police jails,
to show that all is not quiet below the surface. The holders, however, are
firm, and there have been many paragraphs stating that slaves have
contributed to the various funds for state defence, and that they generally
show the very best spirit.
By the proclamation of Governor Magoffin, a copy of which I enclose,
you will see that the governor of the commonwealth of Kentucky and
commander-in-chief of all her military forces on land or water, warns all
states, separately or united, especially the United States and the Confederate
States, that he will fight their troops if they attempt to enter his
commonwealth. Thus Kentucky sets up for herself, while Virginia is on the
eve of destruction, and an actual invasion has taken place on her soil. It is
exceedingly difficult of comprehension that, with the numerous troops,
artillery, and batteries, which the Confederate journals asserted to be in
readiness to repel attack, an invasion which took place in face of the enemy,
and was effected over a broad river, with shores readily defensible, should
have been unresisted. Here it is said there is a mighty plan, in pursuance of
which the United States troops are to be allowed to make their way into
Virginia, that they may at some convenient place be eaten up by their
enemies; and if we hear that the Confederates at Harper’s Ferry retain their
position, one may believe some such plan really exists, although it is rather
doubtful strategy to permit the United States forces to gain possession of
the right bank of the Potomac. Should the position at Harper’s Ferry be
really occupied with a design of using it as a point d’appui for movements
against the North, and any large number of troops be withdrawn from
Annapolis, Washington and Baltimore, so as to leave those places
comparatively undefended, an irruption in force of the Confederates on the
right flank and in rear of General Scott’s army, might cause most serious
inconvenience, and endanger his communications, if not the possession of
the places indicated.
Looking at the map, it is easy to comprehend that a march southward from
Alexandria could be combined with an offensive movement by the forces
said to be concentrated in and around Fortress Monroe, so as to place
Richmond itself in danger, and, if any such measure is contemplated, a
battle must be fought in that vicinity, or the prestige of the South will
receive very great damage. It is impossible for any one to understand the
movement of the troops on both sides. These companies are scattered
broadcast over the enormous expanse of the states, and, where concentrated
in any considerable numbers, seem to have had their position determined
rather by local circumstances than by considerations connected with the
general plan of a large campaign.
In a few days the object of the recent movement will be better understood,
and, it is probable that your correspondent at New York will send, by the
same mail which carries this, exceedingly important information, to which
I, in my present position, can have no access. The influence of the blockade
will be severely felt, combined with the strict interruption of all intercourse
by the Mississippi. Although the South boasts of its resources and of its
amazing richness and abundance of produce, the constant advice in the
journals to increase the breadth of land under corn, and to neglect the cotton
crop in consideration of the paramount importance of the cause, indicates
an apprehension of a scarcity of food if the struggle be prolonged.
Under any circumstances, the patriotic ladies and gentlemen who are so
anxious for the war, must make up their minds to suffer a little in the flesh.
All they can depend on is a supply of home luxuries: Indian corn and wheat,
the flesh of pigs, eked out with a small supply of beef and mutton, will
constitute the staple of their food. Butter there will be none, and wine will
speedily rise to an enormous price. Nor will coffee and tea be had, except at
a rate which will place them out of the reach of the mass of the community.
These are the smallest sacrifices of war. The blockade is not yet enforced
here, and the privateers of the port are extremely active, and have captured
vessels with more energy than wisdom.
The day before yesterday, ships belonging to the United States in that river
were seized by the Confederation authorities, on the ground that war had
broken out, and that the time of grace accorded to the enemy’s traders had
expired. Great was the rush to the consul’s office to transfer the menaced
property from ownership under the stars and stripes to British hands; but
Mr. Mure refused to recognize any transaction of the kind, unless sale bona
fide had been effected before the action of the Confederate marshals.
At Charleston the blockade has been raised, owing, apparently, to some
want of information or of means on the part of the United States
government, and considerable inconvenience may be experienced by them
in consequence. On the 11th, the United States steam-frigate Niagara
appeared outside and warned off several British ships, and on the 13th she
was visited by Mr. Bunch, our consul, who was positively assured by the
officers on board that eight or ten vessels would be down to join in
enforcing the blockade. On the 15th, however, the Niagara departed,
leaving the port open, and several vessels have since run in and obtained
fabulous freights, suggesting to the minds of the owners of the vessels
which were warned off the propriety of making enormous demands for
compensation. The Southerners generally believe not only that their
Confederacy will be acknowledged, but that the blockade will be
disregarded by England. Their affection for her is proportionably
prodigious, and reminds one of the intensity of the gratitude which consists
in lively expectations of favors to come.

NEW ORLEANS, May 21, 1861.


Yesterday morning early I left Mobile in the steamer Florida, which
arrived in the Lake of Pontchartrain, late at night, or early this morning. The
voyage, if it can be called so, would have offered, in less exciting times,
much that was interesting—certainly, to a stranger, a good deal that was
novel—for our course lay inside a chain, almost uninterrupted, of reefs,
covered with sand and pine-trees, exceedingly narrow, so that the surf and
waves of the ocean beyond could be seen rolling in foam through the
foliage of the forest, or on the white beach, while the sea lake on which our
steamer was speeding lay in a broad, smooth sheet, just crisped by the
breeze, between the outward barrier and the wooded shores of the mainland.
Innumerable creeks, or “bayous,” as they are called, pierce the gloom of
these endless pines. Now and then a sail could be made out, stealing
through the mazes of the marshy waters. If the mariner knows his course, he
may find deep water in most of the channels from the outer sea into these
inner waters, on which the people of the South will greatly depend for any
coasting-trade and supplies coastwise, they may require, as well as for the
safe retreat of their privateers. A few miles from Mobile, the steamer
turning out of the bay, entered upon the series of these lakes through a
narrow channel called Grant’s Pass, which some enterprising person, not
improbably of Scottish extraction, constructed for his own behoof, by an
ingenious watercut, and for the use of which, and of a little iron lighthouse
that he has built close at hand, on the model of a pepper-castor, he charges
toll on passing vessels. This island is scarcely three feet above the water; it
is not over 20 yards broad and 150 yards long. A number of men were,
however, busily engaged in throwing up the sand, and arms gleamed amid
some tents pitched around the solitary wooden shed in the centre. A
schooner lay at the wharf, laden with two guns and sand-bags, and as we
passed through the narrow channel several men in military uniform, who
were on board, took their places in a boat which pushed off for them, and
were conveyed to their tiny station, of which one shell would make a dust
heap. The Mobilians are fortifying themselves as best they can, and seem,
not unadvisedly, jealous of gun-boats and small war-steamers. On more
than one outlying sand-bank toward New Orleans, are they to be seen at
work on other batteries, and they are busied in repairing, as well as they
can, old Spanish and new United States works which had been abandoned,
or which were never completed. The news has just been reported, indeed,
that the batteries they were preparing on Ship Island have been destroyed
and burnt by a vessel of war of the United States. For the whole day we saw
only a few coasting craft and the return steamers from New Orleans; but in
the evening a large schooner, which sailed like a witch and was crammed
with men, challenged my attention, and on looking at her through the glass I
could make out reasons enough for desiring to avoid her if one was a quiet,
short-handed, well-filled old merchantman. There could be no mistake
about certain black objects on the deck. She lay as low as a yacht, and there
were some fifty or sixty men in the waist and forecastle. On approaching
New Orleans, there are some settlements rather than cities, although they
are called by the latter title, visible on the right hand, embowered in woods
and stretching along the beach. Such are the “Mississippi City,” Pass
Cagoula, and Pass Christian, &c.—all resorts of the inhabitants of New
Orleans during the summer heats and the epidemics which play such havoc
with life from time to time. Seen from the sea, these huge hamlets look very
picturesque. The detached villas, of every variety of architecture, are
painted brightly, and stand in gardens in the midst of magnolias and
rhododendrons. Very long and slender piers lead far into the sea before the
very door, and at the extremity of each there is a bathing-box for the
inmates. The general effect of one of these settlements, with its light domes
and spires, long lines of whitewashed railings, and houses of every hue set
in the dark green of the pines, is very pretty. The steamer touched at two of
them. There was a motley group of colored people on the jetty, a few
whites, of whom the males were nearly all in uniform; a few bales of goods
were landed or put on board, and that was all one could see of the life of
that place. Our passengers never ceased talking politics all day, except when
they were eating or drinking, for I regret to say they can continue to chew
and to spit while they are engaged in political discussion. Some were rude
provincials in uniform. One was an acquaintance from the far East, who had
been a lieutenant on board of the Minnesota, and had resigned his
commission in order to take service under the Confederate flag. The fiercest
among them all was a thin little lady, who uttered certain energetic
aspirations for the possession of portions of Mr. Lincoln’s person, and who
was kind enough to express intense satisfaction at the intelligence that there
was small-pox among the garrison at Monroe. In the evening a little
difficulty occurred among some of the military gentlemen, during which
one of the logicians drew a revolver, and presented it at the head of the
gentleman who was opposed to his peculiar views, but I am happy to say
that an arrangement, to which I was an unwilling “party,” for the row took
place within a yard of me, was entered into for a fight to come off on shore
in two days after they landed, which led to the postponement of immediate
murder.
The entrance to Ponchartrain lake is infamous for the abundance of its
mosquitos, and it was with no small satisfaction that we experienced a small
tornado, a thunderstorm, and a breeze of wind which saved us from their
fury. It is a dismal canal through a swamp. At daylight, the vessel lay
alongside a wharf surrounded by small boats and bathing stations. A railway
shed receives us on shore, and a train is soon ready to start for the city,
which is six miles distant. For a few hundred yards the line passes between
wooden houses, used as restaurants, or “restaurats,” as they are called
hereaway, kept by people with French names and using the French tongue;
then the rail plunges through a swamp, dense as an Indian jungle, and with
the overflowings of the Mississippi creeping in feeble, shallow currents
over the black mud. Presently the spires of churches are seen rising above
the underwood and rushes. Then we come out on a wide marshy plain, in
which flocks of cattle, up to the belly in mud, are floundering to get at the
rich herbage on the unbroken surface. Next comes a wide-spread suburb of
exceedingly broad lanes, lined with small one-storied houses. The
inhabitants are pale, lean, and sickly; and there is about the men a certain
look, almost peculiar to the fishy-fleshy populations of Levantine towns,
which I cannot describe, but which exists all along the Mediterranean
seaboard, and crops out here again. The drive through badly-paved streets
enables us to see that there is an air of French civilization about New
Orleans. The streets are wisely adapted to the situation; they are not so wide
as to permit the sun to have it all his own way from rising to setting. The
shops are “magasins;” cafés abound. The colored population looks well
dressed, and is going to mass or market in the early morning. The
pavements are crowded with men in uniform, in which the taste of France is
generally followed. The carriage stops at last, and rest comes gratefully
after the stormy night, the mosquitos, “the noise of the captains” (at the
bar), and the shouting.
May 22.—The prevalence of the war spirit here is in every thing
somewhat exaggerated by the fervor of Gallic origin, and the violence of
popular opinion and the tyranny of the mass are as potent as in any place in
the South. The great house of Brown Brothers, of Liverpool and New York,
has closed its business here in consequence of the intimidation of the mob,
or as the phrase is, of the “citizens,” who were “excited” by seeing that the
firm had subscribed to the New York fund, on its sudden resurrection after
Fort Sumter had fallen. Some other houses are about to pursue the same
course; all large business transactions are over for the season, and the
migratory population which comes here to trade, has taken wing much
earlier than usual. But the streets are full of “Turcos,” and “Zouaves,” and
“Chasseurs;” the tailors are busy night and day on uniforms; the walls are
covered with placards for recruits; the seamstresses are sewing flags; the
ladies are carding lint and stitching cartridge-bags. The newspapers are
crowded with advertisements relating to the formation of new companies of
volunteers and the election of officers. There are Pickwick Rifles, Lafayette,
Beauregard, Irish, German, Scotch, Italian, Spanish, Crescent, McMahon—
innumerable—rifle volunteers of all names and nationalities, and the
Meagher Rifles, indignant with “that valiant son of Mars” because he has
drawn his sword for the North, have rebaptized themselves, and are going
to seek glory under a more auspicious nomenclature. About New Orleans, I
shall have more to say when I see more of it. At present it looks very like an
outlying suburb of Chalons when the grand camp is at its highest military
development, although the thermometer is rising gradually, and obliges one
to know occasionally that it can be 95° in the shade already. In the course of
my journeyings southward, I have failed to find much evidence that there is
any apprehension on the part of the planters of a servile insurrection, or that
the slaves are taking much interest in the coming contest, or know what it is
about. But I have my suspicions that all is not right; paragraphs meet the
eye, and odd sentences strike the ear, and little facts here and there come to
the knowledge, which arouse curiosity and doubt. There is one stereotyped
sentence which I am tired of: “Our negroes, sir, are the happiest, the most
contented, and the best off of any people in the world.”
The violence and reiterancy of this formula cause one to inquire whether
any thing which demands such insistance is really in the condition
predicated; and for myself I always say: “It may be so, but as yet I do not
see the proof of it. The negroes do not look to be what you say they are.”
For the present that is enough as to one’s own opinions. Externally, the
paragraphs which attract attention, and the acts of the authorities, are
inconsistent with the notion that the negroes are all very good, very happy,
or at all contented, not to speak of their being in the superlative condition of
enjoyment; and as I only see them as yet in the most superficial way, and
under the most favorable circumstances, it may be that when the cotton-
picking season is at its height, and it lasts for several months, when the
labor is continuous from sunrise to sunset, there is less reason to accept the
assertions as so largely and generally true of the vast majority of the slaves.
“There is an excellent gentleman over there,” said a friend to me, “who
gives his overseers a premium of ten dollars on the birth of every child on
his plantation.” “Why so?” “Oh, in order that the overseers may not work
the women in the family-way overmuch.” There is little use in this part of
the world in making use of inferences. But where overseers do not get the
premium, it may be supposed they do work the pregnant women too much.
Here are two paragraphs which do not look very well as they stand.
Those negroes who were taken with a sudden leaving on Sunday night last, will save the
country the expenses of their burial if they keep dark from these parts. They and other of
the “breden” will not be permitted to express themselves quite so freely in regard to their
braggadocio designs upon virtue, in the absence of volunteers.—Wilmington (Clintock
County, Ohio) Watchman (Republican).
Served Him Right. One day last week, some colored individual, living near South
Plymouth, made a threat that, in case a civil war should occur, “he would be one to ravish
the wife of every democrat, and to help murder their offspring, and wash his hands in their
blood.” For this diabolical assertion he was hauled up before a committee of white citizens,
who adjudged him forty stripes on his naked back. He was accordingly stripped, and the
lashes were laid on with such a good will that blood flowed at the end of the castigation.—
Washington (Fayette County, Ohio) Register (Neutral).

It is reported that the patrols are strengthened, and I could not help hearing
a charming young lady say to another, the other evening, that “she would
not be afraid to go back to the plantation, though Mrs. Brown Jones said she
was afraid her negroes were after mischief.”
There is a great scarcity of powder, which is one of the reasons, perhaps,
why it has not yet been expended as largely as might be expected from the
tone and temper on both sides. There is no sulphur in the States; nitre and
charcoal abound. The sea is open to the North. There is no great overplus of
money on either side. In Missouri, the interest on the state debt, due in July,
will be used to procure arms for the state volunteers to carry on the war. The
South is preparing for the struggle by sowing a most unusual quantity of
grain; and in many fields corn and maize have been planted instead of
cotton. “Stay laws,” by which all inconveniences arising from the usual
dull, old-fashioned relations between debtor and creditor are avoided (at
least by the debtor), have been adopted in most of the seceding states. How
is it that the state legislatures seem to be in the hands of the debtors and not
of the creditors?
There are some who cling to the idea that there will be no war after all, but
no one believes that the South will ever go back of its own free will, and the
only reason that can be given by those who hope rather than think in that
way is to be found in the faith that the North will accept some mediation,
and will let the South go in peace. But could there—can there be peace?
The frontier question—the adjustment of various claims—the demands for
indemnity, or for privileges or exemptions, in the present state of feeling,
can have but one result. The task of mediation is sure to be as thankless as
abortive. Assuredly the proffered service of England would, on one side at
least, be received with something like insult. Nothing but adversity can
teach these people its own most useful lessons. Material prosperity has
puffed up the citizens to an unwholesome state. The toils and sacrifices of
the old world have been taken by them as their birthright, and they have
accepted the fruits of all that the science, genius, suffering, and trials of
mankind in time past have wrought out, perfected, and won as their own
peculiar inheritance, while they have ignorantly rejected the advice and
scorned the lessons with which these were accompanied.
May 23.—The Congress at Montgomery, having sat with closed doors
almost since it met, has now adjourned till July the 20th, when it will
reassemble at Richmond, in Virginia, which is thus designated, for the time,
capital of the Confederate States of America. Richmond, the principal city
of the Old Dominion, is about one hundred miles in a straight line south by
west of Washington. The rival capitals will thus be in very close proximity
by rail and by steam, by land and by water. The movement is significant. It
will tend to hasten a collision between the forces which are collected on the
opposite sides of the Potomac. Hitherto, Mr. Jefferson Davis has not
evinced all the sagacity and energy, in a military sense, which he is said to
possess. It was bad strategy to menace Washington before he could act. His
secretary of war, Mr. Walker, many weeks ago, in a public speech,
announced the intention of marching upon the capital. If it was meant to do
so, the blow should have been struck silently. If it was not intended to seize
upon Washington, the threat had a very disastrous effect on the South, as it
excited the North to immediate action, and caused General Scott to
concentrate his troops on points which present many advantages in the face
of any operations which may be considered necessary along the lines either
of defence or attack. The movement against the Norfolk navy-yard
strengthened Fortress Monroe, and the Potomac and Chesapeake were
secured to the United States. The fortified ports held by the Virginians and
the Confederate States troops, are not of much value as long as the streams
are commanded by the enemy’s steamers; and General Scott has shown that
he has not outlived either his reputation or his vigor by the steps, at once
wise and rapid, he has taken to curb the malcontents in Maryland, and to
open his communications through the city of Baltimore. Although immense
levies of men may be got together, on both sides, for purposes of local
defence or for state operations, it seems to me that it will be very difficult to
move these masses in regular armies. The men are not disposed for regular,
lengthened service, and there is an utter want of field trains, equipment, and
commissariat, which cannot be made good in a day, a week, or a month.
The bill passed by the Montgomery Congress, entitled “An act to raise an
additional military force to serve during the war,” is, in fact, a measure to
put into the hands of the government the control of irregular bodies of men,
and to bind them to regular military service. With all their zeal, the people
of the South will not enlist. They detest the recruiting sergeant, and Mr.
Davis knows enough of war to feel hesitation in trusting himself in the field
to volunteers. The bill authorizes Mr. Davis to accept volunteers who may
offer their services, without regard to the place of enlistment, “to serve
during the war, unless sooner discharged.” They may be accepted in
companies, but Mr. Davis is to organize them into squadrons, battalions, or
regiments, and the appointment of field and staff officers is reserved
especially to him. The company officers are to be elected by the men of the
company, but here again Mr. Davis reserves to himself the right of veto, and
will only commission those officers whose election he approves.
The absence of cavalry and the deficiency of artillery may prevent either
side obtaining any decisive results in one engagement; but, no doubt, there
will be great loss whenever these large masses of men are fairly opposed to
each other in the field. Of the character of the Northern regiments I can say
nothing more from actual observation; nor have I yet seen, in any place,
such a considerable number of the troops of the Confederate States, moving
together, as would justify me in expressing any opinion with regard to their
capacity for organized movements, such as regular troops in Europe are
expected to perform. An intelligent and trustworthy observer, taking one of
the New York state militia regiments as a fair specimen of the battalions
which will fight for the United States, gives an account of them which leads
me to the conclusion that such regiments are much superior, when furnished
by the country districts, to those raised in the towns and cities. It appears, in
this case at least, that the members of the regular militia companies in
general send substitutes to the ranks. Ten of these companies form the
regiment, and, in nearly every instance, they have been doubled in strength
by volunteers. Their drill is exceedingly incomplete, and in forming the
companies there is a tendency for the different nationalities to keep
themselves together. In the regiment in question the rank and file often
consists of quarrymen, mechanics, and canal boatmen, mountaineers from
the Catskill, bark peelers, and timber cutters—ungainly, square-built,
powerful fellows, with a Dutch tenacity of purpose crossed with an English
indifference to danger. There is no drunkenness and no desertion among
them. The officers are almost as ignorant of military training as their men.
The colonel, for instance, is the son of a rich man in his district, well
educated, and a man of travel. Another officer is a shipmaster. A third is an
artist; others are merchants and lawyers, and they are all busy studying
“Hardee’s Tactics,” the best book for infantry drill in the United States. The
men have come out to fight for what they consider the cause of the country,
and are said to have no particular hatred of the South, or of its inhabitants,
though they think they are “a darned deal too high and mighty, and require
to be wiped down considerably.” They have no notion as to the length of
time for which their services will be required, and I am assured that not one
of them has asked what his pay is to be.
Reverting to Montgomery, one may say without offence that its claims to
be the capital of a republic which asserts that it is the richest, and believes
that it will be the strongest in the world, are not by any means evident to a
stranger. Its central position, which has reference rather to a map than to the
hard face of matter, procured for it a distinction to which it had no other
claim. The accommodations which suited the modest wants of a state
legislature vanished or were transmuted into barbarous inconveniences by
the pressure of a central government, with its offices, its departments, and
the vast crowd of applicants which flocked thither to pick up such crumbs
of comfort as could be spared from the executive table. Never shall I forget
the dismay of myself, and of the friends who were travelling with me, on
our arrival at the Exchange Hotel, under circumstances with some of which
you are already acquainted. With us were men of high position, members of
Congress, senators, ex-governors, and General Beauregard himself. But to
no one was greater accommodation extended than could be furnished by a
room held, under a sort of ryot-warree tenure, in common with a
community of strangers. My room was shown to me. It contained four large
four-post beds, a ricketty table, and some chairs of infirm purpose and
fundamental unsoundness. The floor was carpetless, covered with litter of
paper and ends of cigars, and stained with tobacco juice. The broken glass
of the window afforded no ungrateful means of ventilation. One gentleman
sat in his shirt sleeves at the table reading the account of the marshalling of
the Highlanders at Edinburgh in the Abbottsford edition of Sir Walter Scott;
another, who had been wearied, apparently, by writing numerous
applications to the government for some military post, of which rough
copies lay scattered around, came in, after refreshing himself at the bar, and
occupied one of the beds, which by the bye, were ominously provided with
two pillows apiece. Supper there was none for us in the house, but a search
in an outlying street enabled us to discover a restaurant, where roasted
squirrels and baked opossums figured as luxuries in the bill of fare. On our
return we found that due preparation had been made in the apartment by the
addition of three mattresses on the floor. The beds were occupied by
unknown statesmen and warriors, and we all slumbered and snored in
friendly concert till morning. Gentlemen in the South complain that
strangers judge of them by their hotels, but it is a very natural standard for
strangers to adopt, and in respect to Montgomery it is almost the only one
that a gentleman can conveniently use, for if the inhabitants of this city and
its vicinity are not maligned, there is an absence of the hospitable spirit
which the South lays claim to as one of its animating principles, and a little
bird whispered to me that from Mr. Jefferson Davis down to the least
distinguished member of his government there was reason to observe that
the usual attentions and civilities offered by residents to illustrious
stragglers had been “conspicuous for their absence.” The fact is, that the
small planters who constitute the majority of the land-owners are not in a
position to act the Amphytrion, and that the inhabitants of the district can
scarcely aspire to be considered what we would call gentry in England, but
are a frugal, simple, hog-and-hominy living people, fond of hard work and,
occasionally, of hard drinking.

NEW ORLEANS, May 24, 1861.


It is impossible to resist the conviction that the Southern Confederacy can
only be conquered by means as irresistible as those by which Poland was
subjugated. The South will fall, if at all, as a nation prostrate at the feet of a
victorious enemy. There is no doubt of the unanimity of the people. If
words mean any thing, they are animated by only one sentiment, and they
will resist the North as long as they can command a man or a dollar. There
is nothing of a sectional character in this disposition of the South. In every
state there is only one voice audible. Hereafter, indeed, state jealousies may
work their own way. Whatever may be the result, unless the men are the
merest braggarts—and they do not look like it—they will fight to the last
before they give in, and their confidence in their resources is only equalled
by their determination to test them to the utmost. There is a noisy
vociferation about their declarations of implicit trust and reliance on their
slaves which makes one think “they do protest too much,” and it remains to
be seen whether the slaves really will remain faithful to their masters should
the abolition army ever come among them as an armed propaganda. One
thing is obvious here. A large number of men who might be usefully
employed in the ranks are idling about the streets. The military enthusiasm
is in proportion to the property interest of the various classes of the people,
and the very boast that so many rich men are serving in the ranks is a
significant proof, either of the want of a substratum, or of the absence of
great devotion to the cause, of any such layer of white people as may
underlie the great slave-holding, mercantile, and planting oligarchy. The
whole state of Louisiana contains about 50,000 men liable to serve when
called on. Of that number only 15,000 are enrolled and under arms in any
shape whatever, and if one is to judge of the state of affairs by the
advertisements which appear from the adjutant-general’s office, there was
some difficulty in procuring the 3,000 men—merely 3,000 volunteers—“to
serve during the war,” who are required by the Confederate government.
There is “plenty of prave ’ords,” and if fierce writing and talking could do
the work, the armies on both sides would have been killed and eaten long
ago. It is found out that “lives of the citizens” at Pensacola are too valuable
to be destroyed in attacking Pickens. A storm that shall drive away the
ships, a plague, yellow fever, mosquitos, rattlesnakes, small-pox—any of
these agencies, is looked to with confidence to do the work of shot, shell,
and bayonet. Our American “brethren in arms” have yet to learn that great
law in military cookery, that “if they want to make omelets they must break
eggs.” The “moral suasion” of the lasso, of head-shaving, ducking, kicking,
and such processes, are, I suspect, used not unfrequently to stimulate
volunteers; and the extent to which the acts of the recruiting officer are
somewhat aided by the arm of the law, and the force of the policeman and
the magistrate, may be seen from paragraphs in the morning papers now
and then, to the effect that certain gentlemen of Milesian extraction, who
might have been engaged in pugilistic pursuits, were discharged from
custody unpunished on condition that they enlisted for the war. With the
peculiar views entertained of freedom of opinion and action by large classes
of people on this continent, such a mode of obtaining volunteers is very
natural, but resort to it evinces a want of zeal on the part of some of the
50,000 who are on the rolls; and, from all I can hear—and I have asked
numerous persons likely to be acquainted with the subject—there are not
more than those 15,000 men of whom I have spoken in all the state under
arms, or in training, of whom a considerable proportion will be needed for
garrison and coast defence duties. It may be that the Northern states and
Northern sentiments are as violent as those of the South but I see some
evidences to the contrary. For instance, in New York ladies and gentlemen
from the South are permitted to live at their favorite hotel without
molestation, and one hotel keeper at Saratoga Springs advertises openly for
the custom of his Southern patrons. In no city of the South which I have
visited would a party of Northern people be permitted to remain for an hour
if the “citizens” were aware of their presence. It is laughable to hear men
speaking of the “unanimity” of the South. Just look at the peculiar means by
which unanimity is enforced and secured! This is an extract from a New
Orleans paper:
CHARGES OF ABOLITIONISM.—Mayor Monroe has disposed of some of the cases brought
before him on charges of this kind by sending the accused to the workhouse.
A Mexican named Bernard Cruz, born in Tampico, and living here with an Irish wife,
was brought before the Mayor this morning charged with uttering Abolition sentiments.
After a full investigation, it was found from the utterance of his incendiary language, that
Cruz’s education was not yet perfect in Southern classics, and his Honor therefore directed
that he be sent for six months to the Humane Institution for the Amelioration of the
Condition of Northern Barbarians and Abolition Fanatics, presided over by Professor
Henry Mitchell, keeper of the workhouse, who will put him through a course of study on
Southern ethics and institutions.
The testimony before him Saturday, however, in the case of a man named David
O’Keefe, was such as to induce him to commit the accused for trial before the Criminal
Court. One of the witnesses testified positively that he heard him make his children shout
for Lincoln; another, that the accused said, “I am an abolitionist,” &c. The witnesses, the
neighbors of the accused, gave their evidence reluctantly, saying that they had warned him
of the folly and danger of his conduct. O’Keefe says he has been a United States soldier,
and came here from St. Louis and Kansas.
John White was arraigned before Recorder Emerson on Saturday for uttering incendiary
language while traveling in the baggage car of a train of the New Orleans, Ohio, and Great
Western Railroad, intimating that the decapitator of Jefferson Davis would get $10,000 for
his trouble, and the last man of us would be whipped like dogs by the Lincolnites. He was
held under bonds of $500 to answer the charge on the 8th of June.
Nicholas Gento, charged with declaring himself an Abolitionist, and acting very much
like he was one, by harboring a runaway slave, was sent to prison in default of bail, to
await examination before the recorder.

Such is “freedom of speech” in Louisiana! But in Texas the machinery for


the production of “unanimity” is less complicated, and there are no
insulting legal formalities connected with the working of the simple
appliances which a primitive agricultural people have devised for their own
purposes. Hear the Texan correspondent of one of the journals of this city
on the subject. He says:
It is to us astonishing, that such unmitigated lies as those Northern papers disseminate of
anarchy and disorder here in Texas, dissension among ourselves, and especially from our
German, &c., population, with dangers and anxieties from the fear of insurrection among
the negroes, &c., should be deemed anywhere South worthy of a moment’s thought. It is
surely notorious enough that in no part of the South are Abolitionists, or other disturbers of
the public peace, so very unsafe as in Texas. The lasso is so very convenient!

Here is an excellent method of preventing dissension described by a stroke


of the pen; and, as such, an ingenious people are not likely to lose sight of
the uses of a revolution in developing peculiar principles to their own
advantage, repudiation of debts to the North has been proclaimed and acted
on. One gentleman has found it convenient to inform Major Anderson that
he does not intend to meet certain bills which he had given the major for
some slaves. Another declares he won’t pay any one at all, as he has
discovered it is immoral and contrary to the law of nations to do so. A third
feels himself bound to obey the commands of the governor of his state, who
has ordered that debts due to the North shall not be liquidated. As a naïve
specimen of the way in which the whole case is treated, take this article and
the correspondence of “one of the most prominent mercantile houses in
New Orleans:”
SOUTHERN DEBTS TO THE NORTH.
The Cincinnati Gazette copies the following paragraph from The New York Evening Post:
“BAD FAITH.—The bad faith of the Southern merchants in their transactions with their
Northern correspondents is becoming more evident daily. We have heard of several recent
cases where parties in this city, retired from active business, have, nevertheless, stepped
Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.

More than just a book-buying platform, we strive to be a bridge


connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.

Join us on a journey of knowledge exploration, passion nurturing, and


personal growth every day!

ebookbell.com

You might also like