0% found this document useful (0 votes)

8 views59 pages

Theory and Applications of Time Series Analysis Selected Contributions From Itise 2018 Olga Valenzuela Download

The document discusses the book 'Theory and Applications of Time Series Analysis', which compiles selected contributions from the ITISE 2018 conference, focusing on advancements in time series analysis and forecasting. It covers theoretical and practical aspects, including statistical methods, econometric models, and applications across various fields such as finance, health, and climate. The book aims to provide a comprehensive overview of recent developments and methodologies in time series research, emphasizing the importance of interdisciplinary collaboration.

Uploaded by

msavgyrpaz8201

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views59 pages

Theory and Applications of Time Series Analysis Selected Contributions From Itise 2018 Olga Valenzuela Download

Uploaded by

msavgyrpaz8201

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 59

Theory And Applications Of Time Series Analysis

Selected Contributions From Itise 2018 Olga

Valenzuela download

https://ebookbell.com/product/theory-and-applications-of-time-
series-analysis-selected-contributions-from-itise-2018-olga-
valenzuela-10603980

Explore and download more ebooks at ebookbell.com

Here are some recommended products that we believe you will be
interested in. You can click the link to download.

Theory And Applications Of Time Series Analysis Selected Contributions

From Itise 2019 Olga Valenzuela

https://ebookbell.com/product/theory-and-applications-of-time-series-
analysis-selected-contributions-from-itise-2019-olga-
valenzuela-14443416

Theory And Applications Of Time Series Analysis And Forecasting

Selected Contributions From Itise 2021 Olga Valenzuela

https://ebookbell.com/product/theory-and-applications-of-time-series-
analysis-and-forecasting-selected-contributions-from-itise-2021-olga-
valenzuela-48467480

Fractal And Diffusion Entropy Analysis Of Time Series Theory Concepts

Applications And Computer Codes For Studying Fractal Noises And Lvy
Walk Signals Scafetta

https://ebookbell.com/product/fractal-and-diffusion-entropy-analysis-
of-time-series-theory-concepts-applications-and-computer-codes-for-
studying-fractal-noises-and-lvy-walk-signals-scafetta-33776372

Characterizing Interdependencies Of Multiple Time Series Theory And

Applications 1st Edition Yuzo Hosoya

https://ebookbell.com/product/characterizing-interdependencies-of-
multiple-time-series-theory-and-applications-1st-edition-yuzo-
hosoya-6793324
Combined Measure And Shift Invariance Theory Of Time Scales And
Applications Chao Wang

https://ebookbell.com/product/combined-measure-and-shift-invariance-
theory-of-time-scales-and-applications-chao-wang-46255662

Understanding Behavior In The Context Of Time Theory Research And

Applications Alan Strathman Jeff Joireman

https://ebookbell.com/product/understanding-behavior-in-the-context-
of-time-theory-research-and-applications-alan-strathman-jeff-
joireman-2138066

Theory Of Translation Closedness For Time Scales With Applications In

Translation Functions And Dynamic Equations 1st Ed 2020 Chao Wang

https://ebookbell.com/product/theory-of-translation-closedness-for-
time-scales-with-applications-in-translation-functions-and-dynamic-
equations-1st-ed-2020-chao-wang-11026058

Time Perspective Theory Review Research And Application Essays In

Honor Of Philip G Zimbardo 2015th Edition Maciej Stolarski

https://ebookbell.com/product/time-perspective-theory-review-research-
and-application-essays-in-honor-of-philip-g-zimbardo-2015th-edition-
maciej-stolarski-4938226

Theory And Applications Of Colloidal Suspension Rheology Norman J

Wagner

https://ebookbell.com/product/theory-and-applications-of-colloidal-
suspension-rheology-norman-j-wagner-46871204
Contributions to Statistics

Olga Valenzuela
Fernando Rojas
Héctor Pomares
Ignacio Rojas Editors

Theory and
Applications
of Time Series
Analysis
Selected Contributions from ITISE 2018
Contributions to Statistics
The series Contributions to Statistics contains publications in theoretical and
applied statistics, including for example applications in medical statistics,
biometrics, econometrics and computational statistics. These publications are
primarily monographs and multiple author works containing new research results,
but conference and congress reports are also considered.
Apart from the contribution to scientiﬁc progress presented, it is a notable
characteristic of the series that publishing time is very short, permitting authors and
editors to present their results without delay.

More information about this series at http://www.springer.com/series/2912

Olga Valenzuela Fernando Rojas
• •

Héctor Pomares Ignacio Rojas

•

Editors

Theory and Applications

of Time Series Analysis
Selected Contributions from ITISE 2018

123
Editors
Olga Valenzuela Fernando Rojas
Faculty of Sciences ETSIIT, CITIC-UGR
University of Granada University of Granada
Granada, Spain Granada, Spain

Héctor Pomares Ignacio Rojas

ETSIIT, CITIC-UGR ETSIIT, CITIC-UGR
University of Granada University of Granada
Granada, Spain Granada, Spain

ISSN 1431-1968
Contributions to Statistics
ISBN 978-3-030-26035-4 ISBN 978-3-030-26036-1 (eBook)
https://doi.org/10.1007/978-3-030-26036-1
© Springer Nature Switzerland AG 2019
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission
or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, expressed or implied, with respect to the material contained
herein or for any errors or omissions that may have been made. The publisher remains neutral with regard
to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface

The word forecasting is commonly associated by many people with some enigmatic
concepts such as astrology (not to be confounded with astronomy), crystal balls, or
tarot cards. There is, however, a different way to make forecasts based on scientific
analyses from past and present information. This information can be numerical or
categorical. It can even be expressed in linguistic terms by experts in the field. In
this book, we try to provide some very recent contributions toward this scientific
way to make forecasts and which have in common that this past and present
information is arranged as a set of measurements collected in different instants of
time (normally at fixed time intervals). These contributions to what we call in this
book “Time Series Analysis and Forecasting” have been classified into different
parts according to their content. The first three parts of the book contain more
theoretical contributions, some related to pure statistical methods, some that also
make use of state-of-the-art computational intelligence methodologies and finally
some more related to econometrics. On the other hand, in the last parts, we provide
more practical contributions with the intention of providing the readers with the
view that this field, although with a very sophisticated and powerful theory behind,
has as final aim the practical application. There exists practically no discipline in
this world which cannot benefit from contributions in the “Time Analysis and
Forecasting” field.
The origin of this book stems from the International Conference on Time Series
and Forecasting, ITISE 2018, held in Granada (Spain) in September 2018. Our aim
with the organization of ITISE 2018 was to create a friendly discussion forum for
scientists, engineers, educators, and students about the latest ideas and realizations
in the foundations, theory, models, and applications for interdisciplinary and
multidisciplinary research encompassing disciplines of statistics, mathematical
models, econometrics, engineering, and computer science in the field of time series
analysis and forecasting.
The list of topics in the successive Call for Papers has also evolved, resulting in
the following list for the last edition:

v
vi Preface

1. Time Series Analysis and Forecasting

• Nonparametric and functional methods.
• Vector processes.
• Probabilistic approach to modeling macroeconomic uncertainties.
• Uncertainties in forecasting processes.
• Nonstationarity.
• Forecasting with many models. Model integration.
• Forecasting theory and adjustment.
• Ensemble forecasting.
• Forecasting performance evaluation.
• Interval forecasting.
• Data preprocessing methods: data decomposition, seasonal adjustment, sin-
gular spectrum analysis, and detrending methods.
2. Econometric and Forecasting
• Econometric models.
• Economic and econometric forecasting.
• Real macroeconomic monitoring and forecasting.
• Advanced econometric methods.
3. Advanced Methods and Online Learning in Time Series
• Adaptivity for stochastic models.
• Online machine learning for forecasting.
• Aggregation of predictors.
• Hierarchical forecasting.
• Forecasting with computational intelligence.
• Time series analysis with computational intelligence.
• Integration of system dynamics and forecasting models.
4. High Dimension and Complex/Big Data
• Local versus global forecast.
• Techniques for dimension reduction.
• Multiscaling.
• Forecasting from complex/big data.
5. Forecasting in Real Problems
• Health forecasting.
• Atmospheric science forecasting.
• Telecommunication forecasting.
• Hydrological forecasting.
• Trafﬁc forecasting.
• Tourism forecasting.
• Marketing forecasting.
• Modeling and forecasting in power markets.
Preface vii

• Energy forecasting.
• Climate forecasting.
• Financial forecasting and risk analysis.
• Forecasting electricity load and prices.
• Forecasting and planning systems.
• Applications in other disciplines.
High-quality candidate papers from the Conference ITISE2018 (26 contribu-
tions) were invited to submit an extended version of their conference paper to be
considered for this special publication in the book series of Springer: Contributions
to Statistics. For the selection procedure, the information/evaluation of the chairman
of every session, in conjunction with the review comments and the summary of
reviews, were taken into account.
So, now we are pleased to have reached the end of the whole process and present
the readers with these final contributions that we hope will provide a clear overview
of the thematic areas covered by the ITISE 2018 conference, ranging from theo-
retical aspects to real-world applications of Time Series Analysis and Forecasting.
It is important to note that for the sake of consistency and readability of the
book, the presented papers have been classified into the following parts:
• Part: Advanced Statistical Methods for Time Series Analysis and
Forecasting
The main objective of this chapter is to present advanced statistical method-
ologies and theories that could be used with time series. It also aims at bringing
into existence recent and becoming developments in computational mathematics
that could be used in the field of time series. In particular, six contributions have
been selected for this chapter. The first contribution provides us with a
methodology to identify nonstationary autoregressive processes with
time-varying orders and time-varying degrees of nonstationarity, including its
extension to multivariate autoregressive processes. The second contribution
deals with how to take advantage of the information provided by different
estimators for a given big data problem with inhomogeneities, i.e., data is
neither i.i.d. (exhibiting outliers or not belonging to same distribution) nor
stationary (time-varying effects may be present). Not surprisingly, the three most
accurate forecasting methods of the recently celebrated M4 competition are
precisely hybrid approaches formed by a combination of different estimators,
thus proving that researching in this direction should be encouraged. The next
contribution presents a new extension for the general case of linear and non-
linear data of the Granger causality technique to detect causal relationship
between time series based on local approximations of the time delay embedding
reconstruction of the time series’ state space by a linear regression model. The
next two contributions try to shed some light into how complex systems work.
For this purpose, in the first one, the authors have developed a GUI-based
computing environment which allows for building forecasts based on System
Dynamics model, which is the part of the Systems Theory devoted to
viii Preface

understanding the dynamic behavior of complex systems. In the second one, the
authors make use of order patterns recurrence plots to visually tell apart chaotic
systems from other non-chaotic ones. Finally, the last contribution of this part
presents a new freely available Matlab toolbox, called SSpace, that implements
linear, nonlinear, and non-Gaussian State-Space systems. The contribution
demonstrates the toolbox’s potential with several examples.
• Part: Advanced Computational Intelligence Methods for Time Series
Analysis and Forecasting
Although time series analysis can be considered a discipline originated within
the statistical area, in the last decades many computational intelligence methods
or machine learning approaches have been proposed to solve time series-related
problems. In fact, new and further computational intelligence approaches, their
efficiency, and their comparison to statistical methods and other fact-checked
computational intelligence methods are significant topics in academic and
professional projects. It is not uncommon the existence of time series forecasting
competitions which try to elucidate which of the two main research streams is
better. For instance, the above-commented M4-Competition for the first time
made explicit mention to machine learning forecasting methods. Within this
topic, five contributions have been selected for this book. Just related to the
comment we made in the previous paragraphs, the first of the contributions also
deals with an ensemble of estimators but this time it is an ensemble of machine
learning models (deep neural networks). In this case, the authors extend the
concept of snapshot ensembles to the field of time series forecasting. The idea of
this concept is to combine the models obtained through the different local
minima that the optimization algorithm finds in its search for the global one. The
next very interesting contribution is about detecting areas in a time series,
stationary or nonstationary, where it can be asserted that the data belong to a
different distribution than before. To solve this problem, the authors propose a
method called Wavelet-Based Least Squares Density-Difference which is based
on a least squares method applied to the distance between two wavelet expanded
densities extracted from the time series. The third contribution of this second
part of the book presents the very computationally efficient virtual leave-one-out
methodology aimed at selecting the best neural network structure for time series
prediction, and shows how to apply this method in the practical case of time
series data extracted from crime-related police reports. Finally, the last contri-
bution is related to how to implement existing algorithms in fast computing
platforms such as Field Programmable Gate Arrays (FPGAs). In this case, the
authors deal with the hardware implementation of Echo State Networks, which
are a special case of Recurrent Neural Networks in which the synaptic weights
between neurons are kept fixed and only the connections from the network to a
measurement output layer are modified by learning.
Preface ix

• Part: Econometric Models, Financial Forecasting, and Risk Analysis

One of the most prominent applications of time series modeling and forecasting
lies within the field of Econometrics. This chapter aims at presenting some
recent developments of time series research applied to financial and futures data
with the original idea of focusing on studies that develop and apply recent
nonlinear econometric models to reproduce financial market dynamics and to
capture financial data properties with the hope of eventually predicting the next
economic bubble. Five contributions have been selected to that end. The first
one introduces a new class of long-memory model for the estimation of
volatility of stock returns, which takes into account long-memory
heteroskedasticity of the financial time series. The second contribution shows
that under appropriate assumptions on the fractional integration orders the
transfer function corresponding to a Vector Autoregressive Fractionally
Integrated Moving Average (VARFIMA) or a Fractionally Integrated Vector
Autoregressive Moving Average (FIVARMA) process can be estimated con-
sistently using Canonical Variate Analysis. The third contribution studies how
deep long short-term memory neural networks can be applied to robustly
forecast interest rates of different maturities and tenors. Since deep networks
need a lot of data to learn from, the authors solve this problem by generating
data based on fitted time series models. They complete their presentation by
applying support vector machines to predict trends in the term structures. The
next contribution studies the default intensities estimated from credit default
swap spreads by the dynamic Nelson–Siegel model with a time-varying decay
parameter. They show that for the German and U.S. credit default swap markets
the decay parameters change over time and the magnitude of the decay
parameter is positively related to the level of default intensities. The last con-
tribution of this part of the book deals with the problem of how to obtain an
accurate measure of the current globalization process we are currently under-
going in the world. To that end, the authors use the concept of permutation
entropy which essentially measures the entropy of a set of time series based on
the analysis of their permutation patterns. The main difference between this
concept and the Shannon entropy is that the former is a symbolic entropy
focused on patterns rather than on a probability distribution function, which
makes it useful in an analysis of short time series.
The next three parts of the book are dedicated to specific applications of time
series analysis. The contributions provided can be classified into the following
main parts.
• Part: Time Series Analysis in Earth Sciences
This part makes particular emphasis on the application of time series analysis
applied to earth-sciences-related data. For example, the first contribution ana-
lyzes high-resolution time series from a long-term monitoring campaign in the
Guadalquivir River Estuary in the south of Spain in order to predict how
harmful floods can be under certain circumstances. The second contribution uses
the smoothed Lomb–Scargle periodogram to study the precipitation and
x Preface

temperature data recorded during the last decades from 707 meteorological
stations in Andalusia. They eventually obtain a very interesting picture of the
spatial distribution of the climatic cycles from where many conclusions can be
extracted. The third contribution is about online forecasting of ambient tem-
perature and solar irradiation. The method is based on an adaptive ARX model
which can be tuned to changing weather conditions without relying on external
inputs and it obtains an outstanding performance improvement with respect to
the weather forecasting services’ prediction. Finally, the last paper analyzes data
taken from the storm that occurred at the Spanish coast of the Mediterranean Sea
at the end of January 2017 and which produced severe coastal floods. The
authors manage to accurately model and predict the space-time evolution of sea
wave heights during that event using a combination of spatiotemporal random
field theory and the Bayesian maximum entropy method.
• Part: Energy Time Series Forecasting
This part makes particular emphasis on the application of time series analysis,
modeling, and forecasting applied to energy-related data. By energy, we refer to
any kind of energy, such as electrical, solar, microwave, wind, and so on. The
first contribution presents several adaptive methods for forecasting solar heat
production and heat demand of consumers based on weather forecasts. Apart
from the good results obtained, these methods are explicitly developed so that
they are easy to implement in simple computers such as programmable logic
controllers commonly used in the industry. The second one deals with making
short-term forecasts (48-h horizon) of wind power production so as to know
how the pricing rates of wind-generated electricity are going to evolve. To that
end, the authors propose several direct and indirect methods based on different
machine learning algorithms with very promising results.
• Part: Time Series Analysis and Prediction in Other Real Problems
This last part is dedicated to other real applications of time series analysis,
modeling, and forecasting different from those especially mentioned before. The
idea is to state explicitly that applications of time series analysis reach practi-
cally any scientific discipline imaginable. Four very different contributions were
selected for this last part. The first one uses queuing theory to model the custom
inspection process in the Helsinki–Vantaa Airport so as to predict its capacity
and assess whether it can deal with the estimated increase in the number of
passengers in the following years. The second one studies the use of different
models such as ARIMA, Prophet (launched by Facebook in 2017) Multilayer
Perceptrons, and Long Short-Term Memory Neural Networks to predict Internet
data consumption and mobile phone card recharges for different prediction
horizons. The results are worth reading. The next contribution studies how
different pattern similarity-based forecasting methods perform to estimate up to
1 week ahead, the future demand of a thermal unit in a power plant. The
following and last contribution of this book takes advantage of the fact that the
ECG signal is somewhat modulated by the respiration of a person. So, with the
Preface xi

use of an Empirical Mode Decomposition approach to R-peak detection and an

Independent Component Analysis to separate out the respiration signal in the
frequency domain, the authors manage to robustly estimate the breathing rate of
that person.
Last but not least, we would like to point out that this edition of ITISE was
organized by the University of Granada together with the Spanish Chapter of the
IEEE Computational Intelligence Society. The Guest Editors would also like to
express their gratitude to all the people who supported them in the compilation of
this book, and especially to the contributing authors for their submissions, the
chairmen of the different sessions, and to the anonymous reviewers for their
comments and useful suggestions in order to improve the quality of the papers.
We wish to thank our main sponsors as well: the Department of Computer
Architecture and Computer Technology, the Faculty of Science of the University of
Granada, the Research Centre for Information and Communications Technologies
(CITIC-UGR), and the Ministry of Science and Innovation for their support and
grants. Finally, we wish also to thank Prof. Alfred Hofmann, Vice President
Publishing—Computer Science, Springer-Verlag and Dr. Veronika Rosteck,
Springer, Editor, for their interest in editing a book series of Springer based on the
best papers of ITISE 2018.
We hope the readers of this book can make the most of these selected
contributions.

Granada, Spain Olga Valenzuela

January 2019 Fernando Rojas
Héctor Pomares
Ignacio Rojas
Contents

Advanced Statistical Methods for Time Series Analysis

and Forecasting
Identification of Nonstationary Processes Using Noncausal
Bidirectional Lattice Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Maciej Niedźwiecki and Damian Chojnacki
Normalized Entropy Aggregation for Inhomogeneous
Large-Scale Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Maria da Conceição Costa and Pedro Macedo
Modified Granger Causality in Selected Neighborhoods . . . . . . . . . . . . . 31
Martina Chvosteková
Computing Environment for Forecasting Based on System
Dynamics Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Radosław Pytlak, Damian Suski, Tomasz Tarnawski,
Zbigniew Wawrzyniak, Tomasz Zawadzki and Paweł Cichosz
Novel Order Patterns Recurrence Plot-Based Quantification Measures
to Unveil Deterministic Dynamics from Stochastic Processes . . . . . . . . . 57
Shuixiu Lu, Sebastian Oberst, Guoqiang Zhang and Zongwei Luo
Time Series Modeling with MATLAB: The SSpace Toolbox . . . . . . . . . 71
Diego J. Pedregal, Marco A. Villegas, Diego A. Villegas
and Juan R. Trapero

Advanced Computational Intelligence Methods for Time Series

Analysis and Forecasting
Stacked LSTM Snapshot Ensembles for Time Series Forecasting . . . . . 87
Sascha Krstanovic and Heiko Paulheim

xiii
xiv Contents

Change Detection for Streaming Data Using Wavelet-Based Least

Squares Density–Difference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
Nenad Mijatovic, Rana Haber, Mark Moyou, Anthony O. Smith
and Adrian M. Peter
Selection of Neural Network for Crime Time Series Prediction
by Virtual Leave-One-Out Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
Stanisław Jankowski, Zbigniew Szymański, Zbigniew Wawrzyniak,
Paweł Cichosz, Eliza Szczechla and Radosław Pytlak
FPGA-Based Echo-State Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
Erik S. Skibinsky-Gitlin, Miquel L. Alomar, Vincent Canals,
Christiam F. Frasser, Eugeni Isern, Fabio Galán-Prado, Alejandro Morán,
Miquel Roca and Josep L. Rosselló

Econometric Models, Financial Forecasting and Risk Analysis

Conditional Heteroskedasticity in Long-Memory Model “FIMACH”
for Return Volatilities in Equity Markets . . . . . . . . . . . . . . . . . . . . . . . . 149
A. M. M. Shahiduzzaman Quoreshi and Sabur Mollah
Using Subspace Methods to Model Long-Memory Processes . . . . . . . . . 171
Dietmar Bauer
Robust Forecasting of Multiple Yield Curves . . . . . . . . . . . . . . . . . . . . . 187
Christoph Gerhart, Eva Lütkebohmert and Marc Weber
The Changing Shape of Sovereign Default Intensities . . . . . . . . . . . . . . 203
Yusho Kagraoka and Zakaria Moussa
Permutation Entropy as the Measure of Globalization Process . . . . . . . 217
Janusz Miśkiewicz

Time Series Analysis in Earth Sciences

Forecasting Subtidal Water Levels and Currents in Estuaries:
Assessment of Management Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . 229
M. Á. Reyes Merlo, R. Siles-Ajamil and M. Díez-Minguito
Spatial Distribution of Climatic Cycles in Andalusia
(Southern Spain) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
J. Sánchez-Morales, E. Pardo-Igúzquiza and F. J. Rodríguez-Tovar
Localized Online Weather Predictions with Overnight Adaption . . . . . . 257
Michael Zauner, Michaela Killian and Martin Kozek
Storm Characterization Using a BME Approach . . . . . . . . . . . . . . . . . . 271
Manuel Cobos, Andrea Lira-Loarca, George Christakos
and Asunción Baquerizo
Contents xv

Energy Time Series Forecasting

Adaptive Methods for Energy Forecasting of Production and Demand
of Solar-Assisted Heating Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
Viktor Unterberger, Thomas Nigitz, Mauro Luzzu, Daniel Muschick
and Markus Gölles
Short-Term Forecast of Wind Turbine Production with Machine
Learning Methods: Direct and Indirect Approach . . . . . . . . . . . . . . . . . 301
Mamadou Dione and Eric Matzner-Løber

Time Series Analysis and Prediction in Other Real Problems

A Simulation of a Custom Inspection in the Airport . . . . . . . . . . . . . . . 319
Kalle Saastamoinen, Petteri Mattila and Antti Rissanen
Comparing Time Series Prediction Approaches
for Telecom Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
André Pinho, Rogério Costa, Helena Silva and Pedro Furtado
Application of Load Forecasting in Thermal Unit Commitment
Problems: A Pattern Similarity Approach . . . . . . . . . . . . . . . . . . . . . . . 347
Guilherme Costa Silva, Adriano C. Lisboa, Douglas A. G. Vieira
and Rodney R. Saldanha
ICA-Derived Respiration Using an Adaptive R-Peak Detector . . . . . . . . 363
Christina Kozia, Randa Herzallah and David Lowe

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379

Advanced Statistical Methods for Time
Series Analysis and Forecasting
Identification of Nonstationary Processes
Using Noncausal Bidirectional Lattice
Filtering

Maciej Niedźwiecki and Damian Chojnacki

Abstract The problem of off-line identification of a nonstationary autoregressive

process with a time-varying order and a time-varying degree of nonstationarity is
considered and solved using the parallel estimation approach. The proposed parallel
estimation scheme is made up of several bidirectional (noncausal) exponentially
weighted lattice algorithms with different estimation memory and order settings. It is
shown that optimization of both settings can be carried out by means of minimization
of the locally evaluated accumulated forward/backward prediction error statistic.

Keywords Identification of nonstationary processes · Selection of model order ·

Selection of estimation memory

Introduction

Autoregressive analysis is a popular modeling tool, used to solve practical problems

in many different areas, such as biomedicine [1–3], geophysics [4–6], telecommuni-
cations [7, 8], etc. When the analyzed processes are nonstationary, identification of
their autoregressive models can be carried out using local estimation techniques, such
as the well-known sliding-window (SWLS) or exponentially weighted (EWLS) least
squares approaches. Local estimation algorithms are often called finite-memory since

This work was partially supported by the National Science Center under the agreement UMO-
2015/17/B/ST7/03772. Calculations were carried out at the Academic Computer Centre in Gdańsk.

M. Niedźwiecki (B) · D. Chojnacki

Faculty of Electronics, Telecommunications and Informatics, Department of Automatic Control,
Gdańsk University of Technology, ul. Narutowicza 11/12, Gdańsk, Poland
e-mail: [email protected]
D. Chojnacki
e-mail: [email protected]

© Springer Nature Switzerland AG 2019 3

O. Valenzuela et al. (eds.), Theory and Applications of Time Series Analysis,
Contributions to Statistics, https://doi.org/10.1007/978-3-030-26036-1_1
4 M. Niedźwiecki and D. Chojnacki

they rely on the limited (or effectively limited) number of signal samples. Owing to
this property they are capable of tracking time-varying signal parameters.
Two important decisions that must be taken when identifying the time-varying
autoregressive model are the choice of the number of estimated autoregressive coeffi-
cients, i.e., the model order, and selection of the size of the local analysis interval, i.e.,
the estimation memory. Both decisions may have important quantitative (estimation
accuracy) and qualitative (estimation adequacy) implications.
In this paper we will focus on noncausal estimation techniques, which can be
applied when the analyzed signal is prerecorded and can be analyzed off-line. Non-
causality means that at any given time instant t the local parameter estimates can be
based on both “past” observations (collected prior to t) and “future” observations
(collected after t). When applied to identification of nonstationary processes, non-
causal estimators can significantly reduce the estimation bias (due to elimination of
the so-called estimation delay, typical of all causal algorithms [9]).
In the proposed approach, which is a modification of the method described in [10],
noncausal estimates are obtained by combining results yielded by the exponentially
weighted least squares lattice/ladder algorithms [11] running forward and backward
in time, respectively. The problem of model order and estimation memory adaptation
is solved using the parallel estimation approach. In this approach several competing
algorithms, with different order and memory settings, are operated simultaneously
and compared according to their locally assessed predictive abilities.
The proposed technique is computationally attractive and yields time-varying
models with guaranteed uniform stability property which is important in such appli-
cations as parametric spectrum estimation or process simulation.

Nonstationary Autoregressive Processes

Suppose that the analyzed discrete time signal {y(t)}, t = . . . , −1, 0, 1, . . ., can be
described or at least approximated by the following time-varying autoregressive (AR)
model

n
y(t) = ai,n (t)y(t − i) + en (t) = ϕTn (t)αn (t) + en (t)
i=1 (1)
var[en (t)] = ρn (t)

where ϕn (t) = [y(t − 1), . . . , y(t − n)]T denotes regression vector, αn (t) = [a1,n
(t), . . . , an,n (t)]T denotes the vector of autoregressive coefficients, and {en (t)}
denotes white noise with a time-dependent variance ρn (t). In the sequel we will
assume that the entire history of the signal {y(t), t = 1, . . . , T0 } is available, along
with the “boundary” conditions {y(1 − i), y(T0 + i), i = 1, . . . , N }, where
N denotes the maximum model order that will be considered.
Identification of Nonstationary Processes … 5

When the driving noise variance ρn (t) is bounded, αn (t) is a “sampled” version
of a sufficiently smooth continuous time parameter trajectory, andat all time instants
t all zeros of the characteristic polynomial A[z, αn (t)] = 1 − i=1 n
ai,n (t)z −i are
uniformly bounded away from the unit circle in the complex plane, the process (1) is
uniformly exponentially stable [12]. According to the theory developed by Dahlhaus
[13], under the conditions specified above {y(t)} belongs to the class of locally
stationary processes with uniquely defined instantaneous spectral density function
given by

ρn (t)
Sn (ω, t) = (2)
|A[e jω , αn (t)]|2
√
where j = −1 and ω ∈ (−π, π] denotes the normalized angular frequency.

Equivalent Parametrizations of a Stationary

Autoregressive Process

It is known that a zero-mean stationary AR process characterized by the set

Pn = {ρn , a1,n , . . . , an,n } (further referred to as direct parametrization) can be equiv-
alently specified in terms of autocorrelation coefficients Rn = {r0 , r1 , . . . , rn } where
ri = E[y(t)y(t − i)] (autocorrelation parametrization), or in terms of partial autocor-
relation coefficients Qn = {r0 , q1 , . . . , qn } where qi is the normalized autocorrelation
between y(t) and y(t − i) with the linear dependence on the intermediate variables
y(s), t − i < s < t removed (lattice parametrization).
All three parametrizations are equivalent, i.e., given any of them, one can deter-
mine the remaining two using invertible mappings

Pn = F[Rn ], Rn = F −1 [Pn ]
Rn = G[Qn ], Qn = G −1 [Rn ]
Qn = H [Pn ], Pn = H −1 [Qn ].

Description of these mappings can be found, e.g., in [14].

Causal Lattice Algorithm

The exponentially weighted least squares normalized lattice/ladder algorithm pro-

posed by Lee et al. [11], further referred to as EWLMF algorithm, is a time- and
order-recursive estimation procedure known for its low computational cost and
numerical robustness. The EWLMF algorithm is a lattice approximation of the
6 M. Niedźwiecki and D. Chojnacki

EWLS algorithm. The EWLS algorithm, equipped with the forgetting constant λk ,
0 < λk < 1, provides a direct signal parametrization

n|k (t) = {
P ρn|k (t),
a1,n|k (t), . . . ,
an,n|k (t)}

where

n|k (t) = [
α a1,n|k (t), . . . ,
an,n|k (t)]T

t−1
= arg min λik [y(t − i) − ϕTn (t − i)αn ]2 (3)
αn
i=0

1 i
t−1

ρn|k (t) = λ [y(t − i) − ϕTn (t − i)
αn|k (t)]2 (4)
L k (t) i=0 k

t−1 i
and L k (t) = i=0 λk denotes the effective width of the applied exponential window.
The explicit solution of (3) and (4) can be obtained in the form

where

1 i
t−1
n|k (t) =
R λ ϕ (t − i)ϕTn (t − i)
L k (t) i=0 k n

1 i
t−1
rn|k (t) =
λ y(t − i)ϕn (t − i)
L k (t) i=0 k

1 i 2
t−1

r0|k (t) = λ y (t − i) =
r0|k (t).
L k (t) i=0 k

The EWLMF algorithm estimates the normalized partial autocorrelation coefficients

directly from the data, yielding the lattice signal parametrization

n|k (t) = {
Q r0|k (t),
q1|k (t), . . . ,
qn|k (t)}

q1|k (t), . . . ,
The estimates qn|k (t) are usually called reflection coefficients. Due to
appropriate normalization, the estimates provided by the EWLMF algorithm obey
the condition

qi|k (t)| < 1, ∀t, i = 1, . . . , n

| (6)
Identification of Nonstationary Processes … 7

which guarantees that the corresponding “frozen” AR models are at all times stable.
Denote by

n|k (t) = H −1 [Q
P n|k (t)] = {
ρn|k (t),
a1,n|k (t), . . . ,
an,n|k (t)}

the direct parametrization that is an equivalent of the lattice parametrization yielded

where

n|k (t) = {
R r0|k (t),
r1|k (t), . . . , n|k (t)]
rn|k (t)} = G[Q

denotes an autocorrelation parametrization equivalent to Q n|k (t). Therefore, the

parametrization Pn|k (t) can be regarded as a stable approximation of P n|k (t).

Noncausal Lattice Algorithm

To obtain noncausal estimator of ρn (t) and αn (t) we will combine results yielded
by two lattice algorithms—the forward-time (−) EWLMF algorithm equipped with
a forgetting constant λk − , providing the estimates

− (t) = {
Q r0|k − (t),
q1|k − (t), . . . ,
qn|k − (t)}
n|k

and the backward time (+) EWLMF algorithm equipped with a forgetting constant
λk + providing the estimates

+ (t) = {
Q r0|k + (t),
q1|k + (t), . . . ,
qn|k + (t)}.
n|k

We will not assume that the forward and backward time EWLMF algorithms are
equipped with the same forgetting constants. Setting k − = k + , one can fuse long
memory forward time estimation results with short memory backward time ones or
vice versa. Such asymmetric variants may be useful in the presence of abrupt parame-
ter changes. Let π = {k − , k + }, T− (t) = {1, . . . , t − 1} and T+ (t) = {1, . . . , T0 − t}.
The combined estimate can be obtained using a three-step procedure.
8 M. Niedźwiecki and D. Chojnacki

1. First, one can determine the autocorrelation parametrizations corresponding to

− (t − 1) and Q
Q + (t + 1)
n|k n|k

± (t ± 1) = G[Q
R ± (t ± 1)] = {
r0|k ± (t ± 1),
r1|k ± (t ± 1), . . . ,
rn|k ± (t ± 1)}
n|k n|k

Since parametrizations Q − (t − 1) and Q + (t + 1) are stable, the covariance

n|k n|k
matrices made up of the estimates { ri|k − (t), i = 0, . . . , n} and {
ri|k + (t), i =
0, . . . , n} must be positive definite [14].
2. Second, the two-sided autocorrelation parametrization

n|π (t) = {
R r0|π (t),
r1|π (t), . . . ,
rn|π (t)}

can be obtained using the formula

ri|π (t) = μ− (t)
ri|k − (t − 1) + μ+ (t)
ri|k + (t + 1), i = 0, . . . , n (7)

where μ± (t) = L ± ±
k ± (t ± 1)/L π (t), L k ± (t ± 1) = i∈T± (t) λk ±
i−1
and L π (t) =
− +
L k − (t − 1) + L k + (t + 1). Note that since the sequence { ri|π (t), i = 0, . . . , n}
is a convex combination of { ri|k − (t − 1), i = 0, . . . , n} and { ri|k + (t + 1), i =
0, . . . , n}, the parametrization R n|π (t) is at all times stable.
3. Finally, based on R n|π (t), one can obtain the direct parametrization

n|π (t) = F[R

P n|π (t)] = {
ρn|π (t),
a1,n|π (t), . . . ,
an,n|π (t)}

The doubly exponentially weighted Lee-Morf-Friedlander (E2 WLMF) algorithm

described above differs from the one proposed in [10] in one important aspect—unlike
[10] the obtained parameter estimates do not depend (in a deterministic sense) on
the “central” sample y(t).
Similarly as in the case of the EWLMF estimate, one can show that the E2 WLMF
n|π (t) = [
estimate α a1,n|π (t), . . . ,
an,n|π (t)]T can be regarded as a “stable approxi-
mation” of the estimate obtained using the noncausal doubly exponentially weighted
least squares (E2 WLS) algorithm

n|π (t) = [
α a1,n|π (t), . . . ,
an,n|π (t)]T
t−1
−
= arg min λi−1
k − {y(t − i) − [ϕn (t − i)] αn }
T 2
αn
i=1

T0 −t
+
+ λi−1
k + {y(t + i) − [ϕn (t + i)] αn }
T 2

i=1

where
1
± ± (t ± 1) = λi−1 ± ±
k ± ϕn (t ± i)[ϕn (t ± i)]
T
Rn|k
L±
k ± (t ± 1) i∈T
± (t)

1
± ±

rn|k ± (t ± 1) = λi−1
k ± y(t ± i)ϕn (t ± i).
L±
k ± (t ± 1) i∈T (t) ±

Similarly, since α n|π (t) must obey Yule-Walker equations defined in terms of
ri|π (t), i = 0, . . . , n} [14], it holds that
{
−1
α
n|π (t) = μ− (t)R− − (t − 1) + μ+ (t)R + + (t + 1)
n|k n|k

− +
× μ− (t)
rn|k − (t − 1) + μ+ (t)
rn|k + (t + 1)

where
⎡ ⎤
r0|k ± (t ± 1)

rn−1|k ± (t ± 1)
n|k ± (t ± 1) = ⎢
R ⎣
.. ..
.
.. ⎥
⎦
. .
rn−1|k (t ± 1)
±
r0|k (t ± 1)
±

T
rn|k ± (t ± 1) =
r1|k ± (t ± 1) . . .
rn|k ± (t ± 1) .

Hence, the estimates α n|π (t) and α

Model Order and Estimation Memory Adaptation

Based on Pn|π (t), the parametric estimate of the instantaneous spectral density func-
tion Sn (ω, t) can be obtained in the form

ρn|π (t)

Sn|π (ω, t) = (9)
|A[e , α
jω n|π (t)]|2

where αn|π (t) = [

a1,n|π (t), . . . ,
an,n|π (t)]T .
Selection of the order n of the autoregressive model, and the choice of forgetting
factors λk ± plays an important role in parametric spectral analysis. If the order is
underestimated some important features of the resonant structure of {y(t)} may not
be revealed, while when it is overestimated some nonexistent resonances may be
10 M. Niedźwiecki and D. Chojnacki

indicated. In both cases one may arrive at false qualitative conclusions. The optimal
choice of λk − and λk + , i.e., the one that trades off the bias and variance components
of the mean squared parameter estimation error, depends on the rate of parame-
ter variation—forgetting factors should be smaller (which corresponds to shorter
memory) when process parameters are subject to fast changes, and larger (which
corresponds to longer memory) when parameters vary slowly with time.
Our solution to the order/memory optimization problem will be based on parallel
estimation. Consider several E2 WLMF algorithms with different order and memory
settings, running in parallel. Denote by N = {1, . . . , N } the set of all model orders
that will be considered, and by Π the set of all considered pairs π = {k− , k+ }. The
data-adaptive version of (9) can be expressed in the form

ρ n (t)|π(t) (t)

Sn (t)|π(t) (ω, t) = (10)
|A[e jω , αn (t)|π(t) (t)]|2

where

{
n (t), n (t),
π (t)} = { k− (t),
k+ (t)} = arg min Jn|π (t) (11)
n∈N
π∈Π

and Jn|π (t) denotes the local decision statistic.

The proposed selection criterion takes advantage of the fact that, unlike the esti-
n|π (t) are not functions of y(t) and therefore
mates considered in [10], the estimates α
they can be used to compute unbiased forward and backward prediction errors

ε± ±
n|π (t) = y(t) − [ϕn (t)] α
T
n|π (t).

Consequently, one can adopt for Jn|π (t) the following prediction error (PE) statistic

M
M
Jn|π (t) = [ε−
n|π (t − i)] +
2
[ε+
n|π (t + i)]
2
(12)
i=−M i=−M

where M ∈ [20, 50] is the parameter that controls the size of the local decision
window [t − M, t + M] centered around t.

Computational Complexity

Denote by K π ≤ K (K + 1)/2 the number of forward–backward pairs π = (k − , k + )

included in Π . For the assumed maximum model order N the per sample com-
putational load (the number of multiply–add operations) of the proposed parallel
estimation scheme is pretty low and is approximately equal to

l(N ) = 2K A(N ) + 2K B(N ) + K π C(N )

Identification of Nonstationary Processes … 11

where A(N ) = 30N denotes the load of the ELMF algorithm (given that the Newton-
Raphson method is used to evaluate square roots), B(N ) = 2N + N 2 denotes the load
of the G transform (computation of autocorrelation coefficients based on reflection
coefficients), and C(N ) = 2 + 4N + N 2 is the load of the F transform (computation
of autoregressive coefficients based on autocorrelation coefficients). Note that the first
stage of processing is computationally the cheapest one and that the only quantities
that have to be memorized during the forward/backward sweep of the EWLMF
algorithms are the forward/backward reflection coefficients.

Extension to Multivariate Autoregressive Processes

Unlike the univariate case, every zero-mean stationary multivariate AR process

−
has two (usually different) direct parametrizations Pn− = {ρ− −
n , A1,n , . . . , An,n } and
+ + + +
Pn = {ρn , A1,n , . . . , An,n }, corresponding to the forward-time and backward time
AR models, respectively:

n
±
y(t) = Ai,n y(t ± i) + en± (t), cov[en± (t)] = ρ±
n (13)
i=1

where y(t) = [y1 (t), . . . , ym (t)]T is the m-dimensional vector of signal components
±
and Ai,n , i = 1, . . . , n, denote m × m matrices of forward/backward autoregressive
coefficients. Based on (13), the spectral density (matrix) function of {y(t)} can be
evaluated using the formula
−1 jω − −T
Sn (ω) = A e− jω , α−
n ρ−
n A e , αn
−1 + −T
= A e− jω , α+
n ρn A e jω , α+
n (14)
±
where α± ±
n = [A1,n , . . . , An,n ] and

n
± −i
A z, α±
n = i − Ai,n z .
i=1

Similar to the univariate case, the process {y(t)} has a unique autocorrelation
parametrization
Rn = {R0 , R1 , . . . , Rn }

where Ri = E[y(t)yT (t − i)], and unique lattice parametrization

Qn = {R0 , Q1 , . . . , Qn }
12 M. Niedźwiecki and D. Chojnacki

where Qi , i = 1, . . . , n, denote the matrices of normalized partial autocorrelation

(reflection) coefficients.
Noncausal identification of a multivariate AR process can be carried out in an
analogous way to that described in section “Noncausal Lattice Algorithm”. First, at
each time instant t, one can use the multivariate version of the EWLMF algorithm to
evaluate the matrices of forward-time and backward-time normalized reflection coef-
i|k
ficients Q±
(t ± 1), i = 1, . . . , N , k ∈ K. Then, for all selections of n and π, one can
evaluate the two-sided parametrizations R n|π (t), n ∈ N , π ∈ Π and, after solving
the corresponding Yule-Walker equations—the two-sided direct parametrizations
±
Pn|π ρ±
(t) = { ± ±
n|π (t), A1,n|π (t), . . . , An,n|π (t)}.

Finally, the best local combination of n and π can be selected using the decision rule
(11) after adopting

Jn|π (t) = det E n|π (t) (15)

i=−M i=−M

where

n
ε±
n|π (t) = y(t) −
±
Ai,n|π (t)y(t ± i).
i=1

Note that (15) is a natural extension of the univariate statistic (12). Another option
(computationally less demanding) is choosing Jn|π (t) in the form

Jn|π (t) = tr E n|π (t) . (16)

Once the quantities

n (t) and
π (t) are established, the estimate of the instantaneous
spectral density matrix can be obtained in the form
−1
±
S n (t)|π(t) (ω, t) = A e− jω , α ρ±
π (t) (t)
n (t)| π (t) (t)×
n (t)|
−T
× A e jω , α ±n (t)|
π (t) (t) (17)

where
±
α ± ±
n|π (t) = [A1,n|π (t), . . . , An,n|π (t)].
Identification of Nonstationary Processes … 13

Simulation Results

To verify the proposed order and estimation memory selection rule, a nonstationary
variable-order autoregressive process was generated. Process generation was based
on four time-invariant AR anchor models M1 , M2 , M3 and M4 , of orders 2, 4, 6 and
8, respectively. The characteristic polynomial Ai (z) of the model Mi had i pairs of
complex conjugate zeros, given by z k± = 0.995e± jkπ/5 , k = 1, . . . , i. Two simulation
scenarios were considered, incorporating either smooth parameter changes (scenario
A) or abrupt parameter changes (scenario B).
In the first case, depicted in Fig. 1, the generated signal {y(t), t = 1, . . . , T0 } had
stationary periods, during which it was governed by anchor models, and nonstationary
periods, when the generating model was obtained by morphing one anchor model into
another one. Transition from Mi−1 to Mi was realized by moving, with a constant
speed, the i-th pair of complex conjugate zeros from their initial zero positions

Fig. 1 Trajectories of zeros

1
of the characteristic
z+3 z+2
polynomial (top figure),
Imaginary Part

0.8
simulation scenario A
corresponding to smooth 0.6
parameter variation (middle z+4 z+1

figure), and the 0.4

corresponding time-varying
0.2
spectral density function
(bottom figure) 0
-1 0 1
Real Part
M4
M3
M2
M1
1 t1 t2 t3 t4 t5 t6 T0

20
S( ,t)

-20
5000
0

2500
t

1
14 M. Niedźwiecki and D. Chojnacki

M4
M3
M2
M1
1 t7 t8 t9 T0

20
S( ,t)

-20
5000
0

t 2500

Fig. 2 Simulation scenario B corresponding to abrupt parameter changes (top figure), and the
corresponding time-varying spectral density function (bottom figure)

towards the unit circle—see Fig. 1. The simulation scenario is symbolically depicted
in Fig. 1. Note that according to this scenario the order of the generating model was
gradually increased from two to eight.
In the second case, illustrated in Fig. 2, the model Mi−1 was instantaneously
switched to Mi . In this case the order of the model was gradually decreased from
eight to two.
The adopted value of T0 was equal to 5000 and the breakpoints, marked with
bullets in Figs. 1 and 2, had the following time coordinates: t1 = 1000, t2 = 1500,
t3 = 2500, t4 = 3000, t5 = 4000, t6 = 4500 (for type-A changes), and t7 = 1250,
t8 = 2750, t9 = 4250 (for type-B changes). The parallel estimation scheme was made
up of 4 E2 WLMF algorithms combining results yielded by K = 3 forward/back-
ward EWLMF trackers equipped with forgetting constants λ1 = 0.95, λ2 = 0.99
and λ3 = 0.995. The four combinations of forward/backward forgetting constants
were: (0.99, 0.99), (0.995, 0.995), (0.995, 0.95), and (0.95, 0.995), which corresponds
to π1 = (2, 2), π2 = (3, 3), π3 = (3, 1) and π4 = (1, 3), respectively. The parameter
M, which determines the width of the local decision window, was set to 50.
Two measures of fit were used to evaluate identification results: the mean squared
parameter tracking error and the Itakura-Saito spectral distortion measure, both aver-
aged over t ∈ [1, T0 ] and 100 independent realizations of {y(t)}. Table 1 com-
pares results yielded by three unidirectional (λ1 , . . . , λ3 ) and four bidirectional
Identification of Nonstationary Processes … 15

Table 1 Averaged Itakura-Saito distortion measures (left tables) and mean square parameter esti-
mation errors (right tables) in two cases described in the text.

Smooth parameter variation

n/N λ1 λ2 λ3 π1 π2 π3 π4 Adaptive n/N λ1 λ2 λ3 π1 π2 π3 π4 Adaptive

1 4,600 4,266 4,199 4,185 4,131 4,193 4,170 4,155 1 12,027 11,986 11,956 12,002 11,992 11,959 12,049 12,010
2 3,183 2,751 2,796 2,551 2,552 2,697 2,644 2,603 2 8,673 8,623 8,577 8,679 8,685 8,587 8,789 8,681
3 3,093 2,619 2,660 2,398 2,397 2,559 2,488 2,446 3 6,504 6,386 6,339 6,418 6,444 6,339 6,615 6,484
4 2,092 1,536 1,616 1,318 1,358 1,482 1,483 1,357 4 3,015 2,897 2,839 3,011 3,291 2,829 3,729 2,984
5 2,169 1,536 1,611 1,298 1,333 1,472 1,464 1,338 5 2,628 2,460 2,566 2,315 2,517 2,514 2,720 2,402
6 1,118 0,577 0,711 0,452 0,586 0,563 0,726 0,437 6 1,106 1,026 1,282 0,593 0,863 1,156 1,113 0,815
7 1,180 0,583 0,697 0,416 0,519 0,558 0,629 0,415 7 1,083 0,547 0,675 0,723 2,412 0,596 2,970 0,478
8 0,775 0,144 0,208 0,070 0,163 0,126 0,239 0,067 8 1,102 0,348 0,441 0,369 1,522 0,352 2,125 0,236
9 0,848 0,147 0,187 0,071 0,146 0,117 0,206 0,068 9 1,392 0,425 0,523 0,389 1,284 0,436 1,718 0,252
10 0,925 0,154 0,189 0,072 0,134 0,120 0,192 0,068 10 1,688 0,485 0,552 0,444 1,347 0,464 1,749 0,266
11 1,006 0,160 0,191 0,073 0,125 0,122 0,183 0,069 11 2,014 0,551 0,578 0,497 1,449 0,486 1,852 0,278
12 1,093 0,167 0,194 0,075 0,121 0,124 0,180 0,069 12 2,340 0,628 0,618 0,560 1,680 0,523 2,135 0,294
13 1,187 0,174 0,196 0,077 0,120 0,126 0,182 0,069 13 2,613 0,690 0,646 0,549 1,560 0,546 2,067 0,298
14 1,301 0,181 0,198 0,077 0,116 0,129 0,179 0,069 14 2,947 0,761 0,682 0,557 1,377 0,580 1,910 0,305
15 1,413 0,190 0,204 0,080 0,114 0,132 0,178 0,070 15 3,197 0,827 0,716 0,579 1,275 0,613 1,801 0,312
16 1,547 0,198 0,208 0,082 0,112 0,135 0,178 0,070 16 3,614 0,904 0,756 0,615 1,245 0,646 1,778 0,322
17 1,674 0,206 0,213 0,084 0,111 0,138 0,179 0,071 17 3,887 0,962 0,784 0,653 1,306 0,672 1,855 0,328
18 1,821 0,214 0,218 0,086 0,112 0,142 0,182 0,071 18 4,184 1,026 0,816 0,673 1,302 0,702 1,868 0,333
19 1,952 0,221 0,222 0,088 0,111 0,145 0,183 0,071 19 4,475 1,085 0,847 0,691 1,259 0,731 1,841 0,339
20 2,095 0,231 0,227 0,091 0,111 0,148 0,185 0,072 20 4,805 1,160 0,885 0,723 1,223 0,766 1,820 0,345

Abrupt parameter changes

n/N λ1 λ2 λ3 π1 π2 π3 π4 Adaptive n/N λ1 λ2 λ3 π1 π2 π3 π4 Adaptive

1 4.164 4.116 4.414 4.095 4.053 4.105 4.096 4.082 1 13.808 13.828 13.806 13.764 13.754 13.819 13.718 13.771
2 3.206 3.215 3.680 3.050 3.049 3.185 3.097 3.052 2 10.619 10.684 10.571 10.542 10.546 10.667 10.446 10.554
3 3.078 3.075 3.621 2.915 2.905 3.044 2.966 2.909 3 8.346 8.549 8.246 8.189 8.271 8.512 8.011 8.203
4 1.993 2.071 2.597 1.187 1.872 2.035 1.865 1.762 4 4.934 5.732 4.306 4.589 5.084 5.649 3.959 4.205
5 1.989 2.044 2.708 1.796 1.836 2.004 1.851 1.748 5 3.729 4.371 3.420 3.497 3.969 4.283 3.320 4.205
6 1.116 1.299 1.699 0.946 1.098 1.264 0.939 0.809 6 1.620 2.658 1.082 1.248 1.966 2.537 0.872 0.620
7 1.072 1.179 1.841 0.908 1.004 1.144 0.926 0.818 7 3.175 5.830 1.718 2.739 4.927 5.693 1.110 0.772
8 0.281 0.372 0.870 0.180 0.298 0.359 0.184 0.082 8 2.175 4.048 1.428 1.732 3.247 3.984 0.514 0.274
9 0.218 0.300 0.940 0.154 0.241 0.288 0.148 0.076 9 1.785 3.114 1.677 1.385 2.505 3.013 0.629 0.280
10 0.209 0.270 1.014 0.143 0.212 0.258 0.144 0.074 10 1.829 3.125 2.005 1.411 2.501 3.013 0.713 0.298
11 0.207 0.255 1.103 0.138 0.195 0.242 0.145 0.074 11 1.973 3.285 2.402 1.505 2.646 3.164 0.768 0.313
12 0.210 0.250 1.191 0.137 0.188 0.237 0.148 0.075 12 2.259 3.971 2.779 1.745 3.126 3.692 0.836 0.328
13 0.216 0.253 1.283 0.139 0.188 0.239 0.151 0.075 13 2.327 3.792 3.110 1.770 3.081 3.698 0.866 0.341
14 0.220 0.251 1.401 0.139 0.184 0.236 0.153 0.075 14 2.244 3.453 3.477 1.645 2.720 3.365 0.884 0.352
15 0.225 0.251 1.511 0.140 0.181 0.235 0.156 0.076 15 2.191 3.215 3.716 1.561 2.469 3.099 0.947 0.358
16 0.230 0.251 1.656 0.141 0.179 0.234 0.160 0.076 16 2.219 3.120 4.175 1.547 2.370 2.984 0.937 0.370
17 0.236 0.251 1.785 0.142 0.177 0.235 0.164 0.077 17 2.303 3.218 4.451 1.602 2.456 3.089 0.947 0.380
18 0.243 0.254 1.962 0.144 0.178 0.238 0.167 0.077 18 2.379 3.260 4.781 1.640 2.475 3.121 1.009 0.387
19 0.249 0.256 2.098 0.146 0.178 0.239 0.170 0.077 19 2.419 3.221 5.112 1.644 2.417 3.082 1.039 0.393
20 0.255 0.259 2.242 0.148 0.178 0.241 0.174 0.078 20 2.467 3.185 5.459 1.654 2.359 3.044 1.070 0.399

(π1 , . . . , π4 ) lattice algorithms (for different values of n), with the results yielded
by the proposed adaptive scheme (for different values of N ). Note that when the
model order is not underestimated (n, N ≥ 8) the algorithm with adaptive order and
memory assignment provides results that are uniformly the best, irrespective of the
choice of N .
Our second example shows the result of application of the proposed approach to
analysis of a real signal. Fig. 3 shows the plots of five fragments of a speech signal
(sampled at the rate of 22.05 kHz) and the corresponding estimates of the time-
varying spectrum obtained using the parallel estimation scheme described above
(with the same settings).
16 M. Niedźwiecki and D. Chojnacki

Fig. 3 Five fragments of a speech signal (left figures) and the estimated time-varying spectra (right
figures)
Identification of Nonstationary Processes … 17

Conclusion

A new noncausal (bidirectional) lattice filtering algorithm was designed for off-line
identification of nonstationary autoregressive processes and an adaptive mechanism
was proposed for dynamic selection of the number of estimated coefficients and the
most appropriate estimation memory, matching the degree of process nonstationarity.
It was shown that the proposed adaptive parallel estimation scheme outperforms the
fixed-order fixed-memory algorithms it is made up of.

References

1. Wada, T., Jinnouchi, M., Matsumura, Y.: Application of autoregressive modelling for the anal-
ysis of clinical and other biological data. Ann. Inst. Statist. Math. 40, 211–227 (1998)
2. Jirsa, V.K., McIntosh, A.R. (Eds.): Handbook of Brain Connectivity. Springer (2007)
3. Takalo, R., Hytti, H., Ihalainen, H., Sohlberg, A.: Adaptive autoregressive model for reduction
of noise in SPECT. Comp. Math. Methods in Medicine 2015(9) (2015)
4. Li, C., Nowack, R.L.: Application of autoregressive extrapolation to seismic tomography. Bull.
Seism. Soc. Amer. 94, 1456–1466 (2004)
5. Lesage, P., Glangeaud, F., Mars, J.: Applications of autoregressive models and time-frequency
analysis to the study of volcanic tremor and long-period events. J. Volc. Geotherm. Res. 114,
391–417 (2002)
6. Brillinger, D., Robinson, E.A., Schoenberg, F.P. (Eds.): Time Series Analysis and Applications
to Geophysical Systems. Springer (2012)
7. Baddour, K.E., Beaulieu, N.C.: Autoregressive models for fading channel simulation. IEEE
Trans. Wirel. Comm. 4, 1650–1662 (2005)
8. Hayes, J.F., Ganesh Babu, T.V.J.: Modeling and Analysis of Telecommunication Networks.
Wiley (2004)
9. Niedźwiecki, M.: Identification of Time-Varying Processes. Wiley (2000)
10. Niedźwiecki, M., Meller, M., Chojnacki, D.: Lattice filter based autoregressive spectrum esti-
mation with joint model order and estimation bandwidth adaptation. In: Proceedings 56th IEEE
Conference on Decision and Control, Melbourne, Australia, pp. 4618–4625 (2017)
11. Lee, D.T.L., Morf, M., Friedlander, B.: Recursive least squares ladder estimation algorithms.
IEEE Trans. Acoust. Speech Signal Process 29, 627–641 (1981)
12. Moulines, E., Priouret, P., Roueff, F.: On recursive estimation for time-varying autoregressive
processes. Ann. Statist. 33, 2610–2654 (2005)
13. Dahlhaus, R.: Locally stationary processes. Handbook Statist. 25, 1–37 (2012)
14. Friedlander, B.: Lattice filters for adaptive processing. Proc. IEEE 70, 829–867 (1982)
Normalized Entropy Aggregation
for Inhomogeneous Large-Scale Data

Maria da Conceição Costa and Pedro Macedo

Abstract It was already in the fifties of the last century that the relationship between
information theory, statistics and maximum entropy was established, following the
works of Kullback, Leibler, Lindley and Jaynes. However, the applications were
restricted to very specific domains and it was not until recently that the convergence
between information processing, data analysis and inference demanded the founda-
tion of a new scientific area, commonly referred to as Info-Metrics [1, 2]. As a huge
amount of information and large-scale data have become available, the term “big
data” has been used to refer to the many kinds of challenges presented in its analy-
sis: many observations, many variables (or both), limited computational resources,
different time regimes or multiple sources. In this work, we consider one particular
aspect of big data analysis which is the presence of inhomogeneities, compromis-
ing the use of the classical framework in regression modelling. A new approach is
proposed, based on the introduction of the concepts of info-metrics to the analysis
of inhomogeneous large-scale data. The framework of information-theoretic estima-
tion methods is presented, along with some information measures. In particular, the
normalized entropy is tested in aggregation procedures and some simulation results
are presented.

Keywords Big data · Info-Metrics · Maximum entropy · Normalized entropy

M. da Conceição Costa (B) · P. Macedo

Department of Mathematics and CIDMA – Center for Research
and Development in Mathematics and Applications, University of Aveiro,
3810-193 Aveiro, Portugal
e-mail: [email protected]
URL: http://www.ua.pt
P. Macedo
e-mail: [email protected]

© Springer Nature Switzerland AG 2019 19

Introduction

Inference and processing of limited information are still one of the most fascinat-
ing universal problems. As stated by Golan [2], a very recent publication, “[...]
the available information is most often insufficient to provide a unique answer or
solution for most interesting decisions or inferences we wish to make. In fact, insuf-
ficient information—including limited, incomplete, complex, noisy and uncertain
information—is the norm for most problems across all disciplines.” Also, regardless
of the system or question studied, any researcher observes only a certain amount of
information or evidence and optimal inference must take into account the relationship
between the observable and the unobservable, [3].
Info-Metrics is a constrained optimization framework for information processing,
modelling and inference with finite, noisy or incomplete information. It is at the inter-
section of information theory, statistical methods of inference, applied mathematics,
computer science, econometrics, complexity theory, decision analysis, modelling
and the philosophy of science, [2].
As Info-Metrics generalizes the Maximum Entropy (ME) principle by Jaynes
[4, 5], which in turn relies on the maximization of Shannon’s entropy, the notions
of information, uncertainty and entropy are fundamental to the understanding of the
methodologies involved. Each scientist and discipline have their own interpretation
and definition of information within the context of their research and understanding
but, in the context of Info-Metrics, it refers to the meaningful content of data, it’s
context and interpretation and how to transfer data from one entity to another. As
for uncertainty, it arises from a proposition or a set of possible outcomes where
none of the choices or outcomes is known with certainty (a proposition is uncertain
if it is consistent with knowledge but not implied by knowledge). Therefore, these
outcomes are represented by a certain probability distribution. The more uniform the
distribution, the higher the uncertainty that is associated with this set of propositions
or outcomes. Finally, the concept of entropy reflects what, on average, we expect to
learn from observations and it depends on how we measure information. Technically,
entropy is a measure of uncertainty of a single random variable. As such, entropy
can be viewed as a measure of uniformity.
For a brief discussion of entropy, let us consider the set A = {a1 , a2 , . . . , a K }
to be a finite set and p a proper probability mass function on A. The amount of
information needed to fully characterize all of the elements of this set consisting of
K discrete elements is defined by the Hartley’s formula, I ( A K ) = log2 K . Shannon’s
information content of an outcome ak is h(ak ) = h( pk ) ≡ log2 p1k . Shannon’s entropy
reflects the expected information content of an outcome and is defined as

K K
1 1
H ( p) ≡ pk log2 =− pk log2 pk = E log2 , (1)
k=1
pk k=1
p(X )

for the random variable X . This information criterion, expressed in bits, measures the
uncertainty of X that is implied by p. The entropy measure H ( p) reaches a maximum
Normalized Entropy Aggregation for Inhomogeneous Large-Scale Data 21

when p1 = p2 = · · · = p K = K1 and a minimum with a point mass function. The

entropy H ( p) is a function of the probability distribution p and not a function of the
actual values taken by the random variable.
The remainder of the paper is laid out as follows: in section “Generalized
Maximum Entropy Estimator”, maximum entropy and generalized maximum entropy
estimation are briefly discussed. Section “Large-Scale Data and Aggregation” illus-
trates some traditional aggregation procedures and a new proposal based on normal-
ized entropy. Section “Simulation Study” presents simulation results. Some conclu-
sions and topics for future research are given in section “Concluding Remarks”.

Generalized Maximum Entropy Estimator

The ME principle was discussed by Golan et al. [6], in order to develop analytical
and empirical methods for recovering the unobservable parameters of a pure linear
inverse problem. Considering then

y = X p, (2)

where y is the vector (N × 1) of observations, X is a non-invertible matrix (N × K )

with N < K , and p is the vector (K × 1) of unknown probabilities, the ME principle
consists in choosing p that maximizes Shannon’s entropy

K
H ( p) = − pk ln pk = − p ln p, (3)
k=1

subject to the data consistency restriction, y = X p, and the additivity restriction,

p 1 = 1. Formally, the ME estimator is given by

argmax − p ln p , (4)
p

subject to the model consistency and additivity constraints,

y = Xp
. (5)
1 p = 1

There is no closed-form analytical solution, but a numerical approximation can be

obtained using the Lagrange multipliers. It can be said that the Jaynes maximum
entropy formalism has enabled us to solve the pure inverse problem with this opti-
mization (maximization) procedure, regarding it as an inference problem. The ME
principle is the basis for transforming the information in the data into a probabilistic
distribution that reflects our uncertainty about individual outcomes.
To extend the ME estimator to the linear regression model represented by
22 M. da Conceição Costa and P. Macedo

y = Xβ + e, (6)

where, as usually, y denotes a (N × 1) vector of noisy observations, β is a

(K × 1) vector of unknown parameters, X is a known (N × K ) matrix of explana-
tory variables, and e is the (N × 1) vector of random disturbances (errors), Golan
et al. [6], considered each βk as a discrete random variable with a compact support
and M ≥ 2 possible outcomes and each en as a finite and discrete random variable
with J ≥ 2 possible outcomes. The error vector is considered here as another vector
of unknown parameters to be estimated simultaneously with the vector β. In this
context, the linear regression model is represented as

y = X Z p + V w, (7)

where ⎡ ⎤⎡ ⎤
z 1 0 ... 0 p1
⎢0 z 2 ... 0 ⎥⎢ p2 ⎥
⎢ ⎥⎢ ⎥
β = Zp = ⎢ . .. .. .. ⎥⎢ .. ⎥, (8)
⎣ .. . . . ⎦⎣ . ⎦
0 0 . . . z K pK

and ⎡ ⎤⎡ ⎤
v 1 0 ... 0 w1
⎢0 v 2 ... 0 ⎥ ⎢ w2 ⎥
⎢ ⎥⎢ ⎥
e = Vw = ⎢ . .. .. .. ⎥ ⎢ .. ⎥. (9)
⎣ .. . . . ⎦⎣ . ⎦
0 0 . . . v N wN

Matrices Z (K × K M) and V (N × N J ) are the matrices of support values and

vectors p (K M × 1) and w (N J × 1) are the vectors of unknown probabilities to
be estimated. Note that each βk , k = 1, 2, . . . , K , and each en , n = 1, 2, . . . , N ,
are viewed as expected values of discrete random variables z k and v n , respectively,
with M ≥ 2 and J ≥ 2 possible outcomes, within the lower and upper bounds of
the corresponding support spaces. Thus, the generalized maximum entropy (GME)
estimator is given by
argmax − p ln p − w ln w , (10)
p,w

subject to the consistency (with the model) and additivity (for p and w) constraints,
⎧
⎨ y = X Z p + V w,
1 K = (I K ⊗ 1 M ) p, (11)
⎩
1 N = (I N ⊗ 1 J )w,

where ⊗ represents the Kronecker product. The optimal probability vectors,

p and
, are used to obtain point estimates of the unknown parameters and the unknown
w
errors with
β = Z p and e = Vw . Some properties of the GME estimator, such
Normalized Entropy Aggregation for Inhomogeneous Large-Scale Data 23

as consistency and asymptotic normality, are discussed in detail, for example, in

Mittelhammer et al. [7].

Large-Scale Data and Aggregation

Large-scale data or big data usually refers to datasets that are large in different
ways: many observations, many variables (or both); observations are recorded in
different time regimes or are taken from multiple sources. Some difficult issues arise
in dealing with this kind of data like, for instance, retaining optimal (or, at least,
reasonably good) statistical properties with a computationally efficient analysis; or
dealing with inhomogeneous data that does not fit in the classical framework: data is
neither i.i.d. (exhibiting outliers or not belonging to same distribution) nor stationary
(time-varying effects may be present).
Standard statistical models (linear or generalized linear models for regression or
classification) fail to capture inhomogeneity structure in data, compromising estima-
tion and interpretation of model parameters, and, of course, prediction. On the other
hand, statistical approaches for dealing with inhomogeneous data (such as varying-
coefficient models, mixed effects models, mixture models or clusterwise regression
models) are typically very computationally cumbersome.
Ignoring heterogeneity in data, computational burden can be addressed with the
following procedure, [8]: firstly, construct g groups from the large-scale data (groups
may be overlapping and may not cover all observations in the sample); then, for each
group compute an estimator, β g , through standard techniques (e.g., OLS, ridge or
LASSO); finally, considering the ensemble of estimators, aggregate them into a single
estimator, β.

Traditional Aggregation Procedures

Several aggregation procedures have been already proposed in literature. Three of

them are presented next.
1. Bagging: this procedure results in less computational complexity and even allows
for parallel computing. It simply averages the ensemble estimators with equal
weight to obtain the aggregated estimator, [8, 9]:

G

β := wg
βg , (12)
g=1

where wg = G1 for all g = 1, 2, . . . , G. The estimates

β g are obtained from
bootstrap samples, where the groups are sampled with replacement from the
24 M. da Conceição Costa and P. Macedo

whole data. It is a simple procedure and the weights do not depend on the
response y, but it is not suitable for inhomogeneous data.
2. Stacking: instead of assigning a uniform weight to each estimator, [10, 11]
proposed the aggregated estimator

G

β := wg
βg , (13)
g=1

where

G

w := argmin y − yg
wg , (14)
w∈W g=1
2

and, using a ridge constraint, W = {w : w ≤ s}, for some s > 0, or using a
sign constraint, W = {w : min wg ≥ 0}, or using a convex constraint,
g
G
W = {w : min wg ≥ 0 and g=1 wg = 1}. The idea is to find the optimal lin-
g
ear or convex combination of all ensemble estimators, but it is also not suitable
for inhomogeneous data.
3. Magging: corresponds to maximizing the minimally “explained variance” among
all data groups, [8], such that

G

β := wg
βg , (15)
g=1

where
G

w := argmin yg
wg , (16)
w∈W g=1
2

G
and W = {w : min wg ≥ 0 and g=1 wg = 1}. The idea is to choose the
g
weights as a convex combination to minimize the · 2 of the fitted values,

y. If the solution is not unique, it is considered the solution with lowest · 2
of the weight vector among all solutions. This procedure was the first that we
are aware of that was proposed for heterogeneous data. The main idea is that
if an effect is common across all groups, then it cannot be “averaged away” by
searching for a specific combination of the weights. The common effects will
be present in all groups and will be retained even after the minimization of the
aggregation scheme.
We believe the question as to whether the effects are really common across all
groups may not be answered straightforwardly. If the groups carry information about
the whole dataset and there are inhomogeneities, why should we consider that, with
random sub-sampling, all groups are equally informative?
Normalized Entropy Aggregation for Inhomogeneous Large-Scale Data 25

These considerations led us to the idea of choosing the groups according to their
“information content”.

Proposed Aggregation Procedure

To measure the information content in a system and to measure the importance of the
contribution of each piece of data or constraint in reducing uncertainty, Golan et al.
[6], stated that, in the ME formulation, the maximum level of entropy-uncertainty
results when the information-moment constraints are not enforced and the distribu-
tion of probabilities over the K states is uniform. As each piece of effective data is
added, there is a departure from the uniform distribution, which implies a reduction
of uncertainty. The proportion of the remaining total uncertainty is measured by the
normalized entropy (NE),

pk ln
pk
p) = − k
S( , (17)
ln(K )

where S( p) ∈ [0, 1] and ln(K ) represents maximum uncertainty (the entropy level of
the uniform distribution with K outcomes). A value S( p) = 0 implies no uncertainty
and a value S(p) = 1 implies perfect uncertainty. Related to the normalized entropy,
the information index (II) is defined as 1 − S( p) and measures the reduction in
uncertainty.
In this work, we propose a new aggregation scheme that is based on identifying
the information content of a given group through the calculation of the normalized
entropy. The proposed NE aggregated estimator is then

G

β := wg
βg , (18)
g=1

where wg is defined by normalized entropy using GME,

p ln
− p
p)g =
S( , (19)
K ln M
G
for the signal, Xβ, such that g=1 wg = 1. This aggregation procedure is a weighted
average of the collection of regression coefficient estimates as in Bagging, Stacking
and Magging. The idea is almost as simple as Bagging and it is expected to provide
similar results if the data is homogeneous. However, since the weights in (18) will
depend on the information content of each group according to (19), or some function
of it, the weights will be, in general, non-uniform (as in Stacking and Magging) if
the data is inhomogeneous.
26 M. da Conceição Costa and P. Macedo

Following section reports some simulated situations for which the NE aggregated
estimator was calculated and compared to the aggregated estimator based on Bagging.

Simulation Study

A linear regression model was considered, where X is the simulated matrix of

explanatory variables, drawn randomly from normal distributions; β is a vector of
parameters, e is the vector of random disturbances, drawn randomly from normal dis-
tributions and y is the constructed vector of noisy observations. For this simulation,
β was considered as

β = [1.8, 1.2, −1.4, 1.6, −1.8, 2.0, −2.0, 0.2, −0.4, 0.6, 0.8].
(20)
Necessary reparameterizations were done considering M = 5 and J = 3 and dif-
ferent matrices Z containing the supports for the parameters. The support matrix
V containing the supports for the errors was set considering symmetric and zero-
centred supports using the three-sigma rule with the empirical standard deviation of
the noisy observations.
Simulations were done considering X a (20000 × 11) matrix; β a (11 × 1) vec-
tor; e a (20000 × 1) vector and y a (20000 × 1) vector. The error distribution was
considered to be normal, with mean value zero and standard deviation five. Sev-
eral matrices X of explanatory variables were simulated, corresponding to different
condition numbers (c.n.).1 Random sub-sampling with replacement was done con-
sidering different number of groups and 50 observations per group. The Euclidean
norm of the difference between the aggregated estimator β and the true parameter β,
β − β2 , is calculated for each simulated case and the results are given in Tables
1, 2, 3, 4 and 5. For each case, three different solutions are presented, namely,
1. NE1: the chosen β corresponds to the GME estimate for the group with lower
normalized entropy, (NE). This solution does not correspond, in fact, to an aggre-
gated estimator; it corresponds to a chosen estimate amongst all groups;
2. NE2: the chosen β corresponds to the weighted average of the GME estimates
of all groups, weighted by the information index, II, where II = 1−NE;
3. Bgg: the β chosen corresponds to Bagging (average of the OLS estimates of all
groups).2
The present results are intended to highlight the overall tendencies we encountered
in the simulation study. Many other situations were simulated, with many different
matrices of explanatory variables, X, corresponding to a wide range of variation
regarding the matrix condition number, which, as is well known, is related to the

1 Ratioof the largest singular value of X, with the smallest singular value.
2 It
is not considered here the case of a single learning set, as in [9], and the need to take repeated
bootstrap samples from it.
Normalized Entropy Aggregation for Inhomogeneous Large-Scale Data 27

Table 1 Euclidean norm of the difference β̂ − β, with z k = [−10, 10]

n.g. Solution c.n. = 1337
5 NE1 4.26
NE2 4.18
Bgg 181.23

Table 2 Euclidean norm of the difference β̂ − β, with z k = [−10, 10]

n.g. Solution c.n. = 43030
5 NE1 4.22
NE2 4.25
Bgg 1432.59

Table 3 Euclidean norm of the difference β̂ − β, with z k = [−10, 10]

n.g. Solution c.n. = 1337
5 NE1 4.26
NE2 4.18
Bgg 181.23
10 NE1 4.47
NE2 4.31
Bgg 171.22
50 NE1 4.45
NE2 4.30
Bgg 49.36
100 NE1 5.48
NE2 4.34
Bgg 38.74

Table 4 Euclidean norm of the difference β̂ − β, with z k = [−100, 100]

n.g. Solution c.n. = 1337
5 NE1 32.31
NE2 10.17
Bgg 214.56

Table 5 Euclidean norm of the difference β̂ − β, with z k = [−100, 100]

n.g. Solution c.n. = 1337 c.n. = 43030
5 NE1 32.31 35.54
NE2 10.17 15.59
Bgg 214.56 5440.47
28 M. da Conceição Costa and P. Macedo

presence of collinearity3 in the explanatory variables. In this paper, only two extreme
cases were chosen to be presented, the first one corresponding to a relatively small
condition number (c.n. around 1300) and the second one corresponding to a much
higher condition number (c.n. around 43000).
It can be concluded that, for both cases, β − β2 is much lower for any of the
normalized entropy methodologies, when compared to Bagging, as can be seen from
any of the Tables 1, 2, 3, 4 and 5.
Comparing Tables 1 and 2, same number of groups (n.g.=5) and same support
vectors for the parameters (z k = [−10, 10]) were considered. The higher condition
number in Table 2 results in a much higher β − β2 for the Bagging procedure,
whereas the normalized entropy methodologies behave in the same way as with the
much lower condition number, revealing that the presence of collinearity does not
seem to compromise the results provided by the normalized entropy aggregation
procedures. Since the GME estimator is appropriate in the estimation of ill-posed
models, including models with ill-conditioned design matrices, these results are not
surprising.
Considering Table 3, the analysis was done changing the number of groups in the
aggregation. The Bagging procedure tends to provide better results in terms of lower
β − β2 , as the number of groups rises. This observation does not come as a surprise
due to sampling and inferential statistics theory. The normalized entropy methodolo-
gies do not seem to follow this behaviour, as the β − β2 remains approximately
constant as the number of groups gets higher. This may be considered an advantage
of this aggregation procedure, since there is no need for bigger data sets (and conse-
quent higher computational burden) in order to have comparable results in terms of
precision.
Finally, Tables 4 and 5 refer to the effect of changing the amplitude of the support
vectors, z k . It can be seen that as the support vector z k changes from [−10, 10],
in Table 1, to [−100, 100], in Table 4, all aggregation procedures provide worse
results in terms of β − β2 . Widening the amplitude of the support vectors results
in a less informative probability distribution for the parameters, which should lead
to a smaller departure from total uncertainty as compared to the situation where
the support vectors are less wide. It is expected, then, that the normalized entropy
methodologies provide better results when the amplitude of the support vectors are
smaller. The results of the simulation study are in agreement with this interpretation.
Nevertheless, when the same analysis is done considering a matrix of explanatory
variables X, with higher condition number, as presented in Table 5, even though the
normalized entropy methodologies provide worse results, as already discussed, the
Bagging procedure provides even worse results: while β − β2 changes from 4.25
to 15.59 for the information index weighted average of the GME estimates (solution
NE2), the corresponding change for the Bagging procedure is from 1432.59 (which
is already a very poor value concerning the precision of the estimates) to 5440.47.

3 The concept is not used here in a literal sense. A discussion about similar notions of this concept
is available in Belsley et al. [12, pp. 85–98].
Normalized Entropy Aggregation for Inhomogeneous Large-Scale Data 29

Concluding Remarks

The idea of an aggregation procedure based on normalized entropy is promising as

it is clear from the simulation study that this approach provides very satisfactory
solutions. The normalized entropy methodologies, in particular, the aggregation pro-
cedure based on the weighting of the groups by the information index, always results
in a β − β2 much lower than the one obtained with Bagging. This discrepancy
tends to aggravate in the presence of high collinearity, as that is the case when the
explanatory variables matrices, X, have high condition numbers. On the other hand,
the use of more groups in the aggregation scheme does not seem to improve the over-
all quality of the estimates obtained through the normalized entropy methodologies,
what turns out to be an advantage towards this procedure. These observations sug-
gest that a further and thorough simulation analysis with different error structures
or severe inhomogeneities may reveal substantial differences between normalized
entropy aggregation schemes and Bagging, eventually penalizing the second. These
analysis will be conducted in future work, along with investigation of other scenarios,
such as the detection of zero coefficients, non-normal regressors and other violations
of the classical framework. Also, the comparison with Magging is a very important
analysis that remains to be explored.

Acknowledgements This research was supported by the Portuguese national funding agency for
science, research and technology (FCT), within the Center for Research and Development in Math-
ematics and Applications (CIDMA), project UID/MAT/04106/2019.

References

1. Golan, A.: On the state of art of Info-Metrics. In: Huynh, V.N., Kreinovich, V., Sriboonchitta, S.,
Suriya, K. (Eds.) Uncertainty Analysis in Econometrics with Applications, pp. 3–15. Springer,
Berlin (2013)
2. Golan, A.: Foundations of Info-Metrics: Modeling, Inference, and Imperfect Information.
Oxford University Press, New York (2018)
3. Golan, A.: On the foundations and philosophy of Info-Metrics. In: Cooper, S.B., Dawar, A.,
Lowe, B.L. (Eds.) CiE2012. LNCS, vol. 7318, pp. 238–245. Springer, Heidelberg (2012)
4. Jaynes, E.T.: Information theory and statistical mechanics. Phys. Rev. 106, 620–630 (1957)
5. Jaynes, E.T.: Information theory and statistical mechanics II. Phys. Rev. 108, 171–190 (1957)
6. Golan, A., Judge, G., Miller, D.: Maximum Entropy Econometrics—Robust Estimation with
Limited Data. Wiley, Chichester (1996)
7. Mittelhammer, R., Cardell, N.S., Marsh, T.L.: The Data-constrained generalized maximum
entropy estimator of the GLM: asymptotic theory and inference. Entropy 15, 1756–1775 (2013)
8. Bühlmann, P., Meinshausen, N.: Magging: maximin aggregation for inhomogeneous large-
scale data. In: Proceedings of the IEEE 104 (1): Big Data: Theoretical Aspects, pp. 126–135.
IEEE Press, New York (2016)
9. Breiman, L.: Bagging predictors. Mach. Learn. 24, 123–140 (1996)
10. Wolpert, D.: Stacked generalization. Neural Netw. 5, 241–259 (1992)
11. Breiman, L.: Stacked regressions. Mach. Learn. 24, 49–64 (1996b)
12. Belsley, D.A., Kuh, E., Welsch, R.E.: Regression Diagnostics—Identifying Influential Data
and Sources of Collinearity. Wiley, Hoboken, New Jersey (2004)
Modified Granger Causality in Selected
Neighborhoods

Martina Chvosteková

Abstract Although Granger causality is a widely used technique to detect the causal
relationship between time series, its direct application for nonlinearly modeled data
is not appropriate. There have been proposed several extensions to nonlinear cases,
but there is no method appropriate for detecting relations between time series in
general. We present a new measure for evaluation of a causal effect between two
time series, which is calculated on the selected local approximations of time-delay
embedding reconstruction of state space by a linear regression model. The novel
causal measure, called the modified Granger causality in selected neighborhoods
(MGCiSN), reflects the proportion of the explained variation of the modeled variable
by the past of the second variable only. The proposed procedure for evaluating the
direct causal link between two nonlinearly modeled time series is applied to four
data sets with different known nonlinear causal structures. Our experimental results
support that the MGCiSN correctly detects underlying causal relationship in many
cases and does not detect false causality, regardless of the number of samples.

Keywords Granger causality · Time-delay embedding reconstruction · Linear

regression model · Prediction error

Introduction

Assessing the presence of directional interactions between simultaneously recor-

ded variables is an essential issue in diverse areas including finance, neuroscience,
sociology, and others. The Granger causality [7] has become the most popular tool to
identify relationship between two time series corresponding to the recorded variables
due to its computational simplicity. A variable X is said to Granger-cause another
variable Y if the prediction error of Y in a linear regression model including its own
past values and the past values of X as predictors is less (in some suitable sense)

M. Chvosteková (B)
Institute of Measurement Science of Slovak Academy of Sciences, 9 Dúbravská cesta, 84104
Bratislava, Slovakia
e-mail: [email protected]

than the prediction error of Y in a linear regression model including only its own past
values. X Granger-causes Y means that the variable X was found to be helpful for
forecasting variable Y . So, the notation of Granger causality implies predictability
and precedence, but it does not imply true causality. Note that the detection approach
can be theoretically used also for time series actually not generated by a linear
regression model [1]. The most important issue for using the Granger causality
analysis is verification that the time series data can be modeled by a stochastic
linear regressive scheme. If the model assumption is not satisfied, the result of the
commonly used F-test (or Wald test) for Granger causality is not approved.
Extensions of the Granger causality to nonlinear cases have been explored, e.g.,
in [2, 12, 13]. Also, the causality detection methods that do not directly follow the
traditional Granger causality methodology have been proposed, e.g., [10, 15, 16].
The results from recently published rigorous comparison study of various causality
detection methods indicate that the most methods have extremely low specificity
(they produce false detection of causality) and there is no rule how to choose the
appropriate method for particular data. Furthermore, extensive computations and no
straightforward interpretation of the numeric results decrease the usefulness of some
of these procedures. It can be concluded that identifying a causal relationship among
simultaneously acquired processes is still not a satisfactorily closed issue, even for
the bivariate case.
In this work, we present a new method for quantifying the causal structure for non-
linearly modeled bivariate time series. The proposed procedure, called the modified
Granger causality in selected neighborhoods (MGCiSN), is based on a local approx-
imation of the reconstructed dynamics by a linear regression model. The causality
detection methods developed on a similar approach have appeared in [2, 6]. The
presented procedure differs from the mentioned methods in the way how the local
neighborhoods are selected and the most important difference is in using variable for
exploring the relationship between time series. In our procedure, the direct causal
influence of X on Y is evaluated by determining the proportion of predicted variation
of the variable Y by the past of the variable X only. These proportions are calcu-
lated only on those local neighborhoods, where the linear regression model fits the
reconstructed joint dynamics. The goodness of fit in neighborhoods is assessed by
the coefficient of determination, R-squared (R 2 ). Finally, the MGCiSN is obtained
as the average of these proportions from suitable neighborhoods over the attractor.
We examined the suggested measure on numerical nonlinear time series with
known nonlinear asymmetric dependencies. Four artificially generated data sets were
analyzed: unidirectionally coupled nonidentical two Hénon maps, bidirectionally
coupled nonidentical two Hénon maps, and two systems composed of unidirection-
ally different nonlinear coupled chaotic maps.
Modified Granger Causality in Selected Neighborhoods 33

Methodology

Consider two variables X and Y , represented by the stationary time series x and
y, respectively. The classical Granger causality starts with fitting the time series by
a bivariate autoregressive model (V A R( p)), where the Akaike information crite-
rion (AIC) or the Schwartz Bayesian information criterion (BIC) is usually used to
determine the order p. In this study, the number of predictors, which are included
in the modeling, are chosen by employing Takens’ time-delay embedding [17]. The
state spaces
X corresponding to the time series x is reconstructed by the following
time-delayed embedding vector

x m x ,τx (t) = (x(t), x(t − τx ), . . . , x(t − (m x − 1)τx ))T , (1)

where m x is the embedding dimension and τx is the time delay. Similarly, the time-
delayed embedding vector reconstructing the state space Y of time series y is defined
as
y m y ,τ y (t) = (y(t), y(t − τ y ), . . . , y(t − (m y − 1)τ y ))T . (2)

The most common practice to determine the reconstruction parameters is to take the
delay as the first minimum of the mutual information between the delayed compo-
nents [5] and the embedding dimension is estimated by the false nearest neighbor
technique [8]. For investigating the Granger causality, the time delays must be equal,
therefore we will use τx = τ y = τ in the following text. Now, the embedding dimen-
sions m x and m y will determine the number of lagged observations of X and Y in a
linear regression model, respectively.
Let us express the delay vector of the joint state space of
X and Y as

z m x ,m y ,τ (t) = (x m x ,τ (t)T , y m y ,τ (t)T )T , (3)

where z m x ,m y ,τ (t) is a point in the (m x + m y )-dimensional reconstructed state space

Z . It is supposed that the joint dynamics can be locally approximated by a linear map,
written as z(t + τ ) = Az(t) + (t), where A is (m x + m y ) × (m x + m y ) coefficient
matrix and (t) is the error vector. In the next step of our procedure, the neighborhoods
in Z suitable for a local linear approximation are selected.

Selection of Local Neighborhoods

Let z m x ,m y ,τ (t[0] ) be a point in the joint state space

Z and z m x ,m y ,τ (t[1] ), z m x ,m y ,τ (t[2] ),
m x ,m y ,τ
..., z (t[k] ) be its k-nearest neighbors. For all k + 1 points, fit the full linear
regression models of the following form
34 M. Chvosteková

x −1
m m y −1

x(t[i] + τ ) = cx y + ax( xj) x(t[i] − jτ ) + ax( yj) y(t[i] − jτ ) + x y (t[i] ), (4)
j=0 j=0

x −1
m m y −1

y(t[i] + τ ) = c yx + a (yxj) x(t[i] − jτ ) + a (yyj) y(t[i] − jτ ) + yx (t[i] ), (5)
j=0 j=0

where x y (t), yx (t) are the prediction error terms and their magnitudes can be eval-
uated by their variances, i.e., var (x y ), var ( yx ). The unknown intercepts cx y , c yx ,
( j) ( j) ( j) ( j)
and the unknown coefficients ax x , ax y , a yx , a yy of the models (4, 5) can be deter-
mined by the least squares technique. Then, for the same k + 1 points, perform fitting
process of the intercept-only models of the following form

x(t[i] + τ ) = x̄ + x (t[i] ), (6)

y(t[i] + τ ) = ȳ + y (t[i] ), (7)

where x (t[i] ), y (t[i] ) are the prediction error terms, x̄ and ȳ is mean of x(t[i] )’s and
y(t[i] )’s, respectively. Now, we can define the sum of squares

k
k
sx2y = ˆx y (t[i] ) ,
2
sx2 = ˆx (t[i] )2 , (8)
i=0 i=0

k
k
2
s yx = ˆyx (t[i] )2 , s y2 = ˆy (t[i] )2 , (9)
i=0 i=0

where ˆx y , ˆyx , ˆx x , ˆyy , are determined estimates of error x y , yx , x , y based on
the fitted models (4, 5, 6, 7).
In classical linear regression analysis, the coefficient of determination, denoted
R 2 (R - squared), is often used for evaluating the model fit. The quality of the fitted
full linear regression model (5) in the neighborhood corresponding to z m x ,m y ,τ (t[0] )
is evaluated via
R 2y/{x,y} = 1 − s yx
2
/s y2 . (10)

If R 2y/{x,y} value is greater than a prescribed value, denoted R ∗ , then the (k + 1)-th
nearest point to z m x ,m y ,τ (t[0] ), denoted z m x ,m y ,τ (t[k+1] ), is added to the neighborhood
and R 2y/{x,y} is calculated again. If the new value of R 2y/{x,y} is not smaller than
one from the previous step, then the next nearest point z m x ,m y ,τ (t[k+2] ) is added to
the neighborhood and the procedure is repeated. The number of points in a suitable
neighborhood for detecting the causal link of X on Y increases as long as R 2y/{x,y} does
not decrease. Analogously, the appropriateness of a neighborhood in Z̃ for exploring
Modified Granger Causality in Selected Neighborhoods 35

2
the causal link Y to X is evaluated through the value Rx/{x,y} = 1 − sx2y /sx2 . Here we
∗
suggest to use R = 0.95 and k = 30.

Modified Granger Causality Index

The mathematical formulation of the classical Granger causality is based on a linear

regression modeling of stochastic processes. The Granger causality does not imply
true causality, it reflects variable’s prediction ability. The idea behind the Granger
causality is well comprehended. X Granger-causes Y , if the prediction error of y
from a linear regression model including only own past values of y as predictors
(restricted linear regression model) is reduced, in a statistically suitable sense, by
incorporating past values of x as predictors in the linear regression model. The
magnitude of the Granger causality of X on Y can be measured by the log ratio
FX →Y = ln(var ( yy )/var ( yx )), see, e.g., [1], or by the Granger causality index
δ X →Y = 1 − var ( yx )/var ( yy ) see, e.g., [2], where var ( yy ) denotes the prediction
error variance for the restricted model , and var ( yx ) denotes the prediction error
variance for the full linear regression model. Both measures should be null if the past
of x is ineffective to the prediction improvement of y. It is important to note that it
is meaningless to compare nonzero values of FX →Y from another couple of series
since FX →Y is not scaled on a range. The Granger causality index is a more pragmatic
quantity from this point of view, δ X →Y indicates a proportion of the variance of y in
the restricted model, which is not explained by past of y itself and can be explained
by added past of x as predictors to a linear representation of the observed processes.
In general, the Granger causality is focused on improving the prediction error of
modeled variable in a restricted model, but the value of the prediction error alone is out
of interest. It means that if (var ( yy ), var ( yx )) = (10, 1) or (var ( yy ), var ( yx )) =
(10−4 , 10−5 ), then δ X →Y = 0.9 in both cases, i.e., 90% of unexplained variation of
Y by its own past is expressed by implementing the past of X to a representation
of Y and the causal link X to Y is indicated. Consequently, the methods based on
the Granger causality concept fail to detect asymmetric causal dependencies, even
linear, between bivariate time series, the false causality is often detected, see, e.g.,
[9]. The suggested novel causality measure is based on specifying the proportion of
the variation of the modeled variable explained actually by past values of the second
variable in full linear regression model.
The proportion of the explained sum of squares of y by a fitted full linear regression
model is known through the value R 2y/{x,y} for a selected suitable neighborhood. In
the next step of our procedure, on the same specified neighborhood, the restricted
linear regression model of the form

m y −1

y(t[i] + τ ) = c yy + a (y j) y(t[i] − jτ ) + yy (t[i] ) (11)
j=0
36 M. Chvosteková

is fitted. The unknown parameters of the model, the intercept c yy and the coefficients
( j)
a y can be estimated by the least squares procedure. The estimates ˆyy (t[i] ) of the
prediction error yy (t[i] ) are used to determine the following sum of squares

k̃
2
s yy = ˆyy (t[i] )2 , (12)
i=0

where k̃ + 1 denotes the number of points in the suitable neighborhood. Then, the
proposed measure for quantifying the causal effect of X on Y in a selected suitable
neighborhood, denoted λ X →Y , is defined as

λ X →Y 2
s yy − s yx
2

2
s yy − s yx
2
λ X →Y = = , where λ X →Y = . (13)
R 2y/{x,y} s y2 − s yx
2 s y2

A value λ X →Y is a ratio of the explained sum of squares of y from the restricted
model (11) by the full model (5) and the total sum of squares of variable y. Thus,
the value λ X →Y indicates the proportion of the variation of the response variable y,
explained by the linear regression model including past of x and y as predictors,
actually predicted by x values only.
The magnitude of the suggested modified Granger causality of X on Y in selected
neighborhoods, denoted Λ X →Y , is the average of the indices λ X →Y from all suitable
neighborhoods on the attractor. Here, as the significant presence of the causal influ-
ence of X on Y , a Λ X →Y value greater than 0.01 is considered. Λ X →Y ≥ 0.01 means
that more than 1% of the explained variation of y by the full linear regression model
is actually expressed by past of x in the linear representation. In a similar way, the
magnitude of the MGCiSN of Y on X denoted ΛY →X can be defined.

Numerical Experiments

In order to investigate the behavior of the proposed causality measure, we artificially

generated time series from four model systems of nonlinear dynamics with known
nonlinear causal links: three systems composed of unidirectionally coupled chaotic
maps with different types of nonlinear causal effect and one system of bidirectionally
coupled chaotic maps with non-linear causal effect. In all numerical experiments,
the strength of the coupling c varied on a specified range and the MGCiSN was
evaluated on the data sets of N = 10000 and N = 1000 samples for each considered
c value. The time delay for the studied examples was set to τ = 1 and the embedding
dimensions in analyzed systems were set up to m x = m y = 2. Note that none of the
data sets is appropriate for the classical Granger causality analysis.
Modified Granger Causality in Selected Neighborhoods 37

Unidirectional Nonlinear Coupling I

In the first example, we studied two unidirectionally coupled nonidentical Hénon

maps (see, e.g., [4]):

x(t) = 1.4 − x(t − 1)2 + 0.3x(t − 2), (14)

y(t) = 1.4 − [cx(t − 1)y(t − 1) + (1 − c)y(t − 1)2 ] + 0.1y(t − 2) (15)

where the strength of the coupling c was varied from 0 to 0.78 with increments of 0.06.
By construction, the variable X represented by the time series x has causal influence
on the variable Y represented by the time series y for c = 0, and the variables X and
Y are not causally connected for c = 0.
Figures 1 and 2 show results of our first experiment. We observe that the MGCiSN
successfully indicates the absence of the causal influence of Y on X , ΛY →X is zero
at all values of c for N = 10000 and N = 1000. The MGCiSN correctly detected
the causal relationship of X on Y at c > 0.06 for both sample sizes, but Λ X →Y is
nonzero at almost all c > 0.

Fig. 1 Unidirectional 0.25

coupling I (true causality X Y

X → Y ) at coupling 0.2 Y X
= 0.01
strengths c =
{0, 0.06, . . . , 0.72, 0.78}.
MGCiSN ( )

0.15
MGCiSN (Λ) for
N = 10000 0.1

0.05

0 0.2 0.4 0.6

coupling c

Fig. 2 Unidirectional 0.3

X Y
coupling I (true causality Y X
X → Y ) at coupling 0.25 = 0.01
strengths c = 0.2
MGCiSN ( )

{0, 0.06, . . . , 0.72, 0.78}.

MGCiSN (Λ) for N = 1000 0.15

0.1

0.05

0
0 0.2 0.4 0.6
coupling c
38 M. Chvosteková

Unidirectional Nonlinear Coupling II

The next model represents two interacting nonidentical nonlinear time series with a
nonlinear causal effect:

x(t) = 3.4x(t − 1)[1 − x(t − 1)2 ]e−x(t−1) + 0.6x(t − 1),

2
(16)

y(t) = 3.4y(t − 1)[1 − y(t − 1)2 ]e−y(t−1) + 0.3y(t − 2) + cx 2 (t − 2).

2
(17)

The coupling strength c varied from 0 to 2 with a step of 0.2. By construction, X has
causal influence on Y for c = 0, and there is no causal connection between X and Y
for c = 0.
Figures 3 and 4 show results of our second experiment. We observe that the
MGCiSN successfully indicates the absence of causal influence of Y on X at any c
for both sample sizes, but ΛY →X is close to chosen significance level 0.01 at c close
to 2 for N = 1000. The MGCiSN correctly detected causal relationship from X to
Y at c > 0.04 for N = 10000 and N = 1000.

Fig. 3 Unidirectional
coupling II (true causality 0.4
X Y

X → Y ) at coupling Y X
= 0.01
strengths
0.3
c = {0, 0.2, . . . , 1.8, 2}.
MGCiSN ( )

MGCiSN (Λ) for

N = 10000 0.2

0.1

0
0 0.5 1 1.5 2
coupling c

Fig. 4 Unidirectional 0.6

coupling II (true causality X Y

X → Y ) at coupling 0.5 Y X
= 0.01
strengths
0.4
MGCiSN ( )

c = {0, 0.2, . . . , 1.8, 2}.

MGCiSN (Λ) for N = 1000 0.3

0.2

0.1

0
0 0.5 1 1.5 2
coupling c
Other documents randomly have
different content
remain freely available for generations to come. In 2001, the Project
Gutenberg Literary Archive Foundation was created to provide a
secure and permanent future for Project Gutenberg™ and future
generations. To learn more about the Project Gutenberg Literary
Archive Foundation and how your efforts and donations can help,
see Sections 3 and 4 and the Foundation information page at
www.gutenberg.org.

Section 3. Information about the Project

Gutenberg Literary Archive Foundation
The Project Gutenberg Literary Archive Foundation is a non-profit
501(c)(3) educational corporation organized under the laws of the
state of Mississippi and granted tax exempt status by the Internal
Revenue Service. The Foundation’s EIN or federal tax identification
number is 64-6221541. Contributions to the Project Gutenberg
Literary Archive Foundation are tax deductible to the full extent
permitted by U.S. federal laws and your state’s laws.

The Foundation’s business office is located at 809 North 1500 West,

Salt Lake City, UT 84116, (801) 596-1887. Email contact links and up
to date contact information can be found at the Foundation’s website
and official page at www.gutenberg.org/contact

Section 4. Information about Donations to

the Project Gutenberg Literary Archive
Foundation
Project Gutenberg™ depends upon and cannot survive without
widespread public support and donations to carry out its mission of
increasing the number of public domain and licensed works that can
be freely distributed in machine-readable form accessible by the
widest array of equipment including outdated equipment. Many
small donations ($1 to $5,000) are particularly important to
maintaining tax exempt status with the IRS.

The Foundation is committed to complying with the laws regulating

charities and charitable donations in all 50 states of the United
States. Compliance requirements are not uniform and it takes a
considerable effort, much paperwork and many fees to meet and
keep up with these requirements. We do not solicit donations in
locations where we have not received written confirmation of
compliance. To SEND DONATIONS or determine the status of
compliance for any particular state visit www.gutenberg.org/donate.

While we cannot and do not solicit contributions from states where

we have not met the solicitation requirements, we know of no
prohibition against accepting unsolicited donations from donors in
such states who approach us with offers to donate.

International donations are gratefully accepted, but we cannot make

any statements concerning tax treatment of donations received from
outside the United States. U.S. laws alone swamp our small staff.

Please check the Project Gutenberg web pages for current donation
methods and addresses. Donations are accepted in a number of
other ways including checks, online payments and credit card
donations. To donate, please visit: www.gutenberg.org/donate.

Section 5. General Information About

Project Gutenberg™ electronic works
Professor Michael S. Hart was the originator of the Project
Gutenberg™ concept of a library of electronic works that could be
freely shared with anyone. For forty years, he produced and
distributed Project Gutenberg™ eBooks with only a loose network of
volunteer support.
Project Gutenberg™ eBooks are often created from several printed
editions, all of which are confirmed as not protected by copyright in
the U.S. unless a copyright notice is included. Thus, we do not
necessarily keep eBooks in compliance with any particular paper
edition.

Most people start at our website which has the main PG search
facility: www.gutenberg.org.

This website includes information about Project Gutenberg™,

including how to make donations to the Project Gutenberg Literary
Archive Foundation, how to help produce our new eBooks, and how
to subscribe to our email newsletter to hear about new eBooks.
Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.

More than just a book-buying platform, we strive to be a bridge

connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.

Join us on a journey of knowledge exploration, passion nurturing, and

personal growth every day!

ebookbell.com

(Zimsec) Mathematics Notes Rules
100% (2)
(Zimsec) Mathematics Notes Rules
10 pages
Mathematics Module 2
No ratings yet
Mathematics Module 2
23 pages
Math 5 Quarter 1 Module 8
100% (1)
Math 5 Quarter 1 Module 8
10 pages
10 SVM
No ratings yet
10 SVM
23 pages
Course Outline
No ratings yet
Course Outline
1 page
Dce-Ku Water Supply and Sanitation (CIEG 313) Asst. Prof. Manish Prakash
No ratings yet
Dce-Ku Water Supply and Sanitation (CIEG 313) Asst. Prof. Manish Prakash
15 pages
Module #06d - Data Analysis Assignment - DATA
No ratings yet
Module #06d - Data Analysis Assignment - DATA
25 pages
Course Structure Spring Semester 2019 IIIT Kalyani
No ratings yet
Course Structure Spring Semester 2019 IIIT Kalyani
4 pages
87 Simultaneous Equations
No ratings yet
87 Simultaneous Equations
8 pages
Adaptive Speed Identification For Vector Control of Induction Motors Without Rotational Transducers
No ratings yet
Adaptive Speed Identification For Vector Control of Induction Motors Without Rotational Transducers
18 pages
SL Important Questions
No ratings yet
SL Important Questions
3 pages
Tachyon Energy
100% (1)
Tachyon Energy
4 pages
Ap2011 Solutions 03
No ratings yet
Ap2011 Solutions 03
10 pages
Module 4 PDF
No ratings yet
Module 4 PDF
20 pages
Electric Field
No ratings yet
Electric Field
6 pages
Shreeji Academy of Science Sem-1 Std-11 Chapter-6 Test Marks:30
No ratings yet
Shreeji Academy of Science Sem-1 Std-11 Chapter-6 Test Marks:30
1 page
MEC 309 Outline
No ratings yet
MEC 309 Outline
4 pages
ISE 563 Syllabus
No ratings yet
ISE 563 Syllabus
4 pages
Lab1 Microstrip Line FEM FDTD
No ratings yet
Lab1 Microstrip Line FEM FDTD
32 pages
Nike, Inc Cost of Capital Case Study
60% (5)
Nike, Inc Cost of Capital Case Study
19 pages
Double Tuned Filters
No ratings yet
Double Tuned Filters
4 pages
External Paper r20
No ratings yet
External Paper r20
3 pages
Webtoc
No ratings yet
Webtoc
58 pages
Aurally Adequate Evaluation of Sounds
No ratings yet
Aurally Adequate Evaluation of Sounds
19 pages
08 - Squares and Square Roots-22-52
No ratings yet
08 - Squares and Square Roots-22-52
23 pages
Edelberg 1999
No ratings yet
Edelberg 1999
15 pages
Babatunde 2018
No ratings yet
Babatunde 2018
19 pages
AP Easy Questions - Sum
No ratings yet
AP Easy Questions - Sum
12 pages
Year 9 Annual Examinations - Revision Booklet FINAL
No ratings yet
Year 9 Annual Examinations - Revision Booklet FINAL
13 pages
Eee-3901
No ratings yet
Eee-3901
8 pages
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (649)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brene Brown
4/5 (1175)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1857)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1267)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (4103)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (903)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (629)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (298)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (1139)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (943)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2289)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (2886)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (233)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (244)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (836)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (144)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (919)
Little Women
From Everand
Little Women
Louisa May Alcott
4.5/5 (2369)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
4.5/5 (815)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2546)
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)

Theory and Applications of Time Series Analysis Selected Contributions From Itise 2018 Olga Valenzuela Download

Uploaded by

Theory and Applications of Time Series Analysis Selected Contributions From Itise 2018 Olga Valenzuela Download

Uploaded by

Theory And Applications Of Time Series Analysis

Selected Contributions From Itise 2018 Olga

Explore and download more ebooks at ebookbell.com

Theory And Applications Of Time Series Analysis Selected Contributions

Theory And Applications Of Time Series Analysis And Forecasting

Fractal And Diffusion Entropy Analysis Of Time Series Theory Concepts

Characterizing Interdependencies Of Multiple Time Series Theory And

Understanding Behavior In The Context Of Time Theory Research And

Theory Of Translation Closedness For Time Scales With Applications In

Time Perspective Theory Review Research And Application Essays In

Theory And Applications Of Colloidal Suspension Rheology Norman J

More information about this series at http://www.springer.com/series/2912

Héctor Pomares Ignacio Rojas

Theory and Applications

Héctor Pomares Ignacio Rojas

1. Time Series Analysis and Forecasting

• Part: Econometric Models, Financial Forecasting, and Risk Analysis

use of an Empirical Mode Decomposition approach to R-peak detection and an

Granada, Spain Olga Valenzuela

Advanced Statistical Methods for Time Series Analysis

Advanced Computational Intelligence Methods for Time Series

Change Detection for Streaming Data Using Wavelet-Based Least

Econometric Models, Financial Forecasting and Risk Analysis

Time Series Analysis in Earth Sciences

Energy Time Series Forecasting

Time Series Analysis and Prediction in Other Real Problems

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379

Maciej Niedźwiecki and Damian Chojnacki

Abstract The problem of off-line identification of a nonstationary autoregressive

Keywords Identification of nonstationary processes · Selection of model order ·

Autoregressive analysis is a popular modeling tool, used to solve practical problems

M. Niedźwiecki (B) · D. Chojnacki

© Springer Nature Switzerland AG 2019 3

Nonstationary Autoregressive Processes

Equivalent Parametrizations of a Stationary

It is known that a zero-mean stationary AR process characterized by the set

Description of these mappings can be found, e.g., in [14].

Causal Lattice Algorithm

The exponentially weighted least squares normalized lattice/ladder algorithm pro-

The EWLMF algorithm estimates the normalized partial autocorrelation coefficients

qi|k (t)| < 1, ∀t, i = 1, . . . , n

the direct parametrization that is an equivalent of the lattice parametrization yielded

denotes an autocorrelation parametrization equivalent to Q n|k (t). Therefore, the

Noncausal Lattice Algorithm

1. First, one can determine the autocorrelation parametrizations corresponding to

Since parametrizations Q − (t − 1) and Q + (t + 1) are stable, the covariance

can be obtained using the formula

n|π (t) = F[R

The doubly exponentially weighted Lee-Morf-Friedlander (E2 WLMF) algorithm

Hence, the estimates α n|π (t) and α

Model Order and Estimation Memory Adaptation

where αn|π (t) = [

and Jn|π (t) denotes the local decision statistic.

Denote by K π ≤ K (K + 1)/2 the number of forward–backward pairs π = (k − , k + )

l(N ) = 2K A(N ) + 2K B(N ) + K π C(N )

Extension to Multivariate Autoregressive Processes

Unlike the univariate case, every zero-mean stationary multivariate AR process

where Ri = E[y(t)yT (t − i)], and unique lattice parametrization

where Qi , i = 1, . . . , n, denote the matrices of normalized partial autocorrelation

Jn|π (t) = det E n|π (t) (15)

Jn|π (t) = tr E n|π (t) . (16)

Once the quantities

Fig. 1 Trajectories of zeros

figure), and the 0.4

Smooth parameter variation

n/N λ1 λ2 λ3 π1 π2 π3 π4 Adaptive n/N λ1 λ2 λ3 π1 π2 π3 π4 Adaptive

Abrupt parameter changes

n/N λ1 λ2 λ3 π1 π2 π3 π4 Adaptive n/N λ1 λ2 λ3 π1 π2 π3 π4 Adaptive

Maria da Conceição Costa and Pedro Macedo

Keywords Big data · Info-Metrics · Maximum entropy · Normalized entropy

M. da Conceição Costa (B) · P. Macedo

© Springer Nature Switzerland AG 2019 19

when p1 = p2 = · · · = p K = K1 and a minimum with a point mass function. The

Generalized Maximum Entropy Estimator

where y is the vector (N × 1) of observations, X is a non-invertible matrix (N × K )

subject to the data consistency restriction, y = X p, and the additivity restriction,

denotes an autocorrelation parametrization equivalent to Q n|k (t). Therefore, the

Since parametrizations Q − (t − 1) and Q + (t + 1) are stable, the covariance

n|π (t) = F[R

where αn|π (t) = [

where ⊗ represents the Kronecker product. The optimal probability vectors,

where wg = G1 for all g = 1, 2, . . . , G. The estimates