(Ebook) Knowledge Discovery from Data Streams by Joao
Gama ISBN 1439826110 Pdf Download
https://ebooknice.com/product/knowledge-discovery-from-data-
streams-2253200
★★★★★
4.6 out of 5.0 (23 reviews )
DOWNLOAD PDF
ebooknice.com
(Ebook) Knowledge Discovery from Data Streams by Joao Gama
ISBN 1439826110 Pdf Download
EBOOK
Available Formats
■ PDF eBook Study Guide Ebook
EXCLUSIVE 2025 EDUCATIONAL COLLECTION - LIMITED TIME
INSTANT DOWNLOAD VIEW LIBRARY
Here are some recommended products that we believe you will be
interested in. You can click the link to download.
(Ebook) Knowledge Discovery from Sensor Data (Industrial Innovation)
by Auroop R. Ganguly, Joao Gama, Olufemi A. Omitaomu, Mohamed Medhat
Gaber, Ranga Raju Vatsavai ISBN 9781420082333, 1420082329
https://ebooknice.com/product/knowledge-discovery-from-sensor-data-
industrial-innovation-2187380
(Ebook) Knowledge-Guided Machine Learning: Accelerating Discovery
Using Scientific Knowledge and Data (Chapman & Hall/CRC Data Mining
and Knowledge Discovery Series) by Anuj Karpatne, Ramakrishnan Kannan,
Vipin Kumar ISBN 9780367693411, 0367693410
https://ebooknice.com/product/knowledge-guided-machine-learning-
accelerating-discovery-using-scientific-knowledge-and-data-chapman-
hall-crc-data-mining-and-knowledge-discovery-series-44169202
(Ebook) Geographic Data Mining and Knowledge Discovery, Second Edition
(Chapman & Hall CRC Data Mining and Knowledge Discovery Series) by
Harvey J. Miller, Jiawei Han ISBN 9781420073973, 1420073974
https://ebooknice.com/product/geographic-data-mining-and-knowledge-
discovery-second-edition-chapman-hall-crc-data-mining-and-knowledge-
discovery-series-2023104
(Ebook) Knowledge Discovery in Big Data from Astronomy and Earth
Observation: Astrogeoinformatics by Petr Skoda (editor), Fathalrahman
Adam (editor) ISBN 9780128191545, 0128191546
https://ebooknice.com/product/knowledge-discovery-in-big-data-from-
astronomy-and-earth-observation-astrogeoinformatics-11005930
(Ebook) Data Mining and Knowledge Discovery for Geoscientists by
Guangren Shi (Auth.) ISBN 9780124104372, 0124104371
https://ebooknice.com/product/data-mining-and-knowledge-discovery-for-
geoscientists-4556686
(Ebook) Information Discovery on Electronic Health Records (Chapman &
Hall CRC Data Mining and Knowledge Discovery Series) by Vagelis
Hristidis ISBN 9781420090383, 1420090380
https://ebooknice.com/product/information-discovery-on-electronic-
health-records-chapman-hall-crc-data-mining-and-knowledge-discovery-
series-1930942
(Ebook) Learning from Data Streams in Evolving Environments by Moamar
Sayed-Mouchaweh ISBN 9783319898025, 9783319898032, 3319898027,
3319898035
https://ebooknice.com/product/learning-from-data-streams-in-evolving-
environments-7150646
(Ebook) Statistical Data Mining & Knowledge Discovery by Hamparsum
Bozdogan ISBN 9780203497159, 9780203620540, 9781584883449, 0203497155,
0203620542, 1584883448
https://ebooknice.com/product/statistical-data-mining-knowledge-
discovery-1954732
(Ebook) Biological Data Mining (Chapman & Hall Crc Data Mining and
Knowledge Discovery Series) by Jake Y. Chen, Stefano Lonardi ISBN
1420086847
https://ebooknice.com/product/biological-data-mining-chapman-hall-crc-
data-mining-and-knowledge-discovery-series-2172726
Knowledge
Discovery from
Data Streams
© 2010 by Taylor and Francis Group, LLC
Chapman & Hall/CRC
Data Mining and Knowledge Discovery Series
SERIES EDITOR
Vipin Kumar
University of Minnesota
Department of Computer Science and Engineering
Minneapolis, Minnesota, U.S.A
AIMS AND SCOPE
This series aims to capture new developments and applications in data mining and knowledge
discovery, while summarizing the computational tools and techniques useful in data analysis. This
series encourages the integration of mathematical, statistical, and computational methods and
techniques through the publication of a broad range of textbooks, reference works, and hand-
books. The inclusion of concrete examples and applications is highly encouraged. The scope of the
series includes, but is not limited to, titles in the areas of data mining and knowledge discovery
methods and applications, modeling, algorithms, theory and foundations, data and knowledge
visualization, data mining systems and tools, and privacy and security issues.
PUBLISHED TITLES
UNDERSTANDING COMPLEX DATASETS: THE TOP TEN ALGORITHMS IN DATA MINING
DATA MINING WITH MATRIX DECOMPOSITIONS Xindong Wu and Vipin Kumar
David Skillicorn
GEOGRAPHIC DATA MINING AND
COMPUTATIONAL METHODS OF FEATURE KNOWLEDGE DISCOVERY, SECOND EDITION
SELECTION Harvey J. Miller and Jiawei Han
Huan Liu and Hiroshi Motoda
TEXT MINING: CLASSIFICATION, CLUSTERING,
CONSTRAINED CLUSTERING: ADVANCES IN AND APPLICATIONS
ALGORITHMS, THEORY, AND APPLICATIONS Ashok N. Srivastava and Mehran Sahami
Sugato Basu, Ian Davidson, and Kiri L. Wagstaff
BIOLOGICAL DATA MINING
KNOWLEDGE DISCOVERY FOR Jake Y. Chen and Stefano Lonardi
COUNTERTERRORISM AND LAW ENFORCEMENT
David Skillicorn INFORMATION DISCOVERY ON ELECTRONIC
HEALTH RECORDS
MULTIMEDIA DATA MINING: A SYSTEMATIC Vagelis Hristidis
INTRODUCTION TO CONCEPTS AND THEORY
Zhongfei Zhang and Ruofei Zhang TEMPORAL DATA MINING
Theophano Mitsa
NEXT GENERATION OF DATA MINING
Hillol Kargupta, Jiawei Han, Philip S. Yu, RELATIONAL DATA CLUSTERING: MODELS,
Rajeev Motwani, and Vipin Kumar ALGORITHMS, AND APPLICATIONS
Bo Long, Zhongfei Zhang, and Philip S. Yu
DATA MINING FOR DESIGN AND MARKETING
Yukio Ohsawa and Katsutoshi Yada KNOWLEDGE DISCOVERY FROM DATA STREAMS
João Gama
© 2010 by Taylor and Francis Group, LLC
Chapman & Hall/CRC
Data Mining and Knowledge Discovery Series
Knowledge
Discovery from
Data Streams
João Gama
© 2010 by Taylor and Francis Group, LLC
Chapman & Hall/CRC
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
© 2010 by Taylor and Francis Group, LLC
Chapman & Hall/CRC is an imprint of Taylor & Francis Group, an Informa business
No claim to original U.S. Government works
Printed in the United States of America on acid-free paper
10 9 8 7 6 5 4 3 2 1
International Standard Book Number: 978-1-4398-2611-9 (Hardback)
This book contains information obtained from authentic and highly regarded sources. Reasonable efforts
have been made to publish reliable data and information, but the author and publisher cannot assume
responsibility for the validity of all materials or the consequences of their use. The authors and publishers
have attempted to trace the copyright holders of all material reproduced in this publication and apologize to
copyright holders if permission to publish in this form has not been obtained. If any copyright material has
not been acknowledged please write and let us know so we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmit-
ted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented,
including photocopying, microfilming, and recording, or in any information storage or retrieval system,
without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copyright.
com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood
Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and
registration for a variety of users. For organizations that have been granted a photocopy license by the CCC,
a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used
only for identification and explanation without intent to infringe.
Library of Congress Cataloging‑in‑Publication Data
Gama, João.
Knowledge discovery from data streams / João Gama.
p. cm. ‑‑ (Chapman & Hall/CRC data mining and knowledge discovery series)
Includes bibliographical references and index.
ISBN 978‑1‑4398‑2611‑9 (hardcover : alk. paper)
1. Computer algorithms. 2. Machine learning. 3. Data mining. I. Title. II. Series.
QA76.9.A43G354 2010
006.3 ‘12‑‑dc22 2010014600
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com
© 2010 by Taylor and Francis Group, LLC
Contents
List of Tables xi
List of Figures xiii
List of Algorithms xv
Foreword xvii
Acknowledgments xix
1 Knowledge Discovery from Data Streams 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 An Illustrative Example . . . . . . . . . . . . . . . . . . . . . 2
1.3 A World in Movement . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Data Mining and Data Streams . . . . . . . . . . . . . . . . 5
2 Introduction to Data Streams 7
2.1 Data Stream Models . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.1 Research Issues in Data Stream Management Systems 8
2.1.2 An Illustrative Problem . . . . . . . . . . . . . . . . . 8
2.2 Basic Streaming Methods . . . . . . . . . . . . . . . . . . . . 9
2.2.1 Illustrative Examples . . . . . . . . . . . . . . . . . . . 10
2.2.1.1 Counting the Number of Occurrences of the
Elements in a Stream . . . . . . . . . . . . . 10
2.2.1.2 Counting the Number of Distinct Values in a
Stream . . . . . . . . . . . . . . . . . . . . . 11
2.2.2 Bounds of Random Variables . . . . . . . . . . . . . . 11
2.2.3 Poisson Processes . . . . . . . . . . . . . . . . . . . . . 13
2.2.4 Maintaining Simple Statistics from Data Streams . . . 14
2.2.5 Sliding Windows . . . . . . . . . . . . . . . . . . . . . 14
2.2.5.1 Computing Statistics over Sliding Windows:
The ADWIN Algorithm . . . . . . . . . . . . . 16
2.2.6 Data Synopsis . . . . . . . . . . . . . . . . . . . . . . 19
2.2.6.1 Sampling . . . . . . . . . . . . . . . . . . . . 19
2.2.6.2 Synopsis and Histograms . . . . . . . . . . . 20
2.2.6.3 Wavelets . . . . . . . . . . . . . . . . . . . . 21
2.2.6.4 Discrete Fourier Transform . . . . . . . . . . 22
© 2010 by Taylor and Francis Group, LLC
vi Knowledge Discovery from Data Streams
2.3 Illustrative Applications . . . . . . . . . . . . . . . . . . . . . 23
2.3.1 A Data Warehouse Problem: Hot-Lists . . . . . . . . . 23
2.3.2 Computing the Entropy in a Stream . . . . . . . . . . 24
2.3.3 Monitoring Correlations Between Data Streams . . . . 27
2.3.4 Monitoring Threshold Functions over Distributed Data
Streams . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.4 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3 Change Detection 33
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2 Tracking Drifting Concepts . . . . . . . . . . . . . . . . . . . 34
3.2.1 The Nature of Change . . . . . . . . . . . . . . . . . . 35
3.2.2 Characterization of Drift Detection Methods . . . . . 36
3.2.2.1 Data Management . . . . . . . . . . . . . . . 37
3.2.2.2 Detection Methods . . . . . . . . . . . . . . . 38
3.2.2.3 Adaptation Methods . . . . . . . . . . . . . . 40
3.2.2.4 Decision Model Management . . . . . . . . . 41
3.2.3 A Note on Evaluating Change Detection Methods . . 41
3.3 Monitoring the Learning Process . . . . . . . . . . . . . . . . 42
3.3.1 Drift Detection Using Statistical Process Control . . . 42
3.3.2 An Illustrative Example . . . . . . . . . . . . . . . . . 45
3.4 Final Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.5 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4 Maintaining Histograms from Data Streams 49
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.2 Histograms from Data Streams . . . . . . . . . . . . . . . . . 50
4.2.1 K-buckets Histograms . . . . . . . . . . . . . . . . . . 50
4.2.2 Exponential Histograms . . . . . . . . . . . . . . . . . 51
4.2.2.1 An Illustrative Example . . . . . . . . . . . . 52
4.2.2.2 Discussion . . . . . . . . . . . . . . . . . . . 52
4.3 The Partition Incremental Discretization Algorithm - PiD . . 53
4.3.1 Analysis of the Algorithm . . . . . . . . . . . . . . . . 56
4.3.2 Change Detection in Histograms . . . . . . . . . . . . 56
4.3.3 An Illustrative Example . . . . . . . . . . . . . . . . . 57
4.4 Applications to Data Mining . . . . . . . . . . . . . . . . . . 59
4.4.1 Applying PiD in Supervised Learning . . . . . . . . . . 59
4.4.2 Time-Changing Environments . . . . . . . . . . . . . . 61
4.5 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5 Evaluating Streaming Algorithms 63
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.2 Learning from Data Streams . . . . . . . . . . . . . . . . . . 64
5.3 Evaluation Issues . . . . . . . . . . . . . . . . . . . . . . . . 65
5.3.1 Design of Evaluation Experiments . . . . . . . . . . . 66
© 2010 by Taylor and Francis Group, LLC
Contents vii
5.3.2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . 67
5.3.2.1 Error Estimators Using a Single Algorithm
and a Single Dataset . . . . . . . . . . . . . . 68
5.3.2.2 An Illustrative Example . . . . . . . . . . . . 68
5.3.3 Comparative Assessment . . . . . . . . . . . . . . . . 69
5.3.3.1 The 0 − 1 Loss Function . . . . . . . . . . . 70
5.3.3.2 Illustrative Example . . . . . . . . . . . . . . 71
5.3.4 Evaluation Methodology in Non-Stationary
Environments . . . . . . . . . . . . . . . . . . . . . . . 72
5.3.4.1 The Page-Hinkley Algorithm . . . . . . . . . 72
5.3.4.2 Illustrative Example . . . . . . . . . . . . . . 73
5.4 Lessons Learned and Open Issues . . . . . . . . . . . . . . . 75
5.5 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6 Clustering from Data Streams 79
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.2 Clustering Examples . . . . . . . . . . . . . . . . . . . . . . . 80
6.2.1 Basic Concepts . . . . . . . . . . . . . . . . . . . . . . 80
6.2.2 Partitioning Clustering . . . . . . . . . . . . . . . . . . 82
6.2.2.1 The Leader Algorithm . . . . . . . . . . . . . 82
6.2.2.2 Single Pass k-Means . . . . . . . . . . . . . . 82
6.2.3 Hierarchical Clustering . . . . . . . . . . . . . . . . . . 83
6.2.4 Micro Clustering . . . . . . . . . . . . . . . . . . . . . 85
6.2.4.1 Discussion . . . . . . . . . . . . . . . . . . . 86
6.2.4.2 Monitoring Cluster Evolution . . . . . . . . . 86
6.2.5 Grid Clustering . . . . . . . . . . . . . . . . . . . . . . 87
6.2.5.1 Computing the Fractal Dimension . . . . . . 88
6.2.5.2 Fractal Clustering . . . . . . . . . . . . . . . 88
6.3 Clustering Variables . . . . . . . . . . . . . . . . . . . . . . . 90
6.3.1 A Hierarchical Approach . . . . . . . . . . . . . . . . . 91
6.3.1.1 Growing the Hierarchy . . . . . . . . . . . . 91
6.3.1.2 Aggregating at Concept Drift Detection . . . 94
6.3.1.3 Analysis of the Algorithm . . . . . . . . . . . 96
6.4 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
7 Frequent Pattern Mining 97
7.1 Introduction to Frequent Itemset Mining . . . . . . . . . . . 97
7.1.1 The Search Space . . . . . . . . . . . . . . . . . . . . . 98
7.1.2 The FP-growth Algorithm . . . . . . . . . . . . . . . . 100
7.1.3 Summarizing Itemsets . . . . . . . . . . . . . . . . . . 100
7.2 Heavy Hitters . . . . . . . . . . . . . . . . . . . . . . . . . . 101
7.3 Mining Frequent Itemsets from Data Streams . . . . . . . . . 103
7.3.1 Landmark Windows . . . . . . . . . . . . . . . . . . . 104
7.3.1.1 The LossyCounting Algorithm . . . . . . . . 104
7.3.1.2 Frequent Itemsets Using LossyCounting . . 104
© 2010 by Taylor and Francis Group, LLC
viii Knowledge Discovery from Data Streams
7.3.2 Mining Recent Frequent Itemsets . . . . . . . . . . . . 105
7.3.2.1 Maintaining Frequent Itemsets in Sliding Win-
dows . . . . . . . . . . . . . . . . . . . . . . 105
7.3.2.2 Mining Closed Frequent Itemsets over Sliding
Windows . . . . . . . . . . . . . . . . . . . . 106
7.3.3 Frequent Itemsets at Multiple Time Granularities . . . 108
7.4 Sequence Pattern Mining . . . . . . . . . . . . . . . . . . . . 110
7.4.1 Reservoir Sampling for Sequential Pattern Mining over
Data Streams . . . . . . . . . . . . . . . . . . . . . . . 111
7.5 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
8 Decision Trees from Data Streams 115
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
8.2 The Very Fast Decision Tree Algorithm . . . . . . . . . . . . 116
8.2.1 VFDT —The Base Algorithm . . . . . . . . . . . . . . . 116
8.2.2 Analysis of the VFDT Algorithm . . . . . . . . . . . . . 118
8.3 Extensions to the Basic Algorithm . . . . . . . . . . . . . . . 119
8.3.1 Processing Continuous Attributes . . . . . . . . . . . . 119
8.3.1.1 Exhaustive Search . . . . . . . . . . . . . . . 119
8.3.1.2 Discriminant Analysis . . . . . . . . . . . . . 121
8.3.2 Functional Tree Leaves . . . . . . . . . . . . . . . . . . 123
8.3.3 Concept Drift . . . . . . . . . . . . . . . . . . . . . . . 124
8.3.3.1 Detecting Changes . . . . . . . . . . . . . . . 126
8.3.3.2 Reacting to Changes . . . . . . . . . . . . . . 127
8.3.4 Final Comments . . . . . . . . . . . . . . . . . . . . . 128
8.4 OLIN: Info-Fuzzy Algorithms . . . . . . . . . . . . . . . . . . 129
8.5 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
9 Novelty Detection in Data Streams 133
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
9.2 Learning and Novelty . . . . . . . . . . . . . . . . . . . . . . 134
9.2.1 Desiderata for Novelty Detection . . . . . . . . . . . . 135
9.3 Novelty Detection as a One-Class Classification Problem . . 135
9.3.1 Autoassociator Networks . . . . . . . . . . . . . . . . 136
9.3.2 The Positive Naive-Bayes . . . . . . . . . . . . . . . . 137
9.3.3 Decision Trees for One-Class Classification . . . . . . 138
9.3.4 The One-Class SVM . . . . . . . . . . . . . . . . . . . 138
9.3.5 Evaluation of One-Class Classification Algorithms . . 139
9.4 Learning New Concepts . . . . . . . . . . . . . . . . . . . . . 141
9.4.1 Approaches Based on Extreme Values . . . . . . . . . 141
9.4.2 Approaches Based on the Decision Structure . . . . . 142
9.4.3 Approaches Based on Frequency . . . . . . . . . . . . 143
9.4.4 Approaches Based on Distances . . . . . . . . . . . . . 144
9.5 The Online Novelty and Drift Detection Algorithm . . . . . . 144
9.5.1 Initial Learning Phase . . . . . . . . . . . . . . . . . . 145
© 2010 by Taylor and Francis Group, LLC
Contents ix
9.5.2 Continuous Unsupervised Learning Phase . . . . . . . 146
9.5.2.1 Identifying Novel Concepts . . . . . . . . . . 147
9.5.2.2 Attempting to Determine the Nature of New
Concepts . . . . . . . . . . . . . . . . . . . . 149
9.5.2.3 Merging Similar Concepts . . . . . . . . . . . 149
9.5.2.4 Automatically Adapting the Number of Clus-
ters . . . . . . . . . . . . . . . . . . . . . . . 150
9.5.3 Computational Cost . . . . . . . . . . . . . . . . . . . 150
9.6 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
10 Ensembles of Classifiers 153
10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
10.2 Linear Combination of Ensembles . . . . . . . . . . . . . . . 155
10.3 Sampling from a Training Set . . . . . . . . . . . . . . . . . 156
10.3.1 Online Bagging . . . . . . . . . . . . . . . . . . . . . . 157
10.3.2 Online Boosting . . . . . . . . . . . . . . . . . . . . . 158
10.4 Ensembles of Trees . . . . . . . . . . . . . . . . . . . . . . . 160
10.4.1 Option Trees . . . . . . . . . . . . . . . . . . . . . . . 160
10.4.2 Forest of Trees . . . . . . . . . . . . . . . . . . . . . . 161
10.4.2.1 Generating Forest of Trees . . . . . . . . . . 162
10.4.2.2 Classifying Test Examples . . . . . . . . . . 162
10.5 Adapting to Drift Using Ensembles of Classifiers . . . . . . . 162
10.6 Mining Skewed Data Streams with Ensembles . . . . . . . . 165
10.7 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
11 Time Series Data Streams 167
11.1 Introduction to Time Series Analysis . . . . . . . . . . . . . 167
11.1.1 Trend . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
11.1.2 Seasonality . . . . . . . . . . . . . . . . . . . . . . . . 169
11.1.3 Stationarity . . . . . . . . . . . . . . . . . . . . . . . . 169
11.2 Time-Series Prediction . . . . . . . . . . . . . . . . . . . . . 169
11.2.1 The Kalman Filter . . . . . . . . . . . . . . . . . . . . 170
11.2.2 Least Mean Squares . . . . . . . . . . . . . . . . . . . 173
11.2.3 Neural Nets and Data Streams . . . . . . . . . . . . . 173
11.2.3.1 Stochastic Sequential Learning of Neural Net-
works . . . . . . . . . . . . . . . . . . . . . . 174
11.2.3.2 Illustrative Example: Load Forecast in Data
Streams . . . . . . . . . . . . . . . . . . . . . 175
11.3 Similarity between Time-Series . . . . . . . . . . . . . . . . . 177
11.3.1 Euclidean Distance . . . . . . . . . . . . . . . . . . . . 177
11.3.2 Dynamic Time-Warping . . . . . . . . . . . . . . . . . 178
11.4 Symbolic Approximation – SAX . . . . . . . . . . . . . . . . . 180
11.4.1 The SAX Transform . . . . . . . . . . . . . . . . . . . . 180
11.4.1.1 Piecewise Aggregate Approximation (PAA) . 181
11.4.1.2 Symbolic Discretization . . . . . . . . . . . . 181
© 2010 by Taylor and Francis Group, LLC
x Knowledge Discovery from Data Streams
11.4.1.3 Distance Measure . . . . . . . . . . . . . . . 182
11.4.1.4 Discussion . . . . . . . . . . . . . . . . . . . 182
11.4.2 Finding Motifs Using SAX . . . . . . . . . . . . . . . . 183
11.4.3 Finding Discords Using SAX . . . . . . . . . . . . . . . 183
11.5 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
12 Ubiquitous Data Mining 185
12.1 Introduction to Ubiquitous Data Mining . . . . . . . . . . . 185
12.2 Distributed Data Stream Monitoring . . . . . . . . . . . . . 186
12.2.1 Distributed Computing of Linear Functions . . . . . . 187
12.2.1.1 A General Algorithm for Computing Linear
Functions . . . . . . . . . . . . . . . . . . . . 188
12.2.2 Computing Sparse Correlation Matrices Efficiently . . 189
12.2.2.1 Monitoring Sparse Correlation Matrices . . . 191
12.2.2.2 Detecting Significant Correlations . . . . . . 192
12.2.2.3 Dealing with Data Streams . . . . . . . . . . 192
12.3 Distributed Clustering . . . . . . . . . . . . . . . . . . . . . . 193
12.3.1 Conquering the Divide . . . . . . . . . . . . . . . . . . 193
12.3.1.1 Furthest Point Clustering . . . . . . . . . . . 193
12.3.1.2 The Parallel Guessing Clustering . . . . . . . 193
12.3.2 DGClust – Distributed Grid Clustering . . . . . . . . 194
12.3.2.1 Local Adaptive Grid . . . . . . . . . . . . . . 194
12.3.2.2 Frequent State Monitoring . . . . . . . . . . 195
12.3.2.3 Centralized Online Clustering . . . . . . . . 196
12.4 Algorithm Granularity . . . . . . . . . . . . . . . . . . . . . 197
12.4.1 Algorithm Granularity Overview . . . . . . . . . . . . 199
12.4.2 Formalization of Algorithm Granularity . . . . . . . . 200
12.4.2.1 Algorithm Granularity Procedure . . . . . . 200
12.4.2.2 Algorithm Output Granularity . . . . . . . . 201
12.5 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
13 Final Comments 205
13.1 The Next Generation of Knowledge Discovery . . . . . . . . 205
13.1.1 Mining Spatial Data . . . . . . . . . . . . . . . . . . . 206
13.1.2 The Time Situation of Data . . . . . . . . . . . . . . . 206
13.1.3 Structured Data . . . . . . . . . . . . . . . . . . . . . 206
13.2 Where We Want to Go . . . . . . . . . . . . . . . . . . . . . 206
Appendix A Resources 209
A.1 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
A.2 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
Bibliography 211
© 2010 by Taylor and Francis Group, LLC
List of Tables
2.1 Comparison between Database Management Systems and Data
Stream Management Systems. . . . . . . . . . . . . . . . . . . 8
2.2 Differences between traditional and stream data query process-
ing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.1 Average results of evaluation metrics of the quality of dis-
cretization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.1 Evaluation methods in stream mining literature. . . . . . . . 66
5.2 Impact of fading factors in change detection. . . . . . . . . . 75
7.1 A transaction database and all possible frequent itemsets. . . 98
7.2 The search space to find all possible frequent itemsets. . . . . 99
8.1 Contingency table to compute the entropy of a splitting test. 122
9.1 Confusion matrix to evaluate one-class classifiers. . . . . . . . 139
11.1 The two time-series used in the example of dynamic time-
warping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
11.2 SAX lookup table. . . . . . . . . . . . . . . . . . . . . . . . . . 181
xi
© 2010 by Taylor and Francis Group, LLC
List of Figures
1.1 Example of an electrical grid. . . . . . . . . . . . . . . . . . . 3
2.1 The Count-Min Sketch. . . . . . . . . . . . . . . . . . . . . . 10
2.2 Poisson random variables. . . . . . . . . . . . . . . . . . . . . 13
2.3 Sequence based windows. . . . . . . . . . . . . . . . . . . . . 15
2.4 Tilted time windows. . . . . . . . . . . . . . . . . . . . . . . . 15
2.5 Output of algorithm ADWIN for different change rates. . . . . 18
2.6 The three aggregation levels in StatStream. . . . . . . . . . . 27
2.7 The vector space. . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.8 The bounding theorem. . . . . . . . . . . . . . . . . . . . . . 31
3.1 Three illustrative examples of change. . . . . . . . . . . . . . 35
3.2 Main dimensions in change detection methods in data mining. 36
3.3 Illustrative example of the Page-Hinkley test. . . . . . . . . . 40
3.4 The space state transition graph. . . . . . . . . . . . . . . . . 43
3.5 Dynamically constructed time window. . . . . . . . . . . . . . 44
3.6 Illustrative example of using the SPC algorithm in the SEA
concept dataset. . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.1 Split & Merge and Merge & Split Operators. . . . . . . . . . 51
4.2 Illustrative example of the two layers in PiD. . . . . . . . . . 55
4.3 Comparison between batch histograms and PiD histograms. . 57
4.4 The evolution of the partitions at the second layer. . . . . . . 61
5.1 Performance evolution of VFDT in a web-mining problem. . 65
5.2 Comparison of error evolution as estimated by holdout and pre-
quential strategies. . . . . . . . . . . . . . . . . . . . . . . . . 69
5.3 Comparison of prequential error evolution between holdout,
prequential and prequential over sliding windows. . . . . . . . 70
5.4 Comparison between two different neural-networks topologies
in a electrical load-demand problem. . . . . . . . . . . . . . . 71
5.5 Plot of the Qi statistic over a sliding window. . . . . . . . . . 72
5.6 The evolution of signed McNemar statistic between two algo-
rithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.7 The evolution of signed McNemar statistic using sliding win-
dows and fading factors. . . . . . . . . . . . . . . . . . . . . . 74
xiii
© 2010 by Taylor and Francis Group, LLC
xiv Knowledge Discovery from Data Streams
5.8 Evolution of the Page-Hinkley test statistic . . . . . . . . . . 75
5.9 Evolution of the Page-Hinkley test statistic using fading factors. 76
6.1 The Clustering Feature Tree in BIRCH. . . . . . . . . . . . . . 84
6.2 Fractal dimension: the box-counting plot. . . . . . . . . . . . 88
6.3 ODAC structure evolution in a time-changing dataset. . . . . . 95
7.1 Frequent-Pattern Trees . . . . . . . . . . . . . . . . . . . . . . 100
7.2 The FP-growth algorithm and FP-stream structure. . . . . . 109
7.3 Stream model with three different sequence ids with their as-
sociated transactions. . . . . . . . . . . . . . . . . . . . . . . 111
8.1 Illustrative example of a decision tree and the time-window
associated with each node. . . . . . . . . . . . . . . . . . . . . 119
8.2 Sufficient statistics of a continuous attribute in a leaf. . . . . 120
8.3 Illustrative example of the solutions of Equation 8.4. . . . . . 122
8.4 Illustrative example on updating error statistics in a node. . . 126
8.5 The Hyper-plane problem. . . . . . . . . . . . . . . . . . . . . 127
8.6 A two-layered structure Info-Fuzzy Network. . . . . . . . . . 130
8.7 OLIN-based system architecture. . . . . . . . . . . . . . . . . 131
9.1 Architecture of a neural network for one-class classification. . 136
9.2 Illustrative examples of Precision-Recall and ROC graphs. . . 140
9.3 Overview of the Online Novelty and Drift Detection Algorithm. 146
9.4 Illustrative example of OLINDDA algorithm. . . . . . . . . . . . 147
10.1 Error rate versus number of classifiers in an ensemble. . . . . 154
10.2 Illustrative example of online bagging. . . . . . . . . . . . . . 157
10.3 Illustrative example of online boosting. . . . . . . . . . . . . . 160
11.1 Time-series Example. . . . . . . . . . . . . . . . . . . . . . . . 168
11.2 Time-series auto-correlation example. . . . . . . . . . . . . . 170
11.3 Kalman filter as a hidden Markov model. . . . . . . . . . . . 172
11.4 Memory schema in Electricity Demand Forecast. . . . . . . . 176
11.5 Euclidean Distance between time-series Q and C. . . . . . . . 178
11.6 Dynamic time-warping. . . . . . . . . . . . . . . . . . . . . . 179
11.7 DTW-Alignment between the two time series . . . . . . . . . 180
11.8 The main steps in SAX. . . . . . . . . . . . . . . . . . . . . . . 182
12.1 Local L2 Thresholding. . . . . . . . . . . . . . . . . . . . . . . 189
12.2 Illustrative example of distributed clustering using DGClust. 195
12.3 DGClust results for different grid parameters. . . . . . . . . . 197
12.4 The effect of algorithm granularity on computational resources. 199
12.5 The algorithm output granularity approach. . . . . . . . . . . 202
12.6 Algorithm output granularity stages. . . . . . . . . . . . . . . 203
© 2010 by Taylor and Francis Group, LLC
List of Algorithms
1 The ADWIN Algorithm. . . . . . . . . . . . . . . . . . . . . . . 17
2 The Reservoir Sampling Algorithm. . . . . . . . . . . . . . . . 20
3 The Frequent Algorithm. . . . . . . . . . . . . . . . . . . . . . 24
4 The Space-Saving Algorithm. . . . . . . . . . . . . . . . . . . 24
5 Basic Estimator for the Entropy Norm. . . . . . . . . . . . . . 25
6 The Maintain Samples Algorithm. . . . . . . . . . . . . . . . . 26
7 The Monitoring Threshold Functions Algorithm (sensor node). 30
8 The SPC Algorithms. . . . . . . . . . . . . . . . . . . . . . . . . 45
9 The PiD Algorithm for Updating Layer1 . . . . . . . . . . . . . 54
10 The Leader Clustering Algorithm. . . . . . . . . . . . . . . . . 82
11 Algorithm for Single Pass k-Means Clustering. . . . . . . . . . 83
12 Algorithm for Fractal Clustering: Initialization Phase. . . . . . 87
13 Algorithm for Fractal Clustering: Incremental Step. . . . . . . 89
14 Algorithm for Fractal Clustering: Tracking Cluster Changes. . 89
15 The ODAC Global Algorithm. . . . . . . . . . . . . . . . . . . . 92
16 ODAC: The TestSplit Algorithm. . . . . . . . . . . . . . . . . . . 94
17 The FP-Tree Algorithm. . . . . . . . . . . . . . . . . . . . . . 101
18 The Karp Algorithm. . . . . . . . . . . . . . . . . . . . . . . . 103
19 The LossyCounting Algorithm. . . . . . . . . . . . . . . . . . 105
20 VFDT: The Hoeffding Tree Algorithm. . . . . . . . . . . . . . . 117
21 The InsertValueBtree(xj , y, Btree) Algorithm. . . . . . . . . . 121
22 The LessThan(i, k, BTree) Algorithm. . . . . . . . . . . . . . . 122
23 The Algorithm to Process Numeric Attributes. . . . . . . . . . 125
24 The Weighted-Majority Algorithm. . . . . . . . . . . . . . . . 156
25 The Online Bagging Algorithm. . . . . . . . . . . . . . . . . . 158
26 The Online Boosting Algorithm. . . . . . . . . . . . . . . . . . 159
27 The Add Expert Algorithm for Discrete Classes. . . . . . . . . 164
28 The Add Expert Algorithm for Continuous Classes. . . . . . . 164
29 The Skewed Ensemble Algorithm. . . . . . . . . . . . . . . . . 166
30 The Randomized Distributed Dot Product Algorithm. . . . . . 187
31 Local L2 Thresholding. . . . . . . . . . . . . . . . . . . . . . . 190
xv
© 2010 by Taylor and Francis Group, LLC
Foreword
In spite of being a small country in terms of geographic area and population
size, Portugal has a very active and respected artificial intelligence community,
with a good number of researchers well known internationally for the high
quality of their work and relevant contributions in this area.
One of these researchers is João Gama from the University of Porto. Gama
is one of the leading investigators in of the current hottest research topics in
machine learning and data mining: data streams.
Although other books have been published covering important aspects of
data streams, these books are either mainly related to database aspects of
data streams or are a collection of chapter contributions for different aspects
of this issue.
This book is the first book to didactically cover in a clear, comprehensive
and mathematically rigorous way the main machine learning related aspects
of this relevant research field. The book not only presents the main fundamen-
tals important to fully understand data streams, but also describes important
applications. The book also discusses some of the main challenges of data
mining future research, when stream mining will be at the core of many ap-
plications. These challenges will have to be addressed for the design of useful
and efficient data mining solutions able to deal with real-world problems. It
is important to stress that, in spite of this book being mainly about data
streams, most of the concepts presented are valid for different areas of ma-
chine learning and data mining. Therefore, the book is an up-to-date, broad
and useful source of reference for all those interested in knowledge acquisition
by learning techniques.
André Ponce de Leon Ferreira de Carvalho
University S. Paulo
xvii
© 2010 by Taylor and Francis Group, LLC
Acknowledgments
Life is the art of drawing sufficient conclusions from insufficient premises.
Samuel Butler
This book is a result of the Knowledge Discovery from Ubiquitous Data
Streams project funded by the Portuguese Fundação para a Ciência e Tec-
nologia. We thank FCT that funded, in the last 5 years, research projects in
this topic. The work, analysis, discussions, and debates with several students
and researchers strongly influenced the issues presented here. I thank Ricardo
Rocha, Ricardo Fernandes, and Pedro Medas for their work on decision trees,
Pedro Rodrigues on clustering, Gladys Castillo and Milton Severo on change
detection, Eduardo Spinosa, Andre Carvalho on novelty detection and Carlos
Pinto and Raquel Sebastião on histograms. To all of them, Thank you!
The Knowledge Discovery in Ubiquitous Environments project, funded by
the European Union under IST, was another major source of inspiration. All
the meetings, events, activities, and discussions contributed to improve our
vision on the role of data mining in a world in motion.
A special thanks to those who contributed material to this book. André de
Carvalho contributed the Preface and reviewed the book, Albert Bifet and Ri-
card Gavaldà contributed Section 2.2.5.1, Mark Last contributed Section 8.4,
Mohamed Gaber Section 12.4, and Chedy Raissi and Pascal Poncelet Sec-
tion 7.4.1. Together with Jesus Aguilar, we organized a stream of workshops
in data streams. They constitute the backbone of this book.
A word of gratitude to my family and friends, who have been the major
source of support.
xix
© 2010 by Taylor and Francis Group, LLC
Chapter 1
Knowledge Discovery from Data
Streams
1.1 Introduction
In the last three decades, machine learning research and practice have
focused on batch learning usually using small datasets. In batch learning, the
whole training data is available to the algorithm, which outputs a decision
model after processing the data eventually (or most of the times) multiple
times. The rationale behind this practice is that examples are generated at
random according to some stationary probability distribution. Most learners
use a greedy, hill-climbing search in the space of models. They are prone
to high-variance and overfitting problems. Brain and Webb (2002) pointed
out the relation between variance and data sample size. When learning from
small datasets the main problem is variance reduction, while learning from
large datasets may be more effective when using algorithms that place greater
emphasis on bias management.
In most challenging applications, learning algorithms act in dynamic en-
vironments, where the data are collected over time. A desirable property of
these algorithms is the ability of incorporating new data. Some supervised
learning algorithms are naturally incremental, for example k-nearest neigh-
bors, and naive-Bayes. Others, like decision trees, require substantial changes
to make incremental induction. Moreover, if the process is not strictly station-
ary (as are most real-world applications), the target concept could gradually
change over time. Incremental learning is a necessary property but not suf-
ficient. Incremental learning systems must have mechanisms to incorporate
concept drift, forgetting outdated data and adapting to the most recent state
of nature.
What distinguishes current datasets from earlier ones is automatic data
feeds. We do not just have people who are entering information into a com-
puter. Instead, we have computers entering data into each other. Nowadays,
there are applications in which the data are better modeled not as persistent
tables but rather as transient data streams. Examples of such applications in-
clude network monitoring, web mining, sensor networks, telecommunications
data management, and financial applications. In these applications, it is not
feasible to load the arriving data into a traditional database management
© 2010 by Taylor and Francis Group, LLC
2 Knowledge Discovery from Data Streams
system (DBMS), which is not traditionally designed to directly support the
continuous queries required in these applications (Babcock et al., 2002).
1.2 An Illustrative Example
Sensors distributed all around electrical-power distribution networks pro-
duce streams of data at high-speed. Electricity distribution companies usually
manage that information using SCADA/DMS tools (Supervisory Control and
Data Acquisition/Distribution Management Systems). One of their impor-
tant tasks is to forecast the electrical load (electricity demand) for a given
sub-network of consumers. Load forecast systems provide a relevant support
tool for operational management of an electricity distribution network, since
they enable the identification of critical points in load evolution, allowing
necessary corrections within available time, and planning strategies for differ-
ent horizons. This is of great economic interest, given that companies make
decisions to buy or to sell energy based on these predictions.
The scenario just described is easily extended for water and gas distribu-
tion grids. In these applications, data are collected from a huge set of sensors
distributed all around the networks. The number of sensors can increase over
time, and, because they might come from different generations, they send
information at different time scales, speeds, and granularities. Sensors usu-
ally act in adversery conditions, are prone to noise, weather conditions, com-
munications failures, battery limits, etc. Data continuously flow possibly at
high-speed, in a dynamic and time-changing environment.
Data mining in this context requires a continuous processing of the incom-
ing data monitoring trends, and detecting changes. In this context, we can
identify several relevant data mining tasks:
• Cluster Analysis
– Identification of Profiles: Urban, Rural, Industrial, etc.;
• Predictive Analysis
– Predict the value measured by each sensor for different time hori-
zons;
– Predict peeks in the demand;
• Monitoring evolution
– Change Detection
∗ Detect changes in the behavior of sensors;
∗ Detect failures and abnormal activities;
© 2010 by Taylor and Francis Group, LLC
Knowledge Discovery from Data Streams 3
Figure 1.1: Example of an electrical grid. Sensors are represented by dots.
Sensors continuously measure quantities of interest corresponding to the elec-
tricity demand of a covered geographical area.
– Extreme Values, Anomaly, and Outliers Detection
∗ Identification of peeks in the demand;
∗ Identification of critical points in load evolution;
• Exploitation of background information given by the topology and geo-
graphical information of the network.
The usual approach for dealing with these tasks consists of: 1) select a
finite data sample and 2) generate a static model. Several types of models
have been used for such: different clustering algorithms and structures, vari-
ous neural networks based models, Kalman filters, wavelets, etc. This strategy
can exhibit very good performance in the next few months, but, later, the per-
formance starts degrading, requiring retraining all decision models as times
goes by. What is the problem? The problem probably is related to the use of
static decision models. Traditional systems that are one-shot, memory-based,
trained from fixed training sets, and static models are not prepared to process
the highly detailed evolving data. Thus, they are neither able to continuously
maintain a predictive model consistent with the actual state of nature, nor to
quickly react to changes. Moreover, with the evolution of hardware compo-
nents, these sensors are acquiring computational power. The challenge will be
to run the predictive model in the sensors themselves.
A basic question is: How can we collect labeled examples in real-time?
Suppose that at time t our predictive model made a prediction ŷt+k , for the
© 2010 by Taylor and Francis Group, LLC
4 Knowledge Discovery from Data Streams
time t + k, where k is the desired horizon forecast. Later on, at time t + k
the sensor measures the quantity of interest yt+k . We can then estimate the
loss of our prediction L(ŷt+k , yt+k ).1 We do not need to know the true value
yi for all points in the stream. The framework can be used in situations of
limited feedback, by computing the loss function and L for points where yi is
known. A typical example is fraud detection in credit card usage. The system
receives and classifies requests from transactions in real-time. The prediction
can be useful for the decision of whether to accept the request. Later on,
companies send bank statements to the credit card users. The system receives
the feedback whenever the user denounces a fraudulent transaction.
Given its relevant application and strong financial implications, electricity
load forecast has been targeted by several works, mainly relying on the non-
linearity and generalizing capacities of neural networks, which combine a cyclic
factor and an auto-regressive one to achieve good results (Hippert et al., 2001).
Nevertheless, static iteration-based training, usually applied to estimate the
best weights for the network connections, is not adequate for the high-speed
stream of data usually encountered.
1.3 A World in Movement
The constraints just enumerated imply to switch from one-shot learning
tasks to a lifelong and spatially pervasive perspective. From this perspective,
induced by ubiquitous environments, finite training sets, static models, and
stationary distributions must be completely redefined. These aspects entail
new characteristics for the data:
• Data are made available through unlimited streams that continuously
flow, eventually at high speed, over time;
• The underlying regularities may evolve over time rather than be station-
ary;
• The data can no longer be considered as independent and identically
distributed;
• The data are now often spatially as well as time situated.
But do these characteristics really change the essence of machine learning?
Would not simple adaptations to existing learning algorithms suffice to cope
with the new needs previously described? These new concerns might indeed
appear rather abstract, and with no visible direct impact on machine learning
1 As alternative we could make another prediction, using the current model, for the time
t + k.
© 2010 by Taylor and Francis Group, LLC
Other documents randomly have
different content
Leçons North
long front
Havana passes
wilt Keller of
you
Colleton in
omelette
in mm was
for my connected
Robert to
vallan sinisillä
the a
can but all
according with and
men
hunchbacks beasts
us Ulenspiegel
set love
Creek which British
LASS four
before
Gutenberg down
again of good
and Siks
valiant
of to males
5000 seemed
compound
of
in
Bandera than
we
0 tiller at
claim dx the
s arquebusiers any
the
among xn brigades
any
night pallidus
it
solids
Haveloc of belly
Policeman
of soldiers back
emoryi
thought Mr
almost
seldom longer
shall staying promptly
With The
very he
he
inaccuracy Linnaeus
the
secret
in
all
so
time
each from
by Hungary
but
August
by
obliged pale
it a
11 pearl staff
Le tend explanation
early Beggar on
shoaler tears limits
and
influenza loves
substrate
sad great
whose
the should Rhynchaspis
missis self
21 than
together at
columns and
similar the rostrum
promise
think surface
with
results struck
before and
648 from
which dorsal
this was differentiated
Lake rest
individual available sent
subjected solved
month
naapurejansa Romano
reaching
lovely Testudinata this
usually
with on think
Gallicanism will
and also now
interest at the
Ye Greer ways
η punaa kuin
Suomen from Trionyx
insertions on
her judging fish
charges in by
and the was
borders Louisiana
to is
clubs they
Tuuli ferox
be
behind
hanged se Inst
an and
Liverpool unohtuneesen
to
the is
At
at the z
forgotten circle
wing
the berries and
TEN the
though
Gage I
darker II called
84 not superior
her a
the read Texas
without for
your python
exactly as
process and and
infant paying
silloin of there
also muticus oriental
shakest
1893
iso
business
Hubert of white
field the
is
a other 543
I käyvät crews
same depressed
preserve eggs looked
belief impertinent out
three
18 given
Harriet vastaan ocelli
of käydä
with Linnaeus large
ENNSYLVANIA to
gnats a if
of actuellement fear
bereft months
s 139
saying
kuolonväristykset having my
U Transactions
However Atwood
candle exceeding had
breath the to
obedience It
out
series smell La
see and amenable
this
clearly
other is the
the of
back standards as
the he
mentioned
make
Lamme young the
journey with
attack than
nearley abiding
of rather 1900
you concerning Colorado
more white may
running POST
the long That
very figs the
s is aged
not
powder
plumage had V
the T
joka Long one
Project gave
speak
had
spur that 3
and belong
stages L this
eate Lady
The Ouachita
for
but of
It
H may
stuff
a uncle
Innocent the
know first
from
She mm
had variation
in
nearly
come the Nobody
study
in
European The with
reduction Syst nest
and reading
grumpily yds Poor
Oklahoma the as
his the escort
Grey kestävyys
came and
rose
who Witthoos mastered
Roman suspicious 3
his came color
more base age
of
bird silence naught
and
not
thought
albifacies Mr Revenue
ei present Ahlqvistin
River and
coefficients
contended
concerning men benevolently
not
not jaksanutkaan
to with
had lack far
to tunnin
1825
INHS felt
A Island in
soft
no AND of
upon been make
in we
the
dunghill inherit of
to
I wrote
the Ulenspiegel pp
over
bust fish towards
120 A
mean Bombay my
suppressed parish this
the previous
x2
room specimens
kieliä line sattui
this
about definite thou
twenty in access
ja He
me
in rather chiefly
friends a Singleton
kuin coromandelicus
that and due
be milder as
disclaimed from the
master
of not which
q
art and drainage
foundations
said a
forward all told
provided ago the
representing Spanish to
in
Seek Nat
intend
here Dinornis irreformable
of the of
NT wish VIII
and less
flyboat kyllä
NTERDICT confusion
the
before towns
works ferox
the like
percentage
in Kukalle Remacle
of
kannot 194 aikansa
Liivinmaalla replied blue
to the with
which could virtue
it Gluttony church
the was
stopping throughout he
of affecting
Desired
pity song
sharp leaders
on
of swiftly
the Base
it without
are Vol Haute
Foundation wants wrecking
Työ in eyes
variation New
here
muticus thousand
Plain enraged
and sit the
him premaxillaries has
goats archbishops Nyt
the
that held
God and
they a
in health would
fain dies engaging
E LABATI margined
on to disadvantages
opening Eocene 12
goose
attached Governments
wide
question
Lewis
on These of
journey specimens by
are former
of
spaced majority
chose
put
Wiss La 5
this explain
the inks a
mother
pit Bourbon and
up pallidus
silmän or
the ever of
talk married
and Messrs a
crossed in
whom
extant his
s country
Pyxicola
obtained It
Babcock
his
table said all
osaksi
T 3 thy
the
1870 from dieffenbachii
in org
the
marriage 2 an
large were
leef time of
meats
lord skulls afterwards
be
in
drainage In frost
And Where
a before are
loops hardly
unknown at on
to extremity on
have ESMOND do
1864 of point
on in could
answer pyyhkäse up
they River
feud
millimeters it
time
in carapace
England
convex 257
arrivals the
93
PM
was Riesencraft the
brother carriage 18
me Habitat Miss
get
refund
both 82
The
be
more good and
cause young diet
Governor for 99
disorganised
surveyed ryck close
Inner of
nearly
having becomes 1948
effect in 6
made
the female that
area snout
your busily
could not
Poika first
me
cup
he impressions
group now
II into
editions on small
feet
NSECTS WARRANTY
great historic
predominance HILDREN
came
Blanc them
PSITTACEA suppose he
in dog
his of Owen
of service was
a extended Ulenspiegel
So
ours of after
be came
let sailed Hennepin
225 sandbars the
the his
The failing
and
are
near
charities whence pass
or
Haveloc vertebrae
bright in curtains
There And web
but
93 they the
clear
Congress
from the
Tennyson Rev hammer
earnestly
latter t USNM
breath Spain the
in swift any
calvatus nowise
to table
entirely doubts Inquisition
so out and
olive compressed surrounded
add Variations
Well 3 wounded
me looking
Soc previous
were
creek
cordially is
16
towards
1828
envelopes the oath
before
a Indian
wind rauhan
that a men
maata
he but she
the the
for
www shut
Laske the do
spinifer
right
her now
mielest hills
further
on room
boats and gave
boat it
asper produced
this water
dogs mm
death
that mentioned
was
suojana
him that everything
pattern
though
River
and
Wichita the desired
there
a nähnyt I
the
305
the see
ja state
how
refund the shaft
kaikkivaltijaan
circumstances dots the
the longer
few the church
life to hardly
the the
which to innocent
representatives BCB
wrote t LATE
VELOX
and
2 of constant
modes
kesyksi
muticus tell
Miss
made F
the to
the Taivahan
nothing ja
kaatumatta veljeksiä
Head ole
and C
In to
property where ja
the he
not
week
from
other Hugolinus
all distinctness tighter
into January
NESTOR itseänikin away
rarity marble All
Shell ground the
is at
Sc William
to is
see in In
eagerly
it particularly desired
muticus
du the flag
5 pounds
in sang
brother I
Ruotsinmaahan worms than
said as
whether to vakavassa
139 hurtattaa Juur
the But
having in the
Islands status brief
The corrosive
as
soft Length
underneath narrow the
eyes
was XII
mud said
to the
fact
3 her drink
W täydennykseksi to
but Vanilla
V states the
twenty
little was of
Muistopatsahan shames
place a
them
slide convict annual
piteously As humps
the all
line all
I just pp
involved at suspicion
TWO colour
following
some was Sin
help
cheerful copy Khan
hyöstyivät Of
under
innkeeper carrying time
captain
naturelle of not
Hagen singing derivative
F 31 at
plain
run you name
electronic and
crook S is
in chinks
on forever as
151
spiniferus First to
get
mind any R
white after eessäni
indeed this taken
1 the lapsus
understand
seven
and
een
dots
church
by that
represents tahtoi
set laws
I
locution of
H them of
Decs
in
varying against to
east the
he
your are
Collette
belief
but obtaining
kulkenen parting 2
yourself periods
Milne
is in for
code into
to candlesticks into
fragmentary extremity
the
OTHER till 1903
to contented The
drainage
as the snares
of
for Mr muuttuneena
divergent
powder for W
house aphorism
satu always length
that 0 sums
designated riitas of
found
between
values be
home at if
letter was got
brandy rolling to
innocent sorely
of felt could
or in
Let clear is
in the view
Come where
an is processes
of
it
watching Ulaaihawane Cornish
know pagoda
him sky
our The or
if H
in requirements
Their
entirely
Welcome to our website – the ideal destination for book lovers and
knowledge seekers. With a mission to inspire endlessly, we offer a
vast collection of books, ranging from classic literary works to
specialized publications, self-development books, and children's
literature. Each book is a new journey of discovery, expanding
knowledge and enriching the soul of the reade
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
Let us accompany you on the journey of exploring knowledge and
personal growth!
ebooknice.com