0% found this document useful (0 votes)
30 views119 pages

(Ebook) Knowledge Discovery From Data Streams by Joao Gama ISBN 1439826110 Ready To Read

The document is about the ebook 'Knowledge Discovery from Data Streams' by Joao Gama, which is available for download in PDF format. It includes information about the book's content, its ISBN, and its positive reviews. Additionally, it lists related ebooks and provides details about the series it belongs to, focusing on developments in data mining and knowledge discovery.

Uploaded by

okkcpwz0092
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views119 pages

(Ebook) Knowledge Discovery From Data Streams by Joao Gama ISBN 1439826110 Ready To Read

The document is about the ebook 'Knowledge Discovery from Data Streams' by Joao Gama, which is available for download in PDF format. It includes information about the book's content, its ISBN, and its positive reviews. Additionally, it lists related ebooks and provides details about the series it belongs to, focusing on developments in data mining and knowledge discovery.

Uploaded by

okkcpwz0092
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 119

(Ebook) Knowledge Discovery from Data Streams by Joao

Gama ISBN 1439826110 Pdf Download

https://ebooknice.com/product/knowledge-discovery-from-data-
streams-2253200

★★★★★
4.6 out of 5.0 (23 reviews )

DOWNLOAD PDF

ebooknice.com
(Ebook) Knowledge Discovery from Data Streams by Joao Gama
ISBN 1439826110 Pdf Download

EBOOK

Available Formats

■ PDF eBook Study Guide Ebook

EXCLUSIVE 2025 EDUCATIONAL COLLECTION - LIMITED TIME

INSTANT DOWNLOAD VIEW LIBRARY


Here are some recommended products that we believe you will be
interested in. You can click the link to download.

(Ebook) Knowledge Discovery from Sensor Data (Industrial Innovation)


by Auroop R. Ganguly, Joao Gama, Olufemi A. Omitaomu, Mohamed Medhat
Gaber, Ranga Raju Vatsavai ISBN 9781420082333, 1420082329

https://ebooknice.com/product/knowledge-discovery-from-sensor-data-
industrial-innovation-2187380

(Ebook) Knowledge-Guided Machine Learning: Accelerating Discovery


Using Scientific Knowledge and Data (Chapman & Hall/CRC Data Mining
and Knowledge Discovery Series) by Anuj Karpatne, Ramakrishnan Kannan,
Vipin Kumar ISBN 9780367693411, 0367693410
https://ebooknice.com/product/knowledge-guided-machine-learning-
accelerating-discovery-using-scientific-knowledge-and-data-chapman-
hall-crc-data-mining-and-knowledge-discovery-series-44169202

(Ebook) Geographic Data Mining and Knowledge Discovery, Second Edition


(Chapman & Hall CRC Data Mining and Knowledge Discovery Series) by
Harvey J. Miller, Jiawei Han ISBN 9781420073973, 1420073974

https://ebooknice.com/product/geographic-data-mining-and-knowledge-
discovery-second-edition-chapman-hall-crc-data-mining-and-knowledge-
discovery-series-2023104

(Ebook) Knowledge Discovery in Big Data from Astronomy and Earth


Observation: Astrogeoinformatics by Petr Skoda (editor), Fathalrahman
Adam (editor) ISBN 9780128191545, 0128191546

https://ebooknice.com/product/knowledge-discovery-in-big-data-from-
astronomy-and-earth-observation-astrogeoinformatics-11005930
(Ebook) Data Mining and Knowledge Discovery for Geoscientists by
Guangren Shi (Auth.) ISBN 9780124104372, 0124104371

https://ebooknice.com/product/data-mining-and-knowledge-discovery-for-
geoscientists-4556686

(Ebook) Information Discovery on Electronic Health Records (Chapman &


Hall CRC Data Mining and Knowledge Discovery Series) by Vagelis
Hristidis ISBN 9781420090383, 1420090380

https://ebooknice.com/product/information-discovery-on-electronic-
health-records-chapman-hall-crc-data-mining-and-knowledge-discovery-
series-1930942

(Ebook) Learning from Data Streams in Evolving Environments by Moamar


Sayed-Mouchaweh ISBN 9783319898025, 9783319898032, 3319898027,
3319898035

https://ebooknice.com/product/learning-from-data-streams-in-evolving-
environments-7150646

(Ebook) Statistical Data Mining & Knowledge Discovery by Hamparsum


Bozdogan ISBN 9780203497159, 9780203620540, 9781584883449, 0203497155,
0203620542, 1584883448

https://ebooknice.com/product/statistical-data-mining-knowledge-
discovery-1954732

(Ebook) Biological Data Mining (Chapman & Hall Crc Data Mining and
Knowledge Discovery Series) by Jake Y. Chen, Stefano Lonardi ISBN
1420086847

https://ebooknice.com/product/biological-data-mining-chapman-hall-crc-
data-mining-and-knowledge-discovery-series-2172726
Knowledge
Discovery from
Data Streams

© 2010 by Taylor and Francis Group, LLC


Chapman & Hall/CRC
Data Mining and Knowledge Discovery Series

SERIES EDITOR
Vipin Kumar
University of Minnesota
Department of Computer Science and Engineering
Minneapolis, Minnesota, U.S.A

AIMS AND SCOPE


This series aims to capture new developments and applications in data mining and knowledge
discovery, while summarizing the computational tools and techniques useful in data analysis. This
series encourages the integration of mathematical, statistical, and computational methods and
techniques through the publication of a broad range of textbooks, reference works, and hand-
books. The inclusion of concrete examples and applications is highly encouraged. The scope of the
series includes, but is not limited to, titles in the areas of data mining and knowledge discovery
methods and applications, modeling, algorithms, theory and foundations, data and knowledge
visualization, data mining systems and tools, and privacy and security issues.

PUBLISHED TITLES
UNDERSTANDING COMPLEX DATASETS: THE TOP TEN ALGORITHMS IN DATA MINING
DATA MINING WITH MATRIX DECOMPOSITIONS Xindong Wu and Vipin Kumar
David Skillicorn
GEOGRAPHIC DATA MINING AND
COMPUTATIONAL METHODS OF FEATURE KNOWLEDGE DISCOVERY, SECOND EDITION
SELECTION Harvey J. Miller and Jiawei Han
Huan Liu and Hiroshi Motoda
TEXT MINING: CLASSIFICATION, CLUSTERING,
CONSTRAINED CLUSTERING: ADVANCES IN AND APPLICATIONS
ALGORITHMS, THEORY, AND APPLICATIONS Ashok N. Srivastava and Mehran Sahami
Sugato Basu, Ian Davidson, and Kiri L. Wagstaff
BIOLOGICAL DATA MINING
KNOWLEDGE DISCOVERY FOR Jake Y. Chen and Stefano Lonardi
COUNTERTERRORISM AND LAW ENFORCEMENT
David Skillicorn INFORMATION DISCOVERY ON ELECTRONIC
HEALTH RECORDS
MULTIMEDIA DATA MINING: A SYSTEMATIC Vagelis Hristidis
INTRODUCTION TO CONCEPTS AND THEORY
Zhongfei Zhang and Ruofei Zhang TEMPORAL DATA MINING
Theophano Mitsa
NEXT GENERATION OF DATA MINING
Hillol Kargupta, Jiawei Han, Philip S. Yu, RELATIONAL DATA CLUSTERING: MODELS,
Rajeev Motwani, and Vipin Kumar ALGORITHMS, AND APPLICATIONS
Bo Long, Zhongfei Zhang, and Philip S. Yu
DATA MINING FOR DESIGN AND MARKETING
Yukio Ohsawa and Katsutoshi Yada KNOWLEDGE DISCOVERY FROM DATA STREAMS
João Gama

© 2010 by Taylor and Francis Group, LLC


Chapman & Hall/CRC
Data Mining and Knowledge Discovery Series

Knowledge
Discovery from
Data Streams

João Gama

© 2010 by Taylor and Francis Group, LLC


Chapman & Hall/CRC
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742

© 2010 by Taylor and Francis Group, LLC


Chapman & Hall/CRC is an imprint of Taylor & Francis Group, an Informa business

No claim to original U.S. Government works

Printed in the United States of America on acid-free paper


10 9 8 7 6 5 4 3 2 1

International Standard Book Number: 978-1-4398-2611-9 (Hardback)

This book contains information obtained from authentic and highly regarded sources. Reasonable efforts
have been made to publish reliable data and information, but the author and publisher cannot assume
responsibility for the validity of all materials or the consequences of their use. The authors and publishers
have attempted to trace the copyright holders of all material reproduced in this publication and apologize to
copyright holders if permission to publish in this form has not been obtained. If any copyright material has
not been acknowledged please write and let us know so we may rectify in any future reprint.

Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmit-
ted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented,
including photocopying, microfilming, and recording, or in any information storage or retrieval system,
without written permission from the publishers.

For permission to photocopy or use material electronically from this work, please access www.copyright.
com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood
Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and
registration for a variety of users. For organizations that have been granted a photocopy license by the CCC,
a separate system of payment has been arranged.

Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used
only for identification and explanation without intent to infringe.

Library of Congress Cataloging‑in‑Publication Data

Gama, João.
Knowledge discovery from data streams / João Gama.
p. cm. ‑‑ (Chapman & Hall/CRC data mining and knowledge discovery series)
Includes bibliographical references and index.
ISBN 978‑1‑4398‑2611‑9 (hardcover : alk. paper)
1. Computer algorithms. 2. Machine learning. 3. Data mining. I. Title. II. Series.

QA76.9.A43G354 2010
006.3 ‘12‑‑dc22 2010014600

Visit the Taylor & Francis Web site at


http://www.taylorandfrancis.com

and the CRC Press Web site at


http://www.crcpress.com

© 2010 by Taylor and Francis Group, LLC


Contents

List of Tables xi

List of Figures xiii

List of Algorithms xv

Foreword xvii

Acknowledgments xix

1 Knowledge Discovery from Data Streams 1


1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 An Illustrative Example . . . . . . . . . . . . . . . . . . . . . 2
1.3 A World in Movement . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Data Mining and Data Streams . . . . . . . . . . . . . . . . 5

2 Introduction to Data Streams 7


2.1 Data Stream Models . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.1 Research Issues in Data Stream Management Systems 8
2.1.2 An Illustrative Problem . . . . . . . . . . . . . . . . . 8
2.2 Basic Streaming Methods . . . . . . . . . . . . . . . . . . . . 9
2.2.1 Illustrative Examples . . . . . . . . . . . . . . . . . . . 10
2.2.1.1 Counting the Number of Occurrences of the
Elements in a Stream . . . . . . . . . . . . . 10
2.2.1.2 Counting the Number of Distinct Values in a
Stream . . . . . . . . . . . . . . . . . . . . . 11
2.2.2 Bounds of Random Variables . . . . . . . . . . . . . . 11
2.2.3 Poisson Processes . . . . . . . . . . . . . . . . . . . . . 13
2.2.4 Maintaining Simple Statistics from Data Streams . . . 14
2.2.5 Sliding Windows . . . . . . . . . . . . . . . . . . . . . 14
2.2.5.1 Computing Statistics over Sliding Windows:
The ADWIN Algorithm . . . . . . . . . . . . . 16
2.2.6 Data Synopsis . . . . . . . . . . . . . . . . . . . . . . 19
2.2.6.1 Sampling . . . . . . . . . . . . . . . . . . . . 19
2.2.6.2 Synopsis and Histograms . . . . . . . . . . . 20
2.2.6.3 Wavelets . . . . . . . . . . . . . . . . . . . . 21
2.2.6.4 Discrete Fourier Transform . . . . . . . . . . 22

© 2010 by Taylor and Francis Group, LLC


vi Knowledge Discovery from Data Streams

2.3 Illustrative Applications . . . . . . . . . . . . . . . . . . . . . 23


2.3.1 A Data Warehouse Problem: Hot-Lists . . . . . . . . . 23
2.3.2 Computing the Entropy in a Stream . . . . . . . . . . 24
2.3.3 Monitoring Correlations Between Data Streams . . . . 27
2.3.4 Monitoring Threshold Functions over Distributed Data
Streams . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.4 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3 Change Detection 33
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2 Tracking Drifting Concepts . . . . . . . . . . . . . . . . . . . 34
3.2.1 The Nature of Change . . . . . . . . . . . . . . . . . . 35
3.2.2 Characterization of Drift Detection Methods . . . . . 36
3.2.2.1 Data Management . . . . . . . . . . . . . . . 37
3.2.2.2 Detection Methods . . . . . . . . . . . . . . . 38
3.2.2.3 Adaptation Methods . . . . . . . . . . . . . . 40
3.2.2.4 Decision Model Management . . . . . . . . . 41
3.2.3 A Note on Evaluating Change Detection Methods . . 41
3.3 Monitoring the Learning Process . . . . . . . . . . . . . . . . 42
3.3.1 Drift Detection Using Statistical Process Control . . . 42
3.3.2 An Illustrative Example . . . . . . . . . . . . . . . . . 45
3.4 Final Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.5 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4 Maintaining Histograms from Data Streams 49


4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.2 Histograms from Data Streams . . . . . . . . . . . . . . . . . 50
4.2.1 K-buckets Histograms . . . . . . . . . . . . . . . . . . 50
4.2.2 Exponential Histograms . . . . . . . . . . . . . . . . . 51
4.2.2.1 An Illustrative Example . . . . . . . . . . . . 52
4.2.2.2 Discussion . . . . . . . . . . . . . . . . . . . 52
4.3 The Partition Incremental Discretization Algorithm - PiD . . 53
4.3.1 Analysis of the Algorithm . . . . . . . . . . . . . . . . 56
4.3.2 Change Detection in Histograms . . . . . . . . . . . . 56
4.3.3 An Illustrative Example . . . . . . . . . . . . . . . . . 57
4.4 Applications to Data Mining . . . . . . . . . . . . . . . . . . 59
4.4.1 Applying PiD in Supervised Learning . . . . . . . . . . 59
4.4.2 Time-Changing Environments . . . . . . . . . . . . . . 61
4.5 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

5 Evaluating Streaming Algorithms 63


5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.2 Learning from Data Streams . . . . . . . . . . . . . . . . . . 64
5.3 Evaluation Issues . . . . . . . . . . . . . . . . . . . . . . . . 65
5.3.1 Design of Evaluation Experiments . . . . . . . . . . . 66

© 2010 by Taylor and Francis Group, LLC


Contents vii

5.3.2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . 67


5.3.2.1 Error Estimators Using a Single Algorithm
and a Single Dataset . . . . . . . . . . . . . . 68
5.3.2.2 An Illustrative Example . . . . . . . . . . . . 68
5.3.3 Comparative Assessment . . . . . . . . . . . . . . . . 69
5.3.3.1 The 0 − 1 Loss Function . . . . . . . . . . . 70
5.3.3.2 Illustrative Example . . . . . . . . . . . . . . 71
5.3.4 Evaluation Methodology in Non-Stationary
Environments . . . . . . . . . . . . . . . . . . . . . . . 72
5.3.4.1 The Page-Hinkley Algorithm . . . . . . . . . 72
5.3.4.2 Illustrative Example . . . . . . . . . . . . . . 73
5.4 Lessons Learned and Open Issues . . . . . . . . . . . . . . . 75
5.5 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

6 Clustering from Data Streams 79


6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.2 Clustering Examples . . . . . . . . . . . . . . . . . . . . . . . 80
6.2.1 Basic Concepts . . . . . . . . . . . . . . . . . . . . . . 80
6.2.2 Partitioning Clustering . . . . . . . . . . . . . . . . . . 82
6.2.2.1 The Leader Algorithm . . . . . . . . . . . . . 82
6.2.2.2 Single Pass k-Means . . . . . . . . . . . . . . 82
6.2.3 Hierarchical Clustering . . . . . . . . . . . . . . . . . . 83
6.2.4 Micro Clustering . . . . . . . . . . . . . . . . . . . . . 85
6.2.4.1 Discussion . . . . . . . . . . . . . . . . . . . 86
6.2.4.2 Monitoring Cluster Evolution . . . . . . . . . 86
6.2.5 Grid Clustering . . . . . . . . . . . . . . . . . . . . . . 87
6.2.5.1 Computing the Fractal Dimension . . . . . . 88
6.2.5.2 Fractal Clustering . . . . . . . . . . . . . . . 88
6.3 Clustering Variables . . . . . . . . . . . . . . . . . . . . . . . 90
6.3.1 A Hierarchical Approach . . . . . . . . . . . . . . . . . 91
6.3.1.1 Growing the Hierarchy . . . . . . . . . . . . 91
6.3.1.2 Aggregating at Concept Drift Detection . . . 94
6.3.1.3 Analysis of the Algorithm . . . . . . . . . . . 96
6.4 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

7 Frequent Pattern Mining 97


7.1 Introduction to Frequent Itemset Mining . . . . . . . . . . . 97
7.1.1 The Search Space . . . . . . . . . . . . . . . . . . . . . 98
7.1.2 The FP-growth Algorithm . . . . . . . . . . . . . . . . 100
7.1.3 Summarizing Itemsets . . . . . . . . . . . . . . . . . . 100
7.2 Heavy Hitters . . . . . . . . . . . . . . . . . . . . . . . . . . 101
7.3 Mining Frequent Itemsets from Data Streams . . . . . . . . . 103
7.3.1 Landmark Windows . . . . . . . . . . . . . . . . . . . 104
7.3.1.1 The LossyCounting Algorithm . . . . . . . . 104
7.3.1.2 Frequent Itemsets Using LossyCounting . . 104

© 2010 by Taylor and Francis Group, LLC


viii Knowledge Discovery from Data Streams

7.3.2 Mining Recent Frequent Itemsets . . . . . . . . . . . . 105


7.3.2.1 Maintaining Frequent Itemsets in Sliding Win-
dows . . . . . . . . . . . . . . . . . . . . . . 105
7.3.2.2 Mining Closed Frequent Itemsets over Sliding
Windows . . . . . . . . . . . . . . . . . . . . 106
7.3.3 Frequent Itemsets at Multiple Time Granularities . . . 108
7.4 Sequence Pattern Mining . . . . . . . . . . . . . . . . . . . . 110
7.4.1 Reservoir Sampling for Sequential Pattern Mining over
Data Streams . . . . . . . . . . . . . . . . . . . . . . . 111
7.5 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

8 Decision Trees from Data Streams 115


8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
8.2 The Very Fast Decision Tree Algorithm . . . . . . . . . . . . 116
8.2.1 VFDT —The Base Algorithm . . . . . . . . . . . . . . . 116
8.2.2 Analysis of the VFDT Algorithm . . . . . . . . . . . . . 118
8.3 Extensions to the Basic Algorithm . . . . . . . . . . . . . . . 119
8.3.1 Processing Continuous Attributes . . . . . . . . . . . . 119
8.3.1.1 Exhaustive Search . . . . . . . . . . . . . . . 119
8.3.1.2 Discriminant Analysis . . . . . . . . . . . . . 121
8.3.2 Functional Tree Leaves . . . . . . . . . . . . . . . . . . 123
8.3.3 Concept Drift . . . . . . . . . . . . . . . . . . . . . . . 124
8.3.3.1 Detecting Changes . . . . . . . . . . . . . . . 126
8.3.3.2 Reacting to Changes . . . . . . . . . . . . . . 127
8.3.4 Final Comments . . . . . . . . . . . . . . . . . . . . . 128
8.4 OLIN: Info-Fuzzy Algorithms . . . . . . . . . . . . . . . . . . 129
8.5 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

9 Novelty Detection in Data Streams 133


9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
9.2 Learning and Novelty . . . . . . . . . . . . . . . . . . . . . . 134
9.2.1 Desiderata for Novelty Detection . . . . . . . . . . . . 135
9.3 Novelty Detection as a One-Class Classification Problem . . 135
9.3.1 Autoassociator Networks . . . . . . . . . . . . . . . . 136
9.3.2 The Positive Naive-Bayes . . . . . . . . . . . . . . . . 137
9.3.3 Decision Trees for One-Class Classification . . . . . . 138
9.3.4 The One-Class SVM . . . . . . . . . . . . . . . . . . . 138
9.3.5 Evaluation of One-Class Classification Algorithms . . 139
9.4 Learning New Concepts . . . . . . . . . . . . . . . . . . . . . 141
9.4.1 Approaches Based on Extreme Values . . . . . . . . . 141
9.4.2 Approaches Based on the Decision Structure . . . . . 142
9.4.3 Approaches Based on Frequency . . . . . . . . . . . . 143
9.4.4 Approaches Based on Distances . . . . . . . . . . . . . 144
9.5 The Online Novelty and Drift Detection Algorithm . . . . . . 144
9.5.1 Initial Learning Phase . . . . . . . . . . . . . . . . . . 145

© 2010 by Taylor and Francis Group, LLC


Contents ix

9.5.2 Continuous Unsupervised Learning Phase . . . . . . . 146


9.5.2.1 Identifying Novel Concepts . . . . . . . . . . 147
9.5.2.2 Attempting to Determine the Nature of New
Concepts . . . . . . . . . . . . . . . . . . . . 149
9.5.2.3 Merging Similar Concepts . . . . . . . . . . . 149
9.5.2.4 Automatically Adapting the Number of Clus-
ters . . . . . . . . . . . . . . . . . . . . . . . 150
9.5.3 Computational Cost . . . . . . . . . . . . . . . . . . . 150
9.6 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

10 Ensembles of Classifiers 153


10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
10.2 Linear Combination of Ensembles . . . . . . . . . . . . . . . 155
10.3 Sampling from a Training Set . . . . . . . . . . . . . . . . . 156
10.3.1 Online Bagging . . . . . . . . . . . . . . . . . . . . . . 157
10.3.2 Online Boosting . . . . . . . . . . . . . . . . . . . . . 158
10.4 Ensembles of Trees . . . . . . . . . . . . . . . . . . . . . . . 160
10.4.1 Option Trees . . . . . . . . . . . . . . . . . . . . . . . 160
10.4.2 Forest of Trees . . . . . . . . . . . . . . . . . . . . . . 161
10.4.2.1 Generating Forest of Trees . . . . . . . . . . 162
10.4.2.2 Classifying Test Examples . . . . . . . . . . 162
10.5 Adapting to Drift Using Ensembles of Classifiers . . . . . . . 162
10.6 Mining Skewed Data Streams with Ensembles . . . . . . . . 165
10.7 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

11 Time Series Data Streams 167


11.1 Introduction to Time Series Analysis . . . . . . . . . . . . . 167
11.1.1 Trend . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
11.1.2 Seasonality . . . . . . . . . . . . . . . . . . . . . . . . 169
11.1.3 Stationarity . . . . . . . . . . . . . . . . . . . . . . . . 169
11.2 Time-Series Prediction . . . . . . . . . . . . . . . . . . . . . 169
11.2.1 The Kalman Filter . . . . . . . . . . . . . . . . . . . . 170
11.2.2 Least Mean Squares . . . . . . . . . . . . . . . . . . . 173
11.2.3 Neural Nets and Data Streams . . . . . . . . . . . . . 173
11.2.3.1 Stochastic Sequential Learning of Neural Net-
works . . . . . . . . . . . . . . . . . . . . . . 174
11.2.3.2 Illustrative Example: Load Forecast in Data
Streams . . . . . . . . . . . . . . . . . . . . . 175
11.3 Similarity between Time-Series . . . . . . . . . . . . . . . . . 177
11.3.1 Euclidean Distance . . . . . . . . . . . . . . . . . . . . 177
11.3.2 Dynamic Time-Warping . . . . . . . . . . . . . . . . . 178
11.4 Symbolic Approximation – SAX . . . . . . . . . . . . . . . . . 180
11.4.1 The SAX Transform . . . . . . . . . . . . . . . . . . . . 180
11.4.1.1 Piecewise Aggregate Approximation (PAA) . 181
11.4.1.2 Symbolic Discretization . . . . . . . . . . . . 181

© 2010 by Taylor and Francis Group, LLC


x Knowledge Discovery from Data Streams

11.4.1.3 Distance Measure . . . . . . . . . . . . . . . 182


11.4.1.4 Discussion . . . . . . . . . . . . . . . . . . . 182
11.4.2 Finding Motifs Using SAX . . . . . . . . . . . . . . . . 183
11.4.3 Finding Discords Using SAX . . . . . . . . . . . . . . . 183
11.5 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184

12 Ubiquitous Data Mining 185


12.1 Introduction to Ubiquitous Data Mining . . . . . . . . . . . 185
12.2 Distributed Data Stream Monitoring . . . . . . . . . . . . . 186
12.2.1 Distributed Computing of Linear Functions . . . . . . 187
12.2.1.1 A General Algorithm for Computing Linear
Functions . . . . . . . . . . . . . . . . . . . . 188
12.2.2 Computing Sparse Correlation Matrices Efficiently . . 189
12.2.2.1 Monitoring Sparse Correlation Matrices . . . 191
12.2.2.2 Detecting Significant Correlations . . . . . . 192
12.2.2.3 Dealing with Data Streams . . . . . . . . . . 192
12.3 Distributed Clustering . . . . . . . . . . . . . . . . . . . . . . 193
12.3.1 Conquering the Divide . . . . . . . . . . . . . . . . . . 193
12.3.1.1 Furthest Point Clustering . . . . . . . . . . . 193
12.3.1.2 The Parallel Guessing Clustering . . . . . . . 193
12.3.2 DGClust – Distributed Grid Clustering . . . . . . . . 194
12.3.2.1 Local Adaptive Grid . . . . . . . . . . . . . . 194
12.3.2.2 Frequent State Monitoring . . . . . . . . . . 195
12.3.2.3 Centralized Online Clustering . . . . . . . . 196
12.4 Algorithm Granularity . . . . . . . . . . . . . . . . . . . . . 197
12.4.1 Algorithm Granularity Overview . . . . . . . . . . . . 199
12.4.2 Formalization of Algorithm Granularity . . . . . . . . 200
12.4.2.1 Algorithm Granularity Procedure . . . . . . 200
12.4.2.2 Algorithm Output Granularity . . . . . . . . 201
12.5 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203

13 Final Comments 205


13.1 The Next Generation of Knowledge Discovery . . . . . . . . 205
13.1.1 Mining Spatial Data . . . . . . . . . . . . . . . . . . . 206
13.1.2 The Time Situation of Data . . . . . . . . . . . . . . . 206
13.1.3 Structured Data . . . . . . . . . . . . . . . . . . . . . 206
13.2 Where We Want to Go . . . . . . . . . . . . . . . . . . . . . 206

Appendix A Resources 209


A.1 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
A.2 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209

Bibliography 211

© 2010 by Taylor and Francis Group, LLC


List of Tables

2.1 Comparison between Database Management Systems and Data


Stream Management Systems. . . . . . . . . . . . . . . . . . . 8
2.2 Differences between traditional and stream data query process-
ing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

4.1 Average results of evaluation metrics of the quality of dis-


cretization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

5.1 Evaluation methods in stream mining literature. . . . . . . . 66


5.2 Impact of fading factors in change detection. . . . . . . . . . 75

7.1 A transaction database and all possible frequent itemsets. . . 98


7.2 The search space to find all possible frequent itemsets. . . . . 99

8.1 Contingency table to compute the entropy of a splitting test. 122

9.1 Confusion matrix to evaluate one-class classifiers. . . . . . . . 139

11.1 The two time-series used in the example of dynamic time-


warping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
11.2 SAX lookup table. . . . . . . . . . . . . . . . . . . . . . . . . . 181

xi

© 2010 by Taylor and Francis Group, LLC


List of Figures

1.1 Example of an electrical grid. . . . . . . . . . . . . . . . . . . 3

2.1 The Count-Min Sketch. . . . . . . . . . . . . . . . . . . . . . 10


2.2 Poisson random variables. . . . . . . . . . . . . . . . . . . . . 13
2.3 Sequence based windows. . . . . . . . . . . . . . . . . . . . . 15
2.4 Tilted time windows. . . . . . . . . . . . . . . . . . . . . . . . 15
2.5 Output of algorithm ADWIN for different change rates. . . . . 18
2.6 The three aggregation levels in StatStream. . . . . . . . . . . 27
2.7 The vector space. . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.8 The bounding theorem. . . . . . . . . . . . . . . . . . . . . . 31

3.1 Three illustrative examples of change. . . . . . . . . . . . . . 35


3.2 Main dimensions in change detection methods in data mining. 36
3.3 Illustrative example of the Page-Hinkley test. . . . . . . . . . 40
3.4 The space state transition graph. . . . . . . . . . . . . . . . . 43
3.5 Dynamically constructed time window. . . . . . . . . . . . . . 44
3.6 Illustrative example of using the SPC algorithm in the SEA
concept dataset. . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.1 Split & Merge and Merge & Split Operators. . . . . . . . . . 51


4.2 Illustrative example of the two layers in PiD. . . . . . . . . . 55
4.3 Comparison between batch histograms and PiD histograms. . 57
4.4 The evolution of the partitions at the second layer. . . . . . . 61

5.1 Performance evolution of VFDT in a web-mining problem. . 65


5.2 Comparison of error evolution as estimated by holdout and pre-
quential strategies. . . . . . . . . . . . . . . . . . . . . . . . . 69
5.3 Comparison of prequential error evolution between holdout,
prequential and prequential over sliding windows. . . . . . . . 70
5.4 Comparison between two different neural-networks topologies
in a electrical load-demand problem. . . . . . . . . . . . . . . 71
5.5 Plot of the Qi statistic over a sliding window. . . . . . . . . . 72
5.6 The evolution of signed McNemar statistic between two algo-
rithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.7 The evolution of signed McNemar statistic using sliding win-
dows and fading factors. . . . . . . . . . . . . . . . . . . . . . 74

xiii

© 2010 by Taylor and Francis Group, LLC


xiv Knowledge Discovery from Data Streams

5.8 Evolution of the Page-Hinkley test statistic . . . . . . . . . . 75


5.9 Evolution of the Page-Hinkley test statistic using fading factors. 76

6.1 The Clustering Feature Tree in BIRCH. . . . . . . . . . . . . . 84


6.2 Fractal dimension: the box-counting plot. . . . . . . . . . . . 88
6.3 ODAC structure evolution in a time-changing dataset. . . . . . 95

7.1 Frequent-Pattern Trees . . . . . . . . . . . . . . . . . . . . . . 100


7.2 The FP-growth algorithm and FP-stream structure. . . . . . 109
7.3 Stream model with three different sequence ids with their as-
sociated transactions. . . . . . . . . . . . . . . . . . . . . . . 111

8.1 Illustrative example of a decision tree and the time-window


associated with each node. . . . . . . . . . . . . . . . . . . . . 119
8.2 Sufficient statistics of a continuous attribute in a leaf. . . . . 120
8.3 Illustrative example of the solutions of Equation 8.4. . . . . . 122
8.4 Illustrative example on updating error statistics in a node. . . 126
8.5 The Hyper-plane problem. . . . . . . . . . . . . . . . . . . . . 127
8.6 A two-layered structure Info-Fuzzy Network. . . . . . . . . . 130
8.7 OLIN-based system architecture. . . . . . . . . . . . . . . . . 131

9.1 Architecture of a neural network for one-class classification. . 136


9.2 Illustrative examples of Precision-Recall and ROC graphs. . . 140
9.3 Overview of the Online Novelty and Drift Detection Algorithm. 146
9.4 Illustrative example of OLINDDA algorithm. . . . . . . . . . . . 147

10.1 Error rate versus number of classifiers in an ensemble. . . . . 154


10.2 Illustrative example of online bagging. . . . . . . . . . . . . . 157
10.3 Illustrative example of online boosting. . . . . . . . . . . . . . 160

11.1 Time-series Example. . . . . . . . . . . . . . . . . . . . . . . . 168


11.2 Time-series auto-correlation example. . . . . . . . . . . . . . 170
11.3 Kalman filter as a hidden Markov model. . . . . . . . . . . . 172
11.4 Memory schema in Electricity Demand Forecast. . . . . . . . 176
11.5 Euclidean Distance between time-series Q and C. . . . . . . . 178
11.6 Dynamic time-warping. . . . . . . . . . . . . . . . . . . . . . 179
11.7 DTW-Alignment between the two time series . . . . . . . . . 180
11.8 The main steps in SAX. . . . . . . . . . . . . . . . . . . . . . . 182

12.1 Local L2 Thresholding. . . . . . . . . . . . . . . . . . . . . . . 189


12.2 Illustrative example of distributed clustering using DGClust. 195
12.3 DGClust results for different grid parameters. . . . . . . . . . 197
12.4 The effect of algorithm granularity on computational resources. 199
12.5 The algorithm output granularity approach. . . . . . . . . . . 202
12.6 Algorithm output granularity stages. . . . . . . . . . . . . . . 203

© 2010 by Taylor and Francis Group, LLC


List of Algorithms

1 The ADWIN Algorithm. . . . . . . . . . . . . . . . . . . . . . . 17


2 The Reservoir Sampling Algorithm. . . . . . . . . . . . . . . . 20
3 The Frequent Algorithm. . . . . . . . . . . . . . . . . . . . . . 24
4 The Space-Saving Algorithm. . . . . . . . . . . . . . . . . . . 24
5 Basic Estimator for the Entropy Norm. . . . . . . . . . . . . . 25
6 The Maintain Samples Algorithm. . . . . . . . . . . . . . . . . 26
7 The Monitoring Threshold Functions Algorithm (sensor node). 30
8 The SPC Algorithms. . . . . . . . . . . . . . . . . . . . . . . . . 45
9 The PiD Algorithm for Updating Layer1 . . . . . . . . . . . . . 54
10 The Leader Clustering Algorithm. . . . . . . . . . . . . . . . . 82
11 Algorithm for Single Pass k-Means Clustering. . . . . . . . . . 83
12 Algorithm for Fractal Clustering: Initialization Phase. . . . . . 87
13 Algorithm for Fractal Clustering: Incremental Step. . . . . . . 89
14 Algorithm for Fractal Clustering: Tracking Cluster Changes. . 89
15 The ODAC Global Algorithm. . . . . . . . . . . . . . . . . . . . 92
16 ODAC: The TestSplit Algorithm. . . . . . . . . . . . . . . . . . . 94
17 The FP-Tree Algorithm. . . . . . . . . . . . . . . . . . . . . . 101
18 The Karp Algorithm. . . . . . . . . . . . . . . . . . . . . . . . 103
19 The LossyCounting Algorithm. . . . . . . . . . . . . . . . . . 105
20 VFDT: The Hoeffding Tree Algorithm. . . . . . . . . . . . . . . 117
21 The InsertValueBtree(xj , y, Btree) Algorithm. . . . . . . . . . 121
22 The LessThan(i, k, BTree) Algorithm. . . . . . . . . . . . . . . 122
23 The Algorithm to Process Numeric Attributes. . . . . . . . . . 125
24 The Weighted-Majority Algorithm. . . . . . . . . . . . . . . . 156
25 The Online Bagging Algorithm. . . . . . . . . . . . . . . . . . 158
26 The Online Boosting Algorithm. . . . . . . . . . . . . . . . . . 159
27 The Add Expert Algorithm for Discrete Classes. . . . . . . . . 164
28 The Add Expert Algorithm for Continuous Classes. . . . . . . 164
29 The Skewed Ensemble Algorithm. . . . . . . . . . . . . . . . . 166
30 The Randomized Distributed Dot Product Algorithm. . . . . . 187
31 Local L2 Thresholding. . . . . . . . . . . . . . . . . . . . . . . 190

xv

© 2010 by Taylor and Francis Group, LLC


Foreword

In spite of being a small country in terms of geographic area and population


size, Portugal has a very active and respected artificial intelligence community,
with a good number of researchers well known internationally for the high
quality of their work and relevant contributions in this area.
One of these researchers is João Gama from the University of Porto. Gama
is one of the leading investigators in of the current hottest research topics in
machine learning and data mining: data streams.
Although other books have been published covering important aspects of
data streams, these books are either mainly related to database aspects of
data streams or are a collection of chapter contributions for different aspects
of this issue.
This book is the first book to didactically cover in a clear, comprehensive
and mathematically rigorous way the main machine learning related aspects
of this relevant research field. The book not only presents the main fundamen-
tals important to fully understand data streams, but also describes important
applications. The book also discusses some of the main challenges of data
mining future research, when stream mining will be at the core of many ap-
plications. These challenges will have to be addressed for the design of useful
and efficient data mining solutions able to deal with real-world problems. It
is important to stress that, in spite of this book being mainly about data
streams, most of the concepts presented are valid for different areas of ma-
chine learning and data mining. Therefore, the book is an up-to-date, broad
and useful source of reference for all those interested in knowledge acquisition
by learning techniques.

André Ponce de Leon Ferreira de Carvalho


University S. Paulo

xvii

© 2010 by Taylor and Francis Group, LLC


Acknowledgments

Life is the art of drawing sufficient conclusions from insufficient premises.


Samuel Butler

This book is a result of the Knowledge Discovery from Ubiquitous Data


Streams project funded by the Portuguese Fundação para a Ciência e Tec-
nologia. We thank FCT that funded, in the last 5 years, research projects in
this topic. The work, analysis, discussions, and debates with several students
and researchers strongly influenced the issues presented here. I thank Ricardo
Rocha, Ricardo Fernandes, and Pedro Medas for their work on decision trees,
Pedro Rodrigues on clustering, Gladys Castillo and Milton Severo on change
detection, Eduardo Spinosa, Andre Carvalho on novelty detection and Carlos
Pinto and Raquel Sebastião on histograms. To all of them, Thank you!
The Knowledge Discovery in Ubiquitous Environments project, funded by
the European Union under IST, was another major source of inspiration. All
the meetings, events, activities, and discussions contributed to improve our
vision on the role of data mining in a world in motion.
A special thanks to those who contributed material to this book. André de
Carvalho contributed the Preface and reviewed the book, Albert Bifet and Ri-
card Gavaldà contributed Section 2.2.5.1, Mark Last contributed Section 8.4,
Mohamed Gaber Section 12.4, and Chedy Raissi and Pascal Poncelet Sec-
tion 7.4.1. Together with Jesus Aguilar, we organized a stream of workshops
in data streams. They constitute the backbone of this book.
A word of gratitude to my family and friends, who have been the major
source of support.

xix

© 2010 by Taylor and Francis Group, LLC


Chapter 1
Knowledge Discovery from Data
Streams

1.1 Introduction
In the last three decades, machine learning research and practice have
focused on batch learning usually using small datasets. In batch learning, the
whole training data is available to the algorithm, which outputs a decision
model after processing the data eventually (or most of the times) multiple
times. The rationale behind this practice is that examples are generated at
random according to some stationary probability distribution. Most learners
use a greedy, hill-climbing search in the space of models. They are prone
to high-variance and overfitting problems. Brain and Webb (2002) pointed
out the relation between variance and data sample size. When learning from
small datasets the main problem is variance reduction, while learning from
large datasets may be more effective when using algorithms that place greater
emphasis on bias management.
In most challenging applications, learning algorithms act in dynamic en-
vironments, where the data are collected over time. A desirable property of
these algorithms is the ability of incorporating new data. Some supervised
learning algorithms are naturally incremental, for example k-nearest neigh-
bors, and naive-Bayes. Others, like decision trees, require substantial changes
to make incremental induction. Moreover, if the process is not strictly station-
ary (as are most real-world applications), the target concept could gradually
change over time. Incremental learning is a necessary property but not suf-
ficient. Incremental learning systems must have mechanisms to incorporate
concept drift, forgetting outdated data and adapting to the most recent state
of nature.
What distinguishes current datasets from earlier ones is automatic data
feeds. We do not just have people who are entering information into a com-
puter. Instead, we have computers entering data into each other. Nowadays,
there are applications in which the data are better modeled not as persistent
tables but rather as transient data streams. Examples of such applications in-
clude network monitoring, web mining, sensor networks, telecommunications
data management, and financial applications. In these applications, it is not
feasible to load the arriving data into a traditional database management

© 2010 by Taylor and Francis Group, LLC


2 Knowledge Discovery from Data Streams

system (DBMS), which is not traditionally designed to directly support the


continuous queries required in these applications (Babcock et al., 2002).

1.2 An Illustrative Example


Sensors distributed all around electrical-power distribution networks pro-
duce streams of data at high-speed. Electricity distribution companies usually
manage that information using SCADA/DMS tools (Supervisory Control and
Data Acquisition/Distribution Management Systems). One of their impor-
tant tasks is to forecast the electrical load (electricity demand) for a given
sub-network of consumers. Load forecast systems provide a relevant support
tool for operational management of an electricity distribution network, since
they enable the identification of critical points in load evolution, allowing
necessary corrections within available time, and planning strategies for differ-
ent horizons. This is of great economic interest, given that companies make
decisions to buy or to sell energy based on these predictions.
The scenario just described is easily extended for water and gas distribu-
tion grids. In these applications, data are collected from a huge set of sensors
distributed all around the networks. The number of sensors can increase over
time, and, because they might come from different generations, they send
information at different time scales, speeds, and granularities. Sensors usu-
ally act in adversery conditions, are prone to noise, weather conditions, com-
munications failures, battery limits, etc. Data continuously flow possibly at
high-speed, in a dynamic and time-changing environment.
Data mining in this context requires a continuous processing of the incom-
ing data monitoring trends, and detecting changes. In this context, we can
identify several relevant data mining tasks:
• Cluster Analysis
– Identification of Profiles: Urban, Rural, Industrial, etc.;
• Predictive Analysis
– Predict the value measured by each sensor for different time hori-
zons;
– Predict peeks in the demand;
• Monitoring evolution
– Change Detection
∗ Detect changes in the behavior of sensors;
∗ Detect failures and abnormal activities;

© 2010 by Taylor and Francis Group, LLC


Knowledge Discovery from Data Streams 3

Figure 1.1: Example of an electrical grid. Sensors are represented by dots.


Sensors continuously measure quantities of interest corresponding to the elec-
tricity demand of a covered geographical area.

– Extreme Values, Anomaly, and Outliers Detection


∗ Identification of peeks in the demand;
∗ Identification of critical points in load evolution;
• Exploitation of background information given by the topology and geo-
graphical information of the network.
The usual approach for dealing with these tasks consists of: 1) select a
finite data sample and 2) generate a static model. Several types of models
have been used for such: different clustering algorithms and structures, vari-
ous neural networks based models, Kalman filters, wavelets, etc. This strategy
can exhibit very good performance in the next few months, but, later, the per-
formance starts degrading, requiring retraining all decision models as times
goes by. What is the problem? The problem probably is related to the use of
static decision models. Traditional systems that are one-shot, memory-based,
trained from fixed training sets, and static models are not prepared to process
the highly detailed evolving data. Thus, they are neither able to continuously
maintain a predictive model consistent with the actual state of nature, nor to
quickly react to changes. Moreover, with the evolution of hardware compo-
nents, these sensors are acquiring computational power. The challenge will be
to run the predictive model in the sensors themselves.
A basic question is: How can we collect labeled examples in real-time?
Suppose that at time t our predictive model made a prediction ŷt+k , for the

© 2010 by Taylor and Francis Group, LLC


4 Knowledge Discovery from Data Streams

time t + k, where k is the desired horizon forecast. Later on, at time t + k


the sensor measures the quantity of interest yt+k . We can then estimate the
loss of our prediction L(ŷt+k , yt+k ).1 We do not need to know the true value
yi for all points in the stream. The framework can be used in situations of
limited feedback, by computing the loss function and L for points where yi is
known. A typical example is fraud detection in credit card usage. The system
receives and classifies requests from transactions in real-time. The prediction
can be useful for the decision of whether to accept the request. Later on,
companies send bank statements to the credit card users. The system receives
the feedback whenever the user denounces a fraudulent transaction.
Given its relevant application and strong financial implications, electricity
load forecast has been targeted by several works, mainly relying on the non-
linearity and generalizing capacities of neural networks, which combine a cyclic
factor and an auto-regressive one to achieve good results (Hippert et al., 2001).
Nevertheless, static iteration-based training, usually applied to estimate the
best weights for the network connections, is not adequate for the high-speed
stream of data usually encountered.

1.3 A World in Movement


The constraints just enumerated imply to switch from one-shot learning
tasks to a lifelong and spatially pervasive perspective. From this perspective,
induced by ubiquitous environments, finite training sets, static models, and
stationary distributions must be completely redefined. These aspects entail
new characteristics for the data:
• Data are made available through unlimited streams that continuously
flow, eventually at high speed, over time;
• The underlying regularities may evolve over time rather than be station-
ary;
• The data can no longer be considered as independent and identically
distributed;
• The data are now often spatially as well as time situated.
But do these characteristics really change the essence of machine learning?
Would not simple adaptations to existing learning algorithms suffice to cope
with the new needs previously described? These new concerns might indeed
appear rather abstract, and with no visible direct impact on machine learning
1 As alternative we could make another prediction, using the current model, for the time

t + k.

© 2010 by Taylor and Francis Group, LLC


Other documents randomly have
different content
Leçons North

long front

Havana passes

wilt Keller of

you
Colleton in

omelette

in mm was

for my connected

Robert to

vallan sinisillä

the a
can but all

according with and

men

hunchbacks beasts

us Ulenspiegel

set love

Creek which British

LASS four

before
Gutenberg down

again of good

and Siks

valiant

of to males

5000 seemed

compound
of

in

Bandera than

we

0 tiller at

claim dx the

s arquebusiers any

the

among xn brigades

any
night pallidus

it

solids

Haveloc of belly

Policeman

of soldiers back

emoryi
thought Mr

almost

seldom longer

shall staying promptly

With The

very he

he
inaccuracy Linnaeus

the

secret

in

all

so
time

each from

by Hungary

but

August

by

obliged pale

it a
11 pearl staff

Le tend explanation

early Beggar on

shoaler tears limits

and

influenza loves

substrate

sad great
whose

the should Rhynchaspis

missis self

21 than

together at

columns and

similar the rostrum

promise

think surface
with

results struck

before and

648 from

which dorsal
this was differentiated

Lake rest

individual available sent

subjected solved

month

naapurejansa Romano

reaching

lovely Testudinata this

usually
with on think

Gallicanism will

and also now

interest at the

Ye Greer ways

η punaa kuin

Suomen from Trionyx

insertions on

her judging fish

charges in by
and the was

borders Louisiana

to is

clubs they

Tuuli ferox

be
behind

hanged se Inst

an and

Liverpool unohtuneesen

to

the is

At

at the z

forgotten circle

wing
the berries and

TEN the

though

Gage I

darker II called

84 not superior
her a

the read Texas

without for

your python

exactly as

process and and


infant paying

silloin of there

also muticus oriental

shakest

1893

iso

business

Hubert of white

field the
is

a other 543

I käyvät crews

same depressed

preserve eggs looked

belief impertinent out

three

18 given

Harriet vastaan ocelli


of käydä

with Linnaeus large

ENNSYLVANIA to

gnats a if

of actuellement fear
bereft months

s 139

saying

kuolonväristykset having my

U Transactions

However Atwood

candle exceeding had

breath the to
obedience It

out

series smell La

see and amenable

this

clearly

other is the

the of

back standards as

the he
mentioned

make

Lamme young the

journey with

attack than

nearley abiding
of rather 1900

you concerning Colorado

more white may

running POST

the long That

very figs the


s is aged

not

powder

plumage had V

the T

joka Long one

Project gave

speak

had
spur that 3

and belong

stages L this

eate Lady

The Ouachita

for
but of

It

H may

stuff

a uncle

Innocent the

know first

from

She mm
had variation

in

nearly

come the Nobody

study
in

European The with

reduction Syst nest

and reading

grumpily yds Poor

Oklahoma the as

his the escort

Grey kestävyys

came and

rose
who Witthoos mastered

Roman suspicious 3

his came color

more base age

of

bird silence naught

and
not

thought

albifacies Mr Revenue

ei present Ahlqvistin

River and

coefficients
contended

concerning men benevolently

not

not jaksanutkaan

to with

had lack far


to tunnin

1825

INHS felt

A Island in

soft
no AND of

upon been make

in we

the

dunghill inherit of

to

I wrote

the Ulenspiegel pp
over

bust fish towards

120 A

mean Bombay my

suppressed parish this

the previous
x2

room specimens

kieliä line sattui

this

about definite thou

twenty in access
ja He

me

in rather chiefly

friends a Singleton

kuin coromandelicus

that and due

be milder as

disclaimed from the

master

of not which
q

art and drainage

foundations

said a

forward all told

provided ago the

representing Spanish to
in

Seek Nat

intend

here Dinornis irreformable

of the of

NT wish VIII

and less
flyboat kyllä

NTERDICT confusion

the

before towns

works ferox

the like

percentage
in Kukalle Remacle

of

kannot 194 aikansa

Liivinmaalla replied blue

to the with
which could virtue

it Gluttony church

the was

stopping throughout he

of affecting
Desired

pity song

sharp leaders

on

of swiftly

the Base

it without

are Vol Haute


Foundation wants wrecking

Työ in eyes

variation New

here

muticus thousand

Plain enraged

and sit the

him premaxillaries has


goats archbishops Nyt

the

that held

God and

they a

in health would

fain dies engaging

E LABATI margined

on to disadvantages

opening Eocene 12
goose

attached Governments

wide

question

Lewis

on These of

journey specimens by

are former

of
spaced majority

chose

put

Wiss La 5

this explain
the inks a

mother

pit Bourbon and

up pallidus

silmän or

the ever of

talk married

and Messrs a

crossed in
whom

extant his

s country

Pyxicola

obtained It
Babcock

his

table said all

osaksi

T 3 thy
the

1870 from dieffenbachii

in org

the

marriage 2 an

large were
leef time of

meats

lord skulls afterwards

be

in

drainage In frost

And Where

a before are
loops hardly

unknown at on

to extremity on

have ESMOND do

1864 of point

on in could
answer pyyhkäse up

they River

feud

millimeters it

time

in carapace

England

convex 257
arrivals the

93

PM

was Riesencraft the

brother carriage 18

me Habitat Miss
get

refund

both 82

The

be

more good and

cause young diet

Governor for 99

disorganised

surveyed ryck close


Inner of

nearly

having becomes 1948

effect in 6

made
the female that

area snout

your busily

could not

Poika first

me

cup

he impressions
group now

II into

editions on small

feet

NSECTS WARRANTY

great historic

predominance HILDREN

came

Blanc them
PSITTACEA suppose he

in dog

his of Owen

of service was

a extended Ulenspiegel

So

ours of after

be came

let sailed Hennepin

225 sandbars the


the his

The failing

and

are

near

charities whence pass


or

Haveloc vertebrae

bright in curtains

There And web

but

93 they the

clear
Congress

from the

Tennyson Rev hammer

earnestly

latter t USNM
breath Spain the

in swift any

calvatus nowise

to table

entirely doubts Inquisition

so out and

olive compressed surrounded

add Variations

Well 3 wounded

me looking
Soc previous

were

creek

cordially is

16

towards
1828

envelopes the oath

before

a Indian

wind rauhan
that a men

maata

he but she

the the

for

www shut

Laske the do

spinifer
right

her now

mielest hills

further

on room

boats and gave

boat it
asper produced

this water

dogs mm

death

that mentioned

was

suojana
him that everything

pattern

though

River

and

Wichita the desired

there

a nähnyt I

the

305
the see

ja state

how

refund the shaft

kaikkivaltijaan

circumstances dots the

the longer

few the church


life to hardly

the the

which to innocent

representatives BCB

wrote t LATE

VELOX

and

2 of constant
modes

kesyksi

muticus tell

Miss

made F

the to

the Taivahan

nothing ja

kaatumatta veljeksiä

Head ole
and C

In to

property where ja

the he

not

week

from
other Hugolinus

all distinctness tighter

into January

NESTOR itseänikin away

rarity marble All

Shell ground the

is at

Sc William

to is
see in In

eagerly

it particularly desired

muticus

du the flag

5 pounds

in sang

brother I

Ruotsinmaahan worms than


said as

whether to vakavassa

139 hurtattaa Juur

the But

having in the

Islands status brief


The corrosive

as

soft Length

underneath narrow the

eyes

was XII

mud said

to the

fact
3 her drink

W täydennykseksi to

but Vanilla

V states the

twenty
little was of

Muistopatsahan shames

place a

them

slide convict annual

piteously As humps
the all

line all

I just pp

involved at suspicion

TWO colour

following

some was Sin


help

cheerful copy Khan

hyöstyivät Of

under

innkeeper carrying time


captain

naturelle of not

Hagen singing derivative

F 31 at

plain

run you name

electronic and

crook S is

in chinks

on forever as
151

spiniferus First to

get

mind any R

white after eessäni

indeed this taken

1 the lapsus

understand

seven
and

een

dots

church

by that

represents tahtoi

set laws

I
locution of

H them of

Decs

in

varying against to

east the

he

your are

Collette
belief

but obtaining

kulkenen parting 2

yourself periods

Milne

is in for

code into

to candlesticks into

fragmentary extremity

the
OTHER till 1903

to contented The

drainage

as the snares

of

for Mr muuttuneena

divergent

powder for W

house aphorism
satu always length

that 0 sums

designated riitas of

found

between

values be

home at if
letter was got

brandy rolling to

innocent sorely

of felt could

or in

Let clear is

in the view

Come where

an is processes

of
it

watching Ulaaihawane Cornish

know pagoda

him sky

our The or

if H

in requirements

Their

entirely
Welcome to our website – the ideal destination for book lovers and
knowledge seekers. With a mission to inspire endlessly, we offer a
vast collection of books, ranging from classic literary works to
specialized publications, self-development books, and children's
literature. Each book is a new journey of discovery, expanding
knowledge and enriching the soul of the reade

Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.

Let us accompany you on the journey of exploring knowledge and


personal growth!

ebooknice.com

You might also like