0% found this document useful (0 votes)
8 views

Data Mining with Python Theory Application and Case Studies 1st Edition Di Wu download

The document provides an overview of 'Data Mining with Python: Theory, Application, and Case Studies' by Di Wu, which focuses on hands-on learning of data mining techniques using Python. It covers the data mining pipeline, including data collection, integration, analysis, and visualization, with practical tutorials and case studies. The book is designed for students, data scientists, and business analysts to gain applicable skills in data mining.

Uploaded by

bloserenick
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Data Mining with Python Theory Application and Case Studies 1st Edition Di Wu download

The document provides an overview of 'Data Mining with Python: Theory, Application, and Case Studies' by Di Wu, which focuses on hands-on learning of data mining techniques using Python. It covers the data mining pipeline, including data collection, integration, analysis, and visualization, with practical tutorials and case studies. The book is designed for students, data scientists, and business analysts to gain applicable skills in data mining.

Uploaded by

bloserenick
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 75

Data Mining with Python Theory Application and

Case Studies 1st Edition Di Wu pdf download

https://ebookgate.com/product/data-mining-with-python-theory-
application-and-case-studies-1st-edition-di-wu/

Get Instant Ebook Downloads – Browse at https://ebookgate.com


Instant digital products (PDF, ePub, MOBI) available
Download now and explore formats that suit you...

Data Mining with R Learning with Case Studies Chapman Hall


CRC Data Mining and Knowledge Discovery Series 1st Edition
Torgo
https://ebookgate.com/product/data-mining-with-r-learning-with-case-
studies-chapman-hall-crc-data-mining-and-knowledge-discovery-
series-1st-edition-torgo/
ebookgate.com

Data Science Fundamentals with R Python and Open Data 1st


Edition Marco Cremonini

https://ebookgate.com/product/data-science-fundamentals-with-r-python-
and-open-data-1st-edition-marco-cremonini/

ebookgate.com

Data Mining Using SAS Enterprise Miner A Case Study


Approach 2nd Edition

https://ebookgate.com/product/data-mining-using-sas-enterprise-miner-
a-case-study-approach-2nd-edition/

ebookgate.com

Pattern Discovery Using Sequence Data Mining Applications


and Studies 1st Edition Pradeep Kumar

https://ebookgate.com/product/pattern-discovery-using-sequence-data-
mining-applications-and-studies-1st-edition-pradeep-kumar/

ebookgate.com
Applying Theory to Educational Research An Introductory
Approach with Case Studies 1st Edition Jeff Adams

https://ebookgate.com/product/applying-theory-to-educational-research-
an-introductory-approach-with-case-studies-1st-edition-jeff-adams/

ebookgate.com

Traditional Neutrality Revisited Law Theory and Case


Studies 1st Edition Elizabeth Chadwick

https://ebookgate.com/product/traditional-neutrality-revisited-law-
theory-and-case-studies-1st-edition-elizabeth-chadwick/

ebookgate.com

Cases on Health Outcomes and Clinical Data Mining Studies


and Frameworks Premier Reference Source 1st Edition
Patricia Cerrito
https://ebookgate.com/product/cases-on-health-outcomes-and-clinical-
data-mining-studies-and-frameworks-premier-reference-source-1st-
edition-patricia-cerrito/
ebookgate.com

Data Mining and Management 1st Edition Lawrence I.


Spendler

https://ebookgate.com/product/data-mining-and-management-1st-edition-
lawrence-i-spendler/

ebookgate.com

Handbook of Statistics 24 Data Mining and Data


Visualization C.R. Rao

https://ebookgate.com/product/handbook-of-statistics-24-data-mining-
and-data-visualization-c-r-rao/

ebookgate.com
Data Mining with Python

Data is everywhere and it’s growing at an unprecedented rate. But making sense of all that data
is a challenge. Data Mining is the process of discovering patterns and knowledge from large data
sets, and Data Mining with Python focuses on the hands-on approach to learning Data Mining.
It showcases how to use Python Packages to fulfil the Data Mining pipeline, which is to collect,
integrate, manipulate, clean, process, organize, and analyze data for knowledge.

The contents are organized based on the Data Mining pipeline, so readers can naturally prog-
ress step by step through the process. Topics, methods, and tools are explained in three aspects:
“What it is” as a theoretical background, “why we need it” as an application orientation, and
“how we do it” as a case study.

This book is designed to give students, data scientists, and business analysts an understanding of
Data Mining concepts in an applicable way. Through interactive tutorials that can be run, modi-
fied, and used for a more comprehensive learning experience, this book will help its readers gain
practical skills to implement Data Mining techniques in their work.

Dr. Di Wu is an Assistant Professor of Finance, Information Systems, and Economics department


of Business School, Lehman College. He obtained a Ph.D. in Computer Science from the Graduate
Center, CUNY. Dr. Wu's research interests includeTemporal extensions to RDF and semantic
web, Applied Data Science, and Experiential Learning and Pedagogy in Business Education.
Dr. Wu developed and taught courses including Strategic Management, Databases, Business
Statistics, Management Decision Making, Programming Languages (C++, Java, and Python),
Data Structures and Algorithms, Data Mining, Big Data, and Machine Learning.
Chapman & Hall/CRC

The Python Series


About the Series

Python has been ranked as the most popular programming language, and it is widely used in
education and industry. This book series will offer a wide range of books on Python for students
and professionals. Titles in the series will help users learn the language at an introductory and
advanced level, and explore its many applications in data science, AI, and machine learning.
Series titles can also be supplemented with Jupyter notebooks.

Image Processing and Acquisition using Python, Second Edition


Ravishankar Chityala, Sridevi Pudipeddi
Python Packages
Tomas Beuzen and Tiffany-Anne Timbers
Statistics and Data Visualisation with Python
Jesús Rogel-Salazar
Introduction to Python for Humanists
William J.B. Mattingly

Python for Scientific Computation and Artificial Intelligence


Stephen Lynch

Learning Professional Python Volume 1: The Basics


Usharani Bhimavarapu and Jude D. Hemanth

Learning Professional Python Volume 2: Advanced


Usharani Bhimavarapu and Jude D. Hemanth

Learning Advanced Python from Open Source Projects


Rongpeng Li

Foundations of Data Science with Python


John Mark Shea

Data Mining with Python: Theory, Application, and Case Studies


Di Wu

For more information about this series please visit: https://www.crcpress.com/Chapman--Hall-


CRC/book-series/PYTH
Data Mining with Python
Theory, Application, and Case Studies

Di Wu
First edition published 2024
by CRC Press
2385 Executive Center Drive, Suite 320, Boca Raton, FL 33431

and by CRC Press


4 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN

CRC Press is an imprint of Taylor & Francis Group, LLC

© 2024 Di Wu

Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot as-
sume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have
attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders
if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please
write and let us know so we may rectify in any future reprint.

Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or
utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including pho-
tocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission
from the publishers.

For permission to photocopy or use material electronically from this work, access www.copyright.com or contact the
Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. For works that are not
available on CCC please contact [email protected]

Trademark notice: Product or corporate names may be trademarks or registered trademarks and are used only for iden-
tification and explanation without intent to infringe.

ISBN: 978-1-032-61264-5 (hbk)


ISBN: 978-1-032-59890-1 (pbk)
ISBN: 978-1-003-46278-1 (ebk)

DOI: 10.1201/9781003462781

Typeset in Latin Modern font


by KnowledgeWorks Global Ltd.

Publisher’s note: This book has been prepared from camera-ready copy provided by the authors.

Access the Support Material]: https://www.routledge.com/9781032598901


Dedication
Students, Staff, and Colleagues
at University of Colorado Boulder and Lehman College
Contents

List of Figures xi

Foreword xix

Preface xxi

Author Bios xxiii

Section I Data Wrangling

Chapter 1 ■ Data Collection 3


1.1 COLLECT DATA FROM FILES 4
1.1.1 Tutorial – Collect Data from Files 5
1.1.2 Documentation 13
1.2 COLLECT DATA FROM THE WEB 14
1.2.1 Tutorial – Collect Data from Web 15
1.2.2 Case Study – Collect Weather Data from Web 20
1.3 COLLECT DATA FROM SQL DATABASES 23
1.3.1 Tutorial – Collect Data from SQLite 24
1.3.2 Case Study – Collect Shopping Data from SQLite 28
1.4 COLLECT DATA THROUGH APIS 31
1.4.1 Tutorial – Collect Data from Yahoo 32

Chapter 2 ■ Data Integration 37


2.1 DATA INTEGRATION 37
2.1.1 Tutorial – Data Integration 38
2.1.2 Case Study – Data Science Salary 44

vii
viii ■ Contents

Chapter 3 ■ Data Statistics 53


3.1 DESCRIPTIVE DATA ANALYSIS 53
3.1.1 Tutorial – Statistical Understanding 54
3.1.2 Case Study – Statistical Understanding of YouTube and Spotify 59

Chapter 4 ■ Data Visualization 66


4.1 DATA VISUALIZATION WITH PANDAS 67
4.1.1 Tutorial – Data Visualization with Pandas 67
4.2 DATA VISUALIZATION WITH MATPLOTLIB 76
4.2.1 Tutorial – Data Visualization with Matplotlib 77
4.3 DATA VISUALIZATION WITH SEABORN 106
4.3.1 Tutorial – Data Visualization with Seaborn 106

Chapter 5 ■ Data Preprocessing 131


5.1 DEALING WITH MISSING VALUES 131
5.1.1 Tutorial – Handling Missing Values 132
5.2 DEALING WITH OUTLIERS 139
5.2.1 Tutorial – Detect Outliers Using IQR 139
5.2.2 Tutorial – Detect Outliers Using Statistics 144
5.3 DATA REDUCTION 146
5.3.1 Tutorial – Dimension Elimination 146
5.3.2 Tutorial – Sampling 148
5.4 DATA DISCRETIZATION AND SCALING 150
5.4.1 Tutorial – Data Discretization 151
5.4.2 Tutorial – Data Scaling 154
5.5 DATA WAREHOUSE 157
5.5.1 Tutorial – Data Cube 158
5.5.2 Tutorial – Pivot Table 162

Section II Data Analysis

Chapter 6 ■ Classification 171


6.1 NEAREST NEIGHBOR CLASSIFIERS 172
6.1.1 Tutorial – Iris Binary Classification Using KNN 172
6.1.2 Tutorial – Iris Multiclass Classification Using KNN 177
6.1.3 Tutorial – Iris Binary Classification Using RNN 182
6.1.4 Tutorial – Iris Multiclass Classification Using RNN 188
6.1.5 Case Study – Breast Cancer Classification Using Nearest
Neighbor Classifiers 193
Contents ■ ix

6.2 DECISION TREE CLASSIFIERS 196


6.2.1 Tutorial – Iris Binary Classification Using Decision Tree 197
6.2.2 Tutorial – Iris Multiclass Classification Using Decision Tree 204
6.2.3 Case Study – Breast Cancer Classification Using Decision Tree 212
6.3 SUPPORT VECTOR MACHINE CLASSIFIERS 215
6.3.1 Tutorial – Iris Binary Classification Using SVM 215
6.3.2 Tutorial – Iris Multiclass Classification Using SVM 218
6.3.3 Case Study – Breast Cancer Classification Using SVM 220
6.4 NAIVE BAYES CLASSIFIERS 222
6.4.1 Tutorial – Iris Binary Classification Using Naive Bayes 222
6.4.2 Tutorial – Iris Multiclass Classification Using Naive Bayes 225
6.4.3 Case Study – Breast Cancer Classification Using Naive Bayes 227
6.5 LOGISTIC REGRESSION CLASSIFIERS 229
6.5.1 Tutorial – Iris Binary Classification Using Logistic Regression 229
6.5.2 Tutorial – Iris Multiclass Classification Using Logistic
Regression 231
6.5.3 Case Study – Breast Cancer Classification Using Logistic
Regression 234
6.6 CLASSIFICATION METHODS’ COMPARISON 236
6.6.1 Case Study – Wine Classification Using Multiple Classifiers 236

Chapter 7 ■ Regression 242


7.1 SIMPLE REGRESSION 242
7.1.1 Tutorial – California Housing Price 243
7.1.2 Tutorial – California Housing Price 249
7.2 MULTIPLE REGRESSION 254
7.2.1 Tutorial – California Housing Price 255
7.3 REGULARIZATION 259
7.3.1 Tutorial – Regularization 259
7.3.2 Case Study – California Housing Price 263
7.4 CROSS-VALIDATION 270
7.4.1 Tutorial – Cross-Validation 270
7.4.2 Case Study – California Housing Price 273
7.5 ENSEMBLE METHODS 275
7.5.1 Tutorial – Iris Binary Classification Using Random Forests 276
7.5.2 Tutorial – Iris Multi Classification Using Random Forests 278
x ■ Contents

7.5.3 Case Study – California Housing Price 280


7.6 REGRESSION METHODS’ COMPARISON 288
7.6.1 Case Study – Diabetes 288

Chapter 8 ■ Clustering 298


8.1 PARTITION CLUSTERING 298
8.1.1 Tutorial 299
8.1.2 Case Study 309
8.2 HIERARCHICAL CLUSTERING 313
8.2.1 Tutorial 313
8.2.2 Case Study 316
8.3 DENSITY-BASED CLUSTERING 318
8.3.1 Tutorial 318
8.3.2 Case Study 321
8.4 GRID-BASED CLUSTERING 324
8.4.1 Tutorial 324
8.4.2 Case Study 327
8.5 PRINCIPAL COMPONENT ANALYSIS 331
8.5.1 Tutorial 332
8.5.2 Case Study 344
8.6 CLUSTERING METHODS’ COMPARISON 351
8.6.1 Case Study 351

Chapter 9 ■ Frequent Patterns 356


9.1 FREQUENT ITEMSET AND ASSOCIATION RULES 356
9.1.1 Tutorial – Finding Frequent Itemset 357
9.1.2 Tutorial – Detecting Association Rules 358
9.2 APRIORI AND FP-GROWTH ALGORITHMS 361
9.2.1 Tutorial – Apriori Algorithm 361
9.2.2 Tutorial – FP-Growth Algorithm 364
9.2.3 Case Study – Online Retail 366

Chapter 10 ■ Outlier Detection 370


10.1 OUTLIER DETECTION 371
10.1.1 Tutorial 371
10.1.2 Case Study 379

Index 389
List of Figures

4.1 A Scatter Plot 69


4.2 A Line Plot 69
4.3 Another Line Plot 70
4.4 An Area Plot 70
4.5 Another Area Plot 71
4.6 A Bar Plot 71
4.7 A Horizontal Bar Plot 72
4.8 A Histogram 72
4.9 Another Histogram Plot 73
4.10 Another Histogram Plot 73
4.11 Another Histogram Plot with Density 74
4.12 A Box Plot 74
4.13 Another Box Plot 75
4.14 A Pie Plot 75
4.15 A Color Map 76
4.16 A Simple Plot 78
4.17 A Scatter Plot with Marker o 78
4.18 A Scatter Plot with Marker * 79
4.19 A Scatter Plot with Marker. 79
4.20 A Scatter Plot with Marker , 80
4.21 A Scatter Plot with Marker x 80
4.22 A Scatter Plot with Marker X 81
4.23 A Scatter Plot with Marker + 81
4.24 A Scatter Plot with Marker P 82
4.25 A Scatter Plot with Marker s 82
4.26 A Scatter Plot with Marker D 83
4.27 A Scatter Plot with Marker d 83
4.28 A Scatter Plot with Marker p 84

xi
xii ■ LIST OF FIGURES

4.29 A Scatter Plot with Marker H 84


4.30 A Scatter Plot with Marker h 85
4.31 A Scatter Plot with Marker o 85
4.32 A Scatter Plot with Markerˆ 86
4.33 A Scatter Plot with Marker < 86
4.34 A Scatter Plot with Marker > 87
4.35 A Scatter Plot with Marker 1 87
4.36 A Scatter Plot with Marker 2 88
4.37 A Scatter Plot with Marker 3 88
4.38 A Scatter Plot with Marker 4 89
4.39 A Scatter Plot with Marker | 89
4.40 A Scatter Plot with Marker - 90
4.41 A Line Plot 90
4.42 A Line Plot 91
4.43 A Line Plot 91
4.44 A Line Plot 92
4.45 A Line Plot 92
4.46 A Line Plot 93
4.47 A Line Plot 93
4.48 A Line Plot 94
4.49 A Line Plot 94
4.50 A Line Plot 95
4.51 A Line Plot 95
4.52 A Line Plot 96
4.53 A Line Plot 96
4.54 A Line Plot 97
4.55 A Scatter Plot 97
4.56 A Colorbar Plot 98
4.57 A Scatter Plot with Different Dot-Sizes 98
4.58 A Scatter Plot with Colorbar and Different Dot-Sizes 99
4.59 A Bar Plot 100
4.60 A Histogram Plot 100
4.61 Another Histogram Plot 101
4.62 A Pie Plot 101
4.63 An Explode Pie Plot 102
4.64 A Box Plot 102
LIST OF FIGURES ■ xiii

4.65 A Violin Plot 103


4.66 A Multi-Plot 104
4.67 A Multi-Plot with Legend and Grid 104
4.68 A Multi-Plot as Stacks 105
4.69 A Multi-Plot as Columns 106
4.70 A Default Relational Plot 107
4.71 A Default Relational Plot with Gender Differentiation 107
4.72 A Default Relational Plot with Day Differentiation 108
4.73 A Default Relational Plot with Time Differentiation 108
4.74 A Default Relational Plot with Time Differentiation in Multicolumns 109
4.75 A Default Relational Plot with Size Differentiation 109
4.76 A Default Relational Plot with Size Differentiation and Different
Dot-Sizes 110
4.77 A Default Relational Plot with Large Size Differentiation 110
4.78 A Default Relational Plot with Large Size Differentiation
and Transparency 111
4.79 A Default Relational Plot with Categorical Xs 111
4.80 A Line Relational Plot 112
4.81 A Line Relational Plot with Gender Differentiation 112
4.82 A Line Relational Plot with Gender Differentiation in Multicolumns 113
4.83 A Default Distribution Plot 113
4.84 A Default Distribution Plot in Multicolumns 114
4.85 A Default Distribution Plot with Gender Differentiation 114
4.86 A Default Distribution Plot with Gender Differentiation
in Multicolumns 115
4.87 A KDE Distribution Plot with Gender Differentiation 115
4.88 A KDE Distribution Plot with Gender Differentiation and Stacking 116
4.89 A KDE Distribution Plot with Gender Differentiation, Stacking in
Multicolumns 116
4.90 A KDE Distribution Plot with Two Attributes 117
4.91 A KDE Distribution Plot with Two Attributes and Gender
Differentiation 117
4.92 A KDE Distribution Plot with Two Attributes and Rug 118
4.93 An ECDF Distribution Plot 118
4.94 An ECDF Distribution Plot with Gender Differentiation 119
4.95 An ECDF Distribution Plot with Gender Differentiation
in Multicolumns 119
xiv ■ LIST OF FIGURES

4.96 A Default Categorical Plot 120


4.97 A Default Categorical Plot with Gender Differentiation 120
4.98 A Box Categorical Plot 121
4.99 A Box Categorical Plot with Gender Differentiation 121
4.100 A Violin Categorical Plot 122
4.101 A Violin Categorical Plot with Gender Differentiation 122
4.102 Another Violin Categorical Plot with Gender Differentiation 123
4.103 A Violin Plot with Gender Differentiation and Quartile 123
4.104 A Bar Categorical Plot 124
4.105 A Bar Categorical Plot with Gender Differentiation 124
4.106 A Joint Plot 125
4.107 Another Joint Plot 125
4.108 Another Joint Plot 126
4.109 Another Joint Plot 126
4.110 Another Joint Plot 127
4.111 Another Joint Plot 127
4.112 Another Joint Plot 128
4.113 Another Joint Plot 128
4.114 A Pair Plot 129
4.115 A Pair Plot with Gender Differentiation 130

5.1 The Distribution Plot before Removing Outliers 141


5.2 The Box Plot before Removing Outliers 141
5.3 The Distribution Plot after Removing Outliers 143
5.4 The Box Plot after Removing Outliers 143
5.5 The Box Plot before Removing Outliers 145
5.6 The Box Plot after Removing Outliers 145

6.1 A Scatter Plot of Sepal Length VS Sepal Width with Species


Differentiation 173
6.2 A Scatter Plot of Sepal Length VS Sepal Width with Species
Differentiation 178
6.3 A Scatter Plot of Sepal Length VS Sepal Width with Species
Differentiation 184
6.4 A Scatter Plot of Sepal Length VS Sepal Width with Species
Differentiation 189
6.5 Accuracy of KNN Models 196
6.6 Accuracy of RNN Models 196
LIST OF FIGURES ■ xv

6.7 A Scatter Plot of Sepal Length VS Sepal Width with Species


Differentiation 198
6.8 A Default Decision Tree 200
6.9 A Decision Tree Trained with Entropy 201
6.10 A Decision Tree Trained with Max Depth as 1 202
6.11 A Decision Tree Trained with Max Depth as 3 203
6.12 A Scatter Plot of Sepal Length VS Sepal Width with Species
Differentiation 206
6.13 A Default Decision Tree 208
6.14 A Decision Tree Trained with Entropy 209
6.15 A Decision Tree Trained with Max Depth as 1 210
6.16 A Decision Tree Trained with Max Depth as 3 211
6.17 Accuracy VS Max Depth for Different Splitting Criteria 215
6.18 A Scatter Plot of Sepal Length VS Sepal Width with Species
Differentiation 217
6.19 A Scatter Plot of Sepal Length VS Sepal Width with Species
Differentiation 219
6.20 Accuracy VS Regularization Parameter (C) for SVM 222
6.21 A Scatter Plot of Sepal Length VS Sepal Width with Species
Differentiation 224
6.22 A Scatter Plot of Sepal Length VS Sepal Width with Species
Differentiation 226
6.23 A Scatter Plot of Sepal Length VS Sepal Width with Species
Differentiation 230
6.24 A Scatter Plot of Sepal Length VS Sepal Width with Species
Differentiation 233
6.25 Accuracy VS Regularization Parameter (C) 236
6.26 Accuracy Comparison Among Classification Methods 241

7.1 A Scatter Plot of Total Rooms VS Total Bedrooms 244


7.2 A Comparison with Predicted and True Values 245
7.3 A Scatter Plot of Median Income VS Median House Value 246
7.4 A Comparison with Predicted and True Values 247
7.5 A Scatter Plot of Households VS Population 248
7.6 A Comparison with Predicted and True Values 249
7.7 A Scatter Plot of X VS Y 250
7.8 A Scatter Plot of X VS Y 251
7.9 A Comparison with Predicted and True Values 253
xvi ■ LIST OF FIGURES

7.10 A Comparison with Predicted and True Values 254


7.11 A Comparison with Predicted and True Values 257
7.12 A Comparison with Predicted and True Values 258
7.13 A Scatter Plot of X VS Y 260
7.14 Performance Comparison between Polynomial Regression
and Regularization 270
7.15 A Scatter Plot of X VS Y 271
7.16 Mean Squared Error for Different Cross-Validation Techniques 273
7.17 Mean Squared Error for Different Cross-Validation Techniques 275
7.18 A Scatter Plot of Sepal Length VS Sepal Width with Species
Differentiation 277
7.19 A Scatter Plot of Sepal Length VS Sepal Width with Species
Differentiation 279
7.20 R-Squared Scores for Different Models 287
7.21 Comparison of Regression Methods on Diabetes Dataset 297

8.1 A Scatter Plot of X VS Y 300


8.2 K-Means Result with Cluster Differentiation 300
8.3 KMedoids Result with Cluster Differentiation 301
8.4 A Scatter Plot of X VS Y 302
8.5 K-Means Result with Cluster Differentiation 302
8.6 K-Medoids Result with Cluster Differentiation 303
8.7 A Scatter Plot of X VS Y 304
8.8 K-Means Result with Cluster Differentiation 304
8.9 K-Medoids Result with Cluster Differentiation 305
8.10 A Scatter Plot of X VS Y 306
8.11 K-Means Result with Cluster Differentiation 306
8.12 K-Medoids Result with Cluster Differentiation 307
8.13 A Scatter Plot of X VS Y 308
8.14 K-Means Result with Cluster Differentiation 308
8.15 K-Medoids Result with Cluster Differentiation 309
8.16 A Comparison with K-Means and K-Medoids Clustering 313
8.17 A Scatter Plot of Feature1 VS Feature2 315
8.18 DBSCAN Result with Three Clusters 320
8.19 Comparison among DBSCAN Results 323
8.20 A Scatter Plot of Feature1 VS Feature2 325
8.21 STING Clustering Result 326
LIST OF FIGURES ■ xvii

8.22 CLIQUE Clustering Result 327


8.23 CLIQUE Clustering Result 327
8.24 A Scatter Plot of Feature1 VS Feature2 328
8.25 STING Clustering Result 329
8.26 OPTICS Clustering Result 330
8.27 DBSCAN Clustering Result 331
8.28 Digit 0 333
8.29 Digit 0 333
8.30 Digit 1 334
8.31 Digit 2 334
8.32 Digit 3 335
8.33 Digit 4 335
8.34 Digit 5 336
8.35 Digit 6 336
8.36 Digit 7 337
8.37 Digit 8 337
8.38 Digit 9 338
8.39 K-Means Clustering 353
8.40 Agglomerative Clustering 354
8.41 DBSCAN Clustering 355

10.1 A Scatter Plot of Feature1 VS Feature2 with Colorbar 372


10.2 Outlier Detection by Z-Score 373
10.3 Outlier Detection by IQR 374
10.4 Outlier Detection by One-Class SVM 375
10.5 Outlier Detection by Isolation Forest 377
10.6 Outlier Detection by DBSCAN 378
10.7 Outlier Detection by LOF 379
10.8 Outlier Detection by Z-Score 383
10.9 Outlier Detection by IQR 384
10.10 Outlier Detection by One-Class SVM 385
10.11 Outlier Detection by Isolation Forest 386
10.12 Outlier Detection by DBSCAN 387
10.13 Outlier Detection by LOF 388
Foreword

WHY WE NEED THIS BOOK


Data is everywhere and it’s growing at an unprecedented rate. But making sense
of all that data is a challenge. Data Mining is the process of discovering patterns
and knowledge from large data sets. This book focuses on the hands-on approach
to learn Data Mining. This book is designed to give you an understanding of Data
Mining concepts in an applicable way. The tutorials in this book will help you to gain
practical skills to implement Data Mining techniques in your work. Whether you are
a student, a data scientist, or a business analyst, this book is a must-read for you.

xix
Preface

HOW TO USE THIS BOOK


This book is served as complementary to a theoretical Data Mining course. We intend
to keep the introductions brief and simple and concentrate on detailed tutorials. The
book is divided into two parts: Part 1 covers the preparation of data or Data Wrangling.
Part 2 covers the analysis of data or Data Analysis. For readers’ convenience, besides
including all tutorials within pages, we also provide the .ipynb files with associated
data sets through links. When you run the .ipynb files, please make sure the data
path is updated in your local/cloud environment.

WHY THIS BOOK IS DIFFERENT


While there are many books, websites, online courses about the topic, we differentiate
our book in multiple ways:
• We organized the contents based on the Data Mining pipeline, so readers can
naturally gain the formal process from raw data to knowledge step by step.
Readers can have a full stack of consistent learning, rather than learning from
pieces from multiple sources.
• For the topics, methods, and tools we cover in the book, we explain them in
three aspects: “What it is” as a theoretical background, “Why we need it” as
an application orientation, and “How we do it” as a case study.
• Our book is “LIVE”. All tutorials are runnable interactive Python notebooks in
.ipynb format. Students can run them, modify them, and use them.

xxi
Author Bios

Dr. Di Wu is an Assistant Professor of Finance, Information Systems, and Economics


department of Business School, Lehman College. He obtained a Ph.D. in Computer
Science from the Graduate Center, CUNY. Dr. Wu’s research interests are 1) Temporal
extensions to RDF and semantic web, 2) Applied Data Science, and 3) Experiential
Learning and Pedagogy in business education. Dr. Wu developed and taught courses
including Strategic Management, Databases, Business Statistics, Management Decision
Making, Programming Languages (C++, Java, and Python), Data Structures and
Algorithms, Data Mining, Big Data, and Machine Learning.

xxiii
I
Data Wrangling

1
CHAPTER 1

Data Collection

D ata collection is a crucial step in the process of obtaining valuable insights


and making informed decisions. In today’s interconnected world, data can be
found in a multitude of sources, ranging from traditional files such as .csv, .html,
.txt, .xlsx, .html, and .json, to databases powered by SQL, websites hosting relevant
information, and APIs (Application Programming Interfaces) offered by companies.
To efficiently gather data from these diverse sources, various tools can be employed.
These tools encompass an array of technologies, including web scraping frameworks,
database connectors, data extraction libraries, and specialized APIs, all designed to
facilitate the collection and extraction of data from different sources. By leveraging
these tools, organizations can harness the power of data and gain valuable insights to
drive their decision-making processes.
Python offers a rich ecosystem of packages for data collection. Some commonly used
Python packages for data collection include: including:
• Pandas: Pandas is a powerful library for data manipulation and analysis. It
provides data structures and functions to efficiently work with structured data,
making it suitable for data collection from CSV files, Excel spreadsheets, and
SQL databases.
• BeautifulSoup: Beautiful Soup is a Python library for web scraping. It helps
parse HTML and XML documents, making it useful for extracting data from
websites.
• Requests: Requests is a versatile library for making HTTP requests. It simplifies
the process of interacting with web services and APIs, allowing data retrieval
from various sources.
• mysql-connector-python, psycopg2, and sqlite3: These libraries are Python
connectors for MySQL, PostgreSQL, and sqlite databases, respectively. They
enable data collection by establishing connections to these databases, executing
queries, and retrieving data.

DOI: 10.1201/9781003462781-1 3
4 ■ Data Mining with Python

• Yahoo Finance: The Yahoo Finance library provides an interface to access


financial data from Yahoo Finance. It allows you to fetch historical stock prices,
company information, and other financial data.
These are just a few examples of Python packages commonly used for data collection.
We will cover them in detail with tutorials and case studies. Depending on the specific
data sources and requirements, there are many more packages available to facilitate
data collection in Python.

1.1 COLLECT DATA FROM FILES

Storing data in different file formats allows for versatility and compatibility with
various applications and tools.
• CSV (Comma-Separated Values): CSV files store tabular data in plain text
format, where each line represents a row, and values are separated by commas (or
other delimiters). CSV files are simple, human-readable, and widely supported.
They can be easily opened and edited using spreadsheet software or text editors.
However, CSV files may not support complex data structures, and there is no
standardized format for metadata or data types. Pandas provides the read_csv()
function, allowing you to read CSV files into a DataFrame object effortlessly.
It automatically detects the delimiter, handles missing values, and provides
convenient methods for data manipulation and analysis.
• TXT (Plain Text): TXT files contain unformatted text with no specific structure
or metadata. TXT files are lightweight, widely supported, and can be easily
opened with any text editor. However, TXT files lack a standardized structure or
format, making it challenging to handle data that requires specific organization
or metadata. Pandas offers the read_csv() function with customizable delimiters
to read text files with structured data. By specifying the appropriate delimiter,
you can read text files into a DataFrame for further analysis.
• XLSX (Microsoft Excel): XLSX is a file format used by Microsoft Excel to
store spreadsheet data with multiple sheets, formatting, formulas, and metadata.
XLSX files support complex spreadsheets with multiple tabs, cell formatting,
and formulas. They are widely used in business and data analysis scenarios.
However, XLSX files can be large, and manipulating them directly can be
memory-intensive. Additionally, XLSX files require software like Microsoft Excel
to view and edit. Pandas provides the read_excel() function, enabling the
reading of XLSX files into DataFrames. It allows you to specify the sheet name,
range of cells, and other parameters to extract data easily.
• JSON (JavaScript Object Notation): JSON is a lightweight, human-readable
data interchange format that represents structured data as key-value pairs, lists,
and nested objects. JSON is easy to read and write, supports complex nested
structures, and is widely used for data interchange between systems. However,
JSON files can be larger than their equivalent CSV representations, and handling
Data Collection ■ 5

complex nested structures may require additional processing. Pandas provides


the read_json() function to read JSON data directly into a DataFrame. It
handles both simple and nested JSON structures, allowing for convenient data
exploration and analysis.
• XML (eXtensible Markup Language): XML files store structured data using
tags that define elements and their relationships. XML is designed to be self-
descriptive and human-readable. XML files provide a flexible and extensible
format for storing structured data. They are widely used for data interchange
and can represent complex hierarchical structures. However, XML files can be
verbose and have larger file sizes compared to other formats. Parsing XML files
can be more complex due to the nested structure and the need for specialized
parsing libraries. Pandas provides the read_xml() function to directly read XML
files into a DataFrame. It provides several options for handling different XML
structures, such as extracting data from specific tags, handling attributes, and
parsing nested elements.
• HTML (Hypertext Markup Language): HTML files are primarily used for
structuring and presenting content on the web. They consist of tags that define
the structure and formatting of the data. HTML files provide a rich structure for
representing web content and can include images, links, and other multimedia
elements. However, HTML files are designed for web display, so extracting
structured data from them can be more complex due to the presence of non-
tabular content and formatting tags. Pandas provides the read_html() function,
which can extract tabular data from HTML tables into a DataFrame.

1.1.1 Tutorial – Collect Data from Files


We may have stored data in multiple types of files, such as text, csv, excel, xml, html,
etc. We can load them into dataframes.
import pandas as pd

1.1.1.1 CSV
We have done this when we learned pandas. You can get the path of your csv file,
and feed the path to the function read_csv.

Default setting A lot cases, default setting will do the job.


df = pd.read_csv('/content/ds_salaries.csv')

df.head()

Unnamed: 0 work_year experience_level employment_type \


0 0 2020 MI FT
1 1 2020 SE FT
6 ■ Data Mining with Python

2 2 2020 SE FT
3 3 2020 MI FT
4 4 2020 SE FT

job_title salary salary_currency salary_in_usd \


0 Data Scientist 70000 EUR 79833
1 Machine Learning Scientist 260000 USD 260000
2 Big Data Engineer 85000 GBP 109024
3 Product Data Analyst 20000 USD 20000
4 Machine Learning Engineer 150000 USD 150000

employee_residence remote_ratio company_location company_size


0 DE 0 DE L
1 JP 0 JP S
2 GB 50 GB M
3 HN 0 HN S
4 US 50 US L

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 607 entries, 0 to 606
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Unnamed: 0 607 non-null int64
1 work_year 607 non-null int64
2 experience_level 607 non-null object
3 employment_type 607 non-null object
4 job_title 607 non-null object
5 salary 607 non-null int64
6 salary_currency 607 non-null object
7 salary_in_usd 607 non-null int64
8 employee_residence 607 non-null object
9 remote_ratio 607 non-null int64
10 company_location 607 non-null object
11 company_size 607 non-null object
dtypes: int64(5), object(7)
memory usage: 57.0+ KB

Customize setting You can manipulate arguments for your specific csv file
df = pd.read_csv('/content/ds_salaries.csv', header = None)
df.head()

0 1 2 3 \
0 NaN work_year experience_level employment_type
1 0.0 2020 MI FT
2 1.0 2020 SE FT
3 2.0 2020 SE FT
4 3.0 2020 MI FT

4 5 6 7 \
Data Collection ■ 7

0 job_title salary salary_currency salary_in_usd


1 Data Scientist 70000 EUR 79833
2 Machine Learning Scientist 260000 USD 260000
3 Big Data Engineer 85000 GBP 109024
4 Product Data Analyst 20000 USD 20000

8 9 10 11
0 employee_residence remote_ratio company_location company_size
1 DE 0 DE L
2 JP 0 JP S
3 GB 50 GB M
4 HN 0 HN S

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 608 entries, 0 to 607
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 0 607 non-null float64
1 1 608 non-null object
2 2 608 non-null object
3 3 608 non-null object
4 4 608 non-null object
5 5 608 non-null object
6 6 608 non-null object
7 7 608 non-null object
8 8 608 non-null object
9 9 608 non-null object
10 10 608 non-null object
11 11 608 non-null object
dtypes: float64(1), object(11)
memory usage: 57.1+ KB

df = pd.read_csv('/content/ds_salaries.csv', header = None, skiprows=1)


df.head()

0 1 2 3 4 5 6 7 8 9 \
0 0 2020 MI FT Data Scientist 70000 EUR 79833 DE 0
1 1 2020 SE FT Machine Learning Scientist 260000 USD 260000 JP 0
2 2 2020 SE FT Big Data Engineer 85000 GBP 109024 GB 50
3 3 2020 MI FT Product Data Analyst 20000 USD 20000 HN 0
4 4 2020 SE FT Machine Learning Engineer 150000 USD 150000 US 50

10 11
0 DE L
1 JP S
2 GB M
3 HN S
4 US L
8 ■ Data Mining with Python

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 607 entries, 0 to 606
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 0 607 non-null int64
1 1 607 non-null int64
2 2 607 non-null object
3 3 607 non-null object
4 4 607 non-null object
5 5 607 non-null int64
6 6 607 non-null object
7 7 607 non-null int64
8 8 607 non-null object
9 9 607 non-null int64
10 10 607 non-null object
11 11 607 non-null object
dtypes: int64(5), object(7)
memory usage: 57.0+ KB

df = pd.read_csv('/content/ds_salaries.csv', header = None,


skiprows=1, skipfooter=300)
df.head()

0 1 2 3 4 5 6 7 8 9 \
0 0 2020 MI FT Data Scientist 70000 EUR 79833 DE 0
1 1 2020 SE FT Machine Learning Scientist 260000 USD 260000 JP 0
2 2 2020 SE FT Big Data Engineer 85000 GBP 109024 GB 50
3 3 2020 MI FT Product Data Analyst 20000 USD 20000 HN 0
4 4 2020 SE FT Machine Learning Engineer 150000 USD 150000 US 50

10 11
0 DE L
1 JP S
2 GB M
3 HN S
4 US L

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 307 entries, 0 to 306
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 0 307 non-null int64
1 1 307 non-null int64
2 2 307 non-null object
3 3 307 non-null object
4 4 307 non-null object
5 5 307 non-null int64
6 6 307 non-null object
Data Collection ■ 9

7 7 307 non-null int64


8 8 307 non-null object
9 9 307 non-null int64
10 10 307 non-null object
11 11 307 non-null object
dtypes: int64(5), object(7)
memory usage: 28.9+ KB

1.1.1.2 TXT
If the txt follows csv format, then it can be read as a csv file
df = pd.read_csv('/content/ds_salaries.txt')
df

Unnamed: 0 work_year experience_level employment_type \


0 0 2020 MI FT
1 1 2020 SE FT
2 2 2020 SE FT
3 3 2020 MI FT
4 4 2020 SE FT
.. ... ... ... ...
602 602 2022 SE FT
603 603 2022 SE FT
604 604 2022 SE FT
605 605 2022 SE FT
606 606 2022 MI FT

job_title salary salary_currency salary_in_usd \


0 Data Scientist 70000 EUR 79833
1 Machine Learning Scientist 260000 USD 260000
2 Big Data Engineer 85000 GBP 109024
3 Product Data Analyst 20000 USD 20000
4 Machine Learning Engineer 150000 USD 150000
.. ... ... ... ...
602 Data Engineer 154000 USD 154000
603 Data Engineer 126000 USD 126000
604 Data Analyst 129000 USD 129000
605 Data Analyst 150000 USD 150000
606 AI Scientist 200000 USD 200000

employee_residence remote_ratio company_location company_size


0 DE 0 DE L
1 JP 0 JP S
2 GB 50 GB M
3 HN 0 HN S
4 US 50 US L
.. ... ... ... ...
602 US 100 US M
603 US 100 US M
604 US 0 US M
605 US 100 US M
606 IN 100 US L

[607 rows x 12 columns]


10 ■ Data Mining with Python

1.1.1.3 Excel
df = pd.read_excel('/content/ds_salaries.xlsx')

df.head()

Unnamed: 0 work_year experience_level employment_type \


0 0 2020 MI FT
1 1 2020 SE FT
2 2 2020 SE FT
3 3 2020 MI FT
4 4 2020 SE FT

job_title salary salary_currency salary_in_usd \


0 Data Scientist 70000 EUR 79833
1 Machine Learning Scientist 260000 USD 260000
2 Big Data Engineer 85000 GBP 109024
3 Product Data Analyst 20000 USD 20000
4 Machine Learning Engineer 150000 USD 150000

employee_residence remote_ratio company_location company_size


0 DE 0 DE L
1 JP 0 JP S
2 GB 50 GB M
3 HN 0 HN S
4 US 50 US L

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 607 entries, 0 to 606
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Unnamed: 0 607 non-null int64
1 work_year 607 non-null int64
2 experience_level 607 non-null object
3 employment_type 607 non-null object
4 job_title 607 non-null object
5 salary 607 non-null int64
6 salary_currency 607 non-null object
7 salary_in_usd 607 non-null int64
8 employee_residence 607 non-null object
9 remote_ratio 607 non-null int64
10 company_location 607 non-null object
11 company_size 607 non-null object
dtypes: int64(5), object(7)
memory usage: 57.0+ KB
Data Collection ■ 11

1.1.1.4 json

df = pd.read_json('/content/ds_salaries.json')
df.head()

FIELD1 work_year experience_level employment_type \


0 0 2020 MI FT
1 1 2020 SE FT
2 2 2020 SE FT
3 3 2020 MI FT
4 4 2020 SE FT

job_title salary salary_currency salary_in_usd \


0 Data Scientist 70000 EUR 79833
1 Machine Learning Scientist 260000 USD 260000
2 Big Data Engineer 85000 GBP 109024
3 Product Data Analyst 20000 USD 20000
4 Machine Learning Engineer 150000 USD 150000

employee_residence remote_ratio company_location company_size


0 DE 0 DE L
1 JP 0 JP S
2 GB 50 GB M
3 HN 0 HN S
4 US 50 US L

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 607 entries, 0 to 606
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 FIELD1 607 non-null int64
1 work_year 607 non-null int64
2 experience_level 607 non-null object
3 employment_type 607 non-null object
4 job_title 607 non-null object
5 salary 607 non-null int64
6 salary_currency 607 non-null object
7 salary_in_usd 607 non-null int64
8 employee_residence 607 non-null object
9 remote_ratio 607 non-null int64
10 company_location 607 non-null object
11 company_size 607 non-null object
dtypes: int64(5), object(7)
memory usage: 57.0+ KB
12 ■ Data Mining with Python

1.1.1.5 XML
df = pd.read_xml('/content/ds_salaries.xml')
df.head()

FIELD1 work_year experience_level employment_type \


0 0 2020 MI FT
1 1 2020 SE FT
2 2 2020 SE FT
3 3 2020 MI FT
4 4 2020 SE FT

job_title salary salary_currency salary_in_usd \


0 Data Scientist 70000 EUR 79833
1 Machine Learning Scientist 260000 USD 260000
2 Big Data Engineer 85000 GBP 109024
3 Product Data Analyst 20000 USD 20000
4 Machine Learning Engineer 150000 USD 150000

employee_residence remote_ratio company_location company_size


0 DE 0 DE L
1 JP 0 JP S
2 GB 50 GB M
3 HN 0 HN S
4 US 50 US L

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 607 entries, 0 to 606
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 FIELD1 607 non-null int64
1 work_year 607 non-null int64
2 experience_level 607 non-null object
3 employment_type 607 non-null object
4 job_title 607 non-null object
5 salary 607 non-null int64
6 salary_currency 607 non-null object
7 salary_in_usd 607 non-null int64
8 employee_residence 607 non-null object
9 remote_ratio 607 non-null int64
10 company_location 607 non-null object
11 company_size 607 non-null object
dtypes: int64(5), object(7)
memory usage: 57.0+ KB
Data Collection ■ 13

1.1.1.6 HTM
df = pd.read_html('/content/ds_salaries.htm')[0]
df.head()

FIELD1 work_year experience_level employment_type \


0 0 2020 MI FT
1 1 2020 SE FT
2 2 2020 SE FT
3 3 2020 MI FT
4 4 2020 SE FT

job_title salary salary_currency salary_in_usd \


0 Data Scientist 70000 EUR 79833
1 Machine Learning Scientist 260000 USD 260000
2 Big Data Engineer 85000 GBP 109024
3 Product Data Analyst 20000 USD 20000
4 Machine Learning Engineer 150000 USD 150000

employee_residence remote_ratio company_location company_size


0 DE 0 DE L
1 JP 0 JP S
2 GB 50 GB M
3 HN 0 HN S
4 US 50 US L

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 607 entries, 0 to 606
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 FIELD1 607 non-null int64
1 work_year 607 non-null int64
2 experience_level 607 non-null object
3 employment_type 607 non-null object
4 job_title 607 non-null object
5 salary 607 non-null int64
6 salary_currency 607 non-null object
7 salary_in_usd 607 non-null int64
8 employee_residence 607 non-null object
9 remote_ratio 607 non-null int64
10 company_location 607 non-null object
11 company_size 607 non-null object
dtypes: int64(5), object(7)
memory usage: 57.0+ KB

1.1.2 Documentation
It is always good to have a reference of the read files functions in pandas. You can
find it via https://pandas.pydata.org/docs/reference/io.html
14 ■ Data Mining with Python

1.2 COLLECT DATA FROM THE WEB

Collecting data from the web is essential for various reasons:


• Access to vast amounts of information: The web contains an immense amount
of data on diverse topics. By collecting data from the web, you can tap into
this vast information pool and gain insights that can inform decision-making,
research, analysis, and more.
• Real-time and up-to-date data: The web provides a platform for the dissemina-
tion of real-time and up-to-date information. By collecting data from the web,
you can stay informed about the latest news, trends, market updates, social
media activity, and other dynamic sources of information.
• Competitive intelligence: Collecting data from the web allows you to monitor
your competitors, track their activities, analyze their strategies, and gain insights
into the market landscape. This can help you make informed decisions and stay
ahead in a competitive environment.
• Research and analysis: Web data collection is crucial for research, analysis,
and data-driven insights. By collecting data from diverse sources, you can
validate hypotheses, perform statistical analysis, conduct sentiment analysis,
and uncover patterns or trends that can enhance understanding and drive
informed decision-making.
The web has many websites, including structured websites, semi-structured websites,
and unstructured websites, that differ in terms of their organization and consistency.
• Structured Websites: Structured websites have a well-defined and organized
format, making it easy to locate specific information. They often follow a
consistent layout and have clearly defined sections. Structured websites generally
pose fewer challenges for data collection as the information is neatly organized.
However, occasional variations in page layouts or changes in website structure
can introduce some level of complexity. To collect data from structured websites,
you can utilize libraries like Beautiful Soup or lxml in Python. These libraries
enable you to parse the HTML structure of the web pages and extract desired
data using specific tags or CSS selectors.
• Semi-Structured Websites: Semi-structured websites contain a mixture of struc-
tured and unstructured data. While certain sections might be organized, others
may have varying formats or lack consistent organization. The main challenge
with semi-structured websites is the inconsistency in data presentation. The lack
of uniformity in structure and formatting requires additional effort to identify
and extract the relevant data. Similar to structured websites, libraries like
Beautiful Soup or lxml can help parse and extract data from semi-structured
websites. However, you may need to employ additional techniques such as
regular expressions or data cleaning procedures to handle variations in data
presentation.
Data Collection ■ 15

• Unstructured Websites: Unstructured websites lack a clear organization or


predefined structure. They may have free-form text, multimedia content, and
unorganized data scattered across multiple pages. Unstructured websites pose
the most significant challenges for data collection due to the absence of consis-
tent structure. The data may be embedded within paragraphs, images, or other
non-tabular formats, requiring sophisticated techniques for extraction. For un-
structured websites, natural language processing (NLP) techniques and machine
learning algorithms can be employed to extract relevant information. These
methods involve parsing the web content, identifying patterns, and applying
text processing algorithms to extract structured data.
In summary, structured websites provide a clear structure, making data collection rel-
atively straightforward. Semi-structured websites introduce some variability, requiring
careful handling of inconsistencies. Unstructured websites present the most significant
challenges, necessitating advanced techniques such as NLP and machine learning to
extract structured information. Python libraries like Beautiful Soup, lxml, and NLP
frameworks can assist in parsing and extracting data from these different types of
websites, adapting to their specific characteristics and complexities.

1.2.1 Tutorial – Collect Data from Web


import pandas as pd

1.2.1.1 Wiki
Some websites maintains structured data, which is easy to read
table = pd.read_html('https://en.wikipedia.org/wiki/
List_of_countries_by_GDP_(nominal)#Table')

for i in table:
print(type(i))

<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>

for i in table:
print(i.columns)

Int64Index([0], dtype='int64')
Int64Index([0, 1, 2], dtype='int64')
MultiIndex([( 'Country/Territory', 'Country/Territory'),
( 'UN Region', 'UN Region'),
16 ■ Data Mining with Python

( 'IMF[1][13]', 'Estimate'),
( 'IMF[1][13]', 'Year'),
( 'World Bank[14]', 'Estimate'),
( 'World Bank[14]', 'Year'),
('United Nations[15]', 'Estimate'),
('United Nations[15]', 'Year')],
)
...
Int64Index([0, 1], dtype='int64')

df = table[2]
df.head()

Country/Territory UN Region IMF[1][13] World Bank[14] \


Country/Territory UN Region Estimate Year Estimate Year
0 World — 101560901 2022 96513077 2021
1 United States Americas 25035164 2022 22996100 2021
2 China Asia 18321197 [n 1]2022 17734063 [n 3]2021
3 Japan Asia 4300621 2022 4937422 2021
4 Germany Europe 4031149 2022 4223116 2021

United Nations[15]
Estimate Year
0 85328323 2020
1 20893746 2020
2 14722801 [n 1]2020
3 5057759 2020
4 3846414 2020

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 217 entries, 0 to 216
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 (Country/Territory, Country/Territory) 217 non-null object
1 (UN Region, UN Region) 217 non-null object
2 (IMF[1][13], Estimate) 217 non-null object
3 (IMF[1][13], Year) 217 non-null object
4 (World Bank[14], Estimate) 217 non-null object
5 (World Bank[14], Year) 217 non-null object
6 (United Nations[15], Estimate) 217 non-null object
7 (United Nations[15], Year) 217 non-null object
dtypes: object(8)
memory usage: 13.7+ KB

1.2.1.2 Web Scraping


Some websites are semi-structured, which has metadata, such as labels, classes, etc,
so we can look into their source code, and do web scraping.
Note: You need to have a basic understanding of html, xml, in order to
understand the source code and collect data from these websites.
Data Collection ■ 17

Note: Some websites prevent users from scraping or scraping rapidly.


The first thing we’ll need to do to scrape a web page is to download the page. We can
download pages using the Python requests library.
The requests library will make a GET request to a web server, which will download
the HTML contents of a given web page for us. There are several different types of
requests we can make using requests, of which GET is just one. If you want to learn
more, check out our API tutorial.
Let’s try downloading a simple sample website, https://dataquestio.github.io/web-
scraping-pages/simple.html.

Download by requests We’ll need to first import the requests library, and then
download the page using the requests.get method:
import requests

page = requests.get("https://dataquestio.github.io/
web-scraping-pages/simple.html")
page

<Response [200]>

After running our request, we get a Response object. This object has a status_code
property, which indicates if the page was downloaded successfully:
page.status_code

200

A status_code of 200 means that the page downloaded successfully. We won’t fully
dive into status codes here, but a status code starting with a 2 generally indicates
success, and a code starting with a 4 or a 5 indicates an error.
We can print out the HTML content of the page using the content property:
page.content

b'<!DOCTYPE html>\n<html>\n <head>\n <title>A simple example


page</title>\n </head>\n <body>\n <p>Here is some
simple content for this page.</p>\n </body>\n</html>'

Parsing by BeautifulSoup As you can see above, we now have downloaded an HTML
document.
We can use the BeautifulSoup library to parse this document, and extract the text
from the p tag.
18 ■ Data Mining with Python

from bs4 import BeautifulSoup


soup = BeautifulSoup(page.content, 'html.parser')

We can now print out the HTML content of the page, formatted nicely, using the
prettify method on the BeautifulSoup object.
print(soup.prettify())

<!DOCTYPE html>
<html>
<head>
<title>
A simple example page
</title>
</head>
<body>
<p>
Here is some simple content for this page.
</p>
</body>
</html>

This step isn’t strictly necessary, and we won’t always bother with it, but it can be
helpful to look at prettified HTML to make the structure of the and where tags are
nested easier to see.

Finding Tags Finding all instances of a tag at once What we did above was useful for
figuring out how to navigate a page, but it took a lot of commands to do something
fairly simple. If we want to extract a single tag, we can instead use the find_all
method, which will find all the instances of a tag on a page.
if we are looking for the title, we can look for <title> tag
soup.find_all('title')

[<title>A simple example page</title>]

for t in soup.find_all('title'):
print(t.get_text())

A simple example page

If we are looking for text, we can look for <p> tag


for t in soup.find_all('p'):
print(t.get_text())

Here is some simple content for this page.


Data Collection ■ 19

If you instead only want to find the first instance of a tag, you can use the find method,
which will return a single BeautifulSoup object:
soup.find('p').get_text()

{"type":"string"}

Searching for tags by class and id:


Classes and ids are used by CSS to determine which HTML elements to apply certain
styles to. But when we’re scraping, we can also use them to specify the elements we
want to scrape.
Let’s try another page.
page = requests.get("https://dataquestio.github.io/
web-scraping-pages/ids_and_classes.html")
soup = BeautifulSoup(page.content, 'html.parser')
soup

<html>
<head>
<title>A simple example page</title>
</head>
<body>
<div>
<p class="inner-text first-item" id="first">
First paragraph.
</p>
<p class="inner-text">
Second paragraph.
</p>
</div>
<p class="outer-text first-item" id="second">
<b>
First outer paragraph.
</b>
</p>
<p class="outer-text">
<b>
Second outer paragraph.
</b>
</p>
</body>
</html>

Now, we can use the find_all method to search for items by class or by id. In the
below example, we’ll search for any p tag that has the class outer-text:
soup.find_all('p', class_='outer-text')

[<p class="outer-text first-item" id="second">


<b>
20 ■ Data Mining with Python

First outer paragraph.


</b>
</p>, <p class="outer-text">
<b>
Second outer paragraph.
</b>
</p>]

In the below example, we’ll look for any tag that has the class outer-text:
soup.find_all(class_="outer-text")

[<p class="outer-text first-item" id="second">


<b>
First outer paragraph.
</b>
</p>, <p class="outer-text">
<b>
Second outer paragraph.
</b>
</p>]

We can also search for elements by id:


soup.find_all(id="first")

[<p class="inner-text first-item" id="first">


First paragraph.
</p>]

1.2.2 Case Study – Collect Weather Data from Web


1.2.2.1 Downloading Weather Data
We now know enough to proceed with extracting information about the local weather
from the National Weather Service website!
The local weather of Boulder, CO is: https://forecast.weather.gov/MapClick.php?la
t=40.0466&lon=-105.2523#.YwpRBy2B1f0
Time to Start Scraping!
We now know enough to download the page and start parsing it. In the below code,
we will:
• Download the web page containing the forecast.
• Create a BeautifulSoup class to parse the page.
• Find the div with id seven-day-forecast, and assign to seven_day
• Inside seven_day, find each individual forecast item. Extract and print the first
forecast item.
Other documents randomly have
different content
hammock was hung up in the sick bay, a part of the main deck
appropriated to hospital purposes. Poor Tom, having a constitution
already undermined by former excesses, soon fell under the attack of
disease. He was then sewed up in his hammock, with some shot at
his feet: at sundown the ship’s bell pealed a melancholy note, the
ship was “hove to,” all hands mustered on deck, but myself; and,
amid the most profound silence, the body of the departed sailor was
laid upon the grating and launched into the great deep, the resting-
place of many a bold head. A plunge, a sudden opening in the water,
followed by an equally sudden return of the disparted waves, and
Black Tom was gone forever from his shipmates! In a few moments
the yards were braced round, and our frigate was cutting her way
again through the wide ocean waste. It seemed to me that she was
soon destined to heave to again, that I might also be consigned to an
ocean grave. But in this I was happily disappointed. By the blessing
of a watchful Providence, the aid of a sound constitution, assisted by
the skill of our surgeon and the kindness of my shipmates, I was at
last able to leave my hammock. Shortly after our return to Lisbon, I
was pronounced fit for duty, and the surgeon having obtained
another boy, I was placed on the quarter deck, in the capacity of
messenger, or errand boy for the captain and his officers.
With my return to active life, came my exposure to hardships, and,
what I dreaded still more, to punishment. Some of the boys were to
be punished on the main deck; the rest were ordered forward to
witness it, as usual. Being so far aft that I could not hear the
summons, as a matter of course, I remained at my post. The hawk-
eye of the lieutenant missed me, and in a rage he ordered me to be
sent for to receive a flogging for my absence. Excuse was vain; for,
such was the fiendish temper of this brutal officer, he only wanted the
shadow of a reason for dragging the poor helpless boys of his charge
to the grating. While I stood in trembling expectation of being
degraded by the hated cat, a summons from the captain
providentially called off our brave boy-flogger, and I escaped. The
offence was never mentioned afterwards. The reader can easily
perceive how such a constant exposure to the lash must embitter a
seaman’s life.
Already, since the Macedonian had been in commission, had she
changed captains twice. Why it so happened, it is not in my power to
explain; but while at Lisbon, after the cruise last mentioned, our
present captain was superseded by Captain John S. Carden. His
arrival excited a transitory hope of a brighter lot, as he was an older
man than the others, and, as we vainly trusted, a kinder one. Here,
however, we were mistaken; he was like all the rest, the same
heartless, unfeeling lover of whip discipline. At first the men under
sentence tried their powers at flattery with the grave old man; but he
was too experienced a sea-dog to be cajoled by a long-faced sailor
under sentence: when, therefore, they told him he was a kind-
hearted fatherly gentleman, he only replied by a most provoking
laugh, and by saying they were a set of very undutiful sons.
Captain Carden was mercilessly severe in punishing theft. He would
on no account forgive any man for this crime, but would flog the thief
almost to death. Of this, we soon had a cruel instance. A midshipman
named Gale, a most rascally, unprincipled fellow, found his pocket
handkerchief in possession of one of the crew. He charged the man
with stealing it. It was in vain that the poor wretch asserted that he
found it under his hammock. He was reported as a thief; a court-
martial sat upon him, and returned the shamefully disproportionate
sentence of three hundred lashes through the fleet, and one year’s
imprisonment! Any of my shipmates who are living, will certify to the
truth of this statement, brutal and improbable as it may appear.
Nor was that sentence a dead letter; the unhappy man endured it
to the letter. Fifty were laid on alongside of the Macedonian, in
conformity with a common practice of inflicting the most strokes at
the first ship, in order that the gory back of the criminal may strike
the more terror into the crews of the other ships. This poor tortured
man bore two hundred and twenty, and was pronounced by the
attending surgeon unfit to receive the rest. Galled, bruised, and
agonized as he was, he besought him to suffer the infliction of the
remaining eighty, that he might not be called to pass through the
degrading scene again; but this prayer was denied! He was brought
on board, and when his wounds were healed, the captain, Shylock-
like, determined to have the whole pound of flesh, ordered him to
receive the remainder.
But for my desire to present the reader with a true exhibition of life
on board a British man of war, it would be my choice to suppress
these disgusting details of cruelty and punishment. But this is
impossible; I must either draw a false picture or describe them. I
choose the latter, in the hope that giving publicity to these facts will
exert a favorable influence on the already improving discipline of
ships of war.
The case of our ship’s drummer will illustrate the hopelessness of
our situation under such officers as commanded our ship; it will show
that implicit, uncomplaining submission was our only resource. This
drummer, being seized up for some petty offence, demanded, what
no captain can refuse, to be tried by a court-martial; in the hope,
probably, of escaping altogether. The officers laughed among each
other, and when, a few days afterwards, the poor, affrighted man
offered to withdraw the demand and take six dozen lashes, they
coolly remarked, “The drummer is sick of his bargain.” He would have
been a wiser man had he never made it; for the court-martial
sentenced him to receive two hundred lashes through the fleet:—a
punishment ostensibly for his first offence, but really for his insolence
(?) in demanding a trial by court-martial. Such was the administration
of justice (?) on board the Macedonian.
“Why did not your crew rise in resistance to such cruelty?” is a
question which has often been proposed to me, when relating these
facts to my American friends. To talk of mutiny on shore is an easy
matter; but to excite it on shipboard is to rush on to certain death.
Let it be known that a man has dared to breathe the idea, and he is
sure to swing at the yard-arm. Some of our men once saw six
mutineers hanging at the yard-arm at once, in a ship whose crew
exhibited the incipient beginnings of mutiny. Let mutiny be
successful, the government will employ its whole force, if needful, in
hunting down the mutineers; their blood, to the last drop, is the
terrible retribution it demands for this offence. That demand is sure
to be met, as was the case with the crew of the Hermione[5] frigate,
and with the crew of the ill-fated Bounty, whose history is imprinted
on the memory of the whole civilized world. With such tragedies
flitting before our eyes, who need ask why we did not resist?
Just before we left Lisbon for another cruise, my position was once
more changed by my appointment to the post of servant to the
sailing-master; whose boy, for some offence or other, was flogged
and turned away. Here, too, the captain procured a fine band,
composed of Frenchmen, Italians and Germans, taken by the
Portuguese from a French vessel. These musicians consented to
serve, on condition of being excused from fighting, and on a pledge
of exemption from being flogged. They used to play to the captain
during his dinner hour; the party to be amused usually consisting of
the captain and one or two invited guests from the ward-room;
except on Sundays, when he chose to honor the ward-room with his
august presence. The band then played for the ward-room. They also
played on deck whenever we entered or left a port. On the whole,
their presence was an advantage to the crew, since their spirit-stirring
strains served to spread an occasional cheerful influence over them.
Soon after they came on board, we had orders to proceed to sea
again on another cruise.
CHAPTER IV

A
few days after we had fairly got out to sea, the thrilling cry of “A
man overboard!” ran through the ship with electrical effect; it
was followed by another cry of, “Heave out a rope!” then by still
another, of “Cut away the life buoy!” Then came the order, “Lower a
boat!” Notwithstanding the rapidity of these commands, and the
confusion occasioned by the anticipated loss of a man, they were
rapidly obeyed. The ship was then hove to. But that time, however,
the cause of all this excitement was at a considerable distance from
the ship. It was a poor Swede, named Logholm, who, while engaged
in lashing the larboard anchor stock, lost his hold and fell into the
sea. He could not swim; but, somehow, he managed to keep afloat
until the boat reached him, when he began to sink. The man at the
bow ran his boat hook down, and caught the drowning man by his
clothes: his clothes tearing, the man lost his hold, and the Swede
once more sunk. Again the active bowsman ran the hook down,
leaning far over the side; fortunately, he got hold of his shirt collar:
dripping, and apparently lifeless, they drew him into the boat. He was
soon under the surgeon’s care, whose skill restored him to animation
and to life. It was a narrow escape!
Rising one morning, I heard the men talking about having been
called to quarters during the night. They said a strange vessel having
appeared, the drums beat to quarters, the guns were got ready,
those great lanterns, which are placed on the main deck, called battle
lanterns, were got out, and the officers began to muster the men at
each division; when they discovered the supposed vessel of war to be
nothing more than a large merchant ship. Upon this the hands were
sent below. All this was news to me; I had slept through all the noise,
confusion and bustle of the night, utterly ignorant of the whole
matter. It was fortunate for me that the real character of the strange
ship was discovered before my name was called, otherwise the
morning would have found me at the gratings under punishment.
Never was boy happier than myself, when made acquainted with my
hair-breadth escape from the lash.
We had now reached the island of Madeira, occupied by the
Portuguese, and producing fine oranges, grapes and wine. It is some
sixty miles in length, about forty in breadth; the climate is hot, but
salubrious; its harbor, or rather roadstead, is by no means
commodious or safe—so that our stay was short. Here, the
Portuguese lad who had supplied my place as servant to the surgeon,
was sent on shore, for attempting a crime unfit to be mentioned in
these pages, but quite common among the Spaniards and
Portuguese. My old master made an effort to obtain me again, but
did not succeed.
Sailing from Madeira, we next made St. Michael’s. At this pace we
had an increase to our crew, in the person of a fine, plump boy—born
to the wife of one of our men. The captain christened the new comer,
Michael, naming him after the island. This birth was followed by
another. Whether the captain did not like the idea of such interesting
episodes in sea life, or whether any other motive inspired him, I
cannot tell; but when, shortly after, we returned to Lisbon, he
ordered all the women home to England, by a ship just returning
thither. Before this, however, one of our little Tritons had died, and
found a grave under the billows, leaving its disconsolate mother in a
state little short of distraction. A man of war is no place for a woman.
Short cruises are very popular with man-of-war’s men. On many
accounts they love being in harbor; on others they prefer being at
sea. In harbor they have to work all day, but in return for this they
have the whole night for sleep. At sea, the whole time is divided into
five watches of four hours each, and two shorter ones, called dog
watches, of two hours each, or from four to six and from six to eight,
P. M. The design of these dog watches is to alternate the time, so
that each watch may have a fair proportion of every night below.
While at our station this time, our old friend, Bob Hammond, met
with some little difficulty, which we will here make matter of record.
He was below, and one of his messmates did something that vexed
him exceedingly. Now Bob was not a man to bear vexations tamely,
where he had the power to resist them; so, lifting his huge fist, he
struck at the offender; missing his real opponent, the blow fell upon
another who stood near him. Bob was too much of a bully to offer
any apology; he merely laughed, and remarked that he had “killed
two birds with one stone.”
Whether the bird, who, in Bob’s figurative language, was killed, did
not like being called a bird, or whether he conceived a strong dislike
to being a mark for Bob to shoot at, is not for me to say; but he
certainly disliked the one or the other, for the next morning he
reported the matter to the officers, which complaint was considered a
most unsailor-like act by the whole crew.
Fighting was a punishable offence, so Bob was called up the next
morning. The captain mentioned what was reported concerning him.
He acknowledged it was all true, and without any signs of contrition
said, “I only killed two birds with one stone.” The angry captain
ordered two dozen lashes to be laid on; it was done without extorting
a sigh or a groan. He was then loosed from the grating, and
questioned; but he merely replied, in a gruff tone, that “the man who
reported him was a blackguard!” For this, he was seized up again and
another dozen lashes inflicted; he bore them with the same dogged
and imperturbable air. Finding it impossible to extort any
acknowledgment from the stubborn tar, the captain ordered him
below.
About the same time one of our crew, named Jack Sadler, a fine,
noble-hearted seaman, growing weary of the service, determined to
desert. Dropping into the water, he began swimming towards the
shore. It was not very dark, and he was discovered; the sentry was
ordered to fire at him, which he did, but missed his prey. A boat was
next lowered, which soon overtook and dragged him on board. The
officer commanding the boat said, “Well, Mr. Sadler, you thought you
had got away, did you?” “You are not so sure that you have me now,”
replied Sadler, as he sprung over the side of the boat. Nor would they
have captured him, had not another ship’s boat arrived to their
assistance.
The next day, he was seized up and received three dozen lashes,
which, considering his offence, was a very light punishment. I
suppose that his noble bearing, his lion-hearted courage, and his
undaunted manner, produced a favorable feeling in the captain’s
mind; especially as he afterwards became his favorite—a fancy man—
as those men are called who win the favor of their superior officer.
One of Sadler’s failings was that too prevalent evil among seamen,
drunkenness. Soon after the above affair, he got drunk. Being seen
by the captain, he was ordered to be put in irons. Sadler was Bob
Hammond’s messmate; this worthy, finding his comrade in trouble,
made himself drunk, and purposely placed himself in the way of the
officers, that he might be put in irons also, to keep his friend Sadler
company. The plan succeeded. Bob had his wish, and the two
fearless tars were soon ironed together. Nothing daunted, they began
to sing, and through the whole night they kept up such a hallooing,
shouting and singing as might have served for a whole company of
idle roysterers. Being near the ward-room, they prevented the
officers from sleeping nearly all night.
As usual, after being in irons, they were brought up for punishment
the next morning. “Well, Mr. Sadler,” said the captain, “you were
drunk, were you, last night?”
“I was, sir,” replied the offender.
Had he been any other man, he would have been ordered to strip:
as it was, the captain proceeded,—
“Do you feel sorry for it, sir?”
“I do, sir.”
“Will you try to keep sober if I forgive you?” continued Captain
Carden.
“I will try, sir.”
“Then, sir, I forgive you:” and no doubt he was glad to witness that
contrition in his favorite which made it consistent to forgive him.
Having dismissed Sadler, he turned to Hammond: assuming a sterner
look and a harsher voice, he said, in a tone of irony, “Well, Mr.
Hammond, you got drunk last night, did you, sir?”
Bob shrugged up his shoulders, and removed his enormous quid
into a convenient position for speaking, and then replied, “I can’t say
but that I had a horn of malt.”
The captain looked thunder at the stalwart man, as he answered,
“A horn of malt, you rascal! what do you call a horn of malt?”
“When I was in Bengal, Madras, and Batavia,” said he, “I used to
get some stuff called arrack—we used to call it a horn of malt; but
this was some good rum.”
Bob’s manner was so exquisitely ridiculous while delivering this
harangue, that both officers and men broke out into an involuntary
laugh. The captain looked confounded, but recovering himself, he
said to Mr. Hope, the first lieutenant, “Put that rascal in irons; it is of
no use to flog him.”
One of the peculiarities of Captain Carden was an ardent desire to
have a crew of picked, first-rate men. The shiftless, slovenly seaman
was his abhorrence. Had he dared, he would gladly have given all
such their discharge; as it was, he never attempted their recovery, by
offering a reward for their detection, if they ran away; while he
spared no pains to catch an able, active, valuable man like Sadler. He
even gave these drones opportunity to escape, by sending them on
shore at Lisbon, to cut stuff to make brooms for sweeping the deck.
The men sent out on these expeditions were nicknamed “broomers.”
Now, although Bob Hammond was as expert a sailor as any man in
the ship, yet his unconquerable audacity made the captain fear his
influence, and wish to get rid of him; hence, a few days after this
drunken spree, Bob was called on deck to go with the broomers. “You
may go, Mr. Hammond,” said the captain, eyeing him in a very
expressive manner, “with these fellows to cut broom.”
Bob understood the hint perfectly, and replied, “Aye, aye, sir, and I
will cut a long handle to it.” I scarcely need remark that the broomers
returned without Bob. Whether he remained on shore to cut the long
handle, or for some other purpose, he never informed us: certain it
is, however, that the presence of Bob Hammond never darkened the
decks of the Macedonian again.
About this time the prevailing topic of conversation among our men
and officers was the probability of a war with America. The prevailing
feeling through the whole fleet was that of confidence in our own
success, and of contempt for the inferior naval force of our
anticipated enemies. Every man, and especially the officers,
predicted, as his eye glanced proudly on the fine fleet which was
anchored off Lisbon, a speedy and successful issue to the
approaching conflict.
We now received orders to sail to Norfolk, Virginia, with
despatches. The voyage was accomplished without any occurrence of
note. We found ourselves on the American coast, with no very
pleasant impressions. It was late in the fall, and the transition from
the mild, soft climate of Spain and Portugal, to the bleak, sharp
atmosphere of the coast of Virginia, was anything but delightful.
The most disagreeable duty in the ship was that of holy-stoning the
decks on cold, frosty mornings. Our movements were never more
elastic than when at this really severe task. As usual, it gave occasion
to a variety of forecastle yarns about cold stations. Among these was
one which was attested by many witnesses, and there can be no
doubt of its truth:
A British frigate was once stationed in a cold climate. The first
lieutenant was a complete tyrant, delighting in everything that caused
the crew to suffer. Among other things, he took especial care to make
the work of holy-stoning as painful as possible, by forcing them to
continue at it much longer than was necessary. Although he had no
watch on deck, he would contrive to be up in season to annoy the
men with his hated presence. One morning, the weather being
unusually severe, the men sprang to their task with unwonted agility,
and contrived to finish it before the appearance of their persecutor.
To their vexation, however, just as they had completed their work, he
bounced on deck, with a peremptory order to wash the decks all over
a second time.
The men dropped on their knees with the holy-stones, and prayed,
as the tyrant went below, that he might never come on deck again
alive. Whether God heard the cry of the oppressed crew, or whether
it was the action of the ordinary natural laws, the reader must
determine for himself; but when the lieutenant again appeared on
deck, he was brought up “feet foremost,” to be buried. He was taken
sick that morning: his disease baffled the skill of the surgeon, and in
a few days he was a corpse. The opinion that he died a monument of
the divine displeasure against cruel, hard-hearted men of power, and
of disregard for the miseries and tears of the oppressed poor, is at
least worthy of serious consideration.
Soon after we had descried land, an American pilot came on board
to pilot us into Hampton Roads. The sound of our own familiar
tongue from a stranger, was very agreeable to men who had been
accustomed to hear the semi-barbarous lingo of the Portuguese, and
a thrill of home remembrances shot through our hearts, as, stepping
on deck, the pilot exclaimed, “It is very cold!”
While at anchor in Hampton Roads, we fared well. Boats were
alongside every day with plenty of beef and pork, which was
declared, by universal consent, to be infinitely superior to what we
obtained from Portugal. Our men said that the Yankee pork would
swell in the pot, which they very sagely accounted for on the
supposition that the pigs were killed at the full of the moon. But I
suppose that Virginia corn had more to do in this matter than lunar
influences; though our men most doggedly maintained the contrary
and more mystical opinion.
The principal draw-back on the enjoyment of our stay at Norfolk,
was the denial of liberty to go on shore. The strictest care was taken
to prevent all communication with the shore, either personally or by
letter. The reason of this prohibition was a fear lest we should desert.
Many of our crew were Americans: some of these were pressed men;
others were much dissatisfied with the severity, not to say cruelty, of
our discipline; so that a multitude of the crew were ready to give “leg
bail,” as they termed it, could they have planted their feet on
American soil. Hence our liberty was restrained.
Our officers never enjoyed better cheer than during our stay at this
port. Besides feasting among themselves on the fine fat beef, geese
and turkeys, which came alongside in abundance, they exchanged
visits with Commodore Decatur and his officers, of the frigate United
States, then lying at Norfolk. These visits were seasons of much
wassail and feasting. I remember overhearing Commodore Decatur
and the captain of the Macedonian joking about taking each other’s
ship, in case of a war; and some of the crew said that a bet of a
beaver hat passed between them on the issue of such a conflict.
They probably little thought that this joking over a wine-cup, would
afterwards be cracked in earnest, in a scene of blood and carnage.
It was at this port that the difficulty between the British ship
Leopard and the American frigate Chesapeake took place. Several
American seamen, having escaped from the former, took refuge on
board the latter. The captain of the Leopard demanded their
restoration; the captain of the Chesapeake refused submission to the
demand. The Leopard fired into the frigate, which, being of inferior
force, struck to her opponent. As it was a time of peace, the
Chesapeake was not kept as a prize; the claimed men were taken
from her, and she was restored. This was among the circumstances
which led to the war of 1812.
The despatches delivered, and the object of the voyage
accomplished, we once more put to sea; having first laid in a liberal
store of our favorite beef, together with a quantity of Virginia beans,
called Calavances, which were in high favor with our men. To those of
our crew who were Americans, this was rather an unpleasant event.
Like the fabled Tantalus, they had the cup at their mouths, but it
receded before they could taste its contents. They had been at the
threshold of “home, sweet home,” but had not been permitted to step
within its doors. Some of them felt this very keenly, especially a boy,
who belonged to New York, named Jesse Lloyd. In truth, it was a
hard lot.
A quick winter passage brought us to Lisbon, where the arrival of
the English mail-bag, and orders to proceed to England with a convoy
of merchantmen, put us all into a tolerably good humor.
The arrival of the mail-bag is a season of peculiar interest on board
a man of war. It calls the finer feelings of human nature into exercise.
It awakens conjugal, fraternal, and filial affection in almost every
breast. The men crowd around, as the letters are distributed, and he
was pronounced a happy fellow whose name was read off by the
distributor; while those who had none, to hide their disappointment,
would jocularly offer to buy those belonging to their more fortunate
messmates.
During the two years of our absence I had received several letters
from my mother, which afforded me much satisfaction. To these I had
faithfully replied. I now experienced the advantage of the primary
education I had received when a boy. Many of my shipmates could
neither read nor write, and were, in consequence either altogether
deprived of the privilege of intercourse with their friends, or were
dependent on the kindness of others, to read and write for them. For
these I acted as a sort of scribe. I also solaced many weary hours by
reading such works as could be obtained from the officers; and
sometimes I perused the Bible and prayer book which my mother so
wisely placed in my chest, on the eve of my departure. The pack of
cards, which so inappropriately accompanied them, I had loaned to
one of the officers, who took the liberty to keep them. This was,
perhaps, more fortunate than otherwise, since their possession might
have led to their use, and their use might have excited a propensity
to gambling, which would have ended in my ruin.
After remaining a very short time at Lisbon, we one morning fired a
gun to give notice to our convoy to get under weigh. Immediately the
harbor was alive with noise and activity. The song of the sailors
weighing anchor, the creaking of pulleys, the flapping of the sails, the
loud, gruff voices of the officers, and the splashing of the waters,
created what was to us, now that we were “homeward bound,” a
sweet harmony of sounds. Amid all this animation, our own stately
frigate spread her bellying sails to a light but favoring breeze; with
colors flying, our band playing lively airs, and the captain with his
speaking trumpet urging the lagging merchant-ships to more activity,
we passed gaily through the large fleet consigned to our care. In this
gallant style we scudded past the straggling ruins of old Lisbon,
which still bore marks of the earthquake that destroyed it. Very soon
the merry fishermen, who abound in the Tagus, were far at our stern.
Next, we glided past the tall granite pinnacles of towering mount
Cintra; the high-lands passed from our vision like the scenes in a
panorama, and in a few hours, instead of the companionship of the
large flocks of gulls, which abound in this river, we were attended by
only here and there one of these restless wanderers of the deep. We
were fairly at sea, and, what was the more inspiring, we were
enjoying the luxury of fond anticipation. Visions of many an old fire-
side, of many a humble hearth-stone, poor, but precious, flitted
across the visions of our crew that night. Hardships, severe discipline,
were for the time forgotten in the dreams of hope. Would that I could
say that everything in every mind was thus absorbed in pleasure!
There were minds that writhed under what is never forgotten. Like
the scar, that time may heal, but not remove, the flogged man
forgets not that he has been degraded; the whip, when it scarred the
flesh, went farther; it wounded the spirit; it struck the man; it begat
a sense of degradation he must carry with him to his grave. We had
many such on board our frigate; their laugh sounded empty, and
sometimes their look became suddenly vacant in the midst of hilarity.
It was the whip entering the soul anew. But the most of our crew were,
for the time, happy. They were homeward bound!
CHAPTER V

A
fter running a few days before a fair wind, the delightful cry of
“Land ho!” was heard from the mast-head; a cry always
pleasant to the inhabitant of a ship, but most especially so
when the distant hills are those of his native land. Soon after the cry
of the man aloft, the land became dimly visible from the deck, and
our eyes glistened, as the bright, emerald fields of old England, in all
the glory of their summer beauty, lay spread out before us. Ascending
the British Channel, we soon made the spacious harbor of Plymouth,
where we came to an anchor. One of our convoy, however, by some
unskilful management, ran ashore at the mouth of the harbor, where
she went to pieces.
We found Plymouth to be a naval station of considerable
importance, well fortified, possessing extensive barracks for the
accommodation of the military, and having a magnificent dock-yard,
abundantly supplied with the means of building and refitting the
wooden walls.
Nothing would have afforded me a higher gratification, than a trip
to the pleasant fields and quiet hearth-sides of dear old Bladen. I
longed to pour out my pent-up griefs into the bosom of my mother,
and to find that sympathy which is sought in vain in the cold,
unfeeling world. This privilege was, however, denied to all. No one
could obtain either leave of absence or money, since a man of war is
never “paid off” until just before she proceeds to sea. But, feeling
heartily tired of the service, I wrote to my mother, requesting her to
endeavor to procure my discharge. This, with the promptitude of
maternal affection, she pledged herself to do at the earliest possible
opportunity. How undying is a mother’s love!
When a man of war is in port, it is usual to grant the crew
occasional liberty to go on shore. These indulgences are almost
invariably abused for purposes of riot, drunkenness and debauchery;
rarely does it happen, but that these shore sprees end in bringing
“poor Jack” into difficulty of some sort; for, once on shore, he is like
an uncaged bird, as gay and quite as thoughtless. He will then follow
out the dictates of passions and appetites, let them lead him whither
they may. Still, there are exceptions; there are a few who spend their
time more rationally. Were the principles of modern temperance fully
triumphant among sailors, they would all do so.
I resolved not to abuse my liberty as I saw others doing; so when,
one fine Sabbath morning, I had obtained leave from our surly first
lieutenant, I chose the company of a brother to a messmate, named
Rowe, who lived at Plymouth. At the request of my messmate, I
called to see him. He received me very kindly, and took me in
company with his children into the fields, where the merry notes of
the numerous birds, the rich perfume of the blooming trees, the tall,
green hedges, and the modest primroses, cowslips and violets, which
adorned the banks on the road-side, filled me with inexpressible
delight. True, this was not the proper manner of spending a Sabbath
day, but it was better than it would have been to follow the example
of my shipmates generally, who were carousing in the tap-rooms of
the public houses.
At sunset I went on board and walked aft to the lieutenant, to
report myself. He appeared surprised to see me on board so early
and so perfectly sober, and jocosely asked me why I did not get
drunk and be like a sailor. Merely smiling, I retired to my berth,
thinking it was very queer for an officer to laugh at a boy for doing
right, and feeling happy within myself because I had escaped
temptation.
By and by, three other boys, who had been ashore, returned, in a
state which a sailor would call “three sheets in the wind.” They
blustered, boasted of the high time they had enjoyed, and roundly
laughed at me for being so unlike a man-of-war’s-man; while they felt
as big as any man on board. The next morning, however, they looked
rather chop-fallen, when the captain, who had accidentally seen their
drunken follies on shore, ordered them to be flogged, and forbade
their masters to send them ashore while we remained at Plymouth.
Now, then, it was pretty evident who had the best cruise; the joke
was on the other side; for while their drunken behavior cost them a
terrible whipping and a loss of liberty, my temperance gained me the
real approbation of my officers, and more liberty than ever, since
after that day I had to go on shore to do errands for their masters, as
well as for my own. The young sailor may learn from this fact the
benefit of temperance, and the folly of getting drunk, for the sake of
being called a fine fellow.
My frequent visits to the shore gave me many opportunities to run
away; while my dislike of everything about the Macedonian inspired
me with the disposition to improve them. Against this measure my
judgment wisely remonstrated, and, happily for my well being,
succeeded. Such an attempt would inevitably have been followed by
my recovery, since a handsome bounty was paid for the delivery of
every runaway. There are always a sufficient number to be found who
will engage in pursuit for the sake of money—such men as the
Canadian landlord, described by Rev. Wm. Lighton, in his interesting
narrative,[6] a work with which, no doubt, most of my readers are
acquainted, since it has enjoyed an immense circulation. Endurance,
therefore, was the only rational purpose I could form.
Perhaps the hope of a speedy discharge, through my mother’s
efforts, tended somewhat to this result in my case; besides, my
situation had become somewhat more tolerable from the fact, that by
dint of perseverance in a civil and respectful behavior, I had gained
the good will both of the officers and crew. Yet, with this advantage,
it was a miserable situation.
There are few worse places than a man of war, for the favorable
development of the moral character in a boy. Profanity, in its most
revolting aspect; licentiousness, in its most shameful and beastly
garb; vice, in the worst of its Proteus-like shapes, abound there.
While scarcely a moral restraint is thrown round the victim, the
meshes of temptation are spread about his path in every direction.
Bad as things are at sea, they are worse in port. There, boat-loads of
defiled and defiling women are permitted to come alongside; the
men, looking over the side, select whoever best pleases his lustful
fancy, and by paying her fare, he is allowed to take and keep her on
board as his paramour, until the ship is once more ordered to sea.
Many of these lost, unfortunate creatures are in the springtime of life,
some of them are not without pretensions to beauty. The ports of
Plymouth and Portsmouth are crowded with these fallen beings. How
can a boy be expected to escape pollution, surrounded by such works
of darkness? Yet, some parents send their children to sea because
they are ungovernable ashore! Better send them to the house of
correction.
There is one aspect in which life at sea and life in port materially
differ. At sea, a sense of danger, an idea of insecurity, is ever present
to the mind; in harbor, a sense of security lulls the sailor into
indulgence. He feels perfectly safe. Yet, even in harbor, danger
sometimes visits the fated ship, stealing upon her like the spirit of
evil. This remark was fearfully illustrated in the loss of the Royal
George, which sunk at Spithead, near Portsmouth, on the 29th of
August, 1782.
This splendid line of battle ship of one hundred and eight guns, had
arrived at Spithead. Needing some repairs, she was “heeled down,” or
inclined on one side, to allow the workmen to work on her sides.
Finding more needed to be done to the copper sheathing than was
expected, the sailors were induced to heel her too much. While in this
state, she was struck by a slight squall; the cannon rolled over to the
depressed side; her ports were open, she filled with water, and sunk
to the bottom!
This dreadful catastrophe occurred about ten o’clock in the
morning. The brave Admiral Kempenfeldt was writing in his cabin;
most of the crew, together with some three hundred women, were
between decks: these nearly all perished. Captain Waghorn, her
commander, was saved; his son, one of her lieutenants, was lost.
Those who were on the upper deck were picked up by the boats of
the fleet, but nearly one thousand souls met with a sudden and
untimely end. The poet Cowper has celebrated this melancholy event
in the following beautiful lines:
Toll for the brave!
The brave that are no more!
All sunk beneath the wave,
Fast by their native shore.

Eight hundred of the brave,


Whose courage well was tried,
Had made the vessel heel,
And laid her on her side.

A land breeze shook the shrouds,


And she was overset;
Down went the Royal George,
With all her crew complete.

Toll for the brave—


Brave Kempenfeldt is gone,
His last sea fight is fought—
His work of glory done.

It was not in the battle;


No tempest gave the shock;
She sprang no fatal leak;
She ran upon no rock.

His sword was in its sheath;


His fingers held the pen,
When Kempenfeldt went down,
With twice four hundred men.

Weigh the vessel up,


Once dreaded by our foes!
And mingle with our cup
The tear that England owes.

Her timbers yet are sound,


And she may float again,
Full charged with England’s thunder,
And plough the distant main.

But Kempenfeldt is gone,


His victories are o’er;
His victories are o er;
And he, and his eight hundred,
Shall plough the wave no more.

To return to my narrative: Our ship, having been at sea two years,


needed overhauling. She was therefore taken into one of the splendid
dry docks in the Plymouth dock-yard, while the crew were placed, for
the time being, on board an old hulk. A week or two sufficed for this
task, when we returned to our old quarters. She looked like a new
ship, having been gaily painted within and without. We, too, soon got
newly rigged; for orders had reached us from the Admiralty office to
prepare for sea, and we were paid off. Most of the men laid out part
of their money in getting new clothing; some of it went to buy
pictures, looking-glasses, crockery ware, &c., to ornament our berths,
so that they bore some resemblance to a cabin. The women were
ordered ashore, and we were once more ready for sea.
The practice of paying seamen at long intervals, is the source of
many evils. Among these, is the opportunity given to pursers to
practise extortion on the men—an opportunity they are not slow in
improving. The spendthrift habits of most sailors leave them with a
barely sufficient quantity of clothing, for present purposes, when they
ship. If the cruise is long, they are, consequently, obliged to draw
from the purser. This gentleman is ever ready to supply them, but at
ruinous prices. Poor articles with high prices are to be found in his
hands; these poor Jack must take of necessity, because he cannot
get his wages until he is paid off. Hence, what with poor articles, high
charges and false charges, the purser almost always has a claim
which makes Jack’s actual receipts for two or three years’ service,
wofully small. Were he paid at stated periods, he could make his own
purchases as he needed them. The sailor is aware of this evil, but he
only shows his apprehension of it in his usually good-humored
manner. If he sees a poor, ill-cut garment, he will laugh, and say it
“looks like a purser’s shirt on a handspike.” These are small matters,
but they go to make up the sum total of a seaman’s life, and should
therefore be remedied as far as possible.
Our preparations all completed, the hoarse voice of the boatswain
rang through the ship, crying, “All hands up anchor, ahoy!” In a trice,
the capstan bars were shipped, the fifer was at his station playing a
lively tune, the boys were on the main deck holding on to the
“nippers,” ready to pass them to the men, who put them round the
“messenger” and cable; then, amid the cries of “Walk round! heave
away, my lads!” accompanied by the shrill music of the fife, the
anchor rose from its bed, and was soon dangling under our bows.
The sails were then shaken out, the ship brought before the wind,
and we were once more on our way to sea. We were directed to
cruise off the coast of France this time; where, as we were then at
war with the French, we were likely to find active service.
We first made the French port of Rochelle; from thence, we sailed
to Brest, which was closely blockaded by a large British fleet,
consisting of one three-decker, with several seventy-fours, besides
frigates and small craft. We joined this fleet, and came to an anchor
in Basque Roads, to assist in the blockade. Our first object was to
bring a large French fleet, greatly superior to us in size and numbers,
to an engagement. With all our manœuvring, we could not succeed in
enticing them from their snug berth in the harbor of Brest, where
they were safely moored, defended by a heavy fort, and by a chain
crossing the harbor, to prevent the ingress of any force that might be
bold enough to attempt to cut them out. Sometimes we sent a frigate
or two as near their fort as they dared to venture, in order to entice
them out; at other times, the whole fleet would get under weigh and
stand out to sea; but without success. The Frenchmen were either
afraid we had a larger armament than was visible to them, or they
had not forgotten the splendid victories of Nelson at the Nile and
Trafalgar. Whatever they thought, they kept their ships beyond the
reach of our guns. Sometimes, however, their frigates would creep
outside the forts, when we gave them chase, but seldom went
beyond the exchange of a few harmless shots. This was what our
men called “boy’s play;” and they were heartily glad when we were
ordered to return to Plymouth.
After just looking into Plymouth harbor, our orders were
countermanded, and we returned to the coast of France. Having
accomplished about one half the distance, the man at the mast-head
cried out, “Sail ho!”
“Where away?” (what direction?) responded the officer of the deck.
The man having replied, the officer again asked, “What does she look
like?”
“She looks small; I cannot tell, sir.”
In a few minutes the officer hailed again, by shouting, “Mast-head,
there! what does she look like?”
“She looks like a small sail-boat, sir.”
This was rather a novel announcement; for what could a small sail-
boat do out on the wide ocean? But a few minutes convinced us that
it was even so; for, from the deck, we could see a small boat, with
only a man and a boy on board. They proved to be two French
prisoners of war, who had escaped from an English prison, and,
having stolen a small boat, were endeavoring to make this perilous
voyage to their native home. Poor fellows! they looked sadly
disappointed at finding themselves once more in British hands. They
had already been in prison for some time; they were now doomed to
go with us, in sight of their own sunny France, and then be torn away
again, carried to England, and imprisoned until the close of the war.
No wonder they looked sorrowful, when, after having hazarded life
for home and liberty, they found both snatched from them in a
moment, by their unlucky rencontre with our frigate. I am sure we
should all have been glad to have missed them. But this is only one
of the consequences of war.
Having joined the blockading fleet again, we led the same sort of
life as before: now at anchor, then giving chase; now standing in
shore, and anon standing out to sea; firing, and being fired at,
without once coming into action.
Determined to accomplish some exploit or other, our captain
ordered an attempt to be made at cutting out some of the French
small craft that lay in shore. We were accustomed to send out our
barges almost every night, in search of whatever prey they might
capture. But on this occasion the preparations were more formidable
than usual. The oars were muffled; the boat’s crew increased, and
every man was armed to the teeth. The cots were got ready on
board, in case any of the adventurers should return wounded. Cots
are used to sleep in by ward-room officers and captains; midshipmen
and sailors using hammocks. But a number of cots are always kept in
a vessel of war, for the benefit of wounded men; they differ from a
hammock, in being square at the bottom, and consequently more
easy. The service on which the barge was sent being extremely
dangerous, the cots were got ready to receive the wounded, should
there be any; but notwithstanding these expressive preparations, the
brave fellows went off in as fine spirits as if they had been going on
shore for a drunken spree. Such is the contempt of danger that
prevails among sailors.
We had no tidings of this adventure until morning, when I was
startled by hearing three cheers from the watch on deck; these were
answered by three more from a party that seemed approaching us. I
ran on deck just as our men came alongside with their bloodless prize
—a lugger, laden with French brandy, wine and Castile soap. They
had made this capture without difficulty; for the crew of the lugger
made their escape in a boat, on the first intimation of danger. As this
was our first prize, we christened her the Young Macedonian. She
was sent to the admiral; but what became of her, I never heard.
Before sending her away, however, the officers, having a peculiar
itching for some of the brandy, took the liberty of replenishing their
empty bottles from the hold. This, with true aristocratic liberality, they
kept to themselves, without offering the smallest portion to the crew.
Some of them showed, by their conduct afterwards, that this brandy
possessed considerable strength. We had no further opportunity to
signalize either ourselves or our frigate by our heroism at Brest; for
we were soon after ordered back to Plymouth, where, for a short
time, we lay at our old anchorage ground.
Welcome to Our Bookstore - The Ultimate Destination for Book Lovers
Are you passionate about books and eager to explore new worlds of
knowledge? At our website, we offer a vast collection of books that
cater to every interest and age group. From classic literature to
specialized publications, self-help books, and children’s stories, we
have it all! Each book is a gateway to new adventures, helping you
expand your knowledge and nourish your soul
Experience Convenient and Enjoyable Book Shopping Our website is more
than just an online bookstore—it’s a bridge connecting readers to the
timeless values of culture and wisdom. With a sleek and user-friendly
interface and a smart search system, you can find your favorite books
quickly and easily. Enjoy special promotions, fast home delivery, and
a seamless shopping experience that saves you time and enhances your
love for reading.
Let us accompany you on the journey of exploring knowledge and
personal growth!

ebookgate.com

You might also like