The Art of Machine Learning: A Hands-On Guide to Machine Learning with R

Ebook586 pages4 hours

The Art of Machine Learning: A Hands-On Guide to Machine Learning with R

Name: The Art of Machine Learning: A Hands-On Guide to Machine Learning with R
Brand: No Starch Press
Rating: 5.0 (1 reviews)

By Norman Matloff

Rating: 5 out of 5 stars

5/5

()

Read preview

About this ebook

Learn to expertly apply a range of machine learning methods to real data with this practical guide.

Packed with real datasets and practical examples, The Art of Machine Learning will help you develop an intuitive understanding of how and why ML methods work, without the need for advanced math.

As you work through the book, you’ll learn how to implement a range of powerful ML techniques, starting with the k-Nearest Neighbors (k-NN) method and random forests, and moving on to gradient boosting, support vector machines (SVMs), neural networks, and more.

With the aid of real datasets, you’ll delve into regression models through the use of a bike-sharing dataset, explore decision trees by leveraging New York City taxi data, and dissect parametric methods with baseball player stats. You’ll also find expert tips for avoiding common problems, like handling “dirty” or unbalanced data, and how to troubleshoot pitfalls.

You’ll also explore:

How to deal with large datasets and techniques for dimension reduction
Details on how the Bias-Variance Trade-off plays out in specific ML methods
Models based on linear relationships, including ridge and LASSO regression
Real-world image and text classification and how to handle time series data

Machine learning is an art that requires careful tuning and tweaking. With The Art of Machine Learning as your guide, you’ll master the underlying principles of ML that will empower you to effectively use these models, rather than simply provide a few stock actions with limited practical use.

Requirements: A basic understanding of graphs and charts and familiarity with the R programming language

Skip carousel

LanguageEnglish

PublisherNo Starch Press

Release dateJan 9, 2024

ISBN9781718502116

Author

Norman Matloff

Related authors

Skip carousel

Related to The Art of Machine Learning

Related ebooks

Skip carousel

Ultimate Machine Learning with Scikit-Learn: Unleash the Power of Scikit-Learn and Python to Build Cutting-Edge Predictive Modeling Applications and Unlock Deeper Insights Into Machine Learning (English Edition)
Ebook
Ultimate Machine Learning with Scikit-Learn: Unleash the Power of Scikit-Learn and Python to Build Cutting-Edge Predictive Modeling Applications and Unlock Deeper Insights Into Machine Learning (English Edition)
byParag Saxena
Rating: 0 out of 5 stars
0 ratings
Python Automation Mastery: From Novice To Pro
Ebook
Python Automation Mastery: From Novice To Pro
byRob Botwright
Rating: 0 out of 5 stars
0 ratings
The Comprehensive Guide to Machine Learning Algorithms and Techniques
Ebook
The Comprehensive Guide to Machine Learning Algorithms and Techniques
byMohammed Ahmed
Rating: 5 out of 5 stars
5/5
Machine Learning for Absolute Beginners: An Introduction to the Fundamentals and Applications of Machine Learning
Ebook
Machine Learning for Absolute Beginners: An Introduction to the Fundamentals and Applications of Machine Learning
bydaniel huston
Rating: 3 out of 5 stars
3/5
Demystifying Artificial intelligence: Simplified AI and Machine Learning concepts for Everyone (English Edition)
Ebook
Demystifying Artificial intelligence: Simplified AI and Machine Learning concepts for Everyone (English Edition)
byPrashant Kikani
Rating: 0 out of 5 stars
0 ratings
JavaScript for Beginners
Ebook
JavaScript for Beginners
byHernando Abella
Rating: 5 out of 5 stars
5/5
C Programming For Beginners: The Complete Step-By-Step Guide To Mastering The C Programming Language Like A Pro
Ebook
C Programming For Beginners: The Complete Step-By-Step Guide To Mastering The C Programming Language Like A Pro
byVoltaire Lumiere
Rating: 0 out of 5 stars
0 ratings
Modern Full-Stack React Projects: Build, maintain, and deploy modern web apps using MongoDB, Express, React, and Node.js
Ebook
Modern Full-Stack React Projects: Build, maintain, and deploy modern web apps using MongoDB, Express, React, and Node.js
byDaniel Bugl
Rating: 0 out of 5 stars
0 ratings
A Guide To All Programming and Coding Languages
Ebook
A Guide To All Programming and Coding Languages
byDon Carlos
Rating: 0 out of 5 stars
0 ratings
The Future of Artificial Intelligence
Ebook
The Future of Artificial Intelligence
byNsameokim M. Ibok
Rating: 0 out of 5 stars
0 ratings
Mastering Python: A Comprehensive Crash Course for Beginners
Ebook
Mastering Python: A Comprehensive Crash Course for Beginners
byKameron Hussain
Rating: 0 out of 5 stars
0 ratings
Principles of Programming: Java Level 1
Ebook
Principles of Programming: Java Level 1
byJonathan Frank
Rating: 0 out of 5 stars
0 ratings
Learn Programming by Coding Like a Professional: Create Games, Apps, & Programs
Ebook
Learn Programming by Coding Like a Professional: Create Games, Apps, & Programs
byTim Codin
Rating: 0 out of 5 stars
0 ratings
Coding for beginners The basic syntax and structure of coding
Ebook
Coding for beginners The basic syntax and structure of coding
byDiamond Moore
Rating: 0 out of 5 stars
0 ratings
Ultimate Web API Development with Django REST Framework: Build Robust and Secure Web APIs with Django REST Framework Using Test-Driven Development for Data Analysis and Management
Ebook
Ultimate Web API Development with Django REST Framework: Build Robust and Secure Web APIs with Django REST Framework Using Test-Driven Development for Data Analysis and Management
byLeonardo Luis
Rating: 0 out of 5 stars
0 ratings
Ultimate Modern jQuery for Web App Development: Create Stunning Interactive Web Applications with Seamless DOM Manipulation, Animation, and AJAX Integration of jQuery and JavaScript
Ebook
Ultimate Modern jQuery for Web App Development: Create Stunning Interactive Web Applications with Seamless DOM Manipulation, Animation, and AJAX Integration of jQuery and JavaScript
byLaurence Svekis
Rating: 0 out of 5 stars
0 ratings
Ian Talks Python A-Z
Ebook
Ian Talks Python A-Z
byIan Eress
Rating: 0 out of 5 stars
0 ratings
Practical guide for coding: Coding Manual for Industry and product standard classification, generic and for warehouse guideline with Classification Plans and sample Tables for coding articles, products, price lists
Ebook
Practical guide for coding: Coding Manual for Industry and product standard classification, generic and for warehouse guideline with Classification Plans and sample Tables for coding articles, products, price lists
byAlessi Marc'Antonio
Rating: 0 out of 5 stars
0 ratings
Test Driven Machine Learning: Control your machine learning algorithms using test-driven development to achieve quantifiable milestones
Ebook
Test Driven Machine Learning: Control your machine learning algorithms using test-driven development to achieve quantifiable milestones
byJustin Bozonier
Rating: 0 out of 5 stars
0 ratings
CompTIA Linux+ (Plus) Certification The Ultimate Study Guide to Ace the Exam
Ebook
CompTIA Linux+ (Plus) Certification The Ultimate Study Guide to Ace the Exam
byJake T Mills
Rating: 0 out of 5 stars
0 ratings
AI and ML Applications for Decision-Making in Financial Literacy
Ebook
AI and ML Applications for Decision-Making in Financial Literacy
byDr. Zemelak Goraga
Rating: 0 out of 5 stars
0 ratings
Beyond Silicon
Ebook
Beyond Silicon
byPiyush yadav
Rating: 5 out of 5 stars
5/5
Python Apps on Visual Studio Code: Develop apps and utilize the true potential of Visual Studio Code (English Edition)
Ebook
Python Apps on Visual Studio Code: Develop apps and utilize the true potential of Visual Studio Code (English Edition)
bySwapnil Saurav
Rating: 0 out of 5 stars
0 ratings
TCP/IP: Network+ Protocols And Campus LAN Switching Fundamentals
Ebook
TCP/IP: Network+ Protocols And Campus LAN Switching Fundamentals
byRob Botwright
Rating: 0 out of 5 stars
0 ratings
The Wide World of Coding: The People and Careers behind the Programs
Ebook
The Wide World of Coding: The People and Careers behind the Programs
byJennifer Connor-Smith
Rating: 0 out of 5 stars
0 ratings
Elements of Android Q
Ebook
Elements of Android Q
by Mark Murphy
Rating: 0 out of 5 stars
0 ratings
Ultimate SwiftUI Handbook for iOS Developers: A complete guide to native app development for iOS, macOS, watchOS, tvOS, and visionOS
Ebook
Ultimate SwiftUI Handbook for iOS Developers: A complete guide to native app development for iOS, macOS, watchOS, tvOS, and visionOS
byDương Đình Bảo (James) Thăng
Rating: 0 out of 5 stars
0 ratings
Learn HTML and CSS from beginner to expert: Learn HTML5, CSS3, Flexbox, and CSS Grid from the beginning
Ebook
Learn HTML and CSS from beginner to expert: Learn HTML5, CSS3, Flexbox, and CSS Grid from the beginning
byMohammed Mastafi
Rating: 0 out of 5 stars
0 ratings
Building an Operating System with Rust: A Practical Guide
Ebook
Building an Operating System with Rust: A Practical Guide
byRobert Johnson
Rating: 0 out of 5 stars
0 ratings
Mining for Knowledge: Exploring GPU Architectures In Cryptocurrency and AI: The Crypto Mining Mastery Series, #2
Ebook
Mining for Knowledge: Exploring GPU Architectures In Cryptocurrency and AI: The Crypto Mining Mastery Series, #2
byLadd Baby
Rating: 0 out of 5 stars
0 ratings

Intelligence (AI) & Semantics For You

Skip carousel

Writing AI Prompts For Dummies
Ebook
Writing AI Prompts For Dummies
byStephanie Diamond
Rating: 0 out of 5 stars
0 ratings
Artificial Intelligence: A Guide for Thinking Humans
Ebook
Artificial Intelligence: A Guide for Thinking Humans
byMelanie Mitchell
Rating: 4 out of 5 stars
4/5
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
Ebook
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
byCea West
Rating: 4 out of 5 stars
4/5
ChatGPT Money Machine 2024 - The Ultimate Chatbot Cheat Sheet to Go From Clueless Noob to Prompt Prodigy Fast! Complete AI Beginner’s Course to Catch the GPT Gold Rush Before It Leaves You Behind
Ebook
ChatGPT Money Machine 2024 - The Ultimate Chatbot Cheat Sheet to Go From Clueless Noob to Prompt Prodigy Fast! Complete AI Beginner’s Course to Catch the GPT Gold Rush Before It Leaves You Behind
byAlec Rowe
Rating: 0 out of 5 stars
0 ratings
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
Ebook
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
byCea West
Rating: 4 out of 5 stars
4/5
ChatGPT Millionaire: Work From Home and Make Money Online, Tons of Business Models to Choose from
Ebook
ChatGPT Millionaire: Work From Home and Make Money Online, Tons of Business Models to Choose from
byBen Wong
Rating: 5 out of 5 stars
5/5
3550+ Most Effective ChatGPT Prompts
Ebook
3550+ Most Effective ChatGPT Prompts
byOm Prakash Saini
Rating: 0 out of 5 stars
0 ratings
The Secrets of ChatGPT Prompt Engineering for Non-Developers
Ebook
The Secrets of ChatGPT Prompt Engineering for Non-Developers
byCea West
Rating: 5 out of 5 stars
5/5
Midjourney Mastery - The Ultimate Handbook of Prompts
Ebook
Midjourney Mastery - The Ultimate Handbook of Prompts
byAndreea Todinca
Rating: 5 out of 5 stars
5/5
80 Ways to Use ChatGPT in the Classroom
Ebook
80 Ways to Use ChatGPT in the Classroom
byStan Skrabut
Rating: 5 out of 5 stars
5/5
The Instant AI Agency: How to Cash 6 & 7 Figure Checks in the New Digital Gold Rush Without Being A Tech Nerd
Ebook
The Instant AI Agency: How to Cash 6 & 7 Figure Checks in the New Digital Gold Rush Without Being A Tech Nerd
byDan Wardrope
Rating: 5 out of 5 stars
5/5
ChatGPT for Beginners: How to Make Money Online and 10x Your Productivity Using ChatGPT Even if You’re an Absolute Beginner (The Complete Up-to-Date ChatGPT Guide)
Ebook
ChatGPT for Beginners: How to Make Money Online and 10x Your Productivity Using ChatGPT Even if You’re an Absolute Beginner (The Complete Up-to-Date ChatGPT Guide)
byMatthew Hayes
Rating: 0 out of 5 stars
0 ratings
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
Ebook
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
bySteven Cooper
Rating: 4 out of 5 stars
4/5
The ChatGPT Revolution: How to Simplify Your Work and Life Admin with AI
Ebook
The ChatGPT Revolution: How to Simplify Your Work and Life Admin with AI
byDonna McGeorge
Rating: 0 out of 5 stars
0 ratings
A Quickstart Guide To Becoming A ChatGPT Millionaire: The ChatGPT Book For Beginners (Lazy Money Series®)
Ebook
A Quickstart Guide To Becoming A ChatGPT Millionaire: The ChatGPT Book For Beginners (Lazy Money Series®)
byS M Howard
Rating: 4 out of 5 stars
4/5
ChatGPT Side Hustles 2024 - Unlock the Digital Goldmine and Get AI Working for You Fast with More Than 85 Side Hustle Ideas to Boost Passive Income, Create New Cash Flow, and Get Ahead of the Curve
Ebook
ChatGPT Side Hustles 2024 - Unlock the Digital Goldmine and Get AI Working for You Fast with More Than 85 Side Hustle Ideas to Boost Passive Income, Create New Cash Flow, and Get Ahead of the Curve
byAlec Rowe
Rating: 0 out of 5 stars
0 ratings
Chat-GPT Income Ideas: Pioneering Monetization Concepts Utilizing Conversational AI for Profitable Ventures
Ebook
Chat-GPT Income Ideas: Pioneering Monetization Concepts Utilizing Conversational AI for Profitable Ventures
byThe Passive Income Strategist
Rating: 4 out of 5 stars
4/5
THE CHATGPT MILLIONAIRE'S HANDBOOK: UNLOCKING WEALTH THROUGH AI AUTOMATION
Ebook
THE CHATGPT MILLIONAIRE'S HANDBOOK: UNLOCKING WEALTH THROUGH AI AUTOMATION
byLogan Rivers
Rating: 5 out of 5 stars
5/5
ChatGPT Millionaire 2024 - Bot-Driven Side Hustles, Prompt Engineering Shortcut Secrets, and Automated Income Streams that Print Money While You Sleep. The Ultimate Beginner’s Guide for AI Business
Ebook
ChatGPT Millionaire 2024 - Bot-Driven Side Hustles, Prompt Engineering Shortcut Secrets, and Automated Income Streams that Print Money While You Sleep. The Ultimate Beginner’s Guide for AI Business
byAlec Rowe
Rating: 3 out of 5 stars
3/5
The Roadmap to AI Mastery: A Guide to Building and Scaling Projects
Ebook
The Roadmap to AI Mastery: A Guide to Building and Scaling Projects
bySomdip Dey
Rating: 3 out of 5 stars
3/5
Coding with AI For Dummies
Ebook
Coding with AI For Dummies
byChris Minnick
Rating: 1 out of 5 stars
1/5
AI Money Machine: Unlock the Secrets to Making Money Online with AI
Ebook
AI Money Machine: Unlock the Secrets to Making Money Online with AI
byLucas Bennett
Rating: 5 out of 5 stars
5/5
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
Ebook
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
byArthur T. Brooks
Rating: 0 out of 5 stars
0 ratings
Generative AI For Dummies
Ebook
Generative AI For Dummies
byPam Baker
Rating: 2 out of 5 stars
2/5
AI Investing For Dummies
Ebook
AI Investing For Dummies
byPaul Mladjenovic
Rating: 0 out of 5 stars
0 ratings
AI Crash Course: A fun and hands-on introduction to machine learning, reinforcement learning, deep learning, and artificial intelligence with Python
Ebook
AI Crash Course: A fun and hands-on introduction to machine learning, reinforcement learning, deep learning, and artificial intelligence with Python
byHadelin de Ponteves
Rating: 2 out of 5 stars
2/5
Artificial Intelligence For Dummies
Ebook
Artificial Intelligence For Dummies
byJohn Paul Mueller
Rating: 3 out of 5 stars
3/5
100M Offers Made Easy: Create Your Own Irresistible Offers by Turning ChatGPT into Alex Hormozi
Ebook
100M Offers Made Easy: Create Your Own Irresistible Offers by Turning ChatGPT into Alex Hormozi
byBen Preston
Rating: 5 out of 5 stars
5/5
Make Money with ChatGPT: Your Guide to Making Passive Income Online with Ease using AI: AI Wealth Mastery
Ebook
Make Money with ChatGPT: Your Guide to Making Passive Income Online with Ease using AI: AI Wealth Mastery
byBen Preston
Rating: 2 out of 5 stars
2/5
Thinking in Algorithms: Strategic Thinking Skills, #2
Ebook
Thinking in Algorithms: Strategic Thinking Skills, #2
byAlbert Rutherford
Rating: 4 out of 5 stars
4/5

Related categories

Skip carousel

Reviews for The Art of Machine Learning

Rating: 5 out of 5 stars

5/5

1 rating1 review

Rating: 5 out of 5 stars
5/5
Nov 13, 2024
Thank You This Is Very Good, Maybe This Can Help You
Download Full Ebook Very Detail Here :
https://amzn.to/3XOf46C
- You Can See Full Book/ebook Offline Any Time
- You Can Read All Important Knowledge Here
- You Can Become A Master In Your Business

Book preview

The Art of Machine Learning - Norman Matloff

INTRODUCTION

Machine learning! With such a science fiction-ish name, one might expect it to be technology that is strictly reserved for highly erudite specialists. Not true.

Actually, machine learning (ML) can easily be explained in commonsense terms, and anyone with a good grasp of charts, graphs, and the slope of a line should be able to both understand and productively use ML. Of course, as the saying goes, The devil is in the details, and one must work one’s way through those details. But ML is not rocket science, in spite of it being such a powerful tool.

0.1 What Is ML?

ML is all about prediction. Does a patient have a certain disease? Will a customer switch from her current cell phone service to another? What is actually being said in this rather garbled audio recording? Is that bright spot observed by a satellite a forest fire or just a reflection?

We predict an outcome from one or more features. In the disease diagnosis example, the outcome is having the disease or not, and the features may be blood tests, family history, and so on.

All ML methods involve a simple idea: similarity. In the cell phone service example, how do we predict the outcome for a certain customer? We look at past customers and select the ones who are most similar in features (size of bill, lateness record, yearly income, and so on) to our current customer. If most of those similar customers bolted, we predict the same for the current one. Of course, we are not guaranteed that outcome, but it is our best guess.

0.2 The Role of Math in ML Theory and Practice

Many ML methods are based on elegant mathematical theory, with support vector machines (SVMs) being a notable example. However, knowledge of this theory has very little use in terms of being able to apply SVM well in actual applications.

To be sure, a good intuitive understanding of how ML methods work is essential to effective use of ML in practice. This book strives to develop in the reader a keen understanding of the intuition, without using advanced mathematics. Indeed, there are very few equations in this book.

0.3 Why Another ML Book?

There are many great ML books out there, of course, but none really empower the reader to use ML effectively in real-world problems. In many cases, the problem is that the books are too theoretical, but I am equally concerned that the applied books tend to be cookbooks (too recipe-oriented) that treat the subject in a Step 1, Step 2, Step 3 manner. Their focus is on the syntax and semantics of ML software, with the result that while the reader may know the software well, the reader is not positioned to use ML well.

I wrote this book because:

There is a need for a book that uses the R language but is not about R. This is a book on ML that happens to use R for examples and not a book about the use of R in ML.

There is a need for an ML book that recognizes that ML is an art, not a science. (Hence the title of this book.)

There is a need for an ML book that avoids advanced math but addresses the point that, in order to use ML effectively, one does need to understand the concepts well—the why and how of ML methods. Most applied ML books do too little in explaining these things.

All three of these bullets go back to the anti-cookbook theme. My goal is, then, this:

I would like those who use ML to not only know the definition of random forests but also be ready to cogently explain how the various hyperparameters in random forests may affect overfitting. MLers also should be able to give a clear account of the problems of p-hacking in feature engineering.

We will empower the reader with strong, practical, real-world knowledge of ML methods—their strengths and weaknesses, what makes them work and fail, what to watch out for. We will do so without much formal math and will definitely take a hands-on approach, using prominent software packages on real datasets. But we will do so in a savvy manner. We will be informed consumers.

0.4 Recurring Special Sections

There are special recurring themes and sections throughout this book:

Bias vs. Variance

Numerous passages explain in concrete terms—no superstition!—how these two central notions play out for each specific ML method.

Pitfalls

Numerous sections with the Pitfall title warn the reader of potential problems and show how to avoid them.

0.5 Background Needed

What kind of background will the reader need to use this book profitably?

No prior exposure to ML or statistics is assumed.

As to math in general, the book is mostly devoid of formal equations. As long as the reader is comfortable with basic graphs, such as histograms and scatterplots, and simple algebra notions, such as the slope of a line, that is quite sufficient.

The book does assume some prior background in R coding, such as familiarity with vectors, factors, data frames, and functions. The R command line (> prompt, Console in RStudio) is used throughout. Readers without a background in R, or those wishing to have a review, may find my fasteR tutorial useful: https://github.com/matloff/fasteR.

Make sure R and the qeML package are installed on your computer. For the package, the preferred installation source is GitHub, as it will always have the most up-to-date version of the package. You’ll need the devtools package; if you don’t already have it, type:

install.packages('devtools')

Then, to install qeML, type:

install_github('https://github.com/matloff/qeML')

The qeML package will also be on the CRAN R code repository but updated less frequently.

***0.6 The qe*-Series Software***

Most of the software used here will come from popular R packages:

e1071

gbm

glmnet

keras

randomForest

Readers can use these packages directly if they wish. But in order to keep things simple and convenient for readers, we usually will be using wrappers for the functions in those packages, which are available in my package, qeML. This is a big help in two ways:

The wrappers provide a uniform interface.

That uniform interface is also simple.

For instance, consider day1, a bike rental dataset used at various points in this book. We wish to predict tot, total ridership. Here’s how we would do that using random forests, an ML topic covered in this book:

qeRF(day1,'tot')

For support vector machines, another major topic, the call would be

qeSVM(day1,'tot')

and so on. Couldn’t be simpler! No preparatory code, say, to define a model; just call one of the qe functions and go! The prefix qe- stands for quick and easy. One can also specify method-specific parameters, which we will do, but still, it will be quite simple.

For very advanced usage, this book shows how to use those packages directly.

0.7 The Book’s Grand Plan

Here is the path we’ll take. The first three chapters introduce general concepts that recur throughout the book, as well as specific machine learning methods. The rough description of ML above—predict on the basis of similar cases—is most easily developed using an ML method known as k-nearest neighbors (k-NN). Part I of the book will play two roles. First, it will cover k-NN in detail. Second, it will introduce the reader to general concepts that apply to all ML methods, such as choice of hyperparameters. In k-NN, the number of similar cases, usually denoted k, is the hyperparameter. For k-NN, what is the Goldilocks value of k—not too small and not too large? Again, choice of hyperparameters is key in most ML methods, and it will be introduced via k-NN.

Part II will then present a natural extension of k-NN, tree-based methods, specifically random forests and gradient boosting. These methods work in a flowchart-like manner, asking questions about features one at a time. In the disease diagnosis example given before, the first question might be, Is the patient over age 50? The next might be something like, Is the patient’s body mass index below 20.2? In the end, this process partitions the patients into small groups in which the members are similar to each other, so it’s like k-NN. But the groups do take different forms from k-NN, and tree methods often outperform k-NN in prediction accuracy and are considered a major ML tool.

Part III discusses methods based on linear relationships. Readers who have some background in linear regression analysis will recognize some of this, though again, no such background is assumed. This part closes with a discussion of the LASSO and ridge regression, which have the tantalizing property of deliberately shrinking down some classical linear regression estimates.

Part IV involves methods based on separating lines and planes. Consider again the cell phone service example. Say we plot the data for the old customers who left the service using the color blue in our graph. Then on the same graph, we plot those who remained loyal in red. Can we find a straight line that separates most of the blue points from most of the red points? If so, we will predict the action of the new customer by checking which side of the line his case falls on. This description not only fits SVM but also fits, in a sense, the most famous ML method, neural networks, which we cover as well.

Finally, Part V introduces several specific types of ML applications, such as image classification.

It’s often said that no one ML method works best in all applications. True, but hopefully this book’s structure will impart a good understanding of similarities and differences between the methods, appreciating where each fits in the grand scheme of things.

There is a website for the book at http://heather.cs.ucdavis.edu/artofml, which contains code, errata, new examples, and more.

0.8 One More Point

In reading this book, keep in mind that the prose is just as important as the code. Avoid the temptation to focus only on the code and graphs. A page that is all prose—no math, no graphs, and no code—may be one of the most important pages in the book. It is there that you will learn the all-important why of ML, such as why choice of hyperparameters is so vital. The prose is crucial to your goal of becoming adept at ML with the most insight and predictive power!

Keep in mind that those dazzling ML successes you’ve heard about come only after careful, lengthy tuning and thought on the analyst’s part, requiring real insight. This book aims to develop that insight. Formal math is minimized here, but note that this means the math will give way to prose that describes many key issues.

So, let’s get started. Happy ML-ing!

PART I

PROLOGUE, AND NEIGHBORHOOD-BASED METHODS

1

REGRESSION MODELS

In this chapter, we’ll introduce regression functions. Such functions give the mean of one variable in terms of one or more others—for instance, the mean weight of children in terms of their age. All ML methods are regression methods in some form, meaning that they use the data we provide to estimate regression functions.

We’ll present our first ML method, k-nearest neighbors (k-NN), and apply it to real data. We’ll also weave in concepts that will recur throughout the book, such as dummy variables, overfitting, p-hacking, dirty data, and so on. We’ll introduce many of these concepts only briefly for the time being in order to give you a bird’s-eye view of what we’ll return to in detail later: ML is intuitive and coherent but easier to master if taken in stages. Reader, please be prepared for frequent statements like We’ll cover one aspect for now, with further details later.

Before you begin, make sure you have R and the qeML and regtools packages, version 1.7 or newer for the latter, installed on your computer. (Run packageVersion('regtools') to check.) All code displays in this book assume that the user has already made the calls to load the packages:

library(regtools)

library(qeML)

So, let’s look at our first example dataset.

1.1 Example: The Bike Sharing Dataset

Before we introduce k-NN, we’ll need to have some data to work with. Let’s start with this dataset from the UC Irvine Machine Learning Repository, which contains the Capital Bikeshare system’s hourly and daily count of bike rentals between 2011 and 2012, with corresponding information on weather and other quantities. A more detailed description of the data is available at the UC Irvine Machine Learning Repository.¹

The dataset is included as the day dataset in regtools by permission of the data curator. Note, though, that we will use a slightly modified version, day1 (also included in regtools), in which the numeric weather variables are given in their original scale rather than transformed to the interval [0,1].

Our main interest will be in predicting total ridership for a day.

SOME TERMINOLOGY

Say we wish to predict ridership from temperature and humidity. Standard ML parlance refers to the variables used for prediction—in this case, temperature and humidity—as features.

If the variable to be predicted is numeric, say, ridership, there is no standard ML term for it. We’ll just refer to it as the outcome variable. But if the variable to be predicted is an R factor—that is, a categorical variable—it is called a label.

For instance, later in this book we will analyze a dataset on diseases of human vertebrae. There are three possible outcomes or categories: normal (NO), disk hernia (DH), or spondylolisthesis (SL). The column in our dataset showing the class of each patient, NO, DH, or SL, would be the labels column.

Our dataset, say, day1 here, is called the training set. We use it to make predictions in future cases, in which the features are known but the outcome variable is unknown. We are predicting the latter.

1.1.1 Loading the Data

The data comes in hourly and daily forms, with the latter being the one in the regtools package. Load the data:

> data(day1)

With any dataset, it’s always a good idea to first take a look around. What variables are included in this data? What types are they, say, numeric or R factor? What are their typical values? One way to do this is to use R’s head() function to view the top of the data:

> head(day1)

instant dteday season yr mnth holiday

1 1 2011-01-01 1 0 1 0

2 2 2011-01-02 1 0 1 0

3 3 2011-01-03 1 0 1 0

4 4 2011-01-04 1 0 1 0

5 5 2011-01-05 1 0 1 0

6 6 2011-01-06 1 0 1 0

weekday workingday weathersit temp

1 6 0 2 8.175849

2 0 0 2 9.083466

3 1 1 1 1.229108

4 2 1 1 1.400000

5 3 1 1 2.666979

6 4 1 1 1.604356

atemp hum windspeed casual registered

1 7.999250 0.805833 10.749882 331 654

2 7.346774 0.696087 16.652113 131 670

3 -3.499270 0.437273 16.636703 120 1229

4 -1.999948 0.590435 10.739832 108 1454

5 -0.868180 0.436957 12.522300 82 1518

6 -0.608206 0.518261 6.000868 88 1518

tot

1 985

2 801

3 1349

4 1562

5 1600

6 1606

nrow(day1)

[1] 731

We see there are 731 rows (that is, 731 different days), with data on the date, nature of the date (such as weekday), and weather conditions (such as the temperature, temp, and humidity, hum). The last three columns measure ridership from casual users, registered users, and the total.

You can find more information on the dataset with the ?day1 command.

1.1.2 A Look Ahead

We will get to actual analysis of this data shortly. For now, here is a preview. Say we wish to predict total ridership for tomorrow, based on specific weather conditions and so on. How will we do that with k-NN?

We will search through our data, looking for data points that match or nearly match those same weather conditions and other variables. We will then average the ridership values among those data points, and that will be our predicted ridership for this new day.

Too simple to be true? No, not really; the above description is accurate. Of course, the old saying The devil is in the details applies, but the process is indeed simple. But first, let’s address some general issues.

1.2 Machine Learning and Prediction

ML is fundamentally about prediction. Before we get into the details of our first ML method, we should be sure we know what prediction means.

Consider the bike sharing dataset. Early in the morning, the manager of the bike sharing service might want to predict the total number of riders for the day. The manager can do so by analyzing the relations between the features—the various weather conditions, the work status of the day (weekday, holiday), and so on. Of course, predictions are not perfect, but if they are in the ballpark of what turns out to be the actual number, they can be quite helpful. For instance, they can help the manager decide how many bikes to make available, with pumped-up tires and so on. (An advanced version would be to predict the demand for bikes at each station so that bikes could be reallocated accordingly.)

1.2.1 Predicting Past, Present, and Future

The famous baseball player and malapropist Yogi Berra once said, Prediction is hard, especially about the future. Amusing as this is, he had a point; in ML, prediction can refer not only to the future but also to the present or even the past. For example, a researcher may wish to estimate the mean wages workers made back in the 1700s. Or a physician may wish to make a diagnosis as to whether a patient has a particular disease, based on blood tests, symptoms, and so on, guessing their condition in the present, not the future. So when we in the ML field talk of prediction, don’t take the pre- too literally.

1.2.2 Statistics vs. Machine Learning in Prediction

A common misconception is that ML is concerned with prediction, while statisticians do inference—that is, confidence intervals and testing for quantities of interest—but prediction is definitely an integral part of the field of statistics.

There is sometimes a friendly rivalry between the statistics and ML communities, even down to a separate terminology for each (see Appendix B). Indeed, statisticians sometimes use the term statistical learning to refer to the same methods known in the ML world as machine learning!

As a former statistics professor who has spent most of his career in a computer science department, I have a foot in both camps. I will present ML methods in computational terms, but with some insights informed by statistical principles.

HISTORICAL NOTE

Many of the methods treated in this book, which compose part of the backbone of ML, were originally developed in the statistics community. These include k-NN, decision trees or random forests, logistic regression, and L1/L2 shrinkage. These evolved from the linear models formulated way back in the 19th century, but which later statisticians felt were inadequate for some applications. The latter consideration sparked interest in methods that had less restrictive assumptions, leading to the invention first of k-NN and later of other techniques.

On the other hand, two other prominent ML methods, support vector machines (SVMs) and neural networks, have been developed almost entirely outside of statistics, notably in university computer science departments. (Another method, boosting, began in computer science but has had major contributions from both factions.) Their impetus was not statistical at all. Neural networks, as we often hear in the media, were studied originally as a means to understand the workings of the human brain. SVMs were viewed simply in computer science algorithmic terms—given a set of data points of two classes, how can we compute the best line or plane separating them?

1.3 Introducing the k-Nearest Neighbors Method

Our featured method in this chapter will be k-nearest neighbors, or k-NN. It’s arguably the oldest ML method, going back to the early 1950s, but it is still widely used today, especially in applications in which the number of features is small (for reasons that will become clear later). It’s also simple to explain and easy to implement—the perfect choice for this introductory chapter.

1.3.1 Predicting Bike Ridership with k-NN

Let’s first look at using k-NN to predict bike ridership from a single feature: temperature. Say the day’s temperature is forecast to be 28 degrees centigrade. How should we predict ridership for the day, using the 28 figure and our historical ridership dataset (our training set)? A person without a background in ML might suggest looking at all the days in our data, culling out those of temperature closest to 28 (there may be few or none with a temperature of exactly 28), and then finding the average ridership on those days. We would use that number as our predicted ridership for this day.

Actually, this intuition is correct! This, in fact, is the basis for many common ML methods, as we’ll discuss further in Section 1.6 on the regression function. For now, just know that k-NN takes the form of simply averaging over the similar cases—that is, over the neighboring data points. The quantity k is the number of neighbors we use. We could, say, take

Enjoying the preview?

Page 1 of 1

The Art of Machine Learning: A Hands-On Guide to Machine Learning with R

About this ebook

Norman Matloff

Related authors

Related to The Art of Machine Learning

Related ebooks

Ultimate Machine Learning with Scikit-Learn: Unleash the Power of Scikit-Learn and Python to Build Cutting-Edge Predictive Modeling Applications and Unlock Deeper Insights Into Machine Learning (English Edition)

Python Automation Mastery: From Novice To Pro

The Comprehensive Guide to Machine Learning Algorithms and Techniques

Machine Learning for Absolute Beginners: An Introduction to the Fundamentals and Applications of Machine Learning

Demystifying Artificial intelligence: Simplified AI and Machine Learning concepts for Everyone (English Edition)

JavaScript for Beginners

C Programming For Beginners: The Complete Step-By-Step Guide To Mastering The C Programming Language Like A Pro

Modern Full-Stack React Projects: Build, maintain, and deploy modern web apps using MongoDB, Express, React, and Node.js

A Guide To All Programming and Coding Languages

The Future of Artificial Intelligence

Mastering Python: A Comprehensive Crash Course for Beginners

Principles of Programming: Java Level 1

Learn Programming by Coding Like a Professional: Create Games, Apps, & Programs

Coding for beginners The basic syntax and structure of coding

Ultimate Web API Development with Django REST Framework: Build Robust and Secure Web APIs with Django REST Framework Using Test-Driven Development for Data Analysis and Management

Ultimate Modern jQuery for Web App Development: Create Stunning Interactive Web Applications with Seamless DOM Manipulation, Animation, and AJAX Integration of jQuery and JavaScript

Ian Talks Python A-Z

Practical guide for coding: Coding Manual for Industry and product standard classification, generic and for warehouse guideline with Classification Plans and sample Tables for coding articles, products, price lists

Test Driven Machine Learning: Control your machine learning algorithms using test-driven development to achieve quantifiable milestones

CompTIA Linux+ (Plus) Certification The Ultimate Study Guide to Ace the Exam

AI and ML Applications for Decision-Making in Financial Literacy

Beyond Silicon

Python Apps on Visual Studio Code: Develop apps and utilize the true potential of Visual Studio Code (English Edition)

TCP/IP: Network+ Protocols And Campus LAN Switching Fundamentals

The Wide World of Coding: The People and Careers behind the Programs

Elements of Android Q

Ultimate SwiftUI Handbook for iOS Developers: A complete guide to native app development for iOS, macOS, watchOS, tvOS, and visionOS

Learn HTML and CSS from beginner to expert: Learn HTML5, CSS3, Flexbox, and CSS Grid from the beginning

Building an Operating System with Rust: A Practical Guide

Mining for Knowledge: Exploring GPU Architectures In Cryptocurrency and AI: The Crypto Mining Mastery Series, #2

Intelligence (AI) & Semantics For You

Writing AI Prompts For Dummies

Artificial Intelligence: A Guide for Thinking Humans

Mastering ChatGPT: 21 Prompts Templates for Effortless Writing

ChatGPT Money Machine 2024 - The Ultimate Chatbot Cheat Sheet to Go From Clueless Noob to Prompt Prodigy Fast! Complete AI Beginner’s Course to Catch the GPT Gold Rush Before It Leaves You Behind

Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates

ChatGPT Millionaire: Work From Home and Make Money Online, Tons of Business Models to Choose from

3550+ Most Effective ChatGPT Prompts

The Secrets of ChatGPT Prompt Engineering for Non-Developers

Midjourney Mastery - The Ultimate Handbook of Prompts

80 Ways to Use ChatGPT in the Classroom

The Instant AI Agency: How to Cash 6 & 7 Figure Checks in the New Digital Gold Rush Without Being A Tech Nerd

ChatGPT for Beginners: How to Make Money Online and 10x Your Productivity Using ChatGPT Even if You’re an Absolute Beginner (The Complete Up-to-Date ChatGPT Guide)

Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees

The ChatGPT Revolution: How to Simplify Your Work and Life Admin with AI

A Quickstart Guide To Becoming A ChatGPT Millionaire: The ChatGPT Book For Beginners (Lazy Money Series®)

ChatGPT Side Hustles 2024 - Unlock the Digital Goldmine and Get AI Working for You Fast with More Than 85 Side Hustle Ideas to Boost Passive Income, Create New Cash Flow, and Get Ahead of the Curve

Chat-GPT Income Ideas: Pioneering Monetization Concepts Utilizing Conversational AI for Profitable Ventures

THE CHATGPT MILLIONAIRE'S HANDBOOK: UNLOCKING WEALTH THROUGH AI AUTOMATION

ChatGPT Millionaire 2024 - Bot-Driven Side Hustles, Prompt Engineering Shortcut Secrets, and Automated Income Streams that Print Money While You Sleep. The Ultimate Beginner’s Guide for AI Business

The Roadmap to AI Mastery: A Guide to Building and Scaling Projects

Coding with AI For Dummies

AI Money Machine: Unlock the Secrets to Making Money Online with AI

Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)

Generative AI For Dummies

AI Investing For Dummies

AI Crash Course: A fun and hands-on introduction to machine learning, reinforcement learning, deep learning, and artificial intelligence with Python

Artificial Intelligence For Dummies

100M Offers Made Easy: Create Your Own Irresistible Offers by Turning ChatGPT into Alex Hormozi

Make Money with ChatGPT: Your Guide to Making Passive Income Online with Ease using AI: AI Wealth Mastery

Thinking in Algorithms: Strategic Thinking Skills, #2

Related categories

Reviews for The Art of Machine Learning

What did you think?

Book preview

The Art of Machine Learning - Norman Matloff

INTRODUCTION

0.1 What Is ML?

0.2 The Role of Math in ML Theory and Practice

0.3 Why Another ML Book?

0.4 Recurring Special Sections

0.5 Background Needed

0.6 The qe*-Series Software

0.7 The Book’s Grand Plan

***0.6 The qe*-Series Software***