0% found this document useful (0 votes)
12 views38 pages

Introduction to Machine Learning Concepts

This document outlines a course on machine learning, covering fundamental concepts, algorithms, and applications. Students will learn about supervised, unsupervised, semi-supervised, and reinforcement learning, as well as frameworks for building machine learning systems. The course aims to equip students with the knowledge and skills to implement machine learning solutions for real-world problems.

Uploaded by

caothaiiiop
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views38 pages

Introduction to Machine Learning Concepts

This document outlines a course on machine learning, covering fundamental concepts, algorithms, and applications. Students will learn about supervised, unsupervised, semi-supervised, and reinforcement learning, as well as frameworks for building machine learning systems. The course aims to equip students with the knowledge and skills to implement machine learning solutions for real-world problems.

Uploaded by

caothaiiiop
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Machine Learning

(INT_E 14121)
2 Objectives
Knowledge

The goal of this course is to provide an introduction to machine learning algorithms and
applications. The students will:

 Become familiar with fundamental concepts of machine learning and current


achievements in this area.

 Gain knowledge of important machine learning algorithms including supervised learning


(generative/discriminative learning, parametric/non-parametric learning, support vector
machines); unsupervised learning (clustering, dimensionality reduction); and learning
theory (bias/variance tradeoffs).

 Gain an application perspective of machine learning


3 Objectives (cont.)

Skills
Upon successful completion of this source, the students will be able to:
 Demonstrate the understanding of machine learning fundamental
concepts, including potential applications.

 Implement solutions for some real world applications using machine


learning software or library
4 Description

 This course provides a broad introduction to algorithms and applications of


machine learning. Topics include supervised learning
(generative/discriminative learning, parametric/non-parametric learning,
support vector machines); unsupervised learning (clustering, dimensionality
reduction); and learning theory (bias/variance tradeoffs).

 The course will also discuss applications of machine learning, such as data
mining, pattern recognition, and text and web data processing.
5 Textbooks
Lesson 1-Introduction
7 Outline

 What is machine learning?


 Why is machine learning?
 When should use machine learning?
 Types of learning
 Frameworks for building machine learning systems
 Machine Learning Perspective of Data
 Machine Learning Python Packages
8 What is machine learning?

 In 1959, Arthur Samuel, an American pioneer in the field of computer gaming, machine
learning, and artificial intelligence has defined machine learning as a “Field of study that gives
computers the ability to learn without being explicitly programmed.”

 Machine learning is the practice of programming computers to learn from data.

 Machine learning is a subfield of computer science that is concerned with building algorithms
which, to be useful, rely on a collection of examples of some phenomenon. These examples
can come from nature, be handcrafted by humans or generated by another algorithm.

 Machine learning can also be defined as the process of solving a practical problem by

 1) gathering a dataset, and

 2) algorithmically building a statistical model based on that dataset. That statistical model is assumed to
be used somehow to solve the practical problem.
9 Expert system vs Machine learning
 Intelligent Sofware 1.0: inputs + rules → results
 Intelligent Sofware 2.0: inputs + results → rules
10 Why is machine learning?
 Spam e-mails filter
 Without ML
 The program is not software, it contains a very long list of rules that are difficult to maintain.

Launch!

Study the
problem
Write rules Evaluate

Analyze errors
11 Why is machine learning? (cont.)
 With ML

Launch!

data

Study
Train ML Evaluate
the
problem algorithm solution

Analyze errors
12 Why is machine learning? (cont.)
13
When should you use machine learning?

 When you have a problem that requires many long lists of rules to find the solution. In this
case, machine-learning techniques can simplify your code and improve performance.

 Very complex problems for which there is no solution with a traditional approach.

 Non- stable environments: machine-learning software can adapt to new data


14 Types of Learning
 Supervised Learning
 Unsupervised Learning
 Semi-Supervised Learning
 Reinforcement Learning
15 Supervised Learning
 In this type of machine-learning system, the data that you feed into the algorithm, with the
desired solution, are referred to as “labels.”

 The dataset is the collection of labeled examples {(xi,yi)}N i=1


 Each element xi among N is called a feature vector. A feature vector is a vector in which
each dimension j = 1,...,D contains a value that describes the example somehow. That value
is called a feature and is denoted as x(j)
 The label yi can be either an element belonging to a finite set of classes {1,2,...,C}, or a real
number, or a more complex structure, like a vector, a matrix, a tree, or a graph.
16 Supervised Learning (cont.)
 The goal of the algorithm is to learn patterns in the data and build a general set of
rules to map input to the class or event.
 There are two types commonly used as supervised learning algorithms.
 Regression
 Classification
17 Unsupervised Learning

 In this type of machine-learning system, can guess that the data is unlabeled.

 the dataset is a collection of unlabeled examples {xi}N i=1. Again, x is a feature vector,
and the goal of an unsupervised learning algorithm is to create a model that takes a
feature vector x as input and either transforms it into another vector or into a value
that can be used to solve a practical problem.
18 Unsupervised Learning (cont.)

 There are situations where the desired output class/event is


unknown for historical data. The objective in such cases would be
to study the patterns in the input dataset to get better
understanding and identify similar patterns that can be grouped
into specific classes or events.

 There are three types commonly used as unsupervised learning


algorithms.

 Clustering

 Dimension Reduction

 Anomaly Detection
19 Semi-Supervised Learning

 The dataset contains both labeled and unlabeled


examples. Usually, the quantity of unlabeled examples is
much higher than the number of labeled examples.

 The goal of a semi-supervised learning algorithm is the


same as the goal of the supervised learning algorithm.
The hope here is that using many unlabeled examples
can help the learning algorithm to find a better model.
20 Reinforcement Learning
 The machine “lives” in an environment and is
capable of perceiving the state of that
environment as a vector of features.

 The machine can execute actions in every


state. Different actions bring different rewards
and could also move the machine to another
state of the environment. The goal of a
reinforcement learning algorithm is to learn a
policy.
21 Reinforcement Learning (cont.)
 The goal of a reinforcement learning algorithm is to learn a policy.

 A policy is a function (similar to the model in supervised learning) that takes the feature
vector of a state as input and outputs an optimal action to execute in that state.

 The action is optimal if it maximizes the expected average reward.

 Reinforcement learning solves a particular kind of problem where decision making is


sequential, and the goal is long-term

 Examples of reinforcement learning techniques are the following:

 Markov decision process

 Q-learning/Q deep learning

 Temporal Difference methods

 Monte-Carlo methods
22 Basic operations of machine learning
system
23 Frameworks for Building Machine
Learning Systems
 Knowledge Discovery Databases (KDD) process model
 CRoss Industrial Standard Process for Data Mining (CRISP – DM)
 Sample, Explore, Modify, Model and Assess (SEMMA)
24 Knowledge Discovery Databases
(KDD)
 This refers to the overall process of discovering useful knowledge from data,
which was presented by a book by Fayyad et al., 1996
25 Cross-Industry Standard Process for
Data Mining
 It was established by the European Strategic Program on Research in Information
Technology initiative with an aim to create an unbiased methodology that is not
domain dependent.
26 SEMMA (Sample, Explore, Modify,
Model, Assess)
 SEMMA are the sequential steps to build machine learning models incorporated
in ‘SAS Enterprise Miner’, a product by SAS Institute Inc.
27
28 Machine Learning Perspective of Data

 Data is the fact and figures (can also be referred as raw data) that we
have available with respect to the business context. Data are made up of
these two aspects:

 Objects such as people, tree, animals, etc.

 Attributes that were recorded for objects such as age, size, weight, cost, etc.

 The things we measure, control, or manipulate for objects are the variables

 The amount of information that can be provided by a variable is


determined by its type of measurement scale.
29 Scales of Measurement
 Variables can be measured on four different scales.

 Mean, median, and mode are the way to understand the central
tendency, that is, the middle point of data distribution

 Standard deviation, variance, and range are the most commonly used
dispersion measures used to understand the spread of the data.
30 Nominal Scale of Measurement

 Data are measured at the nominal level when each case is classified into
one of a number of discrete categories.
31 Ordinal Scale of Measurement

 Data are measured on an ordinal scale if the categories imply order.


32 Interval Scale of Measurement

 If the differences between values have meanings, the data are measured
at the interval scale
33 Ratio Scale of Measurement

 Data measured on a ratio scale have differences that are meaningful, and
relate to some true zero point.
34 Machine Learning Python Packages
 There is a rich number of open source libraries available to facilitate
practical machine learning.

 These are mainly known as scientific Python libraries and are generally put
to use when performing elementary machine learning tasks.

 At a high level we can divide these libraries into data analysis and core
machine learning libraries based on their usage/purpose

 Data analysis packages

 Core Machine learning packages


35 Data Analysis Packages

 There are four key packages that are most widely used for data analysis
• NumPy
• SciPy
• Matplotlib
• Pandas
36 Machine Learning Core Libraries
 Python has a plethora of open source machine learning libraries
37
The End
38 Homework

1. Give 2 real-world examples that fit the application of machine learning.


2. Find 2 real-world examples that not fit the application of machine learning.

You might also like