0% found this document useful (0 votes)
52 views

Unit 1

Machine learning is a subfield of artificial intelligence that allows computers to learn from data without being explicitly programmed. There are four main types of machine learning: supervised learning which trains models on labeled data to map inputs to outputs; unsupervised learning which finds patterns in unlabeled data through tasks like clustering; semi-supervised learning which uses a combination of labeled and unlabeled data; and reinforcement learning where an agent learns through trial-and-error interactions with an environment. Machine learning powers technologies like virtual assistants, self-driving cars, recommendations, and medical diagnosis systems.

Uploaded by

vvvcxzzz3754
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views

Unit 1

Machine learning is a subfield of artificial intelligence that allows computers to learn from data without being explicitly programmed. There are four main types of machine learning: supervised learning which trains models on labeled data to map inputs to outputs; unsupervised learning which finds patterns in unlabeled data through tasks like clustering; semi-supervised learning which uses a combination of labeled and unlabeled data; and reinforcement learning where an agent learns through trial-and-error interactions with an environment. Machine learning powers technologies like virtual assistants, self-driving cars, recommendations, and medical diagnosis systems.

Uploaded by

vvvcxzzz3754
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Unit-1

what is machine learning?

Machine learning is a subfield of artificial intelligence (AI) that focuses on developing algorithms
and techniques that allow computers to learn from and make predictions or decisions based on
data without being explicitly programmed to do so. In traditional programming, humans
explicitly write rules for the computer to follow, whereas in machine learning, the computer
learns from patterns and relationships in the data.

1. Supervised Learning: This involves training a model on a labeled dataset, where


each example is paired with a corresponding label or outcome. The model learns
to map input data to the correct output by generalizing from the labeled
examples it has seen.
2. Unsupervised Learning: In this type of learning, the algorithm is given unlabeled
data and tasked with finding patterns or structures within it. Clustering and
dimensionality reduction are common tasks in unsupervised learning.
3. Semi-supervised Learning: This is a combination of supervised and
unsupervised learning, where the algorithm learns from a dataset that contains
both labeled and unlabeled data. The model can use the labeled data to guide its
learning process while also leveraging the unlabeled data to improve its
understanding of the underlying structure.
4. Reinforcement Learning: This type of learning involves an agent interacting with
an environment and learning to make decisions by receiving feedback in the form
of rewards or penalties. The agent learns to take actions that maximize
cumulative rewards over time.
It powers technologies such as virtual assistants, self-driving cars, personalized
recommendations, and medical diagnosis systems.

Well Posed Learning Problem


A well-posed learning problem refers to a specific formulation of a task within the context of
machine learning

Examples
Designing a learning system
Machine Learning enables a Machine to Automatically learn from Data, Improve
performance from an Experience and predict things without explicitly programmed.”
In Simple Words, When we fed the Training Data to Machine Learning Algorithm, this
algorithm will produce a mathematical model and with the help of the mathematical
model, the machine will make a prediction and take a decision without being explicitly
programmed. Also, during training data, the more machine will work with it the more it
will get experience and the more efficient result is produced.
Example : In Driverless Car, the training data is fed to Algorithm like how to Drive
Car in Highway, Busy and Narrow Street with factors like speed limit, parking, stop at
signal etc. After that, a Logical and Mathematical model is created on the basis of that
and after that, the car will work according to the logical model. Also, the more data the
data is fed the more efficient output is produced.
0 seconds of 0 secondsVolume 0%

Designing a Learning System in Machine Learning :


According to Tom Mitchell, “A computer program is said to be learning from
experience (E), with respect to some task (T). Thus, the performance measure (P) is the
performance at task T, which is measured by P, and it improves with experience E.”
Example: In Spam E-Mail detection,
 Task, T: To classify mails into Spam or Not Spam.
 Performance measure, P: Total percent of mails being correctly classified
as being “Spam” or “Not Spam”.
 Experience, E: Set of Mails with label “Spam”

Steps for Designing Learning System are:


Step 1) Choosing the Training Experience: The very important and first task is to
choose the training data or training experience which will be fed to the Machine
Learning Algorithm. It is important to note that the data or experience that we fed to the
algorithm must have a significant impact on the Success or Failure of the Model. So
Training data or experience should be chosen wisely.
Below are the attributes which will impact on Success and Failure of Data:
 The training experience will be able to provide direct or indirect feedback
regarding choices. For example: While Playing chess the training data will
provide feedback to itself like instead of this move if this is chosen the
chances of success increases.
 Second important attribute is the degree to which the learner will control the
sequences of training examples. For example: when training data is fed to the
machine then at that time accuracy is very less but when it gains experience
while playing again and again with itself or opponent the machine algorithm
will get feedback and control the chess game accordingly.
 Third important attribute is how it will represent the distribution of examples
over which performance will be measured. For example, a Machine learning
algorithm will get experience while going through a number of different cases
and different examples. Thus, Machine Learning Algorithm will get more and
more experience by passing through more and more examples and hence its
performance will increase.
Step 2- Choosing target function: The next important step is choosing the target
function. It means according to the knowledge fed to the algorithm the machine
learning will choose NextMove function which will describe what type of legal moves
should be taken. For example : While playing chess with the opponent, when opponent
will play then the machine learning algorithm will decide what be the number of
possible legal moves taken in order to get success.
Step 3- Choosing Representation for Target function: When the machine algorithm
will know all the possible legal moves the next step is to choose the optimized move
using any representation i.e. using linear Equations, Hierarchical Graph Representation,
Tabular form etc. The NextMove function will move the Target move like out of these
move which will provide more success rate. For Example : while playing chess machine
have 4 possible moves, so the machine will choose that optimized move which will
provide success to it.
Step 4- Choosing Function Approximation Algorithm: An optimized move cannot
be chosen just with the training data. The training data had to go through with set of
example and through these examples the training data will approximates which steps
are chosen and after that machine will provide feedback on it. For Example : When a
training data of Playing chess is fed to algorithm so at that time it is not machine
algorithm will fail or get success and again from that failure or success it will measure
while next move what step should be chosen and what is its success rate.
Step 5- Final Design: The final design is created at last when system goes from number
of examples , failures and success , correct and incorrect decision and what will be the
next step etc. Example: DeepBlue is an intelligent computer which is ML-based won
chess game against the chess expert Garry Kasparov, and it became the first computer
which had beaten a human chess expert

Perspectives and Issues in Machine Learning

*Perspective:- It involves searching very large space of possible hypothesis to determine one
that best fits the observed data and prior knowledge held by learner.

*issues:-

Although machine learning is being used in every industry and helps organizations make more
informed and data-driven choices that are more effective than classical methodologies, it still
has so many problems that cannot be ignored. Here are some common issues in Machine
Learning that professionals face to inculcate ML skills and create an application from scratch.

1. Inadequate Training Data


2. Poor quality of data
3.How much training data is sufficient?
4. Monitoring and maintenance
5. Getting bad recommendations
6. Lack of skilled resources
7. Data Bias
Data Biasing is also found a big challenge in Machine Learning. These errors exist when
certain elements of the dataset are heavily weighted or need more importance than
others. Biased data leads to inaccurate results, skewed outcomes, and other analytical
errors. However, we can resolve this error by determining where data is actually biased
in the dataset. Further, take necessary steps to reduce it.

8. Lack of Explainability
This basically means the outputs cannot be easily comprehended as it is programmed in
specific ways to deliver for certain conditions. Hence, a lack of explainability is also
found in machine learning algorithms which reduce the credibility of the algorithms.

9. Slow implementations and results


This issue is also very commonly seen in machine learning models. However, machine
learning models are highly efficient in producing accurate results but are time-
consuming. Slow programming, excessive requirements' and overloaded data take more
time to provide accurate results than expected. This needs continuous maintenance and
monitoring of the model for delivering accurate results.

10. Irrelevant features


Although machine learning models are intended to give the best possible outcome, if
we feed garbage data as input, then the result will also be garbage. Hence, we should
use relevant features in our training sample. A machine learning model is said to be
good if training data has a good set of features or less to no irrelevant features

Concept Learning
Concept learning is a fundamental task in machine learning where the goal is to learn a
concept or a decision boundary from labeled examples in order to classify or categorize
new, unseen instances correctly.

decision tree learning

Decision tree learning is a popular machine learning technique used for both classification and
regression tasks. It builds a tree-like structure where each internal node represents a decision based
on the value of a feature, and each leaf node represents the predicted outcome. Decision trees are
easy to understand and interpret, making them useful for both exploratory analysis and decision-
making in various domains.

Example of a Decision Tree Algorithm


Forecasting Activities Using Weather Information
 Root node: Whole dataset
 Attribute : “Outlook” (sunny, cloudy, rainy).
 Subsets: Overcast, Rainy, and Sunny.
 Recursive Splitting: Divide the sunny subset even more according to
humidity, for example.
 Leaf Nodes: Activities include “swimming,” “hiking,” and “staying inside.”
Beginning with the entire dataset as the root node of the decision tree:
 Determine the best attribute to split the dataset based on information gain,
which is calculated by the formula: Information gain = Entropy(parent) –
[Weighted average] * Entropy(children), where entropy is a measure of
impurity or disorder of a set of examples, and the weighted average is based
on the number of examples in each child node.
 Create a new internal node that corresponds to the best attribute and connects
it to the root node. For example, if the best attribute is “outlook” (which can
have values “sunny”, “overcast”, or “rainy”), we create a new node labeled
“outlook” and connect it to the root node.
 Partition the dataset into subsets based on the values of the best attribute. For
example, we create three subsets: one for instances where the outlook is
“sunny”, one for instances where the outlook is “overcast”, and one for
instances where the outlook is “rainy”.
 Recursively repeat steps 1-4 for each subset until all instances in a given
subset belong to the same class or no further splitting is possible. For
example, if the subset of instances where the outlook is “overcast” contains
only instances where the activity is “hiking”, we assign a leaf node labeled
“hiking” to this subset. If the subset of instances where the outlook is “sunny”
is further split based on the humidity attribute, we repeat steps 2-4 for this
subset.
 Assign a leaf node to each subset that contains instances that belong to the
same class. For example, if the subset of instances where the outlook is
“rainy” contains only instances where the activity is “stay inside”, we assign
a leaf node labeled “stay inside” to this subset.
 Make predictions based on the decision tree by traversing it from the root
node to a leaf node that corresponds to the instance being classified. For
example, if the outlook is “sunny” and the humidity is “high”, we traverse the
decision tree by following the “sunny” branch and then the “high humidity”
branch, and we end up at a leaf node labeled “swimming”, which is our
predicted activity.
Advantages of Decision Tree
 Easy to understand and interpret, making them accessible to non-experts.
 Handle both numerical and categorical data without requiring extensive
preprocessing.
 Provides insights into feature importance for decision-making.
 Handle missing values and outliers without significant impact.
 Applicable to both classification and regression tasks.
Disadvantages of Decision Tree
 Disadvantages include the potential for overfitting
 Sensitivity to small changes in data, limited generalization if training data is
not representative
 Potential bias in the presence of imbalanced data.
Conclusion
Decision trees, a key tool in machine learning, model and predict outcomes based on
input data through a tree-like structure. They offer interpretability, versatility, and
simple visualization, making them valuable for both categorization and regression tasks.
While decision trees have advantages like ease of understanding, they may face
challenges such as overfitting. Understanding their terminologies and formation process
is essential for effective application in diverse scenarios.

Hypothesis Space Search in Decision Tree Learning

In decision tree learning, the hypothesis space search refers to the process of exploring and
selecting the optimal decision tree from a space of possible hypotheses. Here's how the
hypothesis space search typically occurs in decision tree learning:

ID3 (Iterative Dichotomiser 3) is a classic decision tree learning algorithm. It is one of the earliest
and most widely used algorithms for constructing decision trees from labeled training data. ID3
is primarily used for classification tasks and is particularly effective when dealing with categorical
data.

Issues in Decision Tree Learning

Decision tree learning, while a powerful and widely used machine learning technique, is
not without its challenges and issues. Here are some common issues associated with
decision tree learning:

1. Overfitting: Decision trees are prone to overfitting, especially when the tree
becomes too deep or complex. Overfitting occurs when the tree captures noise
or outliers in the training data, leading to poor generalization performance on
unseen data.

You might also like