Supervised learning is a type of machine learning that uses labeled data sets — where each input is paired with a known output — to train artificial intelligence (AI) models. By using labeled data, this allows machine learning models to learn the mapping from inputs to outputs and improve prediction accuracy over time.
Types of Supervised Learning Algorithms
Supervised learning typically involves two main task types: regression and classification. These tasks are carried out using algorithms such as naive Bayes, decision trees, random forests and neural networks.
Regression Algorithms
Regression algorithms predict continuous numeric output values, such as prices or temperatures, based on the relationships between input variables.
Classification Algorithms
Classification algorithms assign inputs into predefined categories or classes, such as spam vs. not spam.
Naive Bayes
Naive Bayes is a classification algorithm based on Bayes’ Theorem. It assumes independence between input features and is effective for high-dimensional data sets like text.
Decision Trees
Decision trees are flowchart-like structures where each internal node represents a decision on a feature, and each leaf node represents an output label.
Random Forests
Random forests are ensembles of decision trees that aggregate the predictions of multiple trees to improve accuracy and reduce overfitting.
Neural Networks
Neural networks consist of layers of interconnected nodes that learn to recognize complex patterns in data. They are used in both classification and regression tasks.
Supervised vs. Unsupervised Learning
The biggest difference between supervised and unsupervised learning is the use of labeled data sets.
Supervised Learning
Supervised learning involves training a model on labeled data to predict known outputs from new inputs. By providing labeled data sets, the model iteratively adjusts its internal parameters to minimize the difference between its predictions and the true outputs.
The difference between the prediction and the correct answer is the key to the models producing accurate predictions when using new data. Like most varieties of machine learning, supervised learning is typically used to predict outcomes from data. Supervised learning models are typically implemented using programming languages like Python or R, which provide robust machine learning libraries such as scikit-learn, TensorFlow and caret.
Unsupervised Learning
Unsupervised learning does not make use of labeled data sets, meaning the models work on their own to uncover the inherent structure of the unlabeled data. Human intervention is still required to validate the output variables depending on the intended use of the data and if it makes sense for the data to be utilized.
Unsupervised learning is typically used to uncover insights from massive volumes of data, detect anomalies or make recommendations and may have inaccurate results without human validation.
Supervised Learning Example: Home Price Prediction
Supervised learning can be used to make accurate predictions using data, such as predicting a new home’s price.
In order for predictions to be made, input data must be gathered. To determine a new home’s price, for example, we need to know factors like location, square footage, outdoor space, number of floors, number of rooms and more. In this case, the home’s price is the labeled output, while features like location, square footage and number of rooms are inputs used to predict that price.
Once the corresponding labels are set, data from thousands of other homes can be gathered and compared against the existing input data. By weighing the features against the prices of these other homes, the model can do the same for the home in question to determine an accurate price.
Frequently Asked Questions
What is supervised learning?
Supervised learning is a type of machine learning that uses labeled data to train models to classify data or predict outcomes.
What are the main types of supervised learning?
Supervised learning includes two main task types:
- Regression (predicts continuous values)
- Classification (assigns inputs to predefined categories)
Which algorithms are used in supervised learning?
Common supervised learning algorithms include:
- Decision tree
- Naive Bayes,
- Random forest
- Neural network
- Regression model