Linear Regression for Single Prediction
Last Updated :
17 Jan, 2025
Linear regression is a statistical method and machine learning foundation used to model relationship between a dependent variable and one or more independent variables. The primary goal is to predict the value of the dependent variable based on the values of the independent variables.
Predicting a Single Value Using Linear Regression
Once the model has been trained and evaluated we can use it to make predictions. When we talk about producing a single prediction value it means using a specific set of independent variable(s) to generate one dependent variable using linear regression model.
Example 1: Single Prediction Using Simple Linear Regression
Let’s assume you have a dataset that tracks hours studied (independent variable) and test scores (dependent variable). After training the model you want to predict the test score for a student who studied for 5 hours.
Using the equation of the regression line y = \beta_0 + \beta_1x, where: x is the number of hours studied.
We can substitute x=5x into the equation and compute y, which is the predicted test score.
Example 2: Single Prediction Using Multiple Linear Regression
In the case of multiple linear regression where more than one independent variable influences the dependent variable predicting a single value involves inputting multiple independent variable values in the model.
For example in predicting house prices factors such as area, number of bedrooms and location all be considered. Once you input the values for these independent variables the model will output predicted house price.
Building a Linear Regression Model
1. Loading and Preparing Data
First import necessary libraries and load the dataset. For this example we will generate a synthetic dataset with multiple features using numpy
. Assume we are working with a dataset where we want to predict a target based on two random features.
Python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
# Set random seed for reproducibility
np.random.seed(42)
# Generate random independent variables
n_samples = 1000
X1 = np.random.uniform(1, 100, n_samples) # Random numbers for feature 1
X2 = np.random.uniform(1, 50, n_samples) # Random numbers for feature 2
# Create a relationship between features and the target variable with some noise
noise = np.random.normal(0, 10, n_samples)
y = 2.5 * X1 + 3.8 * X2 + noise # Linear relationship with noise
# Create a DataFrame for easy manipulation
data = pd.DataFrame({'Feature1': X1, 'Feature2': X2, 'Target': y})
data.head()
Output:
Feature1 Feature2 Target
0 38.079472 10.071514 124.690605
1 95.120716 27.553146 334.234944
2 73.467400 43.774346 347.746226
3 60.267190 36.879019 294.481904
4 16.445845 40.521496 204.232146
This synthetic dataset has two features (Feature1
and Feature2
) and one target variable (Target
) with a linear relationship with some noise.
2. Exploratory Data Analysis (EDA)
Before moving to modeling let's analyze the data visually to ensure there is a linear relationship between the features and the target variable.
Python
# Pairplot to visualize relationships between variables
sns.pairplot(data)
plt.show()
# Check the correlation between features and target
corr_matrix = data.corr()
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm')
plt.show()
Output:
Exploratory Data Analysis (EDA)3. Data Preprocessing
Linear regression is sensitive to feature scaling. To improve model performance, it’s important to scale the features. We'll use StandardScaler
from sklearn
.
Python
# Split data into independent variables (X) and dependent variable (y)
X = data[['Feature1', 'Feature2']]
y = data['Target']
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Scale the features using StandardScaler
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
4. Train the Linear Regression Model
Now that the data is scaled, we can train the linear regression model. This step provides the model's coefficients and the intercept.
Python
# Instantiate the linear regression model
model = LinearRegression()
# Train the model on the training data
model.fit(X_train_scaled, y_train)
# Check the model's coefficients and intercept
print("Coefficients: ", model.coef_)
print("Intercept: ", model.intercept_)
Output:
Coefficients: [72.55199542 54.26405783]
Intercept: 224.59535042865838
5. Make Single Predictions
We can now make predictions on the test data and evaluate the model. Below, new_data
represents a new data point with two feature values. The model will predict the target value based on this new input.
Python
# Predict the target values for the test set
y_pred = model.predict(X_test_scaled)
# Example of predicting a single value using a new data point
new_data = np.array([[45, 30]]) # Example values for Feature1 and Feature2
new_data_scaled = scaler.transform(new_data) # Scale the new data
single_prediction = model.predict(new_data_scaled)
print(f"Predicted value for the new data point {new_data[0]}: {single_prediction[0]}")
Output:
Predicted value for the new data point [45 30]: 226.37747796176552
6. Model Evaluation
To evaluate the model's performance we’ll compute the Mean Squared Error (MSE) and R-squared value for the test data.
Python
# Calculate Mean Squared Error (MSE) and R-squared (R^2)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"Mean Squared Error: {mse}")
print(f"R-squared: {r2}")
Output:
Mean Squared Error: 100.35222719050975
R-squared: 0.9876309086737035
Limitations of Linear Regression for Single Predictions
- Sensitivity to Outliers: Outliers have negative impact on linear regression models when making predictions. It can skew the best-fit line and result in inaccurate predictions.
- Overfitting: If the model is overly complex (for example, too many independent variable) it may fit the training data too closely leading to overfitting and perform poorly on new unseen data.
- Assumes a Linear Relationship: Linear regression assumes a linear relationship between the independent and dependent variables. However in real-world data is non-linear relationship.
Linear regression is a simple model that can be used to predict single values based on historical data. The process of building a linear regression model involves preparing data, training the model and making predictions based on specific inputs.
Similar Reads
ML | Rainfall prediction using Linear regression
Predicting rainfall is a vital aspect of weather forecasting, agriculture planning and water resource management. In this article we will use Linear regression algorithm that help establish relationship between two variables: one dependent (rainfall) and one or more independent variables (temperatur
4 min read
Simple Linear Regression in Python
Simple linear regression models the relationship between a dependent variable and a single independent variable. In this article, we will explore simple linear regression and it's implementation in Python using libraries such as NumPy, Pandas, and scikit-learn. Understanding Simple Linear Regression
7 min read
Solving Linear Regression in Python
Linear regression is a widely used statistical method to find the relationship between dependent variable and one or more independent variables. It is used to make predictions by finding a line that best fits the data we have. The most common approach to best fit a linear regression model is least-s
3 min read
Prediction Interval for Linear Regression in R
A prediction interval provides a range within which we expect a new, unseen data point to fall, given a certain level of confidence. Unlike a simple point estimate, this interval reflects two key sources of uncertainty: The inherent variability in the data itself (random noise or natural variation)T
4 min read
Linear Regression using PyTorch
Linear Regression is a very commonly used statistical method that allows us to determine and study the relationship between two continuous variables. The various properties of linear regression and its Python implementation have been covered in this article previously. Now, we shall find out how to
4 min read
Simple Linear Regression in R
Regression shows a line or curve that passes through all the data points on the target-predictor graph in such a way that the vertical distance between the data points and the regression line is minimum What is Linear Regression?Linear Regression is a commonly used type of predictive analysis. Linea
12 min read
Placement prediction using Logistic Regression
Prerequisites: Understanding Logistic Regression, Logistic Regression using Python In this article, we are going to discuss how to predict the placement status of a student based on various student attributes using Logistic regression algorithm. Placements hold great importance for students and educ
4 min read
Python | Linear Regression using sklearn
Prerequisite: Linear Regression Linear Regression is a machine learning algorithm based on supervised learning. It performs a regression task. Regression models a target prediction value based on independent variables. It is mostly used for finding out the relationship between variables and forecast
3 min read
Transport Demand Prediction using Regression
Transport demand prediction is a crucial aspect of transportation planning and management. Accurate demand forecasts enable efficient resource allocation, improved service planning, and enhanced customer satisfaction. Regression analysis, a statistical method for modeling relationships between varia
8 min read
ML | Multiple Linear Regression using Python
Linear regression is a fundamental statistical method widely used for predictive analysis. It models the relationship between a dependent variable and a single independent variable by fitting a linear equation to the data. Multiple Linear Regression is an extension of this concept that allows us to
4 min read