0% found this document useful (0 votes)

20 views

Regression Dataset Example

Uploaded by

MOHANA RAO GANGAVARAPU

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views

Regression Dataset Example

Uploaded by

MOHANA RAO GANGAVARAPU

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Simple Linear Regression with Python

In this post we will guide you the very first step to

approach Machine Learning using Simple Linear
Regression.
What is Linear?
First, let’s say that you are shopping at Walmart.
Whether you buy goods or not, you have to pay $2.00 for
parking ticket. Each apple price $1.5, and you have to
buy an (x) item of apple. Then we can populate a price
list as below:
It’s easy to predict (or calculate) the Price based on
Value and vice versa using the equation of y=2+1.5x for
this example or:

with:
• a=2
• b = 1.5
A linear function has one independent variable and one
dependent variable. The independent variable is x and
the dependent variable is y.
• a is the constant term or the y intercept. It is the
value of the dependent variable when x = 0.
• b is the coefficient of the independent variable. It is
also known as the slope and gives the rate of change
of the dependent variable.
Why we call it linear? Alright, let’s visualize the data set
we got above!

After plotting all value of the shopping cost (in blue line),
you can see, they all are in one line, that’s why we call
it linear. With the equation of linear (y=a+bx), the a is
an independent variable. Even if a=0 (you have no need
to pay for the parking ticket), the Shopping Cost line will
shift down and they are still in a line (orange line).

But in real life, things are not that simple!

Let’s take another example, in AB Company, there is a
salary distribution table based on Year of Experience as
per below:
“The scenario is you are a HR officer, you got a
candidate with 5 years of experience. Then what
is the best salary you should offer to him?”
Before deep dive into this problem, let’s plot the data set
into the plot first:

Please look at this chart carefully. Now we have a bad

news: all the observations are not in a line. It means we
cannot find out the equation to calculate the (y) value.
So what now? Don’t worry, we have a good news for you!
Look at the Scatter Plot again before scrolling down. Do
you see it?
All the points is not in a line BUT they are in a line-shape!
It’s linear!
Based on our observation, we can guess that the salary
range of 5 Years Experience should be in the red range.
Of course, we can offer to our candidate any number in
that red range. But how to pick the best number for him?
It’s time to use Machine Learning to predict the best
salary for our candidate.
In this section, we will use Python on Spyder IDE to find
the best salary for our candidate. Okay, let’s do it!
Linear Regression with Python
Before moving on, we summarize 2 basic steps of
Machine Learning as per below:
1.Training
2.Predict
Okay, we will use 4 libraries such as numpy and pandas to
work with data set, sklearn to implement machine learning
functions, and matplotlib to visualize our plots for viewing:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset

dataset = pd.read_csv('salary_data.csv')
X = dataset.iloc[:, :-1].values #get a copy of dataset exclude last column
y = dataset.iloc[:, 1].values #get array of dataset in column 1st

Code explanation:
• dataset: the table contains all values in our csv file
• X: the first column which contains Years Experience
array
• y: the last column which contains Salary array

Next, we have to split our dataset (total 30 observations)

into 2 sets: training set which used for training and test
set which used for testing:

# Splitting the dataset into the Training set and Test set

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=1/3,
random_state=0)

Code explanation:
• test_size=1/3: we will split our dataset (30 observations)
into 2 parts (training set, test set) and the ratio
of test set compare to dataset is 1/3 (10
observations will be put into the test set. You can put
it 1/2 to get 50% or 0.5, they are the same. We should
not let the test set too big; if it’s too big, we will lack
of data to train. Normally, we should pick around 5%
to 30%.
• train_size: if we use the test_size already, the rest of
data will automatically be assigned to train_size.
• random_state: this is the seed for the random number
generator. We can put an instance of
the RandomState class as well. If we leave it blank or 0,
the RandomState instance used by np.random will be used
instead.
We already have the train set and test set, now we have
to build the Regression Model:
# Fitting Simple Linear Regression to the Training set

from sklearn.linear_model import LinearRegression

regressor = LinearRegression()
regressor.fit(X_train, y_train)

Code explanation:
• regressor = LinearRegression(): our training model which will
implement the Linear Regression.
• regressor.fit: in this line, we pass the X_train which
contains value of Year Experience and y_train which
contains values of particular Salary to form up the
model. This is the training process.

Let’s visualize our training model and testing model:

# Visualizing the Training set results

viz_train = plt
viz_train.scatter(X_train, y_train, color='red')
viz_train.plot(X_train, regressor.predict(X_train), color='blue')
viz_train.title('Salary VS Experience (Training set)')
viz_train.xlabel('Year of Experience')
viz_train.ylabel('Salary')
viz_train.show()

# Visualizing the Test set results

viz_test = plt
viz_test.scatter(X_test, y_test, color='red')
viz_test.plot(X_train, regressor.predict(X_train), color='blue')
viz_test.title('Salary VS Experience (Test set)')
viz_test.xlabel('Year of Experience')
viz_test.ylabel('Salary')
viz_test.show()
After running above code, you will see 2 plots in the
console window:

Compare two plots, we can see 2 blue lines are the same
direction. Our model is good to use now.
Alright! We already have the model, now we can use it to
calculate (predict) any values of X depends on y or any
values of y depends on X. This is how we do it:

# Predicting the result of 5 Years Experience

y_pred = regressor.predict(5)
Predict y_pred using single value of X=5
Bingo! The value of y_pred with X = 5 (5 Years
Experience) is 73545.90
You can offer to your candidate the salary of
$73,545.90 and this is the best salary for him!
We can also pass an array of X instead of single value
of X:
# Predicting the Test set results
y_pred = regressor.predict(X_test)

Predict y_pred using array of X_test

And we can predict X using y as well. Let’s try it yourself!
In conclusion, with Simple Linear Regression, we
have to do 5 steps as per below:
1.Importing the dataset.
2.Splitting dataset into training set and testing set (2
dimensions of X and y per each set). Normally, the
testing set should be 5% to 30% of dataset.
3.Visualize the training set and testing set to double
check (you can bypass this step if you want).
4.Initializing the regression model and fitting it using
training set (both X and y).
5.Let’s predict!!

Complete code:
import numpy as np

import matplotlib.pyplot as plt

import pandas as pd

# Importing the dataset

#dataset = pd.read_csv('181105_missing-data.csv')

dataset = pd.read_csv('/home/student/Desktop/salary_data.csv')

X = dataset.iloc[:, :-1].values #get a copy of dataset exclude last column

y = dataset.iloc[:, 1].values #get array of dataset in column 1st

# Splitting the dataset into the Training set and Test set

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=1/3, random_state=0)

# Scaling
from sklearn.preprocessing import StandardScaler

sc_X = StandardScaler()

X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test)

# itting Simple Linear Regression to the Training set

from sklearn.linear_model import LinearRegression

regressor = LinearRegression()

regressor.fit(X_train, y_train)

# Predicting the Test set results

y_pred = regressor.predict(X_test)

# Visualizing the Training set results

viz_train = plt

viz_train.scatter(X_train, y_train, color='red')

viz_train.plot(X_train, regressor.predict(X_train), color='blue')

viz_train.title('Salary VS Experience (Training set)')

viz_train.xlabel('Year of Experience')

viz_train.ylabel('Salary')

viz_train.show()

# Visualizing the Test set results

viz_test = plt

viz_test.scatter(X_test, y_test, color='red')

viz_test.plot(X_train, regressor.predict(X_train), color='blue')

viz_test.title('Salary VS Experience (Test set)')

viz_test.xlabel('Year of Experience')

viz_test.ylabel('Salary')

viz_test.show()

y_pred = regressor.predict([5])
y_pred

y_pred = regressor.predict(X_test)

y_pred
Output:

[73545.90445964]

Caldwell Hibbert 2002
No ratings yet
Caldwell Hibbert 2002
23 pages
Assignment 1:: Intro To Machine Learning
No ratings yet
Assignment 1:: Intro To Machine Learning
6 pages
Sample Inferential Statistics Exercise # 4
100% (2)
Sample Inferential Statistics Exercise # 4
19 pages
Simple Linear Regression in Machine Learning
No ratings yet
Simple Linear Regression in Machine Learning
7 pages
Machine Learning 2
No ratings yet
Machine Learning 2
45 pages
2.1 ML (Implementation of Simple Linear Regression in Python)
No ratings yet
2.1 ML (Implementation of Simple Linear Regression in Python)
8 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
30 pages
Home Ai Machine Learning Dbms Java Blockchain Control System Selenium HTML Css Javascript Ds
No ratings yet
Home Ai Machine Learning Dbms Java Blockchain Control System Selenium HTML Css Javascript Ds
11 pages
Linear Regression2
No ratings yet
Linear Regression2
9 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
4 pages
EXP-4 DMusingPYTHON
No ratings yet
EXP-4 DMusingPYTHON
7 pages
Exp 1
No ratings yet
Exp 1
6 pages
Practical # 10
No ratings yet
Practical # 10
5 pages
lab mannual of ML
No ratings yet
lab mannual of ML
43 pages
Data Science Chapitre 2
No ratings yet
Data Science Chapitre 2
98 pages
Simple Linear Regression Code
No ratings yet
Simple Linear Regression Code
3 pages
Linear Regression
No ratings yet
Linear Regression
20 pages
ML manoj
No ratings yet
ML manoj
51 pages
Regression
No ratings yet
Regression
16 pages
Machine Learning Hands-On
100% (1)
Machine Learning Hands-On
18 pages
Data Science Chapitre 2
No ratings yet
Data Science Chapitre 2
132 pages
Task1
No ratings yet
Task1
5 pages
ML Experiment No 1 Linear Regression Analysis
No ratings yet
ML Experiment No 1 Linear Regression Analysis
3 pages
3. Machine Learning
No ratings yet
3. Machine Learning
158 pages
Model_learning_steps
No ratings yet
Model_learning_steps
12 pages
ML Activity Kalyan
No ratings yet
ML Activity Kalyan
21 pages
Linear Regression - Numpy and Sklearn
No ratings yet
Linear Regression - Numpy and Sklearn
7 pages
python 1
No ratings yet
python 1
3 pages
Unit 2 Regression Analysis
No ratings yet
Unit 2 Regression Analysis
16 pages
Lab Manual 04
No ratings yet
Lab Manual 04
12 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
43 pages
Praktikum 1 Jupiter Machine Learning
No ratings yet
Praktikum 1 Jupiter Machine Learning
1 page
Lab Experiment 4 - AI
No ratings yet
Lab Experiment 4 - AI
7 pages
CSL0777 L15
No ratings yet
CSL0777 L15
24 pages
Simple Linear Regression Lab II
No ratings yet
Simple Linear Regression Lab II
5 pages
20dit073 Jay Prajapati ML
No ratings yet
20dit073 Jay Prajapati ML
68 pages
ml_6_7_8 (1)
No ratings yet
ml_6_7_8 (1)
10 pages
FYMCA IDSLab A6 Submission
No ratings yet
FYMCA IDSLab A6 Submission
9 pages
Btech1007022_lab5.1
No ratings yet
Btech1007022_lab5.1
9 pages
C1 W1 Lab03 Model Representation Soln-Copy1
No ratings yet
C1 W1 Lab03 Model Representation Soln-Copy1
7 pages
Btech1007022_lab5
No ratings yet
Btech1007022_lab5
14 pages
ML Remaining
No ratings yet
ML Remaining
17 pages
7 محاضرات
No ratings yet
7 محاضرات
36 pages
C1 W1 Lab02 Model Representation Soln
No ratings yet
C1 W1 Lab02 Model Representation Soln
5 pages
Machine Learning Algorithm With Python Implementation
No ratings yet
Machine Learning Algorithm With Python Implementation
34 pages
LP III Lab Manual
100% (1)
LP III Lab Manual
8 pages
C1 W1 Lab02 Model Representation Soln
No ratings yet
C1 W1 Lab02 Model Representation Soln
7 pages
Regression Demo
No ratings yet
Regression Demo
8 pages
Kartik mlp 4-9prg (1)
No ratings yet
Kartik mlp 4-9prg (1)
10 pages
C1 W1 Lab02 Model Representation Soln
No ratings yet
C1 W1 Lab02 Model Representation Soln
7 pages
ML LN 3
No ratings yet
ML LN 3
44 pages
C1 W1 Lab02 Model Representation Soln
No ratings yet
C1 W1 Lab02 Model Representation Soln
5 pages
(Slide) Non Linear Regression
No ratings yet
(Slide) Non Linear Regression
39 pages
Whole ML PDF 1614408656
100% (1)
Whole ML PDF 1614408656
214 pages
07. DE - Python For Data Science - machine learning
No ratings yet
07. DE - Python For Data Science - machine learning
45 pages
2.3 ML (Implementation of Polynomial Regression Using Python)
No ratings yet
2.3 ML (Implementation of Polynomial Regression Using Python)
9 pages
Unit5 - Linear Regression
No ratings yet
Unit5 - Linear Regression
4 pages
Simple - Linear - Regression - Ipynb - Colaboratory
No ratings yet
Simple - Linear - Regression - Ipynb - Colaboratory
2 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
23 pages
Machine Learning Assignment
No ratings yet
Machine Learning Assignment
2 pages
Task8
No ratings yet
Task8
2 pages
MCS-011: Problem Solving and Programming
From Everand
MCS-011: Problem Solving and Programming
Dr. DK Sukhani
No ratings yet
Data analysis Notes
No ratings yet
Data analysis Notes
8 pages
Scikit - Notes ML
100% (2)
Scikit - Notes ML
12 pages
Do UN Interventions Cause Peace Using Matching To
No ratings yet
Do UN Interventions Cause Peace Using Matching To
43 pages
The Relationship Between Academic Motivation and Academic Achievement of Students
No ratings yet
The Relationship Between Academic Motivation and Academic Achievement of Students
8 pages
Westerveld Et Al-2018-Autism Research
No ratings yet
Westerveld Et Al-2018-Autism Research
13 pages
Correlation and Regression Analysis: C H A P T E R 5
No ratings yet
Correlation and Regression Analysis: C H A P T E R 5
11 pages
Statistical Methods Course Syllabus
No ratings yet
Statistical Methods Course Syllabus
20 pages
Machine Learning Lecture-Notes
100% (2)
Machine Learning Lecture-Notes
408 pages
Factors Affecting Internet Banking Adoption Among Internal and External Customers: A Case of Pakistan
No ratings yet
Factors Affecting Internet Banking Adoption Among Internal and External Customers: A Case of Pakistan
15 pages
MEC-109 EM 2024-25 KP
No ratings yet
MEC-109 EM 2024-25 KP
20 pages
Bayesian Analysis with Python 1st Edition Martin - Own the complete ebook set now in PDF and DOCX formats
No ratings yet
Bayesian Analysis with Python 1st Edition Martin - Own the complete ebook set now in PDF and DOCX formats
52 pages
Course Outline - FM217
No ratings yet
Course Outline - FM217
4 pages
Do Online Exams Facilitate Cheating? An Experiment Designed To Separate Possible Cheating From The Effect of The Online Test Taking Environment
No ratings yet
Do Online Exams Facilitate Cheating? An Experiment Designed To Separate Possible Cheating From The Effect of The Online Test Taking Environment
12 pages
Lecture 5 6 Forecasting
100% (1)
Lecture 5 6 Forecasting
45 pages
Casio FX 82za Plus II Maths Literacy
No ratings yet
Casio FX 82za Plus II Maths Literacy
8 pages
A Study On The Customer Satisfaction and Customer Loyalty of Furniture Purchaser in On-Line Shop
No ratings yet
A Study On The Customer Satisfaction and Customer Loyalty of Furniture Purchaser in On-Line Shop
11 pages
Ar: Architecture and Planning: Duration: Three Hours Maximum Marks: 150
No ratings yet
Ar: Architecture and Planning: Duration: Three Hours Maximum Marks: 150
15 pages
Class Size Effect
No ratings yet
Class Size Effect
14 pages
Kishida Et Al. (2006)
No ratings yet
Kishida Et Al. (2006)
11 pages
Optimization of Cassava (Manihot Esculenta Crantz) Bars With Peanuts (Arachis Hypogaea Linn.) and Malunggay (Moringa Oleifera Lam.)
No ratings yet
Optimization of Cassava (Manihot Esculenta Crantz) Bars With Peanuts (Arachis Hypogaea Linn.) and Malunggay (Moringa Oleifera Lam.)
8 pages
ECON 322 ECONOMETRICS 11 - Kabarak University
No ratings yet
ECON 322 ECONOMETRICS 11 - Kabarak University
6 pages
Expansive Soil: Causes and Treatments
No ratings yet
Expansive Soil: Causes and Treatments
186 pages
Simple Regression Analysis 2
No ratings yet
Simple Regression Analysis 2
4 pages
Capital Structure On Bank Performance Report.
No ratings yet
Capital Structure On Bank Performance Report.
25 pages
Dettol Marketing Research For Understanding Consumer Evaluations of Brand Extensions
No ratings yet
Dettol Marketing Research For Understanding Consumer Evaluations of Brand Extensions
9 pages
Cacing Pon Pon
No ratings yet
Cacing Pon Pon
10 pages
Determination of Precipitation Patterns
No ratings yet
Determination of Precipitation Patterns
26 pages
Statistical Methods For Cross-Sectional Data Analysis
No ratings yet
Statistical Methods For Cross-Sectional Data Analysis
1 page

Regression Dataset Example

Uploaded by

Regression Dataset Example

Uploaded by

Simple Linear Regression with Python

In this post we will guide you the very first step to

But in real life, things are not that simple!

Please look at this chart carefully. Now we have a bad

# Importing the dataset

Next, we have to split our dataset (total 30 observations)

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression

Let’s visualize our training model and testing model:

# Visualizing the Training set results

# Visualizing the Test set results

# Predicting the result of 5 Years Experience

Predict y_pred using array of X_test

import matplotlib.pyplot as plt

# Importing the dataset

X = dataset.iloc[:, :-1].values #get a copy of dataset exclude last column

y = dataset.iloc[:, 1].values #get array of dataset in column 1st

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=1/3, random_state=0)

# itting Simple Linear Regression to the Training set

from sklearn.linear_model import LinearRegression

# Predicting the Test set results

# Visualizing the Training set results

viz_train.scatter(X_train, y_train, color='red')

viz_train.plot(X_train, regressor.predict(X_train), color='blue')

viz_train.title('Salary VS Experience (Training set)')

# Visualizing the Test set results

viz_test.scatter(X_test, y_test, color='red')

viz_test.plot(X_train, regressor.predict(X_train), color='blue')

viz_test.title('Salary VS Experience (Test set)')

You might also like