0% found this document useful (2 votes)

2K views8 pages

Linear Regression Assignment

This document provides information and questions about performing linear regression analysis on two datasets to predict employee salary based on years of experience and house prices based on various housing features. It includes details on the datasets, data exploration steps, example code for splitting data into training and test sets, and questions to test understanding of linear regression concepts like R-squared, p-values, and multicollinearity.

Uploaded by

kdeepika2704

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (2 votes)

2K views8 pages

Linear Regression Assignment

Uploaded by

kdeepika2704

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Linear Regression Assignment

[email protected]

+91-7022374614

US: 1-800-216-8930 (Toll-Free)

Problem Statement:
1. You have access to the salary information of several employees along with their Years of
Experience. Using Linear regression analysis in machine learning, create a linear
regression model can predict the salary of an employee based on the years of
experience.
2. House prices can be an ever changing trend, but it does change based on certain
parameters. You are provided with housing data that has information on various houses
and their prices. Use the data at hand to predict the prices of the house using linear
regression in machine learning.

Dataset Information:
1. Data.csv - This dataset contains two columns with 30 entries each for employee years
of experience and their salary.

Column Name Description

YearsExperience The column contains 30 entries of the

employee’s years of experience.

Salary The salary column contains 30 entries of their

respective salary for their years of
experience.

2. Housing.csv - The dataset is considerably larger and contains the following columns in
the data. The dataset contains more than 20,000 entries for information about the
houses, prices and various other parameters.

Column Name Description

id The id column contains a separate id for the

houses in the data.

date The data contains the time series in which all

the houses' respective dates have been
mentioned.

price The price column lists the price of the house.

bedrooms The number of bedrooms in the house.

bathrooms The number of bathrooms in the house.

sqft_living The area of the living room.

sqft_lot The area of the lot.

floors Number of floors in the house.

waterfront If the house has a waterfront or not

view If the house has a viewfront or not.

condition Condition of the house represented in various

categories.

grade The grade of the house in various categories.

sqft_above The area above.

sqft_basement The basement area.

yr_built In which year the house was built.

yr_renovated In which year the house was renovated.

zipcode The zipcode of the house.

lat The latitude information of the house.

long The longitude information of the house.

sqft_lot15 The average square footage of the 15 closest

houses.

Sqft_basement_15 The average square footage of the 15 closest

houses.

Explore the datasets, and perform EDA on both the datasets before starting the following
exercise.

Use the data.csv for the questions mentioned below

1. How many employees having more than 5 years experience are earning more than 60000?
a. 41
b. 12
c. 21
d. 14

2. How many employees are earning between 50000-80000?

a. 14
b. 12
c. 10
d. 13
3. The scatter plot in the following image shows the relationship between the
“YearsExperience” and “Salary” columns. What possible inferences can be drawn from the
plot?

a. The plot shows a positive correlation between the ‘YearsExperience” and “Salary”
column.
b. The plot shows no significant relationship between the “YearExperience” and
“Salary” column.
c. The plot shows a negative correlation between the “YearsExperience” and
“Salary” column.
d. None of the above.

4. The distribution plot of the column “YearsExperience” is shown in the image below,
what possible inferences can be drawn from the plot.

a. “YearsExperience” data is normally distributed.

b. “YearsExperience” data is positively skewed.
c. “YearsExperience” data is negatively skewed.
d. None of the above.

5. What all inferences can be drawn from the table shown below:
a. The range of the “YearsExperience” and “Salary” data is (9.4 , 84660 )
b. The range of the “YearsExperience” and “Salary” data is (4.7 , 65237 )
c. The range of the “YearsExperience” and “Salary” data is (10.5, 122391)
d. The range of the “YearsExperience” and “Salary” data is (7.7 ,100544)

6. To split the dataset into training and testing data, if we use the following
code. X = data['YearsExperience']
y = data['Salary']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,

random_state=0) What does it mean when we write the test size as 0.2?

a. The testing data will be 2% accurate.

b. The testing data will have 80% samples from the total population.
c. The testing data will have 2% samples from the total population.
d. The training data will consist of 80% of the samples from the total population.

7. In the above example code, we have taken the random state as 0, if we change the
random state as 42, what does it mean for our training and testing data?
a. The shape of the training data will become (42,)
b. The shape of the training data will become (42,2)
c. The random state does not have any effect on the shape of the data.
d. The random state will increase the efficiency of the model by 42%.

8. If the r2 score calculated in the above example is 0.98 , change the sample size of
the training and testing set in the ratio 60:40, and build a linear regression model again.
After plotting the best fit line on the test data, calculate the r2_score for the new model.

a. 0.98
b. 0.96
c. 1.0
d. 0.0

9. If while fitting the model with training and testing data, you get the following error
ValueError: Expected 2D array, got 1D array instead: What could be
the issue with the data, and how can you solve it?
a. Reshape the data to a two dimensional array
b. Reshape the data to two arrays of 1-D each.
c. Both A and B
d. None of the above

The exercise after this contains questions that are based on

the housing dataset.

10. How many houses have a waterfront?

a. 21000
b. 21450
c. 163
d. 173

11. How many houses have 2 floors?

a. 2692
b. 8241
c. 10680
d. 161

12. How many houses built before 1960 have a waterfront?

a. 80
b. 7309
c. 90
d. 92

13. What is the price of the most expensive house having more than 4
bathrooms?

a. 7700000
b. 187000
c. 290000
d. 399000

14. The image shown below shows the boxplot of the price column from the housing
dataset. What inferences can you make from the plot?
a. The price column is normally distributed.
b. There might be high chances of price data having null values.
c. There is a presence of outliers in the price data.
d. There is no presence of outliers in the price data.

15. For instance, if the ‘price’ column consists of outliers, how can you make the data clean
and remove the redundancies?
a. Calculate the IQR range and drop the values outside the range.
b. Calculate the p-value and remove the values less than 0.05.
c. Calculate the correlation coefficient of the price column and remove the values less than
the correlation coefficient.
d. Calculate the Z-score of the price column and remove the values less than the z-score.

16. What are the various parameters that can be used to determine the dependent variables
in the housing data to determine the price of the house?

a. Correlation coefficients
b. Z-score
c. IQR Range
d. Range of the Features

17. If we get the r2 score as 0.38, what inferences can we make about the model and
its efficiency?
a. The model is 38% accurate, and shows poor efficiency.
b. The model is showing 0.38% discrepancies in the outcomes.
c. Low difference between observed and fitted values.
d. High difference between observed and fitted values.

18. If the metrics show that the p-value for the grade column is 0.092, what all inferences
can we make about the grade column?
a. Significant in presence of other variables.
b. Highly significant in presence of other variables
c. insignificance in presence of other variables
d. None of the above
19. If the Variance Inflation Factor value for a feature is considerably higher than the
other features, what can we say about that column/feature?
a. High multicollinearity
b. Low multicollinearity
c. Both A and B
d. None of the above

Linear Regression
No ratings yet
Linear Regression
5 pages
Linear Regression - Answers - Aravind
100% (2)
Linear Regression - Answers - Aravind
2 pages
Logistic Regression Assignment Quiz
83% (6)
Logistic Regression Assignment Quiz
7 pages
Decision Tree qUIZE
100% (5)
Decision Tree qUIZE
3 pages
Random Forest Classifier Guide
50% (2)
Random Forest Classifier Guide
5 pages
Logistic Regression Assignment
0% (4)
Logistic Regression Assignment
6 pages
Matplotlib Assignment
0% (2)
Matplotlib Assignment
9 pages
Decision Tree Assignment
0% (2)
Decision Tree Assignment
5 pages
Visualization
No ratings yet
Visualization
2 pages
Solution NumPy Assignment
75% (8)
Solution NumPy Assignment
5 pages
Pandas Assignment
0% (5)
Pandas Assignment
8 pages
PCA Quiz
No ratings yet
PCA Quiz
8 pages
Module 5 Pandas Assignment Updated
No ratings yet
Module 5 Pandas Assignment Updated
3 pages
Numpy Quiz
67% (3)
Numpy Quiz
4 pages
Linear Regression Hands-On
No ratings yet
Linear Regression Hands-On
27 pages
LDA Assignment Quiz
No ratings yet
LDA Assignment Quiz
4 pages
Capstone Project 2 1
No ratings yet
Capstone Project 2 1
3 pages
Assignment 2 Oops
No ratings yet
Assignment 2 Oops
10 pages
Walmart Sales Prediction
No ratings yet
Walmart Sales Prediction
21 pages
Kmeans Clustering Assignment: Problem Statement
No ratings yet
Kmeans Clustering Assignment: Problem Statement
1 page
Introduction To Deep Learning - Assignment
No ratings yet
Introduction To Deep Learning - Assignment
4 pages
Pandasquiz
No ratings yet
Pandasquiz
7 pages
PCA Quiz
100% (1)
PCA Quiz
5 pages
Weekly Quiz 1 Machine Learning Great Learning PDF
100% (2)
Weekly Quiz 1 Machine Learning Great Learning PDF
7 pages
Weekly Quiz 2 Boosting Ensemble Techniques and Model Tuning Great Learning PDF
100% (2)
Weekly Quiz 2 Boosting Ensemble Techniques and Model Tuning Great Learning PDF
8 pages
Logistic Regression Quiz - Predictive Modeling - Great Learning
100% (4)
Logistic Regression Quiz - Predictive Modeling - Great Learning
8 pages
COVID Project
50% (2)
COVID Project
1 page
Quiz 3 LDA Predictive Modeling Great Learning
100% (5)
Quiz 3 LDA Predictive Modeling Great Learning
7 pages
New OOPS Assignment 1
No ratings yet
New OOPS Assignment 1
4 pages
SQL Insights for Database Admins
No ratings yet
SQL Insights for Database Admins
3 pages
Weekly Quiz 2 - PGPBABI.O.OCT19 Statistical Methods For Decision Making - Great Learning PDF
100% (1)
Weekly Quiz 2 - PGPBABI.O.OCT19 Statistical Methods For Decision Making - Great Learning PDF
6 pages
ML Quiz-2
No ratings yet
ML Quiz-2
5 pages
Machine Learning Project: Sneha Sharma PGPDSBA Mar'21 Group 2
100% (4)
Machine Learning Project: Sneha Sharma PGPDSBA Mar'21 Group 2
36 pages
Quiz Week 8 - Unsupervised Learning Clustering
50% (2)
Quiz Week 8 - Unsupervised Learning Clustering
2 pages
Course-02 Upgrad MS DS (UoA) Exam Paper Guidelines - Learner's
100% (2)
Course-02 Upgrad MS DS (UoA) Exam Paper Guidelines - Learner's
9 pages
Naive Bayes Model Accuracy Analysis
100% (1)
Naive Bayes Model Accuracy Analysis
2 pages
Case Study 1 Solution
83% (6)
Case Study 1 Solution
4 pages
Weekly Quiz 2 Predictive Modeling Logistic Regression PDF
100% (1)
Weekly Quiz 2 Predictive Modeling Logistic Regression PDF
3 pages
30 Questions To Test Your Understanding of Logistic Regression
100% (1)
30 Questions To Test Your Understanding of Logistic Regression
13 pages
Time Series Forecasting Week 1 Quiz Part 2
67% (3)
Time Series Forecasting Week 1 Quiz Part 2
2 pages
Oops Assignment Solution
No ratings yet
Oops Assignment Solution
12 pages
ML Quiz 3 Machine Learning Great Learning
89% (9)
ML Quiz 3 Machine Learning Great Learning
7 pages
Advanced Statistics Assignment Report
No ratings yet
Advanced Statistics Assignment Report
12 pages
Machine Learning Scikit Handson
No ratings yet
Machine Learning Scikit Handson
4 pages
ML Quiz 1
No ratings yet
ML Quiz 1
4 pages
Weekly Quiz 1 (TSF) - Time Series Forecasting - Great Learning PDF
100% (1)
Weekly Quiz 1 (TSF) - Time Series Forecasting - Great Learning PDF
4 pages
Project Report
100% (3)
Project Report
36 pages
Weekly Quiz 3 SMDM - PGPBABI.O.OCT19 Statistical Methods For Decision Making - Great Learning PDF
67% (3)
Weekly Quiz 3 SMDM - PGPBABI.O.OCT19 Statistical Methods For Decision Making - Great Learning PDF
6 pages
SMDM Project Business Report - Ketan Sawalkar: (Document Title)
100% (2)
SMDM Project Business Report - Ketan Sawalkar: (Document Title)
17 pages
Weekly Quiz 1 - PGPBABI.O.OCT19 Statistical Methods For Decision Making - Great Learning PDF
100% (1)
Weekly Quiz 1 - PGPBABI.O.OCT19 Statistical Methods For Decision Making - Great Learning PDF
7 pages
Capstone Project 1 1
33% (3)
Capstone Project 1 1
4 pages
Final Document of SQL Project With Questions
0% (2)
Final Document of SQL Project With Questions
5 pages
Linear - Regression - Assignment: Problem Statement
100% (3)
Linear - Regression - Assignment: Problem Statement
24 pages
Business Report Project Machine Learning Rupesh Kumar DSBA-A5-21C-2021
100% (3)
Business Report Project Machine Learning Rupesh Kumar DSBA-A5-21C-2021
77 pages
Data Mining Quiz 1 Clustering
100% (2)
Data Mining Quiz 1 Clustering
4 pages
Business Report: Predictive Modelling
100% (2)
Business Report: Predictive Modelling
37 pages
State Wise Health Income Clustering 18th December 2021 PDF
100% (2)
State Wise Health Income Clustering 18th December 2021 PDF
29 pages
ML 1-11
No ratings yet
ML 1-11
27 pages
Ba ZG512 Ec-2r First Sem 2024-2025
No ratings yet
Ba ZG512 Ec-2r First Sem 2024-2025
12 pages
Python Practice Questions
No ratings yet
Python Practice Questions
5 pages
Tsanalyzer, A Gnss Time Series Analysis Software: Gps Solutions July 2017
No ratings yet
Tsanalyzer, A Gnss Time Series Analysis Software: Gps Solutions July 2017
7 pages
Intro To Data Science Lecture 4
No ratings yet
Intro To Data Science Lecture 4
13 pages
Basic Business Statistics - A Casebook (PDFDrive)
No ratings yet
Basic Business Statistics - A Casebook (PDFDrive)
257 pages
Shopmart Sales Data Steps
No ratings yet
Shopmart Sales Data Steps
2 pages
Lesson Recap
No ratings yet
Lesson Recap
106 pages
Mas FDT 08242023 1697739955676
No ratings yet
Mas FDT 08242023 1697739955676
37 pages
Complex Physical Activity, Outdoor Play, and School Readiness Among Preschoolers
No ratings yet
Complex Physical Activity, Outdoor Play, and School Readiness Among Preschoolers
13 pages
Solution Manual for Probability and Statistics for Engineers and Scientists 9th Edition
No ratings yet
Solution Manual for Probability and Statistics for Engineers and Scientists 9th Edition
14 pages
Measures of Central Tendency: Maximo A. Llego, JR
No ratings yet
Measures of Central Tendency: Maximo A. Llego, JR
43 pages
MICS Data for Education Analysis
No ratings yet
MICS Data for Education Analysis
12 pages
Cleanroom Performance Testing Specifications - Bio-Medical Pharmaceutical
No ratings yet
Cleanroom Performance Testing Specifications - Bio-Medical Pharmaceutical
18 pages
Module 2 - Methods of Segregating Mixed Cost
No ratings yet
Module 2 - Methods of Segregating Mixed Cost
4 pages
Assignment #3
100% (1)
Assignment #3
9 pages
Estimating Animal Density With Camera Traps A Practitioners Guide of The REST Model
No ratings yet
Estimating Animal Density With Camera Traps A Practitioners Guide of The REST Model
40 pages
Scatter Graphs
No ratings yet
Scatter Graphs
14 pages
EDA: A Guide for Researchers
100% (1)
EDA: A Guide for Researchers
41 pages
Chapter 5
No ratings yet
Chapter 5
18 pages
BN2201 1. HipCaseStudy
No ratings yet
BN2201 1. HipCaseStudy
30 pages
GLM & ANOVA for Statisticians
No ratings yet
GLM & ANOVA for Statisticians
58 pages
Science Lab Safety Guide
No ratings yet
Science Lab Safety Guide
13 pages
Study On Deep Reinforcement Learning Techniques For Building Energy
No ratings yet
Study On Deep Reinforcement Learning Techniques For Building Energy
14 pages
Troubleshooting Guide For EQA Results - 1WA
100% (4)
Troubleshooting Guide For EQA Results - 1WA
9 pages
Swapnil Shashank Parkhe (UIN-660014865) Assignment 1 (All Are Pasted at End)
No ratings yet
Swapnil Shashank Parkhe (UIN-660014865) Assignment 1 (All Are Pasted at End)
16 pages
AD3491 - Unit 1 - Introduction To Data Science Important Questions 2 Marks With Answer - 3-8
No ratings yet
AD3491 - Unit 1 - Introduction To Data Science Important Questions 2 Marks With Answer - 3-8
6 pages
Statistical Data Analysis Guide
No ratings yet
Statistical Data Analysis Guide
35 pages
Data Cleaning
No ratings yet
Data Cleaning
20 pages
Trading Institution Buying Logic
No ratings yet
Trading Institution Buying Logic
53 pages
AL801 BI For All
No ratings yet
AL801 BI For All
31 pages
Up Tps6 Lecture-Slides 1.2
No ratings yet
Up Tps6 Lecture-Slides 1.2
31 pages

Linear Regression Assignment

Uploaded by

Linear Regression Assignment

Uploaded by

Linear Regression Assignment

US: 1-800-216-8930 (Toll-Free)

Column Name Description

YearsExperience The column contains 30 entries of the

Salary The salary column contains 30 entries of their

Column Name Description

id The id column contains a separate id for the

date The data contains the time series in which all

price The price column lists the price of the house.

bedrooms The number of bedrooms in the house.

bathrooms The number of bathrooms in the house.

sqft_living The area of the living room.

sqft_lot The area of the lot.

floors Number of floors in the house.

view If the house has a viewfront or not.

condition Condition of the house represented in various

grade The grade of the house in various categories.

sqft_above The area above.

sqft_basement The basement area.

yr_built In which year the house was built.

yr_renovated In which year the house was renovated.

zipcode The zipcode of the house.

lat The latitude information of the house.

long The longitude information of the house.

sqft_lot15 The average square footage of the 15 closest

Sqft_basement_15 The average square footage of the 15 closest

Use the data.csv for the questions mentioned below

2. How many employees are earning between 50000-80000?

a. “YearsExperience” data is normally distributed.

a. The testing data will be 2% accurate.

The exercise after this contains questions that are based on

10. How many houses have a waterfront?

11. How many houses have 2 floors?

12. How many houses built before 1960 have a waterfront?

You might also like