0% found this document useful (0 votes)

24 views15 pages

Nikitha

This document contains code to perform simple linear regression on a housing price dataset to predict prices based on square meters. It loads and explores the Paris housing dataset, selects relevant features correlated with price, then uses gradient descent with different learning rates to fit a linear regression model and analyze the cost function over iterations. Gradient descent with a learning rate of 0.1 converges faster than with a learning rate of 0.01 based on the plotted cost histories.

Uploaded by

Chakri Chakradhar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views15 pages

Nikitha

Uploaded by

Chakri Chakradhar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

import pandas as pd

import numpy as np
import [Link] as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

dataset = pd.read_csv("/content/[Link]")
[Link](20)

squareMeters numberOfRooms hasYard hasPool floors cityCode \

0 75523 3 0 1 63 9373
1 80771 39 1 1 98 39381
2 55712 58 0 1 19 34457
3 32316 47 0 0 6 27939
4 70429 19 1 1 90 38045
5 39223 36 0 1 17 39489
6 58682 10 1 1 99 6450
7 86929 100 1 0 11 98155
8 51522 3 0 0 61 9047
9 39686 42 0 0 15 71019
10 23563 21 0 1 90 91058
11 96470 74 1 0 21 92029
12 19127 31 1 0 5 7475
13 13087 44 1 0 77 40475
14 79770 3 0 1 69 54812
15 75985 60 1 0 67 6517
16 64169 88 0 1 6 61711
17 99371 31 1 1 16 96297
18 25966 37 1 1 17 22818
19 41792 43 1 1 10 80768

cityPartRange numPrevOwners made isNewBuilt hasStormProtector

\
0 3 8 2005 0 1

1 8 6 2015 1 0

2 6 8 2021 0 0

3 10 4 2012 0 1

4 3 7 1990 1 0

5 8 6 2012 0 1

6 10 9 1995 1 1

7 3 4 2003 1 0

8 8 3 2012 1 1
9 5 8 2021 1 1

10 6 8 1993 1 0

11 4 2 2011 1 1

12 2 9 2008 0 0

13 8 4 2004 1 0

14 10 5 2018 0 1

15 6 9 2009 1 1

16 3 9 2011 1 1

17 7 8 2013 1 1

18 3 1 2016 0 0

19 9 5 2017 1 1

basement attic garage hasStorageRoom hasGuestRoom price

0 4313 9005 956 0 7 7559081.5
1 3653 2436 128 1 2 8085989.5
2 2937 8852 135 1 9 5574642.1
3 659 7141 359 0 3 3232561.2
4 8435 2429 292 1 4 7055052.0
5 2009 4552 757 0 1 3926647.2
6 5930 9453 848 0 5 5876376.5
7 6326 4748 654 0 10 8696869.3
8 632 5792 807 1 5 5154055.2
9 5198 5342 591 1 3 3970892.1
10 703 852 684 1 10 2366397.3
11 5414 1172 716 1 9 9652258.1
12 5387 4430 374 0 4 1914688.8
13 1745 724 582 0 0 1320803.4
14 8871 7117 240 0 7 7986665.8
15 4878 281 384 1 5 7607322.9
16 3054 129 726 0 9 6420823.1
17 3258 6296 354 1 8 9944705.3
18 8257 2557 162 0 6 2604486.6
19 2950 9573 572 1 5 4187667.7

[Link]

(10000, 17)

[Link]()
<class '[Link]'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 17 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 squareMeters 10000 non-null int64
1 numberOfRooms 10000 non-null int64
2 hasYard 10000 non-null int64
3 hasPool 10000 non-null int64
4 floors 10000 non-null int64
5 cityCode 10000 non-null int64
6 cityPartRange 10000 non-null int64
7 numPrevOwners 10000 non-null int64
8 made 10000 non-null int64
9 isNewBuilt 10000 non-null int64
10 hasStormProtector 10000 non-null int64
11 basement 10000 non-null int64
12 attic 10000 non-null int64
13 garage 10000 non-null int64
14 hasStorageRoom 10000 non-null int64
15 hasGuestRoom 10000 non-null int64
16 price 10000 non-null float64
dtypes: float64(1), int64(16)
memory usage: 1.3 MB

corr = [Link]()
[Link](figsize=(10, 8))
[Link](corr, annot=True, cmap='coolwarm', fmt=".2f",
linewidths=0.5)
[Link]("Correlation Heatmap")
[Link]()
correlation_with_price = [Link]()['price'].abs()

threshold = 0.01

highly_correlated_columns =
correlation_with_price[correlation_with_price >
threshold].[Link]()

print("Columns highly correlated with 'Price':")

print(highly_correlated_columns)

Columns highly correlated with 'Price':

['squareMeters', 'numPrevOwners', 'isNewBuilt', 'garage', 'price']

df = dataset[highly_correlated_columns]
[Link]()

squareMeters numPrevOwners isNewBuilt garage price

0 75523 8 0 956 7559081.5
1 80771 6 1 128 8085989.5
2 55712 8 0 135 5574642.1
3 32316 4 0 359 3232561.2
4 70429 7 1 292 7055052.0

Gradient Desent, Learning rate, Cost Function

X = df['squareMeters'].values
y = df['price'].values

learning_rate = 0.01
iterations = 10

# Initialize coefficients (slope and intercept)

b0 = 0 # Intercept
b1 = 0 # Slope

# Lists to store the history of coefficients and cost

b0_history = []
b1_history = []
cost_history = []

# Gradient Descent
for iteration in range(iterations):
# Calculate predictions
y_pred = b0 + b1 * X

# Calculate the cost (mean squared error)

cost = [Link]((y_pred - y) ** 2)

# Calculate gradients
gradient_b0 = [Link](y_pred - y)
gradient_b1 = [Link]((y_pred - y) * X)

# Update coefficients using gradients and learning rate

b0 -= learning_rate * gradient_b0
b1 -= learning_rate * gradient_b1

# Append coefficients and cost to history lists for visualization

b0_history.append(b0)
b1_history.append(b1)
cost_history.append(cost)

# Plot the cost history

[Link](figsize=(10, 4))
[Link](1, 2, 1)
[Link](cost_history)
[Link]('Iterations')
[Link]('Cost')
[Link]('Cost History')

plt.tight_layout()
[Link]()

# Print the final coefficients and cost

print("Final Intercept (b0):", b0)
print("Final Slope (b1):", b1)
print("Final Cost:", cost_history[-1])

Final Intercept (b0): -2.4127252943307258e+72

Final Slope (b1): -1.6037599090064153e+77
Final Cost: 7.759017749802741e+148

X = df['squareMeters'].values
y = df['price'].values

learning_rate = 0.1
iterations = 10

# Initialize coefficients (slope and intercept)

b0 = 0 # Intercept
b1 = 0 # Slope
# Lists to store the history of coefficients and cost
b0_history = []
b1_history = []
cost_history = []

# Gradient Descent
for iteration in range(iterations):
# Calculate predictions
y_pred = b0 + b1 * X

# Calculate the cost (mean squared error)

cost = [Link]((y_pred - y) ** 2)

# Calculate gradients
gradient_b0 = [Link](y_pred - y)
gradient_b1 = [Link]((y_pred - y) * X)

# Update coefficients using gradients and learning rate

b0 -= learning_rate * gradient_b0
b1 -= learning_rate * gradient_b1

# Append coefficients and cost to history lists for visualization

b0_history.append(b0)
b1_history.append(b1)
cost_history.append(cost)

# Plot the cost history

[Link](figsize=(10, 4))
[Link](1, 2, 1)
[Link](cost_history)
[Link]('Iterations')
[Link]('Cost')
[Link]('Cost History')

plt.tight_layout()
[Link]()

# Print the final coefficients and cost

print("Final Intercept (b0):", b0)
print("Final Slope (b1):", b1)
print("Final Cost:", cost_history[-1])
Final Intercept (b0): -2.4127259493867817e+82
Final Slope (b1): -1.603760344427987e+87
Final Cost: 7.759021541641718e+166

from sklearn.model_selection import train_test_split

X = df['squareMeters'].values
y = df['price'].values
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.25, random_state=42)

from sklearn.linear_model import LinearRegression

model = LinearRegression()

X_train = X_train.reshape(-1, 1)
X_test = X_test.reshape(-1, 1)
# Fit the model to the data
[Link](X_train, y_train)

# Make predictions
y_pred = [Link](X_test)

from [Link] import r2_score, mean_squared_error

r2 = r2_score(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
print("R-squared:", r2)
print("Mean Squared Error:", mse)

# Plot each independent variable against the dependent variable

for i in range(1):
[Link](figsize=(6, 4))
[Link](X_train[:, i], y_train, label='Data')
[Link]([Link][i])
[Link]('Price')
[Link](f'Scatter Plot of {[Link][i]} vs. Price')

# Plot the regression line

sorted_indices = [Link](X_test[:, i])
[Link](X_test[:, i][sorted_indices], y_pred[sorted_indices],
color='red', label='Linear Regression')

[Link]()
[Link]()

R-squared: 0.999998793097589
Mean Squared Error: 10440151.787275104

Gradient Desent, Cost Function, Learning Rate for Multi Regression

X = df[['squareMeters', 'numPrevOwners', 'isNewBuilt', 'garage']]

y = df['price']
def normalize(feature):
"""Standardize the feature using Z-score normalization."""
return (feature - [Link](feature)) / [Link](feature)

# Hyperparameters
alpha = 0.01
num_iterations = 10

# Initialization
m = len(X['squareMeters'])
X0 = [Link](m)
X1 = normalize([Link](X['isNewBuilt']))
X2 = normalize([Link](X['numPrevOwners']))
X3 = normalize([Link](X['squareMeters']))
X4 = normalize([Link](X['garage']))
y = normalize([Link](y))

X = [Link]([X0, X1, X2, X3, X4]).T

theta = [Link](5)

# Gradient Descent
for _ in range(num_iterations):
y_pred = [Link](X, theta)
gradient = (1/m) * [Link](X.T, (y_pred - y))
theta -= alpha * gradient

print("Parameters:", theta)

Parameters: [-8.27782287e-19 -9.70733553e-04 1.51884192e-03

9.56150111e-02
-1.57511062e-03]

X = df[['squareMeters', 'numPrevOwners', 'isNewBuilt', 'garage']]

y = df['price']

def normalize(feature):
"""Standardize the feature using Z-score normalization."""
return (feature - [Link](feature)) / [Link](feature)

# Hyperparameters
alpha = 0.10
num_iterations = 10

X = [Link]([X0, X1, X2, X3, X4]).T

theta = [Link](5)

# Gradient Descent
for _ in range(num_iterations):
y_pred = [Link](X, theta)
gradient = (1/m) * [Link](X.T, (y_pred - y))
theta -= alpha * gradient

print("Parameters:", theta)

Parameters: [ 2.11741735e-17 -4.05175641e-03 6.47160620e-03

6.51187869e-01
-6.73095587e-03]

X = df[['squareMeters', 'numPrevOwners', 'isNewBuilt', 'garage']]

y = df['price']

def normalize(feature):
"""Standardize the feature using Z-score normalization."""
return (feature - [Link](feature)) / [Link](feature)

# Hyperparameters
alpha = 0.01
num_iterations = 20

X = [Link]([X0, X1, X2,X3,X4]).T

theta = [Link](5)
# Cost history to store MSE values for each iteration
cost_history = []

# Gradient Descent
for _ in range(num_iterations):
y_pred = [Link](X, theta)
cost = (1/m) * sum((y_pred - y)**2)
cost_history.append(cost)

gradient = (1/m) * [Link](X.T, (y_pred - y))

theta -= alpha * gradient

[Link](cost_history)
[Link]('Cost Function Over Iterations (Multiple Regression)')
[Link]('Iterations')
[Link]('Cost (MSE)')
[Link]()

X = df[['squareMeters', 'numPrevOwners', 'isNewBuilt', 'garage']]

y = df['price']
def normalize(feature):
"""Standardize the feature using Z-score normalization."""
return (feature - [Link](feature)) / [Link](feature)

# Hyperparameters
alpha = 0.10
num_iterations = 20

X = [Link]([X0, X1, X2,X3,X4]).T

theta = [Link](5)

# Cost history to store MSE values for each iteration

cost_history = []

# Gradient Descent
for _ in range(num_iterations):
y_pred = [Link](X, theta)
cost = (1/m) * sum((y_pred - y)**2)
cost_history.append(cost)

gradient = (1/m) * [Link](X.T, (y_pred - y))

theta -= alpha * gradient

[Link](cost_history)
[Link]('Cost Function Over Iterations (Multiple Regression)')
[Link]('Iterations')
[Link]('Cost (MSE)')
[Link]()
from sklearn.model_selection import train_test_split

X = df[['squareMeters', 'numPrevOwners', 'isNewBuilt',

'garage']].values
y = df['price'].values

X_train, X_test, y_train, y_test = train_test_split(X, y,

test_size=0.25, random_state=42)

from sklearn.linear_model import LinearRegression

model = LinearRegression()

# Fit the model to the data

[Link](X_train, y_train)

# Make predictions
y_pred = [Link](X_test)

from [Link] import r2_score, mean_squared_error

r2 = r2_score(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)

print("R-squared:", r2)
print("Mean Squared Error:", mse)

# Plot each independent variable against the dependent variable

for i in range(1):
[Link](figsize=(6, 4))
[Link](X_train[:, i], y_train, label='Data')
[Link]([Link][i])
[Link]('Price')
[Link](f'Scatter Plot of {[Link][i]} vs. Price')

# Plot the regression line

sorted_indices = [Link](X_test[:, i])
[Link](X_test[:, i][sorted_indices], y_pred[sorted_indices],
color='red', label='Linear Regression')

[Link]()
[Link]()

R-squared: 0.9999987942704442
Mean Squared Error: 10430006.155390566

Seattle House Data Analysis
No ratings yet
Seattle House Data Analysis
22 pages
GFED5 Beta: Global Fire Emissions Data
No ratings yet
GFED5 Beta: Global Fire Emissions Data
2 pages
Sales Forecasting and Financial Analysis
No ratings yet
Sales Forecasting and Financial Analysis
19 pages
Cost Apportionment Model Overview
No ratings yet
Cost Apportionment Model Overview
17 pages
Data Analysis with Python Libraries
No ratings yet
Data Analysis with Python Libraries
17 pages
House Price Analysis in India
No ratings yet
House Price Analysis in India
11 pages
April 2020 Revenue Summary Report
No ratings yet
April 2020 Revenue Summary Report
95 pages
House Price Analysis in King County
No ratings yet
House Price Analysis in King County
1 page
Sylhet Zone Enterprise Overview
No ratings yet
Sylhet Zone Enterprise Overview
4 pages
Data Management and Analysis Guide
No ratings yet
Data Management and Analysis Guide
43 pages
Sales Data for NGAWI Stores
No ratings yet
Sales Data for NGAWI Stores
94 pages
Time Value and Project Cash Flow Analysis
No ratings yet
Time Value and Project Cash Flow Analysis
11 pages
Brazil Banking Sector Indicators
No ratings yet
Brazil Banking Sector Indicators
9 pages
Regional Population and Land Area Data 2015
No ratings yet
Regional Population and Land Area Data 2015
14 pages
Regression Analysis of Overhead Costs
No ratings yet
Regression Analysis of Overhead Costs
9 pages
Up 1
No ratings yet
Up 1
2 pages
2014 Smoking Cases Data Analysis
No ratings yet
2014 Smoking Cases Data Analysis
35 pages
Albmarle Calciner Fan Specifications
No ratings yet
Albmarle Calciner Fan Specifications
16 pages
Sales and Advertising Correlation Analysis
No ratings yet
Sales and Advertising Correlation Analysis
17 pages
Curva Hipsométrica y Datos de Altura
No ratings yet
Curva Hipsométrica y Datos de Altura
3 pages
Asian Paints Financial Overview 2023
No ratings yet
Asian Paints Financial Overview 2023
47 pages
Data Analysis of Numerical Records
No ratings yet
Data Analysis of Numerical Records
3 pages
Trip Distribution Analysis for Work and Service
100% (1)
Trip Distribution Analysis for Work and Service
18 pages
Financial Results Comparison 2009-2010
No ratings yet
Financial Results Comparison 2009-2010
3 pages
Demanda de Agua Potable 2023-2042
No ratings yet
Demanda de Agua Potable 2023-2042
74 pages
Zomato Annual Report 2023-24 Insights
No ratings yet
Zomato Annual Report 2023-24 Insights
39 pages
Vehicle Sales Data by Region
No ratings yet
Vehicle Sales Data by Region
2 pages
IIIT Dharwad Placement Satisfaction Data
No ratings yet
IIIT Dharwad Placement Satisfaction Data
43 pages
Confusion Matrix and Metrics Analysis
No ratings yet
Confusion Matrix and Metrics Analysis
7 pages
Union Registration and Returns Data
No ratings yet
Union Registration and Returns Data
1 page
Startup Financial Forecasting Guide
No ratings yet
Startup Financial Forecasting Guide
19 pages
Sales and Inventory Projections Analysis
No ratings yet
Sales and Inventory Projections Analysis
137 pages
Student Performance Scores Summary
No ratings yet
Student Performance Scores Summary
2 pages
Supply Chain Data Analysis Guide
No ratings yet
Supply Chain Data Analysis Guide
27 pages
Stagii Anuale Răsura Monica-Mioara
No ratings yet
Stagii Anuale Răsura Monica-Mioara
2 pages
Active Company Distribution Report 2014
No ratings yet
Active Company Distribution Report 2014
6 pages
Civilian College Room Rates Data
No ratings yet
Civilian College Room Rates Data
23 pages
Rs 3,000,000 Loan Details and EMI
No ratings yet
Rs 3,000,000 Loan Details and EMI
42 pages
Loan Approval Dataset Analysis
No ratings yet
Loan Approval Dataset Analysis
12 pages
Water and Electricity Usage 2011-2023
No ratings yet
Water and Electricity Usage 2011-2023
1,522 pages
Heeru Corrosion Audit Report 2022
No ratings yet
Heeru Corrosion Audit Report 2022
381 pages
Real Estate Property Listings Data
No ratings yet
Real Estate Property Listings Data
9 pages
Daily Hotel Revenue and Occupancy Report
No ratings yet
Daily Hotel Revenue and Occupancy Report
81 pages
Cone Lengths with and without Wastage
No ratings yet
Cone Lengths with and without Wastage
4 pages
Andhra Pradesh Power Demand Analysis
No ratings yet
Andhra Pradesh Power Demand Analysis
13 pages
Slovin's Formula for Sample Size Calculation
No ratings yet
Slovin's Formula for Sample Size Calculation
232 pages
M.Phil. Thesis in Arabic Literature
No ratings yet
M.Phil. Thesis in Arabic Literature
120 pages
Railway Job Post Distribution Data
No ratings yet
Railway Job Post Distribution Data
4 pages
Job Mix Formula for Concrete Pengecoran
No ratings yet
Job Mix Formula for Concrete Pengecoran
4 pages
Financial Performance Overview
No ratings yet
Financial Performance Overview
36 pages
Jaipur Weather Data Analysis
0% (1)
Jaipur Weather Data Analysis
2 pages
Financial Statement Analysis and Business Valuation CgnXN64hU2
No ratings yet
Financial Statement Analysis and Business Valuation CgnXN64hU2
18 pages
Python Basics for Qm457 Students
No ratings yet
Python Basics for Qm457 Students
19 pages
Financial Report Summary 2024
No ratings yet
Financial Report Summary 2024
3 pages
WISE Application Usage Report 2024
No ratings yet
WISE Application Usage Report 2024
1 page
Andhra Pradesh Power Demand Analysis
No ratings yet
Andhra Pradesh Power Demand Analysis
14 pages
Understanding Discrete Mathematics Concepts
No ratings yet
Understanding Discrete Mathematics Concepts
50 pages
Regression Analysis Results 2016
No ratings yet
Regression Analysis Results 2016
2 pages
A235943 Triangle Table Data
No ratings yet
A235943 Triangle Table Data
1 page
BVC Logistics Field Executive Profile
No ratings yet
BVC Logistics Field Executive Profile
1 page
Quantum Circuits and State Normalization
No ratings yet
Quantum Circuits and State Normalization
6 pages
Now-X-Angel Control System Overview
No ratings yet
Now-X-Angel Control System Overview
4 pages
Introduction to Algorithms Course Overview
No ratings yet
Introduction to Algorithms Course Overview
45 pages
Cambridge International Examinations
No ratings yet
Cambridge International Examinations
12 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
16 pages
Chapter 8 - Between Subject Experimental Design
No ratings yet
Chapter 8 - Between Subject Experimental Design
2 pages
Yaari 2013
No ratings yet
Yaari 2013
11 pages
Pearson's Beta and Gamma Coefficients
No ratings yet
Pearson's Beta and Gamma Coefficients
21 pages
Probability and Statistics Analysis
No ratings yet
Probability and Statistics Analysis
17 pages
Introduction to Descriptive Statistics
No ratings yet
Introduction to Descriptive Statistics
34 pages
Statistical Quality Control in Concrete
100% (1)
Statistical Quality Control in Concrete
77 pages
Clinical Laboratory Quality Assessment Guide
No ratings yet
Clinical Laboratory Quality Assessment Guide
12 pages
ESG Impact on Financial Distress in IDX Firms
No ratings yet
ESG Impact on Financial Distress in IDX Firms
12 pages
R Functions for Statistics and Modeling
No ratings yet
R Functions for Statistics and Modeling
10 pages
Mathematics in Data Science Overview
No ratings yet
Mathematics in Data Science Overview
34 pages
Statistical Methods Course Syllabus
No ratings yet
Statistical Methods Course Syllabus
20 pages
SSP Acceptance Sampling Overview
No ratings yet
SSP Acceptance Sampling Overview
29 pages
Time Series Control Charts
No ratings yet
Time Series Control Charts
8 pages
Machine Learning - Question
No ratings yet
Machine Learning - Question
5 pages
Chapter 2 - 2012
No ratings yet
Chapter 2 - 2012
17 pages
Age and Income Correlation Analysis
No ratings yet
Age and Income Correlation Analysis
2 pages
Understanding Probability and Data Analysis
No ratings yet
Understanding Probability and Data Analysis
1 page
Colombia 2012 AmericasBarometer Survey Data
No ratings yet
Colombia 2012 AmericasBarometer Survey Data
4 pages
Iris Classification with Scikit-learn
No ratings yet
Iris Classification with Scikit-learn
6 pages
Proportion of Orange Reese's Pieces
No ratings yet
Proportion of Orange Reese's Pieces
3 pages
GPower Faul2007
No ratings yet
GPower Faul2007
17 pages
Maternal Knowledge and Posyandu Visits
No ratings yet
Maternal Knowledge and Posyandu Visits
5 pages
Importance of Inferential Statistics
No ratings yet
Importance of Inferential Statistics
56 pages
Non-Pooled T-Test for Mean Comparison
No ratings yet
Non-Pooled T-Test for Mean Comparison
4 pages
Types of Data Visualization Charts
No ratings yet
Types of Data Visualization Charts
12 pages
Confidence Intervals for Population Estimates
No ratings yet
Confidence Intervals for Population Estimates
23 pages
Advanced Statistical Methods Overview
100% (2)
Advanced Statistical Methods Overview
10 pages
Machine Learning Question Bank Guide
No ratings yet
Machine Learning Question Bank Guide
2 pages