0% found this document useful (0 votes)

17 views51 pages

Kakauikkla

nice

Uploaded by

nishithrbd

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views51 pages

Kakauikkla

nice

Uploaded by

nishithrbd

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 51

Assignment – 1

Perform the following tasks on the shared dataset

a) Basic Exploratory Data Analysis

b) Visualization

c) Preprocessing

d) Apply Model

e) Evaluate

f) Tune HyperParameter

g) Save the model parameter

• Name: Nishith Dubey

• Enrollment Number: 0801CS231087
import numpy as np
import pandas as pd

df=pd.read_csv('/content/01_Student Final Grade Prediction-

Multi_lin_reg - 01_Student Final Grade Prediction-Multi_lin_reg.csv')

##EDA

df.head()

{"type":"dataframe","variable_name":"df"}

df.tail()

{"type":"dataframe"}

df.shape

(395, 33)

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 395 entries, 0 to 394
Data columns (total 33 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 school 395 non-null object
1 gender 394 non-null object
2 age 392 non-null float64
3 address 392 non-null object
4 famsize 394 non-null object
5 Parrent_status 395 non-null object
6 Mother_edu 394 non-null float64
7 Father_edu 394 non-null float64
8 Mother_job 394 non-null object
9 Father_job 392 non-null object
10 reason_to_chose_school 392 non-null object
11 guardian 393 non-null object
12 traveltime 393 non-null float64
13 weekly_studytime 394 non-null float64
14 failures 393 non-null float64
15 extra_edu_supp 394 non-null object
16 family_edu_supp 395 non-null object
17 extra_paid_class 394 non-null object
18 extra_curr_activities 393 non-null object
19 nursery 394 non-null object
20 Interested_in_higher_edu 394 non-null object
21 internet_access 394 non-null object
22 romantic_relationship 394 non-null object
23 Family_quality_reln 394 non-null float64
24 freetime_after_school 395 non-null int64
25 goout_with_friends 395 non-null int64
26 workday_alcohol_consum 395 non-null int64
27 weekend_alcohol_consum 395 non-null int64
28 health_status 395 non-null int64
29 absences 395 non-null int64
30 G1 395 non-null int64
31 G2 395 non-null int64
32 G3 395 non-null int64
dtypes: float64(7), int64(9), object(17)
memory usage: 102.0+ KB

df.describe()

{"summary":"{\n \"name\": \"df\",\n \"rows\": 8,\n \"fields\": [\n

{\n \"column\": \"age\",\n \"properties\": {\n
\"dtype\": \"number\",\n \"std\": 133.37624950020606,\n
\"min\": 1.2746422747774064,\n \"max\": 392.0,\n
\"num_unique_values\": 8,\n \"samples\": [\n
16.706632653061224,\n 17.0,\n 392.0\n ],\n
\"semantic_type\": \"\",\n \"description\": \"\"\n }\
n },\n {\n \"column\": \"Mother_edu\",\n
\"properties\": {\n \"dtype\": \"number\",\n \"std\":
138.45628484354472,\n \"min\": 0.0,\n \"max\": 394.0,\n
\"num_unique_values\": 7,\n \"samples\": [\n 394.0,\n
2.746192893401015,\n 3.0\n ],\n
\"semantic_type\": \"\",\n \"description\": \"\"\n }\
n },\n {\n \"column\": \"Father_edu\",\n
\"properties\": {\n \"dtype\": \"number\",\n \"std\":
138.56731772027555,\n \"min\": 0.0,\n \"max\": 394.0,\n
\"num_unique_values\": 7,\n \"samples\": [\n 394.0,\n
2.520304568527919,\n 3.0\n ],\n
\"semantic_type\": \"\",\n \"description\": \"\"\n }\
n },\n {\n \"column\": \"traveltime\",\n
\"properties\": {\n \"dtype\": \"number\",\n \"std\":
138.3875590098277,\n \"min\": 0.6983593411207679,\n
\"max\": 393.0,\n \"num_unique_values\": 6,\n
\"samples\": [\n 393.0,\n 1.4478371501272265,\n
4.0\n ],\n \"semantic_type\": \"\",\n
\"description\": \"\"\n }\n },\n {\n \"column\":
\"weekly_studytime\",\n \"properties\": {\n \"dtype\":
\"number\",\n \"std\": 138.65339932689832,\n \"min\":
0.8403054970813315,\n \"max\": 394.0,\n
\"num_unique_values\": 6,\n \"samples\": [\n 394.0,\n
2.035532994923858,\n 4.0\n ],\n
\"semantic_type\": \"\",\n \"description\": \"\"\n }\
n },\n {\n \"column\": \"failures\",\n \"properties\":
{\n \"dtype\": \"number\",\n \"std\":
138.7441113832976,\n \"min\": 0.0,\n \"max\": 393.0,\n
\"num_unique_values\": 5,\n \"samples\": [\n
0.33587786259541985,\n 3.0,\n 0.7451614708890008\n
],\n \"semantic_type\": \"\",\n \"description\": \"\"\n
}\n },\n {\n \"column\": \"Family_quality_reln\",\n
\"properties\": {\n \"dtype\": \"number\",\n \"std\":
138.10543705762666,\n \"min\": 0.8962139199690501,\n
\"max\": 394.0,\n \"num_unique_values\": 6,\n
\"samples\": [\n 394.0,\n 3.9416243654822334,\n
5.0\n ],\n \"semantic_type\": \"\",\n
\"description\": \"\"\n }\n },\n {\n \"column\":
\"freetime_after_school\",\n \"properties\": {\n
\"dtype\": \"number\",\n \"std\": 138.63828826062982,\n
\"min\": 0.9988620396657205,\n \"max\": 395.0,\n
\"num_unique_values\": 7,\n \"samples\": [\n 395.0,\n
3.2354430379746835,\n 4.0\n ],\n
\"semantic_type\": \"\",\n \"description\": \"\"\n }\
n },\n {\n \"column\": \"goout_with_friends\",\n
\"properties\": {\n \"dtype\": \"number\",\n \"std\":
138.68948196584594,\n \"min\": 1.0,\n \"max\": 395.0,\n
\"num_unique_values\": 8,\n \"samples\": [\n
3.108860759493671,\n 3.0,\n 395.0\n ],\n
\"semantic_type\": \"\",\n \"description\": \"\"\n }\
n },\n {\n \"column\": \"workday_alcohol_consum\",\n
\"properties\": {\n \"dtype\": \"number\",\n \"std\":
139.0354623650101,\n \"min\": 0.8907414280909659,\n
\"max\": 395.0,\n \"num_unique_values\": 6,\n
\"samples\": [\n 395.0,\n 1.481012658227848,\n
5.0\n ],\n \"semantic_type\": \"\",\n
\"description\": \"\"\n }\n },\n {\n \"column\":
\"weekend_alcohol_consum\",\n \"properties\": {\n
\"dtype\": \"number\",\n \"std\": 138.87302263653973,\n
\"min\": 1.0,\n \"max\": 395.0,\n \"num_unique_values\":
7,\n \"samples\": [\n 395.0,\n
2.2911392405063293,\n 3.0\n ],\n
\"semantic_type\": \"\",\n \"description\": \"\"\n }\
n },\n {\n \"column\": \"health_status\",\n
\"properties\": {\n \"dtype\": \"number\",\n \"std\":
138.50262599778412,\n \"min\": 1.0,\n \"max\": 395.0,\n
\"num_unique_values\": 7,\n \"samples\": [\n 395.0,\n
3.5544303797468353,\n 4.0\n ],\n
\"semantic_type\": \"\",\n \"description\": \"\"\n }\
n },\n {\n \"column\": \"absences\",\n \"properties\":
{\n \"dtype\": \"number\",\n \"std\":
136.85777166785417,\n \"min\": 0.0,\n \"max\": 395.0,\n
\"num_unique_values\": 7,\n \"samples\": [\n 395.0,\n
5.708860759493671,\n 8.0\n ],\n
\"semantic_type\": \"\",\n \"description\": \"\"\n }\
n },\n {\n \"column\": \"G1\",\n \"properties\": {\n
\"dtype\": \"number\",\n \"std\": 136.30663508587594,\n
\"min\": 3.0,\n \"max\": 395.0,\n \"num_unique_values\":
8,\n \"samples\": [\n 10.90886075949367,\n
11.0,\n 395.0\n ],\n \"semantic_type\": \"\",\n
\"description\": \"\"\n }\n },\n {\n \"column\":
\"G2\",\n \"properties\": {\n \"dtype\": \"number\",\n
\"std\": 136.4163745266465,\n \"min\": 0.0,\n \"max\":
395.0,\n \"num_unique_values\": 8,\n \"samples\": [\n
10.713924050632912,\n 11.0,\n 395.0\n ],\n
\"semantic_type\": \"\",\n \"description\": \"\"\n }\
n },\n {\n \"column\": \"G3\",\n \"properties\": {\n
\"dtype\": \"number\",\n \"std\": 136.35024783099098,\n
\"min\": 0.0,\n \"max\": 395.0,\n \"num_unique_values\":
8,\n \"samples\": [\n 10.415189873417722,\n
11.0,\n 395.0\n ],\n \"semantic_type\": \"\",\n
\"description\": \"\"\n }\n }\n ]\n}","type":"dataframe"}

df.columns

Index(['school', 'gender', 'age', 'address', 'famsize',

'Parrent_status',
'Mother_edu', 'Father_edu', 'Mother_job', 'Father_job',
'reason_to_chose_school', 'guardian', 'traveltime',
'weekly_studytime',
'failures', 'extra_edu_supp', 'family_edu_supp',
'extra_paid_class',
'extra_curr_activities', 'nursery', 'Interested_in_higher_edu',
'internet_access', 'romantic_relationship',
'Family_quality_reln',
'freetime_after_school', 'goout_with_friends',
'workday_alcohol_consum',
'weekend_alcohol_consum', 'health_status', 'absences', 'G1',
'G2',
'G3'],
dtype='object')

df.describe(include='all')

{"type":"dataframe"}

Visualization
import matplotlib.pyplot as plt
import seaborn as sns

plt.hist(df['G1'])

(array([ 2., 31., 37., 72., 51., 74., 63., 24., 30., 11.]),
array([ 3. , 4.6, 6.2, 7.8, 9.4, 11. , 12.6, 14.2, 15.8, 17.4,
19. ]),
<BarContainer object of 10 artists>)

plt.hist(df['G2'])
(array([13., 0., 16., 35., 82., 81., 78., 57., 18., 15.]),
array([ 0. , 1.9, 3.8, 5.7, 7.6, 9.5, 11.4, 13.3, 15.2, 17.1,
19. ]),
<BarContainer object of 10 artists>)

plt.hist(df['G3'])

(array([ 38., 0., 8., 24., 60., 103., 62., 60., 22., 18.]),
array([ 0., 2., 4., 6., 8., 10., 12., 14., 16., 18., 20.]),
<BarContainer object of 10 artists>)
plt.hist(df['absences'])

(array([287., 72., 25., 5., 1., 2., 0., 2., 0., 1.]),
array([ 0. , 7.5, 15. , 22.5, 30. , 37.5, 45. , 52.5, 60. , 67.5,
75. ]),
<BarContainer object of 10 artists>)
plt.hist(df['failures'])

(array([310., 0., 0., 50., 0., 0., 17., 0., 0., 16.]),
array([0. , 0.3, 0.6, 0.9, 1.2, 1.5, 1.8, 2.1, 2.4, 2.7, 3. ]),
<BarContainer object of 10 artists>)
sns.scatterplot(x=df['G1'], y=df['G2'])

<Axes: xlabel='G1', ylabel='G2'>

sns.scatterplot(x=df['G1'], y=df['G3'])

<Axes: xlabel='G1', ylabel='G3'>

sns.scatterplot(x=df['G2'], y=df['G3'])

<Axes: xlabel='G2', ylabel='G3'>

sns.histplot(df, x=df['age'], hue=df['gender'])

<Axes: xlabel='age', ylabel='Count'>

sns.histplot(df, x=df['age'], hue=df['address'])

<Axes: xlabel='age', ylabel='Count'>

plt.pie(df['school'].value_counts(), labels=df['school'].unique(),
autopct='%1.1f%%')
plt.title('School Distribution')
plt.show()
plt.pie(df.gender.value_counts().values,
labels = df.gender.value_counts().index, shadow =True,
autopct = "%1.2f%%")
plt.legend()

<matplotlib.legend.Legend at 0x79fe811263f0>
Preprocessing
sns.boxplot(x=df['G1'])

<Axes: xlabel='G1'>
sns.boxplot(x=df['G2'])

<Axes: xlabel='G2'>
numerical_cols = df.select_dtypes(include='number').columns
features_with_outliers=[]

for col in numerical_cols:

Q1 =df[col].quantile(0.25)
Q3 =df[col].quantile(0.75)
IQR = Q3-Q1

lower_bound = Q1-1.5*IQR
upper_bound = Q3+1.5*IQR

outliers = df[(df[col] < lower_bound )| (df[col] > upper_bound)]

if not outliers.empty:
features_with_outliers.append(col)

print("Features with outliers:")

print(features_with_outliers)

for cols in features_with_outliers:

plt.figure(figsize=(6,4))
sns.boxplot(x=df[cols], data=df, color='red')
plt.legend()
plt.title(f'Box plot of {cols}')
plt.show()

Features with outliers:

['school', 'address', 'Parrent_status', 'Mother_job', 'Father_job',
'extra_edu_supp', 'nursery', 'Interested_in_higher_edu',
'internet_access']

/tmp/ipython-input-3960268259.py:25: UserWarning: No artists with

labels found to put in legend. Note that artists whose label start
with an underscore are ignored when legend() is called with no
argument.
plt.legend()

/tmp/ipython-input-3960268259.py:25: UserWarning: No artists with

labels found to put in legend. Note that artists whose label start
with an underscore are ignored when legend() is called with no
argument.
plt.legend()
/tmp/ipython-input-3960268259.py:25: UserWarning: No artists with
labels found to put in legend. Note that artists whose label start
with an underscore are ignored when legend() is called with no
argument.
plt.legend()
/tmp/ipython-input-3960268259.py:25: UserWarning: No artists with
labels found to put in legend. Note that artists whose label start
with an underscore are ignored when legend() is called with no
argument.
plt.legend()
/tmp/ipython-input-3960268259.py:25: UserWarning: No artists with
labels found to put in legend. Note that artists whose label start
with an underscore are ignored when legend() is called with no
argument.
plt.legend()
/tmp/ipython-input-3960268259.py:25: UserWarning: No artists with
labels found to put in legend. Note that artists whose label start
with an underscore are ignored when legend() is called with no
argument.
plt.legend()
/tmp/ipython-input-3960268259.py:25: UserWarning: No artists with
labels found to put in legend. Note that artists whose label start
with an underscore are ignored when legend() is called with no
argument.
plt.legend()
/tmp/ipython-input-3960268259.py:25: UserWarning: No artists with
labels found to put in legend. Note that artists whose label start
with an underscore are ignored when legend() is called with no
argument.
plt.legend()
/tmp/ipython-input-3960268259.py:25: UserWarning: No artists with
labels found to put in legend. Note that artists whose label start
with an underscore are ignored when legend() is called with no
argument.
plt.legend()
for col in features_with_outliers:
Q1 = df[col].quantile(0.25)
Q3 = df[col].quantile(0.75)
IQR = Q3 - Q1

lower_bound = Q1 - 1.5 * IQR

upper_bound = Q3 + 1.5 * IQR

df[col] = df[col].clip(lower=lower_bound, upper=upper_bound)

for cols in features_with_outliers:

plt.figure(figsize=(6,4))
sns.boxplot(x=df[cols], data=df)
plt.legend()
plt.title(f'Box plot of {cols}')
plt.show()

/tmp/ipython-input-3418483522.py:4: UserWarning: No artists with

labels found to put in legend. Note that artists whose label start
with an underscore are ignored when legend() is called with no
argument.
plt.legend()
/tmp/ipython-input-3418483522.py:4: UserWarning: No artists with
labels found to put in legend. Note that artists whose label start
with an underscore are ignored when legend() is called with no
argument.
plt.legend()
/tmp/ipython-input-3418483522.py:4: UserWarning: No artists with
labels found to put in legend. Note that artists whose label start
with an underscore are ignored when legend() is called with no
argument.
plt.legend()
/tmp/ipython-input-3418483522.py:4: UserWarning: No artists with
labels found to put in legend. Note that artists whose label start
with an underscore are ignored when legend() is called with no
argument.
plt.legend()
/tmp/ipython-input-3418483522.py:4: UserWarning: No artists with
labels found to put in legend. Note that artists whose label start
with an underscore are ignored when legend() is called with no
argument.
plt.legend()
/tmp/ipython-input-3418483522.py:4: UserWarning: No artists with
labels found to put in legend. Note that artists whose label start
with an underscore are ignored when legend() is called with no
argument.
plt.legend()
/tmp/ipython-input-3418483522.py:4: UserWarning: No artists with
labels found to put in legend. Note that artists whose label start
with an underscore are ignored when legend() is called with no
argument.
plt.legend()
/tmp/ipython-input-3418483522.py:4: UserWarning: No artists with
labels found to put in legend. Note that artists whose label start
with an underscore are ignored when legend() is called with no
argument.
plt.legend()
/tmp/ipython-input-3418483522.py:4: UserWarning: No artists with
labels found to put in legend. Note that artists whose label start
with an underscore are ignored when legend() is called with no
argument.
plt.legend()
/tmp/ipython-input-3418483522.py:4: UserWarning: No artists with
labels found to put in legend. Note that artists whose label start
with an underscore are ignored when legend() is called with no
argument.
plt.legend()
corr_matrix= df.corr(numeric_only=True)
print(corr_matrix)

age Mother_edu Father_edu

traveltime \
age 1.000000 -0.159973 -0.164266 0.077411

Mother_edu -0.159973 1.000000 0.625897 -0.167021

Father_edu -0.164266 0.625897 1.000000 -0.157558

traveltime 0.077411 -0.167021 -0.157558 1.000000

weekly_studytime 0.014124 0.068783 0.009156 -0.115008

failures NaN NaN NaN NaN

Family_quality_reln 0.055209 0.022801 0.013962 -0.017217

freetime_after_school 0.014538 0.025119 -0.018703 -0.026569

goout_with_friends 0.119841 0.061921 0.041183 0.021811

workday_alcohol_consum 0.124073 0.016929 0.001879 0.116286

weekend_alcohol_consum 0.110046 -0.051180 -0.016568 0.120241

health_status -0.069379 -0.043770 0.013469 0.002497

absences 0.186068 0.116780 0.018008 -0.023451

G1 -0.060287 0.206500 0.192346 -0.086274

G2 -0.153019 0.228634 0.179830 -0.138590

G3 -0.156116 0.217775 0.154668 -0.114356

weekly_studytime failures
Family_quality_reln \
age 0.014124 NaN
0.055209
Mother_edu 0.068783 NaN
0.022801
Father_edu 0.009156 NaN
0.013962
traveltime -0.115008 NaN -
0.017217
weekly_studytime 1.000000 NaN
0.063992
failures NaN NaN
NaN
Family_quality_reln 0.063992 NaN
1.000000
freetime_after_school -0.141654 NaN
0.136637
goout_with_friends -0.066011 NaN
0.058834
workday_alcohol_consum -0.219130 NaN -
0.079564
weekend_alcohol_consum -0.260562 NaN -
0.122639
health_status -0.071126 NaN
0.077752
absences -0.080753 NaN -
0.080903
G1 0.163235 NaN
0.027758
G2 0.134537 NaN
0.007214
G3 0.099217 NaN
0.058057

freetime_after_school goout_with_friends \
age 0.014538 0.119841
Mother_edu 0.025119 0.061921
Father_edu -0.018703 0.041183
traveltime -0.026569 0.021811
weekly_studytime -0.141654 -0.066011
failures NaN NaN
Family_quality_reln 0.136637 0.058834
freetime_after_school 1.000000 0.281769
goout_with_friends 0.281769 1.000000
workday_alcohol_consum 0.205032 0.266818
weekend_alcohol_consum 0.146665 0.420386
health_status 0.075318 -0.009577
absences 0.007181 0.105672
G1 0.007524 -0.149104
G2 -0.011653 -0.157180
G3 0.008719 -0.132791

workday_alcohol_consum weekend_alcohol_consum
\
age 0.124073 0.110046

Mother_edu 0.016929 -0.051180

Father_edu 0.001879 -0.016568

traveltime 0.116286 0.120241

weekly_studytime -0.219130 -0.260562

failures NaN NaN

Family_quality_reln -0.079564 -0.122639

freetime_after_school 0.205032 0.146665

goout_with_friends 0.266818 0.420386

workday_alcohol_consum 1.000000 0.658956

weekend_alcohol_consum 0.658956 1.000000

health_status 0.080359 0.092476

absences 0.146541 0.193614

G1 -0.101402 -0.126179

G2 -0.087085 -0.102462

G3 -0.066432 -0.051939

health_status absences G1 G2
G3
age -0.069379 0.186068 -0.060287 -0.153019 -
0.156116
Mother_edu -0.043770 0.116780 0.206500 0.228634
0.217775
Father_edu 0.013469 0.018008 0.192346 0.179830
0.154668
traveltime 0.002497 -0.023451 -0.086274 -0.138590 -
0.114356
weekly_studytime -0.071126 -0.080753 0.163235 0.134537
0.099217
failures NaN NaN NaN NaN
NaN
Family_quality_reln 0.077752 -0.080903 0.027758 0.007214
0.058057
freetime_after_school 0.075318 0.007181 0.007524 -0.011653
0.008719
goout_with_friends -0.009577 0.105672 -0.149104 -0.157180 -
0.132791
workday_alcohol_consum 0.080359 0.146541 -0.101402 -0.087085 -
0.066432
weekend_alcohol_consum 0.092476 0.193614 -0.126179 -0.102462 -
0.051939
health_status 1.000000 -0.052585 -0.073172 -0.089461 -
0.061335
absences -0.052585 1.000000 -0.020177 -0.050567
0.068030
G1 -0.073172 -0.020177 1.000000 0.884067
0.801468
G2 -0.089461 -0.050567 0.884067 1.000000
0.905780
G3 -0.061335 0.068030 0.801468 0.905780
1.000000

sns.heatmap(df.corr(numeric_only=True), annot=True, cmap='coolwarm')

<Axes: >
Missing Values
df.isna().sum()

school 0
gender 1
age 3
address 3
famsize 1
Parrent_status 0
Mother_edu 1
Father_edu 1
Mother_job 1
Father_job 3
reason_to_chose_school 3
guardian 2
traveltime 2
weekly_studytime 1
failures 2
extra_edu_supp 1
family_edu_supp 0
extra_paid_class 1
extra_curr_activities 2
nursery 1
Interested_in_higher_edu 1
internet_access 1
romantic_relationship 1
Family_quality_reln 1
freetime_after_school 0
goout_with_friends 0
workday_alcohol_consum 0
weekend_alcohol_consum 0
health_status 0
absences 0
G1 0
G2 0
G3 0
dtype: int64

categorical_cols = df.select_dtypes(include=['object']).columns
for col in categorical_cols:
df[col] = df[col].fillna(df[col].mode()[0])

numeric_cols = df.select_dtypes(include=['int64','float64']).columns
for col in numeric_cols:
df[col] = df[col].fillna(df[col].median())

df.isna().sum()

school 0
gender 0
age 0
address 0
famsize 0
Parrent_status 0
Mother_edu 0
Father_edu 0
Mother_job 0
Father_job 0
reason_to_chose_school 0
guardian 0
traveltime 0
weekly_studytime 0
failures 0
extra_edu_supp 0
family_edu_supp 0
extra_paid_class 0
extra_curr_activities 0
nursery 0
Interested_in_higher_edu 0
internet_access 0
romantic_relationship 0
Family_quality_reln 0
freetime_after_school 0
goout_with_friends 0
workday_alcohol_consum 0
weekend_alcohol_consum 0
health_status 0
absences 0
G1 0
G2 0
G3 0
dtype: int64

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 395 entries, 0 to 394
Data columns (total 33 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 school 395 non-null object
1 gender 395 non-null object
2 age 395 non-null float64
3 address 395 non-null object
4 famsize 395 non-null object
5 Parrent_status 395 non-null object
6 Mother_edu 395 non-null float64
7 Father_edu 395 non-null float64
8 Mother_job 395 non-null object
9 Father_job 395 non-null object
10 reason_to_chose_school 395 non-null object
11 guardian 395 non-null object
12 traveltime 395 non-null float64
13 weekly_studytime 395 non-null float64
14 failures 395 non-null float64
15 extra_edu_supp 395 non-null object
16 family_edu_supp 395 non-null object
17 extra_paid_class 395 non-null object
18 extra_curr_activities 395 non-null object
19 nursery 395 non-null object
20 Interested_in_higher_edu 395 non-null object
21 internet_access 395 non-null object
22 romantic_relationship 395 non-null object
23 Family_quality_reln 395 non-null float64
24 freetime_after_school 395 non-null float64
25 goout_with_friends 395 non-null int64
26 workday_alcohol_consum 395 non-null float64
27 weekend_alcohol_consum 395 non-null int64
28 health_status 395 non-null int64
29 absences 395 non-null int64
30 G1 395 non-null int64
31 G2 395 non-null int64
32 G3 395 non-null int64
dtypes: float64(9), int64(7), object(17)
memory usage: 102.0+ KB

df.school.value_counts()

school
GP 349
MS 46
Name: count, dtype: int64

for col in df.select_dtypes(include=['object']).columns:

print(f"\nColumn: {col}")
print(df[col].value_counts())

Column: school
school
GP 349
MS 46
Name: count, dtype: int64

Column: gender
gender
F 209
M 186
Name: count, dtype: int64

Column: address
address
U 307
R 88
Name: count, dtype: int64

Column: famsize
famsize
GT3 282
LE3 113
Name: count, dtype: int64

Column: Parrent_status
Parrent_status
T 354
A 41
Name: count, dtype: int64
Column: Mother_job
Mother_job
other 142
services 102
at_home 59
teacher 58
health 34
Name: count, dtype: int64

Column: Father_job
Father_job
other 218
services 110
teacher 29
at_home 20
health 18
Name: count, dtype: int64

Column: reason_to_chose_school
reason_to_chose_school
course 148
home 108
reputation 104
other 35
Name: count, dtype: int64

Column: guardian
guardian
mother 274
father 89
other 32
Name: count, dtype: int64

Column: extra_edu_supp
extra_edu_supp
no 345
yes 50
Name: count, dtype: int64

Column: family_edu_supp
family_edu_supp
yes 242
no 153
Name: count, dtype: int64

Column: extra_paid_class
extra_paid_class
no 214
yes 181
Name: count, dtype: int64

Column: extra_curr_activities
extra_curr_activities
yes 203
no 192
Name: count, dtype: int64

Column: nursery
nursery
yes 314
no 81
Name: count, dtype: int64

Column: Interested_in_higher_edu
Interested_in_higher_edu
yes 375
no 20
Name: count, dtype: int64

Column: internet_access
internet_access
yes 329
no 66
Name: count, dtype: int64

Column: romantic_relationship
romantic_relationship
no 264
yes 131
Name: count, dtype: int64

df['gender'] = df.apply(lambda x: 1 if x['gender'] == 'M' else 0,

axis=1)
df['address'] = df.apply(lambda x: 1 if x['address'] == 'R' else 0,
axis=1)
df['famsize'] = df.apply(lambda x: 1 if x['famsize'] == 'GT3' else 0,
axis=1)
df['Parrent_status'] = df.apply(lambda x: 1 if x['Parrent_status'] ==
'T' else 0, axis=1)
df['extra_edu_supp'] = df.apply(lambda x: 1 if x['extra_edu_supp'] ==
'yes' else 0, axis=1)
df['family_edu_supp'] = df.apply(lambda x: 1 if x['family_edu_supp']
== 'yes' else 0, axis=1)
df['extra_paid_class'] = df.apply(lambda x: 1 if x['extra_paid_class']
== 'yes' else 0, axis=1)
df['extra_curr_activities'] = df.apply(lambda x: 1 if
x['extra_curr_activities'] == 'yes' else 0, axis=1)
df['nursery'] = df.apply(lambda x: 1 if x['nursery'] == 'yes' else 0,
axis=1)
df['Interested_in_higher_edu'] = df.apply(lambda x: 1 if
x['Interested_in_higher_edu'] == 'yes' else 0, axis=1)
df['internet_access'] = df.apply(lambda x: 1 if x['internet_access']
== 'yes' else 0, axis=1)
df['romantic_relationship'] = df.apply(lambda x: 1 if
x['romantic_relationship'] == 'yes' else 0, axis=1)

df['school'] = df['school'].map({'GP': 0, 'MS': 1})

df['Mother_job'] = df['Mother_job'].map({'at_home': 0, 'health': 1,
'other': 2, 'services': 3, 'teacher': 4})
df['Father_job'] = df['Father_job'].map({'at_home': 0, 'health': 1,
'other': 2, 'services': 3, 'teacher': 4})
df['reason_to_chose_school'] =
df['reason_to_chose_school'].map({'home': 0, 'reputation': 1,
'course': 2, 'other': 3})
df['guardian'] = df['guardian'].map({'mother': 0, 'father': 1,
'other': 2})

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 395 entries, 0 to 394
Data columns (total 33 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 school 395 non-null int64
1 gender 395 non-null int64
2 age 395 non-null float64
3 address 395 non-null int64
4 famsize 395 non-null int64
5 Parrent_status 395 non-null int64
6 Mother_edu 395 non-null float64
7 Father_edu 395 non-null float64
8 Mother_job 395 non-null int64
9 Father_job 395 non-null int64
10 reason_to_chose_school 395 non-null int64
11 guardian 395 non-null int64
12 traveltime 395 non-null float64
13 weekly_studytime 395 non-null float64
14 failures 395 non-null float64
15 extra_edu_supp 395 non-null int64
16 family_edu_supp 395 non-null int64
17 extra_paid_class 395 non-null int64
18 extra_curr_activities 395 non-null int64
19 nursery 395 non-null int64
20 Interested_in_higher_edu 395 non-null int64
21 internet_access 395 non-null int64
22 romantic_relationship 395 non-null int64
23 Family_quality_reln 395 non-null float64
24 freetime_after_school 395 non-null float64
25 goout_with_friends 395 non-null int64
26 workday_alcohol_consum 395 non-null float64
27 weekend_alcohol_consum 395 non-null int64
28 health_status 395 non-null int64
29 absences 395 non-null int64
30 G1 395 non-null int64
31 G2 395 non-null int64
32 G3 395 non-null int64
dtypes: float64(9), int64(24)
memory usage: 102.0 KB

## sEPERATE TARGET VARIABLE

y = df['G3']
X = df.drop('G3', axis =1)

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2, random_state=42)

from sklearn.linear_model import LinearRegression

lin = LinearRegression()
lin = lin.fit(X_train,y_train)
y_pred = lin.predict(X_test)

y_pred

array([ 6.53578216, 11.11320513, 3.61318931, 8.65230501,

9.95418988,
11.78545531, 18.83808101, 7.75778205, 6.64733596,
12.49644447,
14.98638193, 4.97546822, 13.67571849, 11.68115081,
14.53775178,
7.88646458, 4.75536475, 10.74090073, 13.85881117,
7.47913783,
13.76329359, 16.67784243, 13.54226895, 5.51112857,
8.42623937,
21.29809965, 9.98419751, 9.0424006 , 16.89730598,
11.11564326,
9.23491816, 6.48639573, 14.86393416, 13.30759839,
4.67925488,
4.45676117, -0.52096394, 15.49608868, 11.66803542,
8.91600367,
4.64897411, 10.3827293 , 14.22767684, 7.90541528,
16.14847446,
8.55657074, 12.44456524, 14.59436283, 11.47753162,
15.38628234,
14.12254439, 14.59157218, 10.28654417, 7.71348917,
2.86832812,
13.26623373, 9.6241547 , 5.73288258, 15.49127104,
16.05455577,
13.67138629, 7.85598092, 8.16820299, 3.18568083,
3.66881939,
16.87922268, 8.24014765, 8.47858617, 9.30042735,
16.84094208,
8.31127531, 8.31178302, 13.92164121, 20.95349954,
10.30296541,
5.8690325 , 8.06471596, 12.5941896 , 5.3494861 ])

lin.intercept_

np.float64(-3.079114806474344)

X_train.info()

<class 'pandas.core.frame.DataFrame'>
Index: 316 entries, 181 to 102
Data columns (total 32 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 school 316 non-null int64
1 gender 316 non-null int64
2 age 316 non-null float64
3 address 316 non-null int64
4 famsize 316 non-null int64
5 Parrent_status 316 non-null int64
6 Mother_edu 316 non-null float64
7 Father_edu 316 non-null float64
8 Mother_job 316 non-null int64
9 Father_job 316 non-null int64
10 reason_to_chose_school 316 non-null int64
11 guardian 316 non-null int64
12 traveltime 316 non-null float64
13 weekly_studytime 316 non-null float64
14 failures 316 non-null float64
15 extra_edu_supp 316 non-null int64
16 family_edu_supp 316 non-null int64
17 extra_paid_class 316 non-null int64
18 extra_curr_activities 316 non-null int64
19 nursery 316 non-null int64
20 Interested_in_higher_edu 316 non-null int64
21 internet_access 316 non-null int64
22 romantic_relationship 316 non-null int64
23 Family_quality_reln 316 non-null float64
24 freetime_after_school 316 non-null float64
25 goout_with_friends 316 non-null int64
26 workday_alcohol_consum 316 non-null float64
27 weekend_alcohol_consum 316 non-null int64
28 health_status 316 non-null int64
29 absences 316 non-null int64
30 G1 316 non-null int64
31 G2 316 non-null int64
dtypes: float64(9), int64(23)
memory usage: 81.5 KB

from sklearn.metrics import mean_absolute_error, mean_squared_error,

r2_score
mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
r2 = r2_score(y_test, y_pred)

print("Mean Absolute Error (MAE):", mae)

print("Mean Squared Error (MSE):", mse)
print("Root Mean Squared Error (RMSE):", rmse)
print("R² Score:", r2)

Mean Absolute Error (MAE): 1.6264715397250666

Mean Squared Error (MSE): 5.249308259664779
Root Mean Squared Error (RMSE): 2.29113689238875
R² Score: 0.7439992119481771

accuracy_score = lin.score(X_test, y_test)

print("Accuracy Score:", accuracy_score)

Accuracy Score: 0.7439992119481771

from sklearn.linear_model import Ridge

from sklearn.model_selection import GridSearchCV

ridge = Ridge()

params = {
'alpha': [0.01, 0.1, 1, 10, 100]
}

grid = GridSearchCV(ridge, param_grid=params, cv=5, scoring='r2')

grid.fit(X_train, y_train)

print("Best Parameters:", grid.best_params_)

print("Best Score:", grid.best_score_)

Best Parameters: {'alpha': 100}

Best Score: 0.8494824902919669

ridge_pred = grid.predict(X_test)

score = r2_score(ridge_pred , y_test)

score

0.7489279687734536
import pickle

model_pkl_file = "ML-Assignment-1.pkl"
with open(model_pkl_file, 'wb') as file:
pickle.dump(grid, file)

IS - Extended - Project - Guided - Template - Notebook
No ratings yet
IS - Extended - Project - Guided - Template - Notebook
26 pages
VoThaiThaoNhi ECON209 F2024 Lab 2
No ratings yet
VoThaiThaoNhi ECON209 F2024 Lab 2
10 pages
DACLUSTER
No ratings yet
DACLUSTER
9 pages
Loan Default Prediction System
No ratings yet
Loan Default Prediction System
13 pages
Copy of Final Project
No ratings yet
Copy of Final Project
16 pages
BD WPS2
No ratings yet
BD WPS2
23 pages
Python 3
No ratings yet
Python 3
9 pages
MLT Ann Lab 2
No ratings yet
MLT Ann Lab 2
7 pages
Import As Import As Import As Import: Pandas PD Numpy NP Matplotlib - Pyplot PLT Sklearn DF PD - Read - CSV DF
No ratings yet
Import As Import As Import As Import: Pandas PD Numpy NP Matplotlib - Pyplot PLT Sklearn DF PD - Read - CSV DF
9 pages
Assignment 1 ML
No ratings yet
Assignment 1 ML
30 pages
ML Lab-1
No ratings yet
ML Lab-1
5 pages
Lab2
No ratings yet
Lab2
15 pages
Covid 19 Analysis and Visualization Using Plotly Express
No ratings yet
Covid 19 Analysis and Visualization Using Plotly Express
11 pages
Another Copy of Ensemble Models Original Paid
No ratings yet
Another Copy of Ensemble Models Original Paid
51 pages
# Importing Necessary Libraries: Import As Import As Import As Import As
No ratings yet
# Importing Necessary Libraries: Import As Import As Import As Import As
21 pages
B58 - Handling Missing Values, Feature - Selection
No ratings yet
B58 - Handling Missing Values, Feature - Selection
4 pages
CVD Web
No ratings yet
CVD Web
22 pages
Week 4
No ratings yet
Week 4
13 pages
Student Performance in Exams
No ratings yet
Student Performance in Exams
71 pages
Students Exam Scores Analysis - Ipynb
No ratings yet
Students Exam Scores Analysis - Ipynb
4 pages
Heart Disease Classification Full-1
No ratings yet
Heart Disease Classification Full-1
3 pages
Copy of ML - Assignment
No ratings yet
Copy of ML - Assignment
7 pages
EDA Student
No ratings yet
EDA Student
8 pages
1 Linear Regression - Ipynb
No ratings yet
1 Linear Regression - Ipynb
66 pages
Plot 3D: Import As
No ratings yet
Plot 3D: Import As
26 pages
Projet 2 Classification Des Crédits
No ratings yet
Projet 2 Classification Des Crédits
24 pages
Boston Housing Data Analysis Steps
No ratings yet
Boston Housing Data Analysis Steps
17 pages
1 4-EDA Ipynb
No ratings yet
1 4-EDA Ipynb
12 pages
1 Introduction To Statsmodels
No ratings yet
1 Introduction To Statsmodels
28 pages
Supply Chain Analytics
No ratings yet
Supply Chain Analytics
20 pages
Kidney Ipynb
No ratings yet
Kidney Ipynb
253 pages
KNN For Classification
No ratings yet
KNN For Classification
5 pages
Bose A S
No ratings yet
Bose A S
37 pages
Ex 8
No ratings yet
Ex 8
3 pages
Student Performance Analysis
No ratings yet
Student Performance Analysis
16 pages
Task 1
No ratings yet
Task 1
5 pages
Election Prediction Model Guide
No ratings yet
Election Prediction Model Guide
324 pages
Kidney Disease Prediction - Ipynb
No ratings yet
Kidney Disease Prediction - Ipynb
148 pages
Kunal Assignment 3
No ratings yet
Kunal Assignment 3
19 pages
1st Project
No ratings yet
1st Project
24 pages
Data Manipulation With Python Pandas 1700003764
No ratings yet
Data Manipulation With Python Pandas 1700003764
10 pages
Data Science Lab Program Printout
No ratings yet
Data Science Lab Program Printout
43 pages
Jamboree
No ratings yet
Jamboree
10 pages
Student Dropout
No ratings yet
Student Dropout
38 pages
TCS Stock Data - Live and Latest-Checkpoint - Ipynb
No ratings yet
TCS Stock Data - Live and Latest-Checkpoint - Ipynb
172 pages
Data Analysis for Placement Trends
No ratings yet
Data Analysis for Placement Trends
54 pages
KnnImputer Ipynb
No ratings yet
KnnImputer Ipynb
6 pages
Predictive+Modelling+-+Logistic+Regression+-+Student+Version-New2.3.ipynb - Colaboratory
No ratings yet
Predictive+Modelling+-+Logistic+Regression+-+Student+Version-New2.3.ipynb - Colaboratory
12 pages
Aiml
No ratings yet
Aiml
27 pages
Model2.ipynb - Colab
No ratings yet
Model2.ipynb - Colab
11 pages
Simple Linear Regression for Sales Prediction
No ratings yet
Simple Linear Regression for Sales Prediction
40 pages
Open Lab 2
No ratings yet
Open Lab 2
15 pages
Online Food Orders Data Analysis
No ratings yet
Online Food Orders Data Analysis
12 pages
15 - 11 - 24 - SVM - Jupyter Notebook
No ratings yet
15 - 11 - 24 - SVM - Jupyter Notebook
5 pages
DALab Part-B BCU&BU
No ratings yet
DALab Part-B BCU&BU
12 pages
DAR CompleteFile 1
No ratings yet
DAR CompleteFile 1
41 pages
DSBDA Prac2
No ratings yet
DSBDA Prac2
2 pages
A09Ass02 - Jupyter Notebook
No ratings yet
A09Ass02 - Jupyter Notebook
11 pages
Prahlad Resume
No ratings yet
Prahlad Resume
2 pages
Bhasha Bandhu Sample Presentation 2
No ratings yet
Bhasha Bandhu Sample Presentation 2
10 pages
Prabhsimrandeep Singhs Resume
No ratings yet
Prabhsimrandeep Singhs Resume
1 page
GitHub Contribution Guide
No ratings yet
GitHub Contribution Guide
2 pages
Micro Ass2
No ratings yet
Micro Ass2
6 pages
SVA ACTIVITY Sheet
No ratings yet
SVA ACTIVITY Sheet
3 pages
Monophthongs and Diphthongs Guide
No ratings yet
Monophthongs and Diphthongs Guide
4 pages
Reading Skills
No ratings yet
Reading Skills
2 pages
Marriage Insights from Genesis 2:18-25
No ratings yet
Marriage Insights from Genesis 2:18-25
8 pages
Listenings Sites
No ratings yet
Listenings Sites
2 pages
Understanding CPU and Memory Types
No ratings yet
Understanding CPU and Memory Types
5 pages
Spanish 4 Midyear Exam Study Guide
No ratings yet
Spanish 4 Midyear Exam Study Guide
5 pages
Lecture 4
No ratings yet
Lecture 4
67 pages
CADdy++ ET Command Overview
No ratings yet
CADdy++ ET Command Overview
194 pages
Cisco Multi Service IP-To-IP Gateway Application Guide
No ratings yet
Cisco Multi Service IP-To-IP Gateway Application Guide
178 pages
Akose Ifa To Kill Iyami Aje
91% (23)
Akose Ifa To Kill Iyami Aje
3 pages
MIL Sem 3rd Sem Notice
No ratings yet
MIL Sem 3rd Sem Notice
18 pages
Report For CS4402 - P2 by 220016150
No ratings yet
Report For CS4402 - P2 by 220016150
6 pages
Q3 English 3 Module 5
No ratings yet
Q3 English 3 Module 5
19 pages
Teaching Foreshadowing in Grade 9
No ratings yet
Teaching Foreshadowing in Grade 9
17 pages
Act 2 Scene 3 Mov New
No ratings yet
Act 2 Scene 3 Mov New
16 pages
G Vocabulary Workshop Enriched Edition
100% (4)
G Vocabulary Workshop Enriched Edition
212 pages
Resume Darshana Sawant
No ratings yet
Resume Darshana Sawant
3 pages
TST Syllabus OC
No ratings yet
TST Syllabus OC
26 pages
Odd One Out Exercises for CM1 English
No ratings yet
Odd One Out Exercises for CM1 English
2 pages
Course Outline - IB Spanish Ab Initio
100% (1)
Course Outline - IB Spanish Ab Initio
2 pages
Refunds and Exchanges British English Teacher B1 B2
No ratings yet
Refunds and Exchanges British English Teacher B1 B2
8 pages
Network Emulators
No ratings yet
Network Emulators
1,133 pages
Drills On Learning Disabilities and Friends
100% (1)
Drills On Learning Disabilities and Friends
12 pages
Contiguous Memory Allocation
No ratings yet
Contiguous Memory Allocation
3 pages
Image Editing for Beginners
No ratings yet
Image Editing for Beginners
4 pages
Isir Butyl Db73-00018e
No ratings yet
Isir Butyl Db73-00018e
1 page
Google Meet Guide
No ratings yet
Google Meet Guide
10 pages
Assimil-English at Work PDF Completo
100% (1)
Assimil-English at Work PDF Completo
198 pages
SSC CHSL Exam English Questions PDF
No ratings yet
SSC CHSL Exam English Questions PDF
76 pages