linear_merged_pagenumber
linear_merged_pagenumber
2
3
4
5
6
7
8
9
10
11
12
13
4/3/25, 2:14 AM diabetesLogisticRegression.ipynb - Colab
import pandas as pd
# Assuming the .csv file is named 'your_file.csv' and is in the current directory
# Replace 'your_file.csv' with the actual file name if it's different
try:
df = pd.read_csv('diabetes.csv')
except FileNotFoundError:
print("Error: 'your_file.csv' not found. Please upload the file or provide the correct path.")
except pd.errors.ParserError:
print("Error: Could not parse the CSV file. Please check its format.")
1 1 85 66 29 0 26.6 0.351 31 0
3 1 89 66 23 94 28.1 0.167 21 0
6 3 78 50 32 88 31.0 0.248 26 1
Next steps: Generate code with df toggle_off View recommended plots New interactive sheet
df.tail(10)
https://colab.research.google.com/drive/1sFPGSs1MHNRsQl-nBT-IG-pm5vsLLP1D#scrollTo=W-eyPJRTKema&printMode=true 1/8 14
4/3/25, 2:14 AM diabetesLogisticRegression.ipynb - Colab
count 768.000000 768.000000 768.000000 768.000000 768.000000 768.000000 768.000000 768.000000 768.000000
mean 3.845052 120.894531 69.105469 20.536458 79.799479 31.992578 0.471876 33.240885 0.348958
std 3.369578 31.972618 19.355807 15.952218 115.244002 7.884160 0.331329 11.760232 0.476951
min 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.078000 21.000000 0.000000
25% 1.000000 99.000000 62.000000 0.000000 0.000000 27.300000 0.243750 24.000000 0.000000
50% 3.000000 117.000000 72.000000 23.000000 30.500000 32.000000 0.372500 29.000000 0.000000
75% 6.000000 140.250000 80.000000 32.000000 127.250000 36.600000 0.626250 41.000000 1.000000
max 17.000000 199.000000 122.000000 99.000000 846.000000 67.100000 2.420000 81.000000 1.000000
(768, 9)
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 768 entries, 0 to 767
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Pregnancies 768 non-null int64
1 Glucose 768 non-null int64
2 BloodPressure 768 non-null int64
3 SkinThickness 768 non-null int64
4 Insulin 768 non-null int64
5 BMI 768 non-null float64
6 DiabetesPedigreeFunction 768 non-null float64
7 Age 768 non-null int64
8 Outcome 768 non-null int64
dtypes: float64(2), int64(7)
memory usage: 54.1 KB
df.eq(0).sum()
https://colab.research.google.com/drive/1sFPGSs1MHNRsQl-nBT-IG-pm5vsLLP1D#scrollTo=W-eyPJRTKema&printMode=true 2/8 15
4/3/25, 2:14 AM diabetesLogisticRegression.ipynb - Colab
Pregnancies 111
Glucose 5
BloodPressure 35
SkinThickness 227
Insulin 374
BMI 11
DiabetesPedigreeFunction 0
Age 0
Outcome 500
dtype: int64
df.head(10)
Next steps: Generate code with df toggle_off View recommended plots New interactive sheet
https://colab.research.google.com/drive/1sFPGSs1MHNRsQl-nBT-IG-pm5vsLLP1D#scrollTo=W-eyPJRTKema&printMode=true 3/8 16
4/3/25, 2:14 AM diabetesLogisticRegression.ipynb - Colab
chevron_left
1 of 1
chevron_right
Undo Changes Use code with caution
# prompt: get categorical columns of the df and visualise them
Index([], dtype='object')
# Correlation heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(df.corr(), annot=True, cmap="coolwarm", fmt=".2f")
plt.title("Feature Correlation Heatmap")
plt.show()
print()
print()
# Apply LassoCV
lassocv = LassoCV(alphas=None, cv=10, max_iter=100000)
lassocv.fit(x_train, y_train)
plt.figure(figsize=(10, 5))
sns.barplot(x=feature_importance.values, y=feature_importance.index, palette="viridis")
plt.xlabel("Importance")
plt.ylabel("Feature")
plt.title("Feature Importance from Lasso Regression")
plt.show()
https://colab.research.google.com/drive/1sFPGSs1MHNRsQl-nBT-IG-pm5vsLLP1D#scrollTo=W-eyPJRTKema&printMode=true 17
4/8
4/3/25, 2:14 AM diabetesLogisticRegression.ipynb - Colab
https://colab.research.google.com/drive/1sFPGSs1MHNRsQl-nBT-IG-pm5vsLLP1D#scrollTo=W-eyPJRTKema&printMode=true 18
5/8
4/3/25, 2:14 AM diabetesLogisticRegression.ipynb - Colab
Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `y` variable to `hue` and set `legend
https://colab.research.google.com/drive/1sFPGSs1MHNRsQl-nBT-IG-pm5vsLLP1D#scrollTo=W-eyPJRTKema&printMode=true 19
6/8
4/3/25, 2:14 AM diabetesLogisticRegression.ipynb - Colab
plt.figure(figsize=(8, 6))
sns.scatterplot(x='Glucose', y='BMI', hue='Outcome', data=df, palette='Set2')
plt.title('Glucose vs. BMI by Outcome')
plt.show()
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
▾ LogisticRegression i ?
LogisticRegression()
https://colab.research.google.com/drive/1sFPGSs1MHNRsQl-nBT-IG-pm5vsLLP1D#scrollTo=W-eyPJRTKema&printMode=true 7/8 20
4/3/25, 2:14 AM diabetesLogisticRegression.ipynb - Colab
Accuracy: 0.75
Confusion Matrix:
[[78 21]
[18 37]]
Classification Report:
precision recall f1-score support
https://colab.research.google.com/drive/1sFPGSs1MHNRsQl-nBT-IG-pm5vsLLP1D#scrollTo=W-eyPJRTKema&printMode=true 21
8/8
Naive bias
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48