0% found this document useful (0 votes)

103 views

FRA Extended

The document outlines a project focused on developing a Bankruptcy Prediction Tool using machine learning to assess the bankruptcy risk of US publicly traded corporations. It details the process of exploratory data analysis, data preprocessing, model building, and performance evaluation, ultimately recommending the tuned Random Forest model for its balanced performance metrics. Key predictors of financial distress include market value, total long-term debt, and retained earnings, with actionable insights provided for stakeholders in risk management and strategic planning.

Uploaded by

aurorajashri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

103 views

FRA Extended

Uploaded by

aurorajashri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

FRA Project

(Extended)
DSBA

By:
E. AuroRajashri

0
List of Content

1.1 Define the problem and perform Exploratory Data

Analysis ........................................................................................................................................ 4
1.1.1 Problem definition
1.1.2 Check shape, Data types, statistical summary
1.1.3 Univariate analysis and Bivariate analysis. Key meaningful observations on individual
variables and the relationship between variables
1.2 Data Preprocessing ........................................................................................................ 9
1.2.1 Outlier Treatment
1.2.2 Missing Value Treatment
1.2.3 Data Split
1.2.4 Scaling
1.3 Model Building .............................................................................................................. 12
1.3.1 Metrics of Choice (Justify the evaluation metrics)

1.3.2 Model Building (KNN, Naive bayes, Bagging, Boosting)

1.4 Model Performance evaluation.......................................................................................... 13
1.4.1 Check the confusion matrix and classification metrics for all the models (for both train and
test dataset)

1.5 Model Performance improvement.................................................................................... 16

1.5.1 Dealing with multicollinearity using VIF
1.5.2 Identify optimal threshold for Logistic Regression using ROC curve
1.5.3 Model performance check across different metrics

1.6 Model Performance Comparison and Final Model

Selection…………………………………………………………………………………………………………………..
1.7 Actionable Insights & Recommendations………………………………………………………………….

1.7.1 Key takeaway

1
List of Figures

Fig 1: Dataset Head rows

Fig 2: Dataset Info
Fig 3: Dataset Statistical Summary
Fig 4: Bankruptcy Analysis
Fig 5: Boxplot of all numeric variables
Fig 6: Distplot of all numeric variables
Fig 7: Heat Map of numeric variables
Fig 8: Count of Outliers
Fig 9: Boxplot – Post outlier treatment
Fig 10: Missing Values
Fig 11: Post Scaling
Fig 12: Logistic Regression
Fig 13: Random Forest Classifier
Fig 14: LR Training set – Confusion Matrix
Fig 15: LR Training set – Report
Fig 16: LR Test set – Confusion Matrix
Fig 17: LR Test set – Report
Fig 18: RF Training set – Confusion Matrix
Fig 19: RF Training set – Report
Fig 20: RF Test set – Confusion Matrix
Fig 21: RF Test set – Report
Fig 22: multicollinearity using VIF
Fig 23: Optimal Threshold using ROC
Fig 24: LR Tuned – Training set
Fig 25: LR Tuned – Training set
Fig 26: LR Tuned – Test set
Fig 27: LR Tuned – Test set
Fig 28: RF Tuned – Training set
Fig 29: RF Tuned – Training set
Fig 30: RF Tuned – Test set
Fig 31: RF Tuned – Test set
Fig 32: Model Performance comparison
Fig 33: Feature Importance Logistic regression coefficients
Fig 34: Feature Importance

2
Context
Bankruptcy prediction is a crucial component of financial risk management that protects the interests of creditors,
investors, and other stakeholders. Predicting a company's impending bankruptcy can help with timely interventions and
smart decision-making, which can reduce losses and promote stability in the economy. Predictive modeling can benefit
from the abundance of financial data provided by US corporations listed on major exchanges such as the New York Stock
Exchange (NYSE) and NASDAQ, which are subject to regulatory scrutiny and strict financial reporting requirements. A
firm is considered bankrupt, according to the Securities Exchange Commission (SEC), if it files for bankruptcy under the
Bankruptcy Code's Chapter 11 (reorganization) or Chapter 7 (liquidation) provisions.

Objective
A well-known financial analytics company wants to create a Bankruptcy Prediction Tool to help regulators, investors,
and financial institutions assess the bankruptcy risk of US publicly traded corporations. The program will evaluate past
financial data using cutting-edge machine learning algorithms to find important signs and trends related to bankruptcy.
The following are this tool's main goals:
1. Bankruptcy Risk Assessment: Provide a probabilistic estimate of a company's likelihood of filing for bankruptcy
within a specified time frame (e.g., one year), allowing stakeholders to make informed decisions and take
preventive measures.
2. Early Warning System: Develop an early warning system that flags companies exhibiting financial distress
signals, enabling proactive risk management and strategic planning.
3. Financial Health Analysis: Analyze various financial metrics to offer a comprehensive assessment of a company's
financial health, highlighting areas of concern and potential vulnerabilities.

Data Dictionary
 Company_id: Unique identifier for each company
 Current_assets: Total current assets (in millions)
 Cost_of_goods_sold: Cost of goods sold (in millions)
 Depreciation_and_amortization: Depreciation and amortization expenses (in millions)
 EBITDA: Earnings Before Interest, Taxes, Depreciation, and Amortization (in millions)
 Inventory: Value of inventory (in millions)
 Net_income: Net income (profit or loss) (in millions)
 Total_receivables: Total receivables (in millions)
 Market_value: Market value of the company (in millions)
 Net_sales: Net sales or revenue (in millions)
 Total_assets: Total assets (in millions)
 Total_long_term_debt: Total long-term debt (in millions)
 EBIT: Earnings Before Interest and Taxes (in millions)
 Gross_profit: Gross profit (in millions)
 Total_current_liabilities: Total current liabilities (in millions)
 Retained_earnings: Retained earnings (in millions)
 Total_revenue: Total revenue (in millions)
 Total_liabilities: Total liabilities (in millions)
 Total_operating_expenses: Total operating expenses (in millions)
 Bankrupt: Bankruptcy status (1 = Bankrupt, 0 = Not Bankrupt)

3
1.1 Define the problem and perform Exploratory
Data Analysis
1.1.1 Problem Definition
 Imported necessary libraries like NumPy, Pandas,matplotlib,seaborn.
 Loaded the given dataset to dataframe election


Fig 1: Dataset Head rows

1.1.2 Check shape, Data types, statistical summary

 Dataset has shape of 1983 rows and 20 columns. And it has 19 integer
datatypes and 1 object datatypes.

Fig 2: Dataset Info

 Below is the dataset statistical Summary

Fig 3: Dataset Statistical Summary

4
 There are no duplicates in the dataset.

1.1.3 Univariate analysis and Bivariate analysis

 Univariate analysis

Fig 4: Bankruptcy Analysis

Fig 5: Boxplot of all numeric variables

5
Fig 6: Distplot of all numeric variables
1. The dataset is imbalanced, with significantly fewer companies marked as bankrupt
(approximately 20.88%). This imbalance highlights the need for using techniques like
oversampling (e.g., SMOTE) or adjusting class weights in models to handle imbalance
effectively.
2. Variables such as Current Assets, EBITDA, and Net Income show a wide range of values,
including negative numbers, indicating financial distress in some companies.
Boxplots for numerical features revealed significant outliers, particularly in financial metrics
like Net Income, Total Liabilities, and EBIT. These outliers may represent companies under
extreme financial distress, critical for bankruptcy prediction.
3. Current Assets: Shows a significant spike followed by a decline, indicating fluctuations
in liquidity.
Cost of Goods Sold (COGS): Displays a similar pattern, suggesting changes in production
costs.
EBITDA: Notable peaks indicate periods of strong operational performance.
Net Income: Reflects profitability trends over time, essential for assessing financial health.
Total Long-Term Debt: Provides insights into the company's leverage and financial
obligations.

6
 Bivariate analysis

Fig 7: Heat Map of numeric variables

7
Strong Positive Correlations: Variables like Net Sales and Total Revenue, as well as Gross
Profit and EBITDA, show strong positive correlations, indicating redundancy and potential
for multicollinearity in predictive models.
Weak or Negative Correlations with "Bankrupt": The "Bankrupt" variable has weak or
slightly negative correlations with most financial metrics (e.g., Net Income, EBITDA),
suggesting bankruptcy is influenced by more complex or nonlinear factors.
Cost of Goods Sold (COGS) vs. Net Income: A negative correlation between COGS and Net
Income highlights the expected relationship where higher costs reduce profitability.
Market Value and Net Sales: A strong correlation indicates that a company's sales
performance significantly impacts its market valuation, a key insight for financial analysis.
Multicollinearity Risk: Variables such as Total Revenue, Net Sales, and Gross Profit are
highly correlated, suggesting the need for dimensionality reduction or careful feature
selection in modelling.

1.2 Data Preprocessing

1.2.1 Outlier treatment
 Count of outliers and outliers post treatment shown below:


Fig 8: Count of Outliers




8
Fig 9: Boxplot – Post outlier treatment

1.2.2 Missing Value treatment

Here, there is no missing value as shown below

Fig 10: Missing Values

1.2.3 Data Split

 Data splitted into train and test data in .30 size
 From Sklearn model selection library imported train test split

1.2.4 Scaling
 Post standard scaler, below is the head of the dataset.

Fig 11: Post Scaling

9
1.3 Model Building
1.3.1 Metrics of choice
1) Logistic regression

Fig 12: Logistic Regression

2) Random Forest Classifier





Fig 13: Random Forest Classifier







10
1.4 Model Performance evaluation
Logistic Regression Model - Training Performance

Fig 14: LR Training set – Confusion Matrix

Fig 15: LR Training set – Report

Logistic Regression Model - Test Performance

Fig 16: LR Test set – Confusion Matrix

Fig 17: LR Test set – Report

11
Random Forest Model - Training Performance



Fig 18: RF Training set – Confusion Matrix



Fig 19: RF Training set – Report


Random Forest Model - Test Performance



Fig 20: RF Test set – Confusion Matrix



Fig 21: RF Test set – Report



12
1.5 Model Performance Improvement
1.5.1 Dealing with multicollinearity using VIF


Fig 22: multicollinearity using VIF




13
1.5.2 Identifying optimal threshold using ROC curve ¶


Fig 23: Optimal Threshold using ROC


1.5.3 Model performance check across different metrics
Logistic Regression Performance - Training Set¶



Fig 24: LR Tuned – Training set



Fig 25: LR Tuned – Training set

14
Logistic Regression Performance - Test Set¶



Fig 26: LR Tuned – Test set



Fig 27: LR Tuned – Test set




15
Random Forest Performance - Train Set



Fig 28: RF Tuned – Training set




Fig 29: RF Tuned – Training set


Random Forest Performance - Test Set



Fig 30: RF Tuned – Test set





Fig 31: RF Tuned – Test set

16

1.6 Model Performance Comparison and Final Model Selection


Fig 32: Model Performance comparison

Key Metrics:
1. Recall:
o High recall is critical because false negatives (missed bankruptcies) can have severe
consequences.
o The tuned logistic regression and tuned random forest models have significantly higher
recall values compared to others.
2. Precision:
o Precision indicates how often predicted bankruptcies are correct. A balance between
precision and recall is essential to avoid unnecessary alarms.
o Tuned logistic regression has a lower precision compared to tuned random forest.
3. F1 Score:
o This metric balances precision and recall and is often a good indicator for imbalanced
datasets.
o Tuned random forest has a better F1 score compared to tuned logistic regression,
especially on testing data.
Observations:
 Random Forest (untuned): Although it achieves perfect accuracy, recall, precision, and F1
on training data, its performance on testing data suggests severe overfitting.
 Tuned Random Forest: Offers a good trade-off between recall, precision, and F1 on both
training and testing datasets.
 Tuned Logistic Regression: Achieves high recall but suffers from relatively low precision
and F1 scores.
Recommendation:
 Tuned Random Forest is the better choice based on its relatively high recall, balanced
precision, and a stronger F1 score on the testing set. It effectively balances the risk of missed
bankruptcies and false alarms compared to other models.


17

Fig 33: Feature Importance Logistic regression coefficients

18

Fig 34: Feature Importance 

The chart highlights a few features with their respective importance values (logistic regression
coefficients):
1. Market_value: Most significant feature with the highest importance.
2. Total_long_term_debt: Second most important feature.
3. Total_receivables, Total_operating_expenses, Retained_earnings, Inventory, and
Net_income are ranked progressively lower.
The chart provides a broader overview of feature importance:
1. Market_value, Total_long_term_debt, and Retained_earnings have the highest
importance.
2. Other features like Net_income, Total_liabilities, Total_receivables, and more
contribute relatively less but are still considered.
3. The order of features suggests a wider, more holistic feature comparison, with normalized
relative importance values.

19
1.7 Actionable Insights and Recommendations
Business Insights
1. Key Predictors of Financial Distress:
o Market Value: Companies with declining market value are at higher bankruptcy risk. Market
value reflects investor confidence and financial health.
o Total Long-Term Debt: High levels of long-term debt indicate a burdened financial
structure, increasing default risk.
o Retained Earnings: Low or negative retained earnings highlight long-term
underperformance, raising concerns about the firm's ability to sustain operations.
2. Sector-Wide Observations:
o The dataset reveals significant outliers in metrics like Net Income and EBIT, suggesting
specific industries or companies may face extreme financial stress.
o Imbalance in bankrupt versus non-bankrupt companies highlights that bankruptcy is
relatively rare but impactful, requiring precise identification.
3. Strategic Indicators:
o Variables like Gross Profit and EBITDA correlate strongly with revenue, emphasizing
operational efficiency as a critical survival factor.
o Financial metrics such as Total Liabilities and Cost of Goods Sold (COGS) significantly
influence profitability and distress signals.
4. Early Warning Signals:
o Companies with low EBITDA, declining profitability, and increasing liabilities are likely to
move toward financial distress. These signals can trigger preventive measures.

Business Recommendations
1. For Financial Institutions and Investors:
 Risk Management: Use the model to monitor high-risk companies and adjust credit exposure or
investment strategies proactively.
 Portfolio Diversification: Reduce concentration in sectors or companies showing consistent distress
signals (e.g., high debt-to-equity ratios or falling market values).
 Early Interventions: Offer restructuring plans or debt renegotiation options for at-risk clients
flagged by the model.
2. For Regulators:
 Strengthen Monitoring Systems: Leverage the early warning system to identify firms requiring
closer regulatory scrutiny, reducing systemic risks in financial markets.
 Encourage Transparency: Promote accurate and timely financial disclosures to enhance predictive
accuracy and market stability.
3. For Companies:
 Debt Management: Reduce high levels of long-term debt through refinancing or equity funding to
improve financial stability.
 Operational Efficiency: Focus on improving EBITDA and reducing COGS to enhance profitability.
 Liquidity Management: Prioritize maintaining healthy liquidity ratios (e.g., current assets to
current liabilities) to address short-term obligations effectively.
4. Enhance Predictive Monitoring Tools:
 Develop dashboards using the prediction tool to provide real-time bankruptcy risk insights for

20
stakeholders.
 Offer financial health scorecards to benchmark companies against industry peers, encouraging self-
assessment and improvement.
5. Adopt Strategic Partnerships:
 Collaborate with consulting firms to help distressed companies restructure operations, improve
cash flow, and regain profitability.
 Build alliances with insurance providers to design bankruptcy protection products for high-risk
clients.
6. Crisis Preparedness:
 Establish contingency plans for high-risk scenarios, including workforce management, asset
divestment, and creditor negotiations.
 Develop a proactive communication strategy to reassure stakeholders during times of financial
distress.

ML-2 Guided Project Report
No ratings yet
ML-2 Guided Project Report
63 pages
Nagareddy 18-Nov-2023
No ratings yet
Nagareddy 18-Nov-2023
20 pages
Project Questions
No ratings yet
Project Questions
4 pages
Factor-Hair RV PDF
No ratings yet
Factor-Hair RV PDF
23 pages
Sample - Customer Churn Prediction Python Documentation
No ratings yet
Sample - Customer Churn Prediction Python Documentation
33 pages
Assignment Clustering
No ratings yet
Assignment Clustering
22 pages
Data Science & Business Analytics: Post Graduate Program in
No ratings yet
Data Science & Business Analytics: Post Graduate Program in
16 pages
Project - Finance and Risk Assessment: Submitted By: Navendu Mishra
No ratings yet
Project - Finance and Risk Assessment: Submitted By: Navendu Mishra
18 pages
FRA Project Report Milestone 1 PDF
No ratings yet
FRA Project Report Milestone 1 PDF
29 pages
Capstone Notes-1
No ratings yet
Capstone Notes-1
18 pages
Random Forest - US - Heart - Patients - Class
100% (1)
Random Forest - US - Heart - Patients - Class
24 pages
Project Questions
No ratings yet
Project Questions
3 pages
Data Mining Project - 27.06.2021
No ratings yet
Data Mining Project - 27.06.2021
6 pages
Data Mining Business Report
No ratings yet
Data Mining Business Report
38 pages
FINANCE & RISK ANALYTICS – PROJECT - YARESH VIJAYASUNDARAM
No ratings yet
FINANCE & RISK ANALYTICS – PROJECT - YARESH VIJAYASUNDARAM
48 pages
SMT Capstone PPT Ayushi Rastogi PGPDSBA.O.MAY22.C
No ratings yet
SMT Capstone PPT Ayushi Rastogi PGPDSBA.O.MAY22.C
12 pages
Project Predictive Modeling PDF
100% (1)
Project Predictive Modeling PDF
58 pages
ML - Project - Business Report
No ratings yet
ML - Project - Business Report
43 pages
Palash Bhai - Machine Learning Assignment
100% (2)
Palash Bhai - Machine Learning Assignment
18 pages
PM ProjectJune - 2021
100% (1)
PM ProjectJune - 2021
33 pages
Rahulsharma - 03 12 23
No ratings yet
Rahulsharma - 03 12 23
25 pages
Clustering Analysis: Prepared by Muralidharan N
100% (1)
Clustering Analysis: Prepared by Muralidharan N
16 pages
PM Guided Project Sample Business Report
100% (1)
PM Guided Project Sample Business Report
52 pages
Capstone Project - Final Submission
No ratings yet
Capstone Project - Final Submission
36 pages
ML Models
No ratings yet
ML Models
2 pages
Answer Report: Data Mining
No ratings yet
Answer Report: Data Mining
32 pages
Surabhi FRA PartA
No ratings yet
Surabhi FRA PartA
13 pages
Sunira - Predictive Modeling
100% (1)
Sunira - Predictive Modeling
65 pages
Ml-1-Guided-Bus Report
No ratings yet
Ml-1-Guided-Bus Report
35 pages
Advance Stats Project Parijat
No ratings yet
Advance Stats Project Parijat
18 pages
AS Graded Project Suchi Solanki
No ratings yet
AS Graded Project Suchi Solanki
21 pages
Business Report Project - Sheetal - SMDM
100% (1)
Business Report Project - Sheetal - SMDM
20 pages
Anshul Dyundi Machine Learning July 2022
50% (2)
Anshul Dyundi Machine Learning July 2022
46 pages
RACHIT MITTAL Capstone Project. Notes 2 PDF
No ratings yet
RACHIT MITTAL Capstone Project. Notes 2 PDF
39 pages
DataMining Aug2021
100% (2)
DataMining Aug2021
49 pages
Great Learning Predictive Modelling Project
No ratings yet
Great Learning Predictive Modelling Project
12 pages
Vijayalakshmi
No ratings yet
Vijayalakshmi
17 pages
ML Quiz 2
No ratings yet
ML Quiz 2
1 page
Marketing & Retail Analytics - Report - Part A
100% (2)
Marketing & Retail Analytics - Report - Part A
18 pages
Rajiv Ranjan 11 Dec 2022
No ratings yet
Rajiv Ranjan 11 Dec 2022
18 pages
Predictive Modeling
No ratings yet
Predictive Modeling
38 pages
Data Mining Problem 2 Report
No ratings yet
Data Mining Problem 2 Report
13 pages
Answer Book - Rose Wines
100% (1)
Answer Book - Rose Wines
11 pages
Assignment 5 - Heuristics and Principles
No ratings yet
Assignment 5 - Heuristics and Principles
4 pages
Clustering Project
100% (1)
Clustering Project
44 pages
Time Series Forecasting - Rose - Buisness Report
100% (1)
Time Series Forecasting - Rose - Buisness Report
69 pages
Pradeep Chauhan Business Report 09july'23
100% (1)
Pradeep Chauhan Business Report 09july'23
32 pages
Pranjal - Singh - 25.12.2022 - Data Mining Project
No ratings yet
Pranjal - Singh - 25.12.2022 - Data Mining Project
8 pages
Predictive Model: Submitted by
100% (3)
Predictive Model: Submitted by
27 pages
Business Report DSBA Data Mining Project - Part 2 Segmentation Using K-Means Clustering
No ratings yet
Business Report DSBA Data Mining Project - Part 2 Segmentation Using K-Means Clustering
28 pages
Cars Project PDF
No ratings yet
Cars Project PDF
9 pages
MySQL - Week 5 Quiz
100% (1)
MySQL - Week 5 Quiz
6 pages
Machine Learning Guided Project
No ratings yet
Machine Learning Guided Project
23 pages
Finance Research Analysis (FRA) Project Report
No ratings yet
Finance Research Analysis (FRA) Project Report
63 pages
The Cricket Winner Prediction With Applications of ML and Data Analytics
No ratings yet
The Cricket Winner Prediction With Applications of ML and Data Analytics
18 pages
SMDM Project Report
100% (1)
SMDM Project Report
9 pages
NIrupam Agarwal Business Report-ML
100% (1)
NIrupam Agarwal Business Report-ML
23 pages
End Term Quiz1 - Attempt Review
No ratings yet
End Term Quiz1 - Attempt Review
5 pages
Finance & Risk Analytics QSTN 1 - Credit Risk
No ratings yet
Finance & Risk Analytics QSTN 1 - Credit Risk
24 pages
Finance and Risk Analytics Project.pdf
No ratings yet
Finance and Risk Analytics Project.pdf
94 pages
SHV Port - FS English 31 Dec 2019 - Signed
No ratings yet
SHV Port - FS English 31 Dec 2019 - Signed
54 pages
International Trade Fair
100% (1)
International Trade Fair
4 pages
UGC NET JUne 2024 Final Marathon - 1718629643914943
No ratings yet
UGC NET JUne 2024 Final Marathon - 1718629643914943
252 pages
11LEA
No ratings yet
11LEA
18 pages
Real, Quezon: From Wikipedia, The Free Encyclopedia
No ratings yet
Real, Quezon: From Wikipedia, The Free Encyclopedia
5 pages
Cruise Tourism in Gujarat PDF
No ratings yet
Cruise Tourism in Gujarat PDF
21 pages
Certified Cyber Warrior 3.1a
No ratings yet
Certified Cyber Warrior 3.1a
8 pages
Almas CV
No ratings yet
Almas CV
3 pages
Frozen 3-5 Years
No ratings yet
Frozen 3-5 Years
8 pages
A MAP OF THE WORLD SUMMARY
No ratings yet
A MAP OF THE WORLD SUMMARY
5 pages
Ticket
No ratings yet
Ticket
4 pages
The First Written Constitution of The World
100% (2)
The First Written Constitution of The World
45 pages
VISION OF MY LIFE
No ratings yet
VISION OF MY LIFE
2 pages
Urban Sprawl Pattern Madurai
No ratings yet
Urban Sprawl Pattern Madurai
10 pages
GEWORLD PRELIMS
No ratings yet
GEWORLD PRELIMS
6 pages
The Road To The Nuremberg Trials
No ratings yet
The Road To The Nuremberg Trials
6 pages
Aimcat 1806
No ratings yet
Aimcat 1806
39 pages
Презентация
No ratings yet
Презентация
8 pages
Compensation Management of The Shivalika Rugs
No ratings yet
Compensation Management of The Shivalika Rugs
18 pages
Vedant Meru - Computer Science - DITPIMPRI
No ratings yet
Vedant Meru - Computer Science - DITPIMPRI
2 pages
Social Networking, MCommerce & Online Auctions
No ratings yet
Social Networking, MCommerce & Online Auctions
79 pages
Bed Second Year Marksheert
No ratings yet
Bed Second Year Marksheert
1 page
Wilderness Survival Wilderness Survival: Summary
No ratings yet
Wilderness Survival Wilderness Survival: Summary
6 pages
Sewage Water System Commissioning Detailed
No ratings yet
Sewage Water System Commissioning Detailed
11 pages
A-U00-C-Ms-2042-000-Crs-00c - Method Statement For Shoreline Protection - Da
No ratings yet
A-U00-C-Ms-2042-000-Crs-00c - Method Statement For Shoreline Protection - Da
14 pages
AN-066B-EN IS31FL3733B vs. IS31FL3733 Rev.A
No ratings yet
AN-066B-EN IS31FL3733B vs. IS31FL3733 Rev.A
9 pages
Latihan 2 Bahasa Inggris
No ratings yet
Latihan 2 Bahasa Inggris
5 pages
Professional Checklist
No ratings yet
Professional Checklist
1 page
Preparing For Life After Retirement
No ratings yet
Preparing For Life After Retirement
23 pages
Data Dan Fakta Tentang Jerman (Versi Inggris)
No ratings yet
Data Dan Fakta Tentang Jerman (Versi Inggris)
183 pages

FRA Extended

Uploaded by

FRA Extended

Uploaded by

FRA Project

1.1 Define the problem and perform Exploratory Data

1.3.2 Model Building (KNN, Naive bayes, Bagging, Boosting)

1.5 Model Performance improvement.................................................................................... 16

1.6 Model Performance Comparison and Final Model

1.7.1 Key takeaway

Fig 1: Dataset Head rows

1.1.2 Check shape, Data types, statistical summary

Fig 2: Dataset Info

 Below is the dataset statistical Summary

Fig 3: Dataset Statistical Summary

1.1.3 Univariate analysis and Bivariate analysis

Fig 4: Bankruptcy Analysis

Fig 5: Boxplot of all numeric variables

Fig 7: Heat Map of numeric variables

1.2 Data Preprocessing

1.2.2 Missing Value treatment

Fig 10: Missing Values

1.2.3 Data Split

Fig 11: Post Scaling

Fig 12: Logistic Regression

2) Random Forest Classifier

Fig 14: LR Training set – Confusion Matrix

Fig 15: LR Training set – Report

Logistic Regression Model - Test Performance

Fig 16: LR Test set – Confusion Matrix

Fig 17: LR Test set – Report

You might also like