FRA Extended
FRA Extended
(Extended)
DSBA
By:
E. AuroRajashri
0
List of Content
1
List of Figures
2
Context
Bankruptcy prediction is a crucial component of financial risk management that protects the interests of creditors,
investors, and other stakeholders. Predicting a company's impending bankruptcy can help with timely interventions and
smart decision-making, which can reduce losses and promote stability in the economy. Predictive modeling can benefit
from the abundance of financial data provided by US corporations listed on major exchanges such as the New York Stock
Exchange (NYSE) and NASDAQ, which are subject to regulatory scrutiny and strict financial reporting requirements. A
firm is considered bankrupt, according to the Securities Exchange Commission (SEC), if it files for bankruptcy under the
Bankruptcy Code's Chapter 11 (reorganization) or Chapter 7 (liquidation) provisions.
Objective
A well-known financial analytics company wants to create a Bankruptcy Prediction Tool to help regulators, investors,
and financial institutions assess the bankruptcy risk of US publicly traded corporations. The program will evaluate past
financial data using cutting-edge machine learning algorithms to find important signs and trends related to bankruptcy.
The following are this tool's main goals:
1. Bankruptcy Risk Assessment: Provide a probabilistic estimate of a company's likelihood of filing for bankruptcy
within a specified time frame (e.g., one year), allowing stakeholders to make informed decisions and take
preventive measures.
2. Early Warning System: Develop an early warning system that flags companies exhibiting financial distress
signals, enabling proactive risk management and strategic planning.
3. Financial Health Analysis: Analyze various financial metrics to offer a comprehensive assessment of a company's
financial health, highlighting areas of concern and potential vulnerabilities.
Data Dictionary
Company_id: Unique identifier for each company
Current_assets: Total current assets (in millions)
Cost_of_goods_sold: Cost of goods sold (in millions)
Depreciation_and_amortization: Depreciation and amortization expenses (in millions)
EBITDA: Earnings Before Interest, Taxes, Depreciation, and Amortization (in millions)
Inventory: Value of inventory (in millions)
Net_income: Net income (profit or loss) (in millions)
Total_receivables: Total receivables (in millions)
Market_value: Market value of the company (in millions)
Net_sales: Net sales or revenue (in millions)
Total_assets: Total assets (in millions)
Total_long_term_debt: Total long-term debt (in millions)
EBIT: Earnings Before Interest and Taxes (in millions)
Gross_profit: Gross profit (in millions)
Total_current_liabilities: Total current liabilities (in millions)
Retained_earnings: Retained earnings (in millions)
Total_revenue: Total revenue (in millions)
Total_liabilities: Total liabilities (in millions)
Total_operating_expenses: Total operating expenses (in millions)
Bankrupt: Bankruptcy status (1 = Bankrupt, 0 = Not Bankrupt)
3
1.1 Define the problem and perform Exploratory
Data Analysis
1.1.1 Problem Definition
Imported necessary libraries like NumPy, Pandas,matplotlib,seaborn.
Loaded the given dataset to dataframe election
Fig 1: Dataset Head rows
4
There are no duplicates in the dataset.
5
Fig 6: Distplot of all numeric variables
1. The dataset is imbalanced, with significantly fewer companies marked as bankrupt
(approximately 20.88%). This imbalance highlights the need for using techniques like
oversampling (e.g., SMOTE) or adjusting class weights in models to handle imbalance
effectively.
2. Variables such as Current Assets, EBITDA, and Net Income show a wide range of values,
including negative numbers, indicating financial distress in some companies.
Boxplots for numerical features revealed significant outliers, particularly in financial metrics
like Net Income, Total Liabilities, and EBIT. These outliers may represent companies under
extreme financial distress, critical for bankruptcy prediction.
3. Current Assets: Shows a significant spike followed by a decline, indicating fluctuations
in liquidity.
Cost of Goods Sold (COGS): Displays a similar pattern, suggesting changes in production
costs.
EBITDA: Notable peaks indicate periods of strong operational performance.
Net Income: Reflects profitability trends over time, essential for assessing financial health.
Total Long-Term Debt: Provides insights into the company's leverage and financial
obligations.
6
Bivariate analysis
7
Strong Positive Correlations: Variables like Net Sales and Total Revenue, as well as Gross
Profit and EBITDA, show strong positive correlations, indicating redundancy and potential
for multicollinearity in predictive models.
Weak or Negative Correlations with "Bankrupt": The "Bankrupt" variable has weak or
slightly negative correlations with most financial metrics (e.g., Net Income, EBITDA),
suggesting bankruptcy is influenced by more complex or nonlinear factors.
Cost of Goods Sold (COGS) vs. Net Income: A negative correlation between COGS and Net
Income highlights the expected relationship where higher costs reduce profitability.
Market Value and Net Sales: A strong correlation indicates that a company's sales
performance significantly impacts its market valuation, a key insight for financial analysis.
Multicollinearity Risk: Variables such as Total Revenue, Net Sales, and Gross Profit are
highly correlated, suggesting the need for dimensionality reduction or careful feature
selection in modelling.
Fig 8: Count of Outliers
8
Fig 9: Boxplot – Post outlier treatment
1.2.4 Scaling
Post standard scaler, below is the head of the dataset.
9
1.3 Model Building
1.3.1 Metrics of choice
1) Logistic regression
Fig 13: Random Forest Classifier
10
1.4 Model Performance evaluation
Logistic Regression Model - Training Performance
11
Random Forest Model - Training Performance
Fig 18: RF Training set – Confusion Matrix
Fig 19: RF Training set – Report
Random Forest Model - Test Performance
Fig 20: RF Test set – Confusion Matrix
Fig 21: RF Test set – Report
12
1.5 Model Performance Improvement
1.5.1 Dealing with multicollinearity using VIF
Fig 22: multicollinearity using VIF
13
1.5.2 Identifying optimal threshold using ROC curve ¶
Fig 23: Optimal Threshold using ROC
1.5.3 Model performance check across different metrics
Logistic Regression Performance - Training Set¶
Fig 24: LR Tuned – Training set
Fig 25: LR Tuned – Training set
14
Logistic Regression Performance - Test Set¶
Fig 26: LR Tuned – Test set
Fig 27: LR Tuned – Test set
15
Random Forest Performance - Train Set
Fig 28: RF Tuned – Training set
Fig 29: RF Tuned – Training set
Random Forest Performance - Test Set
Fig 30: RF Tuned – Test set
Fig 31: RF Tuned – Test set
16
1.6 Model Performance Comparison and Final Model Selection
Fig 32: Model Performance comparison
Key Metrics:
1. Recall:
o High recall is critical because false negatives (missed bankruptcies) can have severe
consequences.
o The tuned logistic regression and tuned random forest models have significantly higher
recall values compared to others.
2. Precision:
o Precision indicates how often predicted bankruptcies are correct. A balance between
precision and recall is essential to avoid unnecessary alarms.
o Tuned logistic regression has a lower precision compared to tuned random forest.
3. F1 Score:
o This metric balances precision and recall and is often a good indicator for imbalanced
datasets.
o Tuned random forest has a better F1 score compared to tuned logistic regression,
especially on testing data.
Observations:
Random Forest (untuned): Although it achieves perfect accuracy, recall, precision, and F1
on training data, its performance on testing data suggests severe overfitting.
Tuned Random Forest: Offers a good trade-off between recall, precision, and F1 on both
training and testing datasets.
Tuned Logistic Regression: Achieves high recall but suffers from relatively low precision
and F1 scores.
Recommendation:
Tuned Random Forest is the better choice based on its relatively high recall, balanced
precision, and a stronger F1 score on the testing set. It effectively balances the risk of missed
bankruptcies and false alarms compared to other models.
17
Fig 33: Feature Importance Logistic regression coefficients
18
Fig 34: Feature Importance
The chart highlights a few features with their respective importance values (logistic regression
coefficients):
1. Market_value: Most significant feature with the highest importance.
2. Total_long_term_debt: Second most important feature.
3. Total_receivables, Total_operating_expenses, Retained_earnings, Inventory, and
Net_income are ranked progressively lower.
The chart provides a broader overview of feature importance:
1. Market_value, Total_long_term_debt, and Retained_earnings have the highest
importance.
2. Other features like Net_income, Total_liabilities, Total_receivables, and more
contribute relatively less but are still considered.
3. The order of features suggests a wider, more holistic feature comparison, with normalized
relative importance values.
19
1.7 Actionable Insights and Recommendations
Business Insights
1. Key Predictors of Financial Distress:
o Market Value: Companies with declining market value are at higher bankruptcy risk. Market
value reflects investor confidence and financial health.
o Total Long-Term Debt: High levels of long-term debt indicate a burdened financial
structure, increasing default risk.
o Retained Earnings: Low or negative retained earnings highlight long-term
underperformance, raising concerns about the firm's ability to sustain operations.
2. Sector-Wide Observations:
o The dataset reveals significant outliers in metrics like Net Income and EBIT, suggesting
specific industries or companies may face extreme financial stress.
o Imbalance in bankrupt versus non-bankrupt companies highlights that bankruptcy is
relatively rare but impactful, requiring precise identification.
3. Strategic Indicators:
o Variables like Gross Profit and EBITDA correlate strongly with revenue, emphasizing
operational efficiency as a critical survival factor.
o Financial metrics such as Total Liabilities and Cost of Goods Sold (COGS) significantly
influence profitability and distress signals.
4. Early Warning Signals:
o Companies with low EBITDA, declining profitability, and increasing liabilities are likely to
move toward financial distress. These signals can trigger preventive measures.
Business Recommendations
1. For Financial Institutions and Investors:
Risk Management: Use the model to monitor high-risk companies and adjust credit exposure or
investment strategies proactively.
Portfolio Diversification: Reduce concentration in sectors or companies showing consistent distress
signals (e.g., high debt-to-equity ratios or falling market values).
Early Interventions: Offer restructuring plans or debt renegotiation options for at-risk clients
flagged by the model.
2. For Regulators:
Strengthen Monitoring Systems: Leverage the early warning system to identify firms requiring
closer regulatory scrutiny, reducing systemic risks in financial markets.
Encourage Transparency: Promote accurate and timely financial disclosures to enhance predictive
accuracy and market stability.
3. For Companies:
Debt Management: Reduce high levels of long-term debt through refinancing or equity funding to
improve financial stability.
Operational Efficiency: Focus on improving EBITDA and reducing COGS to enhance profitability.
Liquidity Management: Prioritize maintaining healthy liquidity ratios (e.g., current assets to
current liabilities) to address short-term obligations effectively.
4. Enhance Predictive Monitoring Tools:
Develop dashboards using the prediction tool to provide real-time bankruptcy risk insights for
20
stakeholders.
Offer financial health scorecards to benchmark companies against industry peers, encouraging self-
assessment and improvement.
5. Adopt Strategic Partnerships:
Collaborate with consulting firms to help distressed companies restructure operations, improve
cash flow, and regain profitability.
Build alliances with insurance providers to design bankruptcy protection products for high-risk
clients.
6. Crisis Preparedness:
Establish contingency plans for high-risk scenarios, including workforce management, asset
divestment, and creditor negotiations.
Develop a proactive communication strategy to reassure stakeholders during times of financial
distress.
21