Loan Risk Analysis Project
Description:
This repository contains a comprehensive analysis of loan risk factors in the banking
sector using two datasets: application_data.csv and previous_application.csv.
The project aims to provide insights into customer demographics, credit types, risk
assessment, and business strategies.
Project Structure:
Part 1: Understanding the Bank
Total Records: Determine the total number of records in the
'application_data' table.
Credit Types: Analyse the different types of credits offered by the bank.
Gender Distribution: Explore the gender distribution of loan applicants.
Gender-wise Credit Distribution: Analyse the distribution of credits based on
gender.
Ownership of Assets: Investigate the volume of applicants who own cars and
realty in relation to credit type.
Income Distribution: Analyse income distribution and descriptive statistics
concerning credit type.
Income & Credit Distribution: Explore the relationship between income and
credit amounts based on credit type.
Goods Amount Analysis: Analyse the goods amount for which loans are
given in the case of cash loans.
Basic Income Type Distribution: Investigate the distribution of income types
among applicants.
Basic Housing Type Distribution: Explore the distribution of housing types
among applicants.
Basic Occupation Distribution: Analyse the distribution of occupations
among applicants.
Region & City Rating Distribution: Investigate the distribution of region and
city ratings among applicants.
Part 2: Understanding the Client Base & Business Operations
Family Status: Analyse the family status of the bank's clients.
Housing Distribution: Explore the distribution of housing types among
clients.
Age Brackets: Investigate the age brackets of the clients.
Contacts Availability: Analyse the availability of contact information for
clients.
Bank's Contact Reach: Explore the reach of the bank's contacts.
Documents Submission Analysis: Analyse the submission of required
documents by clients.
Loan Application Day Analysis: Investigate the distribution of loan
applications over days.
Part 3: Target Variable & Risk Analysis
Credit Enquiries Analysis: Analyse credit enquiries on clients before the loan
application.
Risk Classification: Classify clients based on risk factors such as default
percentages.
Deeper Risk Analysis: Conduct a deeper analysis of clients with payment
difficulties and low-risk surroundings.
Integration of Previous Application Data: Integrate insights from previous
loan application data.
Part 4: Insights & Recommendations
Part 5: Challenges on the Analysis
Part 6: Challenges on the Bank
SQL Queries & Insights:
After every SQL query, insights and interpretations are provided to facilitate a better
understanding of the data and its implications.
Data:
Source: [Link]
risk/data?select=application_test.csv
application_data.csv: Contains client information at the time of loan
application.
o Total columns: 122
previous_application.csv: Provides data on clients' previous loan
applications.
o Total columns: 37
This project aims to provide valuable insights into risk factors influencing loan default
and recommendations for mitigating such risks in the banking sector.
Snapshot of the work
Credit Types
select
name_contract_type,
cast(count(1)*100.0/(select count(1) from application_data) as decimal(4,2)) as
percentage
from application_data
group by NAME_CONTRACT_TYPE;
90% of the loans are Cash Loans while
around 10% are Revolving Loans. There are 2
kinds of credits namely revolving loans and
cash loans. Cash loans are credits given
upfront with periodical repayments (car loan), while revolving loans are loans based
on usage having a credit limit like Credit Cards. The company seems to pitch more
cash loans. Usually, these structured and secured loans. One can infer that the
company is conservative in giving loans since the earning is usually higher in
Revolving Loans. This however depends on the risk appetite of a bank, competition
of other banks, sales strategy, training of employees, the legal regulations, economy
and credit worthiness of the customer base.
select
CODE_GENDER,
cast(count(1)*100.0/(select count(1) from application_data) as decimal(4,2)) as
percentage
from application_data
group by CODE_GENDER;
65% of the customers are female,34% are males
and rest are others. This bank has a larger
female customer base! Few reasons why this
could be the case is that
Demographic conditions in the region -
More working females, Higher financial literacy & education, risk taking
appetite
Marketing Strategy - The bank might be targeting more females. One reason
could be that the bank has better & loyal female customers.
Fraud rate may be lesser in this gender.
Social Image & Initiatives - The bank could be promoting women
empowerment.
Government Benefits - The bank might be receiving Government Benefits for
having a higher female customer base.
Geographical Conditions - The region where the bank operates might have
more females
Income Distribution & Descriptive Statistics wrt. Credit type
SELECT
distinct name_contract_type AS name_contract_type
,cast(count(1)over(partition by name_contract_type) *100.0/(select count(1) from
application_data) as decimal(4,2)) as percentage
,cast(avg(amt_income_total)over(partition by name_contract_type) as int) as
average_income
,min(amt_income_total) over(partition by name_contract_type) as min_income
,max(amt_income_total) over(partition by name_contract_type) as max_income
,PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY amt_income_total) OVER (PARTITION BY
name_contract_type) AS Median_Income
FROM application_data;
The average income of clients is equal in both the loan segments. One reason could
be that cash loans require security and a higher income level eligibility criterion. The
Min income in both the loans average around 26000. The Maximum income is much
higher in case of cash loans. With higher credit, banks require higher security. The
Median Income and max income in case of Cash Loans show a huge gap. This gap
can be further analysed by categorizing customers into income_level_flags.
SELECT
distinct name_contract_type AS name_contract_type
,cast(count(1)over(partition by name_contract_type) *100.0/(select count(1) from
application_data) as decimal(4,2)) as percentage
,cast(avg(amt_income_total)over(partition by name_contract_type) as int) as
average_income
,cast(avg(AMT_CREDIT)over(partition by name_contract_type) as int) as average_credit
,min(amt_income_total) over(partition by name_contract_type) as min_income
,min(AMT_CREDIT) over(partition by name_contract_type) as min_credit
,max(amt_income_total) over(partition by name_contract_type) as max_income
,max(AMT_CREDIT) over(partition by name_contract_type) as max_credit
,PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY amt_income_total) OVER (PARTITION BY
name_contract_type) AS Median_Income
,PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY AMT_CREDIT) OVER (PARTITION BY
name_contract_type) AS Median_Credit
FROM application_data;
The Average Credit in Cash Loans is twice the Revolving Loan credits, while the
Average & Minimum income is similar. This supports the bank's conservative
approach of dealing credits. The Minimum Credit however is much higher for
Revolving Loans. But the Median Credit is half of Cash Loans. Also, the bank gives 5
times the income as a revolving loan to the person with lowest income. This also
supports the bank's risk-free approach since clients with less assets can avail the
loans. The bank pushes for secured loans.
Analysis of Goods Amount for which loan is given in case of Cash Loans
SELECT
distinct name_contract_type AS name_contract_type
,cast(count(1)over(partition by name_contract_type) *100.0/(select count(1) from
application_data) as decimal(4,2)) as percentage
,cast(avg(AMT_GOODS_PRICE)over(partition by name_contract_type) as int) as
average_goods_amt
,cast(avg(AMT_CREDIT)over(partition by name_contract_type) as int) as average_credit
,min(AMT_GOODS_PRICE) over(partition by name_contract_type) as min_goods_amt
,min(AMT_CREDIT) over(partition by name_contract_type) as min_credit
,max(AMT_GOODS_PRICE) over(partition by name_contract_type) as max_goods_amt
,max(AMT_CREDIT) over(partition by name_contract_type) as max_credit
,PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY AMT_GOODS_PRICE) OVER (PARTITION BY
name_contract_type) AS Median_goods_amt
,PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY AMT_CREDIT) OVER (PARTITION BY
name_contract_type) AS Median_Credit
FROM application_data
where NAME_CONTRACT_TYPE = 'Cash Loans';
Usually, the credit is higher than the goods amount for which the loan is taken. The
reasons why it could be so are-
The Loan might cover additional charges
The borrower might have a discretion to use the money according to their
needs
The borrower might be paying off previous dues with a new loan
Overall, the bank does not allow a significant gap between the goods being
purchased and the loan amount.
Age Brackets of the Clients
with age_application as (
select
case when datediff(year,DATEADd(dd,DAYS_BIRTH,getdate()),GETDATE()) <=25 then '18-25'
when datediff(year,DATEADd(dd,DAYS_BIRTH,getdate()),GETDATE()) between 26 and
40 then '26-40'
when datediff(year,DATEADd(dd,DAYS_BIRTH,getdate()),GETDATE()) between 41 and
55 then '41-55'
when datediff(year,DATEADd(dd,DAYS_BIRTH,getdate()),GETDATE()) between 56 and
65 then '56-65' else '65above' end as age_bracket
from application_data)
select
age_bracket
,count(1) as Frequency
,cast(count(1)*100.0/(select count(1) from application_data)as decimal(4,2)) as
Percentage
from age_application
group by age_bracket
order by Percentage desc;
37% of the clients are between the
age 26 and 55 and 20% of the clients are
above 55
Only 4% of the clients are below 25.
Like iterated earlier, the need for credit
comes with more responsibilities and
interests
Few people who get really successful early in their career, tend to avail credit
options to accelerate their growth
Also, very few clients are Students
with contact_data as
(select
case when FLAG_MOBIL+FLAG_EMP_PHONE+FLAG_WORK_PHONE =3 then 'All Contacts Available'
when FLAG_MOBIL+FLAG_EMP_PHONE+FLAG_WORK_PHONE =2 then 'Two Contacts Available'
when FLAG_MOBIL+FLAG_EMP_PHONE+FLAG_WORK_PHONE =1 then '1 Contact Available'
else 'No Contact Available' end as contacts_provided
from application_data)
select
contacts_provided,
count(1) as Frequency,
cast(count(1)*100.0/(select count(1) from contact_data) as decimal(4,2)) as
percentage
from contact_data
group by contacts_provided;
Around 62% of the Clients have
provided 2 Contacts, and 19%
have given either 1 or all
contacts. There is no client
without any contact. The documentation seems clearly executed.
Documents Submission Analysis
with Documents_data as
(select
case when
FLAG_DOCUMENT_2+FLAG_DOCUMENT_3+FLAG_DOCUMENT_4+FLAG_DOCUMENT_5+FLAG_DOCUMENT_6+FLAG_D
OCUMENT_7+FLAG_DOCUMENT_8+FLAG_DOCUMENT_9+FLAG_DOCUMENT_10+FLAG_DOCUMENT_11+FLAG_DOCUM
ENT_12+FLAG_DOCUMENT_13+FLAG_DOCUMENT_14+FLAG_DOCUMENT_15+FLAG_DOCUMENT_16+FLAG_DOCUME
NT_17+FLAG_DOCUMENT_18+FLAG_DOCUMENT_19+FLAG_DOCUMENT_20+FLAG_DOCUMENT_21
between 15 and 20 then '15-20 Documents Available'
when
FLAG_DOCUMENT_2+FLAG_DOCUMENT_3+FLAG_DOCUMENT_4+FLAG_DOCUMENT_5+FLAG_DOCUMENT_6+FLAG_D
OCUMENT_7+FLAG_DOCUMENT_8+FLAG_DOCUMENT_9+FLAG_DOCUMENT_10+FLAG_DOCUMENT_11+FLAG_DOCUM
ENT_12+FLAG_DOCUMENT_13+FLAG_DOCUMENT_14+FLAG_DOCUMENT_15+FLAG_DOCUMENT_16+FLAG_DOCUME
NT_17+FLAG_DOCUMENT_18+FLAG_DOCUMENT_19+FLAG_DOCUMENT_20+FLAG_DOCUMENT_21
between 10 and 14 then '10-14 Documents Available'
when
FLAG_DOCUMENT_2+FLAG_DOCUMENT_3+FLAG_DOCUMENT_4+FLAG_DOCUMENT_5+FLAG_DOCUMENT_6+FLAG_D
OCUMENT_7+FLAG_DOCUMENT_8+FLAG_DOCUMENT_9+FLAG_DOCUMENT_10+FLAG_DOCUMENT_11+FLAG_DOCUM
ENT_12+FLAG_DOCUMENT_13+FLAG_DOCUMENT_14+FLAG_DOCUMENT_15+FLAG_DOCUMENT_16+FLAG_DOCUME
NT_17+FLAG_DOCUMENT_18+FLAG_DOCUMENT_19+FLAG_DOCUMENT_20+FLAG_DOCUMENT_21
between 5 and 9 then ' 5-9 Documents Available'
else 'Less than 5 Documents Available' end as Documents_provided
from application_data)
select
Documents_provided,
count(1) as Frequency,
cast(count(1)*100.0/(select count(1) from documents_data) as decimal(5,2)) as
percentage
from documents_data
group by Documents_provided;
In Terms of Documents, up to 4 Documents were procured at max (100%). These
documents vary from loan to loan. This could be a good sign in the sense that the
bank takes less documentation before providing credit. A point to check would be
that all the necessary information is collected. While less paperwork and online
documentation is a plus point, the bank should ensure that no information is missed.
Occupation details are clearly not part of this check (Again it depends on the loan
type). Would be a plus if most of it is digitised.
Overall Analysis of Credit enquiries on the Clients
select
AMT_REQ_CREDIT_BUREAU_YEAR
,count(1) as Frequency
,cast(count(1)*100.0/(select count(1) from application_data) as decimal(4,2)) as
Percentage
from application_data
group by AMT_REQ_CREDIT_BUREAU_YEAR
order by percentage desc;
43% of Loan Applications come from clients having 0 or 1 Cibil checks & 16%
from clients having 2 cibil checks. This is a decent sign that could suggest that
the client does not seem to be risky. This could be further analysed by looking
at their cibil reports for 2 years.
20% of clients have more than 2 enquiries in 1 year. This is further analysed
below by looking at their quarterly and monthly enquiries.
13.5% values are null which I assume are the clients having no credit
history/taking credit for the 1st time. This depends on multiple factors like the
bank's strategy, legal implications, client relationship (might be a customer
having deposits), etc.
Past behaviour of clients in that geographical locations need to be checked in
order to know if this is risky sign or not. Macro changes in economy (fall in
interest rates, increase in taxes, etc) could also affect this factor.
Analysis of individual applications based on the credit enquiries
with enquiry_table as
(select
case when AMT_REQ_CREDIT_BUREAU_YEAR is null then 'No Credit History'
when AMT_REQ_CREDIT_BUREAU_YEAR = 0 then 'No Enquiry in the past year'
when AMT_REQ_CREDIT_BUREAU_QRT = 0 then 'Had Enquiries within the year'
when AMT_REQ_CREDIT_BUREAU_MON = 0 then 'Had Enquiries within the quarter'
when AMT_REQ_CREDIT_BUREAU_WEEK = 0 then 'Had Enquiries within the month'
when AMT_REQ_CREDIT_BUREAU_DAY = 0 then 'Had Enquiries within the week'
when AMT_REQ_CREDIT_BUREAU_HOUR = 0 then 'Had Enquiries within the day' end as
Enquiry_Status
from application_data)
select
Enquiry_Status
,count(Enquiry_Status) as Frequency
,cast(count(Enquiry_Status)*100.0/(select count(1) from enquiry_table)as
decimal(4,2)) as Percentage
from enquiry_table
group by Enquiry_Status
order by Percentage desc;
with default_scope as
(select isnull(cast(DEF_60_CNT_SOCIAL_CIRCLE*100.0/NULLIF(OBS_60_CNT_SOCIAL_CIRCLE,0)
as decimal(5,2)),0) as Percentage
from application_data)
,risk_scope as
(select
case when Percentage=100 then 'Very High Risk'
when Percentage between 75 and 99 then 'High Risk'
when Percentage between 50 and 74 then 'Moderate Risk'
when Percentage between 25 and 49 then 'Low Risk'
when Percentage <25 then 'Very Low Risk' end as Risk_category_60_Days
from default_scope)
select
Risk_category_60_Days,
count(1) as Frequency,
cast(count(1)*100.0/(select count(1) from risk_scope) as decimal(5,2)) as Percentage
from risk_scope
group by Risk_category_60_Days
order by Percentage desc;
with default_scope as
(select isnull(cast(DEF_30_CNT_SOCIAL_CIRCLE*100.0/NULLIF(OBS_30_CNT_SOCIAL_CIRCLE,0)
as decimal(5,2)),0) as Percentage
from application_data)
,risk_scope as
(select
case when Percentage=100 then 'Very High Risk'
when Percentage between 75 and 99 then 'High Risk'
when Percentage between 50 and 74 then 'Moderate Risk'
when Percentage between 25 and 49 then 'Low Risk'
when Percentage <25 then 'Very Low Risk' end as Risk_category_30_Days
from default_scope)
select
Risk_category_30_Days,
count(1) as Frequency,
cast(count(1)*100.0/(select count(1) from risk_scope) as decimal(5,2)) as Percentage
from risk_scope
group by Risk_category_30_Days
order by Percentage desc;
92% of the applications look to be of low risk based on the social
surroundings default history in the last 60 days.
This means that the geographical region is good to do business. The people
from that region have made timely payments, defaults not exceeding 60dpd.
Around 3% clients tend to be highly risky. Around 9421, customers to be
precise. 6760 clients have moderate risk.
Overall, the individual behaviours need to be given more weightage while
approving applications even though banks do have specific insights about
regions.
with default_scope as
(select
target,
isnull(cast(DEF_30_CNT_SOCIAL_CIRCLE*100.0/NULLIF(OBS_30_CNT_SOCIAL_CIRCLE,0) as
decimal(5,2)),0) as Percentage
from application_data)
,risk_scope as
(select
target,
case when Percentage=100 then 'Very High Risk'
when Percentage between 75 and 99 then 'High Risk'
when Percentage between 50 and 74 then 'Moderate Risk'
when Percentage between 25 and 49 then 'Low Risk'
when Percentage <25 then 'Very Low Risk' end as Risk_category_30_Days
from default_scope)
select
case when target = 0 then 'Never had Payment Difficulties' else 'Had Payment
Difficulties' end as Target
,Risk_category_30_Days
,count(1) as Frequency
,cast(count(1)*100.0/(select count(1) from risk_scope) as decimal(5,2)) as
Percentage
from risk_scope
group by case when target = 0 then 'Never had Payment Difficulties'
else 'Had Payment Difficulties' end, Risk_category_30_Days
order by Target;
Around 7% customers who are Very Low Risk based on the social
surrounding's 30 days payment default history have had Payment Difficulties.
This is the most important bracket according to me. These are the clients who
need to be studied more. A deeper dive on the client demographics is crucial
to understand this.
Proper meetings with the Debt Managers and other heads of the Collection
team will reveal the reason on why the clients defaulted. Maybe they had an
emergency, maybe the collection method was not appropriate.
It could also happen that they changed their address or they could not be
contacted via email or cell.
For the clients who never had any Payment Difficulties, proper customer
service, cross-product selling, long-term relationship building and proper
customer service is the key.
Deeper analysis on the Contact reach for clients who had payment difficulties but
were from the Very Low Risk social surroundings
with default_scope as
(select
target
,case when FLAG_MOBIL+FLAG_EMP_PHONE+FLAG_WORK_PHONE =3 then 'All Contacts
Available'
when FLAG_MOBIL+FLAG_EMP_PHONE+FLAG_WORK_PHONE =2 then 'Two Contacts Available'
when FLAG_MOBIL+FLAG_EMP_PHONE+FLAG_WORK_PHONE =1 then '1 Contact Available'
else 'No Contact Available' end as contacts_provided
,isnull(cast(DEF_30_CNT_SOCIAL_CIRCLE*100.0/NULLIF(OBS_30_CNT_SOCIAL_CIRCLE,0) as
decimal(5,2)),0) as Percentage
from application_data)
,risk_scope as
(select
target,
contacts_provided,
case when Percentage=100 then 'Very High Risk'
when Percentage between 75 and 99 then 'High Risk'
when Percentage between 50 and 74 then 'Moderate Risk'
when Percentage between 25 and 49 then 'Low Risk'
when Percentage <25 then 'Very Low Risk' end as Risk_category_30_Days
from default_scope)
,risk_based_on_contact_reach as
(select
case when target = 0 then 'Never had Payment Difficulties’ else 'Had Payment
Difficulties' end as Target
,contacts_provided
,Risk_category_30_Days
,count(1) as Frequency
,cast(count(1)*100.0/(select count(1) from risk_scope) as decimal(5,2)) as
Percentage
from risk_scope
group by case when target = 0 then 'Never had Payment Difficulties'
else 'Had Payment Difficulties' end, Risk_category_30_Days,contacts_provided)
select
Target,
contacts_provided,
Risk_category_30_Days,
Frequency,
cast(Frequency*100.0/sum(frequency)over() as decimal(5,2)) as Percentage
from risk_based_on_contact_reach
where Target = 'Had Payment Difficulties' and Risk_category_30_Days = 'Very Low Risk'
order by Percentage desc;
Out of the clients who have had payment difficulties and were from Very Low
Risk regions, all contacts were available for around 24% clients.
64% clients have provided 2 contacts and 12% clients have provided only 1
contact. The team needs to get access of more contact details for these two
classes of clients.
There could be family relatives of these clients whom the bank can contact. Of
course it is done only in extreme cases. Usually, it is done for clients having
more than 90-120dpd or Bucket 3-4.
Further Analysis needs to be done whether the client lives in the given city or
not. Also, an assessment of the credit collection team needs to done.
All changes made in the collection strategy should be analysed. Redundant
changes should be overruled.
with default_scope as
(select
target
,case when REG_REGION_NOT_LIVE_REGION = 1 then 'Address Mismatch' else 'Address
Match' end as Address_city_match
,case when FLAG_MOBIL+FLAG_EMP_PHONE+FLAG_WORK_PHONE =3 then 'All Contacts
Available'
when FLAG_MOBIL+FLAG_EMP_PHONE+FLAG_WORK_PHONE =2 then 'Two Contacts Available'
when FLAG_MOBIL+FLAG_EMP_PHONE+FLAG_WORK_PHONE =1 then '1 Contact Available'
else 'No Contact Available' end as contacts_provided
,isnull(cast(DEF_30_CNT_SOCIAL_CIRCLE*100.0/NULLIF(OBS_30_CNT_SOCIAL_CIRCLE,0) as
decimal(5,2)),0) as Percentage
from application_data)
,risk_scope as
(select
target
,contacts_provided,Address_city_match,
case when Percentage=100 then 'Very High Risk'
when Percentage between 75 and 99 then 'High Risk'
when Percentage between 50 and 74 then 'Moderate Risk'
when Percentage between 25 and 49 then 'Low Risk'
when Percentage <25 then 'Very Low Risk' end as Risk_category_30_Days
from default_scope)
,risk_based_on_contact_reach as
(select
case when target = 0 then 'Never had Payment Difficulties’ else 'Had Payment
Difficulties' end as Target
,Address_city_match
,contacts_provided
,Risk_category_30_Days
,count(1) as Frequency
,cast(count(1)*100.0/(select count(1) from risk_scope) as decimal(5,2)) as
Percentage
from risk_scope
group by case when target = 0 then 'Never had Payment Difficulties'
else 'Had Payment Difficulties' end,
Risk_category_30_Days,contacts_provided,Address_city_match)
select
Target,
contacts_provided,
Address_city_match,
Risk_category_30_Days,
Frequency,
cast(Frequency*100.0/sum(frequency)over() as decimal(5,2)) as Percentage
from risk_based_on_contact_reach
where Target = 'Had Payment Difficulties' and Risk_category_30_Days = 'Very Low Risk'
order by Percentage desc;
Around 2% Cases had an address mismatch, while having the contact details.
Although it is a tiny fraction of the whole, it should still be assessed by the
debt managers.
The underlying reasons for their payment difficulties could be unavailability of
funds, lack of contingency fund, or a typical pay in the beginning and then
default kind of scenario.
Integration of previous application data
with credit_data as
(select
case when AMT_APPLICATION between 0 and 500000 then 'Very Low Amount'
when AMT_APPLICATION between 500001 and 1000000 then 'Low Amount'
when AMT_APPLICATION between 1000001 and 1500000 then 'Moderate Amount'
when AMT_APPLICATION between 1500001 and 2000000 then 'High Amount' else 'Very High
Amount' end as prev_credits
from application_data a
join previous_application p on a.SK_ID_CURR = p.SK_ID_CURR)
select
prev_credits,
count(1) as frequency,
cast(count(1)*100.0/(select count(1) from credit_data) as decimal(5,2)) as
Percentage
from credit_data
group by prev_credits;
Top 15 customers and contact reach
with prev_app_data as
(select
SK_ID_CURR
,count(sk_id_prev) as previous_applications
,cast(sum(case when NAME_CONTRACT_STATUS = 'approved' then 1 else 0
end)*100.0/count(SK_ID_PREV) as decimal(5,2)) as application_approval_rate
from previous_application
group by SK_ID_CURR
having cast(sum(case when NAME_CONTRACT_STATUS = 'approved' then 1 else 0
end)*100.0/count(SK_ID_PREV) as decimal(5,2)) =100.0)
select top 15
p.*
,a.NAME_INCOME_TYPE,a.NAME_EDUCATION_TYPE,OCCUPATION_TYPE
,case when FLAG_MOBIL+FLAG_EMP_PHONE+FLAG_WORK_PHONE =3 then 'All Contacts
Available'
when FLAG_MOBIL+FLAG_EMP_PHONE+FLAG_WORK_PHONE =2 then 'Two Contacts Available'
when FLAG_MOBIL+FLAG_EMP_PHONE+FLAG_WORK_PHONE =1 then '1 Contact Available' else
'No Contact Available' end as contacts_provided
from prev_app_data p
join application_data a on p.SK_ID_CURR = a.SK_ID_CURR
order by previous_applications desc;
INSIGHTS & RECOMMENDATIONS
The bank should try to source more Revolving Loans
Provide more loans to Businessmen
Targeting more Single person could give banks more income. These are the
customers with whom banks can build a long-term relationship and provide
products at every stage of life. Of course this comes with a higher risk, but an
evaluation of current Single clients could reveal the credit behaviour of this
class
Reach out to the addresses of the clients whose contact info is unreachable
Occupation details are missing for more than 31.35% of the clients. The bank
should reach out and collect more information about it. This not only ensures
more security; it also gives the bank a chance to pitch more products
according to the client's occupation
Reach out to more Occupations like HR Staff, IT Staff and Realty Agents.
Train employees/agents to reach out to Tier 1 Regions. Need to penetrate and
investigate the reasons on why the reach is so low on Tier 1 & 3 Regions. One
of the most effective ways is to have periodical meetings with the Executives
managing the Sales Channels. They work on ground level and can say the
correct reason. Also, doing so empowers & motivates them that the upper
management takes their ideas & it makes them feel important and needed
Reach out to Students or the young age group by tying up with Universities,
Colleges & other Online/Offline Education Institutes
Maintain the current volume of Sales Programs/Strategies on regions,
occupations, classes where there is high application rate
More analysis is required on the 4 stages - Pre-Transaction, During
Transaction, Post Transaction & Renewal
Target Low Risk Customers as well. Tailor made solutions for these buckets
could prove fruitful for the business. Cross product targeting to Low & Very
Low Risk classes, tie ups with their organisations (if any) and building long
term relationships is the key for a stable & profitable business
Deeper Analysis on High Risk & Moderate Risk Clients needs to be done. The
quantum of profit from these customers’ needs to be taken into consideration
A Very Low Risk client giving less revenue might be less preferable than a
Moderate Risky client giving more revenue
A lot of the bank's revenue depends on how the Credit Collection team
functions. Proper methodology and action on the ground level ensures timely
payment collection
Periodical training of debt managers, collection agents, third party vendors
need to be done to deal with cases where the contact details are available and
the social surroundings have Very Low Risk in terms of Payment, but the client
has defaulted Also, harsh customer service or debt collection methods can
hurt the brand image in the mind of the client and in the surrounding(long-
term). Proper check needs to be taken to ensure that the methods are strict
but not overly harsh
The bank needs to provide the clients with the proper information about the
effects a default can have on the credit score and the future difficulties the
client could be facing. There could be instances where the debt managers are
too rigid with the collection while they should be educating the customers
about the consequences of such behaviour
The bank could enquire about the persons who were accompanying the client
during the application. The employees at the bank should be well trained to
build knowledge about that person. This increases reliability on the client who
is applying for credit as well as gives an opportunity to pitch products to the
companion
There is a need to sit down with the people working on ground level and
providing them with the info of the analysis. Integrating these minute details
could be really fruitful for any organisation. Banking as a sector is highly
personalised. It becomes unavoidable to take in account these intricate details
and apply them in the day-to-day operations. Ex - Finding that a person has
incomplete education, could mean that they started a venture. Although the
bank could have details about the person's org, a 5 min conversation of the
relationship manager with the client about his journey from being a dropout
to starting his own venture could have a really positive outlook
Although it is cheaper for a bank to maintain current customers than acquiring
new ones, it should try to target more clients who have completed their
higher education. A large chunk of clients has only completed secondary
education
CHALLENGES ON RESEARCH
The Organization type description was not clear. Terms like 'Business Entity
Type 1', 'Industry Type 1' was vague
Application Date is absent, no analysis could be performed in that aspect. We
could not ascertain the increase and decrease in count of applications or the
revenue over various periods
It is important to know the peak seasons. Usually, the need for credit arrives
when there is shortage of money. Month wise, it is the 3rd week of a month.
This is the time when the need of credit arrives due to unplanned
expenditures or increased spending. Year wise, people tend to need more
credit during the 3rd & 4th Quarter. This is the peak time for retail shopping
Rural & Urban Segments could not be analysed since it was not clear from the
data
The same customer might have multiple applications
The enquiries made on the client's credit report to the credit bureau do not
highlight which banks enquired about the client. An analysis of that data could
reveal
whether it was this bank or multiple banks involved
The Quantum of Revenue is missing in these applications. It is a crucial aspect
of analysis
CHALLENGES ON THE BANK
In general, there is a fall in NPA in India, which is a good sign. It now remains
as a challenge on the bank's end to take advantage of this factor while facing
competition from other players
There is minimal control over Interest Rates. It is a question of marketing.
To increase the profitability, on the revenue side, the bank needs to either
increase its number of clients or its revenue charges (annual fees, transaction
fees, etc. – these are normally regulated)
On the cost side, the bank needs to decrease its fixed/variable costs. Fixed
costs like Rent, Maintenance, Employee Salary, etc need to be checked.
Variable costs include interests on deposits, customer handing costs, etc.