UC3BPR20 202425 NUC BPR Project Proposal Thinh Tran
UC3BPR20 202425 NUC BPR Project Proposal Thinh Tran
Thinh Tran
Applied Data Science
1 Introduction
In this digital age, data breaches related to authentication credentials are common but devastating. Millions
of username and password credentials have been compromised in these incidents, causing severe reper-
cussions for individuals and organizations. Despite the obvious flaws in passwords as a security measure,
they still boast the most widely used authentication form over digital platforms, making them primary targets
for cyber attackers.
An odd trend continues: with the advanced security measures today’s people enjoy, such as password man-
agement software, multi-factor authentication (MFA), and password strength indicators, weak and guess-
able passwords are still common among users. These passwords provide easy prey to brute-force attacks
and dictionary-based cracking methods. Research shows that many people’s passwords follow predictable
patterns, are based on personal information, or draw in figures from mainstream culture that everyone is
familiar with. All of these factors make it easier for attackers to get at them. Users’ repeated patterns of
behavior ensure that the consequences of data breaches are ever-increasing.
The core objective of this project is to learn why users constantly go for weak passwords and to analyze the
trends behind their password choices. Using the AuthInfo-Dataset, which gives anonymized user names
and password pairs culled from various breaches, the thesis will attempt to identify these common behaviors
that affect people’s selection of passwords. This research will focus on recognizing repeated patterns,
external influences (like popular culture or famous venues), and user habits that lead to weakened password
security. Through gaining these insights, the thesis aims to make some recommendations that could be of
genuine service in future efforts to make computer passwords more secure and, at the same time, raise
awareness about the importance of creating more robust and less predictable password systems.
1
Project Proposal - BRP24/25
their passwords, how breach data can be used to make more accurate policies, and how real-time feedback
can better direct users in password creation procedures (Bhagavatula et al., 2020).
One crucial development is the increased availability of personal data on social media platforms and other
public places online, which gives attackers more chances to guess passwords based on private information
about us (Atzori et al., 2024). In this environment, researchers and cyber security professionals have both
challenges and unique opportunities in their efforts to raise the overall security level of password-based
systems.
UC3BPR 2024 2
Project Proposal - BRP24/25
passwords helps users fully understand the weakness of their password choices and encourages them to
create stronger, more secure passwords. Through Farooq’s research, this paper also shows that combining
machine learning with visualization can considerably change user behavior in real time to upgrade password
security.
Güven et al., 2022 have pointed out the role that machine learning has played in identifying regular patterns
used when selecting passwords. After analyzing a complete dataset with over ten million userID-password
pairs from breached databases, researchers demonstrated that their machine learning models could predict
which passwords were weak based on what other passwords certain users had chosen. This shows the
potential of machine learning to change the paradigm of password management systems, enhancing their
efficiency through recognition and avoidance of vulnerabilities before an attack occurs.
Machine learning has also been used to enhance the security of password entry systems. For instance, Lee
and Yim, 2020 examined how machine learning models protected against keylogging attacks. Keyloggers
are malicious software programs that steal passwords by intercepting and recording users entering them.
Researchers could significantly increase the security of password entry mechanisms by teaching machine
learning models to differentiate between genuine user inputs and the keystroke records accumulated by
keyloggers. This reflects the potential of machine learning in defending against more sophisticated forms
of cyber attacks precisely aimed at password systems.
Another primary use of machine learning in password safety is recognizing weak practices among de-
velopers. Lykousas and Patsakis, 2024 studied the password practices of developers by comparing Git-
Hub repository passwords with those in the Rockyou2021 leaked password dataset. Developers generally
showed higher password strength than the average user, but there was still a significant security gap among
system complexities where control was not executed. It is necessary to set up machine learning-driven
monitoring systems to immediately spot and flag insecure password practices (Lykousas and Patsakis,
2024).
UC3BPR 2024 3
Project Proposal - BRP24/25
Conclusion
In conclusion, as revealed through the evolution of research into security passwords, whether user behavior
is good or bad is critical in determining the effectiveness of password-based security systems. Even though
mechanisms have been established for password policies, machine learning, and data visualization, some
traditions, such as using birthdays as passwords, still remain. People’s habits are what place them in
danger. Although users have proven intelligent, they exhibit behaviors that render their security unharmed.
To deal with these challenges, it is critically important to take a multi-faceted approach that incorporates
extensive data analysis, machine learning, and a collection of those slivers of behavior comprised of the
human psyche.
Using machine learning models, real-time feedback, and visualization tools represents an auspicious dir-
ection for improving people’s password habits. As research by Güven et al., 2022 and others like Dastane
et al., 2020 has demonstrated, these technologies offer immediate, actionable insights that enable users
to produce stronger and more secure passwords. Additionally, by tackling the dangers of loss of creden-
tials and password reuse through instruction and the adoption of 2FA, password security may be further
enhanced in an increasingly digital world.
In the end, a comprehensive strategy is required to cope with the risks posed by lax password-choosing
practices. As cyber perils become ever more varied and refined, so too should measures for improving
password security be modified. To this end, users must acquire the means and know-how needed to
protect their digital identities.
3 Problem Statement
Passwords are critical in protecting sensitive information, but users continue to create insufficiently strong or
predictable passwords despite ongoing security threats and constant unheeded publicity. People often go
for vulnerable practices like simple patterns, easily guessed familiar personal details or simple components.
This is one contributory factor to making a poor-quality password. Breached data sets also reveal how
attackers utilize these weak passwords in their successful parlay, while current password policies typically
do not address the underlying problem. The reuse of compromised passwords across millions of users time
and again presents significant risks to individual security and organizational infrastructure.
However, the problem continues. The security provisions we have today address this weakness in pass-
word creation. This project uses the AuthInfo dataset to visualize and analyze these patterns thoroughly,
highlighting the dangers customers face when choosing passwords.
Problem Statement: Users continue to select weak and easily guessable passwords, exposing themselves
and organizations to significant security risks. Current password policies are insufficient to address this
issue, and understanding these behaviors through data analysis is crucial to developing more effective
security practices.
UC3BPR 2024 4
Project Proposal - BRP24/25
4 Research Objectives
The primary objective of this research is to analyze patterns in user password behaviors using the AuthInfo-
Dataset and to develop actionable recommendations for improving password security policies.
In order to achieve this primary objective, the following sub-objectives have been identified:
• Identify common patterns in password creation by analyzing the dataset, focusing on recurring
password characteristics such as length, complexity, and use of predictable elements.
• Evaluate password strength through brute force simulations and statistical analysis, determining
how easily certain passwords can be compromised.
• Examine the impact of user behavior on password security, particularly how habits like password
reuse or predictable modifications affect overall vulnerability.
• Investigate the influence of external factors (e.g., popular culture, personal information) on user
password choices and how these factors contribute to security risks.
• Develop recommendations for improved password policies that balance security with user con-
venience, based on insights gained from the analysis.
• Propose educational strategies to raise awareness about secure password practices and the im-
portance of using strong, unique passwords.
5 Research Methodology
This research will use AuthInfo-Dataset to analyze user password behaviors. It will be a data-driven ap-
proach, and the main purpose is to understand better patterns in people’s password selection, common
points of vulnerability, and the entropic nature of these choices. This methodology is selected because it
meets research objectives, such as identifying prevailing password patterns, judging how strong passwords
are, and, of course, offering practical recommendations based on real at-rest data.
Data Analysis and Visualization: The research will use statistical methods to thoroughly explore the
dataset and find patterns in password characteristics. This will include visualizing distributions of these
characteristics and identifying which clusters may be outlying points. Python’s Pandas, Matplotlib, and
Seaborn are three tools we will use to achieve this goal. The goal is to turn raw data into insight, which can
be used in further analysis.
Interactive Dashboard Development: To best communicate our findings, an interactive dashboard will be
created. This dashboard will allow users to manipulate their presentations on various password behaviors,
such as filtering by password length, complexity, or frequency. The dashboard will use Tableau and similar
tools to provide visual summaries and let stakeholders understand patterns in the data without requiring
UC3BPR 2024 5
Project Proposal - BRP24/25
any IT knowledge. This approach is suitable for presenting complex data to others as an activity rather than
a conversation and helps them do something about what they have learned.
Machine Learning Models for Pattern Detection: Machine learning models such as clustering and clas-
sification algorithms will be used to discover underlying patterns in the way passwords are chosen. Models
like K-means clustering can put passwords into different clusters based on their characteristics. In contrast,
classification models can predict whether a password is weak or strong independent of these features.
These models will yield a fuller understanding of user activities and lead to advice on creating better pass-
word policies for the future.
Justification and Relevance: A data-driven approach is legitimized by the need to understand password
behaviors on a large scale, which cannot be done through qualitative methods alone. An interactive dash-
board helps make the findings more accessible, letting security practitioners and policy-makers understand
what they are getting and implement changes with less effort. This methodology is in line with the develop-
ment of similar patterns in data (Yang et al., 2016), which shows a combination of statistical analysis and
machine learning techniques to understand and visualize people’s passwords.
UC3BPR 2024 6
Project Proposal - BRP24/25
Artefact Features
• Interactive visualizations: Graphs, charts, and heat maps displaying password characteristics, such
as complexity, length, and common patterns across user groups.
• Brute force simulation results: Visual representations of how long certain password types would
take to crack under different attack strategies.
• Predictive models: Machine learning-based insights into the likelihood of password compromise
based on observed patterns.
Evaluation Plan
The artefact will be evaluated through the following methods:
• Usability Testing: End-users (researchers, cybersecurity professionals, and non-experts) will interact
with the Tableau dashboard to assess its ease of use, clarity, and effectiveness in communicating
password behavior insights.
• Performance Testing: The machine learning models will be tested for accuracy and performance in
predicting password vulnerability. Key metrics such as precision, recall, and F1-score will be used to
evaluate the effectiveness of these models.
• Expert Feedback: Feedback will be gathered from cybersecurity experts to evaluate the artefact’s
relevance and applicability to real-world password security challenges.
This approach ensures that the dashboard and machine learning models not only provide valuable insights
but also meet the usability and functional needs of the target audience.
References
Atzori, M., Calò, E., Caruccio, L., Cirillo, S., Polese, G., & Solimando, G. (2024). Evaluating password
strength based on information spread on social networks: A combined approach relying on data
reconstruction and generative models. Online Social Networks and Media, 42, 100278. https://doi.
org/https://doi.org/10.1016/j.osnem.2024.100278
Bhagavatula, S., Bauer, L., & Kapadia, A. (2020). (how) do people change their passwords after a breach?
https://arxiv.org/abs/2010.09853
Dastane, O., Bakon, K., & Johari, Z. (2020). The effect of bad password habits on personal data breach.
International Journal of Emerging Trends in Engineering Research, 8, 6950–6960. https://doi.org/
10.30534/ijeter/2020/538102020
Farooq, U. (2020). Real time password strength analysis on a web application using multiple machine
learning approaches. International Journal of Engineering Research & Technology (IJERT), 9(12).
Güven, E. Y., Boyaci, A., & Aydin, M. A. (2022). A novel password policy focusing on altering user password
selection habits: A statistical analysis on breached data. Comput. Secur., 113(100). https://doi.org/
10.1016/j.cose.2021.102560
Lee, K., & Yim, K. (2020). Cybersecurity threats based on machine learning-based offensive technique for
password authentication. Applied Sciences, 10(4). https://doi.org/10.3390/app10041286
Lykousas, N., & Patsakis, C. (2024). Decoding developer password patterns: A comparative analysis of
password extraction and selection practices. Computers Security, 145, 103974. https://doi.org/
https://doi.org/10.1016/j.cose.2024.103974
Shay, R., Komanduri, S., Kelley, P. G., Leon, P. G., Mazurek, M. L., Bauer, L., Christin, N., & Cranor, L. F.
(2010). Encountering stronger password requirements: User attitudes and behaviors. Proceedings
of the Sixth Symposium on Usable Privacy and Security. https://doi.org/10.1145/1837110.1837113
Thomas, K., Li, F., Zand, A., Barrett, J., Ranieri, J., Invernizzi, L., Markov, Y., Comanescu, O., Eranti, V., Mos-
cicki, A., Margolis, D., Paxson, V., & Bursztein, E. (2017). Data breaches, phishing, or malware? un-
derstanding the risks of stolen credentials. Proceedings of the 2017 ACM SIGSAC Conference on
Computer and Communications Security, 1421–1434. https://doi.org/10.1145/3133956.3134067
Yang, W., Li, N., Molloy, I. M., Park, Y., & Chari, S. N. (2016). Comparing password ranking algorithms on
real-world password datasets. In I. Askoxylakis, S. Ioannidis, S. Katsikas & C. Meadows (Eds.),
Computer security – esorics 2016 (pp. 69–90). Springer International Publishing.
Yu, X., & Liao, Q. (2016). User password repetitive patterns analysis and visualization. Information and
Computer Security, 24, 93–115. https://doi.org/10.1108/ICS-06-2015-0026
UC3BPR 2024 7
Project Proposal - BRP24/25
Zhang-Kennedy, L., Chiasson, S., & Biddle, R. (2013). Password advice shouldn’t be boring: Visualizing
password guessing attacks. 2013 APWG eCrime Researchers Summit, 1–11. https://doi.org/10.
1109/eCRS.2013.6805770
UC3BPR 2024 8
Project Proposal - BRP24/25
Total Sum count: 3953 Words in text: 1254 Words in headers: 2699 Words outside
text (captions, etc.): 0 Number of headers: 17 Number of floats/tables/figures: 0
Number of math inlines: 0 Number of math displayed: 0
(errors:4) NOTE: References and Declaration page are excluded from the counts above.
Mandatory Declaration
Declarations
The individual student is responsible for familiarising themselves with the rules and regulations regarding
the use of sources, generated text and academic misconduct. Failure to declare does not release the
student from their responsibility, and will result in an automatic failure of the submission.
1. I hereby declare that the submission answer is my own work, and that I have not Yes
used other sources other than as is referenced and cited correctly, or received help
other than what is specifically acknowledged.
2. I further declare that this submission: Yes
• Has not been used for another exam in another course at Noroff University
College, at another department/university/college at home or abroad.
• Does not refer to or make use of the work of others without acknowledgement.
• Does not refer to my own previous work unless stated.
• Has all the references given in the bibliography.
• Is not a copy, duplicate or copy of someone else’s work or answer.
• Is not generated using AI generation tools.
3. I am aware that a breach of any of the above is to be regarded as cheating and may Yes
result in cancellation of the exam and exclusion from universities and colleges in Nor-
way, cf. University and College Act §§4-7 and 4-8 and Regulations on examinations
§§ 31.
4. I am aware that all components of this assignments may be checked for plagiarism Yes
and other forms of academic misconduct.
5. I hereby acknowledge that I have been taught the appropriate ways to use the work Yes
of other researchers. I undertake to paraphrase, cite, and reference according to
the acceptable academic practices, in accordance with the rules and guidelines, as
taught.
6. I am aware that Noroff University College will process all cases where cheating is Yes
suspected in accordance with the college’s guidelines.
UC3BPR 2024 9