As e Jar 2024030301

This paper presents a method for predicting password complexity using the RoBERTa algorithm, achieving high accuracy rates of over 99% in classifying password strength. The study emphasizes the importance of password security in the digital age and demonstrates how machine learning can enhance user awareness and protection against cyber threats. The findings suggest that the RoBERTa model effectively provides users with timely feedback to improve password complexity and security.

Uploaded by

mohammed.mostafa3m

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views5 pages

As e Jar 2024030301

Uploaded by

mohammed.mostafa3m

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Applied Science & Engineering Journal for Advanced Research Peer Reviewed and Refereed Journal

ISSN (Online): 2583-2468

Volume-3 Issue-3 || May 2024 || PP. 1-5 DOI: 10.5281/zenodo.11180356

Password Complexity Prediction Based on RoBERTa Algorithm

Yuhong Mo1, Shaojie Li2, Yushan Dong3, Ziyi Zhu4 and Zhenglin Li5
1
College of Engineering, Carnegie Mellon University, PA, Pittsburgh, 15213, USA
2
Huacong Qingjiao Information Technology (Beijing) Co., Ltd., Beijing, China
3
University of Maryland, MD, USA
4
New York University, USA
5
Texas A&M University, USA
1
Corresponding Author: [email protected]

Received: 09-04-2024 Revised: 25-04-2024 Accepted: 10-05-2024

ABSTRACT
Corresponding author email: In the digital age, password security is a top priority for protecting personal information.
Machine learning techniques provide us with intelligent and efficient means to enhance password security. In this paper, we
adopt RoBERTa algorithm and use the password complexity text dataset for password complexity prediction, and the confusion
matrix and accuracy rate of the three classifications are derived through two model trainings. The confusion matrix shows that
the vast majority of the classification results are accurate, and the accuracy of the two classifications is over 99.741% and
99.11%, respectively. This indicates that the model is able to effectively predict password complexity, provide users with
accurate feedback, and prompt users to enhance password security in a timely manner. Through this study, we can better
understand how to use machine learning technology to improve password security and protect personal private information
from malicious intrusion. In our daily life, we should pay attention to the complexity of password settings and realise the
importance of password security for personal information protection. We look forward to the launch of more similar studies in
the future to further strengthen cybersecurity protection measures and work together to build a more secure and reliable
digital environment.

Keywords: password complexity, roberta, accuracy

I. INTRODUCTION
Password security is crucial in today's digital society, which is directly related to personal privacy, property security,
and information security. A strong password can effectively protect personal accounts from the threat of hacking, malware
invasion, or information leakage [1]. The higher the complexity of a password, the more difficult it is to crack it, so setting and
managing passwords appropriately is crucial for protecting personal information.
Machine learning plays an important role in predicting password security complexity. Machine learning algorithms
can analyse a large amount of password data and mine the patterns and regularities in it to help users create more secure and
complex passwords. For example, machine learning can identify common weak password patterns (e.g., consecutive numbers,
simple dictionary words, etc.) and remind users to avoid these password combinations that can be easily guessed or cracked
[2,3].
In addition, machine learning can generate personalised password suggestions based on users' personal information
and usage habits. By analysing the user's preferences, birthdays, common vocabulary and other information, machine learning
can recommend password combinations that meet personal preferences but have a certain degree of complexity, helping the
user to strike a balance between memory convenience and security [4]. In addition, machine learning can be applied to detect
password leakage and malicious attacks. By monitoring network traffic and login behaviour data, the machine learning system
can detect abnormal activities in time and take corresponding measures to prevent hackers from using leaked passwords to
carry out malicious login or vandalism.
Password security is the first line of defence for personal information security in the digital era, while machine
learning technology provides us with more intelligent and efficient means to enhance password security [5]. Therefore, in our
daily life we should pay attention to and strictly comply with the recommendations on password setting and management, and
use technological means to continuously improve the safety and security level of our accounts and information.

https://asejar.singhpublication.com 1|Page
Applied Science & Engineering Journal for Advanced Research Peer Reviewed and Refereed Journal
ISSN (Online): 2583-2468
Volume-3 Issue-3 || May 2024 || PP. 1-5 DOI: 10.5281/zenodo.11180356

II. DATA SOURCES

The Password Complexity dataset is a dataset used to train and test password strength prediction models that contains
the text of the passwords along with the corresponding password complexity labels (0, 1, and 2). This dataset is commonly
used for password security research in the field of machine learning and is intended to help developers build algorithms that
can automatically evaluate password strength. By using such a password complexity dataset, machine learning models can be
trained to automatically recognise and assess the strength of user-created passwords and provide feedback and suggestions
accordingly. This can help strengthen cybersecurity measures and raise users' awareness of personal information security.
The dataset used in this paper is an open source dataset, which classifies the complexity of passwords into 3
categories, the dataset consists of 2 columns, the first column is the passwords in the form of strings, and the second column is
the complexity level of the passwords, whose complexity levels are 0, 1, and 2, where 0 is an unreliable password, 1 is a
moderately reliable password, and 2 is a very reliable password. Part of the dataset is shown in Table 1, the password
complexity of the data is counted and the result is shown in Figure 1. The length statistics of the texts of the three cipher
complexities are carried out, i.e., each text is counted according to the number of words in the text, and finally the length
information of each text is summarised, and the results are shown in Fig. 2.

Table 1: Partial data.

Password Strength
yrtzuab476 1
yEdnN9jc1NgzkkBP 2
sarita99 1
Suramerica2015 2
PPRbMvDIxMQ19TMo 2
yuri2011 1
za1njutt 1
g4d96apen 1
amormio123 1
cat700700 1
7ZWdMXTI1MwJaAVo 2
maroccon123 1
9ke0bvq 0
3IJ2RUAB 1
124dscate 1
4DorrxDQ5OAI16PL 2
bentor123 1
kristian1997 1
rlh0u771 1
porseo74 1

https://asejar.singhpublication.com 2|Page
Applied Science & Engineering Journal for Advanced Research Peer Reviewed and Refereed Journal
ISSN (Online): 2583-2468
Volume-3 Issue-3 || May 2024 || PP. 1-5 DOI: 10.5281/zenodo.11180356

Figure 1: Statistical bar charts

（Photo credit : Original）

Figure 2: Statistical aggregation

（Photo credit : Original）

III. METHOD
RoBERTa (A Robustly Optimised BERT Approach) is a natural language processing model proposed by the
Facebook AI team in 2019, which is based on the BERT (Bidirectional Encoder Representations from Transformers) model is
improved and optimised [6,7].The principle of RoBERTa basically continues the architecture of BERT, but a series of
improvements are made during the training and optimisation process, which makes RoBERTa achieve better performance on
several natural language processing tasks.
RoBERTa uses a much larger dataset for pre-training. Compared to the original dataset used by BERT, RoBERTa
uses more unlabelled text data and employs a higher number of iterations in the pre-training task, which helps the model learn
the language representation better. In addition, RoBERTa introduces a dynamic masking strategy, i.e., randomly masking
words at different positions in each training batch, which helps to improve the model's understanding of contextual information
[8].
RoBERTa improves on the mask prediction task during pre-training. Unlike BERT, which uses the Next Sentence
Prediction (NSP) task during pre-training to help the model understand the relationships between sentences, RoBERTa only
uses the Masked Language Model (MLM) task, which means that it only needs to predict the correct form of the masked words
[9]. This simplified pre-training objective allows the model to better learn the lexical representation and avoids the effects of
noise that may be introduced by the NSP task.
In addition, RoBERTa employs a dynamic lexical replacement strategy to enhance the model generalisation
capability. During the pre-training process, a portion of the masked vocabulary is randomly selected and replaced with other
vocabulary, which is then restored back to the original vocabulary in a subsequent step [10]. This approach allows the model to
https://asejar.singhpublication.com 3|Page
Applied Science & Engineering Journal for Advanced Research Peer Reviewed and Refereed Journal
ISSN (Online): 2583-2468
Volume-3 Issue-3 || May 2024 || PP. 1-5 DOI: 10.5281/zenodo.11180356

have better adaptability to words that have appeared but not seen in the input data and improves its generalisation performance.
In the fine-tuning phase, RoBERTa further optimises the performance by using larger scales, longer sequence lengths, and
smaller learning rates. In addition, other techniques such as multi-task learning and migration learning can be combined during
fine-tuning to further improve the model's performance on specific tasks.
Overall, by optimising the pre-training process, improving the mask prediction task, enhancing the generalisation
capability, and optimising the fine-tuning phase, RoBERTa has achieved significant improvements on a variety of natural
language processing tasks, and has become one of the most highly regarded models in the field today.

IV. EXPERIMENTS AND RESULTS

BATCH SIZE is set to 32, which divides the training set and test set in the ratio of 8:2, 80% of the data is used for
training and 20% for testing. The SEQUENCE LENGTH is set to 60 for processing the length of the input text sequence. In
defining the classifier, use the from preset method of RoBERTa classifier and pass the previously defined preprocessor as a
parameter, specifying num classes as 3, i.e., the number of target classes is 3 classes. Sparse Categorical Crossentropy is used
as the loss function, Adam optimiser is selected and the learning rate is set to 1e-5, using accuracy as the evaluation metric.
The model is trained twice to output the confusion matrix of the three classifications and their respective accuracies,
and the confusion matrices of the two trainings are shown in Fig. 2 and Fig. 3, respectively.

Figure 3: Confusion matrix

（Photo credit : Original）

Figure 4: Confusion matrix

（Photo credit : Original）
https://asejar.singhpublication.com 4|Page
Applied Science & Engineering Journal for Advanced Research Peer Reviewed and Refereed Journal
ISSN (Online): 2583-2468
Volume-3 Issue-3 || May 2024 || PP. 1-5 DOI: 10.5281/zenodo.11180356

From the confusion matrix, it can be seen that the vast majority of classifications are accurate, and the accuracy of the
two classifications is 99.741% and 99.11%, respectively, both of which are more than 99%, and the model is able to predict the
complexity of passwords very well, give the user accurate feedback, and remind the user to increase the complexity of
passwords in a timely manner.

V. CONCLUSION
The importance of password security as the first line of defence for personal information security in the digital age
cannot be overstated. With the continuous evolution and intensification of cyber-attack techniques, simple and easy-to-crack
passwords can no longer meet the needs of today's information security. Therefore, the use of machine learning techniques to
enhance password security has become an innovative initiative. The method of password complexity prediction based on
RoBERTa algorithm demonstrated in this paper not only enhances the intelligence level of password security detection, but
also provides users with more convenient and efficient means of protection.
In this paper, the prediction of password complexity based on RoBERTa algorithm is successfully achieved by
training the model twice on the basis of the selected textual dataset of password complexity, and the accurate prediction of
password complexity is successfully achieved. The analysis of the confusion matrix shows that the vast majority of the
classifications are accurate, and the accuracy of the two classifications reaches 99.741% and 99.11%, respectively, both
exceeding the high accuracy level of 99%. This indicates that the constructed RoBERTa model performs well on the task of
password complexity prediction, and is able to provide users with accurate feedback and timely reminders to increase the
complexity of their passwords.
The experimental results in this paper show that password complexity prediction using the RoBERTa algorithm has a
high degree of reliability and accuracy. an accuracy rate of more than 99% means that the model has a high degree of
correctness in the prediction process, and is able to identify and assess password complexity effectively. This will greatly
enhance users' alertness to personal information security issues and prompt them to take more rigorous and effective protection
measures.

REFERENCES
1. Mu, Pengyu, Wenhao Zhang, & Yuhong Mo. (2021). Research on spatio-temporal patterns of traffic operation index
hotspots based on big data mining technology. Basic & Clinical Pharmacology & Toxicology, 128 (111), River ST,
Hoboken, 07030-5774, NJ USA: Wiley.
2. Mo, Y., Qin, H., Dong, Y., Zhu, Z., & Li, Z. (2024). Large language model (llm) ai text generation detection based on
transformer deep learning algorithm. International Journal of Engineering and Management Research, 14(2), 154-
159.
3. Liu, B., Yu, L., Che, C., Lin, Q., Hu, H., & Zhao, X. (2023). Integration and performance analysis of artificial
intelligence and computer vision based on deep learning algorithms. arXiv preprint arXiv:2312.12872.
4. Zhang, Jingyu, et al. (2024). Research on detection of floating objects in river and lake based on AI intelligent image
recognition. arXiv preprint arXiv:2404.0688.
5. Xiang, Ao, et al. (2024). Research on splicingimage detection algorithms based on natural image statistical
characteristics. arXiv preprint arXiv:2404.16296.
6. Dai, Shuying, et al. (2024). The cloud-based design of unmanned constant temperature food delivery trolley in the
context of artificial intelligence. Journal of Computer Technology and Applied Mathematics, 11, 6-12.
7. Li, Zhenglin, et al. (2023). Stock market analysis and prediction using LSTM: A case study on technology stocks.
Innovations in Applied Engineering and Technology, 1-6.
8. Li, Shaojie, Yuhong Mo, & Zhenglin Li. (2022). Automated pneumonia detection in chest x-ray images using deep
learning model. Innovations in Applied Engineering and Technology, 1-6.
9. Mo, Y., Qin, H., Dong, Y., Zhu, Z., & Li, Z. (2024). Large language model (llm) ai text generation detection based on
transformer deep learning algorithm. International Journal of Engineering and Management Research, 14(2), 154-
159.
10. Mansouri, Mohamad, et al. (2023). Sok: Secure aggregation based on cryptographic schemes for federated learning.
Proceedings on Privacy Enhancing Technologies.

https://asejar.singhpublication.com 5|Page

Scopus Indexing Journal
No ratings yet
Scopus Indexing Journal
22 pages
Project Report G3
No ratings yet
Project Report G3
30 pages
ML Report 9 PDF
No ratings yet
ML Report 9 PDF
13 pages
Search Based Ordered Pass
No ratings yet
Search Based Ordered Pass
11 pages
Electronics 12 02159
No ratings yet
Electronics 12 02159
13 pages
UC3BPR20 202425 NUC BPR Project Proposal Thinh Tran
No ratings yet
UC3BPR20 202425 NUC BPR Project Proposal Thinh Tran
9 pages
Password Strength Tester Ijariie20152
No ratings yet
Password Strength Tester Ijariie20152
12 pages
Security System
No ratings yet
Security System
92 pages
Password Cracking and Tools
No ratings yet
Password Cracking and Tools
81 pages
Guess Again (And Again and Again) : Measuring Password Strength by Simulating Password-Cracking Algorithms
No ratings yet
Guess Again (And Again and Again) : Measuring Password Strength by Simulating Password-Cracking Algorithms
15 pages
Labs Manual For Module 1
No ratings yet
Labs Manual For Module 1
13 pages
Ijcnis V8 N7 4
No ratings yet
Ijcnis V8 N7 4
8 pages
Password Strengthening: Using Multi-Lingual Passwords
No ratings yet
Password Strengthening: Using Multi-Lingual Passwords
4 pages
Omen - Fast Password Guessing
No ratings yet
Omen - Fast Password Guessing
15 pages
Password Strength ML
No ratings yet
Password Strength ML
3 pages
Se Final Report1 PDF
100% (1)
Se Final Report1 PDF
26 pages
A Study For An Ideal Password Management System
No ratings yet
A Study For An Ideal Password Management System
7 pages
Chapter Two
No ratings yet
Chapter Two
29 pages
Password Strength - Wikipedia
No ratings yet
Password Strength - Wikipedia
84 pages
Synopsis
No ratings yet
Synopsis
5 pages
AutoPass An Automatic Password Generator
No ratings yet
AutoPass An Automatic Password Generator
23 pages
Keszthelyi 44
No ratings yet
Keszthelyi 44
21 pages
4779 Brute Forceanddictionaryattackonhashedreal Worldpasswords
No ratings yet
4779 Brute Forceanddictionaryattackonhashedreal Worldpasswords
7 pages
Cracking More Password Hashes With Patterns
No ratings yet
Cracking More Password Hashes With Patterns
69 pages
Ijert Ijert: Secured Password Authentication Based On Images
No ratings yet
Ijert Ijert: Secured Password Authentication Based On Images
5 pages
Analysing The Impact of Password Length and Complexity On The Effectiveness of Brute Force Attacks
No ratings yet
Analysing The Impact of Password Length and Complexity On The Effectiveness of Brute Force Attacks
19 pages
Abstract
No ratings yet
Abstract
20 pages
New Algorithm of Weak Password Detection
No ratings yet
New Algorithm of Weak Password Detection
6 pages
2009 Password Cracking Using Probabilistic Context Free Grammars 首次提出CFG 仅考虑lower 口令集 Weir 2009 S&P
No ratings yet
2009 Password Cracking Using Probabilistic Context Free Grammars 首次提出CFG 仅考虑lower 口令集 Weir 2009 S&P
15 pages
Mini Project
No ratings yet
Mini Project
16 pages
DN (Chapters)
No ratings yet
DN (Chapters)
32 pages
Protection by Stevo
No ratings yet
Protection by Stevo
19 pages
3.1.1.5 Lab - Noah-Create and Store Strong Passwords
No ratings yet
3.1.1.5 Lab - Noah-Create and Store Strong Passwords
3 pages
Robust ML Model for Phishing URL Detection
No ratings yet
Robust ML Model for Phishing URL Detection
20 pages
Odern Password Cracking: A Hands-On Approach To Creating An Optimised and Versatile Attack.
No ratings yet
Odern Password Cracking: A Hands-On Approach To Creating An Optimised and Versatile Attack.
65 pages
Password Strength Analyzer With Recommendations
No ratings yet
Password Strength Analyzer With Recommendations
11 pages
Graphical Passwords for End Users
No ratings yet
Graphical Passwords for End Users
248 pages
3.1.1.5 Lab - Create and Store Strong Passwords
No ratings yet
3.1.1.5 Lab - Create and Store Strong Passwords
3 pages
Pag Pass GPT
No ratings yet
Pag Pass GPT
14 pages
Password Security Guidelines: How Can You Ensure That Your Accounts Are Secure?
No ratings yet
Password Security Guidelines: How Can You Ensure That Your Accounts Are Secure?
22 pages
CSE 3118Y Week 02 About Password by Keszthelyi
No ratings yet
CSE 3118Y Week 02 About Password by Keszthelyi
20 pages
Pass Gan
No ratings yet
Pass Gan
12 pages
Online Social Networks and Media
No ratings yet
Online Social Networks and Media
23 pages
Project 2 Abstract
No ratings yet
Project 2 Abstract
1 page
Analyzing Password Decryption Techniques Using Dictionary Attack
No ratings yet
Analyzing Password Decryption Techniques Using Dictionary Attack
9 pages
Genpass: A Multi-Source Deep Learning Model For Password Guessing
No ratings yet
Genpass: A Multi-Source Deep Learning Model For Password Guessing
10 pages
Strong Password Creation Guide
No ratings yet
Strong Password Creation Guide
3 pages
B-157 Poster
No ratings yet
B-157 Poster
1 page
Literature Survey
No ratings yet
Literature Survey
5 pages
Passwordpaper Springer SCIproof
No ratings yet
Passwordpaper Springer SCIproof
30 pages
Detecting Security Levels of Cryptosystems
No ratings yet
Detecting Security Levels of Cryptosystems
12 pages
Graphical Password Authentication Using Image Segmentation
No ratings yet
Graphical Password Authentication Using Image Segmentation
5 pages
Survey On Password Managers: Bimal Krishna K.S, Arun C.S, Arya R.K, Ananthakrishna P.K, Reeny Zakarias, Sanam E Anto
No ratings yet
Survey On Password Managers: Bimal Krishna K.S, Arun C.S, Arya R.K, Ananthakrishna P.K, Reeny Zakarias, Sanam E Anto
10 pages
Carp Department of CSE
No ratings yet
Carp Department of CSE
27 pages
Uwf 25113
No ratings yet
Uwf 25113
15 pages
TSD Final Report
No ratings yet
TSD Final Report
32 pages
AI-Driven Cryptography Analysis
No ratings yet
AI-Driven Cryptography Analysis
18 pages
3.1.1.5 Lab - Create and Store Strong Passwords PDF
No ratings yet
3.1.1.5 Lab - Create and Store Strong Passwords PDF
3 pages
Penerapan Konstruksi Hijau Pada Proyek Apartemen X Di Jakarta Pusat
No ratings yet
Penerapan Konstruksi Hijau Pada Proyek Apartemen X Di Jakarta Pusat
8 pages
Resettlement Plan Chapter-6: R&R Costs and Budget
No ratings yet
Resettlement Plan Chapter-6: R&R Costs and Budget
6 pages
WEEK 16B Leadership and Mangement Roles of The PMHN
0% (1)
WEEK 16B Leadership and Mangement Roles of The PMHN
6 pages
2013 MIA Media Guide
No ratings yet
2013 MIA Media Guide
440 pages
Advertisement System Officers 2011 - New
No ratings yet
Advertisement System Officers 2011 - New
7 pages
Precision Locators - Optimized Precision For Your Utility
No ratings yet
Precision Locators - Optimized Precision For Your Utility
12 pages
As-Built Plan Cost Proposal
No ratings yet
As-Built Plan Cost Proposal
2 pages
Design of Draw Die For Cylindrical Cup Formation: Abstract
No ratings yet
Design of Draw Die For Cylindrical Cup Formation: Abstract
6 pages
Gigabyte Ga-Ab350 Gaming 3 R101
No ratings yet
Gigabyte Ga-Ab350 Gaming 3 R101
37 pages
Templet Edutec 2024.
No ratings yet
Templet Edutec 2024.
19 pages
WAAREE Energies LTD.: Registered Office
No ratings yet
WAAREE Energies LTD.: Registered Office
21 pages
Basics of Clock and Data Recovery Circuits Exploring High-Speed Serial Links
No ratings yet
Basics of Clock and Data Recovery Circuits Exploring High-Speed Serial Links
14 pages
DTECH Press List 2.12
No ratings yet
DTECH Press List 2.12
4 pages
Peas
No ratings yet
Peas
19 pages
Letter To Mr. Herbert Brown, District Director, Norfolk Local EEOC Office
No ratings yet
Letter To Mr. Herbert Brown, District Director, Norfolk Local EEOC Office
3 pages
Flexible Budget Problem
50% (2)
Flexible Budget Problem
19 pages
Joint Public Directive
No ratings yet
Joint Public Directive
2 pages
Violation and Fines
No ratings yet
Violation and Fines
1 page
Mock - 3s
No ratings yet
Mock - 3s
20 pages
Career Tracks - 2023
No ratings yet
Career Tracks - 2023
158 pages
OET Test Teachers Handbook 2024
No ratings yet
OET Test Teachers Handbook 2024
295 pages
Digital Image Forgery Detection Using SIFT Feature: Rajeev Rajkumar Manglem Singh
100% (1)
Digital Image Forgery Detection Using SIFT Feature: Rajeev Rajkumar Manglem Singh
6 pages
Amarjeet Compu
No ratings yet
Amarjeet Compu
2 pages
Accounts Half Yearly
No ratings yet
Accounts Half Yearly
7 pages
SAP Technical Specs for ADTL
No ratings yet
SAP Technical Specs for ADTL
11 pages
Circular 64 - KG I Sibling Admissions For The AY 2024-25
No ratings yet
Circular 64 - KG I Sibling Admissions For The AY 2024-25
2 pages
Q1 2019 - Investor Letter
No ratings yet
Q1 2019 - Investor Letter
3 pages
PowerPoint Basics for CFA Course
No ratings yet
PowerPoint Basics for CFA Course
20 pages
Solutions: 5 Applications of Trigonometry in 2-Dimensional Problems Basic Worksheet 5.1A
No ratings yet
Solutions: 5 Applications of Trigonometry in 2-Dimensional Problems Basic Worksheet 5.1A
13 pages
XII CS Chapter 2 - Data Abstraction Notes2025-2026
No ratings yet
XII CS Chapter 2 - Data Abstraction Notes2025-2026
6 pages