0% found this document useful (0 votes)
19 views6 pages

Wsfip sj1 2024 10

This research evaluates whether OpenAI's GPT-4 can pass the Polish Stockbroker Exam, revealing that it consistently fails to meet the required passing score despite showing some proficiency in legal and finance theoretical questions. The study indicates that GPT-4's performance improves with additional response time, yet it struggles significantly with specific stockbroker-related questions. Overall, while GPT-4 has demonstrated strong capabilities in other academic and professional settings, its performance in this particular exam remains inadequate.

Uploaded by

pawelwojan78
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views6 pages

Wsfip sj1 2024 10

This research evaluates whether OpenAI's GPT-4 can pass the Polish Stockbroker Exam, revealing that it consistently fails to meet the required passing score despite showing some proficiency in legal and finance theoretical questions. The study indicates that GPT-4's performance improves with additional response time, yet it struggles significantly with specific stockbroker-related questions. Overall, while GPT-4 has demonstrated strong capabilities in other academic and professional settings, its performance in this particular exam remains inadequate.

Uploaded by

pawelwojan78
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

ASEJ ISSN: 2543-9103 ISSN: 2543-411X (online)

Can GPT-4 Chat Pass a Polish Stockbroker


Exam?

Tomasz Wyłuda1
1
Wydział Zarządzania, Uniwersytet Warszawski
Warszawa, Poland

Abstract— This research investigates the performance of


10 mathematics. The study reveals that current LLMs fall short of
OpenAI's GPT-4, a sophisticated language model, in passing the delivering satisfactory performance.
Polish Stockbroker Exam conducted by the Polish Financial GPT-4's performance in legal examinations has been
Supervisory Authority (KNF). The exam, covering a broad range
remarkable. According to the GPT-3.5 technical report by
of topics, including legal issues, finance theory, finance
mathematics, and setting prices, requires theoretical and practical OpenAI (Achiam et al., 2023), it would pass a bar exam with a
skills pertinent to the financial markets. The study is set against score around the top 10% of test takers. This indicates GPT-4's
various evaluations where GPT-4 and its predecessors have been advanced comprehension skills in legal matters. In standardized
tested in numerous academic and professional settings, tests, GPT-4 has shown exceptional results (Achiam
demonstrating strengths and weaknesses in different domains. et al., 2023). In the SAT Reading & Writing section, it scored
The study aimed to determine whether GPT-4 can pass the Polish
710 out of 800 (93rd percentile). In the math section, it scored
Stockbroker Exam and analyze its performance across different
question types. Results indicated that GPT-4 consistently failed to 700 (89th percentile). The GPT-4 chat achieved relatively
meet the passing score. However, it performed better when given average test results (54th percentile) in the Graduate Record
more time per question, suggesting a trade-off between accuracy Examinations (GRE) writing part. However, recent papers
and completeness. Analysis by question type revealed higher suggest slightly lower scores (Martinez, 2023). It is worth
proficiency in legal and finance theoretical questions but noting GPT-4 chat would also perform decently in law schools
significant struggles with specific questions related to the
(Blair-Stanek, 2023).
stockbroker job. Notably, GPT-4 showed improvement in finance
calculation questions with more response time. In medical profession exams, GPT chat also performed well.
GPT-4 could receive a passing grade from the Japanese Medical
Keywords— Finance, Law, Stockbroker, Artificial Intelligence, Licensing Examination (Takagi et al., 2023). Similarly, GPT-4
Investment. was able to pass the Polish Medical Final Examination (Rosol
et al., 2023), the Indian pre-medical test (Farhat et al., 2023),
the German State Examination in Medicine (Jung, 2023), the
INTRODUCTION Korean National Licensing Examination for Medicine Doctors
Since OpenAI released ChatGPT and its subsequent upgrade (Jang et al., 2023), and Turkish Medical Specialization Exam
to GPT-4, these models have been extensively tested in (Kilic, 2023).
academic settings to evaluate their capabilities. GPT-4, a deep In the engineering field, GPT-4 performs decently. For
learning model, has shown a notable improvement in instance, the AI model passed the Fundamentals of Engineering
understanding and text generation compared to its predecessor, (FE) Environmental Exam (Pursnani et al., 2023). GPT-3.5 also
ChatGPT. achieved excellent results in software engineering exams
SciBench (Wang et al., 2023) tested the GPT-4 chat on a prepared for students (Loubier, 2023). The AI model has
range of college-level scientific problems such as mathematics, demonstrated impressive capabilities in physics; the GPT chat
chemistry, and physics textbooks, and problems from achieved First-Class grades on an essay writing assessment
undergraduate-level exams in computer science and from a university physics module (Yeadon, 2023). In a different

_________________________________________________________________________________________________________________________________
ASEJ - Scientific Journal of Bielsko-Biala School of Copyright: © 2024 by the authors. This is an open access article
Finance and Law distributed under the terms and conditions of the Creative
Volume 28, No 1 (2024), pages 6 Commons Attribution CC-BY-NC 4.0 License
https://doi.org/10.19192/wsfip.sj1.2024.10 (https://creativecommons.org/licenses/by/4.0/)
Received: January 2024 Accepted: March 2024 Publisher’s Note: ANSBB stays neutral with regard to
Published: March 2024 jurisdictional claims in published maps and institutional
affiliations.

- 75 -
ASEJ ISSN: 2543-9103 ISSN: 2543-411X (online)

research (Yeadon & Douglas, 2023), GPT-4 and GPT-3.5 METHODOLOGY


answered a set of 42 exam papers derived from 10 distinct
The stockbroker's exam includes detailed thematic areas
physics courses (administered at Durham University from 2018
covering legal issues in civil law, commercial law, tax, foreign
to 2022) and scored an average of 49.4% and 38.6%,
exchange law, and aspects related to securities and other
respectively. This is not a passing score; however, it suggests
financial instruments. It also addresses public offerings, trading
that GPT chat is better in a writing assessment than multiple-
in financial instruments, financial accounting, and ethical
choice tests.
standards in the profession. This structured approach ensures
In finance and business, the GPT and AI solutions were
that prospective stockbrokers are well-versed in theoretical
tested as helpful tools in human capital management
knowledge and practical skills essential for their role in the
(Bashynska et al., 2023), auditing (Karmańska, 2022),
financial markets.
accounting (Beerbaum, 2023), banking (Fares et al., 2023),
To assess whether GPT chat could pass the securities
actuary (Balona, 2023), investment (Nametala et al., 2023).
stockbroker exam, a methodology akin to that employed in
GPT was tested against college test examinations in
previously referenced studies was utilized. For this purpose, ten
economics, finance, and management. Chat was able to pass the
exams from previous years published on the KNF website were
Test of Understanding in College Economics (TUCE) with
selected. The exams occurred between March 25, 2018, and
excellent results - the 91st percentile for Microeconomics and
October 15, 2023 (KNF, n.d.). The exam consists of 120 test
the 99th percentile for Macroeconomics when compared to
questions, with a total duration of 3 hours. There are four
students who take the TUCE exam at the end of their principles
possible answers, but only one is correct. For each correct
course (Geerling et al., 2023). In a study named "Would Chat
answer, two points are awarded, and for each incorrect answer,
GPT3 Get a Wharton MBA?" Christian Terwiesch (2023)
one point is deducted. No points are given for unanswered
stated that even the GPT-3.0 version chat would be able to
questions. To pass the exam, one needs to score 160 points.
receive a B to B- grade on the graded exams. However, even
Two methods were used to test how GPT-4 chat would
GPT-4 performed poorly on Quantitative Finance
perform in the exam. In the first method, GPT-4 chat received
Examinations (Malladi, 2023).
all questions at once; however, in the second method, the chat
ChatGPT model can pass major accounting certification
received question by question (not the whole set at once).
exams, including the Certified Public Accountant (CPA),
In the first method, the entire securities stockbroker exam
Certified Management Accountant (CMA), Certified Internal
was passed, preceded by the instructions:
Auditor (CIA), and Enrolled Agent (EA) certification exams
“The securities stockbroker exam is a single-choice test
(Eulerich, 2023). However, GPT-4 would probably fail the
consisting of 120 questions. To pass the exam, a minimum
Chartered Financial Analyst (CFA) Level I and II exams
score of 160 points is required. The scoring system for the exam
(Callanan et al., 2023). The study conducted by a collaboration
is as follows:
of researchers from Queens University, Virginia Tech, and J.P.
Correct Answer: +2 points
Morgan's AI research division highlighted GPT-4's enhanced
Incorrect Answer: -1 point
understanding of complex financial concepts, although it
No Answer: 0 points
demonstrated more difficulty with Level II content.
Your task is to answer the questions and receive a minimum
Overall, GPT chat was tested against a wide range of college-
of 160 points.”
level tests and standardized certification tests. Research shows
In this approach, GPT chat was presented with the whole test
that AI solutions can be valuable tools in education. However,
at once and then proceeded to solve the tasks by choosing one
the performance in the financial education and college-level
out of four correct answers. Each exam was solved in a separate
finance examinations still needs improvement. Knowledge of
instance of GPT chat to avoid interactions between the already
financial topics requires a combination of reasoning, logic, and
completed tests. Before passing the test, the rules were
advanced mathematics skills.
explained to the GPT chat.
The research on applying the tool in business education and
In contrast, the second method involved presenting GPT chat
Polish financial education remains limited. These diverse
with instructions on how to solve the test, followed by pasting
assessments of GPT models in academic and professional
questions one by one. This method allowed GPT chat more time
settings highlight their strengths in processing and generating
to respond to each question. However, it is essential to note that
complex information across various domains. While many
the total response time taken by GPT chat was still below the
papers examined GPT chat's performance, the subject is new
time limit set for the actual test conducted by the Financial
and evolving. Thus, there is a research gap, especially in testing
Supervision Authority (3 hours).
Polish examinations such as the Polish Stockbroker Exam
After the calculation, we conducted a t-statistics test with
administered by the Polish Financial Supervisory Authority
99% confidence to determine whether the GPT-4 chat can pass
(KNF). The study aims to answer the questions:
the test.
1) Can the GPT-4 chat pass the Polish Stockbroker Exam?
− Null Hypothesis (H0): The test taker's average score is
2) How does the model perform with different types of
equal to or greater than the points required to pass.
questions in the exam?
− Alternative Hypothesis (H1): The test taker's average
3) How does GPT-4 perform in different circumstances?
score is less than the points required to pass.

- 76 -
ASEJ ISSN: 2543-9103 ISSN: 2543-411X (online)

The research findings were subsequently verified, and the Total


Question points
number of correct answers, incorrect answers, and refusals to Correct Wrong Question Points
Exam s achieve Resul
answer answer s not require
respond were tallied. Additionally, the questions in the test were date canceled d by t
s s answered d
by KNF GPT
categorized into different sections to ascertain whether GPT chat
chat exhibited similar proficiency across all types of questions. 9 October
73 47 0 0 99 160 Fail
2022
Four categories were distinguished: law, finance-theory, 27 March
72 48 0 0 96 160 Fail
finance-mathematical tasks, and specific knowledge (KNF, 2022
20 June
2024): 2021
63 57 0 0 69 160 Fail
− The legal tasks encompassed a range of topics, 13
Septembe 63 56 0 1 70 158 Fail
including civil law, commercial law, tax and foreign r 2020
exchange law, issues related to securities and other 27
October 61 58 0 1 64 158 Fail
financial instruments, matters concerning public 2019
offerings and public companies, issues related to 24 March
65 54 0 1 76 158 Fail
2019
trading in financial instruments, matters concerning 21
supervision over the financial and capital markets, October 63 54 0 3 72 154 Fail
2018
issues related to the creation and functioning of 25 March
investment companies and funds as well as 61 55 0 4 67 152 Fail
2018
management of alternative investment funds, matters Average 65,5 53,5 0,0 1,0 77,5 158,0
concerning the commodity market exchange, issues Median 64,0 54,0 0,0 0,5 74,0 159,0
related to the settlement-depository system, the Standard
Accounting Act, and accounting issues. deviation 4,1 3,4 0,0 1,3 11,4 2,7
Source: own calculation.
− The second category, finance theory, included
The sample mean is 77.5 points, the required point to pass
theoretical topics (not requiring calculations). These
138,0 points, and the sample standard deviation is 11.4 points.
issues covered financial mathematics, analysis and
The calculated t-statistic is approximately -16.78, and the
valuation of debt instruments, financial analysis of
critical t-value for a 99% confidence level in a one-sided test
enterprises and stock valuation, analysis of derivative
with 9 degrees of freedom is approximately -2.82. Since the
instruments, and investment strategies.
absolute value of the t-statistic is greater than the absolute value
− The third category covered the same range of material of the critical t-value (|16.78| > |2.82|), we reject the null
as the second category but required mathematical hypothesis. This indicates that there is significant evidence at
calculations. the 99% confidence level to conclude that the test taker's
− The fourth category of questions pertained explicitly average score is significantly lower than the points required to
to the work of a stockbroker, including stock exchange pass. It indicates that GPT-4's score would not be sufficient to
and over-the-counter trading, setting prices of listed pass the Stockbroker Exam.
financial instruments, professional ethics, and A closer examination of the performance metrics reveals that
prevention of crimes in the capital market. GPT-4 did not pass any of the 10 exams. Moreover, the
interesting is the strategy taken by the test taker. There were no
instances where questions were left unanswered by GPT chat in
RESULTS AND DISCUSSION any of the exams. However, there were instances of questions
This study aimed to assess the performance of the GPT chat being canceled by the Polish Financial Supervisory Authority
system in the Polish Stockbrokers' examination over multiple (KNF), ranging from none in the earlier exams to a maximum
iterations spanning from March 25, 2018 to October 15, 2023. of four questions in the March 2018 exam.
The evaluation metrics included the number of correct and Following the results, the second testing method was
incorrect answers, questions not answered, questions canceled implemented. In the second series of tests where GPT chat was
by the Polish Financial Supervisory Authority (KNF), total asked each question individually. The results show a distinct
points achieved, and the points required for passing. The results pattern compared to the first test series where GPT chat was
of the first testing method are presented below. asked to answer the entire test in one go. This second approach,
TABLE 1. GPT-4'S RESULTS IN SOLVING THE STOCKBROKER EXAM OVER THE spanning from March 2018 to October 2023, still resulted in
YEARS - FIRST TESTING METHOD GPT chat failing to meet the required threshold for passing the
Total Polish Stockbrokers exam, yet it demonstrates a noteworthy
Question points
Exam
Correct Wrong Question
s achieve
Points
Resul change in performance dynamics. Below the results of the
answer answer s not require
date
s s answered
canceled d by
d
t second testing method are presented.
by KNF GPT
chat
15
October 66 54 0 0 78 160 Fail
2023
19 March
68 52 0 0 84 160 Fail
2023

- 77 -
ASEJ ISSN: 2543-9103 ISSN: 2543-411X (online)

TABLE 2. GPT-4'S RESULTS IN SOLVING THE STOCKBROKER EXAM OVER THE question, leading to some questions being left unanswered
YEARS - SECOND TESTING METHOD
within the given timeframe.
Total
Question points In conclusion, while the altered testing approach in the
Correct Wrong Question Points
Exam
answer answer s not
s achieve
require
Resul second series improved the GPT chat's total points, it also
date canceled d by t
s s answered
by KNF GPT
d introduced the occurrence of unanswered questions. Despite
chat these performance improvements, the AI model was still unable
15
October 69 44 7 0 94 160 Fail to pass the Polish stockbrokers' examination.
2023 TABLE 3. GPT-4'S RESULTS IN SOLVING THE STOCKBROKER EXAM OVER THE
19 March YEARS BY QUESTION TYPE - FIRST TESTING METHOD
70 44 6 0 96 160 Fail
2023
9 October Questio Share of Share of Share of Share of
76 38 6 0 114 160 Fail n type questions by correct wrong questions not
2022
27 March type (%) answers answers answered (%)
73 37 10 0 109 160 Fail
2022 (%) (%)
20 June
66 46 8 0 86 160 Fail
Legal 25 68 32 0
2021
13 Finance 33 69 31 0
Septembe 67 46 6 1 88 158 Fail Theoreti
r 2020 cal
27 Finance 18 52 48 0
October 64 48 7 1 80 158 Fail Calculat
2019 ion
24 March Specific 24 27 73 0
66 45 8 1 87 158 Fail
2019
Knowle
21
October 64 44 9 3 84 154 Fail
dge
2018 Source: own calculation.
25 March TABLE 4. GPT-4'S RESULTS IN SOLVING THE STOCKBROKER EXAM OVER THE
64 40 12 4 88 152 Fail
2018 YEARS BY QUESTION TYPE -SECOND TESTING METHOD
average 67,9 43,2 7,9 1,0 92,6 158,0 Share of Share of Share of Share of
median Question questions by correct wrong questions not
66,5 44,0 7,5 0,5 88,0 159,0
type type (%) answers (%) answers (%) answered (%)
standard
deviation 3,9 3,5 1,9 1,3 10,4 2,7 Legal 25 66 27 8
Source: own calculation. Finance
The calculated t-statistic with the updated data is Theoreti
cal 33 66 26 7
approximately -13.80, and the critical t-value for a 99%
Finance
confidence level in a one-sided test with 9 degrees of freedom Calculati
is approximately -2.82. Similar to the previous analysis, since on 18 72 19 9
the absolute value of the t-statistic is greater than the t-value Specific
Knowled
(|13.80| > |2.82|), we reject the null hypothesis. This indicates ge 24 26 70 4
that there is significant evidence at the 99% confidence level to Source: own calculation.
conclude that the test taker's average score is significantly lower The analysis of GPT chat's performance on the Polish
than the points required to pass, even with the updated data. It Stockbrokers exam, categorized by question type, reveals
indicates that GPT-4's score would not be sufficient to pass the distinct patterns and variations between the two methods of
Stockbroker Exam. testing.
The total points achieved by GPT chat in the second testing In the first method, where GPT chat was provided the full
method are higher than in the first method. For instance, in test at once. GPT chat demonstrated relatively high proficiency
October 2023, GPT chat scored 94 points as opposed to 78 in Legal Questions (25% of the total), with 68% correct answers
points in the first testing method. This trend of increased and 32% wrong answers. Similarly, the AI model showed
scoring is consistent across all test dates, suggesting that proficiency in Finance Theoretical Questions (33% of the total)
providing more time for each question positively impacts GPT with 69% correct answers and 31% wrong answers. The
chat's performance. performance of the chat was mediocre in Finance Calculation
The number of correct answers in the second testing method Questions (18% of the total), with only 52% correct answers
is higher compared to the first testing method. Similarly, there and 48% wrong answers. Specific Knowledge Questions (24%
is a noticeable decrease in the number of wrong answers, with of the total) were the most challenging for GPT chat, with only
the second testing method recording a lower count of incorrect 27% correct answers and a high 73% wrong answers.
responses compared to the first testing method. In the second method of testing, where GPT chat was given
However, we can observe that the GPT chat took a different questions one at a time, GPT chat showed similar performance
strategy in answering questions. In the first testing method, the in
AI model answered all questions. On the other hand, the legal questions (a slight decrease in correct answers to 66%,
number of unanswered questions in the second testing method with wrong answers at 27% and 8% of questions not answered).
ranged from 6 to 12 across different exam dates. This factor Similarly, in finance theoretical questions, GPT chat performed
could be attributed to the altered testing methodology, where well (a slight decrease to 66% correct answers, 26% wrong
GPT chat might have taken more time to consider each answers, and 7% not answered). However, GPT chat increased
- 78 -
ASEJ ISSN: 2543-9103 ISSN: 2543-411X (online)

performance in finance calculation questions (72% correct underperformance highlights the model's limitations in fully
answers, 19% wrong answers, and 9% not answered). As in the grasping and applying the specialized knowledge and analytical
first testing method, GPT chat performed poorly in specific skills required for this specific professional certification.
knowledge questions (with 26% correct answers, 70% wrong The analysis of performance based on question type unveiled
answers, and 4% not answered). distinct patterns. GPT-4 showed relatively higher proficiency in
In comparison, the second testing method revealed a notable legal and finance theoretical questions, but struggled
effect on the performance of GPT chat. This method significantly with specific knowledge questions. Interestingly,
demonstrated an improvement in GPT chat's ability to answer the model demonstrated a marked improvement in finance
financial calculation questions. However, during the first calculation questions when given more time, underscoring its
testing method, in which GPT chat was presented with the potential in handling complex, calculation-based queries.
entire set of questions, the AI model committed a significant These findings emphasize the potential and constraints of AI
error. In certain calculation questions, GPT chat attempted to applications like GPT-4 in professional and academic fields.
retrieve the answers from its memory instead of performing the While GPT-4 shows promise in understanding and processing
calculations. Consequently, GPT chat incorrectly interpreted complex information, its application in passing professional
the task required to respond to the question. Giving GPT chat certifications like the Polish Stockbrokers' examination is
more time to analyze each question might be particularly currently limited. This suggests that while AI can be a valuable
beneficial for complex calculation-based questions. tool for learning and preliminary analysis, it cannot yet replace
However, the overall performance in the specific knowledge the nuanced understanding and decision-making skills of
category, particularly concerning Polish laws and regulations, human professionals.
remained consistently low across both testing methods. This The study underscores the need for ongoing research and
indicates an ongoing challenge in this area. The questions in this development in AI. Improvements in AI models, particularly in
category demanded not only specific knowledge but also the their ability to handle specialized, context-specific information
ability to perform complex tasks, such as setting appropriate and in decision-making under time constraints, could enhance
prices and executing correct orders for buying or selling stocks. their applicability in professional certifications and other
The second testing method also revealed that GPT chat complex tasks.
employed a new strategy. Unlike the first method, where GPT In conclusion, the study of GPT-4's performance in the Polish
chat attempted to answer all questions, the second method left Stockbrokers' examination provides valuable insights into the
some questions unanswered. This change could be attributed to current capabilities of AI in complex, professional settings.
the AI model giving more thoughtful consideration to each While there are notable strengths, particularly in processing and
question when time constraints were less pressing. analyzing information, the limitations in achieving the required
In summary, although the alteration in testing methodology proficiency for professional certification indicate the need for
did result in improvements in specific categories, it also further advancements in AI technology. This exploration serves
underscored the limitations of GPT chat in consistently and as a critical step in understanding and shaping the future role of
comprehensively responding to questions across various types AI in professional and educational domains.
of content featured in the Polish Stockbroker exam.

REFERENCES
CONCLUSIONS Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F. L., ... &
McGrew, B. (2023). Gpt-4 technical report. arXiv preprint arXiv:2303.08774.
This comprehensive study aimed to evaluate the performance
of GPT chat, specifically the GPT-4 model, in passing the Balona, C. (2023). ActuaryGPT: Applications of large language models to
insurance and actuarial work. Available at SSRN 4543652.
Polish Stockbrokers' examination. The assessment was
conducted through two distinct methodologies over multiple Bashynska, I., Prokopenko, O., & Sala, D. (2023). Managing Human Capital
with AI: Synergy of Talent and Technology. Zeszyty Naukowe Wyższej Szkoły
iterations between 25 March 2018 and 15 October 2023. The Finansów i Prawa w Bielsku-Białej, 27(3), 39-45.
results offer several key insights into the capabilities and
limitations of GPT-4 in this professional context. Beerbaum, D. O. (2023). Generative Artificial Intelligence (GAI) with Chat
GPT for Accounting–a business case. Available at SSRN 4385651.
The study revealed that GPT-4 performed better when given
more time to answer each question individually. This approach Blair-Stanek, A., Carstens, A. M., Goldberg, D. S., Graber, M., Gray, D. C., &
Stearns, M. L. (2023). GPT-4’s Law School Grades: Con Law C, Crim C-, Law
led to a higher number of correct answers and a notable & Econ C, Partnership Tax B, Property B-, Tax B. Crim C-, Law & Econ C,
decrease in wrong answers compared to the first method, where Partnership Tax B, Property B-, Tax B (May 9, 2023).
the entire test was presented at once. However, this Callanan, E., Mbakwe, A., Papadimitriou, A., Pei, Y., Sibue, M., Zhu, X., ... &
improvement in accuracy came at the cost of an increased Shah, S. (2023). Can gpt models be financial analysts? an evaluation of chatgpt
number of unanswered questions, suggesting a trade-off and gpt-4 on mock cfa exams. arXiv preprint arXiv:2310.08678.
between accuracy and completeness. Eulerich, M., Sanatizadeh, A., Vakilzadeh, H., & Wood, D. A. (2023). Is it All
Despite the observed improvements in certain aspects, GPT- Hype? ChatGPT’s Performance and Disruptive Potential in the Accounting and
Auditing Industries. SSRN Electronic Journal.
4 consistently failed to achieve the passing score in all iterations
of the Polish Stockbrokers' examination. This

- 79 -
ASEJ ISSN: 2543-9103 ISSN: 2543-411X (online)

Fares, O. H., Butt, I., & Lee, S. H. M. (2023). Utilization of artificial skills test [Announcement No. 4 on the subject scope of the securities broker
intelligence in the banking sector: A systematic literature review. Journal of exam and skills test]. Available at:
Financial Services Marketing, 28(4), 835-852. https://www.knf.gov.pl/knf/pl/komponenty/img/Komunikat_4_2023_87292.p
df (Accessed: January 25, 2024).
Farhat, F., Chaudry, B. M., Nadeem, M., Sohail, S. S., & Madsen, D. O. (2023).
Evaluating AI models for the National Pre-Medical Exam in India: a head-to- Minister of Finance Regulation on examinations for securities brokers and
head analysis of ChatGPT-3.5, GPT-4 and Bard. JMIR Preprints. investment advisors and the skills test (2016) Journal of Laws (Dziennik
Ustaw), 707, Poz. 707.
Geerling, W., Mateer, G. D., Wooten, J., & Damodaran, N. (2023). ChatGPT
has aced the test of understanding in college economics: Now what?. The
American Economist, 05694345231169654.
Gilson, A., Safranek, C. W., Huang, T., Socrates, V., Chi, L., Taylor, R. A., &
Chartash, D. (2023). How does ChatGPT perform on the United States medical
licensing examination? The implications of large language models for medical
education and knowledge assessment. JMIR Medical Education, 9(1), e45312.
Jang, D., Yun, T. R., Lee, C. Y., Kwon, Y. K., & Kim, C. E. (2023). GPT-4 can
pass the Korean National Licensing Examination for Korean Medicine Doctors.
PLOS Digital Health, 2(12), e0000416.
Jung, L. B., Gudera, J. A., Wiegand, T. L., Allmendinger, S., Dimitriadis, K.,
& Koerte, I. K. (2023). ChatGPT passes German state examination in medicine
with picture questions omitted. Deutsches Ärzteblatt International, 120(21-22),
373.
Karmańska, A. (2022). Artificial Intelligence in audit. Prace Naukowe
Uniwersytetu Ekonomicznego we Wrocławiu, 66(4), 87-99.
Kilic, M. E. (2023). AI in Medical Education: A Comparative Analysis of GPT-
4 and GPT-3.5 on Turkish Medical Specialization Exam
Performance. medRxiv, 2023-07.
Loubier, M. (2023). ChatGPT: A Good Computer Engineering Student?: An
Experiment on its Ability to Answer Programming Questions from Exams.
Malladi, R. K. (2023). Emerging Frontiers: Exploring the Impact of Generative
AI Platforms on University Quantitative Finance Examinations. arXiv preprint
arXiv:2308.07979.
Martínez, E. (2023). Re-Evaluating GPT-4's Bar Exam Performance. Available
at SSRN 4441311.
Nametala, C. A., Souza, J. V. D., Pimenta, A., & Carrano, E. G. (2023). Use of
econometric predictors and artificial neural networks for the construction of
stock market investment bots. Computational Economics, 61(2), 743-773.
Pursnani, V., Sermet, Y., Kurt, M., & Demir, I. (2023). Performance of
ChatGPT on the US fundamentals of engineering exam: Comprehensive
assessment of proficiency and potential implications for professional
environmental engineering practice. Computers and Education: Artificial
Intelligence, 5, 100183.
Rosoł, M., Gąsior, J. S., Łaba, J., Korzeniewski, K., & Młyńczak, M. (2023).
Evaluation of the performance of GPT-3.5 and GPT-4 on the Polish Medical
Final Examination. Scientific Reports, 13(1), 20512.
Takagi, S., Watari, T., Erabi, A., & Sakaguchi, K. (2023). Performance of GPT-
3.5 and GPT-4 on the Japanese medical licensing examination: comparison
study. JMIR Medical Education, 9(1), e48002.
Terwiesch, C. (2023). Would chat GPT3 get a Wharton MBA. A prediction
based on its performance in the operations management course.
Wang, X., Hu, Z., Lu, P., Zhu, Y., Zhang, J., Subramaniam, S., ... & Wang, W.
(2023). Scibench: Evaluating college-level scientific problem-solving abilities
of large language models. arXiv preprint arXiv:2307.10635.
Yeadon, W., Inyang, O. O., Mizouri, A., Peach, A., & Testrow, C. P. (2023).
The death of the short-form physics essay in the coming AI revolution. Physics
Education, 58(3), 035027.
Yeadon, W., and Douglas P. Halliday. "Exploring durham university physics
exams with large language models." arXiv preprint arXiv:2306.15609 (2023).
Polish Financial Supervision Authority (n.d.). Examinations for Securities
Brokers. Available at:
https://www.knf.gov.pl/dla_rynku/egzaminy/Maklerzy_papierow_wartosciow
ych_egzaminy/testy (Accessed: January 28, 2024).
KNF, Examination Commission for Securities Brokers (2023). Communication
No. 4 regarding the thematic scope of the exam for securities brokers and the

- 80 -

You might also like