0% found this document useful (0 votes)

13 views10 pages

AI's Geometry Problem-Solving in Spanish

articulo sobre el uso de inteligencia artificial generativa en la resolucion de problemas matemáticos

Uploaded by

Patricia Sureda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views10 pages

AI's Geometry Problem-Solving in Spanish

articulo sobre el uso de inteligencia artificial generativa en la resolucion de problemas matemáticos

Uploaded by

Patricia Sureda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Special Issue on Generative Artificial Intelligence in Education

Can Generative AI Solve Geometry Problems? Strengths

and Weaknesses of LLMs for Geometric Reasoning in
Spanish
Verónica Parra1,3, Patricia Sureda1,3, Ana Corica1,3, Silvia Schiaffino2,3, Daniela Godoy2,3 *
1
Universidad Nacional del Centro de la Provincia de Buenos Aires, Facultad de Ciencias Exactas, NIEM, Tandil, Buenos Aires
(Argentina)
2
Universidad Nacional del Centro de la Provincia de Buenos Aires, Facultad de Ciencias Exactas,
ISISTAN, Tandil, Buenos Aires (Argentina)
3
CONICET, Buenos Aires (Argentina)

Received 18 November 2023 | Accepted 15 February 2024 | Published 28 February 2024

Abstract Keywords
Generative Artificial Intelligence (AI) has emerged as a disruptive technology that is challenging traditional Chatbots, Generative AI,
teaching and learning practices. Question-answering in natural language fosters the use of chatbots, such Geometry, LLMs, Math
as ChatGPT, Bard and others, that generate text based on pre-trained Large Language Models (LLMs). The Problem-Solving.
performance of these models in certain areas, like Math problem solving is receiving a crescent attention as it
directly impacts on its potential use in educational settings. Most of these evaluations, however, concentrate on
the construction and use of benchmarks comprising diverse Math problems in English. In this work, we discuss
the capabilities of most used LLMs within the subfield of Geometry, in view of the relevance of this subject
in high-school curricula and the difficulties exhibited by even most advanced multimodal LLMs to deal with
geometric notions. This work focuses on Spanish, which is additionally a less resourced language. The answers
of three major chatbots, based on different LLMs, were analyzed not only to determine their capacity to provide
correct solutions, but also to categorize the errors found in the reasoning processes described. Understanding
LLMs strengths and weaknesses in a field like Geometry can be a first step towards the design of more informed
methodological proposals to include these technologies in classrooms as well as the development of more DOI: 10.9781/ijimai.2024.02.009
powerful automatic assistance tools based on generative AI.

I. Introduction Mathematics is a valuable testbed for evaluating problem-solving

capabilities of LLMs as it involves the ability to analyze and comprehend

T he emergence and fast adoption of natural-language chatbots,

such as OpenAI ChatGPT1, or Google Bard2, leveraging Large
Language Models (LLMs) to question-answering, is a phenomenon
the problem stated, select viable heuristics from a potentially large set
of strategies, and combine them into a chain-of-thought leading to a
solution. Each of these high-level abilities poses complex challenges for
having a growing impact in several daily activities. Education is among AI-based technologies, in general, and generative AI models, in particular.
the most heavily impacted areas by the irruption of these tools as the The incorporation of generative AI in educational settings requires
interaction between generative AI with both students and teachers a deep understanding of both the capabilities and limitations of
allows to envision promising applications in pedagogical scenarios, LLMs to provide solutions to Math problems as well as step-by-step
but also unveils potential risks. explanations at different levels. Novel AI-based techniques can be built
upon this knowledge and exploit LLMs potential for the development
1
https://chat.openai.com/ of more powerful tools, including Math teaching assistants interacting
with students during their learning process and potentially offering
2
https://bard.google.com/
individualized instruction.
Studies oriented to evaluate the performance of LLMs on mathematical
* Corresponding author.
reasoning have been mostly concerned with the construction of
E-mail addresses: [email protected] (V. Parra), appropriate benchmarks and the quantitative analysis of a given model
[email protected] (P. Sureda), results with respect to them [1]–[5]. Although their findings can provide
[email protected] (A. Corica), an overall view of LLMs performance in the Math domain, there is still
[email protected] (S. Schiaffino), a lack of understanding of their strengths and weaknesses in general
[email protected] (D. Godoy).
terms and in specific Math areas, such as Geometry.

- 65 -
International Journal of Interactive Multimedia and Artificial Intelligence, Vol. 8, Nº5

Finding solutions for Geometry problems might result in a specially and learning practices. Among other benefits, LLMs can be used in the
challenging task for generative AI based on multimodal LLMs as it not development of personalized learning tutors for students and being
only involves the knowledge of fundamental concepts (theorems) and of assistance to teachers in the creation of educational resources (e.g.
its correct application, but specially the use of spatial reasoning skills. syllabus and class planning, course material and exercises) as well
At the same time, Geometry has a preeminent place in high-school as the assessment of students capabilities (e.g. generating tests and
curricula in many countries. Because of this, it becomes essential to evaluation scenarios), among many other applications. On the other
better understand the potential and pitfalls of chatbots in solving side, LLMs potential uses rise concerns in relation to their accuracy
Geometry problems as an essential step towards the construction and reliability as well as other threats such as misuses, plagiarism, the
of more powerful teaching assistance tools as well as pedagogical presence of biases and hallucinations and other ethical considerations.
strategies integrating available general-purpose chatbots. In [12] it had even found that risks also encompass the potential to
In addition, current studies are concentrated on English texts, limit critical thinking and creativity and impede a deep understanding
while the performance of LLMs in less represented languages, such as of subject matter, and foster passivity.
Spanish, remains to be investigated. The quality of answers of models General purpose chatbots, such as ChatGPT or Bard, are trained for
for different languages is directly related to the amount of training dealing with question-answering in diverse domains as they are trained
data available for each language, performing better for languages with large portions of the Web. However, recent studies have shown
with larger representation like English and exhibiting an inferior that chatbots perform differently in different subject areas including
performance for languages like Spanish. finance, coding, maths, and general public queries [13]. In [14], for
This work presents an study tending to shed some light on the example, it was found that ChatGPT performance varied across subject
abilities of chatbots to provide accurate solutions to Geometry domains, ranging from outstanding (e.g., economics) and satisfactory
problems in Spanish. We carried out an analysis of the answers (e.g., programming) to unsatisfactory (e.g., mathematics). Fine-tuning
provided by three available chatbots, namely OpenAI ChatGPT, LLMs in specific domains to build educational applications upon these
Microsoft Bing Chat (BingChat)3, and Google Bard, using a case study trained models can circumvent this issue, examples include ChemBERTa
of Geometry high-school problem. The three major chatbots covered, [15] or MathChat [16]. However, training for downstream tasks requires
leveraging versions of GPT-3.5 [6], GPT-4 [7] and PalM-2 [4] models, specialized data corpora and the final product is tied to the language
were chosen because they are accessible and currently being used by of such data. Understanding the capabilities of most accessed, general-
students in everyday activities and schools. The problem analyzed purpose chatbots is relevant to both introduce them as a pedagogical
corresponds to an Iberoamerican Math competition4 oriented to high tool in classrooms, but also counteract inaccuracies students and
school students, and it is targeted to students under 13 years old. As teachers are exposed to while interacting with generative AI.
a result of this study, we propose a categorization of errors made by B. LLMs in Math and Geometry Problem-Solving
chatbots in Geometry reasoning that can be used as input towards
the construction of methodological proposals fostering the use of Although the entire scholar curricula is affected, the presence of
generative AI for learning and skill acquisition. AI impacts differently according to the competences and skills to be
acquired by students, depending on whether they involve, for example,
The structure of this document is as follows: section II discusses language abilities, communication, problem-solving capabilities,
related works in the area, section III introduces the material and researching factual information or critical thinking.
methods used in this study, section IV discusses the the results
obtained and, finally, section V presents the conclusions and devises Given its current level of adoption by students, it becomes
promising avenues for further research. increasingly important to evaluate LLMs performance on specific
tasks, such as in this case Geometry problem-solving. It is worth
noticing that, as pointed out by [17], autoregressive language models
II. Background & Related Works are trained for predicting the next word given a previous sequence of
words. The mismatch between the problem the model was developed
In this section we first summarize some aspects regarding the use to solve and the task that is being given, can have significant
of LLMs in education (subsection A), then we discuss research on the consequences. In fact, the authors highlight the importance of viewing
performance of these models in Math problem-solving (subsection LLMs not as a “Math problem solver” but rather as a “statistical next-
B) and finally we introduce some context and background concepts word prediction system" being used to solve Math problems. Then,
related to Geometry teaching (subsection C). failures can be understood directly in terms of a conflict between next-
A. LLMs in Education word prediction and the task at hand.
Since the launching of ChatGPT by OpenAI in 2022, there has Different LLMs have been tested on multiple mathematical
been an intensive discussion about the integration of generative AI reasoning datasets showing how these models struggle to solve
in several fields, particularly in education [8],[9], as well as about problems even at the level of a graduate student. In [1] a new natural-
the ethical aspects of using artificial intelligence (AI) systems in language dataset, named GHOSTS5, was introduced. This dataset that
educational contexts [10], [11]. ChatGPT was trained on a large covers graduate-level Mathematics and was curated by researchers
volume of text data, using the Generative Pre-trained Transformer working in Mathematics includes a subset, named Olympiad-
(GPT) deep learning architecture. Immediately, the friendly, human- Problem-Solving, consisting of a selection of exercises often used to
like responses in natural language conversations lead ChatGPT to be prepare for Mathematics competitions. The study over this dataset
one of the technologies of fastest adoption. concluded that ChatGPT cannot get through a university Math
class, but for undergraduate Mathematics, GPT-4 can offer sufficient
The irruption of generative AI and the widespread adoption of
(but not perfect) performance. In a quantitative comparison of GPT
ChatGPT opened the discussion on both challenges and concerns
versions in different subsets of GHOSTS it was shown that Olympiad
regarding its use in educational settings. On one side, there is a pressing
problem solving was the subset proving to be the more difficult for
need of harnessing the power of these tools for enhancing teaching
these models, obtaining lower scores in such problems than even for
3
https://www.bing.com/chat
symbolic integration.
4
https://www.oma.org.ar/internacional/may.htm 5
https://github.com/xyfrieder/science-GHOSTS

- 66 -
Special Issue on Generative Artificial Intelligence in Education

GPT-2 and GPT-3 were tested in the Mathematics Aptitude Test of interpreting elements such as the degree of an angle. Current
Heuristics (MATH) dataset [2] consisting of problems from high school solutions like G-LLaVA [21], built upon LLaVA (Large Language and
Math competitions classified in different subjects and levels. GPT-2 Vision Assistant) model [22], involve enriching the training data and
accuracy reached an average of 6.9%, being better at problems of Pre- creating augmented datasets (Geo170K) for improving model training.
Calculus and Geometry and worse for problems related to Number As mentioned before, the resulting models are less accessible than
Theory. GPT-3, in turn, reaches an average accuracy of 5.2%, being general-purpose ones and available a mainstream language as English.
better at pre-Algebra and worse at Geometry. In [3], an study on the With large language models rapidly evolving, there is a pressing
performance of ChatGPT on Math word problems (MWPs) from the need to understand their capabilities and limitations in the context
dataset DRAW-1K6 found that it changes dramatically if it is asked to of mathematical reasoning and, particularly, in specific fields like
provide explanations of the answer instead of simply being asked for the Geometry. Current studies have been centered on measuring the
answer without further text. PaLM [4] version of 540-billion parameters performance of LLMs on benchmarks of broad sets of Mathematical
reported to solve 58% of the problems in GSM8K7, a benchmark of problems in English. To the best of our knowledge, this is the first
thousands of challenging grade school level Math questions, with 8-shot work focusing on understanding question-answering capabilities of
chain-of-thought prompting in combination with an external calculator. the widely available chatbots regarding Geometry in Spanish language.
In turn, this result outperforms the prior top score of 55% achieved by
fine-tuning the GPT-3 175B model with a training set of 7500 problems C. Geometry in the Classroom
and combining it with an external calculator and verifier [5]. Geometry is one of the basic subjects of Mathematics. For analyzing
A few studies can be found comparing multiple available chatbots Geometry in the context of Argentine educational system, in which
answers for Math problems. In [18] an evaluation of the Mathematics the present study takes place, three edges need to be considered:
performance of Google Bard in solving Mathematics problems commonly curricular design, actual work in classrooms and the Argentine
found in the Vietnamese curricula was presented. The work findings Mathematics Olympiads (OMA8). In the first case, one of the four
indicate that in this regard Google Bard’s performance falls behind priority learning blocks proposed by the Argentine Ministry of
its counterparts (Bing Chat and ChatGPT). For these experiments, a Education is Geometry [23]. Thus, the vast majority of the curricular
Vietnamese dataset was translated into English since Bard lacks support designs of each jurisdiction prescribe studying Geometry throughout
for Vietnamese at the moment the study was carried out. A comparison the secondary education (both in the basic and higher levels). The
between three chatbots like ChatGPT-3.5, ChatGPT-4 and Google Bard curricular relevance of Geometry derives from its close relationship
was presented in [19], focusing on their ability to give correct answers to with various fields, including Natural and Social Sciences, as well as
Mathematics and Logic problems. For a set of 30 questions, it was found everyday life [24]–[26]. However, even though Geometry continues to
that for straightforward arithmetic, algebraic expressions, or basic logic be present in secondary school curricular designs, various researchers
puzzles, chatbots may provide accurate solutions, although not in every highlight the absence of Geometry in the classroom [24],[27]. The
attempt. For more complex Mathematics problems or advanced logic third edge corresponds to a competition that has been taking place
tasks, their answers were unreliable. in Argentina for more than 30 years: the Argentine Mathematics
Mechanisms to improve the ability of LLMs to complex reasoning Olympiads [28]. The fundamental objective of these Olympiads is to
are based on generating a chain of thought, i.e. a series of intermediate stimulate mathematical activity among young people and develop
reasoning steps. Chain-of-thought prompting (CoT) [20] leverages the ability to solve problems (OMA, regulations, art 2.). The OMA
intermediate natural language rationales as prompts to enable LLMs proposes the resolution of problems, which can be grouped into two
to first generate reasoning chains and then predict an answer for an large types: arithmetic-algebraic and geometric.
input question. On the GSM8K benchmark of Math word problems, for In summary, the official curricular guidelines propose studying
example, chain-of-thought prompting with PaLM 540B outperforms Geometry in secondary school, however, this guideline is not
standard prompting by a large margin and achieves new state- materialized in the classrooms (or it is, but weakly). Moreover,
of-the-art performance, surpassing even finetuned GPT-3 with a Geometry is one of the two types of problems that are used to assess
verifier. In the same direction, an evaluation on difficult high school mathematical skills of the students who participate in the OMA. We
competition problems from the MATH dataset was presented in [16] highlight the importance given to OMA because it is not only promoted
and MathChat, a conversational problem-solving framework was by educational centers, but also by provincial governments (as it can be
proposed. It simulates a mock conversation between an LLM assistant seen in their official site), motivating students to participate actively.
using GPT-4 and a user proxy agent working together to solve the In this work we explore how various resources from generative AI can
Math problem. On the problem with the highest level of difficulty from be used to study geometric problems.
MATH, MathCat improves the accuracy from 28% of GPT-4 to 44%
and has competitive performance across all the categories of problems. III. Materials and Methods
Multimodal LLMs (MLLMs) seem to be the most appropriate option
to complement reasoning capabilities with the spatial thinking needed The goal of the analysis carried out in this work is to explore the
to Geometry problem-solving. However, even the most advanced performance of chatbots when dealing with a problem involving
MLLMs still exhibit limitations in addressing geometric problems due Geometry notions at the level of second and third year of high-school
to challenges in accurately comprehending geometric figures [21]. curricular design. The assessment of chatbots capacity of providing
Specifically, the model struggles with understanding the relationships accurate answers and, or in the case of failure, the common mistakes
between fundamental elements like points and lines, and in accurately and deficiencies found in the described solutions, can serve as basis
interpreting elements such as the degree of an angle. It has been for the creation of more efficient teaching methodologies involving
argued [21] that the inaccurate descriptions for geometric shapes generative AI.
produced by models such as GPT4-V (GPT4 with vision) reside on the For the purpose of this study, an Olympiad problem was selected, as
fact that the model struggles with understanding the relationships described in section A, and the answers of three chatbots, enumerated
between fundamental elements like points and lines, and in accurately in section B, to its formulation were collected. The methodology used
for analyzing these answers is described in section C.
6
https://paperswithcode.com/dataset/draw-1k
7
https://paperswithcode.com/dataset/gsm8k 8
https://www.oma.org.ar/

- 67 -
International Journal of Interactive Multimedia and Artificial Intelligence, Vol. 8, Nº5

TABLE I. Summary of Errors Found in the Answers of Chatbots

ChatGPT 3.5 Bing Chat Bard
Error type
#1 #2 #3 Total Precise Balanced Creative Total #1 #2 #3 Total
Construction 2 0 2 4 - 0 3 3 3 0 1 4
Conceptual 2 3 0 5 - 3 0 3 0 2 0 2
Contradiction 0 0 1 1 - 0 1 1 0 0 0 0
Total 4 3 3 10 -* 3 4 7 3 2 1 6
* This is a case in which the chatbot did not provide a solution to the problem.

A. Geometry Problem The solution proposed by the OMA [29] is based on the graphic
The problem used in this work belongs to the May Olympiads, an representation of the decagon and the identification of the segment
Iberoamerican Mathematics contest. This competition has 2 levels, that needs to be calculated (PQ). The suggested strategy for reaching
the first level is for students who, in the year previous to the contest, the solution consists in drawing segments that join the vertices of the
are under 13 years old at December 31st, and the second level is for decagon with its center and diagonals. The analysis of the triangles
students under 15 years old at December 31st. In each level the test is and trapezoids that result from the constructions allows to infer that
unique, and it consists of 5 problems that students must solve within 3 the triangles are isosceles. From this analysis it is concluded that the
hours. From these problems, a Geometry problem of level 1 proposed requested segment has the same length as the radius of the circumference
at May Olympiads in 20189 [29] was considered. in which the decagon is inscribed. This resolution enables to find the
exact value of the length of the segment PQ, which is 5 cm.
The problem selected is characterized by not having an immediate
and unique solution. In fact, reaching a solution requires knowledge B. Chatbots and LLMs
about regular polygons and their properties, circumference and its The three major, freely accessible chatbots available at the time of
properties, similarity between polygons, the Pythagorean theorem, this article were used for collecting answers for the previous problem.
trigonometric ratios, among other concepts. Therefore, it is necessary Each of these chatbots rely on its own large language model, an AI
to know and understand a variety of geometrical notions to decide model designed to understand and generate human-like text based
which is the most appropriate to reach a solution. on deep learning techniques, learned on different corpus using also
The geometric problem was selected in such a way that both different learning strategies. LLMs have a large number of parameters
the mathematical concepts involved and the procedures for its and are trained over a massive amount of text data from different
resolution correspond to what is indicated in the official curricular sources to capture complex language patterns and relationships.
design for Argentine secondary schools [23]. In these designs, the Specifically, the chatbots used for this study were:
Ministry of Education proposes the minimum knowledge that must ChatGPT: ChatGPT (September 25 version) trained over GPT-3.5
be taught in each discipline for each year of the Argentine secondary language model is the original chatbot launched by OpenAI
level. In particular, in Mathematics and in the Geometry area, for in November, 2022.
students aged 12–13 years old, the study of figures is proposed,
Bing Chat: the chatbot accessible through Microsoft Bing search en-
arguing about the analysis of properties. In correspondence with the
gine and running on GPT-4. This chat offers answers in three
selected problem, students are encouraged to: determine points that
modes: (1) More Creative: responses are original and imagi-
meet conditions related to distances and construct circumferences,
native, creating surprise and entertainment; (2) More Precise:
circles, bisectors and perpendicular bisectors as geometric spaces;
responses are factual and concise, prioritizing accuracy and
explore different constructions of triangles and argue about
relevancy; and (3) More Balanced: responses are reasonable
necessary and sufficient conditions for their congruence; construct
and coherent, balancing accuracy and creativity in conver-
similar figures from different information and identify necessary
sation.
and sufficient conditions of similarity between triangles; analyze
claims about properties of figures and argue about their validity, Bard: the chatbot developed by Google AI and powered by PaLM-2
recognizing the limits of empirical evidence; formulate conjectures large language model.
about properties of figures (in relation to interior angles, bisectors, For this analysis, zero-shot learning was employed. This is, LLMs
diagonals, among others) and produce arguments that allow them to were asked to answer the question directly, without any prior data or
be validated. Therefore, the problem analyzed in this work, although example questions. The prompt was the problem statement in Spanish
it may not be a typical high-school task, involves the concepts that exactly as in the original text of the Olympiad competition. For each
should be addressed at school according to what is prescribed by the model, 3 answers were obtained by regenerating the responses in
Argentinian curricular design. order to account for the randomness in text generation.
The problem statement is as follows:
C. Methodology
Problem Statement Beyond the correctness of the solution itself, the answers provided
by chatbots were scanned for identifying reasoning mistakes and
Sea ABCDEFGHIJ un polígono regular de 10 lados que tiene todos sus
inaccuracies in the generated chain-of-thought, individual steps and
vértices en una circunferencia de centro O y radio 5. Las diagonales AD
y BE se cortan en P y las diagonales AH y BI se cortan en Q. Calcular la operations. Basically, it was checked if the appropriate notions were
medida del segmento PQ. recalled and correctly applied and if the chatbot was able to generate a
coherent answer with an accurate solution.
English translation: Let ABCDEFGHIJ be a regular 10-sided polygon
that has all its vertices in a circumference with center O and radius 5. The In the process of analyzing the answers of chatbots to the stated
diagonals AD and BE intersect at P and the diagonals AH and BI intersect Geometry problem, several mistakes of different types were identified.
at Q. Calculate the length of segment PQ. After grouping these mistakes according to their nature, we propose a
general categorization of errors. Mistakes made in solving the problem
9
https://www.oma.org.ar/enunciados/enunciados_Mayo2018.pdf
were classified into three main types or categories:

- 68 -
Special Issue on Generative Artificial Intelligence in Education

• Construction: in this category we find errors originated on From the 9 answers (3 for each model) extracted from ChatGPT 3.5,
the representation made on the plane of the geometric elements Bing Chat and Bard, only one of them indicated the correct value of
indicated in the text answer given by a chatbot. In other words, a the PQ segment length, i.e. only one provided the correct solution to
construction error is a mismatch between the textual response and the stated problem, this corresponds to the Bard response #2. However,
the actual geometric figures and their graphical representation. the model arrived at the result through a method having conceptual
For example, the chatbot ensures that a central angle has 72° when errors, thereby it cannot be considered a satisfactory solution either.
the actual amplitude according to the description given of the In addition, there was a case in which the chatbot did not provide a
figure’s elements is necessarily a different one. solution at all, this is the case of Bard when it is asked for the More
Construction errors denote a lack of comprehension of the LLMs of Precise answer to the question. The answer pointed out some decagon
the spatial relationships among elements like points, lines and angles. properties, but ends up saying (translated from Spanish): "However,
As the description of the geometric problem reasoning advances, this calculation can be quite complicated and would require in-depth
it starts to lose correlation with the actual graph that materializes knowledge of the Geometry of the decagon. I would recommend that you
such description. More likely, there errors stem from the inability of consult a Geometry textbook or online resource for a detailed explanation
generative AI to understand the semantics behind these geometric of how to perform these calculations".
notions at the level required for geometric reasoning. Overall, the general performance of LLMs in generating a text
for answering the Geometry problem stated was disappointing,
• Conceptual: errors in this category relate to incorrect definitions,
completely failing at providing an accurate answer to the problem at
the application of properties without guaranteeing the necessary
hand and making a considerable number of mistakes of different types
conditions or mixing measurement units (e.g. units of length
along the reasoning process. This is a concerning finding, considering
with those of amplitude). An example of conceptual error can be
that the problem presented is a high-school level one, designed for
applying the Pythagorean theorem to a not right-angled triangle.
students under 13 years old, which are likely to access chatbots looking
The possible causes of these mistakes can be varied. Language
for help and would receive not only unreliable answers, but possible
generation tools based on AI are capable of producing text using
introducing or reaffirming Geometry misconceptions.
geometric vocabulary, which allows them, for example, to give a
reasonable explanation of the Pythagorean theorem. However, as Considering the type of errors made by each chatbot, ChatGPT
a consequence of an inadequate knowledge and representation of 3.5 and Bard were the ones exhibiting more errors belonging to
geometric shapes, they are also likely to offer solutions that apply the Construction type. Additionally, ChatGPT 3.5 contains a greater
the theorem incorrectly or make inaccurate calculations. LLMs number of errors of the Conceptual category. Less frequent in all
can also suffer from a deficient context description, which in a answers are the errors in the “Contradiction” category, accounting for
next-word mechanism is the previous sequence of words. Then, one error of ChatGPT 3.5 and one of Bing Chat, but none in Bard.
the omission of relevant information reduces the precision in For illustrating the different types of errors found in the analyzed
text prediction. The deficient description of the context includes answers, Tables II, III and IV provide examples of each type of the
simply missing some piece of information (e.g. the amplitude of a errors existing in the actual answers from the model. The tables
given angle), but also well-known properties (e.g. that the angles include a fragment of the response (2nd column) generated by a
of a triangle must sum to 180 degrees) and common assumptions. chatbot (indicated in the 1st column) based on the corresponding
Furthermore, LLMs are data-driven models trained on data that LLM when queried using the problem statement and a description
might include generalized mistakes and misconceptions. Due to of the mistake made (3rd column). In the last column, observations
their probabilistic nature, LLMs are then prone to reproduce them. related to the error detected are commented accompanied by a graph,
• Contradiction: in a number of reasoning steps, contradictions made by the authors of this paper, based on the indications provided
arise as an inconsistency between a deduction and either in the response.
information involved in the following reasoning steps or the In the first of them, Table II, the errors refer to the construction of
representation on the plane. In other words, the chain-of-thought angles (ChatGPT3.5), the construction of right triangles (Bing Chat)
contains contradictory knowledge, which invalidates the whole and supplementary angles (Bard). Then, in Table III, the errors that
reasoning. For example, a contradiction can be inferring that an are exemplified refer to units of length and amplitude (ChatGPT3.5),
angle is acute while the graphical representation built starting to lengths of diagonals of the decagon (Bing Chat) and to heights of
from this deduction depicts a straight angle. triangles (Bard). Table IV contains prototypes of statements about the
The mentioned categories groups a number of mistakes found in equality of segments of different lengths (ChatGPT3.5), and mismatch
the solutions provided by chatbots. In a single answer, one or more of between exterior and interior angles (Bing Chat). In this case, Bard
these mistakes were identified, leading to a conjunction of errors that does not contain errors of the Contradiction type.
ended up in a wrong answer to the problem. This general classification From a qualitative point of view, responses of the different
of mistakes found in the collected answers enables to reach a better chatbots powered by the corresponding LLM, exhibit some distinctive
comprehension about the failures on geometric reasoning of LLM characteristics.
generated texts. In the case of the ChatGPT 3.5, some observations can be made
about the provided answers. First, the generated text do not justify the
IV. Results & Discussion mathematical procedures. Justification and reasoning about solutions
are important goals in mathematical work in general, and for students
In order to compare the performance of chatbots according to the who solve math problems in particular. Research shows that this is
provided responses, which due to space limitations are not detailed an issue that needs attention as there are important differences in the
here, Table 1 summarizes the total number of errors found within each way math problems are approached and how justification supports
category. For ChatGPT 3.5 three responses were generated, Bard also the proposed solutions [30], [31]. Second, the notions used are not
offers three versions of the answer through its interface, and Bing explicitly identified and mentioned as denoted in the field. Third, the
Chat provides three answers in the form of the more precise, the more response does not offer definitions and/or clarifications that could
balanced and the more creative one. contribute to the resolution.

- 69 -
International Journal of Interactive Multimedia and Artificial Intelligence, Vol. 8, Nº5

TABLE II. Examples of Construction Errors Produced by the Models

Chatbot Fragment of the response Description
ChatGPT 3.5

English translation:
Now, consider the triangle OPQ formed by the diagonals AD and BE that
intersect at P and the diagonals AH and BI that intersect at Q. This triangle has a
central angle O that subtends the arc AB (36 degrees), and the other two angles
at O are equal to α (the angles at vertices A and B).
We will use the property of the sum of the angles of a triangle, which tells us
Fig. 1. Graphical representation in GeoGebra of
that the sum of the angles in a triangle is always equal to 180 degrees. Then:
ChatGPT 3.5 response.
36°+ α + α = 180°
2α + 36° = 180° Considering the OPQ triangle, the angle with center O is
2α = 180° - 36° 108°(3α with α=36°) as can be observed in Fig. 1. The model
states erroneously that the angle with center O is 72°(2α
with α=36°)
Bing Chat
(More
creative)

English translation:
3. Applying the exterior angle theorem, I found that angle BPD measures 72°+
36° = 108°and angle AQH measures 72°+ 72°= 144°.
4. Using the vertex opposite angles theorem, I deduced that angle APD also
Fig. 2. Graphical representation in GeoGebra of Bing
measures 108°and angle BQH also measures 144°.
Chat response.
5. Finally, applying the Pythagorean theorem to the right triangle OPQ, I found
that the measure of the segment PQ is: The model states that the OPQ triangle is a right one,
PQ2 = OP 2 + OQ 2 − 2 ∗ OP ∗ OQ ∗ cos(PÔQ) pink triangle in Fig. 2, but it is not. The POQ angle is
PQ2 = 52 + 52 − 255 ∗ cos(108º + 144º) 108°because it is the sum of 3 central angles corresponding
PQ2 = 50 − 50 ∗ cos(252º) to the 3 sides of the decagon involved and each of them is
PQ2 = 50 + 50 ∗ 0.9511 36°. Thus, each of the other 2 angles is (180°-108°)/2= 36°.
PQ2 = 97,555
PQ =
Bard

English translation:
Since the polygon is regular, all interior angles are equal. In a 10-sided polygon,
the sum of the interior angles is (10-2)*180°=1440°. Since each interior angle is
equal, each one measures 1440°/10=144°.
The angles ∠APD and ∠BQE are supplementary, so ∠APD + ∠BQE = 180º.
Substituting the values of the angles, we obtain 180º = 144º + ∠BQE. From this
equation, it follows that ∠BQE = 36º.
The angles ∠AOB and ∠BOC are equal, since they are opposite by the vertex O. Fig. 3. Graphical representation in GeoGebra of Bard
Therefore, ∠AOB = ∠BOC = =18º response.

The model considers that the APD angle is supplementary

to BQE, but APD is already 180°as can be observed in Fig. 3.

- 70 -
Special Issue on Generative Artificial Intelligence in Education

TABLE III. Examples of Conceptual Errors Produced by the Models

Chatbot Fragment of the response Description
ChatGPT 3.5 A radian is a unit of measurement for angles. The model
gives the length of the segment PQ in radians, therefore, it
English translation: confuses the units of length with those of amplitude.
The length of the segment PQ is equal to the length of the arc PQ, which is (5/4)
π units.
Bing Chat
(More
creative)

English translation:
This problem can be solved using the formula for the length of the diagonal of a
regular polygon. The length of the diagonal of a regular polygon with n sides is
equal to:

Fig. 4. Graphical representation in GeoGebra of Bing

Chat response.

The model proposes a formula to calculate the length of the

diagonals of the decagon. It assumes that all diagonals that
can be drawn have the same length, which, as shown in
Fig. 4, is incorrect.
Bard The answer considers that the triangle POQ is isosceles. It
also considers PQ to be the height of the triangle, but this is
not the case. Based on this, using the Pythagorean theorem,
it proposes to calculate the segment PQ. It is not noticed
that the triangle POQ is not a right angle one, because the
angle POQ measures 108° as explained above. The correct
value of the segment PQ is found in this response, but the
procedure is incorrect.

English translation:
Therefore, the segment is the height of the isosceles triangle △POQ.
If x is the measure of the segment , then QO = 5 − x. By the Pythagorean
theorem in △POQ, we have that
x2 = (5 − x)2 + 52
x2 = 25 − 10x + x2 + 25
2x2 − 10x = 0
x (2x − 10) = 0
x = 0 or x = = 5

Since x is the measure of a segment, x > 0. Therefore, the measure of segment

is 5.

In terms of this general characterization of responses, in the first beyond the generated text. Finally, Bard mentions the decagons, the
response, Bing Chat explains the characteristics of the decagons, the properties of angles and sides, and tries to solve the problem, but the
properties of the angles and the sides, but it does not solve the problem justification of the procedure is incorrect. For answer #2, a justification
at all. Instead, the chatbot limits itself to suggest consulting a Geometry is developed and, as previously mentioned, the chatbot arrives at the
book or online resources. On the contrary, in the second and third correct answer, but through a procedure containing Conceptual errors.
answers, it uses an equation editor (instead of equation written in a
textual manner) and suggests resorting to external websites (which
V. Conclusions
are linked) either to reference figures or schemes included in the
answer (answer #2) or to refer the reader to more examples of solved In this work, we have presented an analysis and comparison of
Geometry problems (answer #3). Both characteristics, using proper resolutions formulated by three major chatbots such as ChatGPT 3.5,
notation to better display equations and linking to external sources, Bing Chat and Bard, to a Geometry problem extracted from the first
are beneficial for students as they can resort to some extra help level of the May Olympiads competition (for students under 13). The

- 71 -
International Journal of Interactive Multimedia and Artificial Intelligence, Vol. 8, Nº5

TABLE IV. Examples of Contradiction Errors Produced by the Models

Chatbot Fragment of the response Description
ChatGPT 3.5

English translation: Fig. 5. Graphical representation in GeoGebra of

Since AP = AQ (since both are radii of the circle of radius 5), we can simplify the ChatGPT 3.5 response.
formula:
PQ2 = 2(AP 2) (1 − cos(∠PQB)) It states that AP (5 u) is equal to AQ (3,09 u), both are
PQ2 = 2(52) (1 − cos(108º)) depicted in Fig. 5.
PQ2 = 2(25) (1 − cos(108º))
PQ2 = 50 (1 − cos(108º))
Now, we can calculate the value of cos(108º), which is approximately equal
to -0.5 (you can use a calculator to get a more precise value). Then, we can
substitute this value into the equation:
PQ2 = 50 (1 −(−0.5))
PQ2 = 50 (1 + 0.5)

Bing Chat
(More
creative)

English translation:
Idem Table II, row 2.

Fig. 6. Graphical representation in GeoGebra of Bing

Chat response.

It proposes to calculate the angle BPD by identifying an

exterior angle and proposes that the angle measures 108°
and AQH=144°, this cannot be right because, as the points
are aligned, the angle is straight (180°) as can be observed
in Fig. 6.
Bard No Contradiction errors were identified in this model answers

three chatbots leverage different LLMs, namely GPT-3.5, GPT-4 and In a more detailed analysis of the answers, we found that all of
PaLM-2, to generate textual responses to natural language queries. In the responses given by the different chatbots contained several types
particular, the problem statement as originally presented to students of errors. In a further inspection of these different errors we were
in Spanish was used as a prompt for the chatbots so that three able to define a classification encompassing three main categories:
answers were collected from each in order to account for the random construction, conceptual and contradiction. Construction errors
components of content generation. correspond to a mismatch between the text description and its
In terms of correctness of the obtained solutions, chatbots had a geometric representation, conceptual errors involve the incorrect use
disappointing performance. Only one answer, provided by Bard, of geometric concepts and misconceptions, while the last type of error
reached the number that was expected ( ). However, even refers to contradictions appearing within the textual description or
when it arrives to the right answer, the described reasoning contains with respect to the graphical representation.
conceptual errors. On the other side, the first response given by Bing According to the proposed categorization of errors, ChatGPT 3.5
Chat does not offer a solution, it only refers the user to consult a and Bard made most mistakes within the Construction category. This
Geometry book or some online resource. is an issue related specifically to Geometry as it has to do with the

- 72 -
Special Issue on Generative Artificial Intelligence in Education

translation of a geometric specification given in text to a graphical would be useful to distinguish when it is possible (or not) to apply a
representation. Additionally, ChatGPT 3.5 responses contain a greater theorem (lemma, corollary, etc.).
number of errors in the Conceptual category, this is, in the application In view of the current wide adoption of chatbot technologies in the
of geometric notions. The Contradiction category is the less frequent classroom and by students of different ages, future work is envisioned
one, appearing once in ChatGPT 3.5 answers and once in the ones to expand the categorization of errors in Geometry problems through
from Bing Chat, but never in Bard answers. the analysis of more problems in different levels. The analysis of a wider
Most failures observed in the answers to the proposed problem are variety of problems would likely allow a finer-grained categorization of
related to two common criticisms of LLMs [32], the lack of symbolic errors and the emergence of more types, less frequent types of mistakes.
structure and the lack of grounding. Both questions their capacity to Ultimately, systematic evaluations of LLMs performance as the one
provide human language representation and understanding in spite carried out in this work contributes to the ongoing development of
of their human-like language abilities. The lack of symbolic structure more advanced, capable AI chatbot systems that can be fully integrated
prevents the model to perform formal reasoning and verify reasoning in teaching practices to enhance learning processes.
steps, whereas the lack of grounding leads to the misinterpretation of
geometric notions and their visual representations. to In other words,
References
the fact of being language models poses some limitations for solving
more formal problems, such as Geometry ones. [1] S. Frieder, L. Pinchetti, A. Chevalier, R.-R. Griffiths, T. Salvatori, T.
The proposed classification contributes to a better understanding of Lukasiewicz, P. C. Petersen, J. Berner, “Mathematical capabilities of
ChatGPT,” 2023.
the failures of LLMs in math-problem solving and, more specifically,
[2] D. Hendrycks, C. Burns, S. Kadavath, A. Arora, S. Basart, E. Tang, D. Song,
those related to spatial representations involved in Geometry
J. Steinhardt, “Measuring mathematical problem solving with the MATH
problems (e.g. construction errors refers to the relation between the dataset,” in Proceedings of the 35th Conference on Neural Information
text and its graphical interpretation). The knowledge and recognition Processing Systems Datasets and Benchmarks Track (Round 2), 2021.
of these issues represent also an opportunity to see errors as a valuable [3] P. Shakarian, A. Koyyalamudi, N. Ngu, L. Mareedu, “An independent
educational tool [33]. This categorization can serve as the basis for evaluation of ChatGPT on mathematical word problems (MWP),” in
the construction of methodologies that include the interaction Proceedings of the AAAI 2023 Spring Symposium on Challenges Requiring
with chatbots in the classroom leveraging on errors to foster their the Combination of Machine Learning and Knowledge Engineering (AAAI-
identification, critical thinking of reasoning steps and operations, and MAKE 2023), 2023.
reflection on alternative problem solutions. [4] A. Chowdhery, S. Narang, J. Devlin, M. Bosma, G. Mishra, A. Roberts,
P. Barham, H. W. Chung, C. Sutton, S. Gehrmann, P. Schuh, K. Shi, S.
Although the disappointing results provided by chatbots cannot Tsvyashchenko, J. Maynez, A. Rao, P. Barnes, Y. Tay, N. Shazeer, V.
be directly attributed to the language used, training data in Spanish Prabhakaran, E. Reif, N. Du, B. Hutchinson, R. Pope, J. Bradbury, J. Austin,
is known to be smaller than in English. Consequently, next-word M. Isard, G. Gur-Ari, P. Yin, T. Duke, A. Levskaya, S. Ghemawat, S. Dev,
prediction performed by LLMs can be assumed to be less precise, H. Michalewski, X. Garcia, V. Misra, K. Robinson, L. Fedus, D. Zhou, D.
thereby the generated lower-quality content. In fact, the reported Ippolito, D. Luan, H. Lim, B. Zoph, A. Spiridonov, R. Sepassi, D. Dohan, S.
evaluations of LLMs on different benchmarks including Geometry Agrawal, M. Omernick, A. M. Dai, T. S. Pillai, M. Pellat, A. Lewkowycz, E.
Moreira, R. Child, O. Polozov, K. Lee, Z. Zhou, X. Wang, B. Saeta, M. Diaz,
problems in English, as discussed in section II, showed a better
O. Firat, M. Catasta, J. Wei, K. Meier-Hellstern, D. Eck, J. Dean, S. Petrov,
performance than the one achieved with this particular problem. Even N. Fiedel, “PaLM: Scaling language modeling with pathways,” 2022.
tough an example is clearly not sufficient to draw conclusions, the [5] K. Cobbe, V. Kosaraju, M. Bavarian, M. Chen, H. Jun, L. Kaiser, M. Plappert,
language can be considered a source of additional difficulties for LLMs. J. Tworek, J. Hilton, R. Nakano, C. Hesse, J. Schulman, “Training verifiers
Findings of the analysis carried out in this work are specially to solve math word problems,” arXiv preprint arXiv:2110.14168, 2021.
concerning, considering that the problem presented is a high-school [6] T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A.
level one, designed for students under 13 years old (although being Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-
Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler,
an Olympiad problem may be beyond the capabilities of a typical
J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B.
of student of that age), which have easy access and are likely to Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, D.
resort to chatbots looking for help to solve similar problems. In this Amodei, “Language models are few-shot learners,” in Proceedings of the
context, they not only will receive unreliable answers in terms of 34th International Conference on Neural Information Processing Systems
the correctness of the solution to a stated problem, but what is even (NIPS’20), Vancouver, BC, Canada, 2020.
more serious, they will be also exposed to inaccurate applications of [7] OpenAI, “GPT-4 technical report,” ArXiv, vol. abs/2303.08774, 2023.
mathematical notions, possibly introducing new misconceptions or [8] F. J. García Pen¨ alvo, F. Llorens-Largo, J. Vidal, “La nueva realidad de la
reaffirming existing ones. This is also a warning sign for teachers educación ante los avances de la inteligencia artificial generativa,” RIED-
using chatbots to generate course material or exam questions, as they Revista Iberoamericana de Educación a Distancia, vol. 27, p. 9–39, ene.
2024, doi: 10.5944/ried.27.1.37716.
can inadvertently introduce some mistakes.
[9] B. Memarian, T. Doleck, “ChatGPT in education: Methods, potentials,
According to the results obtained in solving the problem stated and limitations,” Computers in Human Behavior: Artificial Humans, vol. 1,
and taking into account the general characterization of the interface no. 2, p. 100022, 2023, doi: 10.1016/j.chbah.2023.100022.
of these tools, it can be concluded that the use of chatbots (and the [10] B. Han, S. Nawaz, G. Buchanan, D. McKay, “Ethical and pedagogical
models behind them) for solving Geometry problems is not appropriate impacts of AI in education,” in Artificial Intelligence in Education, Tokyo,
without a critical analysis from teachers as well as the students. The Japan, 2023, pp. 667–673.
inclusion of these technologies in the classroom must follow a careful [11] J. Flores-Vivar, F. García-Pen¨ alvo, “Reflections on the ethics, potential,
and challenges of artificial intelligence in the framework of quality
methodological approach. Potentially valuable applications of these
education (SDG4),” Comunicar, 2023, doi: 10.3916/C74-2023-03.
models in the classroom could be the critically enhanced analysis,
[12] R. Hadi Mogavi, C. Deng, J. Juho Kim, P. Zhou, Y. D. Kwon, A. Hosny
supported by teachers, of the responses obtained by chatbots, such Saleh Metwally, A. Tlili, S. Bassanelli, A. Bucchiarone, S. Gujar, L.
as the one presented in this work. This would allow students to E. Nacke, P. Hui, “ChatGPT in education: A blessing or a curse? A
discuss and learn Geometry concepts (properties, characteristics, qualitative study exploring early adopters’ utilization and perceptions,”
constructions in the plane, etc.) in a practical way. For example, it Computers in Human Behavior: Artificial Humans, vol. 2, no. 1, p. 100027,
2024, doi: 10.1016/j.chbah.2023.100027.

- 73 -
International Journal of Interactive Multimedia and Artificial Intelligence, Vol. 8, Nº5

[13] S. S. Gill, M. Xu, P. Patros, H. Wu, R. Kaur, K. Kaur, S. Fuller, M. Singh, Verónica Parra
P. Arora, A. K. Parlikad, V. Stankovski, A. Abraham, S. K. Ghosh, H.
PhD in Mathematics Education from the Universidad
Lutfiyya, S. S. Kanhere, R. Bahsoon, O. Rana, S. Dustdar, R. Sakellariou,
Nacional del Centro de la Provincia de Buenos Aires
S. Uhlig, R. Buyya, “Transformative effects of ChatGPT on modern
(UNCPBA), 2012. Associate professor in the Teacher
education: Emerging era of AI chatbots,” Internet of Things and Cyber-
Training Department at UNCPBA, member of NIEM
Physical Systems, vol. 4, pp. 19–23, 2024, doi: 10.1016/j.iotcps.2023.06.002.
Research Institute and Associate researcher at CONICET.
[14] C. K. Lo, “What is the impact of ChatGPT on education? A rapid review
Her research interests include mathematics teaching and
of the literature,” Education Sciences, vol. 13, no. 4, 2023, doi: 10.3390/
use of resources for teaching.
educsci13040410.
[15] S. Chithrananda, G. Grand, B. Ramsundar, “ChemBERTa: Large-scale Patricia Sureda
self-supervised pretraining for molecular property prediction,” ArXiv,
vol. abs/2010.09885, 2020. PhD in Mathematics Education from the Universidad
[16] Y. Wu, F. Jia, S. Zhang, H. Li, E. Zhu, Y. Wang, Y. T. Lee, R. Peng, Q. Wu, Nacional del Centro de la Provincia de Buenos Aires
C. Wang, “An empirical study on challenging math problem solving with (UNCPBA), 2012. Associate professor in the Teacher
GPT-4,” 2023. Training Department at UNCPBA, member of NIEM
[17] R. T. McCoy, S. Yao, D. Friedman, M. Hardy, T. L. Griffiths, “Embers Research Institute and Assistant researcher at CONICET.
of autoregression: Understanding large language models through the Her research interests include mathematics teaching and
problem they are trained to solve,” 2023. use of resources for teaching.
[18] P. Nguyen, P. Nguyen, Bruneau, L. Cao, Wang, H. Truong, “Evaluation of
mathematics performance of Google Bard on the mathematics test of the Ana Corica
vietnamese national high school graduation examination,” 07 2023. PhD in Education Science from the Universidad Nacional
doi: 10.36227/techrxiv.23691876.v1. de Córdoba (UNC), 2010. Associate professor in the
[19] V. Plevris, G. Papazafeiropoulos, A. Jiménez Rios, “Chatbots put to the test Teacher Training Department at UNCPBA, director of
in math and logic problems: A preliminary comparison and assessment NIEM Research Institute and Associate researcher at
of ChatGPT- 3.5, ChatGPT-4, and Google Bard,” 2023. CONICET. Her research interests include mathematics
[20] J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. Chi, Q. teaching and use of resources for teaching.
Le, D. Zhou, “Chain-of-thought prompting elicits reasoning in large
language models,” 2023. Silvia Schiaffino
[21] J. Gao, R. Pi, J. Zhang, J. Ye, W. Zhong, Y. Wang, L. Hong, J. Han, H. Xu,
Z. Li, L. Kong, “G-LLaVA: Solving geometric problem with multi-modal PhD in Computer Science from the Universidad Nacional
large language model,” 2023. del Centro de la Provincia de Buenos Aires (UNCPBA),
[22] H. Liu, C. Li, Q. Wu, Y. J. Lee, “Visual instruction tuning,” in NeurIPS, 2004. Full-time associate professor in the Computer
2023. Science Department at UNCPBA, member of ISISTAN
[23] Ministerio de Educación, Argentina, Núcleos de Aprendizajes Prioritarios. Research Institute and Principal researcher at CONICET.
Matemática. Ciclo Básico Educación Secundaria 1° y 2° / 2° y 3° An¨ os. 2006. Her research interests include recommender systems, user
[24] R. S. Abrate, G. I. Delgado, M. D. Pochulu, “Caracterización de las profiling and personalization.
actividades de geometría que proponen los textos de matemática,” Revista
Iberoamericana de Educación, vol. 39, pp. 1–9, jun. 2006, doi: 10.35362/ Daniela Godoy
rie3912598. PhD in Computer Science from the Universidad Nacional
[25] M. B. López, I. B. Fernández, “Tendencias actuales de la ensenanza- del Centro de la Provincia de Buenos Aires (UNCPBA),
aprendizaje de la geometría en educación secundaria,” Revista 2005. Full-time associate professor in the Computer
Internacional de Investigación en Ciencias Sociales, vol. 8, no. 1, pp. 25–42, Science Department at UNCPBA, member of ISISTAN
2012. Research Institute and Principal researcher at CONICET.
[26] A. M. Bressan, K. Crego, B. Bogisic, Razones para ensenar geometría en Her research interests include recommender systems,
la educación básica: mirar, construir, decir y pensar (1a. ed.). Novedades social networks and text mining.
educativas, 2000.
[27] C. R. Suárez, T. Ángel Sierra Delgado, “Spatial problems: An alternative
proposal to teach geometry in compulsory secondary education,” Educaçao
Matemática Pesquisa, vol. 22, ago. 2021, doi: 10.23925/1983-3156.2020v22
i4p593-602.
[28] L. Santalo, “Olimpíadas matemáticas,” Revista de Educación Matemática,
vol. 6, ago. 2021, doi: 10.33044/revem.11101.
[29] P. Fauring, F. Gutierrez Eds., Olimpiadas de Mayo - XVII a XXIV. Buenos
Aires, Argentina: Red Olimpica, 2020.
[30] B. Glass, C. Maher, “Students problem solving and justification,” in
Proceedings of the 28th Conference of the International Group for the
Psychology of Mathematics Education, vol. 2, 2004, pp. 463–470.
[31] Y. S. Eko, S. Prabawanto, A. Jupri, “The role of writing justification in
mathematics concept: the case of trigonometry,” Journal of Physics:
Conference Series, vol. 1097, p. 012146, sep 2018, doi: 10.1088/1742-
6596/1097/1/012146.
[32] E. Pavlick, “Symbols and grounding in large language models,”
Philosophical Transactions of the Royal Society A: Mathematical, Physical
and Engineering Sciences, vol. 381, no. 2251, p. 20220041, 2023, doi:
10.1098/rsta.2022.0041.
[33] G. M. Zunzarren, “The error as a problem or as teaching strategy,”
Procedia - Social and Behavioral Sciences, vol. 46, pp. 3209–3214, 2012, doi:
10.1016/j.sbspro.2012.06.038.

- 74 -

SSRN Id4603237
No ratings yet
SSRN Id4603237
9 pages
Feduc 09 1386075
No ratings yet
Feduc 09 1386075
15 pages
Ijimai8 - 5 - 8 Evaluating ChatGPT-Generated Linear Algebra Formative Assessments
No ratings yet
Ijimai8 - 5 - 8 Evaluating ChatGPT-Generated Linear Algebra Formative Assessments
8 pages
GPT 4 in Education: Evaluating Aptness, Reliability, and Loss of Coherence in Solving Calculus Problems and Grading Submissions
No ratings yet
GPT 4 in Education: Evaluating Aptness, Reliability, and Loss of Coherence in Solving Calculus Problems and Grading Submissions
31 pages
Nguyen (2025) On Therobustnessofchatgptteachingkoreanmathematics
No ratings yet
Nguyen (2025) On Therobustnessofchatgptteachingkoreanmathematics
21 pages
Math Odyssey Benchmarks
No ratings yet
Math Odyssey Benchmarks
14 pages
Ai-A G D M Q - : Ssisted Eneration of Ifficult ATH UES Tions
No ratings yet
Ai-A G D M Q - : Ssisted Eneration of Ifficult ATH UES Tions
30 pages
Impact Robotic
No ratings yet
Impact Robotic
21 pages
ChatGPT-vs-cesar VOLUME 2
No ratings yet
ChatGPT-vs-cesar VOLUME 2
21 pages
From AI-Generated Lesson Plans To The Real-Life Classes: Explored by Pre-Service Teachers
No ratings yet
From AI-Generated Lesson Plans To The Real-Life Classes: Explored by Pre-Service Teachers
9 pages
GRP2 RWS Final
No ratings yet
GRP2 RWS Final
8 pages
Exploring The Potential of Using ChatGPT in Physics Education
No ratings yet
Exploring The Potential of Using ChatGPT in Physics Education
19 pages
MathChat: Multi-Turn Math Reasoning
No ratings yet
MathChat: Multi-Turn Math Reasoning
24 pages
Large Language Models For Mathematical Reasoning - Progresses and Challenges
No ratings yet
Large Language Models For Mathematical Reasoning - Progresses and Challenges
14 pages
Potential & Pitfalls ChatGPT in Science & Engineering Problem-Solving
No ratings yet
Potential & Pitfalls ChatGPT in Science & Engineering Problem-Solving
11 pages
Ej 1472344
No ratings yet
Ej 1472344
14 pages
Dataset 1
No ratings yet
Dataset 1
32 pages
Unveiling The Potential - A Systematic Review of ChatGPT in Transforming Mathematics Teaching and Learning
No ratings yet
Unveiling The Potential - A Systematic Review of ChatGPT in Transforming Mathematics Teaching and Learning
14 pages
Group 1 Concept 1
No ratings yet
Group 1 Concept 1
2 pages
Artificial Intelligence Chatbot As A Mathematics Curriculum Devel
No ratings yet
Artificial Intelligence Chatbot As A Mathematics Curriculum Devel
27 pages
66 Cece 2 BF 0678
No ratings yet
66 Cece 2 BF 0678
8 pages
Frai 1 1532896
No ratings yet
Frai 1 1532896
3 pages
RAG for Enhanced Math QA Responses
No ratings yet
RAG for Enhanced Math QA Responses
14 pages
Mathematical Language Models: A Survey
No ratings yet
Mathematical Language Models: A Survey
34 pages
z-s2.0-S2666920X23000516-main (Bernabei, 2023)
No ratings yet
z-s2.0-S2666920X23000516-main (Bernabei, 2023)
18 pages
Escholarship UC Item 6kf0r28s
No ratings yet
Escholarship UC Item 6kf0r28s
45 pages
Geometric Problem Solving with G-LLaVA
No ratings yet
Geometric Problem Solving with G-LLaVA
13 pages
Integrating Generative AI Into STEM Education: Enhancing Conceptual Understanding, Addressing Misconceptions, and Assessing Student Acceptance
No ratings yet
Integrating Generative AI Into STEM Education: Enhancing Conceptual Understanding, Addressing Misconceptions, and Assessing Student Acceptance
21 pages
Leveraging Online Olympiad-Level Math Problems For Llms Training and Contamination-Resistant Evaluation
No ratings yet
Leveraging Online Olympiad-Level Math Problems For Llms Training and Contamination-Resistant Evaluation
25 pages
Evaluating The Impact of Ai-Powered Tutors Mathgpt
No ratings yet
Evaluating The Impact of Ai-Powered Tutors Mathgpt
14 pages
3 - Investigating The Effectiveness of ChatGPT in Mathematical
No ratings yet
3 - Investigating The Effectiveness of ChatGPT in Mathematical
17 pages
SSRN 4544010
No ratings yet
SSRN 4544010
14 pages
The Wall Confronting Large Language Models
No ratings yet
The Wall Confronting Large Language Models
24 pages
Investigating The Role of ChatGPT in Supporting Metacognitive Processes During Problem Solving Activities
No ratings yet
Investigating The Role of ChatGPT in Supporting Metacognitive Processes During Problem Solving Activities
25 pages
1533-Article Text-5901-2-10-20240528
No ratings yet
1533-Article Text-5901-2-10-20240528
11 pages
Jornadas 22
No ratings yet
Jornadas 22
2 pages
Brave New World: Artificial Intelligence in Teaching and Learning
No ratings yet
Brave New World: Artificial Intelligence in Teaching and Learning
6 pages
AI Models in Higher Secondary Physics
No ratings yet
AI Models in Higher Secondary Physics
7 pages
Education AI
No ratings yet
Education AI
7 pages
Unveiling The Mathematical Reasoning in Deepseek Models: A Comparative Study of Large Language Models
No ratings yet
Unveiling The Mathematical Reasoning in Deepseek Models: A Comparative Study of Large Language Models
27 pages
Chatgpt A Revolutionary Tool For Teaching and Learning Mathematics 13272
No ratings yet
Chatgpt A Revolutionary Tool For Teaching and Learning Mathematics 13272
18 pages
Group 2 Research Paper Chap 1 3
No ratings yet
Group 2 Research Paper Chap 1 3
31 pages
Chatgpt Enters The Classroom
No ratings yet
Chatgpt Enters The Classroom
4 pages
Can LLMs Really Reason and Plan - blog@CACM - Communications of The ACM
No ratings yet
Can LLMs Really Reason and Plan - blog@CACM - Communications of The ACM
7 pages
How Well Do LLM Perform Iin Arithmetic Tasks
No ratings yet
How Well Do LLM Perform Iin Arithmetic Tasks
10 pages
Developing A Computer-Based Tutor Utilizing Generative Artificial Intelligence (GAI) and Retrieval-Augmented Generation (RAG)
No ratings yet
Developing A Computer-Based Tutor Utilizing Generative Artificial Intelligence (GAI) and Retrieval-Augmented Generation (RAG)
22 pages
Landers 2024 Adapting To The Unsanctioned Use of Ai Supported Technologies in Student Assessments
No ratings yet
Landers 2024 Adapting To The Unsanctioned Use of Ai Supported Technologies in Student Assessments
21 pages
A Systematic Review of Generative AI For Teaching and Learning Practice
No ratings yet
A Systematic Review of Generative AI For Teaching and Learning Practice
20 pages
Dunlop 1
No ratings yet
Dunlop 1
14 pages
Perceptions of Generative AI Tools in Higher Education: Insights From Students and Academics at Sultan Qaboos University
No ratings yet
Perceptions of Generative AI Tools in Higher Education: Insights From Students and Academics at Sultan Qaboos University
16 pages
Research Introduction - Exploring The Impact of AI Math Technology Applications Among STEM Sudents
No ratings yet
Research Introduction - Exploring The Impact of AI Math Technology Applications Among STEM Sudents
13 pages
Adapting Large Language Models For Education: Foundational Capabilities, Potentials, and Challenges
No ratings yet
Adapting Large Language Models For Education: Foundational Capabilities, Potentials, and Challenges
31 pages
AI in Math Education for STEM Students
No ratings yet
AI in Math Education for STEM Students
13 pages
ChatGPT As A Supportive Tool For Creating Assessment Resources
No ratings yet
ChatGPT As A Supportive Tool For Creating Assessment Resources
10 pages
Ethical Considerations For Companies Implementing LLMs in Education Software
No ratings yet
Ethical Considerations For Companies Implementing LLMs in Education Software
6 pages
08 Math
No ratings yet
08 Math
22 pages
Generative Artificial Intelligence (AI) in Higher Education: A Comprehensive Review of Opportunities, Challenges and Implications
No ratings yet
Generative Artificial Intelligence (AI) in Higher Education: A Comprehensive Review of Opportunities, Challenges and Implications
20 pages
AI Literacy's Role in Prompt Engineering
No ratings yet
AI Literacy's Role in Prompt Engineering
14 pages
Research Proposal
No ratings yet
Research Proposal
7 pages
Game Theory With Simulation of Other Players
No ratings yet
Game Theory With Simulation of Other Players
8 pages
Class - Ix Ai Practice Worksheet
No ratings yet
Class - Ix Ai Practice Worksheet
2 pages
BATALLA, JHENNIEL A. - STS-Module 6
No ratings yet
BATALLA, JHENNIEL A. - STS-Module 6
4 pages
PSG Integrated M.Sc. Programs Overview
No ratings yet
PSG Integrated M.Sc. Programs Overview
27 pages
State of Edge 2025 - Zededa
No ratings yet
State of Edge 2025 - Zededa
25 pages
Introduction To Digital Image Processing
No ratings yet
Introduction To Digital Image Processing
3 pages
Srishti 24 Brochure
No ratings yet
Srishti 24 Brochure
2 pages
Magazine (Sci and Tech)
No ratings yet
Magazine (Sci and Tech)
2 pages
Embodiment and Self-Awareness Study
No ratings yet
Embodiment and Self-Awareness Study
9 pages
Technology Readiness and The Organizational Journey Towards AI Adoption - An Empirical Study
No ratings yet
Technology Readiness and The Organizational Journey Towards AI Adoption - An Empirical Study
12 pages
Syncronova Health Intelligence
No ratings yet
Syncronova Health Intelligence
15 pages
Read The Following Passage and Mark The Letter A, B, C or D On Your Answer Sheet To Indicate The Best Answer To Each of The Following Questions
100% (1)
Read The Following Passage and Mark The Letter A, B, C or D On Your Answer Sheet To Indicate The Best Answer To Each of The Following Questions
7 pages
Seminar on Artificial Intelligence Technology
No ratings yet
Seminar on Artificial Intelligence Technology
9 pages
AI Multiple Choice Questions Part 3
No ratings yet
AI Multiple Choice Questions Part 3
8 pages
AI Impact Summit 2026
No ratings yet
AI Impact Summit 2026
12 pages
AI Foundation 2025 Questions
No ratings yet
AI Foundation 2025 Questions
127 pages
UNIT 4 Merged
No ratings yet
UNIT 4 Merged
203 pages
AI Emotion Recognition Seminar
No ratings yet
AI Emotion Recognition Seminar
30 pages
AI & Privacy: Seminar for Filmmakers
No ratings yet
AI & Privacy: Seminar for Filmmakers
59 pages
AI - Unit 03
No ratings yet
AI - Unit 03
9 pages
Azure AI Exam Prep Questions
No ratings yet
Azure AI Exam Prep Questions
19 pages
Almost Unsupervised Text To Speech and Automatic Speech Recognition
No ratings yet
Almost Unsupervised Text To Speech and Automatic Speech Recognition
11 pages
Facial Age and Gender Prediction Using Deep Learning
No ratings yet
Facial Age and Gender Prediction Using Deep Learning
6 pages
Tracy B REPORT MASTERAL TUES
No ratings yet
Tracy B REPORT MASTERAL TUES
6 pages
JNTUK B.Tech 2-2 Sem (R20) 1st Mid Exam Time Table April 2022
No ratings yet
JNTUK B.Tech 2-2 Sem (R20) 1st Mid Exam Time Table April 2022
6 pages
KPMG Creating Value With Ai Agents
No ratings yet
KPMG Creating Value With Ai Agents
5 pages
Advanced Certification In: Data Science and Artificial Intelligence
No ratings yet
Advanced Certification In: Data Science and Artificial Intelligence
15 pages
Module 1 Artificial Intelligence Fundamentals
No ratings yet
Module 1 Artificial Intelligence Fundamentals
27 pages
Himaanshu Sharma - Major Project Mid-Term Report
No ratings yet
Himaanshu Sharma - Major Project Mid-Term Report
27 pages
AIS Chapter One
No ratings yet
AIS Chapter One
21 pages