AI in Math Word Problem Solving

The document discusses the challenges and strategies in advancing artificial intelligence's ability to solve math word problems (MWPs), emphasizing the importance of mathematical reasoning. It reviews both non-neural and neural methods, highlighting the dominance of neural approaches and their various strategies for MWP solving. The authors identify gaps in current methodologies and suggest the need for external knowledge and knowledge-infused learning to enhance mathematical reasoning capabilities in AI systems.

Uploaded by

Manggay Joshua Philip P

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views15 pages

AI in Math Word Problem Solving

Uploaded by

Manggay Joshua Philip P

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Towards Tractable Mathematical Reasoning: Challenges, Strategies, and

Opportunities for Solving Math Word Problems

Keyur Faldu ∗ Amit Sheth Prashant Kikani Manas Gaur Aditi Avasthi
Embibe University of Embibe University of Embibe
South Carolina South Carolina

Abstract

Mathematical reasoning would be one of the

arXiv:2111.05364v1 [[Link]] 29 Oct 2021

next frontiers for artificial intelligence to make sig-

nificant progress. The ongoing surge to solve math
word problems (MWPs) and hence achieve better
mathematical reasoning ability would continue to
be a key line of research in the coming time. We
inspect non-neural and neural methods to solve
math word problems narrated in a natural language.
We also highlight the ability of these methods to
be generalizable, mathematically reasonable, in-
terpretable, and explainable. Neural approaches
dominate the current state of the art, and we survey
them highlighting three strategies to MWP solving:
(1) direct answer generation, (2) expression tree
generation for inferring answers, and (3) template
retrieval for answer computation. Moreover, we Figure 1: An illustration of the mathematical reason-
ing process that involves understanding of natural lan-
discuss technological approaches, review the evo-
guage and visual narratives. Essentially, the process
lution of intuitive design choices to solve MWPs comprises extracting entities with explicit and implicit
and examine them for mathematical reasoning abil- quantities building a self-consistent symbolic represen-
ity. We finally identify several gaps that warrant tation invoking mathematical laws, axioms, and sym-
the need for external knowledge and knowledge- bolic rules.
infused learning, among several other opportunities
in solving MWPs.

1 Introduction over these benchmarks. However it is not under-

stood if such performance is attributed to underly-
The recent success of artificial intelligence over a ing reasoning capabilities (Sap et al., 2020)(Storks
wide range of natural language processing tasks et al., 2019)(Khashabi et al., 2019). Discovering
like hypothesis entailment, question answering, and learning salient heuristic patterns to produce a
summarization, translation, etc., has posed the likely outcome is also much better than rule-based
question that how intelligent are these systems?. systems. When learning is achieved without rea-
The General Language Understanding Evaluation soning, it will reduce to mimicking, which would
(GLUE) (Wang et al., 2018a), and SuperGLUE not be generalizable (Bandura, 2008). Reasoning
(Wang et al., 2019) benchmarks have been cre- ability is at the center of performing higher-order
ated to assess their natural language understand- complex tasks. Mathematical reasoning is one of
ing abilities. Soon enough, deep learning sys- the niche abilities humans possess (Lithner, 2000)
tems have outperformed human-level performance (English, 2013), it constitutes linguistic reasoning,
∗
Correspondence to k@[Link] visual reasoning, common sense reasoning, logi-
cal reasoning, numerical reasoning, and symbolic The task of solving MWPs has a long history
reasoning. Humans do not learn mathematical rea- with consistent research attention. This article
soning just based on experience and evidence, but it surveys the progress in solving MWPs, including
results from learning, inferring, and applying laws, datasets, approaches, and architectures, broadly di-
axioms, and symbolic rules. Mathematical reason- vided into non-neural and neural techniques. We
ing could be one of the most important benchmarks also analyze the mathematical reasoning ability of
to test the progress made by artificial intelligence. such methods.
Mathematical reasoning ability involves under-
standing natural language and visual narratives,
extracting explicit and implicit quantities using
common sense and domain-specific knowledge, in-
ferring a self-consistent symbolic universe, and ex-
ploiting mathematical laws, axioms, and symbolic
rules (Amini et al., 2019)(Saxton et al., 2019). The
problems requiring mathematical reasoning ability
pose a unique opportunity to build a system that
requires collaborative interactions between neural Figure 2: Example of math word problem and its possi-
systems and symbolic systems (Kahneman, 2011). ble problem formulations
AI models for natural languages understanding and
visual understanding could be developed using fast
or probabilistic systems, and corresponding mathe- 2 Background and Non-Neural
matical laws, axioms, and rules could be executed Approaches
using slow and deterministic systems. Math word The complexity of solving MWPs has been well
problems (MWP) are natural language questions understood for a long time, demanding logical sys-
that require mathematical reasoning to solve them. tems representation and machine’s ability to apply
MWP is the natural language narrative that con- such systems with natural language understand-
tains a self-consistent universe where entities with ing and generative capabilities (Feigenbaum et al.,
quantitative information interact with each other 1963). Before the surge of neural network-based
under well-defined mathematical axioms, laws, and approaches to solving MWPs, several non-neural
theorems. Answering MWP would require solving techniques were explored since the 1960s, classi-
unknown quantities or sets of unknown quantities fied into rule-based or pattern matching, seman-
from known explicit or implicit quantities using tic parsing, statistical, and machine learning ap-
semantic information, commonsense world knowl- proaches. However, most non-neural methods fo-
edge, and mathematical domain knowledge. Thus, cus on the template retrieval approach or predicting
MWPs can be of varying difficulty levels based on the answer using rules. These approaches are ex-
the complexity of their narratives, implicit world plained in the following sections.
and domain knowledge needed to solve the prob-
lem, and the application of mathematical topics and 2.1 Rule-based or Pattern Matching
their interactions. Figure 3 demonstrates couple of Approaches
examples of relatively complex MWPs.
Normalizing and transforming input narratives us-
The ability to solve MWPs in an automated fash- ing heuristics, such as rules, regular expressions, or
ion could lead to many applications in the edu- pattern matching, was the first attempt in solving
cation domain for content creation and delivering MWPs (Feigenbaum et al., 1963) (Bobrow, 1964)
learning outcomes (Donda et al., 2020) (Faldu et al., (Charniak, 1968). Such approaches are very intu-
2020a). For example, it could aid students’ learning itive but could only handle limited scenarios, and
journey by resolving their doubts on the fly; poten- hence, they lack the flexibility in handling simple
tially solve the problem of content deficiency for variations of math word problems. Further, efforts
practice assessments (Dhavala et al., 2020). Partic- have been made to develop schema, frame, or ex-
ularly by complementing MWPs recommendation tract entities and quantities associated with it to
systems with AI-based solution generation system apply rules deterministically. (Fletcher, 1985) (Del-
(Faldu et al., 2020b). larosa, 1986) (Yuhui et al., 2010). However, such
Figure 3: Example of relatively complex MWPs from dataset MathQA (Amini et al., 2019), requiring understand-
ing of specific words like umbra and penumbra, applying multiple math concepts, and accessing mathematical
knowledge like math formula for area of circle, sphere, value of π.

rule-based and pattern matching algorithms would their relationships (Kushman et al., 2014). These
not be scalable to learn general-purpose mathemat- methods use machine learning algorithms like sup-
ical reasoning ability. port vector machines, probabilistic models, or mar-
gin classifiers for concept prediction, equation pre-
2.2 Semantic Parsing
diction, or slot filling. They leverage sentence se-
A semantic parsing-based system attempts to learn mantics or verb categorization to map mathemati-
the semantic structure of the input problem and cal operations with segments of an MWP (Hosseini
transforms it into mathematical expressions or a et al., 2014) (Amnueypornsakul and Bhat, 2014).
set of equations. It represents problems in object- A further set of candidate hypotheses are reduced
oriented structure, much similar to text-SQL gen- by handling noun slots and number slots separately
eration, following methods in natural language (Zhou et al., 2015). Few other techniques process
processing (Liguda and Pfeiffer, 2012) (Koncel- narratives to fill concepts-specific slots first and
Kedziorski et al., 2015). Underlying tree depen- derive equations from them using domain-specific
dencies of mathematical semantics are captured rules (Mitra and Baral, 2016)(Roy and Roth, 2018).
by context-free grammar rules, which is analogous
to dependency parse trees from the perspective of 3 Mathematical Reasoning in
mathematical semantics (Shi et al., 2015). Non-Neural Approaches
2.3 Statistical and Machine Learning Mathematical reasoning ability is the key to solving
Approaches mathematical problems. There were few attempts
Prior works on Statistical and machine learning made in this direction using reasoning-oriented sub-
approaches developed automated methods that are tasks like quantity entailment for semantic parsing
rule-based and semantic parsing, which processes (Roy et al., 2015), comprehending numerical in-
training data to derive a set of templates or learn formation, quantity alignment prediction for slot
the semantic structure from the input. They range filling , handling quantity slots and noun slots sep-
from text classification of MWPs with equation arately, ignoring extraneous information present in
templates, extracting and mapping quantities with narratives, etc (Bakman, 2007) (Mitra and Baral,
equations, or entities extraction and classifying 2016) (Roy and Roth, 2018) (Zhou et al., 2015).
Non-neural methods that consider rule-based, of the variable. The comparison concept has three
pattern matching, semantic parsing, and machine slots, namely the large quantity, the small quantity,
learning-based approaches are limited to specific and their difference (Mitra and Baral, 2016). Sim-
areas of mathematics like addition-subtractions, ilarly, other such mutually exclusive groups were
arithmetic, linear algebra, calculus, or quadratic identified by researchers to categorize MWPs, like
equations. The performance of such methods (i) join-separate, (ii) part-whole and (iii) compare
is constrained by the coverage and diversity of (Amnueypornsakul and Bhat, 2014); (i) change,
MWPs in the training corpus. The performance of (ii) combine (iii) compare (Bakman, 2007); or (i)
such approaches could not scale on out-of-corpus transfer, (ii) dimensional analysis, (iii) part-whole
MWPs mainly because of the relatively small-sized relation, (iv) explicit math, (Roy and Roth, 2018)
training corpus (Koncel-Kedziorski et al., 2016) etc.
(Kushman et al., 2014). (Huang et al., 2016) at- Commonsense and linguistic knowledge could
tempted these techniques on a large-scale dataset be useful to derive the semantic structure and rep-
Dolphin18K, having 18000 annotated linear and resent the meaning of MWPs. (Briars and Larkin,
non-linear MWPs. Non-neural methods performed 1984) investigates how commonsense knowledge
much worse on diverse and large-scale datasets could be helpful to solve complex problems narrat-
than their reported performance on their corre- ing real-world situations. The knowledge that an
sponding small and specific datasets, and their per- object defined in the MWP is a member of both a
formance improved sub-linearly with more exten- set and its superset and whether subsets can be ex-
sive training data (Huang et al., 2016). It made a changed in the context of deriving an answer for an
solid case to build more generic systems with better MWP. For example, the math word problem shown
reasoning ability. in figure 2 illustrates objects “orange” and “apple.”
Both belong to a superset “fruits,” and they can be
3.1 Domain Knowledge or External exchanged if the unknown quantity in the question
Knowledge is agnostic to the type of fruit. Dolphin language
representation of MWPs requires extracting nodes
Classifying MWPs into subtypes and handling and their relationships, where nodes could be con-
them using mathematical domain knowledge is stants, classes, or functions. Commonsense knowl-
another way to further push such systems’ capa- edge is used to classify entities sharing common
bilities. It becomes easier to reason about a prob- semantic properties (Shi et al., 2015). Linguis-
lem classified to a subtype as it reduces the candi- tic knowledge like verb sense and verb entailment
date math laws, axioms, and symbolic rules and could help efficiently semantic parsing of MWPs
contextualizes semantics related to the specific narratives to extract semantic slots and fill their
area of math (Mitra and Baral, 2016) (Roy and values. Researchers have explored such external
Roth, 2018)(Bakman, 2007) (Amnueypornsakul knowledge from WordNet (Hosseini et al., 2014)
and Bhat, 2014). For example, addition and sub- or E-HowNet (Chen et al., 2005) (Lin et al., 2015).
traction problems can be grouped in three classes of
mathematical concepts, (i) change, (ii) part-whole 4 Neural Approaches
and (iii) compare as shown in table 1 (Mitra and
Baral, 2016). Semantic parsing techniques form The recent advancement in deep learning ap-
rules to extract semantic information specific to proaches has opened up new possibilities. There is
these sub-types. Table 1 illustrates subtypes of a flurry of research that aims to apply neural net-
a problem and how each problem subtype would works to solve MWPs. Sequence to sequence archi-
have different semantic slots to apply math rules. tectures, transformers, graph neural networks, con-
The concept of part-whole has two slots, one for volutions, and attention mechanisms are a few such
the whole that accepts a single variable and the architectures and techniques of neural approaches.
other for its parts that accepts a set of variables of To empower neural approaches, larger and diverse
size at least two. The change concept has four slots, datasets have been also created (Amini et al., 2019),
namely start, end, gains, losses which respectively (Wang et al., 2017), (Zhang et al., 2020), (Saxton
denote the original value of a variable, the final et al., 2019), (Lample and Charton, 2019). Also,
value of that variable, and the set of increments the potential of neural approaches solving complex
and decrements that happen to the original value problems on calculus and integration has raised ex-
Addition-Subtraction Math Word Problems Subtype Semantic Slots
Sam’s dog had 8 puppies. He gave 2 to his friends. He now has 6 Change (i) start (ii) end (iii) gains
puppies . How many puppies did he have to start with? (iv) losses
Tom went to 4 hockey games this year, but missed 7 . He went to Part- (i) whole (ii) parts
9 games last year. How many hockey games did Tom go to in all ? Whole
Bill has 9 marbles. Jim has 7 more marbles than Bill. How many Compare (i) large quantity (ii) small
marbles does Jim have? quantity (iii) difference

Table 1: Classifying math word problems into subtypes, and semantic slots required for solving MWPs in each
subtypes.

pectations on its future promise. These approaches (Lample and Charton, 2019) achieved near-perfect
could be further categorized based on their prob- accuracy of 99% mainly attributed to the learning
lem formulation, (i) to predict answers directly, (ii) ability of neural models over a huge dataset. (Ran
generate intermediate math expressions, or (iii) re- et al., 2019) proposes a numerically aware graph
trieve a template. However, it is evident that neural neural network to directly predict the type of an-
networks are black boxes, and hard to interpret swer and the actual answer.
their functioning and explain their decision (Gaur The second approach deals with generating an
et al., 2021). Attempts to interpret its functioning expression tree and executing them to compute the
reveal how volatile its reasoning ability is, and they answer. Such models have relatively better inter-
lack generalizability (Patel et al., 2021). pretability and explainability and generated expres-
In the following sections, we aim to categorize sions, which provides scaffolding to the reasoning
the current state-of-the-art development. Then, we ability of the model. Seq-to-seq neural models
navigate problem formulations, datasets and anal- framing from LSTM, Transformers, GNN and Tree
ysis, approaches and architectural design choices, decoders have proven to useful (Wang et al., 2017)
data augmentation methods, and interpretability (Amini et al., 2019) (Xie and Sun, 2019) (Qin et al.,
and reasoning ability. 2020) (Liang et al., 2021). The key challenge for
methods in this approach is the need for expert an-
4.1 Problem Formulation notated datasets, as they need expression trees as
We can categorize the problem formulation to solve the labels for each problem in addition to its answer
MWPs in three different ways, (i) predicting the (Amini et al., 2019)(Wang et al., 2017)(Koncel-
answer directly, (ii) generating a set of equations or Kedziorski et al., 2016).
mathematical expressions, and inferring answers The third approach derives template equations
from them by executing them, and (iii) retrieving from training data, retrieves the most similar tem-
the templates from a pool of templates derived from plate, and substitutes numerical parameters. This
training data and augmenting numerical quantities approach suffers from generalization ability, as the
to compute the answer. An example is shown in set of templates are limited to the training set. Such
Figure 2. methods would fail to solve diverse and out-of-
The first approach to predict the answers directly corpus MWPs as they would not be able to re-
could demonstrate the inherent ability of the neural trieve the template needed to solve them. It was
model to learn complex mathematical transforma- one of the popular statistical machine learning ap-
tions (Csáji et al., 2001); however, the black-box na- proach (Kushman et al., 2014) (Koncel-Kedziorski
ture of such models suffers from poor interpretabil- et al., 2015), (Mitra and Baral, 2016), but there are
ity and explainability (Gaur et al., 2021). It has also few other neural approaches, which have stud-
been shown that sequence to sequence (seq-to-seq) ied its value for an ensemble setup or standalone
neural models could compute free form answers model with larger training data (Wang et al., 2017)
for complex problems with high accuracy when (Robaidek et al., 2018).
trained on large datasets (Saxton et al., 2019)(Lam- Generally, these neural approaches follow
ple and Charton, 2019). It has been analyzed how encoder-decoder architecture to predict expression
models behave when just numerical parameters are trees or directly compute the answer, specifically
changed vs. mathematical concepts in the test set. when the answer could be an expression in itself.
Dataset Mathematics Area Language Type Annotation Size
AI2 (Hosseini et al., 2014) Arithmetic English Curated Equation / Answer 395
IL (Roy et al., 2015) Arithmetic English Curated Equation / Answer 562
ALGES (Koncel-Kedziorski Arithmetic English Curated Equation / Answer 508
et al., 2015)
AllArith (Roy and Roth, 2017) Arithmetic English Derived Equation / Answer 831
Alg514 (Kushman et al., 2014) Algebraic (linear) English Curated Equation/Answer 514
Dolphin1878 (Shi et al., 2015) Algebraic (linear) English Curated Equation / Answer 1,878
DRAW (Upadhyay and Chang, Algebraic (linear) English Curated Equation / Answer / 1,000
2015) Template
Dolphin18K (Huang et al., Algebraic (linear, nonlinear) English Curated Equation / Answer 18,000
2016)
MAWPS (Koncel-Kedziorski Arithmetic, Algebraic English Curated Equation/answer 3,320
et al., 2016)
AQuA (Ling et al., 2017) Arithmetic, Algebraic (linear, English Curated Rationale/MCQ- 100,000
nonlinear) choices/Answer
MathQA (Amini et al., 2019) Arithmetic, Algebraic English Derived Equation/MCQ- 37,000
choices/Answer
MATH (Hendrycks et al., 2021) Algebra, Number Theory, English Curated Step-by-step solution / 12500
Probability, Geometry, Calcu- Answer
lus
ASDiv (Miao et al., 2021) Arithmetic, Algebraic English Derived Equation/Answer + 2,305
Grade/Problem-type
SVAMP (Patel et al., 2021) Arithmetic English Derived Equation/Answer 1,000
Math23K (Wang et al., 2017) Algebraic (linear) Chinese Curated Equation/Answer 23,161
HMWP (Qin et al., 2020) Algebraic (linear + nonlinear) Chinese Curated Equation/Answer 5,491
Ape210K (Zhao et al., 2020) Algebraic (linear) Chinese Curated Equation/Answer 210,488
Deepmind Mathematics (Saxton Algebra, Probability, Calcu- English Synthetic Answer 2,000,000
et al., 2019) lus
Integration & Differentiation Integration, Differentiation English Synthetic Answer 160,000,000
Synthetic Dataset (Lample and
Charton, 2019)
AMPS (Hendrycks et al., 2021) Algebra, Calculus, Geometry, English Synthetic Step-by-step solution / 5,000,0000
Statistics, Number Theory Answer

Table 2: Categorization of MWP Datasets available for training AI models based on (a) Mathematics Area, (b)
Language, (c) Type, and (d) Annotation.

We further explain different variants of encoder- Dolphin1878, DRAW, Dolphin18K, Math23K,

decoder architectures in section 4.3. HMWP, and Ape210K focus on Algebraic prob-
lems. MAWPS, AQuA, MathQA, and ASDiv are
4.2 Analyzing Datasets examples of datasets covering MWPs from Arith-
One of the current limitations of achieving perfor- metic and Algebra. There have been efforts to
mance similar to human-level mathematical intelli- create MWPs datasets for complex areas including
gence is the availability of labeled datasets. Table Probability, Integration and Differentiation (Sax-
2 categorizes various MWPs datasets along with ton et al., 2019) (Lample and Charton, 2019). 2
information like dataset size, language, type of covers examples of MWP datasets in English and
dataset, mathematics area etc. Each dataset fo- Chinese languages.
cus on specific areas of Mathematics, i.e., Arith-
metic, Algebra, Calculus, etc. AI2, IL, ALGES, MWPs datasets are either curated from educa-
ALLArith, SVAMP are examples of datasets for tional resources, derived by processing other avail-
arithmetic MWPs. Datasets such as Alg514, able datasets or synthetically created. For exam-
ple., Alg514 is created by curating problems from 4.3 Encoder Decoder Architectures
[Link]. Similarly, Dolphin1878 comprises of
(Wang et al., 2017) proposed the first encoder-
curated problems from both [Link] and an-
decoder architecture to generate equations or ex-
[Link]. Derived datasets have been de-
pression trees for MWPs. It used GRU based en-
rived by processing existing datasets to address con-
coder and LSTM based decoder. Since then, re-
cerns related to training corpus size (Roy and Roth,
searchers have actively sought to explore the opti-
2017), equation annotation (Amini et al., 2019),
mal architecture to get better performance for pre-
lexical diversity (Miao et al., 2021), problem types
dicting equations or expression trees on diverse
(Roy and Roth, 2017) or adversarial reasoning abil-
datasets comprising complex MWPs. As datasets
ity (Patel et al., 2021). Synthetic datasets could be
are limited in size, researchers kept iterating to find
very large, but they lack generalization. They are ul-
better architectures to learn semantic structures of
timately driven from deterministic sets of templates
math word problems well. These novel neural ar-
or rules and combinations of them for specific math
chitectures can be grouped into four categories,
areas (Saxton et al., 2019) (Lample and Charton,
(1) seq-to-seq, (2) seq-to-tree, (3) transformer-
2019) (Hendrycks et al., 2021). Synthetic datasets
to-tree, (4) graph-to-tree.
could also be used in pre-training neural models.
Curated datasets are generally small in size, e.g., Sequence-to-sequence architecture is the default
AI2, IL, Alg514, DRAW, MAWPS, etc., or suf- choice of any natural language generative model
fer from inconsistencies, e.g., AQuA or MathQA. to transform a natural language input to output.
Neural models generating a set of equations or ex- Output sequence is either free form answer ex-
pression trees would need a dataset, where MWPs pression, equations, or traversal sequence of ex-
are labeled with a set of equations or expression pression trees (Griffith and Kalita, 2021) (Amini
trees. Many of these curated datasets would not et al., 2019). (Wang et al., 2017) leverages GRU-
have associated annotated expression trees; hence based encoder and LSTM based decoder, (Amini
these datasets would require manual effort to an- et al., 2019) uses attention mechanisms over RNNs,
notate. If datasets are created by curating relevant and conditions decoder with category of MWPs.
webpages, it would also possibly need answer ex- For larger synthetic datasets (Lample and Charton,
traction from free-form explanations (Huang et al., 2019) (Saxton et al., 2019), there isn’t a need for
2016). Equations for MWPs could also be extracted semantically rich hybrid architectures as seq-to-seq
from their solutions on webpages using pattern models could achieve good performance. Trans-
matching, but which could lead to some inconsis- formers outperforms attentional LSTMs on larger
tencies (Amini et al., 2019). On the other hand, it datasets (Saxton et al., 2019), mainly attributed to
is very costly to augment these datasets with expert more computations with the same numbers of pa-
annotations. There can be multiple possible ways rameters, internal memory of transformers which
to write a semantically unique equation, which de- is pre-disposed to mathematical objects like quanti-
mands equation normalization to make sure labels ties, and better gradient propagation with shallower
are normalized (Huang et al., 2018a). architectures. There has been an effort to model sit-
uations narrated in MWPs using agents, attributes,
There is an active interest among the research and relations and train a seq-to-seq model to con-
community to analyze and improve the quality of vert relation text to equations (Hong et al., 2021b).
datasets. Datasets like AsDiv and SVAMP are de- Sequence-to-tree, transformer-to-tree, and
rived to address the quality concerns by process- Graph-to-tree architectures attempt to incorporate
ing existing datasets. Lexically similar questions the tree structure of the output mathematical
would lead the model to learn the heuristic patterns equations or expressions into it. The decoder in
to predict the answer (Miao et al., 2021). Lexicon such models is a tree decoder, where it generates
Diversity Score measures the lexical diversity of the next token by taking hidden states of its
questions provided in the dataset, and datasets with parent and sibling and its subtree (if any) into the
better lexicon diversity score would be more re- account (Xie and Sun, 2019)(Liu et al., 2019)(Qin
liable to test a model’s performance (Miao et al., et al., 2020)(Liang et al., 2021) (Wu et al., 2021).
2021). Also, adversarial approaches further extend Attention-based architectures leverage attention
such datasets with adversarial variants to test the mechanisms to attend only parents and siblings
mathematical reasoning ability (Patel et al., 2021). (Qin et al., 2020)(Liang et al., 2021). Operators
Figure 4: Popular Neural Architectures for Solving Math Word Problems

would be parent nodes, and operands would be quantity span, question span, or both could be used
their children nodes, which would recursively to learn graph relationships between quantities and
expand further. Encoders in such architectures question present in an MWP (Li et al., 2019).
varied from GRU/LSTM based (Xie and Sun,
2019) or transformer-based (Liang et al., 2021). 4.4 Design Choices
As tree decoders help model learning the tree In this section, we aim to highlight several design
semantics out output expressions, such models decisions for the above architectures. First, the
perform better than the seq-to-seq model on seq-to-seq model suffers from generating spurious
datasets with limited size. The expected output numbers or predicting numbers at the wrong po-
could be represented as a computational tree, sition. Copy and alignment mechanisms can be
which is then traversed and evaluated to compute used to avoid this problem (Huang et al., 2018a).
the answer. Second, it has become a common practice to keep
Graph-to-tree architecture further exploits the decoder vocabulary limited to numbers, constants,
graph semantics present in MWPs (Ran et al., 2019) operators, and other helpful information present
(Zhang et al., 2020) (Li et al., 2019). Graph se- in the question (Xie and Sun, 2019) (Zhang et al.,
mantics captures the relationships between differ- 2020).
ent entities, which can be thought of as relation- There are multiple ways of generating output
ships among numerical quantities, their descrip- expressions, and maximum likelihood expectations
tion, and unknown quantities. The goal-driven would compromise learning if predicted, and ex-
tree-structured model is designed to generate expected expressions are different but semantically
pressions using computational goals inferred from the same. Reinforcement learning would solve this
an MWP, which uses graph transformer network problem by rewarding the learning strategy based
as the encoder to incorporate quantity compari- on the final answer (Huang et al., 2018a).
son graph and quantity cell graph (Zhang et al., Tree-based decoders have received lots of atten-
2020). It leverages self-attention blocks for incor- tion because of their ability to invoke tree relation-
porating these instances of graphs. The numerical ships present in mathematical expressions. Tree de-
reasoning module attempts to learn comparative coders attend both parents and siblings to generate
relations between numerical quantities by connect- the next token. Bottom-up representation of sub-
ing them in a graph. Encoder representations are tree of a sibling could further help to derive better
appended with representations learned by numeri- outcomes (Qin et al., 2020). For efficient imple-
cal reasoning module to improve the performance mentation of tree decoder, the stack data structure
(Ran et al., 2019). An MWP can be segmented can be used to store and retrieve hidden representa-
into a quantity spans and a question span. Specific tions of parent and sibling (Liu et al., 2019). Tree
self-attention head blocks which could just attend regularization transforms encoder representation
and subtree representations and minimizes their L2 (Saxton et al., 2019), etc, but such models often
distance as a regularization term of the loss func- overfit on smaller datasets (Miao et al., 2021) (Pa-
tion (Qin et al., 2020). tel et al., 2021). Data augmentation is a popu-
Graph-based encoder aims to learn different lar preprocessing technique to increase the size of
types of relationships among the constituents of training data. Generally, data augmentation cre-
MWPs. There are several attempts to construct dif- ates the variant of training records by applying
ferent types of graphs. (Ran et al., 2019) inserts a domain-specific argumentation techniques. Several
numerical reasoning module that uses numerically techniques were proposed for the data augmenta-
aware graph neural networks, where two types of tion of MWP datasets. Reverse operation-based
edge less-than-equal-to and greater-than connect augmentation techniques swap an unknown quan-
numerical quantities present in the question. A sit- tity with a known quantity in an MWP, and their
uation model for algebra story problem SMART expression trees are modified to reflect the change
aims to build a graph using attributed grammar, (Liu et al., 2020). Different traversal orders of ex-
which connects nodes with its attributes using pression trees and other preprocessing techniques
relationships extracted from the problem (Hong like lemmatization, POS tagging, sentence reorder-
et al., 2021b). Self-attention blocks of transformers ing, stop word removal, etc., and their impact on
could also be used to model the graph relationships the performance of models have been studied by
(Zhang et al., 2020) (Li et al., 2019). The goal- (Griffith and Kalita, 2021). Data processing and
driven tree-structured model incorporated quantity augmentation could also help to generate adversar-
cell graph and quantity comparison graph (Zhang ial datasets, which could test the reasoning ability
et al., 2020). Quantity cell graph connects numeri- of models (Patel et al., 2021)(Miao et al., 2021).
cal quantity of the input equations with its descrip- Weak supervision is another popular technique
tive words or entities, whereas quantity comparison that generates large datasets with noisy labels. The
graph is similar to the concept used by (Ran et al., model training process could cancel noise and learn
2019). Segmentation of question into quantity span actual patterns. For example, generating expression
and question span, and using self-attention blocks trees for an MWP dataset with question and answer
to derive global attention, quantity-related atten- could be done using weak supervision, where ex-
tion, quantity pair attention, and question-related pression trees may not always be accurate. Still,
attention is another such approach to derive better the answer it evaluates would match the desired
semantic representation (Li et al., 2019). answer. Learning by fixing is a technique where
Multitasking is a prevalent paradigm to train the expression trees are generated iteratively by fix-
same model for multiple tasks. It enriches the se- ing operators and operands till it evaluates to the
mantic representations of models and avoids them desired answer (Hong et al., 2021a). Such a tech-
getting overfitted. Auxiliary tasks could also be nique helps model learning the more diverse ways
part of such a setup. Auxiliary tasks like Com- of solving mathematical problems.
mon Sense Prediction Task, Number Quantity Pre-
diction, and Number location prediction could be 5 Mathematical Reasoning in Neural
helpful (Qin et al., 2021). The Commonsense pre- Approaches
diction task aims to predict the commonsense re-
quired to solve an MWP, like the number of legs Mathematical reasoning is core to human intelli-
of a horse, or a number of days in a week, etc. gence. As explained before, it is a complex phe-
On the other hand, mask language modeling pre- nomenon interfacing natural language understand-
training on mathematical corpus Ape210k in a self- ing and visual understanding to invoke mathemat-
supervised way to help its downstream application ical transformations. Over the past few decades,
for math word problems (Liang et al., 2021). there has been a flurry of research to develop mod-
els with mathematical reasoning ability; however,
the progress is limited. Larger datasets definitely
4.5 Data Augmentation and Weak
help models learning to solve complex and niche
Supervision
problems (Lample and Charton, 2019) (Saxton
Large datasets have empowered neural models to et al., 2019), but a common purpose model to
learn complex mathematical concepts like integra- solve a diverse variety of math problems is still
tion, differentiation (Lample and Charton, 2019) a distinct reality (Patel et al., 2021) (Miao et al.,
2021). Smaller datasets overfit the models, and Still, they are not generalizable because of the in-
hence reasoning ability is questionable. Non-neural herent limitations of statistical and classical ma-
models could not improve performance with larger chine learning techniques to represent the mean-
datasets (Huang et al., 2016). On the other hand, ing. However, deriving intermediate representa-
neural models achieve much better performance tions and inferring the solution henceforth would
on the smaller datasets and attain near-perfect per- remain one of the niche areas to explore in the
formance over very large datasets; however, its context of neural approaches. Transforming prob-
BlackBox structure hinders interpreting its reason- lem narratives to expressions or equations and then
ing ability. There is a decent consensus among evaluating them to get an answer is one of the most
researchers to solve problems by first generating important approaches suitable for elementary level
intermediate expression trees and evaluating them MWPs (Wang et al., 2017). There were other at-
using solvers to predict the answer (Wang et al., tempts in deriving representation language (Shi
2017) (Amini et al., 2019) (Xie and Sun, 2019). As et al., 2015), using logic forms (Liang et al., 2018)
opposed to directly predicting the final answer, gen- and intermediate meaning representations (Huang
erating intermediate expression trees helps under- et al., 2018b), etc. It would require much more ef-
stand models’ reasoning ability better. Currently, fort to derive intermediate representation for com-
the efforts are mainly to solve MWPs of primary or prehensive and complex mathematical forms and
secondary schools as shown in Table 2. The state of expressions in hindsight.
the current art is still very far from building math-
ematical intelligence that could also infer visual 5.1.2 Interspersed Natural language
narratives and natural language and use domain- Explanations
specific knowledge. Natural language explanations to describe the math-
Models with desired reasoning ability would not ematical rationale would reflect the system’s rea-
only be generalizable, but it would also help to soning ability and also makes the system easier to
make systems interpretable and explainable. On the comprehend and explainable. For example, (Ling
other hand, interpretable and explainable systems et al., 2017) uses LSTM to focus on generating an-
are easier to comprehend, which helps to progress swer rationale, which is a natural language explana-
further by expanding the scope of its reasoning on tion interspersed with algebraic expression to arrive
more complex mathematical concepts. The field of at a decision. AQuA dataset contains 100,000 such
mathematics is full of axioms, rules, and formulas, algebraic problems with answer rationales. Such
and it would also be important to plug explicit and answer rationales not only improve interpretability
definite knowledge into such systems. but provide scaffolding which helps the model learn
mathematical reasoning better. Language models
5.1 Interpretability, and Explainability like WT5, which produce explanations along with
Deep learning models directly predicting answers prediction, could be an inspiration (Narang et al.,
from an input problem lack interpretability and 2020). It would be essential to extend such ap-
explainability. As models often end up learning proaches for more complex mathematical concepts
shallow heuristic patterns to arrive at the answer. beyond algebra. Natural language explanations in-
(Huang et al., 2016), (Patel et al., 2021). That’s terspersed with mathematical expressions would
where it is important to generate intermediate repre- be critical areas for the researchers to focus on for
sentations or explanations, then infer the answer us- building more comprehensive systems. Such inter-
ing them. It not only improves interpretability but spersed explanations would open up the possibility
also provides scaffolding to streamline reasoning to help students understand how to solve the prob-
ability and hence generalizability. Such intermedi- lem more effectively or would foster users’ trust in
ate representations could be of different forms, and the system.
they could be interspersed with natural language
5.2 Infusing Explicit and Definitive
explanations to aid the explainability of the model
Knowledge
further.
Mathematical problem solving requires identifying
5.1.1 Intermediate Representations relevant external knowledge and applying it to the
The semantic parsing approach attempts to derive mathematical rules to arrive at an answer. External
semantic structure from the problem narratives. knowledge could be commonsense world knowl-
edge, particular domain knowledge, or complex tion graph would give an answer, and a penalty or
mathematical formulas. There is some progress reward would be given based on the comparison
to infuse commonsense world knowledge (Liang between the computed answer and the expected
et al., 2021) or predict it using auxiliary tasks (Qin ground truth. Researchers have also leveraged rein-
et al., 2021), however remaining two dimensions forcement learning on difficult math domains like
are yet to be addressed. Such external knowledge is automated theorem proving (Crouse et al., 2021)
incorporated using rule-based, pattern-matching ap- (Kaliszyk et al., 2018). It is still an early stage of
proaches in non-neural models; on the other hand, the application of reinforcement learning for solv-
neural models leverage data augmentation and aux- ing math word problems. It could bring the next
iliary task training. (Zhang et al., 2020) (Xie and set of opportunities to build systems capable of
Sun, 2019) developed novel neural architectures to mathematical reasoning ability.
represent semantics captured in the problems using
graph and tree relations. These examples are of 6 Outlook
the shallow knowledge infusion category. It would
be intriguing to explore deep knowledge infusion The mathematical reasoning process involves un-
wherein model architecture would transform and derstanding natural language and visual narratives
leverage external knowledge vector spaces during and extracting entities with explicit and implicit
its training (Wang et al., 2020) (Faldu et al., 2021). quantities building a self-consistent symbolic rep-
Furthermore, it would be promising to build an eval- resentation invoking mathematical laws, axioms,
uation benchmark for such knowledge-intensive and symbolic rules.
mathematical problems to encourage and stream- The ability to solve MWPs in an automated fash-
line efforts to solve math problems (Sheth et al., ion could lead to many applications in the edu-
2021). cation domain for content creation and delivering
learning outcomes (Donda et al., 2020) (Faldu et al.,
2020a).
5.3 Reinforcement Learning
We surveyed sets of different approaches and
Mathematical reasoning ability comprises applying their claims to have reasonable solved MWPs on
a series of mathematical transformations. Rather specific datasets. However, we have also high-
than learning mathematical transformations, the lighted studies with “concrete evidence” that exist-
system could focus on synthesizing the computa- ing MWP solvers tend to rely on shallow heuristics
tional graph. Learning algorithms like gradient to achieve their high performance and questions
descent works on iterative reduction of the error. these models’ capabilities to solve even the sim-
The choices of these mathematical transformations plest of MWPs robustly (Huang et al., 2016) (Patel
are not differential, and hence it is unclear if its the et al., 2021) (Miao et al., 2021).
optimal strategy. Learning policy to construct such Non-neural approaches do not improve linearly
a computation graph using reinforcement learning with larger training data (Huang et al., 2016), on
could be helpful (Palermo et al., 2021) (Huang the other hand, neural approaches have shown a
et al., 2018a). Exponential search space of math- promise to solve even complex MWPs trained with
ematical concepts is another key problem where very large corpus (Lample and Charton, 2019) (Sax-
reinforcement learning could be useful (Wang et al., ton et al., 2019). However, the availability of exten-
2018b). Reinforcement learning reduces a problem sive training data with diverse MWPs is a crucial
to state transition problem with reward function. challenge. The inherent limitation of neural models
It initializes the state from a given problem, and in terms of interpretability and explainability could
based on the semantic information extracted from be addressed partly by predicting expression trees
the problem, it derives actions from updating the instead of directly predicting answers (Gaur et al.,
state with a reward or penalty. It learns a policy to 2021). Such expression trees could be converted
maximize the reward as it traverses through several into equations, and which are evaluated to infer the
intermediate states. For an MWP, the computa- final answer. Further, expression trees also provide
tion graph of expression trees denotes the state, scaffolding to assist mathematical reasoning abil-
and action denotes the choices of mathematical ity and open up design choices like graph-based
transformations and their mapping with operands encoder and tree-based decoder, which helps to
from the input. The evaluation of the computa- learn from the inadequately sized training corpus.
Interspersed natural language explanations and in- Witbrock, and Achille Fokoue. 2021. A deep rein-
fusing explicit knowledge could be of significant forcement learning approach to first-order logic the-
orem proving. In Proceedings of the AAAI Con-
value as it not only improves the explainability of
ference on Artificial Intelligence, volume 35, pages
the system but also guides the model to traverse 6279–6287.
through intermediate states anchoring mathemati-
cal reasoning (Ling et al., 2017) (Gaur et al., 2021). Balázs Csanád Csáji et al. 2001. Approximation with
artificial neural networks. Faculty of Sciences, Etvs
Interspersed explanations not only solve the math Lornd University, Hungary, 24(48):7.
word problems but also engage users on how to
solve them. Denise Dellarosa. 1986. A computer simulation of
children’s arithmetic word-problem solving. Behav-
We highlight that solving MWPs would require
ior Research Methods, Instruments, & Computers,
the inherent ability of mathematical reasoning, 18(2):147–154.
which comprises natural language understanding,
image understanding, and the ability to invoke do- Soma Dhavala, Chirag Bhatia, Joy Bose, Keyur Faldu,
and Aditi Avasthi. 2020. Auto generation of diag-
main knowledge of mathematical laws, axioms, nostic assessments and their quality evaluation. In-
and theorems. There is a massive scope of com- ternational Educational Data Mining Society.
plementing neural approaches with external exper-
tise/knowledge and developing design choices as Chintan Donda, Sayan Dasgupta, Soma S Dhavala,
Keyur Faldu, and Aditi Avasthi. 2020. A framework
we extend the problem to complex mathematical for predicting, interpreting, and improving learning
areas. outcomes. arXiv preprint arXiv:2010.02629.

Lyn D English. 2013. Mathematical reasoning: Analo-

References gies, metaphors, and images. Routledge.

Aida Amini, Saadia Gabriel, Peter Lin, Rik Koncel- Keyur Faldu, Aditi Avasthi, and Achint Thomas. 2020a.
Kedziorski, Yejin Choi, and Hannaneh Hajishirzi. Adaptive learning machine for score improvement
2019. Mathqa: Towards interpretable math word and parts thereof. US Patent 10,854,099.
problem solving with operation-based formalisms.
arXiv preprint arXiv:1905.13319. Keyur Faldu, Amit Sheth, Prashant Kikani, and He-
mang Akabari. 2021. Ki-bert: Infusing knowledge
Bussaba Amnueypornsakul and Suma Bhat. 2014. context for better language and domain understand-
Machine-guided solution to mathematical word ing. arXiv preprint arXiv:2104.08145.
problems. In Proceedings of the 28th Pacific Asia
Conference on Language, Information and Comput- Keyur Faldu, Achint Thomas, and Aditi Avasthi. 2020b.
ing, pages 111–119. System and method for recommending personalized
content using contextualized knowledge base. US
Yefim Bakman. 2007. Robust understanding of Patent App. 16/586,512.
word problems with extraneous information. arXiv
preprint math/0701393. Edward A Feigenbaum, Julian Feldman, et al. 1963.
Computers and thought. New York McGraw-Hill.
Albert Bandura. 2008. Observational learning. The
international encyclopedia of communication. Charles R Fletcher. 1985. Understanding and solving
arithmetic word problems: A computer simulation.
Daniel G Bobrow. 1964. Natural language input for a Behavior Research Methods, Instruments, & Com-
computer problem solving system. puters, 17(5):565–571.

Diane J Briars and Jill H Larkin. 1984. An integrated Manas Gaur, Keyur Faldu, and Amit Sheth. 2021. Se-
model of skill in solving elementary word problems. mantics of the black-box: Can knowledge graphs
Cognition and instruction, 1(3):245–296. help make deep learning systems more interpretable
and explainable? IEEE Internet Computing,
Eugene Charniak. 1968. Calculus word problems. 25(1):51–59.
Ph.D. thesis, Ph. D. thesis, Massachusetts Institute
of Technology. Kaden Griffith and Jugal Kalita. 2021. Solving
arithmetic word problems with transformers and
Keh-Jiann Chen, Shu-Ling Huang, Yueh-Yin Shih, and preprocessing of problem text. arXiv preprint
Yi-Jun Chen. 2005. Extended-hownet: A represen- arXiv:2106.00893.
tational framework for concepts. In Proceedings of
OntoLex 2005-Ontologies and Lexical Resources. Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul
Arora, Steven Basart, Eric Tang, Dawn Song, and
Maxwell Crouse, Ibrahim Abdelaziz, Bassem Makni, Jacob Steinhardt. 2021. Measuring mathematical
Spencer Whitehead, Cristina Cornelio, Pavan Kapa- problem solving with the math dataset. arXiv
nipathi, Kavitha Srinivas, Veronika Thost, Michael preprint arXiv:2103.03874.
Yining Hong, Qing Li, Daniel Ciao, Siyuan Huang, and Nate Kushman, Yoav Artzi, Luke Zettlemoyer, and
Song-Chun Zhu. 2021a. Learning by fixing: Solv- Regina Barzilay. 2014. Learning to automatically
ing math word problems with weak supervision. In solve algebra word problems. In Proceedings of the
AAAI Conference on Artificial Intelligence. 52nd Annual Meeting of the Association for Compu-
tational Linguistics (Volume 1: Long Papers), pages
Yining Hong, Qing Li, Ran Gong, Daniel Ciao, Siyuan 271–281.
Huang, and Song-Chun Zhu. 2021b. Smart: A situa-
tion model for algebra story problems via attributed Guillaume Lample and François Charton. 2019. Deep
grammar. In Proceedings of the AAAI Conference learning for symbolic mathematics. arXiv preprint
on Artificial Intelligence, volume 35, pages 13009– arXiv:1912.01412.
13017.
Jierui Li, Lei Wang, Jipeng Zhang, Yan Wang,
Mohammad Javad Hosseini, Hannaneh Hajishirzi, Bing Tian Dai, and Dongxiang Zhang. 2019. Model-
Oren Etzioni, and Nate Kushman. 2014. Learning ing intra-relation in math word problems with differ-
to solve arithmetic word problems with verb catego- ent functional multi-head attentions. In Proceedings
rization. In Proceedings of the 2014 Conference on of the 57th Annual Meeting of the Association for
Empirical Methods in Natural Language Processing Computational Linguistics, pages 6162–6167.
(EMNLP), pages 523–533.
Chao-Chun Liang, Yu-Shiang Wong, Yi-Chung Lin,
Danqing Huang, Jing Liu, Chin-Yew Lin, and Jian Yin. and Keh-Yih Su. 2018. A meaning-based statistical
2018a. Neural math word problem solver with rein- english math word problem solver. arXiv preprint
forcement learning. In Proceedings of the 27th Inter- arXiv:1803.06064.
national Conference on Computational Linguistics,
Zhenwen Liang, Jipeng Zhang, Jie Shao, and Xian-
pages 213–223.
gliang Zhang. 2021. Mwp-bert: A strong base-
Danqing Huang, Shuming Shi, Chin-Yew Lin, Jian Yin, line for math word problems. arXiv preprint
and Wei-Ying Ma. 2016. How well do comput- arXiv:2107.13435.
ers solve math word problems? large-scale dataset Christian Liguda and Thies Pfeiffer. 2012. Modeling
construction and evaluation. In Proceedings of the math word problems with augmented semantic net-
54th Annual Meeting of the Association for Compu- works. In International Conference on Application
tational Linguistics (Volume 1: Long Papers), pages of Natural Language to Information Systems, pages
887–896. 247–252. Springer.
Danqing Huang, Jin-Ge Yao, Chin-Yew Lin, Qingyu Yi-Chung Lin, Chao-Chun Liang, Kuang-Yi Hsu,
Zhou, and Jian Yin. 2018b. Using intermediate rep- Chien-Tsung Huang, Shen-Yun Miao, Wei-Yun Ma,
resentations to solve math word problems. In Pro- Lun-Wei Ku, Churn-Jung Liau, and Keh-Yih Su.
ceedings of the 56th Annual Meeting of the Associa- 2015. Designing a tag-based statistical math word
tion for Computational Linguistics (Volume 1: Long problem solver with reasoning and explanation. In
Papers), pages 419–428. International Journal of Computational Linguistics
& Chinese Language Processing, Volume 20, Num-
Daniel Kahneman. 2011. Thinking, fast and slow. ber 2, December 2015-Special Issue on Selected Pa-
Macmillan. pers from ROCLING XXVII.
Cezary Kaliszyk, Josef Urban, Henryk Michalewski, Wang Ling, Dani Yogatama, Chris Dyer, and Phil Blun-
and Mirek Olšák. 2018. Reinforcement learning of som. 2017. Program induction by rationale genera-
theorem proving. arXiv preprint arXiv:1805.07563. tion: Learning to solve and explain algebraic word
problems. arXiv preprint arXiv:1705.04146.
Daniel Khashabi, Erfan Sadeqi Azer, Tushar Khot,
Ashish Sabharwal, and Dan Roth. 2019. On Johan Lithner. 2000. Mathematical reasoning in task
the capabilities and limitations of reasoning for solving. Educational studies in mathematics, pages
natural language understanding. arXiv preprint 165–190.
arXiv:1901.02522.
Qianying Liu, Wenyu Guan, Sujian Li, Fei Cheng,
Rik Koncel-Kedziorski, Hannaneh Hajishirzi, Ashish Daisuke Kawahara, and Sadao Kurohashi. 2020.
Sabharwal, Oren Etzioni, and Siena Dumas Ang. Reverse operation based data augmentation for
2015. Parsing algebraic word problems into equa- solving math word problems. arXiv preprint
tions. Transactions of the Association for Computa- arXiv:2010.01556.
tional Linguistics, 3:585–597.
Qianying Liu, Wenyv Guan, Sujian Li, and Daisuke
Rik Koncel-Kedziorski, Subhro Roy, Aida Amini, Nate Kawahara. 2019. Tree-structured decoding for solv-
Kushman, and Hannaneh Hajishirzi. 2016. Mawps: ing math word problems. In Proceedings of the
A math word problem repository. In Proceedings of 2019 Conference on Empirical Methods in Natu-
the 2016 Conference of the North American Chap- ral Language Processing and the 9th International
ter of the Association for Computational Linguistics: Joint Conference on Natural Language Processing
Human Language Technologies, pages 1152–1157. (EMNLP-IJCNLP), pages 2370–2379.
Shen-Yun Miao, Chao-Chun Liang, and Keh-Yih Su. David Saxton, Edward Grefenstette, Felix Hill, and
2021. A diverse corpus for evaluating and devel- Pushmeet Kohli. 2019. Analysing mathematical rea-
oping english math word problem solvers. arXiv soning abilities of neural models. arXiv preprint
preprint arXiv:2106.15772. arXiv:1904.01557.
Arindam Mitra and Chitta Baral. 2016. Learning to use Amit Sheth, Manas Gaur, Kaushik Roy, and Keyur
formulas to solve simple arithmetic problems. In Faldu. 2021. Knowledge-intensive language under-
Proceedings of the 54th Annual Meeting of the As- standing for explainable ai. IEEE Internet Comput-
sociation for Computational Linguistics (Volume 1: ing, (01):1–1.
Long Papers), pages 2144–2153.
Shuming Shi, Yuehui Wang, Chin-Yew Lin, Xiaojiang
Sharan Narang, Colin Raffel, Katherine Lee, Adam Liu, and Yong Rui. 2015. Automatically solving
Roberts, Noah Fiedel, and Karishma Malkan. 2020. number word problems by semantic parsing and rea-
Wt5?! training text-to-text models to explain their soning. In Proceedings of the 2015 Conference on
predictions. arXiv preprint arXiv:2004.14546. Empirical Methods in Natural Language Processing,
Joseph Palermo, Johnny Ye, and Alok Singh. 2021. A pages 1132–1142.
reinforcement learning environment for mathemati- Shane Storks, Qiaozi Gao, and Joyce Y Chai. 2019.
cal reasoning via program synthesis. arXiv preprint Commonsense reasoning for natural language un-
arXiv:2107.07373. derstanding: A survey of benchmarks, resources,
Arkil Patel, Satwik Bhattamishra, and Navin Goyal. and approaches. arXiv preprint arXiv:1904.01172,
2021. Are nlp models really able to solve pages 1–60.
simple math word problems? arXiv preprint
arXiv:2103.07191. Shyam Upadhyay and Ming-Wei Chang. 2015. Draw:
A challenging and diverse algebra word problem set.
Jinghui Qin, Xiaodan Liang, Yining Hong, Jianheng Technical report, Citeseer.
Tang, and Liang Lin. 2021. Neural-symbolic solver
for math word problems with auxiliary tasks. arXiv Alex Wang, Yada Pruksachatkun, Nikita Nangia,
preprint arXiv:2107.01431. Amanpreet Singh, Julian Michael, Felix Hill, Omer
Levy, and Samuel R Bowman. 2019. Super-
Jinghui Qin, Lihui Lin, Xiaodan Liang, Rumin Zhang, glue: A stickier benchmark for general-purpose
and Liang Lin. 2020. Semantically-aligned univer- language understanding systems. arXiv preprint
sal tree-structured solver for math word problems. arXiv:1905.00537.
arXiv preprint arXiv:2010.06823.
Alex Wang, Amanpreet Singh, Julian Michael, Felix
Qiu Ran, Yankai Lin, Peng Li, Jie Zhou, and Zhiyuan Hill, Omer Levy, and Samuel R Bowman. 2018a.
Liu. 2019. Numnet: Machine reading compre- Glue: A multi-task benchmark and analysis platform
hension with numerical reasoning. arXiv preprint for natural language understanding. arXiv preprint
arXiv:1910.06701. arXiv:1804.07461.
Benjamin Robaidek, Rik Koncel-Kedziorski, and Han- Lei Wang, Dongxiang Zhang, Lianli Gao, Jingkuan
naneh Hajishirzi. 2018. Data-driven methods for Song, Long Guo, and Heng Tao Shen. 2018b. Math-
solving algebra word problems. arXiv preprint dqn: Solving arithmetic word problems via deep re-
arXiv:1804.10718. inforcement learning. In Proceedings of the AAAI
Subhro Roy and Dan Roth. 2017. Unit dependency Conference on Artificial Intelligence, volume 32.
graph and its application to arithmetic word problem Ruize Wang, Duyu Tang, Nan Duan, Zhongyu Wei,
solving. In Proceedings of the AAAI Conference on Xuanjing Huang, Guihong Cao, Daxin Jiang, Ming
Artificial Intelligence, volume 31. Zhou, et al. 2020. K-adapter: Infusing knowl-
Subhro Roy and Dan Roth. 2018. Mapping to declara- edge into pre-trained models with adapters. arXiv
tive knowledge for word problem solving. Transac- preprint arXiv:2002.01808.
tions of the Association for Computational Linguis-
tics, 6:159–172. Yan Wang, Xiaojiang Liu, and Shuming Shi. 2017.
Deep neural solver for math word problems. In Pro-
Subhro Roy, Tim Vieira, and Dan Roth. 2015. Reason- ceedings of the 2017 Conference on Empirical Meth-
ing about quantities in natural language. Transac- ods in Natural Language Processing, pages 845–
tions of the Association for Computational Linguis- 854.
tics, 3:1–13.
Qinzhuo Wu, Qi Zhang, Zhongyu Wei, and Xuan-Jing
Maarten Sap, Vered Shwartz, Antoine Bosselut, Yejin Huang. 2021. Math word problem solving with
Choi, and Dan Roth. 2020. Commonsense reason- explicit numerical values. In Proceedings of the
ing for natural language processing. In Proceed- 59th Annual Meeting of the Association for Compu-
ings of the 58th Annual Meeting of the Association tational Linguistics and the 11th International Joint
for Computational Linguistics: Tutorial Abstracts, Conference on Natural Language Processing (Vol-
pages 27–33. ume 1: Long Papers), pages 5859–5869.
Zhipeng Xie and Shichao Sun. 2019. A goal-driven
tree-structured neural model for math word prob-
lems. In IJCAI, pages 5299–5305.
Ma Yuhui, Zhou Ying, Cui Guangzuo, Ren Yun, and
Huang Ronghuai. 2010. Frame-based calculus of
solving arithmetic multi-step addition and subtrac-
tion word problems. In 2010 Second International
Workshop on Education Technology and Computer
Science, volume 2, pages 476–479. IEEE.
Jipeng Zhang, Lei Wang, Roy Ka-Wei Lee, Yi Bin, Yan
Wang, Jie Shao, and Ee-Peng Lim. 2020. Graph-to-
tree learning for solving math word problems. Asso-
ciation for Computational Linguistics.
Wei Zhao, Mingyue Shang, Yang Liu, Liang Wang, and
Jingming Liu. 2020. Ape210k: A large-scale and
template-rich dataset of math word problems. arXiv
preprint arXiv:2009.11506.
Lipu Zhou, Shuaixiang Dai, and Liwei Chen. 2015.
Learn to solve algebra word problems using
quadratic programming. In Proceedings of the 2015
Conference on Empirical Methods in Natural Lan-
guage Processing, pages 817–822.

Resolving Mathematical Word Problems Through Gener
No ratings yet
Resolving Mathematical Word Problems Through Gener
6 pages
Deep Learning in Mathematical Reasoning
No ratings yet
Deep Learning in Mathematical Reasoning
24 pages
1 s2.0 S2666827023000592 Main
No ratings yet
1 s2.0 S2666827023000592 Main
8 pages
Source
No ratings yet
Source
16 pages
NLP Models and Simple Math Challenges
No ratings yet
NLP Models and Simple Math Challenges
15 pages
Deep Learning in Mathematical Reasoning
No ratings yet
Deep Learning in Mathematical Reasoning
27 pages
What Makes Math Word Problems Challenging For LLMS?
No ratings yet
What Makes Math Word Problems Challenging For LLMS?
11 pages
SBI-RAG: Enhancing Math Word Problem Solving For Students Through Schema-Based Instruction and Retrieval-Augmented Generation
No ratings yet
SBI-RAG: Enhancing Math Word Problem Solving For Students Through Schema-Based Instruction and Retrieval-Augmented Generation
15 pages
Automatic Math Word Problem Generation With Topic-Expression Co-Attention Mechanism and Reinforcement Learning
No ratings yet
Automatic Math Word Problem Generation With Topic-Expression Co-Attention Mechanism and Reinforcement Learning
12 pages
NLP for Solving Arithmetic Word Problems
No ratings yet
NLP for Solving Arithmetic Word Problems
12 pages
Large Language Models For Mathematical Reasoning - Progresses and Challenges
No ratings yet
Large Language Models For Mathematical Reasoning - Progresses and Challenges
14 pages
Specialized Mathematical Solving by A Step-By-Step Expression Chain Generation
No ratings yet
Specialized Mathematical Solving by A Step-By-Step Expression Chain Generation
13 pages
Arithmetic Word Problems Review
No ratings yet
Arithmetic Word Problems Review
20 pages
Mathematical Language Models: A Survey
No ratings yet
Mathematical Language Models: A Survey
34 pages
AI-Powered Math Problem Solver
No ratings yet
AI-Powered Math Problem Solver
6 pages
Language Models' Math Reasoning
No ratings yet
Language Models' Math Reasoning
33 pages
Exploring The Potential of Using ChatGPT in Physics Education
No ratings yet
Exploring The Potential of Using ChatGPT in Physics Education
19 pages
Beyond Statistical Learning: Exact Learning Is Essential For General Intelligence
No ratings yet
Beyond Statistical Learning: Exact Learning Is Essential For General Intelligence
24 pages
Mathematics & AI: Problem Solving Insights
No ratings yet
Mathematics & AI: Problem Solving Insights
5 pages
Symmetry 15 00916 v2
No ratings yet
Symmetry 15 00916 v2
13 pages
How Well Do LLM Perform Iin Arithmetic Tasks
No ratings yet
How Well Do LLM Perform Iin Arithmetic Tasks
10 pages
Reasoning in Large Language Models Through Symbolic Math Word Problems
No ratings yet
Reasoning in Large Language Models Through Symbolic Math Word Problems
13 pages
1 s2.0 S0306457325000019 Main
No ratings yet
1 s2.0 S0306457325000019 Main
16 pages
Dissecting Multiplication in Transformers: Insights Into Llms
No ratings yet
Dissecting Multiplication in Transformers: Insights Into Llms
8 pages
Tool-Integrated Agents for Math Reasoning
No ratings yet
Tool-Integrated Agents for Math Reasoning
34 pages
How Does GPT-2 Compute Greater-Than?: Interpreting Mathematical Abilities in A Pre-Trained Language Model
No ratings yet
How Does GPT-2 Compute Greater-Than?: Interpreting Mathematical Abilities in A Pre-Trained Language Model
26 pages
Zhaocheng Zhu1 - Large Language Models Can Learn Rules
No ratings yet
Zhaocheng Zhu1 - Large Language Models Can Learn Rules
29 pages
Math Odyssey Benchmarks
No ratings yet
Math Odyssey Benchmarks
14 pages
4966 Large Language Models Can Lear
No ratings yet
4966 Large Language Models Can Lear
30 pages
Word Problems Algebra Solving
No ratings yet
Word Problems Algebra Solving
11 pages
Math Problem Solving for AI Researchers
No ratings yet
Math Problem Solving for AI Researchers
22 pages
Research Report
No ratings yet
Research Report
12 pages
Understanding Large Language Models
No ratings yet
Understanding Large Language Models
47 pages
23ChatGPT As A Math Questioner
No ratings yet
23ChatGPT As A Math Questioner
9 pages
MathScale: Enhancing Math Reasoning
No ratings yet
MathScale: Enhancing Math Reasoning
15 pages
T Ra: A T - I R A M P S: O OOL Ntegrated Easoning Gent FOR Athematical Roblem Olving
No ratings yet
T Ra: A T - I R A M P S: O OOL Ntegrated Easoning Gent FOR Athematical Roblem Olving
22 pages
Combining LLMs and Symbolic Solvers for Math Problems
No ratings yet
Combining LLMs and Symbolic Solvers for Math Problems
7 pages
Weakly Supervised Math Problem Solver
No ratings yet
Weakly Supervised Math Problem Solver
11 pages
Self-Supervised Mathematical Reasoning
No ratings yet
Self-Supervised Mathematical Reasoning
21 pages
Pruthwik Mishra PHD Thesis
No ratings yet
Pruthwik Mishra PHD Thesis
134 pages
GPT-Based Personalized Math Problem Generation
No ratings yet
GPT-Based Personalized Math Problem Generation
6 pages
Neural Symbolic
No ratings yet
Neural Symbolic
22 pages
OlympiadBench: Bilingual AGI Benchmark
No ratings yet
OlympiadBench: Bilingual AGI Benchmark
20 pages
Artifical Intelligence and Inherent Mathematical Difficulty
No ratings yet
Artifical Intelligence and Inherent Mathematical Difficulty
35 pages
M V: E M R - F M V C: ATH Ista Valuating Athematical Eason Ing of Oundation Odels in Isual Ontexts
No ratings yet
M V: E M R - F M V C: ATH Ista Valuating Athematical Eason Ing of Oundation Odels in Isual Ontexts
116 pages
08 Math
No ratings yet
08 Math
22 pages
LLM Reasoning 1734956818
No ratings yet
LLM Reasoning 1734956818
87 pages
FMC: Formalization of Natural Language Mathematical Competition Problems
No ratings yet
FMC: Formalization of Natural Language Mathematical Competition Problems
18 pages
AI-Descartes: Deriving Scientific Models
No ratings yet
AI-Descartes: Deriving Scientific Models
10 pages
Symbolicai: A Framework For Logic-Based Approaches Combining Generative Models and Solvers
No ratings yet
Symbolicai: A Framework For Logic-Based Approaches Combining Generative Models and Solvers
39 pages
ML 22
No ratings yet
ML 22
29 pages
A Human-Like Artificial Intelligence For Mathematics
No ratings yet
A Human-Like Artificial Intelligence For Mathematics
19 pages
A Mathematical Word Problem Generator With Structure
No ratings yet
A Mathematical Word Problem Generator With Structure
5 pages
Neural Network Solves University Math Problems
No ratings yet
Neural Network Solves University Math Problems
10 pages
Guiding Large Language Models With Divide-and-Conquer Program For Discerning Problem Solving
No ratings yet
Guiding Large Language Models With Divide-and-Conquer Program For Discerning Problem Solving
18 pages
Revolutionizing Mathematics With AI - C Ruiz Valbuena
No ratings yet
Revolutionizing Mathematics With AI - C Ruiz Valbuena
80 pages
Mitali Songara - PPTX A
No ratings yet
Mitali Songara - PPTX A
13 pages
Artisans' Struggles in Saharanpur
No ratings yet
Artisans' Struggles in Saharanpur
6 pages
Harvard Sanskrit Handout - Intro - 101A
No ratings yet
Harvard Sanskrit Handout - Intro - 101A
1 page
Intrusion Detection System - False Positive Alert Reduction Technique
No ratings yet
Intrusion Detection System - False Positive Alert Reduction Technique
4 pages
Teacher's June Activity Schedule
No ratings yet
Teacher's June Activity Schedule
7 pages
ERP and CRM Implementation Failures
No ratings yet
ERP and CRM Implementation Failures
2 pages
Attitude of Students Towards Teaching As A Profession
No ratings yet
Attitude of Students Towards Teaching As A Profession
11 pages
Readings in The Philippine History
No ratings yet
Readings in The Philippine History
128 pages
Elementary Quick Check Test 6B: Grammar
No ratings yet
Elementary Quick Check Test 6B: Grammar
2 pages
Student Portfolio Guide
No ratings yet
Student Portfolio Guide
41 pages
Students Perceptions of Effects of Some Teaching Strategies On Their Academic Permormances in Principles Accounting
No ratings yet
Students Perceptions of Effects of Some Teaching Strategies On Their Academic Permormances in Principles Accounting
29 pages
Behaviour Change Communication
100% (2)
Behaviour Change Communication
17 pages
Integrated Learning Theories in Education
No ratings yet
Integrated Learning Theories in Education
6 pages
CBSE Class 11 Chemistry Syllabus 2025-26 - Free PDF
No ratings yet
CBSE Class 11 Chemistry Syllabus 2025-26 - Free PDF
7 pages
H432-01-Periodic Table, Elements and Physical Chemistry/a Level Chemistry A H432 - H432-01 - QS13
No ratings yet
H432-01-Periodic Table, Elements and Physical Chemistry/a Level Chemistry A H432 - H432-01 - QS13
5 pages
Parent Guide: Literacy Program Details
No ratings yet
Parent Guide: Literacy Program Details
1 page
Understanding 5 Levels of Tai Chi Chuan
No ratings yet
Understanding 5 Levels of Tai Chi Chuan
2 pages
Data Communications Unit Plan
67% (3)
Data Communications Unit Plan
6 pages
f3 Model Test Paper 01 Ms 1
No ratings yet
f3 Model Test Paper 01 Ms 1
5 pages
Acts An Exegetical Commentary 1512335 Craig S Keener Instant Download
100% (7)
Acts An Exegetical Commentary 1512335 Craig S Keener Instant Download
83 pages
Government Junior High School (JHS) Profile: Department of Education
No ratings yet
Government Junior High School (JHS) Profile: Department of Education
22 pages
Presentation Template
No ratings yet
Presentation Template
20 pages
METACOGNITION
No ratings yet
METACOGNITION
16 pages
Project Management Overview and Exam Guide
100% (1)
Project Management Overview and Exam Guide
19 pages
Ethical Thinking and Moral Dilemmas
0% (1)
Ethical Thinking and Moral Dilemmas
3 pages
Ex Intro Dissertation
100% (2)
Ex Intro Dissertation
5 pages
NCM 120 Module 2F
No ratings yet
NCM 120 Module 2F
34 pages
IPO Cycle A Computer Science Perspective
No ratings yet
IPO Cycle A Computer Science Perspective
11 pages
Good Thinking! Part One
No ratings yet
Good Thinking! Part One
3 pages
NOTES - Diseases and Immunity
No ratings yet
NOTES - Diseases and Immunity
22 pages
English Grammar Exercises
No ratings yet
English Grammar Exercises
3 pages