Reasoning about Quantities
in Natural Language
Subhro Roy, Tim Vieira, Dan Roth.
TACL 2015
1
• 
• 
 
 
• 
 
 
 
• 
• 
• 
• 
2
  About six and a half hours later,
  Mr. Armstrong opened the landing craft’s hatch.
  [About six and a half hours later],
  Mr. Armstrong opened the landing craft’s hatch.
3
  About six and a half hours later
4
• 
• 
• 
• 
• 
5
下限 上限 なし
下限 上限 なし
6
  The number of member nations was 80 in 2000,
and then it increased to 95.
  The number of adults and children with
HIV/AIDS reached 39.4 million in 2004.
7
  CERN has now grown to include 20 member
states and enjoys the active participation of many
other countries world-wide.
8
  CERN has 20 member states.
  CERN has now grown to include 20 member
states and enjoys the active participation of many
other countries world-wide.
9
  
10
• 
• 
• 
• 
• 
• 
• 
次のスライドで詳細
11
• 
• 
• 
• 
• 
• 
• 
12
• 
• 
• 
• 
• 
• 
 
• 
 
• 
13
 
• 
• 
14
down” we would like to segment together ”nearly
two years after” . We consider a quantity to be
correctly detected only when we have the exact
phrase that we want, otherwise we consider the
segment to be undetected.
Model P% R% F%
Train Test
Time Time
Semi-CRF (SC) 75.6 77.7 76.6 15.8 1.5
C+I (PR) 80.3 79.3 79.8 1.0 1.0
Table 2: 10-fold cross-validation results of segmentation
accuracy and time required for segmentation, the columns for
runtime have been normalized and expressed as ratios
Table 2 describes the segmentation accuracy, as
well as the ratio between the time taken by both
approaches. The bank of classifiers approach gives
slightly better accuracy than the semi-CRF model,
and is also significantly faster.
Task
Entailm
Contradi
No Rela
Table 3:
consistently
quantities ca
 
• 
• 
• 
15
e increased 10%”, we would like
her “increased 10%”, since this
quantity denotes a rise in value.
nce “Apple restores push email in
two years after Motorola shut it
like to segment together ”nearly
. We consider a quantity to be
only when we have the exact
want, otherwise we consider the
etected.
P% R% F%
Train Test
Time Time
75.6 77.7 76.6 15.8 1.5
80.3 79.3 79.8 1.0 1.0
ross-validation results of segmentation
quired for segmentation, the columns for
malized and expressed as ratios
es the segmentation accuracy, as
between the time taken by both
bank of classifiers approach gives
uracy than the semi-CRF model,
antly faster.
exact match only supports 43.3% of the entailment
decisions. It is also evident that the deeper semantic
analysis using SRL and Coreference improves the
quantitative inference.
Task System P% R% F%
Entailment
Baseline 100.0 43.3 60.5
GOLDSEG 98.5 88.0 92.9
+SEM 97.8 88.6 93.0
PREDSEG 94.9 76.2 84.5
+SEM 95.4 78.3 86.0
Contradiction
Baseline 16.6 48.5 24.8
GOLDSEG 61.6 92.9 74.2
+SEM 64.3 91.5 75.5
PREDSEG 51.9 79.7 62.8
+SEM 52.8 81.1 64.0
No Relation
Baseline 41.8 71.9 52.9
GOLDSEG 81.1 76.7 78.8
+SEM 80.0 78.5 79.3
PREDSEG 54.0 75.4 62.9
+SEM 56.3 72.7 63.5
Table 3: Results of QE; Adding Semantics(+SEM)
consistently improves performance; Only 43.3% of entailing
quantities can be recovered by simple string matching
 
• 
• 
• 
16
ge, divide its
obtain a new
e of the two
tity with the
., time-stamp
of time.
h )
value triples
contradicts or
( Q )
in Algorithm 3.
5.2 Scope of QE Inference
Our current QE procedure is limited in
several ways. In all cases, we attribute these
limitations to subtle and deeper language
understanding, which we delegate to the application
module that will use our QE procedure as a
subroutine. Consider the following examples:
T : Adam has exactly 100 dollars in the bank.
H1 : Adam has 50 dollars in the bank.
H2 : Adam’s bank balance is 50 dollars.
Here, T implies H1 but not H2. However for both
H1 and H2, QE will infer that “50 dollars” is a
contradiction to sentence T, since it cannot make
the subtle distinction required here.
T : Ten students passed the exam, but six students
failed it.
H : At least eight students failed the exam.
• 
• 
17
., time-stamp
of time.
h )
value triples
contradicts or
Q )
Q do
entails then
= contradicts
module that will use our QE procedure as a
subroutine. Consider the following examples:
T : Adam has exactly 100 dollars in the bank.
H1 : Adam has 50 dollars in the bank.
H2 : Adam’s bank balance is 50 dollars.
Here, T implies H1 but not H2. However for both
H1 and H2, QE will infer that “50 dollars” is a
contradiction to sentence T, since it cannot make
the subtle distinction required here.
T : Ten students passed the exam, but six students
failed it.
H : At least eight students failed the exam.
Here again, QE will only output that T implies
“At least eight students”, despite the second part of
T. QE reasons about the quantities, and there needs
to be an application specific module that understands
which quantity is related to the predicate “failed”.
There also exists limitations regarding inferences
with respect to events that could occur over a period
of time. In “It was raining from 5 pm to 7 pm” one
 
 
 
18
• 
• 
• 
• 
• 
• 
• 
• 
19

Reasoning about Quantities in Natural Language.

  • 1.
    Reasoning about Quantities inNatural Language Subhro Roy, Tim Vieira, Dan Roth. TACL 2015 1
  • 2.
  • 3.
      About six anda half hours later,   Mr. Armstrong opened the landing craft’s hatch.   [About six and a half hours later],   Mr. Armstrong opened the landing craft’s hatch. 3
  • 4.
      About six anda half hours later 4
  • 5.
  • 6.
  • 7.
      The number ofmember nations was 80 in 2000, and then it increased to 95.   The number of adults and children with HIV/AIDS reached 39.4 million in 2004. 7
  • 8.
      CERN has nowgrown to include 20 member states and enjoys the active participation of many other countries world-wide. 8   CERN has 20 member states.
  • 9.
      CERN has nowgrown to include 20 member states and enjoys the active participation of many other countries world-wide. 9   
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
      •  •  14 down” we wouldlike to segment together ”nearly two years after” . We consider a quantity to be correctly detected only when we have the exact phrase that we want, otherwise we consider the segment to be undetected. Model P% R% F% Train Test Time Time Semi-CRF (SC) 75.6 77.7 76.6 15.8 1.5 C+I (PR) 80.3 79.3 79.8 1.0 1.0 Table 2: 10-fold cross-validation results of segmentation accuracy and time required for segmentation, the columns for runtime have been normalized and expressed as ratios Table 2 describes the segmentation accuracy, as well as the ratio between the time taken by both approaches. The bank of classifiers approach gives slightly better accuracy than the semi-CRF model, and is also significantly faster. Task Entailm Contradi No Rela Table 3: consistently quantities ca
  • 15.
      •  •  •  15 e increased 10%”,we would like her “increased 10%”, since this quantity denotes a rise in value. nce “Apple restores push email in two years after Motorola shut it like to segment together ”nearly . We consider a quantity to be only when we have the exact want, otherwise we consider the etected. P% R% F% Train Test Time Time 75.6 77.7 76.6 15.8 1.5 80.3 79.3 79.8 1.0 1.0 ross-validation results of segmentation quired for segmentation, the columns for malized and expressed as ratios es the segmentation accuracy, as between the time taken by both bank of classifiers approach gives uracy than the semi-CRF model, antly faster. exact match only supports 43.3% of the entailment decisions. It is also evident that the deeper semantic analysis using SRL and Coreference improves the quantitative inference. Task System P% R% F% Entailment Baseline 100.0 43.3 60.5 GOLDSEG 98.5 88.0 92.9 +SEM 97.8 88.6 93.0 PREDSEG 94.9 76.2 84.5 +SEM 95.4 78.3 86.0 Contradiction Baseline 16.6 48.5 24.8 GOLDSEG 61.6 92.9 74.2 +SEM 64.3 91.5 75.5 PREDSEG 51.9 79.7 62.8 +SEM 52.8 81.1 64.0 No Relation Baseline 41.8 71.9 52.9 GOLDSEG 81.1 76.7 78.8 +SEM 80.0 78.5 79.3 PREDSEG 54.0 75.4 62.9 +SEM 56.3 72.7 63.5 Table 3: Results of QE; Adding Semantics(+SEM) consistently improves performance; Only 43.3% of entailing quantities can be recovered by simple string matching
  • 16.
      •  •  •  16 ge, divide its obtaina new e of the two tity with the ., time-stamp of time. h ) value triples contradicts or ( Q ) in Algorithm 3. 5.2 Scope of QE Inference Our current QE procedure is limited in several ways. In all cases, we attribute these limitations to subtle and deeper language understanding, which we delegate to the application module that will use our QE procedure as a subroutine. Consider the following examples: T : Adam has exactly 100 dollars in the bank. H1 : Adam has 50 dollars in the bank. H2 : Adam’s bank balance is 50 dollars. Here, T implies H1 but not H2. However for both H1 and H2, QE will infer that “50 dollars” is a contradiction to sentence T, since it cannot make the subtle distinction required here. T : Ten students passed the exam, but six students failed it. H : At least eight students failed the exam.
  • 17.
    •  •  17 ., time-stamp of time. h) value triples contradicts or Q ) Q do entails then = contradicts module that will use our QE procedure as a subroutine. Consider the following examples: T : Adam has exactly 100 dollars in the bank. H1 : Adam has 50 dollars in the bank. H2 : Adam’s bank balance is 50 dollars. Here, T implies H1 but not H2. However for both H1 and H2, QE will infer that “50 dollars” is a contradiction to sentence T, since it cannot make the subtle distinction required here. T : Ten students passed the exam, but six students failed it. H : At least eight students failed the exam. Here again, QE will only output that T implies “At least eight students”, despite the second part of T. QE reasons about the quantities, and there needs to be an application specific module that understands which quantity is related to the predicate “failed”. There also exists limitations regarding inferences with respect to events that could occur over a period of time. In “It was raining from 5 pm to 7 pm” one
  • 18.
  • 19.