Neural Network-Based
Abstract Generation for
Opinions and Arguments
Lu Wang and Wang Ling
NAACL 2016
論文紹介
Presentation:Tomonori Kodaira
1
Abstract
• 文書to文のAbstractive要約
• 文書をエンコードする際,Samplingによって計算
量の問題を解決

その際,text unitの重要度を基準にTop-Kのデータ
を用いる
2
Introduction
• Authors present an attention-based NN model for
generating abstractive summaries of opinionated text.
• Their system takes as input a set of text units, and
then outputs a one-sentence abstractive summary.
• Two type of opinionated text:

Movie reviews

Arguments on controversial topic
3
Introduction
• Systems

Attention-based model (Bahdanau et al. 2014)

An importance-based sampling method.
• The importance score of a text unit is estimated
from a regression model with pairwise preference-
based sampling.
4
Data Collection
Rotten Tomatoes (www.rottentomatoes.com)
• There are professional critics and user-generated reviews.
• For each movie has a one-sentence critic consensus.
Data
• 246,164 critics and their opinion consensus for 3,731 movies
• train: 2,458, validation: 536, test: 737 movies
Movie Reviews
5
Data Collection
idebate.org
• This site is wikipedia-style website for gathering pro
and con arguments on controversial issues.
• Each point contains a one-sentence central claim.
Data
• 676 debates with 2,259 claims.
• train: 450, validation: 67, test: 150 debates
Arguments on controversial topic
6
Text units
Data Collection
7
The Neural Network-Based Abstract
Generation Model
• summary y (composed by the sequence of words
y1 , …, |y|.
• input consists of an arbitrary number of reviews or
arguments -> text units

x = {x1, … , xM}
• Each text unit xk is composed by a sequence of words
xk
1 , …., xk
|x
k
|.
Problem Formulation
8
The Neural Network-Based Abstract
Generation Model
• a sequence of word-level predictions: 

log P(y|x) = ∑j=1 log P(yj| y1, ….yj-1, x)

P(yi | y1 , …, yj-1, x) = sofmax(hj)
• hj is RNNs state variable.

hj = g(yj-1, hj-1, s)
• g is LSTM network (Hochreiter and Schmidhuber, 1997)
Decoder
9
The Neural Network-Based Abstract
Generation Model
• LSTM
• The model concatenates the representation of previous
output word yj-1 and the input representation s as uj
Decoder
10
The Neural Network-Based Abstract
Generation Model
• The representation of input text units s is computed
using an attention model (Bahdanau et al., 2014)

-> ∑i=1aibi
• Authors construct bi by building a bidirectional LSTM.

They use the LSTM formulation by setting uj = xj.
• ai = softmax(v(bi, hj-1)) 

v(bi, hj-1) = Ws•tanh(Wcgbi + Whghj-1)
Encoder
11
The Neural Network-Based Abstract
Generation Model
Their input consists multiple separate text units.
• one sequence z =
There two problem:
• The model is sensitive to the order of text units
• z may contain thousands of words.
Attention Over Multiple Inputs
12
The Neural Network-Based Abstract
Generation Model
Sub-sampling from the input
• They define an importance score f (xk) ∈ [0, 1] for
each document xk.
• K candidates are sampled
Attention Over Multiple Inputs
13
The Neural Network-Based Abstract
Generation Model
a ridge regression model and a regularizer.
• Learning f(xk) = rk•w 

by minimizing ||Rw - L ||2
2 + λ•||R’w-L’||2
2 + β•||w||2
2.
• text unit xk is represented as an d-dimentional
feature vector rk ∈ Rd.
Importance Estimation
14
The Neural Network-Based Abstract
Generation Model
• For testing phase, they re-rank the n-best summaries
according to their cosine similarity with the input text units.
• The one with the highest similarity is included in the
final summary.
Post-processing
15
Experimental Setup
• Data Preprocessing

Stanford CoreNLP (Manning et al., 2014)
• Pre trained Embeddings and Features

word embedding: 300 dimension

They extend their model with additional features.
16
• Hyper parameters 

The LSTMs are defined with states and cells of 150
dimensions.

The attention: 100 dimensions. 

Training is performed via Adagrad (Duchi et al. 2011)
• Evaluation : BLEU
• The importance-based sampling rate K is set of 5
• Decoding: beam serch -> 20
Experimental Setup
17
Results
• MRR (Mean Reciprocal Rank
• NDCG (normalized Discounted Cumulative Gain)
Importance Estimation Evaluation
18
Results
Importance Estimation Evaluation
19
Results
Human Evaluation on Summary Quality
20
Results
Sampling Effect
21
Conclusion
• Authors presented a neural approach to generate
abstractive summaries for opinionated text.
• They employed an attention-based method that
finds salient information from different input text
units.
• They deploy an importance-based sampling
mechanism for model training.
• Their system obtained sota results.
22

[Introduction] Neural Network-Based Abstract Generation for Opinions and Arguments

  • 1.
    Neural Network-Based Abstract Generationfor Opinions and Arguments Lu Wang and Wang Ling NAACL 2016 論文紹介 Presentation:Tomonori Kodaira 1
  • 2.
  • 3.
    Introduction • Authors presentan attention-based NN model for generating abstractive summaries of opinionated text. • Their system takes as input a set of text units, and then outputs a one-sentence abstractive summary. • Two type of opinionated text:
 Movie reviews
 Arguments on controversial topic 3
  • 4.
    Introduction • Systems
 Attention-based model(Bahdanau et al. 2014)
 An importance-based sampling method. • The importance score of a text unit is estimated from a regression model with pairwise preference- based sampling. 4
  • 5.
    Data Collection Rotten Tomatoes(www.rottentomatoes.com) • There are professional critics and user-generated reviews. • For each movie has a one-sentence critic consensus. Data • 246,164 critics and their opinion consensus for 3,731 movies • train: 2,458, validation: 536, test: 737 movies Movie Reviews 5
  • 6.
    Data Collection idebate.org • Thissite is wikipedia-style website for gathering pro and con arguments on controversial issues. • Each point contains a one-sentence central claim. Data • 676 debates with 2,259 claims. • train: 450, validation: 67, test: 150 debates Arguments on controversial topic 6
  • 7.
  • 8.
    The Neural Network-BasedAbstract Generation Model • summary y (composed by the sequence of words y1 , …, |y|. • input consists of an arbitrary number of reviews or arguments -> text units
 x = {x1, … , xM} • Each text unit xk is composed by a sequence of words xk 1 , …., xk |x k |. Problem Formulation 8
  • 9.
    The Neural Network-BasedAbstract Generation Model • a sequence of word-level predictions: 
 log P(y|x) = ∑j=1 log P(yj| y1, ….yj-1, x)
 P(yi | y1 , …, yj-1, x) = sofmax(hj) • hj is RNNs state variable.
 hj = g(yj-1, hj-1, s) • g is LSTM network (Hochreiter and Schmidhuber, 1997) Decoder 9
  • 10.
    The Neural Network-BasedAbstract Generation Model • LSTM • The model concatenates the representation of previous output word yj-1 and the input representation s as uj Decoder 10
  • 11.
    The Neural Network-BasedAbstract Generation Model • The representation of input text units s is computed using an attention model (Bahdanau et al., 2014)
 -> ∑i=1aibi • Authors construct bi by building a bidirectional LSTM.
 They use the LSTM formulation by setting uj = xj. • ai = softmax(v(bi, hj-1)) 
 v(bi, hj-1) = Ws•tanh(Wcgbi + Whghj-1) Encoder 11
  • 12.
    The Neural Network-BasedAbstract Generation Model Their input consists multiple separate text units. • one sequence z = There two problem: • The model is sensitive to the order of text units • z may contain thousands of words. Attention Over Multiple Inputs 12
  • 13.
    The Neural Network-BasedAbstract Generation Model Sub-sampling from the input • They define an importance score f (xk) ∈ [0, 1] for each document xk. • K candidates are sampled Attention Over Multiple Inputs 13
  • 14.
    The Neural Network-BasedAbstract Generation Model a ridge regression model and a regularizer. • Learning f(xk) = rk•w 
 by minimizing ||Rw - L ||2 2 + λ•||R’w-L’||2 2 + β•||w||2 2. • text unit xk is represented as an d-dimentional feature vector rk ∈ Rd. Importance Estimation 14
  • 15.
    The Neural Network-BasedAbstract Generation Model • For testing phase, they re-rank the n-best summaries according to their cosine similarity with the input text units. • The one with the highest similarity is included in the final summary. Post-processing 15
  • 16.
    Experimental Setup • DataPreprocessing
 Stanford CoreNLP (Manning et al., 2014) • Pre trained Embeddings and Features
 word embedding: 300 dimension
 They extend their model with additional features. 16
  • 17.
    • Hyper parameters
 The LSTMs are defined with states and cells of 150 dimensions.
 The attention: 100 dimensions. 
 Training is performed via Adagrad (Duchi et al. 2011) • Evaluation : BLEU • The importance-based sampling rate K is set of 5 • Decoding: beam serch -> 20 Experimental Setup 17
  • 18.
    Results • MRR (MeanReciprocal Rank • NDCG (normalized Discounted Cumulative Gain) Importance Estimation Evaluation 18
  • 19.
  • 20.
    Results Human Evaluation onSummary Quality 20
  • 21.
  • 22.
    Conclusion • Authors presenteda neural approach to generate abstractive summaries for opinionated text. • They employed an attention-based method that finds salient information from different input text units. • They deploy an importance-based sampling mechanism for model training. • Their system obtained sota results. 22