Neurocomputing: Shusen Zhou, Qingcai Chen, Xiaolong Wang
Neurocomputing: Shusen Zhou, Qingcai Chen, Xiaolong Wang
Neurocomputing
journal homepage: www.elsevier.com/locate/neucom
art ic l e i nf o a b s t r a c t
Article history: In natural language processing community, sentiment classication based on insufcient labeled data is a
Received 9 May 2012 well-known challenging problem. In this paper, a novel semi-supervised learning algorithm called active
Received in revised form deep network (ADN) is proposed to address this problem. First, we propose the semi-supervised learning
9 April 2013
framework of ADN. ADN is constructed by restricted Boltzmann machines (RBM) with unsupervised
Accepted 15 April 2013
learning based on labeled reviews and abundant of unlabeled reviews. Then the constructed structure is
Communicated by M. Wang
Available online 23 May 2013 ne-tuned by gradient-descent based supervised learning with an exponential loss function. Second, in
the semi-supervised learning framework, we apply active learning to identify reviews that should be
Keywords: labeled as training data, then using the selected labeled reviews and all unlabeled reviews to train ADN
Neural networks
architecture. Moreover, we combine the information density with ADN, and propose information ADN
Deep learning
(IADN) method, which can apply the information density of all unlabeled reviews in choosing the manual
Active learning
Sentiment classication labeled reviews. Experiments on ve sentiment classication datasets show that ADN and IADN
outperform classical semi-supervised learning algorithms, and deep learning techniques applied for
sentiment classication.
& 2013 Elsevier B.V. All rights reserved.
0925-2312/$ - see front matter & 2013 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.neucom.2013.04.017
S. Zhou et al. / Neurocomputing 120 (2013) 536546 537
document representations, which is based on semi-supervised The rest of the paper is organized as follows. Section 2 gives an
auto-encoders that are combined to form a deep network. overview of sentiment classication. The proposed semi-supervised
Active learning is another way that can minimize the number of learning method ADN is described in Section 3. Section 4 combines
required labeled data while getting a competitive result. Rather than ADN and information density into IADN method. Section 5 evaluates
choosing the training set randomly, active learning chooses the ADN and IADN by comparing their classication performance with
training data actively, which reduces the needs of labeled data [16]. existing sentiment classication methods and deep learning methods
It has been widely explored in multimedia research community for on sentiment datasets. The paper is closed with a conclusion.
its capability of reducing human annotation effort [17]. Zha et al.
propose a novel active learning approach based on the optimum
experimental design criteria in statistics for interactive video index- 2. Sentiment classication
ing [18]. Active learning is well-suited to many NLP problems, where
unlabeled data may be abundant but annotation is slow and Sentiment classication can be performed on words, sentences
expensive [19]. Druck et al. propose an active learning approach in or documents, and is generally categorized into lexicon-based [23]
which the machine solicits labels on features rather than instances and corpus-based classication methods [24]. The detail survey
[20]. Zhu et al. combine active and semi-supervised learning under a about techniques and approaches of sentiment classication can
Gaussian random eld model; the active learning scheme requires a be seen in the book [25]. In this paper, we focus on corpus-based
much smaller number of queries to achieve high accuracy compared classication methods.
with random query selection [21]. Recently, active learning has been Corpus-based methods use a labeled corpus to train a sentiment
applied in sentiment classication [3]. classier [24]. Pang et al. [1] are the rst who apply machine learning
Inspired by the study of semi-supervised learning, active approach to corpus-based sentiment classication. They found that
learning and deep learning, this paper proposes a semi- standard machine learning techniques outperform human-produced
supervised sentiment classication method called active deep baselines. They also carried out important experiments on selecting
network (ADN). It is based on a representative deep learning the best features and concluded that unigrams performed better than
method DBN [14] and active learning method [16]. First, we bigrams or unigrams and bigrams together. Dave et al. [26] draw
introduce the semi-supervised learning procedure of ADN method, attention on information retrieval techniques for feature extraction
which constructs the deep architecture with all unlabeled and and scoring in the sentiment classication task. Pang and Lee [27]
labeled reviews, and ne tunes the deep architecture with few apply text-categorization techniques to the subjective portions of the
labeled reviews. To maximize the separability of the classier, an sentiment documents. These portions are extracted by efcient
exponential loss function is suggested. Second, we introduce the techniques for nding minimum cuts in graphs. Gamon [5] demon-
active learning procedure of ADN method. It rst identies a small strates that high accuracy can be achieved by using large feature
number of unlabeled reviews for manual labeling by an active vectors in combination with feature reduction in the very noisy
learner, and then trains the deep architecture with the labeled domain of customer feedback data. Mullen and Collier [28] use
reviews and all other unlabeled reviews. Moreover, we propose support vector machines to bring together diverse sources of
information ADN (IADN) method, to combine the information potentially pertinent information for sentiment classication, includ-
density with ADN, which puts the information density of all ing several favorability measures for phrases and adjectives and,
unlabeled reviews into consideration while choosing the unla- where available, knowledge of the topic of the text. Ng et al. [29]
beled reviews for further labeling. demonstrate that sentiment classication can be performed with
The main contributions of this paper include: First, this paper high accuracy using only unigrams as features. Mcdonald et al. [30]
introduces a new deep architecture that integrates the abstraction investigate a structured model for jointly classifying the sentiment of
ability of deep belief networks and the classication ability of text at various levels of granularity, which is based on standard
backpropagation strategy. It improves the generalization capability sequence classication techniques using constrained Viterbi to
by using the abundant number of unlabeled reviews, and directly ensure consistent solutions. Xia et al. [31] introduce the sentiment
optimizes the classication results in training dataset via the back vector space model to represent song lyric document, assign the
propagation strategy, which makes it possible to achieve attractive sentiment labels such as light-hearted and heavy-hearted. Li et al.
classication performance with few labeled reviews. Second, this [32] propose a machine learning approach to incorporate polarity
paper proposes two effective active learning methods that integrate shifting information into a document-level sentiment classication
the review selection ability of active learning and classication ability system. Liu et al. [33] present an adaptive sentiment analysis model
of deep architecture. Both the labeled review selector and classier that aims to capture the hidden sentiment factors in reviews through
are based on the same architecture, which provides a unied the capability of being incrementally updated as more data becoming
framework for the semi-supervised classication task. Third, this available. Wei et al. [34] propose a novel approach to label attributes
paper applies semi-supervised learning and active learning to senti- of a product and their associated sentiments in product reviews by a
ment classication successfully and gets competitive performance. hierarchical learning process with a dened sentiment ontology tree.
Our experimental results on ve sentiment classication datasets Supervised sentiment classication systems are domain-
show that both ADN and IADN outperform previous sentiment specic and annotating a large-scale corpus for each domain is
classication methods and deep learning methods. very expensive [3]. There exists several solutions for this issue.
This paper is an expanded version of Zhou et al. [22]. Many new The rst solution is cross-domain sentiment classication. Aue
contents are incorporated here: First, the related works of senti- and Gamon [35] survey four different approaches to customize a
ment classication have been extended; more detail introduction sentiment classication system for a new target domain in the
about sentiment classication methods has been made. Second, an absence of large amounts of labeled data. Blitzer et al. [2]
active learning method called IADN is proposed, which combines investigate domain adaptation for sentiment classiers, which
information density with ADN and achieves competitive perfor- reducing the relative error due to adaptation between domains
mance for sentiment classication. Third, more experiments have by an average of 46% over a supervised baseline, and identify a
been conducted to evaluate the performance of deep architecture, measure of domain similarity that correlates well with the
information density incorporation, and various loss functions. potential for adaptation of a classier from one domain to another.
Moreover, we evaluate the proposed active learning methods with Tan et al. [36] combine old-domain labeled examples with new-
a different number of labeled and unlabeled reviews. domain unlabeled ones, and retrain the base classier over all
538 S. Zhou et al. / Neurocomputing 120 (2013) 536546
these examples. Li and Zong [37] study multi-domain sentiment problem. Section 3.2 proposes the semi-supervised learning
classication which aims to improve performance through fusing method of ADN. Section 3.3 proposes active learning method of
training data from multiple domains. Pan et al. [38] propose a ADN. Section 3.4 gives the ADN procedure.
cross-domain sentiment classication method that aligns domain-
specic words extracted from different domains into unied 3.1. Problem formulation
clusters, with the help of domain-independent words as a bridge.
Bollegala et al. [39] automatically create a sentiment sensitive The dataset is compound by a substantial amount of product
thesaurus using both labeled and unlabeled data from multiple reviews. We preprocess the reviews to be classied, the experi-
source domains to nd the association between words that mental setting is same with [3]. Each review is represented as a
express similar sentiments in different domains. He et al. [40] vector of unigrams, using binary weight equal to 1 for terms
modify the joint sentiment-topic model by incorporating word present in a vector. Moreover, the punctuation, numbers, and
polarity priors through modifying the topic-word Dirichlet priors, words of length one are removed from the vector. Finally, we sort
study the polarity-bearing topics extracted by joint sentiment- the vocabulary by document frequency and remove the top 1.5%. It
topic model and show that by augmenting the original feature is because that many of these high document frequency words are
space with polarity-bearing topics, achieve the state-of-the-art stopwords or domain specic general-purpose words (e.g., book
performance of 95% on the movie review data and an average of in the book domain), these noise words would not be helpful for
90% on the multi-domain sentiment dataset. sentiment classication. These words typically comprise 12% of a
The second solution is semi-supervised sentiment classica- vocabulary, the decision of exactly how many terms to remove is
tion. Goldberg and Zhu [41] present a graph-based semi-super- subjective: a large corpus typically requires more removals than a
vised learning algorithm to address the sentiment analysis task of small corpus. To be consistent, we simply remove the top 1.5% high
rating inference, inferring numerical ratings based on the per- frequency words.
ceived sentiment. Sindhwani and Melville [42] propose a semi- After preprocessing, each review is represented by a vector.
supervised sentiment classication algorithm that utilizes lexical Then the dataset is represented as a matrix
prior knowledge in conjunction with unlabeled data. Dasgupta and 2 1 3
x1 ; x21 ; ; xRT
Ng [3] rst mine the unambiguous reviews using spectral techni- 1
6 1 7
ques, and then exploit them to classify the ambiguous reviews via 6 x2 ; x22 ; ; xRT 7
X x1 ; x2 ; ; xRT 6
6 ; ;
2 7
7 1
a novel combination of active learning, transductive learning, and 4 ; 5
ensemble learning. Li et al. [4] adopt two views, personal and x1D x2D ; ; xRT
D
impersonal views, and employ them in both supervised and semi-
supervised sentiment classication. where R is the number of training reviews, T is the number of test
The third solution is unsupervised sentiment classication. reviews, D is the number of feature words in the dataset. Each
Zagibalov and Carroll [43] describe an automatic seed word column of X corresponds to a sample x of review. A sample that
selection method for unsupervised sentiment classication of has all features is viewed as a vector in RD , where the jth
product reviews in Chinese. coordinate corresponds to the jth feature.
There are also several other methods to solve this issue. Read [44] The L labeled reviews are chosen randomly from R training
demonstrates that training data automatically labeled with encoun- reviews, or chosen actively by active learning, which can be seen
tered there emoticons has the potential of being independent of as
domain, topic and time. Wan [24] studies on cross-lingual sentiment
XL XR S; S s1 ; ; sL 1 si R 2
classication, which leverages an available English corpus for Chinese
sentiment classication by using the English corpus as training data. where S is the index set of selected training reviews to be labeled
Machine translation services are used for eliminating the language gap manually.
between the training set and test set, English features and Chinese Let Y be a set of labels correspond to L labeled training reviews
features are considered as two independent views of the classication and is denoted as
problem. Lu et al. [45] present a novel approach for joint bilingual 2 1 3
y1 ; y21 ; ; yL1
sentiment classication at the sentence level that augments available 6 1 L 7
6 y2 ; y2 ; ; y2 7
2
labeled data in each language with unlabeled parallel data. Y L y1 ; y2 ; ; yL 6
6 ;
7 3
However, unsupervised learning of sentiment is difcult, partially 4 ; ; 7 5
because of the prevalence of sentimentally ambiguous reviews [3]. yC ; yC ; ; yC
1 2 L
k k1
The conditional distributions over h and h are given as
k k1 k k1
ph jh pht jh 8
t
k1 k k1 k
ph jh phs jh 9
s
where instances from the pool, has much better performance than the
L C regular random method. So we incorporate this Balance idea
N N
fh XL ; Y L Thj xi yij 19 with ADN method. However, it is not possible to choose an equal
i1j1
number of positive and negative instances without labeling the
and the loss function is dened as entire pool of instances in advance. So we present a simple way to
approximate the balance of positive and negative reviews. For each
Tr exp r 20
iteration, we count, rst, the number of positive and negative
In the supervised learning stage, the stochastic activities are labeled reviews respectively. Second, we classify the unlabeled
replaced by deterministic, real valued probabilities. The greedy reviews in the pool with the deep architecture trained by the
layer-wise unsupervised learning is just used to initialize the previous iteration. Third, we choose the appropriate number of
parameter of deep architecture, the parameters of the deep positive and negative reviews labeled in the second step and add
architecture are updated based on Eq. (15). After initialization, them into the labeled dataset, to let the number of labeled and
real values are used in all the nodes of the deep architecture. We unlabeled review equally. Fourth, we relabel all these new added
use gradient-descent through the whole deep architecture to reviews manually to ensure the correctness of all the review's label
retrain the weights for optimal classication. in the labeled dataset.
for i 1; i I do
As shown by Tong and Koller [16], the balance random method, Step 1. Greedy layer-wise unsupervised learning
which randomly sample an equal number of positive and negative for n 1; n N1 do
S. Zhou et al. / Neurocomputing 120 (2013) 536546 541
N N
for q 1; q Q do location of h XR . The informativeness of h x is weighted by
for k 1; k R T do its average similarity to other samples which in the same side of
N
Calculate the non-linear positive and negative: the separation line with h x. It is formalized as
k
pht 1jh
k1 k1
sigm ckt s wkst hs !
1 U
i i N i N j
k1 k
phs 1jh sigm bs t wkst ht
k1 k ID d dish x ; h x 24
U1 j 1;ji
Update the weights and biases:
k1 k
wkst hs ht P 0 hs ht P 1
k1 k where
N N N N
end XU fj : xj XR h1 xh2 x h1 xj h2 xj 4 0g 25
end
indicates the unlabeled instances that belong to the same class of x
end
based on the classication result of current trained classier
Step 2. Supervised learning with gradient descent
N N N N N N N
Minimize f h X; Y on labeled dataset XL , update the dish xi ; h xj jh1 xi h1 xj j jh2 xi h2 xj j 26
parameter N i N j i
denotes the distance of h x and h x . d denotes the distance
N
space W according to: argminhN f h XL ; Y L N
between a point h xi and the separation line, and is dened by
Step 3. Choose instances for labeled dataset Eq. (21). controls the relative importance of the density term.
Choose G instances which near the separating line by: The training reviews that should be labeled manually are given
j by
s fj : d mindg
Add G instances into the labeled dataset XL s fxi : IDi minIDg 27
end
Train ADN with Steps 1 and 2. The balance selection procedure in ADN is not consider the
cases when there are not enough positive (or negative) reviews for
section, it will select the reviews randomly in this case. For IADN
method, we select all remaining positive (or negative) reviews,
4. Information ADN
and ll the gap with remaining negative (or positive) reviews in
this case. The density calculation relies on the prediction of the
In this part, we combine information density idea [19] with
labels by the classier is necessary, although it might be mislead
ADN, propose a novel information ADN (IADN) method for semi-
in. Because we do not know the label of the reviews in the pool.
supervised sentiment classication.
The classier can recognize most of the reviews rightly. Even
The proposed ADN method can actively choose the reviews that
though some reviews which near the separation line are recog-
are near the separating hyperplane as the training data to be labeled
nized wrongly, there are little effect for the density calculation.
manually. However, ADN does not consider the information density of
! ! Because these wrong recognized reviews are close to these two
these review candidates. For example, in Fig. 2, the samples A and B
classes of reviews at the same time.
are two labeled examples, the other circles are unlabeled data. Since
! The ADN and IADN architecture have different number of hidden
C is the nearest sample to decision boundary, it should be chosen by
! units for each hidden layer. The width of deep architecture used in
ADN method. However, C is far from the center of two classes, i.e., it
different datasets is setting based on the scale of the dataset and the
is not a representative sample in the distribution. In this case, querying
! dimension of the input data. If the scale of the dataset is increasing,
D is likely to contain more information about the dataset. The IADN
and the dimension of the input data is increasing, the number of
method is proposed to put this observation into consideration.
hidden units for each hidden layer is increasing too. Because more
When the deep architecture is trained by L labeled data and all
N parameters in the large deep architecture need more high dimension
unlabeled data, the parameters are adapted, h x is used to
training data to train the architecture effectively. Moreover, the
represent the sample x. Given an unlabeled pool XR , the next
number of units in the last hidden layer is more than other hidden
unlabeled instance to be queried is chosen according to the
layers, because the last hidden layer is linear, and the other hidden
layers are non-linear, more units are needed to be used in linear layer
to represent a model.
5. Experiments
A
We conduct several experiments to compare the performance
D of ADN and IADN with which of existing methods. We have the
following questions in mind while designing and conducting the
experiments:
6. How does varying the number of unlabeled reviews affect the 5.2. ADN performance
performance of ADN and IADN?
To compare the performance of ADN and IADN with previous
works, similar to Dasgupta and Ng [3], we randomly divide the
These questions are answered in the following subsections: 2000 reviews into 10 folds and test all the algorithms using cross-
question 1 in Section 5.2, question 2 in Section 5.3, question 3 in validation. All reviews are used as unlabeled data, where 1000
Section 5.4, question 4 in Section 5.5, question 5 in Section 5.6, and used for training and the other 1000 for test. In each fold, 100
question 6 in Section 5.7. reviews are random selected as training data and the remaining
100 reviews are used for test. For the randomness involved in the
choice of labeled data, all the results of Spectral, TSVM, and DBN
5.1. Experimental setup methods are acquired by repeating 10 times for each fold and then
taking average over results. For Active, MECH, ADN, and IADN
The performance of proposed ADN and IADN methods is methods, one positive and one negative reviews are selected for
evaluated by ve sentiment classication datasets. The rst the initialization of active leaning, 100 labeled reviews are chosen
dataset is MOV [1], which is a widely-used movie review dataset. from the training dataset by active learning and are used for
The other four datasets contain reviews of four types of products, training the classier. For Active, MECH, ADN, and IADN methods,
which include books (BOO), DVDs (DVD), electronics (ELE), and the active learning is performed for ve iterations. In each
kitchen appliances (KIT) respectively [2,3]. Each dataset contains iteration, 20 of the most uncertain points are selected and labeled,
1000 positive and 1000 negative reviews. and then the classier is retrained on all of the unlabeled reviews
For MOV dataset, the ADN and IADN structures used in this and labeled reviews annotated so far. After ve iterations, 100
experiment are 100-100-200-2, which represents the number of labeled reviews are used for training. For these active learning
units in output layer is 2, and in 3 hidden layers are 100, 100, and 200 methods, the initial two labeled reviews are selected randomly, so
respectively. For the other four datasets, the ADN and IADN structures we repeat 30 times for each method, and the results are averaged.
used in this experiment are 50-50-200-2. The number of units in For Spectral, TSVM, DBN, and RAE methods, 100 labeled reviews
input layer is the same as the dimensions of each dataset. The are selected randomly. For Active, MECH, ADN, and IADN methods,
number of units in the third hidden layer is more than previous two 100 labeled reviews are selected by active learning, just the rst
hidden layers, because the unit of third hidden layer is linear, more two labeled reviews are selected randomly.
units can improve the representation ability of third hidden layer. As The classication accuracies on test data in cross validation for
the size of vocabulary in MOV dataset is larger than in other four ve datasets and eight methods are shown in Table 1. The results
datasets, the number of units in previous two hidden layers for MOV of previous four methods are reported by Dasgupta and Ng [3]. The
dataset is more than for other four datasets. The architecture of ADN structure and parameter used for DBN are the same as ADN and
and IADN is similar with DBN, but with a different loss function IADN in this experiment. The experiment of RAE is done based on
introduced for supervised learning stage. The parameters of the deep the default parameters of the source code proposed by Socher
architecture are xed as the default parameter settings of Hinton et al. [23]. Through Table 1, we can see that the performance of
DBN package [14]. For greedy layer-wise unsupervised learning, we DBN is competitive with MECH. Since MECH is the combination of
train the weights of each layer independently with the 30 epochs and spectral clustering, TSVM and active learning, DBN is just a
the learning rate is set to 0.1. The initial momentum is 0.5 and after classication method based on deep neural network, this result
ve epochs, the momentum is set to 0.9. For supervised learning, we shows the good learning ability of deep architecture. ADN is a
run 10 epochs, three times of linear searches are performed in combination of semi-supervised learning and active learning
each epoch. based on deep architecture, the performance of ADN is better
We compare the classication performance of ADN and IADN than previous six methods on all the ve datasets. This could be
with six representative classiers, i.e., semi-supervised spectral contributed by: First, ADN uses a deep architecture to guide the
learning (Spectral) [49], transductive SVM (TSVM), active learning output vector of samples belonged to different regions of new
(Active) [16], mine the easy classify the hard (MECH) [3], deep Euclidean space, which can abstract the useful information that is
belief networks (DBN) [14], and recursive autoencoders (RAE) [23]. not accessible to other learners; Second, ADN uses an exponential
Spectral learning, TSVM, and active learning method are three loss function to maximize the separability of labeled reviews in
baseline methods for sentiment classication. Spectral learning global renement for better discriminability; Third, ADN fully
incorporates labeled data into the clustering framework in the exploits the embedding information from the large amount of
form of must-link and cannot-link constraints. TSVM is the semi- unlabeled reviews to improve the robustness of the classier;
supervised learning version of SVM. Active learning is implemen- Fourth, ADN chooses the useful training reviews actively, which
ted based on SVM, which training an inductive SVM on one labeled also improves the classication performance. The performance of
review from each class, iteratively labeling the most uncertain IADN is better than previous seven methods. It is because that the
unlabeled reviews and re-training the SVM until 100 reviews are semi-supervised learning method used in IADN is same as in ADN,
labeled. MECH is a new semi-supervised method for sentiment
classication [3], which rst mines the unambiguous reviews
Table 1
using spectral techniques, and then exploits them to classify the Test accuracy with 100 labeled reviews for ve datasets and eight methods.
ambiguous reviews via a novel combination of active learning,
transductive learning, and ensemble learning. The implementation Type MOV KIT ELE BOO DVD
details of Spectral, TSVM, Active, and MECH methods are intro-
Spectral 67.3 63.7 57.7 55.8 56.2
duced by Dasgupta and Ng [3]. DBN is a classical deep learning
TSVM 68.7 65.5 62.9 58.7 57.3
method proposed recently [14]. The parameters of the DBN in our Active 68.9 68.1 63.3 58.6 58.0
experiment are xed as the default parameter settings of Hinton MECH 76.2 74.1 70.6 62.1 62.7
DBN package [14], the minimum mean squared error loss function DBN 71.3 72.6 73.6 64.3 66.7
is used in DBN architecture. RAE learns vector space representa- RAE 66.3 69.4 68.2 61.3 63.1
ADN 76.3 77.5 76.8 69.0 71.6
tions for multi-word phrases based on recursive autoencoders. The IADN 76.4 78.2 77.9 69.7 72.2
implementation details of RAE are introduced by Socher et al. [23].
S. Zhou et al. / Neurocomputing 120 (2013) 536546 543
80
ADN
IADN
75
70
Accuracy (%)
65
60
55
50
Fig. 3. Test accuracy of DBN and ADN with different experiment setting on ve
datasets. 45
0 1 2 3 4 5
Number of iterations
and the active learning method used in IADN improves the Fig. 4. Performance curve of ADN and IADN with iterations of active learning.
classication performance.
The test accuracies of deep architectures with different loss 5.6. Semi-supervised learning with variance of labeled data
functions on ve datasets are shown in Fig. 6. The results show
that exponential loss function reaches the best performance on all To verify the performance of ADN and IADN with different
sentiment datasets, and the differences against the second best number of labeled data, we conduct another series of experiments
methods are statistically signicant (p o 0:05) with the paired t- on ve datasets and show the results in Fig. 7. The architectures
test for all ve datasets. The performance of squared error loss for ADN and IADN used in this experiment are the same as Section
function is competitive with hinge loss function. This proves the 5.2. For both ADN and IADN methods, we repeat 30 times for each
effectiveness of exponential loss function used in ADN and IADN experimental setting, and the results are averaged.
architecture. Fig. 7 shows that ADN and IADN can reach a relative high
accuracy by using just 20 labeled reviews for training. For most of
the ve sentiment datasets, the test accuracies are increasing
slowly while the number of labeled review is growing. It also
shows that the performance of ADN is competitive with IADN. In
MOV and ELE datasets, the performance of ADN is even better than
IADN. Because there are few abnormal reviews in these two
datasets, the information density restriction does not take effect
in these two experiments. In the other three datasets, the
performance of IADN is better than ADN.
Fig. 7. Test accuracy of ADN and IADN with different number of labeled reviews on Fig. 8. Test accuracy of ADN and IADN with different number of unlabeled reviews
ve datasets. on ve datasets.
S. Zhou et al. / Neurocomputing 120 (2013) 536546 545
just using 800 unlabeled reviews. When the number of unlabeled [5] M. Gamon, Sentiment classication on customer feedback data: noisy data,
reviews is reduced from 2000 to 800, the performance of ADN and large feature vectors, and the role of linguistic analysis, in: International
Conference on Computational Linguistics, Association for Computational
IADN is not worse. For DVD dataset, the performance of ADN and Linguistics, Switzerland, 2004, pp. 841847.
IADN which use 800 unlabeled reviews is better than them using [6] T. Li, Y. Zhang, V. Sindhwani, A non-negative matrix tri-factorization approach
2000 unlabeled reviews. When the number of unlabeled reviews to sentiment classication with lexical prior knowledge, in: Joint Conference
of the 47th Annual Meeting of the Association for Computational Linguistics
is reduced from 800 to 100, the performance of ADN and IADN get and 4th International Joint Conference on Natural Language Processing of the
worse quickly. This proves that ADN and IADN can get competitive Asian Federation of Natural Language Processing, Association for Computa-
performance with just few labeled reviews and appropriate tional Linguistics, Singapore, 2009, pp. 244252.
[7] R. Raina, A. Battle, H. Lee, B. Packer, A. Y. Ng, Self-taught learning: transfer
number of unlabeled reviews. Inclusion of unlabeled data does
learning from unlabeled data, in: International Conference on Machine
always improve the performance, however, if there are enough learning, ACM, Corvallis, Oregon, USA, 2007, pp. 759766.
unlabeled data, add more unlabeled data just add much time [8] X. Zhu, Semi-supervised Learning Literature Survey, Technical Report, Uni-
needed for training, the performance will not improve signi- versity of Wisconsin Madison, Madison, WI, USA, 2007.
[9] M. Wang, X.-S. Hua, T. Mei, R. Hong, G. Qi, Y. Song, L.-R. Dai, Semi-supervised
cantly. Considering the much time needed for training with more kernel density estimation for video annotation, Comput. Vis. Image Under-
unlabeled reviews and less accuracy improved for ADN and IADN standing 113 (2009) 384396.
method, we suggest using appropriate number of unlabeled [10] M. Wang, X.-S. Hua, R. Hong, J. Tang, G.-J. Qi, Y. Song, Unied video annotation via
multigraph learning, IEEE Trans. Circuits Syst. Video Technol. 19 (2009) 733746.
reviews in real application. In this experiment, the performance [11] Z. Zha, T. Mei, J. Wang, Z. Wang, X. Hua, Graph-based semi-supervised learning
of ADN is competitive with IADN too. with multiple labels, J. Visual Commun. Image Representation 20 (2009)
97103.
[12] Y. Bengio, Learning Deep Architectures for AI, Technical Report, IRO, Universite
de Montreal, 2007.
6. Conclusions [13] R. Salakhutdinov, G.E. Hinton, Learning a nonlinear embedding by preserving
class neighbourhood structure, J. Mach. Learn. Res. 2 (2007) 412419.
This paper proposes a novel semi-supervised learning algorithm [14] G.E. Hinton, R. Salakhutdinov, Reducing the dimensionality of data with neural
networks, Science 313 (2006) 504507.
ADN to address the sentiment classication problem with a small [15] M. Ranzato, M. Szummer, Semi-supervised learning of compact document
number of labeled reviews. ADN can choose the proper training representations with deep networks, in: International Conference on Machine
reviews to be labeled manually, and fully exploit the embedding Learning, ACM, Helsinki, Finland, 2008, pp. 792799.
[16] S. Tong, D. Koller, Support vector machine active learning with applications to
information from the large amount of unlabeled reviews to improve
text classication, J. Mach. Learn. Res. 2 (2002) 4566.
the robustness of the classier. We propose a new architecture [17] M. Wang, X.-S. Hua, Active learning in multimedia annotation and retrieval: a
to guide the output vector of samples locating into different regions survey, ACM Trans. Intell. Syst. Technol. 2 (2011) 121.
[18] Z. Zheng-Jun, W. Meng, Z. Yan-Tao, Y. Yi, H. Richang, C. Tat-Seng, Interactive
of new Euclidean space, and use an exponential loss function to
video indexing with statistical active learning, IEEE Trans. Multimedia 14
maximize the separability of labeled reviews in global renement for (2012) 1727.
better discriminability. Moreover, we also propose IADN method, [19] B. Settles, M. Craven, An analysis of active learning strategies for sequence
which puts the information density of different reviews into con- labeling tasks, in: Conference on Empirical Methods in Natural Language
Processing, Association for Computational Linguistics, Stroudsburg, PA, USA,
sideration when choosing reviews to be labeled manually. 2008, pp. 10701079.
The performance of ADN and IADN is compared with existing [20] G. Druck, B. Settles, A. McCallum, Active learning by labeling features, in:
semi-supervised learning methods and deep learning technique. Conference on Empirical Methods in Natural Language Processing, Association
for Computational Linguistics, Singapore, 2009, pp. 8190.
Experiment results show that both ADN and IADN reach better [21] X. Zhu, J. Lafferty, Z. Ghahramani, Combining active learning and semi-
performance than compared methods. We also conduct experi- supervised learning using Gaussian elds and harmonic functions, in: ICML
ments to verify the effectiveness of ADN and IADN with different 2003 Workshop on the Continuum from Labeled to Unlabeled Data in
Machine Learning and Data Mining, AAAI, Washington DC, USA, 2003,
number of labeled reviews and unlabeled reviews separately, and
pp. 5865.
demonstrate that ADN and IADN can get competitive classication [22] S. Zhou, Q. Chen, X. Wang, Active deep networks for semi-supervised
performance just by using few labeled reviews and appropriate sentiment classication, in: International Conference on Computational Lin-
number of unlabeled reviews. guistics, Coling 2010 Organizing Committee, Beijing, China, 2010, pp. 1515
1523.
[23] R. Socher, J. Pennington, E.H. Huang, A.Y. Ng, C.D. Manning, Semi-supervised
recursive autoencoders for predicting sentiment distributions, in: Conference
Acknowledgments on Empirical Methods in Natural Language Processing, Association for
Computational Linguistics, Edinburgh, UK, 2011, pp. 151161.
[24] X. Wan, Co-training for cross-lingual sentiment classication, in: Joint Con-
This work is supported in part by the Scientic Research Fund ference of the 47th Annual Meeting of the Association for Computational
of Ludong University (LY2013004) and National Natural Science Linguistics and 4th International Joint Conference on Natural Language
Processing of the Asian Federation of Natural Language Processing, Associa-
Foundation of China (Nos. 61173075 and 60973076).
tion for Computational Linguistics, Stroudsburg, PA, USA, 2009, pp. 235243.
[25] B. Pang, L. Lee, Opinion mining and sentiment analysis, Found. Trends Inf. Retr.
References 2, 2008.
[26] K. Dave, S. Lawrence, D.M. Pennock, Mining the peanut gallery: opinion
extraction and semantic classication of product reviews, in: International
[1] B. Pang, L. Lee, S. Vaithyanathan, Thumbs up? Sentiment classication using Conference on World Wide Web, ACM, New York, NY, USA, 2003, pp. 519528.
machine learning techniques, in: Conference on Empirical Methods in Natural [27] B. Pang, L. Lee, A sentimental education: sentiment analysis using subjectivity
Language Processing, Association for Computational Linguistics, Stroudsburg, summarization based on minimum cuts, in: 42th Annual Meeting of the
PA, USA, 2002, pp. 7986. Association of Computational Linguistics, Association for Computational
[2] J. Blitzer, M. Dredze, F. Pereira, Biographies, bollywood, boom-boxes and Linguistics, Barcelona, Spain, 2004, pp. 271278.
blenders: domain adaptation for sentiment classication, in: Annual Meeting [28] T. Mullen, N. Collier, Sentiment analysis using support vector machines with
of the Association of Computational Linguistics, Association for Computational diverse information sources, in: Conference on Empirical Methods in Natural
Linguistics, Prague, Czech Republic, 2007, pp. 440447. Language Processing, Association for Computational Linguistics, Barcelona,
[3] S. Dasgupta, V. Ng, Mine the easy, classify the hard: a semi-supervised Spain, 2004, pp. 412418.
approach to automatic sentiment classication, in: Joint Conference of the [29] V. Ng, S. Dasgupta, S.M.N. Arin, Examining the role of linguistic knowledge
47th Annual Meeting of the Association for Computational Linguistics and 4th sources in the automatic identication and classication of reviews, in: 21st
International Joint Conference on Natural Language Processing of the Asian International Conference on Computational Linguistics and 44th Annual
Federation of Natural Language Processing, Association for Computational Meeting of the Association for Computational Linguistics, Association for
Linguistics, Stroudsburg, PA, USA, 2009, pp. 701709. Computational Linguistics, Sydney, Australia, 2006, pp. 611618.
[4] S. Li, C.-R. Huang, G. Zhou, S.Y.M. Lee, Employing personal/impersonal views in [30] R. Mcdonald, K. Hannan, T. Neylon, M. Wells, J. Reynar, Structured models for
supervised and semi-supervised sentiment classication, in: Annual Meeting ne-to-coarse sentiment analysis, in: Annual Meeting of the Association of
of the Association for Computational Linguistics, Association for Computa- Computational Linguistics, Association for Computational Linguistics, Prague,
tional Linguistics, Uppsala, Sweden, 2010, pp. 414423. Czech Republic, 2007, pp. 432439.
546 S. Zhou et al. / Neurocomputing 120 (2013) 536546
[31] Y. Xia, L. Wang, K.-F. Wong, M. Xu, Lyric-based song sentiment classication [48] T. Joachims, Transductive inference for text classication using support vector
with sentiment vector space model, in: Annual Meeting of the Association of machines, in: International Conference on Machine Learning, Morgan Kauf-
Computational Linguistics, Association for Computational Linguistics, Colum- mann Publishers, Bled, Slovenia, 1999, pp. 200209.
bus, Ohio, 2008, pp. 133136. [49] S. Kamvar, D. Klein, C. Manning, Spectral learning, in: International Joint
[32] S. Li, S.Y.M. Lee, Y. Chen, C.-R. Huang, G. Zhou, Sentiment classication and Conferences on Articial Intelligence, AAAI Press, Catalonia, Spain, 2003,
polarity shifting, in: International Conference on Computational Linguistics, pp. 561566.
Association for Computational Linguistics, Stroudsburg, PA, USA, 2010, [50] Y. Liu, S. Zhou, Q. Chen, Discriminative deep belief networks for visual data
pp. 635643. classication, Pattern Recognition 44 (2011) 22872296.
[33] Y. Liu, X. Yu, X. Huang, A. An, S-PLASA+: adaptive sentiment analysis with
application to sales performance prediction, in: International ACM SIGIR
Conference on Research and Development in Information Retrieval, ACM,
New York, NY, USA, 2010, pp. 873874.
Shusen Zhou received the Ph.D. degree in computer
[34] W. Wei, J.A. Gulla, Sentiment learning on product reviews via sentiment
application technology from the Harbin Institute of
ontology tree, in: Annual Meeting of the Association for Computational
Technology in 2012. He is currently an assistant pro-
Linguistics, Association for Computational Linguistics, Stroudsburg, PA, USA,
fessor in Ludong University. His main research interests
2010, pp. 404413.
include machine learning, articial intelligence, multi-
[35] A. Aue, M. Gamon, Customizing sentiment classiers to new domains: a case
media content analysis and computational linguistics.
study, in: International Conference on Recent Advances in Natural Language
Processing, RANLP 2005 Organising Committee, Borovets, Bulgaria, 2005.
[36] S. Tan, G. Wu, H. Tang, X. Cheng, A novel scheme for domain-transfer problem
in the context of sentiment analysis, in: Conference on Information and
Knowledge Management, ACM, New York, NY, USA, 2007, pp. 979982.
[37] S. Li, C. Zong, Multi-domain sentiment classication, in: 46th Annual Meeting
of the Association of Computational Linguistics, Association for Computational
Linguistics, Columbus, Ohio, 2008, pp. 257260.
[38] S.J. Pan, X. Ni, J.-T. Sun, Q. Yang, Z. Chen, Cross-domain sentiment classication
via spectral feature alignment, in: International World Wide Web Conference,
ACM, New York, NY, USA, 2010, pp. 751760. Qingcai Chen received the Ph.D. degree in computer
[39] D. Bollegala, D. Weir, J. Carroll, Using multiple sources to construct a sentiment science from the Computer Science and Engineering
sensitive thesaurus for cross-domain sentiment classication, in: Annual Department, Harbin Institute of Technology. From Sep-
Meeting of the Association for Computational Linguistics, Association for tember 2003 to August 2004, he worked for Intel
Computational Linguistics, Portland, Oregon, USA, 2011, pp. 132141. (China) Ltd. as a senior software engineer. Since Sep-
[40] Y. He, C. Lin, H. Alani, Automatically extracting polarity-bearing topics for tember 2004, he has been with the Computer Science
cross-domain sentiment classication, in: Annual Meeting of the Association and Technology Department of Harbin Institute of
for Computational Linguistics, Association for Computational Linguistics, Technology Shenzhen Graduate School as an associate
Portland, Oregon, USA, 2011, pp. 123131. professor. His research interests include machine learn-
[41] A.B. Goldberg, X. Zhu, Seeing stars when there aren't many stars: graph-based ing, pattern recognition, speech signal processing, and
semi-supervised learning for sentiment categorization, in: Proceedings of natural language processing.
TextGraphs: The First Workshop on Graph Based Methods for Natural
Language Processing, Association for Computational Linguistics, Stroudsburg,
PA, USA, 2006, pp. 4552.
[42] V. Sindhwani, P. Melville, Document-word co-regularization for semi-
supervised sentiment analysis, in: International Conference on Data Mining,
IEEE, Pisa, Italy, 2008, pp. 10251030. Xiaolong Wang received the B.E. degree in computer
[43] T. Zagibalov, J. Carroll, Automatic seed word selection for unsupervised science from the Harbin Institute of Electrical Technol-
sentiment classication of Chinese text, in: International Conference on ogy, Harbin, China, in 1982, the M.E. degree in compu-
Computational Linguistics, Association for Computational Linguistics, Strouds- ter architecture from Tianjin University, Tianjin, China,
burg, PA, USA, 2008, pp. 10731080. in 1984, and the Ph.D. degree in computer science and
[44] J. Read, Using emoticons to reduce dependency in machine learning techni- engineering from the Harbin Institute of Technology in
ques for sentiment classication, in: the Association of Computational 1989. He was an Assistant Lecturer in 1984 and an
Linguistics Student Research Workshop, Association for Computational Lin- Associate Professor in 1990 with the Harbin Institute of
guistics, Ann Arbor, Michigan, 2005, pp. 4348. Technology. From 1998 to 2000, he was a Senior
[45] B. Lu, C. Tan, C. Cardie, B.K. Tsou, Joint bilingual sentiment classication with Research Fellow with the Department of Computing,
unlabeled parallel corpora, in: Annual Meeting of the Association for Compu- Hong Kong Polytechnic University, Kowloon. He is
tational Linguistics, Association for Computational Linguistics, Portland, Ore- currently a Professor of computer science with the
gon, USA, 2011, pp. 320330. Harbin Institute of Technology Shenzhen Graduate
[46] R. Salakhutdinov, I. Murray, On the quantitative analysis of deep belief School. His research interest includes articial intelligence, machine learning,
networks, in: International Conference on Machine learning, ACM, Helsinki, computational linguistics, and Chinese information processing.
Finland, 2008, pp. 872879.
[47] G.E. Hinton, Training products of experts by minimizing contrastive diver-
gence, Neural Comput. 14 (2002) 17711800.