Building Digital Ink Recognizers Using Data Mining PDF
Building Digital Ink Recognizers Using Data Mining PDF
net/publication/221048185
CITATIONS READS
3 55
4 authors, including:
Yong Wang
University of Auckland
21 PUBLICATIONS 213 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Software Engineering for Big Data and Cloud Computing View project
All content following this page was uploaded by Beryl Plimmer on 04 June 2014.
Abstract. The low accuracy rates of text-shape dividers for digital ink diagrams
are hindering their use in real world applications. While recognition of
handwriting is well advanced and there have been many recognition approaches
proposed for hand drawn sketches, there has been less attention on the division
of text and drawing. The choice of features and algorithms is critical to the
success of the recognition, yet heuristics currently form the basis of selection.
We propose the use of data mining techniques to automate the process of
building text-shape recognizers. This systematic approach identifies the
algorithms best suited to the specific problem and generates the trained
recognizer. We have generated dividers using data mining and training with
diagrams from three domains. The evaluation of our new recognizer on realistic
diagrams from two different domains, against two other recognizers shows it to
be more successful at dividing shapes and text with 95.2% of strokes correctly
classified compared with 86.9% and 83.3% for the two others.
1 Introduction
Hand drawn pen and paper sketches are commonplace for capturing early phase
designs and diagrams. Pen and paper offers an unconstrained space suitable for quick
construction and allow for ambiguity. With recent advances in hardware such as
Tablet PC’s, computer based sketch tools offer a similar pen-based interaction
experience. In addition, these computer based tools can benefit from the ease of
digital storage, transmission and archiving. Recognition of sketches can add even
greater value to these tools. The ability to automatically identify elements in a sketch
allows us to support tasks such as intelligent editing, execution, conversion and
animation of the sketches.
2 Rachel Blagojevic1, 1, Beryl Plimmer1, 3, John Grundy2, 2, Yong Wang1, 4
2 Background
Two particular applications of dividers are freehand note-taking and hand drawn
diagrams. The research on sketched diagram recognition includes dividers but has
also addressed recognition of basic shapes and spatial relationships between diagram
components. This project has drawn on the work from both applications of dividers.
The majority of recognizers rely on information provided by various measurements
of the digital ink strokes (digital ink is represented as a vector of x, y points, each
point has a time and possible pressure attribute) [1, 2, 10], as well as specific
algorithms to combine and select the appropriate features.
In the area of sketched diagram recognition many systems focus only on shapes [1-
3]. There have been some attempts at incorporating text-shape division in domain
specific recognizers [11, 12] and domain independent diagramming tools [10, 13].
These systems are predominantly rule-based, using stroke features chosen
heuristically to distinguish between text and shapes.
Building Digital Ink Recognizers using Data Mining: Distinguishing Between Text and
Shapes in Hand Drawn Diagrams 3
Research in the area of digital ink document analysis for freehand note-taking has
explored text-shape division [14-19]. However as the content of documents is mainly
text these methods hold some bias which make them unsuitable for sketched
diagrams. In addition, as Bhat and Hammond [7] point out, some of these methods
would have difficulty with text interspersed within a diagram. There has also been
some work separating Japanese characters from shapes in documents [18, 20].
Three reports specifically on dividers are [7-9]. Bishop et al [8] use local stroke
features and spatial and temporal context within an HMM to distinguish between text
and shape strokes. They found that using local features and temporal context was
successful. They report classification rates from 86.4% to 97.0% for three classifier
model variations.
In our previous work [9] we developed a domain independent divider for shapes
and text based on statistical analysis of 46 stroke features. A decision tree was built
identifying eight features as significant for distinguishing between shapes and text.
The results on a test set showed an accuracy of 78.6% for text and 57.9% for shapes.
Part of the test set was composed of musical notes which had a significant effect on
this low classification rate. However, when evaluated against the Microsoft and
InkKit dividers, it was able to correctly classify more strokes overall for the test set.
A more recent development in this field is the use of a feature called entropy [7] to
distinguish between shapes and text. First strokes are grouped into shapes and
words/letters and then stroke points are re-sampled for smoothing. The angle between
every point and its adjacent points in the stroke group is calculated. Each angle from
the stroke group is matched to a dictionary containing a different alphabet symbol to
represent a range of angles. This results in a text string representation of each stroke
group. Using Shannon’s entropy formula (as cited by Bhat et al [7]) they sum up the
probabilities of each letter in the string to find the entropy of that group. This value of
entropy is higher for text than shapes as text is more “information dense” than shapes.
They report that 92.06% of data which it had training examples for were correctly
classified. For data the divider had not been trained on it had an accuracy of 96.42%,
however only 71.06% of data was able to be classified. We have re-implemented this
algorithm for our evaluation. As our evaluation will show, this divider has been
trained and tested on limited data and constrained conditions and does not perform at
the reported rate of 92.06% on realistic diagrams.
The choice of features and algorithms is critical to the success of the recognition,
yet heuristics currently form the basis of selection. Given that features provide such
value as input to recognition algorithms, a feature set should be chosen carefully
using statistical or data mining techniques. While others have used some data mining
techniques [8, 15] to the best of our knowledge no one has done a comprehensive
analysis of algorithms. We present below a comprehensive comparative study of
features and algorithms to select the most accurate model. In particular we are looking
at the problem of distinguishing between text and shapes as a first step to recognizing
sketched diagrams; a fundamental problem required to preserve a non-modal user
interface similar to pen and paper.
4 Rachel Blagojevic1, 1, Beryl Plimmer1, 3, John Grundy2, 2, Yong Wang1, 4
3 Our Approach
In order to use data mining techniques to build classifiers we first compiled a
comprehensive feature library which is used in conjunction with our training set of
diagrams to generate a training dataset. We investigated a wide range of data mining
algorithms before focusing on seven that were producing the most promising results.
These seven algorithms and the training dataset were used to build new dividers.
3.1 Features
Our previous feature set [9] of 46 features has been extended to a more
comprehensive library of 115 stroke features for sketch recognition. It has been
assembled from previous work in sketch recognition, includes some of our own
additions, Entropy [7], and our previous divider [9]. Our previous divider is used for
several features: pre-classification of the current stroke, pre-classification of strokes
close by (for spatial context), and pre-classification of successive strokes (for
temporal context).
Many researchers have developed features that measure similar attributes. In order
to give the reader some sense of the types of features we have categorized the feature
library into ten categories, summarized in table 1.
This feature library is available with full implementation within DataManager [21]
from www.cs.auckland.ac.nz/research/hci/downloads.
3.2 Dataset
For the training set we have collected and labeled sketched diagrams from 20
participants using DataManager [21]. Each participant has drawn three diagrams; a
directed graph, organization chart and a user interface e.g. figure 1. There are a total
of 7248 strokes in the training set, with 5616 text strokes and 1632 shape strokes.
Building Digital Ink Recognizers using Data Mining: Distinguishing Between Text and
Shapes in Hand Drawn Diagrams 5
Using this collection of diagrams we have generated a dataset of feature vectors for
each stroke using DataManager. DataManager’s dataset generator function is able to
take the diagrams collected and calculate feature vectors based on the implementation
of our feature library.
3.4 Implementation
In order to run a comparative evaluation of our two new models against other dividers
we integrated our models into DataManager’s Evaluator [6]. We also integrated our
old divider [9] and implemented the Entropy divider [7].
The Entropy divider had to be trained as no thresholds were provided by [7]. We
trained it on the same data as our new dividers using 10-fold cross validation with the
decision stump algorithm from Weka [22] to find an optimal threshold. We chose the
decision stump algorithm as this generates a decision tree with one node, essentially
producing one decision based on the Entropy feature. The 10-fold cross validation
reported that 85.76% of the training data was correctly classified; other algorithms
such as OneR, a rule based method, and a J48 tree (C4.5 decision tree) showed similar
results. Our divider developed from previous work [9] was not re-trained; it was
implemented with the same thresholds as the original decision tree.
4 Evaluation
In order to test the accuracy of these dividers on data that they are not trained on we
used a new set of diagrams from different domains to the training set. The test set was
composed of ER and process diagrams (see figure 2) collected from 33 participants
who drew one diagram from each domain. The participants were asked to construct
the diagrams from text descriptions so that they are realistic in individual drawing.
There are a total of 7062 strokes in our test set which is similar in size to our training
set. There are 4817 text strokes and 2245 shape strokes. Table 3 shows the results for
each divider on the test set of diagrams. LADTree is the most accurate of the four
tested with 95.2% correctly classified closely followed by LogitBoost at 95.0%. The
Entropy divider is the least accurate at a rate of 83.3%. It is clear that entropy has a
large bias towards text as only 50.5% of the shapes in the test set are correctly
Table 3. % Correctly classified for each divider. classified. Our previous divider is
Divider % Correct % Text % Shapes slightly more accurate than Entropy;
however its bias towards text is not
LADTree 95.2 98.3 88.5
as extreme. In fact the results show
LogitBoost 95.0 98.1 88.4 that all dividers classify text much
Old Divider 86.9 93.1 73.5 more accurately than shapes.
5 Discussion
The high accuracy of the results we have obtained by using data mining techniques to
build dividers demonstrates the effectiveness of this approach. We believe that other
recognition problems would also benefit from a similar study of data mining
techniques. However there is still room for improvement in these divider algorithms.
In terms of tuning, for all the algorithms where we varied the number of iterations
we found that a high number of iterations usually resulted in significantly better
results. We could tune these further by increasing the number of iterations for some
algorithms however we are constrained by time and memory. Although, these
constraints are for training, once the classifier is trained the memory requirements are
minimal and actual classification time on instances is very fast in all cases.
We can also study the common types of failures that occur with recognition, in
particular for shapes as they are the main source of misclassification. Data mining
these misclassified strokes could identify features that may help correctly distinguish
them. Studying error cases may also lead to the identification of new features that
account for these misclassified shapes.
Feature selection strategies may also contribute to recognizer improvement. This
involves using feature selection algorithms to isolate the features that perform well.
When training an algorithm insignificant features can have a negative effect on the
success of classification algorithms [22] therefore careful feature selection is a very
important step to developing recognition techniques. We were surprised that 100+
features were employed by our top two dividers and speculate that some features are
redundant or detrimental. Redundant features will only slow execution time whereas
our concerns are with features that have a negative effect. Further exploration of
feature selection strategies could identify features that should be excluded.
Combining different classifiers into a voting system is also worthy of investigation.
Classifiers predictions can be weighted according to their performance and combined
to produce one overall classification for an instance [22]. We are yet to investigate
whether the different algorithms have a large number of common failures. If they all
fail on the same cases then voting is not useful. For future work we plan to investigate
the main cause of failures that occur for the original seven algorithms and identify
what proportions are common between them.
We chose to train and test on diagrams of different domains to create a general
diagram divider. Each diagram domain has its own syntax, semantics and mix of
drawing shapes. Given the difference between the training 10-fold validation values
Building Digital Ink Recognizers using Data Mining: Distinguishing Between Text and
Shapes in Hand Drawn Diagrams 9
and the test results (~ 2.3%), it may be worthwhile to data mine and train a divider for
each diagram domain.
6 Conclusion
We have built seven new dividers using data mining techniques to distinguish
between text and shapes in hand drawn diagrams. The two best dividers, LADTree
and LogitBoost, are able to correctly classify 95.2% and 95.0% respectively of a test
set that they have received no training for. A comparative evaluation of these dividers
against two others shows that the new dividers clearly outperform the others. The
success of our new dividers demonstrates the effectiveness of using data mining
techniques for sketch recognition development.
7 Acknowledgements
Thanks to Associate Professor Eibe Frank for expert advice on using data mining
techniques. This research is partly funded by Microsoft Research Asia and Royal
Society of New Zealand, Marsden Fund.
8 References
1. Rubine, D.H. Specifying gestures by example. in Proceedings of Siggraph '91.
1991: ACM.
2. Paulson, B. and T. Hammond. PaleoSketch: Accurate Primitive Sketch Recognition
and Beautification. in Intelligent User Interfaces (IUI '08). 2008. New York,
USA: ACM Press.
3. Wobbrock, J.O., A.D. Wilson, and Y. Li, Gestures without libraries, toolkits or
training: a $1 recognizer for user interface prototypes, in User interface software
and technology. 2007, ACM: Newport, Rhode Island, USA.
4. Plimmer, B., Using Shared Displays to Support Group Designs; A Study of the Use
of Informal User Interface Designs when Learning to Program, in Computer
Science. 2004, University of Waikato.
5. Young, M., InkKit: The Back End of the Generic Design Transformation Tool, in
Computer Science. 2005, University of Auckland: Auckland.
6. Schmieder, P., B. Plimmer, and R. Blagojevic. Automatic Evaluation of Sketch
Recognition. in Sketch Based Interfaces and Modelling. 2009. New Orleans, USA.
7. Bhat, A. and T. Hammond. Using Entropy to Distinguish Shape Versus Text in
Hand-Drawn Diagrams. in International Joint Conference on Artificial
Intelligence (IJCAI '09). 2009. Pasadena, California, USA.
8. Bishop, C.M., M. Svensen, and G.E. Hinton, Distinguishing Text from Graphics in
On-Line Handwritten Ink, in Proceedings of the Ninth International Workshop on
Frontiers in Handwriting Recognition. 2004, IEEE Computer Society.
9. Patel, R., B. Plimmer, et al. Ink Features for Diagram Recognition. in 4th
Eurographics Workshop on Sketch-Based Interfaces and Modeling 2007.
Riverside, California: Eurographics.
10 Rachel Blagojevic1, 1, Beryl Plimmer1, 3, John Grundy2, 2, Yong Wang1, 4