Design Patterns Detection Using A DSL Driven Graph Matching Approach
Design Patterns Detection Using A DSL Driven Graph Matching Approach
net/publication/263583797
CITATIONS READS
35 768
3 authors:
Giuseppe A. Di Lucca
Università degli Studi del Sannio
124 PUBLICATIONS 2,458 CITATIONS
SEE PROFILE
All content following this page was uploaded by Marta Cimitile on 30 May 2018.
SUMMARY
The knowledge about Design Pattern (DP) instances improves program comprehension and re-engineering
of Object Oriented system. Effectively, it helps to discover developer design decisions and trade-offs that
often are not documented. This work describes an approach to automatically detect DPs in existing Object
Oriented systems tracing system’s source code components with the roles they play in the Patterns. In
the proposed approach DPs are modelled basing on their high level structural properties (e.g., inheritance,
dependency, invocation, delegation, type nesting, and membership relationships) that are checked, by source
code parsing, against the system structure and components. Moreover, the approach can detect also Pattern
variants, defined by overriding the Pattern properties. The paper presents a description of the approach,
provides a brief description of the supporting tool, and discusses the results from the experiments carried
out to validate it. The approach was validated on seven systems of an open benchmark containing systems
of increasing sizes. For five additional systems, the results have been compared with the ones from a similar
approach existing in literature. The obtained results, the identified DPs variants and the effectiveness of the
approach are thoroughly presented and discussed.
Copyright ⃝ c 2013 John Wiley & Sons, Ltd.
Received . . .
KEY WORDS: Design Patterns Detection, Object Oriented systems,Graph-Matching, Domain Specific
Languages, Model Driven Development
1. INTRODUCTION
Design Patterns (DPs) were firstly introduced in [29] as general repeatable solutions to commonly
occurring problems in software design. Several works [5, 12] show how software quality greatly
improves by implementing DPs and documenting their adoption. In the last twenty years, as the
number of pattern-based systems and frameworks increased, the topic of (semi-)automatic detection
of pattern instances in Object Oriented (OO) software systems became more critical to improve
program comprehension, maintenance and reuse [44]. When design documentation is not available
(or updated), DPs detection can help program comprehension providing useful insights on software
architecture, the underlying design choices and the role played by each code component in a DP
[15, 23]. This is even more true in the case of bad or incomplete documentation. In fact, the lack of
adequate documentation in a software system may make hard to understand which are the adopted
design solutions and their code components. Finally, searching a software project for DPs can also
be used to assess the quality of the source code [12, 15].
To address these issues, several methodologies, approaches and tools have been proposed in
literature in the last twenty years [50]. Most of these approaches take into account a fixed and
Copyright ⃝
c 2013 John Wiley & Sons, Ltd.
Prepared using smrauth.cls [Version: 2010/05/10 v2.00]
2 M.L. BERNARDI - M. CIMITILE - G. DI LUCCA
limited set of properties to specify a pattern. In particular they do not consider behavioural properties
which are crucial in the characterization of object DPs. Moreover, most of these approaches are very
sensitive to structural differences of the searched patterns since their specifications are embedded in
the detection algorithm.
This paper proposes a detection approach that addresses these issues. The detection algorithm
is based on a meta-model representing both the software system and the searched DPs through a
wider set, with respect to existing approaches, of high level properties related to the source code
elements, the static relationships among them, and their behavior. Each system is considered as an
instance of this meta-model and is represented as a graph of elements and properties about them.
DPs identification is performed by matching each pattern graph with the overall system graph and
by annotating the elements of the type hierarchy with information on the roles they play in the
pattern. An advantage of the proposed approach over most of existing ones is that it also allows
to easily specify variant forms of the classic DPs (as coded in the literature). This is an important
issue to address since it is well known that DPs are present in real world systems with many different
variants [61, 57]. Our detection approach is driven by a set of pattern models written using a Domain
Specific Language (DSL) defined to model the structure of both the software system and the design
patterns. It organizes such DPs models as a hierarchy of declarative specifications. In particular a
DP variant can be expressed as a set of changes to an existing specification by adding, removing
or relaxing properties. Hence, it is possible to write a new pattern specification deriving it from an
existing one (to detect a variant) or to write it from scratch (to detect new kind of pattern), with no
impact on the mining algorithm.
An eclipse-based tool, called Design Pattern Finder (DPF) has been developed to provide an
automatic support to the approach (in the following DPF is used to concisely refer to the proposed
approach, not only just the tool) .
The approach has been assessed by applying it to seven systems of an open benchmark proposed
in [32] and [9]. For five additional systems, we compared our results with the ones obtained using the
tool Design Pattern Detection (DPD), proposed in [61]. In this case the results have been validated
by experts in order to evaluate precision and recall considering the true positives of both tools as a
gold standard.
This paper improves and enhances our previous preliminary investigation reported in [13, 14].
The improvements and enhancements are mainly referred to: (i) a wider set of DSL specifications
allowing the detection of new DPs not considered before; (ii) a new version of the DPF prototype
tool provided with a user interface; (iii) more experiments and the related discussion of the results
provided by the approach, using the tool DPF, on a larger set of systems, comparing them with
results provided by both an open benchmark and the DPD approach. The paper is structured as
follows. In Section 2 relevant related work is discussed. Section 3 describes the meta-model and
DSL defined to represent the system and patterns structure and the detection approach. Section 4
concisely describes the catalog of the DPs specifications, while the DPF tool is briefly described
in Section 5. Section 6 reports the experiment setup whereas the results are discussed in Section 7.
Section 8 contains conclusive remarks and briefly discusses future work. The Appendix reports and
describes some DSL specifications of the most relevant DPs’ and their variants.
2. RELATED WORK
The problem of mining DPs in existing OO systems has been faced and discussed in several works,
and different methods and techniques have been proposed to support it. Some reviews on current
techniques and tools for discovering architecture and design patterns from OO systems are provided
in [24] and [55]. In the last work, authors classified pattern recovery techniques basing on the used
type of analysis and the adopted searching methodology.
Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
DESIGN PATTERNS DETECTION USING A DSL-DRIVEN GRAPH MATCHING APPROACH 3
Reference Detection Approach Tool Mined Recovered DPs Case studies Precision
Code
Keller et al. (1999) Minimum key struc- SPOOL C++ Template method, 2 industrial systems, -
[41] ture factory method and ET++
bridge
Philippow et al. Mimimum key struc- - C++ GoF Student projects 100% for
(2005) [52] ture Singleton,
Interpreter
Kramer and Class structure Pat C++ Adapter, bridge, NME, LEDA, zApp 14-50%
Prechelt (1996) proxy, composite,
[43] decorator
Beyer et al. (2003, Predicate Calculus Crocopat Java/C++ Composite, mediator 2 Mozilla, JWAM, -
2005) [15] wxWindows
Dong et al. (2009) Matrix and weights DP- Java Adapter/command, Java AWT, JEdit, 91-100%
[25] Miner bridge, composite, JHotDraw 6.0b1
strategy/state
Balanyi and Fer- Class structure Columbus C++ Reclassified GoF 2 Jikes, Leda, StarOf- < 60%
enc (2003) [10] fice, StarOffice Writer
Heuzeroth et al. Predicates on - Java Observer, Swing -
(2003) [38] abstract syntax trees mediator, chain
of responsibility,
visitor, and decorator
Niere et al. (2002, Cliche’ recognition FUJABA Java GoF AWT library -
2003) [48] and graph
transformation,
with fuzzy logic
Antoniol et al. Cliche’ matching with - C++ Adapter, bridge, LEDA, libg++, galib, 30%
(1998) [7] software metrics in proxy, composite, mec, socket
the class structure decorator (1995)
Kim and Boldyreff Metrics - C++ GoF 3 Systems (no info) Avg 43%
(2000) [42]
Olsson and Shi Class structure, PINOT Java Reclassified GoF ANT, AWT, JHotDraw, -
(2006) [56] exploiting inter-class Swing
relationships
Kaczor et al. Bit-vector based on - Java Abstract factory and JHotDrawm, -
(2006) [40] string representation composite QuickUML, Juzzle
Smith and Stotts Elemental design pat- SPQR Java Decorator - -
(2003) [57] terns and rhocalculus
Tsantalis et al. Class structure - Java Composite, adapter/- JHotDraw, JRefactory, 100%
(2006a) [61] expresses as command, decorator, JUnit
matrices, exploiting observer, state/strat-
Graph similarity egy, prototype, visi-
algorithm tor
De Lucia et al. XPG formalism and DPRE Java Adapter, bridge, JHotdraw (5.1, 6.0b1), 62-97%
(2009) [19] LR-based composite, proxy, QuickUML, Apache
and decorator Ant, Swing, and
Eclipse JDT
Bergenti (2000) Class Structure IDEA UML Template, Proxy, - -
[12] Bridge, Composite,
Decorator, Adapter
Arcelli (2011) [8] Basic elements and MARPLE Java Abstract Factory, JavaReports, Batik 10-80%
metrics Composite, Visitor
Vokac (2006) [62] Semi-formals no C++ GoF CRM commercial sys- -
diagrams translated name tem
into queries
France et al. UML no C++ Abstract Factory, -
(2004) [27] name Bridge, Decorator,
Singleton, Observer,
Composite, and
Visitor
Stencel (2008) [59] Static analysis tech- D3 Java GoF AJP and JHotDraw -
niques and SQL
Gueheneuc (2008) UML-like multilayered DeMIMA Java GoF 33 industrial systems, 34 %
[33] approach 5 open source sys-
tems
Proposed DSL-Driven Graph- DPF Java GoF and their vari- 12 open source sys- 95%
approach Matching ants tems
Table I. An overview of Design Pattern Detection Approaches
Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
4 M.L. BERNARDI - M. CIMITILE - G. DI LUCCA
Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
DESIGN PATTERNS DETECTION USING A DSL-DRIVEN GRAPH MATCHING APPROACH 5
This is mainly due to the relational model that is not suitable to easily express and query complex
graph-based structures. Moreover, they are used for structural and creational DPs and they only
partially support behavioral DPs identification. This issues are duscussed in [43]. A notable
exception is [54] that proposes a meta-model that takes into account a reasonably complete set of
structural and behavioural properties. With respect to it, our meta-model: (i) includes the possibility
to express fields, methods and constraints on them; (ii) takes into account delegation and dependency
properties and (iii) allows using the defined DSL to build higher-level properties starting from the
low-level ones in order to mine complex structure (requiring no changes to the search engine).
Several tools and frameworks to identify idioms, macro-patterns, DPs and design defects use
explanation-based constraints programming techniques. For example, in [35], authors recover
patterns using a multilayered approach which focuses on ensuring an optimal recall rate, but
precision and performance are low.
Metric based techniques compute program related metrics (e.g., generalizations, aggregations,
associations, interface hierarchies) from different source code representations and compare their
values with source code DPs metrics. These techniques [63, 49, 7, 42] are computationally efficient
because they reduce the search space through filtration [34]. The limit is that they have been
experimented on a few number of patterns. Moreover, their precision and recall is low.
XPG formalism and parsing techniques use SVG (Scalable Vector Graphics) format for the
intermediate representation of the source code and represent DPs in a visual language by mapping
the grammar of each pattern with the graph representation. They give a precise visualization but are
limited only to structural DPs. Moreover, to the best of our knowledge, the existing experimentations
are limited to few patterns [18] and do not show any recall rates.
UML structures and matrices techniques [61, 25, 49, 33, 12, 56] allow to represent structural and
behavioral information of software systems. They apply different techniques to match DPs template
metrics with the matrices generated for the system. In [33] a semi-automatically approach to identify
micro-architectures in source code is proposed. The approach is based on information organized in
three layers: two layers are used to recover an abstract model of the source code (including binary
class relationships) and a third layer is used to identify DPs in the abstract model.
A DPs detection methodology based on similarity scoring between graph vertexes is proposed
in [61]. The approach is able to also recognize patterns that are modified from their standard
representation. It exploits the fact that patterns reside in one or more inheritance hierarchies (in
order to reduce the size of the graphs which the algorithm is applied to). These approaches are
computationally efficient and have good precision and recall rates. Their limit is that they miss to
extract the implementation variants of similar DPs. Furthermore, they are limited to a few number
of patterns.
Finally, there are some well known techniques that cannot be classified in the above categories
(e.g., fuzzy reasoning, bit vector compression, minimum key structure method, predicate and rho
calculus, dynamic analysis using run-time execution traces, formal methods based on semantic,
machine learning based approaches and concept analysis) but are good as a complement to improve
the structural methods cited above [48, 38, 40]. For example, in [19], De Lucia et. al. present some
case studies of recovering structural design patterns from OO source code and in [21] they propose a
model checking approach to analyze behaviour of pattern instances both dynamically and statically.
In [8] a tool for DPs detection and software architecture reconstruction is proposed. An approach
mixing structural and metric techniques is used to detect pattern instances. More recently, in [4], DP
recovering is obtained using ontology formalism. DPs restrictions are formalized and translated into
rules that are executed on a knowledge-base that is populated with semantic descriptions of library
code.
Some studies have been also focused on the formalization of empirical evaluation criteria
[25, 19, 60]. Each applied technique should be evaluated using well defined criteria and different
authors have proposed taxonomies and related frameworks to perform such evaluations.
The approach we propose is based on a system meta-model, and a DSL, that is able to represent
elements down to statements and expressions. This allows to reason about structural and behavioral
properties that can be used (i) to improve search space reduction and (ii) to distinguish between
Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
6 M.L. BERNARDI - M. CIMITILE - G. DI LUCCA
patterns that have the same structure but behave (or are used) in different ways [61]. Thus the type
of analysis includes structural and behavioral analysis whith a graph matching searching method.
Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
DESIGN PATTERNS DETECTION USING A DSL-DRIVEN GRAPH MATCHING APPROACH 7
the DPF approach, MARPLE is less capable to reduce the search space and, according literature, it
has been validated just on few DPs.
DPRE tool ([19]) is based upon the XPG grammar formalism to express patterns. Once the
grammar is defined, the Visual Programming Environment Generator produces a visual editor and a
XpLR parser from it. XPG grammars, allows to represent a set of properties using the terminals as
building blocks (i.e. Class, Aggregation and Inheritance). Our approach uses a different matching
algorithm that is based on a graph model that also takes delegation, object creation, dependency
and containment into account. Moreover DPRE requires code generation in order to detect new
patterns (or variants) since the grammar must be used to generate the pattern and the related visual
editor. In contrast, our matching algorithm is applied on graphs that are generated by means of
a run-time translation of the DSL into graphs and hence does not require any code generation or
integration step. While this could increase detection times with respect to DPRE, it allows to reach
better precision since the set of patterns and their variants can be effectively customized for any
given context. An extension of DPRE approach has been proposed in [20, 22]. Finally, there are
some additional approaches that are not reported in table for briefly. In [67, 66] a technique for DP
recognizing is provided. It complements existing static analysis with a dynamic analysis. In contrast
with our proposed approach (that uses a static analysis even to recover behavioral properties),
a dynamic analysis is here introduced to transform DP behavioral aspects into finite automata
identifying relevant methods calls. A dynamic analysis is also used in MoDeC ([47]) to describe
behavioral and creational pattern as collaborations among objects in the form of scenario diagrams
and in [39], where a test program is executed on the system to produce traces for the behavior
parser. The dynamic analysis should provide better precision in recovering pattern behavior ( [47]
) but requires a complete system execution and the generation of all the pattern-relevant traces.
Moreover, we have selected a wide set of properties that allow statically to infer pattern behavior.
The results of our study are effective if compared with a dynamic based approach. Finally, a more
recent tool is DPJF ([17]) which implements an approach similar to the proposed one. It improves
precision performing a set of specific behavioural analyses on source-code (e.g., forward to a single
object, field maintenance, state propagation ) to filter out false positives. Our approach, while not
based on such specific analyses, allows by using DSL constructs to build most of them declaratively
improving the overall flexibility of the mining process.
Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
8 M.L. BERNARDI - M. CIMITILE - G. DI LUCCA
The detection approach defines and exploits a meta-model and a Domain Specific Language (DSL)
to model the structure of both the software system and the DPs to be detected.
This part of the meta-model is also used as the base to define the DSL representing structural and
behavioral relevant properties of OO software systems.
Our DSL language takes inspiration from existing pattern detection languages. Some example are
the SDF, Crocopat and Grok [37, 15, 31]. In particular, for pattern mining, two requirements that
strongly point towards using a domain-specific language are: (i) the need to express the structure of
software system (and patterns), composing it by recursive rules; and (ii) the need (and difficulty) of
∗ Compound types are treated as separated types since they must specify the base type of compound. In this class are also
arrays and generic types.
Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
DESIGN PATTERNS DETECTION USING A DSL-DRIVEN GRAPH MATCHING APPROACH 9
p a t t e r n ob s e r v e r {
type AS( 1 ) {
has method A , R ;
has method N ;
has container o of type AO;
}
type AO( 1 ) {
has method U ;
}
type CO( ∗ ) {
i n h e r i t s −from AO;
}
type CS( ∗ ) {
i n h e r i t s −from AS ;
has constructor c {
object −c r e a t i o n o ;
}
overrides methods [ A , R] each {
delegates to o ;
}
overrides method N each {
delegates to o ;
c a l l s U i n AO. U;
}
}
}
succinctly representing effective constraints on such structure. Starting from the existing languages,
the DSL was defined with the aim to express design pattern specifications with the following goals:
• a specification should be writable by the analyst with reduced effort;
• the DSL should allow to express constraints on source code structure and behavior to model
complex DPs;
• the DSL should support the definition of pattern variants (using inheritance among
specifications) to foster reuse;
As an example how a pattern is modeled using the DSL, let us consider a classic Observer DP
(supporting only a single kind of event for each notify method) as proposed in literature [29]. Figure
2 shows the DSL specification for such an Observer DP. Each specification is just a sequence of type
blocks: each block specifies the set of properties that must hold for a role in the pattern (including
the constraints on the allowed multiplicity, reported in the brackets just after the type name - if no
brackets follow the type name the default multiplicity is 1).
As shown in the Figure 2, the Observer specification requires:
• a single AbstractObserver (AO) and several ConcreteObservers (CO);
• a single AbstractSubject (AS) and several ConcreteSubjects (CS);
• a container of AbstractObservers to be defined in the ConcreteSubject (the field “o”);
• the methods A and R (that play roles of add and remove) to be defined in the AbstractSubject
and overridden in ConcreteSubjects;
• a Delegation to be defined between A and R of ConcreteSubject and the add/remove methods
Container type;
• the notify method (called “N”); the method N must contain an invocation towards the update
method U of the AbstractObserver classifier;
• an object creation (to initialize the container field “o”) in the constructor of the
ConcreteSubject type.
Each specification can be translated into a graph in which elements are nodes and properties
are labelled edges. This graph, as better explained in Section 3.2, is part of the input for a two-
pass graph-matching detection algorithm. Figure 4, on the top right side, shows an excerpt of the
Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
10 M.L. BERNARDI - M. CIMITILE - G. DI LUCCA
graph representing the Observer as described in the specification of Figure 2. Such graph reports
key elements specified in the DSL together with the relationships among them † . A variant of
this Observer can be defined, easily, deriving it from the DSL specification of Figure 2. Structural
elements that need to be changed can be overridden. For instance, Figure 7, on the right side, shows
the graph of a common multi-event Observer. This variant redefines the elements “N” and “U” as
sets of methods to take into account different kinds of events and notification handlers.
In order to describe the detection algorithm, we provide the definitions of the notations and
concepts used in it.
A DPG can be considered as an attributed graph specifying a set of predicates on the attributes
that must hold. It is the building block of the detection process and is used to identify sub-graphs of
interest occurring in the system graph. Formally:
Definition 1
DPG — A design pattern graph is a pair DP G = (P, AC), where P is an attributed graph defined
by P = (V, E) where E and V are respectively its nodes and edges. AC is a set of predicates on the
attributes that contains compound expressions made of conditions on nodes, edges and attributes of
P. §
We introduce at this point the definition of DPG matching which generalizes sub-graph
isomorphism with evaluation of the predicates on the attributes.
Definition 2
DPG Matching — A design pattern graph DPG(P, AC) is matched with a system graph S if there
exists an injective mapping ϕ : V (P ) → V (S) such that: (i) ∀e(u, v) ∈ E(P ), (ϕ(u), ϕ(v)) is an
edge in S, and (ii) predicate ACϕ (S) holds.
If a DPG is matched to a system graph, the binding between them can be used to access the
sub-graph on the system (either the sub-graph structure or attributes and properties on nodes and
edges).
We define a matched DPG to denote the binding between a DPG and the system graph as follow:
† To keep figure concise, each node/edge is labelled with the initial letter of the corresponding field or method in the
DSL. Moreover, not all properties are represented, as for type CS which has several overrides and a constructor that are
all omitted.
‡ Note that RTA is used to handle late binding and hence the computed call graph reports a super-set of the real calls that
can be executed at run-time. This however only lowers the precision in very few cases. A discussion on the impact on the
detection quality is however reported in threats to validity section.
§ Compound predicates can be broken down to simple predicates on individual (or set of) nodes or edges.
Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
DESIGN PATTERNS DETECTION USING A DSL-DRIVEN GRAPH MATCHING APPROACH 11
1
2 L i s t <MatchedGraph> i n s t a n c e s = . . . ;
3
4 void s t a r t ( )
5 begin
6 f o r w a r d N e i g h b o r h o o d A n a l y s i s (DP)
7 f o r i = 1 to k do
8 Match ( i ) ;
9 end
10 end
11
12 void f o r w a r d N e i g h b o r h o o d A n a l y s i s (DP)
13 begin
14 foreach node u i n DP do
15 ϕ(u) = { v i n V ( S ) | ACu ( v ) = t r u e }
16 ( 1 ) computation ϕ(u)
17 ( 2 ) reduce ϕ(u1 ) . . . ϕ(uk ) u s i n g
18 lookahead and p r o p e r t i e s c o n s t r a i n t
19 end
20 end
21
22 void Match ( i )
23 begin
24 foreach v i n ϕ(ui ) | v i s f r e e do
25 i f not checkNeighborhoodBindings ( ui , v )
26 then continue ;
27 ϕ(ui ) = v ;
28 i f i < | V(DP ) | then Match ( i + 1 ) ;
29 else
30 i f ACϕ ( S ) then
31 i n s t a n c e s . add ( ϕ() ) ;
32 end
33 end
34
35 boolean checkNeighborhoodBindings ( ui , v )
36 begin
37 foreach edge e ( ui , uj ) i n E(DP) , j < i do
38 i f ( edge e1 (v, ϕ(uj ) ) not i n E ( S ) )
39 o r ( not ACe (e1 ) ) then
40 return false ;
41 end
42 return true ;
43 end
Definition 3
Matched DPG — Given an injective mapping ϕ between a DPG and a graph S, a matched graph is
a triple ⟨ϕ, DP, S⟩ and is denoted by ϕDP (S).
Figure 3 outlines the detection algorithm. The specification expressed as a design pattern graph
DP is rewritten by means of a set of predicates on individual nodes ACu and edges ACe . For each
node u in the pattern DP, there is a set of candidate matched nodes in S for which the constraints
ACu hold. These nodes define a (partial) matched design pattern graph referred as candidate
neighborhood of node u and denoted by ϕ(u):
Definition 4
Candidate Neighborhood — The candidate neighborhood ϕ(u) of node u is the set of nodes in graph
S that satisfies the predicate ACu :
ϕ(u) = {v | v ∈ V (S), ACu (v) = true}
Hence the search space of a DPG matching a pattern DP on the system S is defined by the
candidate neighborhoods of all nodes belonging to the pattern specifications as follows:
Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
12 M.L. BERNARDI - M. CIMITILE - G. DI LUCCA
Definition 5
Search Space — The search space of a DPG on a system graph S is defined by the candidate
neighborhood in S of all DPG nodes. It corresponds to the Cartesian product of the candidate
neighborhood for each DPG node: ϕ(u1 ), × . . . ×, ϕ(uk ) ∈ S , where u1 , . . . , uk ∈ DP .
The algorithm can be seen as composed of two phases. The first one starts at line 5 by calling
(line 7) the forwardNeighborhoodAnalysis function (lines 13-21) which computes the candidate
Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
DESIGN PATTERNS DETECTION USING A DSL-DRIVEN GRAPH MATCHING APPROACH 13
neighborhood for each node uh in the DPG. This function ends after performing a pruning step of
the search space (better described in the following).
The second phase (on lines 23-44) performs a search, over the product ϕ(u1 ) × . . . × ϕ(uk ) using
a depth-first traversal, to find a sub-graph isomorphism. The Match(i) function iterates on the ith
node to find valid bindings for that node. Procedure checkNeighborhoodBindings(ui, v) examines
if ui can be mapped to v by considering their edges and attributes. Line 28 maps the node ui to
v. Lines 29-33 continue to search for the next node or, if it is the last node, evaluate the predicate
AC to check constraints. If it is true, then a valid binding ϕ : V (DP ) → V (S) has been found and
added to the list (line 32). Since the worst-case complexity of the matching algorithm is O(nk ),
where n = |S| and k = |P |, to make the algorithm usable on real systems, a search space reduction
technique must be used. Our approach uses system and pattern information to reduce the size of
candidate neighborhoods and exploits a look-ahead requiring, for each node ui of the DPG, a valid
(partial) binding of the neighborhood sub-graph centered in ui and having a fixed distance r from it.
For each candidate neighborhoods, structural information (e.g., nodes and edges) and predicates
on attributes (types and properties of nodes and edges) are used to prune matches that would not
produce acceptable solutions. This neighborhood knowledge can be exploited to prune unfeasible
sub-graph at an early stage and obtain a reduced set of candidates on which to perform the full
depth first matching (that is resource- and time- expensive). There is a trade-off with respect to
how candidate neighborhood sub-graphs are built. They increase pruning power as the look-ahead
increases, but their construction is of polynomial complexity (with respect to look-ahead). The
current implementation uses a look-ahead equals to 1 (immediate neighborhood). We found no
improvement with a look-ahead equals to 2 since even if for some patterns (those that have a
rich structure or highly constrained) time was greatly reduced, for others the bigger neighborhood
analysis increased the total time (i.e., the average time remained almost the same). Figure 4 reports
a simple running example of the detection process. It shows the candidate neighborhood analysis
and the resulting bindings related to the pattern specification of Figure 2 as performed on a subset
of the roles (only AS and AO) of the Observer DPG and on a small portion of a system graph
(respectively DP and S in the Figure 4). For each pair of nodes ui ∈ V (DP ) and vj ∈ V (S), the
neighborhood sub-graph of ui and vj are matched to find candidate neighborhood. In the step 1 the
pair considered is (Text, AS) and hence the immediate neighborhoods of respectively DP and S are
considered. In this case the match fails since pattern node neighborhood is not a sub-graph of S. By
converse the step 2 on pair (AS, Figure) is a successful match since AS and Figure neighborhoods
are congruent and all the constraints are satisfied (nodes and edges are of the same types, and
multiplicity constraints are met). Hence several conditioned bindings are established for the matched
candidate neighborhood. The step h is for the pair (Figure, AO). Due to structural differences this
match fails (correctly) avoiding an unfeasible candidate neighborhood (since the Figure is not an
AbstractObserver). This because the structure of the Observer DP has a quite good pruning power.
Simpler patterns may generate a higher number of candidate neighborhoods that must be taken
into account in the second phase of the algorithm in which the full depth-first matching performed
increases time and space requirements. Hence, to further reduce search space, the algorithm is
executed on all the DPGs in the specification repository and the previous bindings are taken into
account when performing the subsequent candidate neighborhood analysis.
For some pattern model, the same element could be bound, in the general case, to several patterns.
This however is not true for all pattern elements. For instance, the binding of the visit() for the Visitor
pattern method should not allow bindings of another patterns (e.g., like the execute() method of a
Command or the notify() method of an Observer). When a pattern model explicitly forbids multiple
bindings for a pattern element, existing established bindings of already analyzed patterns are used
as further constraints to improve the search space reduction (pruning unfeasible bindings as early as
possible).
Variants are handled in the same way as other specifications, with no special treatment within the
detection process. The only difference regards how their sub-graphs are built (taking into account
the specification inheritance relationships and using a flattening approach). The resulting variant
graph contains both the properties inherited from their super-specification and the overridden ones.
Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
14 M.L. BERNARDI - M. CIMITILE - G. DI LUCCA
The candidate neighborhood analysis is performed for each pattern role and for each system
element ¶ . After the analysis the bindings are merged together and verified as a whole and hence
results are not influenced by role ordering. Moreover, after candidate neighborhood has been
performed, all the candidate bindings are verified to see: (i) if they cover all mandatory pattern
properties, and (ii) if all the matched candidate bindings hold. If both conditions are met the set of
candidate bindings represents a valid pattern instance linking each found pattern role to the set of
system elements implementing it.
The execution times are influenced by the ordering of the pattern specifications since existing
bindings for single-bind roles are used to prune the list of system elements to consider during
subsequent candidate neighborhood analyses (reducing execution times).
A set of DSL specifications of the most commonly used DPs has been written and stored in the DP’s
specifications catalog repository.
The currently defined catalog is shown in Figure 5, where each DP is represented as the root
of a hierarchy (i.e., the darker rectangle(s) in each box) while each descendant (i.e., the lighter
rectangles) represents a DP variant. The catalog is composed of 18 patterns detected by 56 variants.
The detection relationship between a variant and the mined pattern is depicted using a dashed arrow.
Inheritance between specification is depicted as the standard UML generalization.
A description of the DSL code and DPGs for the most relevant DPs specifications and their
variants is provided in the Appendix.
In the remaining of this section, in order to show how DSL statements are used to specify design
patterns to mine, the Observer multi-event variant is described in more details.
4.1. Observer
In Section 3 a description of both DSL and DPG for the Observer DP has been provided. Here, we
give the DSL specification and DPG of a variant of this DP, known as multi-event Observer and
often used in real-world software systems.
This variant takes into account also notify methods with one or more parameters and indirect calls
to update methods. The structure of this specification, reported in Figure 6, is quite different from
the one proposed in literature (shown in Figure 2) and requires the following properties:
¶ Actually for each system element that has not already bounded to a pattern role that explicitly requires unique bindings
for it
Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
DESIGN PATTERNS DETECTION USING A DSL-DRIVEN GRAPH MATCHING APPROACH 15
GoF Observer
Command
Factory Method Abstract Factory Template Method
GoF Command
EnumerationSingleton
magic-cookie external-nested
TwoWay-Adapter DynamicAdapter
GoF State
uml-statechart
they are written. For instance writing a specification with conflicting rules is legal within our DSL,
but will result in bad mining performances and will obviously produce no results.
Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
16 M.L. BERNARDI - M. CIMITILE - G. DI LUCCA
The Design Patterns Finder (DPF) Tool implements all the steps of the identification process. It was
developed as a set of Ecplise plug-ins based upon JDT, and upon the EMF framework. Figure 8
shows the overall architecture of the DPF tool. It is a layered architecture: the bottom layer includes
Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
DESIGN PATTERNS DETECTION USING A DSL-DRIVEN GRAPH MATCHING APPROACH 17
Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
18 M.L. BERNARDI - M. CIMITILE - G. DI LUCCA
the Eclipse Foundation Components [1]. Indeed the tool uses the JDT and the Eclipse Modeling
Platform (Xpand, Xtext and MoDisco) to extract the needed information about the systems static
structure.
The middle layer (DPF Core) includes the main components of the DPs identification process.
Three main sub-systems are included in this layer. The Project Analyzer produces an instance of
the meta-model (i.e., the system Graph) of the analyzed system by static analysis. At this aim the
following information is extracted from the analyzed system: type hierarchy, type inner structure
(attributes, their types and scopes and so on), methods and constructors signatures, method calls,
object creations and container support in order to express containment within types, static member
information, delegation.
The Pattern Specification Parser and Translator sub-system registers the DSL specifications of
each DP, and parses them to produce the corresponding DPGs. Each specification is written using
the defined DSL that is translated, by means of the Xtext-based DSL translator in a set of constraints.
A Pattern Catalog stores the specifications (both DSLs and corresponding DPGs) of the DPs to
be mined (currently the Factory Method, Prototype, Singleton, Adapter, State, Strategy, Composite,
Decorator, Observer, Memento, Template Method, Command, Proxy, Bridge and Visitor DPs are
included in the catalog). Each pattern specification can be standalone or can override a base
specification by changing some of its properties.
The Design Pattern Finder Engine is the heart of the mining process. It executes the detection
algorithm to identify the DP instances by searching the system graph for sub-graphs matching
a defined DPG. Once the system has been parsed and the meta-model instance is built, the user
can select which DPs are to be detected. The execution of the algorithm produces a model of the
system elements annotated with information on the detected patterns (including with their internal
members and pattern roles). Each identified pattern instance is traced to the source code elements
implementing it and the tool allows a user to visualize, inspect and analyze such code components.
The results of the identification process are also stored in the central repository.
The DPF IDE layer allows interactions with users using the tool as shown in figures 9 and 10.
The user can select which patterns are to be searched and analyzed in a system by means of a tree
viewer or using the Eclipse Visualizer. The Visualizer also shows a summary of patterns found in
the analyzed system at different levels (package or class).
Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
DESIGN PATTERNS DETECTION USING A DSL-DRIVEN GRAPH MATCHING APPROACH 19
However, a full description of the funtionalities provided by the DPF tool and how a user can
interacts with DPF are out of the scope of this paper.
6. CASE STUDY
The effectiveness and efficiency of the proposed approach has been validated applying it to twelve
OO systems.
A first group contains seven open source java software systems of increasing sizes from the
publicly available benchmarks proposed in [32] and in [2]. Moreover, we consider a second group of
five open source java software systems (Log4j, JHotDraw7, Apache Avro, JDT, Voldemort) selected
to perform a direct comparison with a similar design patterns mining approach proposed by Tsantalis
in [61]. All the analyzed systems along with the main structural characteristics are listed in the
Table II. Systems, in both groups, were chosen of increasing sizes to evaluate the scalability of the
algorithm and to validate the quality of results on a large code base. The DPs considered in this case
study are the ones reported in Figure 5. The validation was performed using the DPF tool developed
to support the approach. According to [55], design pattern recovery techniques can be evaluated
by computing precision and recall [51], in order to asses their effectiveness and correctness. To
compute recall and precision we assume that a pattern instance can be classified into one of four
categories:
• true-positive (TP : correctly found),
• false-positive (FP : incorrectly found),
• true-negative (TN : correctly missed),
• false-negative (FN : incorrectly missed).
Precision is defined as the ratio of correctly found occurrences to occurrences provided by the tool
and is given by:
P recision = TP /(TP + FP ) (1)
Recall is the ratio of correctly found occurrences to all correct occurrences and is given by:
Recall = TP /(TP + FN ). (2)
To verify the correctness of the results, in the case of the first group of seven systems, we
considered as Gold Standard (GS) the union of both the benchmarks cited in [32] and in [2]
Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
20 M.L. BERNARDI - M. CIMITILE - G. DI LUCCA
(assumed to be correct) with the correct results produced by our approach (i.e., instances not in
the benchmarcks and mainly due to DP variants detection evaluated by code inspection). ∥ Each DP
instance in the resulting GS was classified as a DP variant according to the defined catalog.
For the remaining five systems, the GS was computed using the correct results produced by both
DPF and DPD tools. Hence it could lack pattern instances missed by both tools (overestimating
recall) but allows to perform a direct (and reliable) comparison on precision.
7. DISCUSSION OF RESULTS
∥ Of course, the different formats of the benchmarks were translated into a unique common format to store the considered
GS.
Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
DESIGN PATTERNS DETECTION USING A DSL-DRIVEN GRAPH MATCHING APPROACH 21
System→ 01-junit
↓Design Pattern GS D TP FP FN P R
Adapter/spec{InnerClass} 10 10 10 0 0 1 1
Adapter/spec{GoFObject} 22 18 17 1 5 0,94 0,77
Command/spec{ExecutionEngine} 57 59 54 5 3 0,92 0,95
Composite/spec{GoF} 3 3 3 0 0 1 1
Composite/spec{MultipleCompositeRoot} 4 4 3 1 1 0,75 0,75
Composite/spec{MultipleComposite} 4 5 4 1 0 0,8 1
Decorator/spec{GoF} 4 5 4 1 0 0,8 1
Factory Method/spec{SingleCreator} 3 3 3 0 0 1 1
Factory Method/spec{GoF} 8 7 7 0 1 1 0,88
Factory Method/spec{SingleFactory} 11 11 10 1 1 0,91 0,91
Factory Method/spec{Parametrized} 4 4 4 0 0 1 1
Factory Method/spec{DelegationBased} 3 2 2 0 1 1 0,67
Iterator/spec{External} 6 6 6 0 0 1 1
Memento/spec{GoF} 2 2 2 0 0 1 1
Observer/spec{EventsAsParams} 6 6 6 0 0 1 1
Singleton/spec{Enumerative} 2 2 2 0 0 1 1
Singleton/spec{Relaxed} 2 2 2 0 0 1 1
Strategy/spec{GoF} 14 12 12 0 2 1 0,86
Template Method/spec{GoF} 22 24 19 5 3 0,79 0,86
Table III. Results obtained on benchmark 01-junit
System→ 02-Lexi
↓Design Pattern GS D TP FP FN P R
Adapter/spec{InnerClass} 36 35 33 2 3 0,94 0,92
Builder/spec{GoF} 5 5 5 0 0 1 1
Command/spec{GoF} 36 33 32 1 4 0,97 0,89
Command/spec{InnerClasses} 37 35 34 1 3 0,97 0,92
Factory Method/spec{SingleCreator} 4 3 3 0 1 1 0,75
Factory Method/spec{GoF} 4 3 3 0 1 1 0,75
Factory Method/spec{SingleFactory} 2 1 1 0 1 1 0,5
Factory Method/spec{Parametrized} 5 4 4 0 1 1 0,8
Factory Method/spec{RegistryBased} 3 2 2 0 1 1 0,67
Factory Method/spec{DelegationBased} 5 4 4 0 1 1 0,8
Observer/spec{GoF} 5 5 5 0 0 1 1
Observer/spec{EventsAsParams} 6 6 6 0 0 1 1
Singleton/spec{GoF} 3 2 2 0 1 1 0,67
Singleton/spec{Enumerative} 5 4 3 1 2 0,75 0,6
Singleton/spec{Relaxed} 10 9 8 1 2 0,89 0,8
State/spec{GoF} 2 2 2 0 0 1 1
Strategy/spec{GoF} 11 12 11 1 0 0,92 1
Template Method/spec{GoF} 5 4 4 0 1 1 0,8
Table IV. Results obtained on benchmark 02-Lexi
Similar considerations can be made for Factory DPs: the Parametrized variant for both Abstract
Factory and Factory method allowed to discover several istances of creator method (requiring
parameters to select the product) that otherwise would be missed by DPF. Indeed, 18 Factory
Method Parametrized instances were found in addition to the 16 GoF ones. Moreover, inspecting the
Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
22 M.L. BERNARDI - M. CIMITILE - G. DI LUCCA
System→ 03-JHotDraw
↓Design Pattern GS D TP FP FN P R
Adapter/spec{GoFObject} 32 29 29 0 3 1 0,91
Bridge/spec{GoF} 37 36 35 1 2 0,97 0,95
Builder/spec{GoF} 2 0 0 0 2 0
Command/spec{UndoRedo} 15 11 11 0 4 1 0,73
Command/spec{GoF} 65 64 61 3 4 0,95 0,94
Command/spec{UndoRedoEngine} 11 10 10 0 1 1 0,91
Command/spec{ExecutionEngine} 20 20 20 0 0 1 1
Composite/spec{GoF} 16 19 14 5 2 0,74 0,88
Decorator/spec{GoF} 31 29 29 0 2 1 0,94
Factory Method/spec{SingleCreator} 10 9 9 0 1 1 0,9
Factory Method/spec{GoF} 3 2 2 0 1 1 0,67
Factory Method/spec{SingleFactory} 10 9 9 0 1 1 0,9
Factory Method/spec{Parametrized} 15 14 12 2 3 0,86 0,8
Factory Method/spec{RegistryBased} 2 1 1 0 1 1 0,5
Factory Method/spec{DelegationBased} 8 5 5 0 3 1 0,62
Memento/spec{GoF} 2 2 2 0 0 1 1
Observer/spec{EventsAsHierarchy} 4 4 4 0 0 1 1
Observer/spec{GoF} 5 5 5 0 0 1 1
Observer/spec{EventsAsParams} 5 5 5 0 0 1 1
Prototype/spec{GoF} 8 8 8 0 0 1 1
Singleton/spec{GoF} 6 5 5 0 1 1 0,83
Singleton/spec{Enumerative} 3 3 3 0 0 1 1
Singleton/spec{Relaxed} 4 4 4 0 0 1 1
Strategy/spec{GoF} 49 37 36 1 13 0,97 0,73
Template Method/spec{GoF} 67 61 59 2 8 0,97 0,88
Table V. Results obtained on benchmark 03-JHotDraw
results, we noted that in the benchmark out of 18 GoF instances, 7 were DelagationBased variants
(delegating the creation to other objects ) of which 6 instances were correctly mined by DPF.
We found that the number of false negatives was dramatically reduced by adding new variants
inheriting existing specifications and taking into account the structural differences that caused the
tool to miss them. The false negatives (computed as GS − Tp in Tables from III to IX) were related
to patterns implemented differently from what assumed in the specification (our catalog is, for the
most part, based on the definitions given in literature [29] and their known variants ).
As an example, Table III shows the variants detected for the JUnit 3.7 system. In this small system
a limited number of pattern is found, however is interesting that the five variants of Factory Method
identify several factory methods that were not present in the benchmarks. It’s worth observing that
in some cases (not all), variants are not mutually exclusive with respect to the instances. Single
creator factory mtehods share the 7 GoF instances and the same happens for the Composite variants
for which the MultipleComposite or MultipleCompositeRoot and Decorator variants can overlap
on the same instances. DPF found several Command instances using an external execution engine.
These patterns actually are not part of the main code of the framework (and correctly not included
in the benchmark) but are sample tests of the junit distribution shipped with the framework. These
tests indeed use the TestRunner as an executor implementing the structure and behaviour required
for the proposed Command-ExecutionEngine variant (and for these reason were added to the gold
standard).
It’s also worth observing that mining results obtained for specifications focused on the detection
of pattern variants using inner classes are quite good raising the resulting precision and recall of
the overall pattern family. This is especially true for Commands (in Lexi, QuickUML and Nuch
Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
DESIGN PATTERNS DETECTION USING A DSL-DRIVEN GRAPH MATCHING APPROACH 23
System→ 04-QuickUML
↓Design Pattern GS D TP FP FN P R
Abstract Factory/spec{GoF} 11 9 9 0 2 1 0,82
Adapter/spec{InnerClass} 19 20 18 2 1 0,9 0,95
Adapter/spec{GoFObject} 48 44 41 3 7 0,93 0,85
Builder/spec{GoF} 12 11 10 1 2 0,91 0,83
Command/spec{GoF} 10 8 7 1 3 0,88 0,7
Command/spec{InnerClasses} 12 12 12 0 0 1 1
Composite/spec{ContainerWithinComposite} 6 4 4 0 2 1 0,67
Composite/spec{GoF} 6 4 4 0 2 1 0,67
Composite/spec{MultipleCompositeRoot} 6 6 6 0 0 1 1
Composite/spec{MultipleComposite} 6 4 4 0 2 1 0,67
Factory Method/spec{Parametrized} 2 2 2 0 0 1 1
Factory Method/spec{RegistryBased} 1 1 1 0 0 1 1
Factory Method/spec{DelegationBased} 2 2 2 0 0 1 1
Observer/spec{GoF} 17 17 16 1 1 0,94 0,94
Prototype/spec{GoF} 10 9 9 0 1 1 0,9
Proxy/spec{Indirection} 4 3 3 0 1 1 0,75
Proxy/spec{GoF} 6 3 3 0 3 1 0,5
Singleton/spec{GoF} 3 3 3 0 0 1 1
Singleton/spec{Enumerative} 2 3 2 1 0 0,67 1
Singleton/spec{Relaxed} 3 3 3 0 0 1 1
Strategy/spec{GoF} 15 18 12 6 3 0,67 0,8
Template Method/spec{GoF} 38 30 29 1 9 0,97 0,76
Table VI. Results obtained on benchmark 04-QuickUML
systems) and Adapter (in JUnit, QuickUML ). For instance, in JUnit the overall recall of Adapter
with InnerClass variant is 0.87 (evaluated considering all the instances of the first two rows of Table
III). Removing the Adapter-InnerClass variant the recall becomes 0.53 (evaluated considering the
10 Adapter-InnerClass instances as false negatives).
Moreover result confirms that patterns mined with a higher number of (non-overlapping)
specifications have better overal results with respect to ones with few specifications. This is also
influenced by the pattern complexity since simple patterns (from structural and behavioural point
of view) require fewer variants than complex ones to be mined efficiently. Two notable examples
are the Singleton and Factory Method pattern families comprised respectively of 5 and 6 variants
having the highest overall precision and recall on all the systems.
Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
24 M.L. BERNARDI - M. CIMITILE - G. DI LUCCA
System→ 05-nutch
↓Design Pattern GS D TP FP FN P R
Abstract Factory/spec{GoF} 1 1 1 0 0 1 1
Adapter/spec{GoFObject} 68 66 64 2 4 0,97 0,94
Bridge/spec{GoF} 25 23 22 1 3 0,96 0,88
Command/spec{GoF} 50 47 46 1 4 0,98 0,92
Command/spec{InnerClasses} 9 7 7 0 2 1 0,78
Command/spec{ExecutionEngine} 12 7 6 1 6 0,86 0,5
Decorator/spec{GoF} 27 26 25 1 2 0,96 0,93
Factory Method/spec{SingleCreator} 1 1 1 0 0 1 1
Factory Method/spec{GoF} 11 7 7 0 4 1 0,64
Factory Method/spec{SingleFactory} 3 3 3 0 0 1 1
Factory Method/spec{Parametrized} 26 30 26 4 0 0,87 1
Factory Method/spec{DelegationBased} 8 5 5 0 3 1 0,62
Iterator/spec{GoF} 4 3 3 0 1 1 0,75
Memento/spec{GoF} 15 14 13 1 2 0,93 0,87
Prototype/spec{GoF} 2 2 2 0 0 1 1
Prototype/spec{GoFManager} 2 2 2 0 0 1 1
Proxy/spec{Indirection} 7 7 7 0 0 1 1
Proxy/spec{GoF} 43 39 38 1 5 0,97 0,88
Singleton/spec{GoF} 11 7 7 0 4 1 0,64
Singleton/spec{Enumerative} 2 2 2 0 0 1 1
Singleton/spec{Pool} 4 4 4 0 0 1 1
Singleton/spec{Relaxed} 46 43 42 1 4 0,98 0,91
Strategy/spec{GoF} 39 36 36 0 3 1 0,92
Template Method/spec{GoF} 38 31 31 0 7 1 0,82
Table VII. Results obtained on benchmark 05-nutch
Tp (true positives), Fp (false positives) and the computed values of Precision (P) and Recall (R)
computed on the results as provided by the tool and validated by an expert.
The first consideration about the results is related to the presence of false positive and false
negative. The percentage of false positive is less than 0.4% and 2% for respectively DPF and DPD
that is quite acceptable for both tools.
However for some patterns, and for both the approaches, the number of false positive is
particularly higher than for other patterns.
This happens, for DPF on JHotDraw 7, for Template Method patterns in which of 110 instances,
10 instances were not template methods. Inspecting those cases revealed a problem in the structure
of the specification that, even correct, was too relaxed. This caused some internal helper methods to
be considered as template methods.
For the Observer pattern (in JHotDraw 7) the results were similar, since the approach detected
105 observers instances (one for each concrete participant) but 9 of them were not Observers. The
case of Observer design pattern is also interesting for what concerns the detection of patterns
variants. DPF detected 96 true Observer instances on JHotDraw 7 (with 9 false positives) and
48 instances on Apache Avro 1.6 (no false positives found). Our pattern specification repository
actually was comprised of 2 variants for the Observer pattern. The first one exploits standard Java
types (Observable class and Listener interface). The second one is the multi-event observer reported
in Figure 7. Inspecting the matched instances we found that, for the both JHotDraw 7 and Apache
Avro 1.6 systems, the 96 and 48 instances respectively were all variants of the second type. This
also explains why DPD tool, that is based on the classic variant, was not able to find observers on
these two systems. The situation is inverted in JDT and Log4J systems for which in both cases a
single event observer is found by DPD but not by DPF. Inspecting source code we found that the
Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
DESIGN PATTERNS DETECTION USING A DSL-DRIVEN GRAPH MATCHING APPROACH 25
System→ 06-PMD
↓Design Pattern GS D TP FP FN P R
Adapter/spec{GoFObject} 10 10 10 0 0 1 1
Builder/spec{GoF} 12 12 11 1 1 0,92 0,92
Command/spec{GoF} 10 6 6 0 4 1 0,6
Composite/spec{GoF} 8 8 7 1 1 0,88 0,88
Decorator/spec{GoF} 5 5 5 0 0 1 1
Factory Method/spec{SingleCreator} 3 2 2 0 1 1 0,67
Factory Method/spec{GoF} 22 18 17 1 5 0,94 0,77
Factory Method/spec{SingleFactory} 15 12 12 0 3 1 0,8
Factory Method/spec{Parametrized} 7 6 5 1 2 0,83 0,71
Factory Method/spec{RegistryBased} 2 3 2 1 0 0,67 1
Factory Method/spec{DelegationBased} 9 7 7 0 2 1 0,78
Iterator/spec{GoF} 4 5 4 1 0 0,8 1
Observer/spec{GoF} 3 3 3 0 0 1 1
Proxy/spec{GoF} 3 4 3 1 0 0,75 1
Singleton/spec{GoF} 9 9 9 0 0 1 1
Singleton/spec{Enumerative} 1 1 1 0 0 1 1
Singleton/spec{Relaxed} 8 8 7 1 1 0,88 0,88
Strategy/spec{GoF} 29 28 28 0 1 1 0,97
Template Method/spec{GoF} 12 12 12 0 0 1 1
Visitor/spec{GoF} 92 80 80 0 12 1 0,87
Table VIII. Results obtained on benchmark 06-PMD
DSL specifications missed the observer with a notify method taking one or more arguments or is
indirectly called (before the actual call to the update(. . . ) methods on listener).
Another interesting case is related to the Adapter and Command patterns since they have very
similar static structure. The DPF tool is able, if needed, to distinguish between them by adding
behavioural constraints (clarifying how pattern is used by its context) while DPD (and many other
structural approaches) is not able to do this. For JhotDraw and Avro our specification was not able
to distinguish among them while in JDT and Log4J cases we added further constraints leading to
a better identification. In particular for the Command pattern, in both cases, DPF obtained better
results in terms of both precision and recall.
Actually for precision and recall we can observe that while precision is quite high for both the
tools, recall for DPF is generally higher than DPD (as shown in Figure 11∗∗ ). This results are
confirmed by the details showing that both tools are good at keeping the number of FP low, but for
DPF the number of TP is, in the average, higher than DPD.
Finally, the DPs detection was performed on the Voldemort system. The obtained results are
in Table XII. Voldemort system was selected since it contains a large portion of automatically
generated code, allowing to study the impact of code generation on pattern mining. As shown in
Table XII, the values of precision and recall for DPF are quite high (respectively greater than 0.64
and 0.82). For DPD, while the average value of precision is similar (0.91) to the DPF one, the values
of recall are not satisfactory if compared to the ones of DPF. A manual code inspection revealed
that the high number of missed Prototype, Adapter and Singleton instances are mostly present in
the classes generated from the ProtocolBuffer code generator†† specifications. The specifications we
used in the catalog helped to obtain better results since we added variants capable of detecting inner
∗∗ The average is performed taking into account all the meaningful precision/recall scores obtained for all searched
patterns on the considered systems.
†† The Google protocol buffer code generator hosted in http://code.google.com/p/protobuf/.
Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
26 M.L. BERNARDI - M. CIMITILE - G. DI LUCCA
System→ 07-JRefactory
↓Design Pattern GS D TP FP FN P R
Abstract Factory/spec{GoF} 2 2 2 0 0 1 1
Abstract Factory/spec{Parametrized} 1 1 1 0 0 1 1
Adapter/spec{GoFObject} 49 39 38 1 11 0,97 0,78
Adapter/spec{Dynamic} 38 33 29 4 9 0,88 0,76
Builder/spec{GoF} 4 4 4 0 0 1 1
Command/spec{GoF} 84 87 82 5 2 0,94 0,98
Command/spec{ExecutionEngine} 10 6 6 0 4 1 0,6
Factory Method/spec{SingleCreator} 9 7 7 0 2 1 0,78
Factory Method/spec{GoF} 18 16 15 1 3 0,94 0,83
Factory Method/spec{SingleFactory} 14 12 12 0 2 1 0,86
Factory Method/spec{Parametrized} 22 18 18 0 4 1 0,82
Factory Method/spec{RegistryBased} 11 11 11 0 0 1 1
Factory Method/spec{DelegationBased} 7 6 4 2 3 0,67 0,57
Memento/spec{GoF} 11 10 10 0 1 1 0,91
Observer/spec{EventsAsParams} 3 3 3 0 0 1 1
Proxy/spec{Indirection} 31 33 31 2 0 0,94 1
Proxy/spec{GoF} 31 31 30 1 1 0,97 0,97
Singleton/spec{GoF} 12 12 12 0 0 1 1
Singleton/spec{Enumerative} 24 24 24 0 0 1 1
Singleton/spec{Relaxed} 45 43 43 0 2 1 0,96
State/spec{GoF} 8 8 7 1 1 0,88 0,88
Strategy/spec{GoF} 50 49 49 0 1 1 0,98
Template Method/spec{GoF} 131 155 131 24 0 0,85 1
Visitor/spec{GoF} 136 132 131 1 5 0,99 0,96
Table IX. Results obtained on benchmark 07-JRefactory
Table X. Precision and Recall for JHotDraw 7 and Apache Avro 1.6
classes and generic types that are consistently and widely used by the large automatically generated
classes.
An overview of the average precision and recall obtained for each system by both the DPD and
DPF tools is shown in Figure 11. The figure shows that the precision of DPF is always greater than
the one of DPD with the exception of the Log4j system. Finally, the recall values obtained for DPF
are largely better than the corresponding values for DPD.
Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
DESIGN PATTERNS DETECTION USING A DSL-DRIVEN GRAPH MATCHING APPROACH 27
System→ Voldemort
Tool→ DPF DPD
↓Design Pattern GS D TP FP P R D TP FP P R
Observer 9 9 9 0 1 1 0 0 0 - -
Singleton 148 150 148 2 0.99 1 91 70 21 0.77 0.47
Factory 89 84 83 1 0.99 0.93 9 7 2 0.78 0.08
Template 22 20 18 2 0.90 0.82 10 10 0 1 0.45
Adapter 245 245 245 0 1 1 64 61 3 0.95 0.25
Command 1 1 1 0 1 1 0 0 0 - -
Decorator 14 16 14 2 0.88 1 13 13 0 1 0.93
Prototype 139 139 139 0 1 1 119 119 0 1 0.86
State-Strategy 112 170 109 61 0.64 0.97 103 96 7 0.93 0.86
Composite 3 3 3 0 1 1 0 0 0 - -
Figure 11. DPF and DPD Precision (left) and Recall (right) for the analyzed systems
shows the total and average detection times of DPD and DPF tools for each systems. The pattern
matching step is the most CPU time consuming. We cannot show detection times for each pattern
since our approach uses the successful identifications across pattern specifications as constraints
to improve the performance and hence the detection times are dependent on that. However, we
calculated the average time to detect a single pattern and it resulted to be comparable to the other
structural approaches. The total times in Table XIII, show that DPF exhibits a better scalability with
respect to DPD Tool. Experimentation performed for tuning the patterns specifications, showed that
Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
28 M.L. BERNARDI - M. CIMITILE - G. DI LUCCA
performances can be considerably improved by identifying structural and behavioral constraints that
are effective at identifying a well defined variant of a pattern. Hence the approach is more effective
when specifications are structured in a hierarchy and each specification is dedicated to specific
pattern variants.
Figure 12. DPF and DPD total (left) and average per pattern (right) detection times
Another interesting point is that the algorithm is faster when actually the searched DPs are found
in the analyzed system. On the contrary the worst performances are when no DP instances are found
in the system. This is because all matches need to be executed anyway and no existing bindings are
used to reduce the list of remaining matches to be evaluated.
Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
DESIGN PATTERNS DETECTION USING A DSL-DRIVEN GRAPH MATCHING APPROACH 29
be higher than actual one since there could exist pattern instances missed by both tools used in
the study to compute the gold standard). This, in future work, can be improved by performing a
full analysis of the searched source code base (or using available benchmarks). Threats to internal
validity concern factors that can influence our observations. In this case the identification of pattern
instances was based on the expert examination of internal/external documentation and source code
and hence could pose a threat to internal validity affecting the number of false negatives. Threats
to external validity concern the generalization of our findings. Of course replication on further
systems to confirm or contradicts the obtained results is always desirable. Moreover, we cannot
claim that our approach produces the same results on different (and larger) systems. Rather, we
provide quantitative information on the quality of the search for several real world systems and can
affirm that precision and recall have remained consistent and independent with respect to the system
size. On the performance side there is a high dependency of the overall detection performance on
the quality of pattern specifications. When specifications are badly written (that means few and
overlapping constraints) the performance of the algorithm degrades rapidly.
DPs identification in a software system, together with the knowledge about the components involved
in each DP instance, greatly helps system comprehension. The approach presented in this paper
provides an efficient support to the detection of DP instances in an OO system. It exploits a meta-
model and a derived DSL defined to represent both the patterns and the system under study. The
detection process is carried out matching each model of a design pattern with the system model
one. The defined DSL allows to easily specify the model of the DPs structure by the Properties
characterizing them. Moreover, the DSL makes easy to specify the variants of a DP by overriding an
already defined DP specification. The DPF tool has been developed to provide an automatic support
to the approach. The performed experiments confirmed the ease of use of the DSL to specify a DP
model and its variants as well as the correctness of the specifications. The experiments showed the
high accuracy of the approach in detecting the DP instances in a system, allowing to distinguish
among the DP variants too.
In particular, the approach has been assessed by applying the DPF tool first to seven systems
of an open benchmark proposed in [32] and [2], and then to five additional systems to compare
the results from DPF with the ones obtained from the DPD approach. It is worthwhile to highlight
the effectiveness of DPF in detecting DP variants that were not identified by the benchmark or the
DPD tool. The average values of precision and recall, evaluated using the GS composed starting
from the considered benchmarks, are good, independently of system size. For the latter group of
five systems, the results show that DPF performs better and is more efficient than the DPD. As
future work, DPF approach will be further improved and a deep comparison with other available
approaches and tools will be performed. Future work thus involves the improvement of the meta-
model in order to consider a wider set of properties to allow modeling of more complex design and
architectural patterns also experimenting with anti-patterns mining.
In the following some DSL specifications and DPGs of of the most relevant DPs and their variants
are provided. For sake of brevity and not to bury readers, all the other DSL specifications and
DPGs of the remaining DPs stored in the DP catalog repository are not described here, but they are
available at our repository [3].
Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
30 M.L. BERNARDI - M. CIMITILE - G. DI LUCCA
1 pattern adapterObject {
2 type ADAPTER( ∗ ) {
3 {
4 has f i e l d f of −type ADAPTEE ;
5 has constructor c {
6 has param p of type ADAPTEE ;
7 set − f i e l d f ;
8 }
9 has methods−set a d ap t e rS e t each {
10 delegates to f i n adapteeSet ;
11 }
12 }
13 type ADAPTEE( 1 ) {
14 has methods−set adapteeSet ;
15 }
16 }
17 p a t t e r n ad a p t e r C l a s s extends a d a p t e r O b j e c t {
18 override ADAPTER( ∗ ) {
19 i n h e r i t s −from ADAPTEE ;
20 }
21 override methods−set a d a p t e r S e t each {
22 delegates to methods−set adapteeSet i n ADAPTEE ;
23 }
24 }
Figure 13. The DSL specifications of adapter Object- and Class- variants.
Figure 14. Object Adapter and its Class variant specification graphs
The Class Adapter, showed in figures 13 (DSL) and 14 (DPG), is based on inheritance (delegates
to inherited methods) while the object Adapter exploits composition. In the DSL specification of
the Object Adapter, two template types are defined: an Adapter role ADAPTER incapsulating a
reference to an ADAPTEE type.
The ADAPTER must define a field f of type ADAPTEE and must implement the methods of the
ADAPTER interface in terms of methods of the ADAPTEE one (referring to the adapterSet method-
set defined in the specification, each method of the set must delegate to a method of “f” field).
The adapterClass instead, is specified as a variant of the adapterObject; in this case ADAPTER is
forced to inherit from ADAPTEE and adapterSet is built considering the methods inherited from
ADAPTEE.
However, in real world systems the adapter pattern can be found in several forms that are slightly
different from the models of Figure 13. It is often implemented for generic types, using inner classes
or both. In our catalog we have three specifications to cover such implementation. Figure 15 reports
the DSL of the inner class Adapter variant. It inherits from the classic Object Adapter specification
and introduces the structural elements to detect the adapter as an inner generic type.
Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
DESIGN PATTERNS DETECTION USING A DSL-DRIVEN GRAPH MATCHING APPROACH 31
1 p a t t e r n A d a p t e r I n n e r G e n e r i c C l a s s e s extends Ob j ec tA da p te r {
2
3 override ADAPTER( ∗ ) {
4 g e n e r i c on type T ;
5 }
6 override ADAPTEE( 1 ) {
7 g e n e r i c on type T ;
8 }
9
10 type CLIENT ( ∗ ) {
11
12 method c l i e n t {
13 new O b j e c t A d a p t er on type T
14 }
15 }
16
17 }
1 p a t t e r n Composite {
2 type C( 1 ) {
3 has method a {
4 has r e t u r n type void
5 has param c of −type C
6 }
7 has method r {
8 has r e t u r n type void
9 has param c of −type C
10 }
11 has methods−set componentSetC
12 }
13 type LF ( ∗ ) {
14 i n h e r i t s −from C
15 has−n o t container co of −type C
16 }
17 type CM( ∗ ) {
18 i n h e r i t s −from C
19 has container cm of −type C
20 has methods−set componentSetCM each {
21 delegates −to cm
22 }
23 }
24 }
8.2. Composite
The Composite DSL specification models the Component hierarchy using three roles: Leaf (LF),
Component (C) and CM (Composite). Both Leaf (LF) and Composite (CM) must inherit the
Component role. While LF has inner structure (no reference to inner components), CM must own
a reference to a collection (named “cm” in the specification) having C as base type. Each method
of the composite CM (referred to as “componentSet”) delegates to container methods (to allow
adding/removing/iterating over inner components). For Composite, DSL specification and DPGs are
respectively reported in figures 16 and 17. In Figure 17, the labels cS1 and cS2 stand respectively
for componentSetC and componentSetCM.
Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
32 M.L. BERNARDI - M. CIMITILE - G. DI LUCCA
cS1
is type of
a C CM cm
is type of is type of
delegate to
m
has not
c r co LF cS2
delegate to
1 p a t t e r n D e c o r a t o r extends Composite {
2 ...
3 override CM {
4 has methods−set componentSetCM each {
5 delegates to cm
6 delegates to m i n C. componentSetC
7 }
8 }
9 }
10 }
11 }
8.3. Decorator
The Decorator pattern can be represented as a variant of the described Composite. The added
elements are represented in red in Figure 18 while an excerpt of its DSL is reported in Figure 19.
The most relevant change in this variant is related to the componentSetCM method set that requires
delegation to both the cm field and to the componentSetC (representing the set of method defined
by the Component interface).
Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
DESIGN PATTERNS DETECTION USING A DSL-DRIVEN GRAPH MATCHING APPROACH 33
1 pattern Singleton {
2 f i n a l type X {
3 X has p r i v a t e constructor c ;
4 X has f i e l d f of type X ;
5 X has public s t a t i c methods−set c re a ti o nH o ok s each {
6 depends on f ;
7 }
8 }
9 }
RET
is type of is type of
is container of
X f X f X f
*
-c c c
cH cH cH
depends on depends on depends on
8.4. Singleton
The Singleton pattern is identified using three variants, whose DPGs are shown in Figure 21. The
DSL of the first one, reported in Figure 20, provides a Singleton definition as given in literature
[29], implemented with a final class, a private constructor and a public static getter method. To mine
multiple instance getters, the variant defines a method set called “creationHooks” (the box labelled
by cH in Figure 21). Each method in this set requires a dependency on the static Singleton field “f”.
The second relaxed specification, called “relaxed-gof”, removes the private constraint from the
constructor (refer to the red block in right part of Figure 21) and final class constraints.
Finally the DPG shown on the right side of the Figure 21, models the Pool variant that allows to
handle a fixed set of resources. In this case the field “f” is overridden to become a container while
the method RET represents the pool’s resource manager.
*Bibliography
[1] http://www.eclipse.org/modeling/.
[2] Comsats institute of information technology. http://research.ciitlahore.edu.pk/Groups/
SERC/DesignPatterns.aspx .
[3] https://github.com/UnisannioSoftEng/DPF/wiki/Design-Pattern-Finder-Home.
[4] A. Alnusair, T. Zhao, and G. Yan. Automatic recognition of design motifs using semantic conditions. In
Proceedings of the 28th Annual ACM Symposium on Applied Computing, SAC ’13, pages 1062–1067,
New York, NY, USA, 2013. ACM.
[5] A. Ampatzoglou, G. Frantzeskou, and I. Stamelos. A methodology to assess the impact of design
patterns on software quality. Inf. Softw. Technol., 54(4):331–346, Apr. 2012.
[6] G. Antoniol, G. Casazza, M. D. Penta, and R. Fiutem. Object-oriented design patterns recovery.
Journal of Systems and Software, 59(2):181–196, 2001.
[7] G. Antoniol, R. Fiutem, and L. Cristoforetti. Design pattern recovery in object-oriented software. In
Proceedings of the 6th International Workshop on Program Comprehension, IWPC ’98, pages 153–,
Washington, DC, USA, 1998. IEEE Computer Society.
Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
34 M.L. BERNARDI - M. CIMITILE - G. DI LUCCA
[8] F. Arcelli and M. Zanoni. A tool for design pattern detection and software architecture reconstruction.
Inf. Sci., 181(7):1306–1324, Apr. 2011.
[9] F. Arcelli Fontana, A. Caracciolo, and M. Zanoni. Dpb: A benchmark for design pattern detection
tools. In Software Maintenance and Reengineering (CSMR), 2012 16th European Conference on,
pages 235–244, 2012.
[10] Z. Balanyi and R. Ferenc. Mining design patterns from c++ source code. In Proc. International
Conference on Software Maintenance ICSM 2003, pages 305–314, Sept. 22–26, 2003.
[11] I. Bayley and H. Zhu. On the composition of design patterns. In Quality Software, 2008. QSIC ’08.
The Eighth International Conference on, pages 27 –36, aug. 2008.
[12] F. Bergenti and A. Poggi. Improving uml designs using automatic design pattern detection. In Proc.
12th. International Conference on Software Engineering and Knowledge Engineering (SEKE 2000,
pages 336–343, 2000.
[13] M. Bernardi, C. M., and D. L. G.A. A model-driven graph-matching approach for design pattern
detection. In 20th Working Conference on Reverse Engineering (WCRE), pages 172–181, 2013.
[14] M. L. Bernardi and G. A. Di Lucca. Model-driven detection of design patterns. In Proceedings of the
26th IEEE International Conference on Software Maintenance, ICSM ’10, September 12-18, Timioara,
Romania, 2010.
[15] D. Beyer. Relational programming with crocopat. In Proceedings of the 28th international conference
on Software engineering, ICSE ’06, pages 807–810, New York, NY, USA, 2006. ACM.
[16] D. Beyer and C. Lewerentz. Crocopat: efficient pattern analysis in object-oriented programs. pages
294–295, 2003.
[17] A. Binun and G. Kniesel. Dpjf - design pattern detection with high accuracy. In Software Maintenance
and Reengineering (CSMR), 2012 16th European Conference on, pages 245–254, 2012.
[18] A. De Lucia, V. Deufemia, C. Gravino, and M. Risi. Behavioral pattern identification through visual
language parsing and code instrumentation. In Proceedings of the 2009 European Conference on
Software Maintenance and Reengineering, CSMR ’09, pages 99–108, Washington, DC, USA, 2009.
IEEE Computer Society.
[19] A. De Lucia, V. Deufemia, C. Gravino, and M. Risi. Design pattern recovery through visual language
parsing and source code analysis. Journal of Systems and Software, 82(7):1177 – 1193, 2009.
[20] A. De Lucia, V. Deufemia, C. Gravino, and M. Risi. An eclipse plug-in for the detection of design
pattern instances through static and dynamic analysis. In Software Maintenance (ICSM), 2010 IEEE
International Conference on, pages 1–6, 2010.
[21] A. De Lucia, V. Deufemia, C. Gravino, and M. Risi. Improving behavioral design pattern detection
through model checking. In Software Maintenance and Reengineering (CSMR), 2010 14th European
Conference on, pages 176–185, 2010.
[22] A. De Lucia, V. Deufemia, C. Gravino, M. Risi, and G. Tortora. An eclipse plug-in for the identification
of design pattern variants. In Sixth Workshop of the Italian Eclipse Community (Eclipse-IT 2011, pages
40–51, 2011.
[23] J. Dong and Y. Zhao. Experiments on design pattern discovery. In Predictor Models in Software
Engineering, 2007. PROMISE’07: ICSE Workshops 2007. International Workshop on, pages 12–12,
May 2007.
[24] J. Dong, Y. Zhao, and T. Peng. Architecture and design pattern discovery techniques - a review. In
H. R. Arabnia and H. Reza, editors, Software Engineering Research and Practice, pages 621–627.
CSREA Press, 2007.
[25] J. Dong, Y. Zhao, and Y. Sun. A matrix-based approach to recovering design patterns. Trans. Sys. Man
Cyber. Part A, 39(6):1271–1282, Nov. 2009.
Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
DESIGN PATTERNS DETECTION USING A DSL-DRIVEN GRAPH MATCHING APPROACH 35
[26] F. A. Fontana, M. Zanoni, and S. Maggioni. Using design pattern clues to improve the precision of
design pattern detection tools. Journal of Object Technology, 10:4: 1–31, 2011.
[27] R. France, D. Kim, S. Ghosh, and E. Song. A uml-based pattern specification technique. Software
Engineering, IEEE Transactions on, 30(3):193 – 206, march 2004.
[28] L. J. Fulop, T. Gyovai, and R. Ferenc. Evaluating c++ design pattern miner tools. In Proceedings of
the Sixth IEEE International Workshop on Source Code Analysis and Manipulation, SCAM ’06, pages
127–138, Washington, DC, USA, 2006. IEEE Computer Society.
[29] E. Gamma, R. Helm, R. Johnson, and J. Vlissides. Design patterns: elements of reusable object-
oriented software. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1995.
[30] H. A. Ghulam Rasool. Discovering variants of design patterns. Journal of Basic and Applied Scientific
Research, pages 139–147, 2013.
[31] M. Goldstein and D. Moshkovich. System grokking: a novel approach for software understanding,
validation, and evolution. In Proceedings of the 7th international conference on Next generation
information technologies and systems, NGITS’09, pages 38–49, Berlin, Heidelberg, 2009. Springer-
Verlag.
[32] Y. G. Guéhéneuc. P-mart: Pattern-like micro architecture repository,. In Proceedings of the 1st
EuroPLoP Focus Group on Pattern Repositories. Michael , Aliaksandr Birukou, and Paolo Giorgini,
2007, http://www.ptidej.net/tool/designpatterns/.
[33] Y. G. Gueheneuc and G. Antoniol. Demima: A multilayered approach for design pattern identification.
IEEE Transactions on Software Engineering, 34(5):667–684, 2008.
[35] Y. G. Guéhéneuc, H. A. Sahraoui, and F. Zaidi. Fingerprinting design patterns. In 11th Working
Conference on Reverse Engineering (WCRE 2004), pages 172–181, 2004.
[36] A. L. Guennec, G. Sunye, and J. marc Jezequel. Precise modeling of design patterns. In In Proceedings
of UML00, pages 482–496. Springer Verlag, 2000.
[37] J. Heering, P. R. H. Hendriks, P. Klint, and J. Rekers. The syntax definition formalism sdf reference
manual. SIGPLAN Not., 24(11):43–75, Nov. 1989.
[38] D. Heuzeroth, T. Holl, G. Högström, and W. Löwe. Automatic design pattern detection. In
Proceedings of the 11th IEEE International Workshop on Program Comprehension, IWPC ’03, pages
94–, Washington, DC, USA, 2003. IEEE Computer Society.
[39] H. Huang, S. Zhang, J. Cao, and Y. Duan. A practical pattern recovery approach based on both
structural and behavioral analysis. Journal of Systems and Software, 75(12):69 – 87, 2005. Software
Engineering Education and Training.
[40] O. Kaczor, Y. Gueheneuc, and S. Hamel. Efficient identification of design patterns with bit-vector
algorithm. In Proc. 10th European Conference on Software Maintenance and Reengineering CSMR
2006, pages 10 pp.–184, 2006.
[41] R. K. Keller, R. Schauer, S. Robataille, and B. Laguë. Advances in software engineering. chapter
Pattern-based design recovery with SPOOL, pages 113–135. Springer-Verlag New York, Inc., New
York, NY, USA, 2002.
[42] H. Kim and C. Boldyreff. A method to recover design patterns using software product metrics. In
Proceedings of the 6th International Conerence on Software Reuse: Advances in Software Reusability,
ICSR-6, pages 318–335, London, UK, UK, 2000. Springer-Verlag.
[43] C. Kramer and L. Prechelt. Design recovery by automated search for structural design patterns in
object-oriented software. In Proceedings of the 3rd Working Conference on Reverse Engineering
(WCRE ’96), WCRE ’96, pages 208–, Washington, DC, USA, 1996. IEEE Computer Society.
Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
36 M.L. BERNARDI - M. CIMITILE - G. DI LUCCA
[44] M. P. L. Prechelt, B. Unger-Lamprecht and W. Tichy. Two controlled experiments assessing the
usefulness of design pattern documentation in program maintenance. IEEE Trans. Softw. Eng.,
28(6):595–606, 2002.
[45] K. N. Loo and S. Lee. Representing design pattern interaction roles and variants. In Computer
Engineering and Technology (ICCET), 2010 2nd International Conference on, volume 6, pages V6–
470–V6–474, 2010.
[46] K. N. Loo, S. P. Lee, and T. K. Chiew. Uml extension for defining the interaction variants of design
patterns. Software, IEEE, 29(5):64–72, 2012.
[47] J. K. Y. Ng, Y. G. Gueheneuc, and G. Antoniol. Identification of behavioural and creational design
motifs through dynamic analysis. J. Softw. Maint. Evol., 22(8):597–627, Dec. 2010.
[48] J. Niere, W. Schäfer, J. P. Wadsack, L. Wendehals, and J. Welsh. Towards pattern-based design
recovery. In Proceedings of the 24th International Conference on Software Engineering, ICSE ’02,
pages 338–348, New York, NY, USA, 2002. ACM.
[49] J. Paakki, A. Karhinen, J. Gustafsson, L. Nenonen, and A. I. Verkamo. Software metrics by
architectural pattern mining. In Proceedings of the International Conference on Software: Theory
and Practice (16th IFIP World Computer Congress, pages 325–332, 2000.
[50] T. Peng, J. Dong, and Y. Zhao. Verifying behavioral correctness of design pattern implementation.
In Proceedings of the Twentieth International Conference on Software Engineering & Knowledge
Engineering (SEKE’2008), pages 454–459, 2008.
[51] N. Pettersson, W. Lowe, and J. Nivre. Evaluation of accuracy in design pattern occurrence detection.
IEEE Trans. Softw. Eng., 36(4):575–590, July 2010.
[52] I. Philippow, D. Streitferdt, M. Riebisch, and S. Naumann. An approach for reverse engineering of
design patterns. Software and System Modeling, 4(1):55–70, 2005.
[53] G. Rasool and P. Mäder. Flexible design pattern detection based on feature types. In 26th IEEE/ACM
International Conference on Automated Software Engineering (ASE 2011), Lawrence, KS, USA,
November 6-10, pages 243–252, 2011.
[54] G. Rasool, I. Philippow, and P. Mäder. Design pattern recovery based on annotations. Adv. Eng. Softw.,
41(4):519–526, Apr. 2010.
[55] G. Rasool and D. Streitfdert. A survey on design pattern recovery techniques. IJCSI International
Journal of Computer Science Issues, 8(2):251 – 260, 2011.
[56] N. Shi and R. A. Olsson. Reverse engineering of design patterns from java source code. In Proceedings
of the 21st IEEE/ACM International Conference on Automated Software Engineering, ASE ’06, pages
123–134, Washington, DC, USA, 2006. IEEE Computer Society.
[57] J. M. Smith and D. Stotts. Spqr: flexible automated design pattern extraction from source code. In
Proc. 18th IEEE International Conference on Automated Software Engineering, pages 215–224, Oct.
6–10, 2003.
[58] K. Stencel and P. Wegrzynowicz. Detection of diverse design pattern variants. In Software Engineering
Conference, 2008. APSEC ’08. 15th Asia-Pacific, pages 25–32, Dec 2008.
[59] K. Stencel and P. Wegrzynowicz. Implementation variants of the singleton design pattern. In
Proceedings of the OTM Confederated International Workshops and Posters on On the Move to
Meaningful Internet Systems: 2008 Workshops: ADI, AWeSoMe, COMBEK, EI2N, IWSSA, MONET,
OnToContent + QSI, ORM, PerSys, RDDS, SEMELS, and SWWS, OTM ’08, pages 396–406, Berlin,
Heidelberg, 2008. Springer-Verlag.
[60] P. Tonella, M. Torchiano, B. Du Bois, and T. Systä. Empirical studies in reverse engineering: state of
the art and future trends. Empirical Softw. Engg., 12(5):551–571, Oct. 2007.
[61] N. Tsantalis, A. Chatzigeorgiou, G. Stephanides, and S. T. Halkidis. Design pattern detection using
similarity scoring. IEEE Trans. Softw. Eng., 32(11):896–909, Nov. 2006.
Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
DESIGN PATTERNS DETECTION USING A DSL-DRIVEN GRAPH MATCHING APPROACH 37
[62] M. Vokác. An efficient tool for recovering design patterns from c++ code. Journal of Object
Technology, 5(1):139–157, 2006.
[63] M. von Detten and S. Becker. Combining clustering and pattern detection for the reengineering of
component-based software systems. In Proceedings of the joint ACM SIGSOFT conference – QoSA and
ACM SIGSOFT symposium – ISARCS on Quality of software architectures – QoSA and architecting
critical systems – ISARCS, QoSA-ISARCS ’11, pages 23–32, New York, NY, USA, 2011. ACM.
[64] Y. Wang and J. Huang. Formal modeling and specification of design patterns using rtpa. IJCINI, pages
100–111, 2008.
[65] P. Wegrzynowicz and K. Stencel. Relaxing queries to detect variants of design patterns. In Computer
Science and Information Systems (FedCSIS), 2013 Federated Conference on, pages 1571–1578, 2013.
[66] L. Wendehals. Improving design pattern instance recognition by dynamic analysis. In Proceedings
of the 2003 International Workshop on Dynamic Systems Analysis, WODA ’03, pages 29–32. ACM,
2003.
[67] L. Wendehals and A. Orso. Recognizing behavioral patterns atruntime using finite automata. In
Proceedings of the 2006 International Workshop on Dynamic Systems Analysis, WODA ’06, pages
33–40, New York, NY, USA, 2006. ACM.
Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr