0% found this document useful (0 votes)
42 views

Design Patterns Detection Using A DSL Driven Graph Matching Approach

Uploaded by

Horia Ignat
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views

Design Patterns Detection Using A DSL Driven Graph Matching Approach

Uploaded by

Horia Ignat
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/263583797

Design Patterns Detection Using a DSL-driven Graph Matching Approach

Article in Journal of Software: Evolution and Process · December 2014


DOI: 10.1002/smr.1674

CITATIONS READS

35 768

3 authors:

Mario Luca Bernardi Marta Cimitile


Università degli Studi del Sannio UnitelmaSapienza University of Rome
170 PUBLICATIONS 1,398 CITATIONS 169 PUBLICATIONS 1,464 CITATIONS

SEE PROFILE SEE PROFILE

Giuseppe A. Di Lucca
Università degli Studi del Sannio
124 PUBLICATIONS 2,458 CITATIONS

SEE PROFILE

All content following this page was uploaded by Marta Cimitile on 30 May 2018.

The user has requested enhancement of the downloaded file.


JOURNAL OF SOFTWARE MAINTENANCE AND EVOLUTION: RESEARCH AND PRACTICE
J. Softw. Maint. Evol.: Res. Pract. 2013; 00:1–37
Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/smr

Design Patterns Detection Using a


DSL-driven Graph Matching Approach

Mario Luca Bernardi1 , Marta Cimitile2 , Giuseppe Di Lucca3


1
Department of Engineering3 , University of Sannio, Italy
2
Unitelma Sapienza University, Italy
1
[email protected], 2 [email protected],3 [email protected]

SUMMARY
The knowledge about Design Pattern (DP) instances improves program comprehension and re-engineering
of Object Oriented system. Effectively, it helps to discover developer design decisions and trade-offs that
often are not documented. This work describes an approach to automatically detect DPs in existing Object
Oriented systems tracing system’s source code components with the roles they play in the Patterns. In
the proposed approach DPs are modelled basing on their high level structural properties (e.g., inheritance,
dependency, invocation, delegation, type nesting, and membership relationships) that are checked, by source
code parsing, against the system structure and components. Moreover, the approach can detect also Pattern
variants, defined by overriding the Pattern properties. The paper presents a description of the approach,
provides a brief description of the supporting tool, and discusses the results from the experiments carried
out to validate it. The approach was validated on seven systems of an open benchmark containing systems
of increasing sizes. For five additional systems, the results have been compared with the ones from a similar
approach existing in literature. The obtained results, the identified DPs variants and the effectiveness of the
approach are thoroughly presented and discussed.
Copyright ⃝ c 2013 John Wiley & Sons, Ltd.

Received . . .

KEY WORDS: Design Patterns Detection, Object Oriented systems,Graph-Matching, Domain Specific
Languages, Model Driven Development

1. INTRODUCTION

Design Patterns (DPs) were firstly introduced in [29] as general repeatable solutions to commonly
occurring problems in software design. Several works [5, 12] show how software quality greatly
improves by implementing DPs and documenting their adoption. In the last twenty years, as the
number of pattern-based systems and frameworks increased, the topic of (semi-)automatic detection
of pattern instances in Object Oriented (OO) software systems became more critical to improve
program comprehension, maintenance and reuse [44]. When design documentation is not available
(or updated), DPs detection can help program comprehension providing useful insights on software
architecture, the underlying design choices and the role played by each code component in a DP
[15, 23]. This is even more true in the case of bad or incomplete documentation. In fact, the lack of
adequate documentation in a software system may make hard to understand which are the adopted
design solutions and their code components. Finally, searching a software project for DPs can also
be used to assess the quality of the source code [12, 15].
To address these issues, several methodologies, approaches and tools have been proposed in
literature in the last twenty years [50]. Most of these approaches take into account a fixed and

Copyright ⃝
c 2013 John Wiley & Sons, Ltd.
Prepared using smrauth.cls [Version: 2010/05/10 v2.00]
2 M.L. BERNARDI - M. CIMITILE - G. DI LUCCA

limited set of properties to specify a pattern. In particular they do not consider behavioural properties
which are crucial in the characterization of object DPs. Moreover, most of these approaches are very
sensitive to structural differences of the searched patterns since their specifications are embedded in
the detection algorithm.
This paper proposes a detection approach that addresses these issues. The detection algorithm
is based on a meta-model representing both the software system and the searched DPs through a
wider set, with respect to existing approaches, of high level properties related to the source code
elements, the static relationships among them, and their behavior. Each system is considered as an
instance of this meta-model and is represented as a graph of elements and properties about them.
DPs identification is performed by matching each pattern graph with the overall system graph and
by annotating the elements of the type hierarchy with information on the roles they play in the
pattern. An advantage of the proposed approach over most of existing ones is that it also allows
to easily specify variant forms of the classic DPs (as coded in the literature). This is an important
issue to address since it is well known that DPs are present in real world systems with many different
variants [61, 57]. Our detection approach is driven by a set of pattern models written using a Domain
Specific Language (DSL) defined to model the structure of both the software system and the design
patterns. It organizes such DPs models as a hierarchy of declarative specifications. In particular a
DP variant can be expressed as a set of changes to an existing specification by adding, removing
or relaxing properties. Hence, it is possible to write a new pattern specification deriving it from an
existing one (to detect a variant) or to write it from scratch (to detect new kind of pattern), with no
impact on the mining algorithm.
An eclipse-based tool, called Design Pattern Finder (DPF) has been developed to provide an
automatic support to the approach (in the following DPF is used to concisely refer to the proposed
approach, not only just the tool) .
The approach has been assessed by applying it to seven systems of an open benchmark proposed
in [32] and [9]. For five additional systems, we compared our results with the ones obtained using the
tool Design Pattern Detection (DPD), proposed in [61]. In this case the results have been validated
by experts in order to evaluate precision and recall considering the true positives of both tools as a
gold standard.
This paper improves and enhances our previous preliminary investigation reported in [13, 14].
The improvements and enhancements are mainly referred to: (i) a wider set of DSL specifications
allowing the detection of new DPs not considered before; (ii) a new version of the DPF prototype
tool provided with a user interface; (iii) more experiments and the related discussion of the results
provided by the approach, using the tool DPF, on a larger set of systems, comparing them with
results provided by both an open benchmark and the DPD approach. The paper is structured as
follows. In Section 2 relevant related work is discussed. Section 3 describes the meta-model and
DSL defined to represent the system and patterns structure and the detection approach. Section 4
concisely describes the catalog of the DPs specifications, while the DPF tool is briefly described
in Section 5. Section 6 reports the experiment setup whereas the results are discussed in Section 7.
Section 8 contains conclusive remarks and briefly discusses future work. The Appendix reports and
describes some DSL specifications of the most relevant DPs’ and their variants.

2. RELATED WORK

The problem of mining DPs in existing OO systems has been faced and discussed in several works,
and different methods and techniques have been proposed to support it. Some reviews on current
techniques and tools for discovering architecture and design patterns from OO systems are provided
in [24] and [55]. In the last work, authors classified pattern recovery techniques basing on the used
type of analysis and the adopted searching methodology.

Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
DESIGN PATTERNS DETECTION USING A DSL-DRIVEN GRAPH MATCHING APPROACH 3

Reference Detection Approach Tool Mined Recovered DPs Case studies Precision
Code

Keller et al. (1999) Minimum key struc- SPOOL C++ Template method, 2 industrial systems, -
[41] ture factory method and ET++
bridge
Philippow et al. Mimimum key struc- - C++ GoF Student projects 100% for
(2005) [52] ture Singleton,
Interpreter
Kramer and Class structure Pat C++ Adapter, bridge, NME, LEDA, zApp 14-50%
Prechelt (1996) proxy, composite,
[43] decorator
Beyer et al. (2003, Predicate Calculus Crocopat Java/C++ Composite, mediator 2 Mozilla, JWAM, -
2005) [15] wxWindows
Dong et al. (2009) Matrix and weights DP- Java Adapter/command, Java AWT, JEdit, 91-100%
[25] Miner bridge, composite, JHotDraw 6.0b1
strategy/state
Balanyi and Fer- Class structure Columbus C++ Reclassified GoF 2 Jikes, Leda, StarOf- < 60%
enc (2003) [10] fice, StarOffice Writer
Heuzeroth et al. Predicates on - Java Observer, Swing -
(2003) [38] abstract syntax trees mediator, chain
of responsibility,
visitor, and decorator
Niere et al. (2002, Cliche’ recognition FUJABA Java GoF AWT library -
2003) [48] and graph
transformation,
with fuzzy logic
Antoniol et al. Cliche’ matching with - C++ Adapter, bridge, LEDA, libg++, galib, 30%
(1998) [7] software metrics in proxy, composite, mec, socket
the class structure decorator (1995)
Kim and Boldyreff Metrics - C++ GoF 3 Systems (no info) Avg 43%
(2000) [42]
Olsson and Shi Class structure, PINOT Java Reclassified GoF ANT, AWT, JHotDraw, -
(2006) [56] exploiting inter-class Swing
relationships
Kaczor et al. Bit-vector based on - Java Abstract factory and JHotDrawm, -
(2006) [40] string representation composite QuickUML, Juzzle
Smith and Stotts Elemental design pat- SPQR Java Decorator - -
(2003) [57] terns and rhocalculus
Tsantalis et al. Class structure - Java Composite, adapter/- JHotDraw, JRefactory, 100%
(2006a) [61] expresses as command, decorator, JUnit
matrices, exploiting observer, state/strat-
Graph similarity egy, prototype, visi-
algorithm tor
De Lucia et al. XPG formalism and DPRE Java Adapter, bridge, JHotdraw (5.1, 6.0b1), 62-97%
(2009) [19] LR-based composite, proxy, QuickUML, Apache
and decorator Ant, Swing, and
Eclipse JDT
Bergenti (2000) Class Structure IDEA UML Template, Proxy, - -
[12] Bridge, Composite,
Decorator, Adapter
Arcelli (2011) [8] Basic elements and MARPLE Java Abstract Factory, JavaReports, Batik 10-80%
metrics Composite, Visitor
Vokac (2006) [62] Semi-formals no C++ GoF CRM commercial sys- -
diagrams translated name tem
into queries
France et al. UML no C++ Abstract Factory, -
(2004) [27] name Bridge, Decorator,
Singleton, Observer,
Composite, and
Visitor
Stencel (2008) [59] Static analysis tech- D3 Java GoF AJP and JHotDraw -
niques and SQL
Gueheneuc (2008) UML-like multilayered DeMIMA Java GoF 33 industrial systems, 34 %
[33] approach 5 open source sys-
tems
Proposed DSL-Driven Graph- DPF Java GoF and their vari- 12 open source sys- 95%
approach Matching ants tems
Table I. An overview of Design Pattern Detection Approaches

Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
4 M.L. BERNARDI - M. CIMITILE - G. DI LUCCA

2.1. Type of analysis


The type of analysis can be classified as: structural analysis, behavioral analysis, semantic analysis
and formal specification/composition analysis.
Structural analysis approaches consist in recovering the structural relationships from available
source code artifacts. They focus on recovering structural DPs such as Adapter, Proxy, Decorator,
etc. These approaches consider inter-class relationships to identify patterns structural properties
[36]. An example is in [6], where a structural analysis of DP is proposed starting from C++ systems.
Moreover, in [62] the source code is parsed using a third party commercial tool called Understand
for C++. The tool extracts the entities and the references from C++ source code and stores them in
a database. Queries are performed on the database to extract different properties of patterns. In their
experimentation authors recovered Singleton, Factory, Template, Observer and Decorator patterns
from a VCS (Version Control System).
Behavioral analysis approaches adopt dynamic analysis, machine learning and static program
analysis techniques for patterns behavioral aspects extraction. They can be used together with
structural analysis when patterns are structurally identical or have a weak structure (e.g, State and
Strategy patterns are structurally identical). The limit of these approaches is the high number of
false positives at the increasing of the number of execution traces [25].
Semantic analysis approaches complete the structural and behavioral analysis reducing the false
positive rate for recognition of different patterns. They use naming conventions and annotations
which contain the role information about the classes and methods [25, 54]. This analysis can be
used for recovery of patterns having similar static and behavior properties (e.g., Bridge and Strategy
patterns). Different techniques are used for semantic analysis. In [25], three options are discussed
and they conclude that naming conventions are most appropriate and feasible option.
Finally, formal specification/composition analysis of DPs includes some approaches on formal
patterns specifications. It is important to supplement different detection approaches by formally
specifying different patterns [11, 27, 64, 58]. Moreover, DPs have different implementation variants
and any formal specification of patterns can help to specify the possible variations in different
patterns as well as overcome the challenges of capturing their semantics. These approaches use
formal specification languages to specify DPs supported by tools validating the correctness and
completeness of the specifications [27]. In [45, 46] an extension of the UML sequence diagram
is proposed allowing designers to define and visualize the pattern roles and the different types of
interaction groups for a design pattern.
Other approaches [53, 30] propose definitions of pattern variants composed of reusable feature
types. A feature type is detected in the source code with a search technique that is most fitting
for its characteristics. Different technologies are used for detecting different parts of a pattern. In
particular, in [30] four new variants of standard patterns (Abstract Factory, Decorator, Adapter and
Proxy) are defined during analysis of different source applications. This approach is similar to ours
because it explicitly represent the concept of variants. It presents a model based on a set of properties
(called by authors feature types), that all together are capable to represent a design pattern. Our
approach defines variants using domain specific language modeling a wide set of structural and
behavioral properties (like delegation, object creation, calls and dependencies) using inheritance to
reuse previous specifications. This makes easy to express and mine design pattern with complex
behavioral relationships.

2.2. Searching Methods


Searching methods can be classified as: Database queries, Constraint resolver, Metrics, XPG
formalism and parsing, UML structures and matrices, miscellaneous approaches.
Several pattern recovery techniques use Database queries [54, 65, 41, 43] for extracting patterns.
They produce an intermediate representation of the source code (i.e. ASG, AST, XMI, meta-data
and UML structures) and then use SQL queries to extract pattern related information.
Performances in these cases depend on the underlying database and can be scaled very well, but
the queries are limited to the information available in the intermediate representations. Existing
SQL-based approaches usually store a limited sets of properties related to the source code elements.

Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
DESIGN PATTERNS DETECTION USING A DSL-DRIVEN GRAPH MATCHING APPROACH 5

This is mainly due to the relational model that is not suitable to easily express and query complex
graph-based structures. Moreover, they are used for structural and creational DPs and they only
partially support behavioral DPs identification. This issues are duscussed in [43]. A notable
exception is [54] that proposes a meta-model that takes into account a reasonably complete set of
structural and behavioural properties. With respect to it, our meta-model: (i) includes the possibility
to express fields, methods and constraints on them; (ii) takes into account delegation and dependency
properties and (iii) allows using the defined DSL to build higher-level properties starting from the
low-level ones in order to mine complex structure (requiring no changes to the search engine).
Several tools and frameworks to identify idioms, macro-patterns, DPs and design defects use
explanation-based constraints programming techniques. For example, in [35], authors recover
patterns using a multilayered approach which focuses on ensuring an optimal recall rate, but
precision and performance are low.
Metric based techniques compute program related metrics (e.g., generalizations, aggregations,
associations, interface hierarchies) from different source code representations and compare their
values with source code DPs metrics. These techniques [63, 49, 7, 42] are computationally efficient
because they reduce the search space through filtration [34]. The limit is that they have been
experimented on a few number of patterns. Moreover, their precision and recall is low.
XPG formalism and parsing techniques use SVG (Scalable Vector Graphics) format for the
intermediate representation of the source code and represent DPs in a visual language by mapping
the grammar of each pattern with the graph representation. They give a precise visualization but are
limited only to structural DPs. Moreover, to the best of our knowledge, the existing experimentations
are limited to few patterns [18] and do not show any recall rates.
UML structures and matrices techniques [61, 25, 49, 33, 12, 56] allow to represent structural and
behavioral information of software systems. They apply different techniques to match DPs template
metrics with the matrices generated for the system. In [33] a semi-automatically approach to identify
micro-architectures in source code is proposed. The approach is based on information organized in
three layers: two layers are used to recover an abstract model of the source code (including binary
class relationships) and a third layer is used to identify DPs in the abstract model.
A DPs detection methodology based on similarity scoring between graph vertexes is proposed
in [61]. The approach is able to also recognize patterns that are modified from their standard
representation. It exploits the fact that patterns reside in one or more inheritance hierarchies (in
order to reduce the size of the graphs which the algorithm is applied to). These approaches are
computationally efficient and have good precision and recall rates. Their limit is that they miss to
extract the implementation variants of similar DPs. Furthermore, they are limited to a few number
of patterns.
Finally, there are some well known techniques that cannot be classified in the above categories
(e.g., fuzzy reasoning, bit vector compression, minimum key structure method, predicate and rho
calculus, dynamic analysis using run-time execution traces, formal methods based on semantic,
machine learning based approaches and concept analysis) but are good as a complement to improve
the structural methods cited above [48, 38, 40]. For example, in [19], De Lucia et. al. present some
case studies of recovering structural design patterns from OO source code and in [21] they propose a
model checking approach to analyze behaviour of pattern instances both dynamically and statically.
In [8] a tool for DPs detection and software architecture reconstruction is proposed. An approach
mixing structural and metric techniques is used to detect pattern instances. More recently, in [4], DP
recovering is obtained using ontology formalism. DPs restrictions are formalized and translated into
rules that are executed on a knowledge-base that is populated with semantic descriptions of library
code.
Some studies have been also focused on the formalization of empirical evaluation criteria
[25, 19, 60]. Each applied technique should be evaluated using well defined criteria and different
authors have proposed taxonomies and related frameworks to perform such evaluations.
The approach we propose is based on a system meta-model, and a DSL, that is able to represent
elements down to statements and expressions. This allows to reason about structural and behavioral
properties that can be used (i) to improve search space reduction and (ii) to distinguish between

Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
6 M.L. BERNARDI - M. CIMITILE - G. DI LUCCA

patterns that have the same structure but behave (or are used) in different ways [61]. Thus the type
of analysis includes structural and behavioral analysis whith a graph matching searching method.

2.3. Comparison with the proposed approach


As discussed above, in the last years a high number of DP mining approaches and tools are
implemented. According to this, there are several studies aiming to compare and describe existing
approaches [26, 19, 52]. In Table I, starting from these works and from the analysis of literature,
we synthesize the most relevant approaches to DP mining. For each approach, the table reports: the
referring authors, a synthesis of the adopted strategy, the supporting tool (if any), the programming
language of the mined source code, the searched DPs. Moreover, the list of the systems used during
the tool experimentation and the obtained average precision are shown. The main advantage of the
DPF approach over most of existing ones is that it allows to identify variant forms of the classic DPs
(as known in the literature). This is a particularly important issue since DPs are present in real world
systems with many different variants [61, 57]. With respect the existing approaches, usually based
on a fixed subset of relationships among source code elements, our approach is based on a richer
meta-model taking into account both structural relationships and behavioral ones (as delegation,
calls, dependency and object creations). Moreover, the declarative DSL specification “override”
capability, makes the approach flexible in detecting new patterns, domain specific patterns and
architectural ones.
As shown in the table, there are tools for discovering patterns from Java source code and tools
for mining patterns from C++. Starting from C++ mining tools, in [41], authors use Datrix as
intermediate format to express a wide set of source code properties in order to model DPs. DPF
extends the set of such properties and introduces a DSL in order to express pattern specification in
a declarative fashion. Moreover inheritance among specifications allows to mine variants without
changes to the pattern detection algorithm. In [52], a Rational Rose C++ analyzer enables DP UML
diagram extraction out of C++ source code. As the authors observed, the language used to write the
detection cannot express several key behavioral properties and: (i) the approach is not capable to
distinguish among pattern with same structure but different behaviour; (ii) it is difficult to extend
the approach in order to consider new kind of properties that such model does not take into account.
Another C++ code mining tool is Pat ([43]), where the fundamental idea for the automated search
is to represent both patterns and designs in Prolog. Unfortunately, some information that would also
be relevant for a precise search for pattern instances is not completely extracted by the structural
analysis there proposed.
DPF overcomes the limitations of [52] and [43] tools because it is independent from any modeling
approach/tool, and its meta-model and DSL allow to model a very broad variety of structural and
behavioral properties in a declarative way to easily extend the original properties without changing
the mining algorithm.
Another DP mining tool is Crocopat [16]. This tool works exclusively on RSF input format. To use
Crocopat on Java projects, a tool is needed that parses Java source code and creates Crocopat input
format from it. Crocopat, is compared to Columbus [10] in [28]. This tool is versatile and easily
extensible (given the high number of available plug-in) but differently from our proposed approach
it requires that the systems are annotated with Design Pattern Markup Language to describe DPs.
For Java mining tool, FUJABA [48] and SPQR [57] were considered. These tools are based on a
very similar decomposition method. DPF, with respect to FUJABA introduces an explicit DSL to
express pattern specification declaratively. Moreover, in order to improve scalability reducing the
search space, our approach allows variants to be defined using inheritance. An approach of DPs
identification using a high performance bit-vector algorithm is proposed in [40]. The approach is
more efficient in term of space and the compactness of representation but is based on a restrict set
of properties with respect to our approach. Comparing DPF with DeMIMA [33] is quite complicate
because DeMIMA is a very different multilayer approach. Basing on precision results shown in the
table, we can suppose that DPF offers a higher precision because it can be configured using a set of
variables. MARPLE tool ([8]) is based on interesting graph matching technique using an attributed
relational graph in which types are nodes and microstructures are associated to edges. Compared to

Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
DESIGN PATTERNS DETECTION USING A DSL-DRIVEN GRAPH MATCHING APPROACH 7

Figure 1. The meta-model represented as a UML class diagram

the DPF approach, MARPLE is less capable to reduce the search space and, according literature, it
has been validated just on few DPs.
DPRE tool ([19]) is based upon the XPG grammar formalism to express patterns. Once the
grammar is defined, the Visual Programming Environment Generator produces a visual editor and a
XpLR parser from it. XPG grammars, allows to represent a set of properties using the terminals as
building blocks (i.e. Class, Aggregation and Inheritance). Our approach uses a different matching
algorithm that is based on a graph model that also takes delegation, object creation, dependency
and containment into account. Moreover DPRE requires code generation in order to detect new
patterns (or variants) since the grammar must be used to generate the pattern and the related visual
editor. In contrast, our matching algorithm is applied on graphs that are generated by means of
a run-time translation of the DSL into graphs and hence does not require any code generation or
integration step. While this could increase detection times with respect to DPRE, it allows to reach
better precision since the set of patterns and their variants can be effectively customized for any
given context. An extension of DPRE approach has been proposed in [20, 22]. Finally, there are
some additional approaches that are not reported in table for briefly. In [67, 66] a technique for DP
recognizing is provided. It complements existing static analysis with a dynamic analysis. In contrast
with our proposed approach (that uses a static analysis even to recover behavioral properties),
a dynamic analysis is here introduced to transform DP behavioral aspects into finite automata
identifying relevant methods calls. A dynamic analysis is also used in MoDeC ([47]) to describe
behavioral and creational pattern as collaborations among objects in the form of scenario diagrams
and in [39], where a test program is executed on the system to produce traces for the behavior
parser. The dynamic analysis should provide better precision in recovering pattern behavior ( [47]
) but requires a complete system execution and the generation of all the pattern-relevant traces.
Moreover, we have selected a wide set of properties that allow statically to infer pattern behavior.
The results of our study are effective if compared with a dynamic based approach. Finally, a more
recent tool is DPJF ([17]) which implements an approach similar to the proposed one. It improves
precision performing a set of specific behavioural analyses on source-code (e.g., forward to a single
object, field maintenance, state propagation ) to filter out false positives. Our approach, while not
based on such specific analyses, allows by using DSL constructs to build most of them declaratively
improving the overall flexibility of the mining process.

Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
8 M.L. BERNARDI - M. CIMITILE - G. DI LUCCA

3. THE DETECTION APPROACH

The detection approach defines and exploits a meta-model and a Domain Specific Language (DSL)
to model the structure of both the software system and the DPs to be detected.

3.1. The Meta-model and the Domain Specific Language


The meta-model uniformly describes the DPs and systems in terms of relationships among code
elements. Both the structural (e.g., inheritance, implementation, type nesting and visibility) and
behavioral relationships (e.g., delegation, object creation) among the Types are traced down to
the DPs Properties and Types components. The meta-model is shown in Figure 1, where a
UML class diagram represents the structure of an OO system (i.e., its Types and the structural
relationships among them), the structure of the DPs (i.e., the set of Properties modelling their
structural elements) and the relationships among the DPs code elements and the Types. Starting
by the upper part of the figure, we can observe that the structure of a system is modeled as a set of
Types (i.e., Container, Value, Reference, and Compound Types ∗ ) along with their relationships.
Reference Types are composed by Fields and Methods, and a Method can have zero or more
Arguments. A ReferenceType can inherit from another ReferenceType as well as can contain another
ReferenceType (e.g., an inner class). The bottom part of the diagram defines a DP as the aggregation
of several Properties characterizing it:

• Classifier: it allows to introduce a Type (Class or Interface) used in a pattern specification or


to modify an already existing Type. Moreover, this property permits to model a role needed
by the pattern with respect to its required internal structure and relationships with other
Classifiers. Finally, it allows to define constraints, if any, on its super-type or its implemented
interfaces.
• Data : it permits to define a field in an existing Classifier or override an existing field. This
property can specify an existing Classifier as the field’s type or a compound type of an existing
Classifier (like an array for a generic Collection).
• Behavioral: it allows to define a method in an existing Classifier or to override a method’s
definition. Method definition includes the specification of its return type and its arguments
along with their optionality flag (indicating whether the element is mandatory for the pattern
specification or not). This property can be used to define one or more of the required (or
optional) behaviours of the Classifiers introduced in a pattern specification.
• Dependency: it describes the dependency between a pattern element (like a method) and
another pattern element (as another method or a field).
• Invocation: it models a call between methods of Classifiers defined in the pattern specification.
• Delegation: it specifies a mapping between a set of methods of a Class and a set of methods
of an existing Classifier in the pattern specification. This allows to take into account the
delegation for the patterns that require it.
• Object Creation: it models the creation constraints specifying the method or the field that
needs the object creation and the Classifier of the created object. This happens for patterns
expressing a mandatory object creation semantic as in the case of creational patterns but also
for many patterns in the other categories.

This part of the meta-model is also used as the base to define the DSL representing structural and
behavioral relevant properties of OO software systems.
Our DSL language takes inspiration from existing pattern detection languages. Some example are
the SDF, Crocopat and Grok [37, 15, 31]. In particular, for pattern mining, two requirements that
strongly point towards using a domain-specific language are: (i) the need to express the structure of
software system (and patterns), composing it by recursive rules; and (ii) the need (and difficulty) of

∗ Compound types are treated as separated types since they must specify the base type of compound. In this class are also
arrays and generic types.

Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
DESIGN PATTERNS DETECTION USING A DSL-DRIVEN GRAPH MATCHING APPROACH 9

p a t t e r n ob s e r v e r {
type AS( 1 ) {
has method A , R ;
has method N ;
has container o of type AO;
}
type AO( 1 ) {
has method U ;
}
type CO( ∗ ) {
i n h e r i t s −from AO;
}
type CS( ∗ ) {
i n h e r i t s −from AS ;
has constructor c {
object −c r e a t i o n o ;
}
overrides methods [ A , R] each {
delegates to o ;
}
overrides method N each {
delegates to o ;
c a l l s U i n AO. U;
}
}
}

Figure 2. An Example of DSL instance: the Observer Pattern specification

succinctly representing effective constraints on such structure. Starting from the existing languages,
the DSL was defined with the aim to express design pattern specifications with the following goals:
• a specification should be writable by the analyst with reduced effort;
• the DSL should allow to express constraints on source code structure and behavior to model
complex DPs;
• the DSL should support the definition of pattern variants (using inheritance among
specifications) to foster reuse;
As an example how a pattern is modeled using the DSL, let us consider a classic Observer DP
(supporting only a single kind of event for each notify method) as proposed in literature [29]. Figure
2 shows the DSL specification for such an Observer DP. Each specification is just a sequence of type
blocks: each block specifies the set of properties that must hold for a role in the pattern (including
the constraints on the allowed multiplicity, reported in the brackets just after the type name - if no
brackets follow the type name the default multiplicity is 1).
As shown in the Figure 2, the Observer specification requires:
• a single AbstractObserver (AO) and several ConcreteObservers (CO);
• a single AbstractSubject (AS) and several ConcreteSubjects (CS);
• a container of AbstractObservers to be defined in the ConcreteSubject (the field “o”);
• the methods A and R (that play roles of add and remove) to be defined in the AbstractSubject
and overridden in ConcreteSubjects;
• a Delegation to be defined between A and R of ConcreteSubject and the add/remove methods
Container type;
• the notify method (called “N”); the method N must contain an invocation towards the update
method U of the AbstractObserver classifier;
• an object creation (to initialize the container field “o”) in the constructor of the
ConcreteSubject type.
Each specification can be translated into a graph in which elements are nodes and properties
are labelled edges. This graph, as better explained in Section 3.2, is part of the input for a two-
pass graph-matching detection algorithm. Figure 4, on the top right side, shows an excerpt of the

Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
10 M.L. BERNARDI - M. CIMITILE - G. DI LUCCA

graph representing the Observer as described in the specification of Figure 2. Such graph reports
key elements specified in the DSL together with the relationships among them † . A variant of
this Observer can be defined, easily, deriving it from the DSL specification of Figure 2. Structural
elements that need to be changed can be overridden. For instance, Figure 7, on the right side, shows
the graph of a common multi-event Observer. This variant redefines the elements “N” and “U” as
sets of methods to take into account different kinds of events and notification handlers.

3.2. The Pattern Detection process


The pattern detection process comprises the following main steps:

• definition of the patterns specifications repository: each specification is written according


to the proposed DSL and organized in a catalog stored into a repository.
• pattern Models instantiation: the DSL specifications are parsed to generate the Design
Pattern Graphs (DPGs) to be detected.
• system source code analysis: the source code of the system under study is parsed and the
complete ASTs of the system are produced.
• generation of an instance of the system model: a traversal of the system ASTs is performed
to generate an instance of the system model, also represented as a graph (called the System
Graph - S). Rapid type analysis (RTA), class flattening and inlining of not public methods are
exploited in order to build a system’s representation suitable for the matching algorithm‡ .
• design patterns matching: S is traversed and a matching algorithm is performed to
identify implemented patterns. During the detection each pattern instance is mapped to the
corresponding matching design pattern graph.

In order to describe the detection algorithm, we provide the definitions of the notations and
concepts used in it.
A DPG can be considered as an attributed graph specifying a set of predicates on the attributes
that must hold. It is the building block of the detection process and is used to identify sub-graphs of
interest occurring in the system graph. Formally:
Definition 1
DPG — A design pattern graph is a pair DP G = (P, AC), where P is an attributed graph defined
by P = (V, E) where E and V are respectively its nodes and edges. AC is a set of predicates on the
attributes that contains compound expressions made of conditions on nodes, edges and attributes of
P. §
We introduce at this point the definition of DPG matching which generalizes sub-graph
isomorphism with evaluation of the predicates on the attributes.
Definition 2
DPG Matching — A design pattern graph DPG(P, AC) is matched with a system graph S if there
exists an injective mapping ϕ : V (P ) → V (S) such that: (i) ∀e(u, v) ∈ E(P ), (ϕ(u), ϕ(v)) is an
edge in S, and (ii) predicate ACϕ (S) holds.
If a DPG is matched to a system graph, the binding between them can be used to access the
sub-graph on the system (either the sub-graph structure or attributes and properties on nodes and
edges).
We define a matched DPG to denote the binding between a DPG and the system graph as follow:

† To keep figure concise, each node/edge is labelled with the initial letter of the corresponding field or method in the
DSL. Moreover, not all properties are represented, as for type CS which has several overrides and a constructor that are
all omitted.
‡ Note that RTA is used to handle late binding and hence the computed call graph reports a super-set of the real calls that
can be executed at run-time. This however only lowers the precision in very few cases. A discussion on the impact on the
detection quality is however reported in threats to validity section.
§ Compound predicates can be broken down to simple predicates on individual (or set of) nodes or edges.

Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
DESIGN PATTERNS DETECTION USING A DSL-DRIVEN GRAPH MATCHING APPROACH 11

1
2 L i s t <MatchedGraph> i n s t a n c e s = . . . ;
3
4 void s t a r t ( )
5 begin
6 f o r w a r d N e i g h b o r h o o d A n a l y s i s (DP)
7 f o r i = 1 to k do
8 Match ( i ) ;
9 end
10 end
11
12 void f o r w a r d N e i g h b o r h o o d A n a l y s i s (DP)
13 begin
14 foreach node u i n DP do
15 ϕ(u) = { v i n V ( S ) | ACu ( v ) = t r u e }
16 ( 1 ) computation ϕ(u)
17 ( 2 ) reduce ϕ(u1 ) . . . ϕ(uk ) u s i n g
18 lookahead and p r o p e r t i e s c o n s t r a i n t
19 end
20 end
21
22 void Match ( i )
23 begin
24 foreach v i n ϕ(ui ) | v i s f r e e do
25 i f not checkNeighborhoodBindings ( ui , v )
26 then continue ;
27 ϕ(ui ) = v ;
28 i f i < | V(DP ) | then Match ( i + 1 ) ;
29 else
30 i f ACϕ ( S ) then
31 i n s t a n c e s . add ( ϕ() ) ;
32 end
33 end
34
35 boolean checkNeighborhoodBindings ( ui , v )
36 begin
37 foreach edge e ( ui , uj ) i n E(DP) , j < i do
38 i f ( edge e1 (v, ϕ(uj ) ) not i n E ( S ) )
39 o r ( not ACe (e1 ) ) then
40 return false ;
41 end
42 return true ;
43 end

Figure 3. A sketch of the detection algorithm.

Definition 3
Matched DPG — Given an injective mapping ϕ between a DPG and a graph S, a matched graph is
a triple ⟨ϕ, DP, S⟩ and is denoted by ϕDP (S).
Figure 3 outlines the detection algorithm. The specification expressed as a design pattern graph
DP is rewritten by means of a set of predicates on individual nodes ACu and edges ACe . For each
node u in the pattern DP, there is a set of candidate matched nodes in S for which the constraints
ACu hold. These nodes define a (partial) matched design pattern graph referred as candidate
neighborhood of node u and denoted by ϕ(u):
Definition 4
Candidate Neighborhood — The candidate neighborhood ϕ(u) of node u is the set of nodes in graph
S that satisfies the predicate ACu :
ϕ(u) = {v | v ∈ V (S), ACu (v) = true}
Hence the search space of a DPG matching a pattern DP on the system S is defined by the
candidate neighborhoods of all nodes belonging to the pattern specifications as follows:

Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
12 M.L. BERNARDI - M. CIMITILE - G. DI LUCCA

Figure 4. A running example showing the candidate neighborhood analysis.

Definition 5
Search Space — The search space of a DPG on a system graph S is defined by the candidate
neighborhood in S of all DPG nodes. It corresponds to the Cartesian product of the candidate
neighborhood for each DPG node: ϕ(u1 ), × . . . ×, ϕ(uk ) ∈ S , where u1 , . . . , uk ∈ DP .

The algorithm can be seen as composed of two phases. The first one starts at line 5 by calling
(line 7) the forwardNeighborhoodAnalysis function (lines 13-21) which computes the candidate

Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
DESIGN PATTERNS DETECTION USING A DSL-DRIVEN GRAPH MATCHING APPROACH 13

neighborhood for each node uh in the DPG. This function ends after performing a pruning step of
the search space (better described in the following).
The second phase (on lines 23-44) performs a search, over the product ϕ(u1 ) × . . . × ϕ(uk ) using
a depth-first traversal, to find a sub-graph isomorphism. The Match(i) function iterates on the ith
node to find valid bindings for that node. Procedure checkNeighborhoodBindings(ui, v) examines
if ui can be mapped to v by considering their edges and attributes. Line 28 maps the node ui to
v. Lines 29-33 continue to search for the next node or, if it is the last node, evaluate the predicate
AC to check constraints. If it is true, then a valid binding ϕ : V (DP ) → V (S) has been found and
added to the list (line 32). Since the worst-case complexity of the matching algorithm is O(nk ),
where n = |S| and k = |P |, to make the algorithm usable on real systems, a search space reduction
technique must be used. Our approach uses system and pattern information to reduce the size of
candidate neighborhoods and exploits a look-ahead requiring, for each node ui of the DPG, a valid
(partial) binding of the neighborhood sub-graph centered in ui and having a fixed distance r from it.
For each candidate neighborhoods, structural information (e.g., nodes and edges) and predicates
on attributes (types and properties of nodes and edges) are used to prune matches that would not
produce acceptable solutions. This neighborhood knowledge can be exploited to prune unfeasible
sub-graph at an early stage and obtain a reduced set of candidates on which to perform the full
depth first matching (that is resource- and time- expensive). There is a trade-off with respect to
how candidate neighborhood sub-graphs are built. They increase pruning power as the look-ahead
increases, but their construction is of polynomial complexity (with respect to look-ahead). The
current implementation uses a look-ahead equals to 1 (immediate neighborhood). We found no
improvement with a look-ahead equals to 2 since even if for some patterns (those that have a
rich structure or highly constrained) time was greatly reduced, for others the bigger neighborhood
analysis increased the total time (i.e., the average time remained almost the same). Figure 4 reports
a simple running example of the detection process. It shows the candidate neighborhood analysis
and the resulting bindings related to the pattern specification of Figure 2 as performed on a subset
of the roles (only AS and AO) of the Observer DPG and on a small portion of a system graph
(respectively DP and S in the Figure 4). For each pair of nodes ui ∈ V (DP ) and vj ∈ V (S), the
neighborhood sub-graph of ui and vj are matched to find candidate neighborhood. In the step 1 the
pair considered is (Text, AS) and hence the immediate neighborhoods of respectively DP and S are
considered. In this case the match fails since pattern node neighborhood is not a sub-graph of S. By
converse the step 2 on pair (AS, Figure) is a successful match since AS and Figure neighborhoods
are congruent and all the constraints are satisfied (nodes and edges are of the same types, and
multiplicity constraints are met). Hence several conditioned bindings are established for the matched
candidate neighborhood. The step h is for the pair (Figure, AO). Due to structural differences this
match fails (correctly) avoiding an unfeasible candidate neighborhood (since the Figure is not an
AbstractObserver). This because the structure of the Observer DP has a quite good pruning power.
Simpler patterns may generate a higher number of candidate neighborhoods that must be taken
into account in the second phase of the algorithm in which the full depth-first matching performed
increases time and space requirements. Hence, to further reduce search space, the algorithm is
executed on all the DPGs in the specification repository and the previous bindings are taken into
account when performing the subsequent candidate neighborhood analysis.
For some pattern model, the same element could be bound, in the general case, to several patterns.
This however is not true for all pattern elements. For instance, the binding of the visit() for the Visitor
pattern method should not allow bindings of another patterns (e.g., like the execute() method of a
Command or the notify() method of an Observer). When a pattern model explicitly forbids multiple
bindings for a pattern element, existing established bindings of already analyzed patterns are used
as further constraints to improve the search space reduction (pruning unfeasible bindings as early as
possible).
Variants are handled in the same way as other specifications, with no special treatment within the
detection process. The only difference regards how their sub-graphs are built (taking into account
the specification inheritance relationships and using a flattening approach). The resulting variant
graph contains both the properties inherited from their super-specification and the overridden ones.

Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
14 M.L. BERNARDI - M. CIMITILE - G. DI LUCCA

The candidate neighborhood analysis is performed for each pattern role and for each system
element ¶ . After the analysis the bindings are merged together and verified as a whole and hence
results are not influenced by role ordering. Moreover, after candidate neighborhood has been
performed, all the candidate bindings are verified to see: (i) if they cover all mandatory pattern
properties, and (ii) if all the matched candidate bindings hold. If both conditions are met the set of
candidate bindings represents a valid pattern instance linking each found pattern role to the set of
system elements implementing it.
The execution times are influenced by the ordering of the pattern specifications since existing
bindings for single-bind roles are used to prune the list of system elements to consider during
subsequent candidate neighborhood analyses (reducing execution times).

4. THE PATTERN SPECIFICATIONS CATALOG

A set of DSL specifications of the most commonly used DPs has been written and stored in the DP’s
specifications catalog repository.
The currently defined catalog is shown in Figure 5, where each DP is represented as the root
of a hierarchy (i.e., the darker rectangle(s) in each box) while each descendant (i.e., the lighter
rectangles) represents a DP variant. The catalog is composed of 18 patterns detected by 56 variants.
The detection relationship between a variant and the mined pattern is depicted using a dashed arrow.
Inheritance between specification is depicted as the standard UML generalization.
A description of the DSL code and DPGs for the most relevant DPs specifications and their
variants is provided in the Appendix.
In the remaining of this section, in order to show how DSL statements are used to specify design
patterns to mine, the Observer multi-event variant is described in more details.

4.1. Observer
In Section 3 a description of both DSL and DPG for the Observer DP has been provided. Here, we
give the DSL specification and DPG of a variant of this DP, known as multi-event Observer and
often used in real-world software systems.
This variant takes into account also notify methods with one or more parameters and indirect calls
to update methods. The structure of this specification, reported in Figure 6, is quite different from
the one proposed in literature (shown in Figure 2) and requires the following properties:

• a single AbstractObserver (AO) and several ConcreteObserver (CO);


• a single AbstractSubject (AS) and several ConcreteSubject (CS);
• a container of AbstractObservers to be defined in the ConcreteSubject (the field “o”);
• an abstract event class EH modeling the event concept;
• one or more concrete event classes CE modeling the concrete events;
• the methods A and R (that play roles of add and remove) to be defined in the AbstractSubject
and overridden in ConcreteSubjects;
• a Delegation to be defined between A and R of ConcreteSubject and the add/remove methods
Container type;
• notify methods set (called “N”); each method of the set must contain an invocation towards
the update method of the AbstractObserver classifier;
• an object creation (to initialize the field “o”) in the constructor of the ConcreteSubject type.

These properties together define the DPG reported in Figure 7.


As a final consideration, the DSL specifications are, of course, critical to the whole detection
process that is deeply affected by the correctness, the completeness and the conciseness by which

¶ Actually for each system element that has not already bounded to a pattern role that explicitly requires unique bindings
for it

Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
DESIGN PATTERNS DETECTION USING A DSL-DRIVEN GRAPH MATCHING APPROACH 15

Prototype Observer Proxy

GoF Observer

GoF Prototype Deep Clone Prototype Client-init Prototype

Multiple Events Observer ProxyIndirection


with Parameter
GoF Prototype With Manager Client-init Prototype With Manager

Multiple Events Observer


Deep Clone Prototype With Manager with Hierarchy

Command
Factory Method Abstract Factory Template Method

GoF Command

Factory Method Abstract Factory


Template Method with Generics
Command With Generics CommandInnerClasses Classic GoF Classic GoF

Command with Explicit


Execution Context
Single Creator Delegation Based Parametrized
Command with Undo/Redo

Command with Undo/Redo Single Factory RegistryBased Parametrized AF


separate Engine

Singleton Composite Decorator

GoF Composite Memento

EnumerationSingleton

Container Composite with


Decorator
GoF Singleton within Component Components Interator

Multiple GoF Memebto


Composites

Pool Relaxed Sigleton Reflective Singleton Multiple


Composites Roots

Adapter Bridge Iterator

Object GoF Iterator


Adapter GoF Bridge Bridge With Generics

Class internal-multiple external


AdapterWithGenerics AdapterInnerClasses Deferred Bridge
Adapter

magic-cookie external-nested
TwoWay-Adapter DynamicAdapter

State Strategy Builder Visitor

GoF State

GoF Strategy GoF Builder GoF Visitor

uml-statechart

Figure 5. The DPs specifications hierarchies

they are written. For instance writing a specification with conflicting rules is legal within our DSL,
but will result in bad mining performances and will obviously produce no results.

Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
16 M.L. BERNARDI - M. CIMITILE - G. DI LUCCA

1 pattern Observer multievent {


2 type AS( 1 ) {
3 has method A , R
4 has methods−set N
5 has container o of −type CO
6 }
7 type AO( 1 ) {
8 has methods−set U
9 }
10 type EH( 1 ) {
11 has method−set G
12 }
13 type CE( ∗ ) {
14 i n h e r i t s from EH
15 }
16 type CO( ∗ ) { i n h e r i t s −from AO }
17 type CS( ∗ ) {
18 i n h e r i t s from AS
19 has constructor c {
20 object −c r e a t i o n o
21 }
22 overrides method A , R {
23 delegates to o
24 }
25 overrides methods−set N
26 each {
27 delegates to o
28 c a l l s u i n CO. U
29 }
30 }
31 }

Figure 6. DSL of the Multi-event Observer specification.

Figure 7. The Multi-Event Observer Design Pattern Graph

5. THE DESIGN PATTERNS FINDER TOOL

The Design Patterns Finder (DPF) Tool implements all the steps of the identification process. It was
developed as a set of Ecplise plug-ins based upon JDT, and upon the EMF framework. Figure 8
shows the overall architecture of the DPF tool. It is a layered architecture: the bottom layer includes

Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
DESIGN PATTERNS DETECTION USING A DSL-DRIVEN GRAPH MATCHING APPROACH 17

Figure 8. The architecture of Design Pattern Finder tool.

Figure 9. DP-Finder Tool: DPF integration in Project Explorer

Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
18 M.L. BERNARDI - M. CIMITILE - G. DI LUCCA

Figure 10. DP-Finder Tool: The DP Specification Editor

the Eclipse Foundation Components [1]. Indeed the tool uses the JDT and the Eclipse Modeling
Platform (Xpand, Xtext and MoDisco) to extract the needed information about the systems static
structure.
The middle layer (DPF Core) includes the main components of the DPs identification process.
Three main sub-systems are included in this layer. The Project Analyzer produces an instance of
the meta-model (i.e., the system Graph) of the analyzed system by static analysis. At this aim the
following information is extracted from the analyzed system: type hierarchy, type inner structure
(attributes, their types and scopes and so on), methods and constructors signatures, method calls,
object creations and container support in order to express containment within types, static member
information, delegation.
The Pattern Specification Parser and Translator sub-system registers the DSL specifications of
each DP, and parses them to produce the corresponding DPGs. Each specification is written using
the defined DSL that is translated, by means of the Xtext-based DSL translator in a set of constraints.
A Pattern Catalog stores the specifications (both DSLs and corresponding DPGs) of the DPs to
be mined (currently the Factory Method, Prototype, Singleton, Adapter, State, Strategy, Composite,
Decorator, Observer, Memento, Template Method, Command, Proxy, Bridge and Visitor DPs are
included in the catalog). Each pattern specification can be standalone or can override a base
specification by changing some of its properties.
The Design Pattern Finder Engine is the heart of the mining process. It executes the detection
algorithm to identify the DP instances by searching the system graph for sub-graphs matching
a defined DPG. Once the system has been parsed and the meta-model instance is built, the user
can select which DPs are to be detected. The execution of the algorithm produces a model of the
system elements annotated with information on the detected patterns (including with their internal
members and pattern roles). Each identified pattern instance is traced to the source code elements
implementing it and the tool allows a user to visualize, inspect and analyze such code components.
The results of the identification process are also stored in the central repository.
The DPF IDE layer allows interactions with users using the tool as shown in figures 9 and 10.
The user can select which patterns are to be searched and analyzed in a system by means of a tree
viewer or using the Eclipse Visualizer. The Visualizer also shows a summary of patterns found in
the analyzed system at different levels (package or class).

Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
DESIGN PATTERNS DETECTION USING A DSL-DRIVEN GRAPH MATCHING APPROACH 19

System Name Version Size #Types #Methods


(KLOC)
Junit 3.7 4,9K 104 648
Lexi 0.1 7,1K 100 677
JHotDraw 5.1 8,9K 174 1316
QuickUML 2.1 9,2K 230 1082
Nutch 0.4 23,6K 335 1854
PMD 1.8 41,5K 519 3665
JRefactory 2.6.24 98,5K 568 4234
Log4J 1.2.15 43,7K 176 863
JHotDraw 7 78,5K 567 5728
Voldemort 1.3.x 85,9K 382 5312
Apache Avro 1.6 125,2K 1085 8451
JDT 3.6.1 511,5K 1655 24153

Table II. Analyzed systems characteristics

However, a full description of the funtionalities provided by the DPF tool and how a user can
interacts with DPF are out of the scope of this paper.

6. CASE STUDY

The effectiveness and efficiency of the proposed approach has been validated applying it to twelve
OO systems.
A first group contains seven open source java software systems of increasing sizes from the
publicly available benchmarks proposed in [32] and in [2]. Moreover, we consider a second group of
five open source java software systems (Log4j, JHotDraw7, Apache Avro, JDT, Voldemort) selected
to perform a direct comparison with a similar design patterns mining approach proposed by Tsantalis
in [61]. All the analyzed systems along with the main structural characteristics are listed in the
Table II. Systems, in both groups, were chosen of increasing sizes to evaluate the scalability of the
algorithm and to validate the quality of results on a large code base. The DPs considered in this case
study are the ones reported in Figure 5. The validation was performed using the DPF tool developed
to support the approach. According to [55], design pattern recovery techniques can be evaluated
by computing precision and recall [51], in order to asses their effectiveness and correctness. To
compute recall and precision we assume that a pattern instance can be classified into one of four
categories:
• true-positive (TP : correctly found),
• false-positive (FP : incorrectly found),
• true-negative (TN : correctly missed),
• false-negative (FN : incorrectly missed).
Precision is defined as the ratio of correctly found occurrences to occurrences provided by the tool
and is given by:
P recision = TP /(TP + FP ) (1)
Recall is the ratio of correctly found occurrences to all correct occurrences and is given by:
Recall = TP /(TP + FN ). (2)
To verify the correctness of the results, in the case of the first group of seven systems, we
considered as Gold Standard (GS) the union of both the benchmarks cited in [32] and in [2]

Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
20 M.L. BERNARDI - M. CIMITILE - G. DI LUCCA

(assumed to be correct) with the correct results produced by our approach (i.e., instances not in
the benchmarcks and mainly due to DP variants detection evaluated by code inspection). ∥ Each DP
instance in the resulting GS was classified as a DP variant according to the defined catalog.
For the remaining five systems, the GS was computed using the correct results produced by both
DPF and DPD tools. Hence it could lack pattern instances missed by both tools (overestimating
recall) but allows to perform a direct (and reliable) comparison on precision.

7. DISCUSSION OF RESULTS

7.1. Results on benchmarks


The study was carried on by applying DPF to find the DPs contained in the first group of seven java
software systems reported in Table II.
In particular, the analysis on JRefactory 2.6.24 system (shown in Figure IX) allows a direct
comparison with results provided for DPD tool in [61], for the tool presented in [2] and for ePad
Eclipse-based tool presented in [20].
Tables from III to IX report, for each of the analyzed systems: the name of the DPs searched in
the code (first column), the number of true positive instances as provided by the benchmark (GS),
the number of each searched pattern detected by the proposed approach (column D), the number of
true positive found by DPF (column Tp), the number of false positive and negatives found by DPF
(columns Fp and Fn). The last two columns report respectively precision (P) and recall (R) computed
on the results as provided by DPF (using the gold standard). In order to improve readability, only
the variants for which either the benchmarks or DPF produced results are reported. The sets of
instances detected for the variants of a given DP can be non-disjoint sets, since an overlap among
istances satisfying more variants’ specifications is possible. This means that, with respect to the
values reported in the tables, the sum of the numbers of the instances of all the variants of a DP may
be greater than the instances of that DP actually implemented in the code.
As shown from Tables III to IX, the average value for precisione and recall are respectively,
0.95 and 0.87. These values are also consistent for increasing system sizes. Patterns like Command,
Composite, Observer but also Visitor are more precisely identified since their specifications include
both static and behavioral relationships. This is confirmed by the number of false positives that is
lower than patterns with a less constrained structure or with limited or absent behavioral properties.
This, as highlighted in the Section 3.2, is due to the higher pruning power of complex DPGs with
respect to the simpler ones.
As results show, the proposed approach is able (depending on how specifications are written) to
distinguish among patterns that have the same static structure but different behaviors. For example,
for the Command pattern, in order to distinguish it from the Adapter one (the object version), the
specification uses the invocation property requiring the execute method, in the concrete subclass, to
invoke a method of a class bound to a Command. The same happens for Composite and Decorator,
where the Decorator is required to specify a delegation towards the decorated object.
Singletons are mined using several variants, from the classic GoF (requiring a private constructor,
a final class declaration and a single getter static method taking no parameters) to the “relaxed” one
(that remove such constraints). To include singletons instantiated using enumerations or static fields
with public access the EnumerationSingleton variant was defined.
As Table IX, according to DSL specifications some several singleton instances were found that
are not compliant to GoF definition and would be missed without the variant specifications.
In particular, DPF detected 43 of 45 instances of RelaxedSingleton of which 8 instances are not
compliant to GoF, two are False Negatives and the remaining 35 ones share both GoF and Relaxed
Singleton Specification (i.e. both specifications were met). In addition, 24 instances were found that
satisfy the EnumerationSingleton specification.

∥ Of course, the different formats of the benchmarks were translated into a unique common format to store the considered
GS.

Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
DESIGN PATTERNS DETECTION USING A DSL-DRIVEN GRAPH MATCHING APPROACH 21

System→ 01-junit
↓Design Pattern GS D TP FP FN P R
Adapter/spec{InnerClass} 10 10 10 0 0 1 1
Adapter/spec{GoFObject} 22 18 17 1 5 0,94 0,77
Command/spec{ExecutionEngine} 57 59 54 5 3 0,92 0,95
Composite/spec{GoF} 3 3 3 0 0 1 1
Composite/spec{MultipleCompositeRoot} 4 4 3 1 1 0,75 0,75
Composite/spec{MultipleComposite} 4 5 4 1 0 0,8 1
Decorator/spec{GoF} 4 5 4 1 0 0,8 1
Factory Method/spec{SingleCreator} 3 3 3 0 0 1 1
Factory Method/spec{GoF} 8 7 7 0 1 1 0,88
Factory Method/spec{SingleFactory} 11 11 10 1 1 0,91 0,91
Factory Method/spec{Parametrized} 4 4 4 0 0 1 1
Factory Method/spec{DelegationBased} 3 2 2 0 1 1 0,67
Iterator/spec{External} 6 6 6 0 0 1 1
Memento/spec{GoF} 2 2 2 0 0 1 1
Observer/spec{EventsAsParams} 6 6 6 0 0 1 1
Singleton/spec{Enumerative} 2 2 2 0 0 1 1
Singleton/spec{Relaxed} 2 2 2 0 0 1 1
Strategy/spec{GoF} 14 12 12 0 2 1 0,86
Template Method/spec{GoF} 22 24 19 5 3 0,79 0,86
Table III. Results obtained on benchmark 01-junit

System→ 02-Lexi
↓Design Pattern GS D TP FP FN P R
Adapter/spec{InnerClass} 36 35 33 2 3 0,94 0,92
Builder/spec{GoF} 5 5 5 0 0 1 1
Command/spec{GoF} 36 33 32 1 4 0,97 0,89
Command/spec{InnerClasses} 37 35 34 1 3 0,97 0,92
Factory Method/spec{SingleCreator} 4 3 3 0 1 1 0,75
Factory Method/spec{GoF} 4 3 3 0 1 1 0,75
Factory Method/spec{SingleFactory} 2 1 1 0 1 1 0,5
Factory Method/spec{Parametrized} 5 4 4 0 1 1 0,8
Factory Method/spec{RegistryBased} 3 2 2 0 1 1 0,67
Factory Method/spec{DelegationBased} 5 4 4 0 1 1 0,8
Observer/spec{GoF} 5 5 5 0 0 1 1
Observer/spec{EventsAsParams} 6 6 6 0 0 1 1
Singleton/spec{GoF} 3 2 2 0 1 1 0,67
Singleton/spec{Enumerative} 5 4 3 1 2 0,75 0,6
Singleton/spec{Relaxed} 10 9 8 1 2 0,89 0,8
State/spec{GoF} 2 2 2 0 0 1 1
Strategy/spec{GoF} 11 12 11 1 0 0,92 1
Template Method/spec{GoF} 5 4 4 0 1 1 0,8
Table IV. Results obtained on benchmark 02-Lexi

Similar considerations can be made for Factory DPs: the Parametrized variant for both Abstract
Factory and Factory method allowed to discover several istances of creator method (requiring
parameters to select the product) that otherwise would be missed by DPF. Indeed, 18 Factory
Method Parametrized instances were found in addition to the 16 GoF ones. Moreover, inspecting the

Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
22 M.L. BERNARDI - M. CIMITILE - G. DI LUCCA

System→ 03-JHotDraw
↓Design Pattern GS D TP FP FN P R
Adapter/spec{GoFObject} 32 29 29 0 3 1 0,91
Bridge/spec{GoF} 37 36 35 1 2 0,97 0,95
Builder/spec{GoF} 2 0 0 0 2 0
Command/spec{UndoRedo} 15 11 11 0 4 1 0,73
Command/spec{GoF} 65 64 61 3 4 0,95 0,94
Command/spec{UndoRedoEngine} 11 10 10 0 1 1 0,91
Command/spec{ExecutionEngine} 20 20 20 0 0 1 1
Composite/spec{GoF} 16 19 14 5 2 0,74 0,88
Decorator/spec{GoF} 31 29 29 0 2 1 0,94
Factory Method/spec{SingleCreator} 10 9 9 0 1 1 0,9
Factory Method/spec{GoF} 3 2 2 0 1 1 0,67
Factory Method/spec{SingleFactory} 10 9 9 0 1 1 0,9
Factory Method/spec{Parametrized} 15 14 12 2 3 0,86 0,8
Factory Method/spec{RegistryBased} 2 1 1 0 1 1 0,5
Factory Method/spec{DelegationBased} 8 5 5 0 3 1 0,62
Memento/spec{GoF} 2 2 2 0 0 1 1
Observer/spec{EventsAsHierarchy} 4 4 4 0 0 1 1
Observer/spec{GoF} 5 5 5 0 0 1 1
Observer/spec{EventsAsParams} 5 5 5 0 0 1 1
Prototype/spec{GoF} 8 8 8 0 0 1 1
Singleton/spec{GoF} 6 5 5 0 1 1 0,83
Singleton/spec{Enumerative} 3 3 3 0 0 1 1
Singleton/spec{Relaxed} 4 4 4 0 0 1 1
Strategy/spec{GoF} 49 37 36 1 13 0,97 0,73
Template Method/spec{GoF} 67 61 59 2 8 0,97 0,88
Table V. Results obtained on benchmark 03-JHotDraw

results, we noted that in the benchmark out of 18 GoF instances, 7 were DelagationBased variants
(delegating the creation to other objects ) of which 6 instances were correctly mined by DPF.
We found that the number of false negatives was dramatically reduced by adding new variants
inheriting existing specifications and taking into account the structural differences that caused the
tool to miss them. The false negatives (computed as GS − Tp in Tables from III to IX) were related
to patterns implemented differently from what assumed in the specification (our catalog is, for the
most part, based on the definitions given in literature [29] and their known variants ).
As an example, Table III shows the variants detected for the JUnit 3.7 system. In this small system
a limited number of pattern is found, however is interesting that the five variants of Factory Method
identify several factory methods that were not present in the benchmarks. It’s worth observing that
in some cases (not all), variants are not mutually exclusive with respect to the instances. Single
creator factory mtehods share the 7 GoF instances and the same happens for the Composite variants
for which the MultipleComposite or MultipleCompositeRoot and Decorator variants can overlap
on the same instances. DPF found several Command instances using an external execution engine.
These patterns actually are not part of the main code of the framework (and correctly not included
in the benchmark) but are sample tests of the junit distribution shipped with the framework. These
tests indeed use the TestRunner as an executor implementing the structure and behaviour required
for the proposed Command-ExecutionEngine variant (and for these reason were added to the gold
standard).
It’s also worth observing that mining results obtained for specifications focused on the detection
of pattern variants using inner classes are quite good raising the resulting precision and recall of
the overall pattern family. This is especially true for Commands (in Lexi, QuickUML and Nuch

Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
DESIGN PATTERNS DETECTION USING A DSL-DRIVEN GRAPH MATCHING APPROACH 23

System→ 04-QuickUML
↓Design Pattern GS D TP FP FN P R
Abstract Factory/spec{GoF} 11 9 9 0 2 1 0,82
Adapter/spec{InnerClass} 19 20 18 2 1 0,9 0,95
Adapter/spec{GoFObject} 48 44 41 3 7 0,93 0,85
Builder/spec{GoF} 12 11 10 1 2 0,91 0,83
Command/spec{GoF} 10 8 7 1 3 0,88 0,7
Command/spec{InnerClasses} 12 12 12 0 0 1 1
Composite/spec{ContainerWithinComposite} 6 4 4 0 2 1 0,67
Composite/spec{GoF} 6 4 4 0 2 1 0,67
Composite/spec{MultipleCompositeRoot} 6 6 6 0 0 1 1
Composite/spec{MultipleComposite} 6 4 4 0 2 1 0,67
Factory Method/spec{Parametrized} 2 2 2 0 0 1 1
Factory Method/spec{RegistryBased} 1 1 1 0 0 1 1
Factory Method/spec{DelegationBased} 2 2 2 0 0 1 1
Observer/spec{GoF} 17 17 16 1 1 0,94 0,94
Prototype/spec{GoF} 10 9 9 0 1 1 0,9
Proxy/spec{Indirection} 4 3 3 0 1 1 0,75
Proxy/spec{GoF} 6 3 3 0 3 1 0,5
Singleton/spec{GoF} 3 3 3 0 0 1 1
Singleton/spec{Enumerative} 2 3 2 1 0 0,67 1
Singleton/spec{Relaxed} 3 3 3 0 0 1 1
Strategy/spec{GoF} 15 18 12 6 3 0,67 0,8
Template Method/spec{GoF} 38 30 29 1 9 0,97 0,76
Table VI. Results obtained on benchmark 04-QuickUML

systems) and Adapter (in JUnit, QuickUML ). For instance, in JUnit the overall recall of Adapter
with InnerClass variant is 0.87 (evaluated considering all the instances of the first two rows of Table
III). Removing the Adapter-InnerClass variant the recall becomes 0.53 (evaluated considering the
10 Adapter-InnerClass instances as false negatives).
Moreover result confirms that patterns mined with a higher number of (non-overlapping)
specifications have better overal results with respect to ones with few specifications. This is also
influenced by the pattern complexity since simple patterns (from structural and behavioural point
of view) require fewer variants than complex ones to be mined efficiently. Two notable examples
are the Singleton and Factory Method pattern families comprised respectively of 5 and 6 variants
having the highest overall precision and recall on all the systems.

7.2. A comparison with DPD approach


A comparison between results of our approach and those obtained using the similarity scoring
approach [61] was performed. DPD approach was chosen mainly because it adopts a similar
technique (exploiting a similar set of information, but used in a different way).
We would like to point out that the comparison is between the two approaches not between the
two tools. A comparison of DPF tool with other DP mining tools available in literature would be
our future work.
The results obtained are synthesized in Tables X (for the systems JHotDraw 7 and Avro 1.6), in
Table XI (systems Log4J 1.2 and JDT 3.6) and in Table XII (system Voldemort).
The tables report the name of the detected DPs (first column) and the number of patterns
considered as Gold Standard (GS). The Gold Standard (GS) used as reference is the set of all
true positive instances. It was computed using the correct results produced by both DPF and DPD
tools (version 4.5). Hence it could lack pattern instances missed by both tools but allows a direct
comparison. The remaining columns in the table, for each tool, report the number of D (detected),

Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
24 M.L. BERNARDI - M. CIMITILE - G. DI LUCCA

System→ 05-nutch
↓Design Pattern GS D TP FP FN P R
Abstract Factory/spec{GoF} 1 1 1 0 0 1 1
Adapter/spec{GoFObject} 68 66 64 2 4 0,97 0,94
Bridge/spec{GoF} 25 23 22 1 3 0,96 0,88
Command/spec{GoF} 50 47 46 1 4 0,98 0,92
Command/spec{InnerClasses} 9 7 7 0 2 1 0,78
Command/spec{ExecutionEngine} 12 7 6 1 6 0,86 0,5
Decorator/spec{GoF} 27 26 25 1 2 0,96 0,93
Factory Method/spec{SingleCreator} 1 1 1 0 0 1 1
Factory Method/spec{GoF} 11 7 7 0 4 1 0,64
Factory Method/spec{SingleFactory} 3 3 3 0 0 1 1
Factory Method/spec{Parametrized} 26 30 26 4 0 0,87 1
Factory Method/spec{DelegationBased} 8 5 5 0 3 1 0,62
Iterator/spec{GoF} 4 3 3 0 1 1 0,75
Memento/spec{GoF} 15 14 13 1 2 0,93 0,87
Prototype/spec{GoF} 2 2 2 0 0 1 1
Prototype/spec{GoFManager} 2 2 2 0 0 1 1
Proxy/spec{Indirection} 7 7 7 0 0 1 1
Proxy/spec{GoF} 43 39 38 1 5 0,97 0,88
Singleton/spec{GoF} 11 7 7 0 4 1 0,64
Singleton/spec{Enumerative} 2 2 2 0 0 1 1
Singleton/spec{Pool} 4 4 4 0 0 1 1
Singleton/spec{Relaxed} 46 43 42 1 4 0,98 0,91
Strategy/spec{GoF} 39 36 36 0 3 1 0,92
Template Method/spec{GoF} 38 31 31 0 7 1 0,82
Table VII. Results obtained on benchmark 05-nutch

Tp (true positives), Fp (false positives) and the computed values of Precision (P) and Recall (R)
computed on the results as provided by the tool and validated by an expert.
The first consideration about the results is related to the presence of false positive and false
negative. The percentage of false positive is less than 0.4% and 2% for respectively DPF and DPD
that is quite acceptable for both tools.
However for some patterns, and for both the approaches, the number of false positive is
particularly higher than for other patterns.
This happens, for DPF on JHotDraw 7, for Template Method patterns in which of 110 instances,
10 instances were not template methods. Inspecting those cases revealed a problem in the structure
of the specification that, even correct, was too relaxed. This caused some internal helper methods to
be considered as template methods.
For the Observer pattern (in JHotDraw 7) the results were similar, since the approach detected
105 observers instances (one for each concrete participant) but 9 of them were not Observers. The
case of Observer design pattern is also interesting for what concerns the detection of patterns
variants. DPF detected 96 true Observer instances on JHotDraw 7 (with 9 false positives) and
48 instances on Apache Avro 1.6 (no false positives found). Our pattern specification repository
actually was comprised of 2 variants for the Observer pattern. The first one exploits standard Java
types (Observable class and Listener interface). The second one is the multi-event observer reported
in Figure 7. Inspecting the matched instances we found that, for the both JHotDraw 7 and Apache
Avro 1.6 systems, the 96 and 48 instances respectively were all variants of the second type. This
also explains why DPD tool, that is based on the classic variant, was not able to find observers on
these two systems. The situation is inverted in JDT and Log4J systems for which in both cases a
single event observer is found by DPD but not by DPF. Inspecting source code we found that the

Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
DESIGN PATTERNS DETECTION USING A DSL-DRIVEN GRAPH MATCHING APPROACH 25

System→ 06-PMD
↓Design Pattern GS D TP FP FN P R
Adapter/spec{GoFObject} 10 10 10 0 0 1 1
Builder/spec{GoF} 12 12 11 1 1 0,92 0,92
Command/spec{GoF} 10 6 6 0 4 1 0,6
Composite/spec{GoF} 8 8 7 1 1 0,88 0,88
Decorator/spec{GoF} 5 5 5 0 0 1 1
Factory Method/spec{SingleCreator} 3 2 2 0 1 1 0,67
Factory Method/spec{GoF} 22 18 17 1 5 0,94 0,77
Factory Method/spec{SingleFactory} 15 12 12 0 3 1 0,8
Factory Method/spec{Parametrized} 7 6 5 1 2 0,83 0,71
Factory Method/spec{RegistryBased} 2 3 2 1 0 0,67 1
Factory Method/spec{DelegationBased} 9 7 7 0 2 1 0,78
Iterator/spec{GoF} 4 5 4 1 0 0,8 1
Observer/spec{GoF} 3 3 3 0 0 1 1
Proxy/spec{GoF} 3 4 3 1 0 0,75 1
Singleton/spec{GoF} 9 9 9 0 0 1 1
Singleton/spec{Enumerative} 1 1 1 0 0 1 1
Singleton/spec{Relaxed} 8 8 7 1 1 0,88 0,88
Strategy/spec{GoF} 29 28 28 0 1 1 0,97
Template Method/spec{GoF} 12 12 12 0 0 1 1
Visitor/spec{GoF} 92 80 80 0 12 1 0,87
Table VIII. Results obtained on benchmark 06-PMD

DSL specifications missed the observer with a notify method taking one or more arguments or is
indirectly called (before the actual call to the update(. . . ) methods on listener).
Another interesting case is related to the Adapter and Command patterns since they have very
similar static structure. The DPF tool is able, if needed, to distinguish between them by adding
behavioural constraints (clarifying how pattern is used by its context) while DPD (and many other
structural approaches) is not able to do this. For JhotDraw and Avro our specification was not able
to distinguish among them while in JDT and Log4J cases we added further constraints leading to
a better identification. In particular for the Command pattern, in both cases, DPF obtained better
results in terms of both precision and recall.
Actually for precision and recall we can observe that while precision is quite high for both the
tools, recall for DPF is generally higher than DPD (as shown in Figure 11∗∗ ). This results are
confirmed by the details showing that both tools are good at keeping the number of FP low, but for
DPF the number of TP is, in the average, higher than DPD.
Finally, the DPs detection was performed on the Voldemort system. The obtained results are
in Table XII. Voldemort system was selected since it contains a large portion of automatically
generated code, allowing to study the impact of code generation on pattern mining. As shown in
Table XII, the values of precision and recall for DPF are quite high (respectively greater than 0.64
and 0.82). For DPD, while the average value of precision is similar (0.91) to the DPF one, the values
of recall are not satisfactory if compared to the ones of DPF. A manual code inspection revealed
that the high number of missed Prototype, Adapter and Singleton instances are mostly present in
the classes generated from the ProtocolBuffer code generator†† specifications. The specifications we
used in the catalog helped to obtain better results since we added variants capable of detecting inner

∗∗ The average is performed taking into account all the meaningful precision/recall scores obtained for all searched
patterns on the considered systems.
†† The Google protocol buffer code generator hosted in http://code.google.com/p/protobuf/.

Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
26 M.L. BERNARDI - M. CIMITILE - G. DI LUCCA

System→ 07-JRefactory
↓Design Pattern GS D TP FP FN P R
Abstract Factory/spec{GoF} 2 2 2 0 0 1 1
Abstract Factory/spec{Parametrized} 1 1 1 0 0 1 1
Adapter/spec{GoFObject} 49 39 38 1 11 0,97 0,78
Adapter/spec{Dynamic} 38 33 29 4 9 0,88 0,76
Builder/spec{GoF} 4 4 4 0 0 1 1
Command/spec{GoF} 84 87 82 5 2 0,94 0,98
Command/spec{ExecutionEngine} 10 6 6 0 4 1 0,6
Factory Method/spec{SingleCreator} 9 7 7 0 2 1 0,78
Factory Method/spec{GoF} 18 16 15 1 3 0,94 0,83
Factory Method/spec{SingleFactory} 14 12 12 0 2 1 0,86
Factory Method/spec{Parametrized} 22 18 18 0 4 1 0,82
Factory Method/spec{RegistryBased} 11 11 11 0 0 1 1
Factory Method/spec{DelegationBased} 7 6 4 2 3 0,67 0,57
Memento/spec{GoF} 11 10 10 0 1 1 0,91
Observer/spec{EventsAsParams} 3 3 3 0 0 1 1
Proxy/spec{Indirection} 31 33 31 2 0 0,94 1
Proxy/spec{GoF} 31 31 30 1 1 0,97 0,97
Singleton/spec{GoF} 12 12 12 0 0 1 1
Singleton/spec{Enumerative} 24 24 24 0 0 1 1
Singleton/spec{Relaxed} 45 43 43 0 2 1 0,96
State/spec{GoF} 8 8 7 1 1 0,88 0,88
Strategy/spec{GoF} 50 49 49 0 1 1 0,98
Template Method/spec{GoF} 131 155 131 24 0 0,85 1
Visitor/spec{GoF} 136 132 131 1 5 0,99 0,96
Table IX. Results obtained on benchmark 07-JRefactory

System→ JHotDraw 7 Apache Avro 1.6


Tool→ DPF DPD DPF DPD
↓Design Pattern GS D TP FP P R D TP FP P R GS D TP FP P R D TP FP P R
Observer 96 105 96 9 0.91 1 0 0 0 - - 48 48 48 0 1 1 0 0 0 - -
Singleton 32 32 32 0 1 1 31 30 1 0.96 0.93 27 27 26 1 0.96 0.96 21 18 3 0.85 0.66
Factory Method 20 18 18 0 1 0.9 4 2 2 0.5 0.1 12 12 12 0 1 1 1 1 0 1 0.1
Template Method 110 110 100 10 0.9 0.9 16 10 6 0.63 0.09 32 28 28 0 1 0.88 13 9 4 0.69 0.28
Adapter/Command 155 123 120 3 0.97 0.77 79 71 8 0.89 0.45 33 30 30 0 1 0.90 12 9 3 0.75 0.27
Decorator 6 6 6 0 1 1 4 4 0 1 0.67 6 5 5 0 1 0.83 6 6 0 1 1
Prototype 46 34 31 3 0,91 0,67 113 40 73 0,35 0,86 5 7 5 2 0.71 1 0 0 0 - -
State Strategy 194 168 165 3 0.92 0.85 213 171 42 0.80 0.88 116 137 114 23 0.82 0.98 18 18 0 1 0.15
Composite 6 5 5 0 1 0.83 6 6 0 1 1 0 — — — — — — — — — —

Table X. Precision and Recall for JHotDraw 7 and Apache Avro 1.6

classes and generic types that are consistently and widely used by the large automatically generated
classes.
An overview of the average precision and recall obtained for each system by both the DPD and
DPF tools is shown in Figure 11. The figure shows that the precision of DPF is always greater than
the one of DPD with the exception of the Log4j system. Finally, the recall values obtained for DPF
are largely better than the corresponding values for DPD.

7.3. Performance issues


When running the DPF tool, we have measured the execution times for each step of the detection
process. Table XIII reports the measured values for each of the analyzed systems and Figure 12

Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
DESIGN PATTERNS DETECTION USING A DSL-DRIVEN GRAPH MATCHING APPROACH 27

System→ JDT Log4J


Tool→ DPF DPD DPF DPD
↓Design Pattern GS D TP FP P R D TP FP P R GS D TP FP P R D TP FP P R
Observer 4 0 0 0 - - 4 4 0 1 1 1 0 0 0 - - 1 1 0 1 1
Singleton 53 66 51 15 0.77 0.96 31 31 0 1 0.58 1 33 32 1 0.97 1 10 10 0 1 0.31
Factory Method 288 278 276 2 0.99 0.96 21 18 3 0.86 0.06 12 12 11 1 0.92 0.92 2 2 0 1 0.17
Template Method 564 554 554 0 1 0.98 94 93 1 0.99 0.16 48 48 48 0 1 1 6 6 0 1 0.13
Adapter 673 673 673 0 1 1 274 256 18 0.93 0.38 50 49 49 0 1 0.98 15 15 0 1 0.30
Command 278 281 278 3 0.99 1 0 0 0 - - 4 5 4 1 0.80 1 0 0 0 - -
Decorator 25 24 24 0 1 0.96 10 10 0 1 0.40 2 3 2 0 0.67 1 0 0 0 - -
Prototype 33 35 33 2 0,94 1 0 0 0 - - 1 1 1 0 1 1 0 0 0 - -
State Strategy 181 187 174 13 0.96 0.96 145 140 5 0.96 0.77 17 22 17 5 0.77 1 17 17 0 1 1
Composite 1 0 0 0 - - 1 1 0 1 1 1 1 1 0 1 1 0 0 0 - -
Proxy 13 13 13 0 1 1 0 0 0 - - 2 2 2 0 1 1 0 0 0 - -

Table XI. Precision and Recall for JDT and Log4J

System→ Voldemort
Tool→ DPF DPD
↓Design Pattern GS D TP FP P R D TP FP P R
Observer 9 9 9 0 1 1 0 0 0 - -
Singleton 148 150 148 2 0.99 1 91 70 21 0.77 0.47
Factory 89 84 83 1 0.99 0.93 9 7 2 0.78 0.08
Template 22 20 18 2 0.90 0.82 10 10 0 1 0.45
Adapter 245 245 245 0 1 1 64 61 3 0.95 0.25
Command 1 1 1 0 1 1 0 0 0 - -
Decorator 14 16 14 2 0.88 1 13 13 0 1 0.93
Prototype 139 139 139 0 1 1 119 119 0 1 0.86
State-Strategy 112 170 109 61 0.64 0.97 103 96 7 0.93 0.86
Composite 3 3 3 0 1 1 0 0 0 - -

Table XII. Precision and Recall Voldemort

Figure 11. DPF and DPD Precision (left) and Recall (right) for the analyzed systems

shows the total and average detection times of DPD and DPF tools for each systems. The pattern
matching step is the most CPU time consuming. We cannot show detection times for each pattern
since our approach uses the successful identifications across pattern specifications as constraints
to improve the performance and hence the detection times are dependent on that. However, we
calculated the average time to detect a single pattern and it resulted to be comparable to the other
structural approaches. The total times in Table XIII, show that DPF exhibits a better scalability with
respect to DPD Tool. Experimentation performed for tuning the patterns specifications, showed that

Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
28 M.L. BERNARDI - M. CIMITILE - G. DI LUCCA

System→ JHotDraw 7 Avro 1.6 JDT Log4J Voldemort


Tool↓ Step↓ Times (s) ↓
Total Time 1281.503 6320.815 12619.523 701.936 2930.562
DPD
Average per pattern 142.389 702.313 1402.169 77.992 325.618

Parsing&AST extraction 69.974 255.23 414.181 39.203 137.809


Meta-model generation 428.147 1567.785 2782.866 239.872 781.442
DPF Pattern repository detection 627.090 2296.271 4451.886 351.331 1454.022
Total Time 1125.211 4120.287 7369.613 630.406 3034.226
Average per pattern 125.023 457.810 822.631 70.044 387.224
Table XIII. Execution times of the design patterns detection process

performances can be considerably improved by identifying structural and behavioral constraints that
are effective at identifying a well defined variant of a pattern. Hence the approach is more effective
when specifications are structured in a hierarchy and each specification is dedicated to specific
pattern variants.

Figure 12. DPF and DPD total (left) and average per pattern (right) detection times

Another interesting point is that the algorithm is faster when actually the searched DPs are found
in the analyzed system. On the contrary the worst performances are when no DP instances are found
in the system. This is because all matches need to be executed anyway and no existing bindings are
used to reduce the list of remaining matches to be evaluated.

7.4. Threats to Validity


Construct validity threats concern the relationship between theory and observation. There could
be imprecision and omissions in the measurements made in the proposed validations for several
reasons. One of the most important limitation regards the generation of behavioral properties in
presence of late binding. In this case, as already stated, we have built a call graph using Rapid
Type Analysis (RTA) to reduce the set of possible callers. However the call graph still contains a
super-set of the actual calls. In the properties extraction algorithm we decided to take into account
the sets of possible targets in order to perform the matches. In this way, we surely don’t miss
any possible binding but exposes the algorithm to the presence of false positives (since the set
of successful binding can be a super-set of the actual ones). Further experimentation (with more
strict policies) should be performed in order to assess if the behavior of the algorithm improves
with respect to this conservative choice. Conclusion validity concerns the relationship between the
treatment and the outcome. As explained in Section 6, we performed a comparison with seven
systems of open benchmarks. Hence our results are still exposed to bias and human mistakes or
subject to interpretation but, being the benchmark publicly available and evolved over several years,
these effects should be limited. Conversely, for the second group of analyzed systems, the gold
standard was computed comparing results obtained using DPF and DPD approaches. This means
that, for these systems, we cannot exclude that the computation of recall is imprecise (it could

Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
DESIGN PATTERNS DETECTION USING A DSL-DRIVEN GRAPH MATCHING APPROACH 29

be higher than actual one since there could exist pattern instances missed by both tools used in
the study to compute the gold standard). This, in future work, can be improved by performing a
full analysis of the searched source code base (or using available benchmarks). Threats to internal
validity concern factors that can influence our observations. In this case the identification of pattern
instances was based on the expert examination of internal/external documentation and source code
and hence could pose a threat to internal validity affecting the number of false negatives. Threats
to external validity concern the generalization of our findings. Of course replication on further
systems to confirm or contradicts the obtained results is always desirable. Moreover, we cannot
claim that our approach produces the same results on different (and larger) systems. Rather, we
provide quantitative information on the quality of the search for several real world systems and can
affirm that precision and recall have remained consistent and independent with respect to the system
size. On the performance side there is a high dependency of the overall detection performance on
the quality of pattern specifications. When specifications are badly written (that means few and
overlapping constraints) the performance of the algorithm degrades rapidly.

8. CONCLUSIONS AND FUTURE WORK

DPs identification in a software system, together with the knowledge about the components involved
in each DP instance, greatly helps system comprehension. The approach presented in this paper
provides an efficient support to the detection of DP instances in an OO system. It exploits a meta-
model and a derived DSL defined to represent both the patterns and the system under study. The
detection process is carried out matching each model of a design pattern with the system model
one. The defined DSL allows to easily specify the model of the DPs structure by the Properties
characterizing them. Moreover, the DSL makes easy to specify the variants of a DP by overriding an
already defined DP specification. The DPF tool has been developed to provide an automatic support
to the approach. The performed experiments confirmed the ease of use of the DSL to specify a DP
model and its variants as well as the correctness of the specifications. The experiments showed the
high accuracy of the approach in detecting the DP instances in a system, allowing to distinguish
among the DP variants too.
In particular, the approach has been assessed by applying the DPF tool first to seven systems
of an open benchmark proposed in [32] and [2], and then to five additional systems to compare
the results from DPF with the ones obtained from the DPD approach. It is worthwhile to highlight
the effectiveness of DPF in detecting DP variants that were not identified by the benchmark or the
DPD tool. The average values of precision and recall, evaluated using the GS composed starting
from the considered benchmarks, are good, independently of system size. For the latter group of
five systems, the results show that DPF performs better and is more efficient than the DPD. As
future work, DPF approach will be further improved and a deep comparison with other available
approaches and tools will be performed. Future work thus involves the improvement of the meta-
model in order to consider a wider set of properties to allow modeling of more complex design and
architectural patterns also experimenting with anti-patterns mining.

APPENDIX - PATTERNS SPECIFICATIONS EXAMPLES

In the following some DSL specifications and DPGs of of the most relevant DPs and their variants
are provided. For sake of brevity and not to bury readers, all the other DSL specifications and
DPGs of the remaining DPs stored in the DP catalog repository are not described here, but they are
available at our repository [3].

8.1. Object and Class Adapters


The Adapter pattern has two “classic” variants [29], since it can be implemented using inheritance
(the class Adapter) or composition (the object Adapter).

Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
30 M.L. BERNARDI - M. CIMITILE - G. DI LUCCA

1 pattern adapterObject {
2 type ADAPTER( ∗ ) {
3 {
4 has f i e l d f of −type ADAPTEE ;
5 has constructor c {
6 has param p of type ADAPTEE ;
7 set − f i e l d f ;
8 }
9 has methods−set a d ap t e rS e t each {
10 delegates to f i n adapteeSet ;
11 }
12 }
13 type ADAPTEE( 1 ) {
14 has methods−set adapteeSet ;
15 }
16 }
17 p a t t e r n ad a p t e r C l a s s extends a d a p t e r O b j e c t {
18 override ADAPTER( ∗ ) {
19 i n h e r i t s −from ADAPTEE ;
20 }
21 override methods−set a d a p t e r S e t each {
22 delegates to methods−set adapteeSet i n ADAPTEE ;
23 }
24 }

Figure 13. The DSL specifications of adapter Object- and Class- variants.

Figure 14. Object Adapter and its Class variant specification graphs

The Class Adapter, showed in figures 13 (DSL) and 14 (DPG), is based on inheritance (delegates
to inherited methods) while the object Adapter exploits composition. In the DSL specification of
the Object Adapter, two template types are defined: an Adapter role ADAPTER incapsulating a
reference to an ADAPTEE type.
The ADAPTER must define a field f of type ADAPTEE and must implement the methods of the
ADAPTER interface in terms of methods of the ADAPTEE one (referring to the adapterSet method-
set defined in the specification, each method of the set must delegate to a method of “f” field).
The adapterClass instead, is specified as a variant of the adapterObject; in this case ADAPTER is
forced to inherit from ADAPTEE and adapterSet is built considering the methods inherited from
ADAPTEE.
However, in real world systems the adapter pattern can be found in several forms that are slightly
different from the models of Figure 13. It is often implemented for generic types, using inner classes
or both. In our catalog we have three specifications to cover such implementation. Figure 15 reports
the DSL of the inner class Adapter variant. It inherits from the classic Object Adapter specification
and introduces the structural elements to detect the adapter as an inner generic type.

Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
DESIGN PATTERNS DETECTION USING A DSL-DRIVEN GRAPH MATCHING APPROACH 31

1 p a t t e r n A d a p t e r I n n e r G e n e r i c C l a s s e s extends Ob j ec tA da p te r {
2
3 override ADAPTER( ∗ ) {
4 g e n e r i c on type T ;
5 }
6 override ADAPTEE( 1 ) {
7 g e n e r i c on type T ;
8 }
9
10 type CLIENT ( ∗ ) {
11
12 method c l i e n t {
13 new O b j e c t A d a p t er on type T
14 }
15 }
16
17 }

Figure 15. The DSL of Inner-class Generic Object Adapter variant

1 p a t t e r n Composite {
2 type C( 1 ) {
3 has method a {
4 has r e t u r n type void
5 has param c of −type C
6 }
7 has method r {
8 has r e t u r n type void
9 has param c of −type C
10 }
11 has methods−set componentSetC
12 }
13 type LF ( ∗ ) {
14 i n h e r i t s −from C
15 has−n o t container co of −type C
16 }
17 type CM( ∗ ) {
18 i n h e r i t s −from C
19 has container cm of −type C
20 has methods−set componentSetCM each {
21 delegates −to cm
22 }
23 }
24 }

Figure 16. The Composite DSL specification

8.2. Composite
The Composite DSL specification models the Component hierarchy using three roles: Leaf (LF),
Component (C) and CM (Composite). Both Leaf (LF) and Composite (CM) must inherit the
Component role. While LF has inner structure (no reference to inner components), CM must own
a reference to a collection (named “cm” in the specification) having C as base type. Each method
of the composite CM (referred to as “componentSet”) delegates to container methods (to allow
adding/removing/iterating over inner components). For Composite, DSL specification and DPGs are
respectively reported in figures 16 and 17. In Figure 17, the labels cS1 and cS2 stand respectively
for componentSetC and componentSetCM.

Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
32 M.L. BERNARDI - M. CIMITILE - G. DI LUCCA

Figure 17. The Composite DPG

cS1

is type of

a C CM cm

is type of is type of
delegate to

m
has not

c r co LF cS2

delegate to

Figure 18. The Decorator DPG

1 p a t t e r n D e c o r a t o r extends Composite {
2 ...
3 override CM {
4 has methods−set componentSetCM each {
5 delegates to cm
6 delegates to m i n C. componentSetC
7 }
8 }
9 }
10 }
11 }

Figure 19. An excerpt of the Decorator DSL.

8.3. Decorator
The Decorator pattern can be represented as a variant of the described Composite. The added
elements are represented in red in Figure 18 while an excerpt of its DSL is reported in Figure 19.
The most relevant change in this variant is related to the componentSetCM method set that requires
delegation to both the cm field and to the componentSetC (representing the set of method defined
by the Component interface).

Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
DESIGN PATTERNS DETECTION USING A DSL-DRIVEN GRAPH MATCHING APPROACH 33

1 pattern Singleton {
2 f i n a l type X {
3 X has p r i v a t e constructor c ;
4 X has f i e l d f of type X ;
5 X has public s t a t i c methods−set c re a ti o nH o ok s each {
6 depends on f ;
7 }
8 }
9 }

Figure 20. The DSL of the first singleton variant

RET
is type of is type of
is container of

X f X f X f
*

-c c c

cH cH cH
depends on depends on depends on

GoF Singleton Relaxed Singleton (overriding constructor) Pool

Figure 21. GoF and relaxed Singleton graphs

8.4. Singleton
The Singleton pattern is identified using three variants, whose DPGs are shown in Figure 21. The
DSL of the first one, reported in Figure 20, provides a Singleton definition as given in literature
[29], implemented with a final class, a private constructor and a public static getter method. To mine
multiple instance getters, the variant defines a method set called “creationHooks” (the box labelled
by cH in Figure 21). Each method in this set requires a dependency on the static Singleton field “f”.
The second relaxed specification, called “relaxed-gof”, removes the private constraint from the
constructor (refer to the red block in right part of Figure 21) and final class constraints.
Finally the DPG shown on the right side of the Figure 21, models the Pool variant that allows to
handle a fixed set of resources. In this case the field “f” is overridden to become a container while
the method RET represents the pool’s resource manager.
*Bibliography
[1] http://www.eclipse.org/modeling/.
[2] Comsats institute of information technology. http://research.ciitlahore.edu.pk/Groups/
SERC/DesignPatterns.aspx .

[3] https://github.com/UnisannioSoftEng/DPF/wiki/Design-Pattern-Finder-Home.
[4] A. Alnusair, T. Zhao, and G. Yan. Automatic recognition of design motifs using semantic conditions. In
Proceedings of the 28th Annual ACM Symposium on Applied Computing, SAC ’13, pages 1062–1067,
New York, NY, USA, 2013. ACM.
[5] A. Ampatzoglou, G. Frantzeskou, and I. Stamelos. A methodology to assess the impact of design
patterns on software quality. Inf. Softw. Technol., 54(4):331–346, Apr. 2012.
[6] G. Antoniol, G. Casazza, M. D. Penta, and R. Fiutem. Object-oriented design patterns recovery.
Journal of Systems and Software, 59(2):181–196, 2001.
[7] G. Antoniol, R. Fiutem, and L. Cristoforetti. Design pattern recovery in object-oriented software. In
Proceedings of the 6th International Workshop on Program Comprehension, IWPC ’98, pages 153–,
Washington, DC, USA, 1998. IEEE Computer Society.

Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
34 M.L. BERNARDI - M. CIMITILE - G. DI LUCCA

[8] F. Arcelli and M. Zanoni. A tool for design pattern detection and software architecture reconstruction.
Inf. Sci., 181(7):1306–1324, Apr. 2011.

[9] F. Arcelli Fontana, A. Caracciolo, and M. Zanoni. Dpb: A benchmark for design pattern detection
tools. In Software Maintenance and Reengineering (CSMR), 2012 16th European Conference on,
pages 235–244, 2012.

[10] Z. Balanyi and R. Ferenc. Mining design patterns from c++ source code. In Proc. International
Conference on Software Maintenance ICSM 2003, pages 305–314, Sept. 22–26, 2003.

[11] I. Bayley and H. Zhu. On the composition of design patterns. In Quality Software, 2008. QSIC ’08.
The Eighth International Conference on, pages 27 –36, aug. 2008.

[12] F. Bergenti and A. Poggi. Improving uml designs using automatic design pattern detection. In Proc.
12th. International Conference on Software Engineering and Knowledge Engineering (SEKE 2000,
pages 336–343, 2000.

[13] M. Bernardi, C. M., and D. L. G.A. A model-driven graph-matching approach for design pattern
detection. In 20th Working Conference on Reverse Engineering (WCRE), pages 172–181, 2013.

[14] M. L. Bernardi and G. A. Di Lucca. Model-driven detection of design patterns. In Proceedings of the
26th IEEE International Conference on Software Maintenance, ICSM ’10, September 12-18, Timioara,
Romania, 2010.

[15] D. Beyer. Relational programming with crocopat. In Proceedings of the 28th international conference
on Software engineering, ICSE ’06, pages 807–810, New York, NY, USA, 2006. ACM.

[16] D. Beyer and C. Lewerentz. Crocopat: efficient pattern analysis in object-oriented programs. pages
294–295, 2003.

[17] A. Binun and G. Kniesel. Dpjf - design pattern detection with high accuracy. In Software Maintenance
and Reengineering (CSMR), 2012 16th European Conference on, pages 245–254, 2012.

[18] A. De Lucia, V. Deufemia, C. Gravino, and M. Risi. Behavioral pattern identification through visual
language parsing and code instrumentation. In Proceedings of the 2009 European Conference on
Software Maintenance and Reengineering, CSMR ’09, pages 99–108, Washington, DC, USA, 2009.
IEEE Computer Society.

[19] A. De Lucia, V. Deufemia, C. Gravino, and M. Risi. Design pattern recovery through visual language
parsing and source code analysis. Journal of Systems and Software, 82(7):1177 – 1193, 2009.

[20] A. De Lucia, V. Deufemia, C. Gravino, and M. Risi. An eclipse plug-in for the detection of design
pattern instances through static and dynamic analysis. In Software Maintenance (ICSM), 2010 IEEE
International Conference on, pages 1–6, 2010.

[21] A. De Lucia, V. Deufemia, C. Gravino, and M. Risi. Improving behavioral design pattern detection
through model checking. In Software Maintenance and Reengineering (CSMR), 2010 14th European
Conference on, pages 176–185, 2010.

[22] A. De Lucia, V. Deufemia, C. Gravino, M. Risi, and G. Tortora. An eclipse plug-in for the identification
of design pattern variants. In Sixth Workshop of the Italian Eclipse Community (Eclipse-IT 2011, pages
40–51, 2011.

[23] J. Dong and Y. Zhao. Experiments on design pattern discovery. In Predictor Models in Software
Engineering, 2007. PROMISE’07: ICSE Workshops 2007. International Workshop on, pages 12–12,
May 2007.

[24] J. Dong, Y. Zhao, and T. Peng. Architecture and design pattern discovery techniques - a review. In
H. R. Arabnia and H. Reza, editors, Software Engineering Research and Practice, pages 621–627.
CSREA Press, 2007.

[25] J. Dong, Y. Zhao, and Y. Sun. A matrix-based approach to recovering design patterns. Trans. Sys. Man
Cyber. Part A, 39(6):1271–1282, Nov. 2009.

Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
DESIGN PATTERNS DETECTION USING A DSL-DRIVEN GRAPH MATCHING APPROACH 35

[26] F. A. Fontana, M. Zanoni, and S. Maggioni. Using design pattern clues to improve the precision of
design pattern detection tools. Journal of Object Technology, 10:4: 1–31, 2011.

[27] R. France, D. Kim, S. Ghosh, and E. Song. A uml-based pattern specification technique. Software
Engineering, IEEE Transactions on, 30(3):193 – 206, march 2004.

[28] L. J. Fulop, T. Gyovai, and R. Ferenc. Evaluating c++ design pattern miner tools. In Proceedings of
the Sixth IEEE International Workshop on Source Code Analysis and Manipulation, SCAM ’06, pages
127–138, Washington, DC, USA, 2006. IEEE Computer Society.

[29] E. Gamma, R. Helm, R. Johnson, and J. Vlissides. Design patterns: elements of reusable object-
oriented software. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1995.

[30] H. A. Ghulam Rasool. Discovering variants of design patterns. Journal of Basic and Applied Scientific
Research, pages 139–147, 2013.

[31] M. Goldstein and D. Moshkovich. System grokking: a novel approach for software understanding,
validation, and evolution. In Proceedings of the 7th international conference on Next generation
information technologies and systems, NGITS’09, pages 38–49, Berlin, Heidelberg, 2009. Springer-
Verlag.

[32] Y. G. Guéhéneuc. P-mart: Pattern-like micro architecture repository,. In Proceedings of the 1st
EuroPLoP Focus Group on Pattern Repositories. Michael , Aliaksandr Birukou, and Paolo Giorgini,
2007, http://www.ptidej.net/tool/designpatterns/.

[33] Y. G. Gueheneuc and G. Antoniol. Demima: A multilayered approach for design pattern identification.
IEEE Transactions on Software Engineering, 34(5):667–684, 2008.

[34] Y. G. Guéhéneuc, J. Y. Guyomarc’H, and H. Sahraoui. Improving design-pattern identification: a new


approach and an exploratory study. Software Quality Control, 18(1):145–174, Mar. 2010.

[35] Y. G. Guéhéneuc, H. A. Sahraoui, and F. Zaidi. Fingerprinting design patterns. In 11th Working
Conference on Reverse Engineering (WCRE 2004), pages 172–181, 2004.

[36] A. L. Guennec, G. Sunye, and J. marc Jezequel. Precise modeling of design patterns. In In Proceedings
of UML00, pages 482–496. Springer Verlag, 2000.

[37] J. Heering, P. R. H. Hendriks, P. Klint, and J. Rekers. The syntax definition formalism sdf reference
manual. SIGPLAN Not., 24(11):43–75, Nov. 1989.

[38] D. Heuzeroth, T. Holl, G. Högström, and W. Löwe. Automatic design pattern detection. In
Proceedings of the 11th IEEE International Workshop on Program Comprehension, IWPC ’03, pages
94–, Washington, DC, USA, 2003. IEEE Computer Society.

[39] H. Huang, S. Zhang, J. Cao, and Y. Duan. A practical pattern recovery approach based on both
structural and behavioral analysis. Journal of Systems and Software, 75(12):69 – 87, 2005. Software
Engineering Education and Training.

[40] O. Kaczor, Y. Gueheneuc, and S. Hamel. Efficient identification of design patterns with bit-vector
algorithm. In Proc. 10th European Conference on Software Maintenance and Reengineering CSMR
2006, pages 10 pp.–184, 2006.

[41] R. K. Keller, R. Schauer, S. Robataille, and B. Laguë. Advances in software engineering. chapter
Pattern-based design recovery with SPOOL, pages 113–135. Springer-Verlag New York, Inc., New
York, NY, USA, 2002.

[42] H. Kim and C. Boldyreff. A method to recover design patterns using software product metrics. In
Proceedings of the 6th International Conerence on Software Reuse: Advances in Software Reusability,
ICSR-6, pages 318–335, London, UK, UK, 2000. Springer-Verlag.

[43] C. Kramer and L. Prechelt. Design recovery by automated search for structural design patterns in
object-oriented software. In Proceedings of the 3rd Working Conference on Reverse Engineering
(WCRE ’96), WCRE ’96, pages 208–, Washington, DC, USA, 1996. IEEE Computer Society.

Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
36 M.L. BERNARDI - M. CIMITILE - G. DI LUCCA

[44] M. P. L. Prechelt, B. Unger-Lamprecht and W. Tichy. Two controlled experiments assessing the
usefulness of design pattern documentation in program maintenance. IEEE Trans. Softw. Eng.,
28(6):595–606, 2002.
[45] K. N. Loo and S. Lee. Representing design pattern interaction roles and variants. In Computer
Engineering and Technology (ICCET), 2010 2nd International Conference on, volume 6, pages V6–
470–V6–474, 2010.
[46] K. N. Loo, S. P. Lee, and T. K. Chiew. Uml extension for defining the interaction variants of design
patterns. Software, IEEE, 29(5):64–72, 2012.
[47] J. K. Y. Ng, Y. G. Gueheneuc, and G. Antoniol. Identification of behavioural and creational design
motifs through dynamic analysis. J. Softw. Maint. Evol., 22(8):597–627, Dec. 2010.
[48] J. Niere, W. Schäfer, J. P. Wadsack, L. Wendehals, and J. Welsh. Towards pattern-based design
recovery. In Proceedings of the 24th International Conference on Software Engineering, ICSE ’02,
pages 338–348, New York, NY, USA, 2002. ACM.
[49] J. Paakki, A. Karhinen, J. Gustafsson, L. Nenonen, and A. I. Verkamo. Software metrics by
architectural pattern mining. In Proceedings of the International Conference on Software: Theory
and Practice (16th IFIP World Computer Congress, pages 325–332, 2000.
[50] T. Peng, J. Dong, and Y. Zhao. Verifying behavioral correctness of design pattern implementation.
In Proceedings of the Twentieth International Conference on Software Engineering & Knowledge
Engineering (SEKE’2008), pages 454–459, 2008.
[51] N. Pettersson, W. Lowe, and J. Nivre. Evaluation of accuracy in design pattern occurrence detection.
IEEE Trans. Softw. Eng., 36(4):575–590, July 2010.
[52] I. Philippow, D. Streitferdt, M. Riebisch, and S. Naumann. An approach for reverse engineering of
design patterns. Software and System Modeling, 4(1):55–70, 2005.
[53] G. Rasool and P. Mäder. Flexible design pattern detection based on feature types. In 26th IEEE/ACM
International Conference on Automated Software Engineering (ASE 2011), Lawrence, KS, USA,
November 6-10, pages 243–252, 2011.
[54] G. Rasool, I. Philippow, and P. Mäder. Design pattern recovery based on annotations. Adv. Eng. Softw.,
41(4):519–526, Apr. 2010.
[55] G. Rasool and D. Streitfdert. A survey on design pattern recovery techniques. IJCSI International
Journal of Computer Science Issues, 8(2):251 – 260, 2011.
[56] N. Shi and R. A. Olsson. Reverse engineering of design patterns from java source code. In Proceedings
of the 21st IEEE/ACM International Conference on Automated Software Engineering, ASE ’06, pages
123–134, Washington, DC, USA, 2006. IEEE Computer Society.
[57] J. M. Smith and D. Stotts. Spqr: flexible automated design pattern extraction from source code. In
Proc. 18th IEEE International Conference on Automated Software Engineering, pages 215–224, Oct.
6–10, 2003.
[58] K. Stencel and P. Wegrzynowicz. Detection of diverse design pattern variants. In Software Engineering
Conference, 2008. APSEC ’08. 15th Asia-Pacific, pages 25–32, Dec 2008.
[59] K. Stencel and P. Wegrzynowicz. Implementation variants of the singleton design pattern. In
Proceedings of the OTM Confederated International Workshops and Posters on On the Move to
Meaningful Internet Systems: 2008 Workshops: ADI, AWeSoMe, COMBEK, EI2N, IWSSA, MONET,
OnToContent + QSI, ORM, PerSys, RDDS, SEMELS, and SWWS, OTM ’08, pages 396–406, Berlin,
Heidelberg, 2008. Springer-Verlag.
[60] P. Tonella, M. Torchiano, B. Du Bois, and T. Systä. Empirical studies in reverse engineering: state of
the art and future trends. Empirical Softw. Engg., 12(5):551–571, Oct. 2007.
[61] N. Tsantalis, A. Chatzigeorgiou, G. Stephanides, and S. T. Halkidis. Design pattern detection using
similarity scoring. IEEE Trans. Softw. Eng., 32(11):896–909, Nov. 2006.

Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr
DESIGN PATTERNS DETECTION USING A DSL-DRIVEN GRAPH MATCHING APPROACH 37

[62] M. Vokác. An efficient tool for recovering design patterns from c++ code. Journal of Object
Technology, 5(1):139–157, 2006.
[63] M. von Detten and S. Becker. Combining clustering and pattern detection for the reengineering of
component-based software systems. In Proceedings of the joint ACM SIGSOFT conference – QoSA and
ACM SIGSOFT symposium – ISARCS on Quality of software architectures – QoSA and architecting
critical systems – ISARCS, QoSA-ISARCS ’11, pages 23–32, New York, NY, USA, 2011. ACM.

[64] Y. Wang and J. Huang. Formal modeling and specification of design patterns using rtpa. IJCINI, pages
100–111, 2008.
[65] P. Wegrzynowicz and K. Stencel. Relaxing queries to detect variants of design patterns. In Computer
Science and Information Systems (FedCSIS), 2013 Federated Conference on, pages 1571–1578, 2013.

[66] L. Wendehals. Improving design pattern instance recognition by dynamic analysis. In Proceedings
of the 2003 International Workshop on Dynamic Systems Analysis, WODA ’03, pages 29–32. ACM,
2003.

[67] L. Wendehals and A. Orso. Recognizing behavioral patterns atruntime using finite automata. In
Proceedings of the 2006 International Workshop on Dynamic Systems Analysis, WODA ’06, pages
33–40, New York, NY, USA, 2006. ACM.

Copyright ⃝
c 2013 John Wiley & Sons, Ltd. J. Softw. Maint. Evol.: Res. Pract. (2013)
Prepared using smrauth.cls DOI: 10.1002/smr

View publication stats

You might also like