DSR Evaluation Framework and Method
DSR Evaluation Framework and Method
Science Research
1 Introduction
K. Peffers, M. Rothenberger, and B. Kuechler (Eds.): DESRIST 2012, LNCS 7286, pp. 423–438, 2012.
© Springer-Verlag Berlin Heidelberg 2012
424 J. Venable, J. Pries-Heje, and R. Baskerville
(IS) Design Theories [3], [20]. March and Smith [6] identify “build” and “evaluate” as
two DSR activities. Hevner et al [5] identify evaluation as “crucial” (p. 82). In their
third guideline for Design Science in IS Research, they state that “The utility, quality,
and efficacy of a design artifact must be rigorously demonstrated via well-executed
evaluation methods” (p. 85).
Evaluation provides evidence that a new technology developed in DSR “works” or
achieves the purpose for which it was designed. Without evaluation, outcomes of
DSR are unsubstantiated assertions that the designed artifacts, if implemented and
deployed in practice, will achieve their purpose. Rigorous, scientific research requires
evidence. If Design Science Research is to live up to its label as “science”, the
evaluation must be sufficiently rigorous.
But how should rigorous evaluation be designed and conducted? What strategies
and methods should be used for evaluation in a particular DSR project? How can the
evaluation be designed to be both effective (rigorous) and efficient (prudently using
resources, including time)? What would constitute good guidance for answering these
questions?
Unfortunately, there is little guidance in the DSR literature about the choice of
strategies and methods for evaluation in DSR. A notable exception is Pries-Heje et al
[11], who develop a 2-by-2 framework to guide selection of evaluation strategy(ies)
for a DSR project. They identify that evaluation design needs to decide what will be
evaluated, when it will be evaluated, and how it will be evaluated. However, beyond
providing the framework and an idea of what needs to be designed in the DSR
component of research, they provide very little guidance in how a research should or
could actually design the DSR evaluation component. This state of affairs in DSR
constitutes what we can call an “evaluation gap”.
The purpose of this paper is to address this evaluation gap by developing a DSR
evaluation framework with clear guidance for how one could design and conduct
evaluation within DSR. Making a strong, published evaluation framework available
to design science researchers, particularly novice ones, can simplify the research
design and reporting. Such guidance would help DSR researchers make decisions
about how they can (and perhaps should) conduct the evaluation activities of DSR.
It is important to clarify that the framework developed here is to aid DSR
researchers in the design of the evaluation component of their DSR. The framework
proposed here is not a framework for evaluating DSR projects as a whole or after the
fact. Conducting DSR involves much more than the evaluation of the resulting DSR
artifacts and IS Design Theories and such broader evaluation of a whole DSR project
is outside the scope of this paper.
This next section of this paper discusses relevant literature on evaluation in DSR to
elucidate the “evaluation gap” addressed in this paper. Section 3 describes an
extended framework and method developed to address this gap. Section 4 describes
the evaluation of the method in use by novice design science researchers. Finally
section 5 discusses the findings and presents conclusions.
A Comprehensive Framework for Evaluation in Design Science Research 425
This section considers the DSR literature concerning the purposes for evaluation in
DSR, characteristics or aspects to be evaluated in DSR evaluation, kinds of artifacts
(evaluands) in DSR, design goals to be addressed in the design of a DSR evaluation
method, methods proposed for evaluation in DSR, and guidance for designing the
evaluation component of DSR.
March and Smith [6] define evaluation as “the process of determining how well the
artifact performs.” (p. 254). The central purpose of DSR evaluation then is to
rigorously demonstrate the utility of the artifact being evaluated (known as the
“evaluand” [13] ). DSR design artifacts “are assessed against criteria of value or
utility – does it work?” [6]. A key purpose of DSR evaluation then is to determine
whether or how well the developed evaluand achieves its purpose.
Evaluating the design artifact’s utility for purpose is closely related to the concepts of
IS Design Theories (ISDTs) [3], [18], [20], design principles [7], [10], [12], or
technological rules [16], which are formalizations of knowledge about designed
artifacts and their utility. When an artifact is evaluated for its utility in achieving its
purpose, one is also evaluating a design theory that the design artifact has utility to
achieve that purpose. From the point of view of design theory, a second purpose of
evaluation in DSR is to confirm or disprove (or enhance) the design theory.
3. Evaluate a designed artifact or formalized knowledge about it in comparison to
other designed artifacts’ ability to achieve a similar purpose
In addition to the first purpose above, Venable [17] identifies a third purpose –
evaluating the artifact “in comparison to other solution technologies” (p. 4). A new
artifact should provide greater relative utility than existing artifacts that can be used to
achieve the same purpose.
Another purpose that Venable [17] identifies is evaluating an artifact for other
(undesirable) impacts in the long run, i.e. for side effects (particularly dangerous
ones).
Next we consider the different kinds of evaluands. Based on the literature, we can
identify two different classifications of artifacts.
A Comprehensive Framework for Evaluation in Design Science Research 427
First, we can distinguish product artifacts from process artifacts [3], [18]. Product
artifacts are technologies such as tools, diagrams, software, etc. that people use to
accomplish some task. Process artifacts are methods, procedures, etc. that guide
someone or tell them what to do to accomplish some task.
Second, we can distinguish between technical artifacts and socio-technical
artifacts. Some artifacts are in some sense “purely” (or nearly purely) technical, in
that they do not require human use once instantiated. Socio-technical artifacts are
ones with which humans must interact to provide their utility.
Relating the technical vs socio-technical distinction to the product vs process
distinction, product artifacts may be either (purely) technical or socio-technical, while
process artifacts are always socio-technical, which will have implications for their
evaluation.
Next we consider what goals there are for the design of the evaluation itself. There are
(at least) three possibly competing goals in designing the evaluation component of
DSR.
• Rigor: Research, including DSR, should be rigorous. Rigor in DSR has two
senses. The first is in establishing that it is the artifact (instantiation) that causes an
observed improvement (and only the artifact, not some confounding independent
variable or circumstance), i.e. its efficacy. The second sense of rigor in DSR is in
establishing that the artifact (instantiation) works in a real situation (despite
organisational complications, unanticipated human behavioral responses, etc.), i.e.
its effectiveness.
• Efficiency: A DSR evaluation should work within resource constraints (e.g.
money, equipment, and people’s time) or even minimize their consumption.
• Ethics: Research, including DSR, should not unnecessarily put animals, people,
organizations, or the public at risk during or after evaluation, e.g. for safety critical
systems and technologies. Venable [19] discusses some ethical issues in DSR.
The 5 E’s [1] are also relevant to the design of the evaluation part of a DSR project.
Each of the above goals corresponds to one of the 5 E’s. Only Elegance is missing,
although presumably an elegant evaluation would be preferable to an inelegant one.
Importantly these goals conflict and DSR evaluation must balance these goals.
Artificial and naturalistic evaluation each have their strengths and weaknesses. To
the extent that naturalistic evaluation is affected by confounding variables or
misinterpretation, evaluation results may not be precise or even truthful about an
artifact’s utility or efficacy in real use. On the other hand, artificial evaluation
involves abstraction from the natural setting and is necessarily “unreal” according to
one or more of Sun and Kantor’s [14] three realities (unreal users, unreal systems, or
unreal problems). To the extent that an artificial evaluation setting is unreal,
evaluation results may not be applicable to real use. In contrast, naturalistic evaluation
offers more critical face validity. Evaluation in a naturalistic setting is “the real
‘proof of the pudding’” [17, p. 5].
Further, Venable noted that more than one method could be used, mixing artificial
and naturalistic evaluation as well as positivist and interpretive evaluation methods,
leading to a pluralist view of science, where each has its strengths in contributing to a
robust evaluation depending on the circumstance. Nonetheless, Venable [17] provided
little or no guidance about selecting among methods and designing an evaluation
strategy.
In summary, the DSR literature identifies a fairly large number and variety of
evaluation methods, but gives little advice as to choice among methods, i.e. how to
design an evaluation strategy for a particular DSR project.
While the above suggestions to guide the research design of evaluation in DSR are
useful, we believe they are incomplete and less useful than they might be. There is no
guidance for considering how the different purposes, evaluation design goals,
available resources, etc. can or should be considered when choosing a DSR evaluation
strategy or strategies. Moreover, they provide no guidance about how to select
evaluation methods. These difficulties are addressed in the next section.
that set the criteria for the evaluation design to a potential evaluation strategy or
strategies (see figure 2 and section 3.1), (2) an extended framework to map a chosen
evaluation strategy or strategies to candidate evaluation methods (see figure 3 and
section 3.2), and (3) a process or method to use the two extended frameworks (see
section 3.3).
The first extension relates or maps various aspects of the context of the evaluation in
the DSR project to the framework by Pries-Heje et al [11], as shown in figure 2 (A
DSR Evaluation Strategy Selection Framework). Relevant aspects of the context of
the DSR evaluation serve as the starting point and input to the design of the DSR
evaluation. Relevant contextual aspects include (1) the different purposes of
evaluation in DSR, (2) the characteristics of the evaluand to be evaluated, (3) the type
of evaluand to be evaluated, and (4) the specific goals that must be balanced in the
design of the evaluation part(s) of a DSR project. These four contextual aspects were
discussed in sections 2.1 through 2.4 respectively.
In figure 2, the above four contextual aspects are combined into criteria that should
be considered as input to the DSR evaluation design. These criteria include the
following and are mapped to ex ante vs ex post and artificial vs naturalistic evaluation
as shown in the white areas of figure 2,
• The extent to which cost and time resource limitations constrain the evaluation or
the whole research project
• Whether (or not) early, formative evaluation is desirable and feasible
• The extent to which the artifact being designed has to please heterogeneous groups
of stakeholders or if there is likely to be conflict, which will complicate evaluation
• Whether the system is purely technical in nature or socio-technical in nature, with
the consequent difficulties of the latter (cf. artifact focus as either technical,
organizational, or strategic [2]
• How important strong rigor concerning effectiveness in real working situations is
• How important strong rigor concerning whether benefits are specifically due to
the designed artifact, rather than some other potential cause (or confounding
variable), is
• Whether or not access to a site for naturalistic evaluation is available or can be
obtained
• Whether the level of risk for evaluation participants is acceptable or needs to be
reduced
in figure 2. Note that picking a single box may not be the best strategy; rather, a
hybrid strategy (more than one quadrant) can be used to resolve conflicting goals.
Having decided the high level strategy to be used for evaluation (i.e. which of the
quadrants in Figure 2 will be used for the evaluation), then the particular evaluation
research method(s) need to be chosen and the evaluation designed in detail. Figure 3
gives a mapping of different possible DSR evaluation research methods map into each
quadrant of the framework in Figures 1 and 2. This mapping may omit some potential
evaluation methods and other evaluation methods may be developed or adopted for
DSR.
Depending on which quadrant(s) were chosen as the DSR evaluation strategy
(using figure 2), figure 3 suggests possible evaluation methods that fit the chosen
evaluation strategy. The specific choice of evaluation method or methods requires
434 J. Venable, J. Pries-Heje, and R. Baskerville
substantial knowledge of the method(s). If the DSR researcher is unfamiliar with the
possible methods, he or she will need to learn about them. Further characteristics of
the evaluation method will need to be assessed against the specific goals and other
contextual issues of the specific DSR project. Detailed advice on which method or
methods to select to fit a particular DSR evaluation strategy is therefore beyond the
scope and available space of this paper.
When writing about evaluation it is obvious that the framework derived needs to be
evaluated. We should take our own medicine so to say. To some extent we have. For
three years, the authors have taught various versions of the evaluation framework as it
has evolved to a variety of students and scholars carrying out design science research
at our and other universities. They have been taught the four steps presented above as
well as different evaluation methods. In particular, at Roskilde University, they have
been asked to apply the framework in real DSR projects with an average size between
1 and 2 man years (6 people, 3 months full time, is a typical project).
One example from Roskilde University was a group that redesigned a bike lane to
make people behave better when biking, i.e. less rude to other people biking and peo-
ple walking. They designed with Ockham’s Razor of simplicity in mind. They used
the theory of planned behaviour to inform their design. The group decided that their
redesign should be evaluated with a real user (biker) focusing on real problems, i.e.
naturalistically. However, access was a problem since it would not be possible to fully
implement the real solution without obtaining a lot of permissions and red tape from
the Ministry and the Municipality, suggesting an ex ante naturalistic evaluation in-
stead, based on figure 2 (DSR Evaluation Strategy Selection Framework). Thus they
instead chose ex ante evaluation and used a Focus Group for evaluation as suggested
436 J. Venable, J. Pries-Heje, and R. Baskerville
5 Conclusion
Evaluation is a very significant issue in IS Design Science Research, yet there is little
guidance concerning how to choose and design an appropriate evaluation strategy.
A Comprehensive Framework for Evaluation in Design Science Research 437
To address the above need, we have developed and presented three enhancements
to the existing DSR Evaluation Strategy Framework proposed by Pries-Heje et al
[11], which are based on an analysis and synthesis of works on DSR as presented in
section 2. The first part of the extended framework (figure 2) maps aspects of the
context of a DSR evaluation, such as resources, purpose, goals, and priorities, to the
two dimensions and four quadrants of the Pries-Heje et al [11] DSR Evaluation
Strategy Framework. The second part (figure 3) maps the quadrants (or the selected
relevant DSR evaluation strategy or strategies) to available and relevant research
methods that could be chosen to conduct the evaluation or multiple evaluation
episodes. We have further developed a detailed four-step method for the design of the
evaluation components in a DSR project. This new framework and method should
assist DSR researchers, particularly those new to the field, to improve the efficiency
and effectiveness of their DSR evaluation activities.
The primary aim of the enhanced framework and method is to guide Design
Science researchers who may need assistance in deciding how to design the
evaluation component of their DSR projects. The framework could also be used by
reviewers of DSR publications or research proposals in evaluating research design
choices, but that is not our intent.
We have tried out and evaluated the extended framework and method in numerous
design research projects, including our own and student projects. Nonetheless, further
research is needed to gain more experience using the comprehensive DSR evaluation
framework and the DSR evaluation design method, further evaluate their utility, and
further develop and improve the method, especially as new DSR evaluation methods
are developed.
References
1. Checkland, P., Scholes, J.: Soft Systems Methodology in Practice. J. Wiley, Chichester
(1990)
2. Cleven, A., Gubler, P., Hüner, K.: Design Alternatives for the Evaluation of Design
Science Research Artifacts. In: Proceedings of the 4th International Conference on Design
Science Research in Information Systems and Technology (DESRIST 2009). ACM Press,
Malvern (2009)
3. Gregor, S., Jones, D.: The Anatomy of a Design Theory. Journal of the Association for In-
formation Systems 8, 312–335 (2007)
4. Gummesson, E.: Qualitative Methods in Management Research. Studentlitterature, Chart-
well-Bratt, Lund, Sweden (1988)
5. Hevner, A.R., March, S.T., Park, J., Ram, S.: Design Science In Information Systems Re-
search. MIS Quarterly 28, 75–105 (2004)
6. March, S.T., Smith, G.F.: Design and natural science research on information technology.
Decision Support Systems 15, 251–266 (1995)
7. Markus, M.L., Majchrzak, A., Gasser, L.: A design theory for systems that support emer-
gent knowledge processes. MIS Quarterly 26, 179–212 (2002)
8. Nunamaker, J.F., Chen, M., Purdin, T.D.M.: Systems Development in Information Sys-
tems Research. Journal of Management Information Systems 7, 89–106 (1990/1991)
438 J. Venable, J. Pries-Heje, and R. Baskerville
9. Peffers, K., Tuunanen, T., Rothenberger, M.A., Chatterjee, S.: A Design Science Research
Methodology for Information Systems Research. Journal of Management Information Sys-
tems 24 (2008)
10. Pries-Heje, J., Baskerville, R.: The Design Theory Nexus. MIS Quarterly 32, 731–755
(2008)
11. Pries-Heje, J., Baskerville, R., Venable, J.R.: Strategies for Design Science Research
Evaluation. In: Proceedigns of the 16th European Conference on Information Systems
(ECIS 2008), Galway, Ireland (2008)
12. Sein, M.K., Henfridsson, O., Purao, S., Rossi, M., Lindgren, R.: Action Design Research.
MIS Quarterly 35, 37–56 (2011)
13. Stufflebeam, D.L.: The Methodology of Metaevaluation as Reflected in Metaevaluations
by the Western Michigan University Evaluation Center, pp. 95–125 (2000)
14. Sun, Y., Kantor, P.B.: Cross-Evaluation: A new model for information system evaluation.
Journal of the American Society for Information Science and Technology 57, 614–628
(2006)
15. Vaishnavi, V., Kuechler, W.: Design Research in Information Systems. AISWorld (2004),
[Link]
(accessed March 3, 2012)
16. van Aken, J.E.: Management Research Based on the Paradigm of the Design Sciences: The
Quest for Field-Tested and Grounded Technological Rules. Journal of Management Stu-
dies 41, 219–246 (2004)
17. Venable, J.R.: A Framework for Design Science Research Activities. In: Proceedings of
the 2006 Information Resource Management Association Conference, Washington, DC,
USA (2006)
18. Venable, J.R.: The Role of Theory and Theorising in Design Science Research. In: Hevner,
A.R., Chatterjee, S. (eds.) Proceedings of the 1st International Conference on Design
Science Research in Information Systems and Technology (DESRIST 2006), Claremont,
CA, USA (2006)
19. Venable, J.R.: Identifying and Addressing Stakeholder Interests in Design Science Re-
search: An Analysis Using Critical Systems Heuristics. In: Dhillon, G., Stahl, B.C.,
Baskerville, R. (eds.) CreativeSME 2009. IFIP AICT, vol. 301, pp. 93–112. Springer, Hei-
delberg (2009)
20. Walls, J.G., Widmeyer, G.R., El Sawy, O.A.: Building an information system design
theory for vigilant EIS. Information Systems Research 3, 36–59 (1992)