Brown 2015
Brown 2015
To cite this article: Gavin T.L. Brown, Heidi L. Andrade & Fei Chen (2015): Accuracy in student
self-assessment: directions and cautions for research, Assessment in Education: Principles, Policy &
Practice, DOI: 10.1080/0969594X.2014.996523
Taylor & Francis makes every effort to ensure the accuracy of all the information (the
“Content”) contained in the publications on our platform. However, Taylor & Francis,
our agents, and our licensors make no representations or warranties whatsoever as to
the accuracy, completeness, or suitability for any purpose of the Content. Any opinions
and views expressed in this publication are the opinions and views of the authors,
and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content
should not be relied upon and should be independently verified with primary sources
of information. Taylor and Francis shall not be liable for any losses, actions, claims,
proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or
howsoever caused arising directly or indirectly in connection with, in relation to or arising
out of the use of the Content.
This article may be used for research, teaching, and private study purposes. Any
substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,
systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &
Conditions of access and use can be found at http://www.tandfonline.com/page/terms-
and-conditions
Downloaded by [Florida Atlantic University] at 03:08 02 February 2015
Assessment in Education: Principles, Policy & Practice, 2015
http://dx.doi.org/10.1080/0969594X.2014.996523
test scores than for global qualities like cognitive competence or for complex perfor-
mances such as writing and projects. Studies of consistency tend to fall into two cat-
egories: student self-assessments relative to teacher ratings and self-assessments
relative to test scores.
concrete the reference criteria (Claes & Salame, 1975), the more accurately students
estimated their own performance. Comparison to explicitly stated criteria, goals or
standards as the basis for self-assessment can also improve the veridicality of self-
assessments (Andrade & Valtcheva, 2009). Accuracy was improved when a small
sample of Grades 5 and 6 American students were explicitly taught to use a self-
checking strategy when solving long division problems (Ramdass & Zimmerman,
2008).
Thus, the accuracy of student self-assessment does not appear to be uniform
throughout the student’s life course, nor across the full range of learning activities.
In general, these findings reflect the results of research in the distinct but related
field of calibration, which is the degree of fit between a person’s judgement of per-
formance and his or her actual performance (Bol & Hacker, 2012). Research on cali-
bration suggests that rewards can increase accuracy (Miller, Duffy, & Zane, 1993),
as can feedback. For example, Miller and Geraci (2011) examined the relation
between the accuracy of 81 undergraduate students’ predictions of their performance
on four exams and (1) extra-credit incentives and (2) explicit, concrete feedback.
They found that the incentives and feedback were related to improved accuracy for
low-performing students but, interestingly, not to higher scores on exams. Predic-
tions about performance on exams are only one type of self-assessment, however.
Little is known about whether or not training in self-assessment influences accuracy
in a classroom context where self-assessment is used formatively to guide revision
and improvement.
Clear criteria
Sharing learning targets and criteria is generally considered good assessment
practice (Brookhart, 2013), and Falchikov and Boud’s (1989) meta-analysis found
6 G.T.L. Brown et al.
student familiarity with rating criteria enhanced accuracy and the alignment of
ratings with academics. Indeed, having students involved in the process of creating
criteria for rubrics is especially associated with greater learning outcomes (Andrade,
Du, & Mycek, 2010; Andrade, Du, & Wang, 2008; Ross, Rolheiser, &
Hogaboam-Grey, 1998a; Sadler & Good, 2006).
Models
Samples of target performances, particularly exemplars, are associated with
improved performance (Andrade et al., 2010; Hewitt, 2001), and might enhance
accuracy if the models are used as benchmarks (Dunning et al., 2004).
Downloaded by [Florida Atlantic University] at 03:08 02 February 2015
Feedback on accuracy
Feedback from others as to the accuracy of students’ predictions of performance has
the potential to improve accuracy (Miller & Geraci, 2011). Self-monitoring and
reflection might also encourage students to compare their self-assessment and actual
performance over time, thereby enabling more accurate self-assessment (Lopez &
Kossack, 2007).
Rewards
Getting students to set stringent targets for self-selected rewards combined with self-
monitoring or self-marking has led to improved learning outcomes (Barling, 1980;
Miller et al., 1993; Wall, 1982). However, the effects of incentives on accuracy tend
to be mixed (Hacker, Bol, & Bahbahani, 2008), so this tactic should be employed
with caution.
Assessment in Education: Principles, Policy & Practice 7
Keep it formative
Including student self-assessments as part of summative course grades introduces
high-stakes consequences for honest, accurate evaluations (Andrade, 2010; Brown
& Harris, 2013). We recommend studying formative self-assessment in contexts that
do not tempt students to inflate or distort their self-evaluations.
assessors (whether high or low performers) how to produce more valid and realistic
evaluations of their work. On closer inspection, however, such an approach is shown
to be somewhat complicated. With concerns for consequential validity (Messick,
1989), in this section, we consider several methodological pitfalls – reliability, grad-
ing, social response bias, response style and trust/respect – and recommend possible
ways to avoid them.
might selectively tell all the truth to the outside world, she does not avoid confront-
ing the whole truth inside herself’ (Raider-Roth, 2005, p. 128). If such attitudes and
behaviours are common, shared self-assessments could provide counterfeit data.
Because disclosure is highly individualised, and trust and respect are essential quali-
ties of a classroom in which students are willing to disclose their knowledge and
engage in assessment for learning (Tierney, 2010), researchers interested in investi-
gating issues of accuracy in self-assessment will have to attend to the classroom
environment and its effects on their data (Brown & Harris, 2013). Here again,
assuring students of anonymity by asking them to hand in assignments and self-
assessments with no identifying information attached might help researchers, if not
teachers. Alternatively, researchers might allow students to restrict access to their
self-assessments to those whom the student trusts and gives permission, potentially
excluding the teacher (Cowie, 2009).
Downloaded by [Florida Atlantic University] at 03:08 02 February 2015
Conclusion
The pitfalls discussed above are complex and not easily resolved. Until we have a
better understanding of the nature of self-assessment, the best we can do as research-
ers is to attempt to create the optimal conditions for accuracy and avoid known pit-
falls, which include issues of reliability, grading, social response bias, response style
and trust/respect. Reliability should be addressed by maximising the psychometric
quality of any criterion used to judge the accuracy of student self-assessments (e.g.
test, teacher rating, etc.): there is no point evaluating students’ accuracy with an
inaccurate measure. Problems related to grading and trust/respect can be managed
by implementing self-assessment in a context likely to promote accuracy – or at
least not promote inaccuracy – meaning that self-assessments should not count
towards grades, and should be private. Social response bias and response style can
be managed to some degree by encouraging students to be honest and accurate, but
students’ tendencies towards bias could also be measured.
Of course, we also recommend the use of randomised trials in order to establish
causal inferences about the impact of variables on the accuracy of self-assessment.
To date, most studies of self-assessment accuracy have not employed randomisation
of assignment. Experimental studies that involve random assignment generally
involve learners in tasks that lack authenticity and, thereby, limit the generalisability
of results to classroom contexts. With large enough samples, it would be possible to
statistically control for factors influencing accuracy, including prior ability or
achievement level, task difficulty, assessment purpose and so on.
Given the social and interpersonal nature of student self-assessment in classroom
contexts and the great variation in culturally preferred classroom climates and prac-
tices, more work needs to be done to understand whether the recommendations we
have made are equally applicable in all societies. It is possible in Confucian-heritage
cultures, for example, that the importance placed on high performance and the need
to avoid low ranks, combined with the pressure from teachers and parents to contin-
ually do better (Brown & Wang, 2013), would discourage or even prevent realistic
evaluations. In systems that are highly selective, it is difficult to expect the weakest
students in a highly proficient class to realise that their work is still actually good.
Likewise, students in systems or schools that provide relatively positive and inflated
reports of proficiency (e.g. Hattie & Peddie’s [2003] description of New Zealand pri-
mary school report cards) are unlikely to develop a realistic sense of the quality of
10 G.T.L. Brown et al.
Notes on contributors
Gavin T.L. Brown is an associate professor of Education and Director of the Quantitative
Data Analysis and Research unit in the Faculty of Education at the University of Auckland,
Downloaded by [Florida Atlantic University] at 03:08 02 February 2015
New Zealand. He has published over 90 research articles in refereed journals and book chap-
ters, and has written two textbooks on assessment and co-authored two standardised educa-
tional test systems. He is co-author of a recent review of student self-assessment published in
The Sage Handbook of Research on Classroom Assessment (2013). His major research inter-
est is cross-cultural study of the social psychological effects of assessment on prospective
and in-service teachers and upon school and higher education students.
Fei Chen is a PhD student in the educational psychology and methodology programme at the
University at Albany, State University of New York, supervised by Dr Heidi Andrade. Her
research interests include self-regulated learning, learning from instruction and assessment,
and gifted education.
References
Alsaker, F. D. (1989). School achievement, perceived academic competence and global self-
esteem. School Psychology International, 10, 147–158. doi:10.1177/0143034389102009
Andrade, H. (2010). Students as the definitive source of formative assessment: Academic
self-assessment and the self-regulation of learning. In H. Andrade & G. Cizek (Eds.),
Handbook of formative assessment (pp. 90–105). New York, NY: Routledge.
Andrade, H. L., Du, Y., & Mycek, K. (2010). Rubric-referenced self-assessment and middle
school students’ writing. Assessment in Education: Principles, Policy & Practice, 17,
199–214. doi:10.1080/09695941003696172
Andrade, H. L., Du, Y., & Wang, X. (2008). Putting rubrics to the test: The effect of a model,
criteria generation, and rubric-referenced self-assessment on elementary school students’
writing. Educational Measurement: Issues and Practice, 27, 3–13. doi:10.1111/j.1745-
3992.2008.00118.x
Andrade, H., & Valtcheva, A. (2009). Promoting learning and achievement through
self-assessment. Theory into Practice, 48, 12–19.
Barling, J. (1980). A multistage multidependent variable assessment of children’s
self-regulation of academic performance. Child Behavior Therapy, 2, 43–54.
Barnett, J. E., & Hixon, J. E. (1997). Effects of grade level and subject on student test score
predictions. The Journal of Educational Research, 90, 170–174.
Birnbaum, R. (1972). Factors associated with the accuracy of self-reported high-school
grades. Psychology in the Schools, 9, 364–370. doi:10.1002/1520-6807%28197210%
299:4%3C364:AID-PITS2310090404%3E3.0.CO;2-G
Assessment in Education: Principles, Policy & Practice 11
Brooks, V. (2002). Assessment in secondary schools: The new teacher’s guide to monitoring,
assessment, recording, reporting and accountability. Buckingham: Open University
Press.
Brown, G. T. L., & Harris, L. R. (2013). Student self-assessment. In J. H. McMillan (Ed.),
The Sage handbook of research on classroom assessment (pp. 367–393). Thousand Oaks,
CA: Sage.
Brown, G. T. L., & Harris, L. R. (2014). The future of self-assessment in classroom practice:
Reframing self-assessment as a core competency. Frontline Learning Research, 2, 22–30.
doi:10.14786/flr.v2i1.24
Brown, G. T. L., & Wang, Z. (2013). Illustrating assessment: How Hong Kong university
students conceive of the purposes of assessment. Studies in Higher Education, 38,
1037–1057. doi:10.1080/03075079.2011.616955
Butler, R. (1990). The effects of mastery and competitive conditions on self-assessment at
different ages. Child Development, 61, 201–210.
Butler, R. (2011). Are positive illusions about academic competence always adaptive, under
all circumstances: New results and future directions. International Journal of Educational
Research, 50, 251–256. doi:10.1016/j.ijer.2011.08.006
Claes, M., & Salame, R. (1975). Motivation toward accomplishment and the self-evaluation
of performances in relation to school achievement. Canadian Journal of Behavioural
Science/Revue canadienne des sciences du comportement, 7, 397–410. doi:10.1037/
h0081924
Connell, J. P., & Ilardi, B. C. (1987). Self-system concomitants of discrepancies between
children’s and teachers’ evaluations of academic competence. Child Development, 58,
1297–1307. doi:10.2307/1130622
Cowie, B. (2009). My teacher and my friends helped me learn: Student perceptions and
experiences of classroom assessment. In D. M. McInerney, G. T. L. Brown, & G. A. D.
Liem (Eds.), Student perspectives on assessment: What students can tell us about
assessment for learning (pp. 85–105). Charlotte, NC: Information Age.
Dalton, D., & Ortegren, M. (2011). Gender differences in ethics research: The importance of
controlling for the social desirability response bias. Journal of Business Ethics, 103,
73–93. doi:10.1007/s10551-011-0843-8
Dunning, D., Heath, C., & Suls, J. M. (2004). Flawed self-assessment: Implications for
health, education, and the workplace. Psychological Science in the Public Interest, 5,
69–106.
Eccles, J., Wigfield, A., Harold, R. D., & Blumenfeld, P. (1993). Age and gender differences
in children’s self- and task perceptions during elementary school. Child Development, 64,
830–847.
Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah,
NJ: Lawrence Erlbaum Associates.
Epley, N., & Gilovich, T. (2005). When effortful thinking influences judgmental anchoring:
Differential effects of forewarning and incentives on self-generated and externally
provided anchors. Journal of Behavioral Decision Making, 18, 199–212.
12 G.T.L. Brown et al.
LaVoie, J. C., & Hodapp, A. F. (1987). Children’s subjective ratings of their performance on
a standardized achievement test. Journal of School Psychology, 25, 73–80. doi:10.1016/
0022-4405%2887%2990062-8
Leahy, S., Lyon, C., Thompson, M., & Wiliam, D. (2005). Classroom assessment minute by
minute, day by day. Educational Leadership, 63(3), 18–24.
Lew, M., Alwis, W., & Schmidt, G. (2010). Accuracy of students’ self-assessment and their
beliefs about its utility. Assessment & Evaluation in Higher Education, 35, 135–156.
Lipnevich, A., & Smith, J. (2008). Response to assessment feedback: The effects of grades,
praise, and source of information (Research Report RR-08-30). Princeton, NJ:
Educational Testing Service.
Lopez, R., & Kossack, S. (2007). Effects of recurring use of self-assessment in university
courses. International Journal of Learning, 14, 203–216.
Luyten, H., & Dolkar, D. (2010). School-based assessments in high-stakes examinations in
Bhutan: A question of trust? Exploring inconsistencies between external exam scores,
school-based assessments, detailed teacher ratings, and student self-ratings. Educational
Downloaded by [Florida Atlantic University] at 03:08 02 February 2015
Wilson, J., & Wright, C. R. (1993). The predictive validity of student self-evaluations,
teachers’ assessments, and grades for performance on the verbal reasoning and numerical
ability scales of the differential aptitude test for a sample of secondary school students
attending rural Appalachia schools. Educational and Psychological Measurement, 53,
259–270. doi:10.1177/0013164493053001029