Faculty vs Trainee EPA Score Comparison
Faculty vs Trainee EPA Score Comparison
© Copyright 2022
Katz et al. This is an open access article 1. Medicine, University of Alberta, Edmonton, CAN
distributed under the terms of the Creative
Commons Attribution License CC-BY 4.0., Corresponding author: Steven J. Katz, katz1@[Link]
which permits unrestricted use, distribution,
and reproduction in any medium, provided
the original author and source are credited.
Abstract
Introduction
Competency by Design (CBD) began on July 1, 2019, for postgraduate year 1 (PGY1) Canadian Core Internal
Medicine (CIM) residents. Many entrustable professional activity (EPA) observations allow for assessment by
either a faculty physician, senior medicine resident (SMR), or subspecialty resident (SSR). However, few
studies exist that compare EPA scores and comments given by faculty vs senior trainees (SMRs and SSRs).
This study aimed to identify differences in EPA scores and comments given to PGY1 residents by faculty
physicians vs senior trainees.
Methods
Scores and comments of EPAs completed between July 1, 2019, and June 30, 2020, for 35 CIM PGY1 residents
were extracted anonymously from the University of Alberta CBD platform. Scores from faculty vs senior
trainees were compared with the Mann-Whitney U test and the Kruskal-Wallis test. Word counts for positive
and constructive comments written by faculty vs senior trainees were compared with the independent t-test
and one-way ANOVA. The most common two-word phrases in comments were identified with QI Macros
software (Denver, CO: KnowWare International, Inc.).
Results
A total of 2226 EPAs were observed. Faculty physicians gave significantly lower EPA scores overall compared
to senior trainees (U = 501706, P <0.001). Constructive comments written by faculty (M = 14.06, SD = 16.84)
had lower word counts compared to senior trainees (M = 15.85, SD = 16.43) for overall EPAs (t{2224} = -2.528,
P = 0.012).
Conclusion
Faculty physicians gave lower EPA scores and had lower word counts on constructive comments, compared
to senior trainees. These results may help the ongoing implementation of Competence by Design.
Introduction
With the increased attention to competency-based medical education (CBME) over the past few years, the
Royal College of Physicians and Surgeons of Canada (RCPSC) recently implemented its version of CBME
called Competence by Design (CBD), first formally launching in July 2017 with Anesthesiology and
Otolaryngology - Head and Neck Surgery, and Core Internal Medicine (CIM) formally adopting CBD later in
July 2019 [1-3]. Faculty physicians, senior medicine residents (SMRs), and subspecialty residents (SSRs - a
resident who has completed their Core Internal Medicine training {postgraduate years 1-3 "PGY1-3"} and are
now completing subspecialty training {PGY4-6}) assess junior residents in performing entrustable
professional activities (EPAs), which are the essential tasks of the specialty the resident is training in [4-6].
With SMRs and SSRs becoming more involved in assessing their junior colleagues with EPAs, it is possible
there are differences that exist between assessments from senior trainees (SMRs and SSRs) and faculty
physicians, and if so, it is possible these differences have consequences on junior resident assessment.
Individual EPA observations are scored on a 1-5 entrustment scale adapted from the Ottawa Clinic
Assessment Tool; an EPA score of 1 equates to “I had to do”, meaning the supervisor had to completely take
over the task, and a score of 5 equates to “I didn’t need to be there”[7], meaning the learner was able to
perform the task competently and safely without the theoretical presence of a supervisor. The assessor can
also write positive and constructive comments. These EPA assessments form the basis of resident
progression through the four stages of CBD: transition to discipline (TD), foundations of discipline (FD),
core of discipline, and transition to practice [6,8].
As outlined by conceptual models from Kogan et al. and Berendonk et al., the cognitive process of assessing
This study compares the scores and comments for TD and FD EPAs given by faculty physicians vs senior
trainees to PGY1 residents in the CIM residency program at the University of Alberta.
QI Macros software (Denver, CO: KnowWare International, Inc.) was used to find the top ten most common
two-word phrases for both positive and constructive EPA comments provided by faculty physicians and
senior trainees. The University of Alberta Medical Ethics Board approved this project (#Pro00097054). The
research was conducted in accordance with the Declaration of Helsinki.
Results
EPA scores: general statistics
A total of 2226 EPAs completed by 35 PGY1 CIM residents were observed for TD1-FD6. Faculty physicians
observed 1174 EPAs, with a mean score of 4.56 ± 0.639. Senior trainees observed 1052 EPAs, with a mean
score of 4.80 ± 0.423. Out of the total 2226 EPAs observed, 1909 EPAs were observed for TD1-FD4a. Faculty
physicians observed 989 EPAs, with a mean score of 4.54 ± 0.652. SMRs observed 496 EPAs, with a mean
score of 4.85 ± 0.38. SSRs observed 424 EPAs, with a mean score of 4.73 ± 0.48 (Table 1).
Overall (TD1-FD6)
Overall (TD1-FD4b)
TD (TD1-3)
FD (FD1-6)
FD (FD1-4b)
TABLE 1: EPA scores, organized by stage, given by faculty physicians, SMR, and SSR
EPA: entrustable professional activity; SMR: senior medicine resident; SSR: subspecialty resident; TD: transition to discipline; FD: foundations of discipline
TABLE 2: Mann-Whitney U test comparison of EPA scores between faculty physicians and senior
trainees.
EPA: entrustable professional activity; TD: transition to discipline; FD: foundations of discipline
Overall (TD1-FD4)
TD (TD1-3)
FD (FD1-4b)
The Kruskal-Wallis H test showed that there was a significant difference in TD1-3 EPA scores given by
faculty physicians vs SMRs vs SSRs (H = 31.125, P <0.001). Pairwise comparisons done by the Mann-Whitney
U test showed that faculty physicians gave lower scores compared to SMRs (U = 6093.5, P <0.001) and SSRs (U
= 8070.5, P =0.017). SSRs gave lower scores compared to SMRs (U = 3561.5, P <0.002) (Table 3).
The Kruskal-Wallis H test showed that there was a significant difference in FD1-4b EPA scores given by
faculty physicians vs SMRs vs SSRs (H = 73.905, P <0.001). Pairwise comparisons done by the Mann-Whitney
U test showed that faculty physicians gave lower scores compared to SMRs (U = 124020, P <0.001) and SSRs
(U = 110751, P <0.001). SSRs gave lower scores compared to SMRs (U = 60894, P =0.005) (Table 3).
When looking at each EPA individually, the Kruskal-Wallis H test and pairwise comparisons with the Mann-
Whitney U test showed that faculty physicians gave lower scores compared to SMRs for all EPAs except for
TD3, FD4a, and FD4b, in which there was no significant difference. Faculty physicians gave lower scores
than SSRs for EPAs FD1, FD2a, and FD2b; there was no significant difference between faculty physicians and
SSRs for the other EPAs. SSRs gave lower EPA scores than SMRs for EPAs TD1 and TD2; there was no
significant difference between SSRs and SMRs for the other EPAs.
Positive comments
Overall (TD1-FD6) 7.414 0.007 19.15 20.157 18.73 14.886 0.557 2147.521 0.578
TD (TD1-3) 3.192 0.075 18.3 23.711 15.84 14.225 1.236 387 0.217
FD (FD1-6) 4.729 0.03 19.32 19.361 19.37 14.96 -0.62 1803.366 0.95
Constructive comments
Overall (TD1-FD6) 0.111 0.738 14.06 16.842 15.85 16.431 -2.528 2224 0.012
TD (TD1-3) 0.75 0.387 14.52 18.231 15.08 17.019 -0.314 387 0.754
FD (FD1-6) 0.014 0.905 13.97 16.553 16.02 16.304 -2.667 1835 0.008
TABLE 4: Independent t-test comparison of mean word count for positive and constructive
comments written by faculty physicians and senior trainees
TD: transition to discipline; FD: foundations of discipline
When comparing overall word counts for positive comments for EPAs TD1-FD4b, ANOVA showed that there
was no significant difference between faculty physicians, SMRs, and SSRs (F{2, 1906} = 1.118, P = 0.327)
(Table 5). There was also no significant difference in the word count of positive comments for EPAs TD1-3,
EPAs FD1-4b, and each EPA individually when comparing faculty physicians, SMRs, and SSRs (Table 5).
Positive comments
TD1-FD4b
TD1-3
FD1-4b
Constructive comments
TD1-FD4b
TD1-3
FD1-4b
Post-hoc analysis with Tukey HSD for constructive comments on EPAs FD1-4b
TABLE 5: Mean word count for positive and constructive comments written by faculty physicians,
SMR, and SSR
EPA: entrustable professional activity; SMR: senior medicine resident; SSR: subspecialty resident; TD: transition to discipline; FD: foundations of
discipline; MS: mean squares; SS: sum of squares
For constructive EPA comments, comments from faculty physicians (M = 14.06, SD = 16.84) had lower word
counts than senior trainees (M = 15.85, SD = 16.43) for EPAs TD1-FD6 (t{2224} = -2.528, P = 0.012). Faculty
physicians (M = 13.97, SD = 16.55) also had lower word counts than senior trainees (M = 16.02, SD = 16.30) for
EPAs FD1-6 (t{1835} = -2.667, P = 0.008). There was no significant difference in word counts between faculty
physicians (M = 14.52, SD = 18.23) and senior trainees (M = 15.08, SD = 17.019) for EPAs TD1-3 (t{387} = -
0.314, P = 0.754) (Table 4). ANOVA showed that there was a significant difference between faculty
Discussion
Overall, faculty physicians gave significantly lower EPA scores compared to senior trainees, and among
senior trainees, SSRs gave significantly lower EPA scores than SMRs. This relationship between faculty
physicians and senior trainees was present for overall EPA scores, and remained when the TD and FD stages
were considered separately. Faculty physicians gave lower scores compared to senior trainees for most
individual EPAs as well.
These results support other studies in which medical students and residents gave higher scores than faculty
on assessments such as OSCEs and workplace-based assessments [11-13]. For example, Hill et al. showed
that faculty consultants rated medical students more strictly than specialist registrars [16]. Other studies
have shown that assessors with greater seniority and rater experience have stricter scoring tendencies
[17,18]. The greater seniority and rater experience of faculty physicians relative to senior trainees may
explain why faculty physicians gave lower EPA scores in our study. However, conflicting literature shows
that an assessor’s rater experience and trainee status do not influence such scores [19,20]. Some studies
show that medical students or residents gave lower ratings than faculty physicians when evaluating their
peers [14,15]. Despite these discordant studies, our study supports the idea that faculty physicians give
stricter ratings than senior trainees and that this effect persists in CBME curricula and EPA scores.
The conceptual frameworks from Kogan et al. and Berendonk et al. describe multiple factors that influence
how assessors make judgments and may help explain the differences in EPA scores given by faculty
physicians vs senior trainees [9,10]. One major factor is the assessor’s frame of reference, which serves as the
standard to which junior residents are graded against. Faculty may use their many years of clinical
experience as a frame of reference when grading junior residents, with more senior faculty giving harsher
ratings [8,16]. Senior trainees are still cultivating their clinical expertise as they progress through their
training, and thus may grade junior residents more leniently. Another factor influencing assessors is their
individual characteristics, which include academic rank and prior participation in medical education
workshops. At the time of our study, the CIM program at our institution had already piloted CBD for two
years, and many SMRs had themselves previously participated in CBD as junior residents. This prior
experience with CBD serves as an assessor characteristic for these SMRs - they may better understand the
practical challenges junior residents face when obtaining EPAs, and may be more sympathetic and lenient
with assessments compared to faculty physicians.
The conceptual frameworks also describe the impact of prior relationships between assessor and learner,
which alters the social context in which feedback is given and influences rating tendencies [8,9]. For
example, a prior positive relationship between an assessor and learner may cause the assessor to fall victim
to the “halo effect” and award higher grades. Senior trainees are in an ideal position to develop this kind of
positive relationship with junior residents, as they are more accessible to junior residents compared to
faculty physicians and are closer to junior residents in training [21-25]. Additionally, senior trainees may be
reluctant to provide negative feedback for fear of impairing social relationships with their junior residents
[9,26]. The development of such close working relationships can influence senior trainees to give more
lenient assessments compared to faculty.
Overall, senior trainees had higher word counts for constructive comments for EPAs compared to faculty
physicians. These results are similar to those found by Ringdahl et al, where senior residents were more
likely than senior faculty members to write negative comments when evaluating PGY1 residents [27]. Even
though this difference between senior trainees and faculty physicians in the word counts of constructive
comments is statistically significant, a difference of two words per comment is unlikely to improve the
quality of feedback. This is supported by the fact that the most common phrases for both positive and
constructive comments were similar between faculty and senior trainees, suggesting little difference in
feedback content.
This study does have limitations. We only reviewed EPAs observed for CIM PGY1 residents, and the
difference between faculty physicians and senior trainees may not be as prominent in other disciplines. EPA
Conclusions
Compared to senior trainees, faculty physicians gave significantly lower EPA scores and wrote significantly
shorter constructive comments with their EPAs. The next steps for future research include expanding the
number of residents involved to include multiple programs and disciplines. Residents from multiple sites
should also be studied to examine for more generalizable results. Deeper analysis to determine if other
factors impart a role in these results is also important, including the potential role of assessor and trainee
gender, age, teaching, and clinical experience, as well as the role of EPA burden. If similar results are
identified, educational leaders will need to consider the impact it has on the ongoing rollout of competency-
based medical education to ensure residents are being assessed and provided feedback as intended.
Additional Information
Disclosures
Human subjects: Consent was obtained or waived by all participants in this study. University of Alberta
Health Ethics Board issued approval #Pro00097054. Animal subjects: All authors have confirmed that this
study did not involve animal subjects or tissue. Conflicts of interest: In compliance with the ICMJE uniform
disclosure form, all authors declare the following: Payment/services info: All authors have declared that no
financial support was received from any organization for the submitted work. Financial relationships: All
authors have declared that they have no financial relationships at present or within the previous three years
with any organizations that might have an interest in the submitted work. Other relationships: All authors
have declared that there are no other relationships or activities that could appear to have influenced the
submitted work.
References
1. Frank JR, Snell LS, Cate OT, et al.: Competency-based medical education: theory to practice . Med Teach.
2010, 32:638-45. 10.3109/0142159X.2010.501190
2. July 1 2017 - CBD: making medical education history . (2017). [Link]
launch-medical-education-history-e.
3. CBME: competency based medical education. (2021). Accessed: June 30, 2021:
[Link]
education-cbme.
4. CBD_can a resident complete an observation . Accessed: June 30, 2021:
[Link]
5. Gofton W, Dudek N, Barton G, Bhanji F: Workplace-Based Assessment Implementation Guide: Formative
Tips For Medical Teaching Practice. First Edition. Royal College of Physicians and Surgeons of Canada,
Ottawa, ON; 2017.
6. Competence by design cheat sheet . (2016). Accessed: June 30, 2021:
[Link]
7. Rekman J, Hamstra SJ, Dudek N, Wood T, Seabrook C, Gofton W: A new instrument for assessing resident
competence in surgical clinic: the Ottawa clinic assessment tool. J Surg Educ. 2016, 73:575-82.
10.1016/[Link].2016.02.003
8. Entrustable professional activities for internal medicine version 2 . (2021). Accessed: March 11, 2022:
[Link] .
9. Kogan JR, Conforti L, Bernabeo E, Iobst W, Holmboe E: Opening the black box of clinical skills assessment
via observation: a conceptual model. Med Educ. 2011, 45:1048-60. 10.1111/j.1365-2923.2011.04025.x
10. Berendonk C, Stalmeijer RE, Schuwirth LW: Expertise in performance assessment: assessors' perspectives .
Adv Health Sci Educ Theory Pract. 2013, 18:559-71. 10.1007/s10459-012-9392-x
11. Reiter HI, Rosenfeld J, Nandagopal K, Eva KW: Do clinical clerks provide candidates with adequate formative
assessment during Objective Structured Clinical Examinations?. Adv Health Sci Educ Theory Pract. 2004,
9:189-99. 10.1023/B:AHSE.0000038172.97337.d5
12. Chenot JF, Simmenroth-Nayda A, Koch A, et al.: Can student tutors act as examiners in an objective
structured clinical examination?. Med Educ. 2007, 41:1032-8. 10.1111/j.1365-2923.2007.02895.x
13. Burgess A, Clark T, Chapman R, Mellis C: Senior medical students as peer examiners in an OSCE . Med
Teach. 2013, 35:58-62. 10.3109/0142159X.2012.731101
14. Van Rosendaal GM, Jennett PA: Comparing peer and faculty evaluations in an internal medicine residency .
Acad Med. 1994, 69:299-303. 10.1097/00001888-199404000-00014
15. Bucknall V, Sobic EM, Wood HL, Howlett SC, Taylor R, Perkins GD: Peer assessment of resuscitation skills .
Resuscitation. 2008, 77:211-5. 10.1016/[Link].2007.12.003
16. Hill F, Kendall K, Galbraith K, Crossley J: Implementing the undergraduate mini-CEX: a tailored approach at
Southampton University. Med Educ. 2009, 43:326-34. 10.1111/j.1365-2923.2008.03275.x
17. Lee V, Brain K, Martin J: Factors influencing mini-CEX rater judgments and their practical implications: a