Computational Thinking Test for Beginners: Design
and Content Validation
María Zapata-Cáceres Estefanía Martín-Barroso Marcos Román-González
Computer Science Department Computer Science Department Faculty of Education
Universidad Rey Juan Carlos Universidad Rey Juan Carlos UNED
Madrid, Spain Madrid, Spain Madrid, Spain
[email protected] [email protected] [email protected] Abstract— Computational Thinking (CT) is a fundamental and generalization [4]. Brennan and Resnick propose a three-
skill that is not only confined to computer scientists’ activities dimensional (3D) framework for CT [13]. This framework,
but can be widely applied in daily life and is required in order which has attracted many researchers’ attention and been cited
to adapt to the future and, therefore, should be taught at early frequently in literature in recent years [14], categorizes CT
ages. Within this framework, assessing CT is an indispensable into three areas: (a) computational concepts (concepts that
part to consider in order to introduce CT in the school curricula. programmers use, such as sequences or loops), (b)
Nevertheless, efforts involving the formal assessment of computational practices (problem-solving practices that
computational thinking has primarily focused on middle school occurs in the process of programming, such as iteration or
grades and above; and are mostly based on the analysis of
abstraction), and (c) computational perspectives (the
projects in specific programming environments. A Beginners
Computational Thinking Test (BCTt), aimed at early ages, and
perspectives that designers form about the world around them
based on the Computational Thinking Test [1], has been and about themselves, such as expressing or connecting).
designed including several improvements; submitted to a Along with learning CT, assessing CT is an indispensable
content validation process through expert´s judgement part to consider in order to introduce CT in the curriculum, as
procedure; and administered to Primary School students. The student evaluation for pedagogical purposes is essential [15].
BCTt design is considered adequate by experts and results show Unfortunately, while there are multiple researchers that
a high reliability for the assessment of CT in Primary School,
describes experiences in integrating computational thinking
particularly in first educational stages.
into the K-12 curriculum, efforts involving the formal
Keywords—Beginners Computational Thinking Test, assessment of computational thinking has primarily focused
Computational Thinking, Computer Science Education, Primary on middle school grades and above [16]. Moreover, the
Education, Assessment, Programming, Early Childhood assessment instruments proposed by recent research are
Education mostly based on the analysis of projects performed by students
in specific programming environments.
I. INTRODUCTION
In this way, there are some attempts to measure and assess
Computational Thinking (CT) is a fundamental skill that CT in young students such as Fairy assessment in Alice [17],
can be widely applied in daily life and is required in order to which measures CT aspects in an specific programming
adapt to 21st century society [2]. CT was first defined as a learning environment (Alice); or Computational Thinking
human problem-solving process that uses decomposition and Pattern Quiz instrument [18], to assess whether computational
requires thinking at multiple levels of abstraction; it is not only thinking patterns can be recognized in a non-programming
the center of problem solving, but also develops and identifies context analyzing CT Patterns during the creation of a
the problem [3]. Subsequently, many other definitions had videogame with AgentSheets environment. Moreover, the
arisen, and this has provoked broad debate. It can be defined Test for Measuring Basic Programming Abilities [19] and
as the conceptual foundation required to solve problems Commutative Assessment [20], are both validated instruments
effectively and efficiently with solutions that can be used in under a psychometric approach but aimed to middle/high
different contexts [4]. Likewise, it can be defined as the school students.
thinking skills that precede coding and programming, and are
applied in understanding a problem and formulating a solution Similarly, Franklin et al. propose a model for integrating
like a computer scientist [5]. CT assessment into the design of a Scratch-based curriculum
[21] and a small pilot test with middle school students show
Although there are potential risks related to this lack of positive results; Denner, Werner and Ortiz developed a coding
consensus about CT definition [6], CT is considered an scheme to identify the extent to which programs written by
essential skill that new generations of students must acquire middle schools girls corresponded with computer science
and, therefore, should be taught at schools [7-9]. In addition, programming concepts [22]; Moreno León and Robles
there is evidence that programming exposes students to CT presented Dr.Scratch that analyses Scratch projects an can be
and, therefore, to problem-solving using computer science used as a tool for the formative assessment of Scratch projects
concepts such as abstraction and decomposition [10]. [23]. Seiter and Foreman introduce the Progression of Early
Consequently, there are no agreed-upon models or Computational Thinking (PECT) model, which is a
frameworks for developing CT in the classroom [4], e.g. Wing framework for understanding and assessing CT in Primary
includes five cognitive processes in CT: problem School (grades 1 to 6), analyzing coding design patterns in
reformulation, recursion, problem decomposition, abstraction student programming projects [16].
and systematic testing [11]; CT could also be divided into five Furthermore, Román et al. developed the Computational
facets: abstraction, generalization, algorithm, modularity, and Thinking Test (CTt), which stands out as a self-contained
decomposition [12]; or categorized into the following skills: instrument, independent of any programming environment,
abstraction, decomposition, algorithms, debugging, iteration for the assessment of CT [1], which is designed under a
978-1-7281-0930-5/20/$31.00 ©2020 IEEE 27–30 April, 2020, Porto, Portugal
2020 IEEE Global Engineering Education Conference (EDUCON)
Page 1905
Authorized licensed use limited to: University of Exeter. Downloaded on June 27,2020 at 18:25:47 UTC from IEEE Xplore. Restrictions apply.
psychometric approach and provides evidence about its problem, passing from one square to another in a particular
reliability and content [24], criterion [25], and predictive order. In this case, visual transitions were added between
validity [26]; it is consistent with [19] and [20]; and aligned squares (Fig. 1). This is intended to be a substantial
with the international standards for psychological and improvement in maze layouts as our hypothesis is that
educational testing [27]. In terms of Brennan 3D framework, difficulties with this type of layouts at early ages are related
the Román et al. CTt focuses on computational concepts, with disorientation and hesitations about whether the current
partially on computational practices and ignores and target square, at each step, should be part of the path
computational perspectives [1]. sequence or ignored. Besides, adding transitions turn the maze
in a state diagram, a main item in algorithms and coding which
Even though the CTt is aimed to students between 10 to has proved to improve the capability to understand problems
16 years old, it has been a consolidated and firm basis for the [29-31].
design of a Primary School targeted test: Beginners
Computational Thinking Test (BCTt) that has been developed,
validated and administered to Primary School students in this
study, as most previous studies were limited to CT assessment
on middle school grades and above. As the BCTt target
population is younger than that on CTt, the test must be
adapted both in form and content. Moreover, the BCTt design
includes several innovations which intended to be substantial
improvements. In this paper we present the guidelines that
have been followed for the design of the BCTt as a stand-alone
assessment instrument, independent of any programming
environment; its content validation process, and some
preliminary statistical analysis that show the promising
consistency of the test to assess CT in Primary School. Fig. 1. Maze A: no transitions; Maze B: transitions are added between squares
turning the maze into a state diagram
II. METHOD BCTt v.1 response alternatives are laid out as sequences of
An initial test version (BCTt v.1) was designed and then thick arrows, numbers and colors, depending on the
submitted to a content validation process through expert´s computational concept involved in each question. Each
judgment procedure. Next, attending to the results, answer has a top-bottom vertical layout, and not horizontal
suggestions and problems encountered, the test was improved, from left to right as in the CTt. This decision was taken
obtaining a second and more robust version (BCTt v.2). considering that code sequences reading direction is top-
Finally, the test was administered to 299 Primary School bottom. Besides, top-bottom layout revealed to be an adequate
students from schools in Spain to perform a statistical item arrangement in pilot tests, although problems related to canvas
analysis. item layout were encountered: if the dotted line was to be
drawn from bottom to top, students tended to read the answers
A. Beginners Computational Thinking Test v.1 from bottom to top (Fig. 2). This problem was solved avoiding
As the BCTt target population (5 to 12 years old) is these drawing directions (Fig. 3).
younger than that on the CTt, it must be adapted both in form
and content. Moreover, the BCTt includes several innovations
which intended to be substantial improvements. Pilot tests
were carried out on small subsamples (n=3 to 5 subjects, 5 to
10 years old) throughout initial design.
This initial version of the BCTt is 25 items long, with an
estimated time of 40 minutes, which seemed to be adequate in
pilot tests. Items are designed with the least possible text, and
symbols are intended to be self-explanatory in order to
increase the readability of the test at early ages.
Considering that the target population has lower reading, Fig. 2. Canvas type item. Drawing direction creates confusion about the
writing, and overall skills, decisions taken are aimed at answers reading direction.
making the test easier and accessible for young people. BCTt
graphic aspect is clear and intuitive and, to ease association,
the symbols used are intended to connect emotionally with the
students, since emotions take central stage among the factors
that influence the success of the learning process [28]. In this
way, the main challenge posed is to carry a chicken along to
its mother (the hen).
BCTt v.1 is multiple choice type, with three response
alternatives for each item, which are set out in two different
graphic layouts: canvas or maze type. The canvas type is a
“follow the dotted line” design that children of these
educational stages are used to, from their every day school Fig. 3. Corrected canvas type item. Drawing direction is the same as the
work. The maze layout consists of a square matrix where the answers reading direction.
student must figure a path in order to reach a target or solve a
978-1-7281-0930-5/20/$31.00 ©2020 IEEE 27–30 April, 2020, Porto, Portugal
2020 IEEE Global Engineering Education Conference (EDUCON)
Page 1906
Authorized licensed use limited to: University of Exeter. Downloaded on June 27,2020 at 18:25:47 UTC from IEEE Xplore. Restrictions apply.
BCTt v.1 contains Brennan’s 3D framework basic
computational concepts, ordered in increasing difficulty,
according to the target educational stages (Table 1): sequences
(6 items), simple loops (5 items), nested loops (7 items) and
conditionals (7 items). In each maze item, the required task is
to carry a chicken along to its mother (the hen) through the
maze that could be small or large format, allowing challenges
of different complexity. There could be obstacles to avoid (a
cat) or objects to collect along the way (pick-ups), such as
another chicken (Figs. 4 to 7).
TABLE I. 3D FRAMEWORK COMPUTATIONAL CONCEPTS CONSIDERED
IN EACH BCTT V.1 ITEM
Computational concept
Loops Conditionals
Obstacles
1.Sequences
Interface
Pick-ups
Then-Else
4.If-Then
2.Simple
3.Nested
6.While
Fig. 5. BCTt v.1 item example (item number 18).
Item
Size
5.If-
1 Maze Small x
2 Canvas - - x
3 Maze Small x x
4 Maze Small x x x
5 Maze Large x x x
6 Canvas - - x
7 Maze Small x
8 Maze Small x
9 Maze Small x x
10 Maze Large x
11 Maze Large x x
12 Maze Large x
13 Canvas - - x
14 Maze Large x x
15 Maze Large x x
Fig. 6. BCTt v.1 item example (item number 21).
16 Maze Large x x
17 Canvas - - x
18 Maze Large x x x
19 Maze Small x
20 Maze Large x
21 Maze Large x
22 Maze Large x
23 Maze Small x
24 Maze Large x
25 Maze Large x
Fig. 7. BCTt v.1 item example (item number 24).
B. Expert’s judgement procedure (BCTt v.1)
A content validation process of BCTt v.1 was completed
through expert judgment procedure, where 45 experts of
different profiles (Table II) provided their validation of the
instrument, estimating the difficulty level and relevance to
measure CT of each item; and contributing with other
considerations such as test length and graphic interface
adequacy or improvements applicability. Data was collected
by a 66 item long on-line form (http://bit.ly/38sEc8B)
Fig. 4. BCTt v.1 item example (item number 3). resumed in Table III.
978-1-7281-0930-5/20/$31.00 ©2020 IEEE 27–30 April, 2020, Porto, Portugal
2020 IEEE Global Engineering Education Conference (EDUCON)
Page 1907
Authorized licensed use limited to: University of Exeter. Downloaded on June 27,2020 at 18:25:47 UTC from IEEE Xplore. Restrictions apply.
TABLE II. EXPERT'S PROFILES Preference between maze
61 Transitions Dichotomous
with and without transitions
Professional group / groups (multiple response Number of
62 Transitions Item 61 answer justification Text
allowed) experts
Valuation on the length of
Computer Science Professional 21 63 Test length Likert scale
the BCTt (v.1)
Computer Science Teacher 14 Valuation on the content of
Test
64 the BCTt (v.1): CT in Likert scale
Primary School Teacher 9 adequacy
Primary School
Preschool Teacher 1 Valuation on graphic
Test
65 design and UX aspects of Likert scale
University Teacher 9 Interface
the BCTt (v.1)
No answer 4 Final comments and
66 Test Overall Text
suggestions
Age
Less than 30 3 C. BCTt administration: participants and procedure (v.2)
From 31 to 50 37 According to the content validation process results and
More than 51 1 experts’ suggestions (see section III.A), the BCTt was
modified into a refined final version: BCTt v.2 (see section
No answer 4
III.B) and administered to Primary School students to perform
Gender an item statistical analysis and to assess its design adequacy.
Woman 13
The participants in this study were a sample of n=299
Man 28
Primary School students (5 to 12 years old) from three Spanish
No answer 4 schools. In each school, the research focused on one
Expertise level in computer science teaching
educational stage as shown in Table IV. The sampling
methodologies procedure is intentional and, depending on the reasons that led
to sample the different subjects, these can be divided as shown
Very low 8
in Table V.
Low 2
Average 9
BCTt v.2, with added transitions between squares in maze
layouts, was administered to A1, B1, C1, D1, E1, and F1
High 13 subsamples. BCTt variation, with no transitions between maze
Very High 9 squares, was administered to B2, D2 and F2 subsamples.
No answer 4 Moreover, BCTt was re-administered to D1 subsample
subjects 5 weeks later.
TABLE III. EXPERT JUDGEMENTS FORM DESCRIPTION
The research was performed under the same conditions in
Experts’ each school as an action protocol was followed. The tests were
judgement BCTt topic Valued issue by the Form items administered concurrently to every subject. In order to ensure
Form addressed judges type
Items #
that students skills or previous experience in the use of
Experts’ answers to the computer devices do not interfere with the results, the tests
Multiple were printed and filled by the students individually in paper
corresponding items from
1 to 12 Sequences choice +
the BCTt and their
Likert scale
form. Moreover, tests were printed in greyscale, so that they
perceived difficulty level were accessible to students with color blindness (see section
13 Sequences Relevance to measure CT Likert scale III.B). Before taking the test, an explanatory example of an
Experts’ answers to the
Multiple item from each of the 6 different computational concepts was
corresponding items from
14 to 23 Simple loops choice + performed orally in front of the students.
the BCTt and their
Likert scale
perceived difficulty level
24 Simple loops Relevance to measure CT Likert scale TABLE IV. PRIMARY SCHOOL EDUCATIONAL STAGE CONSIDERED IN
Experts’ answers to the EACH SCHOOL
Multiple
corresponding items from
25 to 38 Nested loops choice + Educational Students
the BCTt and their
Likert scale School stage Grades ages
perceived difficulty level
39 Nested loops Relevance to measure CT Likert scale Colegio Público Carlos Ruiz 1st 1st and 2nd 5-8
Experts’ answers to the
Multiple Colegio Los Escolapios 2nd 3rd and 4th 7 -10
corresponding items from
40 to 43 If-then choice +
the BCTt and their CEIP León Felipe 3rd 5th and 6th 9 -12
Likert scale
perceived difficulty level
44 If-then Relevance to measure CT Likert scale TABLE V. NUMBER OF STUDENTS (N) IN EACH SUBSAMPLE
Experts’ answers to the
Multiple
corresponding items from Educational
45 to 48 If-then-else choice + Grade Identifier BCTt BCTt variation
the BCTt and their stage
Likert scale
perceived difficulty level
49 If-then-else Relevance to measure CT Likert scale 1 A A1: n=52
1st
Experts’ answers to the 2 B B1: n=18 B2: n=18
Multiple
corresponding items from
50 to 55 While choice + 4 C C1: n=54
the BCTt and their
Likert scale 2nd
perceived difficulty level 4 D D1: n=28 D2: n=28
56 While Relevance to measure CT Likert scale
Personal 5 E E1: n=51
57 to 60 Expert profile data Text 3rd
data 6 F F1: n=25 F2: n=25
978-1-7281-0930-5/20/$31.00 ©2020 IEEE 27–30 April, 2020, Porto, Portugal
2020 IEEE Global Engineering Education Conference (EDUCON)
Page 1908
Authorized licensed use limited to: University of Exeter. Downloaded on June 27,2020 at 18:25:47 UTC from IEEE Xplore. Restrictions apply.
III. RESULTS AND DISCUSSION 5,00
A. Expert’s content validation (BCTt v.1) 4,50
4,00
Regarding BCTt v.1 length, 24.4% of the experts 3,50
consulted estimated that the test contains too many questions; 3,00
a 68.3% considered the test length adequate. Just a 7.3% 2,50
would add more questions. 2,00
It was concluded that BCTt v.1 has a growing perceived 1,50
1,00 y = 0,0786x + 3,6857
difficulty along its items (Mean=2.8; Std. Deviation=0.83) R² = 0,5818
(Fig. 8), which is consistent with the scores obtained by the 0,50
experts since scores decrease throughout the test: splitting 0,00
0 1 2 3 4 5 6 7
BCTt items on computational concepts sets and counting how
many experts answered correctly the items of each set
Fig. 10. BCTt computational concept relevance to measure CT, perceived by
(Mean=37.6; Std. Deviation=2.57) (Fig. 9). Relevance for experts (ordinate axis: Likert scale from 1 to 5) by computational concept
measuring CT grows similarly along the test items (Abscissa axis: 1. Sequences, 2. Simple loop, 3. Nested loop, 4. If-then, 5. If-
(Mean=3.96; Std. Deviation=0.19) (Fig. 10) and each then-else, 6. While)
computational concept have a medium or high perceived
Moreover, under the question: “What is the BCTt global
relevance (Likert scale from 1 to 5: 5 maximum relevance):
level of adequacy to evaluate CT in Primary School students”
sequences were perceived as the least relevant computational
(Likert scale from 1 to 5), 73.1% considered a good or very
concept (3.66) and nested loops the most relevant
good adequacy (34.1%: very good, 39%: good, 22%:
computational concept (4.14).
intermediate, 4.9%: bad, 0%: very bad).
5,00 With regard to the question concerning interface and
4,50 graphic style adequacy to Primary School Students (Likert
4,00 scale from 1 to 5), 75.6% considered a good or very good
3,50 adequacy (39%: very good, 36.6%: good, 22%: intermediate,
3,00
2.4%: bad, 0%: very bad).
2,50 Furthermore, 83% of the experts estimated that the
2,00 addition of transitions to the maze layout is positive and
1,50 considered it a clear improvement to facilitate the
understanding of the problems posed. Some answers collected
1,00
y = 0,1063x + 1,4239 were: “transitions are easily associated to arrows in the
0,50 answers”, “transitions incorporate edges to the mazes, as a
R² = 0,8883
0,00 state diagram, with a better understanding of the problem”,
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
“the allowed paths are clear, excluding diagonal movements”,
“by including transitions, a distinction is clearly made
Fig. 8. BCTt item difficulty perceived by experts (ordinate axis: Likert scale
from 1 to 5), per BCTt item (abscissa axis). between the movement and the place of arrival. In the design
without transitions, doubts are generated about when a
42,00 character reaches another (either when it reaches the previous
41,00 y = -1,3169x + 42,221 square or when it reaches the other character square?)”.
R² = 0,9173
40,00 Finally, experts made many suggestions and comments
39,00 that were carefully considered to improve BCTt to a more
38,00
robust second version. Some relevant recurring comments
were:
37,00
36,00 x “An explanatory oral example of every different type
35,00
of item is needed”.
34,00 x “Regarding the number of response alternatives, I
33,00 suggest 4 alternatives instead of 3, as only 3 response
0 1 2 3 4 5 6 7 alternatives per item can negatively influence the total
reliability of the test”.
Fig. 9. Ordinate axis: BCTt score obtained (e.g. 41 means that 41 of 45 experts
answer correctly); per computational concept (Abscissa axis: 1. Sequences, 2. x “Children could try to jump the cat: an express
Simple loop, 3. Nested loop, 4. If-then, 5. If-then-else, 6. While). indication of not touching the cat is needed”.
x “It is not clear if two chicks can move together after
meeting”.
x “Last questions (conditionals) need more
explanation”.
x “If-else and if-then-else items do not correspond
exactly to the computational concept”.
978-1-7281-0930-5/20/$31.00 ©2020 IEEE 27–30 April, 2020, Porto, Portugal
2020 IEEE Global Engineering Education Conference (EDUCON)
Page 1909
Authorized licensed use limited to: University of Exeter. Downloaded on June 27,2020 at 18:25:47 UTC from IEEE Xplore. Restrictions apply.
Six experts of Primary School teacher’s professional
group (9 experts) consider the BCTt v.1 too difficult for
Primary School students.
B. Final BCTt version (BCTt v.2)
According to the validation process results and experts’
suggestions, the BCTt v.1 was modified, both in form and
content, into a refined final version (BCTt v.2). The following
features, among other, were modified or added:
x Before taking the test, an explanatory example of an
item from each of the 6 different computational
concepts must be performed orally in front of the
students.
x Each item contains 4 alternative responses instead of 3
to reduce the probability of responding correctly at
random (e.g. Fig. 11). Fig. 12. BCTt v.2 item example (item number 18)
x In items that contain a cat to avoid, it is specified that
the square occupied by the cat is not crossed (e.g. Fig.
11).
x In items that include another chicken, it could be
ambiguous whether the two chickens should continue
together after they meet or not, so in BCTt v.2, the
other chicken is replaced by a flower to collect (e.g.
Fig. 12).
x The examples of meaning contained in the last items
are clearer (e.g. Fig. 13).
x If-else and if-then-else items are reformulated for a
better correspondence with the computational concept
(e.g. Fig. 13).
x To ensure that students with color blindness can read
Fig. 13. BCTt v.2 item example (item number 21)
the symbols on each item, a specific shape is associated
to each different color (e.g. triangle and blue).
Moreover, this improvement allows the test to be
printed in black and white format (e.g. Fig. 14).
Fig. 14. BCTt v.2 item example (item number 24)
C. BCTt administration results: statistical analysis
BCTt v.2 (refined final version) was administered to
Fig. 11. BCTt v.2 item example (item number 3). Primary School students to empirically analyze design
adequacy, perform an item statistical analysis, and test
reliability.
1) Transitions
Considering BCTt score as the sum of correct answers
along the 25 items of each student’s test, to evaluate
transitions relevance and effectiveness, the BCTt scores were
compared to the BCTt without transitions scores, between
same grade subsamples (B1 and B2; D1 and D2, F1 and F2),
978-1-7281-0930-5/20/$31.00 ©2020 IEEE 27–30 April, 2020, Porto, Portugal
2020 IEEE Global Engineering Education Conference (EDUCON)
Page 1910
Authorized licensed use limited to: University of Exeter. Downloaded on June 27,2020 at 18:25:47 UTC from IEEE Xplore. Restrictions apply.
with Student’s t-test, assuming equal variances (Levene’s TABLE VII. BCTT SCORE STATISTICS BY GRADE
Test). As can be seen in Table VI, there aren’t significant Entire
differences in 4th and 6th grades test scores, yet there is a Sample A1 B1 C1 E1 F1
sample
significant difference in test scores (p=0.005 < 0.01) between Grade 1-6 1 2 4 5 6
2nd grade subsamples. It can be concluded that only younger N 200 52 18 54 51 25
students benefit from the addition of transitions in maze
layouts. Mean 19.92 16.52 16.78 21.57 21.84 21.72
Median 20.00 16.00 18.00 23.00 23.00 22.00
TABLE VI. SUBSAMPLES STATISTICS AND STUDENT´S T-TEST Std. Deviation 3.79 3.31 2.49 3.044 2.61 2.62
COMPARING BCTT WITH AND WITHOUT TRANSITIONS
Variance 14.36 10.96 6.183 9.268 6.815 6.88
t-test for
Sub- BCTt Std. Equality of Minimum 8.00 8.00 11.00 14.00 13.00 15.00
Grade N Mean
sample version Dev. Means Maximum 25.00 24.00 20.00 25.00 25.00 25.00
t Sig.
with 25 17.00 14.00 15.75 19.00 20.00 19.50
B1 18 16.778 2.487
transitions Percentiles 50 20.00 16.00 18.00 23.00 23.00 22.00
2 3.042 0.005
without
B2 18 14.278 2.445 75 23.00 19.00 18.00 24.00 24.00 24.00
transitions
with
D1 28 21.357 2.438
transitions
4 0.122 0.904 5,00
without
D2 28 21.286 1.922 4,50
transitions
with 4,00
F1 25 21.720 2.622
transitions 3,50
6 0.499 0.620
without 3,00
F2 25 21.280 3.542
transitions
2,50
2,00
2) Item analysis 1,50
Considering BCTt score as the sum of correct answers 1,00
along the 25 items of the test of each student and seeking a 0,50
balance between number of subjects and educational stages, 0,00
a statistical analysis of BCTt score results was performed on
a subsample of each grade: A1, B1, C1, E1 and F1
subsamples (N=200). Results along grades are shown in
Table VII. Entire sample preliminary analysis of the results
reveals an overall high mean (19.915), and scores along 1st 2nd 4th 5th 6th
subsamples suggest that the test might be aimed at Primary
School first educational stages, as no significant difference is Fig. 15. Abscissa axis: computational concept by grade. Ordinate axes: BCTt
shown between 4th and 5th grades means (Student’s t-test: item score, normalized from 0 to 5: 5 maximum score).
t=0.485, p=0.63 > 0.05) nor between 5th and 6th grades
means (Student’s t-test: t=0.193; p=0.85 > 0.05). This score per item or difficulty index, confirms
empirically the progressive difficulty anticipated by the
Nevertheless, a second deeper analysis was made, this qualitative analysis of the experts (average difficulty index =
time, splitting BCTt items on computational concepts, and 0.81) considering the entire sample (Fig. 16). Likewise,
counting how many students answered correctly the items of results isolating the youngest students: 1st educational stage
each set (i.e., if the subsample is n=23, 21 points score in an (subsamples A1 and B1, N=70), show increasing difficulty
item means that 21 of the 23 students answered it correctly). along the elements (N=25 items; Minimum=0.27;
This score per item, related to each computational concept, Maximum=0.96; average difficulty index =0.70; average
shows interesting results, since items scores related to Nested total score = 16.59) (Fig. 17).
Loops and Conditionals concepts reveal low success along
every grade as is shown in Fig. 15, but Sequences and Simple 1,2
loops show very high success in 5th and 6th grades, which
leads us to conclude that the initial test items could be too 1
easy for 3rd Primary School stage students, so it could be 0,8
necessary to add to the test more difficult items for this stage.
0,6
0,4
0,2 y = -0,0155x + 1,0083
R² = 0,6344
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Fig. 16. Item difficulty index (ordinate axis) for each BCTt item (abscissas
axis).
978-1-7281-0930-5/20/$31.00 ©2020 IEEE 27–30 April, 2020, Porto, Portugal
2020 IEEE Global Engineering Education Conference (EDUCON)
Page 1911
Authorized licensed use limited to: University of Exeter. Downloaded on June 27,2020 at 18:25:47 UTC from IEEE Xplore. Restrictions apply.
TABLE IX. BCTT RELIABILITY STATISTICS BY GRADE
1,2
y = -0,0173x + 0,9249 Subsamples Reliability Statistics Item Statistics
1 R² = 0,3904 Cr. 's Alpha
Ed. Cronbach's Based on
0,8 stage Grade Id. n Alpha Stand. Items Mean Variance
1st 1 A1 52 0.833 0.838 0.742 0.041
0,6
1st 2 B1 18 0.793 0.801 0.630 0.042
0,4 2nd 4 C1 54 0.771 0.735 0.837 0.022
0,2 3rd 5 E1 51 0.660 0.683 0.863 0.012
3rd 6 F1 25 0.657 0.648 0.844 0.015
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A second reliability analysis was made, this time
Fig. 17. Item difficulty index (ordinate axis) for each BCTt item (abscissas performing task and re-task method with D1 sample (N=28).
axis), first educational stage. BCTt was administered to D1 subjects (2nd educational
stage, 4th grade) and, 5 weeks later, was re-administered in
The histogram showing the distribution of the BCTt score
the same conditions to the same subjects. As BCTt scores are
along 1st and 2nd grades subsamples (Fig. 18), fits the normal
not normally distributed in D1 subsamples (Shapiro-Wilk test
curve and it is fairly symmetric, which suggests that the BCTt
of normality: test Sig.=0.03; re-test Sig.=0.01), non-
is balanced in terms of difficultly of its items for Primary
parametric Spearman’s test was used, showing a very strong,
School 1st educational stage.
positive and significant correlation (rs=0.93; p<0.01).
Therefore, an excellent reliability as stability was found for
the BCTt in this subsample.
IV. CONCLUSIONS
The BCTt design is considered adequate by experts for
the assessment of CT in Primary School students, both in
form and content, and contains relevant improvements such
as transitions between squares. In addition, BCTt items seem
to be ordered in increasing difficulty and relevancy. These
considerations were confirmed in the statistical analysis from
the administration of the test to Primary School students.
Transitions between squares are shown as a relevant
improvement in maze layouts for younger students (1st
educational stage), resulting in very significant higher scores
compared to students with no transitions in BCTt. However,
4th and 6th grade students do not benefit from the inclusion
of transitions nor were negatively affected by them. This
leads us to conclude that transitions are an improvement that
Fig. 18. Histogram of the BCTt score (1st educational stage)
can be included in this type of problems since it is a
3) Reliability significant scaffold for younger students without affecting
In order to evaluate the internal consistency associated negatively older ones.
with BCTt scores, considering all grades (N=200), a
reliability analysis has been made. Cronbach’s Alfa is Entire sample overall mean and BCTt scores along
=0.824 (Table VIII), that can be considered as a very good subsamples suggest that the test might be aimed at Primary
reliability [32]. Reliability results by grade shows a lower School first educational stages, as high means are shown in
Cronbach’s Alpha the higher the grade is (Table IX), which 2nd and 3rd educational stages subsamples, and no
leads us to conclude that BCTt is mainly aimed at first significant difference is shown between total score means in
Primary School stages (grades from 1 to 4) where it shows older students. Thus, BCTt test seems to be aimed at 1st to
higher reliability. 4th grades and especially to 1st and 2nd grades, as it is
balanced in terms of difficulty of its items. However, there
TABLE VIII. BCTT RELIABILITY STATISTICS ENTIRE SAMPLE are significant differences between grades in all educational
Reliability Item Statistics stages’ subsamples score means, from question 12 and over,
Sample Statistics which leads us to conclude that the test could be aimed at all
Cr. 's Primary School stages, but the first part of the test might
Alpha include items that are too easy for older students, contrary to
Cronba Based on
N of ch's Stand. what was expected from the expert’s content validation
N Items Alpha Items Mean Min. Max. Variance comments, therefore, including more items with a higher
200 25 0.824 0.829 0.807 0.576 0.976 0.021
difficulty for 3rd educational stage students could be a
valuable improvement in a future BCTt version.
978-1-7281-0930-5/20/$31.00 ©2020 IEEE 27–30 April, 2020, Porto, Portugal
2020 IEEE Global Engineering Education Conference (EDUCON)
Page 1912
Authorized licensed use limited to: University of Exeter. Downloaded on June 27,2020 at 18:25:47 UTC from IEEE Xplore. Restrictions apply.
[9] S. Grover and A. Korhonen, "Unlocking the Potential of Learning
Analytics in Computing Education," ACM Transactions on Computing
The BCTt proved a high reliability throughout the entire Education (TOCE), vol. 17, pp. 1-4, Aug 29, 2017.
sample with Cronbach’s Alfa=0.824, however, a higher [10] S.Y. Lye and J.H.L. Koh, "Review on teaching and learning of
coefficient was obtained in younger students than in the older computational thinking through programming: What is next for K-12?"
ones. As expected, the BCTt is more reliable in 1st and 2nd Computers in Human Behavior, vol. 41, pp. 51-61, 2014.
grades than in higher grades, since the difficulty level of the [11] J.M. Wing, "Computational thinking and thinking about computing,"
test fits better with the lower ones. Thus, the BCTt, as the CTt Philosophical Transactions of the Royal Society A: Mathematical,
Physical and Engineering Sciences, vol. 366, pp. 3717-3725, Oct 28,
[1] is a self-contained instrument that has revealed to be 2008.
reliable for the assessment of CT in Primary School and can [12] S. Atmatzidou and S. Demetriadis, "Advancing students’
be administered as pre-test and post-test in researches that computational thinking skills through educational robotics: A study on
requires it, however, as it focuses on 3D framework age and gender relevant differences," Robotics and Autonomous
Systems, vol. 75, pp. 661-670, Jan. 2016.
computational concepts, partially on computational practices
[13] K. Brennan, M. Resnick and MIT Media Lab, "New frameworks for
and ignores computational perspectives, it is recommended studying and assessing the development of computational thinking,"
to use in parallel with other assessment tools to cover its American Educational Research Association Meeting, Vancouver, BC,
limitations [33]. Canada, 2012.
[14] B. Zhong, Q. Wang, J. Chen and Y. Li, "An Exploration of Three-
From this research it can be concluded that BCTt can be Dimensional Integrated Assessment for Computational Thinking,"
Journal of Educational Computing Research, vol. 53, pp. 562-590, Jan.
used in Primary School students, particularly in first grades 2016.
(5 to 10 years old), since older students (9 to 12 years old) [15] NRC, "Report of a workshop on the pedagogical aspects of
scores results revealed that the BCTt was too easy for computational thinking," 2011.
students of the highest grades, although it can be used [16] L. Seiter and B. Foreman, "Modeling the Learning Progressions of
focusing only on the more complex test items. Therefore, Computational Thinking of Primary Grade Students," in Proceedings
BCTt can be considered a reliable extension of the Román et of the Ninth Annual International ACM Conference on International
Computing Education Research, pp. 59-66, 2013.
al. CTt for younger students, since CTt is aimed to 10 to 16
[17] L. Werner, J. Denner, S. Campe and D. Kawamoto, "The fairy
years old students. performance assessment," pp. 215-220, 2012.
[18] A. Basawapatna, K.H. Koh, A. Repenning, D. Webb and K. Marshall,
Further research concerns the administration of the test to "Recognizing computational thinking patterns," pp. 245-250, Mar 9,
3-4 years old students, as upper age limit has been stablished, 2011.
but there are concerns about lower BCTt age limit. Moreover, [19] A. Mühling, A. Ruf and P. Hubwieser, "Design and First Results of a
Psychometric Test for Measuring Basic Programming Abilities," pp. 2-
additional research on 3rd grade students may be necessary 10, Nov 9, 2015.
to exactly determine the BCTt scope. In addition, it could be [20] D. Weintrop and U. Wilensky, "Using Commutative Assessments to
enlightening to replicate the study in other countries and Compare Conceptual Understanding in Blocks-based and Text-based
populations. Programs," pp. 101-110, Aug 9, 2015.
[21] D. Franklin, P. Conrad, B. Boe, K. Nilsen, C. Hill, M. Len, G.
ACKNOWLEDGMENT Dreschler, G. Aldana, P. Almeida-Tanaka, B. Kiefer, C. Laird, F.
Lopez, C. Pham, J. Suarez and R. Waite, "Assessment of computer
Thanks to the participants, teachers and schools involved science learning in a scratch-based outreach program," pp. 371-376,
in the learning experiences. This work has been co-funded by Mar 6, 2013.
the Madrid Regional Government, through the project e- [22] J. Denner, L. Werner and E. Ortiz, "Computer games created by middle
Madrid-CM (P2018/TCS-4307). The e-Madrid-CM project is school girls: Can they be used to measure understanding of computer
also co-financed by the Structural Funds (FSE and FEDER). science concepts?" Computers & Education, vol. 58, pp. 240-249,
2012.
REFERENCES [23] J. Moreno-León and G. Robles, Dr. Scratch: a Web Tool to
Automatically Evaluate Scratch Projects, 2015.
[1] M. Román-González, J. Pérez-González and C. Jiménez-Fernández,
"Which cognitive abilities underlie computational thinking? Criterion [24] M. Román González, "Computational Thinking Test: Design
validity of the Computational Thinking Test," Computers in Human Guidelines and Content Validation," in EDULEARN15 At: Barcelona,
Behavior, vol. 72, pp. 678-691, 2017. 2015.
[2] T. Hsu, S. Chang and Y. Hung, "How to learn and how to teach [25] M. Román-González, J. Pérez-González, J. Moreno-León and G.
computational thinking: Suggestions based on a review of the Robles, "Extending the nomological network of computational
literature," Computers & Education, vol. 126, pp. 296-310, 2018. thinking with non-cognitive factors," Computers in Human Behavior,
vol. 80, pp. 441-459, 2018.
[3] J.M. Wing, "Computational Thinking Test," CACM Viewpoint, pp. 33-
35, March. 2006. [26] M. Román-González, J. Pérez-González, J. Moreno-León and G.
Robles, "Can computational talent be detected? Predictive validity of
[4] V.J. Shute, C. Sun and J. Asbell-Clarke, "Demystifying computational the Computational Thinking Test," International Journal of Child-
thinking," Educational Research Review, vol. 22, pp. 142-158, Nov. Computer Interaction, 2018.
2017.
[27] AERA, APA and NCME, Standards for educational and psychological
[5] S. Grover, "The 5th ‘C’ of 21st Century Skills? Try Computational testing, Washington: American Educational Research Association,
Thinking (Not Coding)," Feb 25, 2018. 2014.
[6] J.A. Velazquez-Iturbide, "Towards an Analysis of Computational [28] D. Cprioar, "Emotions and the Learning of School Mathematics,"
Thinking," pp. 1-6, Sep 2018. Bulletin of the Transilvania University of Braov, Series VII: Social
[7] H. Kim, H. Choi, J. Han and H. So, "Enhancing teachers' ICT capacity Sciences and Law, vol. 10, pp. 9-18, 2017.
for the 21st century learning environment: Three cases of teacher [29] C. Chen and P. Herbst, "The interplay among gestures, discourse, and
education in Korea," Australasian Journal of Educational Technology, diagrams in students’ geometrical reasoning," Educational Studies in
vol. 28, 2012. Mathematics, vol. 83, pp. 285-307, 2013.
[8] K. Tang, T. Chou and C. Tsai, "A Content Analysis of Computational [30] H.Y. Durak and M. Saritepeci, "Analysis of the relation between
Thinking Research: An International Publication Trends and Research computational thinking skills and various variables with the structural
Typology," The Asia-Pacific Education Researcher, pp. 1-11, Mar 21, equation model," Computers & Education, vol. 116, pp. 191-202, 2018.
2019.
978-1-7281-0930-5/20/$31.00 ©2020 IEEE 27–30 April, 2020, Porto, Portugal
2020 IEEE Global Engineering Education Conference (EDUCON)
Page 1913
Authorized licensed use limited to: University of Exeter. Downloaded on June 27,2020 at 18:25:47 UTC from IEEE Xplore. Restrictions apply.
[31] T. Watanabe, "Visual Reasoning Tools in Action: Double Number [33] M. Román-González, J. Moreno-León and G. Robles, "Combining
Lines, Area Models, and Other Diagrams Power Up Students' Ability Assessment Tools for a Comprehensive Evaluation of Computational
to Solve and Make Sense of Various Problems," Mathematics Teaching Thinking Interventions," in Computational Thinking Education, S.
in the Middle School, vol. 21, pp. 152, 2015. Kong and H. Abelson, Singapore: Springer Singapore, 2019, pp. 79-
[32] C.E. Lance, M.M. Butts and L.C. Michels, "The sources of four 98.
commonly reported cutoff criteria: What did they really say?"
Organ.Res.Methods, vol. 9, pp. 202-220, 2006.
978-1-7281-0930-5/20/$31.00 ©2020 IEEE 27–30 April, 2020, Porto, Portugal
2020 IEEE Global Engineering Education Conference (EDUCON)
Page 1914
Authorized licensed use limited to: University of Exeter. Downloaded on June 27,2020 at 18:25:47 UTC from IEEE Xplore. Restrictions apply.