0% found this document useful (0 votes)
3K views31 pages

Chapter 2

Uploaded by

Tasneem Irhouma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3K views31 pages

Chapter 2

Uploaded by

Tasneem Irhouma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
  • Overview: Provides an introduction to the critical thinking approach in educational research with practical implications.
  • Are Biases in Research Obvious?: Examines how biases in research are often overlooked and misunderstood, with focus on media influence and personal interpretation.
  • Sharpen Your Thinking: Powerful Ideas: Discusses ways to enhance critical thinking by avoiding overgeneralization and focusing on differentiation and variability in research.
  • The Amazing Randi: Illustrates the importance of skepticism and verification in research with references to notable figures.
  • Clever Hans: Describes historical anecdotes to highlight how false conclusions can be drawn from misunderstood data.
  • Benjamin Franklin and the Placebo Control: Explores the application of placebo effects in experiments and their impact on controlling bias.
  • Misinterpretations?: Addresses potential misinterpretations in statistical data, and urges thorough examination of findings.
  • Little Emily: Shares a case study emphasizing how bias can creep into research from unexpected sources.
  • Paper Sputnik: Critiques widely accepted studies by revealing overlooked factors, leading to large-scale educational decisions.
  • Iowa First: Analyzes a case study on resource allocation in education and misguided conclusions.
  • Pygmalion: Discusses influential studies on expectations affecting educational outcomes.
  • Hawthorne: Examines the complexity and depth of biases inherent within educational research.
  • Other Examples: You Decide: Presents a variety of examples where critical assessment of data leads to different interpretations.
  • Control in Experimentation: Compared to What?: Discusses the importance of control groups in research and how comparisons affect outcomes.
  • Can You Trust Intuition?: Investigates the role of intuition vs. statistical data in shaping educational research conclusions.
  • Relationships: Do We Have Sufficient Information?: Assesses relational data in research, emphasizing clear understanding of variables and causation.
  • Contrasting Groups: Are They Meaningful?: reviews methodologies and conclusions about differences observed between groups.
  • Statistical Logic: How Can Inference Help?: Explores statistical inference as a tool for recognizing patterns in educational data.
  • Muddied Thinking About Important Ideas: Challenges common misconceptions in statistical significance within educational research contexts.
  • Misunderstood Proof: Addresses the misuse of proof in drawing conclusions from educational research data.
  • Summary of Key Points: Summarizes key concepts to encourage critical evaluation in educational research methodology.
  • Application Exercises: Provides exercises aimed at applying critical thinking skills to evaluate research topics and literature.
  • References: Contains a list of sources and references cited throughout the document for further reading.

Thinking About Research

In: Introduction to Educational Research: A Critical Thinking


Approach

By: W. Newton Suter


Pub. Date: 2014
Access Date: January 20, 2021
Publishing Company: SAGE Publications, Inc.
City: Thousand Oaks
Print ISBN: 9781412995733
Online ISBN: 9781483384443
DOI: [Link]
Print pages: 31-52
© 2012 SAGE Publications, Inc. All Rights Reserved.
This PDF has been generated from SAGE Research Methods. Please note that the pagination of the
online version will vary from the pagination of the print book.
SAGE SAGE Research Methods
2012 SAGE Publications, Ltd. All Rights Reserved.

Thinking About Research

Outline

• Overview
• Sharpen Your Thinking: Powerful Ideas
• Are Biases in Research Obvious?
• The Amazing Randi
• Clever Hans
• Benjamin Franklin and the Placebo Control
• Little Emily
• Misinterpretations?
• Paper Sputnik
• Iowa First
• Pygmalion
• Hawthorne
• Other Examples: You Decide
• Control in Experimentation: Compared to What?
• Can You Trust Intuition?
• Relationships: Do We Have Sufficient Information?
• Autism
• Grade Retention
• SAT Preparation
• Contrasting Groups: Are They Meaningful?
• Statistical Logic: How Can Inference Help?
• Muddied Thinking About Important Ideas
• Misunderstood Statistical Significance
• Misunderstood Proof
• Summary
• Key Terms
• Application Exercises
• Student Study Site
• References

Overview

Chapter 1 introduced a thinking-skills approach to educational research, one that views teachers as critical,
reflective practitioners poised to apply findings from research in education. Chapter 1 also revealed that
Thinking About Research
Page 2 of 31
SAGE SAGE Research Methods
2012 SAGE Publications, Ltd. All Rights Reserved.

thinking like a researcher is an acquired skill. This skill can be applied to mining data to enhance practice or
wisely evaluating research to avoid being snookered. This chapter introduces powerful concepts consumers
of educational research can use to understand the research process and apply directly to critical reviews of
research. This chapter also begins to demystify the process and continues the previous chapter's exploration
of clues to answering the puzzling question “Why are research findings so discrepant?” One major clue is
found in the powerful concept of control (or lack of it).

Control: The idea that procedures used in research can minimize bias, neutralize threats
to validity, rule out alternative explanations, and help establish cause-and-effect
relationships. Common control procedures include blinding and random assignment to
conditions.

Sharpen Your Thinking: Powerful Ideas

As educators, you probably have your own ideas about research, even though you may not think about
them in a formal way. Your ideas were probably garnered from all types of scientific research in diverse
fields of study, not solely from education. Medical research, it seems, attracts more news media than many
other fields of study, so some of what you already know about the research process may be the result
of widely disseminated medical or health-related research findings. Many principles in research, such as
control, are in fact broadly applicable, as they are embedded in the scientific research process in general and
shared by the fields of education, psychology, nursing, business, communications, sociology, neuroscience,
political science, biology, and many others. As you will see in this chapter, however, education poses unique
challenges for scientific researchers. Educational research requires developing new methods of inquiry and
adjusting our thinking somewhat from the simplistic ideas conveyed by news reports of findings and their
interpretation.

Are Biases in Research Obvious?

The Amazing Randi

Consumers of research in many fields of inquiry mistakenly believe that biases in empirical studies are usually
obvious and can nearly always be detected, even by those without special training. Consider a dramatic
example from medicine, but one with direct implications for research in education. In 1988, the journal
Nature published a celebrated research study (Davenas et al., 1988) with remarkable claims in support of a
discredited branch of medicine known as homeopathy (the use of dilute substances lacking molecules of an
original substance—yet having a “memory” of it—to cure an ailment, which, at full strength, would cause the
ailment in healthy people). Nature agreed to publish these findings if a team assembled by the journal could
Thinking About Research
Page 3 of 31
SAGE SAGE Research Methods
2012 SAGE Publications, Ltd. All Rights Reserved.

observe a replication (or repetition) of the experiments. One member of the observation team was particularly
interesting: James Randi, also known as The Amazing Randi, a professional psychic debunker. A magician by
training, The Amazing Randi successfully uncovered the tricks used by frauds who claimed to have psychic
powers. The homeopathic researchers never claimed to have such powers, but the Nature team believed
the researchers might have been less than careful and, without the researchers' knowledge or awareness,
allowed a source of bias to creep in and affect the findings. The real issue was not fraud but research bias,
contaminating influence so subtle it was beyond the researchers' level of awareness.

Research bias: Distortion of data collected in a research study that is explained by


unwanted influences stemming from observers, research participants, procedures and
settings, or researchers themselves.

The homeopathic experiments were repeated under the watchful eyes of The Amazing Randi with the
appropriate controls for experimenter bias such as blinding (being “in the dark”), whereby the researchers
were kept unaware of which conditions were supposed to (according to homeopathic theory) result in higher
measurements. With these controls (and others) in place, the Nature observers found that the homeopathic
effects disappeared and concluded that the original, positive findings were the result of experimenter
bias. The scientific community, including educational researchers, benefited from the reminder that some
contaminating biases are so subtle that their discovery requires perception of the caliber of James Randi's.
All consumers of research, it seems, must be aware of the perils of “wishful science.”

Highlight and Learning Check 2.1 Overlooking Bias

The history of science in many fields, including education, reveals numerous examples
of biases that created significant distortions in data, leading to erroneous conclusions.
Explain how biases, subtle or not, can be overlooked by researchers.

Clever Hans

The introduction of subtle influences beneath the awareness of those responsible is not a new discovery.
About 100 years ago in Germany, a horse named Hans bewildered spectators with displays of unusual
intelligence, especially in math (Pfungst, 1911). The horse's owner, von Osten, tested Hans in front of an
audience by holding up flash cards. Hans would, for example, see “4 + 5” and tap his hoof nine times. Hans
would even answer a flash card showing, say, “1/4 + 1/2,” by tapping three times, then four times. Amazing!
said the crowds and reporters. Worldwide fame was bestowed on the animal now known as “Clever Hans.”

This remarkable display lasted several years before the truth was uncovered by Oskar Pfungst, a psychologist

Thinking About Research


Page 4 of 31
SAGE SAGE Research Methods
2012 SAGE Publications, Ltd. All Rights Reserved.

with training in—you guessed it—the scientific method. Pfungst revealed that Clever Hans responded to very
subtle cues from von Osten—cues of which von Osten himself was oblivious. Body posture and facial cues
(such as raised eyebrows, widened eyes, flared nostrils) were the inevitable result of the owner's excitement
as the hoof tapping approached the correct number. When the right number was tapped, the height of
excitement was displayed all over von Osten's face. This, then, became the signal to stop tapping. Once
the research-oriented psychologist put in place the appropriate controls, such as showing the flash cards to
the horse only (not to von Osten, who was therefore “blind”), then the hoof tapping began to look more like
random responses. Clever Hans didn't seem so clever after all. Von Osten himself was never accused of
being a fraud, for the communication was below his awareness (and imperceptible to spectators). Although
the Clever Hans phenomenon was not discovered in a research setting, it is a valuable reminder that we
cannot be too careful when investigating all types of effects, from magic in medicine to genius in horses.

Benjamin Franklin and the Placebo Control

The concept of control in experimentation predates Clever Hans, and in fact was used well over 200 years
ago. Dingfelder (2010) described “the first modern psychology study,” commissioned by King Louis XIV of
France in 1782 and led by Benjamin Franklin. The need for control was created by Franz Mesmer (from whom
we get the word mesmerized), who claimed to cure physical ills of all sorts with his “magnetized” water and
therapeutic touch. It apparently worked, evidenced by dramatic emotional reactions of his clients, busy clinics,
and growing fortunes.

One clue to the healing effects came from a test of a woman who was falsely led to believe that magnetism
was being administered, behind a closed door, directly to her. Convulsions followed, presumably due to the
excitement of an apparent cure. This was a clue that the power of the mind can effect changes in symptoms.
Supporting evidence came from a woman who drank “magnetized” water—another deception—and fainted.
Benjamin Franklin's experimenters ordered Mesmer's special water to revive her, yet the water failed to have
any effect, an expected result since the woman had no awareness that the water had been “treated.” Although
Mesmer was uncovered as a fraud, the scientific community learned about the power of belief in medicine.
Dingfelder concluded that the most valuable contribution of Franklin's design was “the first placebo-controlled
blind trial, laying the foundation of modern medicine and science” (p. 31).

Critical Thinker Alert 2.1 Bias

Research results are never 100% free of bias. Culture, prior experiences, beliefs, attitudes,
and other preconceived ways of thinking about the research topic influence how a
research project is designed and how the results are interpreted. No interpretation occurs
on a neutral “blank slate.”

Thinking About Research


Page 5 of 31
SAGE SAGE Research Methods
2012 SAGE Publications, Ltd. All Rights Reserved.

Discussion: Do you believe that research in education and its interpretation can be
influenced by political factors? In what ways might political orientation influence research
in education and its impact?

Little Emily

Good science with careful controls requires clear thinking—not necessarily adulthood and a Ph.D. in rocket
science. Nine-year-old Emily Rosa demonstrated this in a fourth-grade science project published in the
prestigious Journal of the American Medical Association (JAMA; Rosa, Rosa, Sarner, & Barrett, 1998) and
reported in Time. George Lundberg, editor of JAMA, reminded us that age doesn't matter: “It's good science
that matters, and this is good science” (as cited in Lemonick, 1998, p. 67). Emily's newsworthy study
debunked therapeutic touch (TT), a medical practice that claims to heal by manipulating patients' “energy
fields.” Lemonick (1998) reported that many TT-trained practitioners wave their hands above the patient's
body in an attempt to rearrange energy fields into balance in order to heal wounds, relieve pain, and reduce
fever. Emily's study was simple. A sample of TT therapists placed their hands out of sight behind a screen.
Emily then flipped a coin, its outcome to determine which TT therapist's hand (left or right) she would place
her own hand over. She reasoned that the TT practitioners should have the ability to feel her own energy
above one of their hands. Each practitioner was then asked to report which hand was feeling her energy. After
she tallied her results, it was determined that the therapists did no better than chance, suggesting it was all
guesswork.

Lemonick (1998) also reported that Emily, being a young scientist, knew that her test must be generalized,
by being repeated under varying situations and with other subjects, before its results would be considered
definitive. Nevertheless, her findings do cast doubt on TT advocates' claims about how TT works and, by
contrast, do support the power of the placebo effect (i.e., wishful thinking while receiving special medical
attention). One of the most important ingredients of good science is control. Emily's one well-controlled yet
simple study was more valuable than a hundred studies with poor control. In this case, Emily's “special
training” was clear thinking about science.

Misinterpretations?

Consumers of research might believe that sloppy reasoning and misinterpretations of data occur rarely.
This is not correct. Misinterpretations of data are actually quite common, and instances of flawed reasoning
abound. Let's consider a few examples.

Paper Sputnik

The Paper Sputnik refers to the landmark study A Nation at Risk: The Imperative for Educational Reform

Thinking About Research


Page 6 of 31
SAGE SAGE Research Methods
2012 SAGE Publications, Ltd. All Rights Reserved.

released in 1983. (Sputnik was the first Earth-orbiting satellite launched by the Soviet Union, a sobering
embarrassment for the United States.) The Nation at Risk study led many to believe that education in
the United States was failing. A call to arms, the report itself included alarming statements such as “If an
unfriendly foreign power had attempted to impose on America the mediocre educational performance that
exists today, we might well have viewed it as an act of war” (National Commission on Excellence in Education,
1983, A Nation at Risk section, para. 3).

Gerald Bracey (2006b) suggested that when notable studies such as this make news, it's time to “go back
to the data” (p. 79). He did, and reported it was “a golden treasury of selected, spun, distorted, and even
manufactured statistics” (p. 79). Bracey began to question the major findings of the report when he found
statements such as “Average tested achievement of students graduating from college is also lower” (p. 79).
How could this be? asked Bracey, for the United States has no program that tests college graduates. (Bracey
sought answers to questions that challenged the report from several commissioners of the report, and he
reported “How convenient” when no one could recall where a specific statistic might have come from.) Bracey
found several instances of “selectivity,” meaning a focus on one statistic that does not represent the general
picture. For example, there was a decline in science achievement among 17-year-olds, but not other tested
ages nor declines in reading or math at any of the three ages tested. In other words, only one achievement
trend out of nine supported the “crisis rhetoric.” You will recognize this as cherry picking, first mentioned in
Chapter 1.

Iowa First

In an attempt to show the waste of simply throwing money at education to increase educational productivity
(higher achievement test scores), a nationally known columnist recently cited the “Iowa first phenomenon”
in support of his argument. Iowa, the argument goes, scored highest in the nation on the SAT (according
to the columnist's latest reports), but did not rank high in terms of state per-pupil expenditure. Is this a
meaningful comparison? No, according to Powell (1993), especially when you consider that only about
5% of the high school seniors in Iowa took the SAT. Most took the ACT (the American College Testing
program is headquartered in Iowa). A select few took the SAT in pursuit of universities beyond their
borders—such as Stanford, Yale, and Harvard. This academically talented group inflated the SAT average,
which is meaningless when compared with, for example, the SAT average of students in New Jersey (home
of Educational Testing Services, which produces the SAT). New Jersey at that time ranked high in per-pupil
expenditure but relatively low in SAT scores. It was no surprise, then, that according to research reports at
the time, the majority (76%) of New Jersey high school seniors, including the less academically able, took the
SAT. State-by-state rankings of score averages make little sense when the composition of the populations
taking the test varies so widely (another instance of “apples and oranges” comparisons).

Pygmalion

Some of the most widely known and influential studies conducted in the social sciences also illustrate the
Thinking About Research
Page 7 of 31
SAGE SAGE Research Methods
2012 SAGE Publications, Ltd. All Rights Reserved.

problem of data misinterpretation. In education, perhaps the most dramatic example is Robert Rosenthal and
Lenore Jacobson's experiment with teachers' self-fulfilling prophecies. This study, described in their book
Pygmalion in the Classroom: Teacher Expectation and Pupils' Intellectual Development (1968), received a
tremendous amount of media coverage and remains one of the most frequently cited studies ever conducted
in the broad social sciences. The study suggested that children's intelligence can increase merely in response
to teachers' expectation that it will do so.

Unfortunately, the media frenzy over this experiment overshadowed the scientific criticism occurring in less
accessible outlets (Elashoff & Snow, 1971; Wineburg, 1987). Richard Snow (1969), for example, observed
that in Rosenthal and Jacobson's original data, one student whose IQ was expected to increase moved from
17 to 148! Another student's IQ jumped from 18 to 122! Because IQs hover around 100 and rarely exceed
the boundaries of 70 to 130, one can only conclude that the original set of data was flawed and meaningless.
The idea of teachers' self-fulfilling prophecies took hold despite the data errors, however, and continues to the
present day. (There is ample evidence that teachers do have expectations of student performance based on
seemingly irrelevant characteristics and that they may behave in accordance with those expectations. There
is less evidence, however, that students' measured intelligence can spurt in the manner originally suggested
by Rosenthal and Jacobson's interpretation of the data.)

Hawthorne

One of psychology's best known research biases—the Hawthorne effect—is also a case study in the
misinterpretation of data. The Hawthorne effect was “discovered” during a series of experiments at the
Hawthorne Western Electric plant from 1924 to 1932. This effect refers to a change in behavior resulting
from simply being studied. It is also referred to as the novelty effect or the guinea pig effect and is generally
believed to stem from the increased attention research subjects receive during the course of a study. The
Hawthorne effect suggests that an increase in workers' production levels attributed to, for example, the
installation of a conveyor belt, could actually stem from the attention they received from being studied in
response to a change (any change). For whatever reason, the Hawthorne experiments are believed to be a
major impetus in the launching of industrial psychology as a discipline.

Highlight and Learning Check 2.2 Different Interpretations

Data collected in educational research do not interpret themselves. Explain how two
reasonable researchers may offer different interpretations of the same data.

The major findings of this study (Roethlisberger & Dickson, 1939) were interpreted impressionistically by the
researchers, and because the Hawthorne effect became so entrenched in the minds of other researchers, it
wasn't until 50 years later that the original data were analyzed objectively and statistically (Franke & Kaul,

Thinking About Research


Page 8 of 31
SAGE SAGE Research Methods
2012 SAGE Publications, Ltd. All Rights Reserved.

1978). Remarkably, Chadwick, Bahr, and Albrecht (1984) reported that “the findings of this first statistical
interpretation of the Hawthorne studies are in direct and dramatic opposition to the findings for which the
study is famous” (p. 273). In other words, an objective analysis revealed (at least in these “data”) that the
Hawthorne effect was a myth. In truth, there may be a Hawthorne effect in other contexts, but we know that
its existence is not supported by the original Hawthorne data. Research “findings” sometimes take on a life of
their own, often having little or no connection to the original data.

Other Examples: You Decide

There is indeed an interesting literature (well suited for cynics) with such titles as “Why Most Published
Research Findings Are False” (Ioannidis, 2005) and “Detecting and Correcting the Lies That Data Tell”
(Schmidt, 2010). This body of literature only calls attention to the need for critical thinking in education (and
many other fields, the two articles just mentioned representing medicine and psychology, respectively).

Could it be that the 1983 study A Nation at Risk and its claim of a “rising tide of mediocrity” (National
Commission on Excellence in Education, 1983, A Nation at Risk section, para. 2) was a “manufactured
crisis” (Berliner & Biddle, 1995)? Could we have averted this dramatic educational scare by the application
of critical thinking? Did lack of attention to alternative explanations of research findings result in our “being
snookered”? With limitations ranging from unchallenged and questionable statistics to a singular focus on
high schools, the report remains a prime example of how research in education can have a huge and lasting
impact—perhaps more so than any other report of research findings to date—without careful attention to
alternative explanations.

Could it be that there is no research base to support the value of homework? Yes, says Kohn (2007), who
concludes that the research supporting homework's link to achievement is “stunning” and its positive effects
on achievement “mythical.” His review of the literature reveals, “for starters, there is absolutely no evidence
of any academic benefit from assigning homework in elementary or middle school” (p. 36). The evidence
for homework's positive effects “isn't just dubious, it's not existent” (p. 36). At the high school level, Kohn
concludes that the link between homework and achievement is weak and “tends to disappear when more
sophisticated measures are applied” (p. 36). Kohn does find value in teacher action research. He suggests
that educators experiment:

Ask teachers who are reluctant to rethink their long-standing reliance on traditional homework to see
what happens if, during a given week or curriculum unit, they tried assigning none. Surely anyone
who believes that homework is beneficial should be willing to test that assumption by investigating
the consequences of its absence [emphasis added]. What are the effects of a moratorium on
students' achievement, on their interest in learning, on their moods and the resulting climate of the
classroom? Likewise, the school as a whole can try out a new policy, such as the change in default
that I've proposed, on a tentative basis before committing to it permanently. (p. 38)

What about learning styles and their great intuitive appeal to educators? According to Lilienfeld, Lynn, Ruscio,

Thinking About Research


Page 9 of 31
SAGE SAGE Research Methods
2012 SAGE Publications, Ltd. All Rights Reserved.

and Beyerstein (2010), the claim that “students learn best when teaching styles are matched to their learning
styles” is one of the 50 Great Myths of Popular Psychology. A recent review of the research literature (Pashler,
McDaniel, Rohrer, & Bjork, 2009) concluded that little or no evidence exists that supports customization of
teaching styles to support learning style variation as measured by current assessments. It may be that the
same learning method is indeed better for most students, but that learning style assessments used in prior
research have been faulty or research methods used to test this connection were too limiting.

This misinterpretation of data reminds me of an old story about the psychologist who trained a flea to jump
on command. This psychologist then investigated what effect removing legs from the flea, one leg at a time,
would have on its ability to jump. He found that even with one leg, the flea could jump at the command “Jump!”
Upon removing the flea's last leg, he found that the flea made no attempt to jump. After thinking about this
outcome awhile, he wrote up his findings and concluded, “When a flea has all legs removed, it becomes deaf.”
His finding was indeed consistent with that interpretation, but it is simply not the most reasonable one.

Critical Thinker Alert 2.2 Misinterpretation

Misinterpretation of research results is common–and not just by the popular media. Every
research finding requires cautious and tentative interpretations. Different interpretations of
the same finding are frequent and expected, in part because of our inclination to interpret
ambiguity in ways that align with our experience, assumptions, and values.

Discussion: Presume a researcher finds that a new program in all county high schools is
linked to higher standardized math scores but higher dropout rates and more time spent
teaching to the test. Is this evidence of the program's effectiveness?

Critical Thinker Alert 2.3 Research Impact

Some well-known studies in education are later reevaluated or perhaps judged to be


flawed. Others may alter our thinking or influence policy recommendations despite weak
empirical foundation.

Discussion: What factors might explain why a particular study has an extraordinary
influence, one not justified given a careful evaluation of its scientific rigor?

Thinking About Research


Page 10 of 31
SAGE SAGE Research Methods
2012 SAGE Publications, Ltd. All Rights Reserved.

Highlight and Learning Check 2.3 Comparison Groups

Researchers often use comparison groups to answer the question “Compared to what?”
That is because a change in a “treatment” group by itself is often difficult to interpret.
Explain why a treatment group's scores might change over time without the influence of
any “treatment.” Can no change in a treatment group be evidence of a treatment effect?
Explain.

Control in Experimentation: Compared to What?

One may believe incorrectly that control groups in research are a luxury and not needed to evaluate the
effectiveness of new interventions. Control groups serve a vital function by enabling researchers who test
new methods to answer the question “Compared to what?” Let's consider a dramatic example in medicine
to illustrate this point. Suppose a researcher wanted to test the effectiveness of acupuncture on lower back
pain. She recruited 100 patients with such pain and asked them to rate their pain on a 1 to 10 scale before
undergoing acupuncture three times a week for 10 weeks. At the end of the 10 weeks, the patients rated their
back pain once again, and, as expected by the researcher, the pain was greatly reduced. She concluded that
acupuncture was effective for reducing low back pain.

Are there other explanations for this finding? Certainly, and the researcher should have controlled for these
alternative, rival explanations with appropriate control groups before drawing any conclusions. For starters,
what about the mere passage of time? Isn't time one of the best healers for many conditions? Maybe the
patients would have had greatly reduced back pain 10 weeks later if they had done nothing. (Have you ever
had a backache? Did it go away without any treatment? Undoubtedly, yes.) A good control for this explanation
would be a 10-week “waiting list” control group that simply waited for the acupuncture in the absence of any
treatment.

What about the effect of simply resting three times a week for 10 weeks? Or an effect due to the awareness of
undergoing an alternative treatment with the expectation that something may finally work? Or an effect due to
lying down on a special acupuncture table? Or an effect due to simply piercing the skin, one that jump-starts
the placebo effect? An appropriate control in these instances would be a group treated exactly the same as
the acupuncture group, including having their skin pierced superficially while lying down three times a week on
a special acupuncture platform. In fact, members of this group should not be aware that they are in the control
group. In the jargon of research, this is referred to as blinding the control to the influence stemming from
the awareness of special treatment. This control group, then, controls for the influence of time, resting prone
during the day, receiving special attention, and many other factors as well, including the simple expectation
that pain will go away. (In this book, the labels control group and comparison group are used interchangeably
since no attempt is made to differentiate between them. The labels experimental group and treatment group

Thinking About Research


Page 11 of 31
SAGE SAGE Research Methods
2012 SAGE Publications, Ltd. All Rights Reserved.

are used interchangeably for the same reason.)

The value, or necessity, of placebo groups as a control is most obvious in medical research. New treatments
in medicine must be compared to something; they are often compared to traditional treatments or placebos.
Thompson (1999) reported how medical researchers tested the usefulness of implanting cells harvested from
embryos into the brains of those with Parkinson's disease to replace the cells killed by the disease. Imagine
being in the placebo group for this study: You are prepped for surgery and sedated. A hole is then drilled
through your skull. Without receiving any embryonic cells, you are sewn up and sent home. Controversial
from an ethical perspective? A resounding Yes, but not from the perspective of control in experimentation.
We know that the placebo effect, as described in the research by Benjamin Franklin discussed previously,
is an effect resulting from the mere thought (wishful thinking) that co-occurs with receiving a drug or some
other treatment. It can exert powerful influences on pain, sleep, depression, and so on. (In fact, research
by de Craen et al. [1999] reveals that four placebos can be more effective than two in the treatment of
stomach ulcers, explained easily by the fact patients swallow expectations along with pills.) The effect was
illustrated dramatically on the World War II battlefields when injured soldiers experienced pain relief after they
mistakenly thought they were getting morphine; in fact, they were simply getting saline solution. But can there
be a placebo effect in Parkinson's disease? Evidently so. Many researchers would argue that fake surgeries
are a necessary means for learning about cause and effect. How else can we learn about a treatment's
effectiveness? There is often tension between research ethics and tight scientific controls, as in the case of
the Parkinson's disease study. The researchers' need to control for patients' wishful thinking was met by the
“sham” (placebo) surgery, but the patients also got a chance for real help. After the study was completed,
the sham-surgery patients received the embryo cells in the same manner as the treatment patients. The
Parkinson's study should convince you that good medical science is not possible without control. The same
is true for educational research, as you'll see throughout this textbook.

Consider another example of the value of a control group (again in medicine). Arthroscopic knee surgery was
supposed to relieve arthritic pain for about 300,000 Americans each year. The only problem, according to
Horowitz (2002), is that it does no good. This conclusion is based on what Horowitz referred to as a “rarely
used but devastatingly effective test: sham surgery” (p. 62). She reported that researchers

randomly assigned some patients to undergo the surgery while other patients were wheeled into the
operating room, sedated, given superficial incisions (accompanied by the commands and operating
room noises they would hear if the real surgery were taking place), patched up and sent home. (p.
62)

The result, Horowitz (2002) reported, was that fake surgery worked as well as the real one, given that two
years later there was no difference between the two groups in terms of pain, mobility, and so on (Moseley,
2002).

One final example may hit home for many more people. Flawed research (bias and poor controls) and the
“Compared to what?” question have been described by Kirsch (2010) in the testing of antidepressants. Both

Thinking About Research


Page 12 of 31
SAGE SAGE Research Methods
2012 SAGE Publications, Ltd. All Rights Reserved.

Kirsch and Begley (2010) ignited a firestorm of controversy by suggesting that antidepressants are no more
effective than a placebo for most people. (Begley referred to them as expensive “Tic Tacs” with unwanted
side effects.) Enter the power of the placebo—the dummy pill—once again. The revelation offered by Kirsch
reminds critical consumers of research in all fields, including education, to ask, “Compared to what?” Begley
notes that the oft-cited claim “Antidepressants work” comes with a “big asterisk”: So do placebos. Fournier
et al. (2010) reported evidence to support the claim that most of the positive effect (excluding cases of
severe depression) is linked to the placebo effect. The main point here is that the answer to “Compared to
what?” may be a “nothing” group, a placebo group, or some other configuration of a comparison. Perhaps
the most informative would be a drug group, a placebo group, and a “wait list” group (participants receiving
no treatment for a length of time, say, eight weeks). Such a wait list group could control influences such
as the passage of time (e.g., personal, seasonal, environmental, and situational changes in people's lives)
as well as otherwise uncontrollable local, regional, national, or international events impacting many people's
affective state. Taking this a step further, one could argue that another meaningful comparison group in an
antidepressant drug study might include daily brisk exercise. Research designed to establish cause and effect
requires control and meaningful comparisons.

Critical Thinker Alert 2.4 Control Groups

Control groups allow researchers to answer the question “Compared to what?” Because
the mere passage of time is a great healer in medicine and patient expectations influence
outcomes, untreated (“placebo”) groups are needed to assess treatment effects beyond
time and expectations. The same concept applies in educational research, although time
and expectations are combined with hundreds of other extraneous factors.

Discussion: Why is it so difficult in education to utilize a classic control group–the kind


used, for example, in clinical trials to assess a drug's influence? Is the clinical drug trial in
research the most appropriate model to use in education?

Intuition: A belief without an empirical basis. Research findings often contradict intuitive
beliefs.

Can You Trust Intuition?

Most people, including researchers, have poor intuition when it comes to estimating the probability of random
outcomes, and thus there is an obvious need for statistical tests that calculate the probability of chance
events. Intuitive guesses are often wrong, and sometimes incredibly so. Consider these two well-known
Thinking About Research
Page 13 of 31
SAGE SAGE Research Methods
2012 SAGE Publications, Ltd. All Rights Reserved.

puzzles:

1. I bet that in a room with 25 people there are at least two with the same birthday. Want to
bet?
2. There are three closed doors, with a new car behind one and nothing behind the others.
I know which door hides the car. You choose a door, then I open another one that shows
there is nothing. Do you want to stick with your original choice, or do you want to switch
to the other closed door?

Because statistical judgments are often way off, you are better off not betting in the birthday problem. The
odds are about 50/50—even odds—for a birthday match with 23 people in a room. With 25 people, the odds
slightly favor a birthday match. And with 35 people, for example, the odds are overwhelming that there will be
a match. Here's an explanation using 23 people: Person 1 has 22 chances for a birthday match, person 2 has
21 chances for a match, person 3 has 20 chances for a match, person 4 has 19 chances for a match, and so
on. The chances mount up quickly, don't they? These additive chances will equal about 50/50 with 23 people.
Many people make intuitive judgments that lead to a losing bet, thinking erroneously that there must be 365
people (or perhaps half that number) for equal odds. A fraction of that number, such as 50, yields a match
with close to 100% certainty (but you can never be 100% sure!). (Over the years, I've noticed that teachers in
the lower grades, such as kindergarten and first grade, are not at all surprised by the answer to the birthday
matching problem. The reason, I believe, is that teachers at these levels tend to recognize birthdays in the
classroom, and with classes of over 20 students, there are often days when two birthdays are celebrated.)

In the door problem above, you'd better switch. If you stay with your original choice, the chance of winning
is .33. If you switch, the chance is .66. Think of it this way. You either pick one door and stay with it for good
(a 1/3 chance), or you pick the other two doors (as a bundle, a 2/3 chance), because I'll show you which
one of those two doors the car can't be behind before you make your selection. Put that way, the choice is
obvious. I'll take the bundle of two. Rephrasing a problem, without changing the problem itself, often leads to
more intelligent decisions. Much more information about this problem, termed the “Monty Hall Dilemma,” can
be found in vos Savant (2002), who was responsible for bringing this problem to light and generating great
interest among statisticians and lay readers.

Here's another counterintuitive problem: Among women aged 40 to 50 years, the probability that a woman
has breast cancer is .8% (8/10 of 1%). If she has breast cancer, the probability is 90% she will have a
positive mammogram. If she does not have breast cancer, the probability is 7% she will still have a positive
mammogram. If a woman does have a positive mammogram, then the probability she actually has breast
cancer is indeed very high: True or False? (This problem is adapted from Gigerenzer, 2002.) The answer is
False. Think in frequencies, with rounding over the long run. Of 1,000 women, 8 will have breast cancer. Of
those 8, 7 will have a positive mammogram. Of the remaining 992 who don't have breast cancer, some 70 will
still have a positive mammogram. Only 7 of the 77 women who test positive (7 plus 70) have cancer, which
is 1 in 11, or 9%. Many people are way off, guessing probabilities like 90%. Natural frequencies make the
problem so much easier, don't you think?
Thinking About Research
Page 14 of 31
SAGE SAGE Research Methods
2012 SAGE Publications, Ltd. All Rights Reserved.

Highlight and Learning Check 2.4 Data Impressions

Researchers do not analyze statistical data impressionistically or intuitively, fearing a


wrong interpretation and conclusion. How does this common intuitive “disability” explain,
for instance, being impressed with psychic readings or losing money while gambling?

I believe this is also counterintuitive: What is the probability of correctly guessing on five (just five) multiple-
choice questions, each with only four choices? Most people grossly underestimate this probability (especially
students who believe they can correctly guess their way through a test!). The probability is 1 out of 1,024.
Thus, you will need over 1,000 students blindly guessing on a test before you can expect one of them to score
100%.

While we're at it, try another: A man on a motorcycle travels up a mountain at 30 miles per hour (mph). He
wants to travel down the other side of the mountain at such a speed that he averages 60 mph for the entire
trip. What is the speed that he must travel down the other side to average 60 mph overall? (Assume no
slowing down or speeding up at the top—no trick there). One may guess 90 mph, thinking 30 + 90 = 120
and 120/2 = 60. That's the wrong answer. There is no speed fast enough. The guess involves the mistake
of thinking there is only one kind of mean—the arithmetic mean, whereby 30 + 90 = 120, which divided by
2 = 60. A different type of mean is needed here. (The harmonic mean is typically applied to the average
of rates and is more complicated.) Here's why: The size of the hill is not relevant because the hill could be
any size. Assume it is 30 miles up and 30 miles down for ease of explanation. Going up will take an hour
(30 miles going 30 mph). To travel 60 mph overall, the motorcycle must go 60 miles in an hour (obviously,
60 mph). Well, the motorcycle driver has already spent that hour going up! Racing down a million miles an
hour will take a few seconds; that will still put his speed under 60 mph overall. Research and statistics use
many different types of means. Fortunately, the one used in educational research is almost always the simple
arithmetic mean: that is, the sum of scores divided by the number of scores.

Here is one last problem (Campbell, 1974, p. 131), especially relevant to pregnant women. Suppose I
sponsored the following ad in a tabloid newspaper:

Attention expectant mothers: Boy or girl—what will it be? Send $20 along with a side-view photo
of pregnant mother. Money-back guarantee if at least 5 months pregnant. Can tell for sure—this is
research based.

Does spending $20 sound like a smart thing to do? No; the scammer would be right half the time, keeping
$10 on average, even while honoring the guarantee (returning $20 half the time but keeping $20 the other
half of the time, presuming those who received the correct answer didn't ask for a refund).

Flaws and fallacies abound in statistical thinking (and in research designs). Our inability to “think smartly”
about statistical problems explains, in part, how the unscrupulous can get rich.

Thinking About Research


Page 15 of 31
SAGE SAGE Research Methods
2012 SAGE Publications, Ltd. All Rights Reserved.

Critical Thinker Alert 2.5 Intuition

Intuition might work well for personal decisions but not for statistical ones. Researchers
need to know what is—and what isn't–an outcome that could have easily arisen from
chance factors.

Discussion: How might a teacher resolve the conflict between a gut feeling (intuition)
and what science suggests works best in education? Should educational practices be
dominated by scientific findings, or is there a place for other ways of knowing?

Fourfold table: A method of displaying data to reveal a pattern between two variables,
each with two categories of variation.

Relationships: Do We Have Sufficient Information?

Autism

Clear thinking in research involves knowing which group comparisons are relevant and which are not.
Consider this sentence: From a research perspective, when symptoms of autism appear shortly after the
measles/mumps/rubella (MMR) vaccine, we know the evidence for “vaccined a maged” children is pretty
strong. Is that true? Definitely not, for symptoms of autism typically appear in children at around the same age
the MMR vaccine is given. A fourfold table (see Table 2.1) illustrates the interaction of the variables. So far,
we have information only in the cell marked “X” (representing MMR children with autism). That is insufficient.
We must consider three other groups (marked “?”): (a) MMR children who are not autistic (lower left), (b) non-
MMR children who are autistic (upper right), and (c) non-MMR children who are not autistic (lower right).

Thinking About Research


Page 16 of 31
SAGE SAGE Research Methods
2012 SAGE Publications, Ltd. All Rights Reserved.

Table 2.1 Fourfold Table Relating MMR and Autism

A relationship can be established only when all cells in a table are filled in with information similar to the
original finding (X). The assessment of relationships, or associations, requires at least two variables (MMR
and autism, in this case), each with at least two levels of variation (yes versus no for MMR; yes versus no for
autism). (The vaccine-autism link illustrates how one “research-based” idea can take hold and not let go. A
controversy for many years, the case for the relationship appears to be completely unfounded. The original
study, based on 12 subjects, is now discredited and joins the ranks of an “elaborate fraud” based on bogus
data [Deer, 2011].)

Only one cell provides information (x). Information in all four cells is neededto establish a relationship.

Relationship: Any connection between variables–though not necessarily cause and


effect–whereby values of one variable tend to co-occur with values of another variable.

Grade Retention

Here is a similar problem, one made concrete with fictional data. True or False: From a research perspective,
if it is found that most high school dropouts in a study were not retained (i.e., held back), then we know that
retention is not linked to dropping out (disproving the retentiondropout connection).

I hope you chose False, for this problem conveys the same idea as the MMR and autism problem. We need a
fourfold table to establish the relationship among the data. Let's say we found 100 high school dropouts after

Thinking About Research


Page 17 of 31
SAGE SAGE Research Methods
2012 SAGE Publications, Ltd. All Rights Reserved.

sampling 1,100 participants, 70 of whom had not been retained (70 out of 100; that's “most,” right?). Then we
studied 1,000 high school students who had not dropped out and found that 930 had also not been retained.
The remaining 70 had been retained. The fourfold table (two variables, each with two levels of variation) is
shown in Table 2.2.
Table 2.2 Fourfold Table Relating Student Retention and Dropout

This table reveals a very strong relationship among the data. If students are retained, the chance of their
dropping out is .3 (30 out of 100); without retention, it is .07 (70 out of 1,000). A very common mistake in
thinking is wrongly concluding that there is (or is not) a relationship on the basis of incomplete information
(only one, two, or three cells within a fourfold table).

Sat Preparation

Evaluate this claim: Because most of the students who score high on the SAT don't enroll in any type of
test preparation program, we know there is no connection between test preparation and SAT scores. Is this
conclusion sound and logical? No, it is faulty and improper, representing a type of statistical fallacy. It might
be based on data such as this: Of the 100 high scorers, only 20 enrolled in a test preparation program. Yet
we need two more groups: the low SAT scorers who did and did not enroll in such programs. Perhaps only
five of the 100 low scorers enrolled in a course, leaving 95 low scorers who did not. With four groups, it is
clear there is relationship between the two variables, for there is a .80 chance (20 out of 25) of scoring high
for those who took the course, but only about a .46 chance (80 out of 175) for those who did not take the
course. (There were 200 scores overall, 100 low scorers and high scorers; a total of 25 took the course and a
total of 175 did not take the course.) Reaching false conclusions based on incomplete data is common. The
four cells, or fourfold table, for this problem are shown in Table 2.3.

Thinking About Research


Page 18 of 31
SAGE SAGE Research Methods
2012 SAGE Publications, Ltd. All Rights Reserved.

Table 2.3 Fourfold Table Relating SAT Scores and Course Preparation

Let's take this example a step further. Sometimes a relationship found in groups combined will disappear—or
even reverse itself—when the groups are separated into subgroups. This problem is known as Simpson's
Paradox. Presume further data mining of the 200 scores in Table 2.3 leads to separation into subgroups: first-
time test takers and repeat test takers. A “three-way” configuration of the data appears in Table 2.4.
Table 2.4 Three-Way Table Relating SaT Scores, Course Preparation, and First-Time versus Repeat
Test Takers

Given the three-way split with these hypothetical data, it is clear that the original finding of a positive course
preparation effect is moderated by the test takers' experiences. For first-timers, most (80%) scored low if they
completed a course, whereas most repeaters (80%) scored high when they completed a course. For both
groups, about half scored high and half low when they did not complete a course. The positive course effect
for combined groups has reversed itself for first-timers. Simpson's Paradox reminds us that relationships may

Thinking About Research


Page 19 of 31
SAGE SAGE Research Methods
2012 SAGE Publications, Ltd. All Rights Reserved.

disappear or reverse direction depending on how the data are split out (the design of the study). (Data could
have been configured to reveal no course effects in both test experience groups. You might wonder how a test
preparation course could lower your scores. Perhaps much poor advice was offered, advice the experienced
test takers were wise enough to ignore while attending to other good advice.) It becomes clear that some
relationships are revealed best (or only) by three variables considered together.

Highlight and Learning Check 2.5 Fourfold Tables

Assessing relationships in education often involves collecting data using a fourfold table.
What four groups (at least) would be needed to determine a link between late enrollment
in kindergarten and placement in special education later on?

Another curious feature reveals itself in Table 2.4. Does the table reveal evidence for a test preparation effect
or a repeat testing effect? Notice how all the repeaters who scored high, whether they took the course or
not (12 + 49 = 61), compared to the first-timers (2 + 37 = 39). Is this evidence of a repeat testing effect? Be
careful. Notice that there are indeed more repeaters (115) than first-timers (85), so you would expect more
high scorers on that basis. The most informative statistic in this regard would be the chance of scoring high if
you are a repeater (12 + 49 divided by 115, or about .53) versus a first-timer (2 + 37 divided by 85, or about
.46). These values are similar, suggesting test experience by itself is not a strong factor. This is a reminder
of the value of knowing information such as the base rate, in this case the chance of being a repeater apart
from scoring low or high.

Critical Thinker Alert 2.6 One-Group Study

A one-group study is often impossible to interpret. Comparison groups establish control


and permit one to discover relationships among variables, the first step toward learning
about cause and effect.

Discussion: Can you think of one-group research used in advertising, the kind that is
designed to sound convincing but clearly is not? From a research perspective, what
important piece of information is lacking in the claim?

Highlight and Learning Check 2.6 Interpreting Differences

Thinking About Research


Page 20 of 31
SAGE SAGE Research Methods
2012 SAGE Publications, Ltd. All Rights Reserved.

Research in education often involves contrasting groups, chosen because of a difference


on one dimension and presumed similarity on all others. But sometimes that presumption
is inaccurate. Explain the interpretative problems involved with comparing the
achievement differences of students who were breastfed and those who were not.

Contrasting Groups: Are They Meaningful?

Researchers often make use of contrasting (or paradoxical) groups, such as the French who consume wine
(and cheese) but have low heart disease (French Paradox). Is that conclusive in showing that wine lowers
heart disease? No. Cheese-eating, wine-drinking but less obese French also may bicycle to the market, eat
smaller portions more slowly, consume a lot of fresh vegetables, snack less, and even floss often, any one of
which may lower heart disease. Similarly, if people who eat fish twice a week live longer than those who don't,
can we conclude that eating fish results in greater longevity? No. Perhaps those who eat fish (expensive) also
eat vegetables, exercise more, or have access to more comprehensive healthcare linked to greater economic
resources.

In education, researchers are faced with the same problem when they compare groups that are not
comparable at the outset. Recall the homeschooling example presented in Chapter 1. If homeschooled
students outperform their public school counterparts on achievement tests, can we conclude that parents
make the best teachers? Hardly, since we are faced with many plausible rival explanations. Perhaps more
able students are homeschooled. And if the homeschooled students tested had entered public education,
might they have scored even higher on achievement tests? Or perhaps only homeschooled students who
excel were tested. Perhaps a different picture would emerge if all homeschooled students were tested.

What if Montessori-schooled children outperformed their public school counterparts? Perhaps the Montessori
students also had private tutors. Or perhaps their families were simply wealthier, wealth itself being linked
with test scores. This is often called the “Volvo effect” (Wesson, 2001), reflecting the finding that the price of
vehicles in a driveway can be linked to students' test scores. Once again, we are faced with an interpretive
dilemma due to the fact that relationships are usually easy to find—but hard to interpret.

One controversial example of questionable contrasting groups in educational research came to light with the
first formal evaluation of the Head Start early intervention program in 1969, which offered “competing data”
(Jacobson, 2007, p. 30). When evaluation results were tabulated, the conclusion led many to believe that
achievement gains in kindergarten disappeared as early as the first or second grade. The study's design,
however, led many to claim that Head Start treatment groups were compared to less meaningful control
groups, ones not as disadvantaged as the Head Start participants (Jacobson, 2007). With a comparison
group already receiving some type of “head start” similar to the formal Head Start program, the treatment
group essentially may have been compared to itself, rendering the comparison invalid.

Thinking About Research


Page 21 of 31
SAGE SAGE Research Methods
2012 SAGE Publications, Ltd. All Rights Reserved.

Critical Thinker Alert 2.7 Contrasting Groups

The use of contrasting groups, those formed without random assignment, poses serious
problems in interpretation. Groups formed on one dimension (e.g., exercise) may also be
different in other ways (e.g., diet). Finding faster mental reaction times among those who
exercise–and who have better diets–does not disentangle the influences of exercise and
diet.

Discussion: Why is it difficult to interpret a research finding that shows, for example, that
students who sleep longer (on average) in high school also earn better grades?

Inferential statistics: Statistical reasoning that permits generalization beyond the sample
to a larger population. Central to this reasoning is the notion of statistical significance,
meaning that a relationship found in the sample is probably not due to the workings of
chance.

Statistical Logic: How Can Inference Help?

Inferential statistics and their underlying logic are very useful to researchers, removing much of the
guesswork about relationships. Very practical indeed, these types of statistics include hundreds of what are
commonly called inferential tests, such as the t and the F, all of which have in common the determination
of p, or probability. The p value allows one to figure out whether a relationship, as opposed to a number of
random factors, is likely to exist in the population represented by the data in the sample. For example, to
determine whether there is a connection between fat in the blood (lipids) and short-term memory (the capacity
to hold items such as those in a grocery list in memory for a brief time), a researcher might collect data on
100 subjects' cholesterol levels, divide the subjects into two groups (lower versus higher cholesterol), and
then compute the average memory span (after its assessment) for both groups. The researcher knows that
chance factors will cause a mean difference between the two groups even if cholesterol and memory are in
no way related.

Let's presume the two groups' average memory spans were 7.1 (for the low cholesterol group) and 6.4 (for
the high cholesterol group). Is this difference greater than what one might expect in a hypothetical situation
that assigns 100 people into two random (hence similar) groups and tests their memory spans? Enter the
p value for final determination. If the p value is low, such as .01, one can conclude that chance influence is
unlikely, and that there probably exists a relationship or connection between cholesterol and memory span
in the population of people similar to those in the sample. This is the essence of reasoning in statistics, and

Thinking About Research


Page 22 of 31
SAGE SAGE Research Methods
2012 SAGE Publications, Ltd. All Rights Reserved.

why the p value is so prominent. Fortunately, we do not calculate the p values by hand; that's what statistical
software does for us. Also notice that if the p value were low (a “statistically significant” difference), one can
only conclude that there is probably a link between cholesterol and memory span in the represented sample.
Causal determination would logically follow a true experiment with randomized groups and manipulation of
cholesterol levels.

Highlight and Learning Check 2.7 Chance Factors

Statistical significance means that chance factors are not likely explanations of given
research results. It does not necessarily suggest that the statistics are important or
substantial. Explain why the use of statistical reasoning in research is vital to evaluating
claims made by those with ambiguous data.

One of the first statistical tests ever developed, and almost certainly one of the most commonly used tests
today, was created in about 1900 in an Irish brewery. Known as the t test and used to determine the statistical
significance of one mean difference (that is, two means), its developer was a chemist who wanted to develop
a better-tasting beer. He needed information about comparisons between two recipes, not so much in the
sample but in the population of beer drinkers represented by the sample. His new creation, the t test, provided
information about the population given only data from a sample. Because many research scenarios call for
comparing two means, you can see why the t test is so commonly used.

The history of statistics, then, is connected to the history of beer making, a fact that should convince you of
the practical nature of inferential statistics. Another commonly used statistical test, the F test (named after
Sir Ronald A. Fisher), was developed in the United States on the farm, so to speak, to solve agricultural
problems. Fisher needed a method for comparing complex groupings (based on different methods of irrigation
or fertilizing) in an attempt to maximize crop production. Once again, we see how statistical techniques were
developed to solve practical problems as opposed to settling theoretical debates. Many other statistical tests
were also developed in business and industry.

Critical Thinker Alert 2.8 Inferential Statistics

The use of inferential statistics via the famous p value permits cautious researchers
to reach conclusions about members of a population in general (not just a
sample)–conclusions that must be carefully tempered yet still might be wrong. The p refers
to probability, not proof.

Discussion: Presume a researcher reported that students born in the fall were earning

Thinking About Research


Page 23 of 31
SAGE SAGE Research Methods
2012 SAGE Publications, Ltd. All Rights Reserved.

slightly higher achievement scores than students born in other seasons. What information
is missing that might help interpret and evaluate this finding?

Muddied Thinking About Important Ideas

Research in many fields, particularly education, is loaded with terms that are poorly understood. This
concluding section samples a few common misconceptions. Clear ways to think about these ideas are
elaborated in the chapters that follow.

Misunderstood Statistical Significance

Consider once again the term statistically significant, which many may believe means roughly the same as
important or substantial. Probably the single best phrase to capture the meaning of statistically significant
is “probably not due to chance.” It carries no connotation such as “important” or “valuable” or “strong.” Very
small effects (for example, training in test-taking skills that “boost” a group's achievement scores from, say,
54 to 56) might be statistically significant but trivial and of little practical importance.

Highlight and Learning Check 2.8 Statistical Thinking

Statistical thinking permits researchers to disentangle what can reasonably be attributed


to chance factors and what can't. Explain why researchers need statistical help with
interpreting a small (or large) difference between two groups in a study.

The term statistically significant does not in any way suggest an explanation of findings either—it suggests
only that an observed relationship is probably not due to chance. For example, let's pretend your friend claims
to have psychic powers—that is, to be able to affect the outcome of a coin toss. As the coin is tossed, your
friend can “will” more heads than would be expected by chance. After 100 tosses, the results are in: The coin
turned up heads 60 times. Is this statistically significant? Yes, because a coin tossed 100 times would be
expected to land on heads with a frequency of about 43 to 57 most of the time. (“Most of the time” means
95% of the time; if you were to toss a coin 100 times and repeat this for 100 trials, 95 of the trials would
probably produce between 43 and 57 heads.) Notice that 60 heads is a statistically significant outcome since
it is beyond the limits imposed by chance 95% of the time. But also notice that no explanation is offered by
the term statistically significant.

There are many explanations other than “psychic ability.” Perhaps there was something wrong with the
coin (it did not average 50/50 heads-tails in the long run), or mistakes were made in the tally of heads, or
the “psychic” was a cunning trickster. Also, there always exists the possibility that the outcome of the coin

Thinking About Research


Page 24 of 31
SAGE SAGE Research Methods
2012 SAGE Publications, Ltd. All Rights Reserved.

toss was indeed a chance occurrence, although this explanation is correct less than 5% of the time. (Note:
The concept of statistical significance is undoubtedly the single most difficult in the introductory study of
educational research. Don't worry how those numbers were determined in the coin toss example. This will be
fully explained in Chapter 13, where it is discussed far more conceptually than mathematically.)

The famous expression p < .05 means statistically significant, or the probability is less than 5 out of 100 that
the findings are due to chance. In this sense, the word probability in science refers to a 95% likelihood. As 5
out of 100 is 1 out of 20, there are 19 or more chances out of 20 that a relationship uncovered by statistical
methods is “real” (or not due to chance). It sounds arbitrary; it is. I suppose scientists could have agreed on 18
out of 20. But they didn't. It was 19 out of 20 that stuck as the scientific standard. Keep in mind that for every
20 studies completed that show statistical significance, one study is probably “fluky.” That's another reason
why a single study by itself is suggestive only, and any definitive conclusion about it would be premature until
replications reveal that flukiness is not a likely explanation.

Knowing more about statistics enables us to critically evaluate research claims. This is especially important
because meaningless statistics are often presented in support of an argument. Wonderful examples of this
can be found in the book Damned Lies and Statistics: Untangling Numbers From the Media, Politicians,
and Activists (Best, 2001). The author considers well-known, oft-cited statistics and shows that they are
senseless, yet may take on lives of their own. One famous statistical claim is that since 1950, there has been
a doubling every year of the number of American children gunned down. The problem with this claim is its
impossibility, for if one child were gunned down in 1950, by 1970, the number of children gunned down would
have exceeded a million. By 1980, the number would have surpassed a billion, and by 1990, the number
would have topped the recorded population throughout history. Best (2001) shows that the hilarious number
would have reached 35 trillion in 1995, and soon after become a number only encountered in astronomy.
Clearly, many “well-known” statistics in the social sciences can be wildly wrong. Be on guard!

Critical Thinker Alert 2.9 Statistical Significance

Statistical significance is related to probability; it does not carry any implication about
practical value. Nor does it relate to the importance or strength of a connection among the
factors being investigated. Significance in the statistical sense is more closely related to,
for example, likely or unlikely outcomes of chance events.

Discussion: If a researcher reported a statistically significant link between students' college


entrance examination (SAT) scores and the size of the students' high schools (smaller
schools were associated with higher scores on average), does this suggest changes in
policy regarding the construction of new schools? Why or why not?

Thinking About Research


Page 25 of 31
SAGE SAGE Research Methods
2012 SAGE Publications, Ltd. All Rights Reserved.

Highlight and Learning Check 2.9 Research Support

Support is a far better word than proof in the study of educational research. (Proofs
are better suited for geometry and physics.) The term is used when the data gathered
are consistent with (support) a theory that predicted the outcome. Explain how research
support (not proof) is tied to the ever-present concern about alternative explanations for
research findings.

Misunderstood Proof

It is a mistaken belief that educational researchers can prove theories by collecting data in order to
prove hypotheses. Prove is a word that is best dropped from your vocabulary, at least during your study
of educational research. Unlike those who prove theorems in geometry, those who conduct educational
research will most likely test theories by finding support for a specific hypothesis born from the theory. For
example, constructivist theory predicts that students who construct meaning by, say, creating a metaphor
to increase their understanding, will learn new material better than students who passively receive new
material prepackaged in a lecture. If in fact a researcher found that the “constructed” group learned the new
material faster than the “lectured” group did, it would prompt the conclusion that the research hypothesis was
supported (not proven), and the theory which spawned the hypothesis would, in turn, become more credible.
Testing, supporting, and adding credibility are all more suitable terms in educational research than proving.
Researchers choose their words carefully to avoid confusion. Bracey (2006a) noted that “language that seeks
to make up your mind for you or to send your mind thinking in a certain direction is not the language of
research” (p. xvii). Prove is one case in point.

There are at least two reasons why educational researchers cannot prove hypotheses or theories. First,
research findings are usually evaluated with regard to their statistical significance, which involves the
computation of a p value as described earlier, referring to the probability (not proof) that a certain finding
was due to chance factors. Although the p values can be extraordinarily low (e.g., .000001, or one chance
out of a million, that the findings were due to chance), they cannot drop to zero. So there is always the
possibility—however small—that research findings are attributable to chance.

Second, no matter how well controlled a study is, there always exists the possibility that the findings could
be the result of some influence other than the one systematically studied by the researcher. For example,
a researcher might compare the relative effectiveness of learning to spell on a computer versus the “old-
fashioned” way of writing words by hand. If the computer group learned to spell better and faster than the
handwriting group, might there be reasons other than the computers for the better performance? Yes, there
may be several. For example, maybe the teachers of the computer group were different, possibly more
enthusiastic or more motivating. It might be that the enthusiasm or motivation by itself resulted in better
spelling performance. If the more enthusiastic and motivating teachers had taught the handwriting method,
Thinking About Research
Page 26 of 31
SAGE SAGE Research Methods
2012 SAGE Publications, Ltd. All Rights Reserved.

then the handwriting group might have outperformed the computer group.

Consider another explanation. The computing group was taught in the morning, the handwriting group in
the afternoon. If the computing group outperformed the handwriting group, how would you know whether
the better performance was a teaching method effect or a time-of-day effect? You wouldn't know. Perhaps
students had chosen the computer group or the handwriting group. If higher-achieving students (hence better
spellers) chose the computer group, the findings would tell us nothing about the relative effectiveness of the
two methods of learning (only that better spellers prefer to work on a computer).

Critical Thinker Alert 2.10 Proof

Theories are supported, not proven. They are developed to enhance our current
understanding and guide research. A good theory may outlive its usefulness and never be
proven; it may be replaced with a more useful one. The words research and prove usually
don't belong in the same sentence.

Discussion: What famous theorists come to mind from your studies of education and/
or psychology? Freud? Skinner? Jung? Dewey? What happened to the theory? Did the
theory evolve into a different one?

I hope that some of these research concepts have stimulated your interest in the process of scientific research
in general and educational research in particular. In the next chapter, we will examine the great diversity of
approaches to educational research.

Summary

A few powerful concepts go far in helping educators unravel the complexities of the research process. A
sampling of these ideas includes the following: how bias can be subtle, why data require interpretation (and
the related notion that misinterpretation of results is common), why control is vital, why intuition is often
unreliable, why associations in research require sufficient information, the meaningfulness of contrasting
groups, the power of statistical inference, and how thinking becomes muddied. Critical thinking about
research is a developed skill that helps clear the muddied waters of research in education.

Key Terms
Control 31 Inferential statistics 46 Relationship 43

Fourfold table 42 Intuition 40 Research bias 32

Thinking About Research


Page 27 of 31
SAGE SAGE Research Methods
2012 SAGE Publications, Ltd. All Rights Reserved.

Application Exercises

1. Locate an author's opinion about an issue or topic in education (perhaps a letter to the
editor or story about education in a magazine, blog, or newspaper). Analyze the
arguments carefully, paying particular attention to sloppy reasoning or fallacious thinking.
Which conclusions seem astute? Which ones seem faulty?
2. Locate an example of teacher action research in your library or on the Internet. (Search
Google for “teacher action research.”) Summarize the research report and describe why
you think it reveals correct interpretation and critical thinking about research.
3. Visit your library and locate a journal that publishes the findings of research studies, such
as the American Educational Research Journal, Journal of E ducat ional Psychology, or
Journal of Educational Research. Alternatively, use the Internet and locate an online
journal that publishes full-text reports of research, such as Education Policy Analysis
Archives. Other online journals that publish educational research can be found at the
website of the Educational Research Global Observatory ([Link]
[Link]). Find a study that uses a control group and explain its function. In other
words, what is it that the control group controls?
4. Using the same resources in your library or on the Internet, locate a study and focus on
one or more ideas introduced in this chapter, such as bias, misinterpretations,
counterintuitive findings, assessing relationships via fourfold tables (or similar group
comparisons), the use of contrasting groups, statistical significance, or the notion of proof.
Are any ideas in this chapter also conveyed in the published research report?
5. One might argue that research in education is too easily influenced by current politics.
Discuss ways in which political orientation might affect, even bias, research in education.
To get started, think about how politics might influence the very definition of science, what
qualifies as rigorous evidence, how federal research funds are awarded, or how research
findings are disseminated. Do you feel that politics can influence the research base in
education? How might this occur?

Student Study Site

Log on to the Web-based student study site at [Link] for additional study tools
including:

• eFlashcards
• Web Quizzes
• Web Resources
• Learning Objectives

Thinking About Research


Page 28 of 31
SAGE SAGE Research Methods
2012 SAGE Publications, Ltd. All Rights Reserved.

• Links to SAGE Journal Articles


• Web Exercises

References

Begley, S. (2010, February 8). The depressing news about antidepressants. Newsweek, 155(6), 34–41.

Berliner, D. C., & Biddle, B. J. (1995). The manufactured crisis: Myths, fraud, and the attack on America's
public schools. Reading, MA: Addison-Wesley.

Best, J. (2001). Damned lies and statistics: Untangling numbers from the media, politicians, and activists.
Berkeley: University of California Press.

Bracey, G. W. (2006a). Reading educational research: How to avoid getting statistically snookered.
Portsmouth, NH: Heinemann.

Bracey, G. W. (2006b). How to avoid statistical traps. Educational Leadership, 63(8), 78–82.

Campbell, S. K. (1974). Flaws and fallacies in statistical thinking. Englewood Cliffs, NJ: Prentice Hall.

Chadwick, B. A., Bahr, H. M., & Albrecht, S. (1984). Social science research methods. Englewood Cliffs,
NJ: Prentice Hall.

Davenas, E., Beauvais, F., Amara, J., Oberbaum, M., Robinson, B., Miasdonna, A.,… Benveniste, J.
(1988). Human basophil degranulation triggered by very dilute antiserum against IgE. Nature, 333, 816–818.
[Link]

de Craen, A. J., Moerman, D. E., Heisterkamp, S. H., Tytgat, G. N., Tijssen, J. G., & Kleijnen, J. (1999).
Placebo effect in the treatment of duodenal ulcer. British Journal of Clinical Pharmacology, 48(6), 853–860.
[Link]

Deer, B. (2011). Secrets of the MMR scare: How the case against the MMR vaccine was fixed. British Medical
Journal, 342. doi: 10.1136/bmj.c5347

Dingfelder, S. F. (2010, July/August). The first modern psychological study. Monitor on Psycholog y, 41(7),
30–31.

Elashoff, J. D., & Snow, R. E. (1971). Pygmalion reconsidered. Worthington, OH: Jones.

Fournier, J. C., DeRubeis, R. J., Hollon, S. D., Dimidian, S., Amsterdam, J. D., Shelton, R.C., & Fawcett,
J. (2010). Antidepressant drug effects and depression severity: A patientlevel meta-analysis. Journal of the
American Medical Association, 303(1), 47–53. [Link]

Thinking About Research


Page 29 of 31
SAGE SAGE Research Methods
2012 SAGE Publications, Ltd. All Rights Reserved.

Franke, R., & Kaul, J. (1978). The Hawthorne experiments: First statistical interpretation. American
Sociological Review, 43, 623. [Link]

Gigerenzer, G. (2002). Calculated risks: How to know when numbers deceive you. New York, NY: Simon &
Schuster.

Horowitz, J. M. (2002, July 22). What the knees really need. Time, 160(4), 62.

Ioannidis, J. P. A. (2005). Why most published research findings are false. PLoS Medicine, 2(8). doi:
10.1371/[Link].002014[Link]

Jacobson, L. (2007, April 25). Researcher offers competing data. Education Week, 26(34), 30.

Kirsch, I. (2010). The emperor's new drugs: Exploding the antidepressant myth. New York, NY: Basic Books.

Kohn, A. (2007, January/February). Rethinking homework. Principal, 86(3), 35–38.

Lemonick, M. (1998, April 13). Emily's little experiment. Time, 151(14), 67.

Lilienfeld, S. O., Lynn, S. J., Ruscio, J., & Beyerstein, B. L. (2010). 50 great myths of popular psychology:
Shattering widespread misconceptions about human behavior. Malden, MA: Wiley-Blackwell.

Moseley, J. B. (2002). A controlled trial of arthroscopic surgery for osteoarthritis of the knee. New England
Journal of Medicine, 347(2), 81–88. [Link]

National Commission on Excellence in Education. (1983). A nation at risk: The imperative for educational
reform. Retrieved from [Link]

Pashler, H., McDaniel, M., Rohrer, D., & Bjork, R. (2009). Learning styles: Concepts and evidence.
Psychological Science in the Public Interest, 9(3), 105–119.

Pfungst, O. (1911). Clever Hans. New York, NY: Holt, Rinehart & Winston.

Powell, B. (1993, December). Sloppy reasoning, misused data. Phi Delta Kappan, 75(4), 283, 352.

Roethlisberger, F. J., & Dickson, W. J. (1939). Management and the worker. Cambridge, MA: Harvard
University Press.

Rosa, L., Rosa, E., Sarner, L., & Barrett, S. (1998). A close look at therapeutic touch. Journal of the
American Medical Association, 279, 1005–1010. [Link]

Rosenthal, R., & Jacobson, L. (1968). Pygmalion in the classroom: Teacher expectation and pupils'
intellectual development. New York, NY: Holt, Rinehart & Winston.

Schmidt, F. (2010). Detecting and correcting the lies that data tell. Perspectives on Psychological Science,

Thinking About Research


Page 30 of 31
SAGE SAGE Research Methods
2012 SAGE Publications, Ltd. All Rights Reserved.

5(3), 233–242. doi: 10.1177/1745691610369339doi:10.1177/1745691610369339[Link]


1745691610369339

Snow, R. E. (1969). Unfinished Pygmalion [Review of the use of the book Pygmalion in the Classroom].
Contemporary Psychology, 14, 197–200.

Thompson, D. (1999, February 22). Real knife, fake surgery. Time, 153(7), 66.

vos Savant, M. (2002). The power of logical thinking: Easy lessons in the art of reasoning and hard facts
about its absence in our lives. New York, NY: St. Martin's.

Wesson, K. A. (2001). The “Volvo effect”: Questioning standardized tests. Young Children, 56(2), 16–18.

Wineburg, S. S. (1987). The self-fulfillment of the self-fulfilling prophecy : A cr it ical appraisal. Educat ional
Researcher, 16, 28–37. [Link]

[Link]

Thinking About Research


Page 31 of 31

Thinking About Research 
In: Introduction to Educational Research: A Critical Thinking 
Approach 
By: W. Newton Suter 
Pub. D
Thinking About Research 
Outline 
• Overview 
• Sharpen Your Thinking: Powerful Ideas 
• Are Biases in Research Obvious? 
• T
thinking like a researcher is an acquired skill. This skill can be applied to mining data to enhance practice or 
wisely eval
observe a replication (or repetition) of the experiments. One member of the observation team was particularly 
interesting: J
with training in—you guessed it—the scientific method. Pfungst revealed that Clever Hans responded to very 
subtle cues from
Discussion: Do you believe that research in education and its interpretation can be 
influenced by political factors? In what
released in 1983. (Sputnik was the first Earth-orbiting satellite launched by the Soviet Union, a sobering 
embarrassment for
problem of data misinterpretation. In education, perhaps the most dramatic example is Robert Rosenthal and 
Lenore Jacobson's
1978). Remarkably, Chadwick, Bahr, and Albrecht (1984) reported that “the findings of this first statistical 
interpretation
and Beyerstein (2010), the claim that “students learn best when teaching styles are matched to their learning 
styles” is one

You might also like