8/5/24, 9:34 AM Hypothesis Testing, P Values, Confidence Intervals, and Significance - StatPearls
Hypothesis Testing, P Values,
Confidence Intervals, and
Significance
Article Editor: Martin R Huecker (
[email protected])
Assigned Author: Jacob Shreffler (
[email protected])
Updated: 3/13/2023 3:52:01 PM
Definition/Introduction
Medical providers often rely on evidence-based medicine to guide decision-making in practice. Often a
research hypothesis is tested with results provided, typically with p values, confidence intervals, or
both. Additionally, statistical or research significance is estimated or determined by the investigators.
Unfortunately, healthcare providers may have different comfort levels in interpreting these findings,
which may affect the adequate application of the data.
Issues of Concern
Without a foundational understanding of hypothesis testing, p values, confidence intervals, and the
difference between statistical and clinical significance, it may affect healthcare providers' ability to
make clinical decisions without relying purely on the research investigators deemed level of
significance. Therefore, an overview of these concepts is provided to allow medical professionals to use
their expertise to determine if results are reported sufficiently and if the study outcomes are clinically
appropriate to be applied in healthcare practice.
Hypothesis Testing
Investigators conducting studies need research questions and hypotheses to guide analyses. Starting
with broad research questions (RQs), investigators then identify a gap in current clinical practice or
research. Any research problem or statement is grounded in a better understanding of relationships
between two or more variables. For this article, we will use the following research question example:
Research Question: Is Drug 23 an effective treatment for Disease A?
https://www.statpearls.com/Keywords/UserPdfView?id=95301&userID=c2354948-04fd-4eb8-b529-a1053d7c4f9a 1/6
8/5/24, 9:34 AM Hypothesis Testing, P Values, Confidence Intervals, and Significance - StatPearls
Research questions do not directly imply specific guesses or predictions; we must formulate research
hypotheses. A hypothesis is a predetermined declaration regarding the research question in which the
investigator(s) makes a precise, educated guess about a study outcome. This is sometimes called the
alternative hypothesis and ultimately allows the researcher to take a stance based on experience or
insight from medical literature. An example of a hypothesis is below.
Research Hypothesis: Drug 23 will significantly reduce symptoms associated with Disease A
compared to Drug 22.
The null hypothesis states that there is no statistical difference between groups based on the stated
research hypothesis.
Researchers should be aware of journal recommendations when considering how to report p values,
and manuscripts should remain internally consistent.
Regarding p values, as the number of individuals enrolled in a study (the sample size) increases, the
likelihood of finding a statistically significant effect increases. With very large sample sizes, the p-value
can be very low significant differences in the reduction of symptoms for Disease A between Drug 23
and Drug 22. The null hypothesis is deemed true until a study presents significant data to support
rejecting the null hypothesis. Based on the results, the investigators will either reject the null
hypothesis (if they found significant differences or associations) or fail to reject the null hypothesis
(they could not provide proof that there were significant differences or associations).
To test a hypothesis, researchers obtain data on a representative sample to determine whether to reject
or fail to reject a null hypothesis. In most research studies, it is not feasible to obtain data for an entire
population. Using a sampling procedure allows for statistical inference, though this involves a certain
possibility of error.[1] When determining whether to reject or fail to reject the null hypothesis, mistakes
can be made: Type I and Type II errors. Though it is impossible to ensure that these errors have not
occurred, researchers should limit the possibilities of these faults.[2]
Significance
Significance is a term to describe the substantive importance of medical research. Statistical
significance is the likelihood of results due to chance.[3] Healthcare providers should always delineate
statistical significance from clinical significance, a common error when reviewing biomedical research.
[4] When conceptualizing findings reported as either significant or not significant, healthcare providers
should not simply accept researchers' results or conclusions without considering the clinical
significance. Healthcare professionals should consider the clinical importance of findings and
understand both p values and confidence intervals so they do not have to rely on the researchers to
determine the level of significance.[5] One criterion often used to determine statistical significance is
the utilization of p values.
P Values
P values are used in research to determine whether the sample estimate is significantly different from a
hypothesized value. The p-value is the probability that the observed effect within the study would have
occurred by chance if, in reality, there was no true effect. Conventionally, data yielding a p<0.05 or
p<0.01 is considered statistically significant. While some have debated that the 0.05 level should be
lowered, it is still universally practiced.[6] Hypothesis testing allows us to determine the size of the
effect.
https://www.statpearls.com/Keywords/UserPdfView?id=95301&userID=c2354948-04fd-4eb8-b529-a1053d7c4f9a 2/6
8/5/24, 9:34 AM Hypothesis Testing, P Values, Confidence Intervals, and Significance - StatPearls
An example of findings reported with p values are below:
Statement: Drug 23 reduced patients' symptoms compared to Drug 22. Patients who received Drug 23
(n=100) were 2.1 times less likely than patients who received Drug 22 (n = 100) to experience
symptoms of Disease A, p<0.05.
Or
Statement:Individuals who were prescribed Drug 23 experienced fewer symptoms (M = 1.3, SD = 0.7)
compared to individuals who were prescribed Drug 22 (M = 5.3, SD = 1.9). This finding was statistically
significant, p= 0.02.
For either statement, if the threshold had been set at 0.05, the null hypothesis (that there was no
relationship) should be rejected, and we should conclude significant differences. Noticeably, as can be
seen in the two statements above, some researchers will report findings with < or > and others will
provide an exact p-value (0.000001) but never zero [6]. When examining research, readers should
understand how p values are reported. The best practice is to report all p values for all variables within
a study design, rather than only providing p values for variables with significant findings.[7] The
inclusion of all p values provides evidence for study validity and limits suspicion for selective
reporting/data mining.
While researchers have historically used p values, experts who find p values problematic encourage the
use of confidence intervals.[8]. P-values alone do not allow us to understand the size or the extent of
the differences or associations.[3] In March 2016, the American Statistical Association (ASA) released a
statement on p values, noting that scientific decision-making and conclusions should not be based on a
fixed p-value threshold (e.g., 0.05). They recommend focusing on the significance of results in the
context of study design, quality of measurements, and validity of data. Ultimately, the ASA statement
noted that in isolation, a p-value does not provide strong evidence.[9]
When conceptualizing clinical work, healthcare professionals should consider p values with a
concurrent appraisal study design validity. For example, a p-value from a double-blinded randomized
clinical trial (designed to minimize bias) should be weighted higher than one from a retrospective
observational study [7]. The p-value debate has smoldered since the 1950s[10], and replacement with
confidence intervals has been suggested since the 1980s.[11]
Confidence Intervals
A confidence interval provides a range of values within given confidence (e.g., 95%), including the
accurate value of the statistical constraint within a targeted population.[12] Most research uses a 95%
CI, but investigators can set any level (e.g., 90% CI, 99% CI).[13] A CI provides a range with the lower
bound and upper bound limits of a difference or association that would be plausible for a population.
[14] Therefore, a CI of 95% indicates that if a study were to be carried out 100 times, the range would
contain the true value in 95,[15] confidence intervals provide more evidence regarding the precision of
an estimate compared to p-values.[6]
In consideration of the similar research example provided above, one could make the following
statement with 95% CI:
Statement: Individuals who were prescribed Drug 23 had no symptoms after three days, which was
significantly faster than those prescribed Drug 22; there was a mean difference between the two
groups of days to the recovery of 4.2 days (95% CI: 1.9 – 7.8).
https://www.statpearls.com/Keywords/UserPdfView?id=95301&userID=c2354948-04fd-4eb8-b529-a1053d7c4f9a 3/6
8/5/24, 9:34 AM Hypothesis Testing, P Values, Confidence Intervals, and Significance - StatPearls
It is important to note that the width of the CI is affected by the standard error and the sample size;
reducing a study sample number will result in less precision of the CI (increase the width).[14] A larger
width indicates a smaller sample size or a larger variability.[16] A researcher would want to increase
the precision of the CI. For example, a 95% CI of 1.43 – 1.47 is much more precise than the one
provided in the example above. In research and clinical practice, CIs provide valuable information on
whether the interval includes or excludes any clinically significant values.[14]
Null values are sometimes used for differences with CI (zero for differential comparisons and 1 for
ratios). However, CIs provide more information than that.[15] Consider this example: A hospital
implements a new protocol that reduced wait time for patients in the emergency department by an
average of 25 minutes (95% CI: -2.5 – 41 minutes). Because the range crosses zero, implementing this
protocol in different populations could result in longer wait times; however, the range is much higher
on the positive side. Thus, while the p-value used to detect statistical significance for this may result in
"not significant" findings, individuals should examine this range, consider the study design, and weigh
whether or not it is still worth piloting in their workplace.
Similarly to p-values, 95% CIs cannot control for researchers' errors (e.g., study bias or improper data
analysis).[14] In consideration of whether to report p-values or CIs, researchers should examine journal
preferences. When in doubt, reporting both may be beneficial.[13] An example is below:
Reporting both: Individuals who were prescribed Drug 23 had no symptoms after three days, which
was significantly faster than those prescribed Drug 22, p = 0.009. There was a mean difference
between the two groups of days to the recovery of 4.2 days (95% CI: 1.9 – 7.8).
Clinical Significance
Recall that clinical significance and statistical significance are two different concepts. Healthcare
providers should remember that a study with statistically significant differences and large sample size
may be of no interest to clinicians, whereas a study with smaller sample size and statistically non-
significant results could impact clinical practice.[14] Additionally, as previously mentioned, a non-
significant finding may reflect the study design itself rather than relationships between variables.
Healthcare providers using evidence-based medicine to inform practice should use clinical judgment to
determine the practical importance of studies through careful evaluation of the design, sample size,
power, likelihood of type I and type II errors, data analysis, and reporting of statistical findings (p
values, 95% CI or both).[4] Interestingly, some experts have called for "statistically significant" or "not
significant" to be excluded from work as statistical significance never has and will never be equivalent
to clinical significance.[17]
The decision on what is clinically significant can be challenging, depending on the providers' experience
and especially the severity of the disease. Providers should use their knowledge and experiences to
determine the meaningfulness of study results and make inferences based not only on significant or
insignificant results by researchers but through their understanding of study limitations and practical
implications.
Nursing, Allied Health, and Interprofessional Team
Interventions
https://www.statpearls.com/Keywords/UserPdfView?id=95301&userID=c2354948-04fd-4eb8-b529-a1053d7c4f9a 4/6
8/5/24, 9:34 AM Hypothesis Testing, P Values, Confidence Intervals, and Significance - StatPearls
All physicians, nurses, pharmacists, and other healthcare professionals should strive to understand the
concepts in this chapter. These individuals should maintain the ability to review and incorporate new
literature for evidence-based and safe care.
References
[1] Jones M, Gebski V, Onslow M, Packman A. Statistical power in stuttering research: a tutorial. Journal of speech,
language, and hearing research : JSLHR. 2002 Apr:45(2):243-55 [PubMed PMID: 12003508]
(http://www.ncbi.nlm.nih.gov/pubmed/12003508)
[2] Sedgwick P. Pitfalls of statistical hypothesis testing: type I and type II errors. BMJ (Clinical research ed.). 2014 Jul
3:349():g4287. doi: 10.1136/bmj.g4287. Epub 2014 Jul 3 [PubMed PMID: 24994622]
(http://www.ncbi.nlm.nih.gov/pubmed/24994622)
[3] Fethney J. Statistical and clinical significance, and how to use confidence intervals to help interpret both. Australian
critical care : official journal of the Confederation of Australian Critical Care Nurses. 2010 May:23(2):93-7. doi:
10.1016/j.aucc.2010.03.001. Epub 2010 Mar 29 [PubMed PMID: 20347326]
(http://www.ncbi.nlm.nih.gov/pubmed/20347326)
[4] Hayat MJ. Understanding statistical significance. Nursing research. 2010 May-Jun:59(3):219-23. doi:
10.1097/NNR.0b013e3181dbb2cc. Epub [PubMed PMID: 20445438]
(http://www.ncbi.nlm.nih.gov/pubmed/20445438)
[5] Ferrill MJ, Brown DA, Kyle JA. Clinical versus statistical significance: interpreting P values and confidence intervals
related to measures of association to guide decision making. Journal of pharmacy practice. 2010 Aug:23(4):344-51.
doi: 10.1177/0897190009358774. Epub 2010 Apr 13 [PubMed PMID: 21507834]
(http://www.ncbi.nlm.nih.gov/pubmed/21507834)
[6] Infanger D, Schmidt-Trucksäss A. P value functions: An underused method to present research results and to promote
quantitative reasoning. Statistics in medicine. 2019 Sep 20:38(21):4189-4197. doi: 10.1002/sim.8293. Epub 2019 Jul 3
[PubMed PMID: 31270842] (http://www.ncbi.nlm.nih.gov/pubmed/31270842)
[7] Dorey F. Statistics in brief: Interpretation and use of p values: all p values are not equal. Clinical orthopaedics and
related research. 2011 Nov:469(11):3259-61. doi: 10.1007/s11999-011-2053-1. Epub [PubMed PMID: 21918804]
(http://www.ncbi.nlm.nih.gov/pubmed/21918804)
[8] Liu XS. Implications of statistical power for confidence intervals. The British journal of mathematical and statistical
psychology. 2012 Nov:65(3):427-37. doi: 10.1111/j.2044-8317.2011.02035.x. Epub 2011 Oct 25 [PubMed PMID:
22026811] (http://www.ncbi.nlm.nih.gov/pubmed/22026811)
[9] Tijssen JG, Kolm P. Demystifying the New Statistical Recommendations: The Use and Reporting of p Values. Journal of
the American College of Cardiology. 2016 Jul 12:68(2):231-3. doi: 10.1016/j.jacc.2016.05.026. Epub [PubMed PMID:
27386779] (http://www.ncbi.nlm.nih.gov/pubmed/27386779)
[10] Spanos A. Recurring controversies about P values and confidence intervals revisited. Ecology. 2014 Mar:95(3):645-51
[PubMed PMID: 24804448] (http://www.ncbi.nlm.nih.gov/pubmed/24804448)
[11] Freire APCF, Elkins MR, Ramos EMC, Moseley AM. Use of 95% confidence intervals in the reporting of between-group
differences in randomized controlled trials: analysis of a representative sample of 200 physical therapy trials. Brazilian
journal of physical therapy. 2019 Jul-Aug:23(4):302-310. doi: 10.1016/j.bjpt.2018.10.004. Epub 2018 Oct 16 [PubMed
PMID: 30366845] (http://www.ncbi.nlm.nih.gov/pubmed/30366845)
[12] Dorey FJ. In brief: statistics in brief: Confidence intervals: what is the real result in the target population? Clinical
orthopaedics and related research. 2010 Nov:468(11):3137-8. doi: 10.1007/s11999-010-1407-4. Epub [PubMed
PMID: 20532716] (http://www.ncbi.nlm.nih.gov/pubmed/20532716)
https://www.statpearls.com/Keywords/UserPdfView?id=95301&userID=c2354948-04fd-4eb8-b529-a1053d7c4f9a 5/6
8/5/24, 9:34 AM Hypothesis Testing, P Values, Confidence Intervals, and Significance - StatPearls
[13] Porcher R. Reporting results of orthopaedic research: confidence intervals and p values. Clinical orthopaedics and
related research. 2009 Oct:467(10):2736-7. doi: 10.1007/s11999-009-0952-1. Epub 2009 Jun 30 [PubMed PMID:
19565303] (http://www.ncbi.nlm.nih.gov/pubmed/19565303)
[14] Gardner MJ, Altman DG. Confidence intervals rather than P values: estimation rather than hypothesis testing. British
medical journal (Clinical research ed.). 1986 Mar 15:292(6522):746-50 [PubMed PMID: 3082422]
(http://www.ncbi.nlm.nih.gov/pubmed/3082422)
[15] Cooper RJ, Wears RL, Schriger DL. Reporting research results: recommendations for improving communication. Annals
of emergency medicine. 2003 Apr:41(4):561-4 [PubMed PMID: 12658257]
(http://www.ncbi.nlm.nih.gov/pubmed/12658257)
[16] Doll H, Carney S. Statistical approaches to uncertainty: P values and confidence intervals unpacked. Equine veterinary
journal. 2007 May:39(3):275-6 [PubMed PMID: 17520981] (http://www.ncbi.nlm.nih.gov/pubmed/17520981)
[17] Colquhoun D. The reproducibility of research and the misinterpretation of p-values. Royal Society open science. 2017
Dec:4(12):171085. doi: 10.1098/rsos.171085. Epub 2017 Dec 6 [PubMed PMID: 29308247]
(http://www.ncbi.nlm.nih.gov/pubmed/29308247)
https://www.statpearls.com/Keywords/UserPdfView?id=95301&userID=c2354948-04fd-4eb8-b529-a1053d7c4f9a 6/6