0% found this document useful (0 votes)
17 views12 pages

Glossary

The document is a glossary of key terms related to impact evaluation, covering concepts from causation and correlation to statistical methods and experimental designs. It includes definitions and explanations of terms such as impact evaluation, monitoring, randomized controlled trials, and various statistical concepts like variance and regression analysis. The glossary serves as a resource for understanding the methodologies and terminology used in evaluating development policies and programs.

Uploaded by

achristopherfred
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views12 pages

Glossary

The document is a glossary of key terms related to impact evaluation, covering concepts from causation and correlation to statistical methods and experimental designs. It includes definitions and explanations of terms such as impact evaluation, monitoring, randomized controlled trials, and various statistical concepts like variance and regression analysis. The glossary serves as a resource for understanding the methodologies and terminology used in evaluating development policies and programs.

Uploaded by

achristopherfred
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Glossary

Key Terms by Module


Module 1 Impact Evaluation as an Instrument of Development Policy
Causation
A relationship between two variables in which a change in one brings about a
change in the other.

Source: Social Research Glossary: Quality Research International

Correlation
Correlation is the degree to which two or more quantities are linearly
associated. In a two-dimensional plot, the degree of correlation between the
values on the two axes is quantified by the so-called correlation coefficient.

Source: Wolfram MathWorld

Impact Evaluation
An impact evaluation is an evaluation that tries to make a causal link between
a program or intervention and a set of outcomes. An impact evaluation tries to
answer the question of whether a program is responsible for changes in the
outcomes of interest.
Intervention
The project, program or policy which is the subject of the impact evaluation.

Source: 3ie Impact Evaluation Glossary. International Initiative for


Impact Evaluation: New Delhi, India

Monitoring
Monitoring is the continuous process of collecting and analyzing information to
assess how well a project, program, or policy, is performing. It relies primarily
on administrative data to track performance against expected results, make
comparisons across programs, and analyze trends over time. Monitoring
usually tracks inputs, activities, and outputs, though occasionally it includes
outcomes as well. Monitoring is used to inform day-to-day management and
decisions.

Outcome
Can be intermediate or final. An outcome is a result of interest that comes
about through a combination of supply and demand factors. For example, if an
intervention leads to a greater supply of vaccination services, then actual
vaccination numbers would be an outcome, as they depend not only on the
supply of vaccines but also on the behavior of the intended beneficiaries: do
they show up at the service point to be vaccinated? Final or long-term
outcomes are more distant outcomes. The distance can be interpreted in a
time dimension (it takes a long time to get to the outcome) or a causal
dimension (many causal links are needed to reach the outcome).

Quasi-Experimental Design
Impact evaluation designs used to determine impact in the absence of a
control group from an experimental design. Many quasi-experimental
methods, e.g. propensity score matching and regression discontinuity design,
create a comparison group using statistical procedures. The intention is to
ensure that the characteristics of the treatment and comparison groups are
identical in all respects, other than the intervention, as would be the case from
an experimental design. Other, regression-based approaches, have an implicit
counterfactual, controlling for selection bias and other confounding factors
through statistical procedures.

Source: 3ie Impact Evaluation Glossary. International Initiative for


Impact Evaluation: New Delhi, India

Module 2 | RCT Basics


Cluster
A cluster is a group of units that are similar in one way or another. For
example, in a sampling of school children, children who attend the same
school would belong to a cluster because they share the same school facilities
and teachers and live in the same neighborhood.

Counterfactual
The counterfactual is an estimate of what the outcome (Y) would have been
for a program participant in the absence of the program (P). By definition, the
counterfactual cannot be observed. Therefore, it must be estimated using
comparison groups.

Control Group
Also known as a “comparison group.” A valid comparison group will have the
same characteristics as the group of beneficiaries of the program (treatment
group), except that the units in the comparison group do not benefit from the
program. Comparison groups are used to estimate the counterfactual.

External Validity
To have external validity means that the causal impact discovered in the
impact evaluation can be generalized to the universe of all eligible units. For
an evaluation to be externally valid, it is necessary that the evaluation sample
be a representative sample of the universe of eligible units.

Internal Validity
To say that an impact evaluation has internal validity means that it uses a
valid comparison group, that is, a comparison group that is a valid estimate of
the counterfactual.

Randomized Assignment or Randomized Controlled Trial (RCT)


Randomized assignment is considered the most robust method for estimating
counterfactuals and is often referred to as the “gold standard” of impact
evaluation. With this method, beneficiaries are randomly selected to receive
an intervention, and each has an equal chance of receiving the program. With
large-enough sample sizes, the process of random assignment ensures
equivalence, in both observed and unobserved characteristics, between the
treatment and control groups, thereby addressing any selection bias.

Spillover Effect
Also known as contamination of the comparison group. A spillover effect
occurs when the comparison group is affected by the treatment administered
to the treatment group, even though the treatment is not administered directly
to the comparison group. If the spillover effect on the comparison group is
negative (that is, if they suffer because of the program), then the straight
difference between outcomes in the treatment and comparison groups will
yield an overestimation of the program impact. By contrast, if the spillover
effect on the comparison group is positive (that is, they benefit), then it will
yield an underestimation of the program impact.

Treatment Group
Also known as the treated group or the intervention group. The treatment
group is the group of units that benefits from an intervention, versus the
comparison group that does not.

Module 3 | Statistical Concepts


Binomial Distribution
A binomial distribution can be thought of as simply the probability of a
SUCCESS or FAILURE outcome in an experiment or survey that is repeated
multiple times. The binomial is a type of distribution that has two possible
outcomes (the prefix “bi” means two, or twice). For example, a coin toss has
only two possible outcomes: heads or tails and taking a test could have two
possible outcomes: pass or fail.

Central Limit Theorem


In probability theory, the central limit theorem (CLT) establishes that, in some
situations, when independent random variables are added, their properly
normalized sum tends toward a normal distribution (informally a "bell curve")
even if the original variables themselves are not normally distributed. The
theorem is a key concept in probability theory because it implies that
probabilistic and statistical methods that work for normal distributions can be
applicable to many problems involving other types of distributions.

Normal distribution
Normal distribution, also known as the Gaussian distribution, is a probability
distribution that is symmetric about the mean, showing that data near the
mean are more frequent in occurrence than data far from the mean. In graph
form, normal distribution will appear as a bell curve.

Sample
In statistics, a sample is a subset of a population. Typically, the population is
very large, making a census or a complete enumeration of all the values in the
population impractical or impossible. Instead, researchers can select a
representative subset of the population (using a sampling frame) and collect
statistics on the sample; these may be used to make inferences or to
extrapolate to the population. This process is referred to as sampling.

Sampling
Process by which units are drawn from the sampling frame built from the
population of interest (universe). Various alternative sampling procedures can
be used. Probability sampling methods are the most rigorous because they
assign a well-defined probability for each unit to be drawn. Random sampling,
stratified random sampling, and cluster sampling are all probability sampling
methods. Non-probabilistic sampling (such as purposive or convenience
sampling) can create sampling errors.

Variable
In statistical terminology, a variable is a symbol that stands for a value that
may vary.

Module 4 | Statistical Inference


Alternative Hypothesis
In impact evaluation, the alternative hypothesis is usually the hypothesis that
the null hypothesis is false; in other words, that the intervention has an impact
on outcomes.

Assumed Mean
In statistics the assumed mean is a method for calculating the arithmetic
mean and standard deviation of a data set. It simplifies calculating accurate
values by hand. Its interest today is chiefly historical but it can be used to
quickly estimate these statistics.

Confidence interval
A confidence interval is a type of interval estimate, computed from the
statistics of the observed data, that might contain the true value of an
unknown population parameter. The interval has an associated confidence
level, or coverage that, loosely speaking, quantifies the level of confidence
that the deterministic parameter is captured by the interval.

Estimator
In statistics, an estimator is a statistic (a function of the observable sample
data) that is used to estimate an unknown population parameter; an estimate
is the result from the actual application of the function to a particular sample of
data.

Hypothesis
A hypothesis is a proposed explanation for an observable phenomenon. See
also, null hypothesis and alternative hypothesis.>

Null hypothesis
A null hypothesis is a hypothesis that might be falsified on the basis of
observed data. The null hypothesis typically proposes a general or default
position. In impact evaluation, the default position is usually that there is no
difference between the treatment and control groups, or in other words, that
the intervention has no impact on outcomes.

Significance Level
The significance level is usually denoted by the Greek symbol, α (alpha).
Popular levels of significance are 5 percent (0.05), 1 percent (0.01), and 0.1
percent (0.001). If a test of significance gives a p value lower than the α level,
the null hypothesis is rejected. Such results are informally referred to as
“statistically significant.” The lower the significance level, the stronger the
evidence required. Choosing the level of significance is an arbitrary task, but
for many applications, a level of 5 percent is chosen for no better reason than
that it is conventional.

T-distribution
In probability and statistics, Student's t-distribution is any member of a family
of continuous probability distributions that arises when estimating the mean of
a normally distributed population in situations where the sample size is small
and population standard deviation is unknown. It was developed by William
Sealy Gosset under the pseudonym Student.

True Mean or Population Mean


In statistics, it’s actually rare that you can calculate the population mean.
That’s because asking an entire population about something is usually cost
prohibitive or too time consuming.

Source: Statistics How To

Variance
In probability theory and statistics, variance is the expectation of the squared
deviation of a random variable from its mean. Informally, it measures how far
a set of numbers are spread out from their average value.

Z-score
In statistics, the standard score is the signed fractional number of standard
deviations by which the value of an observation or data point is above the
mean value of what is being observed or measured. Observed values above
the mean have positive standard scores, while values below the mean have
negative standard scores.

Module 5 | Regression Analysis


Baseline
Pre-intervention, ex-ante. The situation prior to an intervention, against which
progress can be assessed or comparisons made. Baseline data are collected
before a program or policy is implemented to assess the “before” state.

Ordinary Least Squares (OLS)


In statistics, ordinary least squares (OLS) is a type of linear least squares
method for estimating the unknown parameters in a linear regression model.
OLS chooses the parameters of a linear function of a set of explanatory
variables by the principle of least squares: minimizing the sum of the squares
of the differences between the observed dependent variable (values of the
variable being observed) in the given dataset and those predicted by the
linear function.

Regression
In statistics, regression analysis includes any techniques for modeling and
analyzing several variables, when the focus is on the relationship between a
dependent variable and one or more independent variables. In impact
evaluation, regression analysis helps us understand how the typical value of
the outcome indicator Y (dependent variable) changes when the assignment
to treatment or comparison group P (independent variable) is varied, while the
characteristics of the beneficiaries (other independent variables) are held
fixed.

Module 6 | Imperfect Compliance and Attrition


Attrition
Attrition occurs when some units drop from the sample between one data
collection round and another, for example, because migrants are not tracked.
Attrition is a case of unit nonresponse. Attrition can create bias in impact
evaluations if it is correlated with treatment status.

Instrumental variable
An instrumental variable is a variable that helps identify the causal impact of a
program when participation in the program is partly determined by the
potential beneficiaries. A variable must have two characteristics to qualify as a
good instrumental variable: (1) it must be correlated with program
participation, and (2) it may not be correlated with outcomes Y (apart from
through program participation) or with unobserved variables.

Intention-to-Treat (ITT) or Estimator


The ITT estimator is the straight difference in the outcome indicator Y for the
group to whom we offered treatment and the same indicator for the group to
whom we did not offer treatment.

Local Average Treatment Effect (LATE)


Also known as the complier average causal effect (CACE). It is the treatment
effect for the subset of the sample that takes the treatment if and only if they
were assigned to the treatment, otherwise known as the compliers. The
impact of the program estimated for a specific subset of the population, such
as units that comply with their assignment to the treatment or comparison
group in the presence of imperfect compliance, or around the eligibility cutoff
score when applying a regression discontinuity design. Thus, the LATE
provides only a local estimate of the program impact and should not be
generalized to the entire population.

Sources: Wikipedia, Local average treatment effect; Gertler, P.J.,


Martinez, S., Premand, P., Rawlings, L.B., and Vermeersch, C.MJ.
(2016). Impact evaluation in Practice, 2nd Ed. The World Bank .

Treatment-on-the-Treated (TOT) (effect of)


Also known as the TOT estimator. The effect of treatment on the treated is the
impact of the treatment on those units that have actually benefited from the
treatment.

Module 7 Power Calculations


Cost-Benefit Analysis
Ex-ante calculations of total expected costs and benefits, used to appraise or
assess project proposals. Cost-benefit can be calculated ex-post in impact
evaluations if the benefits can be quantified in monetary terms and the cost
information is available.

Encouragement Design
A form of randomized control trial in which the treatment group is given an
intervention (e.g. a financial incentive or information) to encourage them to
participate in the intervention being evaluated. The population in both
treatment and control have access to the intervention being evaluated, so the
design is suitable for national-level policies and programs.

Source: 3ie Impact Evaluation Glossary. International Initiative for


Impact Evaluation: New Delhi, India
Intra-Cluster Correlation
Intra-cluster correlation is correlation (or similarity) in outcomes or
characteristics between units that belong to the same cluster. For example,
children that attend the same school would typically be similar or correlated in
terms of their area of residence or socioeconomic background.

Minimum Detectable Effect (MDE)


An input for power calculations; that is, it provides the effect size that
an impact evaluation is designed to estimate for a given level
of significance and power. Evaluation samples need to be large enough to
detect a policy-relevant minimum detectable effect with sufficient power. The
minimum detectable effect is set by considering the change in outcomes that
would justify the investment in an intervention.

Source: Gertler, P.J., Martinez, S., Premand, P., Rawlings, L.B., and
Vermeersch, C.MJ. (2016). Impact evaluation in Practice, 2nd Ed. The
World Bank

Power
The power is the probability of detecting an impact if one has occurred. The
power of a test is equal to 1 minus the probability of a type II error, ranging
from 0 to 1. Popular levels of power are 0.8 and 0.9. High levels of power are
more conservative and decrease the likelihood of a type II error. An impact
evaluation has high power if there is a low risk of not detecting real program
impacts, that is, of committing a type II error.

Stratified Sample
Obtained by dividing the population of interest (sampling frame) into groups
(for example, male and female), and then drawing a random sample within
each group. A stratified sample is a probabilistic sample: every unit in each
group (or stratum) has the same probability of being drawn.

Type I Error
Error committed when rejecting a null hypothesis even though the null
hypothesis actually holds. In the context of an impact evaluation, a type I error
is made when an evaluation concludes that a program has had an impact
(that is, the null hypothesis of no impact is rejected), even though in reality the
program had no impact (that is, the null hypothesis holds). The significance
level determines the probability of committing a type I error.

Type II Error
Error committed when accepting (not rejecting) the null hypothesis even
though the null hypothesis does not hold. In the context of an impact
evaluation, a type II error is made when concluding that a program has no
impact (that is, the null hypothesis of no impact is not rejected) even though
the program did have an impact (that is, the null hypothesis does not hold).
The probability of committing a type II error is 1 minus the power level.

Module 8 | Difference in Differences


Before-and-After Comparison
Also known as “pre-post comparison” or “reflexive comparison,” a before-and-
after comparison attempts to establish the impact of a program by tracking
changes in outcomes for program beneficiaries over time, using
measurements before and after the program or policy is implemented.

Difference-in-Differences
Also known as “double difference” or “DD.” Difference-in-differences estimates
the counterfactual for the change in outcome for the treatment group by taking
the change in outcome for the comparison group. This method allows us to
take into account any differences between the treatment and comparison
groups that are constant over time. The two differences are thus before and
after, and between the treatment and comparison groups.

Triple Difference
The comparative or differential impact on two groups, calculated as the
difference in the double difference impact estimate for each group compared
to a no treatment comparison group. A significant triple difference estimates
demonstrates the presence of impact heterogeneity.

Source: 3ie Impact Evaluation Glossary. International Initiative for


Impact Evaluation: New Delhi, India
Module 9 | Regression Discontinuity
Fuzzy RD
Regression discontinuity design in which some subjects do not receive their
treatment or control condition.

Regression Discontinuity Design (RDD)


Regression discontinuity design is a non-experimental evaluation method. It is
adequate for programs that use a continuous index to rank potential
beneficiaries and that have a threshold along the index that determines
whether potential beneficiaries receive the program or not. The cutoff
threshold for program eligibility provides a dividing point between the
treatment and comparison groups.

Source: Gertler, P.J., Martinez, S., Premand, P., Rawlings, L.B., and
Vermeersch, C.MJ. (2016). Impact evaluation in Practice, 2nd Ed. The
World Bank.

Sharp RD
Regression discontinuity design in which all subjects receive their assigned
treatment or control condition.

You might also like