HYPOTHESIS TESTING
A hypothesis
• is a conjecture about a population parameter. This conjecture may or may not be
true.
• An educated guess based on theory and background information
Hypothesis Testing is a process of using sample data and statistical procedures to
decide whether to reject or not reject a hypothesis (statement) about a population
parameter value.
Examples
a. Whether seat belts will reduce the severity of injuries caused by accident
b. Whether the public prefer certain colour in the fabric lining
c. Whether adding a chemical will improve water quality
d. The average life expectancy in the next decade for man will be more than 100
years
Two type of statistical hypothesis
i) The Null Hypothesis: symbolised by Ho, states that there is no difference
between a parameter and a specific value OR that there is no difference
between two parameters. NULL means NO CHANGE. Statement of equality
ii) The Alternative Hypothesis: symbolised by Ha, states a specific difference
between parameter and a specific value OR states that there is a difference
between two parameters. TEST or Research Hyphothesis.
Situation A: A researcher is interested in finding out whether a new medicine will
have any undesirable side effects on the pulse rate of the patient. Will the pulse
rate increase, decrease or remain unchanged. Since the researcher knows the pulse
rate of the population under study is 82 beats per minute, the hypothesis will be
Ho : µ = 82 (remain uncahnged)
H1 : µ ≠ 82 (will be different)
This is a two-tailed test since the possible effect could be to raise or lower the
pulse
1
Situation B: A chemist invents an additive to increase the life of an automobile
battery. The mean lifetime of ordinary battery is 36 months. The hypothesis will
be:
Ho : µ ≤ 36
Ha : µ > 36
The chemist is interested only in increasing the lifespan of the battery. His
alternative hypothesis is that the mean is larger than 36. Therefore the test is
called right-tailed, interested in the increase only.
Situation C: A contractor wishes to lower heating bill by using a special type of
insulation in house. If the average monthly bill is RM100, his hypothesis will be
Ho : µ ≥ RM 100
H1 : µ < RM 100
This is a left-tailed test since the contractor is only interested in reducing the bill
• General Procedure for testing the hypothesis. Can be done statistically.
i. Step 1: State the hypothesis
ii. Step 2: find critical value for a selected level of significant (α) e.g. 0.1,
0.05, 0.01. Consider case for one-tailed or two-tailed
iii. Step 3: Compute the test value using z-test or t-test
iv. Step 4: Make the decision to reject or not to reject the hypothesis. If
test value < critical value accept Ho. test value > critical value reject Ho.
Refer
What is Significant Difference??
A significant difference occurs if the difference between the hypothesized (null)
value and the sample statistic value is too large to be attributed to chance. A
significant difference strongly suggests that the null hypothesis is not true.
Significant difference at p<0.05 means, 95% of the time the sample mean is
larger than the hypothesised value.
2
TESTING THE DIFFERENCE AMONG MEANS AND VARIANCE
Situations:
i. To compare the average lifetime of two difference brands of tires
ii. Two different brands of fertilizer, whether one is better than the other for
growing plants
iii. Two brands of cough syrup, to test whether one brand is more effective than
the other
Commonly used Methods
1. z-test
• For detecting difference between two means for large sample (two samples)
• Assumptions required
i. The sample must be independent, that is no relationship between the
subject in the sample
ii. The sample must be normally distributed
2. F-test
• For the comparison of two variances or standard deviations. E.g variation in
cholesterol level in man and women
• Assumptions
i. The population from which the samples were obtained must be normally
distributed
ii. Samples must be independent of each other
3. t-test
• To test the difference between two means for small independent sample (n<30)
• Assumptions
i. Sample must be independent
ii. The populations are normally distributed
CORRELATION AND REGRESSION
3
Correlation is a statistical method used to determine whether a relationship
between variable exists. Regression describe the nature of the relationship
between variables
Line of best fit
Best fit means that the sum of the squares of vertical distances from each point to
the line are at the minimum
Regression equation
Y = a + bx: a = intercept and b = slope
Assumptions in Regression
i. For any value of the independent variable x, the value of the dependent must
be normally distributed about the regression line
ii. The standard deviation of each of the dependent variable must be the same
for each value of the independent variable
Coefficient of Determination is the ratio of the explained variation to the total
variation and is denoted by r2
explained variation
r2 =
total variation
Example r2 = 0.845 mean 84.5% of the total variation is explained by the regression
line using the independent variable.
Definition: the coefficient of determination is a measure of the variation of the
dependent variable that is explained by the regression line and the independent
variable.
Standard Error of Estimate
• Defined as the standard deviation of the observed y value about the predicted
y’ values
• y’ is predicted for a specific x value, thus it is a point prediction
• A prediction interval about y value can be constructed using a statistic call
standard error of estimate (Sest)
• Formula
4
∑ ( y − y' )2
S est =
n−2
Testing Regression Model
• Test of the slope, β.
• β.≠0 in order x can be used to predict Y
β.= 0
β.≠0
• Use t-test.
• β.≠0 if calculated t is greater than t (from table) at certain significant level (α)
Multiple Regression
Several independent variables and one dependent
Y’ = a +b1x1+ b2x2 + ……. bkxk
Assumptions for multiple regression
• For any specific value of independent variable, the value of the y variable are
normally distributed (normality assumption)
• The variances or standard deviation for the y variable are the same for each
value of the independent variable (equal variance assumption)
• There is a linear relationship between the dependent variable and the
independent variable (linearity assumption)
5
• The independent variables are not correlated
• The values for the y variables are independent
NON-PARAMETRIC TEST
• Z, f and t-tests are parametric – when data are normally distributed
• When data is not normally distributed – Non-Parametric test is more
appropriate.
• Also called Distribution Free Statistics
Advantages of Non Parametric Test
i. Can be used when the variable is not normally distributed
ii. Can be used when data is small
iii. Can be used to test hypothesis
iv. The computation is easier
v. Easier to understand
Disadvantages
i. Less sensitive
ii. Less information
iii. Less efficient
Involved Ranking of data
e.g . Sign Test
• The value is assigned + when greater than median and – when lower than median
• The number of + and – signs are compared
• If the Null Hypothesis is true the number of + sign about equal to – signs !
ratio close to 1. Reject Null hypothesis when the value is greater than critical
value (from table).
USING MODELS
Be sure with data requirement and the need of the study
Consists of 4 main steps
6
i. Model formulation
ii. Model optimazation
iii. Model calibration/verification
iv. Model Application
Model Formulation
• Involved empirical and theoretical evidences
• Make assumptions – to reduce the problem to a manageable form (simplification
of process)
Model optimization
• Regression analysis – analytical way
• Subjective optimization – based on experience of the modelers
Model Calibration
• Changing the coefficient
• Reduce error between observed and predicted values
Model Application
• After the model has been calibrated and validated
USING STANDARDS
• Straight forward, strictly follow the procedure
• To check conformity
• Compared with specific standards