0% found this document useful (0 votes)
11 views

Biostatistics Sem V

Biostatistics notes for tybsc

Uploaded by

Rohit Sayyed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
11 views

Biostatistics Sem V

Biostatistics notes for tybsc

Uploaded by

Rohit Sayyed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 20
CHAPTER IV BIOSTATISTICS Objective: ‘Tomake learner familiar with biostatistics as an important tool of analysis and applications, Desired outcome: The learner will be able to collect, organize and analyze data using parametric and non-parametric tess. — They will also be able to set up a hypothesis and verify the same using limits of significance. 4.1 Probability Distributions: Probability distributions are theoretical or expected frequency distributions when the number of observations made is very large. They are useful tools in making inferences and decisions under conditions of uncertainty based on limited data or theoretical considerations. Probability distributions are either discrete or continuous depending on whether they define probabilities for discrete or random variables. Probability distributions are of many types but the most extensively used are Binomial distribution, Poisson Distribution and Normal Distribution. Binomial Distribution Binomial distribution was discovered by Swiss mathematician James Bernoulli (1654-1705). Itisa discrete probability distribution which is obtained when the probability ‘P’ of the happening of an event is same in all the trials and there are only two outcomes in each trial. For example, tossing of a coin has two outcomes - Head or tail likewise result of taking a test can be a success or failure. The probability distribution of binomial distribution for X success in n trails is given by POO="C, pra =nl /x! nex! * p%qr™* For x=0,1, 2.1 where n=no, of trails. x= no. of success p= probability of success q= probability of failure (q= 1p) Characteristics 1. Itisa discrete probability distribution. 2. It has two parameters p or q (the probability of success or failure) and n (the number of trails). The parameter n is always a positive integer. 3. In binomial distribution - Mean (jt) = np; Variance (6”) = npq ; Standard deviation (6) = npq 4, Binomial distribution issymmetrcalifp=q~0.5; positively skewedifp< 0.5 and negatively skewed if p> 0.5 5. “As‘n’ increases the binomial distribution approaches the normal distribution, (& scanned with OKEN Scanner Example hat is the probability of findi is wha itty of Findings females in a sample of S fishes drawn one by one, if probability Solution Probability of finding female fish = 0. =P Probability of not finding female fish (i.e finding male fish =05=4q Applying binomial distribution, P(x) = "Cy p* qr Where n=5, x=3, p= 0.5=1/2 P(3) = C3 (1/2)3 (1/2)? = 51/3! (5-3)! * (1/2) 1/34 21 * (1/2) =5x4x3x2K1/3x2x1x2x1* (1/2) 5/16 = 0.312 Poisson Distribution Poisson distribution was derived by French mathematician Siméon Denis Poisson (1837). Poisson distribution is the probability distribution of discrete random variables of rare events whose probability occurrence (p) is very small but number of variables (n) is very large such that np is constant. Formula The probability distribution of a random variable of X is said to have Poissot Pexy=e™ mix! where P= Probability of success; x = variables 0,1,2,3,.. tant 2.7183; m=mean of distribution n distribution. = cons Characteristics 1. Itisa discrete probability distribution, a limit thas a single parameter, the mean of distribution. ited form of Binomial distribution. 2. 3. Mean and variance are equal. 4. m increases, it will tend towards normal It is positively skewed to the left. However, as distribution. 5. In Poisson distribution, it is assumed that rare events occur randomly and independently. an a (& scanned with OKEN Scanner Example Asscience book with 585 pages contains 43 typological errors. Ifthese errors are randomly distributed throughout the book, what is the probability that 10 pages selected at random will be free from errors? (Use e--735= 0.4795) Solution n=10 Typological errors 43 out of total 585 pages P= 43/585=0.0735 Mean = m= np = 10 x 0.0735 = 0.735 Poisson Distribution P(x)= e-™ m* /x! Probability of successes =1 P (0) = e735 x (0,735) 01 0.4795 x 1 = 0.4795 Normal Distribution ‘Normal distribution was first discovered by English Mathematician De Movire in 1733. Later it was rediscovered and developed by Gauss (1809) and Laplace (1812) and hence also called as, Gausian and Laplace distribution. Normal distribution is a continuous probability distribution which is bell shaped unimodal and symmetrical. When normal distribution of a variable is represented graphically, ittakes the shape of a symmetrical curve called as Normal Curve. This curve is asymptotic to X- axis on either side. Normal curve can be represented by the following equation y=Vovie ¢--aV" Where m = mean; 1 = constant = 3.14; o = standard deviation; e = constant = 2.7185 ‘Characteristics 1. Normal distribution is a continuous distribution and can assumes any value from -2 to +00. 2. Normal distribution is identified by two parameters-mean (m) and standard deviation (6) 3. Mean + I a include 68.27% (2/3"4) of all the observations. Mean + 2.6 include 95.45 % of all the observations. Mean + 3 o include 99% of all the observations. (fig. 4.1) ‘Normal Curve is bell shaped and symmetrical about the line X= 1 Both the tails extend to infinity (asymptotic). It is a unimodal i.e. has only one mode. ‘Mean=Median=Mode=p. The center of the curve is defined by the mean and the spread of the curve by standard deviation. eXAyavs 92. (& scanned with OKEN Scanner | til coeffic . 9. The coefficient of skewness is zero and coefficient of kurtosis is 3. 10. Total area under the normal curve is 1 To. i silts Ne Mean Mean Fig. 4.1: Area betweenm+16 ; m+1.966 ; m+2.586 Applications 1. It is used to approximate binomial and poisson distributions. 2. Itis used in sampling theory. 3. Itis used in statistical quality control. 4, Itis used in statistical hypothesis and tests of significance in which itis always assumed that samples are drawn from a population with normal distribution. Z transformation. ‘A random variable Z. which has a normal distribution with mean 1=0 and standard deviation @ = 1 js called standard normal distribution and curve is called standard normal curve. Tables giving the reas under the standard normal curve are available for ready use. Total area bound by normal curve and the X axis is 1 1f we have to calculate area under the curve that falls between two points on the Xaxis for example between X= aand X=b, we can make use ofthe tables after transforming the normal variable X to a standard normal variable Z using the formula : Z = X-w/o where Z= standard normal variable; y= Mean; o= standard deviation This is known as Z transformation. Example Weight of Rohu was found to be normally distibuted with mean at $00 gand standard deviation 50 . Find the standard normal variate of Rohu with 520 g. Solution Weight X = 520g,» =500g ando=50 = 520-500/50 = 20/50 =0.4 P—valne When we perform a bypothesis test in statistics, a p-value helps in determining the significance of ‘our results. Hypothesis tests are used to test the validity of a claim that is made about a population. This claim that’s on trial is called the null hypothesis. The alternative hypothesis is the one we accep ifthe null hypothesis is concluded to be untrue, The evidence in the tial isthe data and the 983 (& scanned with OKEN Scanner statistics that go along with it. All hypothesis tests use a p-value to weigh the strength of the evidence (what the data tells about the population). The term significance level (alpha) is used to refer toa pre-chosen probability and the term ‘P value’ is used to indicate a probability that_you calculate after a given study. If your P value is less than the chosen significance level then you reject the null hypothesis i.e. accept that your sample gives reasonable evidence to support the alternative hypothesis. The choice of significance level at which you reject Hy is arbitrary. Conventionally the 5% (less than 1 in 20 chance of being wrong), 1% and 0.1% (P <0.05, 0.01 and 0.001) levels have been used. Probability Probability may be defined as relative frequency or probable chances of occurrence of an event on anaverage. For example, giving birth to a female in the first pregnancy; getting, head when unbiased coin is tossed is Ys or 50%, Probability is denoted by symbol ‘p’ and its value ranges from zero to one 0 p or KZ,( the table value) ‘That means thatthe calculated value > table value, then we reject Hy and accept Hy given level of significance, 4.4 Parametric and Non Parametric Tests A parametric statistical test is one that makes assumptions about the parameters of the population distributions) from which one’s data are drawn, while a non-parametric testis one that makes no such assumptions. Z test and test are examples of parametric tests which assume that the underlying, source population(s) are normally distributed whereas chi-square testis a non-parametric test, Nowe parametric tests are also called as distribution free tests. (& scanned with OKEN Scanner ple 4.2: Differ a ences between parametric and non-parametric tests Basis for Parametric test comparison A statistical test, in which specific assumptions are made about the Population parameter is known as parametric test. Distribution Arbitrary pee —_e — | Variables Variables and Attributes ‘Spearman Pearson ical test used in the case of iables, is A stat non-metric independent var called non-parametric test. Z test is a large sample test based on normal distribution. ‘When Z test is applied to the sampling variability, the difference ‘observed between a sample estimate and that of population is expressed in terms of standard cerror(SE) instead ‘of standard deviation(SD). Zis the score of value of the rate between the observed difference and SE. Conditions for applying Z test 1. The data must be quantitative. 2. The variable follows normal distribution. 3. Samples are randomly selected. 4, Sample is more than 30. 1. teristics nce in terms of SE or Z sores fall win EST 1.96 SE. acceptance (confidence limit '95%) the null hypothesis Ha is accepted. 2. The distance from the mean at which null hypothesis is rejected is called evel of significance. Te falls inthe zone of rejection for Ho and is denoted by letter P. 3. Greater the Z val, fesser willbe the ‘P* 4, Pat 5% level is written as 0.05 ‘and at 1% level is 0.01 in the zone of Applications of Z test = ifference between Sample mean X_ and population mean o. 1) Totes significance of d Ze Sample Mean (X)~ Population Mean (1)/SE of sample mean z=X-w/se.@) 2) Totest significance of svo sample means (Xs) difference between | 101 (& scanned with OKEN Scanner = 5 2 SER-%,)" Jet, of m My Working procedure oo Null hypothesis is set that there is no difference between the two means. To determine significance fZ. value, the probability p value is found from table 3 constructed on basis of normal distribution, ‘Null hypothesis is set that there is no difference between the two means. To determine significance FZ value, the probability p value is found from table 3 constructed on basis of normal distribution. ‘Table 3: Z value at different levels of significance Critical Value Level of significance Ze 1% (0.01) 5% (0.05) 10%(0.10) ‘Two tailed test IZ) =2.58 IZ ,)=1.96 IZ_)= 1.645 If calculated value of Z i.e. |Z] is less than critical value Z,at given level of significance ‘4 then null hypothesis Hy is accepted. If [2] is more than Z,, Hy is rejected and altemative hypothesis H, is accepted . Example A random sample of 100 fishes drawn of a certain species showed mean length of 28 cm. Can this be considered as a sample drawn from a population with mean 30 cm and standard deviation 10. 1=30 ¢=10 n=100 Z=|R — pl /o/Vn=|28-30)/10 1100 = 2/10/10 =2 [Zpgs=1:96< [2-2 So Hy is rejected, Hence sample has not been drawn from the population with mean 30 and standard deviation 10. One tailed and two tailed Z test Ifnull hypothesis Hy :X =p istested against Hp :X > p then the interest isin the extreme values to one side of the mean. Critical region will be to one side of the distribution. It is called as one tailed test. Butif Hy :X= istested against H, :X #p then the interest is on extreme values of Z on both tails of distribution .The critical region is on both the sides It is called as two tailed test. (& scanned with OKEN Scanner ttest : Nonifeance frame by W. S. Gossett (1908) whose pen name was student as 5 test of samples whose sample size n<30 such samples follow t distribution instead of normal distribution. 2, The ‘t’ may be defined as the rato of observed difference between two means of small samples to the standard error of difference in the means. po ir%] _ Xk SE Sia where X, and X, =Mean, SE = Standard error ‘S = Standard deviation and n = sample size Criteria for applying t test. 1. Sample size <30 2. Random samples are drawn from normal population 3, For testing equality of two population means the population variances are considered as Characteristics of t distribution 1. Itis a sampling distribution derived from the parent normal distribution. 2. It is symmetrical about the mean but has greater spread than normal distribution. 3. Itis different for different sample size n or degrees of freedom n-1. 4. Itresembles normal distribution more closely as the sample size (n) (a-1) increases. 5, The value of *t’ for different degrees of freedom at differen level of significance have been tabulated (Fisher and Yates,1963) Applications of t test i "To text the significance ofa single mean when the population variance is not known, 2. To test the significance of difference between two samples means when the population variances are equal and unknown. 3, Totest the significance of an observed sample correlation coefficient ordifference between means of two dependent samples (paired observation) ‘Types of t test 1. Unpaired t test | a ; Itis applied to unpaired data of observations made on individuals of two different groups i.e. samples are drawn from two different populations, t value is calculated as the ratio between the observed difference between means of two samples and standard error of difference between the means. t= [1 - XaVSE SE of (X 1-2) = SD. +s +2 or degree of freedom 103 (& scanned with OKEN Scanner ‘Null hypothesis is set up assuming there is no difference between the means, then calculated value is compared with table t value at the the specified level of significance (mostly 0.05) with n,+n,-2 degrees of freedom If calculated value [t| is less than the table value then null hypothesis is accepted otherwise it is rejected. Some examples of “Unpaired t test’ are: To determine whether there is significant difference in 1. Birth weight of babies born in municipal hospital and private hospital. 2. Weight gain in group of children given normal diet (control group) and group of children given normal diet plus vitamins A and D tablets (experimental group). 2. Paired t test It is applied to paired data of independent observations from one sample only where each individual gives a paired observations. ‘Some examples of ‘Paired t test” are: ‘To determine whether there is significant difference in 1, Weight loss of a group of people before and after giving a special diet. 2. IQ level of a group of students before and after coaching 3. Effect of two drugs given to a group of individuals on two different occasions. The t value is calculated using formula: t=D-0/SDn 1X,‘ D = mean of difference. ‘Null hypothesis states there is no difference in paired observations. Table value of t at a specified level of significance and degree of freedom n-1 is compared with the calculated value of [t). If calculated value is less than the table value, null hypothesis is accepted or else rejected. Chi-Square Test Chi-square is a test of significance which is not based on any assumption or distribution of any variable and therefore is anon-parameter test. It ismostly used when data are in frequencies such as number of responses in two or more categories. Chi-square static follows a specific distribution called chi-square distribution, Chi-square 7? distribution : TEX, Xy----s-----q aren independent standard normal variates. Then sum of squares of these varieties, 4474 4y".......-sXq2 follows 1? distribution with n degrees of freedom, which is also its mean. For n > 30, X? distribution approximates normal distribution. The value of x have been tabulated for different degrees of freedom at different levels of probability (Fischer and ‘Yates, 1963). Chi-square test is commonly used for analysis of qualitative data or attribute. (& scanned with OKEN Scanner yr cotsqiiare Test formula : o-EF rele where O= ‘observed frequency; B= expected frequer conditions for using Chi-square test : 1,’ Observations are random and independent. 2.” Total number of observations should be large n > 50. 3, °X? test applied in a four -fold 1 . ~fold table will not gi i it fredom expected vueinunyealiskessas esl bnieaion 4. Insuch cases yates correction should be applied. aay le-el-¥F C2 opeener ‘Where % Applications of Chi-square test ).5 is Yates correction 16 Chi square of goodness of fit: Chi-square test can be applied to determine whether the observed numbers of frequencies are in good agreement with its expected or theoretical numbers or frequencies such as 1:1 sex ratio or 3:1 Mendelian monohybrid ratio or’ 9:3:3:1 Mendelian dihybrid rato. Its assumed in null hypothesis thatthe observed ratio fits the theoretical ratio. Chi-square value is calculated by formula phat ‘Yates correction is applied if’ ‘expected frequency in any ‘observation is less than 5 in one degree of freedom. Calculated Chi-square value is compared with its table value at given level of significance (usually '5%) and degree of freedom, nel. If calculated value is less than the table value, null hypothesis is Example Ina sample of 300 fish, there were 130 males ratio of 1: Solution Hy: The observ ‘and 170 females. Does this data fit the ‘expected: red data fits the ratio 1:1. ‘Therefore according to the expected ratio, 150 fish are expected in male ‘and female categories each. yo E © (a0 150, 60-1507 * 150 150 i) 500 sue 5.33 vot igfend=2-Ie1 (& scanned with OKEN Scanner Table value of Chi-square at df =1 and 5% level of significance equals 3.84. Since the calculated value is more than table value, null hypothesis is rejected. Conclusion : The data does not fit the expected ratio. 2. Test of independence of attributes or Contingency Chi-sq is applied to test the association between two events in binomial or multinomial samples. It measures the probability of association between two discrete attributes such as smoking and lung cancer, vaccination and immunity, blood pressure and heart disease. There can be two possibilities either the two events are independent or dependent (associated) on each other. The sample data is presented in the form of contingency table. For two attributes the 2 x 2 contingency table is represented as follows in table 4, Table 4.4: Contingency table 2x2 ‘Null hypothesis Ho is set that no association exists between the attributes i.e they are independent, Alternative hypothesis is that the attributes are associated or dependent on each other. Expected Where R; = Sum total of the row as which E,,is lying; C, = sum total of the column as which E, is lying; N = total sample size Chi-square is calculated by the formula” ve yb BIKE ‘Where O= observed frequency; E= expected frequency; 4 = 0.5 is yates correction Degree of freedom df= (R- 1) (C-1) where R = number of rows; C= number of columns Table value of 2 for a given level of significance and degree of freedom df is compared with the calculated value of 2. If calculated value of is less than its table value, null hypothesis is accepted. If calculated value of x2 is more than its table value, null hypothesis to rejected and alternative hypothesis is accepted. Incase of 2 x2 contingency table, Chi-square can also be calculated by simple formula as follows: Without Yates correction: ?= eon 2 ax (lad —bel-N/2)°X W = Gad—bel-N2Px With Yates correction: 2? = 7 yceayeroeed 7 45 Correlation Correlation is the statistical tool for measuring the of relationship between two variables. It quantifies both the strength and direction of the linear relationship between two variables. Co-efficient of 106 (& scanned with OKEN Scanner correlation measures the extent or degree of relationship between the two variables. A coefficient of correlation : single number that expresses to what extent two things are related and to what extent variation in one go with variations in the other. It is denoted by ‘r”. Properties of co-efficient of correlation: 1. Ithas no units. 2. It’s value lies between —1 and +1 ie.,-1

You might also like