0% found this document useful (0 votes)
48 views10 pages

Central Tendancy in R

The document discusses descriptive statistics, focusing on measures of central tendency including the median, mode, and mean, along with their properties and applications. It highlights the importance of understanding the distribution characteristics such as central tendency, variation, skewness, and kurtosis, and provides examples and R commands for calculating these measures. Additionally, it covers geometric and harmonic means, particularly in the context of ungrouped and grouped frequency distributions.

Uploaded by

abysly.mystic
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
48 views10 pages

Central Tendancy in R

The document discusses descriptive statistics, focusing on measures of central tendency including the median, mode, and mean, along with their properties and applications. It highlights the importance of understanding the distribution characteristics such as central tendency, variation, skewness, and kurtosis, and provides examples and R commands for calculating these measures. Additionally, it covers geometric and harmonic means, particularly in the context of ungrouped and grouped frequency distributions.

Uploaded by

abysly.mystic
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 10
Descriptive Statistics os and graphic presentation of the data. However, we often need descriptive measures in the form of single numbers that can focus attention more sharply on various properties of a set of data being investigated. We are mainly interested in the following four characteristics which are often sufficient to characterize frequency distribution of univariate data: 1, The location of the center of the distribution or the measure of central tendency. 2. The degree of variation of individual values around the central point, or the tendency of the individual values to deviate from the measure of central tendency. 3. The degree of skewness; that is, lack of symmetry of the frequency distribution. 4, The degree of peakedness, or kurtosis. Measures of central tendency: Measures of central tendency are called averages. The most frequently encountered averages are the median, the mode and the arithmetic mean. Two other averages, the geometric mean and the harmonic mean, are useful in some special situations. We introduce these averages for raw data. Median: The median of a set of measurements is defined as a middle measurement after the measurements are arranged in order of magnitude. If the number of measurements in the set is odd then median is the middle observation. If the number of measurements in the set is even then there are two middle observations. In this case, any value between these two measurements is a median. However, conventionally the arithmetic mean of the two middle values is taken as median. Median is a particularly useful measure of central tendency if the distribution is not symmetric. It can be computed exactly even if there are open-end classes. Mode: The mode of a series of observations is defined as a measurement with maximum frequency. Strictly speaking, any value is called a mode if it appears more often than both of the adjacent values. Although the mode is a simple and useful concept, its applications present many troublesome aspects. First, the distribution may reveal that two or more values repeat themselves an equal number of times and this frequency is larger than other frequencies. (The distribution is multi modal). Second, we may not find any value that appears more than once, as in case of population of large cities in Asia. Third, the mode is a very unstable value. It can change radically with the method of rounding the data. Finally, the mode could be an extreme value and then it could hardly be considered as a measure of central tendency. Mean: The arithmetic mean (hence forth we shall only use the term mean) is a sum of individual values in a series divided by the number of observations in the series. Symbolically, ux x Properties of mean: The mean has a number of interesting properties. It can be thought of as a balance point; if we arrange the data on a horizontal scale, the mean would be a point at which the numbers on the left “ balance ” the numbers on the right. (It is the center of gravity 2.26 Statistics Usin, and hence is also called the first moment.) It is a typical value in the sense that its value Ma be substituted for the value of each item in the series without changing the total. Following & mathematical properties of the mean: (i) The sum of deviations from the mean is zero. (ii) The sum of squared deviations of all items from the mean is less than the sum of squap, deviations from any other value. Because of this property the mean usually serves as a basis ¢, measures of dispersion (these will be discussed subsequently). ) Suppose the sample means of two sets of data having sample sizes m and no are denoted} Yj and Yq, respectively. The overall sample mean for the combined set of (1 + n2) observatio., is given by vy _ mYi+ nee ~ (mn) © Although the mean is commonly used, it has the following disadvantages: (i) Mean is useful only for quantitative data. It is not sensible to compute mean for observation on a nominal scale even though the different levels or categories of the qualitative variables «: coded as numbers for convenience. (i) Mean can be highly influenced by an observation that falls far from the rest of the data, calle an outlier. For example, consider the following data on monthly income (in Rs.) of employes in a small business: 10200, 10400, 10700, 11200, 11300, 11500, and 200000. The income 2000 is the salary of the owner's son, who happens to be highly educated. The mean computed the other six employees is 10, 883 which is quite different from the mean 37900 inchiding ti outlier. This example shows that the mean is not always representative of the data. Comparison of mean with median: Median has certain advantages as compared to mc For example, median is usually more appropriate when the distribution is highly skewed. 1! mean is greatly affected by outliers, whereas the median is not. The mean requires quantit® data whereas median can be found for qualitative as well as quantitative data. R - commands for obtaining mean and median are: (as given in Section 1.7) mean(x) *" median(x) where x is a data object. However it may be noted that mode (x) will NOT s" the modal value. It will simply display data type of data object x. How to get mode will illustrated with the help of examples. Example 2.4.1: Twenty students, graduates and undergraduates, were enrolled in a statist” course. Their ages were: 18 19 19 19 19 20 20 20 20 20 21 21 21 21 22 23 24 27 30 36 a) Find the median age of all students. b) Find median age of all students under 25 years. c) Find modal age of all students. d) Find mean age for all students, ©) Two more students enter the class. Age of both the students is 19. What is mean? W! Descriptive Statistics median? And what is mode? Solution: x< — scan(); # Enter data using scan function. 1:18 19 19 19 19 20 20 20 20 20 21 21 21 21 22 23 24 27 30 36 a: Read 20 items me< — median(x); # Find median of data set x. [x < 25]; # Vector of students with ages less than 25. mel< — median(y) ; # Find median of data set y. mel ; # Print median of y. (1) 20 xt< — table(x) ; # Prepare frequency table of x. mode< — which(xt== max(xt)); # Obtain mode of x. mode; # Print modal value. 20 3 Observe the output. The mode is 20, which is the third distinct value in the ordered series; z< — e(x, 19, 19); # Obtain augmented data set after the admissions of two new students. nmean< — mean(z); # Compute new mean. nmean ; # Print new mean. (a) 2.72727 nmedian< — median(z) ; # Obtain new median. nmedian ; # Print new median. (1) 20 zt< — table(z); # Prepare frequency table of 2 nmodec — which(2 ==max(at)) ; # Compute new modal value. nmode ; # Print new modal value: 19 2 Measures of central tendency for fr he distinct values of . variable (2) along with the corresponding frequencies (fi) form the ungrouped frequency distribution. ‘The definitions of median, mode and mean are the same as those for raw data. However, the Recommands for the computation need to be modified so as to suit the different nature of data. quency table: T! : Statistics Using , 2.28 In this case, the formula for mean is zy IX n where n is total number of observations. / / Example 2.4.2: A survey of 25 faculty members is taken in a college to study their vocation, mobility. ‘They were asked the question “In addition to your present position, at how may, educational institutes have you served on the faculty?” Following is the frequency distributir, of their responses. Table 2.4.1 Frequency Table of Vocational Mobility rea oy; 8 1}i1 2/5 31 Compute mean, mode and median of the distribution. Solution: From ungrouped frequency distribution we can get original data (ordered) back ty using the R-command rep function; ¥< ~ rep(x, f); where x and f values are stored in objects x and f respectively. mean< — (sum(y))/length(y); mean: [1] 0.96 median <~— median(y) ; median ; Output is: (aja From the given table, note that mode is 1 as its frequency is maximum. Measures of central tendency for grouped frequency distribution: Following are t!" formulae for median, mode and mean, Median: For grouped frequency distribution denotes the “less than” cumulative m1 is the median class if ¢q;_) < is the class to which median belon, will give the median for grouped we first determine the median class. Suppo ° frequency of the i-th class, Class interval with serial num n/2 and em > n/2, where n denotes the total frequency’! %- Then the following formula, based on linear interpolat”® frequency distribution, 2 Median = 1 + [ =n/2)); # The serial number of the median class. mi: [1]5 h < ~5: I< -midx{mll-h/2 ; # Lower limit of the median class. { < — frequency|ml] ; # Frequency of the median class. © <~ clmi-1]; # Cumulative frequency of the pre median class. £:¢ ss The outputs are: 1) 64 [1] 96 median < — 1+(((n/2)—c)/f)+h ; median ; 1] 165.3125 m < ~ which {frequency==max(frequency)); # Serial number of the modal class. fin < ~ frequency|m] ; # Frequency of the modal class, fm: fa) 64 fl <— frequency{m-1); # Frequency of the pre modal class, 2 <— frequency(m+l] ; # Frequency of the Post modal class. fl; The output is: 1] 58; £2; The output is: (1] 30 <= midx{m)-/2 ; # Lower limit of the modal class, 0) 165 mode <~ 1+ ((fm-f1)/(2efinf1-£2))4h ; mode ; (1) 165.75 scriptive Statistics : 2.31 snean < ~ sum(frequency* midx) /n; mean; [1] 165.175 Geometric mean and harmonic mean: In the following we discuss these measures for three types of data. (j) Series of observations: Logarithm (log) of the goometric mean is the arithmetic mean of the log of observations. Harmonic mean is the reciprocal of the arithmetic mean of the reciprocals. Example 2.4.4: Find geometric mean and the harmonic mean for the data of Example 2.4.1 Solution: x< — scan() 1: 18 19 19 19 19 20 20 20 20 20 21 21 21 21 22 23 24 27 30 36 a Read 20 items y <~ log(x, 10); We have used log function. x [1] 1.255073 1.278754 1.278754 1.278754 1.278754 1.301030 1.301030 1.301030 [9] 1.901080 1.301030 1.922219 1.322219 1.22219 1.322219 1.342423 1.361728 [17] 1.380211 1.431364 1.477121 1.556303 logg< — mean(y); # Log of geometric mean. logg ; {a} 1.335673 g< —10° logg ; # Geometric mean. g5 {1} 21.6073 2< -1/x 525 [1] 0.05555556 0.05263158 0.05263158 0.05263158 0.05263158 0.05000000 [7] 0.05000000 0.05000000 0.05000000 0.05000000 0.04761905 0.04761905 [13] 0.04761905 0.04761905 0.04545455 0,04347826 0.04166667 0.03703704 [19] 0.03333333 0.027778 inv <— mean(2); # Reciprocal of Harmonic mean. h< - 1/invh ; # Harmonic mean. h; [1] 21.38338 (ii) Ungrouped frequency distribution: Formula for log of geometric mean is ¥ filog(2s) logG = + . Statistice Using Formula for harmonic mean is n = Eye - : Compute the geometric mean and the harmonic mean for the following s, H Example 2.4. quency table: Table 2.4.3 Frequency Table 5[6[7[8]9|10| 1 | 2 4/5/6/5[/4/3{/2]1 X/2/3 f[1/2 4 3 Solution: x< — c(2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12) ;f <— c(1, 2, 3, 4, 5, 6, 5, 4, 3, 2, 1); fr.dist <— data.frame(x, f) ; fr.dist] <— transform(fr.dist, y=log(x, 10)) ; fr.dist]: Col=Tf y al VT] 2 71 J 0.3010300 Ny Sx? | 2| 3 |2| oa7zas Ons 4 | 3 | 4 | 3 | 0.6020600 ‘ 4 | 5 | 4 | 0.6989700 5 | 6 | 5 | 0.781513 6 | 7 | 6 | 0.8450980 7 | 8 | 5 | 0.9030900 8 | 9 | 4 | 0.9542425 9 | 10 | 3 | 1.000000 10 | 11 | 2 | 1.0413927 11 | 12 | 1 | 1.0791812 attach(fr.dist1) ; logg< — sum(f + y)/sum(f) ; logg ; {1] 0.8142518 g = 10° logg ; # Geometric mean. g 5 # Output is: [1] 6.520063 fr.dist2< — transform(fr.distl, 2=1/x) ; fr.dist2 ; Descriptive Statistics _ | x{t] sy % 1} 2 } 1 | 0.3010300 | 0.50000000 2 | 3 | 2) 04771213 | 6 3333 | 3 | 4 | 3} 0.6020600 | 0.25000000 4| 5 | 4 | 0.6989700 | 0.20000000 5 | 6 | 5 | 0.7781513 | 0.16666667 6 | 7 | 6 | 0.8450980 | 0.14285714 7 \ 8 | 5 | 0.9030900 | 0.12500000 ; 9 | 4} 0.9542425 | 0.111111 10 2 1.000000 | 0.10000000 Ii " > 1.0413927 | 0.09090909 107918127 33333 sea he — sum(f)/sum(f * 2); # Harmonic mean h ; # Output is: [1] 5.95855 Grouped frequency distribution: Formula for log of geometric mean is DY filog(ai) logG = ~——_ , n where 1; is the mid point of the i-th class interval. Formula for harmonic mean is 5 i= E 7E? where 2; is the mid point of the i-th class interval. Example 2.4.6: Obtain the geometric mean and the harmonic mean for the data of Example 2.4.3. Solution: x< — seq(147.5, 182.5, 5) 5x5 [1] 147.5 152.5 157.5 162.5 167.5 172.5 17.5 182.5 f< — e(4, 6, 28, 58, 64, 30, 5, 5) sfrdist< ~ data.frame(x, f) 5 fr.dist] < — transform(fr.dist, y=log(x, 10), a=1/x) ; frdistl 5 | x [fl y_ % 147.5 @ | 2.168792 (0.006779661 152.5 | 6 | 2.183270 | 0.006567377 157.5 | 28 | 2.197281 | 0.006349206 1 2 3 4| 162.5 | 58 | 2.210853 | 0.006153846 5 | 167.5 | 64 | 2.224015 | 0.005970149 6 7 8 172.57 2.236789 | 0.005797101 177.5 2.249198 | 0.005633803 182.5 2.261263 | 0.005479452 aag 2.34 Statistics Using attach(fr.dist1); logg< —sum(f + y)/sum(f) ; log ; g= 10°89 ; # Geometric mean g; [1] 165.0461 h< — sum(f)/sum(z f) ; h; [1] 164.9168

You might also like