Chapter # 02 Sher Muhammad Chaudary

Uploaded by

dear1234

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

100% found this document useful (1 vote)

249 views41 pages

Chapter # 02 Sher Muhammad Chaudary

Uploaded by

dear1234

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 41

CHAPTER 2 PRESENTATION OF DATAPRESENTATION OF DATA 2.1 INTRODUCTION The device of gathering data often results in a massive volume of statistical data, which are in the form of individual measurements or counts. It is difficult to learn anything by examining the unorganised data which is more often confusing than clarifying. The mass of data is therefore to be organised and condensed into a form that can be more rapidly and easily understood and interpreted. For this purpose, techniques of classification, tabulation and graphic displays are presented in this chapter. 2.2 CLASSIFICATION The term classification is defined as the process of dividing a set of observations or objects into classes or groups in such a way that (i) observations or objects in the same class or group are similar, (ii) observations or objects in each class or group are dissimilar to observations or objects in other class or group. Classification is thus the sorting of data into homogeneous classes or groups according to their being alike or not. When the data are sorted according to one criterion only, it is called a simple classification or a one-way classification. Classification is called a two-way classification when tlie data are sorted according to two criteria. A manifold classification or cross-classification is made according to several criteria, Data may also be classified according to qualitative, temporal and geographical characteristic Arrangement of data according to the values of a variable characteristic is called a distribution. When the defining variable is expressed in terms of location, we get a spatial or geographical distribution. ‘Temporal arrangement of values is referred to as a time series. 2.2.1 Aims of Classification. The main aims of classification are: i) to reduce the large sets of data to an easily understood summary; ii) to display the points of similarity and dissimilarity; iii) to save mental strain by eliminating unnecessary details; iv) toreflect the important aspects of the data; and ¥) to prepare the ground for comparison and inference. 2.2.2 Basic Principles of Classification. While classifying large sets of data, the following points should be taken into consideration. i) The classes or categories into which the data are to be divided, should be mutually exclusive and no overlap should exist between successive classes. In other words, classes should be arranged so that each observation or object can be placed in one and only one class. ii) The classes or categories should be all inclusive. All inclusive classes are classes that include all the data. ili) _As far as possible, the conventional classification procedure should be adopted. iv) The classification procedure should not be so elaborate as to lead to trivial classes nor it should be so crude as to concentrate all the data in one or two classes. 2.3 TABULATION By tabulation, we mean a systematic presentation of data classified under suitable heads and subheads, and placed in columns and rows. This sort of logical arrangement makes the data easy to16 INTRODUCTION TO STATISTICAL THEORY understand, facilitates comparisons and provides an effective way to convey information to a reader, 4 British statistician, Professor Bowley (1869-1957), refers to tabulation as “the intermediate process between the accumulation of data, in whatever form they are obtained, and the final reasoned account of the results shown by the statistics.” 23.1 Types of Tables. Statistical tables classified according to purpose, are of two types, vs, General purpose (primary) tables and Specific purpose (derived or text) tables. The general purpose tables are large in size, are extensive with vast coverage and are constructed for reference purposes. The specific purpose tables are simpler in structure and deal with one or two criteria of classification only. Such tables are used to analyse or to assist in analysing data. ‘When the classification corresponds to one, two or many criteria or characteristics, the tabulation is called a single, double or manifold tabulation respectively. Tabulation of a dependent variable (say, number of students) against the independent variable (say, weight) provides an example of a single tabulation. Tables with two criteria of classification, e.g. gender and marital status or height and weight, etc. are examples of double tabulation. An example of manifold tabulation is the presentation of the population of a country by age, by gender, by residence, by literacy, by livelihood classes, etc. ‘The main parts of a statistical table are the title, the boxhead, the stub, the body, one or more prefatory notes, foomotes and a source, etc. They are described in the next section. 2.3.2 Main Parts of a Table and its Construction. The main parts of a table and the general rules to be observed in constructing any table are described below: a) Title. A table must have a self-explanatory title, which should usually tell us the “what, where; how classified and when” of the data, in that order. Some other important points are stated below: i) Titles should be brief in the form of phrases. Complete sentences are unnecessary. ii) Abbreviations should not be used. | iii) Main titles should be in capitals throughout. Sub-titles, if any, should be in lower case letters | with major words capitalized and should indicate clearly what the table describes. iv) The different parts of a title should be separated by commas but no full-stop at the end. v) Words in titles should not be hyphenated except when really necessary. vi) If a title necessitates the use of two or more lines, an inverted pyramid arrangement of the lines should be used. b) Column Captions and Boxhead. The heading of each column is called a Column Caption, | while the section of a table that contains the column captions, is referred to as Boxhead. Points to note | here are given below: i) The heading should be clear but concise. ii) They should be arranged in such a way that the most important characteristic is placed in the first column. The column of totals is usually placed at extreme right, but some people prefer | the totals on the left. iii) Only the first word in each column caption should be capitalized. No full-stop should be put at the end. | iv) Abbreviations, when clear, may be used. “¥) Main caption should be centred over the column it is to span. vi) Extra lines should be used to avoid crowding in caption box.PRESENTATION OF DATA Lt vii) Whenever possible, caption width should be made roughly proportional to the size of numbers to be inserted, c) Row Captions and Stub. The heading or title for a row, is called the Row Caption and the section containing the row captions is known as Stub. The necessary points in this respect are given below: The principles for column captions apply to row captions in stub. ii) If the stub is long and has several levels of classification, the major classification should be capitalized to separate the table into parts. Whenever the figures have more than four or five significant digits, the digits should be grouped in threes or fours. For example, one should write 23 178 327, not 23178327. In long tables, some space should be left after every five or ten rows. * ¥) Totals should usually be placed at the bottom, but some prefer to place them at the top. vi) Items in the stub should be arranged so as to facilitate easy reading. Every stub should have an appropriate heading describing its contents. This heading should be centred in the upper left box of the table. vii) 4) Prefatory Notes and Footnotes. Explanatory notes incorporated in the table beneath the title and below the body, are called prefatory notes and footnotes respectively. Prefatory notes give additional specifications of the data indicative of items included or excluded for all data of the table, statements of the box, etc. They are placed between the title and the boxhead. The wording should be in lower case alphabet. Footnotes are used to clarify anything in the table by giving a fuller description, by drawing attention to incompleteness or by stating any special circumstances affecting the data, The footnotes should be specific in nature. They are placed immediately below the bottom line of the table, above the source. Footnote symbols should be placed as follows: i) If they refer to an entire column or a set of columns, place them at the end of the appropriate caption. ii) If they refer to an entire row or a set of rows, place them at the end of the appropriate stub title. iii) If they refer toa single cell in the table, place beside the cell entry in the body of the table. The footnotes should be indicated either by lower case alphabet enclosed in parentheses or by symbols as *, t, }, etc.; never by a number. e) Source Notes. Every table should have a source note, unless the table is an original tabulation and its source is clear from the context. It is placed immediately below the table and below the footnotes, if any. The source notes must include the compiling agency, publication, date of publication and page as they are used as a means of verification and reference. f) Body and Arrangement of Data. The body of a table is the most important part,, which contains the entire data arranged in columns and rows. A rough-sketch enables us to have an idea aout the number of columns and rows required ‘Arrangement of the data is made by taking into consideration the basis of classification and the purpose of the table. Thus the data may be arranged either (i) according to the alphabetical order or (ii) according to the time of occurrence or (iii) according to location or (iv) according to magnitude or18 INTRODUCTION TO STATISTICAL THEORY | importance, or (v) by a customary classification, e.g. classifying as men, women and children, etc, Whatever arrangements are used, the table should be neat, simple and attractive to the eye. g) Spacing and Rulings. A proper and judicious use of spacing and ruling enhances the effectiveness of a table and helps in separating or emphasizing certain items in it. Thick or double lines (rulings) are used for emphasis and for separating the title, the boxhead, the stub, etc., while parts under captions and related columns are separated by thin or single liens. h) General. There are some other considerations too, that are enumerated below: i) A table should be simple. A complex table if possible, may be broken into relatively simple tables. ii) Units of measurements and nature of the data should be specified in title,, captions, etc, in parentheses. iii) Percentages should be clearly indicated as ‘per cent of total’ etc. and their total should be shown as 100.0. iv) Ifthe figures entered in the table are rounded off, this should be indicated in the prefatory note or in the stub or caption. v) Zeros need not be entered. vi). Minus signs are a part of the table and precede the number. vii) The relationship of the parts to the whole should be shown by thin or heavy rulings. viii) The item or items to be emphasized should be placed in the most prominent position of the table. The general sketch of a table is given below: Me eeeeereeeones TITER..000. > — Prefatory notes Boxhead COLUMN CAPTIONS — <— | Units ¥ 1 i I 1 1 STUB B o|D yY \ 1 i 1 | i y Footnotes Source note: Example 2.1 A district is divided into two areas, viz. Urban area and Rural area. Total population of the district is 271,076 out of which only 46,740 live in the urban area. Total male population of thePRESENTATION OF DATA 19 district is 139,699 and that of urban area is 23,083. Total unmarried population of the district is 4 on out of which 36,864 are rural females. In the urban area, unmarried people number 21,072 out of whi 12,149 are males. Prepare a table showing the population of the district by marital status, by residence and by gender. A rough table which will probably need amending later, might look as follows: ‘AREA | BOTH GENDERS MALE FEMALE _ Married | Unmarried | Married | Unmarried | Married | Unmarried Urban, Rural We first compute the relevant figures as below: Similarly, Rural females Rural unmarried population Rural unmarried males Urban unmarried females Rural population Female population Rural male population Urban females = Total population ~ Urban population = 271,076 — 46,740 = 224,336 = Total population - Male population = 271,076 — 139,699 = 131,377 = District male population — Urban males = 139,699 — 23,083 = 116,616 = 46,740 — 23,083 = 23,657 = 224,336 — 116,616 = 107,720 Married population of District = 271,076 - 112,352 = 158,724 = 112,352 - 21,072 = 91,280 = 91,280 — 36,864 = 54,416 = 21,072 - 12,149 = 8,923 etc. Having computed all these figures, they are presented in the final table that appears below: Title: Boxhead (Captions) Stub: Source: POPULATION OF DISTRICT “A” BY GENDER, MARITAL STATUS AND RESIDENCE Both Genders Male Female Areas Un- Une ri otal arr a Un- Tol | Mariel | gparica | T! | Mamied | gamicg | Total | Manied | UM Distrit | 271,076 | 158,724 | 112,352 | 139,699 | 73,134 | 66,565 | 131,377 | 85390 | asa87 Urban | 46,740 | 25,08 | 21,072 | 23,083 | 10934 | 12,149 | 23.657 | 14734 | sons Rural | 224,336 | 133,056 | 91.280 | 116616 | $4,200 | s44i6 | 107,720 | 70850 | soacs20 INTRODUCTION TO STATISTICAL THEORY, 2.4 FREQUENCY DISTRIBUTION \ The organization of a set of data in a table showing the distribution of the data into classes g, groups together with the number of observations in each class or group is called a Frequency Distribution. The number of observations falling in a particular class is referred to as the class frequency or simply frequency and is denoted by f. Data presented in the form of a frequency distribution are also called grouped data while the data in the original form are referred to as ungrouped data. The data are said to be arranged in an array when arranged in ascending or descending order of magnitude. The purpose of a frequency distribution is to produce a meaningful pattern for the overall distribution of the data from which conclusions can be drawn. A fairly common frequency pattern is the rising to a peak ang then declining. In terms of its construction, each class or group has lower and upper limits, lower ang upper boundaries, an interval and a middle value. 2.4.1 Class-limits. The class-limits are defined as the numbers or the values of the variables which describe the classes; the smaller number is the /ower class limit and the larger number is the upper class limit. Class-limits should be well defined and there should be no overlapping. In other words, the limits should be inclusive, i.e. the values corresponding exactly to the lower limit or the upper limit be included in that class. The class-limits are therefore selected in such a way that they have the same number of significant places as the recorded values. Suppose the data are recorded to the nearest integers. Then an appropriate method for defining the class limits without overlapping, for example, may be 10 - 14, 15 - 19, 20 - 24, etc. The class limits may be defined as 10.0 - 14.9, 15.0 ~ 19.9, 20.0 ~ 24.9, etc. when | the data are recorded to nearest tenth of an integer. Sometimes, a class has either no lower class limit or no upper class-limit. Such a class is called an open-end class. ‘The open-end classes, if possible, should be avoided as they are a hindrance in performing certain calculations. A class indicated as 10 — 15 will include 10 but not 15, i.e. 10< X<15. 2.4.2. Class-boundaries. The class-boundaries are the precise numbers which separate one class from another. The selection of these numbers removes the difficulty, if any, in knowing the class to which a particular value should be assigned. A class-boundary is located midway between the upper limit of a class and the lower limit of the next higher class, e.g. 9.5 ~ 14.5, 14.5 ~ 19.5, 19.5 ~ 24.5, or 9.95 ~ 14.95, 14.95 — 19.95, etc. The class-boundaries are thus always defined more precisely than the level of measurements being used so that the possibility of any observation falling exactly on the boundary is avoided. That is why the class boundaries carry one more decimal place than the class limits or the observed values. The upper class boundary of a class coincides with the lower boundary of the next class. 2.4.3 Class Mark. A class mark, also called class midpoint, is that number which divides each class into two parts. In practice, it is obtained by dividing either the sum of the lower and upper limits of a class, or the sum of the lower and upper boundaries of the class by 2 but in a few cases, it does not hold, | particularly in modern practice of age grouping. For purposes of calculations, the frequency in a particular class is assumed to have the same value as the class-mark or midpoint. This assumption may introduce an error, called the grouping error, but statistical experience has shown that such errors usually tend to counterbalance over the entire distribution. The grouping error may also be minimized by selecting a class (group) in such a way that its midpoint corresponds to the mean of the observed values falling in that class. | 2.4.4 Class Width or Interval. The class-width or interval of a class is equal to the difference | between the class boundaries. It may also be obtained by finding the difference cither between two successive lower class limits, or between two successive class marks. The lower limit of a class should not be subtracted from its upper limit to get the class interval. An equal class interval, usually denoted by | h or c, facilitates the calculations of statistical constants such as the mean, the standard deviation |PRESENTATION OF DATA. 2 moments ete. Tat is why in practice, iis desiabe to have equal clas intervals. Bu in some Opes of ccanomic and medial data, i's wise fo use unequal class-intervals an account of greater coneenration of mle era aminlelases Such class intervals usually become uniform when logarithms of class marks are taken. It should be noted that some people use the tems “class” and “clessnteval geably and the width of the class is referred to as the size or length of the class-interval. sho ane Constructing a Grouped Frequency Distribution. The following are some basic rules that uld be kept in mind when constructing a grouped frequency distribution: i) iii) iv) vi) Decide on the number of classes into which the data are to be grouped. There are no hard and fast rules for deciding on the number of classes which actually depends on the size of data. Statistical experience tells us that no less than 5 and no more than 20 classes are generally used. Use of too many classes will defeat the purpose of condensation and too few will result in too much loss of information. H.A. Sturges has proposed an empirical rule for éctermining the number of classes into which a set of observations should be grouped. The rule is, k= 14323 log N, where k denotes the number of classes and is the total number of observations. For examp| if there are 100 observations, then by applying Sturges’ rule, we should have k= 1+ 3.3 (2,0000)=7.6, ie. 8 classes ‘Thus eight classes are required but this rule is rarely used in practice. Determine the range of variation in the data, i.e. the difference between the largest and the smallest values in the data. le, Divide the range of variation by the number of classes to determine the approximate width or size of the equal class-interval. In case of fractional results, the next higher whole number is. usually taken as the size or width of class-interval. If equal class-intervals are inconvenient or may be undesirable, then classes of unequal size are used. But in practice, intervals that are multiple of 5 or 10, are commonly used as people can understand them more readily. Decide where to locate the class-limit of the lowest class and then the lower class boundary. The lowest class usually starts with the smallest data value or a number less than it. It is better if it is a multiple of class-interval. Find the upper class boundary by adding the width of the class-interval to the lower class-boundary and write down the upper class limits too. The open-end classes, i.e. classes with the lowermost or uppermost class boundary unknown, should be avoided if possible. Determine the remaining class-limits and class boundaries by adding the class-interval repeatedly. The lowest class should be placed at the top and the rest should follow according to size. In some cases, the highest class is placed at the top, Distribute the data into the appropriate classes. This is best done by using a “Tally- Column” where values are tabulated against appropriate classes by merely making short bars or tally marks to represent them. Itis customary for convenience in counting to place the fist four bars vertically and the fifth one diagonally and to leave a space. The number of tallies is then written in the frequency column, The tally column is usually omitted in the final presentation of the frequency distribution. But in case of small number of values, the actual values should be shown against each class to mitigate chances of error.yy INTRODUCTION TO STATISTICAL THOR have been accounted f vii) Finally, total the frequency column to see that all the data or, sumed to be continuous. Jn case of disey, z ich are as “ se lary is unrealistic as there ean can be r t of a class boundary isi zy jnis logical difficulty, when the diserere dag. Oyealenlatons as continuous and hence are gry These rules are applied to group raw data data which carry only integral values, the concep points where the adjoining classes meet In spite sufficiently large, they are weated for convenience in the same way as the continuous data. he following data, rola Example 2.2 Make a grouped frequency distribution as Tomeconsleutect, ating weight recorded to the nearest grams of 60 apples picked out at ra 106 107 -76~=«82«109«07s«dHSs«93-—«*1BT_ 95123 Nas i192 86 70«:126-=«6B-—«130-—«*129,—«139sD_ TNS 128 100 186 «849913204 1360 '123 90H 68 110 78 185 162 178 140 152 173 146 158 194 14890107. «11«131_—=—«75—«*N84:«dO4: sD BOB 8 By scanning the data, we find that the largest weight is 204 grams and the smallest weight is og grams so that the range is 204 ~ 68.= 136 grams. 10 the Suppose we decide to take 7 classes of equal size. Then size or width of the equal class interyaj would be 126-19.47. But we take = 20, the next integral value higher than 19.47 to facilitate the 7 numerical work. Let us decide to locate the lower limit of the lowest class at 65. With this choice, the class limits will be 65 ~ 84, 85 ~ 104, 105 ~ 124, ..., the class boundaries become 64.5 — 84.5, 84.5 ~ 104.5, 104.5~ 124.5, ..., and the class-marks are 74.5, 94.5, 114.5... The grouped frequency distribution is then constructed as follows: i) By listing the actual values: FREQUENCY DISTRIBUTION OF WEIGHTS OF 60 APPLES. Weight Entries Frequency 65~ 84 | 76, 82, 70, 68, 84, 78, 75, 80, 82 9 85-194 | 93, 95, 92, 86, 100, 99, 90, 98, 90, 104 10 se | AMR HBTS | 125-144 | 125, 126, 130, 129, 139, 128, 141, 136, 140, 131 10 145-164 | 162, 152, 146, 158, 148 5 165-184 | 178, 173, 181, 184 7 185-204 | 187, 186, 204, 185, 194 5 Total 60PRESENTATION OF DATA 2 ‘This table is sometimes known as an entry table. The values against each class may be arranged in an array. ii) By using a Tally-Column: FREQUENCY DISTRIBUTION OF WEIGHTS OF 60 APPLES (weigh Hunde “trae . aly eee 65-84 64.5 - 84.5 74.5 THLIIT 9 85-104 | 84.5- 104.5 945 | THIH 10 105124 | 104.5- 124.5 14s | TH TATA I 7 125-144 | 124.5~ 144.5 134.5 | THI 10 145-164 | 144.5 ~ 164:5 154.5 | IH 2 164-184 | 164.5- 184.5 174.5 ih 2 185-204 | 184.5~204.5 194.5 | TH 5 Total = see se bad | Example 2.3 Given below are the mean annual death rates per 1,000 at ages 20 ~ 65 in each of 88 occupational groups. Construct a grouped frequency distribution. 7S 8.2 62 89 78 S54 94 99 109 10.8 14 97 116 126 50 102 92 120 99 73 73 84 10.3 10.1 100 Wl 65 125 78 65 87 93 124 104 91 97 93 62 103 66 74 86 77 94 77° 128 87 55 86 96 119 104 78 76 121 46 140 8&1 114 106 116 104 81 46 66 128 68 71 66 88 88 107 108 60 79 73 93 93 89 101 39 60 69 90 88 94 114 109 (BLS. Lahore, 1971) A scan of the data shows that the largest value is 14.0 and the smallest value is 3.9 so that the range is 14.0-3.9=10.1. ‘As the data are recorded to one decimal place, we may therefore locate the lower limit of the first group at 3.5. Let us choose a class interval of 1.0, Then the class limits are specified as 3.5 - 4.4,24 4.5 -5.4,5.5-6.4, ... With this choice, the class-boundari¢ which do not coincide with the given values. INTRODUCTION TO STATISTICAL THEORy es are 3.45 - 4.45, 4.45 ~ 5.45, 5.45 6.45 ‘The following table shows the required frequency distribution: FREQUENCY DISTRIBUTION OF MEAN DEATH RATES Death Rates | Class-boundaries Tally Frequency 35-44 3.45 4.45 ' 45-54 4.45 -5.45 tI 4 55-64 s4s—64s | IH 5 65-74 645-745 | TH MMII 5 75-84 7as—3.4s— | HUH 2 85-94 8.45—9.45 | TH THI TH Till 19 9.5104 9.45— 10.45 | IML IHL I 3 10.5- 11.4 1045-1145 | HTH 10 115-124 1145-1245 | MHL 6 12.5-13.4 12.45 - 13.45 I + 13.5- 14.4 13.45 - 14.45 1 Total 88 Example 2.4 Construct a frequency distribution for the data below. Indicate the class boundaries and class limits clearly. 41.78 | 29.32 | 31.47 | 35.35 | 32.82 | 39.42 61.65 | 28.31 | 44.63 | 22.78 | 44.44 | 48.12 81.71 | 33.47 | 50.35 | 29.19 | 51.26 | 50.32 26.84 | 18.95 | 48.19 | 43.72 | 43.89 | 47.15 60.20 | 44.43 | 41.17 | 37.50 | 22.35 | 29.17 By scanning the data, we find that the largest value is 81.71 and the smallest value is 18.95 so that the range is 81.71 - 18.95 = 62.76. Suppose ve decide to take 5 classes of equal size. Then size or width of the equal class interal S 12.55, But we take h = would ve © numerical work. ~ 13.00, the next integral value higher than 12.55 to takePRESENTATION OF DATA. . 25; AS the data are recorded to two decimal places, we may locate the lower limit of the first yroup at 18.00. With this choice, the class limits will be 18,00 ~ 30.99, 31.00 - 43.99, ....... the class boundaries become 17.995 - 30.995, 30.995 — 43.995. The grouped frequency distribution is then constructed as follows: Classes Class boundaries Tally £ 18.00 - 30.99 17.995 — 30.995 | TH Ill 8 31.00 ~ 43.99 30.995 - 43.995 | TH IH i 44.00 - 56.99 43.995 56.995 | TH Ill 9 57.00- 69.99 | 56.995-69.995 | |] 2 70.00 ~82.99 | 69.995-82.995 | | 1 Total one 30 Example 2.5 A survey of 50 retail establishments had assistants, excluding proprietors, as follows: 2,3, 9 0, 4 4 1, 5, 4 8 5, 3, 6 6 0, 2,7, 6, 4, 8 4, 3, 3, 1, 0, 8 3, 4, 326 3 4 75 4 6 4 2 5, 3, Axrange the values as a frequency distribution. By scanning the data, we find that the number of assistants is a discrete variable and the range is small, so the data can be conveniently sorted by taking the values of classes as 0, 1, 2, etc. The frequency, distribution is then constructed as shown below: FREQUENCY DISTRIBUTION OF ASSISTANTS IN 50 RETAIL ESTABLISHMENTS r of Assets) Tally Rouen s(p 0 th 3 1 {Hl 4 2 HI 6 3 TH I 7 4 TL ML 10 5 MALL 6 6 iH 5 7 IH 5 8 Ill 3 9 | 1 Total 50 Such a frequency distribution in which each class consists of a single value is sometimes called a discrete or ungrouped frequency distribution.26; INTRODUCTION TO STATISTICAL THEORY 2.4.6 Cumulative Frequency Distribution. The total frequency of a variable from its one end toa certain value (usually upper class boundary in grouped data), called the base, is known as the cumulative frequency, less than or more than the base of the variable. A table that shows the cumulative frequencies, is called a cumulative frequency distribution. The cumulative frequency of the last class is the sum of all frequencies in the distribution. If the cumulation process is from the lowest value to the highest, it is referred to as “a less than” type cumulative frequency distribution. For example, let us consider a frequency distribution having & classes, each of width h. Let us denote the midpoint of the ih class by x; k with frequency f; such that )° f, =n. Now the lower class-boundary of the first group is x; ~/2 and ia the upper class boundaries are x, +//2, (i=1, 2... 4). The cumulative frequency distribution is then obtained by adding each successive frequency to the cumulative total of frequencies for the preceding classes as shown below: Class-boundary Cumulative Frequency | Tess than x, —h/2 0 less than x, +//2 f less than x, +/2 Nth less than x5 +h/2 hithth less than x, +h/2 Dyan It should be noted that a less than type cumulative frequency distribution starts with the lower class boundary of the first group indicating that there is no frequency below x, ~//2. When the frequencies are cumulated from the highest value to the lowest value, it is called a “more than” type cumulative frequency. If the class frequencies against various classes are divided by the total frequency, we get the relative frequencies which always add to one. The class frequencies may also be expressed as percentages, the total of which would be 100. A percentage cumulative distribution is useful to read off the percentage of values falling between certain specified values Example 2.6 Construct (i) a “less than” type cumulative distribution, and (ii) a “more than” type cumulative distribution from the frequency distribution of weights of 60 apples of Example 2.2. i) A “less than” type cumulative frequency distribution is shown below: Weight (grams) | Cumulative Frequency (F) Less than 64.5 0 Less than 84.5 9 Less than 104.5 19 Less than 124.5 36 Less than 144.5 46 Less than 164.5 51 Less than 184.5 5S Less than 204.5 60PRESENTATION OF DATA. 27 ii) A“more than” type cumulative frequency distribution is given below: ‘Weight (grams) Cumulative Frequency (F) More than 64,5 60 More than 84,5 SI More than 104.5 41 More than 124.5 24 More than 144.5 14 More than 164.5 More than 184.5 More than 204.5 0 2.5 STEM-AND-LEAF DISPLAY A clear disadvantage of using a frequency table is that the identity of individual observations is lost in grouping process. To overcome this drawback, John Tukey (1977) introduced a technique known as the Stem-and-Leaf Display. This technique offers a quick and novel way for simultaneously sorting and displaying data sets where each number in the data set is divided into two parts, a Stem and a Leaf A stem is the leading digit(s) of each number and is used in sorting, while a /eaf is the rest of the number or the trailing digit(s) and shown in display. A vertical line separates the leaf (or leaves) from the stem. For example, the number 243 could be split two ways: leading digit _| trailing digits OR leading digit | trailing digits 2 B 24 3 stem leaf stem leaf Alll possible stems are arranged in order from the smallest to the largest and placed on the left hand side of the line. The stem-and-leaf display is a useful step for listing the data in an array, leaves are associated with the stem to know the numbers. The sfem-and-leaf table provides a useful description of the data set and can easily be converted to a frequency table. It is a common practice to arrange the trailing digits in each Tow from smallest to highest. Example 2.7 The ages of 30 patients admitted to a certain hospital during a particular week were as follows: 48, 31, 54, 37, 18, 64, 61, 43, 40, 71, 51, 12, 52, 65, 53 42, 39, 62, 74, 48, 29, 67, 30, 49, 68, 35, 57, 26, 27, 58 Construct a stem-and-leaf display from the data and list the data in an array. A scan of the data indicates that the observations range (in age) from 12 to 74. We use the first (or leading) digit as the stem and the second (or trailing) digit as the /eaf: The first observation is 48, which has a stem of 4 and a leaf of 8, the second a stem of 3 and a leaf of 1, etc. Placing the leaves in the order in which they appear in the data, we get the stem-and-leaf display as shown on next page:28 INTRODUCTION TO STATISTICAL THEORY Stem (leading digit) Leaf (trailing digit) 1 wauUeroN 82 967 17905 830289 412378 415278 14 To get the array, we associate the leaves in order of size with the stems as shown below: 12, 18, 26, 27, 29, 30, 31, 35, 37, 39, 40, 42, 43, 48, 48, 49, 51, 52, 53, 54, 57, 58, 61, 62, 64, 65, 67, 68, 71, 74 Example 2.8 Construct a stem-and-leaf display for the data of annual death rates given in Example 2.3. Using the decimal part in each number as the /eaf and the rest of the digits as the stem, we get the following stem-and-leaf display (leaves are ordered): Stem Leaf 3 Card He 10 IL 12 14 9 66 045 00225566689 13334456778889 1124667788899 012333344467799 011233446678899 144669 0145688 0 2.6 GRAPHICAL REPRESENTATION Tabulation, we know, is a good method of condensing and representing statistical data in a readily understandable form, but many people have no taste for figures. They would prefer a way of representation where figures could be avoided. This purpose is achieved by the presentation of statistical data in a visual form. The visual display of statistical data in the form of points, lines, areas and other geometrical forms and symbols, is in the most general terms known as Graphical Representation. Statistical data can be studied with this method without going through figures, presented in the form of tables.PRESENTATION OF DATA 29 Such visual representation can be divided into two main groups, graphs and diagrams to be described in the sections that follow. The basic difference between a graph and a diagram is that a graph is a representation of data by a continuous curve, usually shown on a graph paper while a diagram is any other one, two or three--dimensional form of visual representation. 2.7 DIAGRAMS Diagrammatic representation is best suited to spatial series and data split into different categories. Whenever a comparison of the same type of data at different places is to be made, diagrams will be the best way to do that. Diagrammatic representation has several advantages over tabular representation of figures. Beautifully and neatly constructed diagrams are more attractive than simple figures. Diagrams, being a visual display, leave more effective and long lasting impression on the mind of a reader. They make unwieldy data intelligible at a glance. Comparison is made easier with diagrams. Diagrams have some disadvantages too. Diagrams are less accurate than tables; cost money and time and the amount of information conveyed is limited. However, this method of representation is excessively used in business and administration. Different types of diagrams or charts commonly used for displaying statistical data are described below: i) Linear or One-Dimensional Diagrams. They consist of Simple Bars, Multiple Bars and Component Bar charts. Here the values are represented only by one dimension, generally the length of the bar. i) Areal or Two-Dimensional Diagrams. They consist of Rectangles, Sub-divided Rectangles and Squares, the areas of which are proportional to the values of the given quantities. This device is used to represent data having moderately large variations, iii) Cubic or Three-Dimensional Diagrams. They are in the form of Cubes and cylinders, whose volumes are proportional to the values they represent. These diagrams are used when the variation among the values of the data to be portrayed is so large that even the square roots of the values concemed fail to reduce the variation appreciably. iv) Pie-Diagrams. They are in the form of Circles and Sectors. Here the areas of circles or sectors are in proportion to the values they represent or compare. v) — Pictograms. They consist of pictures or small symbolic figures representing the statistical data. A pictogram is an effective way of visual comparisons. For example, we can compare the armed strength of various countries by drawing pictures of the number of soldiers, where each pictorial soldier may denote, say, 1,000 soldiers. In a similar way, the production of wheat can be compared by means of the pictures of wheat bags of a specified size. It is essential to repeat the pictures a number of times to represent the differences in magnitudes. While drawing diagrams, the following points should be kept in mind: i) An appropriate scale consistent with the size of paper available and the size of the data to be represented, should be chosen and indicated either at the side or at the bottom of the diagram This scale must start at zero. ii) A diagram like a table, must have a title, which should be brief and se footnote or source will also be necessary. explanatory. A key, iii) A diagram should be shaded, coloured or cross-hatched to show the different parts, if any. iv) Lettering should be shown horizontally.30 INTRODUCTION TO STATISTICAL THEORY 2.7.1 Simple Bar Chart. A simple bar chart consists of horizontal or vertical bars of equal widths and lengths proportional to the values they represent. As the basis of comparison is linear or one- dimensional, the widths of these bars have no significance but are taken to make the chart look attractive. The space separating the bars should not exceed the width of the bar and should not be less than half of its width. The bars should neither be exceedingly long and narrow nor short and broad. The vertical bar chart is an effective way for presenting a time series and qualitatively classified data whereas horizontal bars are useful for geographical or spatial distributions. The data when do not relate to time, should be arranged in ascending or descending order before charting. Example 2.9 Draw a simple bar diagram to represent the turnover of a company for 6 years. Years: 1980 1981 1982 1983 1984 1985 Turnover (Rupees): 38,000 45,000 48,000 52,500 55,000 $8,000 The bar chart is drawn below: Bar diagram showing the Turnover of a company for 5 years 70,000 60,000 50,000 40,000 30,000 20,000 10,000 1980 1981 1982 1983 1984 1985 Year 2.7.2 Multiple Bar Chart. A multiple bar chart shows two or more characteristics correspondi to the values of a common variable in the form of grouped bars, whose lengths are proportional to the values of the characteristics, and each of which is shaded or coloured differently to aid identification. This is a good device for the comparison of two or three kinds of information. For example, imports, exports and productions of a country can be compared from year to year by grouping the three bars together. Example 2.10 Draw multiple bar charts to show the area and production of cotton in the Punjab from the following data: Year Area (000 acres) | Production (000 bales) 1965 ~ 66 2866 1588 1970-71 3233 2229 1975-76 3420 1937 (Source: Statistical Wing, Agriculture Deptt. Lahore)PRESENTATION OF DATA 31 ‘The multiple bar charts uw drawn below: AREA Au pRopucrioN oF Toro an fae SoA +200 |- wea m aces seo [7] proovenon w ones 2800 1200 2.1.3 Component Bar Chart. A conponent bar chart is an effective technique in which each bar is divided into two or more sections, proportional in size to the component parts of a total being displayed by each bar. The various component parts shown as sections of the bar, are shaded or coloured differently to increase the overall effectiveness of the diagram, Component bar charts are used to represent the cumulation of the verious components of data and the percentages. They are also known as sub-divided bars. Example 2.11 Draw a component bar chart for the following data. (Population in Lakhs) Division Both Genders | Male | Female | Peshawar 4 33 31 Rawalpindi 40 21 19 Sargodha | 60 32 28 Lahore... 65 35 30 ‘The appropriate coinponent bar chit afler arranging the population fizures in ascending ord drawn below: COMPONENT Lik CHART SHOWING PUPULATIC.NOF 4 DIVISIONS, Biome Chrome [Soa i ge 3 gee32. INTRODUCTION TO STATISTICAL THEORY 2.7.4 Rectangles and Sub-divided Rectangles. The atea of a rectangle is equal to the product of its length and breadth. To represent a quantity by a rectangle, both length and breadth of the rectangle are used. Sub-divided rectangles are drawn for the data where the quantities along with their components are to be compared. These diagrams are generally drawn to compare the budgets of various families. In the | construction of sub-divided rectangles, we are required to i) change each component into the percentage of the corresponding total, ii) draw one rectangle for each total, taking equal lengths (100 units) and breadths proportional to the totals, iii) divide every rectangle so drawn into parts equal in number to the number of components Each part shaded or coloured will represent percentage size of one component. Example 2.12 Compare the budgets of families A and B with a suitable diagram. Items of Expenditure Family A | Family B Food 24 60 Clothing 4 14 House Rent 4 16 Education 3 6 Litigation 2 10 Conventional Needs 1 6 Miscellaneous 2 8 Total 40 120 The necessary computations required for the drawing of sub-divided rectangles are given below and the diagram is shown on page 33: Family A Family B Items of Expenditure |” Actual | Percentage | Actual | Percentage Expenses | Expenses | Expenses | Expenses Food 24 60.0 60 50.0 | Clothing 4 10.0 14 Te House Rent 4 10.0 16 B3 Education 3 7S 6 5.0 Litigation 2 5.0 10 83 Conventional Needs 1 25 6 5.0 Miscellaneous 2 5.0 8 67 Total 40 100.0 120 100.0PRESENTATION OF DATA 3 eoace FAMILY FAMILY £90 A NEEDS UTIGATION EDUCATION 80 Sued | HOUSE RENT itl WZ 60 CLOTHING 40) FooD 20 Rs.40~ Rs.120 2.7.5 Pictograms. A pictogram is a popular device for portraying the statistical data by means of pictures or small symbols. It is said that @ picture is worth ten thousand words. It is customary to Tepresent a unit value of the data by a standard symbol or a picture and the whole quantity by an appropriate number of repetitious of symbol concemed. This means the larger quantities should be represented by a larger number of symbols and not by larger symbols. A quantity smaller than the unit is represented by a part of the picture or symbol used. The symbols or pictures to be used, must be simple and clear. A pictogram is virtually a bar chart constructed in pictorial way as the number of symbols or pictures corresponds to the length of a bar. Example 2.13 The following table shows the number of employees in a certain Textile Mills. Represent the data by means of a pictogram, Year No. of Employees 1950 2,004 1955 2,990 1960 4,240 1965 5,380 Representing 1,000 employees by one picture, the pictogram is drawn below: PIcToGRAM Jk ERI 1965]34 INTRODUCTION TO STATISTICAL THEORy 2.7.6 Pie Diagrams. A pie-diagram, also known as sector diagram, is 2 graphic eerie Const of a circle divided into sectors or piechaped pices whens areas re el differently to how which the whole quantity is divided. The sectors ied © i : relationship of parts 0 the whole. Ifspace permits, the descriptive titles ofthe constiment bar's should i Blaced horizontally on each sector, otherwise a key becomes necessary. It is & convenient way Gisplaying the component pars in proportion to the total and therefore i used as an altematve ig component bar chart. It is an effective way of showing percentage parts when the whole q : HY Js tak a5 100. Iis also used when the basic categories are not quantifiable as with expenditure, classified jn, food, clothing, fuel and light, etc. The arrangement of the sectors must be made uniform in comparing pj, charts. To construct a pie chart, draw a circle of any convenient radius. As a circle consists of 360°, whole quantity to be displayed is equated. to 360. The proportion that each component Part or categoy bears to the whole quantity will be the corresponding proportion of 360°. These corresponding proportions, i.e. angles, are calculated by the formula component part whole quantity Angle = x 360° Then divide the circle into different sectors by constructing angles at the centre by means of a Protracto, and draw the corresponding radii. Example 2.14 Represent the total expenditure and expenditures on various items of a family by pie diagram. Ttems: Food Clothing House Rent Fueland Light — Misc. Expenditure: (in Rs.) 50 30 20 15 35 ‘The corresponding angles needed to draw the chart are computed below. Ttems Expenditure (in Rs.) 7 Food 50 Clothing 30 House Rent 20 Angles of the Seciors (in Degrees) Fuel and Light 15 120 Miscellaneous 35 2 Total 150 360 “The pie diagram consisting of a circle divided into five sectors defined by angles 120°, 72°, 48°, 36° and 84°, is drawn below:"* PIE orAGRAMPRESENTATION OF DATA 35 . at Profit and Loss Chart. This is virtually a percentage component bar chart in which profits can be shown above the normal base line and losses below the base line. Since the bars are to be extended rom the zero line to show 'ossvs, we start from the top. For an illustration, the following data are represented: COST, PROCEEDS, PROFIT OR LOSS PER CHAIR Particulars 1960 1970 i) Materials Rs. 10 Rs. 16 ii) Wages 6 8 iii) Polishing, etc. 2 4 Total cost 18 28 Proceeds 20 25 Profit ( +) or loss (—) +2 3 PROFIT AND LOSS CHART % age 1960 1970 areRtat 100 waces 80 pousiine 60 40. 20: prorrr [+t oO coos [RAT ese une A pie chart may also be used for this purpose. 2.8 GRAPHS As already stated, diagrams are useful for representing spatial series. Diagrams fail when we want to represent a statistical series spread over a period of time, or a frequency distribution or two related variables in visual form. For such representations, graphs are employed. Graphs present the data in a simple, clear and effective manner, facilitate comparison between two or more than two statistical series, and help us in appreciating their significance readily. Another advantage of graphs is that they provide an overall picture of a statistical series. Graphs are also sometimes used to make predictions and forecasts. Certain partition values can also be located graphically. But graphs are less accurate as they do not give minute details. Moreover, they cost considerable expenditure and time.36 INTRODUCTION TO STATISTICAL THEORY to take a starting poin Construction of Graphs. In the construction of a graph, the fisteo taight lines perpendicus’ as the origin, inthe lefi-hand bottom corner of the graph paper ON? TST cscs and the vertcg other are drawn trough the origin. The horizontal lines called the ays OFT Ne etc i labeled as Y-axis or ordinate. The tvo lines together ae known 68 Foro TN ane lab scales are selected along X-axis and Y-axis. Independent variable of raph. While constructing a gra variable along Y-axis. Points are plotted and joined to get the required 2 “th, the following points should be kept in mind: ling a way that the true i i) A scale and the form of representation is to be selected in such a way Impression of the data to be represented is given by the graph. ii) Every graph must have a clear and comprehensive title at top. Where necessary, sub-tittes should be added. iii) The source of the data must be given, A key and foomotes should be provided whe, necessary. ; . : izontal axis, iv) The independent variable should always be placed on the horizontal a ¥) The vertical scale should always begin with zero, otherwise the axaph will Bive a fae impression. If, however, the first item of the data is quite large, a scale-break should be shown between zero and next member. vi) The horizontal axis does not have to begin with zero unless of course, the independent variable or the lower limit of the first class interval is zero. vii) The axes of the graph should be properly labelled. Labels should clearly state both the variable and the units, e.g. “Distance” and “Kilometer”. “Sales” and “Rupees”, etc. viii) Curves if more than one, must be clearly distinguished either by different colours or by differentiated lines (solid, dashed, dot-dashed). ix) The graph should not be loaded with too many curves. Graphs can be divided into two main categories, namely: a) Graphs of Time-Series or Graphs of Historical Data, and b) Graphs of Frequency Distributions. The important graphs of frequency distributions are Histogram, Frequency Polygon, Frequency Curve and the Cumulative Frequency Curve or Ogive 2.8.1 Graph of Time Series-Historigram. A curve showing changes in the value of one or more items from one period of time to the next is known as the graph of a time series, This curve is also called a Historigram. Thus a historigram displays the variations in time series dealing with prices, production, imports, population, etc. To construct a historigram, time is taken along Y-axis and the values of the variables along Y-axis, Points are ploted and are then connected by means of straight line searents to get the “Historigram’ 8 Example 2.15 The following table gives the num ber of ca ei 1 years 1929-1936. Draw a suitable graphs, ie. TS produced in Germany during the Historigram of the series, Years: 1929 1930 1931 1932 1933 193419351936 No. of Cars: 98 14 68 50 99 172 245302PRESENTATION OF DATA 31 The historigram is drawn for the data by taking years on horizontal axis and the number of cars on vertical axis as below: f ISTORIGRAM EB ee BREE RSS 8 8 2292228 8 8 YEAR 2.8.2 Histogram. A histogram consists of a set of adjacent rectangles whose bases are marked off by class boundaries (not class limits) on the X-axis and whose heights are proportional to the frequencies associated with respective classes. The area of each rectangle represents the respective class frequencies. This is one of the most important graphical representation of a frequency distribution. When the class- intervals are equal, the rectangles all have the same width and their heights directly represent the class frequencies, that is they are numerically proportional to the frequencies in the respective classes. The following figure shows the histogram for the frequency distribution of Example 2.3. ¥ HISTOGRAM FOR FREQUENCY DISTRIBUTION (OF ANNUAL DEATH RATES FREQUENCY CLASS BOUNDARIES38 INTRODUCT! JON TO STATISTICAL THEORy If the class-intervals are nor all equal, the height of the rectangle Oe anita, i to be adjusted because it is area and not height that measures fey ae clase-ineret et Le rectangle must be proportionally decreased if the length of the foe de eight of the recten reas For example, if the length of a class-interval becomes double, then i fe of ailbadton ISIE is to by halved so that the area, being the fundamental property of the a 8 trie lacniteniog emains unchanged. This sort of rescaling is necessary so that the correct pattern 10M is to be conveyed. When the frequencies in a frequency distribution are given against the class-marks x; of eq, class-intervals of width h, a histogram is constructed by drawing Taal Coe) woe height, correspond to the respective class-frequencies at the class-marks marked of OE an and erecting a series of adjacent rectangles with widths equal to x; +//2 (ie: half of the wi on either side of x). istogram, we assume that within any one class It is important to note that in the construction of a hi IM 1 the class-boundaries. A histogram which mus, the values of the variable are evenly spread out between , " not be confused with the historigram (graph of a time series) is useful in forming a rough idea ofthe overall pattern and shape of the frequency distribution. Example 2.16 Constract a Histogram for the following frequency distribution relating to the ages (to nearest birthday) of telephone operators Age (Years) 18-19 | 20-24 | 25-29 | 30-34 | 35-44 | 45-59 No. of Operators 9 188 160 123 84 15 [As the class-intervals are unequal, the height of each rectangle cannot be made equal to the frequency The height of a rectangle is therefore calculated by dividing the frequency (the area) by the corresponding class interval (the width). The necessary calculations and the histogram follow: Class- Class- Frequency Proportional boundaries Interval (1) Heights 17.5 19.5 2 9 922-45 195-245 5 188 188 = 5 =37.6 24.5 ~29.5 5 160 160 + 5 = 32.0 29.5 34.5 5 123, 123, | 34.5 44.5 10 84 84+ 10-84 44.5 -59.5 15, 1s 1s+15=1.0PRESENTATION OF DATA 2e ¥ HISTOGRAM FOR UNEQUAL CLASS INTERVALS 2.8.3 Frequency Polygon. A frequency polygoit is a graphic form of a frequency distribution, which is constructed by plotting the points (x;, f;) where x, is the class-mark of the ith class and_f; is the corresponding frequency, and then connecting them by straigh! line segmests provided the cless- intervals are equal. In case of ne jual class-intervals, heights of unequal classes are adjusted by using the same technique that was usec for histogram. It can also b2 obtained by joining the tops of the successive rectangles in the histogram by means of straight line segments. The graph drawn in this way does not reach the herizontal axis. But a polygor, as we know, is a closed figure havitig many sides. It is therefore customary to add “extra” class inarks at both ends of the distribution with zero class frequencies so that the polygon does form 2 closed figure with the korizonte! axis. This should be done even if the curve ends in the minus part of the graph. The frequency polygon for the frequency distribution of weights in Example 2.2 is given below: Yy FREQUENCY POLYGON FOR FREQUENCY DISTRIBUTION 1 OF WEIGHTS eu 6 & 8 a sper SIZ] MID POINTS quenzy polvecn which can be used! Sor comparing two or more data sets, gives roughly the position or the mode, some idea of skewness iad kurtosis of the curve (these terms are defined later).“ INTRODUCTION TO STATISTICAL THEORY . 2.8.4 Frequency Curve. When a frequency polygon or a histogram constructed over class intervals made sufficiently small for a large number of observations, is smoothed, it approaches a continuous curve, called a frequency curve. The concept of a frequency curve 18 of great importance in Statistics. Mathematically, the curve is represented by the relation y= (2) and has an important property concerning its area, The following graph represents histogram and frequency curve for the frequency distribution of the mean annual death rates of Example 2.3 Y FREQUENCY CURVE MID POINTS 2.8.5 Cumulative Frequency Polygon or Ogive. A cumulative frequency polygon, popularly known as Ogive (rhymes with “alive” and pronounced o' iv) is a graph oblained by plotting the cumulated frequencies of a distribution against the upper or lower class boundaries depending upon whether the cumulation is of the “less than” or “more than” type, and the points are joined by straight line segments. Because of its likeness to an architectural moulding called an ogee, a cumulative frequency polygon is called an Ogive. An Ogive, when the cumulation is of less-than type, is constructed by plotting the points x; +h/2, F,) where x;+h/2 is the upper class-boundary of the ith class and F; is the cumulative frequency for the ith class, and connecting the successive points by straight line segments. ‘The polygon should start from zero at the lower boundary of the first interval, i.e. the point (x; ~//2, 0) is plotted and joined, and to have a polygon, the last point is also joined with the last upper class- boundary. In case of unequal classes, we merely join the unequally spaced points. CUMULATIVE FREQUENCY POLYGON (OGIVE) y FOR FREQUENCY DISTRIBUTION OF WEIGHTS OF 60 APPLES. 60 50 40 30 20 10 ‘CUMULATIVE RFEQUENCY eae ERES UPPER CLASS BOUNDARIESPRESENTATION OF DATA. a If relative frequencies are used, the cumulative frequency polygon rises from the value 0 at the left to the value 1 at the right. A smoothed Ogive is called an Ogive curve, which is often used to locate the partition values such as the median, quartiles, percentiles, etc. of a frequency distribution A percentage cumulative frequency polygon or curve may also-be drawn by expressing the cumulative frequencies as percentages of the total frequency and then connecting the plotted percentages against upper class boundaries. This graphic device is useful for comparing two or more frequency distributions as they are adjusted to a uniform standard. 2.8.6 Ogive for a Discrete Variable. When a variable X is discrete, its cumulative frequency polygon consists of horizontal line segments between any two successive values and has a jump of height fy at each value of x;. In other words, the cumulative distribution increases only in jumps and is constant between jumps. For the purpose of illustration, the cumulative frequency polygon drawn for the frequency distribution of assistants in Example 2.5, is shown below: Y OGIVE FOR DISCRETE VARIABLE aa $3 » 3 y 8 5 CUMULATIVE FREQUENCY ° 1234 567 8 9 NUMBER OF ASSISTANTS 10 This graph shows that the cumulative frequency polygon is stepped. Such a function is called a step-function. 2.8.7 Types of Frequency Curves. The frequency distributions occurring in practice, usually belong to one of the following four types: i) The Symmetrical Distributions. A frequency distribution or curve is said to be symmetrical if values equidistant from a central maximum have the same frequencies, i.e. the curve can be folded along the central maximum in such a way that the two halves of the curve coincide. ‘The Normal curve is an important example of a symmetrical distribution. Y7 INTRODUCTION TO STATISTICAL THEORY il) The Moderately Skewed or Asymmetrical Distributions. A frequency distribution or curve is said t0 be skewed when it departs from symmetry. Here the frequencies ten" 1 PIG up at fone end or the order end of the distribution or curve. This is the most common Pattern encountered in practice. Y x o iii) The Extremely Skewed or J-shaped distributions. Here the frequencies run up to a maximum at one end of the range, having the shape of the letter J or its reverse. Most of the distributions in economic or medical statistics belong to extremely skewed distributions. Y o x iv) ‘The U-shaped Distributions. In such fequency distibutions or curves, frequencies occur at both ends of the range and a minimum or less like the letter U. A distribution of this type is rare, the maximum lowards the centre, shaped morePRESENTATION OF DATA 43 Y 0 x 2.8.8 Ratio Charts or Semi-logarithmic Graphs. In the ordinary types of graph, the scales used are called the natural scales or the arithmetic scales. These graphs can only be used to compare the absolute changes in values because the ordinary graph paper, also known as arithmetic paper, is so ruled that equal intervals anywhere on the paper represent equal differences or amounts. More often we are interested in studying the relative changes or ratios. The relative changes or ratios can be displayed and compared by the slope of straight line when the logarithms of the values are plotted on an arithmetic paper. In practice, the difficulty of looking up logarithms can be dispensed with by using another type of graph paper, called Semi-logarithmic paper ot ratio paper. A semi-logarithmic paper or ratio paper is so constructed that equal intervals on the vertical axis indicate equal ratios or rates of change, while equal intervals on the horizontal axis represent equal differences or amounts of change. Thus the essential feature of a Semi-logarithmic chart is that one axis has a logarithmic scale and the other has arithmetic scale. Graphs obtained by plotting the values on a semi-logarithmic paper or ratio paper and joining the successive points by means of straight line segments are called Semi-logarithmic graphs or Ratio charts They are generally used when i) the relative rates of change are to be compared; ii) visual comparisons are to be made between two or more series which differ widely in magnitude: and iii) the data are to be examined to see whether they are characterized by a constant rate of change. A ratio chart possesses the following characteristics: i) There is no zero line on the logarithmic scale as the logarithm of zero is minus infinity. ii) A geometric progression when plotted on semi-logarithmic paper, forms a straight line, as the logarithms of a geometric progression form an arithmetic progression. iii) The slope of the logarithmic scale variable indicates the rate at which the variable is changing (ie. increasing or decreasing)44 INTRODUCTION TO. STATISTICAL THEORY : ; e, has the largest r iv) Incase of two or more curves, the curve having the steepest S1OP Best rate of change. A hange. vy) Equal slopes (in case of parallel curves) indicate equal rates of < es EXERCISES OBJECTIVE a) Answer ‘True’ and ‘False’. If the statement words that make the statement true: 1 is not true then replace the underlined words with i) The term cross-sectional data refers to data that may change overtime. ii) ‘The frequency distribution represents data in a condensed form. iii) The data presented in an array does not allow us to locate the largest and smallest values inthe data set. iv) The classes in any frequency distribution are generally not mutually exclusive. ¥) For nominally or ordinary scaled data the frequency distribution cannot be constructed. vi) Frequency distribution of continuous data may be represented diagrammatically. vii) Frequency distribution can be presented graphically by using both histogram and historigram, Vii)" Time series data can be tabulated using frequency distributions. ix) Simple bar diagram is used for two-dimensional comparisons. x) The width of a bar in histogram represents the frequency rather than the value of a variable. xi) Class marks are the lower limits of each class. xii) The lower class limit is the middle possible data value for a class. xiii) The sum of relative frequencies in a relative frequency distribution should always equal 100. xiv) A pie chart can be used to display quantitative data xv) A shape of the frequency distribution and the relative frequency distribution always will be different b) | MULTIPLE CHOICE QUESTIONS. i) Which of following is not an example of condensed data? a) frequency distribution b) data array ©) histogram) polygon ii) In the construction of a frequency distribution the steps are to: a) decide the number of classes b) arranging the data in ascending / descending order c) locate the smallest and largest values ina data set d) all of abovePRESENTATION OF DATA. 45 iii) The number of classes in a frequency distribution generally should be a) less than five b) more than twenty ) between five and twenty d) between ten and twenty iv) As the number of observations and classes increase, the shape of the frequency polygon: a) remains same b) tends to smooth ©) become more erratic 4) none of them v) Acumulative frequency distribution is graphically represented by: a) frequency curve b) frequency polygon ) pie chart 4) ogive vi) A relative frequency distribution presents frequencies in terms of: a) whole numbers b) percentages c) fractions d) all of above vii) A diagram that presents properties that look like slices of a pizza is known as: a) abar diagram b) a component bar diagram ©) a histogram 4) apie diagram viii) Observed data organized into tabular form is called: a) abar chart b) apie chart ©) a frequency polygon d) a frequency distribution ix) The number of occurrences of a data value is called: a) the frequency b) the cumulative frequency ©) the relative frequency 4) all of above x) __ Inthe following stem-and-leaf diagram: Stem | Leaf. 3,23 4/1 2223 s/1 1355556 6 | 4 567 7/2 8 3 | 6 The number that occurred the most is a) 2 b) 55 9) 42 a5 SUBJECTIVE 21 Explain what is meant by classification. What are its basic principles?46 22 23 24 2s INTRODUCTION TO STATISTICAL THEORY ation”. Outline the main ry notes? bles? Explain the different parts of ra (P.U., BA/BSc. 1970) steps in tabulation. Whay @. Define the terms “Classification” and “Tabul U., BAJBSc. 1983) do you mean by captions, stubs, title and prefator What i sth 2 What are different types © ‘hat is a statistical table? What are ain their construction. table and the main points to be kept in min in the form of a table, so as to bring oat ing paragraph i one pearing suitable title: 1945, the John Smith Manufacturing employees in 1941. Of these 220 were er of union employees increased to Joyees 200 were males. In 1943, (Of all the employees in 1943, 1e total number of employees oyees in 1944, 300 were Represent the data given in the foll clearly all the facts, indicating the source an‘ “According to the census of Manufacturers Report Company employed 400 non-union and 1,250 union females of which 140 were non-union. In 1942, the num 1,475 of which 1,300 were males. Of the 250 non-union emp 1,700 employees were union members and 50 were non-union. 250 were femates of which 240 were union members. In 1944, th ‘vas 2,000 of which one percent were non-union. Of all the emp females of which only 5 were non-union.” a) Write short notes on: s, Size of Class-Interval and Class-frequency, Class-Interval, Class limits, Class. Mark: Sturges’ rule. b) Determine class boundaries, class limits and class marks for the first and last classes in respect of the following: i) Weights of 300 entering freshmen ranged from 98 to 226 pounds, correct to the nearest pound. ii) The thickness of 460 washers ranged from 0.421 to 0.56: each recorded as correct to the nearest integer, decided to use seven classes of width 20 units and ss boundaries, limits and marks of the seven 33 inches. c) A sample consists of 34 observations, ranging in value from 201 to 337. If itis to begin the first one at 199.5, find the cla classes. (LU, M.A. Econ., 1992) a) What is meant by a frequency distribution? Describe briefly the main steps in the preparation of a frequency table from raw data. (P.U., B.A/B.Sc. 1974) b) Prepare a frequency table for the price data given below, taking 5 units as the width of class-interval. 100 96 «992 «88 86 84 82 80 78 91 37 «83797775787 SBS 73 50 57 55 53 S148 469 55 51 49 47 «45 B58 S450 56 44 42 40 38 36465350 a3PRESENTATION OF DATA 27 28 2 2.10 47 a) Why are frequency distributions constructed? What are the rules to be observed in making a frequency distribution from ungrouped data? b) A record was made of the number of abst the following results: Absentees (x) 0 1 2/3 4 5 | 6 No. of days (/) 5 7 916 4 241 i) On how many days were there fewer than 4 people absent? ii) On how many days were there at least 4 people absent? iii) What is the total number of absences over the whole 35 days? (M.A. Econ, I] Semester, 1980) a) Describe the steps you would take to construct a frequency distribution, b) Tabulate the following marks ina grouped frequency distribution. 74 49 103 95° 59 62 96 «82 71 104 96 84 66 96 83 57 90 65 62 60 18 85 58 Si. 52 105 66 4 88 101 16 91 96 83 100 80 54 120 121 92 72 56 64 110 99 52 76 84 75 55 99 104 88 64 63 95 97 85 B 3 ences per day from a factory over 35 days with The following data give the index numbers of 100 commodities in a certain year. Form a grouped frequency distribution, taking 5 as class-interval a1 99 63 7 81 90 129 100 101 75 124 99 121 177 127 142 87 117 120 100 109 114 122 110 86 105 113 73 112 113 129 100 1 95 153 94 110 134 138 114 141 88 119 96 79 125 144 108 140 109 107 108 7 96 129 96 o1 102 103 116 86 87 127 86 151 99 106 123 96 109 76 118 101 101 61 133 106 119 89 143 138145 10497 106 119 13678 116 104 111 mn 68 79 147 113 us 104 83 128 Arrange the data given below in an array and construct a frequency distribution, using « class interval of 5.00. Indicate the class boundaries and class limits clearly. 79.4 75.2 88.1 68.3 71.6 B19 778 48.6 on5 68.9 69.4 83.5 73.0 74.2 83.2 70.8 74.2 80.7 82.7 721 81.8 65.7 BB 16 90.6 67.6 64.2 59.4 55.9 82.9 63.9 771.6 (B.L.S.E. Lahore, 1972)INTRODUCTION TO STATISTICAL THE ony The following figures give the number of children born to 50 women: 2 6 1 5 4 3 3 8 3 1 4 3°93 0 5 2 1 4 3 3 5 3 3 6 3 3 2 2 7 3 1 4 2 4 4 4 6 8 0 7 7 °5 6 §$ 3 2 3 9 2 2 Construct an ungrouped frequency distribution of these data. Count the number of letters in each word of the following passage, hyphenated words, if any, being treated as single words and make a frequency distribution of word length. “To forgive an injury is often considered to be a sign of weakness; it is really a sign of strength. It is easy to allow oneself to be carried away by resentment and hate into an act of vengeance; but it takes a strong character to restrain those natural passions. The man who forgives an injury proves himself to be the superior of the man who wronged him, and puts the wrong-doer to shame. Forgiveness may even turn a foe into a friend. So mercy is the noblest form of revenge.” The weights of 50 football players are listed below: 193 240 217 283 268 212 251 263 275 208 230 288 259 225 252 236 243 247 280 234 250 236 277 218 245 268 231 269 224 259 258 231 255 228 202 245 246 271 249 255 265 235 243 219 255 245 238 257 254 284 Make a stem-and leaf display for the data and convert it to a frequency table with 10 classes, beginning with 190. Make a stem-and-leaf table for the following data. Using 8.0 as the lower limit of the first class and with a width of 1 unit, convert it to a frequency distribution. 90 102 113 121 107 138 108 11.6 13.6 16.4 11.0 15.8 93 13.7 7 110 80-120 1S 7116 101 141 100 99 «134-157 115 123° 98 130 91 83 129 140 105 132 105 106 125 151 128 104 112 93 117 177 139 169 13.4 118 16.8 14.2 118 9.6 19 87 147 109 179 1151474159 18 106 126 126 157 149 99 Describe the advantages and disadvantages of diagrammatic representation, Describe one of the important types of diagrams. Describe each of the diagrams listed below and give an illustration in euch case Bar diagrams; Multiple bar diagrams; . Pie-di ~--ms; Pictograms, and Profit and Loss charts, (P.U,, B.A/B.Se. 1974)PRESENTATION OF DATA 0 217 2.18 2.19 2.20 2.21 2.22 Give a description of various graphic and pictorial aids for representing data. Mention particular uses of some methods. : (P.U., B.A/B.Sc. 1961) Describe briefly the different types of diagrams generally used for presenting statistical data. State advantages and disadvantages of any three of them giving illustrations where possible. Represent the following yield per acre data by a bar diagram. Years: 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 Yield Per acre: 5 7 9 6 10° 12 8 MW 12 10 Following table gives the birth rates and death rates per thousand of a few countries. Represent them by multiple bar charts. Country Birth Rate | Death Rate India 33 24 Japan 32 19 Germany 16 10 Egypt 44 24 Australia 20 9 New Zealand 18 8 France 21 16 Russia 38 16 Represent the following data by rectangular diagram showing percentage of Income spent by two families on different items of expenditure. Family-budgets of two families ftemsor Family A Family B Eipentiture Income Rs.80 Income Rs.40 Actual Expenses Actual Expenses Food Rs. 32 Rs. 20 Clothing Rs. 20 Rs.8 Shelter Rs.8 Rs. 4 Fuel and Light Rs.4 Rs.2 Miscellaneous Rs. 16 Rs.6 Total Rs. 80 Rs. 40 The following table gives the details of monthly expenditure of three families. Represent the data by a suitable diagram on percentage basis. Items of Family A Family B Family C Expenditure (Rs.) (Rs.) (Rs.) Food Articles 43 87 120 Clothing 18 17 | Recreation 3 10 12 Education 5 9 15 Rent 10 21 17 Miscellaneous 6 15 un50 INTRODUCTION TO STATISTICAL THEORY 2.23 Represent the following data by means of a pictogram: T No. of Employees (a) Industry (000) Marine 96 Forest 187 Mineral 290 Farm 635 (b) Year Production of Vans 1982 2040 1983 2996 1984 4319 1985 6324 2.24 a) Draw a Pie-diagram and also a Component Bar-diagram for the following data: Item Expenditure in Rs. Food 190 Clothing 64 Rent 100 Medical care 46 Other items 80 b) Graph the following data showing the areas in millions of square miles of the oceans of the world, using (i) a bar char, (i) a pie chart Ocean, Pacific | Atlantic | Indian | Antarctic | Arctic ‘Area 708 | 412 | 285 16 48 2.25 a) The area sown in Rabi Crop is as follows: Prepare a Pie-chart. Wheat 106 lakh acres Gram 30 lakh acres Barley 1s lakh acres Pulses 10 lakh acres Fodder 25 lakh acres Other crops 14 | takhacres b) Calculate the per cent contribution of each crop to the total Rabi crops. (P.U., B.A/B.Sc., 1968)

M. Aslam Ch.
No ratings yet
M. Aslam Ch.
655 pages
Engineering Statistics Chapter 2
No ratings yet
Engineering Statistics Chapter 2
44 pages
Class 11 Chemistry Chapter 3 Theories of Covalent Bonding
No ratings yet
Class 11 Chemistry Chapter 3 Theories of Covalent Bonding
46 pages
Western Political Thought by Judd Harmon
No ratings yet
Western Political Thought by Judd Harmon
479 pages
Quantitative Aptitude Basic Arithmetic Skills
No ratings yet
Quantitative Aptitude Basic Arithmetic Skills
6 pages
The Menace of Corruption in Pakistan Causes Impacts and Solutions
No ratings yet
The Menace of Corruption in Pakistan Causes Impacts and Solutions
16 pages
Role of Agriculture in Economic Development
100% (1)
Role of Agriculture in Economic Development
27 pages
THE CONSTITUTION OF PAKISTAN 1956 and The Reasons For Its Failure
100% (1)
THE CONSTITUTION OF PAKISTAN 1956 and The Reasons For Its Failure
3 pages
Ipin, Ojhri Camp Disaster Tragedy and Aftermath
100% (1)
Ipin, Ojhri Camp Disaster Tragedy and Aftermath
5 pages
Imran Khan: by Haseeb, Laiba, Sherbano
No ratings yet
Imran Khan: by Haseeb, Laiba, Sherbano
19 pages
Cultural Heritage of Pakistan
No ratings yet
Cultural Heritage of Pakistan
11 pages
Economy of Pakistan Css
No ratings yet
Economy of Pakistan Css
7 pages
Punjab Small Industries Corporation Lahore (PSIC)
100% (1)
Punjab Small Industries Corporation Lahore (PSIC)
42 pages
Plato the Father of Western Political Thought – The CSS Point
No ratings yet
Plato the Father of Western Political Thought – The CSS Point
5 pages
Kashmir Issue
No ratings yet
Kashmir Issue
2 pages
Noa Pakistan Affairs-Reduced
No ratings yet
Noa Pakistan Affairs-Reduced
404 pages
Class Administrative Structure of Pakist
No ratings yet
Class Administrative Structure of Pakist
9 pages
Education System of Pakistan
No ratings yet
Education System of Pakistan
4 pages
Notes - Constitutional and Political History of Pakistan by Hamid Khan
No ratings yet
Notes - Constitutional and Political History of Pakistan by Hamid Khan
2 pages
Assistant SGAD EVENING 1 12 2024
100% (1)
Assistant SGAD EVENING 1 12 2024
3 pages
Distribution of Power Between Federal and Provincial Governments in Pakistan
No ratings yet
Distribution of Power Between Federal and Provincial Governments in Pakistan
1 page
Punjab Police SI Intelligence Officer Previous Papers
No ratings yet
Punjab Police SI Intelligence Officer Previous Papers
38 pages
Basics of OOP: Chapter # 13
No ratings yet
Basics of OOP: Chapter # 13
18 pages
Geographical Importance of Pakistan
100% (2)
Geographical Importance of Pakistan
31 pages
Analytical Reasoning
No ratings yet
Analytical Reasoning
7 pages
Allama Iqbal and Quaid e Azam in Two Nation Theory
100% (1)
Allama Iqbal and Quaid e Azam in Two Nation Theory
23 pages
Assignment Pak Study
No ratings yet
Assignment Pak Study
10 pages
Social Issues of Pakistan
No ratings yet
Social Issues of Pakistan
13 pages
Sakoot-e-Dhaka, Separation of East and West Pakistan 3128
No ratings yet
Sakoot-e-Dhaka, Separation of East and West Pakistan 3128
3 pages
18th Amendment: Its Implications For Federalism and Impact On Provinces
No ratings yet
18th Amendment: Its Implications For Federalism and Impact On Provinces
3 pages
Ayub Khan Handout
100% (1)
Ayub Khan Handout
8 pages
Governmental Structure of Pakistan - Official Website - National Democratic Foun
No ratings yet
Governmental Structure of Pakistan - Official Website - National Democratic Foun
7 pages
Assignment On Wave Function and Born's Interpretation On Wave Function
No ratings yet
Assignment On Wave Function and Born's Interpretation On Wave Function
6 pages
Why Do We Study Statistics in Pakistan Studies
No ratings yet
Why Do We Study Statistics in Pakistan Studies
2 pages
Current Affairs 2020
No ratings yet
Current Affairs 2020
113 pages
Pakistan 1947 To 1958
No ratings yet
Pakistan 1947 To 1958
13 pages
Zulfiqar Ali Bhutto
No ratings yet
Zulfiqar Ali Bhutto
38 pages
Political Economy of Pakistan Under Zulfiqar Ali Bhutto
No ratings yet
Political Economy of Pakistan Under Zulfiqar Ali Bhutto
8 pages
Political Economy of Defense in Pakistan
No ratings yet
Political Economy of Defense in Pakistan
7 pages
Assistant / Ucd / Steno Typist Paper Solved of Mcqs Date: December 16,2021
No ratings yet
Assistant / Ucd / Steno Typist Paper Solved of Mcqs Date: December 16,2021
4 pages
18th Amendment and Federalism in Pakistan Project PS
No ratings yet
18th Amendment and Federalism in Pakistan Project PS
11 pages
StudySchemeEconomics BZU
No ratings yet
StudySchemeEconomics BZU
26 pages
MCQS 1
No ratings yet
MCQS 1
21 pages
Bureaucracy in Pakistan
100% (1)
Bureaucracy in Pakistan
8 pages
AICT Notes
No ratings yet
AICT Notes
6 pages
549 Pakistan Studies Compulsory
100% (1)
549 Pakistan Studies Compulsory
36 pages
Pak Study
No ratings yet
Pak Study
15 pages
Foreign Policy
100% (1)
Foreign Policy
2 pages
15037440046
50% (2)
15037440046
2 pages
Guideline Islamabad (Fall 2014)
No ratings yet
Guideline Islamabad (Fall 2014)
9 pages
Problems After Independence
No ratings yet
Problems After Independence
3 pages
PMS Written Result 2016 KPK
No ratings yet
PMS Written Result 2016 KPK
11 pages
Indo Pak History by Tariq Ali Awan Indo Pak History by K Ali Great Mughals by Keane
No ratings yet
Indo Pak History by Tariq Ali Awan Indo Pak History by K Ali Great Mughals by Keane
2 pages
Governance Essay
No ratings yet
Governance Essay
4 pages
Advanced Contemporary Affairs Book 88
No ratings yet
Advanced Contemporary Affairs Book 88
7 pages
The Pressler Amendment and Pakistan
No ratings yet
The Pressler Amendment and Pakistan
19 pages
Sample Paper State Bank of Pakistan: Total Number of Sections 03
No ratings yet
Sample Paper State Bank of Pakistan: Total Number of Sections 03
10 pages
Pak Studies
No ratings yet
Pak Studies
44 pages
Pakistan Bait Ul Mal Scholarship Form
No ratings yet
Pakistan Bait Ul Mal Scholarship Form
5 pages
1956 Constitution
No ratings yet
1956 Constitution
23 pages

Chapter # 02 Sher Muhammad Chaudary

Uploaded by

Chapter # 02 Sher Muhammad Chaudary

Uploaded by

You might also like