Frequency Distributions and Graphs
Descriptive Statistics
The goal of descriptive statistics is to summarize a collection of data in a clear and understandable way.
What is the pattern of scores over the range of possible values? Where, on the scale of possible scores, is a point that best represents the set of scores? Do the scores cluster about their central point or do they spread out around it?
Bases of classification
Geographical Chronological Qualitative Quantitative
Classification of Data
Condenses the data Facilitates comparison Relationship study Analysis of data
Tabulation
Systematic representation of the information collected in the data in rows or columns according to certain characteristics
Example
In a sample study about coffee habits in two towns the following information is given : Town A : Females were 40, total coffee drinkers were 45% and male non coffee drinkers were 20%. Town B : Males were 55%, male non coffee drinkers were 30% and female coffee drinkers were 15%. Present the data in tabular form
Example-2
In 2002 out of total of 4000 workers in a factory 3300 were members of a trade union. The number of women workers was 500 out of which 400 did not belong to the union. In 2001 the number of workers in the union was 3450 of which 3200 were men. The number of workers not belonging to the union was 760 of which 330 are women. Tabulate the data
Frequency Distribution
Frequency distributions organize raw data or
observations that have been collected. Ungrouped Data
Listing all possible scores that occur in a distribution and then indicating how often each score occurs. Combining all possible scores into classes and then indicating how often each score occurs within each class. Easier to see patterns in the data, but lose information about individual scores.
Grouped Data
An Example: Grouped Frequency Distribution
Take home salary rates Hotel Rates
Find the lowest and highest score (order scores from lowest to highest). Number of Observation N=30 2540 is highest score. 2365 is lowest score. K=1+3.222log N (5) Find the range by subtracting the lowest score from the highest score. 2540-2365 = 175 Divide range by Number of classes (k). 175/5 = 35 Round off to the nearest convenient width. 35
2482 2446 2482 2460 2390
2392 2540 2394 2425 2460
2499 2394 2450 2500 2422
2412 2365 2444 2390 2500
2440 2412 2440 2414 2470
2444 2458 2494 2365 2428
An Example: Grouped Frequency Distribution
Record the limits of all class intervals, placing the interval containing the score Count up the number of scores in each interval.
Class Interval 2365-2400 2400-2435 2435-2470 2470-2505 2505-2540 Frequency 6 7 10 6 1 30
Take home salary rates Hotel Rates
2482 2446 2482 2460 2390
2392 2540 2394 2425 2460
2499 2394 2450 2500 2422
2412 2365 2444 2390 2500
2440 2412 2440 2414 2470
2444 2458 2494 2365 2428
Frequency Table Guidelines
Intervals should not overlap, so no score can belong to more than one interval. Make all intervals the same width. Make the intervals continuous throughout the distribution (even if an interval is empty). Choose a convenient interval width.
Class Interval 2365-2400 2400-2435 2435-2470 2470-2505 2505-2540
Frequency 6 7 10 6 1 30
An Example: Grouped Frequency Distribution
Proportion (Relative Frequency)
Divide frequency of each class by total frequency.
Class Interval 2365-2400 2400-2435 2435-2470 2470-2505 2505-2540
Frequency 6 7 10 6 1 30
Proportion 0.20 0.23 0.33 0.20 0.03 1
An Example: Grouped Frequency Distribution
Proportion (Relative Frequency)
Used when you want to compare the frequencies of one distribution with another when the total number of data points is different.
Occupations, 1992 (in hundreds)
Method Engineers Doctors Lawyers Other Males 163 32 37 15 247 Females 24 22 9 6 61
An Example: Grouped Frequency Distribution
Percentage
Proportion *100
Class Interval 2365-2400 2400-2435 2435-2470 2470-2505 2505-2540
Frequency 6 7 10 6 1 30
Proportion 0.20 0.23 0.33 0.20 0.03 1
Percentage 20.00 23.33 33.33 20.00 3.33 100
An Example: Grouped Frequency Distribution
Cumulative Frequency
Shows total number of observations in each class and all lower classes.
Frequency 6 7 10 6 1 30 Proportion 0.20 0.23 0.33 0.20 0.03 1 Percentage 20.00 23.33 33.33 20.00 3.33 100 Cumalative frequency 6.00 13.00 23.00 29.00 30.00
Class Interval 2365-2400 2400-2435 2435-2470 2470-2505 2505-2540
An Example: Grouped Frequency Distribution
Cumulative Proportion (Cumulative Relative Frequency): Divide Cumulative Frequency by Total Frequency Percentile Rank
Cumulative Proportion * 100
Class Interval 2365-2400 2400-2435 2435-2470 2470-2505 2505-2540 Frequency 6 7 10 6 1 30 Proportion 0.20 0.23 0.33 0.20 0.03 1 Percentage 20.00 23.33 33.33 20.00 3.33 100 Cumalative frequency 6.00 13.00 23.00 29.00 30.00 Cumulative Proportion 0.20 0.43 0.77 0.97 1.00 Percentile 20.00 43.33 76.67 96.67 100.00
Summarize these stock prices in the form of frequency distribution
67 34 36 48 49 31 61 34
43 36 45 47 46 43
45 50 49 30 43 60
38 46 48 50 34 39
32 30 41 28 62
27 40 53 35 69
61 32 36 35 50
29 30 37 38 28
47 33 47 36 44
Convert the distribution into percentage frequency and cumulative frequency distribution
What is the pattern of scores?
Graphs often make it easier to see certain characteristics and trends in a set of data.
Graphs for quantitative data.
Histogram Frequency Polygon Stem and Leaf Display
Graphs for qualitative data.
Bar Chart Pie Chart
hotel rates
800-899
700-799
600-699
Mumbai Hotel Rates
500-599
400-499
300-399
200-299
Histogram
100-199
0-99
Rates
Frequency
Histogram
Consists of a number of bars placed side by side.
The width of each bar indicates the interval size. The height of each bar indicates the frequency of the interval. There are no gaps between adjacent bars.
Continuous nature of quantitative data.
Graph Guidelines
Include a descriptive title for the graph. Label each axis.
The independent variable is on the X axis. The dependent variable (or frequency) is on the Y axis.
The numbers along the Y axis indicate the
measurement increments.
Histogram
Temperature and Aggression
Shapes of Histograms
Skewed Distributions
Often occur when what is being measured has some upper or lower limit.
Negatively skewed (skewed to the left).
May reflect a ceiling effect (you cant score any
higher).
Positively skewed (skewed to the right).
May reflect a floor effect (you cant score any
lower).
Bar Graph
A graphical representation of qualitative data. Unlike in a histogram, the bars do not touch.
Discontinuous nature of qualitative data.
Bar Graph
What makes a good graph?
Complex ideas communicated with clarity, precision, and efficiency. Gives the most information in the shortest time using the least amount of ink and space. Physical differences measured on the graph are proportional to the numerical differences in the data. Clear, detailed, and thorough labeling. The scale is consistent.
Housing Complex
The welfare committee of a large housing complex wants to understand the possibility of appointing private security guards at the entrance gate of the complex for 24-hour duty. There are 810 flats in the housing complex. And the owners were asked to vote for or against the proposal. The following data was collected. Should the guards be appointed Yes 194 No 121 Not Sure 73 No Response 422 Convert the data to percentages and construct a bar chart.