STA2122 Test 1: Statistics Exam Questions
STA2122 Test 1: Statistics Exam Questions
Distinguishing between a statistic and a parameter is critical because they pertain to different scopes. A statistic is a measure derived from a sample, while a parameter refers to the entire population. For instance, the average score computed for thirty 13-year-old girls from a study on TV show influence is a statistic, providing insight into the sample's behavior, rather than the 13-year-old population as a whole (parameter), limiting its generalizability .
To minimize the influence of outliers in salary data, the median would serve as the most appropriate measure of central tendency. Unlike the mean, which can be significantly affected by outliers, the median represents the middle value of a dataset and provides a better central value in skewed distributions, such as salary distributions with superstars .
Variance measures how spread out a set of data is from the mean. A higher variance indicates more dispersion among the data points. When comparing two datasets, differing variances suggest different levels of variability; a dataset with a variance of 4.33 (as in the sample with scores 5, 0, 2, 1) represents greater inconsistency compared to one with a lower variance. Variance is essential for identifying data with significant deviations from the mean, affecting statistical analyses like hypothesis testing and predictive modeling .
To calculate a z-score, subtract the mean of the population from the individual's score and then divide by the standard deviation. For example, with a population mean of 100 and a standard deviation of 10, a score of 60 yields a z-score of (60 - 100) / 10 = -4. This result signifies that the score is 4 standard deviations below the mean, indicating it is significantly lower than the average in that population .
Right-skewed distributions, where most data points cluster at the lower end with a long tail extending to higher values, can skew means higher than medians. This affects statistical inference by potentially misleading perceptions of central tendency if means are relied upon without context. Decision-making, such as policy adjustments, might need to incorporate median values or identify outliers to ensure that decisions truly reflect typical experiences, avoiding distortions from extreme high values .
Standardizing a dataset alters its scale to have a specified mean and standard deviation, often 0 and 1. This transformation allows comparison between datasets by converting individual scores into z-scores. A score of 56 in a test scaled with mean 47 and SD 6 would transform into a higher standardized score when the new mean and SD are 75 and 10. Standardization maintains relative positions within the dataset but changes the numerical representation, such that the recalculated score would be 81 .
The choice of scale affects the types of analyses that can be performed. An ordinal scale, which provides ordered categories like letter grades, allows for ranking comparisons but not precise arithmetic operations. On the other hand, a nominal scale categorizes data without an intrinsic order, suitable for frequency counts and mode determination. Using an ordinal scale in evaluating student essays enables comparative assessment through ranks, while a nominal scale limits the inferential statistics applicable .
The statistical implications of a population having a mean of 10, a median of 15, and a mode of 25 suggest a left-skewed distribution. In such distributions, the mean lies to the left of the median, and the mode is the furthest right, indicating that most data values cluster on the high end, with a long tail on the lower end .
A pie graph is significant for visualizing categorical data because it represents proportions of the whole, giving a clear view of distribution among categories. It is particularly useful when the categories are few, and distinctions in distribution percentages are more insightful. For example, in displaying the distribution of academic majors at a college basketball game, a pie graph can effectively convey each major's representation as a slice of the entirety, enhancing comparability visually .
In a sequence of independent events, past outcomes do not affect the probabilities of future events. For example, in drawing balls from a box, if the first four balls drawn are red, the probability of drawing a fifth red ball remains based on the remaining balls in the box, independent of previous draws. The probability is recalculated based on the updated total number of balls; if the first four were red, the 5th ball probability depends on the new count, as shown by 96/166 if replacing each ball .