0% found this document useful (0 votes)
274 views60 pages

MPC 006 5yr Sol - Pyq

The document is an official solved question paper for the M.A. Psychology (MAPC) course, specifically covering Statistics in Psychology (MPC-006). It includes sections on parametric and nonparametric statistics, the normal curve, part correlation, two-way ANOVA, measures of central tendency and dispersion, and short notes on level of significance, Kruskal-Wallis ANOVA, and standard error. The content emphasizes the importance of understanding statistical methods and their applications in psychological research.

Uploaded by

tabishirshad36
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
274 views60 pages

MPC 006 5yr Sol - Pyq

The document is an official solved question paper for the M.A. Psychology (MAPC) course, specifically covering Statistics in Psychology (MPC-006). It includes sections on parametric and nonparametric statistics, the normal curve, part correlation, two-way ANOVA, measures of central tendency and dispersion, and short notes on level of significance, Kruskal-Wallis ANOVA, and standard error. The content emphasizes the importance of understanding statistical methods and their applications in psychological research.

Uploaded by

tabishirshad36
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 60

IGNOU REAL GUIDE ( 6006255571)

M. A. PSYCHOLOGY (MAPC)

MPC-006 : STATISTICS IN PSYCHOLOGY

MPC 006

June, 2024
OFFICIAL SOLVED QUESTION PAPER

Section—A

Note :Answer any two of the following in about 400 words each :

1. Discuss the assumptions of parametric and nonparametric statistics.

Parametric Statistics:

Parametric statistics refer to statistical techniques that make certain assumptions about

the parameters of the population distribution from which the samples are drawn. These

methods are generally more powerful when their assumptions are met.

Key Assumptions of Parametric Statistics:

1.​ Normality:​
The most critical assumption in parametric tests is that the data (especially the
residuals or errors) follow a normal distribution. This assumption is especially
important for smaller sample sizes. For large samples (typically n > 30), the
Central Limit Theorem helps mitigate violations of normality.
2.​ Homogeneity of Variance (Homoscedasticity):​
Parametric tests assume that different samples or groups have similar variances.
This is essential in tests like ANOVA and t-tests where comparisons are made
across groups.
IGNOU REAL GUIDE ( 6006255571)

3.​ Interval or Ratio Scale of Measurement:​


The variables analyzed should be measured on an interval or ratio scale.
Parametric tests are not suitable for ordinal or nominal data.
4.​ Independence of Observations:​
The observations should be independent of one another. The behavior or value
of one observation should not influence or be influenced by another.
5.​ Linearity (in regression analysis):​
There should be a linear relationship between the independent and dependent
variables in models such as linear regression.
6.​ Additivity:​
In some models, especially multiple regression, it is assumed that the effects of
independent variables on the dependent variable are additive.

Common Parametric Tests:

●​ t-tests (independent and paired)


●​ ANOVA
●​ Pearson correlation
●​ Linear regression

Nonparametric Statistics:

Nonparametric statistics do not rely on assumptions about the shape or parameters of

the population distribution. These methods are used when the assumptions of

parametric statistics are violated or when working with ordinal or nominal data.

Key Characteristics and Assumptions of Nonparametric Statistics:

1.​ No Assumption of Normality:​


Nonparametric tests do not require the data to be normally distributed, making
them suitable for skewed distributions or ordinal data.
2.​ Ordinal or Nominal Data Allowed:​
These tests can handle data on ordinal or nominal scales, where parametric tests
are inappropriate.
3.​ Fewer Assumptions about Variance:​
Nonparametric tests do not assume homogeneity of variance, making them more
robust to heteroscedasticity.
IGNOU REAL GUIDE ( 6006255571)

4.​ Independence of Observations:​


Like parametric tests, nonparametric methods also assume that observations are
independent unless using repeated measures techniques (e.g., Friedman test).
5.​ Ranks Instead of Raw Scores:​
Many nonparametric tests convert raw data into ranks before analyzing, making
them less sensitive to outliers and non-linear relationships.

Common Nonparametric Tests:

●​ Mann-Whitney U test
●​ Wilcoxon signed-rank test
●​ Kruskal-Wallis H test
●​ Friedman test
●​ Spearman’s rank correlation

Conclusion:

The choice between parametric and nonparametric methods depends on the nature of

the data and whether the assumptions of parametric tests are met. While parametric

tests are more powerful and efficient with appropriate data, nonparametric tests offer

flexibility and robustness when those assumptions are not satisfied. An understanding

of their assumptions ensures the validity and reliability of statistical inferences.

2. Explain the concept of normal curve with the help of a diagram. Describe the
characteristics of normal probability curve.

Concept of Normal Curve:

The normal curve, also known as the normal probability curve or Gaussian curve, is a

bell-shaped, symmetrical graph that represents the distribution of a continuous random

variable. It is widely used in statistics to represent real-world variables like height,


IGNOU REAL GUIDE ( 6006255571)

intelligence, test scores, and measurement errors, which tend to distribute in a normal

pattern.

Diagram of Normal Curve:

Here's a simple representation of a normal curve:

.--''''''--.
.-' '-.
.' '.
.' '.
/ \
| |
\ /
'. .'
'. .'
'-. .-'
'--....--'

This curve is symmetrical around the mean and shows that most values cluster around

the center and taper off toward the tails.

Characteristics of the Normal Probability Curve:

1.​ Bell-Shaped and Symmetrical:​


The curve is perfectly symmetrical around its mean. This means the left half of
the curve is a mirror image of the right half.
2.​ Mean = Median = Mode:​
In a normal distribution, the mean, median, and mode all lie at the center of the
distribution and are equal.
IGNOU REAL GUIDE ( 6006255571)

3.​ Asymptotic to the X-axis:​


The tails of the curve approach the horizontal axis but never touch it. This means
that the probability of extreme values is very low but not zero.
4.​ Unimodal:​
The curve has only one peak (mode), which occurs at the mean.
5.​ Empirical Rule (68-95-99.7 Rule):
○​ About 68% of the data falls within 1 standard deviation of the mean.
○​ About 95% falls within 2 standard deviations.
○​ About 99.7% falls within 3 standard deviations.
6.​ Area under the Curve:​
The total area under the normal curve is equal to 1 (or 100%), representing the
total probability. Specific areas under the curve correspond to probabilities of the
variable falling within particular ranges.
7.​ Standard Normal Curve:​
When the mean is 0 and the standard deviation is 1, the curve is called the
standard normal distribution. It is used to calculate Z-scores.
8.​ Determined by Mean and Standard Deviation:​
The shape of the curve is controlled by these two parameters. The mean
determines the location of the center, and the standard deviation determines the
spread.

Conclusion:

The normal probability curve is a fundamental concept in statistics, serving as the basis

for many statistical tests and theories. Its predictable properties and widespread

occurrence in natural and social phenomena make it an essential tool in data analysis

and interpretation.

Section—B
IGNOU REAL GUIDE ( 6006255571)

Note : Answer any four of the following in about 250 words each :

7. Elucidate part correlation.

Definition:​

Part correlation measures the unique contribution of an independent variable (X1) to the

dependent variable (Y), after removing the effect of another independent variable (X2)

from only X1, but not from Y.

It answers the question:​

“How much of the variation in Y is uniquely explained by X1, beyond what is explained

by X2?”

Key Features:

●​ Also called semipartial correlation.


●​ Unlike partial correlation, where the effect of other variables is removed from both
X and Y, in part correlation, it’s removed only from the predictor (X1).
●​ Shows the incremental predictive value of a variable in multiple regression.

Interpretation:
If the part correlation is:

●​ High: X1 has a strong unique contribution to explaining Y.


IGNOU REAL GUIDE ( 6006255571)

●​ Low or near zero: X1 adds little to the prediction of Y beyond what X2 already
explains.

Example:
Suppose we want to see how much hours studied (X1) predicts exam scores (Y) after

removing the effect of prior GPA (X2) from hours studied, but not from exam scores.

The part correlation tells us how much unique variance in exam scores is explained by

hours studied, above and beyond GPA.

Applications:

●​ Used in multiple regression analysis to assess unique predictor contributions.


●​ Helps in model building by identifying redundant variables.

Summary:
Part correlation isolates the unique effect of a predictor on an outcome, controlling for

other predictors only in that variable, offering insight into which variables matter most

individually.

8. Explain the concept, merits and demerits of two- way ANOVA.

Concept of Two-Way ANOVA:


IGNOU REAL GUIDE ( 6006255571)

Two-Way Analysis of Variance (ANOVA) is a statistical method used to examine the

effect of two independent variables (factors) on a single dependent variable. It also

analyzes the interaction effect between the two factors.

For example, if we study the effect of teaching method (Factor A) and student gender

(Factor B) on exam scores (dependent variable), two-way ANOVA can assess:

1.​ The effect of teaching method,


2.​ The effect of gender,
3.​ The interaction effect between teaching method and gender.

Merits of Two-Way ANOVA:

1.​ Simultaneous Analysis: Examines two factors at once, saving time and
increasing efficiency.
2.​ Interaction Detection: Detects interaction effects that one-way ANOVA cannot.
3.​ More Accurate Results: Controls for variability from multiple sources, leading to
more precise conclusions.
4.​ Improves Generalizability: Incorporating more factors increases the applicability
of results across conditions.

Demerits of Two-Way ANOVA:

1.​ Complexity: Interpretation becomes difficult when interaction effects are


significant.
2.​ Assumptions Required: Assumes normality, homogeneity of variances, and
independent observations.
3.​ Balanced Design Preferred: Unequal group sizes (unbalanced data) complicate
the analysis.
4.​ Not for More Than Two Factors: For more than two factors, factorial ANOVA or
higher-order methods are required.
IGNOU REAL GUIDE ( 6006255571)

Conclusion:
Two-way ANOVA is a powerful tool to assess the independent and combined effects of

two categorical variables on a continuous outcome, but it requires careful interpretation

and adherence to statistical assumptions.

9. Explain the measures of central tendency and measures of dispersion.

1. Measures of Central Tendency:


These are statistical values that represent the center or average of a dataset. They

summarize a large amount of data into a single representative value.

a) Mean (Arithmetic Average):

●​ Sum of all values divided by the number of values.


●​ Formula:
●​ Mean=∑xn
●​ Mean=
●​ n
●​ ∑x
●​ ​

●​ Merits: Simple and widely used.


●​ Demerits: Affected by extreme values (outliers).

b) Median:

●​ The middle value when data are arranged in order.


●​ Merits: Not affected by outliers; good for skewed data.
●​ Demerits: Not suitable for further mathematical calculations.
IGNOU REAL GUIDE ( 6006255571)

c) Mode:

●​ The value that occurs most frequently.


●​ Merits: Useful for categorical data.
●​ Demerits: May not exist or may be more than one.

2. Measures of Dispersion:
These describe the spread or variability in a dataset — how much the data values

deviate from the center.

a) Range:

●​ Difference between the maximum and minimum values.


●​ Formula:
●​ Range=Max−Min
●​ Range=Max−Min
●​ Merits: Simple to calculate.
●​ Demerits: Based only on extreme values.

c) Standard Deviation (SD):

●​ Square root of variance; shows average deviation from the mean.


●​ Merits: Most reliable and widely used.
●​ Demerits: Affected by outliers.

d) Quartile Deviation (Interquartile Range/2):

●​ Measures spread of the middle 50% of data.


●​ Merits: Not influenced by extreme values.

Conclusion:
IGNOU REAL GUIDE ( 6006255571)

●​ Central tendency gives the typical value of a dataset.


●​ Dispersion tells us how much the data varies.
●​ Both are essential for understanding the nature and structure of a dataset in
statistics.

Section—C

Note : Write short notes on the following in about 100 words each :

10. Level of significance

The level of significance (denoted by α) is the probability of rejecting a true null

hypothesis in a statistical test. It represents the risk of committing a Type I error.

Common levels of significance are 0.05 (5%), 0.01 (1%), and 0.10 (10%). For example,

if α = 0.05, there is a 5% chance of wrongly concluding that an effect exists when it does

not. The level of significance is chosen before conducting the test and is used to

determine the critical value or p-value threshold for making decisions about the null

hypothesis.

11. Kruskal-Wallis ANOVA test

The Kruskal-Wallis ANOVA is a non-parametric statistical test used to compare three or

more independent groups when the assumptions of one-way ANOVA (like normality)

are not met. It tests whether the median ranks of the groups differ significantly. Data are

ranked across all groups, and the test statistic evaluates differences in these ranks. It is
IGNOU REAL GUIDE ( 6006255571)

an extension of the Mann-Whitney U test for more than two groups. The Kruskal-Wallis

test is useful for ordinal data or non-normal interval data and helps identify if at least

one group differs from the others, but it doesn’t specify which groups differ.

12. Standard error

The Standard Error (SE) measures the variability or precision of a sample statistic

(usually the sample mean) as an estimate of the population parameter. It shows how

much the sample mean is expected to fluctuate from the true population mean if

different samples are taken. The smaller the SE, the more precise the estimate. It is

calculated as:

SE=sn

where s is the sample standard deviation and n is the sample size. Standard error is

important in constructing confidence intervals and hypothesis testing

MPC 006

DEC 2023
OFFICIAL SOLVED QUESTION PAPER

Section—A

Note : Answer the following questions in about 450 words each


IGNOU REAL GUIDE ( 6006255571)

1. Explain organisation of data

Organisation of Data

Organisation of data is a fundamental step in statistical analysis, involving arranging raw data
in a structured format to make it easier to understand, interpret, and analyze. It transforms
unprocessed data into a systematic order, helping researchers draw meaningful conclusions
efficiently.

Importance of Organising Data

Raw data collected from surveys, experiments, or observations are often vast, unstructured, and
complex. Without organisation, it becomes challenging to identify patterns, trends, or
relationships. Proper organisation allows for easier computation of statistical measures like
mean, median, variance, and simplifies visualization through graphs or charts.

Steps in Data Organisation

1.​ Collection of Data:​


Initially, data is gathered from relevant sources based on research objectives.​

2.​ Classification:​
Data is classified by grouping similar items or observations based on shared
characteristics or categories. For example, grouping students by grade levels or
responses by age groups.​

3.​ Tabulation:​
After classification, data is presented in tables for clarity. Tabulation arranges data into
rows and columns, showing frequencies or counts corresponding to different classes or
categories.​

4.​ Coding:​
Sometimes, data is coded into numbers or symbols for ease of entry and analysis,
especially in large datasets.​
IGNOU REAL GUIDE ( 6006255571)

Types of Data Organisation

●​ Raw Data: Data in its original form without any arrangement. It’s difficult to interpret in
this form.​

●​ Classification of Data: Grouping data into mutually exclusive classes or categories.​

●​ Frequency Distribution: Showing the number of occurrences (frequency) of each


class/category, often using frequency tables.​

●​ Graphical Representation: Using charts (bar graphs, histograms, pie charts) to visually
summarize data.​

●​ Grouped Data: Data organised into classes or intervals, especially useful for large
datasets.​

Example

Suppose a survey collects the ages of 50 people. Instead of listing each age individually (raw
data), the data can be classified into age groups (e.g., 10-19, 20-29, etc.), and a frequency
distribution table can be created showing how many people fall into each group. This organized
data can then be used for further analysis like calculating the average age or visualizing age
distribution.

Benefits of Organised Data

●​ Simplifies analysis: Clear data arrangement helps apply statistical techniques


accurately.​

●​ Facilitates comparisons: Easier to compare different groups or categories.​

●​ Highlights patterns: Trends and outliers become more apparent.​

●​ Supports decision-making: Helps researchers and decision-makers derive valid


conclusions quickly.​

Conclusion
IGNOU REAL GUIDE ( 6006255571)

Organising data is a critical preliminary step that ensures raw data is transformed into a clear,
concise, and meaningful format. It lays the foundation for effective analysis, interpretation, and
presentation of results in any research or statistical study. Without proper organisation, data
loses its utility and becomes overwhelming.

2. Elucidate partial and part correlation (semi-partial correlation) with the help of suitable
examples.

Understanding the relationship between variables is a key focus in statistics, especially when
multiple variables influence a dependent variable. Partial correlation and part correlation
(semipartial correlation) are techniques used to analyze these relationships by controlling for
the effect of other variables.

Partial Correlation

Definition:​
Partial correlation measures the relationship between two variables while removing the effect
of one or more additional variables from both variables under consideration.

Explanation:​
Suppose we have three variables:

●​ X1 = Hours studied​

●​ X2 = IQ​

●​ Y = Exam score​

We want to find the correlation between hours studied (X1) and exam scores (Y) after
removing the effect of IQ (X2) from both hours studied and exam scores. This helps
understand the pure association between X1 and Y without the influence of IQ.
IGNOU REAL GUIDE ( 6006255571)

Example:

●​ Raw correlation between hours studied and exam score might be high because smarter
students (high IQ) study more and score higher.​

●​ Partial correlation adjusts for IQ, showing the direct relationship between study hours
and scores beyond IQ’s effect.​

Part Correlation (Semipartial Correlation)

Definition:​
Part correlation measures the relationship between two variables while removing the effect of
the control variable(s) from only one of the variables (usually the predictor), but not the
other (dependent variable).

Explanation:​
Using the same example, part correlation examines how hours studied (X1), after removing
IQ’s effect from hours studied only, relates to exam scores (Y). The difference from partial
correlation is that IQ’s effect remains in the exam scores.

Example:

●​ This helps to understand the unique contribution of study hours to exam scores beyond
IQ's influence on study hours, but without adjusting exam scores for IQ.​

Key Differences
Aspect Partial Correlation Part (Semipartial)
Correlation

Control variable removed Both variables (predictor and Only from predictor variable
from criterion)

Interpretation Pure relationship between Unique contribution of


two variables controlling for predictor to outcome
others
IGNOU REAL GUIDE ( 6006255571)

Magnitude Usually smaller or equal to Usually larger or equal to


part correlation partial correlation

Practical Importance

●​ Partial correlation is used to isolate the direct relationship between two variables while
eliminating confounding effects in both variables.​

●​ Part correlation is useful in regression analysis to find out how much a predictor
uniquely explains the variation in the dependent variable.​

Conclusion

Both partial and part correlations help clarify complex variable relationships by controlling for
confounders. Partial correlation controls effects from both variables, showing the direct link,
while part correlation isolates the unique effect of one predictor on the outcome, aiding in
understanding predictor importance in multivariate contexts.

Section—B

Note : Answer questions in about 250 words each (wherever applicable).

5. Describe divergence in normality with the help of suitable diagrams.

Divergence in normality refers to the ways in which a data distribution deviates from the ideal
normal distribution or normal curve. While the normal curve is symmetrical, bell-shaped, and
defined by specific statistical properties, real-world data often diverge from this ideal shape in
various ways. Understanding these divergences is important for choosing appropriate statistical
methods and correctly interpreting data.
IGNOU REAL GUIDE ( 6006255571)

Types of Divergence in Normality

1.​ Skewness​
Skewness is the measure of asymmetry in the distribution.​

●​ Positive Skew (Right Skew): The tail on the right side is longer or fatter. Most data
values are concentrated on the left with some extreme high values pulling the tail right.​

●​ Negative Skew (Left Skew): The tail on the left side is longer. Most data values cluster
on the right with some low extreme values.​

2.​ Kurtosis​
Kurtosis measures the "peakedness" or the heaviness of the tails of the distribution
compared to the normal curve.​

●​ Leptokurtic: A distribution with a sharper peak and fatter tails than the normal
distribution, indicating more outliers.​

●​ Platykurtic: A distribution with a flatter peak and thinner tails, showing fewer extreme
values.​

3.​ Bimodality or Multimodality​


This divergence occurs when the data distribution has two or more peaks, suggesting
the presence of subgroups within the data rather than a single normal population.​

Visual Representation

●​ Normal Distribution: Symmetrical bell shape, mean = median = mode.​

●​ Positive Skew: Right tail longer, peak shifted left.​

●​ Negative Skew: Left tail longer, peak shifted right.​

●​ Leptokurtic: Taller, sharper peak and heavy tails.​

●​ Platykurtic: Shorter, flatter peak and light tails.​

●​ Bimodal: Two distinct peaks.​


IGNOU REAL GUIDE ( 6006255571)

Importance

Detecting divergence in normality helps decide whether to apply parametric tests that assume
normality or to use nonparametric alternatives. It also aids in data transformation or modeling
strategies to correct or accommodate such divergences.

7. Describe the levels of measurement with suitable examples.

Levels of measurement refer to the different ways variables or data can be categorized,
ordered, and measured. Understanding these levels helps in choosing the appropriate statistical
analysis. There are four main levels of measurement:

1. Nominal Level

●​ Definition: Data are categorized into distinct groups or categories with no inherent order
or ranking.​

●​ Characteristics: Categories are mutually exclusive and exhaustive but cannot be


logically ordered.​

●​ Examples:​

○​ Gender (Male, Female)​

○​ Blood groups (A, B, AB, O)​

○​ Types of occupation (Teacher, Doctor, Engineer)​

●​ Statistics Used: Mode, frequency counts, Chi-square test.​

2. Ordinal Level
IGNOU REAL GUIDE ( 6006255571)

●​ Definition: Data are categorized into groups that can be ordered or ranked, but the
intervals between ranks are not necessarily equal.​

●​ Characteristics: Order matters, but the difference between ranks is not quantifiable.​

●​ Examples:​

○​ Educational level (High school, Bachelor, Master, PhD)​

○​ Likert scale responses (Strongly Agree, Agree, Neutral, Disagree, Strongly


Disagree)​

○​ Class ranking in school​

●​ Statistics Used: Median, percentiles, non-parametric tests like Mann-Whitney U.​

3. Interval Level

●​ Definition: Data have ordered categories with equal intervals between values, but there
is no true zero point.​

●​ Characteristics: Differences can be measured meaningfully, but ratios are not


meaningful because zero is arbitrary.​

●​ Examples:​

○​ Temperature in Celsius or Fahrenheit (0°C does not mean no temperature)​

○​ IQ scores​

○​ Calendar years​

●​ Statistics Used: Mean, standard deviation, correlation, regression.​

4. Ratio Level

●​ Definition: Data have all the properties of interval level, plus a meaningful, absolute
zero point.​
IGNOU REAL GUIDE ( 6006255571)

●​ Characteristics: Ratios and differences are meaningful.​

●​ Examples:​

○​ Weight (0 means no weight)​

○​ Height​

○​ Age​

○​ Income​

●​ Statistics Used: All statistical operations including geometric mean, coefficient of


variation.​

Summary Table:
Level Order Equal Intervals True Zero Examples

Nominal No No No Gender, Blood


Group

Ordinal Yes No No Rank,


Satisfaction

Interval Yes Yes No Temperature


(°C)

Ratio Yes Yes Yes Weight, Age

Understanding these levels guides researchers in selecting the right analytical tools and
interpreting data accurately.

9. Elucidate hypothesis testing.


IGNOU REAL GUIDE ( 6006255571)

Hypothesis testing is a statistical method used to make decisions or inferences about


population parameters based on sample data. It involves testing an assumption (hypothesis)
about a population using data and statistical techniques.

Key Concepts in Hypothesis Testing

1.​ Null Hypothesis (H₀):​


This is the default assumption that there is no effect or no difference. It represents the
status quo.​
Example: There is no difference in test scores between Group A and Group B.​

2.​ Alternative Hypothesis (H₁ or Ha):​


This is the statement we want to test. It suggests that there is an effect or a
difference.​
Example: There is a difference in test scores between Group A and Group B.​

3.​ Significance Level (α):​


The probability of rejecting the null hypothesis when it is actually true. Common values
are 0.05 or 0.01.​

4.​ Test Statistic:​


A numerical value calculated from sample data used to decide whether to reject H₀. This
could be a z, t, F, or chi-square statistic depending on the type of test.​

5.​ P-value:​
The probability of obtaining a test statistic at least as extreme as the one observed,
assuming the null hypothesis is true.​

○​ If p-value ≤ α, reject H₀.​

○​ If p-value > α, fail to reject H₀.​

Steps in Hypothesis Testing

1.​ Formulate Hypotheses:​


Set H₀ and H₁.​

2.​ Choose Significance Level (α):​


Typically 0.05 or 0.01.​
IGNOU REAL GUIDE ( 6006255571)

3.​ Select Test and Compute Test Statistic:​


Choose appropriate test (e.g., t-test, chi-square) and calculate value.​

4.​ Determine Critical Value or p-value:​


Use statistical tables or software.​

5.​ Make Decision:​


Compare test statistic with critical value or p-value with α.​

6.​ Interpret Results:​


State the conclusion in the context of the problem.​

Example

A researcher wants to test if a new drug is more effective than the old one.

●​ H₀: There is no difference in effectiveness.​

●​ H₁: The new drug is more effective.​

Based on the data, if the p-value is 0.02 and α = 0.05, the researcher rejects H₀ and concludes
the new drug is significantly more effective.

Hypothesis testing is fundamental to scientific research, helping validate findings and guide
decision-making.

Section—C

Note : Write short notes on the following in about 100 words each.

10. Tetrachoric correlation.

Tetrachoric correlation is a statistical technique used to estimate the correlation between two
theoretically continuous variables that have been dichotomized (converted into two
categories). It assumes that each binary variable arises from an underlying normal distribution
and that a certain threshold splits the values into two categories (e.g., pass/fail, yes/no).
IGNOU REAL GUIDE ( 6006255571)

This method is appropriate when both variables are artificially divided and not naturally binary.
For example, if test scores are categorized as “pass” or “fail” based on a cut-off, the tetrachoric
correlation helps estimate the original correlation between the unobserved continuous variables.

It is commonly used in psychometrics and social science.

11. Linear regression

Linear regression is a statistical method used to model the relationship between a dependent
variable and one or more independent variables. In simple linear regression, the
relationship is modeled with a straight line:

Y=a+bX+εY = a + bX + \varepsilonY=a+bX+ε

Where:

●​ Y is the dependent variable,​

●​ X is the independent variable,​

●​ a is the intercept,​

●​ b is the slope (regression coefficient),​

●​ ε is the error term.​

It assumes a linear relationship, constant variance, independence of errors, and normally


distributed residuals. Linear regression is used for prediction and to assess the strength of
relationships between variables.

12. Scatter plot

A scatter plot is a graphical representation used to display the relationship between two
quantitative variables. Each point on the plot represents an observation with its position
determined by the values of the two variables — one plotted along the x-axis and the other
along the y-axis.

Scatter plots help in:


IGNOU REAL GUIDE ( 6006255571)

●​ Visualizing correlation (positive, negative, or none),​

●​ Identifying outliers or unusual patterns,​

●​ Determining the possible linearity of the relationship.​

For example, a scatter plot of study time vs. exam scores can show whether increased study
time is associated with higher scores. It’s a foundational tool in regression and correlation
analysis.

MPC 006

JUNE 2023
OFFICIAL SOLVED QUESTION PAPER

Section—A

Note : Answer the following questions in about 450 words each (wherever
applicable).

1. Elucidate parametric and non-parametric statistics. Describe the assumptions


of parametric statistics.

Parametric and Non-Parametric Statistics

Statistics can broadly be divided into two main categories: parametric and

non-parametric. These two approaches differ primarily in terms of the assumptions they

make about the population from which the data are drawn.
IGNOU REAL GUIDE ( 6006255571)

Parametric Statistics
Parametric statistics involve statistical techniques that assume the data come from a

type of probability distribution and that this distribution is characterized by a set of fixed

parameters. The most common assumption is that the data follow a normal (Gaussian)

distribution.

Examples of parametric tests include:

●​ t-test (independent and paired)


●​ Analysis of Variance (ANOVA)
●​ Pearson correlation coefficient
●​ Regression analysis

Assumptions of Parametric Statistics

1.​ Normality: The data (or the residuals in case of regression) should be normally
distributed. This is especially important for small sample sizes.
2.​ Homogeneity of Variance (Homoscedasticity): The variances in different groups
should be approximately equal. This is critical in tests like ANOVA.
3.​ Independence: Observations should be independent of one another. That is, the
value of one observation should not influence another.
4.​ Scale of Measurement: Data should be measured at the interval or ratio scale.
These scales allow for meaningful calculations of means and standard
deviations.
5.​ Linear Relationship (in some cases): In tests like correlation and regression,
there is an assumption of a linear relationship between the variables.

When these assumptions are met, parametric tests are more powerful and reliable,

meaning they are more likely to detect a true effect when one exists.

Non-Parametric Statistics
Non-parametric statistics do not require the data to fit any specific distribution. These

techniques are often referred to as "distribution-free" methods and are particularly

useful when the assumptions of parametric tests cannot be met.


IGNOU REAL GUIDE ( 6006255571)

Examples of non-parametric tests include:

●​ Mann–Whitney U test
●​ Wilcoxon signed-rank test
●​ Kruskal–Wallis test
●​ Spearman’s rank correlation
●​ Chi-square test

These tests typically rely on the ranking of data rather than the data's actual values,

making them appropriate for ordinal data or for data that are not normally distributed.

Key Differences
Aspect Parametric Statistics Non-Parametric Statistics

Strict assumptions (e.g.,


Assumptions Fewer or no assumptions
normality)

Ordinal, nominal, or skewed


Data Type Interval or ratio scale
interval

Higher when assumptions are


Statistical Power Lower, but robust to violations
met

Use of Data Uses actual values Uses ranks or frequencies

Conclusion
Parametric statistics are preferred when their assumptions are satisfied due to their

higher efficiency and power. However, non-parametric statistics provide a valuable

alternative when the assumptions are violated or when dealing with non-quantitative

data. An informed choice between the two approaches is crucial for accurate and valid

statistical inference.
IGNOU REAL GUIDE ( 6006255571)

2. Explain point biserial correlation and phi-coefficient with suitable examples.

Point biserial correlation and phi-coefficient are both measures of association between

two variables, but they are used in different contexts depending on the nature of the

variables involved. Both are special cases of the Pearson product-moment correlation

coefficient.

1. Point Biserial Correlation


Definition:​

The point biserial correlation coefficient is used when one variable is dichotomous

(binary with two categories like Yes/No, Male/Female) and the other is continuous (like

height, test scores, etc.).

It measures the strength and direction of the association between a continuous variable

and a true dichotomous variable.

Example:​

Suppose a researcher wants to see the relationship between gender (coded as Male =

0, Female = 1) and exam scores. If females have a higher mean exam score than

males, the point biserial correlation will be positive. A large positive or negative value

indicates a strong relationship.


IGNOU REAL GUIDE ( 6006255571)

2. Phi-Coefficient (φ)
Definition:​

The phi-coefficient is used when both variables are dichotomous. It measures the

strength and direction of the association between two binary variables.

Example:​

Suppose we want to study the relationship between smoking (Yes/No) and presence of

a disease (Yes/No). Both are binary variables. A 2x2 table is made, and φ is calculated.

A high φ (near +1 or -1) shows a strong relationship between smoking and disease

presence.

Key Differences
Aspect Point Biserial Correlation Phi-Coefficient

Variable Types One continuous, one binary Both binary

Measurement Level Interval/Ratio & Nominal Nominal & Nominal

Type of Analysis Continuous-outcome analysis Contingency table analysis

Conclusion
IGNOU REAL GUIDE ( 6006255571)

Both coefficients are useful in analyzing relationships involving categorical variables.

Point biserial correlation is ideal for continuous-binary relationships, while the

phi-coefficient is suitable when analyzing associations between two binary variables.

Understanding when to use each ensures correct statistical interpretation and

meaningful research conclusions.

Section—B

Note : Answer question in about

250 words each (wherever applicable).

5. Explain the characteristics of normal

probability curve.

The normal probability curve (also called the Gaussian distribution or bell curve) is a

fundamental concept in statistics and psychological measurement. It represents the

distribution of many natural and social phenomena such as intelligence, height, test

scores, etc.

Here are the key characteristics of the normal probability curve:

1.​ Bell-Shaped and Symmetrical:​


The curve is perfectly symmetrical about the mean. The left and right sides are
IGNOU REAL GUIDE ( 6006255571)

mirror images. Most data cluster around the central peak and taper off equally on
both sides.
2.​ Unimodal:​
It has a single peak, which corresponds to the mean, median, and mode—all
located at the center of the distribution.
3.​ Asymptotic:​
The tails of the curve approach the horizontal axis but never touch it. This implies
that extreme values (very high or very low) are possible but rare.
4.​ Mean = Median = Mode:​
In a normal distribution, these three measures of central tendency are equal and
lie at the center of the curve.
5.​ Empirical Rule (68-95-99.7 Rule):
○​ About 68% of values lie within ±1 standard deviation from the mean.
○​ About 95% lie within ±2 standard deviations.
○​ About 99.7% lie within ±3 standard deviations.
6.​ Area Under the Curve:​
The total area under the curve is 1 (or 100%), which represents the entire
probability distribution.
7.​ Defined by Two Parameters:​
The shape of the curve is determined by its mean (μ) and standard deviation (σ).
The mean determines the center, while the standard deviation controls the
spread.

The normal curve is foundational in inferential statistics, as many statistical tests

assume that the data follow a normal distribution.

6. Discuss the importance and application of standard error of means.

The Standard Error of the Mean (SEM) is a key concept in statistics that measures how

much the sample mean (average) is expected to vary from the true population mean. It

quantifies the precision of the sample mean as an estimate of the population mean.
IGNOU REAL GUIDE ( 6006255571)

Importance of SEM

1.​ Measures Precision of the Sample Mean:​


SEM indicates the reliability of the sample mean. A smaller SEM means the
sample mean is a more precise estimate of the population mean.
2.​ Basis for Confidence Intervals:​
SEM is used to construct confidence intervals around the sample mean, giving a
range in which the population mean is likely to fall with a certain probability (e.g.,
95%).
3.​ Foundation for Hypothesis Testing:​
Many statistical tests (e.g., t-tests) use SEM to determine if differences between
means are statistically significant, by comparing observed differences relative to
the variability captured by SEM.
4.​ Controls for Sample Size:​
SEM decreases as the sample size increases, reflecting that larger samples
provide more accurate estimates of the population mean.

Applications of SEM

1.​ In Research and Experimentation:​


SEM helps researchers understand how much the sample results might vary if
the study were repeated with different samples from the same population.
2.​ In Reporting Results:​
Scientific papers report SEM to show the variability of the mean, providing
readers a sense of the data’s reliability.
3.​ In Quality Control:​
Manufacturers use SEM to assess the consistency of product measurements
across samples, aiding in quality assurance.
4.​ In Decision Making:​
Businesses and policymakers rely on SEM when making inferences about
populations based on survey or experimental data, ensuring decisions are based
on statistically sound estimates.

Summary
IGNOU REAL GUIDE ( 6006255571)

The Standard Error of the Mean is crucial because it helps quantify the uncertainty

associated with estimating a population mean from a sample. It aids in making valid

inferences, interpreting data accurately, and ensures that conclusions drawn from

sample data are statistically justified.

7. Explain the concept of interactional effect. Discuss merits and demerits of two

way ANOVA.

In two-way ANOVA, we study the effects of two independent variables (factors) on a

dependent variable. Besides examining the main effects of each factor individually, we

also investigate whether there is an interaction effect between them.

●​ Interaction effect occurs when the effect of one independent variable on the
dependent variable depends on the level of the other independent variable.
●​ In other words, the influence of one factor varies across the levels of the second
factor, indicating that the factors do not operate independently.

Example:​

If a study examines the effect of teaching method (Factor A) and student gender (Factor

B) on test scores, an interaction would mean that the effectiveness of a teaching

method differs for males and females. For instance, Method 1 might work better for

males but Method 2 might work better for females.

Merits of Two-Way ANOVA

1.​ Simultaneous Analysis of Two Factors:​


It allows testing the effect of two independent variables on the dependent
IGNOU REAL GUIDE ( 6006255571)

variable at the same time, saving time and resources compared to conducting
two separate one-way ANOVAs.
2.​ Detection of Interaction Effects:​
Two-way ANOVA reveals whether the factors interact, providing a more complete
understanding of the relationships among variables.
3.​ Greater Statistical Power:​
By accounting for variation from two factors, it often increases the sensitivity to
detect real effects.
4.​ Efficient Use of Data:​
It utilizes data more efficiently by examining combined effects and reduces the
risk of Type I error from multiple testing.

Demerits of Two-Way ANOVA

1.​ Complexity:​
Interpretation becomes more complicated, especially if significant interaction
effects are found, requiring careful analysis.
2.​ Assumptions:​
Like all parametric tests, it requires assumptions of normality, homogeneity of
variances, and independence, which may not always be met.
3.​ Unequal Sample Sizes:​
Two-way ANOVA can be sensitive to unequal group sizes, especially in the
presence of interaction effects, complicating analysis.
4.​ Limited to Two Factors:​
While useful, it only analyzes two factors at a time; more complex designs
require higher-way ANOVAs or other methods.

Summary
The interactional effect is crucial for understanding how two factors jointly influence an

outcome, and two-way ANOVA is a powerful tool to examine this along with main

effects. While it offers deeper insights and efficient data use, it also demands careful

interpretation and adherence to assumptions.


IGNOU REAL GUIDE ( 6006255571)

Section—C

Note : Answer the following in about

100 words each.

8. Hypothesis testing.

Hypothesis testing is a statistical method used to make decisions or inferences about a

population based on sample data. It involves formulating two hypotheses: the null

hypothesis (H₀), which states there is no effect or difference, and the alternative

hypothesis (H₁), which suggests there is an effect. Data is analyzed to determine

whether to reject H₀ or fail to reject it, using test statistics and significance levels (like

0.05). Hypothesis testing helps in validating research claims and guiding conclusions in

scientific studies.

9. Outliers and curvilinearity.

Outliers are data points that differ significantly from other observations in a dataset.

They can result from measurement errors, variability, or unusual conditions and may

distort statistical analyses if not addressed.

Curvilinearity refers to a relationship between two variables that follows a curved pattern

rather than a straight line. Unlike linear relationships, curvilinear associations require

nonlinear models for accurate analysis, as linear methods may misrepresent the data’s
IGNOU REAL GUIDE ( 6006255571)

true pattern. Recognizing curvilinearity helps in selecting appropriate statistical

techniques.

12. Wilcoxon matched pair signed rank test.

The Wilcoxon matched pair signed rank test is a non-parametric statistical test used to

compare two related samples or repeated measurements on the same subjects. It

assesses whether their population mean ranks differ, serving as an alternative to the

paired t-test when data do not meet normality assumptions. The test ranks the absolute

differences between paired observations, considering the sign of differences, and

calculates a test statistic to determine if there is a significant median difference between

the pairs. It is commonly used in before-and-after studies

MPC 006

DEC 2022

OFFICIAL SOLVED QUESTION PAPER

SECTION A

Answer the following questions in about 450 words each :

1. Describe the measures of central tendency and measures of dispersion.


IGNOU REAL GUIDE ( 6006255571)

Statistics deals with collecting, organizing, analyzing, and interpreting data. Two fundamental
concepts in descriptive statistics are measures of central tendency and measures of
dispersion. These help in summarizing and understanding the distribution of data.

Measures of Central Tendency

Measures of central tendency describe the center or average of a data set. The three main
measures are:

1.​ Mean (Arithmetic Average):​

○​ The sum of all observations divided by the number of observations.​

○​ Formula:​
Mean=∑XN\text{Mean} = \frac{\sum X}{N}Mean=N∑X​
○​ It is sensitive to extreme values (outliers) and is best used with interval or ratio
data.​

2.​ Median:​

○​ The middle value when the data are arranged in ascending or descending order.​

○​ If the number of observations is even, the median is the average of the two
middle values.​

○​ It is not affected by extreme scores and is ideal for ordinal data or skewed
distributions.​

3.​ Mode:​

○​ The value that appears most frequently in the data set.​

○​ A data set may have no mode, one mode (unimodal), or more than one mode
(bimodal/multimodal).​

○​ It is useful for nominal data.​

Application:​
Measures of central tendency are used to find a typical or representative value of a data set,
such as average income, average marks, or most common category in survey responses.
IGNOU REAL GUIDE ( 6006255571)

Measures of Dispersion

While measures of central tendency describe the center, measures of dispersion describe the
spread or variability in the data. The main measures are:

1.​ Range:​

○​ The difference between the highest and lowest values.​

○​ Formula:​
Range=Maximum−Minimum\text{Range} = \text{Maximum} -
\text{Minimum}Range=Maximum−Minimum
○​ It is simple but affected by outliers.​

2.​ Interquartile Range (IQR):​

○​ The range of the middle 50% of the data, i.e., the difference between the third
quartile (Q3) and the first quartile (Q1).​

○​ IQR = Q3 - Q1​

○​ It is a better measure when data contains outliers.​

3.​ Variance:​

○​ The average of the squared deviations from the mean.​

○​ Formula:​
Variance=∑(X−Xˉ)2N\text{Variance} = \frac{\sum (X -
\bar{X})^2}{N}Variance=N∑(X−Xˉ)2​
4.​ Standard Deviation (SD):​

○​ The square root of variance, indicating average distance from the mean.​

○​ It is widely used to measure consistency or variability.​

Application:​
Measures of dispersion are crucial in understanding the reliability and predictability of data. A
smaller SD indicates that data points are closer to the mean, while a larger SD shows more
spread out data.
IGNOU REAL GUIDE ( 6006255571)

Conclusion

Together, measures of central tendency and dispersion provide a complete summary of data.
While central tendency gives the “typical” value, dispersion indicates how much the values vary,
both of which are essential for accurate data analysis and interpretation.

SECTION B

Answer the following questions in about 250 words each :

5. Discuss the characteristics of normal probability curve.

The normal probability curve, also known as the normal distribution or Gaussian
distribution, is a fundamental concept in statistics. It represents a theoretical distribution of
continuous data that is symmetrically distributed around the mean. The curve is bell-shaped and
plays a crucial role in inferential statistics.

Key Characteristics:

1.​ Symmetrical Shape:​

○​ The curve is perfectly symmetrical about the mean.​

○​ This means that the left and right sides of the curve are mirror images.​

2.​ Mean, Median, and Mode Are Equal:​

○​ All three measures of central tendency lie at the center of the distribution and
have the same value.​
IGNOU REAL GUIDE ( 6006255571)

3.​ Asymptotic to the X-axis:​

○​ The curve approaches, but never touches, the horizontal axis.​

○​ The tails extend infinitely in both directions.​

4.​ Bell-Shaped Curve:​

○​ Most values cluster around the central peak, and the probabilities for values taper
off equally in both directions from the mean.​

5.​ Empirical Rule (68-95-99.7 Rule):​

○​ Approximately 68% of the data lies within 1 standard deviation (σ) from the mean
(μ),​

○​ 95% within 2σ,​

○​ 99.7% within 3σ.​

6.​ Area Under the Curve:​

○​ The total area under the normal curve is equal to 1 (or 100%), representing the
entire probability space.​

7.​ Unimodal Distribution:​

○​ There is only one peak, indicating a single mode.​

Conclusion:

The normal probability curve is essential for hypothesis testing, confidence intervals, and many
statistical methods. Its well-defined properties make it a powerful tool for understanding and
predicting patterns in data.

6. Explain Type I and Type II errors, with suitable examples.


IGNOU REAL GUIDE ( 6006255571)

In statistical hypothesis testing, decisions are made about a population based on sample data.
However, these decisions are subject to error. The two common types of errors are Type I and
Type II errors.

Type I Error (False Positive):

●​ Definition: Occurs when the null hypothesis (H₀) is rejected when it is actually true.​

●​ Symbol: Denoted by α (alpha), which is the level of significance (commonly 0.05).​

●​ Example:​
A new drug is tested to see if it is more effective than the existing one.​

○​ H₀: The new drug is no better than the existing one.​

○​ If the test leads to rejecting H₀, even though the new drug is not actually better,
this is a Type I error.​

Type II Error (False Negative):

●​ Definition: Occurs when the null hypothesis (H₀) is not rejected when it is actually
false.​

●​ Symbol: Denoted by β (beta).​

●​ Example:​
In the same drug test, suppose the new drug is actually more effective, but the test fails
to show a significant difference.​

○​ Failing to reject H₀ in this case is a Type II error.​

Comparison:

Error Type Definition Symbol Consequence


IGNOU REAL GUIDE ( 6006255571)

Type I Rejecting a true null α False claim of an


hypothesis effect

Type II Failing to reject a β Missing a real effect


false null hypothesis

Conclusion:

Both errors have serious implications, depending on the context (e.g., medicine, law, quality
control). Researchers aim to minimize these errors through appropriate study design, larger
sample sizes, and correct choice of significance level.

7. Compute chi-square for the following data :

Responses

Gender. Yes. No

Males. 5 5

Females. 6. 4

To compute the Chi-square (χ²) for the given 2x2 contingency table, we follow these steps:

Step 1: Organize the Data

Gender Yes No Row Total


IGNOU REAL GUIDE ( 6006255571)

Males 5 5 10

Females 6 4 10

Col Total 11 9 20

Step 2: Calculate Expected Frequencies

Expected frequency (E)=Row Total×Column TotalGrand Total\text{Expected frequency (E)} =


\frac{\text{Row Total} \times \text{Column Total}}{\text{Grand Total}}Expected frequency
(E)=Grand TotalRow Total×Column Total​

For each cell:

●​ E(Male, Yes) = (10 × 11) / 20 = 5.5​

●​ E(Male, No) = (10 × 9) / 20 = 4.5​

●​ E(Female, Yes) = (10 × 11) / 20 = 5.5​

●​ E(Female, No) = (10 × 9) / 20 = 4.5​

Step 3: Apply Chi-square Formula

χ2=∑(O−E)2E\chi^2 = \sum \frac{(O - E)^2}{E}χ2=∑E(O−E)2​

Now compute each term:

●​ (5 - 5.5)² / 5.5 = 0.25 / 5.5 ≈ 0.045​

●​ (5 - 4.5)² / 4.5 = 0.25 / 4.5 ≈ 0.056​

●​ (6 - 5.5)² / 5.5 = 0.25 / 5.5 ≈ 0.045​


IGNOU REAL GUIDE ( 6006255571)

●​ (4 - 4.5)² / 4.5 = 0.25 / 4.5 ≈ 0.056​

χ2=0.045+0.056+0.045+0.056=0.202\chi^2 = 0.045 + 0.056 + 0.045 + 0.056 =


\mathbf{0.202}χ2=0.045+0.056+0.045+0.056=0.202

Step 4: Interpretation

Degrees of Freedom (df) = (rows − 1)(columns − 1) = (2−1)(2−1) = 1

From chi-square table:

●​ Critical value at 0.05 level = 3.84​

Since 0.202 < 3.84, we fail to reject the null hypothesis.

Conclusion:

There is no significant association between gender and response.​


Chi-square (χ²) = 0.202, df = 1, p > 0.05.

8. Describe point-biserial correlation and phi-coefficient.

Both point-biserial correlation and phi-coefficient are measures of association involving


categorical variables, particularly dichotomous (binary) ones. However, they differ based on the
type of variables involved.

1. Point-Biserial Correlation (r b)

●​ Definition:​
A special case of Pearson’s correlation coefficient used when one variable is
dichotomous (e.g., gender: male/female) and the other is continuous (e.g., test score).​

●​ Usage:​
To determine the strength and direction of the relationship between a binary variable
IGNOU REAL GUIDE ( 6006255571)

and a continuous variable.​

●​ Example:​
Suppose you want to examine whether gender (0 = female, 1 = male) is related to exam
scores (a continuous variable). Point-biserial correlation would measure that association.​

●​ Interpretation:​

○​ r b ranges from -1 to +1.​

○​ A positive value indicates that higher scores are associated with the group coded
as "1".​

2. Phi-Coefficient (φ)

●​ Definition:​
A correlation coefficient used when both variables are dichotomous.​

●​ Usage:​
Measures the degree of association between two binary variables.​

●​ Example:​
Gender (male/female) and smoking status (smoker/non-smoker). If both variables are
coded as 0 and 1, phi-coefficient can be used to determine whether there's a
relationship.​

●​ Formula:​
ϕ=χ2N\phi = \sqrt{\frac{\chi^2}{N}}ϕ=Nχ2​​
where χ² is the chi-square value and N is the total number of observations.​

●​ Interpretation:​

○​ φ ranges from -1 to +1.​

○​ Values closer to ±1 indicate stronger association.​

Conclusion:
IGNOU REAL GUIDE ( 6006255571)

●​ Use point-biserial correlation when one variable is continuous and the other is binary.​

●​ Use phi-coefficient when both variables are binary.​


Both are useful tools for analyzing relationships involving categorical data.​

9. What is Skewness ? Explain the factors causing divergence in normal distribution.

Skewness is a statistical measure that describes the asymmetry or departure from symmetry
in a distribution of data. In a perfectly symmetrical distribution, such as a normal distribution,
skewness is zero. However, if the data are not evenly distributed around the mean, the
distribution is said to be skewed.

●​ Positive Skewness (Right-Skewed):​


The right tail is longer; most values lie to the left of the mean.​
Mean > Median > Mode​

●​ Negative Skewness (Left-Skewed):​


The left tail is longer; most values lie to the right of the mean.​
Mean < Median < Mode​

Factors Causing Divergence in Normal Distribution

1.​ Extreme Scores (Outliers):​


A few very high or low values can pull the tail of the distribution, causing skewness.​

2.​ Sampling Bias:​


If the sample is not representative of the population, it may distort the shape of the
distribution.​

3.​ Small Sample Size:​


In smaller samples, normality may not hold due to random variation.​
IGNOU REAL GUIDE ( 6006255571)

4.​ Data Transformation Errors:​


Improper scaling or coding of data can distort distribution symmetry.​

5.​ Floor or Ceiling Effects:​


When a large number of scores cluster at the lower or upper end of the scale, the
distribution becomes skewed.​

6.​ Natural Limits of Variables:​


Some variables (e.g., reaction time, income) are naturally bounded at one end, leading
to skewness.​

Conclusion:

Skewness helps in understanding the shape and nature of a dataset. Recognizing the factors
that cause divergence from normality is essential in choosing the right statistical tests and
interpreting results correctly.

SECTION C

Write short notes on the following in about 100 words each :

10. Point and Interval Estimation

Point Estimation refers to the process of using sample data to calculate a single value (called a
statistic) as an estimate of an unknown population parameter. For example, the sample mean is
a point estimate of the population mean. While simple, point estimates do not provide
information about the estimate’s accuracy or reliability.

Interval Estimation addresses this limitation by providing a range of values, called a


confidence interval, within which the population parameter is expected to lie with a certain
level of confidence (e.g., 95%). Interval estimation accounts for sampling variability and gives a
more informative estimate by indicating the precision and uncertainty around the point estimate.
IGNOU REAL GUIDE ( 6006255571)

Together, these methods help researchers make inferences about populations based on sample
data.

11. Partial Correlation

Partial correlation measures the strength and direction of the relationship between two
variables while controlling for the effect of one or more additional variables. It helps to
understand the direct association between two variables after removing the influence of other
variables that might confound or mediate their relationship.

For example, if you want to find the correlation between students’ study hours and exam scores
while controlling for IQ, partial correlation allows you to isolate the direct effect of study hours on
exam scores independent of IQ.

Partial correlation values range from -1 to +1, similar to Pearson’s correlation, with 0 indicating
no direct relationship after controlling for other variables. It is widely used in multivariate
statistical analysis to clarify relationships among variables.

12. Mann-Whitney U-test

The Mann-Whitney U-test is a non-parametric test used to compare differences between two
independent groups when the data are ordinal or not normally distributed. It tests whether one
group tends to have higher values than the other.

Instead of comparing means, it ranks all observations from both groups together and then
examines the sum of ranks for each group. The U statistic measures how much the rank sums
differ from what would be expected under the null hypothesis of no difference.

This test is useful when assumptions of the t-test (like normality) are violated. It is widely applied
in social sciences, medicine, and other fields to analyze group differences on variables
measured at least at the ordinal level.
IGNOU REAL GUIDE ( 6006255571)

MPC 006

JUNE 2022

OFFICIAL SOLVED QUESTION PAPER

SECTION A

Answer the following questions in about 450 words each :

1. Explain non-parametric statistics with a focus on its assumptions, advantages and


disadvantages.

Non-Parametric Statistics: Explanation, Assumptions, Advantages, and


Disadvantages

Non-parametric statistics are a set of statistical methods used when the data do not meet the
assumptions required for parametric tests or when the data are measured at ordinal or nominal
levels. Unlike parametric tests, which rely on specific distributional assumptions (like normality),
non-parametric tests are more flexible and make fewer assumptions about the data’s underlying
population distribution.

Explanation

Non-parametric methods are often called distribution-free tests because they do not assume
that the data follow a particular distribution. They are particularly useful when dealing with small
sample sizes, ordinal data, ranked data, or data with outliers and skewed distributions.

Examples of non-parametric tests include the Mann-Whitney U test, Wilcoxon signed-rank test,
Kruskal-Wallis test, and Chi-square test. These tests typically analyze median differences,
ranks, or frequency counts instead of means.

Assumptions of Non-Parametric Statistics


IGNOU REAL GUIDE ( 6006255571)

Although non-parametric tests are more flexible, they still have some assumptions:

1.​ Independence: Observations should be independent within and between groups.​

2.​ Random Sampling: Samples should be randomly selected from the population.​

3.​ Ordinal or Nominal Scale: Data should be at least ordinal (ranked) for many tests,
while some tests work with nominal data.​

4.​ Shape of Distribution: Non-parametric tests do not assume normality, but some require
that distributions have similar shapes (e.g., Kruskal-Wallis test).​

Advantages of Non-Parametric Statistics

1.​ No Distributional Assumptions: Useful when data violate normality or other parametric
assumptions.​

2.​ Handles Ordinal and Nominal Data: Can analyze data that cannot be meaningfully
averaged or measured on interval/ratio scales.​

3.​ Robust to Outliers: Less affected by extreme values that skew parametric tests.​

4.​ Applicable to Small Samples: Performs well with small sample sizes where parametric
tests might be invalid.​

5.​ Simple and Flexible: Easy to compute and interpret, especially for ranked or categorical
data.​

Disadvantages of Non-Parametric Statistics

1.​ Less Powerful: Generally, non-parametric tests have less statistical power than
parametric tests, meaning they are less likely to detect a true effect when it exists.​

2.​ Limited Information: Do not provide estimates of parameters like means and standard
deviations.​

3.​ Difficulty in Complex Designs: Less suited for complex experimental designs or
models involving multiple factors.​
IGNOU REAL GUIDE ( 6006255571)

4.​ Interpretation Challenges: Results often focus on medians or ranks, which can be less
intuitive or informative than means.​

5.​ Requires Larger Sample Sizes for Accuracy: For some tests, larger samples may be
needed to achieve reliable results.​

Conclusion

Non-parametric statistics offer a vital alternative when parametric test assumptions are violated
or data are measured on ordinal/nominal scales. While they provide flexibility and robustness,
researchers should consider their lower power and limited interpretive detail when choosing
between parametric and non-parametric methods.

SECTION B

Answer the following questions in about 250 words each :

5. Describe biserial and tetrachoric correlation

Biserial correlation and tetrachoric correlation are both special types of correlation
coefficients used to measure the relationship between variables when one or both are
categorical, but they differ in their assumptions and applications.

Biserial Correlation

●​ Definition:​
Biserial correlation is used when one variable is continuous (interval or ratio scale) and
the other is a dichotomous variable (with two categories) that is artificially
IGNOU REAL GUIDE ( 6006255571)

dichotomized from an underlying continuous variable.​

●​ Purpose:​
It estimates the correlation between the continuous variable and the latent continuous
variable underlying the dichotomous one.​

●​ Example:​
Suppose you have students' test scores (continuous) and whether they passed or failed
(dichotomous). Passing/failing is actually based on a continuous score but simplified to
two categories. Biserial correlation helps estimate the association between the
continuous test score and this dichotomous pass/fail variable.​

●​ Interpretation:​
Biserial correlation tends to be higher than point-biserial correlation because it corrects
for the dichotomization of an originally continuous variable.​

Tetrachoric Correlation

●​ Definition:​
Tetrachoric correlation measures the association between two dichotomous variables
when both are assumed to arise from underlying continuous and normally distributed
variables. It estimates the correlation between those latent continuous variables.​

●​ Purpose:​
It's used when both variables are artificially dichotomized, such as “Yes/No” responses
to two test items or symptoms.​

●​ Example:​
Imagine two binary variables: “Smokes (Yes/No)” and “Has Lung Disease (Yes/No)”.
Both are observed as dichotomous but are assumed to reflect underlying continuous
tendencies (e.g., level of nicotine addiction, lung health). Tetrachoric correlation
estimates the correlation between these underlying continuous traits.​

●​ Interpretation:​
Tetrachoric correlation provides a more accurate estimate of association than the simple
phi coefficient when the dichotomies represent cut-offs on continuous variables.​

Summary
IGNOU REAL GUIDE ( 6006255571)

Correlation Type Variables Involved Assumption Use Case

Biserial One continuous, one Dichotomous variable Continuous &


dichotomous is artificial pass/fail type data

Tetrachoric Two dichotomous Both dichotomies Two binary variables


variables from continuous assumed continuous

Both correlations are valuable in psychological and educational research, especially when
working with categorized data that originate from continuous variables.

6. Discuss the meaning, importance and application of standard error.

Meaning of Standard Error

The Standard Error (SE) is a statistical measure that quantifies the amount of variability or
dispersion in the sampling distribution of a statistic, most commonly the sample mean. It
indicates how much the sample mean is expected to vary from the true population mean if you
were to take multiple samples.

Mathematically, for the sample mean, the standard error is calculated as:

SE=snSE = \frac{s}{\sqrt{n}}SE=n​s​

where

●​ sss = sample standard deviation,​

●​ nnn = sample size.​

Importance of Standard Error


IGNOU REAL GUIDE ( 6006255571)

1.​ Measure of Precision:​


The standard error provides insight into the precision of the sample mean as an
estimate of the population mean. A smaller SE indicates more precise estimates.​

2.​ Basis for Confidence Intervals:​


SE is used to construct confidence intervals around the sample mean, giving a range in
which the true population mean likely falls.​

3.​ Foundation for Hypothesis Testing:​


It is fundamental in determining test statistics (like t-scores), which helps in making
decisions about the population parameters.​

4.​ Comparing Estimates:​


SE allows comparison of the reliability of different sample estimates; samples with
smaller SE are more reliable.​

Applications of Standard Error

1.​ Confidence Interval Construction:​


Researchers use SE to calculate confidence intervals, such as a 95% confidence
interval, which estimates the range of the population mean with a certain level of
confidence.​

2.​ Hypothesis Testing:​


SE helps calculate test statistics to decide whether to reject a null hypothesis. For
example, in t-tests, the difference between sample means is divided by the standard
error to compute the t-value.​

3.​ Estimating Sampling Variability:​


SE informs how much sample estimates fluctuate between samples, guiding
researchers in interpreting sample data.​

4.​ Comparing Different Studies:​


In meta-analysis, SE helps weigh studies by their precision; studies with smaller SE
have more influence.​

Summary
IGNOU REAL GUIDE ( 6006255571)

Standard error is crucial because it bridges sample statistics and population parameters by
measuring the expected variability of estimates. It helps researchers quantify uncertainty,
assess precision, and make statistically sound inferences, thus playing a vital role in data
analysis and decision-making across various fields.

8. Discuss Kruskal-Wallis analysis of variance.

Kruskal-Wallis ANOVA is a non-parametric statistical test used to determine if there are


statistically significant differences between the medians of three or more independent groups. It
is an extension of the Mann-Whitney U test (which compares two groups) to more than two
groups.

When is it used?

●​ When the assumptions of parametric ANOVA (such as normality and homogeneity of


variances) are violated.​

●​ When the dependent variable is ordinal or continuous but not normally distributed.​

●​ When sample sizes are small or unequal.​

●​ When data contain outliers or are skewed.​

Procedure Overview:

1.​ Rank all observations: Combine all data points from all groups and assign ranks, with
the smallest value ranked 1, next smallest 2, and so on.​

2.​ Sum ranks within each group: Calculate the sum of ranks for each group.​
IGNOU REAL GUIDE ( 6006255571)

3.​ Calculate the Kruskal-Wallis statistic (H):​


4.​ Determine significance: Compare the computed HHH value to the critical value from
the chi-square distribution with k−1k-1k−1 degrees of freedom (kkk = number of groups).​

Advantages of Kruskal-Wallis Test:

●​ Does not assume normality of data.​

●​ Suitable for ordinal data or non-normally distributed interval data.​

●​ Less affected by outliers.​

●​ Can handle unequal sample sizes.​

Limitations:

●​ Less powerful than parametric ANOVA when data meet parametric assumptions.​

●​ Does not specify which groups differ—requires post-hoc testing.​

●​ Assumes independence of observations and similar shapes of distributions across


groups.​

Applications:

●​ Comparing effectiveness of treatments in clinical trials when data are skewed.​

●​ Analyzing survey responses measured on ordinal scales across multiple groups.​

●​ Behavioral and social sciences where data often violate parametric assumptions.​
IGNOU REAL GUIDE ( 6006255571)

Summary:

The Kruskal-Wallis test is a valuable tool for comparing multiple groups when parametric
assumptions are unmet. It ranks the data and tests whether the groups come from the same
distribution, offering a robust alternative to the traditional ANOVA.

9. Describe the importance and application of normal distribution.

Importance of Normal Distribution

The normal distribution, often called the Gaussian distribution or bell curve, is fundamental in
statistics because many natural phenomena and measurement errors tend to follow this pattern.
It has several important properties:

1.​ Symmetry: The normal distribution is perfectly symmetric about its mean, meaning
values are equally likely to occur on either side.​

2.​ Describes Natural Phenomena: Heights, weights, IQ scores, blood pressure, and
many other biological, social, and psychological variables approximate normal
distribution.​

3.​ Mathematical Simplicity: Its properties allow for straightforward mathematical


calculations and inference, including use of z-scores and standard deviations.​

4.​ Central Limit Theorem (CLT): The CLT states that the sum or average of a large
number of independent, identically distributed variables will tend to be normally
distributed, regardless of the original variable’s distribution. This underpins many
statistical tests.​

5.​ Basis for Statistical Inference: Many parametric tests (t-tests, ANOVA, regression)
assume normality for valid results.​

Applications of Normal Distribution


IGNOU REAL GUIDE ( 6006255571)

1.​ Statistical Testing: Most inferential tests rely on the assumption of normality, especially
when sample sizes are small.​

2.​ Quality Control: In manufacturing, control charts use normal distribution to detect
variations and maintain product quality.​

3.​ Probability and Risk Assessment: Normal distribution models probabilities of


outcomes, helping in fields like finance, insurance, and project management to assess
risk.​

4.​ Measurement and Error Analysis: Measurement errors in scientific experiments often
follow a normal distribution, allowing for error estimation and confidence intervals.​

5.​ Psychometrics and Social Sciences: Normal distribution models test scores, survey
results, and other measurements for meaningful interpretation.​

Summary

Normal distribution is crucial because it accurately models many real-world variables and forms
the foundation of most classical statistical methods. Its mathematical properties enable
researchers to make predictions, estimate probabilities, and perform hypothesis tests effectively,
making it a cornerstone of data analysis across disciplines.

SECTION C

Write short notes on the following in about 100 words each :

10. Types of Frequency Distribution

Frequency distribution is a way to organize data by showing how often each value or range of
values occurs. The main types include:

1.​ Ungrouped Frequency Distribution: Lists each individual value with its frequency,
suitable for small data sets with distinct values.​
IGNOU REAL GUIDE ( 6006255571)

2.​ Grouped Frequency Distribution: Data are grouped into class intervals (ranges),
showing frequency per interval. Useful for large data sets with many values.​

3.​ Cumulative Frequency Distribution: Displays the running total of frequencies up to a


certain point, helpful for understanding data distribution and percentiles.​

4.​ Relative Frequency Distribution: Shows frequencies as proportions or percentages of


the total, facilitating comparison across groups.​

Each type helps in summarizing data and identifying patterns or trends effectively.

11. Linear and Non-linear Relationship

Linear Relationship:​
A linear relationship between two variables means that the change in one variable is
proportional to the change in the other, and their graph forms a straight line. It can be expressed
by the equation y=mx+cy = mx + cy=mx+c, where mmm is the slope and ccc is the intercept.
Linear relationships are easy to model and interpret. For example, the relationship between
hours studied and exam scores often shows a linear pattern.

Non-linear Relationship:​
A non-linear relationship means the association between variables does not follow a straight
line but a curve or other complex form. This can be quadratic, exponential, logarithmic, etc. For
example, the relationship between stress and performance often follows an inverted U-shaped
curve, indicating a non-linear pattern.

Understanding the type of relationship helps choose the correct statistical model for analysis.

12. Kurtosis

Kurtosis is a statistical measure that describes the shape of a distribution’s tails and the
sharpness of its peak compared to a normal distribution. It indicates whether data have heavier
or lighter tails than a normal curve.

●​ Leptokurtic: Distributions with positive kurtosis have heavy tails and a sharp peak,
meaning more extreme values or outliers.​
IGNOU REAL GUIDE ( 6006255571)

●​ Platykurtic: Distributions with negative kurtosis have light tails and a flatter peak,
indicating fewer extreme values.​

●​ Mesokurtic: Distributions with kurtosis close to zero resemble the normal distribution in
tail weight and peak shape.​

Kurtosis helps in understanding data variability and the likelihood of extreme events.

You might also like