Yule's Q Coefficient Explained
Yule's Q Coefficient Explained
CONTINGENCY TABLE
The Yule 'Q' Coefficient is used to determine if two variables VARIABLE Y
measures on nominal or ordinal scales and ordered tables of
2x2 independent sons. The coefficient was developed and
published in 1912 by the British statistician George Undy Yule
(1871–1951), who named her Q in honor of the Belgian statistician.
Quetelet (1796-1874)
The symmetric association coefficient Q of Yule is defined as:
IfQ = 0There is no association between the modalities or levels of the variables, that is, they are independent.
YesQ = +1, there is a perfect positive association.
YesQ = -1, there is perfect negative association.
The interpretation of the results can be difficult and arbitrary with measures of association. A general rule (Knoke and
Bohrnstedt, 1991) is as follows:
DISADVANTAGES
Despite Q resulting in 1, the relationship is far from perfect. There were 35 high-income respondents with low participation.
a contradiction with the hypothesis. Another measure, the phi coefficient , does not share that particular characteristic with Q,
For this reason, sometimes it is preferred to use coefficient despite a slightly more complicated calculation.
EXAMPLE
We can ask the question: Is there a relationship between sex and the habit of smoking cigarettes? The data from 100 men.
and 100 women, are the following:
Men Women
If he/she smokes 60 25
Does not smoke 40 75
. .
= × = 1,2… ℎ ; = 1,2… ,
As observed, it does not hold for the first case analyzed.(0.3 ≠ 0.2125)therefore independence is not fulfilled
statistics, and the requested coefficients will have non-null values.
Applying Yule's Q formula, we obtain:
INTERPRETATION
The Q coefficient of Yule, = 0.63indicates a moderate dependency relationship between sex and smoking habit
cigarettes. Furthermore, since this association is positive = 0.63 is greater than 0,that is to say, men are associated with the habit of
smoking (60) and women's habits of not smoking (75).
NATIONAL UNIVERSITY OF SAN AGUSTIN OF AREQUIPA
PROFESSIONAL SCHOOL OF PSYCHOLOGY
CURSO: PSICOESTADISTICA PROF.: LICENSED LUIS GUERRA JORDAN
FOURTH WORKSHOP PART II ASSOCIATION COEFFICIENTS × Q OF YULE
In a study to find out if tall men tend to marry tall partners, the following information was published
about the wives of 134 tall husbands and 116 short husbands. Find the coefficient of association between the
height of wives and husbands.
a. Complete the contingency table by number of people (simple percentage frequency) considering the variable
independent X (Level of education) and the dependent variable Y (Level of attention they pay to news, issues
or events that happen in other countries).
b. Create grouped bar charts by absolute frequencies (number of spouses) and by percentages (percentage)
of wives) by percentages considering the height of the parents on the horizontal axis.
c. Carry out the necessary calculations to prove that the necessary and sufficient condition of independence is not met.
statistics
. .
== × = 1,2… ℎ ; = 1,2… ,
Chain of
Offices
production
Studies
270 30
superiors
No studies
80 120
superiors
a. Complete the contingency table by number of people (simple percentage frequency) considering the variable
independent X (Level of education) and the dependent variable Y (Level of attention they pay to the news,
issues or events that occur in other countries.
Type of work Type of work
Studies Chain offices Total offices Chains Total
Studies
of of
production production
Superiors Superiors 100%
No No 100%
superiors superiors
Total
b. Create grouped bar charts by absolute frequencies and by percentages (type of work)
considering the type of studies on the horizontal axis.
c. Perform the necessary calculations to prove that the necessary and sufficient condition of independence is not met.
statistics
. .
== × = 1,2, … ℎ ; = 1,2,… ,
Interpretation
James Davis offered some expressions that can be used to describe the various ranges of values. Specifically,
Davis developed the expression to use when interpreting Yule's Q, another measure of association. .
EXAMPLE
The question: Should priests marry? Was answered by 90 men and 90 women, as they are classified.
your answers in the following table:
We can ask if there is a relationship between the sex of the subjects and their opinion regarding the celibacy of priests.
catholics.
Yes No
Women 70 20
Men 50 40
70 × 40 − 20 × 50
= = 0.24
√ 90 × 90 × 120 × 60
INTERPRETATION
The phi coefficient, 0.24indicates a weak dependency relationship between sex and opinion regarding celibacy of
priests. Furthermore, since this association is positive = 0.24 > 0that is to say, women are associated with the opinion of
to agree with celibacy and that of men to be in disagreement.
NATIONAL UNIVERSITY OF SAN AGUSTIN OF AREQUIPA
PROFESSIONAL SCHOOL OF PSYCHOLOGY
COURSE: PSYCHOSTATISTICS PROF.: LICENSEE LUIS GUERRA JORDAN
FIFTH WORKSHOP PART II ASSOCIATION COEFFICIENTS IN TABLES × PHI
In a study to find out if tall men tend to marry tall partners, the following information was published.
about the wives of 134 tall husbands and 116 short husbands. Find the coefficient of association between the
height of wives and husbands.
In a company with 500 workers, with higher education or not, and working in offices or production line.
As shown in the following contingency table. Analyze whether there is an association between the two variables using a
respective measure of association.
Chain of
Offices
production
Studies
270 30
superiors
No studies
80 120
superiors
ℎ: ú
: ú
CHARACTERISTICS
Indicates whether there is a relationship between the variables, but does not indicate the degree or intensity of the relationship as the size increases.
of the sample.
It does not indicate the meaning of the relationship.
It is applicable to variables measured on nominal, ordinal, interval, or ratio scales. Simply, the last three
scales should be reformulated as categorical.
It is applicable when the theoretical (expected) frequencies are not less than five.
Si calculamos el valor de for a contingency table of two rows by two columns, the following holds
equivalence
EXAMPLE
The following table shows the results of a study to find out if drug consumption is very related to
the antisocial behavior of a sample of randomly selected young people in a rehabilitation center.
Thus we obtain the contingency table of the expected frequencies as shown below:
[25 − 24.41 ]2 [20 − 28.03 ]2 [18-20.19 ]2 [25 − 15.37 ]2 [38 - 32.73 ]2 [38- 37.58 ]2
= + + + + + +
24.41 28.03 20.19 15.37 32.73 37.58
[22 − 27.08 ]2 [20 - 20.61 ]2 [18- 23.86 ]2 [35 −27.39 ]2 [27-19.73 ]2 [6-15.02 ]2
+ + + + + + =
27.08 20.61 23.86 27.39 19.73 15.02
= .
INTERPRETATION
The Chi square coefficient = . > indicates that drug use is related to behavior
antisocial.
NATIONAL UNIVERSITY OF SAN AGUSTIN OF AREQUIPA
PROFESSIONAL SCHOOL OF PSYCHOLOGY
CURSO: PSICOESTADISTICA PROF.: LIC. LUIS GUERRA JORDAN
SIXTH WORKSHOP PART II ASSOCIATION COEFFICIENTS TABLES × CHI SQUARE FROM PEARSON
From a survey conducted among a group of young people about the current situation, the following results have been obtained:
a. Complete the contingency table by number of people (simple percentage frequency) considering the variable
independiente X (Nivel de estudios) y la variable dependiente Y (Nivel de atención que prestan a las noticias,
issues or events that occur in other countries.
b. Create grouped bar charts by absolute frequencies (number of young people) and by percentages.
(percentage of youth) considering the level of education on the horizontal axis.
c. To find out if the level of education is related to the problems of the current situation, using the coefficient.
Chi Square. Calculate it and interpret it.
CONTINGENCY COEFFICIENT OF PEARSON
=√ ≤ <
+
The upper limit, or maximum value, of the contingency coefficient C depends on the size of the table.
When the table is square, that is, the number of rows it is the same number of columns the maximum value of C can be
calculate using the following equation:
−
á =√ , =
ℎ: ú
: ú
For example,
For contingency tables2 × 2the maximum value is 0.707, that is:0 ≤ C ≤ 0.707
For contingency tables3 × 3the maximum value is0.816, that is to say:0 ≤ C ≤ 0.816
It is only with contingency tables larger than 5 x 5 that the upper limit exceeds 0.900.
−
≤ á = √ <
If the table is not square, the number of rows and columns is not equal. ≠ then the maximum value or upper limit of
C can be calculated using:
− −
á = √ ×
− −
≤ á = √ × <
STANDARDIZED CONTINGENCY COEFFICIENT C
There is a simple solution to the problem of variation in the upper limits of contingency coefficients. They
they can be normalized or standardized by dividing by their upper limits. This makes all the maximum values,
Regardless of the size or shape of the table, they should be equal to 1. These standardized coefficients can be compared.
between tables of any size, using the following equation:
=
á
≤ ≤
Classification criteria for the coefficients of C (or they are not very common to be found. Most of the
authors cite only that values close to 0 represent weak or no association, and the strongest association is
For values close to 1, however, the magnitude of these factors is not linear, which interferes with interpretation.
We will follow the following classification ([Link]/Statbook/[Link]:
PROPERTIES
The contingency coefficient meets the first characteristic, but does not meet the second, that is, it is equal.
it is zero if there is no association between the two variables or attributes, but it cannot reach one. For that reason
so much
0≤C<1
Two contingency coefficients can only be compared if they come from tables of the same size, because it depends
the upper limit of the number of rows and columns.
C is not directly comparable to any other correlation measure.
The data must be used appropriately for the calculation of the since this test can only be used if at least the
20% of the cells have an expected frequency less than 1.
Its interpretation is difficult because it does not reach its maximum at unity.
EXAMPLE
A researcher aims to find the association between the amount of stress in 167 students of a first-level institute.
semester depending on their socioeconomic conditions.
.
=√ + .
= .
= .
−
á = √ = .
Interpretation:
The existing association between stress frequency and socioeconomic condition is moderate, in which as
It is the class of life, significantly decreasing the frequency of the stress process, with a coefficient value.
of contingency of 0.388.
The obtained value of C=0.388 is compared with á = 0.816 is approximately 47.54%. The relationship therefore
is moderate.
NATIONAL UNIVERSITY OF SAN AGUSTIN DE AREQUIPA
PROFESSIONAL SCHOOL OF PSYCHOLOGY
COURSE: PSYCHO-STATISTICS PROF.: LIC. LUIS GUERRA JORDAN
SEVENTH WORKSHOP PART II ASSOCIATION COEFFICIENTS IN TABLES × DE PEARSON
From a survey conducted with a group of young people about the current situation, the following results have been obtained:
A school psychologist is testing three reading methods for dyslexic children. Studying the background of these...
children saw that the experience in families with dyslexic siblings could be a cause of the ineffectiveness of the methods.
in order to obtain some evidence, he measured his students on the two variables and obtained the following healing table:
Conventions for describing the magnitude of association in contingency tables (Rea & Parker, p. 203)
PROPERTIES
Harald Cramer (Stockholm, 25 of
Cramer's V coefficient has the
Septemberof1893 - October 5of1985.
following properties:
. . .
=√ == √ = √ = .
[ ( ,) −] [ − ]
Interpretation:
It can be concluded that all the coefficients are above half of the range they can take, without reaching the
maximum. It could be said that it results in a moderate-high association.
NATIONAL UNIVERSITY OF SAN AGUSTIN OF AREQUIPA
PROFESSIONAL SCHOOL OF PSYCHOLOGY
CURSO: PSICOESTADISTICA PROF.: LIC. LUIS GUERRA JORDAN
EIGHTH WORKSHOP PART II ASSOCIATION COEFFICIENTS IN TABLES × V DE CRAMER
EXAMPLE
The number of households that watch a television program, obtained from a survey directed at a sample of households.
Three communities are gathered in the following table.
It is desired to know the degree of association between both factors for these households through Cramér's V coefficient.
a. Create grouped bar charts by absolute frequencies (number of households) and by percentages.
(percentage of households) considering communities on the horizontal axis.
The Chi-squared test assesses whether a significant association exists between two categorical variables by comparing observed and expected frequencies. Pearson's Contingency Coefficient, C, is derived from the Chi-squared test statistic, providing a measure of the strength of association. Specifically, C is calculated as C = sqrt(X² / (N + X²)), standardizing the Chi-squared statistic to a consistent scale, allowing interpretation of the association's strength. While Chi-squared detects significance, Pearson's C assesses the association magnitude, important for practical applications and statistical analysis .
Cramer's V coefficient, which ranges from 0 to 1, is used to measure the strength of association between two variables in a contingency table larger than 2x2. One limitation is that while it provides a clear measure of the degree of association, it does not imply causation or indicate the direction of the association. Additionally, the absolute value does not elucidate the nature of the relationship, only its strength. Another challenge is interpreting values since the upper limit of the coefficient changes with table size, requiring normalization for comparisons .
Standardizing association measures is crucial because it allows comparisons across tables of different sizes and shapes by setting the maximum possible measure to a constant (usually 1). This process enhances interpretability and comparability, eliminating variations caused by differing table dimensions. Commonly standardized coefficients include Pearson's Contingency Coefficient and Cramer's V. Both address issues related to inherent upper limits specific to each table size or shape, facilitating meaningful interpretation and comparison of association strengths in diverse datasets .
The maximum value of Pearson's Contingency Coefficient is influenced by the size and shape of the contingency table used. For square tables with an equal number of rows and columns, the maximum value increases with the table's size, approaching higher limits as table dimensions grow. In non-square tables, the variability in row and column numbers impacts the coefficient's maximum, calculated by C's equation, which compensates for table shape discrepancies. Thus, the coefficient reflects strength relative to the table's configuration, imposing a need for its standardization for cross-table comparison .
Statistical independence between two variables in a contingency table can be assessed by checking if the proportion of occurrences for one variable is consistent across the levels of the other variable. Specifically, the product of the marginal probabilities should equal the joint probabilities of the categories being tested. If this condition holds, the variables are statistically independent. In the provided example, since the expected values under independence (0.3) did not match the observed joint probabilities (0.2125), the variables were not independent, indicating some form of association .
Yule's Q coefficient is a measure of association between two binary variables, allowing for the evaluation of whether an association exists between them. It ranges between -1 and 1, where values closer to 1 or -1 represent a stronger association, and values closer to 0 indicate a weaker one. A positive Q value suggests a positive association, meaning as one variable increases, the other does too, or both decrease. In the context provided, a Q value of 0.63 indicates a moderate positive dependency between the variables analyzed, such as sex and smoking habit .
The Phi coefficient, also known as the Matthews correlation coefficient, is used to measure the strength of association between two dichotomous variables, calculated from a 2x2 contingency table. It ranges from -1 to 1, where 1 indicates a perfect positive association, -1 a perfect negative association, and 0 indicates no association. It is particularly useful for assessing binary classifications. In the described case, a Phi coefficient of 0.24 suggests a weak positive association between gender and opinions about priest celibacy, indicating that the majority of women agreed while men disagreed .
Cramer's V coefficient is symmetric, meaning it treats both variables equally without a distinction between independent and dependent. This property allows it to measure the strength of association anywhere in the dataset's structure, which is advantageous over other measures that might assume a directional influence. Since Cramer's V does not assume causality and is bounded between 0 and 1, it provides a clear and interpretable measure of strength, beneficial when the direction of influence is irrelevant or unknown .
The Contingency Coefficient C is used to measure the strength of association or dependence between two categorical variables. It is calculated from a contingency table and is defined as C = sqrt(X² / (N + X²)), where X² is the chi-square value, and N is the total sample size. The coefficient can range from 0 to a maximum value that depends on the size and shape of the table, making it sensitive to the number of categories involved. The coefficient provides a standardized measure of association, allowing comparisons across different sized tables .
The Phi coefficient is equal to Cramer's V coefficient when dealing with contingency tables that are 2x2. In these cases, both coefficients quantify the association between dichotomous variables and produce the same value, as they both measure association strength considering dichotomous outcomes. This equivalency occurs because Cramer's V adjusts the Phi coefficient for larger tables by the smaller dimension of the table, but for 2x2 tables, this adjustment has no effect .