next up previous
Up: RSS Introduction to Statistics Home

Module 11: Nominal and Ordinal Variable Analysis

Jon Starkweather, PhD

Jon Starkweather, PhD
jonathan.starkweather@unt.edu
Consultant
Research and Statistical Support



Image UNT_Brand
http://www.unt.edu



Image RSS_logo_001
A list of them is available at:




Module 11: Nominal and Ordinal Variable Analysis

Parametric statistics

  • All of the previous modules dealt with Parametric statistics.
    • Concerned with population values (i.e. parameters).
    • Require Interval and/or ratio scaled variables.
    • Assumptions about population distributions.
  • This module (11) concerns itself with Nonparametric statistics.

Nonparametric statistics

  • Most Nonparametrics are still concerned with populations, but the hypotheses are not formally stated using population values.
    • Nominal or ordinal scaled variables.
    • Few if any assumptions.
    • Sometimes called distribution-free tests because, they do not make assumptions about a population distribution.
  • Unfortunately, nonparametric tests tend to have less power or sensitivity to detect significance than their parametric partners.

1. Chi-square test

1.1. Chi-square test Introduction

Chi-square test introduction

  • The chi-square test has two forms.
    • Chi-square Goodness-of-Fit which tests whether or not the sample data fit the hypothesized population proportions.
    • Chi-square Test of Independence which tests for the presence or absence of a relationship between two variables.
  • Both Chi-square tests use the same formula and are based on the distribution of Chi-square.
    • Symbol: \(\chi^2\)

Chi-square distribution

Core of Chi-square

  • The core idea of any chi-square test is the comparison of Observed versus Expected frequencies.
  • The general formula is:

    \(\chi^2 = \sum{\frac{\left(O - E\right)^2}{E}}\)


  • Where O is the observed frequency, E is the expected frequency.
    • E is the frequency expected if the null hypothesis were true.

1.2. One-way Classification Tables

One-way Classification Table Example

  • The One-way Chi-square test is the Goodness-of-fit test.
  • Say we randomly picked 100 students walking into the University Administration building.
  • We would expect, because they were picked at random, that an equal number of those students would be Freshman, Sophomore, Junior, and Senior levels.
    • We would expect 25 Freshmen, 25 Sophomores, 25 Juniors, and 25 Seniors.
    • The null hypothesis would be: \(H_{0}: E = O\)
    • The alternative hypothesis: \(H_{1}: E \neq O\)
  • Instead, we found: 32 Freshmen, 28 Sophomores, 23 Juniors, and 17 Seniors.
  • This study design constitutes a one-way classification table because, there is only one variable (class level) with multiple categories.

The One-way Classification Table

Freshmen Sophomore Junior Senior


Observed
32 28 23 17


Expected
25 25 25 25

  • Degrees of Freedom (\(df\)) is the number of Categories or Columns minus 1.
  • \(df = C - 1 = 4 - 1 = 3\)
http://www.medcalc.be/manual/chi-square-table.php

Calculate Chi-square

  • Using the formula from above,

    \(\chi^2 = \sum{\frac{\left(O - E\right)^2}{E}}\)


    \(\chi^2 = \frac{\left(32 - 25\right)^2}{25} + \frac{\left(28 - 25\right)^2}{25} + \frac{\left(23 - 25\right)^2}{25} + \frac{\left(17 - 25\right)^2}{25} = 5.04\)


  • And since \(\chi_{calc}^2 = 5.04 < 7.815 = \chi_{crit}^2\) we fail to reject the null hypothesis and conclude that this sample does not indicate a significant difference between the observed and expected frequencies of class level.

1.3. Multi-way Contingency Tables

Multi-way Chi-square

  • When we have more than one categorical variable, we call the chi-square test a test of Independence.
    • Are the cells of the table Independent of one another, or is there some relationship occurring among them.
  • In the one-way example above, we called the table a classification table because we were classifying frequencies on one variable.
  • In the multi-way situation, we call the table a contingency table because, the frequencies of one variable are contingent upon another (or more than one) variable.

A Two-way Example

  • Suppose we wondered about the gender frequency of students entering the UNT Administration building from above?
  • A 2 X 4 design (Gender by Class Level).
Class Level
Gender Freshmen Sophomore Junior Senior Total
Male 32 28 23 17 100
Female 28 29 20 15 92
Total 60 57 43 32 192

Expected Frequencies in a Two-way Design

  • In the one-way design, expected frequencies were simply even proportions; but here, with a more complex design, we must calculate the expected frequencies which are contingent upon two variables.
  • The basic equation for calculating the Expected frequencies is:
    \(E_{ij} = \frac{R_{i}C_{j}}{n_{t}}\)
  • Where \(E_{ij}\) is a particular cell, \(R_{i}\) is the row total, \(C_{j}\) is the column total, and \(n_{t}\) is the total number of individuals (or cases).

Expected Frequencies for the current example

  • For the current example, we have the following Expected frequencies for each cell:
\(E_{11} = \frac{100\times60}{192}\) \(E_{12} = \frac{100\times57}{192}\) \(E_{13} = \frac{100\times43}{192}\) \(E_{14} = \frac{100\times32}{192}\)


\(E_{21} = \frac{92\times60}{192}\) \(E_{22} = \frac{92\times57}{192}\) \(E_{23} = \frac{92\times43}{192}\) \(E_{24} = \frac{92\times32}{192}\)
  • Which leads to:
\(E_{11} = 31.25\) \(E_{12} = 29.69\) \(E_{13} = 22.40\) \(E_{14} = 16.67\)


\(E_{21} = 28.75\) \(E_{22} = 27.32\) \(E_{23} = 20.60\) \(E_{24} = 15.33\)

Table with Expected Frequencies

  • Here we have the Expected Frequencies for each cell, listed in parentheses.
Class Level
Gender Freshmen Sophomore Junior Senior Total
Male 32(31.25) 28(29.69) 23(22.40) 17(16.67) 100
Female 28(28.75) 29(27.32) 20(20.60) 15(15.33) 92
Total 60 57 43 32 192

  • Of course, you can not have 31.25 persons (frequencies), so you could round to the nearest whole number.

Calculating \(\chi^2\) for the two-way example

  • Recall the formula for \(\chi^2 = \sum{\frac{\left(O - E\right)^2}{E}}\) =
\(\frac{\left(32-32.25\right)^2}{31.25}+\frac{\left(28-29.69\right)^2}{29.69}+\frac{\left(23-22.40\right)^2}{22.40}+\frac{\left(17-16.67\right)^2}{16.67}+\)


\(\frac{\left(28-28.75\right)^2}{28.75}+\frac{\left(29-27.32\right)^2}{27.32}+\frac{\left(20-20.60\right)^2}{20.60}+\frac{\left(15-15.33\right)^2}{15.33}\)=


\(\chi^2 = 0.286\)

Degrees of Freedom in Two-way designs

Two-way Example Results

  • So our \(\chi_{calc}^2 = 0.286 < 7.185 = \chi_{crit}^2\) we fail to reject the null hypothesis and we conclude that there was not a relationship between Gender and Class Level.
  • Stated another way, the two variables were not independent of one another.
  • Stated still another way, the Observed frequencies for each cell did not differ significantly from the Expected frequencies.
  • Like with correlation, chi-square is very sensitive to sample size.
    • If given a large enough sample, any chi-square analysis will be significant.

1.4. Effect Size

Odds as Effect Size

  • We can calculate the odds of cell membership as a measure of effect size which allows us to go beyond the simple hypothesis testing context.
  • In order to calculate the odds for a given cell, we must identify the cell in our question.
    • For example, if you are a Male entering the UNT Administration building, what are the odds you are a Freshman?
    • To answer that question, simply divide the number of Freshmen by the number of not Freshmen for the Male row.
    • Odds of a male also being a Freshman: \(\frac{32}{68} = 0.4706\) or nearly 50/50 odds.
  • Stated another way: there is a 47.06% chance a male entering the building is also a Freshman.

Phi as 2 X 2 Contingency Effect Size

  • When in the 2 X 2 situation, Phi can be used to measure the association of the two variables.
  • Symbol: \(\phi\)
  • Calculation:
    \(\phi = \sqrt{\frac{\chi^2}{n_{t}}}\)
  • The resulting number will be a correlation coefficient and is interpreted as such.
  • Of course, it is limited to the 2 X 2 situation only.

Cramer's \(V\)

  • Cramer's \(V\) is used as an analog to Phi but for contingency tables larger than 2 X 2.
  • The forumula is:
    \(V = \sqrt{\frac{\chi^2}{n_{t}\left(k - 1\right)}}\)
  • Where \(k\) is the smaller of: number of rows or number of columns.
  • A note of caution regarding Phi and Cramer's V. Interpreting a correlation among two strictly categorical variables is essentially meaningless.
    • What does it mean to say that class standing level and gender are (or are not) correlated at .60?
    • NOT MUCH!

1.5. Kappa

Cohen's kappa

  • Cohen's kappa is a measure of agreement.
  • Suppose we have two tenured faculty members rate 28 graduate students' teaching effectiveness.
    • Not Effective, Effective, Highly Effective
  • It would be beneficial to know if both faculty agree on the ratings; or to what extent do they agree or disagree.
  • One could simply calculate the percentage of agreement, but that measure does not take into account the random chance of agreement.
  • Cohen's kappa corrects this deficiency.

Agreement Data

NE = Not Effective, E = Effective, HE = Highly Effective.

Faculty 1
Faculty 2 NE E HE Total
NE 4 0 0 4
E 0 5 1 6
HE 0 3 15 18
Total 4 8 16 28

Percentage of Agreement and Random Chance

  • Of the 28 graduate students, 24 were rated the same by both faculty (add along the diagonal).
    • This means, \(24/28 = .8571\) or 85.71% agreement.
  • However, consider the following:
    • The probability of `Effective' for Faculty 1 is 8/28 = .2857.
    • The probability of `Effective' for Faculty 2 is 6/28 = .2143.
    • So, the probability of both faculty agreeing on `Effective' for one student is .2857*.2143 = .0612.
    • Which is not a lot, but across all 28 students, we can expect .0612*28 = 1.71 agreements just by random chance.

Calculate kappa

  • Calculating kappa is similar to calculating the usual \(\chi^2\).
  • The equation for kappa (\(\kappa\)) is:
    \(\kappa = \frac{\sum{f_{o}} - \sum{f_{e}}}{n_{t} - \sum{f_{e}}}\)
  • Where \(f_{o}\) is the observed frequencies on the diagonal and \(f_{e}\) is the expected frequencies on the diagonal.

Calculating the Expected Frequencies

  • Use the same formula from earlier to calculate the Expected Frequencies:
    \(E_{ij} = \frac{R_{i}C_{j}}{n_{t}}\)
  • For Not Effective (NE): (4*4)/28 = .571
  • For Effective (E): (6*8)/28 = 1.714
  • For Highly Effective (HE): (18*16)/28 = 10.286
  • Then, sum them to get \(f_{e}\) = 12.571

Calculating the Observed Frequencies

  • Simply add up the observed frequencies to get \(f_{o}\)
    \(4 + 5 + 15 = 24\)
  • Now we can calculate kappa.

Calculating kappa

  • Recall, kappa (\(\kappa\)) is:
    \(\kappa = \frac{\sum{f_{o}} - \sum{f_{e}}}{n_{t} - \sum{f_{e}}}\)
  • So, for the current example:
    \(\kappa = \frac{\sum{f_{o}} - \sum{f_{e}}}{n_{t} - \sum{f_{e}}} = \frac{24 - 12.571}{28 - 12.571} = .7407\)
  • So, agreement is really lower than the 85.71% from above; after accounting for chance it is 74.07%.

2. Wilcoxon's Ranks tests

2.1. Wilcoxon's Rank-Sum Test

Wilcoxon's Rank-Sum test

  • Wilcoxon's Rank-Sum test is a non-parametric replacement for the Independent Samples \(t\) test.
  • When data do not conform to the assumptions of the \(t\) test, Wilcoxon's Rank-Sum test is an appropriate alternative.
  • However, as mentioned previously, non-parametric tests tend to have less power than their parametric companions.
    • The Rank-Sum test has less power than the Independent Samples \(t\) test.
  • The general idea of the Rank-Sum test is to test whether two samples originated with the same population, similar to the Independent Samples \(t\) test.
    • However, it is not specifically tied to mean differences, but rather; differences in central tendency.

Ranked Sums

  • If we rank the scores of two groups from lowest to highest, then sum the groups' ranked scores...
  • We would expect, if the groups are different, to find the sum of one group to be smaller than the sum of the other group.
  • As a significance test, we take the sum of the ranks for the smaller group and compare it to a tabled value to determine if the groups are significantly different.
    • If the groups are equal size, then use the smaller of the two ranked sums.
Small Example

  • Say we have two groups' Driving Anger scores; a group of police officers and a group of taxi drivers.
  • The police officers' scores are: 8, 15, 12, 10, 13
  • The taxi drivers' scores are: 27, 28, 19, 17, 26, 28
  • We would expect police officers to have a lower level of Driving Anger than the Taxi drivers.
    • One-tailed test: police officers \(<\) taxi drivers.
  • To test this we will first rank all the scores.

Ranked Data

Raw Scores Rank
Police 8 1
Officers 15 5
12 3
10 2
13 4
Taxi 27 9
Drivers 28 10.5
19 7
17 6
26 8
28 10.5


Tied scores get tied ranks half-way between the two whole number
ranks they would occupy if sequential.

Calculate \(W_{s}\)

  • Sum the Ranks for the smaller group, the police officers: \(\sum{R_{s}} = 1+5+3+2+4 = 15\)
  • Look in the table for the critical value of \(W_{s}\) with a significance level of 0.05 and:
    • \(n_{1}\) = smaller group = 5
    • \(n_{2}\) = larger group = 6
critical value; we reject the null hypothesis and conclude that the two groups are significantly different.

Caution

  • It is important to note that the table of \(W_{s}\) displays Critical Lower-Tail Values of \(W_{s}\), where \(n_{1} \leq n_{2}\).
    • The calculated \(W_{s}\) needs to be less than or equal to the critical value in order to reject the null (i.e. find a significant difference).
  • If we wanted to test if the Upper-Tail was significant (i.e. hypothesize that the taxi drivers tend to score significantly higher than the police officers) we would need to calculate \(W'_{s}\)
    \(W'_{s} = 2\overline{W} - W_{s}\)
  • where \(2\overline{W} = n_{1}\left(n_{1} + n_{2} + 1\right)\)
    • Notice, the table provides \(2\overline{W}\) in the right most column.
  • Then, if \(W'_{s}\) is larger than the critical value, we would reject the null and conclude that the taxi drivers scored significantly higher on the Driving Anger scale.

Normal Approximation

  • Notice the tables of \(W_{s}\) are only useful when \(n_{1}\) and \(n_{2}\) are less than or equal to 25.
  • For larger samples, the distribution of \(W_{s}\) approaches normal; which means we can calculate a \(z\) score for them.
    • The mean of the distribution of \(W_{s}\) is: \(\frac{n_{1}\left(n_{1}+n_{2}+1\right)}{2}\)
    • And the standard deviation of the distribution of \(W_{s}\) is: \(\sqrt{\frac{n_{1}n_{2}\left(n_{1}+n_{2}+1\right)}{12}}\)
  • So, the \(z\) score is calculated using:
    \(z = \frac{statistic - mean}{standard\ deviation} = \frac{W_{s} - \frac{n_{1}\l...

...1}+n_{2}+1\right)}{2}}{\sqrt{\frac{n_{1}n_{2}\left(n_{1}+n_{2}+1\right)}{12}}}\)

For the current example

2.2. Wilcoxon's Matched-Pairs Signed-Ranks Test

Wilcoxon's Matched-Pairs Signed-Ranks test

  • The Wilcoxon's Matched-Pairs Signed-Ranks test is an appropriate alternative to the Dependent Samples \(t\) test.
  • It is used when the assumptions for the Dependent Samples \(t\) test can not be met.
  • Specifically it is used to determine if a significant difference exists among two related sets of scores.
    • e.g., pretest to post test.
  • Like the previous Wilcoxon test, this one works with ranks and the sum of ranks.

Quick example

  • Suppose were were interested in documenting the effectiveness of Xanax as an anti-anxiety treatment.
  • We gather 10 individuals who meet the diagnostic criteria for general anxiety and measure their symptoms with a standard anxiety survey.
  • Then, we administer a protocol of Xanax for two weeks and follow that with another measure of their symptoms on the anxiety survey.
  • We would expect the post test scores to be lower than the pretest scores.
    • One-tailed test, lower end

Example Data

Pre Post Difference Rank of difference Signed Rank
15 8 7 2.5 2.5
18 10 8 4.5 4.5
17 8 9 6.5 6.5
19 11 8 4.5 4.5
20 13 7 2.5 2.5
22 12 10 8.5 8.5
16 18 -2 1 -1
24 12 12 10 10
23 14 9 6.5 6.5
21 11 10 8.5 8.5


\(T+ = \sum{positive\ ranks} = 54\)
\(T- = \sum{negative\ ranks} = -1\)

\(T_{calc}\) and \(T_{crit}\)

  • Once we have the sum of both positive and negative difference ranks, we compute \(T_{calc}\) which is simply the smaller of the two in absolute value.
    • Since \(T- = -1\) is smaller in absolute value than \(T+ = 54\), then \(T_{calc} = 1\) (the absolute value of the smaller rank sum).
  • To find the critical value (\(T_{crit}\)), we use the number of participants or cases (\(n = 10\)) and look in the \(T\) distribution table, specifically the column with a significance level of 0.05 (the table linked below has only one column: the 0.05 values are listed).
    • As before, all the values in the table are for one-tailed tests.

\(T_{calc}\) versus \(T_{crit}\)

  • So, the \(T_{crit}\) (labeled S in the table linked above), for \(n = 10\) would be 10 (with exact significance level at 0.04199) or we could use 11 (with an exact significance level of 0.05273).
    • Remember, because we are dealing with ranks, \(T_{crit}\) must be a discrete number.
  • So, since \(T_{calc} = 1 < 10 = T_{crit}\) we reject the null hypothesis and conclude that the post-test scores were significantly lower than the pretest scores.

From \(T\) to \(z\)

  • When sample sizes are greater than 50, we can conduct a \(z\) test with our ranked sums \(T\).
  • The distribution of \(T\) is approximately normal when \(n > 50\) with a mean of: \(\frac{n\left(n + 1\right)}{4}\)
  • And a standard deviation of: \(\sqrt{\frac{n\left(n+1\right)\left(2n+1\right)}{24}}\)
  • All of which gives us what we need to compute \(z\):
    \(z = \frac{T - \frac{n\left(n+1\right)}{4}}{\sqrt{\frac{n\left(n+1\right)\left(2n+1\right)}{24}}}\)

Current Example applied to \(z\) test

3. Kruskal-Wallis One-way ANOVA

Kruskal-Wallis One-way ANOVA

  • The Kruskal-Wallis test is a nonparametric replacement for the One-way ANOVA when the assumptions of One-way ANOVA are not met.
  • The Kruskal-Wallis test is a direct extension of the Wilcoxon's Rank-Sum test for independent groups.
    • Both are based on the sums of ranks.
  • As with the Wilcoxon's Rank-Sum test we again rank all of the scores (regardless of group membership) and then sum the ranks for each group.

Omnibus test of differences

  • The Kruskal-Wallis test is used to identify differences in central tendency among more than 2 groups.
  • As with the One-way ANOVA, the Kruskal-Wallis test can only tell us if there is a significant difference among the central tendencies of the groups; it does not tell us where the group differences are located.
    • Secondary analysis, such as the Wilcoxon's Rank-Sum test would be necessary (much like conducting post-hoc testing in the ANOVA situation).

Compute \(H\)

  • To calculate the Kruskal-Wallis test; compute \(H\)
    \(H = \left[\frac{12}{n_{t}\left(n_{t}+1\right)}\right]*\sum{\frac{R_{i}^2}{n_{i}}} - 3\left(n_{t}+1\right)\)
  • where \(n_{t}\) is the total number of participants, \(R_{i}\) is the sum of the ranks in group i, and \(n_{i}\) is the number of participants in group i.
  • The comparison distribution is the chi-square distribution with \(df = k - 1\) where \(k\) is the number of groups.

Quick Example

  • Suppose we added limousine drivers to our earlier example comparing driving anger among police officers and taxi drivers.
Police Taxi Limousine
Score Rank Score Rank Score Rank
8 1 27 13 16 9
15 7.5 28 14.5 15 7.5
12 3 19 11 14 6
10 2 17 10 13 4.5
13 4.5 26 12
28 14.5


Tied scores get tied ranks half-way between the two whole number
ranks they would occupy if sequential.

Calculate \(H\)

  • First, we need the sum of each rank (\(R_{i}\)) and the number of participants in each group (\(n_{i}\)).
    \(R_{1} = 1+7.5+3+2+4.5 = 18\) and \(n_{1} = 5\)
    \(R_{2} = 13+14.5+11+10+12+14.5 = 75\) and \(n_{2} = 6\)
    \(R_{3} = 9+7.5+6+4.5 = 27\) and \(n_{3} = 4\)
  • Then we can calculate \(H\)
\(H = \left[\frac{12}{n_{t}\left(n_{t}+1\right)}\right]*\sum{\frac{R_{i}^2}{n_{i}}} - 3\left(n_{t}+1\right) =\)


\(\left[\frac{12}{15\left(15+1\right)}\right]*\left[\frac{18^2}{5}+\frac{75^2}{6}+\frac{27^2}{4}\right] - 3\left(15+1\right) = 11.2275\)

Compare and make a decision

  • Our \(H_{calc}\) (which is really a \(\chi^2\) value) is 11.2275.
  • We have \(df = k -1 = 3 - 1 = 2\) which with 0.05 significance level, yields a critical value of 5.991.
  • So, since \(H_{calc} = 11.2275 > 5.991 = \chi_{crit}^2\) we reject the null hypothesis and conclude there was a significant difference in driving anger among the three groups.
    • Secondary analysis, such as the Wilcoxon's Rank-Sum test would be necessary to determine where the differences were among each group (much like conducting post-hoc testing in the ANOVA situation).

4. Summary of Module 11

Summary of Module 11

Module 11 covered the following topics:

  • Chi-square tests
  • Wilcoxon's Rank-Sum test
  • Wilcoson's Matched-Pairs Signed-Ranks test
  • Kruskal-Wallis One-Way ANOVA

This concludes Module 11

  • Until next time; have a nice day.




This page was last updated on: November 2, 2010




This page was created using LATEX. This document was created in LATEX and converted to HTML using LATEX2HTML.



Return to the Short Course page by clicking the link below.