R can be used to fit a Rasch model in Item Response Theory (IRT). The Rasch model was proposed in the 1960's by a Danish statistician Georg Rasch. The basic Rasch model is used to separate the ability of test takers and the quality of the test. In this section we will be going over how R can be used to estimate IRT models.
Take the following table as an example. The table shows the results of 35 students taking an 18-item ability test. Each item is coded as correct (1) or incorrect (0). The data come from the first example of a popular free software called MiniStep. MiniStep is the evaluation version of a larger, more complete software called WinStep. Ministep can be downloaded from http://www.winstep.org. It works only under the Windows environment.
At the bottom of the table is a row showing the percentages of students who answer each question correctly. Questions 1, 2, and 3 are the easiest because all students get it right. Question 18 is the hardest because none of the students knows the answer to it. Questions like these are not useful in distinguishing students who possess high ability and students who possess low ability. Questions like these are therefore omitted from the calculation.
name q1 q2 q3 q4 q5 q6 q7 q8 q9 q10 q11 q12 q13 q14 q15 q16 q17 q18 Richard 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 Tracie 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 Walter 1 1 1 1 1 1 1 1 1 0 0 1 0 0 0 0 0 0 Blaise 1 1 1 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 Ron 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 William 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 Susan 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 0 0 Linda 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 Kim 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 Carol 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 Pete 1 1 1 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 Brenda 1 1 1 1 1 0 1 0 1 1 0 0 0 0 0 0 0 0 Mike 1 1 1 1 1 0 0 1 1 1 1 1 0 0 0 0 0 0 Zula 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 Frank 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 Dorothy 1 1 1 1 1 1 1 1 1 0 1 0 0 0 0 0 0 0 Rod 1 1 1 1 0 1 1 1 1 1 0 0 0 0 0 0 0 0 Britton 1 1 1 1 1 1 1 1 1 1 0 0 1 0 0 0 0 0 Janet 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 David 1 1 1 1 1 1 1 1 1 1 0 0 1 0 0 0 0 0 Thomas 1 1 1 1 1 1 1 1 1 1 1 0 1 0 0 0 0 0 Betty 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 Bert 1 1 1 1 1 1 1 1 1 1 0 0 1 1 0 0 0 0 Rick 1 1 1 1 1 1 1 1 1 1 1 0 1 0 0 1 1 0 Don 1 1 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 Barbara 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 Adam 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 Audrey 1 1 1 1 1 1 1 1 1 0 1 0 0 0 0 0 0 0 Anne 1 1 1 1 1 1 0 0 1 1 1 0 0 1 0 0 0 0 Lisa 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 James 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 Joe 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 Martha 1 1 1 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 Elsie 1 1 1 1 1 1 1 1 1 1 0 1 0 1 0 0 0 0 Helen 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ---------------------------------------------------------------------------- 1 1 1 .91 .88 .86 .88 .77 .86 .69 .34 .17 .20 .09 .03 .03 .03 0.0
We also need to examine if there are students who get all or none of the items right. If we omit items 1, 2, 3, and 18, and we take a look at the the 0's and 1's for each student, then we see that Helen fails on all remaining items. Helen therefore tells us nothing about which item is harder and which is easier. All items are beyond Helen's ability. Students who can get all items right also do not give us useful information. But in this example none of the students can answer all items correctly.
This is how many specific-purpose statistical packages prepare the raw data for IRT analysis. These packages set aside any items or persons that provide no useful information for the analysis. They analyze the IRT model with the remaining data. From the solutions derived from the remaining data, these packages extrapolate to come up with estimates for items and persons first set aside.
In R we can use a conditional logit model to perform IRT analyses. The
conditinal logit model is in the survival analysis library maintained by
Thomas Lumley at University of Washington. A description of the survival
analysis library and the additional packages it depends on can be obtained by
the R command library(help=survival)
. The R codes offered in this
section have been tested with version 2.11-4 of the survival library in R-1.9.0.
The original derivation by Rasch was different but equivalent to a fixed-effect conditional logit model. Note that the following procedures in R are suitable only for binary items (e.g., pass/fail on test items).
The Rasch model states that the log odds of a person answering an item correctly is a function of the person's ability () and the item's difficulty (). Difficult items are hard to get right even for people with high ability. The odds of getting an item right decreases with item difficulty (and thus the minus sign before ). The notations we use here are from Embretson and Reise (2000).
where
The example data, after removing Helen and items 1, 2, 3, and 18, contain 34
students (
)
and 17 items (
).
Richard is person 1, Tracie is person 2, ..., and so on. To evaluate Richard's
chance of answering item 3 correctly, we take Richard's abiliy (say, it is 3 in
logit unit), substract it out with item 3's difficulty (say, it is 2.2 in logit
unit), and we get a 0.8 logit. We can convert the logit scale back to
probability by taking exp(0.8)/(1 + exp(0.8))
and we get 0.69
.
Richard has a 69% chance of getting item 3 right.
The question is, how do we calculate the 's and the 's in the equation?
We need to first load the survival analysis library by typing
library(survival)
. Then we need to set aside uninformative items and
persons. This is done by the following commands:
> library(survival) > exam1 <- read.csv(file="exam1.dat", sep="", header=T, row.names=NULL) > exam1$id <- factor(paste("s", 1:dim(exam1)[1], sep="")) > # > # percentage of Ss who can correctly answer the items > # > round(apply(exam1[, 3:20], 2, mean), 2) # items 1, 2, 3 and 18 are no good q1 q2 q3 q4 q5 q6 q7 q8 q9 q10 q11 q12 q13 q14 q15 q16 1.00 1.00 1.00 0.91 0.89 0.86 0.89 0.77 0.86 0.69 0.34 0.17 0.20 0.09 0.03 0.03 q17 q18 0.03 0.00 > round(apply(exam1[, 7:19], 1, mean), 2) # Helen got every good item wrong 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 0.23 0.46 0.46 0.15 0.46 0.46 0.77 0.46 0.46 0.54 0.38 0.31 0.46 0.54 0.69 0.46 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 0.38 0.54 0.38 0.54 0.62 0.62 0.62 0.77 0.15 0.46 0.23 0.46 0.46 0.38 0.46 0.54 33 34 35 0.15 0.62 0.00 > exam1[35, ] name sex q1 q2 q3 q4 q5 q6 q7 q8 q9 q10 q11 q12 q13 q14 q15 q16 q17 q18 id 35 Helen F 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 s35 > # remove items 1, 2, 3, and 18 because they provide no useful information > names(exam1) [1] "name" "sex" "q1" "q2" "q3" "q4" "q5" "q6" "q7" "q8" [11] "q9" "q10" "q11" "q12" "q13" "q14" "q15" "q16" "q17" "q18" [21] "id" > exam1.1 <- exam1[, -c(3,4,5,20)] > # remove Helen, who answered all good items incorrectly > exam1.1 <- exam1[ -dim(exam1)[1], ]
To run the conditional logit model clog()
, we need to reshape
the data from a wide format to a long format. The reshape()
command
takes q4:q17
and organize them into a new variable called
resp
. Another new varible item
is created to identify
questions 4 to 17.
> exam1.1 <- reshape(exam1.1, varying=list(paste("q", 4:17, sep="")), + timevar="item", v.names="resp", direction="long")
In the equation the item difficulty parameters (the
's)
have a minus sign before them. In clog()
a parameter with a minus
sign is represented by a dummy variable of -1. For example, in the long form the
items are identified by the variable item
. An excerpt of the full
data shows that Martha answered item 1 correctly, Elsie also answered item 1
correctly. The answers of Martha and Elsie for item 1 are followed by answers of
Richard, Tracie, and Walter to the next item (item 2).
> exam1.1[34:38, ] name sex id item resp s33.1 Martha F s33 1 1 s34.1 Elsie F s34 1 1 s1.2 Richard M s1 2 1 s2.2 Tracie F s2 2 1 s3.2 Walter M s3 2 1
Using the above data excerpt as an example, we can create dummy variables for
items 1 and 2. The R command is tt <- factor(exam1.1[34:38, "item"]);
dummy <- diag(nlevels(tt))[tt, ]
and you have:
> cbind(exam1.1[34:38, ], 0 - dummy) name sex id item resp 1 2 s34.1 Elsie F s34 1 1 -1 0 s35.1 Helen F s35 1 1 -1 0 s1.2 Richard M s1 2 1 0 -1 s2.2 Tracie F s2 2 1 0 -1 s3.2 Walter M s3 2 1 0 -1
Note that the last two columns are added with cbind()
and they
can be used to code the
parameters for items 1 and 2, respectively. When this is applied to the 476 rows
in the data set, it produces a dummy variable matrix of 476 rows and 14 columns.
Each column of the matrix will be used to estimate the item difficulty
associated with each of questions 4 to 17.
> ex.i.dummy <- diag(nlevels(factor(exam1.1$item)))[factor(exam1.1$item), ] > ex.i.dummy <- 0 - ex.i.dummy # turns (0, 1) into (0, -1) > ex.i.dummy <- data.frame(ex.i.dummy, row.names=NULL) > dimnames(ex.i.dummy)[[2]] <- paste("i", 4:17, sep="") > dim(ex.i.dummy) [1] 476 14
Finally, a call to clogit()
completes the calculations. Worth
noting is how the syntax is written. The strata()
function tells
clogit()
to estimate item difficulty associated with each dummy
variable by holding constant the influence across different persons. The syntax
also specifies that the item difficulty paramaters be estimated for all items
except i4
. This is because i4
is the reference level
when all dummy variables are zero. The clogit()
function is not
able to define a reference level if i4
is added. The clogit()
function with i4
does not know what to do when all dummy variables
are zero.
> attach(ex.i.dummy) # attach the dummy variables so I can call i4:i17 > # item 4 is the reference > exam2.clog<- clogit(resp ~ i5+i6+i7+i8+i9+i10+i11+i12+i13+i14+ + i15+i16+i17+ strata(id), data=exam1.1) > summary(exam2.clog) Call: clogit(resp ~ i5 + i6 + i7 + i8 + i9 + i10 + i11 + i12 + i13 + i14 + i15 + i16 + i17 + strata(id), data = exam1.1) n= 476 coef exp(coef) se(coef) z p i5 0.514 1.67 1.025 0.501 6.2e-01 i6 0.923 2.52 0.991 0.931 3.5e-01 i7 0.514 1.67 1.025 0.501 6.2e-01 i8 1.875 6.52 0.955 1.964 5.0e-02 i9 0.923 2.52 0.991 0.931 3.5e-01 i10 2.598 13.44 0.944 2.752 5.9e-03 i11 4.517 91.54 0.954 4.736 2.2e-06 i12 5.792 327.53 1.028 5.632 1.8e-08 i13 5.537 253.95 1.010 5.484 4.1e-08 i14 6.825 920.74 1.136 6.007 1.9e-09 i15 8.095 3278.95 1.403 5.772 7.8e-09 i16 8.095 3278.95 1.403 5.772 7.8e-09 i17 8.095 3278.95 1.403 5.772 7.8e-09 exp(coef) exp(-coef) lower .95 upper .95 i5 1.67 0.597954 0.224 12.5 i6 2.52 0.397477 0.361 17.6 i7 1.67 0.597954 0.224 12.5 i8 6.52 0.153373 1.004 42.4 i9 2.52 0.397477 0.361 17.6 i10 13.44 0.074425 2.112 85.5 i11 91.54 0.010924 14.120 593.5 i12 327.53 0.003053 43.650 2457.7 i13 253.95 0.003938 35.104 1837.0 i14 920.74 0.001086 99.293 8538.0 i15 3278.95 0.000305 209.826 51240.2 i16 3278.95 0.000305 209.826 51240.2 i17 3278.95 0.000305 209.826 51240.2 Rsquare= 0.525 (max possible= 0.659 ) Likelihood ratio test= 355 on 13 df, p=0 Wald test = 108 on 13 df, p=0 Score (logrank) test = 277 on 13 df, p=0
With the i4
being the reference level, the coeffiencts
associated with the dummy variables should be interpreted as the differences in
item difficulty between i4
and each of i5
to i17
.
For example, the 0.514
associated with i5
means that
i5
is more difficult (remember, higher
means higher difficulty) than i4
by 0.514
. In
addition, items 15, 16, and 17 are equally difficult. They are more difficult
than i4
by 8.095
logits. This translates to a
reduction in the probability of answering items 15, 16, and 17 correctly. On a
probability scale, this 8.095 logit is equivalent to 0.999 reduction in
probability (try exp(8.095) / ( 1 + exp(8.095))
).
The clogit()
function in R is general-purpose routine designed
for fitting a conditional logit model. Thus it does not automatically print out
many other statistics specific to Rasch model. For example, the summary()
function does not automatically generate the estimated ability level for each
person. Other packages like MiniStep does that for you. But clogit()
in R provides estimates of item difficulty, which are the most important
information for developing surveys or tests.