STAT 778 R TEMPLATES

Item Response Theory - Examples in R

The Basics of R
Loading in Data
Classical Test Theory
Classical Item Analysis
Subsetting Data
Simulating IRT Data
Saving Simulated Data
Splitting an Exam by Items
Difference Between IRFs
Item Information Functions
Dimensionality Procedures
Differential Item Functioning

The Basics of R:

R, like the commercial S-Plus, is based on the programming language S. It can be downloaded for from www.r-project.org This page also has links to FAQs and other information about R.

One of the nice things about R (besides the fact that it's powerful and free) is that you can save everything you've done after you've finished each time. You can manually load and save the .Rdata file in your local drive each time you want to start and end the program, respectively.

At the heart of R are the various objects that you enter. At any time (when you are in R) you can type in the function objects() to see which ones you have entered previously. Presumably there aren't any there right now. R also has many built in objects, most of which are functions. Because these objects are always there, it will not list them when you type objects().

A list of the various functions for R can be found in the manuals that are installed with the program. The various manuals can be found under the Manuals submenu of the Help menu. In particular, the R Reference Manual lists not only all the functions in the base package of R, but also some of those in the supplementary packages as well.

Once you know the name of a function, you can get help about it simply by typing help(functionname). In R, parentheses indicate either mathematical grouping, like they usually do, or that you are applying a function. Braces [ ] indicate that you are finding an element in a string or array.

This brief example shows some of R's flexibility.

x<-c(5,4,3,2)

The <- is the command that assigns what ever is on the left to the name that is on the right. The c indicates that an array or vector is going to be entered. Simply typing x now will return those four values. Typing x[3] will return the third value, mean(x) will return the mean, and objects() will show you that you now have an object called x.

R also has a variety of graphical functions and statistical functions built in. hist(x) will construct a histogram, and t.test(x,alternative="greater",mu=5) will test the null hypothesis mu=5 vs. the alternate that mu > 5. (If you happen to mistype a command and get an error, just hit the up arrow to bring it up again for editing).

Besides just getting the result of a function, we can store the results somewhere. For example x.t<-t.test(x,alternative="greater",mu=5) will store the result of the t-test in x.t. attributes(x.t) will then show you the various properties of x.t. Thus, x.t$p.value will return the p-value.

R can also be used to write your own functions. For example, if you wanted a function that returned sin(x)+x² we could write:

examp.func<-function(x){ y<-sin(x)+x^2 return(y)}

Try examp.func(5) and examp.func(x). What is it doing?

Loading in Data: Two common formats for testing data are simply big matrices, either with or without spaces. To load these into R as an object called your.data we could do:

your.data<-read.fwf("a:/your.data",widths=rep(1,24))

your.data<-read.fwf("a:/your.data",widths=rep(2,24))

The widths=rep(x,24) tells r that there will be 24 columns, each of width x. Notice that just typing your.data won't be very interesting. However, if you type

dim(your.data)

you will see that your.data is a matrix with 6000 rows and 24 columns. Using

apply(your.data,2,mean)

will give you the average of each column (the 2 is for column, 1 would be for row). Finally, we could get a histogram of the sums of the rows

hist(apply(your.data,1,sum),breaks=c((0:25)-0.5)).

Classical Test Theory Statistics

The three functions below give some of the basic Classical Test Theory statistics that we talked about in class. It is probably best to copy all three of the functions over at one time since sem uses alpha and xandt uses sem. Copy them over just as they are, you don't need to make any changes.

Once you have copied the three functions over, you can run them by simply using the form function(arguments). Thus, to get the measures of reliability for the your.data data set we could use alpha(your.data). To get the 60% confidence intervals for true score using alpha (instead of lambda2) you would type
xandt(your.data,rtype="alpha",conf.level=0.6)

Brief Descriptions of the Three Functions

alpha - Calculates both the estimated Cronbach's alpha and Guttman's lambda2 statistics.

sem - Calculates the estimated sem using both alpha and lambda2 to estimate the reliability. sem1 uses alpha, and and sem2 uses lambda2.

xandt - Gives the frequency distribution of observed scores, the regression estimate of true score, the sem based confidence interval for T, and the regression based confidence interval for T. It gives you the option of using either alpha or lambda2 (lambda 2 is the default) and your choice of confidence level (0.95 is the default). At the top of the ouput it gives you the sample mean, variance and standard deviation of the observed scores (X). Below that it gives a table. The columns of the table are:

x = observed score,

treg = the t-hat estimated using the regression method for that X,

count = the number of examinees with that observed score,

losem and hisem = the confidence interval for T using the form X+z*SEM,

loreg and hireg = the confidence interval for T using the form T-hat+z*SEE.

The Functions Themselves

alpha<-function(testdata){
n<-ncol(testdata)
nexmn<-nrow(testdata)
x<-apply(testdata,1,sum)
s2y<-diag(var(testdata))*(nexmn-1)/nexmn
s2x<-var(x)*(nexmn-1)/nexmn
alpha<-(n/(n-1))*(1-sum(s2y)/s2x)
s2yy<-(((var(testdata)-diag(diag(var(testdata))))*(nexmn-1)/nexmn))^2
lambda2<-1-sum(s2y)/s2x+sqrt((n/(n-1))*sum(s2yy))/s2x
return(alpha,lambda2)}

sem<-function(testdata){
nexmn<-nrow(testdata)
x<-apply(testdata,1,sum)
rhat<-alpha(testdata)
sx<-sqrt(var(x)*(nexmn-1)/nexmn)
sem1<-sx*sqrt(1-rhat$alpha)
sem2<-sx*sqrt(1-rhat$lambda2)
return(sem1,sem2)}

xandt<-function(testdata,rtype="lambda2",conf.level=0.95){
nexmn<-nrow(testdata)
x<-apply(testdata,1,sum)
avgx<-mean(x)
varx<-var(x)*(nexmn-1)/nexmn
sdx<-sqrt(varx)
maxx<-max(x)
xrange<-c(0:maxx)
count<-rep(0,maxx)
for (i in 0:maxx){
  count[i+1]<-sum(rep(1,nexmn)[x==i])}
x<-c(0:maxx)
z<--1*qnorm((1-conf.level)/2)
if (rtype=="alpha") {
rhat<-alpha(testdata)$alpha
sem.est<-sem(testdata)$sem1
}
else {
rhat<-alpha(testdata)$lambda2
sem.est<-sem(testdata)$sem2
}
losem<-x-z*sem.est
hisem<-x+z*sem.est
treg<-rhat*x+(1-rhat)*avgx
see.est<-sem.est*sqrt(rhat)
loreg<-treg-z*see.est
hireg<-treg+z*see.est
return(avgx,varx,sdx,cbind(x,treg,count,losem,hisem,loreg,hireg))}

Classical Item Analysis

The function items will calculate the p-value, the variance, the D based on a user specified percent of the examinees (default=27%), the point biserial, and the biserial. If correct=T (the default) it will remove the item in question from the sum score, otherwise it will include that item. dpct tells it what percent of the examinees to use at each end for calculating D. digits tells it how many digits to report after the decimal (the default is four).

Thus, items(your.data) will analyze your.data at the default settings, while
items(your.data,correct=F,dpct=0.5,digits=3) will analyze it without removing the item in question, using 50% instead of 27%, and using only 3 digits after the decimal place.

items<-function(testdata,correct=T,dpct=0.27,digits=4){
n<-ncol(testdata)
nexmn<-nrow(testdata)
x<-apply(testdata,1,sum)
medx<-median(x)
lowx<-sort(x)[floor((nexmn+1)*dpct)]
highx<-sort(x)[ceiling((nexmn+1)*(1-dpct))]
pvalue<-as.numeric(apply(testdata,2,mean))
var<-as.numeric(apply(testdata,2,var)*(nexmn-1)/nexmn)
D<-rep(0,n)
PBis<-rep(0,n)
Bis<-rep(0,n)
for (i in 1:n){
     if (correct==T){
         xx<-x-testdata[,i]
	 lowx<-sort(xx)[floor((nexmn+1)*dpct)]
         highx<-sort(xx)[ceiling((nexmn+1)*(1-dpct))]
         }
     else {xx<-x}
     pu<-mean(testdata[xx >= highx,i])
     pl<-mean(testdata[xx <= lowx,i])
     D[i]<-pu-pl
     PBis[i]<-cor(xx,testdata[,i])
     Bis[i]<-sqrt(pvalue[i]*(1-pvalue[i]))*PBis[i]/dnorm(qnorm(pvalue[i]))
}
return(round((10^digits)*cbind(pvalue,var,D,PBis,Bis))/(10^digits))}

Subsetting Data

In order to use the following code, you must already have the data set your.data loaded into your computer. Simply copying and pasting the following code into R will then divide the dataset into three smaller sets of 2,000 examinees each.

It does this by going through the following steps: 1) generate a list of the numbers 1-6,000 in a random order, 2) put the observations corresponding to those first 2,000 random numbers into arand, 3) put the remaining 4,000 observations into not.arand, 4) calculate the observed score X for these 4,000 observations, 5) order the 4,000 observations by their observed score, 6) put the first 2,000 in alo, and 7) put the rest in ahi.

random.order<-sample(1:6000,replace=F)
arand<-your.data[random.order[1:2000],]
not.arand<-your.data[random.order[2001:6000],]
x.for.not.arand<-apply(not.arand,1,sum)
not.arand<-not.arand[order(x.for.not.arand),]
alo<-not.arand[1:2000,]
ahi<-not.arand[2001:4000,]

Having entered the above code means you have added three new data sets to R: arand, alo, and ahi. You could now run sem(alo), for example, if you wanted.

The following code will combine the biserial correlation calculated with alo and all of the item statistics calculated with ahi into a single piece of output. The first line makes something called q.1.data that has the fifth column of output from items(alo) and all of the columns of output from items(arand). The second line gives new names to the columns to make it clear what we've calculated.

q.1.data<-cbind(items(alo)[,5],items(arand)) 
colnames(q.1.data)<-c("lo.Bis","rand.p","rand.v","rand.D","rand.PBis","rand.Bis")

After you enter the above two lines of code, you may simply type q.1.data to see the data on the screen. You could cut and paste it from there to another application, or analyze it in R if you are comfortable with that. Keep in mind that your numbers will be slightly different than everyone elses because we each took our own random subsets.

Simulating IRT Data

The three functions below are:

irf - given theta, a, b, c, and the type of model ("3pl" or "norm"), this function returns the probability the examinee will get the item correct. By default the guessing parameter cc is set to 0. The following would give the values to plot for item 1 in problem 1 on page 28 of the text (answer on page 30).

irf(theta=c(-3,-2,-1,0,1,2,3),a=1.8,b=1.0,cc=0.0,type="3pl")
round(1000*irf(theta=c(-3,-2,-1,0,1,2,3),a=1.8,b=1.0,cc=0.0,type="3pl"))/1000

Notice that you are able to enter multiple theta values as a vector if you like.

irfplot - will give a plot of the specified item response function for abilities ranting from -3.5 to +3.5. It allows you to label the graph, and make it dashed if you would like. The following will construct a graph of the same parameters for both the normal ogive (dashed) and logistic (not-dashed) models. 3PL and not-dashed are both the default settings.

irfplot(a=1.8,b=1.0,cc=0.0,label="item 1 comparison")
par(new=T)
irfplot(a=1.8,b=1.0,cc=0.0,type="norm",dashed="T")

The command par(new=T) allows you to overlay the first and second plot (otherwise it would have erased the first one).

irtgen - this function will simulate data from an exam that follows either the normal-ogive or logistic models. It requires the number of examinees (default=10); the vector of a, b, and c parameters; the type of item ("3PL" or "norm"); and the mean and standard deviation of the examinee abilities (which are simulated to follow the normal distribution). The following sample code simulates an exam consisting of 10 items and 100 examinees, where each item had a=1, b=0, and c=1, where it followed the normal-ogive model, and where the distribution of examinee abilities was standard normal. It calls the simulated data set sampdat and runs items on it.

a<-rep(1,10)
b<-rep(0,10)
cc<-rep(0,10)
sampdat<-irtgen(nexmn=100,avec=a,bvec=b,cvec=cc,type="norm",mthet=0,sdthet=1)
sampdat
items(sampdat)

The rep(1,10) is telling R to make a vector that repeats 1 ten times. Type a to see thats what it did. Also, note that we had to use cc instead of c for the name of that vector. This is because R reserves c to be the command for making a vector (e.g. c(1,2,3). Finally, make sure to enter all three functions because both irfplot and irtgen use irf.

The Functions

irf<-function(theta=0,a=1,b=0,cc=0,type="3pl"){
if (type=="3pl"){
  prob<-cc+(1-cc)/(1+exp(-1.7*a*(theta-b)))
}
if (type=="norm"){
  prob<-cc+(1-cc)*pnorm(a*(theta-b))
}
return(prob)
}


irfplot<-function(a=1,b=0,cc=0,type="3pl",label="",dashed="F"){
prob<-rep(0,71)
theta<-rep(0,71)
for (i in 1:71){
theta[i]<-(i-36)/10
prob[i]<-irf(theta[i],a,b,cc,type)
}
if (dashed=="F"){
plot(theta,prob,xlab="theta",ylab="prob",type="l",
xlim=c(-3.5,3.5),ylim=c(0,1),main=label)
}
if (dashed=="T"){
plot(theta,prob,xlab="theta",ylab="prob",type="l",
xlim=c(-3.5,3.5),ylim=c(0,1),main=label,lty=2)
}
}


irtgen<-function(nexmn=10,avec=c(1),bvec=c(0),cvec=c(0),type="3pl",
mthet=0,sdthet=1){
n<-length(avec)
data<-matrix(0,nrow=nexmn,ncol=n)
ability<-rnorm(nexmn,mean=mthet,sd=sdthet)
p<-t(apply(as.array(ability),1,irf,avec,bvec,cvec,type))
try<-runif(nexmn*n,0,1)
data[try < p]<-1
return(data)
}

Saving Simulated Data

irtsave - will save an IRT data matrix with several different options. It may have a leading examinee id number, up to 99999, (the default), and it may or may not (the default) have spaces between the item responses following the id. The following are four examples of what the outputted files could look like (following the command used to generate it). It uses the sampdat matrix created above.

----------------------------------------------
irtsave(sampdat,"a:/simul.dat")

00001 0000100000
00002 0010001110
00003 0011000000
etc...
----------------------------------------------
irtsave(sampdat,"a:/simul.dat",sep=T)

00001 0 0 0 0 1 0 0 0 0 0
00002 0 0 1 0 0 0 1 1 1 0
00003 0 0 1 1 0 0 0 0 0 0
etc...
----------------------------------------------
irtsave(sampdat,"a:/simul.dat",id=F)

0000100000
0010001110
0011000000
etc...
----------------------------------------------
irtsave(sampdat,"a:/simul.dat",sep=T,id=F)

 0 0 0 0 1 0 0 0 0 0
 0 0 1 0 0 0 1 1 1 0
 0 0 1 1 0 0 0 0 0 0
etc...
----------------------------------------------

The Function itself is:

irtsave<-function(datamatrix,flnm,id=T,sep=F){
nitem<-length(datamatrix[1,])
nexmn<-length(datamatrix[,1])
if (sep==F) {
  s1<-"" 
  s2<-" " 
}
else if (sep==T) {
  s1<-" " 
  s2<-""
}
ndigits<-5
rowput<-rep("",nexmn)
for (j in 1:nexmn){
  for (i in 1:nitem){
    rowput[j]<-paste(rowput[j],as.character(datamatrix[j,i]),sep=s1)
  }
  if (id==T){
    rowput[j]<-paste(
      paste(rep("0",(ndigits-ceiling(log10(j+1)))),sep="",collapse=""),
      as.character(j),s2,rowput[j],sep="")
  } 
}
write(as.table(as.matrix(rowput)),file=flnm)
}

Splitting an Exam by Items

Say we want to split the exam your.data into two separate exams, one containing items 1,8-15, and 23, and the other containing 2-7,16-22, and 24. This could be done using the following code that calls the new exams asplit1 and asplit2.

asplit1<-your.data[,c(1,8:15,23)]
asplit2<-your.data[,c(2:7,16:22,24)]

If needed, you could save these new data sets to disk using the irtsave function above.

Difference Between IRFs

irfdiff - In order to use this function you must have already entered the function irf above. The function approximates the area between two item response functions between theta=-10 and theta=10 for an ability distribution with mean=mthet and standard deviation=sdthet. It works for either the normal ogive model or the 3pl model.

The code for the function is:

irfdiff<-function(a1,b1,c1,a2,b2,c2,type="3pl",mthet=0,sdthet=1){
thet<-(-500:500)/50
weight<-dnorm(thet,mean=mthet,sd=sdthet)
diff<-irf(thet,a1,b1,c1,type=type)-irf(thet,a2,b2,c2,type=type)
probdiff<-(sum(weight*abs(diff))/sum(weight))
return(probdiff)
}

For two 3PL items with parameters a=0.914, b=-1.516, c=0.184 and a=0.79, b=-1.822, and c=0, where the population is standard normal, we would use the following code:

irfdiff(a1=0.914,b1=-1.516,c1=0.184,a2=0.79,b2=-1.822,c2=0,type="3pl",mthet=0,sdthet=1)

Item Information Functions

The following two functions concern the information functions of items and tests. As the second function relies on the first, it is probably best to enter them as a pair. In each a single item can be analyzed by entering its particular a, b, and c parameters. If vectors of parameters are entered then the analysis will be done for a test consisting of those items. For example a=c(1,1.5,1), b=c(0.5,1,1), cc=c(0,0,0.2) would be used to get the information for a three item test with parameters: a1=1, b1=0.5, c1=0, a2=1.5, b2=1, c2=0, a3=1, b3=1, and c3=0.2.

irfinfo - returns the information for the specified item or test at the theta values requested. It will function for either a logistic or ogive form using the options type="3pl" or type="norm".

The following commands will generate part of the answers to question 3 on page 97 of the text.

irfinfo(theta=c(-2,-1,0,1,2),a=c(1.25,1.5,1.25),b=c(-0.5,0,1),
cc=c(0,0,0),type="3pl")

irfinfoplot - returns the plotted information function, with the same options as irfplot above. The only difference in the options is that the user can specify the maximum value of the y-axis for the plots using maxy=. The default is 1.1 times the maximum information for that item. This is important for overlaying the plots of several information functions

The following commands will generate the information curves for the two items (and the two item test) where the parameters are a1=1.5, b1=-0.5, c1=0.2, a2=1, b2=1, c2=0.25.

par(mfrow=c(2,1))

irfinfoplot(a=1.5,b=-0.5,cc=0.2,type="3pl",label="two items information",
dashed="T",maxy=1.25)
par(new=T)
irfinfoplot(a=1,b=1,c=0.25,type="3pl",label="",dashed="F",maxy=1.25)

irfinfoplot(a=1.5,b=-0.5,cc=0.2,label="test info for two items",
dashed="T",maxy=1.25)
par(new=T)
irfinfoplot(a=1,b=1,c=0.25,dashed="T",maxy=1.25)
par(new=T)
irfinfoplot(a=c(1.5,1),b=c(-0.5,1),cc=c(0.2,0.25),maxy=1.25)

Recall that the par(mfrow=c(2,1)) tells R to plot the graphics so that there are two rows and one column in the plot window, and that par(new=T) means to plot over the current figure. Also notice that the default settings are: type="3pl", dashed="F", and label="", so we don't need to include those.

The code for the two functions is:

#-----------------------------------------------------------

irfinfo<-function(theta=0,a=1,b=0,cc=0,type="3pl"){
n<-length(a)
info<-0
if (type=="3pl"){
  for (i in 1:n){
    info<-info+2.89*a[i]^2*(1-cc[i])/(
      (cc[i]+exp(1.7*a[i]*(theta-b[i])))*
      ((1+exp(-1.7*a[i]*(theta-b[i])))^2)
      )
  }
}
else{
  for (i in 1:n){
    info<-info+
       a[i]^2*(1-cc[i])^2*dnorm(a[i]*(theta-b[i]))^2/
      (irf(theta=theta,a=a[i],b=b[i],cc=cc[i],type="norm")* 
      (1-irf(theta=theta,a=a[i],b=b[i],cc=cc[i],type="norm")))
  }
}
return(info)
}


irfinfoplot<-function(a=1,b=0,cc=0,type="3pl",label="",dashed="F",maxy=0){
info<-rep(0,71)
theta<-rep(0,71)
for (i in 1:71){
theta[i]<-(i-36)/10
info[i]<-irfinfo(theta=theta[i],a=a,b=b,cc=cc,type=type)
}
if (maxy==0){maxy<-1.1*max(info)}
if (dashed=="F"){
plot(theta,info,xlab="theta",ylab="information",type="l",
xlim=c(-3.5,3.5),ylim=c(0,maxy),main=label)
}
if (dashed=="T"){
plot(theta,info,xlab="theta",ylab="information",type="l",
xlim=c(-3.5,3.5),ylim=c(0,maxy),main=label,lty=2)
}
}

#-----------------------------------------------------------

Dimensionality Procedures

The following four functions are related to assessing the dimensionality of dichotomous standardized test data. Two of the functions (stratify and accov) are called by the others and generally wouldn't be called by themselves. The other two functions are:

WARNING! These functions may take quite a while to run, especially for large exams. It took hcaccprox around 45 seconds to analyze the 6,000 examinee, 24 item data set your.data on a 1300mhz Pentium 4.

irtmh - calculates Rosenbaum's Mantel-Haenszel Statistic for IRT data. It requires the name of the data matrix, and the numbers of the two items. It also has an option (include=T) for including the two items in question in the score. The following would calculate the statistic between items 1 and 2, and 1 and 10 in the your.data data set. iIt also calculates the statistic between items 1 and 10 including the items in the score.

irtmh(your.data,1,2)
irtmh(your.data,1,15)
irtmh(your.data,1,15,include=T)

The function returns the z statistic, the p-value for testing that the covariance (conditioned on the chosen observed score) is zero vs. not zero, and the p-value for testing that the covariance is zero vs. less than zero. Notice in this case, that even with the multidimensionality in this data that the second p.less value is still very large! Even with including the two-items in question (which should have a negative bias) the third p.less is barely significant.

hcaccprox - conducts Roussos's cluster analysis procedure on the given test data matrix. The following code conducts the analysis on a random sample of 1,000 examinees from your.data.

randorder<-sample(1:6000,1000,replace=F)
hcaccprox(your.data[randorder,])

The code for the four functions is:

#-----------------------------------------------------------

stratify<-function(respmat,it1,it2,include=F){
  nex<-nrow(respmat)
  nit<-ncol(respmat)
  outarray<-array(0,dim=c(2,2,nit+1))
  numatscore<-rep(0,nit+1)
  if (include==T){
    score<-apply(respmat,1,sum)+1
  }
  else{ 
    score<-apply(respmat,1,sum)-respmat[,it1]-respmat[,it2]+3
  }
  for (j in 1:nex){
    outarray[respmat[j,it1]+1,respmat[j,it2]+1,score[j]]<-
        outarray[respmat[j,it1]+1,respmat[j,it2]+1,score[j]]+1
    numatscore[score[j]]<-numatscore[score[j]]+1
  }
  return(outarray[,,(1:(nit+1))[(numatscore>1)]])
}    
  

irtmh<-function(respmat,it1,it2,include=F){
  temp<-mantelhaen.test(stratify(respmat,it1,it2,include))
  zmh<-as.numeric(sqrt(temp$statistic)*sign(temp$estimate-1))
  return(zmh=zmh,p.equal=2*min(pnorm(zmh),1-pnorm(zmh)) ,p.less=pnorm(zmh))
}

accov<-function(respmat,include=F){
 nit<-ncol(respmat)
 nex<-nrow(respmat)
 ccov<-matrix(0,nrow=nit,ncol=nit)
 tscore<-apply(respmat,1,sum)
 for (i in 1:(nit-1)) {
   for (l in (i+1):nit) {
     sil<-tscore-respmat[,i]-respmat[,l]+1
     irfi<-tapply(respmat[,i],sil,sum)
     irfl<-tapply(respmat[,l],sil,sum)
     irfil<-tapply(respmat[,i]*respmat[,l],sil,sum)
     numatscore<-table(sil)
     numatscore2<-numatscore[numatscore>1]
     irfi<-irfi[numatscore>1]/numatscore2
     irfl<-irfl[numatscore>1]/numatscore2
     irfil<-irfil[numatscore>1]/numatscore2
     ccov[i,l]<-sum((irfil-irfi*irfl)*numatscore2)/sum(numatscore2)
     ccov[l,i]<-ccov[i,l]
   }
 }
  return(ccov)
}


hcaccprox<-function(respmat){
  library(mva)
  dmat<-as.dist(-1*accov(respmat)+1)
  plot(hclust(dmat,method="average"))
} 

#-----------------------------------------------------------

Differential Item Functioning

difmh - calculates the z-statistic, 2-sided p-value, and MH D-DIF Statistic for detecting differential item functioning. It calls the function difstrata and so that function must also be included.

The ETS classification system reported in Peteren (1987) classified items based as A, B, and C based on the p-value for testing that there is no DIF and the D score (which measures the amount of DIF). The B and C categories also involves testing that the D score is greater than 1 in absolute value. (A D of 0 indicates no DIF). In particular an A classification occurs if either: the DIF is not statistically significant OR the absolute value of D is less than 1.

The Mantel Haenszel D-DIF Statistic is defined as -2.35 times the natural logarithm of the pooled odds ratio. A positive value indicates DIF against the focal group, and a negative value indicates DIF against the reference group.

The reason for the logarithm is that it converts the odds ratio from a scale where 0 is the most DIF in one direction, 1 is no DIF and infinity is the most DIF in the other direction; to a scale where -infinity and inifinity are the extremes, and 0 is no DIF. In particular, the A, B, C system consideres an absolute value of one to be troubling, and an absolute value of 1.5 to be very troubling. There isn't a nice translation to a more intuitive scale though.

Anyway, the following code will simulate the case where the 24 item exam effectively has the same parameters for both the reference and focal group, except for item 1 (which is harder for the focal group). Both groups of 500 examinees have the same, standard normal, ability distribution. It then calculates the estimated DIF for all of the items, and stores the results in three vectors: DIFz, DIFpval and DIFD.

#simulate

a<-exp(rnorm(24,mean=0,sd=0.35))
b<-rnorm(24,0,1)
cc<-rep(.2,24)
refgroup<-irtgen(nexmn=500,avec=a,bvec=b,cvec=cc,type="norm",mthet=0,sdthet=1)
b[1]<-b[1]+0.5
focgroup<-irtgen(nexmn=500,avec=a,bvec=b,cvec=cc,type="norm",mthet=0,sdthet=1)

#calculate DIF

DIFz<-rep(0,24)
DIFpval<-rep(0,24)
DIFD<-rep(0,24)
for (i in 1:24){
  temp<-difmh(refgroup,focgroup,i)
  DIFz[i]<-temp$zmh
  DIFpval[i]<-temp$p.equal
  DIFD[i]<-temp$D
}
round(DIFz*1000)/1000
round(DIFpval*1000)/1000
round(DIFD*1000)/1000

#done

The above generation of the a parameters is based on distributions used by Donoghue and Allen (1993) to reflect SAT parameters reported by Lord (1968). To change the number of items on the simulated exam, simply change the 24 in each line to the appropriate number.

The code for the two functions is:

#-----------------------------------------------------------

difstrata<-function(refmat,focmat,it){
  nref<-nrow(refmat)
  nfoc<-nrow(focmat)
  refscore<-apply(refmat,1,sum)
  focscore<-apply(focmat,1,sum)
  nit<-max(refscore,focscore)
  outarray<-array(0,dim=c(2,2,nit+1))
  numatscore<-rep(0,nit+1)
  refscore<-apply(refmat,1,sum)+1
  focscore<-apply(focmat,1,sum)+1
  for (j in 1:nref){
    outarray[refmat[j,it]+1,1,refscore[j]]<-
        outarray[refmat[j,it]+1,1,refscore[j]]+1
    numatscore[refscore[j]]<-numatscore[refscore[j]]+1
  }
  for (j in 1:nfoc){
    outarray[focmat[j,it]+1,2,focscore[j]]<-
        outarray[focmat[j,it]+1,2,focscore[j]]+1
    numatscore[focscore[j]]<-numatscore[focscore[j]]+1
  }
  return(outarray[,,(1:(nit+1))[(numatscore>1)]])
}    
  

difmh<-function(refmat,focmat,it){
  temp<-mantelhaen.test(difstrata(refmat,focmat,it))
  zmh<-as.numeric(sqrt(temp$statistic)*sign(temp$estimate-1))
  return(zmh=zmh,p.equal=2*min(pnorm(zmh),1-pnorm(zmh)),
   D=as.numeric(-2.35*log(temp$estimate)))
}

#-----------------------------------------------------------