Bootstrapped confidence
interval for the Independent t-test
The following covers how to conduct a bootstrapped
resampling procedure to get confidence intervals for a t-test.
Use the File, Import Data... to import the
Example
Data 1 file using the Import Wizard with SPSS File (*.sav)
source and member name example1 as was done previously.
Let's start by getting a look at the data and
variables of interest.
PROC PRINT DATA = example1;
RUN;
PROC MEANS DATA = example1;
CLASS candy;
VAR recall1;
RUN;
1.
Next, we can conduct the t-test.
We can use PROC TTEST to examine differences
between two independent groups. Notice in the output, we get t-values
for variances assumed equal and variances not assumed equal.
Run the independent groups t-test.
PROC TTEST DATA=example1;
CLASS candy;
VAR recall1;
RUN;
2.
Building the macro or function; to run
the bootstrapped re-sampling (yes, this takes some time to type!) with
1000 re-samples. References: (1)
(2)
%MACRO bootse (b);
DATA orig1 (WHERE = (candy = 1))
orig2 (WHERE = (candy = 2));
SET example1;
RUN;
DATA boot;
%DO t = 1 %to 2;
DO sample = 1 to &b;
DO i = 1 to NOBS;
pt = ROUND(RANUNI(&t) * NOBS);
SET orig&t NOBS = NOBS POINT = pt;
OUTPUT;
END;
END;
%END;
STOP;
RUN;
PROC MEANS
DATA = boot
NOPRINT
NWAY;
CLASS sample candy;
VAR recall1;
OUTPUT out = x
MEAN = mean;
RUN;
DATA diffmean;
MERGE x (WHERE = (candy = 1) RENAME = (mean = mean1))
x (WHERE = (candy = 2) RENAME = (mean = mean2));
BY sample;
diffmean = mean1 - mean2;
RUN;
PROC MEANS
DATA = diffmean
STD;
VAR diffmean;
OUTPUT out = bootse
STD = bootse;
RUN;
%MEND;
%bootse (1000);
DATA bootorig;
SET example1 (in = a)
boot;
if a THEN sample = 0;
RUN;
PROC MEANS
DATA = bootorig
NOPRINT
NWAY;
CLASS sample candy;
VAR recall1;
OUTPUT out = x
mean = mean
var = var
n = n;
RUN;
DATA diff_z;
MERGE x (WHERE = (candy = 1) RENAME = (mean = mean1 var = var1 n = n1))
x (WHERE = (candy = 2) RENAME = (mean = mean2 var = var2 n = n2));
BY sample;
diffmean = mean1 - mean2;
diffse = sqrt ((var1 + var2) / (n1 + n2));
RETAIN origdiff;
IF sample = 0 THEN origdiff = diffmean;
diff_z = (diffmean - origdiff) / diffse;
RUN;
PROC SORT
DATA = diff_z;
BY diff_z;
RUN;
DATA t_vals;
SET diff_z END = eof;
RETAIN t_lo t_hi;
IF _n_ = 975 THEN t_lo = diff_z;
IF _n_ = 25 THEN t_hi = diff_z;
IF eof THEN OUTPUT;
RUN;
DATA ci_t;
MERGE diff_z (WHERE = (sample = 0))
bootse (KEEP = bootse)
t_vals (KEEP = t_:);
conf_lo = origdiff - (t_lo * bootse);
conf_hi = origdiff - (t_hi * bootse);
KEEP origdiff bootse t_lo t_hi conf_lo conf_hi;
RUN;
3.
Finally, we can then pull out the
confidence interval limits.
PROC PRINT DATA = ci_t;
RUN;
4.
With all due respect to the SAS Institute....that's
a ridiculous amount of code when compared to what is necessary to do
essentially the same thing in R. See the
Do
It Yourself
Introduction to R course, specifically,
Module
5 which
covers t and F tests. The
comments and code below were adapted from that module.
### Robust t-test.
# First create an object (called 'x1' here) to show each group of Candy on Recall1.
x1 <- split(Recall1, Candy) # Load required library(WRS). library(WRS) # Robust t-test (Yuen bootstrapped t-test); with trimming (20%), 1000 bootstrapped resamples; one-tailed test (side=T). yuenbt(x1$Skittles, x1$None, tr=.20, alpha=.05, nboot=1000, side=T)
|