VIII.
ANOVA and Linear Regression
The following covers some of the common SAS
procedures with which you can run some intermediate level statistical
analyses. Use the Import Wizard to import the
Example
Data 1 file using the SPSS File (*.sav) source
option as was done previously.
1. One-way ANOVA
Some sources will recommend use of PROC ANOVA for
the one-way or single factor analysis; however, PROC ANOVA assumes
balanced cells (i.e. each group has an equal number of cases). Given
that we frequently do not have balanced cells, use of PROC GLM is
preferred. The current example compares different stimuli conditions on
ability to recall at time 1. In this example, our factor or independent
variable stimuli has three levels (or conditions); spoken, written, and
combined. Our dependent variable is the familiar recall at time 1
(recall1).
PROC GLM DATA=example1;
CLASS stimuli;
MODEL recall1 = stimuli;
MEANS stimuli;
RUN;
We can run post-hoc tests (here with Tukey's
version) by adding additional operators to the means statement.
PROC GLM DATA=example1;
CLASS stimuli;
MODEL recall1 = stimuli;
MEANS stimuli / TUKEY;
RUN;
Here we use the Tukey and the
Ryan-Einot-Gabriel-Welsch Multiple Range Test for our post hoc
comparisons.
PROC GLM DATA=example1;
CLASS stimuli;
MODEL recall1 = stimuli;
MEANS stimuli / TUKEY REGWQ;
RUN;
2. Multi-way or Factorial ANOVA
Here we are looking for mean differences among two
factors with six total conditions on ability to recall at time 1
(recall1). The first factor, stimuli, has three conditions and was
described above. The second factor, candy; has two conditions (Skittles
& no candy).
PROC GLM DATA=example1;
CLASS candy stimuli;
MODEL recall1 = candy stimuli;
MEANS stimuli / REGWQ;
MEANS candy stimuli;
RUN;
3. One-way MANOVA
Here, we are testing for group differences among
two dependent variables simultaneously using our familiar three groups
of stimuli. First, we run a PROC MEANS to take a look at the
descriptive statistics for each group across the two dependent
variables. Then, we run the actual MANOVA.
PROC MEANS N MEAN STD MIN MAX DATA=example1;
CLASS stimuli;
VAR recall1 recall2;
RUN;
PROC GLM DATA=example1;
CLASS stimuli;
MODEL recall1 recall2 = stimuli / SS3;
CONTRAST 'Printed vs Spoken&Printed and Spoken' stimuli 2 -1 -1;
CONTRAST 'Spoken vs Printed and Spoken' stimuli 0 1 -1;
MANOVA h=_all_;
RUN;
QUIT;
Given that our two dependent variables above are
really the same variable measured at two points in time; it would be
more appropriate to run the Repeated Measures ANOVA.
PROC GLM DATA=example1;
CLASS stimuli;
MODEL recall1 recall2 = stimuli;
REPEATED TIME 2 (0 1) / SUMMARY;
RUN;
4. Factorial MANOVA
Here, we are looking at differences between
stimuli groups, as well as candy groups, on recall at time 1 and age.
To begin, we will take a look at some of the descriptive statistics of
our variables; then the correlation between our two dependent variables
(age & recall1); then run the GLM procedure.
PROC MEANS N MEAN STD MIN MAX DATA=example1;
CLASS stimuli;
VAR age;
RUN;
PROC MEANS N MEAN STD MIN MAX DATA=example1;
CLASS candy;
VAR age;
RUN;
PROC MEANS N MEAN STD MIN MAX DATA=example1;
CLASS stimuli;
VAR recall1;
RUN;
PROC MEANS N MEAN STD MIN MAX DATA=example1;
CLASS candy;
VAR recall1;
RUN;
PROC MEANS N MEAN STD MIN MAX DATA=example1;
CLASS stimuli candy;
VAR age recall1;
RUN;
PROC CORR DATA=example2;
VAR age recall1;
RUN;
PROC GLM DATA=example1;
CLASS stimuli candy;
MODEL age recall1 = stimuli candy / SS3;
CONTRAST 'Printed vs Spoken&Printed and Spoken' stimuli 2 -1 -1;
CONTRAST 'Spoken vs Printed and Spoken' stimuli 0 1 -1;
MANOVA h=_all_ / SUMMARY PRINTE;
RUN;
5. Linear Regression.
Use the Import Wizard to import the 'regression_example_data.sav'
file using the SPSS File (*.sav) source option and
the member name 'red'.
PROC PRINT DATA=red;
RUN;
First, we'll do a simple linear ordinary least
squares (OLS) regression with two predictors (age & recall1)
and recall2 as our outcome variable.
PROC REG DATA=red;
MODEL apt = prison age peyrs;
RUN;
SAS produces un-standardized regression
coefficients by default. If you also want SAS to
produce the standardized coefficients then you must include an STB
(standardized beta) options statement directly following the name of
the last predictor; like the following example:
PROC REG DATA=red;
MODEL apt = prison age peyrs / STB;
RUN;
Next, we'll take a second look at the same
regression model, but have SAS create a graph of the residuals vs. the
Cook's Distance.
PROC REG DATA=red;
MODEL apt = prison age peyrs;
OUTPUT OUT = T STUDENT = RES COOKD = COOKD;
RUN;
QUIT;
PROC GPLOT DATA = T;
PLOT res*cookd = 1 / vaxis=axis1;
RUN;
QUIT;
Now, we'll review the residual values which is a
three stage process. We will first generate a new variable rabs
containing the absolute value of standardized residuals. Then we sort
the data on rabs in descending order. We then list
the first 50 observations.
DATA T2;
SET T;
RABS = abs(res);
RUN;
PROC SORT DATA=T2;
BY DESCENDING rabs;
RUN;
PROC PRINT DATA=T2 (obs=50);
RUN;
6. Robust regression
is done by Iterated Weighted Least Squares (IWLS). The procedure for
running robust regression is proc robustreg. There
are a couple of estimators for IWLS. We are going to use the Huber
estimator in this example. We can save the final weights created by the
IWLS process. This can be very useful. We will use
the data set T2 generated above. It includes the original data set and
the diagnostic variables generated based on the OLS regression model.
*Note in the output the presence of the AIC & BIC for model
fit.
PROC ROBUSTREG DATA=T2 METHOD=m (wf=huber);
MODEL apt = prison age peyrs;
OUTPUT OUT = test1 weight=wgt;
RUN;
Next, we'll take a look at the residuals of the
robust regression.
PROC SORT DATA=test1;
by wgt;
RUN;
PROC PRINT DATA=test1 (obs=50);
RUN;
Now let's compare the results of a regular OLS
regression and a robust regression. If the results are very
different, you will most likely want to use the results from the robust
regression.
ODS LISTING CLOSE;
PROC REG DATA=red;
MODEL apt = prison age peyrs;
ODS OUTPUT PARAMETERESTIMATES = a;
RUN;
QUIT;
PROC ROBUSTREG DATA=T2 METHOD=m (wf=huber);
MODEL apt = prison age peyrs;
ODS OUTPUT PARAMETERESTIMATES = b;
RUN;
QUIT;
ODS LISTING;
TITLE "OLS Regression";
PROC PRINT DATA=a;
TITLE "Robust Regression";
PROC PRINT DATA=b;
RUN;
|