Logistic Regression (Multinomial)
Multinomial Logistic regression is appropriate
when the outcome is a polytomous variable (i.e. categorical with more
than two categories) and the predictors are of any type: nominal,
ordinal, and / or interval/ratio (numeric). Discriminant Function
Analysis (DFA) may be used in the same situation; but DFA requires
adherence to more assumptions and therefore, multinomial logistic
regression is often preferred when the outcome variable is categorical.
Multinomial logistic regression does not require the use of a coding
strategy (i.e. dummy coding, effects coding, etc.) for including
categorical predictors in the model. Categorical predictor variables
can be included directly as factors in the multinomial logistic
regression dialog menu box.
For the duration of this tutorial we will be using
the
MultiNomReg.sav
file; which contains 1 polytomous categorical outcome variable (y) and
3 continuous predictor variables (x1 - x3); and 600 cases.
Begin by clicking on Analyze, Regression,
Multinomial Logistic...
Next, highlight / select the outcome variable (y)
and use the top arrow button to move it to the Dependent: box. Next,
click on the Reference Category... button and select First Category.
Then click the Continue button.
Next, select all three of the predictor variables
(x1, x2, x3) and use the bottom arrow button to move them to the
Covariate(s): box. Notice here, if we had any categorical predictors,
we would move them to the Factor(s): box. Next, click the Statistics...
button and select the following. The cell probabilities will not be
displayed if selected because, with only continuous predictors in the
model, too many cells would be produced. Also, the Monotonicity
measures table will not be needed because our outcome variable has more
than two categories. Next, click the Continue button, then click the OK
button to complete the analysis.
The output should be similar to what is displayed
below.
The Case Processing Summary table simply shows how
many cases or observations were in each category of the outcome
variable (as well as their percentages). It also shows if there was any
missing data. The Model Fitting Information table (above right) shows
various indices for assessing the intercept only model (sometimes
referred to as the null model) and the final model which includes all
the predictors and the intercept (sometimes called the full model).
Both the Akaike Information Criterion (AIC) and the Bayesian
Information Criterion (BIC) are information theory based model fit
statistics. Lower values of indicate better model fit and both can be
below zero (i.e. larger negative values indicate better fit than values
closer to zero). The BIC tends to be more conservative. Similarly, the
-2 Log Likelihood (-2LL) should be lower for the the full model than it
is for the null model; lower values indicate better fit. The -2 LL is a
likelihood ratio and represents the unexplained variance in the outcome
variable. Therefore, the smaller the value, the better the fit. The
Likelihood Ratio chi-square test is alternative test of
goodness-of-fit. As with most chi-square based tests however, it is
prone to inflation as sample size increases. Here, we see model fit is
significant
χ²
(6) = 1291.00, p < .001, which indicates
our full model predicts significantly better, or more accurately, than
the null model. To be clear, you want the p-value
to be
less than your established cutoff
(generally 0.05) to indicate good fit.
The Goodness-of-Fit table provides further
evidence of good fit for our model. Again, both the Pearson and
Deviance statistics are chi-square based methods and subject to
inflation with large samples. Here, we interpret lack of significance
as indicating good fit.
To be clear, you want the p-value to be greater
than your established cutoff (generally 0.05) to indicate
good fit. The Pseudo R-Square table displays three metrics which have
been developed to provide a number familiar to those who have used
traditional, standard multiple regression. They are treated as measures
of effect size, similar to how R² is treated in
standard multiple regression. However, these metrics do not represent
the amount of variance in the outcome variable accounted for by the
predictor variables. Higher values indicate better fit, but they should
be interpreted with caution.
The statistics in the Likelihood Ratio Tests table
are the same types as those reported for the null and full models above
in the Model Fitting Information table. Here however, each element of
the model is being compared to the full model in such a way as to allow
the research to determine if it (each element) should be included in
the full model. In other words, does each element (predictor)
contributed meaningfully to the full effect. For instance, we see that
the x3 predictor displays a non-significant (p =
.110) chi-square which indicates x3 could be dropped from the model and
the overall fit would NOT be significantly reduced. To be clear,
if
the p-value is less than your
established cutoff (generally 0.05) for a predictor then that predictor
contributes significantly to the full (final)
model.
The Parameter Estimates table (above), shows the
logistic coefficient (B) for each predictor variable for each
alternative category of the outcome variable. Alternative category
meaning, not the reference category. The logistic coefficient is the
expected amount of change in the logit for each one unit change in the
predictor. The logit is what is being predicted; it is the odds of
membership in the category of the outcome variable which has been
specified (here the first value: 1 was specified, rather than the
alternative values 2 or 3). The closer a logistic coefficient is to
zero, the less influence the predictor has in predicting the logit. The
table also displays the standard error, Wald statistic, df,
Sig. (p-value); as well as the Exp(B) and confidence
interval for the Exp(B). The Wald test (and associated p-value)
is used to evaluate whether or not the logistic coefficient is
different than zero. The Exp(B) is the odds ratio associated with each
predictor. We expect predictors which increase the logit to display
Exp(B) greater than 1.0, those predictors which do not have an effect
on the logit will display an Exp(B) of 1.0 and predictors which decease
the logit will have Exp(B) values less than 1.0. As an example, we can
see that a one unit change in x3 does not significantly change the odds
of being classified in the first category of the outcome variable
relative to the second or third categories of the outcome variable,
while controlling for the influence of the other predictors.
The Classification Table (above) shows how well
our full model correctly classifies cases. A perfect model would show
only values on the diagonal--correctly classifying all cases. Adding
across the rows represents the number of cases in each category in the
actual data and adding down the columns represents the number of cases
in each category as classified by the full model. The key piece of
information is the overall percentage in the lower right corner which
shows our model (with all predictors & the constant) is 99.2%
accurate; which is excellent.
As with most of the tutorials / pages within this
site, this page should not be considered an exhaustive review of the
topic covered and it should not be considered a substitute for a good
textbook.
|