Creation date: 11/18/2000
Authored by: Karl Ho
RSS Matters
The All-encompassing SAS 8 (1/2)
Last year, when I wrote an evaluation note on SAS 7, (which was a transitional release from SAS' first windows generation 6.x to the current version SAS 8), I fell short of giving a full coverage because of SAS' enormity composed of numerous modules and procedures. Another reason was SAS 7 was still in a developer's release (a "post-beta" beta version.) After one year, when I request myself to do that again to the new version 8, I have to say I am still shy of giving a satisfactory report: I can only split the evaluation into two articles just to introduce the new features that are innovated in version 8 alone.
The new SAS not only demonstrates higher level of stability in the MS windows operating system (geared for Windows 2000)*, it introduces a wave of new functionalities and features that give the software a facelift from its previous mainframe-adapted outlook. Most of the windows users may still refrain from choosing SAS 7 in lieu of other GUI-based packages such as SPSS or Statistica since SAS is known for its syntax-based operation. With the three new add-on modules (SAS/Analyst, SAS/LAB, SAS/INSIGHT) plus the 3-D graphic PROC G3D procedure, I would declare SAS is now fully gooey (GUI). For instance, with Analyst (Solutions--> Analysis --> Analyst), users can simply import data in various formats and start analyzing in the spreadsheet-like, explorer interface. A wide variety of procedures are ready-to-use in Analyst, such as performing bivariate analyses (e.g. T-test, correlations, ANOVA) and multivariate analyses (GLM, Regression, Power analysis, Principal Components and Survival models). Users can also easily select samples out of an existing data set and create charts by point-and-clicking.
However, comparative advantages of SAS are still on its advancement in research and development, that exemplifies in the new data analysis procedures. In the following I will briefly introduce these procedures new to the release 8.1 with some sample outputs.
Survey Sampling
When starting a survey, particularly a large-scale or national survey,
researchers are concerned how to extract samples from the population and if
and how weighting should be applied to certain under-represented (certain
social-economic status group in some geographic areas) or over-represented
groups (e.g. upper-middle class among email recipients). SAS 8 introduces
a new series of SAS procedures enables survey researchers to select their
survey samples using different designs:
simple random
stratified
clustering
unequal weighting
PROC SURVEYSELECT selects samples via a variety of methods
ranging from simple random to complex multi-stage design sampling.
With another two new procedures, SURVEYMEANS and SURVEYREG,
researchers can easily estimate sample and population means, variances, confidence limits, and other descriptive
statistics, sampling errors and regression models, taking into account the
sampling design and weighting scheme introduced in the sample selection
process. (sample output)
Nonparametric Modeling
SAS incorporates in the newest version 8.1 one of the latest techniques in
modeling non-linear models: nonparametric regression. It encompasses a
suite of nonparametric techniques including kernel density estimation and
loess smoothing. The PROC KDE procedure compute nonparametric
estimates using the method of kernel density estimation, saving the estimate
for subsequent plotting and analysis. The PROC LOESS and PROC TPSPLINE
provide various smoothing methods to conduct exploratory data analysis and
fit nonparametric or semiparametric models.
Sample output:
Spatial Prediction: Variogram and 2-dimensional Kriging
(Spatial analyses in geology, petroleum exploration, mining, and water pollution analysis)
PROC VARIOGRAM and PROC KRIGE2D implement the spatial prediction of
unsampled locations using two-dimensional data based on spatial continuity.
Sample plots:
Qualitative and Limited Dependent Variable Models
Researchers are very often faced with dependent variables that are not
continuous. These discrete variables (sometime called categorical choice)
include the choice of political parties, presidential candidates and
decision to take a bus or a train. One of the most renowned examples
is what the 2000 Nobel prize laureate, Daniel L. McFadden, has been studying
since 1974: commuters' choice of transportation mode(**). Multinomial
logit and probit models estimate the probability of the limited dependent
variable such as a commuter's choice of whether taking a bus or driving a
car. A new procedure in SAS/ETS is introduced to estimate the family
of discrete choice model. PROC QLIM can analyze the regular binary
(two-choice) probit and logit models, but also:
ordinal probit
nested logit
multinomial logit (more than two categories_
tobit
endogenous switching regression
simultaneous equations
Other New tests/features include:
Exact Logistic Regression (sample output)
Exact tests: generating direct exact p-values, or using Monte Carlo simulation (10000 samples) to estimate exact p-values.
Numerically Precise Regression (PROC ORTHOREG***): The new procedure produces more numerically accurate estimates than other regression procedures (e.g. REG, GLM) when data are ill conditioned or badly scaled.
In the next article, I will introduce the following new features:
Partial Least Square
IML workshop
Multiple Imputation for Missing Data
Distribution analysis
Robust regression
* I should have mentioned SAS for UNIX (version 8) delivers at least as much as its Windows version. Given the limit in space, I only focus on the latter.
** McFadden, D. 1974. "The Measurement of Urban Travel Demand" Journal of Public Economics, 3:303-28. Another laureate, James Heckman, another econometrician, is known for the selection bias model, also called Heckman model.
*** Orthogonal regression minimizes the distance between the X/Y points taken together and the regression line but PROC ORTHOREG uses least squares.
Reference:
An, Anthony and Donna Watts. 1998 "New SAS Procedures for Analysis of Sample Survey Data" SUGI Proceedings
What's New in Data Analysis on SAS Research and Development communities web (http://www.sas.com/rnd/app/da/danew.html)
Last updated: 01/18/06 by Karl Ho