rssunt.gif (12308 bytes)


SAS Programming Workshop II

Exercise | Download | Evaluation

Creation date: 10/27/99
Author: Karl Ho

 

 

Objectives: This is the third of the SAS short course series.  It is designed for intermediate users and experienced users who have taken the first two classes and want to focus on the statistical and reporting procedures in SAS.  After this course, you should be able to:

1. Understand the SAS procedure syntax;
2. Perform data analysis in SAS
3. Perform advanced data management in SAS
4. Be familiar with new developments in the latest version of SAS


Topics:

Review
I. Introduction
II. Functional Categories of Base SAS procedures
III. Report Writing
IV. Examples

  1. Descriptive Statistics
  2. Linear Models
  3. Plots and Charts
  4. Utilities

V. New Developments in SAS

Review

Before we get started with the Procedure step, let's refresh what we have learned in the previous class. 

DATA step

A DATA step consists of a group of statements that reads ASCII text data (in a computer file) or existing SAS data sets in order to create a new SAS data set. 

A SAS program usually starts with A DATA step.  A DATA step consists of a group of statements that reads ASCII text data (in a computer file) or existing SAS data sets in order to create a new SAS data set. The DATA step must begin with the DATA statement and should end with a RUN statement. Data set can be created and stored in a permanent library. Otherwise, it will stay in a temporary library (by default, WORK) which lasts as long as the current SAS session, i.e. such data sets will be erased when you exit SAS. Data manipulation, such as creating or renaming a variable, must be done in a DATA step and cannot be done in a PROC step.

One can read in the data using the CARDS statement and embed the data in the data step or read in an external file using the INFILE statement.   The following exemplifies the former method (note: the free input method is used despite the fixed column data):

DATA CLASS;
   INPUT NAME $ SEX $ AGE HEIGHT WEIGHT;
CARDS;
Alice      F  13  56.0  84
Barbara    F  14  62.0  102
Bernadette F  13  65.0  98
Jane       F  12  59.0  84
Janet      F  15  62.0  112
Joyce      F  11  51.0  50
Judy       F  14  64.0  90
Louise     F  12  56.0  77
Mary       F  15  66.0  112
Alfred     M  14  69.0  112
Henry      M  14  63.0  102
James      M  12  57.0  83
Jeffery    M  13  62.0  84
John       M  12  59.0  99
Philip     M  16  72.0  150
Robert     M  12  64.0  128
Ronald     M  15  67.0  133
Thomas     M  11  57.0  85
William    M  15  66.0  112
;
RUN;

Note that the CARDS statement will not be complete without a ‘;’ put on a new line and the DATA step ends with RUN; You can always verify the data by PROC PRINT or PROC CONTENTS.

The new version of SAS also features a data import wizard that directly reads in data in various formats.   See the SAS Programming Workshop I notes for details.

Library

A SAS data library is a collection of SAS files that are recognized as a unit by the SAS system.

A SAS library is like a special SAS pointer to a location where your SAS files are stored. Once a library is created, SAS has access to the files in that library. When you delete a library, the files are still on your computer, but SAS no longer has access to them.  By creating a library, you are essentially giving SAS a shortcut name or pointer to a storage location in your operating environment where you store SAS files.

To create a library, use the following statement:

LIBNAME libref 'path:directory';

libref is the library reference name assigned by the programmer.  It is bound by the conventional eight character, no punctuation rule.

I. Introduction: The Procedure Step

In this workshop, we focus on the SAS procedure step that covers running SAS procedures on SAS data sets. A PROCedure step calls a SAS procedure to analyze or process a SAS dataset. The PROC step begins with a PROC statement and ends with a RUN statement. All of the statistical procedures require the input of a SAS data set. This data set should have already been prepared in a DATA step for processing by the procedure, since SAS procedures allow only limited adjustment of the data set.

The general syntax for a PROC step is:

PROC name [DATA=dataset [dsoptions] ] [options];
[other PROC-specific statements;]
[BY varlist;]
RUN;

where:

name   identifies the procedure you want to use.
dataset identifies the SAS data set to be used by the procedure; if omitted, the last data set to have been created during the session is used.
dsoptions specifies the data set options to be used.
varlist specifies the variables that define the groups to be processed separately. The data set must already be sorted by these same variables.
options specifies the PROC-specific options to be used.

 
The syntax above uses the following conventions for statements:

. SAS keywords are in UPPERCASE;
. User-supplied words (such as file names or variable names) are in lowercase;
. Options are in brackets [ ] . Note that you do not type the brackets.

This is a simplified form of the syntax conventions used in SAS manuals and in documentation for most statistical packages.

A SAS program can contain any number of DATA and PROC steps. The SAS statements in each step are executed all together. Once a dataset has been created, it can be processed by any subsequent DATA or PROC step. Note the following rules of the SAS statements:

- All SAS statements start with a keyword (DATA, INPUT, PROC, etc.)

- All SAS statements end with a semicolon (;) . (The most common problem students encounter is omitting a semicolon -- SAS thinks that two statements are just one.)

- SAS statements can be entered in free-format : You can begin in any column, type several statements on one line or split a single statement over several lines (as long as no word is split.).

- Uppercase and lowercase are equivalent, except inside quote marks ( sex = 'm'; is not the same as sex = 'M';).

SAS Procedures exist to carry out all the forms of statistical analysis. As the above examples indicate, a procedure is invoked in a "PROC step" which starts with the keyword PROC, such as:

PROC MEANS DATA=CLASS;
VAR HEIGHT WEIGHT;

The VAR or VARIABLES statement can be used with all procedures to indicate which variables are to be analyzed. If this statement is omitted, the default is to include all variables of the appropriate type (character or numeric) for the given analysis.

Some other statements that can be used with most SAS procedure steps are:

BY variable(s);

Causes the procedure to be repeated automatically for each different value of the named variable(s). The data set must first be sorted by those variables.

ID variable(s);

Give the name of a variable to be used as an observation IDentifier.

LABEL var='label';

Assign a descriptive label to a variable.

WHERE (expression);

Select only those observations for which the expression is true.

For example, the following lines produce separate means for males and females, with the variable SEX labeled 'Gender'. (An ID statement is not appropriate, because PROC MEANS produces only summary output.)

 

PROC SORT DATA=CLASS;
BY SEX;
PROC MEANS DATA=CLASS;
VAR HEIGHT WEIGHT;
BY SEX;
LABEL SEX='Gender';

If the DATA= option is not used, SAS procedures process the most recently created dataset. In the brief summaries below, the required portions of a PROC step are shown in bold. Only a few representative options are shown.

II Functional Categories of Base SAS Procedures

Base SAS software provides a variety of procedures that produce reports, compute statistics, and perform utility operations.


Report Writing

These procedures display useful information, such as data listings (detail reports), summary reports, calendars, letters, labels, forms, multipanel reports, and graphical reports:

CALENDAR MEANS* SQL*
CHART* PLOT SUMMARY*
FORMS PRINT TABULATE*
FREQ* REPORT* TIMEPLOT
*These procedures produce reports and compute statistics.

Statistics

These procedures compute elementary statistical measures which include descriptive statistics based on moments, quantiles, confidence intervals, frequency counts, cross-tabulations, correlations, and distribution tests. They also rank and standardize data:

CHART RANK SUMMARY
CORR REPORT TABULATE
FREQ SQL UNIVARIATE
MEANS STANDARD

Utilities

These procedures perform basic utility operations. They create, edit, sort, and transpose data sets, create and restore transport data sets, create user defined formats, and provide basic file maintenance such as to copy, append, and compare data sets:

APPEND EXPLODE REGISTRY
BMDP** EXPORT RELEASE**
CATALOG FORMAT SORT
CIMPORT FSLIST SOURCE**
COMPARE IMPORT SQL
CONTENTS OPTIONS TAPECOPY**
CONVERT** PDS** TAPELABEL**
COPY PDSCOPY** TRANSPOSE
CPORT PMENU TRANTAB
DATASETS PRINTTO
**See the SAS documentation for the operating environment for a description of these procedures.


IV. Examples

1. Descriptive statistics

PROC CORR

Correlations among a set of variables.

PROC CORR DATA=SASdataset options;
options:NOMISS ALPHA
VAR variable(s);
WITH variable(s);

where nomiss option excludes missing values and ALPHA specifies Pearson Correlations with Cronbach’s alpha.

Example

To get the correlation coefficients fro HEIGHT and WEIGHT, use the VAR statement:

 

PROC CORR;
VAR HEIGHT WEIGHT;

The output should look like:

    ---------------------------------------------------------------------
                                          CUMULATIVE    CUMULATIVE
      AGE    FREQUENCY    PERCENT         FREQUENCY     PERCENT
      
      11         2          10.5              2            10.5
      12         5          26.3              7            36.8
      13         3          15.8              10           52.6
      14         4          21.1              14           73.7
      15         4          21.1              18           94.7
      16         1           5.3              19           100.0
    ---------------------------------------------------------------------

PROC FREQ

Frequency tables, chi ?tests

PROC FREQ DATA=SASdataset;
TABLES variable(s) / options;
options:NOCOL NOROW NOPERCENT
OUTPUT OUT=SASdataset;

Example

To get the frequency of AGE in Data Class.

 

PROC FREQ DATA=CLASS;
TABLES AGE;

Then output should look like:

 

    ---------------------------------------------------------------------
                                          CUMULATIVE    CUMULATIVE
      AGE    FREQUENCY    PERCENT         FREQUENCY     PERCENT
      
      11         2          10.5              2            10.5
      12         5          26.3              7            36.8
      13         3          15.8              10           52.6
      14         4          21.1              14           73.7
      15         4          21.1              18           94.7
      16         1           5.3              19           100.0
    ---------------------------------------------------------------------

Also, you can get the crosstab table for two variables. For example, if you want to examine the

relationship between AGE and HEIGHT, you can use the Frequency procedure get the cross table for them.

 

PROC FREQ DATA=CLASS;
TABLES AGE*HEIGHT;

 

PROC MEANS

Means, standard deviations, and a host of other univariate statistics for a set of variables.

 

PROC MEANS DATA=SASdataset options;
options:N MEAN STD MIN MAX SUM VAR CSS USS
VAR variable(s);
BY variable(s);
OUTPUT OUT=SASdataset keyword=variablename ... ;

Statistical options on the PROC MEANS statement determine which statistics are printed. The (optional) OUTPUT statement is used to create a SAS dataset containing the values of these statistics.

Example

You can examine the means of WEIGHT for different SEX.

PROC MEANS;
BY SEX;
VAR WEIGHT;

The output should look like:

 

    -----------------------------------------------------------------
    VARIABLE  N      MEAN         STD         MINIMUM    MAXIMUM    

    -------------------------SEX=F-----------------------------------
     WEIGHT   9   89.88888889   19.41934888   50.000000   112.000000
    -------------------------SEX=M-----------------------------------
     WEIGHT  10   108.8000000   22.75863694   83.000000   150.000000
    -----------------------------------------------------------------

PROC UNIVARIATE

Univariate statistics and displays for a set of variables.

 

PROC UNIVARIATE DATA=SASdataset options;
options:PLOT
VAR variable(s);
BY variable(s);
OUTPUT OUT=SASdataset keyword=variablename ... ;

Example

You can examine the univariate statistics like median and kurtosis of WEIGHT for different SEX.

 

PROC UNIVARIATE DATA=class PLOT;
VAR weight;
BY sex ;
run;

Click here to take a look at the output.

 

2. Linear models

SAS statements and options for regression (PROC REG) are described in more detail in the document PROC REG Summary. SAS statements and options for analysis of variance (PROC ANOVA and PROC GLM) described in the document PROC ANOVA and PROC GLM.

PROC ANOVA

Analysis of variance (balanced designs)

PROC ANOVA DATA=SASdataset options;
CLASS variable(s);
MODEL dependent(s)= effect(s);

PROC GLM

General linear models, including ANOVA, regression and analysis of covariance models.

PROC GLM DATA=SASdataset options;
CLASS variable(s);
MODEL dependent(s)= effect(s);
OUTPUT OUT=SASdataset keyword=variablename ... ;

Sample program 

Sample output

PROC REG

Regression analysis

PROC REG DATA=SASdataset options;
MODEL dependent(s) = regressors
/ options;
PLOT variable | keyword. *
variable | keyword. = symbol ;
OUTPUT OUT=SASdataset P=name R=name ... ;

3. Plots and charts

PROC CHART

Histograms and bar charts

PROC CHART DATA=SASdataset options;
VBAR variable / options;
HBAR variable / options;
options: MIDPOINTS= GROUP= SUMVAR=

PROC PLOT

Scatter plots

PROC PLOT DATA=SASdataset options;
options: HPERCENT= VPERCENT=
PLOT yvariable *
xvariable = symbol / options;
PLOT (yvariables) *
(xvariables) = symbol / options ;
PLOT options: BOX OVERLAY VREF= HREF=
BY variable(s) ;

Note that the parenthesized form in the PLOT statement plots each y-variable listed against each x-variable.

4. Utility procedures

PROC PRINT

Print a SAS data set

PROC PRINT DATA= SASdataset options;
options: UNIFORM LABEL SPLIT='char'
VAR variable(s);
BY variable(s);
SUM variable(s);

PROC SORT

Sort a SAS data set according to one or more variables.

PROC SORT DATA=SASdataset options;
options: OUT=
BY variable(s);

V. Using SAS Solutions and Tools

SAS provides a set of ready-to-use solutions, applications, and tools in its latest version of the software. You can access many of these tools by using the Solutions menu. They are:

Analysis

Using the ANALYST application for statistics tasks
One-Way ANOVA
Linear Regression
Simple Statistics
Summary Statistics



Applications Development

Developing EIS and OLAP applications
Creating and enhancing customized applications
Using pre-defined Report Templates in an application
Creating a custom desktop environment
Source Control Manager (SCM)

Business Geographics

Address matching and geo-coding
Geographic reporting and map visualization
Using the SAS/AF Map Class in your applications

Connectivity

Remote library services
Compute services
Remote objecting services
Submitting SAS code to remote systems

Data Access

Importing and exporting data (using the Import/Export Wizard)
Using the External File Interface
Accessing databases


Data Management

Editing and browsing your data
Subsetting tables and applying a WHERE clause
Data Management Procedures


Data Presentation

Printing information from the SAS System

Database Marketing

Data visualization

Graphical Reporting

Using pre-defined Report Templates for graphing
Creating 3D Business Graphs
Mapping your data

Online Analytical Processing (OLAP)

Using multidimensional data in reports
Creating a multidimensional database

Report Writing

Report writing procedures

Web Enablement

Year 2000

Year 2000
YEARCUTOFF= option

 

Evaluation


MAIN

Last updated: 01/18/06 by Karl Ho