Introduction to SAS 8
Data | Procedures | Examples | Download | Exercise | Evaluation
Creation date: 10/13/98
Author: Karl Ho
Objectives: This is the SAS 8 debut class and first of the SAS short course series. It is designed for beginning users who want to get started with the program and experienced users who want to get to know the new generation of the SAS system. After this course, you should be able to:
1. Understand how SAS processes a program;
2. Be familiar with the simple data handling process;
3. Get accustomed to the SAS reiterative process;
4. Manage data sets in SAS;
5. Perform simple statistical analysis
Topics:
I. What is SAS?
II. Who can use SAS for Windows?
III. Using SAS for Windows
IV. SAS data handling - Data Step
V. SAS Files
VI. SAS Procedures
Appendix: SAS function keys
SAS stands for Statistical Analysis System. It is both a statistical language and a system that performs sophisticated data management and statistical analysis. SAS is available in multiple computing environments. At the University of North Texas, we have obtained the licenses of the software for different operating systems including Windows, Mac, OS/MVS, VM/CMS, and UNIX. In this series, we focus on SAS 8 for Windows, which is a complete data analysis program with capabilities comparable to, and, in some aspects, surpassing, its mainframe counterpart. SAS for Windows will do every task that other editions of SAS do, plus it is easy to use and its graphic user interface can do a lot more in graphical analyses than the mainframe or UNIX versions. The only limitation to SAS for Windows is the hardware that it is run on. Although, SAS runs on all classes of IBM personal computers and true IBM compatible machines, faster micro processors are preferable. A Pentium class machine is recommended. The latest version of the software, SAS 8, can run on Windows 95, Windows 98 and Windows NT.
II. Who can use SAS for Windows?
SAS software is distributed through the university's site license agreement with SAS Institute. UNT has a site license that allows students to use the software in any general access labs on campus. Full-time faculty and staff can request SAS installation on their machines on campus or at home. For students who want to install the software at their home machines, a student edition of the software is available for sale at the UNT bookstore. Check the UNT bookstore or the RSS office for more details pertaining to the student version of SAS 8.
UNT students and faculty are also eligible to use SAS on other platforms than Windows. It is available on UNIX, CMS and MVS. Users who want to use the mainframe/central host version of the software need to have a UNIX/sol account or a CMS account. Application for these accounts are available to the Computing Center Helpdesk (565-2324).
1. New Windows Display System
The new SAS 8 expands from the previous version to include three new elements on top of its basic three window display system. The Explorer window and Results window provide user better control and organization of the SAS objects (e.g. SAS libraries, SAS data sets) and system output. The latest Enhanced Editor provides a more sophisticated programming editor that is equipped with color coding, macro features and better customizability for programmers' needs. For more new features of the new Enhanced Editor, see Appendix I. Despite that there are more windows to deal with, the new SAS 8 workspace is more versatile and easier to use with the new windowing design.
In its original design, the SAS Display Manager System is composed of three smaller windows labelled: Program Editor, Log, and Output. The first of which is for program input while the latter two respectively provide the program system log information and the output of the program. The program log is primarily diagnostics of the programming syntax and gives information about if the program was written correctly and hints for debugging otherwise. It also reports information about site license, user information, and the SAS release number. The Output window is where results output by SAS procedures are shown. If the program does not go through correctly, the output window will usually be blank or report the previous output.
The new Explorer window provides easy access to SAS objects such as File shortcuts and Libraries objects. You can assign SAS programs or different versions of the same program to a clickable shortcut object using the File shortcut. Accessing SAS libraries and data sets are much easier in version 8. All objects are clickable with activation of appropriate windows such as the Viewtable window for data sets.
The Result window is another Windows Explorer-like utility that allows output to be organized in a hierarchical fashion.
2. Keys Windows - A Road Map to SAS windows
The new SAS 8 has even more windows than its predecessor. To name a few, apart from the three default windows, they include the Library window, Filename window, Viewtable window, Keys window, Options window, Graphic windows, etc. It is imperative to use a "Road Map" to surf around the SAS workspace. The Keys window plays such a role that guides and gives you shortcuts to different windows. To activate the Keys window, point and click on the Command box at the upper left corner and type "keys" (case-insensitive).
The Keys window allows you to assign function keys to switch to the most frequently used windows and perform the most used functions. In the above example, the F5, F6 and F7 are assigned to default windows (Program Editor, Log and Output respectively) as in the previous versions and versions in other operating environments. For the rest of the function keys and hot keys (a combination of function keys, letter keys Control and/or Shift keys), you are basically free to assign what you want or what you use most. For instance, assigning F12 to "next" allows you to switch to the next window, F2 to "lib" to open the library window, etc. It is much more convenient to use the function keys than clicking on the menu to perform the frequently used functions or windows. An alternative is to take advantage of the window tabs at the bottom of the SAS window. Taking after the Window 95 taskbar, this SAS taskbar provides shortcut to the open windows.
3. Entering SAS Statements
To enter SAS statements, simply move the cursor in the Program Editor window and click the left mouse button. You can also hit the function key F5 or click on the Program Editor tab at the bottom to perform the same task.
A SAS program normally starts with a data step. Each statement can go on several lines, but it MUST end with a semi-colon. If you want to go to the next line, simply press ENTER.
Conventions on Windows operations are applicable in SAS for Windows. Cutting and pasting, for example, make program editing much easier for users.
By default, SAS for Windows displays the three windows simultaneously. But, you can select Cascade or Tile under Window option at the menu bar to choose the format of display.
Normally, only one window is active at a time. Simply moving the cursor to a particular window and click within the window area and the color change on the panel (top bar of the window) notifies which is the active window. When you enter a SAS session, the Program Editor window is active by default. To make the LOG window active, type LOG at the command box located at left hand corner underneath the menu bar.
4. Submitting SAS Statements
When you have entered your program correctly, you are ready to submit these statements for execution. There are many ways to do so. You can:
1. press F3 or;
2. type SUBMIT at the command box right below the menu bar or;
3. click your right mouse button and select LOCAL, SUBMIT.
Processing of a SAS Program
If something goes wrong in your SAS statements, SAS will issue error messages in the LOG window. To check if there is an error message, you need to go to the LOG window. Type LOG at the Program Editor command line. Use PageUp and Page Down keys to scroll up and down the window. When the error is located, you may want to go back to your SAS program and make some changes. Type PGM at the command line in the LOG window to make your Program Editor window active. At the command line in the Program Editor type RECALL, you will get your SAS program back. An easy way to do so is to hit F4.
5. Saving SAS Statements
If you wish to save your SAS program, click File on the menu and select save. Give a file name like "A:\mypgm.sas", which saves the SAS program file on to your floppy diskette. Alternatively, you can also type in the command box "FILE A:\mypgm.sas". This will save every thing on the Program Editor window into drive A: under the name MYPGM.SAS. The same applies to the LOG and OUTPUT windows. Note that by convention, the file extension .sas stands for SAS program files, .log stands for SAS log files and .lst stands for listing or output files.
*.sas - SAS program file *.log - SAS log file *.lst - SAS output file
|
6. Bringing SAS Programs into a SAS Session
If you want to bring a file into the SAS Program Editor window once you have been in SAS for Windows session, type INCLUDE 'A:\MYPGM.SAS' at the Program Editor command line. An alternative way is to use the menu bar and choose from the FILE option. SAS will retrieve a file named 'MYPGM.SAS' from drive A: into the Program Editor window.
7. Ending a SAS Session
To end a SAS session, double click the uppermost left hand corner button or type ENDSAS at the command line in any window. You can close any window by typing END at the command line when the window is active.
IV. SAS data handling - Data Step
Normally, SAS users do not pay attention to what type of files SAS uses in a SAS session. This section distinguishes several types of files that SAS can handle. Knowing this, you will be able to use each type of file advantageously.
A data file that has been entered in the SAS Data step after the CARDS command needs to
be converted into a SAS data file before SAS can use it. The DATA step takes care of this
conversion.
Example 1:
Data;
input x y z @@;
cards;
1 2 3 4 5 6 7 8 9
;
run;
This sample SAS program creates a temporary data set using the CARDS statement to read in the in-line data. The Data statement defines a temporary data set; the INPUT statement defines the variables and their formats; the CARDS statement gives instruction to start reading in the data that follow. The RUN statement ends this session of the program and submits for processing.
External Files
The most common type of data is sometimes referred to as an External File, Raw File, or even a Text File. These files have the same characteristics: they are made up of numbers and/or characters and they can be processed by other programming languages as well as SAS. There are two ways to incorporate this kind of file into a SAS program. The first and the commonly used one is to put data after a CARDS command as in the previous example. Another method is to refer to the location of data in the SAS program. The latter method is more efficient than the former, because it reduces the size of your SAS program to a more manageable level, especially, when your data set has over a thousand observations. The following SAS program shows you how to accomplish the latter method.
Example 2:
FILENAME DATAIN 'A:\COUNTRY.DAT'; DATA COUNTRY; INFILE DATAIN; INPUT DEC 1 ID 2-4 NAME $CHAR26. SSCODE 31-33 CONTIN $ 34-35 DODEV 36 POPULATE 37-43 AREA 44-49 GNP 50-56 MILEXPED 57-64 .1 PEDEXPED 65-71 .1; PROC PRINT DATA=COUNTRY; RUN;
The FILENAME statement tells SAS to use DATAIN as a file reference for the data set named 'COUNTRY.DAT'. The INFILE command tells SAS to get the data file on drive A: under 'COUNTRY.DAT'.
In the SAS v 7, support data files formats include dBASE files and Lotus 1-2-3 and Microsoft Excel 97 spreadsheets and Microsoft Access Tables. Using the Import Wizard, the user will be guided to create a data set from files in these formats. To import files, click on File on the menu and select import as in the following:
The Import Wizard will guide you through the importation process.
SAS uses a special data format during data processing. This unique data format is called a SAS data or system File. If the data file you tell SAS to use is not a SAS File, SAS converts it to a SAS File before SAS starts processing the data set. SAS Files have special characteristics that make them more convenient and efficient for SAS to use. There are two types of SAS files: SAS Data Sets (*.sd2) and SAS Catalogs (*.sc2). The most commonly used is the SAS Data Set. In a SAS Data Set, variable names, variable labels, and variable formats have been recorded together with the variable values.
A SAS File name is somewhat different from other types of data file names. A complete SAS file name consists of two parts separated by a period, for example PROJECT1.FITNESS. The first part is called the first-level name or libref, identifying the directory or library where the file is saved. The second part, the second-level name identifies the specific file name in the directory or library. Anyone can create a SAS Data Set from a regular file. The following example 5 shows how to do this.
Example 3:
TITLE 'SAS SAMPLE - COUNTRY DATA'; LIBNAME PROJECT1 'A:\MYDATA'; DATA PROJECT1.COUNTRY; ARRAY MISSING GNP MILEXPED PEDEXPED; infile 'a:\country.dat'; input dec 1 id 2-4 name $char26. sscode 31-33 contin $ 34-35 dodev 36 populate 37-43 area 44-49 gnp 50-56 milexped 57-64 .1 pedexped 65-71 .1; label name = "COUNTRYS' NAME" CONTIN = 'CONTINENT' DODEV = 'DEGREE OF DEVELOPMENT' GNP = 'GNP IN MILLIONS OF DOLLARS' MILEXPED= 'MILITARY EXPENDITURE IN MILLIONS OF $' PEDEXPED= 'PUB. EDUCATION EXPENDITURE IN MIL. $'; DO OVER MISSING; IF MISSING= 9999999 OR MISSING = 999999.9 OR MISSING=99999.9 THEN MISSING=.; END; RUN; PROC PRINT DATA=COUNTRY; RUN;
The LIBNAME directs SAS to associate PROJECT1 with the directory A:\MYDATA. After this job has been executed, you will have a SAS Data Set saved under A:\MYDATA\COUNTRY.SD2. Retrieving a SAS Data Set is easy because you do not have to tell SAS the variable names, variable formats, variable label, and variable locations. The example below shows you how this can be done.
Example 4:
LIBNAME PROJECT1 'A:\MYDATA'; DATA; SET PROJECT1.COUNTRY; PROC PRINT; RUN;
In a SAS for Windows session, you can have as many SAS data steps as you want. You can use the LIBNAME command as often as you need to direct SAS for Windows to different SAS data directories. In case you have many SAS data files in a SAS program, SAS for Windows allows you to keep track of your SAS data files and their variables.
SAS for Windows has LIBNAME, DIRECTORY, and VARIABLES windows. The LIBNAME window tells you how many SAS data libraries are in a SAS program. The DIRECTORY window displays how many SAS data files are in a SAS data library or directory. The VARIABLES window lists the SAS variables in each SAS data file. To tell SAS for Windows to go to the LIBNAME window, you type LIB at the command box. A list of libraries or directories will be shown on a new window (LIB Window). You can tab down to the directory that you want to inspect. Then you mark it by typing S in front of the directory name and press <ENTER>. A list of SAS data files will be displayed in a new window (DIR window). If you want to look at the variables in a SAS data file, you tab the cursor to that file and then mark that file and hit <ENTER>. You will see a list of variables in that file in a new window (VAR window).
You can also go to the DIR and VAR windows directly by typing DIR and VAR respectively at the command prompt in any SAS for Windows display Manager window. By doing this, SAS for Windows displays the current directory which is the WORK directory. To tell SAS for Windows to display the desired directory, you can type the name of the directory at the top of the window. You can do the same with the VAR window.
The following covers some of the most commonly used SAS procedures with which you can run some basic statistical analyses.
1. PROC PRINT
PROC PRINT is frequently used to check the data being read by SAS. It prints out
the observations in a SAS data set, using any or some of the variables.
The syntax is as follows:
PROC PRINT DATA= SAS-data-set
DOUBLE
NOOBS
UNIFORM
LABEL
SPLIT= 'split-character'
N
ROUND
HEADING= direction
ROWS= page-format
WIDTH= column-width;
VAR variable-list;
ID variable-list;
BY variable-list;
PAGEBY BY-variable;
SUMBY BY-variable;
SUM variable-list;
The most common use is to have the PROC PRINT following the data step to verify the data:
DATA;
INPUT X Y;
CARDS;
1 2
3 4
5 7
;
PROC PRINT;
RUN;
2. PROC CONTENTS
This procedure prints descriptions of the contents of one or more files from a SAS
library. Another common procedure to verify the data set read into SAS library,
especially for a sizeable data set. It is crucial, for example, to check if all
observations and variables are read in correctly. PROC CONTENTS prints descriptions
of the contents of one or more files from a SAS data library. It is useful for documenting
permanent SAS data sets (library members of DATA type).
Specific information pertaining to the physical characteristics of a member depends on
whether the file is a SAS data set or another type of SAS file.
Syntax:
PROC CONTENTS <DATA= <libref.>member>
<DIRECTORY>
<FMTLEN>
<MEMTYPE= (mtype-list)>
<NODS>
<NOPRINT>
<OUT= SAS-data-set>
<POSITION>
<SHORT>
<DETAILS|NODETAILS>;
3. PROC UNIVARIATE
This procedure is useful for basic descriptives of the variables. It provides
detail on the distribution of a variable. Features include:
?nbsp; detail on the extreme values of a variable
?nbsp; quartiles, such as the median
?nbsp; several plots to picture the distribution
?nbsp; frequency tables
?nbsp; a test to determine that the data are normally distributed.
If a BY statement is used, descriptive statistics are calculated separately for groups of
observations.
Syntax:
PROC UNIVARIATE DATA= SASdataset
NOPRINT
PLOT
FREQ
NORMAL
PCTLDEF= value
VARDEF= DF|WEIGHT|WGT|N|WDF
ROUND= roundoff unit...;
VAR variables;
BY variables;
FREQ variable;
WEIGHT variable;
ID variables;
OUTPUT OUT= SASdataset keyword= names...;
4. PROC FREQ
The procedure produces one-way to n-way frequency and crosstabulation tables. It
shows the distribution of variable values and crosstabulation tables with combined
frequency distributions for two or more variables. For one-way tables, PROC FREQ can
compute chi-square tests for equal or specified proportions. For two-way tables, PROC FREQ
computes tests and measures of association. For n-way tables, PROC FREQ does stratified
analysis, computing statistics within as well as across strata.
Syntax:
PROC FREQ options;
OUTPUT <OUT= SAS-data-set><output-statistic-list>;
TABLES requests / options;
WEIGHT variable;
EXACT statistic-keywords;
BY variable-list;
5. PROC TABULATE
PROC TABULATE constructs tables of descriptive statistics using class variables,
analysis variables, and keywords for statistics. Tables can have one to three dimensions:
column; row and column; or page, row, and column.
The statistics that PROC TABULATE computes are many of the same statistics computed by
other descriptive procedures such as MEANS, FREQ, and SUMMARY. In order for PROC TABULATE
to execute, you need either a CLASS or VAR statement, and a TABLE statement. There are no
default variables chosen for the procedure.
Syntax:
PROC TABULATE <option-list>;
CLASS class-variable-list;
VAR analysis-variable-list;
FREQ variable;
WEIGHT variable;
FORMAT variable-list-1 format-1 <...variable-list-n format-n>;
LABEL variable-1='label-1' <...variable-n='label-n'>;
BY <NOTSORTED> <DESCENDING> variable-1
<...<DESCENDING> VARIABLE-N>;
TABLE <<page_expression,> row_expression,> column_expression
</ table-option-list>;
KEYLABEL keyword-1 ='description-1'
<...keyword-n='description-n'>;
6. PROC MEANS
PROC MEANS computes statistics for an entire SAS data set or for groups of observations in the data set. If you use a BY statement, PROC MEANS calculates descriptive statistics separately for groups of observations. Each group is composed of observations having the same values of the variables used in the BY statement. The groups can be further subdivided by the use of the CLASS statement. PROC MEANS can optionally create one or more SAS data sets containing the statistics calculated.
PROC MEANS is the easiest and most direct descriptive procedure for computing univariate statistics. Other SAS procedures which compute univariate statistics and provide additional features are CHART, TABULATE, and UNIVARIATE.
The full syntax for PROC MEANS is as follows:
PROC MEANS <option-list> <statistic-keyword-list>; VAR variable-list; BY variable-list; CLASS variable-list; FREQ variable; WEIGHT variable; ID variable-list; OUTPUT <OUT= SAS-data-set> <output-statistic-list> <MINID|MAXID <(var-1<(id-list-1)> <...var-n<(id-list-n)>>)>=name-list>;
7. PROC REG
PROC REG is a general-purpose procedure for regression, while other regression procedures in the SAS System implement more specialized applications. PROC REG provides nine model-selection methods, tests linear hypotheses and multivariate hypotheses, generates scatter plots of data and various statistics, computes collinearity diagnostics and influence statistics, produces partial leverage plots, and outputs statistics to a SAS data set, including predicted values, residuals, ridge regression estimates and confidence limits. PROC REG fits linear regression models by least-squares estimation. Subsets of independent variables that "best" predict the dependent or response variable can be determined by various model-selection methods. PROC REG can be used interactively.
The full syntax for PROC REG is as follows:
PROC REG options; label: MODEL dependents= regressors / <options>; BY variable-list; FREQ variable; ID variable; VAR variable-list; ADD variable-list; DELETE variable-list; REWEIGHT <condition|ALLOBS> </options> | <STATUS|UNDO>; WEIGHT variable; label: MTEST <equation1, ... equationk / options>; OUTPUT OUT= SAS-data-set keyword= names ...; PAINT <condition|ALLOBS> </options> | <STATUS|UNDO>; PLOT <yvariable1*xvariable1> <=symbol1>,... <yvariablek*xvariablek> <=symbolk> </options>; PRINT <options ANOVA MODELDATA>; REFIT; RESTRICT equation1, ... equationk; label: TEST equation1, ... equationk / option;
8. PROC ARIMA
The ARIMA procedure implements a flexible and powerful method to analyze and forecast time series data. PROC ARIMA's implementation is similar to that of programs 1-7 in Part V of Box and Jenkins (1976). PROC ARIMA can handle time series of moderate size; there should be more than 30 observations and less than 2000. You should consider other procedures, such as FORECAST or AUTOREG, if PROC ARIMA does not meet your needs.
PROC ARIMA models a value in a response time series as a linear combination of its own past values, past errors (shocks, innovations), and past values of other time series.
The full syntax for PROC ARIMA is as follows:
PROC ARIMA options; IDENTIFY VAR=variable options; ESTIMATE options; FORECAST options; BY variables;
Recommended mapping of SAS functions keys (Windows, Mac, UNIX, CMS)
F1 | help |
F2 | lib |
F3 | end |
F4 | recall |
F5 | pgm |
F6 | log |
F7 | output |
F8 | zoom off |
F9 | keys |
F11 | command bar |
F12 | next |
Ctrl-F12 | viewtable work._last_ |
Ctrl-F1 | output;clear;log;clear;pgm;recall |
Next week:
Data management
Database functionalities
Report writing
Statistical procedures
Last updated: 01/18/06 by Karl Ho