Introduction to SAS II

Creation date: 12/30/2000
Author: Karl Ho

Objectives: This is the second of the SAS short course series. It is designed for intermediate users who have taken the first class and want to advance the programming techniques in SAS. It focuses on the two building blocks of SAS programming: the DATA step and PROCedure step. After this course, you should be able to:

Understand SAS library and data file system;
Import data into SAS and export SAS data sets to other formats;
Perform statistical analysis;
Run queries using SQL (Structured Query Langauage)
Write Macros

Topics:

The DATA Step
Manipulating and subsetting data set
Subsetting Data Sets
Using Array
The PROCedure Step
Running queries using SQL
Writing Macros
SAS for UNIX

The DATA Step

In this session, we focus on the first of the two major parts in a SAS program -- the DATA step. It covers data entry, reading raw data, data manipulation and management. We will follow up on the materials we have covered in the Introduction to SAS I class and apply to more hands-on programming in this workshop. At the appendix, we provide a more detailed explanation on using the Display Manager System in SAS.

Before your data can be analyzed by SAS, it must be in a form that the SAS System can recognize, i.e., a SAS data set. It is contingent on the Data step that reads in raw data, manipulates variables and stores data in designated area(s).

A DATA step consists of a group of statements that reads ASCII text data (in a computer file) or existing SAS data sets in order to create a new SAS data set.

The DATA step must begin with the DATA statement and should end with a RUN statement.

Data set can be created and stored in a permanent library. Otherwise, it will stay in a temporary library (by default, WORK) which lasts as long as the current SAS session, i.e. such data sets will be erased when you exit SAS. Data manipulation must be done in a DATA step and cannot be done in a PROC step.

Data set options specify actions that apply only to the SAS data set with which they appear. They let you perform such operations as:

creating new variables out of existing variables or random functions

renaming variables

selecting only the first or last n observations for processing

dropping variables from processing or output

specifying a password for a SAS mainframe data set.

A SAS data library is a collection of SAS files that are recognized as a unit by the SAS system.

The three most common forms of general syntax for the DATA step are:

sasws1c.gif (3077 bytes)

sasws1d.gif (2741 bytes)

sasws1e.gif (2778 bytes)

The following examples illustrate how to implement these three data methods.

In example 1, a temporary data set is defined and an external raw data file is read in using INFILE. The INPUT specifies the variable names, locations and formats of the variables.

Example 1

data geol; infile 'filename'; input state $ 1-3 county $ 5-12 sqmile 14-19 region 22-24 tract 27-29 code $ 32-33 rainfall 37-40 temp 43-46 temptype $ 49; proc print data=geol; run;

Example 2 illustrates the use of pre-stored data sets with the SET command. Note that the data set is a permanent one stored under the library SASWS1.

Example 2

libname sasws1 'd:\temp'; data geolnew; set sasws1.geol; proc print data=sasws1.geol; run;
The third example shows combining the current data sets geol and district into one data set called combine. Is this data set permanent or temporary?

Example 3

data combine; merge geol district; run;

Manipulating and subsetting data set

The following example applies the if/then statements to create variables based on certain conditions.

If-Then/Else Statements

Example 4

libname sasws1 'c:\temp'; DATA SASWS1.COUNTRY; INFILE 'c:\temp\country.dat'; INPUT DEC 1 ID 2-4 NAME $CHAR26. SSCODE 31-33 CONTIN $ 34-35 DODEV 36 POPULATE 37-43 AREA 44-49 GNP 50-56 MILEXPED 57-64 .1 PEDEXPED 65-71 .1; DATA temp; SET sasws1.country; IF GNP GE 20000 THEN GNPNEW = 'high'; ELSE IF 10000 < GNP < 20000 THEN GNPNEW='med'; ELSE GNPNEW='lo'; RUN; PROC PRINT; VAR GNP GNPNEW; RUN;

This program also demonstrates the use of the IF-THEN/ELSE statements. The third statement creates a new variable named GNPNEW and assigns it the value "high" if the observation has a value greater than or equal to (GE) 20,000 for the variable named GNP. The next statement uses a compound inequality (i.e.,10000 < GNP < 20000 ) to assign the value med to GNPNEW if the observation's value for GNP is between 10000 and 20000. Finally, the fifth statement assigns the value lo to GNPNEW to all other observations -- that is, to all observations with a value for GNP that is missing or less than 10000.

Subsetting Data Sets

Consider the following statements:

Example 5

data high medium low;
  SET work.temp;
  if GNPNEW="high" then output high;
  else if GNPNEW="med" then output medium;
  else output low;
run;
PROC PRINT;
RUN;

The IF-THEN statements draw subsets from the data set by the GNPNEW variable. Use the following statements, we can print the most recently created data set. What is it?

Example 6

PROC PRINT; RUN;

If we just want to print the data set HIGH, we apply:

Example 7

PROC PRINT data=high; RUN;

When using list input, SAS scans the input line for values instead of reading from specific columns. Features of list input include:

. Order of the variables in the INPUT statement and their corresponding values in the data must be the same; values cannot be selectively read with list input.

. Values must be separated by at least one blank.

. Missing values must be represented by periods, not blanks.

. Numerical values cannot contain embedded blanks.

. Character values longer than eight characters must use a format-modifier statement.

The syntax for list input is:

Example 8

INPUT variable [$] [&] ... ;

where:

variable is the variable name for the data value to be read.

$ indicates that the variable has character values.

& indicates that a character value may have one or more single embedded blanks.

An informat can be specified following a variable on the INPUT statement. The informat defines the variable's data type and field width, and how the values are to be read. An informat takes the form [$][name][w].[d], where $ indicates a character informat, w is the number of columns in the input data, and d gives the number of decimal places to be assigned to values without an explicit decimal point.

The syntax for using informats is:

Example 9

INPUT variable informat ... ;where:

variable is the variable name for the data value to be read

informat gives the informat to use when reading the data value.

Data set options can be specified whenever a SAS data set is specified. Some options can be specified as statements in a DATA step. These same options can be used following a specified data set in a SET, MERGE, or PROC statement. In this case, the data set options must be enclosed in parentheses and must immediately follow the data set to which they apply. The following are some commonly used data set options:

DROP = variables drops the listed variables from the data set being created.
KEEP = variables keeps the listed variables in the data set being created.
RENAME = (old=new) changes variable name from old to new.

The following example demonstrate dropping two old variables in addition to renaming an old variable to a new name:

Example 10

DATA GEOLNEW (DROP=TEMP TEMPC); SET GEOL(RENAME=(TEMPF=TEMPNEW)); RUN;

The MERGE statement joins corresponding observations from two or more SAS data sets into single observations in a new SAS data set. You can merge data sets with or without a BY statement. Without a BY statement, MERGE performs one-to-one merging by joining the first observation in one data set with the first observation in another, the second observation in one data set with the second observation in another, and so on. With a BY statement, MERGE performs match-merging by joining observations from two or more sorted data sets, based on the values of the common BY variables. The syntax for the MERGE statement is:

Example 10

MERGE datasets [(options)] ; [BY variables ;]

where:

datasets are two or more existing SAS data sets.

[(options)] are data-set options, enclosed in parentheses.

[BY variables ;]are the matching variables for the BY statement.

Each data set must be sorted by these variables. SAS functions are routines that return values computed from one or more arguments; they are used to create new variables or modify existing ones. Functions are used in statements that have the syntax:

Example 11

variable = function(arguments) ;

where:

variable is the name of the variable being created or modified.

function is the name of the function you want to use.

arguments are one or more variable names, constants, or expressions.

Commonly Used Functions

MAX returns the largest of the argument values

MIN returns the smallest of the argument values

SQRT calculates square root of the argument value

ROUND rounds value to the nearest indicated round-off unit

LOG gives the natural log of the argument

MEAN returns the mean of the nonmissing argument values

SUM returns the sum of the nonmissing argument values

STD returns the standard deviation of the nonmissing values

DATE gives the current date as a SAS date value

Conditional IF statements, with a THEN clause, execute SAS statements for those observations that meet the condition defined in the IF clause. An optional ELSE statement executes alternative statements if the THEN clause is not executed. In the syntax of each IF statement:

expression is any valid SAS expression.

statement is any executable statement or DO group.

The expression can use the following comparison operators, as well as arithmetic

operators:

EQ equal to NE not equal to

GT greater than GE greater than or equal to

LT less than LE less than or equal to

Use the IF statement when you want to execute a SAS statement for some but not all of the observations in the data set being created. The expression following the IF is evaluated; if it is true, then the statement following the THEN is executed. Syntax:

IF expression THEN statement ;

Use the IF-THEN/ELSE statements when you want to conditionally process all the observations in the data set being created. When the expression following the IF is true, the statement following the THEN is executed and the statement following the ELSE is ignored. When the expression is false, the statement following the ELSE is executed and the statement following the THEN is ignored. Syntax:

IF expression THEN statement ;

ELSE [IF] statement ;

Use the subsetting IF statement to select only those observations from the input data set that meet the IF condition. Therefore, the resulting data set contains a subset of the original observations. Syntax:

IF expression ;

In this case, SAS interprets the lack of a then-statement to mean "then include this observation in the data set".

Using Array

Array is an alias used to represent a set of variables to be process in a like manner

General form of the ARRAY statement:

ARRAY array-name{dimension} $ length elements (initial values);

The following example demonstrates the application of an array in defining missing values for multiple variables:

DATA COUNTRY; ARRAY MV GNP MILEXPED PEDEXPED; infile 'a:\country.dat'; input dec 1 id 2-4 name $char26. sscode 31-33 contin $ 34-35 dodev 36 populate 37-43 area 44-49 gnp 50-56 milexped 57-64 .1 pedexped 65-71 .1; DO OVER MV; IF MV= 9999999 OR MV = 999999.9 OR MV=99999.9 OR MV=9999999.9 THEN MV=.; END; RUN;

The PROCedure Step

In this session, we focus on the SAS procedure step that covers running SAS procedures on SAS data sets. A PROCedure step calls a SAS procedure to analyze or process a SAS dataset. The PROC step begins with a PROC statement and ends with a RUN statement. All of the statistical procedures require the input of a SAS data set. This data set should have already been prepared in a DATA step for processing by the procedure, since SAS procedures allow only limited adjustment of the data set.

The general syntax for a PROC step is:

PROC name [DATA=dataset [dsoptions] ] [options]; [other PROC-specific statements;] [BY varlist;] RUN;

where:

`name`	identifies the procedure you want to use.
`dataset`	identifies the SAS data set to be used by the procedure; if omitted, the last data set to have been created during the session is used.
`dsoptions`	specifies the data set options to be used.
`varlist`	specifies the variables that define the groups to be processed separately. The data set must already be sorted by these same variables.
`options`	specifies the PROC-specific options to be used.

The syntax above uses the following conventions for statements:

. SAS keywords are in UPPERCASE;
. User-supplied words (such as file names or variable names) are in lowercase;
. Options are in brackets [ ] . Note that you do not type the brackets.

This is a simplified form of the syntax conventions used in SAS manuals and in documentation for most statistical packages.

A SAS program can contain any number of DATA and PROC steps. The SAS statements in each step are executed all together. Once a dataset has been created, it can be processed by any subsequent DATA or PROC step. Note the following rules of the SAS statements:

- All SAS statements start with a keyword (DATA, INPUT, PROC, etc.)

- All SAS statements end with a semicolon (;) . (The most common problem students encounter is omitting a semicolon -- SAS thinks that two statements are just one.)

- SAS statements can be entered in free-format : You can begin in any column, type several statements on one line or split a single statement over several lines (as long as no word is split.).

- Uppercase and lowercase are equivalent, except inside quote marks ( sex = 'm'; is not the same as sex = 'M';).

SAS Procedures exist to carry out all the forms of statistical analysis. As the above examples indicate, a procedure is invoked in a "PROC step" which starts with the keyword PROC, such as:

PROC MEANS DATA=CLASS;
VAR HEIGHT WEIGHT;

The VAR or VARIABLES statement can be used with all procedures to indicate which variables are to be analyzed. If this statement is omitted, the default is to include all variables of the appropriate type (character or numeric) for the given analysis.

Some other statements that can be used with most SAS procedure steps are:

BY variable(s);

Causes the procedure to be repeated automatically for each different value of the named variable(s). The data set must first be sorted by those variables.

ID variable(s);

Give the name of a variable to be used as an observation IDentifier.

LABEL var='label';

Assign a descriptive label to a variable.

WHERE (expression);

Select only those observations for which the expression is true.

For example, the following lines produce separate means for males and females, with the variable SEX labeled 'Gender'. (An ID statement is not appropriate, because PROC MEANS produces only summary output.)

PROC SORT DATA=CLASS;
BY SEX;
PROC MEANS DATA=CLASS;
VAR HEIGHT WEIGHT;
BY SEX;
LABEL SEX='Gender';

If the DATA= option is not used, SAS procedures process the most recently created dataset. In the brief summaries below, the required portions of a PROC step are shown in bold. Only a few representative options are shown.

Functional Categories of Base SAS Procedures

Base SAS software provides a variety of procedures that produce reports, compute statistics, and perform utility operations.

Report Writing

These procedures display useful information, such as data listings (detail reports), summary reports, calendars, letters, labels, forms, multipanel reports, and graphical reports:

CALENDAR sample program	MEANS^*	SQL^*
CHART^*	PLOT	SUMMARY^*
FORMS	PRINT	TABULATE^*
FREQ^*	REPORT^*	TIMEPLOT
*These procedures produce reports and compute statistics.

Statistics

These procedures compute elementary statistical measures which include descriptive statistics based on moments, quantiles, confidence intervals, frequency counts, cross-tabulations, correlations, and distribution tests. They also rank and standardize data:

CHART	RANK	SUMMARY
CORR	REPORT	TABULATE
FREQ	SQL	UNIVARIATE
MEANS	STANDARD

Utilities

These procedures perform basic utility operations. They create, edit, sort, and transpose data sets, create and restore transport data sets, create user defined formats, and provide basic file maintenance such as to copy, append, and compare data sets:

APPEND	EXPLODE	REGISTRY
BMDP^**	EXPORT	RELEASE^**
CATALOG	FORMAT	SORT
CIMPORT	FSLIST	SOURCE^**
COMPARE	IMPORT	SQL
CONTENTS	OPTIONS	TAPECOPY^**
CONVERT^**	PDS^**	TAPELABEL^**
COPY	PDSCOPY^**	TRANSPOSE
CPORT	PMENU	TRANTAB
DATASETS	PRINTTO
**See the SAS documentation for the operating environment for a description of these procedures.

Examples

1. Descriptive statistics

PROC CORR

Correlations among a set of variables.

PROC CORR DATA=SASdataset options;
options:NOMISS ALPHA
VAR variable(s);
WITH variable(s);

where nomiss option excludes missing values and ALPHA specifies Pearson Correlations with Cronbach’s alpha.

Example

To get the correlation coefficients fro HEIGHT and WEIGHT, use the VAR statement:

DATA CLASS; INPUT NAME $ SEX $ AGE HEIGHT WEIGHT; CARDS; Alice F 13 56.0 84 Barbara F 14 62.0 102 Bernadette F 13 65.0 98 Jane F 12 59.0 84 Janet F 15 62.0 112 Joyce F 11 51.0 50 Judy F 14 64.0 90 Louise F 12 56.0 77 Mary F 15 66.0 112 Alfred M 14 69.0 112 Henry M 14 63.0 102 James M 12 57.0 83 Jeffery M 13 62.0 84 John M 12 59.0 99 Philip M 16 72.0 150 Robert M 12 64.0 128 Ronald M 15 67.0 133 Thomas M 11 57.0 85 William M 15 66.0 112 ;PROC CORR; VAR HEIGHT WEIGHT;

The output should look like:

                                         The SAS System     11:32 Wednesday, November 8, 2000   1

                                       The CORR Procedure

                                2  Variables:    HEIGHT   WEIGHT


                                       Simple Statistics

   Variable           N          Mean       Std Dev           Sum       Minimum       Maximum

   HEIGHT            19      61.94737       5.19052          1177      51.00000      72.00000
   WEIGHT            19      99.84211      22.81876          1897      50.00000     150.00000


                            Pearson Correlation Coefficients, N = 19
                                   Prob > |r| under H0: Rho=0

                                             HEIGHT        WEIGHT

                               HEIGHT       1.00000       0.87800
                                                           <.0001

                               WEIGHT       0.87800       1.00000
                                             <.0001

PROC FREQ

Frequency tables, chi ?tests

PROC FREQ DATA=SASdataset; TABLES variable(s) / options; options:NOCOL NOROW NOPERCENT OUTPUT OUT=SASdataset;

Example

To get the frequency of AGE in Data Class.

PROC FREQ DATA=CLASS;
TABLES AGE;

Then output should look like:

    ---------------------------------------------------------------------
                                          CUMULATIVE    CUMULATIVE
      AGE    FREQUENCY    PERCENT         FREQUENCY     PERCENT
      
      11         2          10.5              2            10.5
      12         5          26.3              7            36.8
      13         3          15.8              10           52.6
      14         4          21.1              14           73.7
      15         4          21.1              18           94.7
      16         1           5.3              19           100.0
    ---------------------------------------------------------------------

Also, you can get the crosstab table for two variables. For example, if you want to examine the

relationship between AGE and HEIGHT, you can use the Frequency procedure get the cross table for them.

PROC FREQ DATA=CLASS;
TABLES AGE*HEIGHT;

PROC MEANS

Means, standard deviations, and a host of other univariate statistics for a set of variables.

PROC MEANS DATA=SASdataset options;
options:N MEAN STD MIN MAX SUM VAR CSS USS
VAR variable(s);
BY variable(s);
OUTPUT OUT=SASdataset keyword=variablename ... ;

Statistical options on the PROC MEANS statement determine which statistics are printed. The (optional) OUTPUT statement is used to create a SAS dataset containing the values of these statistics.

Example

You can examine the means of WEIGHT for different SEX.

PROC MEANS;
BY SEX;
VAR WEIGHT;

The output should look like:

    -----------------------------------------------------------------
    VARIABLE  N      MEAN         STD         MINIMUM    MAXIMUM    

    -------------------------SEX=F-----------------------------------
     WEIGHT   9   89.88888889   19.41934888   50.000000   112.000000
    -------------------------SEX=M-----------------------------------
     WEIGHT  10   108.8000000   22.75863694   83.000000   150.000000
    -----------------------------------------------------------------

PROC UNIVARIATE

Univariate statistics and displays for a set of variables.

PROC UNIVARIATE DATA=SASdataset options;
options:PLOT
VAR variable(s);
BY variable(s);
OUTPUT OUT=SASdataset keyword=variablename ... ;

Example

You can examine the univariate statistics like median and kurtosis of WEIGHT for different SEX.

PROC UNIVARIATE DATA=class PLOT;
VAR weight;
BY sex ;
run;

Click here to take a look at the output.

2. Linear models

SAS statements and options for regression (PROC REG) are described in more detail in the document PROC REG Summary. SAS statements and options for analysis of variance (PROC ANOVA and PROC GLM) described in the document PROC ANOVA and PROC GLM.

PROC ANOVA

Analysis of variance (balanced designs)

PROC ANOVA DATA=SASdataset options;
CLASS variable(s);
MODEL dependent(s)= effect(s);

PROC GLM

General linear models, including ANOVA, regression and analysis of covariance models.

PROC GLM DATA=SASdataset options;
CLASS variable(s);
MODEL dependent(s)= effect(s);
OUTPUT OUT=SASdataset keyword=variablename ... ;

Sample program

Sample output

PROC REG

Regression analysis

PROC REG DATA=SASdataset options;
MODEL dependent(s) = regressors
/ options;
PLOT variable | keyword. *
variable | keyword. = symbol ;
OUTPUT OUT=SASdataset P=name R=name ... ;

3. Plots and charts

PROC CHART

Histograms and bar charts

PROC CHART DATA=SASdataset options;
VBAR variable / options;
HBAR variable / options;
options: MIDPOINTS= GROUP= SUMVAR=

PROC PLOT

Scatter plots

PROC PLOT DATA=SASdataset options;
options: HPERCENT= VPERCENT=
PLOT yvariable * xvariable = symbol / options;
PLOT (yvariables) *(xvariables) = symbol / options;
PLOT options: BOX OVERLAY VREF= HREF=
BY variable(s) ;

Note that the parenthesized form in the PLOT statement plots each y-variable listed against each x-variable.

4. Utility procedures

PROC PRINT

Print a SAS data set

PROC PRINT DATA= SASdataset options;
options: UNIFORM LABEL SPLIT='char'
VAR variable(s);
BY variable(s);
SUM variable(s);

PROC SORT

Sort a SAS data set according to one or more variables.

PROC SORT DATA=SASdataset options;
options: OUT=
BY variable(s);

Structured Query Language (SQL)

The SQL procedure implements Structured Query Language (SQL) for the SAS System. SQL is a standardized, widely used language that retrieves and updates data in tables and views based on those tables.

The SAS System's SQL procedure enables you to

retrieve and manipulate data that are stored in tables or views.
create tables, views, and indexes on columns in tables.
create SAS macro variables that contain values from rows in a query's result.
add or modify the data values in a table's columns or insert and delete rows. You can also modify the table itself by adding, modifying, or dropping columns.
send DBMS-specific SQL statements to a database management system (DBMS) and to retrieve DBMS data.

PROC SQL performs database management and operation in much simpler language syntax. But it is much more powerful when users are primarily concerned with performing query on a database. A simplified version of the PROC SQL syntax is as follows:

PROC SQL <option(s)>;  
 CREATE TABLE table-name (column-definition <,column-definition>...);  (column-specification , ...<constraint-specification > ,...) ;  
 CREATE TABLE table-name LIKE table-name;  
 CREATE TABLE table-name AS query-expression  <ORDER BY order-by-item <,order-by-item>...>;  
 DELETE  FROM table-name|proc-sql-view |sas/access-view <AS alias>  <WHERE sql-expression>;  
 DROP INDEX index-name <,index-name>... FROM table-name;  
 DROP TABLE table-name <,table-name>...;  
 INSERT INTO table-name|sas/access-view|proc-sql-view<(column<,column>...)>  VALUES (value<,value>...) 
<VALUES (value <,value>...)>...;  
 SELECT <DISTINCT> object-item <,object-item>...   <INTO :macro-variable-specification 
<, :macro-variable-specification>...>  
 FROM from-list  
 <WHERE sql-expression>  
 <GROUP BY group-by-item 
<,group-by-item>...>  
 <HAVING sql-expression>  
 <ORDER BY order-by-item 
<,order-by-item>...>;  
 UPDATE table-name|sas/access-view|proc-sql-view <AS alias>  SET column=sql-expression 
<,column=sql-expression>...  
 <SETcolumn=sql-expression 
<,column=sql-expression>...>  
 <WHERE sql-expression>;  
 VALIDATEquery-expression

Note that PROC SQL does not need the RUN; or QUIT to close the PDV for execution. The following sample program illustrates how PROC SQL generates a subset out of an existing data based on a condition:

LIBNAME PROJECT1 'c:\temp'; PROC SQL; create table world.africa as Select NAME,CONTIN,DODEV,POPULATE,GNP,MILEXPED,PEDEXPED from PROJECT1.COUNTRY where DODEV EQ 1 ;

This program stores seven variables out of the original COUNTRY data set based on the condition that the Development variable, DODEV, is equal to 1. Another PROC SQL example as follows inserts value into a SAS data set (or an SQL table, as it is used in SQL language):

libname proclib 'c:\temp';
options nodate pageno=1 linesize=80 pagesize=40;
proc sql;
   create table proclib.paylist
       (IdNum char(4),
        Gender char(1),
        Jobcode char(3),
        Salary num,
        Birth num informat=date7.
                  format=date7.,
        Hired num informat=date7.
                  format=date7.);
 
  insert into proclib.paylist
    values('1639','F','TA1',42260,'26JUN70'd,'28JAN91'd)
    values('1065','M','ME3',38090,'26JAN54'd,'07JAN92'd)
    values('1400','M','ME1',29769.'05NOV67'd,'16OCT90'd)
 
      values('1561','M',null,36514,'30NOV63'd,'07OCT87'd)
    values('1221','F','FA3',.,'22SEP63'd,'04OCT94'd);
 
  title 'PROCLIB.PAYLIST Table';
select *
   from proclib.paylist;

Simple SQL-based queries can also be performed in SAS by typing query in the command box. It performs functions like generating sub-tables, calculating functions out of existing variables such as average or sum.

Writing Macros

The macro facility is a tool for extending and customizing the SAS System and for reducing the amount of text you must enter to do common tasks. The macro facility allows you to assign a name to character strings or groups of SAS programming statements. From that point on, you can work with the names rather than with the text itself.

When you use a macro facility name in a SAS program or from a command prompt, the macro facility generates SAS statements and commands as needed. The rest of the SAS System receives those statements and uses them in the same way it uses the ones you enter in the standard manner.

The macro facility has two components:

the macro processor is the portion of the SAS System that does the work.

the macro language is the syntax that you use to communicate with the macro processor.

When the SAS System compiles program text, two delimiters trigger macro processor activity:

&name refers to a macro variable. The form &name is called a macro variable reference.

%name refers to a macro.

The text substitution produced by the macro processor is completed before the program text is compiled and executed. The macro facility uses statements and functions that resemble those that you use in the DATA step; however, an important difference is that macro language elements can only trigger text substitution and are not present during program or command execution.

Note: Three SAS statements begin with a % that are not part of the macro facility. These elements are the %INCLUDE, %LIST, and %RUN statements. These statements are documented in your base SAS documentation.

Macro variables are an efficient way of replacing text strings in SAS code. The simplest way to define a macro variable is to use the %LET statement to assign the macro variable a name (subject to standard SAS naming conventions), and a value. Here is a simple example:

%let city=New Orleans;

Now you can use the macro variable CITY in SAS statements where you'd like the text New Orleans to appear. You refer to the variable by preceding the variable name with an ampersand (&), as in the following TITLE statement:

title "Data for &city";

The macro processor resolves the reference to the macro variable CITY, and the statement becomes

title "Data for New Orleans";

Macros allow you to substitute text in a program and to do many other things. A SAS program can contain any number of macros, and you can invoke a macro any number of times in a single program.

To help you learn how to define your own macros, this section presents a few examples you can model your own macros after. Each of these examples is fairly simple; by mixing and matching the various techniques, you can create advanced, flexible macros that are capable of performing complex tasks.

Each macro you define has a distinct name, which is subject to the standard SAS naming conventions. (See the base SAS language documentation for more information on SAS naming conventions.) A macro definition is placed between a %MACRO statement and a %MEND (macro end) statement, as follows:

%MACRO macro-name;

macro definition

%MEND macro-name;

The macro-name specified in the %MEND statement must match the macro-name specified in the %MACRO statement.

Note: While specifying the macro-name in the %MEND statement is not required, it is recommended. It makes matching %MACRO and %MEND statements while debugging easier.

Example:

%macro plot;
   proc plot;
      plot income*age;
   run;
%mend plot;

Later in the program you can invoke the macro as follows:

data temp;
   set in.permdata;
   if age>=20;
run;

%plot

proc print;
run;

The following example illustrates a more sophisticated use of Macro to run conditional procedures. By using the %IF-%THEN-%ELSE macro statements, you can conditionally generate SAS code with a macro. Here is an example:

%macro whatstep(info=,mydata=);
   %if &info=print %then
      %do;
         proc print data=&mydata;
         run;
      %end;

   %else %if &info=report %then
      %do;
         options nodate nonumber ps=18 ls=70 fmtsearch=(sasuser);
      proc report data=&mydata nowd;
         column manager dept sales;
         where sector='se';
         format manager $mgrfmt. dept $deptfmt. sales dollar11.2;
         title 'Sales for the Southeast Sector';
      run;
   %end;
%mend whatstep;

In this example, the macro WHATSTEP uses keyword parameters, which are set to default null values. When you call a macro that uses keyword parameters, specify the parameter name followed by an equal sign and the value you want to assign the parameter. Here, the macro WHATSTEP is called with INFO set to print and MYDATA set to grocery:

%whatstep(info=print,mydata=grocery)

This produces the following statements:

proc print data=grocery;
run;

Because the macro processor is case sensitive, the previous program does not work if you specify PRINT instead of print. To make your macro more robust, use the %UPCASE macro function. For more information on this function, refer to Chapter 13, "Macro Language Dictionary."

Using SAS Solutions and Tools

SAS provides a set of ready-to-use solutions, applications, and tools in its latest version of the software. The following gives a sample of the new developments in the SAS system.You can access many of these tools by choosing Solutions under the new menu. They are:

Analysis

Using the ANALYST application for statistics tasks
One-Way ANOVA
Linear Regression
Simple Statistics
Summary Statistics

One-way ANOVA

1                                                            11:32 Wednesday, November 8, 2000   1
 
                                        The ANOVA Procedure
 
                                      Class Level Information
  
                              Class         Levels    Values
 
                              CONTIN             5    AA AF EU LA NA 
 
 
                                   Number of observations    141
1                                                            11:32 Wednesday, November 8, 2000   2
 
                                        The ANOVA Procedure
  
 Dependent Variable: MILEXP   MILITARY EXPENDITURE IN MILLIONS OF $
 
                                                Sum of
        Source                      DF         Squares     Mean Square    F Value    Pr > F
 
        Model                        4     1.922911E12    480727755652       0.33    0.8560
 
        Error                      136    1.9684182E14    1.4473663E12                     
 
        Corrected Total            140    1.9876473E14                                     
 
 
                        R-Square     Coeff Var      Root MSE    MILEXP Mean
 
                        0.009674      684.5572       1203065       175743.6
 
 
        Source                      DF        Anova SS     Mean Square    F Value    Pr > F
 
        CONTIN                       4     1.922911E12    480727755652       0.33    0.8560
1                                                            11:32 Wednesday, November 8, 2000   3
 
                                        The ANOVA Procedure
 
                         Levene's Test for Homogeneity of MILEXP Variance
                           ANOVA of Squared Deviations from Group Means
  
                                         Sum of        Mean
                   Source        DF     Squares      Square    F Value    Pr > F
 
                   CONTIN         3    1.714E26    5.712E25       0.44    0.7266
                   Error        135    1.763E28    1.306E26                     
 
 
                                     Welch's ANOVA for MILEXP
  
                              Source          DF    F Value    Pr > F
 
                              CONTIN      4.0000       1.38    0.3281
                              Error       7.3883                     
1                                                            11:32 Wednesday, November 8, 2000   4
 
                                        The ANOVA Procedure
 
                         Level of            ------------MILEXP-----------
                         CONTIN        N             Mean          Std Dev
 
                         AA           40       274978.440       1578118.16
                         AF           45       224822.767       1490324.43
                         EU           30        77385.607        210782.17
                         LA           24         3255.146          4793.82
                         NA            2       631995.300        835510.58

Applications Development

Developing EIS and OLAP applications
Creating and enhancing customized applications
Using pre-defined Report Templates in an application
Creating a custom desktop environment
Source Control Manager (SCM)

New developments

Business Geographics

Address matching and geo-coding
Geographic reporting and map visualization
Using the SAS/AF Map Class in your applications

Connectivity

Remote library services
Compute services
Remote objecting services
Submitting SAS code to remote systems

Data Access

Importing and exporting data (using the Import/Export Wizard)
Using the External File Interface
Accessing databases

Data Management

Editing and browsing your data
Subsetting tables and applying a WHERE clause
Data Management Procedures

Data Presentation

Printing information from the SAS System

Database Marketing

Data visualization

Graphical Reporting

Using pre-defined Report Templates for graphing
Creating 3D Business Graphs
Mapping your data

Online Analytical Processing (OLAP)

Using multidimensional data in reports
Creating a multidimensional database

Report Writing

Report writing procedures

SAS for UNIX

MAIN

Last updated: 01/17/07 by Karl Ho

the macro processor	is the portion of the SAS System that does the work.
the macro language	is the syntax that you use to communicate with the macro processor.

&name	refers to a macro variable. The form &name is called a macro variable reference.
%name	refers to a macro.