Running SAS in Batch Mode on Unix

by Robert A. Yaffee
Statistics and Social Science Group
Academic Computing Facility
Courant Institute of Mathematical Sciences
251 Mercer Street
New York, New York 10012


September 7, 1995

Table of Contents

  1. Preparation of your SAS command (program) file
  2. Submitting the command file for processing
  3. Reviewing the output of the command file
  4. Debugging the program
  5. Resubmission of the debugged program file
  6. Printing your output

1. Preparation of the SAS batch program.

The Academic Computing Facility operates an IBM RS-6000/C-20 which runs Unix (IBM AIX) to serve most university statistical computing needs. The internet address of the IBM RS-6000/C-20 is stats1.acf.nyu.edu and its university users have internet access. Very high performance needs can be serviced by access to supercomputers.

Prepare you SAS command file with your favorite UNIX editor. At present, vi, ex, and pico are available on stats1. This program file should contain your options statement, your data and proc steps. Your data step may contain options, titles, data step name, input, labels, assorted assignments, transformation statements, and missing value statements. Your statistical procedures follow your data step.

A Typical Basic Annotated SAS Command File contains

	1. Configuration information
	
	   An options statement provides formatting 
	   of output to 80 column, 60 line output:

		options linesize=80 pagesize=60;

	2. Title information

	   SAS can handle up to 10 title lines for 
		definition of project, phase, task, 
		author, etc.
	   Title 'Panel Survey of Health Care Utilization';
	   Title2 'Practices in Greater New York City Area';
	   Title3 'Wave Five: 1993';

	3. Format Definitions
	   Nominal and ordinal variables have values 
		that require labels called formats.  
		The formats are set up with the format 
		procedure:

	   Proc Format;
		value sx 1 = 'Female' 2='Male';
		value ag 1 = 'Disagree strongly' 
			 2='Disagree' 3 = 'Unsure' 
			 4 = 'Agree' 5 = 'Agree strongly';
		value wk 1 = 'farmer' 2 ='blue collar' 
			 3='sales' 4 = 'Clerical' 
			 5 = 'Technical' 6 = 'Managerial' 
			 7 = 'Executive' 8 = 'Professional';		
		value ls 1 = 'Poor' 2 = 'Moderate' 
			 3 'Luxurious';
		value $ agg  'min' = 'Minor'
                              adu' = 'Adult';

	4. Data Step Definitions
	   The data set is defined with a name
	
	   Data one;

	5. Data file definition
           The location of the external data file is
	   specified with an infile statement. On the
	   unix system, the data file in this instance
	   is located in the /u1/userid/sas subdirectory
	   and the file is called mydata.dat


	   infile '/u1/userid/sas/mydata.dat';


	6. Input format
	   Each variable is defined. The variable name, 
	   type, and location of the variable in the 
	   data file is specified. SAS variable names 
	   are generally limited to 8 letters and numbers.
	   Character variables have a dollar sign after 
           their variable name, whereas numeric variables 
  	   do not.  The column location in the data 
  	   file is specified by a starting position a dash 
	   and an ending column location. On line 1 in the
	   data file, variables id, age, gender, sex, and
	   occupation may be found.  On line 2 in the data
	   file, the social security number and city of
           residence are located within the specified
	   column ranges.  The social security number and
	   city of residence are character variables.

		Input #1 id 1-4 age 6-8 gender 10 sex 12 
		occup 14 #2 ssn $ 1-11 cityname $ 13-23;

	7. Variable Label information
	   Each variable has a name and a label. The label
	   typically can be up to 40 columns in length. 
	   Longer labels will be truncated and result in 
	   notes of warning.	

	   Label id = 'Respondent id'
                 age = 'Age of Respondent in Years'
                 gender = 'Sex of Respondent'
	         ssn = 'Social Security Number'
		 cityname = 'City of Residence';


	8.  Variable Transformations and new definitions
	    Assignment statements permit new variable 
	    definitions.

	    /*Bureau of Census National Poverty Threshold*/

	    If famsize = 4 and income < 16000 
		then lifesty= 1;
	    if famsize = 4 and income gt 16000 and 
		income lt 60000 then lifesty = 2;
            if famsize = 4 and income gt 60000
		then lifesty = 3;
            label lifesty = 'Lifestyle of Respondent in NYC';
	    
	9.  Selection statements
		
   		The researcher may wish to subset 
		out cases for analysis.  He may wish 
		to examine females only for an analysis. 
		Such subsetting is accomplished with a 
		subsetting if statement.

		if sex = 1;

	
	10. Recoding statements

		Variables may be recoded with 
		conditional if statements or arrays.
		
		The lifestyle variable may be recoded 
		as poor or not poor with the following 
		statement.
		
		If lifestyl = 2 or lifesty = 3 
			then lifesty = 2;

		Even character variables may be created. See
                the Proc Format above for exemplification
                of how to set up formats for character 
		variables.

		if age < 21 then agegroup = 'min';
		if age ge 21 then agegroup = 'mdu';

		Arrays can be used to recode groups 
		of variables. For example:

		array coping (14) var1 - var14;
		   do i = 1 to 14;
		   if coping (i) = 9 then coping = 5;
		end;

	11. Missing value handling statements.

		Conditional if statements can 
		recode errors.
		
		if sex = 3 then sex = .;

	12. Format assignments
		Before the statistical procedures,
		it is necessary to assign formats
		to their respective variables.
		Note the $ before the agegroup
                format (it is a character variable).

		format sex sx.;
		format age ag.;
		format agegroup $ agg.;
                format lifestyl ls.;
		format occup wk.;

	12. Data may be sorted with the sort 
		procedure. 

		Proc Sort; by id;
		run;

	12. Data Preview may be undertaken 
		before analysis begins.

		The data may be listed out with a 
		proc print statemnt:

		proc print; 
		run;

	13. Statistical Procedures

		A range check of the data should be 
		undertaken before analysis as well.  
		This may be executed with a frequencies
		procedure.

		Proc Freq; 
		run;

		Other statistical procedures may be 
		appended to the program in order of 
		preference. Examples might be Proc Means, 
		Proc Anova, Proc GLM, Proc Reg, Proc Factor, 
		Proc Discrim, Proc Mixed, etc.

Therefore the actual SAS Program File without annotation would appear as follows:

	   options linesize=80 pagesize=60;    
	   Title 'Panel Survey of Health Care Utilization';
           Title2 'Practices in Greater New York City Area';
           Title3 'Wave Five: 1993';    
	     Proc Format;
                value sx 1 = 'Female' 2='Male';
                value ag 1 = 'Disagree strongly'
                         2='Disagree' 3 = 'Unsure'
                         4 = 'Agree' 5 = 'Agree strongly';
                value wk 1 = 'farmer' 2 ='blue collar'
                         3='sales' 4 = 'Clerical'
                         5 = 'Technical' 6 = 'Managerial'
                         7 = 'Executive' 8 = 'Professional';
                value ls 1 = 'Poor' 2 = 'Moderate'
                         3 'Luxurious';             
	    Data one;
	     infile '/u1/userid/sas/mydata.dat';
             Input id 1-4 age 6-8 gender 10 sex 12
               ssn $ 1-11 cityname $ 13-23;                                         
	     Label id = 'Respondent id'
                 age = 'Age of Respondent in Years'
                 gender = 'Sex of Respondent'
                 ssn = 'Social Security Number'
                 cityname = 'City of Residence';
	       /*Bureau of Census National Poverty 
			Threshold*/

                If famsize = 4 and income < 16000
                  then lifesty= 1;
                if famsize = 4 and income gt 16000 and
                  income lt 60000 then lifesty = 2;
                if famsize = 4 and income gt 60000
                then lifesty = 3;
                                    	                                                   
		If sex = 1;    
	
		If lifestyl = 2 or lifesty = 3
                        then lifesty = 2;

                If age < 21 then agegroup = 'Minor';
                If age ge 21 then agegroup = 'Adult';  

		If sex = 3 then sex = .;
                                 

		format sex sx.;
                format age ag.;
                format agegroup $ agg.;
                format lifestyl ls.;
                format occup wk.;             
		
		Proc Sort; by id;
                run;        

		Proc print;
                run;

                Proc Freq;
                run;
                                  

2. Submitting the Command file for processing.

If you name your program (command) file, with a .sas suffix, you may submit the job in batch mode merely by typing the filename. For example, if you name your SAS program file, hmwk1.sas, then you may submit the program for processing by typing: sas hmwk1 The output of your run will appear in a log, and if there are no fatal sas programming errors, a first file. The log file will be called hmwk1.log. It will contain a list of the SAS commands and notifications of how SAS processed each step. If nonfatal errors were made, this file will contain a list of warnings indicating each. If fatal errors were committed, then error diagnostics will appear in the hmwk1.log file indicating each of these. Logical errors that are syntactically admissible will not be revealed so the user must review his logic apart from the errors indicated before he or she is sure that the program is correct.

3. Reviewing the Output of the SAS Program.

When there are no programming errors left in the command file, the submitted program will produce a first file, in this instance called hmwk1.lst. This output file will contain the results of the processing. If the statistical procedures were included in the program, then the output (listing) file will contain the results of the statistical analysis. The user should review these results carefully to check for errors before accepting them as valid. If variables are not being read correctly or if an assignment statement was incorrectly formulated, the output may be flawed. Upon finding such flaws, the user should review the program file for the errors.

4. Debugging the SAS program file.

The user may edit the SAS command file and in so doing correct the errors found in the program logic. The user should save the SAS program file again. In our example, it is called, hmwk1.sas

5. Resubmitting the SAS program file.

The user may resubmit the SAS command file by typing:
	sas hmwk1      
The output file will be called hmwk1.lst

6. Printing the output.

The SAS output file, hmwk1.lst, may be printed by typing:
	lpr -Pnyu_acf_th_hp4si_1 hmwk1.lst 
This command on stats1 will print hmwk1.lst on the Tisch Hall LC-7 Hewlett Packard LaserJet hp4si printer. You may direct the printed output to another printer by checking the printer addresses on stats1 by typing:
	printers
You may check the status of your print command after issuing it by typing:
	lpq
For additional assistance either with statistics or statistical programming, users may phone Frank LoPresti, (212) 998-3398, at the ITS Statistics and Social Science Group.

This page has been accessed times. Counter courtesy of Web-Counter.


Home - last updated 1/28/05.