Prepare you SAS command file with your favorite UNIX editor. At present, vi, ex, and pico are available on stats1. This program file should contain your options statement, your data and proc steps. Your data step may contain options, titles, data step name, input, labels, assorted assignments, transformation statements, and missing value statements. Your statistical procedures follow your data step.
A Typical Basic Annotated SAS Command File contains
1. Configuration information
An options statement provides formatting
of output to 80 column, 60 line output:
options linesize=80 pagesize=60;
2. Title information
SAS can handle up to 10 title lines for
definition of project, phase, task,
author, etc.
Title 'Panel Survey of Health Care Utilization';
Title2 'Practices in Greater New York City Area';
Title3 'Wave Five: 1993';
3. Format Definitions
Nominal and ordinal variables have values
that require labels called formats.
The formats are set up with the format
procedure:
Proc Format;
value sx 1 = 'Female' 2='Male';
value ag 1 = 'Disagree strongly'
2='Disagree' 3 = 'Unsure'
4 = 'Agree' 5 = 'Agree strongly';
value wk 1 = 'farmer' 2 ='blue collar'
3='sales' 4 = 'Clerical'
5 = 'Technical' 6 = 'Managerial'
7 = 'Executive' 8 = 'Professional';
value ls 1 = 'Poor' 2 = 'Moderate'
3 'Luxurious';
value $ agg 'min' = 'Minor'
adu' = 'Adult';
4. Data Step Definitions
The data set is defined with a name
Data one;
5. Data file definition
The location of the external data file is
specified with an infile statement. On the
unix system, the data file in this instance
is located in the /u1/userid/sas subdirectory
and the file is called mydata.dat
infile '/u1/userid/sas/mydata.dat';
6. Input format
Each variable is defined. The variable name,
type, and location of the variable in the
data file is specified. SAS variable names
are generally limited to 8 letters and numbers.
Character variables have a dollar sign after
their variable name, whereas numeric variables
do not. The column location in the data
file is specified by a starting position a dash
and an ending column location. On line 1 in the
data file, variables id, age, gender, sex, and
occupation may be found. On line 2 in the data
file, the social security number and city of
residence are located within the specified
column ranges. The social security number and
city of residence are character variables.
Input #1 id 1-4 age 6-8 gender 10 sex 12
occup 14 #2 ssn $ 1-11 cityname $ 13-23;
7. Variable Label information
Each variable has a name and a label. The label
typically can be up to 40 columns in length.
Longer labels will be truncated and result in
notes of warning.
Label id = 'Respondent id'
age = 'Age of Respondent in Years'
gender = 'Sex of Respondent'
ssn = 'Social Security Number'
cityname = 'City of Residence';
8. Variable Transformations and new definitions
Assignment statements permit new variable
definitions.
/*Bureau of Census National Poverty Threshold*/
If famsize = 4 and income < 16000
then lifesty= 1;
if famsize = 4 and income gt 16000 and
income lt 60000 then lifesty = 2;
if famsize = 4 and income gt 60000
then lifesty = 3;
label lifesty = 'Lifestyle of Respondent in NYC';
9. Selection statements
The researcher may wish to subset
out cases for analysis. He may wish
to examine females only for an analysis.
Such subsetting is accomplished with a
subsetting if statement.
if sex = 1;
10. Recoding statements
Variables may be recoded with
conditional if statements or arrays.
The lifestyle variable may be recoded
as poor or not poor with the following
statement.
If lifestyl = 2 or lifesty = 3
then lifesty = 2;
Even character variables may be created. See
the Proc Format above for exemplification
of how to set up formats for character
variables.
if age < 21 then agegroup = 'min';
if age ge 21 then agegroup = 'mdu';
Arrays can be used to recode groups
of variables. For example:
array coping (14) var1 - var14;
do i = 1 to 14;
if coping (i) = 9 then coping = 5;
end;
11. Missing value handling statements.
Conditional if statements can
recode errors.
if sex = 3 then sex = .;
12. Format assignments
Before the statistical procedures,
it is necessary to assign formats
to their respective variables.
Note the $ before the agegroup
format (it is a character variable).
format sex sx.;
format age ag.;
format agegroup $ agg.;
format lifestyl ls.;
format occup wk.;
12. Data may be sorted with the sort
procedure.
Proc Sort; by id;
run;
12. Data Preview may be undertaken
before analysis begins.
The data may be listed out with a
proc print statemnt:
proc print;
run;
13. Statistical Procedures
A range check of the data should be
undertaken before analysis as well.
This may be executed with a frequencies
procedure.
Proc Freq;
run;
Other statistical procedures may be
appended to the program in order of
preference. Examples might be Proc Means,
Proc Anova, Proc GLM, Proc Reg, Proc Factor,
Proc Discrim, Proc Mixed, etc.
Therefore the actual SAS Program File without annotation
would appear as follows:
options linesize=80 pagesize=60;
Title 'Panel Survey of Health Care Utilization';
Title2 'Practices in Greater New York City Area';
Title3 'Wave Five: 1993';
Proc Format;
value sx 1 = 'Female' 2='Male';
value ag 1 = 'Disagree strongly'
2='Disagree' 3 = 'Unsure'
4 = 'Agree' 5 = 'Agree strongly';
value wk 1 = 'farmer' 2 ='blue collar'
3='sales' 4 = 'Clerical'
5 = 'Technical' 6 = 'Managerial'
7 = 'Executive' 8 = 'Professional';
value ls 1 = 'Poor' 2 = 'Moderate'
3 'Luxurious';
Data one;
infile '/u1/userid/sas/mydata.dat';
Input id 1-4 age 6-8 gender 10 sex 12
ssn $ 1-11 cityname $ 13-23;
Label id = 'Respondent id'
age = 'Age of Respondent in Years'
gender = 'Sex of Respondent'
ssn = 'Social Security Number'
cityname = 'City of Residence';
/*Bureau of Census National Poverty
Threshold*/
If famsize = 4 and income < 16000
then lifesty= 1;
if famsize = 4 and income gt 16000 and
income lt 60000 then lifesty = 2;
if famsize = 4 and income gt 60000
then lifesty = 3;
If sex = 1;
If lifestyl = 2 or lifesty = 3
then lifesty = 2;
If age < 21 then agegroup = 'Minor';
If age ge 21 then agegroup = 'Adult';
If sex = 3 then sex = .;
format sex sx.;
format age ag.;
format agegroup $ agg.;
format lifestyl ls.;
format occup wk.;
Proc Sort; by id;
run;
Proc print;
run;
Proc Freq;
run;
sas hmwk1The output file will be called hmwk1.lst
lpr -Pnyu_acf_th_hp4si_1 hmwk1.lstThis command on stats1 will print hmwk1.lst on the Tisch Hall LC-7 Hewlett Packard LaserJet hp4si printer. You may direct the printed output to another printer by checking the printer addresses on stats1 by typing:
printersYou may check the status of your print command after issuing it by typing:
lpqFor additional assistance either with statistics or statistical programming, users may phone Frank LoPresti, (212) 998-3398, at the ITS Statistics and Social Science Group.
This page has been accessed times. Counter courtesy of Web-Counter.
-
last updated 1/28/05.