PaulDickman.com

Using PROC GENMOD for logistic regression (SAS version 6)

Note that these notes refer to version 6 of the SAS system. In version 8 it is preferable to use PROC LOGISTIC for logistic regression. See the notes Logistic regression in SAS version 8.

Download the handout from seminar I (MS Word format).

Download the SAS code from seminar II (a .SAS file).

PROC GENMOD is a procedure which was introduced in SAS version 6.09 (approximately 1993) for fitting generalised linear models. Generalised linear models include classical linear models with normal errors, logistic and probit models for binary data, and log-linear and Poisson regression models for count data.

PROC GENMOD uses a class statement for specifying categorical (classification) variables, so indicator variables do not have to be constructed in advance, as is the case with, for example, PROC LOGISTIC. Interactions can be fitted by specifying, for example, age*sex. The response variable or the explanatory variable can be character (see the example below), while PROC LOGISTIC requires explanatory variables to be numeric.

Another advantage of the class statement is that by using the TYPE3 option on the model statement, PROC GENMOD will automatically report likelihood ratio test statistics for the effect of each term in the model. For a categorical variable with k levels, the test will be based on (k-1) degrees of freedom.

By default, PROC GENMOD uses a corner point parameterisation for categorical variables where the last category of each variable is used as the reference category. One method for specifying a reference category is to define a format for the variable using a space as the first character of the formatted value for all categories except the reference category and specifying the order=formatted option in PROC GENMOD. Since a space is sorted before all other characters, GENMOD will use the desired category as the reference.

PROC GENMOD is documented in SAS/STAT Software: Changes and Enhancements through Release 6.12.

data file1;
input year $ dose $ reject $ count;
cards;
<1973 <3.0 yes 4
<1973 >=3.0 yes 2
<1973 <3.0 no 9
<1973 >=3.0 no 16
1973+ <3.0 yes 13
1973+ >=3.0 yes 2
1973+ <3.0 no 10
1973+ >=3.0 no 12
;
run;


/* Fit a logistic regression model using PROC GENMOD */
proc genmod;
class dose year;
freq count;
model reject = dose year / error=bin link=logit type3;
make 'parmest' out=parmest;
run;


/*
PROC GENMOD does not report the odds ratio
directly, only the estimated betas (log odds ratios), 
but we can exponentiate these in a data step to 
get estimated odds ratios*/

data parmest;
set parmest;
if df gt 0;
or=exp(estimate);
low_or=exp(estimate-1.96*stderr);
hi_or=exp(estimate+1.96*stderr);
run;

proc print data=parmest label noobs;
title2 'Estimated odds ratios and 95% CIs';
var parm level1 estimate stderr or low_or hi_or;
format estimate stderr or low_or hi_or 6.3;
label
parm='Parameter'
level1='Level'
estimate='Beta estimate'
stderr='Standard Error'
or='Estimated OR'
low_or='Lower limit 95% CI'
hi_or='Upper limit 95% CI'
;
run;