SAS Tips: Missing values

Numeric missing values are represented by a single period (.).

Character missing values are represented by a single blank enclosed in quotes (' ').

Special numeric missing values are represented by a single period followed by a single letter or an underscore (for example .A, .S, .Z, ._).

Special missing values

These are only available for numeric variables and are used for distinguishing between different types of missing values.

Responses to a questionnaire, for example, could be missing for one of several reasons (Refused, illness, Dead, not home). By using special missing values, each of these can be tabulated separately, but the variables are still treated as missing by SAS in data analysis.

data survey;
missing A I R;
input id q1;
cards;
8401 2
8402 A
8403 1
8404 1
8405 2
8406 3
8407 A
8408 1
8408 R
8410 2
;

proc format;
value q1f
.A='Not home'
.R='Refused'
;
run;

proc freq data=survey;
table q1 / missprint;
format q1 q1f.;
run;

Sort order for missing values

There is a serious logic error in the following code:

```sas if age < 20 then agecat=1; else if age < 50 then agecat=2; else if age ge 50 then agecat=3; else if age=. then agecat=9; ``` Since missing values are considered smaller than all numbers, records with age=. will be coded to agecat=1.
Sort order Symbol Description
smallest _ underscore
. period
A-Z special missing values A (smallest) through Z (largest)
-n negative numbers
0 zero
largest +n positive numbers

Working with missing values

When transforming or creating SAS variables, the first part of the code should deal with the case where variables are missing.

if age=. then agecat=.;
else if age < 20 then agecat=1;
else if age < 50 then agecat=2;
else agecat=3;

SAS version 8 introduced a new function, MISSING, that accepts either a character or numeric variable as the argument and returns the value 1 if the argument contains a missing value or zero otherwise. As such, the code above can be improved as follows:

if missing(age) then agecat=.;
else if age < 20 then agecat=1;
else if age < 50 then agecat=2;
else agecat=3;

The MISSING function is particularly useful if you use special missing values since 'if age=.' will not identify all missing values in such cases.

Note that the result of any operation on missing values will return a missing value. In

total=q1+q2+q3+q4+q5+q6;

An alternative is to use:

total=sum(of q1-q6);

in which missing values are assumed to be zero.

Even if you think that a variable should not contain any missing values, you should always write your code under the assumption that there may be missing values.

Index

Paul Dickman
Paul Dickman
Professor of Biostatistics

Biostatistician working with register-based cancer epidemiology.