PaulDickman.com

Estimating and modelling relative survival using Stata

By Paul Dickman, Enzo Coviello, and Michael Hills

This page contains details of Stata commands (for version 8 or higher) for estimating and modelling relative survival. Expected survival can be calculated using the Ederer I, Ederer II, or Hakulinen methods. Estimation using a period approach is supported. A sample data set containing information on colon carcinoma diagnosed in Finland is provided. Further details of the command can be found here.

The package can be installed by typing the following command at the Stata command prompt:

net install http://www.pauldickman.com/rsmodel/stata_colon/strs, all

This command installs the following files into the Stata PLUS directory:

strs.ado 
strs.hlp
ht.ado 
rs.ado 
esteve.ado 

In adddition, the following ancillary files will be copied into the current directory:

colon.dta 
popmort.dta 
survival.do 
models.do
survival_period.do

Sample do files are provided to reproduce the estimates reported in Table I of the paper Dickman et al (2004). Two input data files are provided; colon.dta contains the cancer patient data and popmort.dta contains data on expected probabilities of death for the Finnish general population.

Running survival.do produces life table estimates of relative survival stratified by sex, age, and calendar period of diagnosis. In addition, two output data sets are created (one containing grouped data and one containing individual patient data) which are used as input data sets for modelling. Running models.do estimates a relative survival regression model using several different approaches (described in Dickman et al (2004)). Period estimation is illustrated in survival_period.do.

strs is the command for estimating relative survival (see the help file for details and survival.do for an example). The various approaches to modelling excess mortality are defined using ado files; ht.ado (Hakulinen-Tenkanen), esteve.ado (Estève et al.), and rs.ado (Poisson regression). An example of how to fit the models is provided in models.do.

The data files (colon.dta, and popmort.dta) are also available in Stata version 7 format [Download ZIP archive].

Version History
(strs can be updated using the Stata adoupdate command)

20091120 Version 1.2.9 (minor changes only)
More informative error message when some records do not merge with popmort file.
More informative message when late entry is detected. Set n_prime to missing when late entry is detected.

20080604 Version 1.2.8
Major bug fix: Exit times (deaths or censorings) that occurred on the boundaries of life table intervals were previously classified (incorrectly) into the earlier interval rather than the latter interval. This is because stsplit uses intervals that are open on the left and closed on the right whereas life table intervals are closed on the left and open on the right.
New feature: Cumulative incidence of death due to cancer and cumulative incidence of death due to other causes in the presence of competing risks can be calculated using the method of Cronin and Feuer (2000) via the new cuminc option.
New option: keep() can be used to restrict the variables written to the 'individual' data file.
New option: savstand species that standardised estimates be saved to an output data set.
Fix: Command exits with an error if missing values found for any variable listed in the mergeby() option.
Fix: The variables start or end (but not both) can be suppressed when using the list() option. If one of these two is specified then the other is suppressed. If neither is specified then both are listed (as in previous versions).

20070702 Version 1.2.5
Corrected bug that gave incorrect estimates if brenner option was used together with if qualifier.
Corrected bug in calculating cumulative survival when interval specific survival was zero (everyone dies during the interval). In previous versions the cumulative survival was multiplied by 1 when it should be multiplied by zero.

20061008 Version 1.2.4
strs now exits with a warning if ederer1 and brenner options are used together.
improved code for period analysis in survival_period.do.

20060504 Version 1.2.3

20051128 Version 1.2.0
New algorithm for hakulinen estimates of period survival.
Incorporation of weights to provide standardised survival estimates, including the 'alternative approach' developed by Brenner et al. (not yet fully tested). See the standstrata and brenner options.
Data no longer saved to grouped.dta and individ.dta by default - use the new save(replace) option (or saveind and savgroup to specify the filenames).
New option notables to supress listing of the life tables.
Improved error reporting.

20041124 Version 1.1.0
A major upgrade thanks to help from Enzo Coviello. Added Ederer I and Hakulinen estimates (period analysis can only be performed with Ederer II). Many 'options' are now truly optional. Improved error checking. Added a list option for specifying variables to be printed and a format option. The command now runs without a by option (producing a single life table for all patients).

20041008 Version 1.0.1
Corrected an error in the formula for the standard error of the interval-specific relative survival (r). The line
quietly gen se_r=se_p/r
was changed to
quietly gen se_r=se_p/p_star

20040809 Version 1.0