PaulDickman.com

Estimating and modelling relative survival using Stata

By Paul Dickman, Enzo Coviello, and Michael Hills

This page contains details of Stata commands (for Stata version 9 or higher) for estimating and modelling relative survival. Expected survival can be calculated using the Ederer I, Ederer II, or Hakulinen methods. Estimation using a period approach is supported. A sample data set containing information on colon carcinoma diagnosed in Finland is provided. Further details of the command can be found here.

The package can be installed by typing the following command at the Stata command prompt:

net install http://www.pauldickman.com/rsmodel/stata_colon/strs, all

This command installs the following files into the Stata PLUS directory:

strs.ado 
strs.hlp
ht.ado 
rs.ado 
esteve.ado 

In adddition, the following ancillary files will be copied into the current directory:

colon.dta 
melanoma.dta 
popmort.dta 
survival.do 
models.do
survival_period.do four_methods.do

Sample do files are provided to reproduce the estimates reported in Table I of the paper Dickman et al (2004). Two input data files are provided; colon.dta contains the cancer patient data and popmort.dta contains data on expected probabilities of death for the Finnish general population.

Running survival.do produces life table estimates of relative survival stratified by sex, age, and calendar period of diagnosis. In addition, two output data sets are created (one containing grouped data and one containing individual patient data) which are used as input data sets for modelling. Running models.do estimates a relative survival regression model using several different approaches (described in Dickman et al (2004)). Period estimation is illustrated in survival_period.do.

strs is the command for estimating relative survival (see the help file for details and survival.do for an example). The various approaches to modelling excess mortality are defined using ado files; ht.ado (Hakulinen-Tenkanen), esteve.ado (Estève et al.), and rs.ado (Poisson regression). An example of how to fit the models is provided in models.do.

The data files (colon.dta, and popmort.dta) are also available in Stata version 7 format [Download version 7 ZIP archive] and Stata version 9/10 format [Download version 9 ZIP archive].

Version History
(strs can be updated using the Stata adoupdate command)

20120304 Version 1.3.7

  • Fixed a bug that caused an error when using the calyear option (the error caused the program to stop and no incorrect results were presented).

20120216 Version 1.3.6

  • The Pohar Perme estimator of net survival (pohar option) can now be used in the presence of late entry (e.g., period analysis).
  • The -calyear- option (introduced in version 1.3.5) for use with the pohar option can now also be used with Ederer II estimation. The code has also been optimised (no longer required to split the data twice) and should be faster.
  • Better control of the output when standardised estimates are requested.
  • Pohar Perme estimates are now saved in the -individual- file.

20110914 Version 1.3.5

  • Updated code for the estimator of net survival proposed by Pohar Perme et al (the (pohar) option). The new code has been checked by comparing the results with the R function written by Maja Pohar Perme. When the (pohar) option is specified the new option (calyear) is available. This option splits at each calendar year, thereby enabling a slightly more precise computation of the weights used by this estimator. When the (calyear) option is used the results produced by -strs- match almost completely those obtained in R. However, this increases the memory needs and can reduce the speed of computation. Even without the (calyear) option the results agree to the 4th significant digit. The do file four_methods.do contains code for estimating relative/net survival using 4 methods (Ederer I, Ederer II, Hakulinen, and Pohar Perme) and presenting the estimates graphically.

20110614 Version 1.3.3

  • EXPERIMENTAL: New option (pohar) for the estimator of net survival proposed by Pohar Perme et al (Biometrics 2011 in press). This has not been fully tested and should be considered experimental, although it gives comparable results when applied to the Slovenian data used in the R package by Maja Pohar Perme (we do not expect identical results since -strs- splits time). The do file four_methods.do contains code for estimating relative/net survival using 4 methods (Ederer I, Ederer II, Hakulinen, and Pohar Perme) and presenting the estimates graphically.
  • Standard errors of cr_e2 (se_cr_e2) and cr_hak (se_cr_hak) are now calculated.
  • Standard errors and confidence intervals for standardized survival estimates are calculated using the method described by Corazziari et al (2004). The same approach has been applied to the standardized cumulative incidence estimates.
  • Confidence intervals for age standardized estimates are now computed even when the survival probability is zero (i.e., all cases die within the interval) for some intervals.
  • Added an additional check on the standardized survival estimates when the period or hybrid approach is used. They are dropped from the first interval in which they are not computed.

20101105 Version 1.3.2
Corrected a bug (introduced with Stata 11) that lead to incorrect confidence limits for survival estimates when P or CP were zero. Updated models.do to use Stata version 11 syntax for factor variables.

20100926 Version 1.3.1
Corrected a bug whereby the check (using isid) added in 1.3.0 incorrectly returned an error when the filename (for the population mortality file) contained spaces.

20100617 Version 1.3.0
Now requires Stata version 9.
Prior to version 1.3.0 strs reported incorrect standard errors (for both all-cause and relative survival) when period analysis was performed.
This has now been corrected. See standard_errors.pdf for details.
Added a check (using the isid command) that the mergeby variables uniquely index the observations in the population mortality file.

20091120 Version 1.2.9 (this is the latest version of strs that will run in Stata version 8)
More informative error message when some records do not merge with popmort file.
More informative message when late entry is detected. Set n_prime to missing when late entry is detected.

20080604 Version 1.2.8
Major bug fix: Exit times (deaths or censorings) that occurred on the boundaries of life table intervals were previously classified (incorrectly) into the earlier interval rather than the latter interval. This is because stsplit uses intervals that are open on the left and closed on the right whereas life table intervals are closed on the left and open on the right.
New feature: Cumulative incidence of death due to cancer and cumulative incidence of death due to other causes in the presence of competing risks can be calculated using the method of Cronin and Feuer (2000) via the new cuminc option.
New option: keep() can be used to restrict the variables written to the 'individual' data file.
New option: savstand species that standardised estimates be saved to an output data set.
Fix: Command exits with an error if missing values found for any variable listed in the mergeby() option.
Fix: The variables start or end (but not both) can be suppressed when using the list() option. If one of these two is specified then the other is suppressed. If neither is specified then both are listed (as in previous versions).

20070702 Version 1.2.5
Corrected bug that gave incorrect estimates if brenner option was used together with if qualifier.
Corrected bug in calculating cumulative survival when interval specific survival was zero (everyone dies during the interval). In previous versions the cumulative survival was multiplied by 1 when it should be multiplied by zero.

20061008 Version 1.2.4
strs now exits with a warning if ederer1 and brenner options are used together.
improved code for period analysis in survival_period.do.

20060504 Version 1.2.3

20051128 Version 1.2.0
New algorithm for hakulinen estimates of period survival.
Incorporation of weights to provide standardised survival estimates, including the 'alternative approach' developed by Brenner et al. (not yet fully tested). See the standstrata and brenner options.
Data no longer saved to grouped.dta and individ.dta by default - use the new save(replace) option (or saveind and savgroup to specify the filenames).
New option notables to supress listing of the life tables.
Improved error reporting.

20041124 Version 1.1.0
A major upgrade thanks to help from Enzo Coviello. Added Ederer I and Hakulinen estimates (period analysis can only be performed with Ederer II). Many 'options' are now truly optional. Improved error checking. Added a list option for specifying variables to be printed and a format option. The command now runs without a by option (producing a single life table for all patients).

20041008 Version 1.0.1
Corrected an error in the formula for the standard error of the interval-specific relative survival (r). The line
quietly gen se_r=se_p/r
was changed to
quietly gen se_r=se_p/p_star

20040809 Version 1.0