Estimating and modelling relative survival using Stata
By Paul Dickman and Enzo Coviello
This page describes the Stata package for estimating and modelling relative and survival. Expected survival can be calculated using the Ederer I, Ederer II, or Hakulinen methods. Estimation using a period or hybrid approach is supported, as is age standardisation. Sample data sets and do files with worked examples are provided. Further details of the command can be found in our two papers in The Stata Journal [strs and stnet].
The package can be installed by typing the following command at the Stata command prompt:
net install http://www.pauldickman.com/rsmodel/stata_colon/strs, all
This command installs the following files into the Stata PLUS directory:
In adddition, the following ancillary files will be copied into the current working directory:
Sample do files are provided to reproduce the estimates reported in Table I of the paper Dickman et al (2004). Two input data files are provided; colon.dta contains the cancer patient data and popmort.dta contains data on expected probabilities of death for the corresponding general population.
Running survival.do produces life table estimates of relative survival stratified by sex, age, and calendar period of diagnosis. In addition, two output data sets are created (one containing grouped data and one containing individual patient data) which are used as input data sets for modelling. Running models.do estimates a relative survival regression model using several different approaches (described in Dickman et al (2004)). Period estimation is illustrated in survival_period.do.
strs is the command for estimating relative survival (see the help file for details and survival.do for an example). The various approaches to modelling excess mortality are defined using ado files; ht.ado (Hakulinen-Tenkanen), esteve.ado (Estève et al.), and rs.ado (Poisson regression). An example of how to fit the models is provided in models.do.
(strs can be updated using the Stata adoupdate command)
20170416 Version 184.108.40.206
- Corrected bug (line 691) that resulted in incorrect estimates of Pohar Perme (actuarial) estimator when all individuals in an interval died.
20161116 Version 220.127.116.11
- The variable end is now generated as a float (line 1240) to avoid potential problems if the user has "set type double".
20150507 Version 1.4.2
- Sample data sets have been modified. The cancer registry that provided us with these data asked us to remove references to the source and to randomly permute the dates of diagnosis. As such, some estimates may differ slightly compared to those shown in published papers.
- The temporary variable __000000 was saved to the individual data set, which could cause problems for future operations. [See this post]
- More informative error messages when _age or _year were present in the patient data.
- Our two papers have been published in The Stata Journal [strs and stnet].
20150218 Version 1.4.1
- Minor bug fixes.
20150209 Version 1.4.0
- Crude probabilities of death (cuminc option) are now estimated when late entry is detected (e.g., period analysis) or when the ht (hazard-transformation) option is used. We thank Ron Dewar (Cancer Care Nova Scotia) for bringing this issue to our attention and helping with the implementation and testing of the new code. See strs_technical_140.pdf for technical details.
20131106 Version 1.3.9
- esteve.ado updated to correct a bug that gave incorrect results when interval widths were other than 1.
- Fixed bug that caused strs to give an error if the time origin was both non-zero and constant for all observations.
- Improved variance formula for Pohar Perme estimator.
- The savstand() option now works as it should when the brenner option is used (previously it did not have any effect).
20130329 Version 1.3.8
- Improved algorithm for the Pohar Perme estimator. Thanks to Karri Seppä and Arun Pokhrel. Instead of weighting by the cumulative expected survival at the end of the interval, weights are based on the cumulative expected survival at the midpoint of the interval.
- New option, -ht-, that causes -strs- to use the hazard transformation approach for cohort/complete analysis (rather than the default actuarial approach)
- Corrected a bug that lead to incorrect (too wide) confidence intervals for the Pohar Perme standardised estimates.
- Corrected a bug that caused -strs- to erroneously report late entry in some circumstances when an -if- expression was used.
- Corrected a bug that caused -strs- to report an incorrect number censored in the last interval when the -calyear- option is specified.
- Corrected a bug that caused -strs- to report incorrect confidence intervals for standardised estimates when relative/net survival is greater than 1. Confidence intervals now no longer calculated calculated when RSR > 1.
- CIs for standardised estimates now respect the -format- option (%6.4f by default).
- -strs- implements two alternative approaches to estimating survival, the actuarial approach and the hazard transformation (ht) approach. In the actuarial approach all calculations are done on the survival scale whereas in the ht approach we first estimate the cumulative hazard and then transform to the the survival scale. In version 1.3.7, the ht approach was the default if late entry was detected (i.e., period analysis) whereas the actuarial approach was the default if there was no late entry (i.e., cohort estimation).
- The choice of default approach remains unchanged in 1.3.8, although a new option, -ht-, has been added that forces -strs- to use the hazard transformation approach.
- Ederer I estimates are not available using the ht approach. Requesting Ederer I estimates together with the ht option or in the presence of late entry will cause an error.
- The variables w (number censored in the interval) and n_prime (effective number at risk) are not calculated when the ht approach is used so requesting these in the list option will cause an error.
20120304 Version 1.3.7
- Fixed a bug that caused an error when using the calyear option (the error caused the program to stop and no incorrect results were presented).
20120216 Version 1.3.6
- The Pohar Perme estimator of net survival (pohar option) can now be used in the presence of late entry (e.g., period analysis).
- The -calyear- option (introduced in version 1.3.5) for use with the pohar option can now also be used with Ederer II estimation. The code has also been optimised (no longer required to split the data twice) and should be faster.
- Better control of the output when standardised estimates are requested.
- Pohar Perme estimates are now saved in the -individual- file.
20110914 Version 1.3.5
- Updated code for the estimator of net survival proposed by Pohar Perme et al (the (pohar) option). The new code has been checked by comparing the results with the R function written by Maja Pohar Perme. When the (pohar) option is specified the new option (calyear) is available. This option splits at each calendar year, thereby enabling a slightly more precise computation of the weights used by this estimator. When the (calyear) option is used the results produced by -strs- match almost completely those obtained in R. However, this increases the memory needs and can reduce the speed of computation. Even without the (calyear) option the results agree to the 4th significant digit. The do file four_methods.do contains code for estimating relative/net survival using 4 methods (Ederer I, Ederer II, Hakulinen, and Pohar Perme) and presenting the estimates graphically.
20110614 Version 1.3.3
- EXPERIMENTAL: New option (pohar) for the estimator of net survival proposed by Pohar Perme et al (Biometrics 2011 in press). This has not been fully tested and should be considered experimental, although it gives comparable results when applied to the Slovenian data used in the R package by Maja Pohar Perme (we do not expect identical results since -strs- splits time). The do file four_methods.do contains code for estimating relative/net survival using 4 methods (Ederer I, Ederer II, Hakulinen, and Pohar Perme) and presenting the estimates graphically.
- Standard errors of cr_e2 (se_cr_e2) and cr_hak (se_cr_hak) are now calculated.
- Standard errors and confidence intervals for standardized survival estimates are calculated using the method described by Corazziari et al (2004). The same approach has been applied to the standardized cumulative incidence estimates.
- Confidence intervals for age standardized estimates are now computed even when the survival probability is zero (i.e., all cases die within the interval) for some intervals.
- Added an additional check on the standardized survival estimates when the period or hybrid approach is used. They are dropped from the first interval in which they are not computed.
20101105 Version 1.3.2
Corrected a bug (introduced with Stata 11) that lead to incorrect confidence limits for survival estimates when P or CP were zero. Updated models.do to use Stata version 11 syntax for factor variables.
20100926 Version 1.3.1
Corrected a bug whereby the check (using isid) added in 1.3.0 incorrectly returned an error when the filename (for the population mortality file) contained spaces.
20100617 Version 1.3.0
Now requires Stata version 9.
Prior to version 1.3.0 strs reported incorrect standard errors (for both all-cause and relative survival) when period analysis was performed.
This has now been corrected. See standard_errors.pdf for details.
Added a check (using the isid command) that the mergeby variables uniquely index the observations in the population mortality file.
20091120 Version 1.2.9 (this is the latest version of strs that will run in Stata version 8)
More informative error message when some records do not merge with popmort file.
More informative message when late entry is detected. Set n_prime to missing when late entry is detected.
20080604 Version 1.2.8
Major bug fix: Exit times (deaths or censorings) that occurred on the boundaries of life table intervals were previously classified (incorrectly) into the earlier interval rather than the latter interval. This is because stsplit uses intervals that are open on the left and closed on the right whereas life table intervals are closed on the left and open on the right.
New feature: Cumulative incidence of death due to cancer and cumulative incidence of death due to other causes in the presence of competing risks can be calculated using the method of Cronin and Feuer (2000) via the new cuminc option.
New option: keep() can be used to restrict the variables written to the 'individual' data file.
New option: savstand species that standardised estimates be saved to an output data set.
Fix: Command exits with an error if missing values found for any variable listed in the mergeby() option.
Fix: The variables start or end (but not both) can be suppressed when using the list() option. If one of these two is specified then the other is suppressed. If neither is specified then both are listed (as in previous versions).
20070702 Version 1.2.5
Corrected bug that gave incorrect estimates if brenner option was used together with if qualifier.
Corrected bug in calculating cumulative survival when interval specific survival was zero (everyone dies during the interval). In previous versions the cumulative survival was multiplied by 1 when it should be multiplied by zero.
20061008 Version 1.2.4
strs now exits with a warning if ederer1 and brenner options are used together.
improved code for period analysis in survival_period.do.
20060504 Version 1.2.3
20051128 Version 1.2.0
New algorithm for hakulinen estimates of period survival.
Incorporation of weights to provide standardised survival estimates, including the 'alternative approach' developed by Brenner et al. (not yet fully tested). See the standstrata and brenner options.
Data no longer saved to grouped.dta and individ.dta by default - use the new save(replace) option (or saveind and savgroup to specify the filenames).
New option notables to supress listing of the life tables.
Improved error reporting.
20041124 Version 1.1.0
A major upgrade thanks to help from Enzo Coviello. Added Ederer I and Hakulinen estimates (period analysis can only be performed with Ederer II). Many 'options' are now truly optional. Improved error checking. Added a list option for specifying variables to be printed and a format option. The command now runs without a by option (producing a single life table for all patients).
20041008 Version 1.0.1
Corrected an error in the formula for the standard error of the interval-specific relative survival (r). The line
quietly gen se_r=se_p/r
was changed to
quietly gen se_r=se_p/p_star
20040809 Version 1.0