Date: Mon, 21 Oct 2002 09:31:03 -0400
Reply-To: "Karl K." <karlstudboy@HOTMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Karl K." <karlstudboy@HOTMAIL.COM>
Subject: Lifetest Hazard Function Plots
Hi, All. My only statistician who knows survival analysis departed in mid-
project, and unfortunately the only person I have who is competent to
finish the project is, uh, me. I have plenty of experience supervising
the analyses and interpreting the results, but we all know it's a little
different when you have to actually DO something.
The event for which we're estimating the survival function is a
complication from treatment. A full course of treatment (according to
guidelines) takes about 4 months; about 25% of patients experience the
targeted complication, and the literature indicates that the risk for
developing the complication increases up to about day 10, then drops off
until about day 20, then has a smaller peak at about day 30, then drops
off again and levels off at a low level until the end of treatment (about
day 130). In other words, the literature says that most of the people
likely to develop the complication get it right away, a few more get it a
couple weeks later, and, by then, everybody who's gonna get it has gotten
Here's the catch: the literature is based on clinical trials. My task is
to validate that model with a sample of 1,000 patients treated "in the
wild", i.e., I have naturalistic retrospective data. For the first 50
days or so, my estimated hazard function plot corresponds to the
literature from trials as I described above. But, because the data don't
come from trials, the docs can delay or interrupt treatment, which they do
for a variety of reasons, not just the one complication I'm studying.
This means that, although it's a 130-day regimen, the distribution of
treatment length looks like you'd expect any other length-of-stay
distribution to look like: it's highly skewed to the right. As a result
(at least, I think that's what's causing this), my hazard plot doesn't
flatten out after day 50 like the literature says it should. In fact,
once you get out past about day 130, you get ever increasing spikes in the
hazard estimates, that are much higher than the theoretical maximum
risk, "known" to occur at about day 10.
I THINK this is due to the fact that I have so few observations, censored
or otherwise, out past day 130. When an event occurs out there, it puts a
huge spike in the hazard plot.
My questions, then, are: 1) is this interpretation accurate, given what
little info I've shared, and 2) is there anything I can legitimately do
about it (eg., by screwing around with the "intervals" option) to make my
hazard plot look more like what's expected from the literature (without
compromising scientific integrity)?
Thanks in advance, and my apologies for such a long posting.
Karl (running Sas 8.2 on WinXP)