```Date: Wed, 25 Jul 2007 15:09:53 -0500 Reply-To: "Hashmi, Syed S" Sender: "SPSSX(r) Discussion" From: "Hashmi, Syed S" Subject: Re: Random date generator Comments: To: Richard Ristow In-Reply-To: <7.0.1.0.2.20070725144713.03929528@mindspring.com> Content-Type: text/plain; charset="us-ascii" > -----Original Message----- > From: Richard Ristow [mailto:wrristow@mindspring.com] > Sent: Wednesday, July 25, 2007 2:05 PM > > >There is a small subset of the population where I have the complete > >stop date but am missing the start day (I have the year and month) and > >am also missing the duration. I had to come up with some way to > >impute a start date for these cases for analysis. (which will be done > >with and without these specific cases). I know that the event could > >not be more than a month long. I was planning calculate the earliest > >possible start date (e_startdt) up to a month before the stop date and > >then randomly pick a date between e_startdt and the stop date. > > OUCH! I would not do this. Period. > > *MAYBE* the start dates and durations you get this way will be vaguely > representative of the population of events, though I doubt it. Are your > durations roughly uniformly distributed from 0 to 30 days? For goodness > sake, you ought to check that before proceeding. > > But even if they're representative of the population, they have nothing > to do with the individual cases for which they're 'imputed'. No > analysis using those 'dates' will be the least trustworthy. > > A far better approach is to use true missing-value interpolation on the > *durations*, not the dates. (See SPSS 'MVA'.) I'm not clear how many > durations you'd have to impute. If it's near 50%, that won't be at all > reliable, either. > > -Good luck, > Richard Richard, Thanks for your input. I realize that I was stepping into extremely treacherous territory when I decide to impute dates and select random ones. As for the durations being roughly uniformly distributed, that's what it looks like from the data I do have. Initially, I'd assumed that durations would have a mean of about 7 days but somehow the data I do have doesn't seem to show that. It's more or less uniformly distributed. There were some durations that were >30 days but I doubt if they're true. Therefore, I decided to go ahead with the uniform distribution (although, the whole imputation and random selection still bothers me). The reason that I'm trying to get an idea about the dates, especially the event start dates, is due to the nature of the study question. I'm looking at the occurrence of certain events during pregnancy. However, these events of interest have to occur within the first trimester, or if I narrow it down further, the first two months of pregnancy. Therefore, I have to know if an event occurred within a certain period of time after the last menstrual date as reported by the woman. At the end of the day, the variables for all the events get filtered down to a single dichotomous variable - Y/N did the event occur during the period of interest? I will do the analysis with and without the cases where the dates have been imputed from incomplete data. I hadn't previously thought of using true-missing value interpolation on the durations but I'll look into it. I've never done that before so will have to read up a bit on it. I might have an issue with number of missings though, since more cases have at least some part of the date then a duration value. Thanks again for your advice. It's always nice to get a fresh look at an issue. - Shahurkh ```

Back to: Top of message | Previous page | Main SPSSX-L page