LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (July 2007)Back to main SPSSX-L pageJoin or leave SPSSX-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Wed, 25 Jul 2007 15:05:07 -0400
Reply-To:     Richard Ristow <wrristow@mindspring.com>
Sender:       "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From:         Richard Ristow <wrristow@mindspring.com>
Subject:      Re: Random date generator
Comments: To: "Hashmi, Syed S" <Syed.S.Hashmi.1@uth.tmc.edu>
Comments: cc: Melissa Ives <mives@chestnut.org>,
          Gene Maguin <emaguin@buffalo.edu>
In-Reply-To:  <820E5CA81E7A2544A9D7E0A25025E4810107AFCA@e2k0305.chestnut. net>
Content-Type: text/plain; charset="us-ascii"; format=flowed

Somehow I missed or deleted the original posting in this thread. Anyway, on Thursday, July 19, 2007 9:48 PM Hashmi, Syed S asked,

>A dataset that I'm analyzing has a set of dates for events (start and >stop dates) as well as how long those events occured for. The data >for each date is in three variables (month, day, year). The years are >pretty complete if they are filled in but the month and day might are >sometimes listed as the exact month or date and other times they're >listed as beginning, middle or end of the year (for the month >variable) or the month (for the day variable). > >I have [two dates as three variables each, plus a duration] duration). >I have the complete start and stop date for about half the cases. The >rest are missing either parts of one of the dates (eg. day) or for >both. If I have one of the dates and a duration, I can calculate the >other date.

So far, so good, though be careful about how precise your 'durations' are.

>There is a small subset of the population where I have the complete >stop date but am missing the start day (I have the year and month) and >am also missing the duration. I had to come up with some way to >impute a start date for these cases for analysis. (which will be done >with and without these specific cases). I know that the event could >not be more than a month long. I was planning calculate the earliest >possible start date (e_startdt) up to a month before the stop date and >then randomly pick a date between e_startdt and the stop date.

OUCH! I would not do this. Period.

*MAYBE* the start dates and durations you get this way will be vaguely representative of the population of events, though I doubt it. Are your durations roughly uniformly distributed from 0 to 30 days? For goodness sake, you ought to check that before proceeding.

But even if they're representative of the population, they have nothing to do with the individual cases for which they're 'imputed'. No analysis using those 'dates' will be the least trustworthy.

A far better approach is to use true missing-value interpolation on the *durations*, not the dates. (See SPSS 'MVA'.) I'm not clear how many durations you'd have to impute. If it's near 50%, that won't be at all reliable, either.

-Good luck, Richard


Back to: Top of message | Previous page | Main SPSSX-L page