Date: Wed, 27 Feb 2008 01:32:41 -0500
Reply-To: Dave Birch <davebirch@LYCOS.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Dave Birch <davebirch@LYCOS.COM>
Subject: Re: Missing data that means something
Adding to Gerhard's and Mary's sage advice, I'd investigate using the
special missing values which can represent different categories of missing
data - e.g.:
.a for "Not Applicable"
.i for "Invalid Response"
.u for "Unknown"
Many SAS Procs allow you to choose whether missing values are included in
calculations or not. This often makes a significant difference to
Affinity analysis etc.
On Tue, 26 Feb 2008 10:48:36 -0600, Mary <mlhoward@AVALON.NET> wrote:
>I'd encourage adding more variables to make it more clear- certainly a
variable for whether the person has children or not; as you might know
that the person has children, but just not have actual ages for all
>Also watch out for the age of 0 of a child- I sometimes see in data that
people don't know what to do with a child who is literally 0 days old
(just born), or is not born yet (mother is pregnant, so how old is the
baby- negative days old?).
>It would seem like in purchasing that this would be important- expecting
mothers buy TONS of stuff, as do fathers who run out on the day the baby
is born to get things that are suddenly needed. An expecting mother with
no current children would indeed have a different buying pattern than
someone with no children who isn't expecting, either.
>Perhaps you could have a particular null value for "never purchased",
such as a date last purchased could have a null date of 01/01/9999; but
then you'd have to make sure to check for that in your code before doing
anything with the dates.
> ----- Original Message -----
> From: amw5gster@GMAIL.COM
> To: SAS-L@LISTSERV.UGA.EDU
> Sent: Tuesday, February 26, 2008 6:54 AM
> Subject: Missing data that means something
> I'm performing a cluster analysis on a sample of customers, trying
> to determine different natural levels of affinity across products. I
> believe that the life-stage of the customer (combined function of
> marital status, age, age of children) is an important influence on the
> clusters, so I'm including those variables as well.
> The problem I'm having is that many of my variables have missing
> values- not because I don't know the values, but because missing
> indicates a real value. E.g., time since last purchase of a
> particualr product line. Well, if the customer never purchased that
> product, the value is missing. Age of childern would be another
> example, field, and would be missing is the customer has no children.
> Is there a common way to account/recode for these types of variables?
> The only documentation I can find talks about missing as an indicator
> of "don't know", as opposed to null or n/a.
> Much obliged for your consideration