Date: Wed, 22 Feb 2006 11:06:50 -0800
Reply-To: "Choate, Paul@DDS" <pchoate@DDS.CA.GOV>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Choate, Paul@DDS" <pchoate@DDS.CA.GOV>
Subject: Re: studying repeats
Content-Type: text/plain; charset="us-ascii"
If the data aren't too large (<64k lines) I often put this sort of data
into Excel (with the libname engine) and use a pivot table to look at
the problems. For me it's very interactive and lets me rapidly drill
across and down and inspect the situation. I then use this to inform my
SAS programming. If the data are large I often subset or sample it and
do the same.
I agree that keeping your key fields separate is important - using
multiple level sorts instead. This allows one to also look for problems
within the levels independently.
DDS Data Extraction
> -----Original Message-----
> From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
> Sent: Tuesday, February 21, 2006 2:02 PM
> To: SAS-L@LISTSERV.UGA.EDU
> Subject: studying repeats
> I need to look at how many times and which various combinations of
> are repeated (zip code and time period). (Ideally, there should only
> be one instance of each zip code, time period combo, so I want to
> more about cases where that is false.) I created a variable that is
> the concatenation of the values of zip code and time period. What is
> the best way of getting summary stats on the repeats?
> I don't want to just use nodup because I want to know what was
> and how many times it was repeated.
> Ideally, I guess I would maybe do a proc freq on the concatenated
> variable, but I want to get rid of all instances where the frequency