Date: Mon, 13 Feb 2006 22:41:31 +0000
Reply-To: toby dunn <tobydunn@HOTMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: toby dunn <tobydunn@HOTMAIL.COM>
Subject: Re: Counting/extracting unique records
In-Reply-To: <200602132226.k1DLxbjk008924@mailgw.cc.uga.edu>
Content-Type: text/plain; format=flowed
Howard,
Nat brought this top my attention a little while ago. I initailly misread
the question. Upon reflection I am not sure why one would want these
numbers like this but what do I know. All I can think of is a two pass
solution using either two sql statements or one with a subquery.
Toby Dunn
From: "Howard Schreier <hs AT dc-sug DOT org>" <nospam@HOWLES.COM>
Reply-To: "Howard Schreier <hs AT dc-sug DOT org>" <nospam@HOWLES.COM>
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: Counting/extracting unique records
Date: Mon, 13 Feb 2006 17:26:21 -0500
Sorry guys, but it was not a good idea to test an example where the size of
each group happens to equal the number of groups.
Make it
data temp;
format date mmddyy10.;
input MyId Date mmddyy10.;
datalines;
1 1/2/2006
1 2/2/2006
1 .
1 .
2 3/2/2006
2 1/2/2006
2 .
;
and you get
Id
Obs date MyId Cnt
1 02/02/2006 1 4
2 03/02/2006 2 3
So IdCnt does not answer the question. Of course the final entry in the Obs
column does.
On Mon, 13 Feb 2006 21:24:37 +0000, toby dunn <tobydunn@HOTMAIL.COM> wrote:
>Opps your right Jiann thanks for the catch.
>
>
>
>Toby Dunn
>
>
>
>
>
>From: Jiann-Shiun Huang <Jiann-Shiun.Huang@AMERUS.COM>
>Reply-To: Jiann-Shiun Huang <Jiann-Shiun.Huang@AMERUS.COM>
>To: SAS-L@LISTSERV.UGA.EDU
>Subject: Re: Counting/extracting unique records
>Date: Mon, 13 Feb 2006 15:19:24 -0600
>
>Mark:
>
> A minor modification from Toby's code and tested with a small dataset
>temp. Output follows the code.
>
>data temp;
> format date mmddyy10.;
> input MyId Date mmddyy10.;
> datalines;
>1 1/2/2006
>1 2/2/2006
>2 3/2/2006
>2 1/2/2006
>;
>run;
>
>Proc sql ;
> create table test as
> select * , count(MyId) as IdCnt
> from temp
> group by MyId
> having Date = Max(Date) ;
>quit;
>
>proc print;
>run;
>
>***** Output *****
> The SAS System 07:41
>Monday, February 13, 2006 17
>
> Id
> Obs date MyId Cnt
>
> 1 02/02/2006 1 2
> 2 03/02/2006 2 2
>
>
>J S Huang
>1-515-557-3987
>fax 1-515-557-2422
>
> >>> toby dunn <tobydunn@HOTMAIL.COM> 2/13/2006 3:07:45 PM >>>
>Mark ,
>
>Untested but you could try:
>
>Proc sql ;
> create table test as
> select * , count(MyId) as IdCnt
> from Something
> group by MyId , Date
> having Date = Max(Date) ;
>quit ;
>
>
>
>
>Toby Dunn
>
>
>
>
>
>From: "Coleman, Mark" <Mark.Coleman@LIBERTYMUTUAL.COM>
>Reply-To: "Coleman, Mark" <Mark.Coleman@LIBERTYMUTUAL.COM>
>To: SAS-L@LISTSERV.UGA.EDU
>Subject: Counting/extracting unique records
>Date: Mon, 13 Feb 2006 15:59:42 -0500
>
>Greetings,
>
>I am working with a large data set (about 440,000 records with 300
>fields per record). One of the fields (call it myID) is the principal
>identifier, and may (legitimately) appear more than once in the
>dataset.
>Another field is a date identifier, which will be unique among any set
>of myID fields. I have two questions I'm hoping someone can help me
>answer. First, I'd like a way to simply count the number of unique
>identifiers in the field myID. Next, for each unique identifier in
>myID, I'd like to extract the record with the most current date in the
>datefield. Is there a way to each of these tasks in a single step?
>I've
>found a way to do it with two queries, but I was curious if there
>might
>be a way to do it more efficiently in one step.
>
>Any help would be greatly appreciated.
>
>Thank you.
>
>Best regards,
>
>Mark
|