| Date: | Fri, 31 Jul 2009 10:03:46 -0500 |
| Reply-To: | OR Stats <stats112@GMAIL.COM> |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | OR Stats <stats112@GMAIL.COM> |
| Subject: | Re: No Dupkey by Criteria |
|
| In-Reply-To: | <200907311501.n6VAnIW6010816@mailgw.cc.uga.edu> |
| Content-Type: | text/plain; charset=ISO-8859-1 |
Thanks, All. That's great.
On Fri, Jul 31, 2009 at 10:01 AM, Paul Dorfman <sashole@bellsouth.net>wrote:
> OR Stats,
>
> Frankly, what you have described puzzles me. It stands to reason that the
> *first* instance of a duplicate key record is kept, and in my experience
> it has unfailingly been so, which also pertains to your data sample. When
> I run it, I get Y=100 and Y=300, just as I expect. As I understand, it is
> assured by EQUALS proc sort option. If I really need to make sure that
> relative order of an unsorted file is the same as that of the sorted file,
> I code it explicitly, lest the system has the default set to NOEQUALS.
>
> But your task, IIUC, is to select, from each date BY-group, the record
> with the maximal Y. It is then either a 2-step process, as Joe has
> suggested, for example,
>
> data a ;
> input date: mmddyy8. Y ;
> cards ;
> 1/3/09 50
> 1/5/09 100
> 1/5/09 200
> 1/6/09 300
> 1/6/09 10
> ;
> run ;
>
> proc sort data = a out = b ;
> by date Y ;
> run ;
>
> data c ;
> set b ;
> by date ;
> if last.date ;
> run ;
>
> Or it can be a 1-step SQL process (which is, of course, still a 2-step
> process behind-the-scenes):
>
> proc sql ;
> create table c as
> select date
> , max (Y) as Y
> from a
> group date
> order date
> ;
> quit ;
>
> Or it can be a 1-step proc means/summary just as well:
>
> proc means data = a nway noprint ;
> class date ;
> var Y ;
> output out = c (drop = _:) max= ;
> run ;
>
> Kind regards
> ------------
> Paul Dorfman
> Jax, FL
> ------------
>
>
>
> On Fri, 31 Jul 2009 09:17:24 -0500, OR Stats <stats112@GMAIL.COM> wrote:
>
> >Hello:
> >
> >In PROC SORT, can we choose which duplicate record to keep by a
> conditional
> >statement?
> >
> >For example, my data looks like this
> >
> >Date Y
> >1/3/09 50
> >1/5/09 100
> >1/5/09 200
> >1/6/09 300
> >1/6/09 10
> >
> >For 1/5, Y=200 is kept and Y=100 is deleted; whereas for 1/6, Y=300 is
> kept
> >and Y=10 is deleted. The final data would look like the following
> >Date Y
> >1/3/09 50
> >1/5/09 200
> >1/6/09 300
> >
> >Thanks!
>
|