|
Thanks Ian,
I thought about sorting the data by the number of unique y in each
x, so that, the least number of y group get to pick the y first.
This could be done easily by SQL:
proc sql;
select *,count(distinct y) as ny
from have
group by x
order by ny,x,y
;
But then I wasn't sure this is enough, so I didn't post.
On Tue, 18 Dec 2007 18:55:57 +0000, Ian Whitlock <iw1junk@COMCAST.NET>
wrote:
>Summary: Difficult problem
>#iw-value=1
>
>Souga,
>
>You can be happy with Ya's solution, but I would point out
>that
>
> Obs X Y
> -----------------
> 1 A001 30
> 2 A002 50
> 3 A003 60
> 4 A004 40
>
>Provides a better solution to Ya's example, since every value of
>X has a Y. Consequently Ya's algorithm doesn't always find a
>complete solution even when one exists.
>
>However, for the appropriate sort of Ya's example, his algorithm
>does yield a complete solution.
>
> Obs X Y
> -----------------
> 1 A004 40
> 2 A001 10
> 3 A002 50
> 4 A003 60
>
>But don't always expect it.
>
>On the other hand, Ya's solution reduces to Howard's when the file
>is properly sorted and and the every pair condition is met.
>
>Ian Whitlock
>==============
>
>Date: Tue, 18 Dec 2007 12:00:56 -0500
>Reply-To: souga soga <souga1234@GMAIL.COM>
>Sender: "SAS(r) Discussion"
>From: souga soga <souga1234@GMAIL.COM>
>Subject: Re: selecting a unique set of data.
>Comments: To: Ya Huang <ya.huang@amylin.com>
>In-Reply-To: <200712180439.lBI4WOpB010892@mailgw.cc.uga.edu>
>Content-Type: text/plain; charset=ISO-8859-1
>
>THANK YOU ALL VERY VERY MUCH, this is exactly what i need.
>Just had one question :why is the (100*0) needed in the array statement
>:-array yy(1:100) _temporary_ (100*0);
>
>
>On 12/17/07, Ya Huang <ya.huang@amylin.com> wrote:
>>
>> If NOT every value of X has every value of Y, or number of observation
>> in each X varies, one way to get the unique Y is as follows, i.e., pick
>> a Y for first x group, record it as used y, then for the next x group,
>> check if the y has been used, if so, goes to next y in the group...
>> temp array in the code is used to record used y value.
>>
>> data have;
>> X="A001";Y=10;OUTPUT;
>> X="A001";Y=30;OUTPUT;
>> X="A001";Y=40;OUTPUT;
>> X="A003";Y=10;OUTPUT;
>> X="A003";Y=60;OUTPUT;
>> X="A003";Y=40;OUTPUT;
>> X="A003";Y=50;OUTPUT;
>> X="A002";Y=10;OUTPUT;
>> X="A002";Y=40;OUTPUT;
>> X="A002";Y=50;OUTPUT;
>> X="A004";Y=60;OUTPUT;
>> X="A004";Y=40;OUTPUT;
>> run;
>>
>> data need;
>> array yy(1:100) _temporary_ (100*0);
>> set have;
>> retain found used;
>> by x notsorted;
>> if first.x then found=0;
>> if found=0 and yy(y)=0 then do;
>> yy(y)=1;
>> used=y;
>> found=1;
>> end;
>> if last.x then do;
>> if found=1 then y=used; else y=.;
>> output;
>> end;
>> run;
>>
>> proc print;
>> run;
>>
>> Obs X Y found used
>>
>> 1 A001 10 1 10
>> 2 A003 60 1 60
>> 3 A002 40 1 40
>> 4 A004 . 0 40
>>
>> Note that for A004, both 60 and 40 have been used, therefore
>> assign a missing for it.
>>
>>
>>
>> On Mon, 17 Dec 2007 22:51:01 -0500, souga soga <souga1234@GMAIL.COM>
>> wrote:
>>
>> >I apologize for not being specific, anyway what i need is in your first
>> >paragraph
>> >
>> >"In your example every value of X has every value of Y. If this is
>> >accurate then sort by X Y, and select first record in first group of
>> >X's, second record from the second group, etc. to the last group of
>> >X's taking the last record in the group."
>> >
>> >Unfortunately i do not know how to program this, can someone help!
>> >
>> >Thanks again,
>> >Sa
>> >
>> >On Dec 17, 2007 10:03 PM, <iw1junk@comcast.net> wrote:
>> >
>> >> Summary: Specs should precede solution.
>> >> #iw-value=1
>> >>
>> >> Souga,
>> >>
>> >> In your example every value of X has every value of Y. If this is
>> >> accurate then sort by X Y, and select first record in first group of
>> >> X's, second record from the second group, etc. to the last group of
>> >> X's taking the last record in the group.
>> >>
>> >> If your example is not accurate, then it makes no sense for anyone to
>> >> give an answer until more is known about the requirements. For
>> >> example, suppose there are 5 distinct value of X and 4 distinct values
>> >> of Y, then it is impossible to have every value of X in the output
>> >> file no matter how the Y values are distributed. Consequently there
>> >> is
>> >> no solution when all values of X must be used.
>> >>
>> >> So the real question is - what do you have, and what are the
>> >> requirements for what you want? Knowing why would also help us to
>> >> know whether the problem is worth thinking about.
>> >>
>> >> It sounds like some kind of operations research problem.
>> >>
>> >> Ian Whitlock
>> >> ==============
>> >> Date: Mon, 17 Dec 2007 19:36:40 -0500
>> >> Reply-To: souga soga <souga1234@GMAIL.COM>
>> >> Sender: "SAS(r) Discussion"
>> >> From: souga soga <souga1234@GMAIL.COM>
>> >> Subject: selecting a unique set of data.
>> >> Content-Type: text/plain; charset=ISO-8859-1
>> >>
>> >> Hi,
>> >>
>> >> X="A001";Y=10;OUTPUT;
>> >> X="A001";Y=20;OUTPUT;
>> >> X="A002";Y=10;OUTPUT;
>> >> X="A002";Y=20;OUTPUT;
>> >>
>> >> I need the output to be:
>> >>
>> >> X="A001";Y=10;OUTPUT;
>> >> X="A002";Y=20;OUTPUT;
>> >>
>> >> ie. the value of y should not repeat for x
>> >>
>> >> Thanks, Sa
>> >>
>> >>
>>
|