Date: Tue, 18 Dec 2007 18:55:57 +0000 iw1junk@COMCAST.NET "SAS(r) Discussion" Ian Whitlock Re: selecting a unique set of data. cc: souga soga , Ya Huang

Summary: Difficult problem #iw-value=1

Souga,

You can be happy with Ya's solution, but I would point out that

Obs X Y ----------------- 1 A001 30 2 A002 50 3 A003 60 4 A004 40

Provides a better solution to Ya's example, since every value of X has a Y. Consequently Ya's algorithm doesn't always find a complete solution even when one exists.

However, for the appropriate sort of Ya's example, his algorithm does yield a complete solution.

Obs X Y ----------------- 1 A004 40 2 A001 10 3 A002 50 4 A003 60

But don't always expect it.

On the other hand, Ya's solution reduces to Howard's when the file is properly sorted and and the every pair condition is met.

Ian Whitlock ==============

Date: Tue, 18 Dec 2007 12:00:56 -0500 Reply-To: souga soga <souga1234@GMAIL.COM> Sender: "SAS(r) Discussion" From: souga soga <souga1234@GMAIL.COM> Subject: Re: selecting a unique set of data. Comments: To: Ya Huang <ya.huang@amylin.com> In-Reply-To: <200712180439.lBI4WOpB010892@mailgw.cc.uga.edu> Content-Type: text/plain; charset=ISO-8859-1

THANK YOU ALL VERY VERY MUCH, this is exactly what i need. Just had one question :why is the (100*0) needed in the array statement :-array yy(1:100) _temporary_ (100*0);

On 12/17/07, Ya Huang <ya.huang@amylin.com> wrote: > > If NOT every value of X has every value of Y, or number of observation > in each X varies, one way to get the unique Y is as follows, i.e., pick > a Y for first x group, record it as used y, then for the next x group, > check if the y has been used, if so, goes to next y in the group... > temp array in the code is used to record used y value. > > data have; > X="A001";Y=10;OUTPUT; > X="A001";Y=30;OUTPUT; > X="A001";Y=40;OUTPUT; > X="A003";Y=10;OUTPUT; > X="A003";Y=60;OUTPUT; > X="A003";Y=40;OUTPUT; > X="A003";Y=50;OUTPUT; > X="A002";Y=10;OUTPUT; > X="A002";Y=40;OUTPUT; > X="A002";Y=50;OUTPUT; > X="A004";Y=60;OUTPUT; > X="A004";Y=40;OUTPUT; > run; > > data need; > array yy(1:100) _temporary_ (100*0); > set have; > retain found used; > by x notsorted; > if first.x then found=0; > if found=0 and yy(y)=0 then do; > yy(y)=1; > used=y; > found=1; > end; > if last.x then do; > if found=1 then y=used; else y=.; > output; > end; > run; > > proc print; > run; > > Obs X Y found used > > 1 A001 10 1 10 > 2 A003 60 1 60 > 3 A002 40 1 40 > 4 A004 . 0 40 > > Note that for A004, both 60 and 40 have been used, therefore > assign a missing for it. > > > > On Mon, 17 Dec 2007 22:51:01 -0500, souga soga <souga1234@GMAIL.COM> > wrote: > > >I apologize for not being specific, anyway what i need is in your first > >paragraph > > > >"In your example every value of X has every value of Y. If this is > >accurate then sort by X Y, and select first record in first group of > >X's, second record from the second group, etc. to the last group of > >X's taking the last record in the group." > > > >Unfortunately i do not know how to program this, can someone help! > > > >Thanks again, > >Sa > > > >On Dec 17, 2007 10:03 PM, <iw1junk@comcast.net> wrote: > > > >> Summary: Specs should precede solution. > >> #iw-value=1 > >> > >> Souga, > >> > >> In your example every value of X has every value of Y. If this is > >> accurate then sort by X Y, and select first record in first group of > >> X's, second record from the second group, etc. to the last group of > >> X's taking the last record in the group. > >> > >> If your example is not accurate, then it makes no sense for anyone to > >> give an answer until more is known about the requirements. For > >> example, suppose there are 5 distinct value of X and 4 distinct values > >> of Y, then it is impossible to have every value of X in the output > >> file no matter how the Y values are distributed. Consequently there > >> is > >> no solution when all values of X must be used. > >> > >> So the real question is - what do you have, and what are the > >> requirements for what you want? Knowing why would also help us to > >> know whether the problem is worth thinking about. > >> > >> It sounds like some kind of operations research problem. > >> > >> Ian Whitlock > >> ============== > >> Date: Mon, 17 Dec 2007 19:36:40 -0500 > >> Reply-To: souga soga <souga1234@GMAIL.COM> > >> Sender: "SAS(r) Discussion" > >> From: souga soga <souga1234@GMAIL.COM> > >> Subject: selecting a unique set of data. > >> Content-Type: text/plain; charset=ISO-8859-1 > >> > >> Hi, > >> > >> X="A001";Y=10;OUTPUT; > >> X="A001";Y=20;OUTPUT; > >> X="A002";Y=10;OUTPUT; > >> X="A002";Y=20;OUTPUT; > >> > >> I need the output to be: > >> > >> X="A001";Y=10;OUTPUT; > >> X="A002";Y=20;OUTPUT; > >> > >> ie. the value of y should not repeat for x > >> > >> Thanks, Sa > >> > >> >

Back to: Top of message | Previous page | Main SAS-L page