Date: Wed, 4 Mar 2009 01:13:36 -0500
Reply-To: Jishen Zhao <jcz50@HOTMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Jishen Zhao <jcz50@HOTMAIL.COM>
Subject: Re: Search and update an external dataset simultaneously
In-Reply-To: <445d9dbe0903021839i29eff6b9v11155c475dcf1339@mail.gmail.com>
Content-Type: text/plain; charset="Windows-1252"
Yu,
I feel a little guilty for claiming yesterday that removing the "if obsnum=last then output" statement did not solve my problem. In fact, when I checked my code this evening, I found that I left an "output" statement in my code after removing the "if obsnum=last then ..." part. (The "output" statement is within the loop of this part.). That is why I got 12 ids instead of 3, and the output did not make a lot of sense.
I reran my revised code a moment ago, and it finally produced what I needed. I really appreciate your time and patience in guiding me through this process. Take care.
Jishen
From: jcz50@hotmail.com
To: zhangyu05@gmail.com
CC: sas-l@listserv.uga.edu
Subject: RE: Search and update an external dataset simultaneously
Date: Mon, 2 Mar 2009 23:56:31 -0500
Yu,
Thank you again for your help.
I am reluctant to comment out the "if obsnum=last then output" statement mainly because the order of the selected id is important to me. My sample data is oversimplified. In reality, the difference value for each id changes if another id is removed from the dataset. So, if an id is ranked number two in the first iteration, it may drop to number five in the next iteration, and still will not be selected. In other words, I need to find the best id in each iteration relative to all the available ids. This seems to suggest that each time the "last" id is selected, the "last" value probably needs to be adjusted (last -1) so that I will still be able to put out the next appropriate id.
By the way, I took a trial run. Because of the "if obsnum=last then output" statement, only one of the three selected ids could be listed. When I commented out the statement, I got 12 ids, which include different repeats of the three as well as some unqualified ids.
Jishen
Date: Mon, 2 Mar 2009 20:39:50 -0600
Subject: Re: Search and update an external dataset simultaneously
From: zhangyu05@gmail.com
To: jcz50@hotmail.com
CC: sas-l@listserv.uga.edu
Jishen,
you will output each record from gvn, the purpose of the direct access to ext dataset is get the min difference. so really truly the statement :if obsnum=last then output;
is useless here. just comment it out and see if you will get waht you want.
HTH
Yu
On Mon, Mar 2, 2009 at 6:30 PM, Jishen Zhao <jcz50@hotmail.com> wrote:
Yu,
Thank you very much for your new code. It effectively solves my problem in removing the pre-selected id in the external dataset by using the statement
if indexw(selected,id)>0 then continue;
I was thinking of using some cumbersome code to get the job done, but yours is definitely nice and simple.
There is still one issue with the code, though. The statement
if obsnum=last then output;
seems to depend on the data. If the id associated with “last“ is removed early in the game, the code will no longer put out anything. This can be seen if the external data is modified as follows:
data ext;
input id $ tt;
cards;
AA 4
BB 5
CC 3
DD 2
EE 0
;run;
I wonder if this can be fixed or not.
Once again, thank you for your time.
Jishen
Date: Sun, 1 Mar 2009 23:40:08 -0600
Subject: Re: Search and update an external dataset simultaneously
From: zhangyu05@gmail.com
To: jcz50@hotmail.com
Ok, I presume your sample code does what you expected, except not removing the pre-selected id in external dataset.
so here is the modified version of your code:
data new;
length selected $150.;
retain selected ' ';
set gvn;
do obsnum=1 to last;
set ext point=obsnum nobs=last;
if indexw(selected,id)>0 then continue;
maxdif=vv-tt;
if dif<maxdif then
do;
dif=maxdif;
idd=id;
end;
if obsnum=last then output;
end;
selected=catx(' ',selected,idd);
keep vv idd tt selected;
run;
Yu
On Sun, Mar 1, 2009 at 11:23 PM, Jishen Zhao <jcz50@hotmail.com> wrote:
Yu,
Thank you so much for your prompt response.
Your code certainly solves my problem as it is presented. The problem is that I failed to present my case accurately in an attempt to simplify my case. In reality, the difference (vv - tt) is a product of more than twenty factors, each of which is a product by itself. This means that I just cannot determine which tt is the best candidate for a vv by sorting.
Thank you again for your time.
Jishen
Date: Sun, 1 Mar 2009 22:57:26 -0600
Subject: Re: Search and update an external dataset simultaneously
From: zhangyu05@gmail.com
To: jcz50@hotmail.com
CC: SAS-L@listserv.uga.edu
Jishen,
I am not sure if understood all requirements. however, base on your data and find dataset you need, I thnik following code does what you want.
sort your external dataset first,
data ext;
input id $ tt;
cards;
AA 4
BB 5
CC 0
DD 2
EE 3
;
run;
proc sort data=ext;
by tt;
run;
data new;
set gvn;
if _n_<=last then
do;set ext point=_n_ nobs=last;
diff=vv-tt;
end;
keep vv id tt;
run;
Hth,
Yu
On Sun, Mar 1, 2009 at 10:23 PM, JIshen Zhao <jcz50@hotmail.com> wrote:
Hi. I need to create a dataset that combines a given dataset and the
information from an external dataset. While the new dataset is being
created, the external dataset needs to be updated simultaneously.
A simplified version of these datasets can be presented as follows:
Given dataset:
data gvn;
do i=7 to 9;
vv=i;
output;
drop i;
end;
run;
External dataset:
data ext;
input id $ tt;
cards;
AA 4
BB 5
CC 0
DD 2
EE 3
;
run;
The intended new dataset should contain the original three vv's in the
given dataset, each of which is followed by an unduplicated pair of id and
tt from the external dataset that maximizes the difference between the vv
and the tt (vv - tt):
vv id tt
1 CC 0
2 DD 2
3 EE 3
Because the id and the tt cannot be repeated in the output, one record
will be removed from the external dataset every time it has been selected
for the new dataset.
I came up with the following code, but soon realized that it cannot remove
records from the external dataset as expected:
data new;
set gvn;
do obsnum=1 to last;
set ext point=obsnum nobs=last;
retain dif idd;
if obsnum=1 then idd=id;
maxdif=vv-tt;
if dif<maxdif then
do;
dif=maxdif;
idd=id;
end;
if obsnum=last then output;
end;
keep vv idd tt;
run;
I would greatly appreciate it if you would provide me with some guidance
in solving this problem.
Jishen
Windows Live™ Contacts: Organize your contact list. Check it out.
Express your personality in color! Preview and select themes for Hotmail®. See how.
Windows Live™ Contacts: Organize your contact list. Check it out.
_________________________________________________________________
Hotmail® is up to 70% faster. Now good news travels really fast.
http://windowslive.com/online/hotmail?ocid=TXT_TAGLM_WL_HM_70faster_032009