LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (December 2001, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 13 Dec 2001 20:56:08 GMT
Reply-To:     roger_phillips@ONETEL.NET.UK
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         roger_phillips@ONETEL.NET.UK
Organization: ntl Cablemodem News Service
Subject:      Re: Using Key= to perform a table lookup

Having found this thread rather late (3 weeks+ late) I was tempted to just let it go, but since Ian makes such a good point and since some of the responses got so complicated, I felt I just had to put my ore (or is it oar?) in.

Firstly, Ian's assertion that the "default" behaviour seems ridiculous, well, why not? Whenever did the SAS developers start using sensible defaults? I sometimes think that when they do come up with a sensible default, it is more by accident than by design. :-)

Secondly, Key= is a facility which I tend to use quite heavily and it is very powerful, if you understand it's foibles.

Scenario 1 - direct access to a single record (sorry row) in a lookup table. If the data you want to access is more than a single variable (column), (when, of course, you'd use Format) use Key= with the /UNIQUE option. That's what it's for.

Scenario 2 - direct access to multiple records for the same key value for a one to many join and the main file can be guaranteed unique (PROC SORT ... NODUPKEY; should do it.) Use KEY= taking the default behaviour to loop through all records until none are left.

Scenario 3 - now it's getting tricky because we want a many to many join and yes, it happens in my work a lot and it's all down to my users' preferences. "Users are OK, but would you let your daughter marry one?".

Anyway, we must now understand that each SET KEY= statement stands on it's own, thus if I code two identical set key= statements one after the other, they will both return the same keyed record. One statement is not affected by the other. So how do I guarantee I start with the first record of a set and read through to the last, well, quite simply by cheating and it works like this:

Data Fred; /* Where would we be without Fred? */ Set DriverFile; OnFirst = 1; Drop OnFirst; Do until(_error_); * (what's with _iorc_ anyway?); If OnFirst > 0 then /* Force an error to start with */ do; /* Next time around same instruction */ /* will find first record in set */ StoredKey = KeyValue; drop StoredKey; KeyValue = .; * or something not on the keyed file; end;

else KeyValue = StoredKey;

Set KeyedFile Key=KeyValue;

If OnFirst > 0 then do; _error_ = 0; OnFirst = -1; end;

else If OnFirst < 0 then Do; OnFirst = 0; output; End;

else If not _error_ then output; End; _error_ = 0; run;

The important thing is to create a pseudo SET KEY= ... /UNIQUE; statement by setting the key value to something not in the file. (actually anything other than the current key value will work as well). We ignore the data which comes from this iteration and reset the key back to it's proper value and now the first real read will always return the first row from a set and subsequent reads will return additional rows, until all rows with that key value have been read. Simple eh?

So to answer Ian's original question, yes, I can think of a scenario where the "default" behaviour is useful, but it takes a bit of effort to make it work as you want it to.

My most extreme use of this technique required ten keyed files with multiple values and a number of different keys. The master file also potentially had multiple values for each key.

Roger Phillips

On 28-Nov-2001, WHITLOI1@WESTAT.COM (Ian Whitlock) wrote:

> Path: > news6-win.server.ntlworld.com!news1-win.server.ntlworld.com!news5-gui.server.ntli.net!ntli.net!newspeer.clara.net!news.clara.net!news2.euro.net!uunet!ash.uu.net!finch!cronkite!news.uga.edu!not-for-mail > From: WHITLOI1@WESTAT.COM (Ian Whitlock) > Newsgroups: comp.soft-sys.sas > Subject: Re: Using Key= to perform a table lookup > Message-ID: > <08B08C9FA5EBD311A2CC009027D5BF8102E2AB89@remailnt2-re01.westat.com> > Date: 28 Nov 01 20:37:03 GMT > Sender: saslmnt@LISTSERV.UGA.EDU > MIME-Version: 1.0 > Content-Type: text/plain; charset="iso-8859-1" > Reply-To: Ian Whitlock <WHITLOI1@WESTAT.COM> > Lines: 59 > Xref: news5-gui.server.ntli.net comp.soft-sys.sas:76122 > X-Received-Date: Wed, 28 Nov 2001 20:49:36 GMT > (news6-win.server.ntlworld.com) > > Subject: Using Key= to perform a table lookup > Summary: Is the default a 10 year old design bug? > Respondent: IanWhitlock@westat.com > > Alistair Gordon [Alistair_Gordon@STANDARDLIFE.COM] has fallen into a > common SASTrap. He wants to look up FUND+SEDOL combinations from > ALLSPAR multiple times in DEALSEDS. Here is his code. > > data spardate; > set allspar(keep=fund sedol date); > set dealseds key=dealkey; > if _iorc_=%sysrc(_dsenom) then > do; > error=0; > delete; > end; > run; > > There are two mistakes. The simple one, misspelling of _ERROR_, is > his mistake. The other, the failure to use the UNIQUE option of the > SET statement, is more questionable and the one I would label SASTrap. > > As the documentation states, by default, when adjacent key values are > encountered SAS will continue looking for more records with the same > key. In Alistair's case there aren't any and thus his problem. > > Note the funny situation here. If adjacent records have the same key > SAS goes onto to look for a second record with that key. If they are > not adjacent, SAS starts over with the first one. This means the > result of the lookup must depend on the physical order of the find > file, ALLSPAR in this case. Moreover, Suppose Alistair actually wanted > this feature, so that the second record with the key should match the > second record with the key in the lookup file, DEALSEDS. So what is > the second one? Now we find the lookup result also depends on the > physical order of the lookup file. Thus for consistency the order of > both files must be predetermined in order to use this KEY tool. But > the whole idea of direct access is to make free one from the physical > order. > > Add to this that you must have enough records of the same key value in > the lookup file to not run out. This combination of ideas, scared me > to the point that I resolved to never to use the default, i.e. always > use /UNIQUE. Thus, for me, the choice of defaults very poorly chosen. > My question goes further. Has anyone ever found a realistic example > where they dared to use the default in a meaningful manner? If so, I > would like to see it discussed. > > Note that it is not fair to say you use it when it is known that there > cannot be any duplicates. In this case the choice is irrelevant and > have simply chosen not to write /UNIQUE. It is not a case where you > actually need the default behavior. > > I have seen cases where one might consider the default option. Say I > have a unique set of keys to lookup and I want to find all occurrences > of these keys in the lookup file. Even here I would consider it more > stable and therefore more correct code to sequentially number the > original keys and make a new unique composite key. > > Ian Whitlock


Back to: Top of message | Previous page | Main SAS-L page