LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (February 2003, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 7 Feb 2003 11:13:32 -0500
Reply-To:     Ian Whitlock <WHITLOI1@WESTAT.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Ian Whitlock <WHITLOI1@WESTAT.COM>
Subject:      Re: Merging datasets ...
Comments: To: "julierog@ix.netcom.com" <julierog@ix.netcom.com>
Content-Type: text/plain; charset="iso-8859-1"

Roger,

Everything you say looks good except the question. Perhaps I missed something.

IanWhitlock@westat.com

-----Original Message----- From: Roger Lustig [mailto:trovato@BELLATLANTIC.NET] Sent: Thursday, February 06, 2003 5:16 PM To: SAS-L@LISTSERV.UGA.EDU Subject: Re: Merging datasets ...

Ian:

You're right: the solutions you give are limited by time. Don't really know whether the data step is faster; since you're not sorting first, you have to do about N*K/2 lookups, which can be a lot. (K is the number of rows in the lookup table.) Or am I missing something?

If you sort both files first, you can then optimize the data step by starting your lookups for a given w record with the lkup record that provided a range match most recently. That way, you do at most 1 read for each lkup record. Depending on how much the I/O costs for one read from the lookup data set, the unsorted approach may be pretty expensive.

After all, log2(500,000) is somewhere short of 19. However expensive each step of a search through a B+ tree may be, it's not 13,000 (=250,000/19ish) times more expensive than a read from the lkup file.

Roger

Ian Whitlock wrote: > Johny, > > Here are two fast to code solutions. > > data w ; input number ; cards ; > 23 > 24 > 435 > 45 > ; > > data lkup ; input id from to ; cards ; > 1 1 3 > 2 4 23 > 3 26 500 > ; > > proc sql ; > create table q as > select * > from w left join lkup > on number between from and to > order by number > ; > quit ; > > data q ; > set w ; > do pt = 1 to nobs until ( flag ) ; > set lkup point = pt nobs = nobs ; > if from <= number <= to then flag = 1 ; > else > if number < from then flag = -1 ; > end ; > if flag = -1 then id = . ; > run ; > > The DATA step will be somewhat faster than the SQL. These solutions are > primarily limited by time. > > If efficiency really matters and the number of records in LKUP are in the > small millions or less, then there will be an array solution which will be > more efficient than the above. > > Roger's solution with a format will also be good but then LKUP should > probably be under 500,000+. The exact limits on the array and format > solutions will depend on how much memory you have. My numbers for > relatively modest PC's. My guess is that these solutions are not worth the > effort unless you are learning, the limits are important, or you have to run > many times. > > IanWhitlock@westat.com > -----Original Message----- > From: Johny Bravo [mailto:manu_killer@POCZTA.ONET.PL] > Sent: Thursday, February 06, 2003 2:25 PM > To: SAS-L@LISTSERV.UGA.EDU > Subject: Merging datasets ... > > > Hi ! > > I have a following problem. > I have two datasets. > First dataset contains numbers, e.g.: > number > 23 > 24 > 435 > 45 > ... > And second dataset that contains intervals, e.g.: > id from to > 1 1 3 > 2 4 23 > 3 26 500 > ... .... .... > > I want to classify all numbers from first dataset > to intervals in the second. I want to have on output > dataset that contains number and an ID of particular > interval, e.g.: > number ID > 23 2 > 24 null > 435 3 > 45 3 > ... ... > ( BTW intervals in second set are separate) > > I am wondering how to do it with data step and maybe "retain". > Maybe If-clause is nessesary, i don't know :( > The main problem is how to join datasets with diffrent number > of observations. > > Thanks in advance for advice, > Best regards, > Johny Bravo ;)


Back to: Top of message | Previous page | Main SAS-L page