```Date: Fri, 7 Feb 2003 11:13:32 -0500 Reply-To: Ian Whitlock Sender: "SAS(r) Discussion" From: Ian Whitlock Subject: Re: Merging datasets ... Comments: To: "julierog@ix.netcom.com" Content-Type: text/plain; charset="iso-8859-1" Roger, Everything you say looks good except the question. Perhaps I missed something. IanWhitlock@westat.com -----Original Message----- From: Roger Lustig [mailto:trovato@BELLATLANTIC.NET] Sent: Thursday, February 06, 2003 5:16 PM To: SAS-L@LISTSERV.UGA.EDU Subject: Re: Merging datasets ... Ian: You're right: the solutions you give are limited by time. Don't really know whether the data step is faster; since you're not sorting first, you have to do about N*K/2 lookups, which can be a lot. (K is the number of rows in the lookup table.) Or am I missing something? If you sort both files first, you can then optimize the data step by starting your lookups for a given w record with the lkup record that provided a range match most recently. That way, you do at most 1 read for each lkup record. Depending on how much the I/O costs for one read from the lookup data set, the unsorted approach may be pretty expensive. After all, log2(500,000) is somewhere short of 19. However expensive each step of a search through a B+ tree may be, it's not 13,000 (=250,000/19ish) times more expensive than a read from the lkup file. Roger Ian Whitlock wrote: > Johny, > > Here are two fast to code solutions. > > data w ; input number ; cards ; > 23 > 24 > 435 > 45 > ; > > data lkup ; input id from to ; cards ; > 1 1 3 > 2 4 23 > 3 26 500 > ; > > proc sql ; > create table q as > select * > from w left join lkup > on number between from and to > order by number > ; > quit ; > > data q ; > set w ; > do pt = 1 to nobs until ( flag ) ; > set lkup point = pt nobs = nobs ; > if from <= number <= to then flag = 1 ; > else > if number < from then flag = -1 ; > end ; > if flag = -1 then id = . ; > run ; > > The DATA step will be somewhat faster than the SQL. These solutions are > primarily limited by time. > > If efficiency really matters and the number of records in LKUP are in the > small millions or less, then there will be an array solution which will be > more efficient than the above. > > Roger's solution with a format will also be good but then LKUP should > probably be under 500,000+. The exact limits on the array and format > solutions will depend on how much memory you have. My numbers for > relatively modest PC's. My guess is that these solutions are not worth the > effort unless you are learning, the limits are important, or you have to run > many times. > > IanWhitlock@westat.com > -----Original Message----- > From: Johny Bravo [mailto:manu_killer@POCZTA.ONET.PL] > Sent: Thursday, February 06, 2003 2:25 PM > To: SAS-L@LISTSERV.UGA.EDU > Subject: Merging datasets ... > > > Hi ! > > I have a following problem. > I have two datasets. > First dataset contains numbers, e.g.: > number > 23 > 24 > 435 > 45 > ... > And second dataset that contains intervals, e.g.: > id from to > 1 1 3 > 2 4 23 > 3 26 500 > ... .... .... > > I want to classify all numbers from first dataset > to intervals in the second. I want to have on output > dataset that contains number and an ID of particular > interval, e.g.: > number ID > 23 2 > 24 null > 435 3 > 45 3 > ... ... > ( BTW intervals in second set are separate) > > I am wondering how to do it with data step and maybe "retain". > Maybe If-clause is nessesary, i don't know :( > The main problem is how to join datasets with diffrent number > of observations. > > Thanks in advance for advice, > Best regards, > Johny Bravo ;) ```

Back to: Top of message | Previous page | Main SAS-L page