LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (March 2001, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 2 Mar 2001 17:27:05 -0000
Reply-To:     sashole@bellsouth.net
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Paul Dorfman <paul_dorfman@HOTMAIL.COM>
Subject:      Re: Match Merge
Comments: To: jimcmi2@USWEST.COM
Content-Type: text/plain; format=flowed

>From: James McMillen <jimcmi2@USWEST.COM>

>I am confident I just do not understand how the SAS merge works. You have not presented any evidence to prove it :-).

>Please help.

Gladly, but you did not tell what you thought was wrong.

>I have two data sets, one with customer phone numbers and zip_codes, >one >of just zip codes. The first has 10.5 million observations, >the second >has l227. I want customers that have zips codes that >match the second >file. There are multiple customers in each zip, >but that zip only shows >on the zip file once.

For this, why even bother with sorts and merge? Try the thingie below and (just for the sake of fun), compare the run-time with that of the merge:

data indterr.btns_covered_mt indterr.btns_notcovered_mt ; array xzip (0:99999) _temporary_; if _n_=1 then do until (zeof); set indterr.Zips_state_sort2 (keep=zip_code) end=zeof; xzip (zip_code) = 1; end; set indterr.custbtns_new_sort; if xzip (zip_code) then output indterr.btns_covered_mt; else output indterr.btns_notcovered_mt; run;

Note that with this, you do not have to spend time sorting either file - it (and any other direct-addressing scheme) does not rely on any kind of input order. But of course if the files are already in order, it does not hurt (nor does it help). I assume that you have 5-digit, numeric zip codes. If they are 5-digit character, use the expression

xzip (input(zip_code,5.))

instead of you understand what. If you have 9-digits zip codes, do not try it at home by expanding the array to [0:999999999] range! Instead, write to SAS-L. The 'wide-range' situation can be dealt with in a way, much more clever than buying 20 Gb of memory :-).

If you need to drag variables other than the key (zip_code) from the small data set into the output, create a corresponding number of temporary arrays having proper types/declared lenghts, parallel to xzip.

Kind regards, ==================== Paul M. Dorfman Jacksonville, Fl ====================

>Here is my code, and the log results: (I only get one match per >zip_code, but there are multiple for each zip) > >838 data indterr.btns_covered_mt indterr.btns_notcovered_mt; >839 btns=0; >840 zips5=0; >841 merge indterr.Zips_state_sort2 (in=zips5) >842 indterr.custbtns_new_sort (in=btns) ; >843 by zip_code ; >844 If btns = 1 and zips5 = 1 then output indterr.btns_covered_mt; >845 else if btns = 1 and zips5 = 0 then output >indterr.btns_notcovered_mt ; > >846 run ; > >NOTE: There were 1227 observations read from the data set >INDTERR.ZIPS_STATE_SORT2. >NOTE: There were 10549178 observations read from the data set >INDTERR.CUSTBTNS_NEW_SORT. >NOTE: The data set INDTERR.BTNS_COVERED_MT has 1121 observations and 4 >variables. >NOTE: The data set INDTERR.BTNS_NOTCOVERED_MT has 10548057 observations >and 4 variables.

_________________________________________________________________ Get your FREE download of MSN Explorer at http://explorer.msn.com


Back to: Top of message | Previous page | Main SAS-L page