LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (March 2010, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 26 Mar 2010 13:56:52 -0400
Reply-To:     "Keintz, H. Mark" <mkeintz@WHARTON.UPENN.EDU>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Keintz, H. Mark" <mkeintz@WHARTON.UPENN.EDU>
Subject:      Re: PROC COMPARE checking unique records irrespective of position
Comments: To: maverick2 <phekuchand@GMAIL.COM>
In-Reply-To:  <0b69305a-46e8-4ac7-b923-413e5776fd7d@t9g2000prh.googlegroups.com>
Content-Type: text/plain; charset="us-ascii"

Maverick:

Many options are available to you.

1. Mark T has already presented an SQL approach.

2. Sorting datasets and then merging (or interleaving them) to find the number of records for each value of the variables found in each datasets. Would take 3 steps.

3. Use hash tables. This is a single data step. Read dataset1 into one ordered hash table keyed on your variable, read in dataset2 into another. Then traverse with a hash iterator noting which keys are in both, in hash 1 only, or hash 2 only. A bit more complicated programming, but it is a single step.

4. But I think this is one of those tasks which initially seems most suitable to SQL or DATA step programming but might be done more efficaciously via one of the many SAS PROCs. Here's your problem submitted to PROC SUMMARY (I still call it by its archaic alias PROC MEANS). It's two steps, as below:

** Step 1: Make a data view, to be run through PROC MEANS. **; data viewboth / view=viewboth; set data1 (in=in1) data2 (in=in2); present1=in1; present2=in2; run;

** Step 2: This proc means will give, for each value of VAR, the sum of the dummy variables PRESENT1 and PRESENT2, i.e. the number of records in each dataset **;

proc means data=viewboth noprint; class var; ways 1; var present1 present2; output out=want (drop=_TYPE_ _FREQ_) sum=NRECS1 NRECS2; run;

For large datasets I suspect that this might be faster than those SQL suggestions which need to actually match records in the two datasts. PROC means, on the other hand will just run from top to bottom of the concatenated datasets, keeping running tabs on record counts in each dataset per class.

Regards, Mark

> -----Original Message----- > From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of > maverick2 > Sent: Friday, March 26, 2010 12:46 PM > To: SAS-L@LISTSERV.UGA.EDU > Subject: PROC COMPARE checking unique records irrespective of position > > If I have two datasets, each having one variable and all observations > being unique. only that one has more observations than the other. > > one two > > var1 var2 > ABC ABC > AAA AAA > CCC CCC > ACX XYZ > XYZ ZZZ > ZZZ > > Because of this, if I use PROC COMPARE, the output dataset shows first > 3 values matching. But even though #5 in one is equal to #4 in two > (similarly, #6 in one = #5 in two), the output dataset, from #4 and > beyond shows that these are not matching. Is there any way PROC > COMPARE can compare the values of a variable irrespective of position? > It would be helpful in this case since all observations are unique. > > Is there anyway using PROC COMPARE I can get matching observations


Back to: Top of message | Previous page | Main SAS-L page