LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (May 2010, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Sun, 16 May 2010 09:56:30 -0400
Reply-To:     OR Stats <stats112@GMAIL.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         OR Stats <stats112@GMAIL.COM>
Subject:      Re: Average the Dups
Comments: To: Murphy Choy <goladin@gmail.com>
In-Reply-To:  <AANLkTin4DOJgMRPQhqR4yaZ8S3zWW4c1yXqlWg1tekNj@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

The challenge is it is a very large dataset in which it would be painful to go thru every row to see where the dups occur. We would like the code itself to detect everytime there is a dup row in the data and then average the two rows; otherwise it just leaves the unique row on its own.

pseudo code;

/*my original dataset from outside of SAS called origtable*/

data mynewtable; set origtable; /*if there is a dup by id, take the average of the two rows*/ run;

What is the syntax for the condition above in the datastep? And/or what would be the syntax for the above in PROC SQL. Again, I don't know the row #'s of where the dups are occuring in the large origtable.

Thanks

On Sun, May 16, 2010 at 9:03 AM, Murphy Choy <goladin@gmail.com> wrote:

> Hi, > > I think PROC SQL is the preferred approach. > > Proc sql; > > create table result as select x as x, y as y,mean(x1) as x1,mean(y1) as y1 > from infodata group by x,y; > > quit; > > > > On Sun, May 16, 2010 at 8:57 PM, OR Stats <stats112@gmail.com> wrote: > >> Hello: >> >> I have a dataset that is >> >> id x y >> >> Every once in a while in the dataset there would be two consecutive rows >> with the same unique id. In cases of these duplicates, I would like to >> take >> the average of the two rows. How can I do this in SAS? For example the >> data would like this >> >> 1 x1 y1 >> 2 x2 y2 >> 3 x3a y3a >> 3 x3b y3b >> 4 x4 y4 >> . >> . >> . >> 74 x74a y74a >> 74 x74b y74b >> >> . >> . >> . >> The final table would look like >> 1 x1 y1 >> 2 x2 y2 >> 3 Average(x3a,x3b) Average(y3a,y3b) >> 4 x4 y4 >> . >> . >> . >> 74 Average(x74a,x74b) Average(y74a,y74b) >> >> . >> . >> . >> Can we do this in a single datastep or PROC SQL? Which is faster? >> > > > > -- > Regards, > Murphy Choy > > Certified Advanced Programmer for SAS V9 > Certified Basic Programmer for SAS V9 > DataShaping Certified SAS Professional >


Back to: Top of message | Previous page | Main SAS-L page