LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (January 2010, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 28 Jan 2010 07:44:44 -0800
Reply-To:     mlhoward@avalon.net
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Mary <mlhoward@AVALON.NET>
Subject:      Re: Multiple Observations to one observation
Comments: To: rchaar@GMAIL.COM
Content-Type: text/plain; charset="UTF-8"

pi_girl,

Not off to a good start, but it isn't a bad question. Note you asked last evening when most people had already left for the day in the Eastern and Midwestern US and Canada.

I have attempted to do things like this- what I don't understand is why you have 13 tasks on one record and 20 tasks on another- you need to explain more about what your project is.

I was dealing with genetics data last year, and had a sitution like this

id, datetime, snp1, snp2, snp3 001, 1/5/08, "AA", " ", "BB" 001, 1/6/08, "AB", "BB", " "

In this case you'd want to "roll up" the data. There are a number of approaches to doing this; it sort of depends on how many variables you have. One approach, if you have a maximum of two records per id, is to split the dataset into two datasets; one for the first record and the other for the second, then merge them back together again, such as

proc sql noprint; create table mergedtable as select set1.id, set1.datetime as time1, set1.snp1 as set1snp1, set1.snp2 as set1snp2, set1.snp3 as set1snp3, set2.snp1 as set2snp1, ... from set1 left outer join set2 on set1.id=set2.id order by set1.id; quit;

Then either in the same step or in a subsequent data step you can use the coalesec function to use the first non-missing value, which in SQL would look like this:

coalesce(set1.snp1, set2.snp1) as snp1

****

If you have more than a max of two records per subject, then this gets cumbersome. In that case then I'd split the data by columns and attempt to roll it up, so that you'd wind up with a record like this

001, "AA;AB", "BB", "BB"

This can be done in a data step with a by id statement.

Let us know more about what your data looks like rather than being so vague. I suspect the first approach is better for you, as splitting the data into two sets would allow you to use Proc Compare to check that values are the same before combining them- when I did this with genetics data then I could let the DNA people know which ones didn't match and they could reconcile which one was correct. It really isn't a good idea just to take the first value (i.e., just have SNP1 be AA in the case above); rather there needs to be a reconciliation of why the two are different, and if that's not possible, set the value to missing, not the first value.

Perhaps you could ask questions and clarify from this point and we could improve the tone back to a good point again.

-Mary

--- art297@NETSCAPE.NET wrote:

From: Arthur Tabachneck <art297@NETSCAPE.NET> To: SAS-L@LISTSERV.UGA.EDU Subject: Re: Multiple Observations to one observation Date: Wed, 27 Jan 2010 18:27:09 -0500

pi_girl,

It will be a lot easier for anyone to respond if you provide a data step that shows some sample data, along with a data step that shows the file you want as a result of the code you are asking the list to develop.

Then, you can explain the decision rules in relation to the sample data.

Art -------- On Wed, 27 Jan 2010 13:06:27 -0800, pi_girl <rchaar@GMAIL.COM> wrote:

>I have a bunch of duplicate observations, however I do not want to >throw away any of the data but duplicating them. > >I need help with some SAS code, using retain and the array statement. > >Let's start with this simple case - if there are two records which >each have data for observation one and observation two. For obs 1 >there are 13 tasks and for obs 2 there are 20 tasks. I need to create >temporary arrays for obs1 and obs2 (the values for each task are Y,N, >or N/A). If there is a duplicate record ID, the first case would be >data for obs1, blank for obs2 but blank for obs1 and data for obs2 in >the second record. So one array is an obs temp which takes the data >if it is in obs1 for record 1 and takes the data from record 2 obs 2 >and a temp obsnumb var. Eventually want one record with obs1 and obs2 >data including all tasks and dates for each. > >Another issues with the date variable is I have obs1 month and day in >char format and same with obs2, would like to combine them into two >manageable dates to use for processing, I know I'm going to have to >concatenate somehow... > >I know this issue gets complicated but I would appreciate any code >help since I'm not the greatest SAS guru in the world. Thanks in >advance for your help!


Back to: Top of message | Previous page | Main SAS-L page