LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (March 2008, week 5)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Sat, 29 Mar 2008 06:35:30 -0500
Reply-To:     "data _null_," <datanull@GMAIL.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "data _null_," <datanull@GMAIL.COM>
Subject:      Re: inserting "missing observations"
Comments: To: jfh@stanfordalumni.org
In-Reply-To:  <1206764626.24840.1244906567@webmail.messagingengine.com>
Content-Type: text/plain; charset=ISO-8859-1

You might like to try STDIZE to poke all those missing values with Zeros...

proc stdize missing=0 reponly out=cow_visits_complete; var visits dur kg; run;

On Fri, Mar 28, 2008 at 11:23 PM, Jack Hamilton <jfh@stanfordalumni.org> wrote: > If your input data set contains at least one record for each cow, one > record for each day, and one record for each time, you can use the > COMPLETETYPES option in PROC SUMMARY to generate the missing records, > with missing values for visits, dur, and kg. You could then go through > the data set and convert the missing values to 0. > > Consider this data set; it is your original one with 5 extra visits > added to make sure that there's at least one visit for each hour > (presumably you won't need to do this if you have a large herd of cows > that are hungry all day): > > ===== > data cow_visits; > infile cards; > input day cow time visits dur kg; > cards; > > 1 79 0 5 1534 4.5 > 1 79 4 7 1242 2.4 > 1 79 6 2 247 0.8 > 1 79 7 1 456 2.1 > 1 79 8 6 1419 5.5 > 1 79 9 7 2092 7.4 > 1 79 13 4 858 3.6 > 1 79 14 7 1627 5.5 > 1 79 15 4 1126 3.2 > 1 79 16 2 556 1.3 > 1 79 17 2 298 1.6 > 1 79 19 6 1414 5.8 > 1 79 20 3 932 4.2 > 1 79 21 6 2065 7.4 > 1 2044 0 3 942 1.7 > 1 2044 6 4 410 1.3 > 1 2044 7 4 945 3.2 > 1 2044 10 7 1451 5.4 > 1 2044 11 1 701 2 > 1 2044 13 2 725 2.5 > 1 2044 14 2 902 2.2 > 1 2044 15 3 773 1.4 > 1 2044 18 10 2504 10.5 > 1 2044 19 3 629 2.2 > 1 2044 20 5 1685 5 > 1 2044 22 5 2159 6.4 > 1 2044 23 1 534 1.1 > 1 2044 1 . . . > 1 2044 2 . . . > 1 2044 3 . . . > 1 2044 5 . . . > 1 2044 12 . . . > ;;;; > ===== > NOTE: The data set WORK.COW_VISITS has 32 observations and 6 variables > > > Then you can use PROC SUMMARY to create a complete data set: > > ===== > proc summary data=cow_visits completetypes nway missing; > class day cow time; > var visits dur kg; > output out=cow_visits_complete (drop=_type_) > sum=; > run; > ===== > NOTE: There were 32 observations read from the data set WORK.COW_VISITS. > NOTE: The data set WORK.COW_VISITS_COMPLETE has 48 observations and 7 > variables. > > Notice that there are now 48 observations (1 day * 2 cows * 24 hours). > Here are a few of them: > > ===== > Obs day cow time visits dur kg > > 1 1 79 0 5 1534 4.5 > 2 1 79 1 . . . > 3 1 79 2 . . . > 4 1 79 3 . . . > 5 1 79 4 7 1242 2.4 > 6 1 79 5 . . . > 7 1 79 6 2 247 0.8 > 8 1 79 7 1 456 2.1 > 9 1 79 8 6 1419 5.5 > 10 1 79 9 7 2092 7.4 > 11 1 79 10 . . . > 12 1 79 11 . . . > 13 1 79 12 . . . > ===== > > You now have records for the day * cow * hour combos that were missing > in the original data set. > > Convert the missing values to zero: > > ===== > data cow_visits_complete; > set cow_visits_complete; > array setzero visits dur kg; > do over setzero; > if setzero = . then > setzero = 0; > end; > run; > ===== > > and you get > > ===== > Obs day cow time visits dur kg > > 1 1 79 0 5 1534 4.5 > 2 1 79 1 0 0 0.0 > 3 1 79 2 0 0 0.0 > 4 1 79 3 0 0 0.0 > 5 1 79 4 7 1242 2.4 > 6 1 79 5 0 0 0.0 > 7 1 79 6 2 247 0.8 > 8 1 79 7 1 456 2.1 > 9 1 79 8 6 1419 5.5 > 10 1 79 9 7 2092 7.4 > 11 1 79 10 0 0 0.0 > 12 1 79 11 0 0 0.0 > 13 1 79 12 0 0 0.0 > ===== > > You can instead create a custom format and use PRELOADFMT to make sure > that all the hours are included in the output. Omit the extra empty > observations and use: > > ===== > data hourfmt; > retain fmtname 'allhours'; > do start = 0 to 23; > label = start; > output; > end; > run; > > proc format cntlin=hourfmt; > run; > > proc summary data=cow_visits completetypes nway missing; > class day cow; > class time / preloadfmt; > format time allhours.; > var visits dur kg; > output out=cow_visits_complete (drop=_type_ _freq_) > sum=; > run; > > data cow_visits_complete; > set cow_visits_complete; > format time; > array setzero visits dur kg; > do over setzero; > if setzero = . then > setzero = 0; > end; > run; > ===== > > You also might be able to do this with PROC EXPAND if you have SAS/ETS, > but you will have to ask a ETS user. A browse through the documentation > makes me think that you might have to do a lot of data conversion to > make that work - but you might be able to work directly with your > original data file (which you didn't show) instead of having to create > an intermediate summary file. > > > > > > On Fri, 28 Mar 2008 16:20:59 -0700, nchapinal@YAHOO.COM said: > > It was hard to give a name to my problem this time... > > > > I work with an electronic feeding system for cows that record every > > time a cow visits one of the feeders, so at the end of the day I get > > an excel file with all the cows that went to eat, and all the visits > > the made. I have as many observations as visits, and for each visit I > > get the duration and how much feed it was consumed. > > I want to study diurnal patterns. Therefore, I usually summarize by > > hour, and I sum all durations and intakes of the visits that started > > each hour for each cow. Therefore, I get a file like this. > > > > > > day cow time visits dur kg > > 1 79 0 5 1534 4.5 > > 1 79 4 7 1242 2.4 > > 1 79 6 2 247 0.8 > > 1 79 7 1 456 2.1 > > 1 79 8 6 1419 5.5 > > 1 79 9 7 2092 7.4 > > 1 79 13 4 858 3.6 > > 1 79 14 7 1627 5.5 > > 1 79 15 4 1126 3.2 > > 1 79 16 2 556 1.3 > > 1 79 17 2 298 1.6 > > 1 79 19 6 1414 5.8 > > 1 79 20 3 932 4.2 > > 1 79 21 6 2065 7.4 > > 1 2044 0 3 942 1.7 > > 1 2044 6 4 410 1.3 > > 1 2044 7 4 945 3.2 > > 1 2044 10 7 1451 5.4 > > 1 2044 11 1 701 2 > > 1 2044 13 2 725 2.5 > > 1 2044 14 2 902 2.2 > > 1 2044 15 3 773 1.4 > > 1 2044 18 10 2504 10.5 > > 1 2044 19 3 629 2.2 > > 1 2044 20 5 1685 5 > > 1 2044 22 5 2159 6.4 > > 1 2044 23 1 534 1.1 > > > > where time refers to the 24-h clock. > > > > The problem is that the every cow do not visit the feeder at every our > > everyday. Then, when I try to get the average per hour and day across > > all the cows... it doesn't work, of course. I would need a way to > > insert the missing times for each cow and day, so that it say 0 in the > > number of visits per hour, duration spent eating per hour and intake > > per hour.... > > So far, what I have been doing is creating a list that contains for > > each cow, all the days that she was in the trial, and for each day, > > the a column with the time from 0 to 23. When I merge that by cow and > > day, it works... but creating this file is a lot of work! there must > > be another way quicker.... > > > > I hope I made myself understood.... > > > > Thanks in advance! > > > -- > Jack Hamilton > Sacramento, California > jfh@alumni.stanford.org >


Back to: Top of message | Previous page | Main SAS-L page