Date: Sat, 29 Mar 2008 06:35:30 -0500
Reply-To: "data _null_," <datanull@GMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "data _null_," <datanull@GMAIL.COM>
Subject: Re: inserting "missing observations"
In-Reply-To: <1206764626.24840.1244906567@webmail.messagingengine.com>
Content-Type: text/plain; charset=ISO-8859-1
You might like to try STDIZE to poke all those missing values with Zeros...
proc stdize missing=0 reponly out=cow_visits_complete;
var visits dur kg;
run;
On Fri, Mar 28, 2008 at 11:23 PM, Jack Hamilton <jfh@stanfordalumni.org> wrote:
> If your input data set contains at least one record for each cow, one
> record for each day, and one record for each time, you can use the
> COMPLETETYPES option in PROC SUMMARY to generate the missing records,
> with missing values for visits, dur, and kg. You could then go through
> the data set and convert the missing values to 0.
>
> Consider this data set; it is your original one with 5 extra visits
> added to make sure that there's at least one visit for each hour
> (presumably you won't need to do this if you have a large herd of cows
> that are hungry all day):
>
> =====
> data cow_visits;
> infile cards;
> input day cow time visits dur kg;
> cards;
>
> 1 79 0 5 1534 4.5
> 1 79 4 7 1242 2.4
> 1 79 6 2 247 0.8
> 1 79 7 1 456 2.1
> 1 79 8 6 1419 5.5
> 1 79 9 7 2092 7.4
> 1 79 13 4 858 3.6
> 1 79 14 7 1627 5.5
> 1 79 15 4 1126 3.2
> 1 79 16 2 556 1.3
> 1 79 17 2 298 1.6
> 1 79 19 6 1414 5.8
> 1 79 20 3 932 4.2
> 1 79 21 6 2065 7.4
> 1 2044 0 3 942 1.7
> 1 2044 6 4 410 1.3
> 1 2044 7 4 945 3.2
> 1 2044 10 7 1451 5.4
> 1 2044 11 1 701 2
> 1 2044 13 2 725 2.5
> 1 2044 14 2 902 2.2
> 1 2044 15 3 773 1.4
> 1 2044 18 10 2504 10.5
> 1 2044 19 3 629 2.2
> 1 2044 20 5 1685 5
> 1 2044 22 5 2159 6.4
> 1 2044 23 1 534 1.1
> 1 2044 1 . . .
> 1 2044 2 . . .
> 1 2044 3 . . .
> 1 2044 5 . . .
> 1 2044 12 . . .
> ;;;;
> =====
> NOTE: The data set WORK.COW_VISITS has 32 observations and 6 variables
>
>
> Then you can use PROC SUMMARY to create a complete data set:
>
> =====
> proc summary data=cow_visits completetypes nway missing;
> class day cow time;
> var visits dur kg;
> output out=cow_visits_complete (drop=_type_)
> sum=;
> run;
> =====
> NOTE: There were 32 observations read from the data set WORK.COW_VISITS.
> NOTE: The data set WORK.COW_VISITS_COMPLETE has 48 observations and 7
> variables.
>
> Notice that there are now 48 observations (1 day * 2 cows * 24 hours).
> Here are a few of them:
>
> =====
> Obs day cow time visits dur kg
>
> 1 1 79 0 5 1534 4.5
> 2 1 79 1 . . .
> 3 1 79 2 . . .
> 4 1 79 3 . . .
> 5 1 79 4 7 1242 2.4
> 6 1 79 5 . . .
> 7 1 79 6 2 247 0.8
> 8 1 79 7 1 456 2.1
> 9 1 79 8 6 1419 5.5
> 10 1 79 9 7 2092 7.4
> 11 1 79 10 . . .
> 12 1 79 11 . . .
> 13 1 79 12 . . .
> =====
>
> You now have records for the day * cow * hour combos that were missing
> in the original data set.
>
> Convert the missing values to zero:
>
> =====
> data cow_visits_complete;
> set cow_visits_complete;
> array setzero visits dur kg;
> do over setzero;
> if setzero = . then
> setzero = 0;
> end;
> run;
> =====
>
> and you get
>
> =====
> Obs day cow time visits dur kg
>
> 1 1 79 0 5 1534 4.5
> 2 1 79 1 0 0 0.0
> 3 1 79 2 0 0 0.0
> 4 1 79 3 0 0 0.0
> 5 1 79 4 7 1242 2.4
> 6 1 79 5 0 0 0.0
> 7 1 79 6 2 247 0.8
> 8 1 79 7 1 456 2.1
> 9 1 79 8 6 1419 5.5
> 10 1 79 9 7 2092 7.4
> 11 1 79 10 0 0 0.0
> 12 1 79 11 0 0 0.0
> 13 1 79 12 0 0 0.0
> =====
>
> You can instead create a custom format and use PRELOADFMT to make sure
> that all the hours are included in the output. Omit the extra empty
> observations and use:
>
> =====
> data hourfmt;
> retain fmtname 'allhours';
> do start = 0 to 23;
> label = start;
> output;
> end;
> run;
>
> proc format cntlin=hourfmt;
> run;
>
> proc summary data=cow_visits completetypes nway missing;
> class day cow;
> class time / preloadfmt;
> format time allhours.;
> var visits dur kg;
> output out=cow_visits_complete (drop=_type_ _freq_)
> sum=;
> run;
>
> data cow_visits_complete;
> set cow_visits_complete;
> format time;
> array setzero visits dur kg;
> do over setzero;
> if setzero = . then
> setzero = 0;
> end;
> run;
> =====
>
> You also might be able to do this with PROC EXPAND if you have SAS/ETS,
> but you will have to ask a ETS user. A browse through the documentation
> makes me think that you might have to do a lot of data conversion to
> make that work - but you might be able to work directly with your
> original data file (which you didn't show) instead of having to create
> an intermediate summary file.
>
>
>
>
>
> On Fri, 28 Mar 2008 16:20:59 -0700, nchapinal@YAHOO.COM said:
> > It was hard to give a name to my problem this time...
> >
> > I work with an electronic feeding system for cows that record every
> > time a cow visits one of the feeders, so at the end of the day I get
> > an excel file with all the cows that went to eat, and all the visits
> > the made. I have as many observations as visits, and for each visit I
> > get the duration and how much feed it was consumed.
> > I want to study diurnal patterns. Therefore, I usually summarize by
> > hour, and I sum all durations and intakes of the visits that started
> > each hour for each cow. Therefore, I get a file like this.
> >
> >
> > day cow time visits dur kg
> > 1 79 0 5 1534 4.5
> > 1 79 4 7 1242 2.4
> > 1 79 6 2 247 0.8
> > 1 79 7 1 456 2.1
> > 1 79 8 6 1419 5.5
> > 1 79 9 7 2092 7.4
> > 1 79 13 4 858 3.6
> > 1 79 14 7 1627 5.5
> > 1 79 15 4 1126 3.2
> > 1 79 16 2 556 1.3
> > 1 79 17 2 298 1.6
> > 1 79 19 6 1414 5.8
> > 1 79 20 3 932 4.2
> > 1 79 21 6 2065 7.4
> > 1 2044 0 3 942 1.7
> > 1 2044 6 4 410 1.3
> > 1 2044 7 4 945 3.2
> > 1 2044 10 7 1451 5.4
> > 1 2044 11 1 701 2
> > 1 2044 13 2 725 2.5
> > 1 2044 14 2 902 2.2
> > 1 2044 15 3 773 1.4
> > 1 2044 18 10 2504 10.5
> > 1 2044 19 3 629 2.2
> > 1 2044 20 5 1685 5
> > 1 2044 22 5 2159 6.4
> > 1 2044 23 1 534 1.1
> >
> > where time refers to the 24-h clock.
> >
> > The problem is that the every cow do not visit the feeder at every our
> > everyday. Then, when I try to get the average per hour and day across
> > all the cows... it doesn't work, of course. I would need a way to
> > insert the missing times for each cow and day, so that it say 0 in the
> > number of visits per hour, duration spent eating per hour and intake
> > per hour....
> > So far, what I have been doing is creating a list that contains for
> > each cow, all the days that she was in the trial, and for each day,
> > the a column with the time from 0 to 23. When I merge that by cow and
> > day, it works... but creating this file is a lot of work! there must
> > be another way quicker....
> >
> > I hope I made myself understood....
> >
> > Thanks in advance!
>
>
> --
> Jack Hamilton
> Sacramento, California
> jfh@alumni.stanford.org
>
|