Date: Sun, 4 Sep 2005 14:24:23 -0400
Reply-To: "Howard Schreier <hs AT dc-sug DOT org>" <nospam@HOWLES.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Howard Schreier <hs AT dc-sug DOT org>" <nospam@HOWLES.COM>
Subject: Re: Combining groups of observations with sum
There are two techniques which come to mind.
First load your data:
data have;
infile whatever;
input ijnam $ 1-8
ijesno
cid
iacct $
date :mmddyy8.
time : hhmmss8.
ipages
ochk
;
idtm = dhms(date,0,0,time); drop date time;
format idtm datetime.;
run;
Now do a DoW-powered self-interleave:
data filled_in (drop = fi_:);
do until (last.cid);
set have(in=firstpass) have;
by cid;
if firstpass then do;
if iacct ne : '980' then fi_iacct = iacct;
end;
else do;
if iacct eq : '980' and not missing(fi_iacct)
then iacct = fi_iacct;
output;
end;
end;
run;
The first pass gathers the information needed to fill in or make
substitutions. The second pass makes the assignments.
You'll notice I've not handled all of the issues in the DATA step. That's
because I think some of them can be handled by the ID statement in PROC
SUMMARY without need for preprocessing.
The final step:
proc summary data=filled_in nway;
class cid ochk;
output out=sums(drop = _type_ _freq_) sum(ipages)=;
id ijnam ijesno iacct idtm;
run;
Result:
Obs cid ochk ijnam ijesno iacct idtm ipages
1 279 622 SIDJ822P 29861 427LR200 30AUG05:14:11:49 412
2 280 622 SIDJ821P 29857 427LR200 30AUG05:14:17:57 797
3 285 622 PSDPNOTI 29275 475DS020 30AUG05:15:12:52 331
4 292 622 ZYNAPPNP 29068 460G5008 30AUG05:15:22:21 237
5 315 622 FPIPREPT 4216 98020240 30AUG05:17:05:48 149
On Thu, 1 Sep 2005 16:52:22 -0400, McWhorter, Keith <Kmcwhort@GTA.GA.GOV>
wrote:
>TMK -
>
>Thanks for your reply! Seems you understand what I was trying to
>explain!
> ;-)
>There should never be more than one non-980 value in the account field.
>What's happening is the batch job's account is there for the printing of
>the actually report or listing from that job. The 980-account is an
>overhead account. Jobs that have only a 980 are internal. The cover
>sheets and/or separator pages are getting associated with the overhead
>account, even though it's in the same CID as the job. It's just a couple
>of sheets/job so I'd just as soon throw it into the JOB's account
>number. The date/time (idtm) - let's just say the highest value.
>
>I had sorted (for this listing) by cid, ijnam, ijesno. But of course it
>can be sorted as needed.
>
>It did cross my mind that SQL might could be used. (I just took the SAS
>SQL class a few weeks ago.)
>
>There is just one more thing that I hesitated to include earlier. I
>thought I could figure it out after getting over this first hurdle. But
>since you understand what I'm doing so far... The final column (which
>may appear wrapped to the next line) OCHK - it will have a value of
>either 622 or 621 (621 meaning it was a check that was printed.) A job
>can have a mix of plain forms (such as the separator pages) and checks.
>So within a CID there can be 622s and 621s. If both are present then the
>CID would result in TWO observations: 1 reflecting the total pages
>printed on non-check forms and another showing pages printed on checks.
>
>That's all the "rules" I promise!
>
>Thanks again!
>keith
>
>-----Original Message-----
>From: Talbot Michael Katz [mailto:topkatz@MSN.COM]
>Sent: Thursday, September 01, 2005 3:37 PM
>To: SAS-L@LISTSERV.UGA.EDU; McWhorter, Keith
>Cc: Talbot Michael Katz
>Subject: Re: Combining groups of observations with sum
>
>Hi, Keith.
>
>It looks to me like it will be pretty straightforward to do what you
>want
>with PROC SQL, or with a SORT followed by a data step, but you still
>have
>some decisions to pin down. If I understand correctly, you're going to
>want one record per CID, with the sum of ipages. That's a piece of cake
>so far. But you have five other fields that possibly can take on
>multiple
>values for each CID, so you have to specify some rules for the values.
>So
>far we know that if there are two values of iacct, and one of them
>begins
>with 980, then use the other one, but if there is only a 980 value, use
>it. What happens if there are two (or more) non-980 values of iacct?
>Is
>the idtm value important? In your example for CID 315 you chose
>17:05:41
>as the idtm value. That's on the last 315 record, but it also occurs on
>some of the other 315 records, and there are some different values, too.
>Do you always want the value on the last record? If so, presumably the
>records are supposed to be sorted in some fashion, but it's not obvious
>to
>me that that is true.
>
>Once you come up with a set of rules that covers all the cases, or at
>least all the important cases, then we can help you code it. If we give
>you code before you tell us the rules, then it may do something contrary
>to what you want.
>
>-- TMK --
>"The Macro Klutz"
>
>
>On Thu, 1 Sep 2005 14:50:10 -0400, McWhorter, Keith
><Kmcwhort@GTA.GA.GOV>
>wrote:
>
>>Hey good people,
>>
>>
>>
>>I have some data that comes from our Xerox printers. We FTP it to the
>>mainframe then I am writing a program to process and format it for
>>billing purposes. (I got past a really big part in that the file has
>>about 200 vars, is semi-colon delimited with many missing values!) I've
>>kept the important vars and printed a sample list below.
>>
>>
>>
>>The "cid" is container ID. All records with the same CID belong
>>together. I want to end up with one observation for each CID with the
>>number of pages printed (ipages) added up. Example: the first two
>lines
>>below would result in one line with "ipages" having a value of 412.
>>Note that the account number (iacct) is also different. When it comes
>>down to deciding between the two account numbers, I want the one that
>>does not begin with 980.
>>
>>
>>
>>It gets worse: Look at CID 315. The jobname (ijnam) and ijesno are
>>missing. But I know that they are part of CID 315, so they (or at least
>>the resulting output record) should be populated with the job name that
>>is present in the last two observations. (I don't think I care which
>>time value is used.) CID 315 should look like this after processing:
>>
>>
>>
>>FPIPREPT 4216 315 98020240 08/30/05 17:05:41 149
>>622
>>
>>
>>
>>It had only the one account number across all obs so it's ok that it
>>begins with 980.
>>
>>
>>
>>
>>
>>
>>
>>ijnam ijesno cid iacct idtm ipages
>>ochk
>>
>>
>>
>>
>>SIDJ822P 29861 279 98004ACX 08/30/05 14:11:49 1
>>622
>>
>>SIDJ822P 29861 279 427LR200 08/30/05 14:11:27 411
>>622
>>
>>SIDJ821P 29857 280 98004ACX 08/30/05 14:17:57 1
>>622
>>
>>SIDJ821P 29857 280 427LR200 08/30/05 14:17:19 796
>>622
>>
>>PSDPNOTI 29275 285 475DS020 08/30/05 15:12:52 325
>>622
>>
>>PSDPNOTI 29275 285 98004ACX 08/30/05 14:31:05 1
>>622
>>
>>PSDPNOTI 29275 285 475DS020 08/30/05 14:30:55 5
>>622
>>
>>ZYNAPPNP 29068 292 460G5008 08/30/05 15:22:21 237
>>622
>>
>> . 315 98020240 08/30/05 17:05:46 16
>>622
>>
>> . 315 98020240 08/30/05 17:05:46 1
>>622
>>
>> . 315 98020240 08/30/05 17:05:44 63
>>622
>>
>> . 315 98020240 08/30/05 17:05:43 1
>>622
>>
>> . 315 98020240 08/30/05 17:05:41 63
>>622
>>
>> . 315 98020240 08/30/05 17:05:41 1
>>622
>>
>>FPIPREPT 4216 315 98020240 08/30/05 17:05:48 2
>>622
>>
>>FPIPREPT 4216 315 98020240 08/30/05 17:05:41 2
>>622
>>
>>
>>
>>Is this do-able? ;-) Any suggestions would be greatly appreciated!!!
>>
>>
>>
>>Thanks,
>>
>>
>>
>>
>>
>>Keith McWhorter
>>
>>IBM Support Unit Manager
>>
>>Georgia Technology Authority
>>
>>404-656-9068
>>
>>
>>
>>"Be who you are and say what you feel because those who mind don't
>>matter and those who matter don't mind." - Dr. Suess
>>
>>
|