Date: Thu, 10 Jun 2004 14:38:49 -0500
Reply-To: "Dunn, Toby" <tdunn@TEA.STATE.TX.US>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Dunn, Toby" <tdunn@TEA.STATE.TX.US>
Subject: Re: A BIIIIIIIIIG SAS Challenge !! I'm helpless! :((
Content-Type: text/plain; charset="us-ascii"
Yes, just one more example of why Ian is one of (at the head of the
list) the people that I look up to and aspire to become as programmers.
Toby Dunn
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Rob
Rohrbough
Sent: Thursday, June 10, 2004 2:36 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: A BIIIIIIIIIG SAS Challenge !! I'm helpless! :((
Seems as if I have hardly enough time to do my own work, but I have to
comment on this. I don't want to compare the father of the DOW loop to
the
Great Communicator, but this parallel is just too vivid to overlook.
While
some of us, quite rightly, point out that the purpose of the "L" is not
to
do each others jobs for them, here comes an approach from The Master
which
is both kind and insightful. That's what former President Reagan was -
even
to his political and world-wide foes.
Bravo, Ian!
Rob Rohrbough
> -----Original Message-----
> From: Ian Whitlock [mailto:iw1junk@COMCAST.NET]
> Sent: Thursday, June 10, 2004 2:04 PM
> Subject: Re: A BIIIIIIIIIG SAS Challenge !! I'm helpless! :((
>
>
> Eric <eoyount@REMOVETHIS.HOTMAIL.COM> responded to
> the Easwara Moorthy <easwar_ps@YAHOO.COM> question below
> in part with
>
> This newsgroup is here to help SAS programmers with their
> problems, teach new techniques, and discuss ideas, not do
> people's jobs for them.
>
> There is a fine line between teaching somebody the tools needed
> to solve a problem and doing someone's job. Easwara made it hard
> because little simplification took place over the business specs.
> In fact I am not sure exactly what is wanted. Instead I am
> guided by what might be a plausible interpretation, albeit with
> some quirks and warts ignored. After this simplification the
> problem is no harder than many discussed on SAS-L so I will proceed.
>
> It is really a simple merge and count type problem, made a little
> more difficult by the fact that the DOMAIN variable in the main
> dataset indicates which files are to be counted. This means
> either splitting the main table according to DOMAIN and
> processing each part separately, or combining the auxiliary
> tables and adding a DOMAIN variable to indicate the source. I
> chose the first method since that is what I thought of first.
> But the second method would avoid macro at the expense of more IF
> and ARRAY code.
>
> First let's set up some test data.
>
> /* make data */
> data Track ;
> input byDate :ddmmyy10. fini :time8. sum :time8.
> domain $ ;
> cards ;
> 01/06/2004 3:44:30 1:36:39 FAB
> 02/06/2004 3:10:12 1:03:30 FAB
> 03/06/2004 4:31:48 2:23:37 ORD
> 04/06/2004 3:55:11 1:46:53 FAB
> 05/06/2004 3:58:06 1:48:23 FAB
> 06/06/2004 3:45:49 1:37:44 ADV
> 07/06/2004 3:20:46 1:13:26 FAB
> 08/06/2004 3:56:29 1:48:15 FAB
> 09/06/2004 3:50:05 1:41:57 FAB
> 10/06/2004 3:50:05 1:41:57 FAB
> 01/05/2004 2:53:50 0:46:20 COMMON
> 02/05/2004 2:37:12 0:34:16 FAB
> ;
>
> proc sort data = tract ;
> by bydate ;
> run ;
>
> data FabRejFunc ;
> input SNOrder :$16. byDate :ddmmyy10. IsCorrected toDelete ;
> cards ;
> 1504051801003001 10/06/2004 0 0
> 1504060702164001 10/06/2004 1 0
> 9604060200330001 09/06/2004 0 1
> 9604060200330001 09/06/2004 0 0
> 9604060200330001 09/06/2004 0 1
> 9604060200330001 09/06/2004 1 1
> 9604060400048001 09/06/2004 1 0
> 9604060800317001 09/06/2004 0 1
> 9604060700074004 08/06/2004 0 0
> 9604060800009002 08/06/2004 0 1
> ;
>
> proc sort data = fabrejfunc ;
> by bydate ;
> run ;
>
> data FabRejDup ;
> input SNOrder :$16. byDate :ddmmyy10. ;
> cards ;
> 1503121200733001 10/06/2004
> 1503121200733001 10/06/2004
> 1503121601325001 10/06/2004
> 1503121700045001 10/06/2004
> 1503121700061001 09/06/2004
> 1503121700114001 09/06/2004
> 1503121700125003 08/06/2004
> 1503121700125003 08/06/2004
> ;
>
> proc sort data = FabRejDup ;
> by bydate ;
> run ;
>
> I changed the auxiliary tables from ORD to FAB in order
> to obtain some non-zero counts.
>
> The split is easy but probably over simplified because
> I don't know enough about the data.
>
> data fab ord adv ;
> set track ;
> select ( domain ) ;
> when ( "FAB" ) output fab ;
> when ( "ORD" ) output ord ;
> when ( "ADV" ) output adv ;
> otherwise /* ??? */ ;
> end ;
> run ;
>
> Now we will have to repeat the merges for each legitimate value
> of DOMAIN. This cries out for a simple macro with one parameter
> giving the current domain value. Here is the macro.
>
> %macro getcnt ( dom = fab ) ;
> %let dom = %upcase(&dom) ;
>
> data &dom.func ( drop = iscorrected todelete ) ;
> merge &dom ( in = track )
> &dom.rejfunc ( in = c drop = snorder )
> ;
> by bydate ;
> if track ;
> put bydate= ;
>
> if first.bydate then
> do ;
> count = 0 ;
> corrcnt = 0 ;
> delcnt = 0 ;
> end ;
>
> count + c ;
> corrcnt + iscorrected ;
> delcnt + todelete ;
>
> if last.bydate ;
> run ;
>
> data &dom.all ;
> merge &dom.func ( in = track )
> &dom.rejdup ( in = c drop = snorder )
> ;
> by bydate ;
> if track ;
>
> if first.bydate then dupcnt = 0 ;
> dupcnt + c ;
> if last.bydate ;
> run ;
>
> %mend getcnt ;
>
> Finally one has to call the macro for each appropriate value.
> For example, here is the call for FAB.
>
> %getcnt ( dom = fab ) ;
>
> When all the calls are made then a final concatenation and sort
> is needed to make the report.
>
> Looking back, Eric has a point. It does look a lot like applying
> the proper analysis and writing the code for a job. On the other
> hand, there is probably plenty work to fill in the missing parts
> and insure that my interpretation is a good one.
>
> It probably would be worth while and interesting for others to
> comment on Eric's point. However, in the long run, if one posses
> a question missing too much of the analysis, and presented in too
> raw a business form then the answers are liable to be useless or
> nonexistent. So it is a self-correcting situation for SAS-L.
>
> Ian_Whitlock@comcast.net
> =======================
>
> Date: Thu, 10 Jun 2004 01:44:58 -0700
> Reply-To: Easwara Moorthy <easwar_ps@YAHOO.COM>
> From: Easwara Moorthy <easwar_ps@YAHOO.COM>
> Organization: http://groups.google.com
> Subject: A BIIIIIIIIIG SAS Challenge !! I'm helpless! :((
> Content-Type: text/plain; charset=ISO-8859-1
>
> Hi All..
>
> I have one 'Track' table which gives the date/hour/duration/domain of
> the warehouse loading. and 6 rejects tables, 2 each for ADV,FAB,ORD
> Domain and each domain has functioncally rejected records(in
> XXXX_Functional dataset) and records rejected due to duplicate keys(in
> XXX_reject_duplicate dataset).
>
> Take a look at these Tables.
>
> Track table
> -----------
> byDate fini sum domain
> 01/06/2004 3:44:30 1:36:39 FAB
> 02/06/2004 3:10:12 1:03:30 FAB
> 03/06/2004 4:31:48 2:23:37 ORD
> 04/06/2004 3:55:11 1:46:53 FAB
> 05/06/2004 3:58:06 1:48:23 FAB
> 06/06/2004 3:45:49 1:37:44 ADV
> 07/06/2004 3:20:46 1:13:26 FAB
> 08/06/2004 3:56:29 1:48:15 FAB
> 09/06/2004 3:50:05 1:41:57 FAB
> 10/06/2004 3:50:05 1:41:57 FAB
> 01/05/2004 2:53:50 0:46:20 COMMON
> 02/05/2004 2:37:12 0:34:16 FAB
>
> Ord_Reject_functional Table :
> -------------------
> SN Order byDate IsCorrected toDelete
>
> 1504051801003001 10/06/2004 0 0
> 1504060702164001 10/06/2004 1 0
> 9604060200330001 09/06/2004 0 1
> 9604060200330001 09/06/2004 0 0
> 9604060200330001 09/06/2004 0 1
> 9604060200330001 09/06/2004 1 1
> 9604060400048001 09/06/2004 1 0
> 9604060800317001 09/06/2004 0 1
> 9604060700074004 08/06/2004 0 0
> 9604060800009002 08/06/2004 0 1
>
> Ord_reject_Duplicate
> --------------------
> SN_ORDER byDate
> 1503121200733001 10/06/2004
> 1503121200733001 10/06/2004
> 1503121601325001 10/06/2004
> 1503121700045001 10/06/2004
> 1503121700061001 09/06/2004
> 1503121700114001 09/06/2004
> 1503121700125003 08/06/2004
> 1503121700125003 08/06/2004
>
> (Similar to this , the tables for other domains FAB and ADV are
> available).
>
> I need the following columns in the 'Track' table.
>
> In addition to byDate , fini, sum and domain, I need
>
> Adv_rej_Func_count,Adv_Func_isCorrected,Adv_Func_toDel,Adv_dup_counts
> Ord_rej_Func_count,Ord_Func_isCorrected,Ord_Func_toDel,Ord_dup_counts
> Fab_rej_Func_count,Fab_Func_isCorrected,Fab_Func_toDel,Fab_dup_counts
> Adv_rej_Fonc_ord,Fab_rej_Fonc_ord,Ord_rej_Fonc_ord.
>
> where
>
> XXX_Func_count is count(*) of XXX_Functional_table;
> XXX_dup_count is count(*) of XXX_duplicate_table;
> XXX_XXX_ord is the no. of Orders rejected on that date.
>
> for eg.
> SN_Order
> --------
> 1504051801003001
> 1504060702164001
> 9604060200330001
> 9604060200330001
> 9604060200330001
> - the record count is 5, but order count is 3.(3 similar orders aree
> present).
>
> I need a final dataset which has all columns in Track tables and
> Ord_rej_Func_count,Ord_Func_isCorrected,Ord_Func_toDel,Ord_dup_counts
> Adv_rej_Func_count,Adv_Func_isCorrected,Adv_Func_toDel,Adv_dup_counts
> Fab_rej_Func_count,Fab_Func_isCorrected,Fab_Func_toDel,Fab_dup_counts
>
> Adv_rej_Fonc_ord,Fab_rej_Fonc_ord,Ord_rej_Fonc_ord columns
>
> for every matching date in the track and the respective reject table.
>
>
> Desired Dataset - based on Ord tables given and the column order:
>
>
> byDate**fini**sum**domain**Ord_rej_Func_count**Ord_isCorr**Ord_toD
> e*ord_dup_cnt
> ------------------------------------------------------------------
> ------------
> 01/06/2004 3:44:30 1:36:39 . . ...... (since no record for
> 1st june)
> .....
> .....
> 08/06/2004 3:56:29 1:48:15 FAB 2 0 1 2........
> 09/06/2004 3:50:05 1:41:57 FAB 6 2 4 2........
> 10/06/2004 3:50:05 1:41:57 FAB 10 1 0 4........
>
>
> I hope I've given all necessary information...I'm also working on
> it..but finding difficult in designing an approach to get the desired
> dataset. :((
>
> Please Help me...this task plays a major role in my career! :((
>
> Thanks a Million in advance!
>
> Eashwar!
>
|