Date: Fri, 24 Aug 2007 05:33:00 -0700
Reply-To: Hari <excel_hari@YAHOO.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Hari <excel_hari@YAHOO.COM>
Organization: http://groups.google.com
Subject: Re: Attribute and transactional level data as single record
In-Reply-To: <200708222158.l7MJOleI015787@mailgw.cc.uga.edu>
Content-Type: text/plain; charset="us-ascii"
HIC LP for HUT is from May. Right now all the MVCHUT/HIC remain
unclassified
On Aug 23, 2:58 am, nos...@HOWLES.COM ("Howard Schreier <hs AT dc-sug
DOT org>") wrote:
> On Tue, 21 Aug 2007 00:58:12 -0700, Hari <excel_h...@YAHOO.COM> wrote:
>
> [snip]
>
>
>
>
>
> >Howard,
>
> >Im pasting an example with smaller number of sample values, hope this
> >doesnt wrap
>
> >Data Trial1;
> > length ID $8 Name $20 JoinDate 8 TransactType1_ProdBought $200
> >TransactType2_OperatingSysUsed $ 200
> > LastInteractionDate 8;
> > infile datalines delimiter='#' dsd;
> > input ID $ Name $ JoinDate $ TransactType1_ProdBought $
> > TransactType2_OperatingSysUsed $ LastInteractionDate
> >$;
> > informat JoinDate LastInteractionDate date9.;
> > format JoinDate LastInteractionDate date9.;
> > Datalines;
> >223#Foo##prod1|prod99|prod3#Windows XP|Mac#18-Jun-06
> >56#Weq#30-Jun-02#prod97|prod2#Windows Vista#10-Apr-04
> >786#Rte#12-Sep-04#prod1|prod2|prod4|prod4##18-Aug-05
> >;
> >run;
>
> >My goal is to parse the 2 fields "TransactType1_ProdBought " and
> >"TransactType2_OperatingSysUsed" in to further fields so that each
> >pipe delimited value within them goes in to seperate fields. For ex,
> >in above case we can create 4 fields for TransactType1_ProdBought -->
> >TransactType1_ProdBought1, TransactType1_ProdBought2,
> >TransactType1_ProdBought3 and TransactType1_ProdBought4.
>
> >Similarly we can create 2 fields for "TransactType2_OperatingSysUsed" -
> >> TransactType2_OperatingSysUsed1 and TransactType2_OperatingSysUsed2.
>
> >The number of pipe values will be different from one record to
> >another, so one will have to dynamically find the maximum number of
> >pipes for each such field and then create as many child variables. The
> >challenge is that name of the child variables in data set will have to
> >be dynamically constructed and thats where Im getting an error (-->
> >&&ParsingVar&VarNum&&PipeCount =
> >scan(resolve(&&ParsingVar&VarNum),resolve(&PipeCount),"|"); )
>
> >Here is my macro attempt for the same.
>
> >options mprint mlogic;
>
> >%Macro ParsePipeDelimitedFields (InSASFile, ListofVarToBeParsed,
> >OutSASFile);
>
> >%Let VarCounter = 1;
> >%Let VarName = %Qscan(&ListofVarToBeParsed,&VarCounter,%str( ));
>
> >%Do %until(&VarName. = );
> >%Let ParsingVar&VarCounter = &VarName;
> >/*Determining the maximum number of pipes for
> >each variable which needs to be parsed*/
> >Proc SQL;
> >Select max(count(&&ParsingVar&VarCounter,"|"))
> > into :MaxNumOfPipeCharForParsingVar&VarCounter
> >from &InSASFile;
> >Quit;
> >%Let VarCounter = %eval(&VarCounter+1);
> >%Let VarName = %Qscan(&ListofVarToBeParsed,&VarCounter,%str( ));
> >%End;
>
> >Data &OutSASFile;
> >Set &InSASFile;
>
> >%Do VarNum = 1 %to &VarCounter - 1;
>
> >/*Generating individual fields for each parsing variable*/
> >%Do PipeCount = 1 %to &&MaxNumOfPipeCharForParsingVar&VarNum. + 1 ;
> >&&ParsingVar&VarNum&&PipeCount =
> >scan(resolve(&&ParsingVar&VarNum),resolve(&PipeCount),"|");
> >%End;
>
> >%End;
>
> >Run;
>
> >%Mend ParsePipeDelimitedFields;
>
> >%ParsePipeDelimitedFields(Trial1, TransactType1_ProdBought
> >TransactType2_OperatingSysUsed,
> > TrialOutput);
>
> I still see no need for a macro.
>
> The SCAN function is the key. If you can pre-specify the maximum number of
> product and opsys subfields, you can simply create two arrays and then loop
> through each of the composite variables using SCAN to get the individual values.
>
> However, I would suggest instead a normalized and relational approach.Split
> the data into its three natural component tables. Here is tested code:
>
> data master(keep = ID Name JoinDate LastInteractionDate)
> prod (keep = ID ProdBought)
> opsys (keep = ID OperatingSysUsed);
> set Trial1;
> output master;
> length ProdBought OperatingSysUsed $ 15;
> do i = 1 by 1 until (missing(ProdBought) );
> ProdBought = scan(TransactType1_ProdBought,i,'|');
> if not missing(ProdBought) then output prod;
> end;
> do i = 1 by 1 until (missing(OperatingSysUsed) );
> OperatingSysUsed = scan(TransactType2_OperatingSysUsed,i,'|');
> if not missing(OperatingSysUsed) then output opsys;
> end;
> run;
>
> You can easily join or two-way merge these tables as appropriate. If you
> insist you can transpose the two detail tables and then do a three-way merge
> to get to your original design.
>
> In any case, it's macro-free.- Hide quoted text -
>
> - Show quoted text -
Howard,
Thanks for the solution. It definitely makes sense to generate a
normalized output. I wasnt aware of the syntax construct do i = 1 by 1
until (condition) (surprised to see both iterative and conditional and
it avoiding the "TO" index in a nice way). I will use this for my
present requirement.
Btw, can you also please tell me for my personal learning as to why
the syntax in my original macro approach failed
(&&ParsingVar&VarNum&&PipeCount =
scan(resolve(&&ParsingVar&VarNum),resolve(&PipeCount),"|");) I would
like to correct my faulty understanding.
hp
|