LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (August 2007, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 24 Aug 2007 05:33:00 -0700
Reply-To:     Hari <excel_hari@YAHOO.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Hari <excel_hari@YAHOO.COM>
Organization: http://groups.google.com
Subject:      Re: Attribute and transactional level data as single record
Comments: To: sas-l@uga.edu
In-Reply-To:  <200708222158.l7MJOleI015787@mailgw.cc.uga.edu>
Content-Type: text/plain; charset="us-ascii"

HIC LP for HUT is from May. Right now all the MVCHUT/HIC remain unclassified

On Aug 23, 2:58 am, nos...@HOWLES.COM ("Howard Schreier <hs AT dc-sug DOT org>") wrote: > On Tue, 21 Aug 2007 00:58:12 -0700, Hari <excel_h...@YAHOO.COM> wrote: > > [snip] > > > > > > >Howard, > > >Im pasting an example with smaller number of sample values, hope this > >doesnt wrap > > >Data Trial1; > > length ID $8 Name $20 JoinDate 8 TransactType1_ProdBought $200 > >TransactType2_OperatingSysUsed $ 200 > > LastInteractionDate 8; > > infile datalines delimiter='#' dsd; > > input ID $ Name $ JoinDate $ TransactType1_ProdBought $ > > TransactType2_OperatingSysUsed $ LastInteractionDate > >$; > > informat JoinDate LastInteractionDate date9.; > > format JoinDate LastInteractionDate date9.; > > Datalines; > >223#Foo##prod1|prod99|prod3#Windows XP|Mac#18-Jun-06 > >56#Weq#30-Jun-02#prod97|prod2#Windows Vista#10-Apr-04 > >786#Rte#12-Sep-04#prod1|prod2|prod4|prod4##18-Aug-05 > >; > >run; > > >My goal is to parse the 2 fields "TransactType1_ProdBought " and > >"TransactType2_OperatingSysUsed" in to further fields so that each > >pipe delimited value within them goes in to seperate fields. For ex, > >in above case we can create 4 fields for TransactType1_ProdBought --> > >TransactType1_ProdBought1, TransactType1_ProdBought2, > >TransactType1_ProdBought3 and TransactType1_ProdBought4. > > >Similarly we can create 2 fields for "TransactType2_OperatingSysUsed" - > >> TransactType2_OperatingSysUsed1 and TransactType2_OperatingSysUsed2. > > >The number of pipe values will be different from one record to > >another, so one will have to dynamically find the maximum number of > >pipes for each such field and then create as many child variables. The > >challenge is that name of the child variables in data set will have to > >be dynamically constructed and thats where Im getting an error (--> > >&&ParsingVar&VarNum&&PipeCount = > >scan(resolve(&&ParsingVar&VarNum),resolve(&PipeCount),"|"); ) > > >Here is my macro attempt for the same. > > >options mprint mlogic; > > >%Macro ParsePipeDelimitedFields (InSASFile, ListofVarToBeParsed, > >OutSASFile); > > >%Let VarCounter = 1; > >%Let VarName = %Qscan(&ListofVarToBeParsed,&VarCounter,%str( )); > > >%Do %until(&VarName. = ); > >%Let ParsingVar&VarCounter = &VarName; > >/*Determining the maximum number of pipes for > >each variable which needs to be parsed*/ > >Proc SQL; > >Select max(count(&&ParsingVar&VarCounter,"|")) > > into :MaxNumOfPipeCharForParsingVar&VarCounter > >from &InSASFile; > >Quit; > >%Let VarCounter = %eval(&VarCounter+1); > >%Let VarName = %Qscan(&ListofVarToBeParsed,&VarCounter,%str( )); > >%End; > > >Data &OutSASFile; > >Set &InSASFile; > > >%Do VarNum = 1 %to &VarCounter - 1; > > >/*Generating individual fields for each parsing variable*/ > >%Do PipeCount = 1 %to &&MaxNumOfPipeCharForParsingVar&VarNum. + 1 ; > >&&ParsingVar&VarNum&&PipeCount = > >scan(resolve(&&ParsingVar&VarNum),resolve(&PipeCount),"|"); > >%End; > > >%End; > > >Run; > > >%Mend ParsePipeDelimitedFields; > > >%ParsePipeDelimitedFields(Trial1, TransactType1_ProdBought > >TransactType2_OperatingSysUsed, > > TrialOutput); > > I still see no need for a macro. > > The SCAN function is the key. If you can pre-specify the maximum number of > product and opsys subfields, you can simply create two arrays and then loop > through each of the composite variables using SCAN to get the individual values. > > However, I would suggest instead a normalized and relational approach.Split > the data into its three natural component tables. Here is tested code: > > data master(keep = ID Name JoinDate LastInteractionDate) > prod (keep = ID ProdBought) > opsys (keep = ID OperatingSysUsed); > set Trial1; > output master; > length ProdBought OperatingSysUsed $ 15; > do i = 1 by 1 until (missing(ProdBought) ); > ProdBought = scan(TransactType1_ProdBought,i,'|'); > if not missing(ProdBought) then output prod; > end; > do i = 1 by 1 until (missing(OperatingSysUsed) ); > OperatingSysUsed = scan(TransactType2_OperatingSysUsed,i,'|'); > if not missing(OperatingSysUsed) then output opsys; > end; > run; > > You can easily join or two-way merge these tables as appropriate. If you > insist you can transpose the two detail tables and then do a three-way merge > to get to your original design. > > In any case, it's macro-free.- Hide quoted text - > > - Show quoted text -

Howard,

Thanks for the solution. It definitely makes sense to generate a normalized output. I wasnt aware of the syntax construct do i = 1 by 1 until (condition) (surprised to see both iterative and conditional and it avoiding the "TO" index in a nice way). I will use this for my present requirement.

Btw, can you also please tell me for my personal learning as to why the syntax in my original macro approach failed (&&ParsingVar&VarNum&&PipeCount = scan(resolve(&&ParsingVar&VarNum),resolve(&PipeCount),"|");) I would like to correct my faulty understanding.

hp


Back to: Top of message | Previous page | Main SAS-L page