Date: Sun, 12 Aug 2007 12:23:15 -0700
Reply-To: Hari <excel_hari@YAHOO.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Hari <excel_hari@YAHOO.COM>
Organization: http://groups.google.com
Subject: Attribute and transactional level data as single record
Content-Type: text/plain; charset="iso-8859-1"
Hi,
I have received customer level data from a party as a flat-file
(Character 29 is the delimite) and it has been delivered in a very
unweildy format, and I cant ask them to re-do because of constraints
in data pulls.
The customer level information like their unique ID, name etc has been
unwittingly combined with all the transactional level data (lets say
their purchases) and everything is put in a SINGLE record.
Basically, different records of each particular kind of transaction of
a customer has been compressed together with a pipe delimiter
seperating them and this has been put as a single field value as part
of the customer table.
INPUT Sample data (# represents character 29 in the below data, 2
consecutive # represent missing field value)
ID#Name#JoinDate#TransactType1_ProdBought#TransactType2_OperatingSysUsed#LastInteractionDate
223#Foo#23-May-03#Product1|Product99|Product3#Windows XP|Windows
Vista#18-Jun-06
56#Weq#30-Jun-02#Product97|Product2#Windows Vista#10-Apr-04
786#Rte#12-Sep-04#Product1|Product2|Product4|Product4##18-Aug-05
I want to parse the each of the pipe delimted values within the
transactional kind of fields in to seperate new fields
Output desired within SAS (using # below to represent 2 different
fields in SAS)
ID#Name#JoinDate#TransactType1_ProdBought1#TransactType1_ProdBought2#TransactType1_ProdBought3#TransactType1_ProdBought4#TransactType2_OperatingSysUsed1#TransactType2_OperatingSysUsed2#LastInteractionDate
223#Foo#23-May-03#Product1#Product99#Product3#Windows XP|Windows
Vista#18-Jun-06
56#Weq#30-Jun-02#Product97#Product2#Windows Vista#10-Apr-04
786#Rte#12-Sep-04#Product1#Product2#Product4#Product4##18-Aug-05
I have 20 different kinds of transactional fields and the maximum
number of values within each transaction field is unknown. Some
customers might have just 1 or no value and others might have as much
as 20 different values (like 20 different products bought). The number
of new fields created for each transactional field will depend on the
maximum of the count of pipe characters across all records and this
will need to be done for every transactional field seperately.
What is an efficient way to do the above parsing/processing?
(After the parsing I intend to use Ron F's mutliple response macro to
generate frequencies for the different values of each transaction
type)
Please guide me.
regards,
hp