Date: Sun, 29 Jan 2006 14:47:24 -0500
Reply-To: sashole@bellsouth.net
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Paul M. Dorfman" <sashole@BELLSOUTH.NET>
Organization: Sashole of Florida
Subject: Re: Inefficient data transformation
In-Reply-To: <1138521753.502404.27430@g49g2000cwa.googlegroups.com>
Content-Type: text/plain; charset="us-ascii"
Hari,
If your input flat file is intrinsically sorted by the perceived ID,
as it would be in the temp file IN below,
filename in temp ;
data _null_ ;
input ;
file in ;
put _infile_ ;
cards ;
54,23,1.45,65,89,"DSD"
14,63,78.09,98,15,"DSD"
98,63,89.34,76,17,"DSD"
97,25,90.76,36,29,"DSD"
76,43,6.45,23,98,"KUY"
14,57,98.09,23,15,"KUY"
65,46,89.04,76,17,"SET"
98,87,89.23,71,37,"SET"
102,24,79.98,67,56,"SET"
run ;
then you can simply pipe it through a view:
data v / view = v ;
infile in ;
input ;
infile = put (_infile_, $64.) ;
id = put (scan (_infile_, -1, ','), $5.) ;
run ;
and then write it out to another flat file, OUT, as desired:
filename out temp ;
data _null_ ;
array inf [99999] $ 64 _temporary_ ;
do n = 1 by 1 until (last.id) ;
set v ;
by id ;
inf [n] = infile ;
end ;
file out ;
put _n_ n ;
do n = 1 to n ;
infile = inf [n] ;
put infile ;
end ;
run ;
In principle, the reading from IN, control-break processing, anad writing to OUT
can be all done in a single DATA step, but it would be much more logically
convoluted compared to the cannibalaizing on the SAS autovariable last.id. The
stack array INF is used instead of the double-DoW to avoid an extra pipe from
the original file. 99999 and $64 are chosen merely to be 'large enough'; you can
adjust both up or down in accordance with your likings.
Kind regards
------------
Paul Dorfman
Jax, FL
------------
> -----Original Message-----
> From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On
> Behalf Of Hari
> Sent: Sunday, January 29, 2006 3:03 AM
> To: SAS-L@LISTSERV.UGA.EDU
> Subject: Inefficient data transformation
>
> Hi,
>
> (I had a problem to solve and I dont know how to use DATA
> steps to do what I want to do. So, I used to solve the
> problem in a complicated manner. I think Data step magic can
> be used to solve this problem elegantly, please guide me for
> the same.)
>
> I have to export a SAS data file in to text format in such
> way that each ID (in my case Market is ID) in my data file is
> grouped together (sort on ID) and just before the starting
> row of each ID a new row is inserted which has numerical ID
> (equivalent to market)in first column and second column has
> count of number of markets.
>
> So, if my starting SAS file (I have put commas between
> variables/values) looks like the folowing :-
>
> InterCeptDumm,IndepVar1,IndepVar2,IndepVar3,DependentVar,Market
> 54,23,1.45,65,89,"DSD"
> 14,63,78.09,98,15,"DSD"
> 98,63,89.34,76,17,"DSD"
> 97,25,90.76,36,29,"DSD"
> 76,43,6.45,23,98,"KUY"
> 14,57,98.09,23,15,"KUY"
> 65,46,89.04,76,17,"SET"
> 98,87,89.23,71,37,"SET"
> 102,24,79.98,67,56,"SET"
>
> Then I would like to see my Data file as
>
> InterCeptDumm,IndepVar1,IndepVar2,IndepVar3,DependentVar
> 1,4
> 54,23,1.45,65,89,"DSD"
> 14,63,78.09,98,15,"DSD"
> 98,63,89.34,76,17,"DSD"
> 97,25,90.76,36,29,"DSD"
> 2,2
> 76,43,6.45,23,98,"KUY"
> 14,57,98.09,23,15,"KUY"
> 3,3
> 65,46,89.04,76,17,"SET"
> 98,87,89.23,71,37,"SET"
> 102,24,79.98,67,56,"SET"
>
> In the above file , I have assigned a numeric ID of 1 to market "DSD"
> and it is stored in the first column (IndepVar1) and the
> number of rows for "DSD" is stored in second column as
> frequency count.
>
> I did it using Proc SQL in the following manner (which i
> think is unneccasarily complicated)
>
> a) I first used a Proc Freq to calculate number of rows for
> each market. Then saved this as a new table.
>
> b) The table in a) is sorted and a new column was used to
> generate numeric ID's (using _n_ ) for the same. Name for
> this new column was same as First column of my Original file
> which is InterceptDumm
>
> c) Then in the original table a new column was created with
> values for each row as _n_ + 1.
>
> d) Then b) and c) are stacked one below the other and then
> sorted on 2 fields, market name and the newly created column
> in both tables.
>
> e) The file in d) was sorted now based on
>
> %Macro CreateHBRegDataFile(DependentVar, InputSASFile,
> MoreMktCustom, FinalHBRegData);
>
>
> /*Generating Count of Markets*/
> Proc Freq Data = &InputSASFile;
> Tables Mkt / out = MarketCount;
> Run;
>
> /*Generating Row Id for each of the Markets*/ Data
> MarketCustomCount (drop = PERCENT); Set Marketcount; /*Put
> the variable name for the _n_ variable same as the first IV
> (InterceptDumm - InterceptDumm is the first IV) in your Main
> data set*/ InterCeptDumm = _n_; /*This dummyID will be used
> for sorting after merging the 2 data sets*/ /*This will help
> us keep the rows from MarketCustomCount */ /*as first row of
> final file*/ DummyId = 1; /*Put the variable name for the
> COUNT variable same as the Second IV in your*/ /*Main data
> set*/ Rename COUNT = IndepVar1; run;
>
> /*Note we are not using product variable here*/ /*as the
> estimation for HB-Reg is done at product level.*/ data
> InputSASFileWithDummyId; Retain
> InterCeptDumm,IndepVar1,IndepVar2,IndepVar3,
> &DependentVar,Market;
> set &InputSASFile;
> DummyId = _n_ + 1;
> InterCeptDumm = 1;
> run;
>
> proc sort data = InputSASFileWithDummyId out =
> InputSASFileWithDummyId; by Mkt DummyId; run;
>
>
> /*Probably the following proc SQL code to select distinct
> rows is not needed. When I wrote this code I had probably
> generated Frequency count using Proc Freq on a bigger data
> file,but cant check now as I dont have SAS in front of me*/
> Proc SQL; Create table &MoreMktCustom as select distinct
> MktFile.InterCeptDumm, MktFile.IndepVar1, MktFile.DummyID,
> MktFile.Mkt from MarketCustomCount as MktFile,
> InputSASFileWithDummyId as BaseFile where BaseFile.mkt =
> MktFile.mkt; quit;
>
> Data MergeMe;
> set &MoreMktCustom InputSASFileWithDummyId; run;
>
> proc sort data = MergeMe out = SortMeAfterMerging; by Mkt
> DummyId; run;
>
> Data &FinalHBRegData (drop = Mkt DummyId); set
> SortMeAfterMerging; run;
>
> proc export data= &FinalHBRegData
> outfile="D:\Hari\AB - Project\Working\Exported files\2006
> 01 27\&FinalHBRegData..csv"
> dbms= csv replace;
>
> run;
>
> Proc Datasets nolist;
> Copy out = FinalSAS move;
> Select &MoreMktCustom &FinalHBRegData;
> Delete MarketCount MarketCustomCount InputSASFileWithDummyId
> MergeMe SortMeAfterMerging; Quit;
>
>
> %Mend CreateHBRegDataFile;
>
>
> /******************************************************
> Calling the macro for a Linear Model
>
> ******************************************************/
>
> %CreateHBRegDataFile
> (VolumeIndex,ProductA_Final,ProductA_MktList,ProductA_HBReg_Linear);
>
> Regards,
> Hari
> India
>
|