LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (January 2006, week 5)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Sun, 29 Jan 2006 14:47:24 -0500
Reply-To:     sashole@bellsouth.net
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Paul M. Dorfman" <sashole@BELLSOUTH.NET>
Organization: Sashole of Florida
Subject:      Re: Inefficient data transformation
Comments: To: Hari <excel_hari@YAHOO.COM>
In-Reply-To:  <1138521753.502404.27430@g49g2000cwa.googlegroups.com>
Content-Type: text/plain; charset="us-ascii"

Hari,

If your input flat file is intrinsically sorted by the perceived ID, as it would be in the temp file IN below,

filename in temp ;

data _null_ ; input ; file in ; put _infile_ ; cards ; 54,23,1.45,65,89,"DSD" 14,63,78.09,98,15,"DSD" 98,63,89.34,76,17,"DSD" 97,25,90.76,36,29,"DSD" 76,43,6.45,23,98,"KUY" 14,57,98.09,23,15,"KUY" 65,46,89.04,76,17,"SET" 98,87,89.23,71,37,"SET" 102,24,79.98,67,56,"SET" run ;

then you can simply pipe it through a view:

data v / view = v ; infile in ; input ; infile = put (_infile_, $64.) ; id = put (scan (_infile_, -1, ','), $5.) ; run ;

and then write it out to another flat file, OUT, as desired:

filename out temp ;

data _null_ ; array inf [99999] $ 64 _temporary_ ; do n = 1 by 1 until (last.id) ; set v ; by id ; inf [n] = infile ; end ; file out ; put _n_ n ; do n = 1 to n ; infile = inf [n] ; put infile ; end ; run ;

In principle, the reading from IN, control-break processing, anad writing to OUT can be all done in a single DATA step, but it would be much more logically convoluted compared to the cannibalaizing on the SAS autovariable last.id. The stack array INF is used instead of the double-DoW to avoid an extra pipe from the original file. 99999 and $64 are chosen merely to be 'large enough'; you can adjust both up or down in accordance with your likings.

Kind regards ------------ Paul Dorfman Jax, FL ------------

> -----Original Message----- > From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On > Behalf Of Hari > Sent: Sunday, January 29, 2006 3:03 AM > To: SAS-L@LISTSERV.UGA.EDU > Subject: Inefficient data transformation > > Hi, > > (I had a problem to solve and I dont know how to use DATA > steps to do what I want to do. So, I used to solve the > problem in a complicated manner. I think Data step magic can > be used to solve this problem elegantly, please guide me for > the same.) > > I have to export a SAS data file in to text format in such > way that each ID (in my case Market is ID) in my data file is > grouped together (sort on ID) and just before the starting > row of each ID a new row is inserted which has numerical ID > (equivalent to market)in first column and second column has > count of number of markets. > > So, if my starting SAS file (I have put commas between > variables/values) looks like the folowing :- > > InterCeptDumm,IndepVar1,IndepVar2,IndepVar3,DependentVar,Market > 54,23,1.45,65,89,"DSD" > 14,63,78.09,98,15,"DSD" > 98,63,89.34,76,17,"DSD" > 97,25,90.76,36,29,"DSD" > 76,43,6.45,23,98,"KUY" > 14,57,98.09,23,15,"KUY" > 65,46,89.04,76,17,"SET" > 98,87,89.23,71,37,"SET" > 102,24,79.98,67,56,"SET" > > Then I would like to see my Data file as > > InterCeptDumm,IndepVar1,IndepVar2,IndepVar3,DependentVar > 1,4 > 54,23,1.45,65,89,"DSD" > 14,63,78.09,98,15,"DSD" > 98,63,89.34,76,17,"DSD" > 97,25,90.76,36,29,"DSD" > 2,2 > 76,43,6.45,23,98,"KUY" > 14,57,98.09,23,15,"KUY" > 3,3 > 65,46,89.04,76,17,"SET" > 98,87,89.23,71,37,"SET" > 102,24,79.98,67,56,"SET" > > In the above file , I have assigned a numeric ID of 1 to market "DSD" > and it is stored in the first column (IndepVar1) and the > number of rows for "DSD" is stored in second column as > frequency count. > > I did it using Proc SQL in the following manner (which i > think is unneccasarily complicated) > > a) I first used a Proc Freq to calculate number of rows for > each market. Then saved this as a new table. > > b) The table in a) is sorted and a new column was used to > generate numeric ID's (using _n_ ) for the same. Name for > this new column was same as First column of my Original file > which is InterceptDumm > > c) Then in the original table a new column was created with > values for each row as _n_ + 1. > > d) Then b) and c) are stacked one below the other and then > sorted on 2 fields, market name and the newly created column > in both tables. > > e) The file in d) was sorted now based on > > %Macro CreateHBRegDataFile(DependentVar, InputSASFile, > MoreMktCustom, FinalHBRegData); > > > /*Generating Count of Markets*/ > Proc Freq Data = &InputSASFile; > Tables Mkt / out = MarketCount; > Run; > > /*Generating Row Id for each of the Markets*/ Data > MarketCustomCount (drop = PERCENT); Set Marketcount; /*Put > the variable name for the _n_ variable same as the first IV > (InterceptDumm - InterceptDumm is the first IV) in your Main > data set*/ InterCeptDumm = _n_; /*This dummyID will be used > for sorting after merging the 2 data sets*/ /*This will help > us keep the rows from MarketCustomCount */ /*as first row of > final file*/ DummyId = 1; /*Put the variable name for the > COUNT variable same as the Second IV in your*/ /*Main data > set*/ Rename COUNT = IndepVar1; run; > > /*Note we are not using product variable here*/ /*as the > estimation for HB-Reg is done at product level.*/ data > InputSASFileWithDummyId; Retain > InterCeptDumm,IndepVar1,IndepVar2,IndepVar3, > &DependentVar,Market; > set &InputSASFile; > DummyId = _n_ + 1; > InterCeptDumm = 1; > run; > > proc sort data = InputSASFileWithDummyId out = > InputSASFileWithDummyId; by Mkt DummyId; run; > > > /*Probably the following proc SQL code to select distinct > rows is not needed. When I wrote this code I had probably > generated Frequency count using Proc Freq on a bigger data > file,but cant check now as I dont have SAS in front of me*/ > Proc SQL; Create table &MoreMktCustom as select distinct > MktFile.InterCeptDumm, MktFile.IndepVar1, MktFile.DummyID, > MktFile.Mkt from MarketCustomCount as MktFile, > InputSASFileWithDummyId as BaseFile where BaseFile.mkt = > MktFile.mkt; quit; > > Data MergeMe; > set &MoreMktCustom InputSASFileWithDummyId; run; > > proc sort data = MergeMe out = SortMeAfterMerging; by Mkt > DummyId; run; > > Data &FinalHBRegData (drop = Mkt DummyId); set > SortMeAfterMerging; run; > > proc export data= &FinalHBRegData > outfile="D:\Hari\AB - Project\Working\Exported files\2006 > 01 27\&FinalHBRegData..csv" > dbms= csv replace; > > run; > > Proc Datasets nolist; > Copy out = FinalSAS move; > Select &MoreMktCustom &FinalHBRegData; > Delete MarketCount MarketCustomCount InputSASFileWithDummyId > MergeMe SortMeAfterMerging; Quit; > > > %Mend CreateHBRegDataFile; > > > /****************************************************** > Calling the macro for a Linear Model > > ******************************************************/ > > %CreateHBRegDataFile > (VolumeIndex,ProductA_Final,ProductA_MktList,ProductA_HBReg_Linear); > > Regards, > Hari > India >


Back to: Top of message | Previous page | Main SAS-L page