LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (April 2003, week 5)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Wed, 30 Apr 2003 08:38:07 -0700
Reply-To:   QYing <qiying.zhou@EQUIFAX.COM>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   QYing <qiying.zhou@EQUIFAX.COM>
Organization:   http://groups.google.com/
Subject:   Re: Reading in variable length record with multiple segments in one record
Content-Type:   text/plain; charset=ISO-8859-1

After reading and digesting everybody's kind reply, I have three questions. (1) Is it true that on the line of INFILE CARDS column=xx does not work correctly but column= does? Someone alluded to this in one message (Roger? Howard?) but I can not find that message any more. (I have a hard time reading ALL the messages related to my question). Couldn't find any answeres in SAS doc either. In people's feedback where length= was used with INFILE CARDS, they also have other mechanism in place to loop out so no problem as long as the incorrect length is greater than the actual length.

(2) Ian, In your code below, how did you capture the INFORMAT (Roger also suggested this) with the INPUT statement? They are not in the buffer at the "loc"...

proc format ; invalue len "A" = 4 "B" = 6 "C" = 7 other = . ; ...... INPUT @loc segid $1. @loc len len1. @ ;

(3) Combining everybody's wisdom and apply to the cloer-to-reality version of my data: A1234B123456B123456C1234567&&A1234&&A1234C1234567C1234567&& where (1) "&&" separates records; (2) yes there will be ABC's inside of the segment; and (3) segment A will only appear once at the beginning of the record.

...I have had the following. Have I covered all my bases and is this the most parsimonious? Any other suggestions? Many thanks!

data test; infile 'oneline' column=col length=reclen ; length content $8; input @; do until (col > reclen); input segid $1. @@; select (segid); when ('A') do; input +(-1) content $5. @; id+1; end ; when ('B') input +(-1) content $7. @; when ('C') input +(-1) content $8. @; otherwise delete; end; output; end; run;

LOG ********************************************* after INFILE reclen = 0 col = 1 buffer = A1234B123456B123456C1234567&&A1234&&A1234C1234567C1234567&& after INPUT segid reclen = 59 col = 2 segid = A after A col = 6 id = 1 after INPUT segid reclen = 59 col = 7 segid = B after B col = 13 id = 1 after INPUT segid reclen = 59 col = 14 segid = B after B col = 20 id = 1 after INPUT segid reclen = 59 col = 21 segid = C after C col = 28 id = 1 after INPUT segid reclen = 59 col = 29 segid = & after INFILE reclen = 59 col = 29 after INPUT segid reclen = 59 col = 30 segid = & after INFILE reclen = 59 col = 30 after INPUT segid reclen = 59 col = 31 segid = A after A col = 35 id = 2 after INPUT segid reclen = 59 col = 36 segid = & after INFILE reclen = 59 col = 36 after INPUT segid reclen = 59 col = 37 segid = & after INFILE reclen = 59 col = 37 after INPUT segid reclen = 59 col = 38 segid = A after A col = 42 id = 3 after INPUT segid reclen = 59 col = 43 segid = C after C col = 50 id = 3 after INPUT segid reclen = 59 col = 51 segid = C after C col = 58 id = 3 after INPUT segid reclen = 59 col = 59 segid = & after INFILE reclen = 59 col = 59 after INPUT segid reclen = 59 col = 60 segid = & after INFILE reclen = 59 col = 60 NOTE: 1 record was read from the infile 'oneline'. The minimum record length was 59. The maximum record length was 59. NOTE: SAS went to a new line when INPUT statement reached past the end of a line. NOTE: The data set WORK.TEST has 8 observations and 3 variables.

WHITLOI1@WESTAT.COM (Ian Whitlock) wrote in message news:<9B501B3774931C469BCCCC021BE537223D463B@remailnt2-re01.westat.com>... > Peter Crawford has kindly pointed out that my new "improved" mail is leaving > out some line feeds, hence the odd comment in my previous message. > With this new understanding it is worth simplifying the INPUT program. > > proc format ; > > invalue len > "A" = 4 > "B" = 6 > "C" = 7 > other = . > ; > run ; > > DATA test; > *INFILE 'person.dat' lrecl=10000 truncover length = reclen recfm = v; > infile cards length = reclen ; > length content $ 7 ; > input @ ; > loc = 1; > ID + 1 ; > do while (loc < reclen ); > INPUT @loc segid $1. @loc len len1. @ ; > if segid = " " then leave ; > else > if segid not in ( "A" "B" "C" ) then abort ; > loc + 1 ; > INPUT @loc content $varying7. len @; > loc+len; > output ; > end; > cards ; > A1234B123456B123456C1234567 > A1234 > ABCDEABCDEBCCCCCCCAAAAAAA > A1234C1234567C1234567 > RUN; > > I added a line of data to show that the data can be any printable characters > as long as SEGID appears at the promised places. > > Sorry for my confusion. > > IanWhitlock@westat.com > -----Original Message----- > From: QYing [mailto:qiying.zhou@EQUIFAX.COM] > Sent: Thursday, April 24, 2003 4:13 PM > To: SAS-L@LISTSERV.UGA.EDU > Subject: Reading in variable length record with multiple segments in one > record > > > Could someone please shed light on this problem I am trying to solve? > Thanks. > > Ying > ************************************************************** > Data in: (the real records are 1000+ to 7000+ bytes long) > A1234B123456B123456C1234567 A1234 A1234C1234567C1234567 etc > ************************************************************** > Desired outcome: > id - segid - content > 1 - A - A1234 > 1 - B - B123456 > 1 - B - B123456 > 1 - C - C1234567 > 2 - A - A1234 > 3 - A - A1234 > 3 - C - C1234567 > 3 - C - C1234567 > ************************************************************* > I tried the following idea and it went into loops. > ************************************************************* > DATA test; > INFILE 'person.dat' lrecl=10000 truncover length = reclen recfm = v; input > @@; loc = 1; do while (loc < reclen ); > INPUT @loc segid $1. @@; > if segid='A' then do; > loc=1; > INPUT @loc content $4. segid @; > loc+4; > end; > if segid='B' then do; > INPUT @loc content $6. segid @; > loc+6; > end; > if segid='C' then do; > INPUT @loc content $7. segid @; > loc+7; > end; > end; > RUN; > > proc print; > run;


Back to: Top of message | Previous page | Main SAS-L page