LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (April 2011, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Sat, 9 Apr 2011 18:06:47 -0400
Reply-To:     Arthur Tabachneck <art297@ROGERS.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Arthur Tabachneck <art297@ROGERS.COM>
Subject:      Re: Trouble reading a very large ASCII file perhaps due to
              '0d0a'x (Carriage-Return + Line-Feed) within variable: SAS v 9.13
Comments: To: Matthew Zack <mmz1@CDC.GOV>

Matthew,

If your guess is correct simply add the string ignoredoseof on your infile statement.

Art ------ On Sat, 9 Apr 2011 21:36:39 +0000, Zack, Matthew M. (CDC/ONDIEH/NCCDPHP) <mmz1@CDC.GOV> wrote:

>Thank you. > >The end-of-file marker (Control-Z) within an ASCII field seems the most likely possibility now. >I'll have to read up on the ENCODING option if the other suggested solutions that SAS-Lers >do not work. > >Matthew Zack > > >-----Original Message----- >From: Lingqun [mailto:lingqun@gmail.com] >Sent: Saturday, April 09, 2011 2:52 PM >To: Zack, Matthew M. (CDC/ONDIEH/NCCDPHP) >Cc: SAS-L@LISTSERV.UGA.EDU >Subject: Re: Trouble reading a very large ASCII file perhaps due to '0d0a'x (Carriage-Return + Line-Feed) within variable: SAS v 9.13 > >You may try option ENCODING= > > $B:_ (B Apr 9, 2011 $B!$ (B12:49 PM $B!$ (B"Zack, Matthew M. (CDC/ONDIEH/NCCDPHP)" <mmz1@CDC.GOV> $B<LF;!' (B > >> Thank you for your suggestion. >> >> I'll try it out. >> >> Matthew Zack >> >> From: Gabriel Rosas [mailto:rosas.gabe@gmail.com] >> Sent: Saturday, April 09, 2011 11:47 AM >> To: Zack, Matthew M. (CDC/ONDIEH/NCCDPHP) >> Subject: Re: Trouble reading a very large ASCII file perhaps due to '0d0a'x (Carriage-Return + Line-Feed) within variable: SAS v 9.13 >> >> I think you're going to have to read it in byte by byte and re-write the text file before reading it in properly. The following is untested code. >> >> filename fixfile temp; >> >> data _null_; >> infile yourhugefile recfm=n lrecl=651; >> file fixfile lrecl=651; >> recpos=1; >> do while(recpos<652); >> input chktmp $1 @; >> if chktmp='0d'x then do; >> input chktmp $1 @; >> if chktmp='0a'd then recpos+2; >> end; >> else put chktmp +(-1) @; >> recpos+1; >> end; >> put; >> run; >> >> >> On Sat, Apr 9, 2011 at 10:24 AM, Zack, Matthew M. (CDC/ONDIEH/NCCDPHP) <mmz1@cdc.gov<mailto:mmz1@cdc.gov>> wrote: >> This text file is ~ 3.9 GB long and is being read using a SAS DATA step with INFILE/INPUT statements >> under Windows XP. The record length is 651, and only some of the variables/fields/columns on each record >> are being read. One of the records has a carriage-return+line-feed in the middle of one of these variables >> so that SAS stops reading and writing observations at that record (N=580,376). This record shows up in the incomplete SAS data set using the SAS Analyst as being truncated within this specific variable; all preceding variables with this record look OK, and all succeeding variables within this record are blank. >> >> Given the size of the file and the record length, the total number of records on the file should be closer >> to 6,000,000 (ten times the number I can read in). I don't have a file viewer/text editor with hex capabilities >> that can "see" if other problems are affecting the records beyond record # 580,376. >> >> I've tried the following combinations of INFILE/INPUT statement options without successfully reading >> or writing these 6 million records (the NOTE to the SAS LOG indicates that only 580,376 records have >> been read and written): >> >> 1. INFILE options LRECL=651, PAD, TRUNCOVER, and MISSOVER: >> >> INFILE filename LRECL=651 PAD TRUNCOVER; >> INPUT . . . ; >> >> or >> >> INFILE filename LRECL=651 PAD MISSOVER; >> INPUT . . . ; >> >> 2. INFILE option LENGTH=xxx with two INPUT statements, one of which has a $VARYINGW. informat: >> >> LENGTH LINE $ 651; >> INFILE filename LENGTH=linelen; >> INPUT @; >> INPUT @1 LINE $VARYING651. LINELEN; >> . . . subsequent statements to parse the variable, LINE, into distinct variables/fields. . .; >> >> 3. Removing the carriage-return + line feed: >> >> LENGTH LINE LINE2 $ 651; >> INFILE filename LRECL=651 PAD TRUNCOVER; >> INPUT @1 LINE $CHAR651.; >> LINE2=COMPRESS(LINE,'0d0a'x); >> . . . subsequent statements to parse the variable, LINE2, into distinct variables/fields. . .; >> >> 4. Using the INFILE statement options, FIRSTOBS=nnnn and OBS=nnnnn, to read past the troublesome record, >> perhaps with two separate DATA steps to read records before and after this record: >> >> DATA TEMP1; >> INFILE filename FIRSTOBS=1 OBS=580375 LRECL=651 PAD TRUNCOVER; >> INPUT . . . .; >> OUTPUT TEMP1; >> RUN; >> >> DATA TEMP2; >> INFILE filename FIRSTOBS=580377 LRECL=651 PAD TRUNCOVER; >> INPUT . . . .; >> OUTPUT TEMP2; >> RUN; >> >> PROC APPEND DATA=TEMP2 BASE=TEMP1; >> RUN; >> >> PROC DATASETS LIBRARY=WORK NOLIST; >> DELETE TEMP2 / MEMTYPE=DATA; >> QUIT; >> >> >> 5. Reading only variables in text column positions before the variable truncated by the Carriage-Return >> and Line-Feed (for example, VAR8 starting in column 230) on record number 580,376: >> >> DATA TEMP1; >> INFILE filename FIRSTOBS=1 OBS=580375 LRECL=651 PAD TRUNCOVER; >> INPUT @1 var1 $char20. @35 var2 $char13. . . . . . . var7 218-223; >> OUTPUT TEMP1; >> RUN; >> >> Because none of these attempted solutions reads beyond the truncated record number 580,376, 90% of the records >> are missing from the final SAS data set. >> >> Could this be a problem with Windows XP (address space limitations) or SAS version 9.13? >> >> Any other ideas for a solution? >> >> Thank you. >> >> Matthew Zack


Back to: Top of message | Previous page | Main SAS-L page