LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (April 2011, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Sat, 9 Apr 2011 11:06:22 -0400
Reply-To:   Arthur Tabachneck <art297@ROGERS.COM>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   Arthur Tabachneck <art297@ROGERS.COM>
Subject:   Re: Trouble reading a very large ASCII file perhaps due to '0d0a'x (Carriage-Return + Line-Feed) within variable: SAS v 9.13
Comments:   To: Matthew Zack <mmz1@CDC.GOV>

Matthew,

It could be something as simple as a dos eof character residing in that position. One easy way to see the hex characters would be to use the list statement, although you could just read character by character and print the values as hex.

I would try the following as an initial test:

data _null_; infile in ignoredoseof; input; if _n_ gt 580370 then do; list; end; if _n_ > 580378 then stop; run;

Let us know if that is of any help, Art ------- On Sat, 9 Apr 2011 14:24:37 +0000, Zack, Matthew M. (CDC/ONDIEH/NCCDPHP) <mmz1@CDC.GOV> wrote:

>This text file is ~ 3.9 GB long and is being read using a SAS DATA step with INFILE/INPUT statements >under Windows XP. The record length is 651, and only some of the variables/fields/columns on each record >are being read. One of the records has a carriage-return+line-feed in the middle of one of these variables >so that SAS stops reading and writing observations at that record (N=580,376). This record shows up in the incomplete SAS data set using the SAS Analyst as being truncated within this specific variable; all preceding variables with this record look OK, and all succeeding variables within this record are blank. > >Given the size of the file and the record length, the total number of records on the file should be closer >to 6,000,000 (ten times the number I can read in). I don't have a file viewer/text editor with hex capabilities >that can "see" if other problems are affecting the records beyond record # 580,376. > >I've tried the following combinations of INFILE/INPUT statement options without successfully reading >or writing these 6 million records (the NOTE to the SAS LOG indicates that only 580,376 records have >been read and written): > > 1. INFILE options LRECL=651, PAD, TRUNCOVER, and MISSOVER: > > INFILE filename LRECL=651 PAD TRUNCOVER; > INPUT . . . ; > > or > > INFILE filename LRECL=651 PAD MISSOVER; > INPUT . . . ; > > 2. INFILE option LENGTH=xxx with two INPUT statements, one of which has a $VARYINGW. informat: > > LENGTH LINE $ 651; > INFILE filename LENGTH=linelen; > INPUT @; > INPUT @1 LINE $VARYING651. LINELEN; > . . . subsequent statements to parse the variable, LINE, into distinct variables/fields. . .; > > 3. Removing the carriage-return + line feed: > > LENGTH LINE LINE2 $ 651; > INFILE filename LRECL=651 PAD TRUNCOVER; > INPUT @1 LINE $CHAR651.; > LINE2=COMPRESS(LINE,'0d0a'x); > . . . subsequent statements to parse the variable, LINE2, into distinct variables/fields. . .; > > 4. Using the INFILE statement options, FIRSTOBS=nnnn and OBS=nnnnn, to read past the troublesome record, > perhaps with two separate DATA steps to read records before and after this record: > > DATA TEMP1; > INFILE filename FIRSTOBS=1 OBS=580375 LRECL=651 PAD TRUNCOVER; > INPUT . . . .; > OUTPUT TEMP1; > RUN; > > DATA TEMP2; > INFILE filename FIRSTOBS=580377 LRECL=651 PAD TRUNCOVER; > INPUT . . . .; > OUTPUT TEMP2; > RUN; > > PROC APPEND DATA=TEMP2 BASE=TEMP1; > RUN; > > PROC DATASETS LIBRARY=WORK NOLIST; > DELETE TEMP2 / MEMTYPE=DATA; > QUIT; > > >5. Reading only variables in text column positions before the variable truncated by the Carriage-Return > and Line-Feed (for example, VAR8 starting in column 230) on record number 580,376: > > DATA TEMP1; > INFILE filename FIRSTOBS=1 OBS=580375 LRECL=651 PAD TRUNCOVER; > INPUT @1 var1 $char20. @35 var2 $char13. . . . . . . var7 218-223; > OUTPUT TEMP1; > RUN; > >Because none of these attempted solutions reads beyond the truncated record number 580,376, 90% of the records >are missing from the final SAS data set. > >Could this be a problem with Windows XP (address space limitations) or SAS version 9.13? > >Any other ideas for a solution? > >Thank you. > >Matthew Zack


Back to: Top of message | Previous page | Main SAS-L page