|
Matthew,
It could be something as simple as a dos eof character residing in that
position. One easy way to see the hex characters would be to use the list
statement, although you could just read character by character and print the
values as hex.
I would try the following as an initial test:
data _null_;
infile in ignoredoseof;
input;
if _n_ gt 580370 then do;
list;
end;
if _n_ > 580378 then
stop;
run;
Let us know if that is of any help,
Art
-------
On Sat, 9 Apr 2011 14:24:37 +0000, Zack, Matthew M. (CDC/ONDIEH/NCCDPHP)
<mmz1@CDC.GOV> wrote:
>This text file is ~ 3.9 GB long and is being read using a SAS DATA step
with INFILE/INPUT statements
>under Windows XP. The record length is 651, and only some of the
variables/fields/columns on each record
>are being read. One of the records has a carriage-return+line-feed in the
middle of one of these variables
>so that SAS stops reading and writing observations at that record
(N=580,376). This record shows up in the incomplete SAS data set using the
SAS Analyst as being truncated within this specific variable; all preceding
variables with this record look OK, and all succeeding variables within this
record are blank.
>
>Given the size of the file and the record length, the total number of
records on the file should be closer
>to 6,000,000 (ten times the number I can read in). I don't have a file
viewer/text editor with hex capabilities
>that can "see" if other problems are affecting the records beyond record #
580,376.
>
>I've tried the following combinations of INFILE/INPUT statement options
without successfully reading
>or writing these 6 million records (the NOTE to the SAS LOG indicates that
only 580,376 records have
>been read and written):
>
> 1. INFILE options LRECL=651, PAD, TRUNCOVER, and MISSOVER:
>
> INFILE filename LRECL=651 PAD TRUNCOVER;
> INPUT . . . ;
>
> or
>
> INFILE filename LRECL=651 PAD MISSOVER;
> INPUT . . . ;
>
> 2. INFILE option LENGTH=xxx with two INPUT statements, one of which has a
$VARYINGW. informat:
>
> LENGTH LINE $ 651;
> INFILE filename LENGTH=linelen;
> INPUT @;
> INPUT @1 LINE $VARYING651. LINELEN;
> . . . subsequent statements to parse the variable, LINE, into
distinct variables/fields. . .;
>
> 3. Removing the carriage-return + line feed:
>
> LENGTH LINE LINE2 $ 651;
> INFILE filename LRECL=651 PAD TRUNCOVER;
> INPUT @1 LINE $CHAR651.;
> LINE2=COMPRESS(LINE,'0d0a'x);
> . . . subsequent statements to parse the variable, LINE2, into
distinct variables/fields. . .;
>
> 4. Using the INFILE statement options, FIRSTOBS=nnnn and OBS=nnnnn, to
read past the troublesome record,
> perhaps with two separate DATA steps to read records before and after
this record:
>
> DATA TEMP1;
> INFILE filename FIRSTOBS=1 OBS=580375 LRECL=651 PAD TRUNCOVER;
> INPUT . . . .;
> OUTPUT TEMP1;
> RUN;
>
> DATA TEMP2;
> INFILE filename FIRSTOBS=580377 LRECL=651 PAD TRUNCOVER;
> INPUT . . . .;
> OUTPUT TEMP2;
> RUN;
>
> PROC APPEND DATA=TEMP2 BASE=TEMP1;
> RUN;
>
> PROC DATASETS LIBRARY=WORK NOLIST;
> DELETE TEMP2 / MEMTYPE=DATA;
> QUIT;
>
>
>5. Reading only variables in text column positions before the variable
truncated by the Carriage-Return
> and Line-Feed (for example, VAR8 starting in column 230) on record
number 580,376:
>
> DATA TEMP1;
> INFILE filename FIRSTOBS=1 OBS=580375 LRECL=651 PAD TRUNCOVER;
> INPUT @1 var1 $char20. @35 var2 $char13. . . . . . . var7 218-223;
> OUTPUT TEMP1;
> RUN;
>
>Because none of these attempted solutions reads beyond the truncated record
number 580,376, 90% of the records
>are missing from the final SAS data set.
>
>Could this be a problem with Windows XP (address space limitations) or SAS
version 9.13?
>
>Any other ideas for a solution?
>
>Thank you.
>
>Matthew Zack
|