| Date: | Mon, 30 Oct 2000 12:48:01 -0500 |
| Reply-To: | Howard Schreier <Howard_Schreier@ITA.DOC.GOV> |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | Howard Schreier <Howard_Schreier@ITA.DOC.GOV> |
| Subject: | Re: Infile Statement |
|---|
I have found that sometimes input files which deviate from conventions
require pre-processing, by which I mean a step which reads the file, fixes
the problem(s) and then writes another *external* file which in turn can be
read in by a subsequent DATA step.
Here is a much simplified example of your problem. I stored the following
lines in a file named BROKEN.DAT
"A","1",""
"B","2","To: Jo Wye
123 Elm St.
Cary NC 27513"
"C","3","Call for details"
Then I ran the following program:
data _null_;
infile 'broken.dat' length=lastcol;
file 'fixed.dat' lrecl=5000;
do until (lastchar='"');
input @;
input @lastcol lastchar $1.;
put _infile_ ' ' @;
end;
put;
run;
It's actually fairly simple. It reads the file and copies entire lines to
the output file, concatenating them until it detects a double-quote
character in the trailing position. But the logic is very sensitive with
respect to when buffers are loaded and released.
The output file contains:
"A","1",""
"B","2","To: Jo Wye 123 Elm St. Cary NC 27513"
"C","3","Call for details"
There are no longer newline sequences embedded within fields, so a
straightforward INPUT statement should be able to handle it.
The logic used is not airtight, however. Suppose the data entry person
typed:
They said "OK"
(and sounded sincere)
There is a double quote right before the newline, which should be converted
to two consecutive double quotes when the comma-separated file is built.
Either way, it will fool my simple program. But you can't check for
consecutive double quotes either, because that is of course the
representation of a null value. Even the sequence {comma, double quote,
double quote} could be internal to a field if the person typed
They said "OK,"
and sounded sincere
So it's a nasty problem.
It would help greatly if your form contained a hidden field at the end,
pre-loaded with some end-of-record indicator (like "#*#EOR#*#"). If I
understand correctly, there are multiple implementations of the form (such
as HTML, MS Access, Acrobat, whatever), so getting that done consistently
might be easier said than done.
On Tue, 24 Oct 2000 14:50:25 -0400, Anita heckenbach
<aheck@GIPSADC.USDA.GOV> wrote:
>Hi all,
>
>I am sent data from 59 customers. They fill in forms (created in a
multitude of languages), but they are supposed to send me a comma separated
text file with the variables in a certain order.
>
>The data for the most part is good. However, there are a couple of offices
that use the return (enter) button within the Remark field, which makes it
unreadable by my program.
>
>Below is part of my infile statement. Is there a way for me to get to
capture these errant Remark fields? Or, is there something I can suggest to
the programmers of these forms so that when data entry folks enter the data
and hit the return (enter) button, no harm is done?
>
>data txt;
>length type $1 ssp 6. anloc 6. lot $20 agid $20 subid $20 intype $1 appno
$17 loc $50 city $30 state $2 phone $12 ob $50
>cert $8 cdate $12 ctime $4 sertype $2 purcode $1 oldcert $8 edicode 3.
ediadd 3. move $1 dest 4. carrtype $1 carrid $30 samp $1
>topfeet $3 datesamp $12
>timesamp $4 grade $2 grain $1 class $4 quant 8. unit $2 inspect 5. inspdate
$12 timein $4 remark remark1 $250
>factor1 $4 result1 $8 factr1 $160 factor2 $4 result2 $8 factr2 $160 factor3
$4 result3 $8 factr3 $160 factor4 $4 result4 $8
>factr4 $160
>factor5 $4 result5 $8 factr5 $160 factor6 $4 result6 $8 factr6 $160 factor7
$4 result7 $8 factr7 $160 factor8 $4 result8 $8
>factr8 $160
>factor9 $4 result9 $8 factr9 $160 factor10 $4 result10 $8 factr10 $160
factor11 $4 result11 $8 factr11 $160 factor12 $4 result12 $8
>factr12 $160
>factor13 $4 result13 $8 factr13 $160 factor14 $4 result14 $8 factr14 $160
factor15 $4
>result15 $8 factr15 $160 factor16 $4 result16 $8 factr16 $160
>factor17 $4 result17 $8 factr17 $160 factor18 $4 result18 $8 factr18 $160
>
>infile "C:\nqdb\unzip\&filename" dsd lrecl=2000 missover;
>
>input type ssp anloc lot agid subid intype appno loc city state phone ob
cert cdate ctime sertype
>purcode oldcert edicode ediadd move dest carrtype carrid samp topfeet
datesamp timesamp grade
>grain class quant unit inspect inspdate timein remark remark1
>factor1 result1 factr1 factor2 result2 factr2 factor3 result3 factr3
factor4 result4 factr4
>factor5 result5 factr5 factor6 result6 factr6 factor7 result7 factr7
factor8 result8 factr8
>factor9 result9 factr9 factor10 result10 factr10 factor11 result11 factr11
factor12 result12 factr12
>factor13 result13 factr13 factor14 result14 factr14 factor15 result15
factr15 factor16 result16 factr16
>factor17 result17 factr17 factor18 result18 factr18 factor19 result19
factr19 factor20 result20 factr20;
>
>run;
>
>Below is a snippet of an errant Remark field. All is fine until after the
87%, then it reads everything from Barley 13.0% as another record.
>
>
>"I","461660","461660","2000092621",192674,"Mark Small Lot #4-529 Dry
>Creek","O","","","xxxx","WA","xxxxx","xxxxxxxxxx","","20000919","","OS","O"
,"","","",,"","",,"","","20000919","","NG","M","XGR","",,"07381","20000919",
""," "," Wheat 87.0%
>Barley 13.0%
>FM & Fines
>2.0%","","","","M","10.8","","","","","","","","","","","","","","","","","
"
>,"","","","","","","","","","","","","","","","","","","","","","","","",""
,
>"","","","","","","","","","","","",""
>
>anita
>
>Anita D. Heckenbach
>Information Technology Staff
>aheck@gipsadc.usda.gov
>816-823-4639
|