LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (October 2000, week 5)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Mon, 30 Oct 2000 12:48:01 -0500
Reply-To:   Howard Schreier <Howard_Schreier@ITA.DOC.GOV>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   Howard Schreier <Howard_Schreier@ITA.DOC.GOV>
Subject:   Re: Infile Statement

I have found that sometimes input files which deviate from conventions require pre-processing, by which I mean a step which reads the file, fixes the problem(s) and then writes another *external* file which in turn can be read in by a subsequent DATA step.

Here is a much simplified example of your problem. I stored the following lines in a file named BROKEN.DAT

"A","1","" "B","2","To: Jo Wye 123 Elm St. Cary NC 27513" "C","3","Call for details"

Then I ran the following program:

data _null_; infile 'broken.dat' length=lastcol; file 'fixed.dat' lrecl=5000; do until (lastchar='"'); input @; input @lastcol lastchar $1.; put _infile_ ' ' @; end; put; run;

It's actually fairly simple. It reads the file and copies entire lines to the output file, concatenating them until it detects a double-quote character in the trailing position. But the logic is very sensitive with respect to when buffers are loaded and released.

The output file contains:

"A","1","" "B","2","To: Jo Wye 123 Elm St. Cary NC 27513" "C","3","Call for details"

There are no longer newline sequences embedded within fields, so a straightforward INPUT statement should be able to handle it.

The logic used is not airtight, however. Suppose the data entry person typed:

They said "OK" (and sounded sincere)

There is a double quote right before the newline, which should be converted to two consecutive double quotes when the comma-separated file is built. Either way, it will fool my simple program. But you can't check for consecutive double quotes either, because that is of course the representation of a null value. Even the sequence {comma, double quote, double quote} could be internal to a field if the person typed

They said "OK," and sounded sincere

So it's a nasty problem.

It would help greatly if your form contained a hidden field at the end, pre-loaded with some end-of-record indicator (like "#*#EOR#*#"). If I understand correctly, there are multiple implementations of the form (such as HTML, MS Access, Acrobat, whatever), so getting that done consistently might be easier said than done.

On Tue, 24 Oct 2000 14:50:25 -0400, Anita heckenbach <aheck@GIPSADC.USDA.GOV> wrote:

>Hi all, > >I am sent data from 59 customers. They fill in forms (created in a multitude of languages), but they are supposed to send me a comma separated text file with the variables in a certain order. > >The data for the most part is good. However, there are a couple of offices that use the return (enter) button within the Remark field, which makes it unreadable by my program. > >Below is part of my infile statement. Is there a way for me to get to capture these errant Remark fields? Or, is there something I can suggest to the programmers of these forms so that when data entry folks enter the data and hit the return (enter) button, no harm is done? > >data txt; >length type $1 ssp 6. anloc 6. lot $20 agid $20 subid $20 intype $1 appno $17 loc $50 city $30 state $2 phone $12 ob $50 >cert $8 cdate $12 ctime $4 sertype $2 purcode $1 oldcert $8 edicode 3. ediadd 3. move $1 dest 4. carrtype $1 carrid $30 samp $1 >topfeet $3 datesamp $12 >timesamp $4 grade $2 grain $1 class $4 quant 8. unit $2 inspect 5. inspdate $12 timein $4 remark remark1 $250 >factor1 $4 result1 $8 factr1 $160 factor2 $4 result2 $8 factr2 $160 factor3 $4 result3 $8 factr3 $160 factor4 $4 result4 $8 >factr4 $160 >factor5 $4 result5 $8 factr5 $160 factor6 $4 result6 $8 factr6 $160 factor7 $4 result7 $8 factr7 $160 factor8 $4 result8 $8 >factr8 $160 >factor9 $4 result9 $8 factr9 $160 factor10 $4 result10 $8 factr10 $160 factor11 $4 result11 $8 factr11 $160 factor12 $4 result12 $8 >factr12 $160 >factor13 $4 result13 $8 factr13 $160 factor14 $4 result14 $8 factr14 $160 factor15 $4 >result15 $8 factr15 $160 factor16 $4 result16 $8 factr16 $160 >factor17 $4 result17 $8 factr17 $160 factor18 $4 result18 $8 factr18 $160 > >infile "C:\nqdb\unzip\&filename" dsd lrecl=2000 missover; > >input type ssp anloc lot agid subid intype appno loc city state phone ob cert cdate ctime sertype >purcode oldcert edicode ediadd move dest carrtype carrid samp topfeet datesamp timesamp grade >grain class quant unit inspect inspdate timein remark remark1 >factor1 result1 factr1 factor2 result2 factr2 factor3 result3 factr3 factor4 result4 factr4 >factor5 result5 factr5 factor6 result6 factr6 factor7 result7 factr7 factor8 result8 factr8 >factor9 result9 factr9 factor10 result10 factr10 factor11 result11 factr11 factor12 result12 factr12 >factor13 result13 factr13 factor14 result14 factr14 factor15 result15 factr15 factor16 result16 factr16 >factor17 result17 factr17 factor18 result18 factr18 factor19 result19 factr19 factor20 result20 factr20; > >run; > >Below is a snippet of an errant Remark field. All is fine until after the 87%, then it reads everything from Barley 13.0% as another record. > > >"I","461660","461660","2000092621",192674,"Mark Small Lot #4-529 Dry >Creek","O","","","xxxx","WA","xxxxx","xxxxxxxxxx","","20000919","","OS","O" ,"","","",,"","",,"","","20000919","","NG","M","XGR","",,"07381","20000919", ""," "," Wheat 87.0% >Barley 13.0% >FM & Fines >2.0%","","","","M","10.8","","","","","","","","","","","","","","","",""," " >,"","","","","","","","","","","","","","","","","","","","","","","","","" , >"","","","","","","","","","","","","" > >anita > >Anita D. Heckenbach >Information Technology Staff >aheck@gipsadc.usda.gov >816-823-4639


Back to: Top of message | Previous page | Main SAS-L page