Date: Fri, 10 Aug 2007 09:47:16 +1000
Reply-To: d@dkvj.biz
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: David Johnson <d@DKVJ.BIZ>
Subject: Re: Multi-file INFILE statement only honors FIRSTOBS option for
first file processed?
In-Reply-To: <OFAF1EA538.4A020B43-ON85257331.005C4470-85257331.005CF94A@dom.com>
Content-Type: text/plain; charset="us-ascii"
Nat and others have given sound advice on solving this issue. I will add a
little thought on this.
When I talk to clients, I work from the assumption that they have two
extremely valuable assets, their staff and their data. Too often it seems,
one or both are treated with some disrespect. For the data, I always take
the view that it should be "cared and fed" lest we lose something in the
process. There's an old song that suggests we don't know what we've got
'til it's gone, sometimes we don't even notice the absence.
So when I am faced with a file read where I will drop irrelevant data, such
as headers, underscores, blank lines or total rows, I account for every one
of these and verify at the end that what I have dropped and kept still
accounts for the full contents of the file.
To do this I explicitly "Output" the lines that I want to keep, remove all
reference to "FirstObs" processing and change the "delete" rows to this type
of structure.
If _InFile_ Eq: "CONSUMNO" Then Do; /* Be aware of case sensitivity */
DROPHEAD ++ 1;
Input; /* Release the held record vector */
End;
On the last input record I then output the count(s) of the line types I
drop, and if the sum of these and the records in the output table match the
lines in the file(s) input, then I have neither duplicated nor dropped any
data.
It takes a little longer, but I hold it as a code template for repetition
and save a lot of time verifying all the appropriate data has been captured.
Kind regards
David
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU]On Behalf Of Nat
Wooding
Sent: Thursday, 9 August 2007 2:56 AM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: Multi-file INFILE statement only honors FIRSTOBS option for
first file processed?
Roy
You say that the first line of each file contains the variable names and I
hope that these are consistant and the first one is (based on your input
statement)
consumno . Rather than worry about skipping the lines, I would first check
the line to see if it has this string and if so, delete it. Then, read your
data.
* ================================== ;
filename oic 'N:\oncology_infusion_center\IP401B-*.txt' ;
data gnu ;
infile oic firstobs = 2 ;* the firstobs is no longer needed;
input @; * new line;
if _infile_ =:'consumno' then delete;* new line;
input
@1 consumno $char8.
@12 acct $char9.
@22 attending_provider $char30.
@53 service_date date11.
@113 quantity 3.0
<etc.>
;
run ;
* ================================== ;
You may have mixed case issues so you may want to do an upcase function on
the _infile_ before checking for consumno and I hope that different files
don't have different spellings since you would need to find the varieties.
As to the n= and the "outside the printed range" notes, I don't think that
I have ever encountered either.
Good luck
Nat
Nat Wooding
Environmental Specialist III
Dominion, Environmental Biology
4111 Castlewood Rd
Richmond, VA 23234
Phone:804-271-5313, Fax: 804-271-2977
"Pardee, Roy"
<pardee.r@GHC.ORG
> To
Sent by: "SAS(r) SAS-L@LISTSERV.UGA.EDU
Discussion" cc
<SAS-L@LISTSERV.U
GA.EDU> Subject
Multi-file INFILE statement only
honors FIRSTOBS option for first
08/08/2007 12:22 file processed?
PM
Please respond to
"Pardee, Roy"
<pardee.r@GHC.ORG
>
Hey All,
I've got a bunch of text files I need to read in, each of which has a
'header row' of var names that I need sas to skip over. So I wrote the
following:
* ================================== ;
filename oic 'N:\oncology_infusion_center\IP401B-*.txt' ;
data gnu ;
infile oic firstobs = 2 ;
input
@1 consumno $char8.
@12 acct $char9.
@22 attending_provider $char30.
@53 service_date date11.
@113 quantity 3.0
<etc.>
;
run ;
* ================================== ;
This works just fine if I edit that filename statement so it only refers
to a single file. But if I leave it as written, I see things like:
NOTE: Invalid data for service_date in line 1985 53-63.
NOTE: Invalid data for quantity in line 1985 113-115.
NOTE: Invalid data errors for file OIC occurred outside the printed
range.
NOTE: Increase available buffer lines with the INFILE n= option.
(I'm confused by the literal data that sas prints out around those
NOTEs--some of it looks like actual data, and some of it looks like the
header row.)
I've tried removing the specific file complained about & re-running,
only to have SAS start complaining about a different file. This leads
me to the theory that the FIRSTOBS = 2 option is only being applied to
the first file.
Is that plausible? And more to the point--how do I get sas to skip line
1 of every file?
Many thanks in advance!
-Roy
Roy Pardee
Research Analyst/Programmer
Group Health Center For Health Studies (Cancer Research Network)
(206) 287-2078
Google Talk: rpardee
-----------------------------------------
CONFIDENTIALITY NOTICE: This electronic message contains
information which may be legally confidential and/or privileged and
does not in any case represent a firm ENERGY COMMODITY bid or offer
relating thereto which binds the sender without an additional
express written confirmation to that effect. The information is
intended solely for the individual or entity named above and access
by anyone else is unauthorized. If you are not the intended
recipient, any disclosure, copying, distribution, or use of the
contents of this information is prohibited and may be unlawful. If
you have received this electronic transmission in error, please
reply immediately to the sender that you have received the message
in error, and delete it. Thank you.