LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (August 2007, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 10 Aug 2007 09:47:16 +1000
Reply-To:     d@dkvj.biz
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         David Johnson <d@DKVJ.BIZ>
Subject:      Re: Multi-file INFILE statement only honors FIRSTOBS option for
              first file processed?
In-Reply-To:  <OFAF1EA538.4A020B43-ON85257331.005C4470-85257331.005CF94A@dom.com>
Content-Type: text/plain; charset="us-ascii"

Nat and others have given sound advice on solving this issue. I will add a little thought on this.

When I talk to clients, I work from the assumption that they have two extremely valuable assets, their staff and their data. Too often it seems, one or both are treated with some disrespect. For the data, I always take the view that it should be "cared and fed" lest we lose something in the process. There's an old song that suggests we don't know what we've got 'til it's gone, sometimes we don't even notice the absence.

So when I am faced with a file read where I will drop irrelevant data, such as headers, underscores, blank lines or total rows, I account for every one of these and verify at the end that what I have dropped and kept still accounts for the full contents of the file.

To do this I explicitly "Output" the lines that I want to keep, remove all reference to "FirstObs" processing and change the "delete" rows to this type of structure.

If _InFile_ Eq: "CONSUMNO" Then Do; /* Be aware of case sensitivity */ DROPHEAD ++ 1; Input; /* Release the held record vector */ End;

On the last input record I then output the count(s) of the line types I drop, and if the sum of these and the records in the output table match the lines in the file(s) input, then I have neither duplicated nor dropped any data.

It takes a little longer, but I hold it as a code template for repetition and save a lot of time verifying all the appropriate data has been captured.

Kind regards

David

-----Original Message----- From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU]On Behalf Of Nat Wooding Sent: Thursday, 9 August 2007 2:56 AM To: SAS-L@LISTSERV.UGA.EDU Subject: Re: Multi-file INFILE statement only honors FIRSTOBS option for first file processed?

Roy

You say that the first line of each file contains the variable names and I hope that these are consistant and the first one is (based on your input statement) consumno . Rather than worry about skipping the lines, I would first check the line to see if it has this string and if so, delete it. Then, read your data.

* ================================== ; filename oic 'N:\oncology_infusion_center\IP401B-*.txt' ;

data gnu ; infile oic firstobs = 2 ;* the firstobs is no longer needed; input @; * new line;

if _infile_ =:'consumno' then delete;* new line; input @1 consumno $char8. @12 acct $char9. @22 attending_provider $char30. @53 service_date date11. @113 quantity 3.0 <etc.> ; run ; * ================================== ;

You may have mixed case issues so you may want to do an upcase function on the _infile_ before checking for consumno and I hope that different files don't have different spellings since you would need to find the varieties.

As to the n= and the "outside the printed range" notes, I don't think that I have ever encountered either.

Good luck

Nat

Nat Wooding Environmental Specialist III Dominion, Environmental Biology 4111 Castlewood Rd Richmond, VA 23234 Phone:804-271-5313, Fax: 804-271-2977

"Pardee, Roy" <pardee.r@GHC.ORG > To Sent by: "SAS(r) SAS-L@LISTSERV.UGA.EDU Discussion" cc <SAS-L@LISTSERV.U GA.EDU> Subject Multi-file INFILE statement only honors FIRSTOBS option for first 08/08/2007 12:22 file processed? PM

Please respond to "Pardee, Roy" <pardee.r@GHC.ORG >

Hey All,

I've got a bunch of text files I need to read in, each of which has a 'header row' of var names that I need sas to skip over. So I wrote the following:

* ================================== ; filename oic 'N:\oncology_infusion_center\IP401B-*.txt' ;

data gnu ; infile oic firstobs = 2 ; input @1 consumno $char8. @12 acct $char9. @22 attending_provider $char30. @53 service_date date11. @113 quantity 3.0 <etc.> ; run ; * ================================== ;

This works just fine if I edit that filename statement so it only refers to a single file. But if I leave it as written, I see things like:

NOTE: Invalid data for service_date in line 1985 53-63. NOTE: Invalid data for quantity in line 1985 113-115. NOTE: Invalid data errors for file OIC occurred outside the printed range. NOTE: Increase available buffer lines with the INFILE n= option.

(I'm confused by the literal data that sas prints out around those NOTEs--some of it looks like actual data, and some of it looks like the header row.)

I've tried removing the specific file complained about & re-running, only to have SAS start complaining about a different file. This leads me to the theory that the FIRSTOBS = 2 option is only being applied to the first file.

Is that plausible? And more to the point--how do I get sas to skip line 1 of every file?

Many thanks in advance!

-Roy

Roy Pardee Research Analyst/Programmer Group Health Center For Health Studies (Cancer Research Network) (206) 287-2078 Google Talk: rpardee

----------------------------------------- CONFIDENTIALITY NOTICE: This electronic message contains information which may be legally confidential and/or privileged and does not in any case represent a firm ENERGY COMMODITY bid or offer relating thereto which binds the sender without an additional express written confirmation to that effect. The information is intended solely for the individual or entity named above and access by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution, or use of the contents of this information is prohibited and may be unlawful. If you have received this electronic transmission in error, please reply immediately to the sender that you have received the message in error, and delete it. Thank you.


Back to: Top of message | Previous page | Main SAS-L page