Date: Thu, 15 Jul 2010 14:17:42 -0400
Reply-To: Chang Chung <chang_y_chung@HOTMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Chang Chung <chang_y_chung@HOTMAIL.COM>
Subject: Re: parsing infile from web URL
On Thu, 15 Jul 2010 13:12:15 -0400, Miller, Jeremy T. (CDC/OID/NCPDCID)
>I'm creating some files from a web URL to create a relational DB. I'm
>creating a list of CBSA divisions, CBSA names, then a list of
>FIPS/FIPS_C/county_names that I could use.
>It's easy to cut off the top of a text document, with FIRSTOBS=, but,
>what is the best way to use for stopping the processing further down the
>text. For example, in the this URL, beginning on line 1959, there is a
>note that I do not want to process, so I would like to STOP there.
>Obviously, I could just download the file and strip the offending text
>both above and below to make this a non-issue, but I WANT to know how to
>do it the other way.
>filename source URL
>data msa_names (drop=flag:);
> infile source firstobs=12 truncover ;
> input flag1 $ 1 flag2 $ 25-26 CBSA 1-5 @;
> if flag1 = "*" then stop ;
> if flag2 ne " " ;
> @25 CBSA_nm $79. ;
>This "works," but you'll notice in the log a note for invalid data.
>Should I do some type of pre-parsing to stop the input before invalid
>data can come in?
>Again, I just don't want something that works, I would like to know the
>appropriate method to stop parsing INPUT if you know that only a certain
>portion of text has data WITHOUT altering the original text.
I rather think it is nicer to download the file once and work on it locally,
instead of bothering the remote server to send you the data over and over.
On avoiding error messages, I think it is easier to read the fields into
character variables and then to convert it to numeric later. And take
advantage of the structure of the input file. In the file, the first three
columns are all in the fixed column, so use the same name and your select if
statement becomes more readable. HTH.
%let metroareas = http://www.census.gov/population/www/metroareas;
filename source url "&metroareas/lists/2008/List4.txt";
data cbsa(keep=cbsa name);
infile source firstobs=12 truncover;
input cbsa_ $ 1-5 div_ $ 9-13 fips_ $ 17-21 @;
if not missing(cbsa_) and missing(div_) and missing(fips_);
cbsa = input(cbsa_, 5.0);
input @25 name $79.;
keep cbsa name;
filename source clear;
/* check */
proc print data=cbsa;