|
TW Socos <bodybuilder@SCIENTIST.COM> replied (to Richard Devenezia):
> Thank you for the info. I am behind a firewall and have tried the
> suggestions you provided, but to no avail. Here is the info I am
> receiving when running w/ the DEBUG option.
>
> NOTE: >>> GET
/cgi-bin/Phoenix/redirect.pl?pil=MAY2004&depth=0&Print=144
> HTTP/1.0
> NOTE: >>> Accept: */*.
> NOTE: >>> Authorization: Basic
> NOTE: >>> Accept-Language: en
> NOTE: >>> Accept-Charset: iso-8859-1,*,utf-8
> NOTE: >>> User-Agent: SAS/URL
> NOTE: >>>
> NOTE: <<< HTTP/1.0 400 Host Header Required
> NOTE: <<< Date: Tue, 08 Jun 2004 20:00:41 GMT
> NOTE: <<< Via: HTTP/1.1 www.wrh.noaa.gov (Traffic-Server/4.0.15.1-Dell
> [c s f ])
> NOTE: <<< Cache-Control: no-store
> NOTE: <<< Content-Type: text/html
> NOTE: <<< Content-Language: en
> NOTE: <<< Content-Length: 468
> NOTE: <<<
>
> ERROR: Bad request. Use the debug option for more info.
> NOTE: The SAS System stopped processing this step because of errors.
>
> Interestingly, this is not a problem when downloading data from other
> websites (e.g., census data from US census site, etc., yahoo financial
> data, etc.). Any info you can provide would be very helpful.
Okay, the DEBUG option has provided some help here. You're getting the
HTTP handshake, and the website sees your request. It looks to me as
though you can't quite do the redirect that you need for this particular
webpage. That's probably why you can snag other webpages, but not this
one.
I suggest you try snagging the page using Perl or Python or Java, and
seeing how far you get with that approach. In Perl, you could use the
WWW::Mechanize module, or go with something a little lighter, like the
LWP::Simple or LWP::UserAgent modules. I'm going to take a wild guess
and advise you to start with the LWP::UserAgent module, as the
redirection
above might take a bit of work using just LWP::Simple .
While you're at it, you could use the Perl module HTML::Parser to clean
off the HTML tags so that SAS would get just the desired text.
HTH,
David
--
David Cassell, CSC
Cassell.David@epa.gov
Senior computing specialist
mathematical statistician
|