LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (June 2004, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Wed, 9 Jun 2004 14:59:36 -0700
Reply-To:   cassell.david@EPAMAIL.EPA.GOV
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   "David L. Cassell" <cassell.david@EPAMAIL.EPA.GOV>
Subject:   Re: Infile Data from CGI Generated Web Page
Content-type:   text/plain; charset=US-ASCII

TW Socos <bodybuilder@SCIENTIST.COM> replied (to Richard Devenezia): > Thank you for the info. I am behind a firewall and have tried the > suggestions you provided, but to no avail. Here is the info I am > receiving when running w/ the DEBUG option. > > NOTE: >>> GET /cgi-bin/Phoenix/redirect.pl?pil=MAY2004&depth=0&Print=144 > HTTP/1.0 > NOTE: >>> Accept: */*. > NOTE: >>> Authorization: Basic > NOTE: >>> Accept-Language: en > NOTE: >>> Accept-Charset: iso-8859-1,*,utf-8 > NOTE: >>> User-Agent: SAS/URL > NOTE: >>> > NOTE: <<< HTTP/1.0 400 Host Header Required > NOTE: <<< Date: Tue, 08 Jun 2004 20:00:41 GMT > NOTE: <<< Via: HTTP/1.1 www.wrh.noaa.gov (Traffic-Server/4.0.15.1-Dell > [c s f ]) > NOTE: <<< Cache-Control: no-store > NOTE: <<< Content-Type: text/html > NOTE: <<< Content-Language: en > NOTE: <<< Content-Length: 468 > NOTE: <<< > > ERROR: Bad request. Use the debug option for more info. > NOTE: The SAS System stopped processing this step because of errors. > > Interestingly, this is not a problem when downloading data from other > websites (e.g., census data from US census site, etc., yahoo financial > data, etc.). Any info you can provide would be very helpful.

Okay, the DEBUG option has provided some help here. You're getting the HTTP handshake, and the website sees your request. It looks to me as though you can't quite do the redirect that you need for this particular webpage. That's probably why you can snag other webpages, but not this one.

I suggest you try snagging the page using Perl or Python or Java, and seeing how far you get with that approach. In Perl, you could use the WWW::Mechanize module, or go with something a little lighter, like the LWP::Simple or LWP::UserAgent modules. I'm going to take a wild guess and advise you to start with the LWP::UserAgent module, as the redirection above might take a bit of work using just LWP::Simple .

While you're at it, you could use the Perl module HTML::Parser to clean off the HTML tags so that SAS would get just the desired text.

HTH, David -- David Cassell, CSC Cassell.David@epa.gov Senior computing specialist mathematical statistician


Back to: Top of message | Previous page | Main SAS-L page