Date: Thu, 15 Jan 2004 23:13:45 -0500
Reply-To: "Richard A. DeVenezia" <radevenz@IX.NETCOM.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Richard A. DeVenezia" <radevenz@IX.NETCOM.COM>
Subject: Re: Reading the data from the Internet
"Dea Talu" <deatalu@HOTMAIL.COM> wrote in message
news:200401152212.i0FMClQ06577@listserv.cc.uga.edu...
> I basically have the same question with Edle. However, I don't believe the
> data at http://10xgroup.com/indc/?id=indc_nagp_report#top is in csv
format.
> I tried Paul Choate's code from a an old e-mail, but couldn't make it run.
> The problem is, I don't even know what in type of file the table can be
> save.
>
> Dea
>
Dea:
Perl is ready to rock and roll for you. If the data in an html page has a
'header row' (i.e. column labels row), you can use the perl module
HTML::TableExtract to extract it.
So, if you have perl installed, pipe the output of this perl program (i.e
filename extract pipe "perl <perl-program>"; ) to a data step that reads the
table data extracted from deep within the html page.
snip
use LWP::Simple;
use LWP::UserAgent;
use HTML::TableExtract;
my $url = "http://10xgroup.com/indc/?id=indc_nagp_report#top";
# 10xgroup requires user agent of Explorer so this don't work
# my $html_string = get ( $url ) or die "$!\n";
# Spoof the user agent so 10xgroup coughs up the html containing the data
my $ua = new LWP::UserAgent;
$ua->agent("Mozilla/5.0");
my $request = new HTTP::Request('GET', $url);
my $response = $ua->request($request);
die "auugh" unless $response->is_success;
$html_string = $response->content;
$te = new HTML::TableExtract( headers => [qw(Region High Low Wtd Change
Daily No. No. Delivery)] );
$te->parse($html_string);
# Examine all matching tables
foreach $ts ($te->table_states) {
print "Table (", join(',', $ts->coords), "):\n";
foreach $row ($ts->rows) {
print join(',', @$row), "\n";
}
}
the perl program outputs this
Table (3,9):
ááEAST,,,,,,,,
ááááAlgonquin Citygates, $33.0000, $16.5000,
$19.6028,-36.7972,25,300,5,7,Algonquin Gas Transmissions Co. - Citygates
ááááCol Gas TCO,$6.5000,$5.8500,$6.2618,+.1389, 343,500, 49, 28,Columbia Gas
Co. - TCO Pool (Appalachia)
ááááDominion-South,$7.5000,$5.7200,$7.1119,+.0135, 409,300, 68,
33,Dominion - South Point
ááááDracut, $18.0000, $18.0000, $18.0000,-32.8108,10,000,2,3,Maritimes &
Tennessee Gas Pipeline Co. - Dracut Interconnect
ááááIroquois-Z2, $55.0000, $13.0000, $19.5852,-32.4848,66,300, 14,
14,Iroquois - Zone 2
ááááTETCO-M3, $45.0000,$8.3500, $17.2220,-24.0263, 156,900, 46, 24,Texas
Eastern - M3 Zone
ááááTransco-Z6 (non-NY), $43.0000,$8.8500, $12.5921,-17.5070,63,600, 21,
19,Transcontinental Gas Pipeline Corp. - Zone 6 (non-NY)
ááááTransco-Z6 (NY), $60.0000,$9.0000, $23.4641,-23.6373,73,800, 26,
17,Transcontinental Gas Pipeline Corp. - Zone 6 (NY)
.....
--
Richard A. DeVenezia
http://www.devenezia.com/downloads/sas/macros/?m=xmlib
|