LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (January 2004, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 15 Jan 2004 23:13:45 -0500
Reply-To:     "Richard A. DeVenezia" <radevenz@IX.NETCOM.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Richard A. DeVenezia" <radevenz@IX.NETCOM.COM>
Subject:      Re: Reading the data from the Internet

"Dea Talu" <deatalu@HOTMAIL.COM> wrote in message news:200401152212.i0FMClQ06577@listserv.cc.uga.edu... > I basically have the same question with Edle. However, I don't believe the > data at http://10xgroup.com/indc/?id=indc_nagp_report#top is in csv format. > I tried Paul Choate's code from a an old e-mail, but couldn't make it run. > The problem is, I don't even know what in type of file the table can be > save. > > Dea >

Dea: Perl is ready to rock and roll for you. If the data in an html page has a 'header row' (i.e. column labels row), you can use the perl module HTML::TableExtract to extract it.

So, if you have perl installed, pipe the output of this perl program (i.e filename extract pipe "perl <perl-program>"; ) to a data step that reads the table data extracted from deep within the html page.

snip

use LWP::Simple; use LWP::UserAgent; use HTML::TableExtract;

my $url = "http://10xgroup.com/indc/?id=indc_nagp_report#top";

# 10xgroup requires user agent of Explorer so this don't work # my $html_string = get ( $url ) or die "$!\n";

# Spoof the user agent so 10xgroup coughs up the html containing the data my $ua = new LWP::UserAgent; $ua->agent("Mozilla/5.0");

my $request = new HTTP::Request('GET', $url); my $response = $ua->request($request); die "auugh" unless $response->is_success;

$html_string = $response->content;

$te = new HTML::TableExtract( headers => [qw(Region High Low Wtd Change Daily No. No. Delivery)] ); $te->parse($html_string);

# Examine all matching tables foreach $ts ($te->table_states) { print "Table (", join(',', $ts->coords), "):\n"; foreach $row ($ts->rows) { print join(',', @$row), "\n"; } }

the perl program outputs this

Table (3,9): ááEAST,,,,,,,, ááááAlgonquin Citygates, $33.0000, $16.5000, $19.6028,-36.7972,25,300,5,7,Algonquin Gas Transmissions Co. - Citygates ááááCol Gas TCO,$6.5000,$5.8500,$6.2618,+.1389, 343,500, 49, 28,Columbia Gas Co. - TCO Pool (Appalachia) ááááDominion-South,$7.5000,$5.7200,$7.1119,+.0135, 409,300, 68, 33,Dominion - South Point ááááDracut, $18.0000, $18.0000, $18.0000,-32.8108,10,000,2,3,Maritimes & Tennessee Gas Pipeline Co. - Dracut Interconnect ááááIroquois-Z2, $55.0000, $13.0000, $19.5852,-32.4848,66,300, 14, 14,Iroquois - Zone 2 ááááTETCO-M3, $45.0000,$8.3500, $17.2220,-24.0263, 156,900, 46, 24,Texas Eastern - M3 Zone ááááTransco-Z6 (non-NY), $43.0000,$8.8500, $12.5921,-17.5070,63,600, 21, 19,Transcontinental Gas Pipeline Corp. - Zone 6 (non-NY) ááááTransco-Z6 (NY), $60.0000,$9.0000, $23.4641,-23.6373,73,800, 26, 17,Transcontinental Gas Pipeline Corp. - Zone 6 (NY) .....

-- Richard A. DeVenezia http://www.devenezia.com/downloads/sas/macros/?m=xmlib


Back to: Top of message | Previous page | Main SAS-L page