LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (October 1999, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 22 Oct 1999 16:18:50 +0000
Reply-To:     kmself@ix.netcom.com
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Karsten M. Self" <kmself@IX.NETCOM.COM>
Organization: Self Analysis
Subject:      Re: Stripping out HTML tags
Comments: To: "Steven E. Stevens" <sstevens@LATERALTHOUGHT.COM>
Content-Type: text/plain; charset=us-ascii

> Date: Thu, 21 Oct 1999 12:04:40 -0400 > From: "Steven E. Stevens" <sstevens@LATERALTHOUGHT.COM> > Subject: Stripping out HTML tags > > Would anyone be willing to share (or point me to) a chunk of SAS code > (datastep or macro) to strip out some or all HTML tags from character > strings? Am running SAS V7/8, so 200 byte character variable limitation is > not an issue. Thanks in advance for any responses...

In yet another non-SAS response

The easiest solution I could think of would be to use an existing browser to dump rendered text. Lynx, a text-based web browser, can do just this from the command line. Under Unix it could be used via a FILENAME PIPE as SAS input, under other platforms you would generally dump output to a file and read this via SAS.

lynx -dump <filename or URL>

http://lynx.browser.org/

From the site:

Lynx is a text browser for the World Wide Web. Lynx 2.8.2 runs on Un*x, VMS, Windows 95/98/NT but not 3.1 or 3.11, on DOS (386 or higher) and OS/2 EMX. The current developmental version is also available for testing. Ports to Mac are in beta test.

Perl also has several HTML-to-text modules. See O'Reilly's "The Perl Cookbook" (Christianson & Torkington) pp 714 ff for more information.

For those who like to browse text-only but don't like Lynx's management of tables or frames, may I suggest w3m:

http://freshmeat.net/appindex/1999/06/09/928951047.html http://ei5nazha.yz.yamagata-u.ac.jp/~aito/w3m/eng/

-- Karsten M. Self (kmself@ix.netcom.com) What part of "Gestalt" don't you understand?

SAS for Linux: http://www.netcom.com/~kmself/SAS/SAS4Linux.html Mailing list: "subscribe sas-linux" to mailto:majordomo@cranfield.ac.uk 9:10am up 21:28, 2 users, load average: 0.25, 0.17, 0.09


Back to: Top of message | Previous page | Main SAS-L page