LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (May 2001, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 17 May 2001 15:59:00 -0700
Reply-To:     Lauren Haworth <haworthl@GENE.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Lauren Haworth <haworthl@GENE.COM>
Organization: Genentech, Inc.
Subject:      Re: PDF to SAS??
Comments: To: CBristow@dpra.com
Content-Type: text/plain; charset=us-ascii

Carol,

Two web sites, www.planetpdf.com and www.pdfzone.com, list a number of PDF tools. I did a quick search and came up with: ACE: converts PDF tables to HTML or tab delim (free for non-commercial use, http://www.ces.census.gov/ace/) Redwing: converts PDF tables and text to a variety of formats, ($349, http://www.datawatch.com/docs/products/index.html)

-- Lauren

Carol Bristow wrote:

> I have a very large report, generated out of a database, that was provided in > PDF format. We are hoping to go back to the source to get the data in DBF > format so that I don't have to do anything. In the meantime I'm doing a > little research on tools to convert the PDF file into something useful for > analyses, just in case. Since I know the folks that read this list use data > from a wide variety of sources, I'm hoping someone has dealt with this > situation. > > The report has a moderately complex structure, with report headers on the top > of the page that I want to ignore, about five lines of data fields that get > printed at the top of each page (1-n pages per site) of information for the > site, and then a bunch of records below that with activity-level info at the > site, along with subactivities associated with an activity. And of course, > to make the report easier to read, duplicate info in a column is suppressed > on the report, but I need it in the resulting dataset so that I can use it > for analyses. > > The hard way is to get the PDF into a text-based format (anyone know of an > easy way to do this?), and then write a program to parse it out. I've parsed > report output before, although nothing quite this messy, so I know that I > *can* do it, but I'd really rather not go there if I can help it. So the > question is, does anyone know of a product that might let me do this without > spending a whole lot of money (since I don't have to do this all the time)? > I know, I'm asking for an awful lot. ;-) > > I tried doing some web searches, but everything I checked so far seems to be > aimed at putting PDFs in a database and allowing text searching of the PDFs. > And they all say call for pricing, which is a good indication to me that they > are probably going to want a whole lot more than we're going to want to spend! > > Hoping someone has some good suggestions, > > Carol Bristow > DPRA Incorporated > cbristow@dpra.com > 703-841-8025


Back to: Top of message | Previous page | Main SAS-L page