Date: Thu, 17 May 2001 15:59:00 -0700
Reply-To: Lauren Haworth <haworthl@GENE.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Lauren Haworth <haworthl@GENE.COM>
Organization: Genentech, Inc.
Subject: Re: PDF to SAS??
Content-Type: text/plain; charset=us-ascii
Carol,
Two web sites, www.planetpdf.com and www.pdfzone.com, list a number of PDF tools.
I did a quick search and came up with:
ACE: converts PDF tables to HTML or tab delim (free for non-commercial use,
http://www.ces.census.gov/ace/)
Redwing: converts PDF tables and text to a variety of formats, ($349,
http://www.datawatch.com/docs/products/index.html)
-- Lauren
Carol Bristow wrote:
> I have a very large report, generated out of a database, that was provided in
> PDF format. We are hoping to go back to the source to get the data in DBF
> format so that I don't have to do anything. In the meantime I'm doing a
> little research on tools to convert the PDF file into something useful for
> analyses, just in case. Since I know the folks that read this list use data
> from a wide variety of sources, I'm hoping someone has dealt with this
> situation.
>
> The report has a moderately complex structure, with report headers on the top
> of the page that I want to ignore, about five lines of data fields that get
> printed at the top of each page (1-n pages per site) of information for the
> site, and then a bunch of records below that with activity-level info at the
> site, along with subactivities associated with an activity. And of course,
> to make the report easier to read, duplicate info in a column is suppressed
> on the report, but I need it in the resulting dataset so that I can use it
> for analyses.
>
> The hard way is to get the PDF into a text-based format (anyone know of an
> easy way to do this?), and then write a program to parse it out. I've parsed
> report output before, although nothing quite this messy, so I know that I
> *can* do it, but I'd really rather not go there if I can help it. So the
> question is, does anyone know of a product that might let me do this without
> spending a whole lot of money (since I don't have to do this all the time)?
> I know, I'm asking for an awful lot. ;-)
>
> I tried doing some web searches, but everything I checked so far seems to be
> aimed at putting PDFs in a database and allowing text searching of the PDFs.
> And they all say call for pricing, which is a good indication to me that they
> are probably going to want a whole lot more than we're going to want to spend!
>
> Hoping someone has some good suggestions,
>
> Carol Bristow
> DPRA Incorporated
> cbristow@dpra.com
> 703-841-8025
|