LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (December 2006, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Mon, 4 Dec 2006 19:12:17 -0500
Reply-To:     John Birken <zbq5@CDC.GOV>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         John Birken <zbq5@CDC.GOV>
Subject:      Re: How to Read PDF/RTF data in to SAS datasets
Content-Type: text/plain; charset=ISO-8859-1

Dwi:

You're facing a tricky problem. I have to cope with 50,000+ line pdf text conversions many times a year. Automatic you say - it can be done but I recommend you learn to do it manually first. If you don't work with many new data sets/day it’s probably not worth it - unless you want some programming fun. The future SAS version Beta is supposed to be able to handle pdfs - few people have it.

For us mortals (not Ron Fehd):

I import it into Excel setting the columns where I want them. If your data has many page titles and lots of formats as my data is, you might have to clean them Excel or for long data tables you may put some conditional statements in SAS or Excel to ignore the garbage. It all depends on how much garbage you have.

Here the operations: * Save the pdf as plain text. * From Excel open the txt file. * Text import wizard step 1 delimiter spaces ONLY. * Numbers between the spaces appear in their own columns. * Excel seems to know to ignore the title garbage for column setting * You can import this into SAS or read it as a column or flat file.

If your data is clean - no page titles and many fancies just save the pdf in plain text - read it into SAS without Excel.

If you find this time consuming and want to do other things - BUY one of the many shareware and commercial software packages that do this. They run from $29 - $89 and one or two wanted >>>$100 – they don’t have a clue. I recently tried a $29 one, it advertised not to expire (the expensive ones expire in 5 days). It worked fine pdf >/xls/doc/txt/more but for only 3 pages. For more pages purchase it. It’s a lot better than the expensive ones that stamp their own insignia all over the output and expire in 5 days. I have to see your data to be specific.

If you need additional details post the question or e-mail me directly.

HTH, John, jbirken@cdc.gov


Back to: Top of message | Previous page | Main SAS-L page