LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (September 2006, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Mon, 11 Sep 2006 10:58:11 -0400
Reply-To:     "Fehd, Ronald J. (CDC/CCHIS/NCPHI)" <rjf2@CDC.GOV>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Fehd, Ronald J. (CDC/CCHIS/NCPHI)" <rjf2@CDC.GOV>
Subject:      Re: Help: How to extract information such as TITLE,
              AUTHOR  from PDF files?
Content-Type: text/plain; charset="us-ascii"

> From: medpower > I encountered a problem. would you be so kind to help me find a way? > > I have to process hundreds of PDF files but they are name > just as XXX01, XXX02, XXX03. Obviously it's hard for making > serching or queries. Now, I want to extract these files' > TITLE, AUTHOR and etc. automaticaly by SAS.

Adobe v6, regular allows you to open a pdf, highlight all, and copy to clipboard, move to text editor and paste. professional has the batch processing facility but I have not worked with it.

Adobe v7, professional, has the batch processing facility which allows you to extract text (extended).

I'm not sure what the difference is, but I do know that extracting text from my conference CDs got me not too many (<80%) text files while extracting text (extended) got me 99+%.

in short: buy professional tools. Hacking SAS to read a pdf may be more of AF Learning Experience than you really have time or patience to experience.

Ron Fehd the pdf maven CDC Atlanta GA USA RJF2 at cdc dot gov


Back to: Top of message | Previous page | Main SAS-L page