Date: Mon, 11 Sep 2006 10:58:11 -0400
Reply-To: "Fehd, Ronald J. (CDC/CCHIS/NCPHI)" <rjf2@CDC.GOV>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Fehd, Ronald J. (CDC/CCHIS/NCPHI)" <rjf2@CDC.GOV>
Subject: Re: Help: How to extract information such as TITLE,
AUTHOR from PDF files?
Content-Type: text/plain; charset="us-ascii"
> From: medpower
> I encountered a problem. would you be so kind to help me find a way?
>
> I have to process hundreds of PDF files but they are name
> just as XXX01, XXX02, XXX03. Obviously it's hard for making
> serching or queries. Now, I want to extract these files'
> TITLE, AUTHOR and etc. automaticaly by SAS.
Adobe v6, regular
allows you to open a pdf, highlight all, and copy to clipboard,
move to text editor and paste.
professional has the batch processing facility
but I have not worked with it.
Adobe v7, professional, has the batch processing facility
which allows you to extract text (extended).
I'm not sure what the difference is,
but I do know that extracting text from my conference CDs
got me not too many (<80%) text files
while extracting text (extended) got me 99+%.
in short: buy professional tools.
Hacking SAS to read a pdf
may be more of AF Learning Experience
than you really have time or patience to experience.
Ron Fehd the pdf maven CDC Atlanta GA USA RJF2 at cdc dot gov
|