Date: Sun, 4 Feb 2007 09:57:37 -0500
Reply-To: Wensui Liu <liuwensui@GMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Wensui Liu <liuwensui@GMAIL.COM>
Subject: Re: why infile much faster than proc import
In-Reply-To: <200702041418.l14BkEmH029671@mailgw.cc.uga.edu>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
good summary, Peter!
I totally agree with you that syntax of data step is more intuitive
than proc import in this case.
On 2/4/07, Peter Crawford <peter.crawford@blueyonder.co.uk> wrote:
> On Sat, 3 Feb 2007 14:34:42 -0500, Wensui Liu <liuwensui@GMAIL.COM> wrote:
>
> >I just did a speed comparison of csv file import between infile and
> >proc import and realized infile is much much ... ... faster.
> >
> >what's the trick behind it?
> >
> >thanks.
> >--
> >WenSui Liu
> >A lousy statistician who happens to know a little programming
> >(http://spaces.msn.com/statcompute/blog)
>
>
> WenSui Liu
>
> I'm not surprised with the results you reported.
>
> I know this may not be in mainstream thinking, but I would only use
> proc import for a preliminary look at column structure of a file.
> Even then, I would only do that for an input file of unknown origins.
>
> Generally we know where a file has come from and what it's structure
> should be. So, generally, I would use a datastep infile statement to
> connect the process with the data, and input statements to parse the
> data. And I can expect to be precise in my definition and results.
> That is something proc import cannot match when reading plain text,
> because it does not know the information structure, so it has to try
> to discover what columns are present in the input file.
>
> If we want to import from something that is not plain text, why use
> proc import? Instead we can use the relevant SAS library engine to
> deliver the data directly. Under the covers the excel engine
> gemerates syntax that looks like some flavour of sql.
>
> Proc import will never satisfy my need to be specific and precise
> about parsing input text. The data step provides all the
> flexibility and control for handling text in complex structure.
> For simple structure, data step syntax is probably easier to learn
> than proc import ( but thats just my opinion)
>
> Peter Crawford
> Crawford Software Consultancy
>
--
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)
|