Date: Fri, 4 Dec 2009 08:45:01 -0600
Reply-To: Joe Matise <snoopy369@GMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Joe Matise <snoopy369@GMAIL.COM>
Subject: Re: Data Compression Question
In-Reply-To: <100FCDB28E638D4B903CB7D2056E43C7039E34DE@USFCH-MAIL1.lewin.com>
Content-Type: text/plain; charset=ISO-8859-1
In many instances, 'fewer I/O operations' speeds up processing far more than
the processing overhead. Not always true, but on large datasets like yours,
usually true (depends on the contents of the dataset). Try just running
data whatever;
set whatever;
run;
compressed and uncompressed. I'd be surprisedif there weren't a significant
time difference.
-Joe
On Fri, Dec 4, 2009 at 8:27 AM, Kirby, Ted <ted.kirby@lewin.com> wrote:
> The following is an extract from the SAS Online documentation:
>
> <begin quote>
> Advantages of compressing a file include
>
> reduced storage requirements for the file
>
> fewer I/O operations necessary to read from or write to the data during
> processing.
>
>
> However, disadvantages of compressing a file are that
>
> more CPU resources are required to read a compressed file because of the
> overhead of uncompressing each observation
>
> there are situations when the resulting file size may increase rather
> than decrease
> <end quote>
>
> My question is does the "fewer I/O operations" speed up processing more
> than "the overhead of uncompressing each operation" slows it down? I
> seem to recall hearing at some point in my life that the I/O is the
> slowest component of SAS processing, but am not sure.
>
> The reason for my question is that we have 51.1 GB dataset that, upon
> binary compression, is reduced to 3.18 GB. Obviously for saving disk
> space compression is a no-brainer. However, I was not sure of the value
> of data compression from a processing standpoint. (The dataset has
> 6,698,765 observations and 982 variables.)
>
> Thanks
> ************* IMPORTANT - PLEASE READ ********************
>
> This e-mail, including attachments, may include confidential and/or
> proprietary information, and may be used only by the person or entity to
> which it is addressed. If the reader of this e-mail is not the intended
> recipient or his or her authorized agent, the reader is hereby notified that
> any dissemination, distribution or copying of this e-mail is prohibited. If
> you have received this e-mail in error, please notify the sender by replying
> to this message and delete this e-mail immediately.
>
>
|