Date: Fri, 4 Dec 2009 09:59:20 -0500
Reply-To: Michael Raithel <michaelraithel@WESTAT.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Michael Raithel <michaelraithel@WESTAT.COM>
Subject: Re: Data Compression Question
In-Reply-To: <100FCDB28E638D4B903CB7D2056E43C7039E34DE@USFCH-MAIL1.lewin.com>
Content-Type: text/plain; charset="us-ascii"
Dear SAS-L-ers,
Ted Kirby posted the following interesting question:
>
> The following is an extract from the SAS Online documentation:
>
> <begin quote>
> Advantages of compressing a file include
>
> reduced storage requirements for the file
>
> fewer I/O operations necessary to read from or write to the data during
> processing.
>
>
> However, disadvantages of compressing a file are that
>
> more CPU resources are required to read a compressed file because of
> the
> overhead of uncompressing each observation
>
> there are situations when the resulting file size may increase rather
> than decrease
> <end quote>
>
> My question is does the "fewer I/O operations" speed up processing more
> than "the overhead of uncompressing each operation" slows it down? I
> seem to recall hearing at some point in my life that the I/O is the
> slowest component of SAS processing, but am not sure.
>
> The reason for my question is that we have 51.1 GB dataset that, upon
> binary compression, is reduced to 3.18 GB. Obviously for saving disk
> space compression is a no-brainer. However, I was not sure of the
> value
> of data compression from a processing standpoint. (The dataset has
> 6,698,765 observations and 982 variables.)
>
Ted, great question, and nicely laid out!
The answer is: YES fewer I/O operations speed up processing more than the overhead of uncompressing the data. As you stated, I/O's are the slowest operation in a computer program--tape mounts aside. By compressing a SAS data set, you squeeeeeeze it down to fewer data set pages (blocks), so SAS uses fewer I/O's to haul the pages from disk into memory. The pages are decompressed in computer memory (so that they don't get the bends http://en.wikipedia.org/wiki/Decompression_sickness ) which operates at a faster speed than the I/O subsystem. Think microseconds versus milliseconds.
My experience with SAS data set compression has been that it leads to faster wallclock time when sequentially processing large, compressed SAS data sets. Definitely give it a try... hey, maybe benchmark your particular example...
Ted, best of luck in all of your SAS endeavors!
I hope that this suggestion proves helpful now, and in the future!
Of course, all of these opinions and insights are my own, and do not reflect those of my organization or my associates. All SAS code and/or methodologies specified in this posting are for illustrative purposes only and no warranty is stated or implied as to their accuracy or applicability. People deciding to use information in this posting do so at their own risk.
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Michael A. Raithel
"The man who wrote the book on performance"
E-mail: MichaelRaithel@westat.com
Author: Tuning SAS Applications in the MVS Environment
Author: Tuning SAS Applications in the OS/390 and z/OS Environments, Second Edition
http://www.sas.com/apps/pubscat/bookdetails.jsp?catid=1&pc=58172
Author: The Complete Guide to SAS Indexes
http://www.sas.com/apps/pubscat/bookdetails.jsp?catid=1&pc=60409
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
My dog is worried about the economy because Alpo is up to 99
cents a can. That's almost $7.00 in dog money. - Joe Weinstein
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++