Date: Mon, 26 Feb 2001 07:52:34 GMT
Reply-To: "Andrew H. Karp" <sfbay0001@AOL.COMNOSPAM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Andrew H. Karp" <sfbay0001@AOL.COMNOSPAM>
Organization: AOL http://www.aol.com
Subject: Re: SAS Compress vs GZIP
I am not a big fan of data set compression in the SAS System. In several tests
I have run, on large datasets in the MVS environment, data set compression was
effective in reducing the size of SAS data sets composed largely of character
variables that have repeating blanks or repeating values. I have not had a
chance to use the COMPRESS=BINARY option, new to V7/V8, which is supposed to
compress numeric variables.
To me, the big drawback to data set compression is that CPU requirements often
more than double when you apply a SAS task to a compressed vs a non compressed
data set.
I therefore don't recommend SAS data set compression options be applied to data
sets that will be used frequently, or have many DATA or PROC steps applied to
them.
My suggestion is to instead use the LENGTH statement to adjust the byte lengths
of variables. This is particularly useful for numeric variables with small
values. You will find that your I/O and, perhaps, your CPU requirements will
decrease when you apply SAS tasks to a data set with byte lengths set to
appropriate lenghts...and, the data set will require less storage space.
I like to use PKZIP/UNZIP to compress a SAS data set before e-mailing it or
storing it away. GNU, as I understand it, performs similar functions in the
UNIX environment.
I hope my comments/experience are of use.
Andrew H. Karp
Sierra Information Services, Inc.
A SAS Institute Quality Partner in the USA
19229 Sonoma Highway PMB 264
Sonoma, CA 95476 USA
707/996-7380 (voice)
SierraInfo@AOL.COM
http://www.SierraInformation.com