Date: Wed, 21 Jul 2010 11:05:39 -0400
Reply-To: Paul Dorfman <sashole@BELLSOUTH.NET>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Paul Dorfman <sashole@BELLSOUTH.NET>
Subject: Re: In search of a more efficient program
Andy,
Please see interjections below.
On Wed, 21 Jul 2010 07:05:30 -0400, Andy Arnold <awasas@COX.NET> wrote:
>Background:
>I've inherited a large, complex SAS program. Most files are quite small;
>however, some are extremely large and have become a problem.
>The files in question have 12-24 fields that are mostly numeric with a few
>short (1-4character) fields. The files that are killing me have 20M
records
>and one has 160M records.
>The files don't use SAS compression because the records are too short to
>make compression cost & time effective.
Compression would most likely hurt you even more under the circumstances.
>Problem 1:
>What are the trade-offs if I force numeric length to 4 or 2? Does SAS
>always use a NUM 8 format internally? If I force short numeric field
>lengths, will SAS have to convert them up to length=8 and back down again
in
>order to use the data?
Yes to both points.
>Problem 2:
>What are the trade-offs if I use an index instead of a sort? The problem
>sort takes an hour, uses a single numeric key with about 10 distinct
values,
>sorts 20M records that are about 100 bytes long. I've been successful with
>indexing before; in that situation, I needed to sort the file in 15
>different sequences.
It would make a low-performing index because of its low cardinality (too few
distinct values). However, it still may work better than sorting the whole
enchilada 15 different ways. Usually if something like this needs to be
done, it is a good indication that the entire approach is in question and
needs to be rethought, after which oftentimes it turns out that the sorts
and/or indices may not be needed at all.
>
>Thanks for your input and advice.
>
>--Andy
Kind regards
------------
Paul Dorfman
Jax, FL
------------