LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (July 2010, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Wed, 21 Jul 2010 11:05:39 -0400
Reply-To:     Paul Dorfman <sashole@BELLSOUTH.NET>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Paul Dorfman <sashole@BELLSOUTH.NET>
Subject:      Re: In search of a more efficient program
Comments: To: Andy Arnold <awasas@COX.NET>

Andy,

Please see interjections below.

On Wed, 21 Jul 2010 07:05:30 -0400, Andy Arnold <awasas@COX.NET> wrote:

>Background: >I've inherited a large, complex SAS program. Most files are quite small; >however, some are extremely large and have become a problem. >The files in question have 12-24 fields that are mostly numeric with a few >short (1-4character) fields. The files that are killing me have 20M records >and one has 160M records. >The files don't use SAS compression because the records are too short to >make compression cost & time effective.

Compression would most likely hurt you even more under the circumstances.

>Problem 1: >What are the trade-offs if I force numeric length to 4 or 2? Does SAS >always use a NUM 8 format internally? If I force short numeric field >lengths, will SAS have to convert them up to length=8 and back down again in >order to use the data?

Yes to both points.

>Problem 2: >What are the trade-offs if I use an index instead of a sort? The problem >sort takes an hour, uses a single numeric key with about 10 distinct values, >sorts 20M records that are about 100 bytes long. I've been successful with >indexing before; in that situation, I needed to sort the file in 15 >different sequences.

It would make a low-performing index because of its low cardinality (too few distinct values). However, it still may work better than sorting the whole enchilada 15 different ways. Usually if something like this needs to be done, it is a good indication that the entire approach is in question and needs to be rethought, after which oftentimes it turns out that the sorts and/or indices may not be needed at all. > >Thanks for your input and advice. > >--Andy

Kind regards ------------ Paul Dorfman Jax, FL ------------


Back to: Top of message | Previous page | Main SAS-L page