LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous (more recent) messageNext (less recent) messagePrevious (more recent) in topicNext (less recent) in topicPrevious (more recent) by same authorNext (less recent) by same authorPrevious page (November 2006, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Tue, 14 Nov 2006 14:26:16 -0500
Reply-To:     "Richard A. DeVenezia" <rdevenezia@WILDBLUE.NET>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Richard A. DeVenezia" <rdevenezia@WILDBLUE.NET>
Subject:      Re: Bottleneck reading large datasets
Comments: To: sas-l@uga.edu

"Gerstle, John (CDC/CCID/NCHHSTP) (CTR)" wrote: > SAS v9.1.3, SP 4, on WinXP PC machines > > Here's the gist of the problem...=20 > > We have several SAS datasets available on a fileserver, some are small > to medium in size and some are large. All the datasets are (SAS) > compressed. The large datasets are several hundred MB's in size > (~million records). As we read in the large datasets, we only keep the > variables we need and/or we use where statements to select the cases > we want. > > Of the group of 13 programmers, 10 can read in a million plus records, > only keeping six variables, in about minute and a half. The other > three, well... , it's taking 20 minutes to run the same code reading > the same data (I'm one of the three :-(). (It's actually faster to > copy/paste the dataset into the SAS temp folder.) All the programmers > have the same SAS install. > > From what I've determined, the bottleneck appears as the dataset being > read is copied to the PC's SAS temporary (WORK) folder. Once it is > copied, processing speed is what one would expect (i.e., normal). > Creating a (large) dataset on the server (where one has read/write > access) does not take more time than what one would expect (i.e., > writing does not seem to experience the same bottleneck).

Take the analysis outside SAS. On each of the PC's copy the same large file from the fileserver to local disk. Try to do this so there is not contention by others on the network. If there is variation in the times to complete the task, then issue becomes one of diagnosing the network topology -- Are the 'slower' to complete machines on the other side of routers or switches that affect throughput to them? Do they have noticible hardware difference from other PCs? In an effort to isolate, you might use a single laptop to trace copy rates at various network drops. I'm sure the network folks have better tools for tracing throughputs.

Good luck finding your vampire.

-- Richard A. DeVenezia http://www.devenezia.com/


Back to: Top of message | Previous page | Main SAS-L page