Date: Tue, 14 Nov 2006 14:26:16 -0500
Reply-To: "Richard A. DeVenezia" <rdevenezia@WILDBLUE.NET>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Richard A. DeVenezia" <rdevenezia@WILDBLUE.NET>
Subject: Re: Bottleneck reading large datasets
"Gerstle, John (CDC/CCID/NCHHSTP) (CTR)" wrote:
> SAS v9.1.3, SP 4, on WinXP PC machines
>
> Here's the gist of the problem...=20
>
> We have several SAS datasets available on a fileserver, some are small
> to medium in size and some are large. All the datasets are (SAS)
> compressed. The large datasets are several hundred MB's in size
> (~million records). As we read in the large datasets, we only keep the
> variables we need and/or we use where statements to select the cases
> we want.
>
> Of the group of 13 programmers, 10 can read in a million plus records,
> only keeping six variables, in about minute and a half. The other
> three, well... , it's taking 20 minutes to run the same code reading
> the same data (I'm one of the three :-(). (It's actually faster to
> copy/paste the dataset into the SAS temp folder.) All the programmers
> have the same SAS install.
>
> From what I've determined, the bottleneck appears as the dataset being
> read is copied to the PC's SAS temporary (WORK) folder. Once it is
> copied, processing speed is what one would expect (i.e., normal).
> Creating a (large) dataset on the server (where one has read/write
> access) does not take more time than what one would expect (i.e.,
> writing does not seem to experience the same bottleneck).
Take the analysis outside SAS. On each of the PC's copy the same large file
from the fileserver to local disk. Try to do this so there is not
contention by others on the network. If there is variation in the times to
complete the task, then issue becomes one of diagnosing the network
topology -- Are the 'slower' to complete machines on the other side of
routers or switches that affect throughput to them? Do they have noticible
hardware difference from other PCs?
In an effort to isolate, you might use a single laptop to trace copy rates
at various network drops. I'm sure the network folks have better tools for
tracing throughputs.
Good luck finding your vampire.
--
Richard A. DeVenezia
http://www.devenezia.com/
|