Date: Thu, 17 Aug 2006 22:19:08 -0400
Reply-To: "Howard Schreier <hs AT dc-sug DOT org>" <nospam@HOWLES.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Howard Schreier <hs AT dc-sug DOT org>" <nospam@HOWLES.COM>
Subject: Re: How slow is normal with large datasets?
It seems that some people suspect the proc and others suspect the network
I would experiment by running a do-nothing DATA step a few times against the
same data set:
That may help focus on the problem.
On Thu, 17 Aug 2006 02:56:54 -0700, Paul <ptvonhippel@CHECKFREE.COM> wrote:
>I'm analyzing a large data set wth 1.5M rows and 50 variables. With 30
>regressors, PROC LOGISTIC takes at least 15 minutes to give results --
>and can take much longer if other users are running SAS jobs as well.
>It's an imbalanced data set -- a thousand ones, a million zeroes, and
>sampling weights that vary from 1 to 100. The DATA step is slow, too,
>so I can spend the bulk of my day just coding up a couple of new
>variables and running a few regressions that use them.
>In my previous job, I did academic research using data sets that were
>at least 10 times smaller. So I've learned to use SAS more efficiently
>-- avoiding sorts, keeping only the variables I need, using
>pass-through SQL. These make a big difference, but I just can't seem to
>get the speed down to where I really feel I'm interacting with the
>Do I need to accept fate, or should I be pounding the table for a
>better technical setup? We're using SAS Enterprise Guide on a remote
>server in another state. I'm sharing resources with up to 4 other
>I'd be interested in hearing comparisons (either "That sounds about
>right" or "I run more complicated analyses on larger data sets and
>never have to wait more than a minute for results"). And I'd be
>interested in hearing suggestions about diagnosing the problem. I'm not
>sure if our connection is slow, or if it's fundamentally inappropriate
>to be using EG for this purpose. Maybe I should have my own SAS
>installation on my local PC?